Ready for Discovery, held back by user concerns
If you were to, you’d find yourself in back-and-forth conversations on video with the chatbot, not just with voice, but also on video, where the AI assistant can see what’s in front of you. ‘Multimodal’ AI is a powerful step and opens up new possibilities. Yet, it remains underrated and underused. A host of very human concerns prevent its enthusiastic uptake.
Each GenAI app uses its Live mode slightly differently. Gemini is the most multi-featured and has an interesting screen sharing option; ChatGPT specializes in warm, friendly interaction; Perplexity uses the camera outside of its own app; Grok has AI modes with different personalities. Microsoft’s Copilot is built into Windows and focuses on workplace tasks. And Meta doesn’t have a Live mode in the app, but connects to smart glasses to be a wearable AI assistant.
New possibilities
Adding video to the voice interaction opens up entirely new possibilities. Some use cases are obvious. Anyone with a visual impairment can hold up the device with video Live mode on and ask questions about what is in front and navigate an unfamiliar space. You can point the camera at a place, a store, or an object and ask questions.
The tech isn’t flawless. It can often give long, chatty explanations when what you want is short, quick guidance. I asked about obstacles on a staircase, but by the time we were done with the elaborate description, I was long up the stairs and off.
Video can identify places, objects, and even people if their pictures are stored in memory for this purpose. You can then get information by asking questions, such as the historical background of a site.
Not every chat assistant is equally skilled. I had Perplexity look at my laptop and tell me what it sees, but it just could not, although it did think there was a “little device with some numbers under it” in front of me. It also saw a window that doesn’t exist. This can be dangerous in some scenarios.
If you show the Live AI something you’re trying to do, such as repairing an appliance, cooking something from a new recipe, or assembling an item of furniture, the assistant will answer questions or lead you through the process step by step. It’s also able to switch languages on the fly, so the implications for education and training are immense.
In work situations, customized variants of a chatbot can all but take over chunks of training, onboarding or orientation.
Gemini’s Live mode includes screen sharing that can even guide you through doing something you need on the device. Also, if you attach a file (document, audio, video, image), you can also discuss it with Gemini. I showed it a long document I didn’t feel like reading and asked it specific questions about the content, which it answered in mere seconds.
ChatGPT’s Live mode sees everything in detail. I can vouch for it after having recently spilt boiling tea over myself and acquired burn injuries. These were mostly first-degree, so I let ChatGPT guide me through taking care of the injuries for a period of several weeks—a job it did remarkably well, always telling me what to watch out for in case a doctor’s visit was necessary.
Grok does much the same. When I showed its Live mode my laptop, it began to tell me all about the article I was writing. Perplexity didn’t do so well and told me there was a window in front of me.
Trust deficit
That bit of hallucination leads us to why users are wary of using the Live mode. As it stands, the understanding of AI is only in its infancy, as is AI itself. To expect to let it see one’s personal space and other details is more than a little frightening. Where does the information go, what is done with it, what happens if it’s breached—all valid questions.
Some users even believe Live means you’re broadcasting to the world, and anyone anywhere can look in. The trust deficit is enormous, leading to a reluctance to try the Live feature.
There’s also no apparent compelling reason to. What will the Live AI do that my phone doesn’t already manage? If users don’t try, they don’t become aware of the potential, so they get even more wary of using it. Companies are investing a lot in Live AI capabilities, meanwhile. If they want users to adopt them, they will have to do more to build both trust and awareness.
The New Normal: The world is at an inflexion point. Artificial intelligence (AI) is set to be as massive a revolution as the Internet has been. The option to just stay away from AI will not be available to most people, as all the tech we use takes the AI route. This column series introduces AI to the non-techie in an easy and relatable way, aiming to demystify and help a user to actually put the technology to good use in everyday life.
Mala Bhargava is most often described as a ‘veteran’ writer who has contributed to several publications in India since 1995. Her domain is personal tech, and she writes to simplify and demystify technology for a non-techie audience.
Post Comment