After spending my last month playing around with Alexa and its kit A.S.K, testing its potentials and experimenting on building a couple of skills. Made me choose this weeks story.
I wanna start with saying “Voice UIs are not THE future … they are A future.” Humans communicate with each other in multiple ways and it is inevitable and necessary that to efficiently and effectively communicate with a machine, then the machine must be able to understand us when we use our normal mechanisms human-to-human. As the why, it’s because of their immense usefulness. After all, we’ve been imagining what it’ll be like to control practically everything with our voice since being exposed to the potential for it in TV shows like Star Trek and movies like Iron Man (J.A.R.V.I.S.), and now that possibility is becoming reality, we’re quick to make the most of it.
Apart from that, technological breakthroughs are also making it possible to implement VUIs in many places. For example, as this post on designing VOIs explains:
● Web Services, IoT Have Opened Doors: Web services and the Internet of Things provide ready-made opportunities for voice. Sensors and readouts, for example, make for natural smart-home integrations.
● The Science Is Accessible: Now anyone can leverage learnings from fields like automatic speech recognition (ASR), natural language understanding (NLU) and text to speech (TTS).
● The Hardware Can Support the Use Case: Existing hardware can support far-field voice input processing (FFVIP), enabling a wider range of experiences with VUIs.
● AI Is Making VUIs Smarter: Thanks to advances in machine learning, VUIs are learning and adapting to users’ speech patterns, preferences, and contexts over time.
Here are two major reasons why people are enamored with the voice as the interface of the future:
- Speed of interaction: it’s way faster to type anything than to type it, and that’s arguably the main reason why voice has come to be so huge. In an age of efficiency, the ability to save time is something that will always be extremely appealing to customers, whether it’s booking a reservation at a restaurant or managing a smart home.
- Access: voice is the natural interface so, by default, no user-friendly site or app can compare with it. The frictionless nature of voice means there are far fewer hoops to jump through in order to begin the user experience. Not that the process that starts with you pulling out your phone and ends with the app loading is soul-crushing by any means, but it’s still more hassling then simply saying a few words and once again — people don’t like hassle and if they can cut down on it, they’ll be for it big time.
- Natural interaction mode: Human beings use spoken the language. It’s one of the defining characteristics of our species. People are very comfortable speaking to interact.
- Increased flexibility: Currently, interacting with a computer is essentially performing a series of steps in a particular order to achieve a given outcome. There is almost no flexibility in how to perform those steps. It is up to the human to learn the computer. The dream with voice interfaces is to remove some of that burden, and indeed have the computer try to figure out what the human is trying to say.
Now, as convenient voice UIs are, they are somewhat limited in terms of browsing and general discovery. Those areas are still best suited as visual elements because we can scan search results quicker that listen to each, as well as cognitively understand easier the broader view of possibilities. That’s why a multi-modal voice experience augmented with a screen will likely be the user interface of the future. The best part is that it’s still early stages of development so it’s bound to get better, especially as it moves away from the Q&A principle and becomes more conversational.
Today, voice UI will undoubtedly be a powerful addition to certain interactions, but nothing more.
Because remember, even in Star Trek, when they need to get shit done quickly, Data just used the keyboard. #Spreadknowledge
Footnotes
Some stuff you can go thru —