Voice interfaces have finally made the jump from science fictions to the realm of current technological possibilities. And it’s currently seeing a huge mass-market adoption.
From the perspective of User Experience (UX) design, this is an opportunity to apply the principles of the user-centered design to shape the young technical capability. And we designers are already on the task — working hard to supplant all earlier ‘primitive’ interfaces with the ‘natural’ interface.
But are there any problems with such a focused approach to Voice User Interface (VUI) design?
To answer that, let’s look at two important challenges which we are trying to tackle in VUIs.
1. A voice assistant should sound ‘human’
Any voice interface today is modeled on human conversation behaviors. But is that a case of designers basing their designs on their own mental models?
I am not denying that human’s conversation would influence a user’s perception of a Voice User Interface (VUI) — in fact, there are a lot of articles outlining how significant that influence is. However, that isn’t the only factor influencing the user’s mental model. User’s perception of a user interface (UI) is also influenced by GUIs, IVRs, chatbots, and any interaction with machines as well.
So, while they expect the freedom to speak to the machine ‘in their own tongue’, they may also rely on common vocabulary with ‘machines’ (ex: ‘log out’, ‘check-out’) Speaking keywords is cognitively less taxing than formulating complete phrases. So, maybe, for first-time users, it is ok to list down possible commands ‘robotically’ rather than ask open-ended questions (and wait for no-user-input error state to read out these options).
Secondly, VUIs can give a terse response for confirmation states rather than try to sound witty every time. There are two reasons for this — first, that voids the intention of using voice for speed. An experienced user doesn’t require the level of confirmation. Second, while this doesn’t seem to be a problem now, when the breadth of VUI usage of a user increases, this may become irritating. This has already been demonstrated by the enthusiasm by which Reddit users are embracing Alexa’s brief mode.
Third, there is the ambiguous question of how much we want a machine to sound like a human. The entire ‘should you say please to Alexa’ boils down to a basic fact — should we treat VUIs as human? There is no clear answer yet.
Of course, chatbots and services designed specifically for banter are exceptions.
2. Voice can/needs to support everything
Currently, services arenas are rushing to jump on the VUI bandwagon — from education to healthcare to banks. But can and should everything be supported by voice?
Some tasks are less efficient by voice. While it is easier to speak a command to complete or reach the end stage of complex tasks, it is much easier to glance over a list on screen than have an assistant read it out. Also, GUIs have made a lot of progress in increasing glanceability through icons. Users VUIs don’t have anything concrete corresponding to icons yet.
However, the above point becomes void when the assistant is smart to choose and read out only the ‘best’ and ‘personalized’ result for you.
Which brings me to my next point. Filter bubbles. The term “filter bubble” refers to the results of the algorithms that dictate what we encounter online. According to Eli Pariser – the person who first coined the term – those algorithms create “a unique universe of information for each of us … which fundamentally alters the way we encounter ideas and information.”
It’s important to note that while adults are mostly using VUIs for a few tasks and simple questions, kids using VUIs from a young age are using VUIs to find answers for all of their questions. As designers, shouldn’t we take into consideration the effects of our designs on such a vulnerable target group?
So, is voice bad?
I am not contesting against the usefulness of VUIs. It is undoubtedly a powerful platform. However, design decisions shouldn’t be made to forcefully fit voice technology.
Voice is another tool for designers to tackle challenges in human-computer interaction. It is a versatile tool, not one that can supplant every other tool. Every type of interface has its unique benefits, behavioral or technical, that works for certain types of users or situations.
All in all, voice is supposed to be the infrastructure for user experience, and not the end goal.