Wireless and Digital Services

You talkin’ to me? 7 Principles when designing for voice user interfaces (VUI)

By Tim Daines - Last updated: Tuesday, January 3, 2017

Voice user interfaceAmazon’s Echo and Apple’s Siri have blazed the trail. Voice input and control have reached the technology mainstream and a new generation of digital services and physical devices will soon be upon us. But for those of us in technology development, how should we go about designing an experience that knows us as well as our mother?

There’s much discussion around the subject of voice user interface, or VUI for the acronym lovers; but nobody quite knows what’s in store for the future. One thing we can be sure of though is that it’s not the shiny ‘hard’ design that matters here. Instead it’s the ‘soft’ design which must be perfected and made delightful; specifically it’s a ‘service’ that uses ‘audio data’ to assist someone in their life. This ‘service’ will provide a user experience, but equally it’s the ‘audio data’ design challenge which must be worked through, particularly as there aren’t any screens that a user can interact with.

These are early days for VUI, and best practice is yet to be established. With that in mind, I wanted to propose my own core principles when designing for voice user interface. Some of this thinking is the result of having recently attended a user experience hackathon, focused on designing for VUI.

  1. Don’t build a service like Fawlty Towers

The legendary UK TV series Fawlty Towers follows the exploits and misadventures of a short-fused hotel owner named Basil Fawlty. His relationship with the waiter Manuel is comically confused and irritated. Basil’s frustration with Manuel grows, based on constant complications and mistakes in ‘dialogue’ due to cultural language barriers (he’s “from Barcelona”). The result is escalating frustration and a one-way shouting match. Understanding how to design for dialogue will be key to ensuring your VUI returns a response it’s confident in and which supports the user’s needs. By making dialogue a core design principle, technology developers have a chance of building products that users can enjoy interacting with. Beware though, even the very best get this wrong.

  1. Look mum, no hands!

Remember we’re designing a VUI product that will listen to the users in a variety of conditions. There’s no machine equivalent of the cocktail party effect, whereby we humans can focus our auditory attention on a particular stimulus while filtering out a range of other stimuli. VUI design will need to be defined around how normal human behaviour tunes into a single voice. Understanding this dialogue experience is vitally important, as it will inform the ‘hard’ design in relation to audio source separation, including the combination of intelligent signal processing and the placement of microphones. Without understanding how noise suppression will be designed to immediately detect directed user dialogue, it’s likely that the entire ‘soft’ design will fail.

  1. It’s the 21st Century

Have you ever wondered why Amazon Echo defaults to a female voice? Is it because females are more helpful and males are not? Take great care when choosing the voice that speaks to the user. We are a complicated species with a mix of cultures, languages and preferences. The choice of default VUI voice could easily send a message we didn’t intend.

  1. Why complement the VUI with an App?

Does it really need an app?

I have serious reservations about the value that Amazon’s Alexa app for Echo brings to the user. It’s useful during setup and acts as a repository for past question and answers, but Amazon are introducing further complexity into an already unusual and complex system, and designers run the risk of cognitively overloading the experience of the VUI. Without wishing to be too purist about it, I’d be happier if VUI products and services were so unique as to be impossible to replicate with a simple app.

  1. The user’s journey is VUI not GUI

In the world of graphical user interface (GUI) design, I’ve come to understand a rule of thumb that suggests our work is 80% design and 20% experimenting with the user. But with voice-interface design we’re not designing and experimenting with interface design, rather we’re designing for dialogue. My own experience is that this completely upends the old rule of thumb, moving us to 80% experimenting and 20% design. Remember, we’re designing for what the user is saying (conversational and transactional), and not what the user is doing (interaction and navigation). This shift to a focus on experimentation and learning, leading to a complete renegotiation of the relationship between designer and developer across the entire product strategy. Don’t take this lightly.

  1. Build for empathy

Whilst a VUI system won’t (yet) know how the user is feeling, the dialogue and perceived character of the VUI will be important when responding to the tone of voice it understands from the user. This understanding should have been learned and improved over time from numerous conversations that take place between the user and the VUI. In short, the conversation needs to be as natural as possible, as humans intelligently match their tone of voice to those they’re speaking with as a way of putting people at ease. The best machines will do the same – this is part of what the vital Machine Learning element of such services can deliver.

  1. VUI needs to mature with the user

Cloud computing has matured and is all-pervasive. Users are now able to take their data with them when they upgrade their ‘hard’ design product to the latest technology. This must be true of a VUI technology as well, particularly when concerning the preservation of numerous conversations. Whilst this ‘hard’ design element – the physical thing – may remain locked in conventional product lifecycles, the ‘soft’ design part – the digital service – should have loftier ambitions and a longer lifespan. It should evolve its existence based on learning how the user’s dialogue changes as they themselves evolve and age. Truly visionary VUI will design for this process of adapting, learning and understanding the biological evolution of its user.

An area I haven’t examined in detail here is AI, and the ways in which Machine Learning and the right inputs and outputs might enable VUIs to replicate the way a human mind thinks. Machine Learning today is improving VUI, but not convincing anyone that they’re speaking to a real human. Indeed it may be that a real human hand is delivering some VUI improvements.

In addition, as VUIs become better at detecting the emotions of humans we must be mindful of the ethical considerations at play. Should we allow VUIs into the medical space, for instance, allowing them to give advice on medical conditions such as depression, perhaps even making referrals or advising a next of kin? It’s a fair bet that voice tools aided by Machine Learning will eventually be able to infer a long term deterioration of the user’s mood, so what should be done with such knowledge?

These principles are an early attempt to codify the core design principles when designing for voice-user interface (VUI). They’re not comprehensive and they’ll evolve over time. In the meantime, share your thoughts and any that I might have missed.