Richard Silvester
November 8, 2018
Reading time:
6 min

Do you remember the distant 1980’s when you used to interact with your personal computer via a command line? Or maybe you remember when Apple introduced the first computer with a Graphical User Interface (GUI)? If you are a millennial like me then you will definitely remember how our web consumption shifted from desktops to mobile in the mid-2000s. Fast forward to the present, and a new way of interacting with our devices is here to stay – Voice User Interface (VUI).

According to research done with over 500 adults in the UK, 99% of those surveyed are aware of voice assistants and 73% have used one at some point in time. These are quite impressive numbers but they should come at no surprise.

The number of voice assistants and devices that support them has been growing rapidly in the past few years with the introduction of Siri in 2011, followed by Alexa in 2014 and Google Assistant in 2016. This is an interesting trend considering that voice interactions once only belonged to science fiction films.

Voice is the most natural way of communicating and one of the first things we learn as human beings. VUIs are particularly good when they are making a particular task easier, faster and more natural to complete. For example, Ocado has both a mobile app and an Alexa skill. Let’s examine what steps are required to order a can of tomatoes:

Mobile app:

  1. Locate and unlock phone
  2. Find the Ocado app
  3. Search for tomato cans
  4. Make a selection and order
  5. Lock phone and put away

Voice assistant skill:

  1. “Alexa, order a can of tomatoes.”

Here you might ask how Alexa knows which tomatoes to order exactly? Ocado has been particularly thoughtful here and base the selection on past purchases and popularity. Can you imagine the following if Alexa approached the task in the conventional, GUI, way: “There are 20 varieties of tomato cans available. Starting from the cheapest to most expensive the first options is a pack of 6 organically grown tomatoes for the price of £2 and you can save if you order 2 packs…” and so on. This would be a very poor User Experience. This brings us to the fact that voice commands are suitable for repetitive or limited tasks and not very good for exploratory ones.

So how do you design optimally for voice? Amazon suggests the following:

  • Start with people, not technology – write for how people talk, instead of how they read and write. Do not try to force the implementation of a technology if there is not enough evidence that it will enhance the experience.
  • The one breath test – ask for one piece of information at a time and keep interactions brief
  • Eyes expect uniformity, ears expect variety – avoid repetitive phrases
  • Don’t assume that the user knows what to do or what will happen – indicate when the user needs to provide input
  • Clearly present options, but not more than 3 at a time – be mindful that our auditory sensory memory lasts up to 2 seconds and that takes up to 30 seconds to store something in the short-term memory

Voice assistants are just at their inception and they are far from becoming a mature technology. At present, they are very good at speech recognition and generation, language understanding, dialog tasking and world knowledge. They, however, lack skills in turn-taking, attention and memory, theory of mind, grounding and error correction, and conveying emotion and social relationships. With advancements in AI, voice capabilities will expand in these areas which will have major implications for many industries. In the world of content marketing, the development of voice technology is something to watch very closely.

Do you want to start designing for voice? Start by visiting and for best practices.