Voice interfaces have been around for a while now. Over the past two decades they’ve been quietly sat on our phones and home computers but have generally been unloved and unwanted. However more recently they have taken off in a way never seen before. In this article I would like to explore the growth of voice UI, the reasons for its sudden resurgence in popularity and ask where it could take us to next.
Growing up in the 80s was an amazing time for a geek like myself. Home computers were starting to become a reality and seemingly doubling in their power every year. Science fiction was finally beginning to move beyond stereotypical alien invasions and starting to squeeze exciting technology concepts into real life situations, giving us a glimpse of potential visions of the future. TV shows such as Airwolf, Street Hawk and Knight Rider were hypothesising futuristic artificially intelligent vehicles controlled by leather clad, heavily permed protagonists - containing many concept technologies that have since been adopted. And even Jessica Fletcher in Murder She Wrote got in on the action by dedicating an entire storyline to the concept of Virtual Reality.
Seriously. This actually happened. Jessica Fletcher predicted the future.
The tech explosion of the 80s was incredible to witness. But there was one technology that consistently featured throughout the decade’s science fiction that felt natural, surprisingly obvious and didn’t require a perm or leather to use it. That technology was voice control.
Voice control became such an 80s science fiction staple that after a while you began to forget that it was a technology that not only didn’t actually exist, but one that would be incredibly difficult to put into practice. But the concept of intelligent voice control was old even in the 80s. Over a decade earlier, 2001: A Space Odyssey hypothesised that in order for voice control to work efficiently as an interface, that an artificial intelligence (in this case the infamous Hal 9000) would need to be present. This idea was later replicated with Knight Rider’s artificially intelligent car, K.I.T.T. and a variety of other popular science fiction machines.
These examples concentrated on artificial intelligence as the central technology in the story and the voice control was just a natural byproduct of the AI. A few other notable pieces of science fiction portrayed voice control as being an everyday assistant which simply contains a level of intelligent learning sufficient to understand the human interaction. This is the kind of voice assistant we are starting to see today in our homes with the likes of Amazon’s Alexa, Google’s Assistant and Apple’s Siri. All three have added a level of personality so that users are more likely to interact with them in an anthropomorphic manner, but they’re still simply just voice interfaces that learn from the way you use them.
Roads? Where we're going, we don't need roads.
Back to the Future II’s vision was clear and simple, the year is 2015 and your home contains a higher level of automation. You can enter your house using a fingerprint scanner and turn on the lights by saying “lights on”. Setting the temperature in your house is as easy as verbalising a setting. The scenic views broadcast to you on your large flatscreen TV can quickly switch to a web conferencing system, or an entertainment system, controlled entirely with your voice. It even recognises which person in the house is issuing the commands and responds appropriately.
Put simply, Back to the Future II nailed it.
The year is 2017 and your house can contain a high level of automation. You can enter your house using a fingerprint scanner if you so wish (although most people still prefer to use keys). Using the Google Home or Amazon Echo you can say “Ok Google, lights on” or “Alexa turn on the lights” and it will automatically fade up your Philips Hue, or alternative smart lighting system. You can even set the mood of the lights with your voice if you want. You can set the temperature of your house by issuing a voice command and the Home or Echo will interface directly with Nest, Honeywell or whichever other intelligent heating system you have installed.
Your 50” flatscreen TV can switch between the 4K aquarium that it’s broadcasting from Netflix to whichever other entertainment channel you prefer, again by interfacing with your preferred voice assistant. Google Home ties in with Google’s Chromecast devices and Amazon’s Echo works well with home remote hubs such as Logitech’s Harmony series. Voice assistants can even tie in with your other appliances such as ovens, fridges and washer/dryers if they’re already classed as “smart” devices, or you can budget automate your entire home using a variety of smart sockets.
The future is here. We’ve taken science fiction and made it science fact. In fact we’ve taken it one step further. With the internet we can find information, buy goods and services, get the news, weather, find recipes, educate our children, interact with polls, connect with each other socially and most importantly watch as many “fluffy kittens” videos on demand as we desire. And now thanks to voice assistants, we can do all of these things as simply as talking to a box in the corner of our rooms. It’s mind boggling to wonder what Back to the Future’s writer Robert Zemeckis would have conceived if he’d realised the sheer connectivity that the internet would give us.
Zemeckis had a vision and the dawn of the internet expanded the potential of that vision massively, but you might ask why voice interaction is only now breaking through. It’s a good question and it has a simple answer. We have now entered the age of accessible machine learning and an easy-to-use voice activated assistant is the first obvious application.
The return of a lost technology
In order for a system to recognise and translate your voice to text, it has to take into account a number of different factors; language being the most obvious, but also accent, background noise and in the case of whole sentences, context. Voice recognition has been around since the 1950s, starting with recognising only numbers spoken by a single voice. By the time that Back to the Future was first released in the mid 80s, it had advanced to recognising 1000 words in a single language, which is better but still vastly decreased its usability in a real life scenario. By the 1990s, voice recognition had begun to hit the consumer market in the form of dictation software, but the software was inaccessibly expensive, took hours to train to recognise your voice and was often still incorrect in what it recognised.
The truth is that in order to accurately recognise a voice, a system would require an enormous amount of example data and the processing power required to make the comparisons. Home computers simply weren’t up to the task, so dictation software was naturally left behind in favour of typing and auto spell-checking.
But in 2010 Google revolutionised the world of voice recognition by introducing "Personalised recognition" to Android phones (and later in 2011 to the Chrome browser) as a way of improving the Google voice search. This opt-in feature would record and send any voice searches that you made to the cloud to be statistically compared to other user’s voice commands. This comparison with other users would allow it to pick up on the nuances of language, accent, pitch, pace, gender and age. Thanks to Personalised Recognition, Google’s English voice search system now incorporates over 230 billion words gathered from actual user queries. Combining all of this data with cloud computing and machine learning meant that finally they could produce an accurate recognition system like we’ve never seen before.
Apple followed suit with a cloud-based processing approach for Siri in 2011 and Amazon slightly later with Alexa in 2014. Now in 2017, voice recognition has become a completely viable interface for the modern user. It’s affordable, easy-to-use and already compatible with a number of smart devices. Tim Tuttle, CEO of MindMeld, was recently quoted in an interview with TechCrunch as saying:
"In the past 18 months, commercial speech recognition technologies have seen a dramatic 30 percent improvement. To put that into perspective, that’s a bigger gain in performance than we’ve seen in the past 15 years combined."
A new kind of internet
But what’s next for Voice Assistants? Well, at Toaster we see voice interfaces becoming a quickly accepted UI now that they’re more accurate and practical to use. They continue to learn and improve as more and more users interact with them. Apple announced the HomePod in June and are due to release it in December. Although more of a direct Sonos competitor than Home or Echo, the HomePod will bring Siri to the living room. In advance of the HomePod’s release, both Google and Amazon will be concentrating their efforts on improving their services, each hoping to corner the market. Whichever proves to be the best seller, it's a guarantee that the collective influence of Google, Amazon and Apple will make voice assistants this year’s must-have Christmas present for many.
Looking further ahead, an audio only web interface will appeal to security conscious parents. An audio based internet immediately addresses many levels of child security, without the need to enforce obvious restrictions. A whole new generation could now grow up with voice assistants being a natural interface to the internet.
Large brands are beginning to adopt voice assistants as an easy way to promote their products and services. You can already order a Dominos pizza from the comfort of your sofa, order an Uber or even get your horoscope from Elle magazine, simply by issuing a voice command. At Toaster we’ve been experimenting with this technology and have even released our own application for the Google Assistant.
In this article, our Strategy Director, Zanya Fahy discusses what brands should consider in order to make the most of this new form of interaction in more detail.