Quick Links

Voice assistants like Alexa, Google Assistant, and Siri have come a long way in the last few years. But, for all their improvements, one thing holds them back: They don't understand you. They rely too much on specific voice commands.

Speech Recognition is Just a Magic Trick

An Echo dot saying "Hmmm... I don't know that"
Amazon

Voice assistants don't understand you. Not really, anyway. When you speak to a Google Home or Amazon Echo, it essentially converts your words to a text string and then compares that to expected commands. If it finds an exact match, then it follows a set of instructions. If it doesn't, it looks for an alternative of what to do based on what information it does have, and if that doesn't work you get a failure message such as "I'm sorry, but I don't know that." It's little more than sleight of hand magic to trick you into thinking it understands.

It can't use contextual clues to make the best guess, or even use an understanding of similar topics to inform its decisions. It isn't hard to trip up voice assistants either. While you can ask Alexa "Do you work for the NSA?" and get an answer, if you ask "Are you secretly part of the NSA?" you get an "I don't know that one" response (at least at the time of this writing).

Humans, who genuinely understand speech, don't work like this. Suppose you ask a human, "What is that klarvain in the sky? The one that is arched, and full of striped colors like red, orange, yellow and, blue." Despite klarvain being a made-up word, the person you asked could likely figure out from the context that you're describing a rainbow.

While you could argue that a human is converting speech to ideas, a human can then apply knowledge and understanding to conclude an answer. If you ask a human if they secretly work for the NSA, they'll give you a yes or no answer, even if that answer is a lie. A human wouldn't say "I don't know that one" to a question like that. That humans can lie is something that comes with real understanding.

Voice Assistants Can't Go Beyond Their Programming

Voice assistants are ultimately limited to programmed expected parameters, and wandering outside of them will break the process. That fact shows when third-party devices come in to play. Usually, the command to interact with those is very unwieldy, amounting to "tell device manufacturer to command optional argument." An exact example would be: "Tell Whirlpool to pause the dryer." For an even harder to remember example, the Geneva Alexa skill controls some GE ovens. A user of the skill needs to remember to "tell Geneva" not "tell GE" then the rest of the command. And while you can ask it to preheat the oven to 350 degrees, you can't follow up with a request to increase the temperature by another 50 degrees. A human could follow these requests though.

Amazon and Google have worked very hard to overcome these obstacles, and it shows. Where once you had to follow the above sequence to control a smart lock, now you can say "lock the front door" instead. Alexa used to be confused by "tell me a dog joke," but ask for one today, and it will work. They've added variations to the commands you use, but ultimately you still have to know the right command to say. You need to use the correct syntax, in the correct order.

And if you think that sounds a lot like a command line, you're not wrong.

Voice Assistants are a Fancy Command Line

A command prompt with search text

A Command Line is narrowly defined to performs simple tasks, but only if you know the proper syntax. If you slip out of that correct syntax and type dyr instead of dir, then the command prompt will give you an error message. You can use aliases for easier to remember commands, but you have to an idea of what the original commands were, how they work, and how to use aliases efficiently. If you don't take the time to learn the ins and out of command line, you won't ever get much out of it.

Voice assistants are no different. You need to know the correct way to say a command or ask a question. And you need to know how to set up groups for Google and Alexa, why grouping your devices is essential, and how to name your smart devices. If you don't follow these necessary steps, you'll feel the frustration of asking your voice assistant to turn off the study only to be asked, "which study" should be turned off.

Even when you do use the correct syntax in the right order, the process may fail. Either with the wrong response issued or a surprising result. Two Google Homes in the same house may give weather for slightly different locations even though they have access to the same user account info and internet connection.

In the above example, the command "Set a timer for a half hour" is given. The Google Home hub created a timer named "Hour" and then asked for how long the timer should be. And yet repeating the same command three other times worked correctly and created a 30-minute timer. Using the command "Set a timer for 30 minutes" works correctly on a more consistent basis.

While the speaking to a Google Home or Echo may be more fluid, under the hood voice assistants and command lines work the same way. You may not need to learn a new language, but you need to learn a new dialect.

The Narrow Understanding of Voice Assistants Will Limit Growth

An Echo Spot, Google Home, Smart bulb and smart plug on a wooden surface.

None of this prevents voice assistants like Google Assistant and Alexa from working well enough (although Cortana is a different story). Google Assistant and Alexa and search online for questions decently, though not surprisingly Google is better at search, and can answer basic questions like measurement conversions and simple math. With a correctly set up smart home and well-trained user, most smart home commands will work as intended. But this came through work and effort, not intellectual understanding.

Timers and Alarms used to be simplistic. Over time naming was added, then the ability to add time to a timer. They moved from simplistic to more complicated. Voice assistants can answer more questions, and each day brings new skills and features. But that isn't a product of self-growth that comes from learning and understanding.

And none of that delivers the inherent capability to use what is known to reach the unknown. For every command and question that does work, there will always be three that don't. Without a breakthrough in A.I. that grants a human-like ability to understanding, Voice assistants aren't assistants at all. They're just voice command lines---useful in the right scenario but limited to those scenarios they've been programmed to understand.

In other words: machines are learning things, but can't understand them.

Related: The Problem With AI: Machines Are Learning Things, But Can’t Understand Them