To make the experience more seamlessly human-like, these devices are listening to you all the time.
People have been wanting to talk to their computers for a long time. As humans, we’re wired for conversational interaction, so it’s not a giant leap to imagine talking to a computer as being easier (for the human at least) than typing at one. In 1966, there was the talking computer of the starship Enterprise in the original Star Trek television series, voiced by Majel Barrett. The fictional HAL 9000 was the calm-voiced villain of Stanley Kubrick’s 2001: A Space Odyssey in 1968. In 1983’s War Games, the primitive text-to-speech voice of the WOPR computer asked, “How about a nice game of chess?”
Fast forward to 2011, when Apple announced the voice-capable virtual assistant, Siri, would be standard on the iPhone 4S. Suddenly, people could talk to their smartphones. At the start of 2017, the trend towards virtual assistants has reached critical mass, with offerings from Microsoft (Cortana), Google (Google Home and Google Assistant), Samsung (Viv) and, perhaps most importantly, Amazon’s Alexa technology, which is employed in products like Amazon Echo and Amazon Dot.
In case you’re not aware, virtual assistants are now available not just on smartphones, but as tabletop devices like the Amazon Echo, a cylinder about 9” high and 3” in diameter, or Google Home, which looks like an oversized air freshener. The second-generation Amazon Dot looks like a hockey puck.
A number of separate technologies underlie these products. First of all, there is speech recognition, which involves taking the sounds emitted by the human vocal tract and turning them into machine-readable text. For example, once the sound of “What time is it?” is turned into text, another technology must make sense of the question and formulate a reply in text form. This falls under the general heading of Natural Language Processing (NLP). Finally, the text of the response, “It’s 3:21 p.m.” is translated back into human-sounding speech. As you might expect, it’s taken a lot of smart people to bring these technologies to the point where they can be bundled together in a commercial product such as a smartphone or a tabletop device like Google Home or Amazon Echo.
In fact, most of the processing which makes the magic happen doesn’t occur locally, but in the cloud. A compact representation of what you say is processed in “the cloud” by much more powerful computers connected to your phone or device via a wireless Internet connection. To make the experience more seamlessly human-like, these devices are listening to you all the time. On an iPhone 6S or other later models, you can say “Hey, Siri!” and Siri will respond. (Earlier models require you to be plugged into power for this feature to work). Amazon’s Echo listens for a “wake word,” which defaults to “Alexa.” Privacy advocates have expressed concern about what happens to the information these devices “hear.” In fact, police recently subpoenaed an Amazon Echo device to see if it contains audio data which can shed light on a murder.
Building an Alexa, Siri, Viv, (Google) Assistant, or Cortana isn’t easy. And this is where Amazon came up with a brilliant idea. They’re offering interested developers the Alexa Skills Kit, “a collection of self-service APIs, tools, documentation and code samples that make it fast and easy for you to add skills to Alexa.” A “skill” adds new voice-controlled abilities to Alexa’s skill set.
As a result, a gaggle of products at the Consumer Electronics Show in January featured either the ability to function as an Echo (a refrigerator, for example, which can tell you the time), or the ability to use a device with Alexa (such as an Echo or that refrigerator) to control your product. For instance, one firm demonstrated arming and disarming their home security system via Alexa commands. Given how much easier Alexa makes voice control, you can expect to see more of these integrations. And if you’re in business, you should definitely consider whether integrating with Alexa makes sense for your product offerings (take a look at some of the 2,000-plus add-on skills at www.amazon.com.
Though I’ve focused on Alexa and Amazon Echo, Google Home (with its Assistant software) gets high marks for its ability to respond to questions. (See tinyurl.com/hlwrn9q for a head-to-head comparison of Alexa, Siri, Cortana, and Assistant. For the record, all of them respond to the command, “Talk dirty to me.”
Author
-
Michael E. Duffy is a 70-year-old senior software engineer for Electronic Arts. He lives in Sonoma County and has been writing about technology and business for NorthBay biz since 2001.
View all posts