The art of story: Crafting helpful voice assistants

Key takeaways

  • Voice assistant use is on the rise, but even so, their conversational ability can be less than dazzling.
  • For voice designers, the art of a good conversation can be difficult to pin down, but plenty of examples can be found in movies and television.
  • By ensuring character, tone and narrative are taken into account, voice assistants can do more assisting and less accidental frustrating.

From Amazon’s Echo to Google’s Assistant and Apple’s Siri, consumers are expected to chat with more than 8.4 billion voice devices by 2024.1

Yet the conversational aspect of these interfaces remains limiting. Despite their ability to set an alarm, relay directions, or even offer a glimpse into what NASA’s Mars Rover is up to2 — voice assistants are still programmed by limited language models based on ‘command-and-response’ queries.

The need to use ‘wake’ words to get their attention, or repeat queries because they weren’t understood, means interactions can at times be frustrating. Potentially, this friction is more than just annoying: 33 percent of Americans say they’ll consider switching companies after a single instance of poor service.3] For voice technology, that could be as simple as hearing “I can’t do that”.

So how can we approach creating voice interfaces that engage users, minimise interaction costs, and produce an experience that builds trust? The answer lies in the strength of the story.

1. Create a rich character

Think of the last movie you watched. What made you invest in the characters? Were they outrageous? Courageous? Complicated? Character is fundamental to how we relate to and interpret the world around us, and good stories are the vehicle to viewing them.

In Disney’s Moana, for example, viewers empathise with the title character because they believe her cause is just: to save her island and prove herself as a leader. Her story hooks us in, and so can voice assistants. Alexa skill for kids program, Outer Space Alice, for example, gives users a character story to engage with.4 Alice is a fictitious teenage stowaway on the International Space Station who talks to users and shares her experience with kids by telling them where she is and fun facts about the location.

Like our celluloid heroes, voice assistants must also be designed with characteristics that create likeability and investment. In Moana a cast of supporting characters embody relatable traits — bravado, wisdom, fallibility, eccentricity — each resonating with viewers in different ways. With voice assistants too, where queries are not necessarily utilitarian in value (Alexa, tell me a joke?), traits need to lead to likeability to encourage future interaction.

For voice interfaces, personality is generally a set of character traits developers ascribe, and tone is the way in which this personality is expressed. Just as we want a newscaster to embody trustworthiness and sound authoritative, we want our assistants to be reliable and personable. Alice, for instance, was created with her own teen voice instead of the normal Alexa voice, to help engage with kids.5

2. Write human speech for robot minds

Designers often assume that ‘personality’ equals ‘person’ and try to make their creations as human as possible. But voice assistants are machines and they simply don’t work in the same way that humans do — programming needs to take this into account.

Take the anthropomorphic android, Data, in Star Trek: The Next Generation. Constructed to function as a human, Data nevertheless struggles to understand jokes, irony, and other subtleties his human colleagues use.

While his brain, unlike most computer algorithms, finds it easy enough to understand ‘continuous contingent interaction,’ — that is, a sequential conversation that builds off things previously discussed* — he still lacks the conceptual ability human minds possess to abstract and understand things like irony.

In one scene, the character of Wesley says to Data, “Say goodbye, Data,” who replies with “Goodbye, Data”. Other characters find this funny because he has misinterpreted what Wesley meant. Data, however, sees no humor in this — he executed the command he was given.

Additional semantic issues such as specificity (human speech can be notoriously vague); overlapping intent (for example, the word ‘target’ can mean multiple things based on context); and the inability to understand long, complex thought, can all limit an AI from producing a voice assistant that feels like talking to a human.6

As AI gets smarter, these interactions will become better at navigating these challenges, in the meantime special attention will need to be paid to ensure that voice assistants can mimic human conversations well enough that humans will want to keep talking.7

3. Understand the hero’s quest

Behind any movie plot is a character with a pain point that needs a solution. Epic stories nearly always have a protagonist overcoming a problem of great significance to them, both materially and emotionally.

This is also true of a user’s experience. Let’s say a mother wants to buy her child a set of books before school starts. While the material need is straightforward — buying a product — the emotional payoff is a complex mix of wants and needs. Perhaps this mother wants her daughter to succeed in school or wants to be considered a good caregiver.

In Star Wars, protagonist Luke Skywalker must venture beyond the comfort of his home planet to deliver a message to Princess Leia and ultimately, help the rebellion in the fight against the evil Empire. Woven within this narrative is Luke’s personal journey in understanding his Jedi heritage. The entire movie is a call-and-response to these central themes, helping the character to overcome challenges — trash compactors, tractor beams, Darth Vader — and discover what’s important to him.

Similarly, the emotional underpinnings of the mother’s quest will drive her to overcome pain points. She may find items out of stock, in the wrong colour, with unworkable delivery times or expensive postage costs. The stakes of her child’s education (and her perceived worth as a parent) are high enough that she persists until she achieves material and emotional success.

Understanding these emotional drivers is key to understanding the level of internal motivation towards a service or product, and how to design the best conversational path to achieve that goal. The hero’s journey for voice assistants is an exercise in designing robust dialogue to account for all the unexpected behaviors that occur during normal conversation.

4. Design for unpredictability

In a perfect world, voice assistant conversations should be fluid, sequential, and have an intended goal. But what happens when users stray from the ideal hero’s journey we believe they’re on? For instance, what if a customer wants to purchase a TV but they don’t want it delivered during the week, wants to throw in some DVDs but needs to know some titles or gets interrupted in the middle of the request.

These scenarios are not ones that a voice assistant can easily respond to yet all of these things could be handled by an in person employee.

Designing error paths to account for such scenarios can go a long way towards retaining user attention and trust. In Lord of the Rings, Frodo’s epic quest to reach Mount Doom is stopped by trolls, giant spiders, Balrogs and Sauron’s armies, but even though his journey ends up taking a different route than intended, ultimately he is able to reach his final destination and destroy the one ring. Like such narratives, a ‘Choose Your Own Adventure’ model in conversational flow should have a way to return the user to their original query and get them to their intended goal as quickly as possible.

Successful error paths elegantly reroute users back towards their intended goal. Sometimes, it’s as simple as stating “I can’t do that. Try asking me something else” or offering suggestions that help consumers best interact with the voice interface.

Write your customer’s story

The fundamental driver of a good user experience is fulfilling the customer’s needs. Doing so via voice technology has its challenges, but simply because the technology contains limitations doesn’t mean the interactions themselves must be limited. By taking lessons from storycraft on personality, tone and narrative we can create interactions in voice that build trust and loyalty.

Let the adventure begin.

* The use of the word ‘that’, for example, in referencing an object previously mentioned is an easy enough concept for a child to learn, but one that has proven nearly impossible to teach to natural language programs.


John Jones

John is a managing director in PwC’s New York Experience Center, PwC United States