
If you’ve ever asked Siri or Alexa something, you may have noticed you speak MORE LOUDLY, slowly or make your words “clear-er.” UC Davis researchers say most of us speak differently when talking to our devices. Voice artificial intelligence may even be changing our social behavior. In this episode of , we’ll talk to two UC Davis linguists to find out why voice AI is changing the way we operate.
In this episode:
, associate professor, Department of Linguistics, UC Davis
postdoctoral researcher, Department of Linguistics, UC Davis
Transcript
Transcripts may contain errors. Transcribed by AI. Fixed by humans.
Amy Quinton
Hey, Marianne.
Marianne Russ Sharp
Hey, Amy.
Amy Quinton
Have you talked to an Amazon Alexa before?
Marianne Russ Sharp
Not really. Does that make me like, a really weird person? But if I did, I would ask you to play my favorite podcast Unfold. I know I'm one of those people who doesn't have one of those. And I do talk to Siri, sometimes usually by accident. Does that count?
Amy Quinton
I mean, sure. They're both voice artificial intelligence. But Alexa and I have had a real conversation, and I recorded it.
Marianne Russ Sharp
I bet she recorded it too. Creepy.
Amy Quinton
She did actually. She's part of a conversational competition.
Marianne Russ Sharp
What?
Amy Quinton
Yeah, just listen.
Amy Quinton
Alexa, can you chat with me?
ALEXA
Welcome to the Alexa prize, a university competition to advance AI. I'll give you a social bot now. Hi, this is an Alexa prize social bot. So what's on your mind? Let's dive in.
Marianne Russ Sharp
The Alexa prize. What's that?
Amy Quinton
It's a social bot competition. You know, a UC Davis team, aptly named team Gunrock from our engineering department won this contest a couple of years ago. The goal is to create an Alexa social bot and get her to converse engagingly on a range of popular topics for about 20 minutes.
Marianne Russ Sharp
20 minutes is a long time even for AI? How long did you chat with Alexa?
Amy Quinton
Not long. Here's what it sounded like. What do you want to talk about?
ALEXA
I would love to talk about anything related to sports or technology? What would you like to know?
Amy Quinton
Should we be afraid of AI?
ALEXA
That's a great topic. What do you think is the most terrifying aspect of being afraid of an eye?
Amy Quinton
of artificial intelligence?
ALEXA
That's an interesting topic as well. Do you think AI has the potential to replace human intelligence in certain areas?
Amy Quinton
No, but it's a fear. And it's weird that you sound like a person.
ALEXA
It's not weird at all. As a bot, I do not have emotions like humans do. But I can understand why it would be scary to think about the potential consequences of AI replacing human intelligence. Do you have any other thoughts on this topic?
Marianne Russ Sharp
It's kind of interesting, because she says it's not weird at all. And I do think it's kind of weird. But she also says, I don't have emotions, but I can understand like, I don't have emotions, but I have a brain or essentially the power to... right? The brain of a computer.
Amy Quinton
Yeah, well, you know, I gave up on the conversation after maybe five minutes. She misunderstood me a few times.
Marianne Russ Sharp
Yeah, like being afraid of an eye. Again, definitely weird.
Amy Quinton
And I had to repeat myself.
Marianne Russ Sharp
Yeah, that happens all the time.
Amy Quinton
And I slowed down my speech and talked a little louder and clearer for some reason.
Marianne Russ Sharp
I do the same thing with Siri.
Amy Quinton
Everyone does, or at least most people. UC Davis researchers have studied this. One of the things we do is called hyperarticulation.
Marianne Russ Sharp
Sort of exaggerating our speech to be clearer, like we do sometimes with babies.
Amy Quinton
Yeah, exactly.
Marianne Russ Sharp
So we have researchers that study how we talk to AI.
Amy Quinton
Yeah.
Marianne Russ Sharp
Why?
Amy Quinton
Well, that's the first question I asked Georgia Zellou, an associate professor of linguistics.
Georgia Zellou
For the first time in human history. I mean, starting since 2011, when Siri was introduced, humans are talking to non human entities in a substantive and meaningful way. And it's happening on a daily basis, and children are doing it and people of all ages are doing it. So trying to understand their impact on our language, our development, our our sort of social life is something that we're interested in exploring. There's so many kinds of unanswered questions here.
Marianne Russ Sharp
Indeed, so many unanswered questions. So Amy, is she suggesting that our language might even change over time as a result of us talking with voice AI?
Amy Quinton
Yeah, it already is, and just how our future conversations will unfold with voice AI is what we're going to be talking about in this episode of Unfold.
Marianne Russ Sharp
All right, then, as the Alexa social bot says...
ALEXA
Let's dive in.
Amy Quinton
Coming to you from UC Davis, this is Unfold. I'm Amy Quinton.
Marianne Russ Sharp
And I'm Marianne Russ Sharp.
Amy Quinton
Voice AI or machine-made voices speak to us all the time. Whether it's Siri as our personal assistant, or our GPS system telling us which way to turn, it's really become a staple in a lot of households.
Marianne Russ Sharp
Yeah, GPS has saved my life more times than I can remember. But humans have been fascinated with making machines talk for a very long time. I mean, that's certainly played out in Hollywood. All over the place. I think about "2001 A Space Odyssey." Remember Hal?
Movie Clip
Open the pod bay doors Hal. I'm sorry, Dave. I'm afraid I can't do that.
Marianne Russ Sharp
Or that kid in WarGames that made his computer talk?
Movie Clip
Yes, they do. How can it talk? It's not a real voice. This box just interprets signals from the computer and turns them into sound. Shall we play a game?
Amy Quinton
Good examples, although those aren't really machines talking but actors playing the roles of Hal and Joshua, the computers. But do you remember Speak and Spell, the toy that taught you how to spell?
Marianne Russ Sharp
Yes, I do.
Speak and Spell
Spell Circuit, C, I R, C, U, I T. That is correct.
Marianne Russ Sharp
Gosh, that sounds so robotic. Thinking about how Siri and Alexa sound now, Boy, technology has come a long, long way.
Amy Quinton
Did you know that the first device that could generate continuous human speech electronically was invented in 1939?
Marianne Russ Sharp
1939? I would not have guessed. Are you serious?
Amy Quinton
Yeah, it was called the VODER for Voice Operation DEmonstratoR.
Marianne Russ Sharp
VODER. Not the best name but a magnificent feat for 1939.
Amy Quinton
It was invented by a guy named Homer Dudley with Bell Laboratories. And it was a demonstration of their innovations. And it was a highlight of the 1939 World's Fair.
Marianne Russ Sharp
How did it work?
Amy Quinton
Well, the speech sounds had to be made manually on a special keyboard that produces all the vowels and consonants of speech. So it required an operator who also had to use a wrist bar and foot pedals and an arm switch to generate sounds.
Marianne Russ Sharp
Wow, so it was a workout and not an easy conversation?
Amy Quinton
No, but I have a recording of it, actually. Want to hear it?
Marianne Russ Sharp
Yes, please.
Amy Quinton
Well, in this demonstration, the operator of the VODER is a woman named Helen Harper.
MAN
For example, Helen, Will you have the VODER say, "She saw me?"
VODER
SHE SAW ME.
MAN
That sounded awfully flat. How about a little expression? Say the sentence in answer to these questions? Who saw you?
VODER
SHE saw me.
MAN
Whom did she see?
VODER
She saw ME?
Man
Well did she see you or hear you?
VODER
She SAW me.
Marianne Russ Sharp
The intonation. Already at that time. It's kind of a blowing my mind. Although I will say it sounded nothing like the voice AI that we hear today. No computer algorithms were creating that one.
Amy Quinton
Yeah, but it was pretty impressive. My understanding was that it was really difficult to operate, but it paved the way for future machine operated speech.
Marianne Russ Sharp
Wow. And now we have researchers investigating whether that machine operated speech is changing the way we operate, or at least how we speak.
Amy Quinton
And they're also trying to understand our social interaction and behavior toward voice AI. In other words, are we treating these devices like people and building a mental picture of what they are like? Georgia Zellou, who we spoke to earlier, explained it like this.
Georgia Zellou
As soon as machines speak to us with voices, they are portraying apparent gender, apparent race, apparent regional background, apparent language background, and all these things in natural human-human conversation are really, really significant and important and affect how we perceive and use language. So are we just doing the same thing when we talk to machines? Or are we sort of creating a very specialized, separate way of handling machines?
Marianne Russ Sharp
It seems to me it would be unavoidable that we would treat them like machines and not humans. We know, as we're speaking to them that they are not a human. So doesn't that mean that our language or our voice would change when we talk to devices like Siri?
Amy Quinton
Well, Michelle Cohn is a postdoctoral researcher in linguistics at UC Davis and she and Georgia did a couple of experiments to figure this out. They had people, both young adults and children, talk to devices and talk to other humans saying the same phrases. They even introduced planned errors with both humans and devices.
Marianne Russ Sharp
So they intentionally had a person and a device misunderstand them?
Amy Quinton
Yeah. And here's how Michelle said a human talked with voice AI compared to how a human talked with another human.
Michelle Cohn
They're speaking more loudly to voice assistants, often slowly. They produce either increases or decreases in their pitch. One interesting thing with that Siri study is we found that speakers produced less pitch variation, so kinda like more monotone speech to the Siri voice than the human voice.
Marianne Russ Sharp
That's not too surprising. So it's like, "Hey, Siri, what's the weather like?"
Amy Quinton
But also people who talk to voice AI are hyperarticulating, making the segments of speech slow-er and clear-er.
Marianne Russ Sharp
Un-der-stood. So if our voices changed when talking to machines, we are then treating them differently, right? Like they are machines.
Amy Quinton
Well, Georgia says it's not that simple.
Georgia Zellou
So what we know about real human human interaction is that we naturally adapt. Conversation is dynamic. So I'll change my tone of voice or the words that I'm using, as our conversation is unfolding, um, in a natural way in response to your behavior. And you do that to me vice versa.
Marianne Russ Sharp
So if my tone of voice changes, like maybe I get excited, yes, you're likely to change your tone of voice and sound excited too?
Amy Quinton
Yeah, you know, and even if I hang around someone with a thick accent, I know I'm likely to pick up on their pronunciation and start speaking like them after a while. Michelle says there's actually a technical term for this,
Michelle Cohn
That process is called like alignment or mirroring. The idea is that you adopt the pronunciation patterns of other people to align more closely to them socially. Plays a social role. And so the idea is when you are getting along with someone and you want to convey that you convey that through the your speech patterns, but also in other, in your body language and gaze. People do these micro sways together. So this, there's like this huge, intricate dance of coordination.
Marianne Russ Sharp
I want to do the micro sway. But here's the thing, devices can't mirror you like this, right? There's no body language.
Amy Quinton
No, but Michelle suggests humans might instead be changing their voice to be more monotone, more clear, to reflect what they're hearing from voice AI or Siri.
Michelle Cohn
So we could think about it as kind of like another alignment, like they're aligning more towards what they think that voice sounds like maybe to be better understood. But the cluster of adaptations really reflect this expectation that it's not going to understand you, even if the actual interaction that day or in the lab is exactly the same for the human.
Marianne Russ Sharp
Yeah, I pretty much assume it's never going to understand me on the first try. And I articulate pretty well, I, I might even hyperarticulate but anyway, moving on, you did mention that the researchers did experiments with adults and kids. And so as a mom, I'm pretty curious if the kid's voice has changed as well.
Amy Quinton
Yeah, so these were school aged kids, aged seven to 12, in one of the experiments. See if you can tell a difference with this exchange. The first one is between a child and a human.
Human
What's number three?
Human
The word is side.
Human
I heard side. Say the sentence one more time.
Human
The word is side. Okay.
Amy Quinton
Now see if you can hear the difference between a child and a device.
Device
What's number three?
Human
The word is kid.
Device
I misunderstood that. I heard kit or kid. Repeat the sentence one more time.
Human
The word is kid.
Device
Got it.
Amy Quinton
Could you tell the difference?
Marianne Russ Sharp
Yeah, On the word kid. You can hear the D is over emphasized.
Amy Quinton
Yeah, I could also tell that she was slowing down her voice a little bit.
Marianne Russ Sharp
Yeah
Amy Quinton
I think it's very slight but Michelle and Georgia say it's a significant difference.
Michelle Cohn
Kids actually produce even more kind of evidence that they they perceive the devices having a barrier.
Georgia Zellou
People hyperarticulate to devices and kids do it even more. Bigger. Like there's a bigger difference between humans and devices for kids than adults produce.
<