Hey Siri, Why Do I Speak Differently to You?

Woman speaking into her Alexa device. UC Davis researchers have found that people speak differently when talking with voice artificial intelligence. — UC Davis researchers have found that we speak differently when talking to voice artificial intelligence. (Getty)

If you’ve ever asked Siri or Alexa something, you may have noticed you speak MORE LOUDLY, slowly or make your words “clear-er.” UC Davis researchers say most of us speak differently when talking to our devices. Voice artificial intelligence may even be changing our social behavior. In this episode of , we’ll talk to two UC Davis linguists to find out why voice AI is changing the way we operate.

In this episode:

, associate professor, Department of Linguistics, UC Davis

postdoctoral researcher, Department of Linguistics, UC Davis

Transcript

Transcripts may contain errors. Transcribed by AI. Fixed by humans.

Amy Quinton

Hey, Marianne.

Marianne Russ Sharp

Hey, Amy.

Amy Quinton

Have you talked to an Amazon Alexa before?

Marianne Russ Sharp

Not really. Does that make me like, a really weird person? But if I did, I would ask you to play my favorite podcast Unfold. I know I'm one of those people who doesn't have one of those. And I do talk to Siri, sometimes usually by accident. Does that count?

Amy Quinton

I mean, sure. They're both voice artificial intelligence. But Alexa and I have had a real conversation, and I recorded it.

Marianne Russ Sharp

I bet she recorded it too. Creepy.

Amy Quinton

She did actually. She's part of a conversational competition.

Marianne Russ Sharp

What?

Amy Quinton

Yeah, just listen.

Amy Quinton

Alexa, can you chat with me?

ALEXA

Welcome to the Alexa prize, a university competition to advance AI. I'll give you a social bot now. Hi, this is an Alexa prize social bot. So what's on your mind? Let's dive in.

Marianne Russ Sharp

The Alexa prize. What's that?

Amy Quinton

It's a social bot competition. You know, a UC Davis team, aptly named team Gunrock from our engineering department won this contest a couple of years ago. The goal is to create an Alexa social bot and get her to converse engagingly on a range of popular topics for about 20 minutes.

Marianne Russ Sharp

20 minutes is a long time even for AI? How long did you chat with Alexa?

Amy Quinton

Not long. Here's what it sounded like. What do you want to talk about?

ALEXA

I would love to talk about anything related to sports or technology? What would you like to know?

Amy Quinton

Should we be afraid of AI?

ALEXA

That's a great topic. What do you think is the most terrifying aspect of being afraid of an eye?

Amy Quinton

of artificial intelligence?

ALEXA

That's an interesting topic as well. Do you think AI has the potential to replace human intelligence in certain areas?

Amy Quinton

No, but it's a fear. And it's weird that you sound like a person.

ALEXA

It's not weird at all. As a bot, I do not have emotions like humans do. But I can understand why it would be scary to think about the potential consequences of AI replacing human intelligence. Do you have any other thoughts on this topic?

Marianne Russ Sharp

It's kind of interesting, because she says it's not weird at all. And I do think it's kind of weird. But she also says, I don't have emotions, but I can understand like, I don't have emotions, but I have a brain or essentially the power to... right? The brain of a computer.

Amy Quinton

Yeah, well, you know, I gave up on the conversation after maybe five minutes. She misunderstood me a few times.

Marianne Russ Sharp

Yeah, like being afraid of an eye. Again, definitely weird.

Amy Quinton

And I had to repeat myself.

Marianne Russ Sharp

Yeah, that happens all the time.

Amy Quinton

And I slowed down my speech and talked a little louder and clearer for some reason.

Marianne Russ Sharp

I do the same thing with Siri.

Amy Quinton

Everyone does, or at least most people. UC Davis researchers have studied this. One of the things we do is called hyperarticulation.

Marianne Russ Sharp

Sort of exaggerating our speech to be clearer, like we do sometimes with babies.

Amy Quinton

Yeah, exactly.

Marianne Russ Sharp

So we have researchers that study how we talk to AI.

Amy Quinton

Yeah.

Marianne Russ Sharp

Why?

Amy Quinton

Well, that's the first question I asked Georgia Zellou, an associate professor of linguistics.

Georgia Zellou

For the first time in human history. I mean, starting since 2011, when Siri was introduced, humans are talking to non human entities in a substantive and meaningful way. And it's happening on a daily basis, and children are doing it and people of all ages are doing it. So trying to understand their impact on our language, our development, our our sort of social life is something that we're interested in exploring. There's so many kinds of unanswered questions here.

Marianne Russ Sharp

Indeed, so many unanswered questions. So Amy, is she suggesting that our language might even change over time as a result of us talking with voice AI?

Amy Quinton

Yeah, it already is, and just how our future conversations will unfold with voice AI is what we're going to be talking about in this episode of Unfold.

Marianne Russ Sharp

All right, then, as the Alexa social bot says...

ALEXA

Let's dive in.

Amy Quinton

Coming to you from UC Davis, this is Unfold. I'm Amy Quinton.

Marianne Russ Sharp

And I'm Marianne Russ Sharp.

Amy Quinton

Voice AI or machine-made voices speak to us all the time. Whether it's Siri as our personal assistant, or our GPS system telling us which way to turn, it's really become a staple in a lot of households.

Marianne Russ Sharp

Yeah, GPS has saved my life more times than I can remember. But humans have been fascinated with making machines talk for a very long time. I mean, that's certainly played out in Hollywood. All over the place. I think about "2001 A Space Odyssey." Remember Hal?

Movie Clip

Open the pod bay doors Hal. I'm sorry, Dave. I'm afraid I can't do that.

Marianne Russ Sharp

Or that kid in WarGames that made his computer talk?

Movie Clip

Yes, they do. How can it talk? It's not a real voice. This box just interprets signals from the computer and turns them into sound. Shall we play a game?

Amy Quinton

Good examples, although those aren't really machines talking but actors playing the roles of Hal and Joshua, the computers. But do you remember Speak and Spell, the toy that taught you how to spell?

Marianne Russ Sharp

Yes, I do.

Speak and Spell

Spell Circuit, C, I R, C, U, I T. That is correct.

Marianne Russ Sharp

Gosh, that sounds so robotic. Thinking about how Siri and Alexa sound now, Boy, technology has come a long, long way.

Amy Quinton

Did you know that the first device that could generate continuous human speech electronically was invented in 1939?

Marianne Russ Sharp

1939? I would not have guessed. Are you serious?

Amy Quinton

Yeah, it was called the VODER for Voice Operation DEmonstratoR.

Marianne Russ Sharp

VODER. Not the best name but a magnificent feat for 1939.

Amy Quinton

It was invented by a guy named Homer Dudley with Bell Laboratories. And it was a demonstration of their innovations. And it was a highlight of the 1939 World's Fair.

Marianne Russ Sharp

How did it work?

Amy Quinton

Well, the speech sounds had to be made manually on a special keyboard that produces all the vowels and consonants of speech. So it required an operator who also had to use a wrist bar and foot pedals and an arm switch to generate sounds.

Marianne Russ Sharp

Wow, so it was a workout and not an easy conversation?

Amy Quinton

No, but I have a recording of it, actually. Want to hear it?

Marianne Russ Sharp

Yes, please.

Amy Quinton

Well, in this demonstration, the operator of the VODER is a woman named Helen Harper.

MAN

For example, Helen, Will you have the VODER say, "She saw me?"

VODER

SHE SAW ME.

MAN

That sounded awfully flat. How about a little expression? Say the sentence in answer to these questions? Who saw you?

VODER

SHE saw me.

MAN

Whom did she see?

VODER

She saw ME?

Man

Well did she see you or hear you?

VODER

She SAW me.

Marianne Russ Sharp

The intonation. Already at that time. It's kind of a blowing my mind. Although I will say it sounded nothing like the voice AI that we hear today. No computer algorithms were creating that one.

Amy Quinton

Yeah, but it was pretty impressive. My understanding was that it was really difficult to operate, but it paved the way for future machine operated speech.

Marianne Russ Sharp

Wow. And now we have researchers investigating whether that machine operated speech is changing the way we operate, or at least how we speak.

Amy Quinton

And they're also trying to understand our social interaction and behavior toward voice AI. In other words, are we treating these devices like people and building a mental picture of what they are like? Georgia Zellou, who we spoke to earlier, explained it like this.

Georgia Zellou

As soon as machines speak to us with voices, they are portraying apparent gender, apparent race, apparent regional background, apparent language background, and all these things in natural human-human conversation are really, really significant and important and affect how we perceive and use language. So are we just doing the same thing when we talk to machines? Or are we sort of creating a very specialized, separate way of handling machines?

Marianne Russ Sharp

It seems to me it would be unavoidable that we would treat them like machines and not humans. We know, as we're speaking to them that they are not a human. So doesn't that mean that our language or our voice would change when we talk to devices like Siri?

Amy Quinton

Well, Michelle Cohn is a postdoctoral researcher in linguistics at UC Davis and she and Georgia did a couple of experiments to figure this out. They had people, both young adults and children, talk to devices and talk to other humans saying the same phrases. They even introduced planned errors with both humans and devices.

Marianne Russ Sharp

So they intentionally had a person and a device misunderstand them?

Amy Quinton

Yeah. And here's how Michelle said a human talked with voice AI compared to how a human talked with another human.

Michelle Cohn

They're speaking more loudly to voice assistants, often slowly. They produce either increases or decreases in their pitch. One interesting thing with that Siri study is we found that speakers produced less pitch variation, so kinda like more monotone speech to the Siri voice than the human voice.

Marianne Russ Sharp

That's not too surprising. So it's like, "Hey, Siri, what's the weather like?"

Amy Quinton

But also people who talk to voice AI are hyperarticulating, making the segments of speech slow-er and clear-er.

Marianne Russ Sharp

Un-der-stood. So if our voices changed when talking to machines, we are then treating them differently, right? Like they are machines.

Amy Quinton

Well, Georgia says it's not that simple.

Georgia Zellou

So what we know about real human human interaction is that we naturally adapt. Conversation is dynamic. So I'll change my tone of voice or the words that I'm using, as our conversation is unfolding, um, in a natural way in response to your behavior. And you do that to me vice versa.

Marianne Russ Sharp

So if my tone of voice changes, like maybe I get excited, yes, you're likely to change your tone of voice and sound excited too?

Amy Quinton

Yeah, you know, and even if I hang around someone with a thick accent, I know I'm likely to pick up on their pronunciation and start speaking like them after a while. Michelle says there's actually a technical term for this,

Michelle Cohn

That process is called like alignment or mirroring. The idea is that you adopt the pronunciation patterns of other people to align more closely to them socially. Plays a social role. And so the idea is when you are getting along with someone and you want to convey that you convey that through the your speech patterns, but also in other, in your body language and gaze. People do these micro sways together. So this, there's like this huge, intricate dance of coordination.

Marianne Russ Sharp

I want to do the micro sway. But here's the thing, devices can't mirror you like this, right? There's no body language.

Amy Quinton

No, but Michelle suggests humans might instead be changing their voice to be more monotone, more clear, to reflect what they're hearing from voice AI or Siri.

Michelle Cohn

So we could think about it as kind of like another alignment, like they're aligning more towards what they think that voice sounds like maybe to be better understood. But the cluster of adaptations really reflect this expectation that it's not going to understand you, even if the actual interaction that day or in the lab is exactly the same for the human.

Marianne Russ Sharp

Yeah, I pretty much assume it's never going to understand me on the first try. And I articulate pretty well, I, I might even hyperarticulate but anyway, moving on, you did mention that the researchers did experiments with adults and kids. And so as a mom, I'm pretty curious if the kid's voice has changed as well.

Amy Quinton

Yeah, so these were school aged kids, aged seven to 12, in one of the experiments. See if you can tell a difference with this exchange. The first one is between a child and a human.

Human

What's number three?

Human

The word is side.

Human

I heard side. Say the sentence one more time.

Human

The word is side. Okay.

Amy Quinton

Now see if you can hear the difference between a child and a device.

Device

What's number three?

Human

The word is kid.

Device

I misunderstood that. I heard kit or kid. Repeat the sentence one more time.

Human

The word is kid.

Device

Got it.

Amy Quinton

Could you tell the difference?

Marianne Russ Sharp

Yeah, On the word kid. You can hear the D is over emphasized.

Amy Quinton

Yeah, I could also tell that she was slowing down her voice a little bit.

Marianne Russ Sharp

Yeah

Amy Quinton

I think it's very slight but Michelle and Georgia say it's a significant difference.

Michelle Cohn

Kids actually produce even more kind of evidence that they they perceive the devices having a barrier.

Georgia Zellou

People hyperarticulate to devices and kids do it even more. Bigger. Like there's a bigger difference between humans and devices for kids than adults produce.

91Porn

Transcript