Android Director: 'We Have the Most Accurate, Conversational, Synthesized Voice in the World'

Google's Hugo Barra, the product manager for Android, talks with Wired about what's new for the world's most popular mobile operating system: Google's voice, Google Now, Jelly Bean and the Asus Nexus 7 tablet.
Image may contain Hugo Barra Human Person Clothing and Apparel

When Google unveiled its latest mobile operating system to the world last week, the company asked a reserved but extremely confident man named Hugo Barra to grab the microphone, and celebrate Android 4.1 as the best mobile operating system the world has seen. It couldn't have been easy to sing the praises of an OS code-named "Jelly Bean" with a completely straight face, but Barra, Android's director of product management, was cool and composed as he shared Android's latest killer features.

There was the new graphically enhanced search tool, Google Now. There was the new voice-based search assistant -- Google's answer to Apple's Siri. And there was also a new piece of hardware -- the Nexus 7 -- which would show off Android's full potential. Barra anchored all these announcements, reporting the Google I/O news that the world was most interested in hearing.

And now he speaks directly with Wired about Google's mobile future. We sat down with Barra last week at Google I/O to pick his brain about the Nexus 7, and all the other key Android announcements. Here is the edited conversation.

Wired: Jelly Bean really has two major new features -- Google Now and voice search. Walk us through the thinking behind these additions.

Hugo Barra: The concept of a card with some information in it [Google Now] isn't actually new. For a long time, we've had the notion of "One Boxes." Whenever Google presents information to you on top of search results -- it's sort of formatted in a particular way, and physically separate from the search results -- we've called that a "One Box" for awhile. So we've taken that concept of a card with information in it just a few steps further by formatting it in a way that's more appropriate for mobile devices and giving it a significant amount of visual polish. It's not a new concept. It's just an advancement of an existing concept when it comes to search.

>‬"It’s very deliberately not making jokes with you. Google is a neutral party — it’s not your friend, secretary or sister."

\- Hugo Barra

‬__Wired:__ Is Google Now just making things looking prettier, or is this actually a use case-driven enhancement? Can you quantify whether this makes information easier or more accessible to the user?

Barra: It certainly is. If you've asked a question for which a specific answer or a small set of specific answers exist, you're likely wanting to see that specific answer, right? So rather than trusting that the user will sift through the web in a highly precisely ranked form, we take it one step further, and serve that answer up on an information card.

The second thing you talked about -- giving Google a voice -- is very use case-driven. If you're in a situation where you're asking a question with your voice, there's a significant chance you're in a somewhat constrained environment. You're on the go, you're rushing. You might be in the car. You're carrying something else with your hands. You can't really pause to look at your screen or type.

So speaking it back to you seems pretty natural, right? That's how humans communicate. But we also wanted to do that only when we had a text-to-speech engine that was extremely high quality. And what you hear today, if you ask Google a question on Jelly Bean, is quite spectacular. There isn't a text-to-speech engine, as we call them, that has accuracy as high as that.

We didn't talk about this in the keynote, but we have built a text-to-speech engine that's networked-based, meaning it uses a very large amount of data to compose a spoken answer. You know, purely from a synthesis perspective -- forget about answering questions -- it takes a very large amount of data to generate a synthesized audio of someone speaking. But we also have a matching engine that sits on the device. It's the exact same voice but with a very different computational technique. You'll always hear the same voice whether it's speaking back to you in a connected use-case, in which it comes from the server, or a disconnected offline use-case, in which it would just be synthesized on the device.

Wired: What makes a good voice? Did you model it after someone?

Barra: I actually come from speech recognition, and I worked in speech in general for a very long time. So don't let me talk about this all day. But it's a very, very intricate process. And it starts with finding a voice talent.

Wired: A real person?

Barra: Finding a person who has a voice that just nails it. And in this day and age, it's actually a very different voice talent than the voice talents that power most of the voice technology that exists today. A lot of today's voice technology comes from the companies you'd expect -- Nuance and Microsoft and others. That technology is built for a telephony world, for a customer service environment where you need this posh, powerful voice -- a branding approach to things.

We set out to create the very first conversational voice, and I think we nailed that. I think we have the very first high-quality, natural-sounding, conversational, synthesized voice in the entire world.

Between a bunch of designers, engineers and speech scientists, we sat down and tried to describe the personality of the person, the personality of the voice that we were trying to create. We wrote down "friendly" [as a product goal] and there were literally 15 different ways to describe what friendly means. So that was the brief that we gave to a casting agency, and they came back with 10 candidates. We recorded those 10 candidates, and we did a bunch of blind tests with all sorts of different people, and we voted it down to two people. And then we recorded more of those people, and we did some tests and we decided "OK, we're going to go with this one person."

I don't actually know her name. In fact, no one knows her name.

Wired: It's a secret?

Barra: It's supposed to be. It's not something that you publicize because it needs to be the voice of Google. And then you create the voice, you collect a lot of data. What we did is an industry first.

Wired: While it does sound more human-like, it doesn't have a lot of personality in the sense that it doesn't say funny things back to you. It doesn't deliver jokes.

Barra: So nothing to do with the voice itself, but what it says and how it says it?

Wired: Exactly. Is that something you guys were looking to add in the future, or is that something you wanted to leave out?

Barra: It's very deliberately not making jokes with you. Google is a neutral party -- it's not your friend, secretary or sister. It's not your mom. It's not your girlfriend or boyfriend. It is an information retrieval entity. You ask, we respond. And it's very important that this entity be impartial, and adding jokes and other mannerisms to the voice would take away from that.

It's something that we've talked about, and it's pretty clear. There hasn't been a single person in the company who thinks we should have gone the other direction.

Wired: Samsung already has S Voice and LG is working on its Quick Voice feature. So is Google introducing its own voice feature because it doesn't want 15 different variations of the same sort of function on Android devices?

Barra: It is not. It's simply an evolution of the Google search experience. All of the assets that we use -- both the online and offline speech engine, as well as the speech synthesizer -- those are all assets that our hardware partners can use to compose whatever experience they want. Our goal was simply to build the next-generation Google search experience. Voice in and voice out, and then a brand new feature called Google Now.

Wired: Is there a name for the voice that we hear in Jelly Bean?

Barra: Google Voice Search. It's always been called Voice Search. It continues to be called voice search.

Wired: What does Jelly Bean say about Google's view on the direction of mobile operating systems and devices, and the industry as a whole?

Barra: Some of the things that we did in Jelly Bean are representative of where we think the industry should go. I'll just mention two.

One is the home screen experience. We did this with Android with the first generation of widgets -- this notion of having an application space of your own where stuff appears and actions can be invoked, without having to dive into an application. People want that, people need that.

The second thing is task switching. There are all these awesome, specialized applications that exist today. I think there's a specialization trend, by the way, in mobile. You use a lot more applications a lot more often, often for very simple tasks, so put those in the notification shade. Something as simple as calling back should not be three clicks away. It should be one click away. Bringing the application action value to the surface, when it’s needed, where it’s needed. We think we’re doing a lot of things that set direction for the industry.

Wired: Android 4.0, Ice Cream Sandwich, at this point, is on only about 7 percent of Android devices. The fact that Ice Cream Sandwich and Jelly Bean are so similar, is that going to make it easier for hardware partners to port their software over? Or will we see the same lag in adopting the latest software that we saw with Ice Cream Sandwich?

Barra: We don't know. These are business decisions that our partners make, but we certainly are making it easier.

First, you are right that they are similar and that does, yes, make it easier. If you take a look at the difference between the two platforms, you'll see that there is a smaller difference between Jelly Bean and Ice Cream Sandwich than there was between Ice Cream Sandwich and Gingerbread.

But we are launching the Platform Development Kit, the PDK, for our hardware partners. It's starting in beta. It's really going to be full-on in the next release, but it's already there. We want partners to innovate in parallel so that by the time we're ready, they're ready. I think that will shorten the cycle and that's really the objective of the PDK.

Wired: The Nexus 7 tablet its the first Jelly Bean tablet and it looks really different than any Honeycomb or Ice Cream Sandwich tablets that are out there. The operating system stays in a portrait orientation. You even have an app tray very similar to what we see on our phones. Is this a signal to your hardware partners that says, "This is the style in which you should make tablets?"

Barra: It is a signal to the industry. We've done a tremendous amount of user research to understand what people want. But first, a few things.

We do think that this form factor is one that the industry hasn’t embraced as much as it should. This fills a very important gap. It’s a device that you can carry in a small purse or back pocket. Just walk around Moscone, and that’s what you're going to see. It's the device that's totally cool to have with you in the subway or the bus, and then when you get up, you don't have to put it away.

There's a huge market gap that we're filling with the Nexus 7, and we're doing it really well because it's a really powerful computer. It's the most powerful 7-inch tablet the world has seen by leaps and bounds. In that sense, we're setting a direction for the industry, or suggesting a direction for the industry.

As far as the user interface is concerned, we think that Jelly Bean is a much more modern UI for a tablet of this size. When it comes to the 10-inch size, that's really going to depend on the product partners.

Wired: Are we going to see a Nexus 10?

Barra: This is where we're starting. We'll take it one step at a time. This is where we're starting and we'll see what partners do in the 10-inch form factor.

Wired: What did the relationship with Asus look like? Did you just get a bunch of guys from Asus to come down to Mountain View, and worked together everyday? Or did Google design something, and say, "Hey, build this for us?"

Barra: I think it was about four months, and we did have them over and we also went over there ourselves. It was a lot of hard work at high intensity because of the short period of time. We really wanted to get something out here, but it was actually great to have a particular place in time where it was "If we don't make it by then, it'll no longer be available to us." We wanted to launch something here at I/O and it was a lot of work.

Wired: Four months is a very short period of time. Did Google see the MeMO 370T at CES and turn that into a Nexus tablet? Or we're you all looking for the right hardware partner, and just hadn’t found it until fourth months ago?

Barra: We didn't think that someone had nailed the digital content device. I'm talking about a device that allows you to do movies, books, magazines and so on, but also gaming. Super high-performance gaming, with a gyroscope, a pretty powerful GPU and so on. We didn't think anyone had nailed that in this form factor. We thought there was an opportunity, a gap in the world. So we spent a little bit of time talking to people until we found the right partner and when we did, it was full speed ahead.

Wired: Does Google need to convince consumers that the Nexus 7 is an entertainment device worth picking up? The price is right, the hardware and specs are right, and the content is there, but consumers haven't traditionally seen Google as a place to buy digital media.

Barra: We've just built a new brand that didn't exist a few months ago.

Wired: Google Play?

Barra: Yes, Google Play. We all know that new brands don't make themselves. They require education and marketing. Android Market wasn't an obvious destination for you to go buy a book. It really wasn't. And therefore, yeah, we do have to make it known to people that’s a destination that will have the stuff that they want.

Google Play is Nexus 7 and Nexus 7 is Google Play. So which one is it that you are selling? Is it Google Play or the Nexus 7? Well, it's really both. So hopefully that'll work. And you know, page 5 of the Wall Street Journal, we had a full page ad [on Thursday]. We're really serious about this.