How Google Goggles Won, Then Lost, the Camera-First Future

As more people talk, play, and work through the lens of their smartphone, Google's trying to finish what it started.
This image may contain Electronics Camera Phone Cell Phone and Mobile Phone
Google

Google's first public foray into augmented reality began with an argument in a bar. It was 2008, and David Petrou, a longtime Google engineer, was sitting in a Brooklyn watering hole explaining to his friends how someday, you'd be able to do a search just by pointing your phone's camera at something. He likened it to pointing and asking, what's that? It would be faster and richer than typing, and could help you ask questions you'd never be able to put into words. Based on what he'd seen within Google, Petrou thought the tech could already work. His friends, of course, said he was crazy. They thought computer vision was science fiction.

Petrou left the bar early and angry, went home, and started coding. Despite having no background in computer vision, and a day job working on Google's Bigtable database system, Petrou taught himself Java so he could write an Android app and immersed himself in Google's latest work on computer vision. After a month of feverish hacking, Petrou had the very first prototype of what would soon become Google Goggles.

Petrou still has a video of an early demo. He's crammed into a Google conference room with Ted Power, a UX designer, talking into a webcam. Before he starts, Petrou explains what he's working on. "The idea is generic image annotation, where an image can come in to Google and a number of back-ends can annotate that image with some interesting features." Crystal clear, right?

To explain, Petrou grabs a G1, Google's then-new Android phone, and takes a photo of a newspaper article about Congressional oversight of ExxonMobil. A moment later, the phone spits back all the article's text, rendered in white text on a black background. It looked like a DOS prompt, not a smartphone app, but it worked impressively—except near the end when it spelled the company Em<onMobile. A few minutes later, Petrou showed a terribly lit photo of Power's desk, littered with books and cables, with a MacBook in the center. The app surveyed the image and returned 10 terms to describe it. Some made sense, like "room" and "interior." Others, like "Nokia," less so. Two terms particularly excited Petrou: "laptop" and "MacBook." They showed this camera could see objects, and understand them. But still, almost immediately after, Petrou preached caution: "We are a long way to providing perfect results," he said into the webcam.

The first versions of Goggles couldn't do much, and couldn't do it very well. But searching the web just by taking a photo still felt like magic. Over the next few years, Google Goggles would capture the imaginations of Google executives and users alike. Before Apple built ARKit and Microsoft made HoloLens, before anyone else began to publicly explore the possibilities of augmented reality, Goggles provided a crucial early example of how smartphones could interact with the real world.

Then Goggles died. The first great experiment in smartphone AR came and went before anyone else could even copy it.

Robin Williams used to joke that the Irish discovered civilization, then had a Guinness and forgot where they left it. So it was with Google and smartphone cameras. Nearly a decade ago, Google engineers were working on ideas that you'll now find in Snapchat, Facebook, the iPhone X, and elsewhere. As the tech industry moves towards the camera-first future, in which people talk, play, and work through the lens of their smartphone, Google's now circling back, tapping those same ideas and trying to finish what it started. This time it's hoping it's not too late.

Seeing Is Believing

When Petrou first started working on Goggles, he had no idea how many other Googlers were working on the same stuff, and how long they'd been at it. In 2006, Google had acquired a Santa Monica-based company called Neven Vision, which possessed some of the most advanced computer-vision tools anywhere. Google had a particular idea for where to deploy it: in its Picasa photo-sharing app. "It could be as simple as detecting whether or not a photo contains a person, or, one day, as complex as recognizing people, places, and objects," Adrian Graham, Picasa's product manager, wrote in a blog post announcing the acquisition. "This technology just may make it a lot easier for you to organize and find the photos you care about."

After a couple of years, as Neven Vision's tech integrated further into Picasa, founder Hartmut Neven and his team started to think a little bigger. "We were all inspired by the Terminator movie, when he walks into the bar and everything gets identified," says Shailesh Nalawadi, a former product manager on the team and now CEO at Mavin, Inc. "We thought, 'Hey, wouldn't it be amazing if you could have something like that, match it against a database, and it would tell you what's in that picture?'"

Eventually the Neven Vision crew met Petrou, and they started working on a better prototype. They built an app that could identify book covers, album art, paintings, landmarks, and lots of other well-known images. You'd take a picture, and after 20 seconds or so of uploading and processing, the app would return search results for whatever you were looking at. It was primitive, but it worked.

Lots of projects within Google start the same way: one person builds something, shows it around, generates enough excitement to get a few more people interested, and they contribute resources to build it out further. For the Goggles team, that happened easily. Almost everyone who saw the app walked away amazed by it. Two execs in particular became high-level champions of the idea: Vic Gundotra, a vice president of engineering, and Alan Eustace, a senior vice president of knowledge. They brought resources, energy, and ambition to Goggles. Googlers started to talk about how great it would be when the app was universal, when it could recognize anything and everything. "Everyone at Google understood that this was possible, this was familiar, and yet transformative," Nalawadi remembers. "That we were on the cusp of this thing, and it could be done." He likens it to self-driving cars: wild and futuristic, but also completely natural. Why shouldn't you be able to point your phone at something and ask, what's that? It felt inherently Google-y.

Google launched Goggles as a public product in December of 2009, at an event at the Computer History Museum down the street from Google's Mountain View campus. The product demoed only had a few features: It could identify landmarks, works of art, and some consumer products, but little else. Google projected both caution and optimism about the product. It was part of Google Labs, and even in the app's setup it told you all the things it couldn't do. But everyone knew the plan. "Google Goggles today works very well on certain types of objects in certain categories, but it is our goal to be able to visually identify any image over time," Gundotra said at the launch. "Today you have to frame a picture and snap a photo, but in the future you'll simply be able to point to it...and we'll be able to treat it like a mouse pointer for the real world. "

Internally, though, the team behind Goggles was staring down a long list of problems with the technology. They knew that mouse-pointer future was years off, if it was even possible. "We always knew it was more like a research project," one former engineer says. Even the most advanced computer vision was still quite primitive, and since Google hadn't yet begun to work deeply with machine learning and neural networks, all Goggles could do was pattern-match a photo against a database.

Some of the problems weren't even Google's to solve. Smartphone cameras weren't yet great, nor were people very good at using them. And even when people took good photos, there were often lots of potentially interesting things in them; Google couldn't know if you cared about the tree, the bench, the puppy, or the sign in your shot. Text-recognition tech could help identify things, but even that was brand new. Curved or handwritten text challenged the algorithms, as did a model of car or any other object only identifiable by subtle differences. Logos were easy; plants were hard. Barcodes were simple; animals were impossible. Even things that did work just took too long over 3G.

Most frustrating, Google couldn't even use the thing it did best, the most Terminator-y feature of all: facial recognition. "If there are six or more pictures of you on the internet that are well-tagged, and you through our system take a seventh picture, you had 90 percent probability of the right answer being in the first ten search results," Nalawadi says. But Google knew it couldn't roll out the feature at a time when regulators and consumers were already beginning to worry about how much Google knew about them. Scarred from the launch of Google Buzz a few months earlier, which had been rife with privacy violations, they left facial recognition on the cutting room floor.

What it looked like to use the first versions of Google Goggles.

Even as the team hammered away at the many mountainous tasks, Google kept preaching the Goggles gospel. In the summer of 2010, Petrou delivered a keynote address at the Hot Chips conference at Stanford, in which he laid out an even more exciting vision. About halfway through an otherwise deeply technical talk, Petrou flipped to a slide called "Digression into Augmented Reality." The Goggles team had been thinking about AR for a while, it turned out. They figured if your camera could understand what it was seeing, it could potentially add more things into the scene. One former engineer recalled experimenting with how to identify things inside your viewfinder, so that when a car drove through your view, a small AR arrow that said "Subaru" might follow the car. Petrou, likewise, imagined a user standing on the famous Abbey Road crosswalk and watching the Beatles re-create their album cover in AR. Or, in another Terminator-inspired thought, he thought of how to amplify certain things in your view as if you're using a thermal camera.

Toward the end of that same talk, Petrou acknowledged what came to be the most important question facing Goggles, which would come to plague every company that worked on AR later. He put up that iconic image from Wall-E, a bunch of uniform-wearing obese people seated in chairs, sipping drinks and staring at screens. "If this is our future, then maybe AR is not all that important," Petrou said. Augmented reality and image search only matter if people care about the world around them, and every trend of screen time says they're increasingly less so.

The Goggles team searched constantly for ways to get people using Goggles more often. Goggles became a Sudoku solver, a translation tool, and a barcode scanner, all to give people more reasons to return to the app. Petrou remembers working on a feature called "Virtual Graffiti," where you could draw in AR and leave it somewhere for others to mind. The feature sounds virtually identical to the augmented-reality art Facebook showed off for its Facebook Camera platform in 2017. Google was years earlier to the idea, but never shipped it.

Glass Shattering

Google continued developing Goggles, but progress soon stalled. The company had promised a full iPhone version of Goggles, but eventually inserted it into the Google app---and then quickly removed the feature. Google hardly talked about Googles after 2011. By 2012, the company had more or less shut down development.

Most of the people I spoke to had differing ideas about what killed Goggles. One member of the team says they eventually saw the limits of the tech and just gave up. Another says people weren't yet comfortable with the idea of walking around holding their camera up all the time. But there was one other thing, the only one everyone mentioned, that may be the culprit.

In 2011, Google filed a patent application for a "head-mounted display that displays a visual representation of physical interaction with an input interface outside of the field of view." That's a whole bunch of words, but the picture told the story: It was Google Glass. The name on the patent? David Petrou.

Google Glass promised all the features of Goggles, right in front of your eyes.

Ariel Zambelich/WIRED

Petrou says that "we never questioned mobile phones" as a useful place for visual search, but others say the Goggles team always knew smartphones weren't the ideal devices for their tech. Eventually, they figured, users would rather have a gadget they don't have to hold up or manage; a pair of glasses made sense. (Contacts seemed even cooler.) All that tech seemed years away, though, and would require big leaps in processing power, battery efficiency, and internet connectivity. They kept working on smartphones because smartphones worked.

But practicality didn't matter to everyone. One former member of the Goggles team told me that in part, Google executives liked Goggles simply because it was "a whizzy demo." Then co-CEOs Larry Page and Sergey Brin loved showing Goggles to people, this person said, because it was new and nifty and futuristic. When Glass came along, promising not just camera-enabled search but a whole new kind of device and platform, Goggles paled in comparison. "It was an even whizzier demo," the former engineer says.

Indeed, Glass was touted far beyond any other Google product before or since. Brin interrupted a keynote address at the Google I/O conference in 2012 just in time to watch Glass-wearing skydivers fall through the air, land on the roof of the conference center, and ride BMX bikes into the auditorium. In a remarkable video titled "One day..." Google showed what a Glass-augmented life might look like. Brin even took Glass to the TED conference in 2013, passionately arguing for a future where gadgets free your eyes, hands, and ears rather than occupying them. Glass offered a complete and enticing view of the future, and inspired many inside and outside Google. Never mind that the tech didn't really work.

Pretty quickly, Nalawadi says, "I think the momentum shifted to project Glass." A few Goggles employees even went to work on the team. Others went elsewhere: to Maps, to YouTube, to Google Now. Some left Google altogether. At some point, Goggles just wasn't a thing anymore. By mid-2014, nobody was left to even update the Android app.

Back Where We Started

Right as Google gave up on Goggles, other companies began to see value in the idea. Snapchat launched in 2011 as a tool for sending disappearing messages, but quickly embraced smartphone cameras as a powerful platform. Pinterest hinged on turning images into search queries; pin a chair you like, and Pinterest helped you decorate your house. For Apple, Facebook, and others, augmented reality shifted from sci-fi impossibility to near-future product.

Even within Google, the underlying technology wasn't going to waste. In fact, it was improving faster than ever. "We had this big step-function jump because of deep learning," says Aparna Chennapragada, a senior director of product at Google. "The same step-function jump we got with voice, we started seeing in image search." Thanks to its investment in AI chips, and Google's company-wide shift to AI thinking, results got better and improved more quickly1. The first result of the shift: Google Photos, with its powerful search and assistive abilities. (Here Google finally got to roll out its facial recognition, too.)

After all these years, most of what held Goggles back has been solved. Smartphone cameras are excellent, as are the context-gathering sensors like gyroscopes and GPS that help anchor a user's position in the world. As a result, billions of users happily open their phone dozens of times a day to share memories, capture receipts, live-stream events, and save things to remember later. The back-end tech is faster, the front-end interfaces are easier. Nobody's wearing face-puters yet, but users don't mind doing it on their phones.

Matt Vokoun, Director of Product Management at Google, Inc., introduces Google Lens at a product launch event on October 4, 2017 in San Francisco, California.ELIJAH NOUVELAGE/AFP/Getty Images

All that helps explain what happened in May of 2017, when Google CEO Sundar Pichai took the stage at the I/O developer conference and announced... Goggles again, basically. Only this time it's called Lens. "Google Lens is a set of vision-based computing abilities that can understand what you're looking at, and help you take action based on that information," Pichai said. He gave demos: identifying a type of flower, or automatically connecting to Wi-Fi just by taking a picture of the username and password. So far, so Goggles. Including the fact that none of what worked in that video would be possible in the actual product anytime soon. Right now, Lens does the same things Goggles did in 2010, only much faster.

It's easy to wonder if Google squandered a years-long advantage in thinking about how people might want to use their camera. A few people in the company understood that users might someday want to explore the world through the screen of their phone. They might want to point their phone at something to understand it better, and might want to overlay the digital world on top of the physical one. Google may have known it first, but others beat it in the race to build something that captured the hearts and minds of users.

Still, even if Google could have been earlier to the party, it's still not late. Google does have a huge set of intrinsic advantages, from its search-engine knowledge to its long history of collecting and personalizing data. And Google did learn a few lessons in the Goggles experiment. This time, Lens won't be a standalone app. Instead, the tech will course through lots of Google products. It helps you grab phone numbers or restaurant info from any shot in Google Photos. Soon, it'll be part of Google Assistant, helping you search for anything you need any way you want. Rather than make an app you may never open, Google's putting Lens everywhere you already, with the hope you'll discover it and use it.

Google's made clear that Lens is a long-term bet for the company, and a platform for lots of use cases. Pichai compared Lens to Google's beginnings, how search was only possible because Google understood web pages. Now, it's learning to understand the world. You can bet that next time Google tries to put a computer on your face, Lens will be there. That'll make for quite a demo.

1UPDATE: This piece now accurately reflects which parts of Google's AI investments directly contributed to its visual search projects.