Skip to Main Content
PCMag editors select and review products independently. If you buy through affiliate links, we may earn commissions, which help support our testing.

Google's Speech Recognition Can Now Handle Cocktail Parties

Noisy environments and multiple voices are no longer a problem for Google understanding what is being said.

April 13, 2018
Cocktail Party Drink

Humans are great at picking out and focusing on a single voice in a noisy environment. Computers are getting much better at speech recognition as the growing number of digital assistants proves, but they still struggle when there's multiple voices or lots of background noise. Google seems to have solved the problem, though, by using both audio and video to train a system to isolate speech.

The phenomenon Google was attempting to copy is known as the cocktail party effect. It's the brain's ability to selectively focus on audio while filtering out all other stimuli. A good example of this being listening to someone talk in a very noisy room.

Google Research tackled the problem by combining video and audio in order to identify who is speaking based on mouth movements and linking that up to the audio being heard. Training a "multi-stream convolutional neural network" to carry out this task required collecting 100,000 high quality video lectures and talk from YouTube, then extracting clean speech segments from them.

This resulted in 2,000 hours of clean data with which to create "synthetic cocktail parties." Google achieved that by mixing the video together so two people were talking simultaneously. Non-speech background noise was also added just to make things more realistic (and difficult).

As the video above demonstrates, once trained the system is capable of focusing on a single voice and filtering out everything else. The same is possible when only one person is speaking but the background noise is bad enough that you struggle to hear what is being said.

Here's a good example of how Google's system can improve audio using a noisy cafeteria setting:

As you can imagine, there are many situations where this technology could have a positive impact. For pre-recorded video, it should make automatic captioning much more accurate because each voice can be focused on as part of the process. It may take multiple passes, but it's worth it if the recognition accuracy increases significantly.

For the hard of hearing, the system could be used as part of a hearing aid and smart glasses combo. The wearer looks at the person they want to listen to in a noisy environment and the hearing aid they are wearing can filter out all but the voice because the camera on the glasses is tracking the mouth movements. The same is possible when watching TV, which could benefit from a new "speech focus" setting for audio output. YouTube would probably get such a feature first, though.

Google is already exploring how it can incorporate the technology into its products, and it's obvious Google's digital assistant will be a focus. Being able to converse with Google Home ($99.00 at Target) devices in a noisy family environment, or instructing Google to do something using your smartphone in any number of noisy public situations are clear near-future beneficiaries of this tech.

Get Our Best Stories!

Sign up for What's New Now to get our top stories delivered to your inbox every morning.

This newsletter may contain advertising, deals, or affiliate links. Subscribing to a newsletter indicates your consent to our Terms of Use and Privacy Policy. You may unsubscribe from the newsletters at any time.


Thanks for signing up!

Your subscription has been confirmed. Keep an eye on your inbox!

Sign up for other newsletters

TRENDING

About Matthew Humphries

Senior Editor

I started working at PCMag in November 2016, covering all areas of technology and video game news. Before that I spent nearly 15 years working at Geek.com as a writer and editor. I also spent the first six years after leaving university as a professional game designer working with Disney, Games Workshop, 20th Century Fox, and Vivendi.

Read Matthew's full bio

Read the latest from Matthew Humphries