Apple published a surprising amount of detail about how the HomePod works

Image of a HomePod — Enlarge / Siri on Apple's HomePod speaker.
Jeff Dunn

Today, Apple published a long and informative blog post by its audio software engineering and speech teams about how they use machine learning to make Siri responsive on the HomePod, and it reveals a lot about why Apple has made machine learning such a focus of late.

The post discusses working in a far-field setting where users are calling on Siri from any number of locations around the room relative to the HomePod's location. The premise is essentially that making Siri work on the HomePod is harder than on the iPhone for that reason. The device must compete with loud music playback from itself.

Apple addresses these issues with multiple microphones along with machine learning methods—specifically:

Mask-based multichannel filtering using deep learning to remove echo and background noise

Unsupervised learning to separate simultaneous sound sources and trigger-phrase based stream selection to eliminate interfering speech

Apple's teams write that "speech enhancement performance has improved substantially due to deep learning."

This new post is basically an article meant to establish Apple as a leader in the space, which the company is in some ways but not others. The post covers a wide range of topics, like echo cancellation and suppression, mask-based noise reduction, and deep-learning based streamer selection, among other things. There is a considerable amount of technical and mathematical detail, as it's written like an academic paper with detailed citations. We won't recap it all here—it's a lot—but give it a read if you're interested in a fairly deep dive on the techniques being used at Apple (and other tech companies, although specific approaches do vary).

From Apple's blog post, a diagram of the online multichannel signal processing chain on the device.

Apple
Also from the blog post, this graph shows false rejection rates of “Hey Siri” under various conditions.

Apple
Another graph: word error rates.

Apple

As noted previously, Apple has made machine learning a major focus of its work over the past couple of years. The Neural Engine on the iPhone's A12 and the iPad Pro's A12X chip is many times more powerful than what was included in previous Apple devices, and it's much more powerful than machine learning silicon in competing SoCs.

We hear about "machine learning" so much in tech product marketing pitches that it starts to sound like a catch-all that doesn't mean a lot to the user, so pieces like this can be helpful for context even if they're fundamentally promotional. Google has generally done a good job using its blogs to give users and partners a deeper understanding; here, Apple is doing the same.

For now, Amazon and Google are the market leaders when it comes to digital assistant technology. Apple has some catching up to do with Siri, but the approaches behind these competitors are not the same, so comparisons aren't as easy as they could be. Most relevant for users, Apple focuses on doing machine learning tasks on the local machine (either the user's, or the application or feature developer's) not in the cloud. But Apple's Core ML API does allow developers to tap into outside cloud networks, and Android devices also do local processing, like with photos, but the emphasis is different.

The HomePod smart speaker launched early this year. In our review, we found its sound quality to be outstanding and Siri to be responsive, but the lack of on-device Spotify support, the price, and other limitations of Siri as compared to Amazon's Alexa (found in the Sonos One and many other smart speakers) prevented us from making an unequivocal recommendation. Apple has not specifically shared individual unit sales of the HomePod in its quarterly earnings reports.

Machine learning —

Apple published a surprising amount of detail about how the HomePod works

Machine learning is a big focus at Apple right now—a blog post explains why.

Further Reading

Further Reading

Channel Ars Technica

Further Reading

Further Reading

reader comments

Channel Ars Technica