Microsoft's Cortana can now recognise human speech with the same accuracy as a professional human transcriber

  • Microsoft made improvements to its conversational speech recognition system
  • This resulted in a 5.1 per cent margin of error in line with trained professionals
  • The firm achieved a 5.9 per cent error rate equal to the average person last year
  • The Washington-based company is now setting its sights on getting machines to understand the meaning behind the words they recognise

A new milestone in human speech recognition has been reached by Microsoft, matching the accuracy of trained human transcribers.

The firm's software, used in its Cortana voice assistant, has achieved a 5.1 per cent margin of error, putting it on a par with professionals.

One of the big frustrations of voice recognition has been getting machines to accept commands, a process which often involves repetition and exaggerated speech. 

The development means the company's products will soon accept orders with super-human precision.

Scroll down for video 

A new milestone in human speech recognition has been reached by Microsoft, matching the accuracy of trained human transcribers. The firm's software, used in its Cortana voice assistant (pictured), achieved a 5.1 per cent error rate, putting it on a par with professionals

A new milestone in human speech recognition has been reached by Microsoft, matching the accuracy of trained human transcribers. The firm's software, used in its Cortana voice assistant (pictured), achieved a 5.1 per cent error rate, putting it on a par with professionals

WHAT NEXT? 

Microsoft says it is now turning its attention to solving some of the remaining challenges facing speech recognition, as well as teaching machines to understand what they hear.

Some problems still to be addressed by the software include achieving human levels of recognition in noisy environments with distant microphones as well as recognising accented speech or speaking styles and languages for which only limited training data is available.

Microsoft says they have much work to do in teaching computers not just to transcribe the words spoken, but also to understand their meaning and intent.

The firm believes moving from recognising to understanding speech is the next major frontier for speech technology.'

Advertisement

The findings were published in a technical report published by Microsoft on Saturday.

Last year, researchers from Microsoft Artificial Intelligence and Research reached a 5.9 per cent error rate, the same as the average person. 

The new paper details how experts used improvements in AI to refine its conversational speech recognition system.

This allows the system to better recognise the waveform of speech patterns, moment to moment and word to word.

It also uses the context of a conversation to predict what is likely to come next. 

The technology is used in the company's Cortana voice assistant that allows users to perform a range of tasks, from checking the weather to chatting. 

It also provides a voice translation service.

Writing on the Microsoft Research blog, technical fellow Xuedong Huang said: 'Reaching human parity with an accuracy on par with humans has been a research goal for the last 25 years. 

'Microsoft's willingness to invest in long-term research is now paying dividends for our customers in products and services such as Cortana, Presentation Translator, and Microsoft Cognitive Services. 

'It's deeply gratifying to our research teams to see our work used by millions of people each day.' 

Switchboard is a body of recorded telephone conversations that the speech research community has used for more than 20 years to test voice recognition systems. 

The task involves transcribing conversations between strangers discussing topics ranging from sports to politics.

Previous research has shown that humans achieve higher levels of agreement on the precise words spoken as they expend more care and effort, as in the case of professional transcribers. This images shows some of the options available through the Cortana voice assistant 

Previous research has shown that humans achieve higher levels of agreement on the precise words spoken as they expend more care and effort, as in the case of professional transcribers. This images shows some of the options available through the Cortana voice assistant 

Previous research has shown that humans achieve higher levels of agreement on the precise words spoken as they expend more care and effort, as in the case of professional transcribers.

Microsoft says it is now turning its attention to solving some of the remaining challenges facing speech recognition, as well as teaching machines to understand what they hear.

Mr Huang added: 'While achieving a 5.1 per cent word error rate on the Switchboard speech recognition task is a significant achievement, the speech research community still has many challenges to address.

Microsoft says it is now turning its attention to solving some of the remaining challenges facing speech recognition, as well as teaching machines to understand what they hear. This image shows the firm's voice translation service

Microsoft says it is now turning its attention to solving some of the remaining challenges facing speech recognition, as well as teaching machines to understand what they hear. This image shows the firm's voice translation service

'[This includes] achieving human levels of recognition in noisy environments with distant microphones, in recognising accented speech, or speaking styles and languages for which only limited training data is available. 

'Moreover, we have much work to do in teaching computers not just to transcribe the words spoken, but also to understand their meaning and intent. 

'Moving from recognising to understanding speech is the next major frontier for speech technology.'

The comments below have not been moderated.

The views expressed in the contents above are those of our users and do not necessarily reflect the views of MailOnline.

We are no longer accepting comments on this article.