Connect with us

Hi, what are you looking for?

Tech & Science

Microsoft’s AI as good as humans at voice recognition

Last year, Microsoft’s AI attained parity with human transcribers in the industry standard Switchboard test. Switchboard is a collection of real telephone conversations covering a wide range of disparate subjects. To complete the task, the AI has to accurately transcribe the conversations, without having previously heard them.
In Microsoft’s original test, it measured human transcribers to have a word error rate of 5.9 percent. The company’s AI then successfully transcribed the conversations with the same error rate, suggesting it was as proficient at the task as the trained humans.
Since then, other researchers have replicated Microsoft’s work. They found that humans working as a team have an error rate of only 5.1 percent. In a blog post earlier this week, Microsoft announced its AI has reached parity with this figure too.
The company said it attained the new milestone by making improvements to the AI’s acoustic and language models. It tweaked the way in which the AI handles acoustic modelling and word prediction. The language element was also overhauled to offer more context, allowing the AI to use the entire conversation history to predict the words likely to come next.
READ NEXT: Blockchain a “catalytic force” in soaring fintech market
The development is significant for voice recognition tech. It demonstrates machines can recognise voices as accurately as humans, something which will be more important as digital assistants and voice-controlled interfaces develop.
Microsoft acknowledged there are further tasks ahead though. The test was completed in ideal conditions which don’t represent the real world. Voice recognition tech in actual operation has to deal with noisy background environments and several styles and accents of speech. Accuracy can suffer dramatically as a result. Microsoft is now turning its attention to improving the AI’s word error rate under these conditions.
“While achieving a 5.1 percent word error rate on the Switchboard speech recognition task is a significant achievement, the speech research community still has many challenges to address, such as achieving human levels of recognition in noisy environments with distant microphones, in recognizing accented speech, or speaking styles and languages for which only limited training data is available,” said Microsoft.

Real-time voice translation in PowerPoint

Real-time voice translation in PowerPoint
Microsoft


READ NEXT: Majority of CEOs concerned about digital transformation
The research is already having an impact on Microsoft’s products. The company’s voice recognition technology has been integrated into its cloud-based Cognitive Services toolkit. It also powers its Cortana digital assistant and has been integrated into PowerPoint to translate presentations in real-time for multi-lingual audiences.
According to Microsoft, human levels of speech recognition could unlock new ways of interacting with computers and completing work. The next stage is to train AI to interpret the meaning in different conversations, allowing machines to understand intentions and expressions. Microsoft said its current studies are just the gateway to this kind of system.
“We have much work to do in teaching computers not just to transcribe the words spoken, but also to understand their meaning and intent,” said Microsoft. “Moving from recognizing to understanding speech is the next major frontier for speech technology.”

Written By

You may also like:

Business

IDC’s 2026 forecast shows AI moving into core infrastructure as economic and security pressures raise the stakes

Social Media

The EU said Friday that it had told TikTok it needs to change its "addictive design" or risk heavy fines.

Business

Jeep maker Stellantis warned Friday that it would take a 22 billion euro hit after a slower takeup of electric vehicles than it expected.

Business

Among overall categories, wholesale and retail trade, repair of motor vehicles and motorcycles, which includes e-commerce, ranked above all others.