Remember meForgot password?
    Log in with Twitter

article imageMicrosoft's voice recognition now almost as accurate as humans

By James Walker     Sep 14, 2016 in Technology
Microsoft has announced it has reached a milestone in the development of more accurate speech recognition. The latest version of its technology has just achieved the lowest word error rate in the industry, bringing full voice input closer to reality.
Only a decade ago, voice recognition was still a new gimmick for most consumers. The relatively recent rise of digital assistants such as Apple's Siri, Google Now and Microsoft's Cortana has made the tech more accessible, introducing people to the idea that voice input can have practical uses. It's far from perfect, however. All three assistants can easily mishear what you're saying, wasting time and forcing you to repeat key words and phrases.
A team of Microsoft researchers, led by company chief speech scientist Xuedong Huang, has been attempting to raise the level of voice recognition accuracy. In a benchmark evaluation this week, the group achieved a word error rate of just 6.3 percent, the lowest in the industry. The software was tested against the Switchboard speech recognition task, accepted as a standard by all key voice recognition vendors.
The figure beats a claim made by IBM this past weekend. At the international Interspeech conference, the company said its systems had achieved a word error rate of 6.6 percent. Mere days later, Microsoft shaved another 0.3 percent off. Twenty years ago, the most advanced system available had a word error rate of 43 percent.
"Our best single system achieves an error rate of 6.9% on the NIST 2000 Switchboard set," said Huang. We believe this is the best performance reported to date for a recognition system not based on system combination. An ensemble of acoustic models advances the state of the art to 6.3 percent on the Switchboard test data."
Both Microsoft and IBM cited deep neural networks as the enabler that has made advanced speech recognition possible. Modern speech analysis systems, such as Cortana and Siri, are powered by vast cloud services making millions of calculations per second. Each time you talk to the assistant, your phone communicates with the cloud to work out what you are saying.
The advent of neural networks, computing systems designed to emulate the connections in the human brain, has made it possible to bypass many of the inaccuracies in traditional speech recognition. It makes it far easier for a computer to identify objects in an image or words in speech, making voice recognition more accurate and reliable.
Microsoft Chief Speech Scientist Xuedong Huang
Microsoft Chief Speech Scientist Xuedong Huang
The advances will directly benefit the future of digital assistants. Microsoft has made Cortana a core component of Windows 10, encouraging users to engage with the assistant to set reminders, check the weather and play games. Cortana was recently improved by the introduction of GPU-processing, enabling the system to ingest 10 times more data than before in the same amount of time. It's driven by the deep learning algorithms provided by the Computational Network Toolkit.
Microsoft's researchers believe that digital voice recognition could soon be sophisticated enough that computers can understand spoken words as well as other humans. The company said it "aligns with Microsoft's strategy to provide more personal computing experiences," noting the technology is already in use in Cortana and its real-time Skype Translator service. It described it as a "significant" step forward in its progress towards creating an AI that anticipates a user's needs instead of responding to their commands.
More about Microsoft, cortana, digital assistant, Voice recognition, neural networks
Latest News
Top News