Connect with us

Hi, what are you looking for?

Tech & Science

Speech Technology Finds its Voice

Digital Journal — Waleed Qirbi was an up-and-coming MBA student at the University of Toronto when the headaches started. Already equipped with an honours political science BA from the University of Western Ontario, Qirbi was intent on getting his Masters in this highly competitive program.

Things were going well. His grades were high, he was leading the school’s Business Technology Group and he came in second at the 1998 MBA Games in Quebec City. Then, just days after the competition, he felt a swelling headache that wouldn’t go away. It turned out to be a brain tumour that kept him hospitalized for months. While his classmates were finishing their MBAs, Qirbi was finishing his treatment.

He was considered lucky. He left the hospital alive but with only 15 per cent of his former eyesight. Yet he returned to school the next semester and completed his Masters despite his new disability.

Even such heroic determination wouldn’t have been enough. It was only through purchasing a top-of-the-line Dell system, scanner, tape recorder, microphone and a host of speech software like TextHelp and Dragon NaturallySpeaking that Qirbi was able to get by. But working with such technologies was often a taxing affair. He was constantly correcting simple mistakes, scanning in textbooks and listening to hours of lifeless synthesized playback.

Speech technology has come a long way over the years, much more than most people would think. At the dawn of the first millennium, the Benedictine monk Gerbert d’Aurillac owned a talking bronze head that could answer “yes” or “no” questions through simple binary automation. Its mechanics were so complicated for the time that most people assumed it was co-developed by Lucifer himself.

Fast-forward through the next eight centuries, when other inventors built monstrous talking machines out of hand-operated valves, rubber vocal cords and even metal tongues. Similar contraptions, incorporating the principles of electronics, eventually pushed the technology to a more practical level when Bell Labs developed its VODER synthesizer in the 1930s. This keyboard-operated system wowed audiences at the 1939 World’s Fair in New York with a surprisingly heartfelt rendition of “Auld Lang Syne.”

Speech technology really got serious in the early 1970s, when programmers discovered Hidden Markov Modelling applications. All modern speech-recognition programs are based on these complicated mathematical formulae, which are designed to pick out patterns from existing data. These algorithms were first applied to speech technology during a series of research projects funded by the U.S. Defense Advanced Research Projects Agency.

Among the researchers was Dragon Systems (now called ScanSoft), which first released its industry-standard NaturallySpeaking program in 1997. NaturallySpeaking was billed as the first “continuous speech” dictation software, meaning users didn’t have to pause between words, as with older systems. It boasted an accuracy rate of about 95 per cent, or five mistakes per 100 words.

With today’s Dragon NaturallySpeaking 7, ScanSoft claims 98 per cent accuracy — a figure designed to constantly improve as vocabulary is added to its database. It still has limitations, such as an inability to recognize subtle differences in intonation. And of course, it will always have difficulties with homonyms, sometimes known as “wreck a nice beach” (“recognize speech”) syndrome. However, considering it lets you input up to 160 words per minute, it’s probably worth that extra editing time.

Other champions of speech technology include Microsoft, whose new Speech Server 2004 platform has caused quite a stir, and IBM, which applies its technology to everything from Web browsers to call centres to in-car telematics.

“Speech technology has only just begun,” says Igor Jablokov, program director of IBM’s Multimodal and Voice Portal Technologies. “We’re where the Web was in the mid-’90s, just starting to tap the potential of the technology. IBM Research, for instance, is working on something called the Superhuman project. Our goal: Improving word error rates by 25 per cent annually, with the end goal of designing a system to recognize speech better than humans can in a decade.”

Waleed Qirbi says the main problem with speech-based programs is not the actual software, but the fact that users are not suitably equipped to run them. A voice-ready PC should include a powerful CPU, a reliable soundcard and at least a gigabyte of RAM. Users should also invest in a high-quality array microphone, and most importantly, get the necessary training to install and operate these programs.

Soon after Qirbi graduated, he dreamed he owned a company called “VoicePC,” selling high-end desktops specifically designed for speech technology. He picked up the relevant domain names in 2001 and today successfully runs VoicePC Inc. out of Ottawa.

It’s a testament to the power of speech technology that Qirbi — who can barely see text on a screen — is able to manage an international, online retail business without ever having to touch a keyboard or a mouse.

Synthetic Speech’s Greatest Hits: Some words of wisdom from talking heads through the ages.

(1.) “Good evening, radio audience. Good afternoon, radio audience.”
Said by: VODER
Created by: Homer Dudley, 1939
Sounds like: A robot being strangled.

(2.) “How are you? I love you.”
Said by: Cascade Formant Synthesizer OVE
Created by: Gunnar Fant, 1953
Sounds like: Corky from Life Goes On, but with a mild Boston accent.

(3.) “Daisy, Daisy, give me your answer, do/I’m half crazy, all for the love of yooou . . . .”
Said by: IBM 7094
Created by: Bell Labs, 1961
Sounds like: Sheer, unrestrained terror.

(4.) “Look Dave, I can see you’re really upset about this. I honestly think you ought to sit down calmly, take a stress pill and think things over.”
Said by: HAL 9000
Created by: Arthur C. Clarke and Stanley Kubrick, 1968
Sounds like: One smug bastard of a program.

(5.) “I enjoy the simple life, as long as there’s plenty of comfort.”
Said by: JSRU Parallel-Formant Synthesizer
Created by: John Holmes, 1973
Sounds like: A kindly old British robo-grandpa.

(6.) “The juice of lemons makes a fine punch. A box was thrown beside a parked truck.”
Said by: Type ‘n Talk
Created by: Votrax, 1978
Sounds like: Complete nonsense.

(7.) “Spell ‘one,’ as in ‘one word.’ O, N, E. Correct. Now spell ‘Earth.’”
Said by: Speak & Spell
Created by: Texas Instruments, 1978
Sounds like: Mom, this toy’s trying to make me learn!

(8.) “I am Beautiful Betty, the standard female voice. Some people think I sound a bit like a man.”
Said by: DECTalk
Created by: Digital Equipment Corporation, 1985
Sounds like: Somebody’s gotta lay off the steroids there, hon.

Digital Journal‘s Top 5 commercial speech-tech programs:

Dragon NaturallySpeaking Preferred 7
ScanSoft, Inc.
“>www.ibm.com/software/voice/viavoice

Speech Server 2004
Microsoft Corp.
“>www.texthelp.com

Nuance 8.5
Nuance
subscribe to Digital Journal now, and receive 8 issues for $19.95 + GST ($39.95 USD).

You may also like:

World

US President Joe Biden delivers remarks after signing legislation authorizing aid for Ukraine, Israel and Taiwan at the White House on April 24, 2024...

World

AfD leaders Alice Weidel and Tino Chrupalla face damaging allegations about an EU parliamentarian's aide accused of spying for China - Copyright AFP Odd...

Business

Meta's growth is due in particular to its sophisticated advertising tools and the success of "Reels" - Copyright AFP SEBASTIEN BOZONJulie JAMMOTFacebook-owner Meta on...

World

Iran's supreme leader Ayatollah Ali Khamenei leads prayers by the coffins of seven Revolutionary Guards killed in an April 1 air strike on the...