How Speech Recognition Works

Speech Recognition: Weaknesses and Flaws

A high-quality noise-canceling microphone can help the accuracy of your speech recognition system.
Image courtesy Amazon

No speech recognition system is 100 percent perfect; several factors can reduce accuracy. Some of these factors are issues that continue to improve as the technology improves. Others can be lessened -- if not completely corrected -- by the user.

Low signal-to-noise ratio


The program needs to "hear" the words spoken distinctly, and any extra noise introduced into the sound will interfere with this. The noise can come from a number of sources, including loud background noise in an office environment. Users should work in a quiet room with a quality microphone positioned as close to their mouths as possible. Low-quality sound cards, which provide the input for the microphone to send the signal to the computer, often do not have enough shielding from the electrical signals produced by other computer components. They can introduce hum or hiss into the signal.

Overlapping speech

Current systems have difficulty separating simultaneous speech from multiple users. "If you try to employ recognition technology in conversations or meetings where people frequently interrupt each other or talk over one another, you're likely to get extremely poor results," says John Garofolo.

Intensive use of computer power

Running the statistical models needed for speech recognition requires the computer's processor to do a lot of heavy work. One reason for this is the need to remember each stage of the word-recognition search in case the system needs to backtrack to come up with the right word. The fastest personal computers in use today can still have difficulties with complicated commands or phrases, slowing down the response time significantly. The vocabularies needed by the programs also take up a large amount of hard drive space. Fortunately, disk storage and processor speed are areas of rapid advancement -- the computers in use 10 years from now will benefit from an exponential increase in both factors.


Homonyms are two words that are spelled differently and have different meanings but sound the same. "There" and "their," "air" and "heir," "be" and "bee" are all examples. There is no way for a speech recognition program to tell the difference between these words based on sound alone. However, extensive training of systems and statistical models that take into account word context have greatly improved their performance.

We'll look at the future of speech recognition programs next.