Nowadays, speech recognition is very important topic for IT companies, because people want to control their devices using a voice commands. Of course, we have on the market solutions like Siri or Cortana, but they are not perfect. However, people would also like to use speech recognition software also for document “writing”. Microsoft claims that their artificial intelligence system is able to recognize voice with the same accuracy as people.
You probably are wondering at this moment based on what they have found this. The speech recognition accuracy can be measured using WER (World Error Rate) in the Switchboard Recognition Task. This is very challenging test, because the benchmarked software has to recognize speech from over 300 hours of conversations. The WER is calculated using the formula:
- S is the number of substitutions,
- D is the number of deletions,
- I is the number of insertions,
- C is the number of the corrects,
- N is the number of words in the reference (N=S+D+C).
Recently Microsoft’s speech recognition system has achieved 5.9% WER result.
This is 0.4% better result than a month ago which was achieved also by Microsoft. The 5.9% WER result in Switchboard speech recognize test was also achieved by professional transcriptionists. The neural networks used by Microsoft is self-learning mechanism which is able to learn the words which follow one another. The Microsoft’s Speech and Dialog research group is managed by Geoffrey Zweig. He is very proud of this breakthrough, because they were working over twenty years to achieve this. Five years ago he would have dared even think that this is possible.