As reported by Derrick Harris of Gigaom, the Chinese search engine company Baidu claims to have made a major breakthrough in the realm of speech recognition software with its new Deep Speech system:
“Chinese search engine giant Baidu says it has developed a speech recognition system, called Deep Speech, the likes of which has never been seen, especially in noisy environments. In restaurant settings and other loud places where other commercial speech recognition systems fail, the deep learning model proved accurate nearly 81 percent of the time.
That might not sound too great, but consider the alternative: commercial speech-recognition APIs against which Deep Speech was tested, including those for Microsoft Bing, Google and Wit.AI, topped out at nearly 65 percent accuracy in noisy environments.”
Baidu’s chief scientist is prudently avoiding getting too excited. He noted to Harris that the company’s research is still just research. Still, Baidu’s supposed breakthrough could mean a major shift down the line if Deep Speech gets incorporated into products such as smartphones.
Harris explains that the secret behind Deep Speech’s unprecedented accuracy is a massive trove of data plus the training of the system to deal with background noise:
“Baidu gathered about 7,000 hours of data on people speaking conversationally, and then synthesized a total of roughly 100,000 hours by fusing those files with files containing background noise. That was noise from a restaurant, a television, a cafeteria, and the inside of a car and a train.”
For more about Deep Speech and voice recognition software, be sure to check out Harris’ full article linked below. Included is a video demonstration of Baidu’s new software. It’s worth a look.
Read more at Gigaom
Photo credit: Carlos Amarillo / Shutterstock