Microsoft’s conversational speech recognition system – designed to precisely recognises the phrases in a dialog like people do – has reached a 5.1 % error price, its lowest up to now.
This milestone signifies that, for the primary time, a pc can recognise the phrases in a dialog in addition to an individual would.
“Our analysis workforce reached that 5.1 % error price with our speech recognition system, a brand new business milestone, considerably surpassing the accuracy we achieved final 12 months,” Microsoft stated in a weblog put up late on Sunday.
Final 12 months in October, the workforce from Microsoft Synthetic Intelligence and Analysis reported a speech recognition system that makes the identical or fewer errors than skilled transcriptionists.
The researchers had then reported a phrase error price (WER) of 5.9 %.
“Final 12 months, Microsoft’s speech and dialog analysis group introduced a milestone in reaching human parity on the ‘Switchboard’ conversational speech recognition job, that means we had created know-how that recognised phrases in a dialog in addition to skilled human transcribers,” stated Xuedong Huang, Technical Fellow, Microsoft.
‘Switchboard’ is a corpus of recorded phone conversations that the speech analysis neighborhood has used for greater than 20 years to benchmark speech recognition programs.
The duty includes transcribing conversations between strangers discussing subjects resembling sports activities and politics.
The workforce used “Microsoft Cognitive Toolkit 2.1” (CNTK), probably the most scalable deep studying software program accessible, for exploring mannequin architectures.
Moreover, Microsoft’s funding in cloud compute infrastructure, particularly Azure GPUs, helped enhance the effectiveness and pace.
Reaching human parity with an accuracy on par with people has been a analysis objective for the final 25 years.
“Microsoft’s willingness to spend money on long-term analysis is now paying dividends for our clients in services and products resembling Cortana, Presentation Translator, and Microsoft Cognitive Providers,” the put up learn.
“Shifting from recognising to understanding speech is the following main frontier for speech know-how,” the put up added.