The Microsoft 2017 Conversational Speech Recognition System

Alleva, F.; Droppo, J.; Huang, X.; Stolcke, A.; Wu, L.; Xiong, W.

research

The Microsoft 2017 Conversational Speech Recognition System

Authors: F. Alleva
J. Droppo
X. Huang
A. Stolcke
L. Wu
W. Xiong
Publication date: 24 August 2017
Publisher
Doi

Abstract

We describe the 2017 version of Microsoft's conversational speech recognition system, in which we update our 2016 system with recent developments in neural-network-based acoustic and language modeling to further advance the state of the art on the Switchboard speech recognition task. The system adds a CNN-BLSTM acoustic model to the set of model architectures we combined previously, and includes character-based and dialog session aware LSTM language models in rescoring. For system combination we adopt a two-stage approach, whereby subsets of acoustic models are first combined at the senone/frame level, followed by a word-level voting via confusion networks. We also added a confusion network rescoring step after system combination. The resulting system yields a 5.1\% word error rate on the 2000 Switchboard evaluation set

Similar works

Full text

Available Versions

Crossref

Last time updated on 10/08/2021