704 research outputs found
A Comparison of Hybrid and End-to-End Models for Syllable Recognition
This paper presents a comparison of a traditional hybrid speech recognition
system (kaldi using WFST and TDNN with lattice-free MMI) and a lexicon-free
end-to-end (TensorFlow implementation of multi-layer LSTM with CTC training)
models for German syllable recognition on the Verbmobil corpus. The results
show that explicitly modeling prior knowledge is still valuable in building
recognition systems. With a strong language model (LM) based on syllables, the
structured approach significantly outperforms the end-to-end model. The best
word error rate (WER) regarding syllables was achieved using kaldi with a
4-gram LM, modeling all syllables observed in the training set. It achieved
10.0% WER w.r.t. the syllables, compared to the end-to-end approach where the
best WER was 27.53%. The work presented here has implications for building
future recognition systems that operate independent of a large vocabulary, as
typically used in a tasks such as recognition of syllabic or agglutinative
languages, out-of-vocabulary techniques, keyword search indexing and medical
speech processing.Comment: 22th International Conference of Text, Speech and Dialogue TSD201
The Microsoft 2016 Conversational Speech Recognition System
We describe Microsoft's conversational speech recognition system, in which we
combine recent developments in neural-network-based acoustic and language
modeling to advance the state of the art on the Switchboard recognition task.
Inspired by machine learning ensemble techniques, the system uses a range of
convolutional and recurrent neural networks. I-vector modeling and lattice-free
MMI training provide significant gains for all acoustic model architectures.
Language model rescoring with multiple forward and backward running RNNLMs, and
word posterior-based system combination provide a 20% boost. The best single
system uses a ResNet architecture acoustic model with RNNLM rescoring, and
achieves a word error rate of 6.9% on the NIST 2000 Switchboard task. The
combined system has an error rate of 6.2%, representing an improvement over
previously reported results on this benchmark task
- …