Search CORE

4,353 research outputs found

The Microsoft 2016 Conversational Speech Recognition System

Author: Droppo J.
Huang X.
Seide F.
Seltzer M.
Stolcke A.
Xiong W.
Yu D.
Zweig G.
Publication venue
Publication date: 25/01/2017
Field of study

We describe Microsoft's conversational speech recognition system, in which we combine recent developments in neural-network-based acoustic and language modeling to advance the state of the art on the Switchboard recognition task. Inspired by machine learning ensemble techniques, the system uses a range of convolutional and recurrent neural networks. I-vector modeling and lattice-free MMI training provide significant gains for all acoustic model architectures. Language model rescoring with multiple forward and backward running RNNLMs, and word posterior-based system combination provide a 20% boost. The best single system uses a ResNet architecture acoustic model with RNNLM rescoring, and achieves a word error rate of 6.9% on the NIST 2000 Switchboard task. The combined system has an error rate of 6.2%, representing an improvement over previously reported results on this benchmark task

arXiv.org e-Print Archive

Crossref

The Microsoft 2017 Conversational Speech Recognition System

Author: Alleva F.
Droppo J.
Huang X.
Stolcke A.
Wu L.
Xiong W.
Publication venue
Publication date: 24/08/2017
Field of study

We describe the 2017 version of Microsoft's conversational speech recognition system, in which we update our 2016 system with recent developments in neural-network-based acoustic and language modeling to further advance the state of the art on the Switchboard speech recognition task. The system adds a CNN-BLSTM acoustic model to the set of model architectures we combined previously, and includes character-based and dialog session aware LSTM language models in rescoring. For system combination we adopt a two-stage approach, whereby subsets of acoustic models are first combined at the senone/frame level, followed by a word-level voting via confusion networks. We also added a confusion network rescoring step after system combination. The resulting system yields a 5.1\% word error rate on the 2000 Switchboard evaluation set

arXiv.org e-Print Archive

Crossref

TheanoLM - An Extensible Toolkit for Neural Network Language Modeling

Author: Enarvi Seppo
Kurimo Mikko
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2016
Field of study

We present a new tool for training neural network language models (NNLMs), scoring sentences, and generating text. The tool has been written using Python library Theano, which allows researcher to easily extend it and tune any aspect of the training process. Regardless of the flexibility, Theano is able to generate extremely fast native code that can utilize a GPU or multiple CPU cores in order to parallelize the heavy numerical computations. The tool has been evaluated in difficult Finnish and English conversational speech recognition tasks, and significant improvement was obtained over our best back-off n-gram models. The results that we obtained in the Finnish task were compared to those from existing RNNLM and RWTHLM toolkits, and found to be as good or better, while training times were an order of magnitude shorter

arXiv.org e-Print Archive

Crossref

Aaltodoc Publication Archive

Conversational Analysis using Utterance-level Attention-based Bidirectional Recurrent Neural Networks

Author: Bothe Chandrakant
Magg Sven
Weber Cornelius
Wermter Stefan
Publication venue: 'International Speech Communication Association'
Publication date: 20/06/2018
Field of study

Recent approaches for dialogue act recognition have shown that context from preceding utterances is important to classify the subsequent one. It was shown that the performance improves rapidly when the context is taken into account. We propose an utterance-level attention-based bidirectional recurrent neural network (Utt-Att-BiRNN) model to analyze the importance of preceding utterances to classify the current one. In our setup, the BiRNN is given the input set of current and preceding utterances. Our model outperforms previous models that use only preceding utterances as context on the used corpus. Another contribution of the article is to discover the amount of information in each utterance to classify the subsequent one and to show that context-based learning not only improves the performance but also achieves higher confidence in the classification. We use character- and word-level features to represent the utterances. The results are presented for character and word feature representations and as an ensemble model of both representations. We found that when classifying short utterances, the closest preceding utterances contributes to a higher degree.Comment: Proceedings of INTERSPEECH 201

arXiv.org e-Print Archive

Crossref

Automatic speech recognition with deep neural networks for impaired speech

Author: España-i-Bonet Cristina
Rodríguez Fonollosa José Adrián
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

The final publication is available at https://link.springer.com/chapter/10.1007%2F978-3-319-49169-1_10Automatic Speech Recognition has reached almost human performance in some controlled scenarios. However, recognition of impaired speech is a difficult task for two main reasons: data is (i) scarce and (ii) heterogeneous. In this work we train different architectures on a database of dysarthric speech. A comparison between architectures shows that, even with a small database, hybrid DNN-HMM models outperform classical GMM-HMM according to word error rate measures. A DNN is able to improve the recognition word error rate a 13% for subjects with dysarthria with respect to the best classical architecture. This improvement is higher than the one given by other deep neural networks such as CNNs, TDNNs and LSTMs. All the experiments have been done with the Kaldi toolkit for speech recognition for which we have adapted several recipes to deal with dysarthric speech and work on the TORGO database. These recipes are publicly available.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC