1,256 research outputs found
Conversational Analysis using Utterance-level Attention-based Bidirectional Recurrent Neural Networks
Recent approaches for dialogue act recognition have shown that context from
preceding utterances is important to classify the subsequent one. It was shown
that the performance improves rapidly when the context is taken into account.
We propose an utterance-level attention-based bidirectional recurrent neural
network (Utt-Att-BiRNN) model to analyze the importance of preceding utterances
to classify the current one. In our setup, the BiRNN is given the input set of
current and preceding utterances. Our model outperforms previous models that
use only preceding utterances as context on the used corpus. Another
contribution of the article is to discover the amount of information in each
utterance to classify the subsequent one and to show that context-based
learning not only improves the performance but also achieves higher confidence
in the classification. We use character- and word-level features to represent
the utterances. The results are presented for character and word feature
representations and as an ensemble model of both representations. We found that
when classifying short utterances, the closest preceding utterances contributes
to a higher degree.Comment: Proceedings of INTERSPEECH 201
The Microsoft 2017 Conversational Speech Recognition System
We describe the 2017 version of Microsoft's conversational speech recognition
system, in which we update our 2016 system with recent developments in
neural-network-based acoustic and language modeling to further advance the
state of the art on the Switchboard speech recognition task. The system adds a
CNN-BLSTM acoustic model to the set of model architectures we combined
previously, and includes character-based and dialog session aware LSTM language
models in rescoring. For system combination we adopt a two-stage approach,
whereby subsets of acoustic models are first combined at the senone/frame
level, followed by a word-level voting via confusion networks. We also added a
confusion network rescoring step after system combination. The resulting system
yields a 5.1\% word error rate on the 2000 Switchboard evaluation set
- …