1,163 research outputs found
Recommended from our members
Confidence Estimation for Black Box Automatic Speech Recognition Systems Using Lattice Recurrent Neural Networks
Confidence Estimation for Black Box Automatic Speech Recognition Systems Using Lattice Recurrent Neural Networks
Recently, there has been growth in providers of speech transcription services
enabling others to leverage technology they would not normally be able to use.
As a result, speech-enabled solutions have become commonplace. Their success
critically relies on the quality, accuracy, and reliability of the underlying
speech transcription systems. Those black box systems, however, offer limited
means for quality control as only word sequences are typically available. This
paper examines this limited resource scenario for confidence estimation, a
measure commonly used to assess transcription reliability. In particular, it
explores what other sources of word and sub-word level information available in
the transcription process could be used to improve confidence scores. To encode
all such information this paper extends lattice recurrent neural networks to
handle sub-words. Experimental results using the IARPA OpenKWS 2016 evaluation
system show that the use of additional information yields significant gains in
confidence estimation accuracy. The implementation for this model can be found
online.Comment: 5 pages, 8 figures, ICASSP submissio
Future word contexts in neural network language models
Recently, bidirectional recurrent network language models (bi-RNNLMs) have
been shown to outperform standard, unidirectional, recurrent neural network
language models (uni-RNNLMs) on a range of speech recognition tasks. This
indicates that future word context information beyond the word history can be
useful. However, bi-RNNLMs pose a number of challenges as they make use of the
complete previous and future word context information. This impacts both
training efficiency and their use within a lattice rescoring framework. In this
paper these issues are addressed by proposing a novel neural network structure,
succeeding word RNNLMs (su-RNNLMs). Instead of using a recurrent unit to
capture the complete future word contexts, a feedforward unit is used to model
a finite number of succeeding, future, words. This model can be trained much
more efficiently than bi-RNNLMs and can also be used for lattice rescoring.
Experimental results on a meeting transcription task (AMI) show the proposed
model consistently outperformed uni-RNNLMs and yield only a slight degradation
compared to bi-RNNLMs in N-best rescoring. Additionally, performance
improvements can be obtained using lattice rescoring and subsequent confusion
network decoding
- …