944 research outputs found
ASR error management for improving spoken language understanding
This paper addresses the problem of automatic speech recognition (ASR) error
detection and their use for improving spoken language understanding (SLU)
systems. In this study, the SLU task consists in automatically extracting, from
ASR transcriptions , semantic concepts and concept/values pairs in a e.g
touristic information system. An approach is proposed for enriching the set of
semantic labels with error specific labels and by using a recently proposed
neural approach based on word embeddings to compute well calibrated ASR
confidence measures. Experimental results are reported showing that it is
possible to decrease significantly the Concept/Value Error Rate with a state of
the art system, outperforming previously published results performance on the
same experimental data. It also shown that combining an SLU approach based on
conditional random fields with a neural encoder/decoder attention based
architecture , it is possible to effectively identifying confidence islands and
uncertain semantic output segments useful for deciding appropriate error
handling actions by the dialogue manager strategy .Comment: Interspeech 2017, Aug 2017, Stockholm, Sweden. 201
DNN adaptation by automatic quality estimation of ASR hypotheses
In this paper we propose to exploit the automatic Quality Estimation (QE) of
ASR hypotheses to perform the unsupervised adaptation of a deep neural network
modeling acoustic probabilities. Our hypothesis is that significant
improvements can be achieved by: i)automatically transcribing the evaluation
data we are currently trying to recognise, and ii) selecting from it a subset
of "good quality" instances based on the word error rate (WER) scores predicted
by a QE component. To validate this hypothesis, we run several experiments on
the evaluation data sets released for the CHiME-3 challenge. First, we operate
in oracle conditions in which manual transcriptions of the evaluation data are
available, thus allowing us to compute the "true" sentence WER. In this
scenario, we perform the adaptation with variable amounts of data, which are
characterised by different levels of quality. Then, we move to realistic
conditions in which the manual transcriptions of the evaluation data are not
available. In this case, the adaptation is performed on data selected according
to the WER scores "predicted" by a QE component. Our results indicate that: i)
QE predictions allow us to closely approximate the adaptation results obtained
in oracle conditions, and ii) the overall ASR performance based on the proposed
QE-driven adaptation method is significantly better than the strong, most
recent, CHiME-3 baseline.Comment: Computer Speech & Language December 201
Recommended from our members
Confidence Estimation for Black Box Automatic Speech Recognition Systems Using Lattice Recurrent Neural Networks
Confidence Estimation for Black Box Automatic Speech Recognition Systems Using Lattice Recurrent Neural Networks
Recently, there has been growth in providers of speech transcription services
enabling others to leverage technology they would not normally be able to use.
As a result, speech-enabled solutions have become commonplace. Their success
critically relies on the quality, accuracy, and reliability of the underlying
speech transcription systems. Those black box systems, however, offer limited
means for quality control as only word sequences are typically available. This
paper examines this limited resource scenario for confidence estimation, a
measure commonly used to assess transcription reliability. In particular, it
explores what other sources of word and sub-word level information available in
the transcription process could be used to improve confidence scores. To encode
all such information this paper extends lattice recurrent neural networks to
handle sub-words. Experimental results using the IARPA OpenKWS 2016 evaluation
system show that the use of additional information yields significant gains in
confidence estimation accuracy. The implementation for this model can be found
online.Comment: 5 pages, 8 figures, ICASSP submissio
Confidence Estimation and Deletion Prediction Using Bidirectional Recurrent Neural Networks
The standard approach to assess reliability of automatic speech
transcriptions is through the use of confidence scores. If accurate, these
scores provide a flexible mechanism to flag transcription errors for upstream
and downstream applications. One challenging type of errors that recognisers
make are deletions. These errors are not accounted for by the standard
confidence estimation schemes and are hard to rectify in the upstream and
downstream processing. High deletion rates are prominent in limited resource
and highly mismatched training/testing conditions studied under IARPA Babel and
Material programs. This paper looks at the use of bidirectional recurrent
neural networks to yield confidence estimates in predicted as well as deleted
words. Several simple schemes are examined for combination. To assess
usefulness of this approach, the combined confidence score is examined for
untranscribed data selection that favours transcriptions with lower deletion
errors. Experiments are conducted using IARPA Babel/Material program languages.ALTA Institute, Cambridge University; The Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA) via Air Force Research Laboratory (AFRL
- …