1,660 research outputs found
ASR error management for improving spoken language understanding
This paper addresses the problem of automatic speech recognition (ASR) error
detection and their use for improving spoken language understanding (SLU)
systems. In this study, the SLU task consists in automatically extracting, from
ASR transcriptions , semantic concepts and concept/values pairs in a e.g
touristic information system. An approach is proposed for enriching the set of
semantic labels with error specific labels and by using a recently proposed
neural approach based on word embeddings to compute well calibrated ASR
confidence measures. Experimental results are reported showing that it is
possible to decrease significantly the Concept/Value Error Rate with a state of
the art system, outperforming previously published results performance on the
same experimental data. It also shown that combining an SLU approach based on
conditional random fields with a neural encoder/decoder attention based
architecture , it is possible to effectively identifying confidence islands and
uncertain semantic output segments useful for deciding appropriate error
handling actions by the dialogue manager strategy .Comment: Interspeech 2017, Aug 2017, Stockholm, Sweden. 201
Tracking of enriched dialog states for flexible conversational information access
Dialog state tracking (DST) is a crucial component in a task-oriented dialog
system for conversational information access. A common practice in current
dialog systems is to define the dialog state by a set of slot-value pairs. Such
representation of dialog states and the slot-filling based DST have been widely
employed, but suffer from three drawbacks. (1) The dialog state can contain
only a single value for a slot, and (2) can contain only users' affirmative
preference over the values for a slot. (3) Current task-based dialog systems
mainly focus on the searching task, while the enquiring task is also very
common in practice. The above observations motivate us to enrich current
representation of dialog states and collect a brand new dialog dataset about
movies, based upon which we build a new DST, called enriched DST (EDST), for
flexible accessing movie information. The EDST supports the searching task, the
enquiring task and their mixed task. We show that the new EDST method not only
achieves good results on Iqiyi dataset, but also outperforms other
state-of-the-art DST methods on the traditional dialog datasets, WOZ2.0 and
DSTC2.Comment: 5 pages, 2 figures, accepted by ICASSP201
Survey on Evaluation Methods for Dialogue Systems
In this paper we survey the methods and concepts developed for the evaluation
of dialogue systems. Evaluation is a crucial part during the development
process. Often, dialogue systems are evaluated by means of human evaluations
and questionnaires. However, this tends to be very cost and time intensive.
Thus, much work has been put into finding methods, which allow to reduce the
involvement of human labour. In this survey, we present the main concepts and
methods. For this, we differentiate between the various classes of dialogue
systems (task-oriented dialogue systems, conversational dialogue systems, and
question-answering dialogue systems). We cover each class by introducing the
main technologies developed for the dialogue systems and then by presenting the
evaluation methods regarding this class
An End-to-End Trainable Neural Network Model with Belief Tracking for Task-Oriented Dialog
We present a novel end-to-end trainable neural network model for
task-oriented dialog systems. The model is able to track dialog state, issue
API calls to knowledge base (KB), and incorporate structured KB query results
into system responses to successfully complete task-oriented dialogs. The
proposed model produces well-structured system responses by jointly learning
belief tracking and KB result processing conditioning on the dialog history. We
evaluate the model in a restaurant search domain using a dataset that is
converted from the second Dialog State Tracking Challenge (DSTC2) corpus.
Experiment results show that the proposed model can robustly track dialog state
given the dialog history. Moreover, our model demonstrates promising results in
producing appropriate system responses, outperforming prior end-to-end
trainable neural network models using per-response accuracy evaluation metrics.Comment: Published at Interspeech 201
Spoken Language Intent Detection using Confusion2Vec
Decoding speaker's intent is a crucial part of spoken language understanding
(SLU). The presence of noise or errors in the text transcriptions, in real life
scenarios make the task more challenging. In this paper, we address the spoken
language intent detection under noisy conditions imposed by automatic speech
recognition (ASR) systems. We propose to employ confusion2vec word feature
representation to compensate for the errors made by ASR and to increase the
robustness of the SLU system. The confusion2vec, motivated from human speech
production and perception, models acoustic relationships between words in
addition to the semantic and syntactic relations of words in human language. We
hypothesize that ASR often makes errors relating to acoustically similar words,
and the confusion2vec with inherent model of acoustic relationships between
words is able to compensate for the errors. We demonstrate through experiments
on the ATIS benchmark dataset, the robustness of the proposed model to achieve
state-of-the-art results under noisy ASR conditions. Our system reduces
classification error rate (CER) by 20.84% and improves robustness by 37.48%
(lower CER degradation) relative to the previous state-of-the-art going from
clean to noisy transcripts. Improvements are also demonstrated when training
the intent detection models on noisy transcripts
- …