78,533 research outputs found
Transfer Learning for Context-Aware Spoken Language Understanding
Spoken language understanding (SLU) is a key component of task-oriented
dialogue systems. SLU parses natural language user utterances into semantic
frames. Previous work has shown that incorporating context information
significantly improves SLU performance for multi-turn dialogues. However,
collecting a large-scale human-labeled multi-turn dialogue corpus for the
target domains is complex and costly. To reduce dependency on the collection
and annotation effort, we propose a Context Encoding Language Transformer
(CELT) model facilitating exploiting various context information for SLU. We
explore different transfer learning approaches to reduce dependency on data
collection and annotation. In addition to unsupervised pre-training using
large-scale general purpose unlabeled corpora, such as Wikipedia, we explore
unsupervised and supervised adaptive training approaches for transfer learning
to benefit from other in-domain and out-of-domain dialogue corpora.
Experimental results demonstrate that the proposed model with the proposed
transfer learning approaches achieves significant improvement on the SLU
performance over state-of-the-art models on two large-scale single-turn
dialogue benchmarks and one large-scale multi-turn dialogue benchmark.Comment: 6 pages, 3 figures, ASRU201
Dialogue history integration into end-to-end signal-to-concept spoken language understanding systems
This work investigates the embeddings for representing dialog history in
spoken language understanding (SLU) systems. We focus on the scenario when the
semantic information is extracted directly from the speech signal by means of a
single end-to-end neural network model. We proposed to integrate dialogue
history into an end-to-end signal-to-concept SLU system. The dialog history is
represented in the form of dialog history embedding vectors (so-called
h-vectors) and is provided as an additional information to end-to-end SLU
models in order to improve the system performance. Three following types of
h-vectors are proposed and experimentally evaluated in this paper: (1)
supervised-all embeddings predicting bag-of-concepts expected in the answer of
the user from the last dialog system response; (2) supervised-freq embeddings
focusing on predicting only a selected set of semantic concept (corresponding
to the most frequent errors in our experiments); and (3) unsupervised
embeddings. Experiments on the MEDIA corpus for the semantic slot filling task
demonstrate that the proposed h-vectors improve the model performance.Comment: Accepted for ICASSP 2020 (Submitted: October 21, 2019
- …