764 research outputs found
An End-to-end Approach for Handling Unknown Slot Values in Dialogue State Tracking
We highlight a practical yet rarely discussed problem in dialogue state
tracking (DST), namely handling unknown slot values. Previous approaches
generally assume predefined candidate lists and thus are not designed to output
unknown values, especially when the spoken language understanding (SLU) module
is absent as in many end-to-end (E2E) systems. We describe in this paper an E2E
architecture based on the pointer network (PtrNet) that can effectively extract
unknown slot values while still obtains state-of-the-art accuracy on the
standard DSTC2 benchmark. We also provide extensive empirical evidence to show
that tracking unknown values can be challenging and our approach can bring
significant improvement with the help of an effective feature dropout
technique.Comment: Accepted by ACL 201
BERT-DST: Scalable End-to-End Dialogue State Tracking with Bidirectional Encoder Representations from Transformer
An important yet rarely tackled problem in dialogue state tracking (DST) is
scalability for dynamic ontology (e.g., movie, restaurant) and unseen slot
values. We focus on a specific condition, where the ontology is unknown to the
state tracker, but the target slot value (except for none and dontcare),
possibly unseen during training, can be found as word segment in the dialogue
context. Prior approaches often rely on candidate generation from n-gram
enumeration or slot tagger outputs, which can be inefficient or suffer from
error propagation. We propose BERT-DST, an end-to-end dialogue state tracker
which directly extracts slot values from the dialogue context. We use BERT as
dialogue context encoder whose contextualized language representations are
suitable for scalable DST to identify slot values from their semantic context.
Furthermore, we employ encoder parameter sharing across all slots with two
advantages: (1) Number of parameters does not grow linearly with the ontology.
(2) Language representation knowledge can be transferred among slots. Empirical
evaluation shows BERT-DST with cross-slot parameter sharing outperforms prior
work on the benchmark scalable DST datasets Sim-M and Sim-R, and achieves
competitive performance on the standard DSTC2 and WOZ 2.0 datasets.Comment: Published in Interspeech 201
Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces
This paper presents the machine learning architecture of the Snips Voice
Platform, a software solution to perform Spoken Language Understanding on
microprocessors typical of IoT devices. The embedded inference is fast and
accurate while enforcing privacy by design, as no personal user data is ever
collected. Focusing on Automatic Speech Recognition and Natural Language
Understanding, we detail our approach to training high-performance Machine
Learning models that are small enough to run in real-time on small devices.
Additionally, we describe a data generation procedure that provides sufficient,
high-quality training data without compromising user privacy.Comment: 29 pages, 9 figures, 17 table
STIL -- Simultaneous Slot Filling, Translation, Intent Classification, and Language Identification: Initial Results using mBART on MultiATIS++
Slot-filling, Translation, Intent classification, and Language
identification, or STIL, is a newly-proposed task for multilingual Natural
Language Understanding (NLU). By performing simultaneous slot filling and
translation into a single output language (English in this case), some portion
of downstream system components can be monolingual, reducing development and
maintenance cost. Results are given using the multilingual BART model (Liu et
al., 2020) fine-tuned on 7 languages using the MultiATIS++ dataset. When no
translation is performed, mBART's performance is comparable to the current
state of the art system (Cross-Lingual BERT by Xu et al. (2020)) for the
languages tested, with better average intent classification accuracy (96.07%
versus 95.50%) but worse average slot F1 (89.87% versus 90.81%). When
simultaneous translation is performed, average intent classification accuracy
degrades by only 1.7% relative and average slot F1 degrades by only 1.2%
relative.Comment: 4 pages; To be published at AACL 2020; For code, see:
https://github.com/jgmfitz/stil-mbart-multiatispp-aacl202
STN4DST: A Scalable Dialogue State Tracking based on Slot Tagging Navigation
Scalability for handling unknown slot values is a important problem in
dialogue state tracking (DST). As far as we know, previous scalable DST
approaches generally rely on either the candidate generation from slot tagging
output or the span extraction in dialogue context. However, the candidate
generation based DST often suffers from error propagation due to its pipelined
two-stage process; meanwhile span extraction based DST has the risk of
generating invalid spans in the lack of semantic constraints between start and
end position pointers. To tackle the above drawbacks, in this paper, we propose
a novel scalable dialogue state tracking method based on slot tagging
navigation, which implements an end-to-end single-step pointer to locate and
extract slot value quickly and accurately by the joint learning of slot tagging
and slot value position prediction in the dialogue context, especially for
unknown slot values. Extensive experiments over several benchmark datasets show
that the proposed model performs better than state-of-the-art baselines
greatly
Copy-Enhanced Heterogeneous Information Learning for Dialogue State Tracking
Dialogue state tracking (DST) is an essential component in task-oriented
dialogue systems, which estimates user goals at every dialogue turn. However,
most previous approaches usually suffer from the following problems. Many
discriminative models, especially end-to-end (E2E) models, are difficult to
extract unknown values that are not in the candidate ontology; previous
generative models, which can extract unknown values from utterances, degrade
the performance due to ignoring the semantic information of pre-defined
ontology. Besides, previous generative models usually need a hand-crafted list
to normalize the generated values. How to integrate the semantic information of
pre-defined ontology and dialogue text (heterogeneous texts) to generate
unknown values and improve performance becomes a severe challenge. In this
paper, we propose a Copy-Enhanced Heterogeneous Information Learning model with
multiple encoder-decoder for DST (CEDST), which can effectively generate all
possible values including unknown values by copying values from heterogeneous
texts. Meanwhile, CEDST can effectively decompose the large state space into
several small state spaces through multi-encoder, and employ multi-decoder to
make full use of the reduced spaces to generate values. Multi-encoder-decoder
architecture can significantly improve performance. Experiments show that CEDST
can achieve state-of-the-art results on two datasets and our constructed
datasets with many unknown values.Comment: 12 pages, 4 figure
CASA-NLU: Context-Aware Self-Attentive Natural Language Understanding for Task-Oriented Chatbots
Natural Language Understanding (NLU) is a core component of dialog systems.
It typically involves two tasks - intent classification (IC) and slot labeling
(SL), which are then followed by a dialogue management (DM) component. Such NLU
systems cater to utterances in isolation, thus pushing the problem of context
management to DM. However, contextual information is critical to the correct
prediction of intents and slots in a conversation. Prior work on contextual NLU
has been limited in terms of the types of contextual signals used and the
understanding of their impact on the model. In this work, we propose a
context-aware self-attentive NLU (CASA-NLU) model that uses multiple signals,
such as previous intents, slots, dialog acts and utterances over a variable
context window, in addition to the current user utterance. CASA-NLU outperforms
a recurrent contextual NLU baseline on two conversational datasets, yielding a
gain of up to 7% on the IC task for one of the datasets. Moreover, a
non-contextual variant of CASA-NLU achieves state-of-the-art performance for IC
task on standard public datasets - Snips and ATIS.Comment: To appear at EMNLP 201
Prior Knowledge Driven Label Embedding for Slot Filling in Natural Language Understanding
Traditional slot filling in natural language understanding (NLU) predicts a
one-hot vector for each word. This form of label representation lacks semantic
correlation modelling, which leads to severe data sparsity problem, especially
when adapting an NLU model to a new domain. To address this issue, a novel
label embedding based slot filling framework is proposed in this paper. Here,
distributed label embedding is constructed for each slot using prior knowledge.
Three encoding methods are investigated to incorporate different kinds of prior
knowledge about slots: atomic concepts, slot descriptions, and slot exemplars.
The proposed label embeddings tend to share text patterns and reuses data with
different slot labels. This makes it useful for adaptive NLU with limited data.
Also, since label embedding is independent of NLU model, it is compatible with
almost all deep learning based slot filling models. The proposed approaches are
evaluated on three datasets. Experiments on single domain and domain adaptation
tasks show that label embedding achieves significant performance improvement
over traditional one-hot label representation as well as advanced zero-shot
approaches.Comment: 11 pages, 6 figures; Accepted for IEEE/ACM TRANSACTIONS ON AUDIO,
SPEECH, AND LANGUAGE PROCESSIN
Measuring and Reducing Gendered Correlations in Pre-trained Models
Pre-trained models have revolutionized natural language understanding.
However, researchers have found they can encode artifacts undesired in many
applications, such as professions correlating with one gender more than
another. We explore such gendered correlations as a case study for how to
address unintended correlations in pre-trained models. We define metrics and
reveal that it is possible for models with similar accuracy to encode
correlations at very different rates. We show how measured correlations can be
reduced with general-purpose techniques, and highlight the trade offs different
strategies have. With these results, we make recommendations for training
robust models: (1) carefully evaluate unintended correlations, (2) be mindful
of seemingly innocuous configuration differences, and (3) focus on general
mitigations
Mining fine-grained opinions on closed captions of YouTube videos with an attention-RNN
Video reviews are the natural evolution of written product reviews. In this
paper we target this phenomenon and introduce the first dataset created from
closed captions of YouTube product review videos as well as a new attention-RNN
model for aspect extraction and joint aspect extraction and sentiment
classification. Our model provides state-of-the-art performance on aspect
extraction without requiring the usage of hand-crafted features on the SemEval
ABSA corpus, while it outperforms the baseline on the joint task. In our
dataset, the attention-RNN model outperforms the baseline for both tasks, but
we observe important performance drops for all models in comparison to SemEval.
These results, as well as further experiments on domain adaptation for aspect
extraction, suggest that differences between speech and written text, which
have been discussed extensively in the literature, also extend to the domain of
product reviews, where they are relevant for fine-grained opinion mining.Comment: 8th Workshop on Computational Approaches to Subjectivity, Sentiment &
Social Media Analysis (WASSA
- …