2,478 research outputs found
Natural Language Interactions in Autonomous Vehicles: Intent Detection and Slot Filling from Passenger Utterances
Understanding passenger intents and extracting relevant slots are important
building blocks towards developing contextual dialogue systems for natural
interactions in autonomous vehicles (AV). In this work, we explored AMIE
(Automated-vehicle Multi-modal In-cabin Experience), the in-cabin agent
responsible for handling certain passenger-vehicle interactions. When the
passengers give instructions to AMIE, the agent should parse such commands
properly and trigger the appropriate functionality of the AV system. In our
current explorations, we focused on AMIE scenarios describing usages around
setting or changing the destination and route, updating driving behavior or
speed, finishing the trip and other use-cases to support various natural
commands. We collected a multi-modal in-cabin dataset with multi-turn dialogues
between the passengers and AMIE using a Wizard-of-Oz scheme via a realistic
scavenger hunt game activity. After exploring various recent Recurrent Neural
Networks (RNN) based techniques, we introduced our own hierarchical joint
models to recognize passenger intents along with relevant slots associated with
the action to be performed in AV scenarios. Our experimental results
outperformed certain competitive baselines and achieved overall F1 scores of
0.91 for utterance-level intent detection and 0.96 for slot filling tasks. In
addition, we conducted initial speech-to-text explorations by comparing
intent/slot models trained and tested on human transcriptions versus noisy
Automatic Speech Recognition (ASR) outputs. Finally, we compared the results
with single passenger rides versus the rides with multiple passengers.Comment: Accepted and presented as a full paper at 20th International
Conference on Computational Linguistics and Intelligent Text Processing
(CICLing 2019), April 7-13, 2019, La Rochelle, Franc
Joint Slot Filling and Intent Detection via Capsule Neural Networks
Being able to recognize words as slots and detect the intent of an utterance
has been a keen issue in natural language understanding. The existing works
either treat slot filling and intent detection separately in a pipeline manner,
or adopt joint models which sequentially label slots while summarizing the
utterance-level intent without explicitly preserving the hierarchical
relationship among words, slots, and intents. To exploit the semantic hierarchy
for effective modeling, we propose a capsule-based neural network model which
accomplishes slot filling and intent detection via a dynamic
routing-by-agreement schema. A re-routing schema is proposed to further
synergize the slot filling performance using the inferred intent
representation. Experiments on two real-world datasets show the effectiveness
of our model when compared with other alternative model architectures, as well
as existing natural language understanding services.Comment: In ACL 2019 as a long paper. Code and data available at
https://github.com/czhang99/Capsule-NL
Coupled Representation Learning for Domains, Intents and Slots in Spoken Language Understanding
Representation learning is an essential problem in a wide range of
applications and it is important for performing downstream tasks successfully.
In this paper, we propose a new model that learns coupled representations of
domains, intents, and slots by taking advantage of their hierarchical
dependency in a Spoken Language Understanding system. Our proposed model learns
the vector representation of intents based on the slots tied to these intents
by aggregating the representations of the slots. Similarly, the vector
representation of a domain is learned by aggregating the representations of the
intents tied to a specific domain. To the best of our knowledge, it is the
first approach to jointly learning the representations of domains, intents, and
slots using their hierarchical relationships. The experimental results
demonstrate the effectiveness of the representations learned by our model, as
evidenced by improved performance on the contextual cross-domain reranking
task.Comment: IEEE SLT 201
Cross-lingual transfer learning for spoken language understanding
Typically, spoken language understanding (SLU) models are trained on
annotated data which are costly to gather. Aiming to reduce data needs for
bootstrapping a SLU system for a new language, we present a simple but
effective weight transfer approach using data from another language. The
approach is evaluated with our promising multi-task SLU framework developed
towards different languages. We evaluate our approach on the ATIS and a
real-world SLU dataset, showing that i) our monolingual models outperform the
state-of-the-art, ii) we can reduce data amounts needed for bootstrapping a SLU
system for a new language greatly, and iii) while multitask training improves
over separate training, different weight transfer settings may work best for
different SLU modules.Comment: accepted at ICASSP, 201
An Efficient Approach to Encoding Context for Spoken Language Understanding
In task-oriented dialogue systems, spoken language understanding, or SLU,
refers to the task of parsing natural language user utterances into semantic
frames. Making use of context from prior dialogue history holds the key to more
effective SLU. State of the art approaches to SLU use memory networks to encode
context by processing multiple utterances from the dialogue at each turn,
resulting in significant trade-offs between accuracy and computational
efficiency. On the other hand, downstream components like the dialogue state
tracker (DST) already keep track of the dialogue state, which can serve as a
summary of the dialogue history. In this work, we propose an efficient approach
to encoding context from prior utterances for SLU. More specifically, our
architecture includes a separate recurrent neural network (RNN) based encoding
module that accumulates dialogue context to guide the frame parsing sub-tasks
and can be shared between SLU and DST. In our experiments, we demonstrate the
effectiveness of our approach on dialogues from two domains.Comment: Submitted to INTERSPEECH 201
A Survey on Dialogue Systems: Recent Advances and New Frontiers
Dialogue systems have attracted more and more attention. Recent advances on
dialogue systems are overwhelmingly contributed by deep learning techniques,
which have been employed to enhance a wide range of big data applications such
as computer vision, natural language processing, and recommender systems. For
dialogue systems, deep learning can leverage a massive amount of data to learn
meaningful feature representations and response generation strategies, while
requiring a minimum amount of hand-crafting. In this article, we give an
overview to these recent advances on dialogue systems from various perspectives
and discuss some possible research directions. In particular, we generally
divide existing dialogue systems into task-oriented and non-task-oriented
models, then detail how deep learning techniques help them with representative
algorithms and finally discuss some appealing research directions that can
bring the dialogue system research into a new frontier.Comment: 13 pages. arXiv admin note: text overlap with arXiv:1703.01008 by
other author
Deep Cascade Multi-task Learning for Slot Filling in Online Shopping Assistant
Slot filling is a critical task in natural language understanding (NLU) for
dialog systems. State-of-the-art approaches treat it as a sequence labeling
problem and adopt such models as BiLSTM-CRF. While these models work relatively
well on standard benchmark datasets, they face challenges in the context of
E-commerce where the slot labels are more informative and carry richer
expressions. In this work, inspired by the unique structure of E-commerce
knowledge base, we propose a novel multi-task model with cascade and residual
connections, which jointly learns segment tagging, named entity tagging and
slot filling. Experiments show the effectiveness of the proposed cascade and
residual structures. Our model has a 14.6% advantage in F1 score over the
strong baseline methods on a new Chinese E-commerce shopping assistant dataset,
while achieving competitive accuracies on a standard dataset. Furthermore,
online test deployed on such dominant E-commerce platform shows 130%
improvement on accuracy of understanding user utterances. Our model has already
gone into production in the E-commerce platform.Comment: AAAI 201
OneNet: Joint Domain, Intent, Slot Prediction for Spoken Language Understanding
In practice, most spoken language understanding systems process user input in
a pipelined manner; first domain is predicted, then intent and semantic slots
are inferred according to the semantic frames of the predicted domain. The
pipeline approach, however, has some disadvantages: error propagation and lack
of information sharing. To address these issues, we present a unified neural
network that jointly performs domain, intent, and slot predictions. Our
approach adopts a principled architecture for multitask learning to fold in the
state-of-the-art models for each task. With a few more ingredients, e.g.
orthography-sensitive input encoding and curriculum training, our model
delivered significant improvements in all three tasks across all domains over
strong baselines, including one using oracle prediction for domain detection,
on real user data of a commercial personal assistant.Comment: 5 pages conference paper accepted to IEEE ASRU 2017. Will be
published in December 201
BERT for Joint Intent Classification and Slot Filling
Intent classification and slot filling are two essential tasks for natural
language understanding. They often suffer from small-scale human-labeled
training data, resulting in poor generalization capability, especially for rare
words. Recently a new language representation model, BERT (Bidirectional
Encoder Representations from Transformers), facilitates pre-training deep
bidirectional representations on large-scale unlabeled corpora, and has created
state-of-the-art models for a wide variety of natural language processing tasks
after simple fine-tuning. However, there has not been much effort on exploring
BERT for natural language understanding. In this work, we propose a joint
intent classification and slot filling model based on BERT. Experimental
results demonstrate that our proposed model achieves significant improvement on
intent classification accuracy, slot filling F1, and sentence-level semantic
frame accuracy on several public benchmark datasets, compared to the
attention-based recurrent neural network models and slot-gated models.Comment: 4 pages, 1 figur
Efficient Large-Scale Domain Classification with Personalized Attention
In this paper, we explore the task of mapping spoken language utterances to
one of thousands of natural language understanding domains in intelligent
personal digital assistants (IPDAs). This scenario is observed for many
mainstream IPDAs in industry that allow third parties to develop thousands of
new domains to augment built-in ones to rapidly increase domain coverage and
overall IPDA capabilities. We propose a scalable neural model architecture with
a shared encoder, a novel attention mechanism that incorporates personalization
information and domain-specific classifiers that solves the problem
efficiently. Our architecture is designed to efficiently accommodate new
domains that appear in-between full model retraining cycles with a rapid
bootstrapping mechanism two orders of magnitude faster than retraining. We
account for practical constraints in real-time production systems, and design
to minimize memory footprint and runtime latency. We demonstrate that
incorporating personalization results in significantly more accurate domain
classification in the setting with thousands of overlapping domains.Comment: Accepted to ACL 201
- …