2,482 research outputs found
Coupled Representation Learning for Domains, Intents and Slots in Spoken Language Understanding
Representation learning is an essential problem in a wide range of
applications and it is important for performing downstream tasks successfully.
In this paper, we propose a new model that learns coupled representations of
domains, intents, and slots by taking advantage of their hierarchical
dependency in a Spoken Language Understanding system. Our proposed model learns
the vector representation of intents based on the slots tied to these intents
by aggregating the representations of the slots. Similarly, the vector
representation of a domain is learned by aggregating the representations of the
intents tied to a specific domain. To the best of our knowledge, it is the
first approach to jointly learning the representations of domains, intents, and
slots using their hierarchical relationships. The experimental results
demonstrate the effectiveness of the representations learned by our model, as
evidenced by improved performance on the contextual cross-domain reranking
task.Comment: IEEE SLT 201
OneNet: Joint Domain, Intent, Slot Prediction for Spoken Language Understanding
In practice, most spoken language understanding systems process user input in
a pipelined manner; first domain is predicted, then intent and semantic slots
are inferred according to the semantic frames of the predicted domain. The
pipeline approach, however, has some disadvantages: error propagation and lack
of information sharing. To address these issues, we present a unified neural
network that jointly performs domain, intent, and slot predictions. Our
approach adopts a principled architecture for multitask learning to fold in the
state-of-the-art models for each task. With a few more ingredients, e.g.
orthography-sensitive input encoding and curriculum training, our model
delivered significant improvements in all three tasks across all domains over
strong baselines, including one using oracle prediction for domain detection,
on real user data of a commercial personal assistant.Comment: 5 pages conference paper accepted to IEEE ASRU 2017. Will be
published in December 201
Cross-lingual transfer learning for spoken language understanding
Typically, spoken language understanding (SLU) models are trained on
annotated data which are costly to gather. Aiming to reduce data needs for
bootstrapping a SLU system for a new language, we present a simple but
effective weight transfer approach using data from another language. The
approach is evaluated with our promising multi-task SLU framework developed
towards different languages. We evaluate our approach on the ATIS and a
real-world SLU dataset, showing that i) our monolingual models outperform the
state-of-the-art, ii) we can reduce data amounts needed for bootstrapping a SLU
system for a new language greatly, and iii) while multitask training improves
over separate training, different weight transfer settings may work best for
different SLU modules.Comment: accepted at ICASSP, 201
An Efficient Approach to Encoding Context for Spoken Language Understanding
In task-oriented dialogue systems, spoken language understanding, or SLU,
refers to the task of parsing natural language user utterances into semantic
frames. Making use of context from prior dialogue history holds the key to more
effective SLU. State of the art approaches to SLU use memory networks to encode
context by processing multiple utterances from the dialogue at each turn,
resulting in significant trade-offs between accuracy and computational
efficiency. On the other hand, downstream components like the dialogue state
tracker (DST) already keep track of the dialogue state, which can serve as a
summary of the dialogue history. In this work, we propose an efficient approach
to encoding context from prior utterances for SLU. More specifically, our
architecture includes a separate recurrent neural network (RNN) based encoding
module that accumulates dialogue context to guide the frame parsing sub-tasks
and can be shared between SLU and DST. In our experiments, we demonstrate the
effectiveness of our approach on dialogues from two domains.Comment: Submitted to INTERSPEECH 201
A Scalable Neural Shortlisting-Reranking Approach for Large-Scale Domain Classification in Natural Language Understanding
Intelligent personal digital assistants (IPDAs), a popular real-life
application with spoken language understanding capabilities, can cover
potentially thousands of overlapping domains for natural language
understanding, and the task of finding the best domain to handle an utterance
becomes a challenging problem on a large scale. In this paper, we propose a set
of efficient and scalable neural shortlisting-reranking models for large-scale
domain classification in IPDAs. The shortlisting stage focuses on efficiently
trimming all domains down to a list of k-best candidate domains, and the
reranking stage performs a list-wise reranking of the initial k-best domains
with additional contextual information. We show the effectiveness of our
approach with extensive experiments on 1,500 IPDA domains.Comment: Accepted to NAACL 201
Speaker Role Contextual Modeling for Language Understanding and Dialogue Policy Learning
Language understanding (LU) and dialogue policy learning are two essential
components in conversational systems. Human-human dialogues are not
well-controlled and often random and unpredictable due to their own goals and
speaking habits. This paper proposes a role-based contextual model to consider
different speaker roles independently based on the various speaking patterns in
the multi-turn dialogues. The experiments on the benchmark dataset show that
the proposed role-based model successfully learns role-specific behavioral
patterns for contextual encoding and then significantly improves language
understanding and dialogue policy learning tasks.Comment: Accepted by IJCNLP 2017, The 8th International Joint Conference on
Natural Language Processing (IJCNLP 2017
Enhancing Chinese Intent Classification by Dynamically Integrating Character Features into Word Embeddings with Ensemble Techniques
Intent classification has been widely researched on English data with deep
learning approaches that are based on neural networks and word embeddings. The
challenge for Chinese intent classification stems from the fact that, unlike
English where most words are made up of 26 phonologic alphabet letters, Chinese
is logographic, where a Chinese character is a more basic semantic unit that
can be informative and its meaning does not vary too much in contexts. Chinese
word embeddings alone can be inadequate for representing words, and pre-trained
embeddings can suffer from not aligning well with the task at hand. To account
for the inadequacy and leverage Chinese character information, we propose a
low-effort and generic way to dynamically integrate character embedding based
feature maps with word embedding based inputs, whose resulting word-character
embeddings are stacked with a contextual information extraction module to
further incorporate context information for predictions. On top of the proposed
model, we employ an ensemble method to combine single models and obtain the
final result. The approach is data-independent without relying on external
sources like pre-trained word embeddings. The proposed model outperforms
baseline models and existing methods
Towards Open Intent Discovery for Conversational Text
Detecting and identifying user intent from text, both written and spoken,
plays an important role in modelling and understand dialogs. Existing research
for intent discovery model it as a classification task with a predefined set of
known categories. To generailze beyond these preexisting classes, we define a
new task of \textit{open intent discovery}. We investigate how intent can be
generalized to those not seen during training. To this end, we propose a
two-stage approach to this task - predicting whether an utterance contains an
intent, and then tagging the intent in the input utterance. Our model consists
of a bidirectional LSTM with a CRF on top to capture contextual semantics,
subject to some constraints. Self-attention is used to learn long distance
dependencies. Further, we adapt an adversarial training approach to improve
robustness and perforamce across domains. We also present a dataset of 25k
real-life utterances that have been labelled via crowd sourcing. Our
experiments across different domains and real-world datasets show the
effectiveness of our approach, with less than 100 annotated examples needed per
unique domain to recognize diverse intents. The approach outperforms
state-of-the-art baselines by 5-15% F1 score points
Speaker-Sensitive Dual Memory Networks for Multi-Turn Slot Tagging
In multi-turn dialogs, natural language understanding models can introduce
obvious errors by being blind to contextual information. To incorporate dialog
history, we present a neural architecture with Speaker-Sensitive Dual Memory
Networks which encode utterances differently depending on the speaker. This
addresses the different extents of information available to the system - the
system knows only the surface form of user utterances while it has the exact
semantics of system output. We performed experiments on real user data from
Microsoft Cortana, a commercial personal assistant. The result showed a
significant performance improvement over the state-of-the-art slot tagging
models using contextual information.Comment: 5 pages conference paper accepted to IEEE ASRU 2017. Will be
published in December 201
Efficient Large-Scale Domain Classification with Personalized Attention
In this paper, we explore the task of mapping spoken language utterances to
one of thousands of natural language understanding domains in intelligent
personal digital assistants (IPDAs). This scenario is observed for many
mainstream IPDAs in industry that allow third parties to develop thousands of
new domains to augment built-in ones to rapidly increase domain coverage and
overall IPDA capabilities. We propose a scalable neural model architecture with
a shared encoder, a novel attention mechanism that incorporates personalization
information and domain-specific classifiers that solves the problem
efficiently. Our architecture is designed to efficiently accommodate new
domains that appear in-between full model retraining cycles with a rapid
bootstrapping mechanism two orders of magnitude faster than retraining. We
account for practical constraints in real-time production systems, and design
to minimize memory footprint and runtime latency. We demonstrate that
incorporating personalization results in significantly more accurate domain
classification in the setting with thousands of overlapping domains.Comment: Accepted to ACL 201
- …