12,760 research outputs found
An Efficient Approach to Encoding Context for Spoken Language Understanding
In task-oriented dialogue systems, spoken language understanding, or SLU,
refers to the task of parsing natural language user utterances into semantic
frames. Making use of context from prior dialogue history holds the key to more
effective SLU. State of the art approaches to SLU use memory networks to encode
context by processing multiple utterances from the dialogue at each turn,
resulting in significant trade-offs between accuracy and computational
efficiency. On the other hand, downstream components like the dialogue state
tracker (DST) already keep track of the dialogue state, which can serve as a
summary of the dialogue history. In this work, we propose an efficient approach
to encoding context from prior utterances for SLU. More specifically, our
architecture includes a separate recurrent neural network (RNN) based encoding
module that accumulates dialogue context to guide the frame parsing sub-tasks
and can be shared between SLU and DST. In our experiments, we demonstrate the
effectiveness of our approach on dialogues from two domains.Comment: Submitted to INTERSPEECH 201
Jointly Encoding Word Confusion Network and Dialogue Context with BERT for Spoken Language Understanding
Spoken Language Understanding (SLU) converts hypotheses from automatic speech
recognizer (ASR) into structured semantic representations. ASR recognition
errors can severely degenerate the performance of the subsequent SLU module. To
address this issue, word confusion networks (WCNs) have been used to encode the
input for SLU, which contain richer information than 1-best or n-best
hypotheses list. To further eliminate ambiguity, the last system act of
dialogue context is also utilized as additional input. In this paper, a novel
BERT based SLU model (WCN-BERT SLU) is proposed to encode WCNs and the dialogue
context jointly. It can integrate both structural information and ASR posterior
probabilities of WCNs in the BERT architecture. Experiments on DSTC2, a
benchmark of SLU, show that the proposed method is effective and can outperform
previous state-of-the-art models significantly.Comment: Accepted to INTERSPEECH 202
Iterative Policy Learning in End-to-End Trainable Task-Oriented Neural Dialog Models
In this paper, we present a deep reinforcement learning (RL) framework for
iterative dialog policy optimization in end-to-end task-oriented dialog
systems. Popular approaches in learning dialog policy with RL include letting a
dialog agent to learn against a user simulator. Building a reliable user
simulator, however, is not trivial, often as difficult as building a good
dialog agent. We address this challenge by jointly optimizing the dialog agent
and the user simulator with deep RL by simulating dialogs between the two
agents. We first bootstrap a basic dialog agent and a basic user simulator by
learning directly from dialog corpora with supervised training. We then improve
them further by letting the two agents to conduct task-oriented dialogs and
iteratively optimizing their policies with deep RL. Both the dialog agent and
the user simulator are designed with neural network models that can be trained
end-to-end. Our experiment results show that the proposed method leads to
promising improvements on task success rate and total task reward comparing to
supervised training and single-agent RL training baseline models.Comment: Accepted at ASRU 201
OneNet: Joint Domain, Intent, Slot Prediction for Spoken Language Understanding
In practice, most spoken language understanding systems process user input in
a pipelined manner; first domain is predicted, then intent and semantic slots
are inferred according to the semantic frames of the predicted domain. The
pipeline approach, however, has some disadvantages: error propagation and lack
of information sharing. To address these issues, we present a unified neural
network that jointly performs domain, intent, and slot predictions. Our
approach adopts a principled architecture for multitask learning to fold in the
state-of-the-art models for each task. With a few more ingredients, e.g.
orthography-sensitive input encoding and curriculum training, our model
delivered significant improvements in all three tasks across all domains over
strong baselines, including one using oracle prediction for domain detection,
on real user data of a commercial personal assistant.Comment: 5 pages conference paper accepted to IEEE ASRU 2017. Will be
published in December 201
Noise-robust Named Entity Understanding for Virtual Assistants
Named Entity Understanding (NEU) plays an essential role in interactions
between users and voice assistants, since successfully identifying entities and
correctly linking them to their standard forms is crucial to understanding the
user's intent. NEU is a challenging task in voice assistants due to the
ambiguous nature of natural language and because noise introduced by speech
transcription and user errors occur frequently in spoken natural language
queries. In this paper, we propose an architecture with novel features that
jointly solves the recognition of named entities (a.k.a. Named Entity
Recognition, or NER) and the resolution to their canonical forms (a.k.a. Entity
Linking, or EL). We show that by combining NER and EL information in a joint
reranking module, our proposed framework improves accuracy in both tasks. This
improved performance and the features that enable it, also lead to better
accuracy in downstream tasks, such as domain classification and semantic
parsing.Comment: 9 page
Improving Response Selection in Multi-Turn Dialogue Systems by Incorporating Domain Knowledge
Building systems that can communicate with humans is a core problem in
Artificial Intelligence. This work proposes a novel neural network architecture
for response selection in an end-to-end multi-turn conversational dialogue
setting. The architecture applies context level attention and incorporates
additional external knowledge provided by descriptions of domain-specific
words. It uses a bi-directional Gated Recurrent Unit (GRU) for encoding context
and responses and learns to attend over the context words given the latent
response representation and vice versa.In addition, it incorporates external
domain specific information using another GRU for encoding the domain keyword
descriptions. This allows better representation of domain-specific keywords in
responses and hence improves the overall performance. Experimental results show
that our model outperforms all other state-of-the-art methods for response
selection in multi-turn conversations.Comment: Published as conference paper at CoNLL 201
Indexing, browsing and searching of digital video
Video is a communications medium that normally brings together moving pictures with a synchronised audio track into a discrete piece or pieces of information. The size of a “piece ” of video can variously be referred to as a frame, a shot, a scene, a clip, a programme or an episode, and these are distinguished by their lengths and by their composition. We shall return to the definition of each of these in section 4 this chapter. In modern society, video is ver
Toward Mention Detection Robustness with Recurrent Neural Networks
One of the key challenges in natural language processing (NLP) is to yield
good performance across application domains and languages. In this work, we
investigate the robustness of the mention detection systems, one of the
fundamental tasks in information extraction, via recurrent neural networks
(RNNs). The advantage of RNNs over the traditional approaches is their capacity
to capture long ranges of context and implicitly adapt the word embeddings,
trained on a large corpus, into a task-specific word representation, but still
preserve the original semantic generalization to be helpful across domains. Our
systematic evaluation for RNN architectures demonstrates that RNNs not only
outperform the best reported systems (up to 9\% relative error reduction) in
the general setting but also achieve the state-of-the-art performance in the
cross-domain setting for English. Regarding other languages, RNNs are
significantly better than the traditional methods on the similar task of named
entity recognition for Dutch (up to 22\% relative error reduction).Comment: 13 pages, 11 tables, 3 figure
Label-Dependencies Aware Recurrent Neural Networks
In the last few years, Recurrent Neural Networks (RNNs) have proved effective
on several NLP tasks. Despite such great success, their ability to model
\emph{sequence labeling} is still limited. This lead research toward solutions
where RNNs are combined with models which already proved effective in this
domain, such as CRFs. In this work we propose a solution far simpler but very
effective: an evolution of the simple Jordan RNN, where labels are re-injected
as input into the network, and converted into embeddings, in the same way as
words. We compare this RNN variant to all the other RNN models, Elman and
Jordan RNN, LSTM and GRU, on two well-known tasks of Spoken Language
Understanding (SLU). Thanks to label embeddings and their combination at the
hidden layer, the proposed variant, which uses more parameters than Elman and
Jordan RNNs, but far fewer than LSTM and GRU, is more effective than other
RNNs, but also outperforms sophisticated CRF models.Comment: 22 pages, 3 figures. Accepted at CICling 2017 conference. Best
Verifiability, Reproducibility, and Working Description awar
DREAM: A Challenge Dataset and Models for Dialogue-Based Reading Comprehension
We present DREAM, the first dialogue-based multiple-choice reading
comprehension dataset. Collected from English-as-a-foreign-language
examinations designed by human experts to evaluate the comprehension level of
Chinese learners of English, our dataset contains 10,197 multiple-choice
questions for 6,444 dialogues. In contrast to existing reading comprehension
datasets, DREAM is the first to focus on in-depth multi-turn multi-party
dialogue understanding. DREAM is likely to present significant challenges for
existing reading comprehension systems: 84% of answers are non-extractive, 85%
of questions require reasoning beyond a single sentence, and 34% of questions
also involve commonsense knowledge.
We apply several popular neural reading comprehension models that primarily
exploit surface information within the text and find them to, at best, just
barely outperform a rule-based approach. We next investigate the effects of
incorporating dialogue structure and different kinds of general world knowledge
into both rule-based and (neural and non-neural) machine learning-based reading
comprehension models. Experimental results on the DREAM dataset show the
effectiveness of dialogue structure and general world knowledge. DREAM will be
available at https://dataset.org/dream/.Comment: To appear in TAC
- …