27,377 research outputs found
Generative Encoder-Decoder Models for Task-Oriented Spoken Dialog Systems with Chatting Capability
Generative encoder-decoder models offer great promise in developing
domain-general dialog systems. However, they have mainly been applied to
open-domain conversations. This paper presents a practical and novel framework
for building task-oriented dialog systems based on encoder-decoder models. This
framework enables encoder-decoder models to accomplish slot-value independent
decision-making and interact with external databases. Moreover, this paper
shows the flexibility of the proposed method by interleaving chatting
capability with a slot-filling system for better out-of-domain recovery. The
models were trained on both real-user data from a bus information system and
human-human chat data. Results show that the proposed framework achieves good
performance in both offline evaluation metrics and in task success rate with
human users.Comment: Accepted as a long paper in SIGIDIAL 201
The Rapidly Changing Landscape of Conversational Agents
Conversational agents have become ubiquitous, ranging from goal-oriented
systems for helping with reservations to chit-chat models found in modern
virtual assistants. In this survey paper, we explore this fascinating field. We
look at some of the pioneering work that defined the field and gradually move
to the current state-of-the-art models. We look at statistical, neural,
generative adversarial network based and reinforcement learning based
approaches and how they evolved. Along the way we discuss various challenges
that the field faces, lack of context in utterances, not having a good
quantitative metric to compare models, lack of trust in agents because they do
not have a consistent persona etc. We structure this paper in a way that
answers these pertinent questions and discusses competing approaches to solve
them.Comment: 14 pages, 7 figures. arXiv admin note: text overlap with
arXiv:1704.07130, arXiv:1507.04808, arXiv:1603.06155, arXiv:1611.06997,
arXiv:1704.08966 by other author
A Survey on Dialogue Systems: Recent Advances and New Frontiers
Dialogue systems have attracted more and more attention. Recent advances on
dialogue systems are overwhelmingly contributed by deep learning techniques,
which have been employed to enhance a wide range of big data applications such
as computer vision, natural language processing, and recommender systems. For
dialogue systems, deep learning can leverage a massive amount of data to learn
meaningful feature representations and response generation strategies, while
requiring a minimum amount of hand-crafting. In this article, we give an
overview to these recent advances on dialogue systems from various perspectives
and discuss some possible research directions. In particular, we generally
divide existing dialogue systems into task-oriented and non-task-oriented
models, then detail how deep learning techniques help them with representative
algorithms and finally discuss some appealing research directions that can
bring the dialogue system research into a new frontier.Comment: 13 pages. arXiv admin note: text overlap with arXiv:1703.01008 by
other author
Learning Robust Dialog Policies in Noisy Environments
Modern virtual personal assistants provide a convenient interface for
completing daily tasks via voice commands. An important consideration for these
assistants is the ability to recover from automatic speech recognition (ASR)
and natural language understanding (NLU) errors. In this paper, we focus on
learning robust dialog policies to recover from these errors. To this end, we
develop a user simulator which interacts with the assistant through voice
commands in realistic scenarios with noisy audio, and use it to learn dialog
policies through deep reinforcement learning. We show that dialogs generated by
our simulator are indistinguishable from human generated dialogs, as determined
by human evaluators. Furthermore, preliminary experimental results show that
the learned policies in noisy environments achieve the same execution success
rate with fewer dialog turns compared to fixed rule-based policies.Comment: 1st Workshop on Conversational AI at NIPS 201
Towards End-to-End Learning for Dialog State Tracking and Management using Deep Reinforcement Learning
This paper presents an end-to-end framework for task-oriented dialog systems
using a variant of Deep Recurrent Q-Networks (DRQN). The model is able to
interface with a relational database and jointly learn policies for both
language understanding and dialog strategy. Moreover, we propose a hybrid
algorithm that combines the strength of reinforcement learning and supervised
learning to achieve faster learning speed. We evaluated the proposed model on a
20 Question Game conversational game simulator. Results show that the proposed
method outperforms the modular-based baseline and learns a distributed
representation of the latent dialog state.Comment: In proceeding of SIGDIAL 2016. Added changes based-on peer review,
including: 1. Added references, 2. fixed typos in text and figures, 3. added
minor change to introductio
A Simple Baseline for Audio-Visual Scene-Aware Dialog
The recently proposed audio-visual scene-aware dialog task paves the way to a
more data-driven way of learning virtual assistants, smart speakers and car
navigation systems. However, very little is known to date about how to
effectively extract meaningful information from a plethora of sensors that
pound the computational engine of those devices. Therefore, in this paper, we
provide and carefully analyze a simple baseline for audio-visual scene-aware
dialog which is trained end-to-end. Our method differentiates in a data-driven
manner useful signals from distracting ones using an attention mechanism. We
evaluate the proposed approach on the recently introduced and challenging
audio-visual scene-aware dataset, and demonstrate the key features that permit
to outperform the current state-of-the-art by more than 20\% on CIDEr.Comment: Accepted to CVPR 201
Quantized-Dialog Language Model for Goal-Oriented Conversational Systems
We propose a novel methodology to address dialog learning in the context of
goal-oriented conversational systems. The key idea is to quantize the dialog
space into clusters and create a language model across the clusters, thus
allowing for an accurate choice of the next utterance in the conversation. The
language model relies on n-grams associated with clusters of utterances. This
quantized-dialog language model methodology has been applied to the end-to-end
goal-oriented track of the latest Dialog System Technology Challenges (DSTC6).
The objective is to find the correct system utterance from a pool of candidates
in order to complete a dialog between a user and an automated
restaurant-reservation system. Our results show that the technique proposed in
this paper achieves high accuracy regarding selection of the correct candidate
utterance, and outperforms other state-of-the-art approaches based on neural
networks
A Benchmarking Environment for Reinforcement Learning Based Task Oriented Dialogue Management
Dialogue assistants are rapidly becoming an indispensable daily aid. To avoid
the significant effort needed to hand-craft the required dialogue flow, the
Dialogue Management (DM) module can be cast as a continuous Markov Decision
Process (MDP) and trained through Reinforcement Learning (RL). Several RL
models have been investigated over recent years. However, the lack of a common
benchmarking framework makes it difficult to perform a fair comparison between
different models and their capability to generalise to different environments.
Therefore, this paper proposes a set of challenging simulated environments for
dialogue model development and evaluation. To provide some baselines, we
investigate a number of representative parametric algorithms, namely deep
reinforcement learning algorithms - DQN, A2C and Natural Actor-Critic and
compare them to a non-parametric model, GP-SARSA. Both the environments and
policy models are implemented using the publicly available PyDial toolkit and
released on-line, in order to establish a testbed framework for further
experiments and to facilitate experimental reproducibility.Comment: Accepted at the Deep Reinforcement Learning Symposium, 31st
Conference on Neural Information Processing Systems (NIPS 2017) Paper updated
with minor change
Iterative Policy Learning in End-to-End Trainable Task-Oriented Neural Dialog Models
In this paper, we present a deep reinforcement learning (RL) framework for
iterative dialog policy optimization in end-to-end task-oriented dialog
systems. Popular approaches in learning dialog policy with RL include letting a
dialog agent to learn against a user simulator. Building a reliable user
simulator, however, is not trivial, often as difficult as building a good
dialog agent. We address this challenge by jointly optimizing the dialog agent
and the user simulator with deep RL by simulating dialogs between the two
agents. We first bootstrap a basic dialog agent and a basic user simulator by
learning directly from dialog corpora with supervised training. We then improve
them further by letting the two agents to conduct task-oriented dialogs and
iteratively optimizing their policies with deep RL. Both the dialog agent and
the user simulator are designed with neural network models that can be trained
end-to-end. Our experiment results show that the proposed method leads to
promising improvements on task success rate and total task reward comparing to
supervised training and single-agent RL training baseline models.Comment: Accepted at ASRU 201
An Incremental Turn-Taking Model For Task-Oriented Dialog Systems
In a human-machine dialog scenario, deciding the appropriate time for the
machine to take the turn is an open research problem. In contrast, humans
engaged in conversations are able to timely decide when to interrupt the
speaker for competitive or non-competitive reasons. In state-of-the-art
turn-by-turn dialog systems the decision on the next dialog action is taken at
the end of the utterance. In this paper, we propose a token-by-token prediction
of the dialog state from incremental transcriptions of the user utterance. To
identify the point of maximal understanding in an ongoing utterance, we a)
implement an incremental Dialog State Tracker which is updated on a token basis
(iDST) b) re-label the Dialog State Tracking Challenge 2 (DSTC2) dataset and c)
adapt it to the incremental turn-taking experimental scenario. The re-labeling
consists of assigning a binary value to each token in the user utterance that
allows to identify the appropriate point for taking the turn. Finally, we
implement an incremental Turn Taking Decider (iTTD) that is trained on these
new labels for the turn-taking decision. We show that the proposed model can
achieve a better performance compared to a deterministic handcrafted
turn-taking algorithm.Comment: Accepted to INTERSPEECH 201
- …