Search CORE

27,377 research outputs found

Generative Encoder-Decoder Models for Task-Oriented Spoken Dialog Systems with Chatting Capability

Author: Eskenazi Maxine
Lee Kyusong
Lu Allen
Zhao Tiancheng
Publication venue
Publication date: 26/06/2017
Field of study

Generative encoder-decoder models offer great promise in developing domain-general dialog systems. However, they have mainly been applied to open-domain conversations. This paper presents a practical and novel framework for building task-oriented dialog systems based on encoder-decoder models. This framework enables encoder-decoder models to accomplish slot-value independent decision-making and interact with external databases. Moreover, this paper shows the flexibility of the proposed method by interleaving chatting capability with a slot-filling system for better out-of-domain recovery. The models were trained on both real-user data from a bus information system and human-human chat data. Results show that the proposed framework achieves good performance in both offline evaluation metrics and in task success rate with human users.Comment: Accepted as a long paper in SIGIDIAL 201

arXiv.org e-Print Archive

The Rapidly Changing Landscape of Conversational Agents

Author: Mathur Vinayak
Singh Arpit
Publication venue
Publication date: 24/03/2018
Field of study

Conversational agents have become ubiquitous, ranging from goal-oriented systems for helping with reservations to chit-chat models found in modern virtual assistants. In this survey paper, we explore this fascinating field. We look at some of the pioneering work that defined the field and gradually move to the current state-of-the-art models. We look at statistical, neural, generative adversarial network based and reinforcement learning based approaches and how they evolved. Along the way we discuss various challenges that the field faces, lack of context in utterances, not having a good quantitative metric to compare models, lack of trust in agents because they do not have a consistent persona etc. We structure this paper in a way that answers these pertinent questions and discusses competing approaches to solve them.Comment: 14 pages, 7 figures. arXiv admin note: text overlap with arXiv:1704.07130, arXiv:1507.04808, arXiv:1603.06155, arXiv:1611.06997, arXiv:1704.08966 by other author

arXiv.org e-Print Archive

A Survey on Dialogue Systems: Recent Advances and New Frontiers

Author: Chen Hongshen
Liu Xiaorui
Tang Jiliang
Yin Dawei
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 11/01/2018
Field of study

Dialogue systems have attracted more and more attention. Recent advances on dialogue systems are overwhelmingly contributed by deep learning techniques, which have been employed to enhance a wide range of big data applications such as computer vision, natural language processing, and recommender systems. For dialogue systems, deep learning can leverage a massive amount of data to learn meaningful feature representations and response generation strategies, while requiring a minimum amount of hand-crafting. In this article, we give an overview to these recent advances on dialogue systems from various perspectives and discuss some possible research directions. In particular, we generally divide existing dialogue systems into task-oriented and non-task-oriented models, then detail how deep learning techniques help them with representative algorithms and finally discuss some appealing research directions that can bring the dialogue system research into a new frontier.Comment: 13 pages. arXiv admin note: text overlap with arXiv:1703.01008 by other author

arXiv.org e-Print Archive

Learning Robust Dialog Policies in Noisy Environments

Author: Cao Jin
Casale Jared
Fazel-Zarandi Maryam
Geramifard Alborz
Henderson Peter
Li Shang-Wen
Whitney David
Publication venue
Publication date: 11/12/2017
Field of study

Modern virtual personal assistants provide a convenient interface for completing daily tasks via voice commands. An important consideration for these assistants is the ability to recover from automatic speech recognition (ASR) and natural language understanding (NLU) errors. In this paper, we focus on learning robust dialog policies to recover from these errors. To this end, we develop a user simulator which interacts with the assistant through voice commands in realistic scenarios with noisy audio, and use it to learn dialog policies through deep reinforcement learning. We show that dialogs generated by our simulator are indistinguishable from human generated dialogs, as determined by human evaluators. Furthermore, preliminary experimental results show that the learned policies in noisy environments achieve the same execution success rate with fewer dialog turns compared to fixed rule-based policies.Comment: 1st Workshop on Conversational AI at NIPS 201

arXiv.org e-Print Archive

Towards End-to-End Learning for Dialog State Tracking and Management using Deep Reinforcement Learning

Author: Eskenazi Maxine
Zhao Tiancheng
Publication venue
Publication date: 15/09/2016
Field of study

This paper presents an end-to-end framework for task-oriented dialog systems using a variant of Deep Recurrent Q-Networks (DRQN). The model is able to interface with a relational database and jointly learn policies for both language understanding and dialog strategy. Moreover, we propose a hybrid algorithm that combines the strength of reinforcement learning and supervised learning to achieve faster learning speed. We evaluated the proposed model on a 20 Question Game conversational game simulator. Results show that the proposed method outperforms the modular-based baseline and learns a distributed representation of the latent dialog state.Comment: In proceeding of SIGDIAL 2016. Added changes based-on peer review, including: 1. Added references, 2. fixed typos in text and figures, 3. added minor change to introductio

arXiv.org e-Print Archive

A Simple Baseline for Audio-Visual Scene-Aware Dialog

Author: Hazan Tamir
Schwartz Idan
Schwing Alexander
Publication venue
Publication date: 11/04/2019
Field of study

The recently proposed audio-visual scene-aware dialog task paves the way to a more data-driven way of learning virtual assistants, smart speakers and car navigation systems. However, very little is known to date about how to effectively extract meaningful information from a plethora of sensors that pound the computational engine of those devices. Therefore, in this paper, we provide and carefully analyze a simple baseline for audio-visual scene-aware dialog which is trained end-to-end. Our method differentiates in a data-driven manner useful signals from distracting ones using an attention mechanism. We evaluate the proposed approach on the recently introduced and challenging audio-visual scene-aware dataset, and demonstrate the key features that permit to outperform the current state-of-the-art by more than 20\% on CIDEr.Comment: Accepted to CVPR 201

arXiv.org e-Print Archive

Quantized-Dialog Language Model for Goal-Oriented Conversational Systems

Author: Fadnis Kshitij P.
Ganhotra Jatin
Gunasekara R. Chulaka
Nahamoo David
Polymenakos Lazaros C.
Publication venue
Publication date: 26/12/2018
Field of study

We propose a novel methodology to address dialog learning in the context of goal-oriented conversational systems. The key idea is to quantize the dialog space into clusters and create a language model across the clusters, thus allowing for an accurate choice of the next utterance in the conversation. The language model relies on n-grams associated with clusters of utterances. This quantized-dialog language model methodology has been applied to the end-to-end goal-oriented track of the latest Dialog System Technology Challenges (DSTC6). The objective is to find the correct system utterance from a pool of candidates in order to complete a dialog between a user and an automated restaurant-reservation system. Our results show that the technique proposed in this paper achieves high accuracy regarding selection of the correct candidate utterance, and outperforms other state-of-the-art approaches based on neural networks

arXiv.org e-Print Archive

A Benchmarking Environment for Reinforcement Learning Based Task Oriented Dialogue Management

Author: Budzianowski Paweł
Casanueva Iñigo
Gašić Milica
Mrkšić Nikola
Rojas-Barahona Lina
Su Pei-Hao
Ultes Stefan
Wen Tsung-Hsien
Young Steve
Publication venue
Publication date: 06/04/2018
Field of study

Dialogue assistants are rapidly becoming an indispensable daily aid. To avoid the significant effort needed to hand-craft the required dialogue flow, the Dialogue Management (DM) module can be cast as a continuous Markov Decision Process (MDP) and trained through Reinforcement Learning (RL). Several RL models have been investigated over recent years. However, the lack of a common benchmarking framework makes it difficult to perform a fair comparison between different models and their capability to generalise to different environments. Therefore, this paper proposes a set of challenging simulated environments for dialogue model development and evaluation. To provide some baselines, we investigate a number of representative parametric algorithms, namely deep reinforcement learning algorithms - DQN, A2C and Natural Actor-Critic and compare them to a non-parametric model, GP-SARSA. Both the environments and policy models are implemented using the publicly available PyDial toolkit and released on-line, in order to establish a testbed framework for further experiments and to facilitate experimental reproducibility.Comment: Accepted at the Deep Reinforcement Learning Symposium, 31st Conference on Neural Information Processing Systems (NIPS 2017) Paper updated with minor change

arXiv.org e-Print Archive

Iterative Policy Learning in End-to-End Trainable Task-Oriented Neural Dialog Models

Author: Lane Ian
Liu Bing
Publication venue
Publication date: 18/09/2017
Field of study

In this paper, we present a deep reinforcement learning (RL) framework for iterative dialog policy optimization in end-to-end task-oriented dialog systems. Popular approaches in learning dialog policy with RL include letting a dialog agent to learn against a user simulator. Building a reliable user simulator, however, is not trivial, often as difficult as building a good dialog agent. We address this challenge by jointly optimizing the dialog agent and the user simulator with deep RL by simulating dialogs between the two agents. We first bootstrap a basic dialog agent and a basic user simulator by learning directly from dialog corpora with supervised training. We then improve them further by letting the two agents to conduct task-oriented dialogs and iteratively optimizing their policies with deep RL. Both the dialog agent and the user simulator are designed with neural network models that can be trained end-to-end. Our experiment results show that the proposed method leads to promising improvements on task success rate and total task reward comparing to supervised training and single-agent RL training baseline models.Comment: Accepted at ASRU 201

arXiv.org e-Print Archive

An Incremental Turn-Taking Model For Task-Oriented Dialog Systems

Author: Coman Andrei C.
Murase Yukitoshi
Nakamura Satoshi
Riccardi Giuseppe
Yoshino Koichiro
Publication venue
Publication date: 11/07/2019
Field of study

In a human-machine dialog scenario, deciding the appropriate time for the machine to take the turn is an open research problem. In contrast, humans engaged in conversations are able to timely decide when to interrupt the speaker for competitive or non-competitive reasons. In state-of-the-art turn-by-turn dialog systems the decision on the next dialog action is taken at the end of the utterance. In this paper, we propose a token-by-token prediction of the dialog state from incremental transcriptions of the user utterance. To identify the point of maximal understanding in an ongoing utterance, we a) implement an incremental Dialog State Tracker which is updated on a token basis (iDST) b) re-label the Dialog State Tracking Challenge 2 (DSTC2) dataset and c) adapt it to the incremental turn-taking experimental scenario. The re-labeling consists of assigning a binary value to each token in the user utterance that allows to identify the appropriate point for taking the turn. Finally, we implement an incremental Turn Taking Decider (iTTD) that is trained on these new labels for the turn-taking decision. We show that the proposed model can achieve a better performance compared to a deterministic handcrafted turn-taking algorithm.Comment: Accepted to INTERSPEECH 201

arXiv.org e-Print Archive