Search CORE

19 research outputs found

Recommended from our members

Gaussian processes for POMDP-based dialogue manager optimization

Author: Gašić M
Young S
Publication venue: IEEE Transactions on Audio, Speech and Language Processing
Publication date: 16/09/2013
Field of study

A partially observable Markov decision process (POMDP) has been proposed as a dialog model that enables automatic optimization of the dialog policy and provides robustness to speech understanding errors. Various approximations allow such a model to be used for building real-world dialog systems. However, they require a large number of dialogs to train the dialog policy and hence they typically rely on the availability of a user simulator. They also require significant designer effort to hand-craft the policy representation. We investigate the use of Gaussian processes (GPs) in policy modeling to overcome these problems. We show that GP policy optimization can be implemented for a real world POMDP dialog manager, and in particular: 1) we examine different formulations of a GP policy to minimize variability in the learning process; 2) we find that the use of GP increases the learning rate by an order of magnitude thereby allowing learning by direct interaction with human users; and 3) we demonstrate that designer effort can be substantially reduced by basing the policy directly on the full belief space thereby avoiding ad hoc feature space modeling. Overall, the GP approach represents an important step forward towards fully automatic dialog policy optimization in real world systems.This is the accepted manuscript version of an article first published in IEEE/ACM Transactions on Audio, Speech, and Language Processing. The final published version is available online from IEEE at http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6601004. © 2013 IEEE

Apollo (Cambridge)

Reward Shaping with Recurrent Neural Networks for Speeding up On-Line Policy Learning in Spoken Dialogue Systems

Author: Gasic Milica
Mrksic Nikola
Su Pei-Hao
Vandyke David
Wen Tsung-Hsien
Young Steve
Publication venue
Publication date: 01/01/2015
Field of study

Statistical spoken dialogue systems have the attractive property of being able to be optimised from data via interactions with real users. However in the reinforcement learning paradigm the dialogue manager (agent) often requires significant time to explore the state-action space to learn to behave in a desirable manner. This is a critical issue when the system is trained on-line with real users where learning costs are expensive. Reward shaping is one promising technique for addressing these concerns. Here we examine three recurrent neural network (RNN) approaches for providing reward shaping information in addition to the primary (task-orientated) environmental feedback. These RNNs are trained on returns from dialogues generated by a simulated user and attempt to diffuse the overall evaluation of the dialogue back down to the turn level to guide the agent towards good behaviour faster. In both simulated and real user scenarios these RNNs are shown to increase policy learning speed. Importantly, they do not require prior knowledge of the user's goal.Comment: Accepted for publication in SigDial 201

arXiv.org e-Print Archive

Crossref

"None of the Above":Measure Uncertainty in Dialog Response Retrieval

Author: Eskenazi Maxine
Feng Yulan
Mehri Shikib
Zhao Tiancheng
Publication venue
Publication date: 01/01/2020
Field of study

This paper discusses the importance of uncovering uncertainty in end-to-end dialog tasks, and presents our experimental results on uncertainty classification on the Ubuntu Dialog Corpus. We show that, instead of retraining models for this specific purpose, the original retrieval model's underlying confidence concerning the best prediction can be captured with trivial additional computation.Comment: Accepted to ACL 2020 as short pape

arXiv.org e-Print Archive

Crossref

Policy committee for adaptation in multi-domain spoken dialogue systems

Author: Gašić M
Mrkšić N
Su PH
Vandyke D
Wen TH
Young Steve
Publication venue: 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)
Publication date: 01/01/2001
Field of study

Moving from limited-domain dialogue systems to open domain dialogue systems raises a number of challenges. One of them is the ability of the system to utilise small amounts of data from disparate domains to build a dialogue manager policy. Previous work has focused on using data from different domains to adapt a generic policy to a specific domain. Inspired by Bayesian committee machines, this paper proposes the use of a committee of dialogue policies. The results show that such a model is particularly beneficial for adaptation in multi-domain dialogue systems. The use of this model significantly improves performance compared to a single policy baseline, as confirmed by the performed real-user trial. This is the first time a dialogue policy has been trained on multiple domains on-line in interaction with real users.The research leading to this work was funded by the EPSRC grant EP/M018946/1 ”Open Domain Statistical Spoken Dialogue Systems”.This is the author accepted manuscript. The final version is available from IEEE via http://dx.doi.org/10.1109/ASRU.2015.740487

Publikationer från KTH

CiteSeerX

Crossref

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Apollo (Cambridge)