Search CORE

45 research outputs found

Recommended from our members

Gaussian processes for POMDP-based dialogue manager optimization

Author: Gašić M
Young S
Publication venue: IEEE Transactions on Audio, Speech and Language Processing
Publication date: 16/09/2013
Field of study

A partially observable Markov decision process (POMDP) has been proposed as a dialog model that enables automatic optimization of the dialog policy and provides robustness to speech understanding errors. Various approximations allow such a model to be used for building real-world dialog systems. However, they require a large number of dialogs to train the dialog policy and hence they typically rely on the availability of a user simulator. They also require significant designer effort to hand-craft the policy representation. We investigate the use of Gaussian processes (GPs) in policy modeling to overcome these problems. We show that GP policy optimization can be implemented for a real world POMDP dialog manager, and in particular: 1) we examine different formulations of a GP policy to minimize variability in the learning process; 2) we find that the use of GP increases the learning rate by an order of magnitude thereby allowing learning by direct interaction with human users; and 3) we demonstrate that designer effort can be substantially reduced by basing the policy directly on the full belief space thereby avoiding ad hoc feature space modeling. Overall, the GP approach represents an important step forward towards fully automatic dialog policy optimization in real world systems.This is the accepted manuscript version of an article first published in IEEE/ACM Transactions on Audio, Speech, and Language Processing. The final published version is available online from IEEE at http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6601004. © 2013 IEEE

Apollo (Cambridge)

Four Mode Based Dialogue Management with Modified POMDP Model

Author: Sathulla Sabiha
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2010
Field of study

This thesis proposes a method to manage the interaction between the user and the system dynamically, through speech or text input which updates the user goals, select system actions and calculate rewards for each system response at each time-stamp. The main focus is made on the dialog manager, which decides how to continue the dialogue. We have used POMDP technique, as it maintains a belief distribution on the dialogue states based on the observations over the dialogue even in a noisy environment. Four contextual control modes are introduced in dialogue management for decision-making mechanism, and to keep track of machine behaviour for each dialogue state. The result obtained proves that our proposed framework has overcome the limitations of prior POMDP methods, and exactly understands the actual intention of the users within the available time, providing very interactive conversation between the user and the computer

Scholarship at UWindsor

Deep reinforcement learning for multi-domain dialogue systems

Author: Carse Jacob
Cuayahuitl Heriberto
Williamson Ashley
Yu Seunghak
Publication venue: 'Center for Open Science'
Publication date: 26/11/2016
Field of study

Standard deep reinforcement learning methods such as Deep Q-Networks (DQN) for multiple tasks (domains) face scalability problems. We propose a method for multi-domain dialogue policy learning---termed NDQN, and apply it to an information-seeking spoken dialogue system in the domains of restaurants and hotels. Experimental results comparing DQN (baseline) versus NDQN (proposed) using simulations report that our proposed method exhibits better scalability and is promising for optimising the behaviour of multi-domain dialogue systems

University of Lincoln Institutional Repository

arXiv.org e-Print Archive

System Optimisation for Multi-access Edge Computing Based on Deep Reinforcement Learning

Author: Wang J
Publication venue: 'Division of Chemical Information and Computer Sciences'
Publication date: 11/11/2021
Field of study

Multi-access edge computing (MEC) is an emerging and important distributed computing paradigm that aims to extend cloud service to the network edge to reduce network traffic and service latency. Proper system optimisation and maintenance are crucial to maintaining high Quality-of-service (QoS) for end-users. However, with the increasing complexity of the architecture of MEC and mobile applications, effectively optimising MEC systems is non-trivial. Traditional optimisation methods are generally based on simplified mathematical models and fixed heuristics, which rely heavily on expert knowledge. As a consequence, when facing dynamic MEC scenarios, considerable human efforts and expertise are required to redesign the model and tune the heuristics, which is time-consuming. This thesis aims to develop deep reinforcement learning (DRL) methods to handle system optimisation problems in MEC. Instead of developing fixed heuristic algorithms for the problems, this thesis aims to design DRL-based methods that enable systems to learn optimal solutions on their own. This research demonstrates the effectiveness of DRL-based methods on two crucial system optimisation problems: task offloading and service migration. Specifically, this thesis first investigate the dependent task offloading problem that considers the inner dependencies of tasks. This research builds a DRL-based method combining sequence-to-sequence (seq2seq) neural network to address the problem. Experiment results demonstrate that our method outperforms the existing heuristic algorithms and achieves near-optimal performance. To further enhance the learning efficiency of the DRL-based task offloading method for unseen learning tasks, this thesis then integrates meta reinforcement learning to handle the task offloading problem. Our method can adapt fast to new environments with a small number of gradient updates and samples. Finally, this thesis exploits the DRL-based solution for the service migration problem in MEC considering user mobility. This research models the service migration problem as a Partially Observable Markov Decision Process (POMDP) and propose a tailored actor-critic algorithm combining Long-short Term Memory (LSTM) to solve the POMDP. Results from extensive experiments based on real-world mobility traces demonstrate that our method consistently outperforms both the heuristic and state-of-the-art learning-driven algorithms on various MEC scenarios

Open Research Exeter

Single-Agent vs. Multi-Agent Techniques for Concurrent Reinforcement Learning of Negotiation Dialogue Policies

Author: Claire Nelson
David Traum
Kallirroi Georgila
Publication venue
Publication date: 11/04/2020
Field of study

Abstract We use single-agent and multi-agent Reinforcement Learning (RL) for learning dialogue policies in a resource allocation negotiation scenario. Two agents learn concurrently by interacting with each other without any need for simulated users (SUs) to train against or corpora to learn from. In particular, we compare the Qlearning, Policy Hill-Climbing (PHC) and Win or Learn Fast Policy Hill-Climbing (PHC-WoLF) algorithms, varying the scenario complexity (state space size), the number of training episodes, the learning rate, and the exploration rate. Our results show that generally Q-learning fails to converge whereas PHC and PHC-WoLF always converge and perform similarly. We also show that very high gradually decreasing exploration rates are required for convergence. We conclude that multiagent RL of dialogue policies is a promising alternative to using single-agent RL and SUs or learning directly from corpora

CiteSeerX

Survey on reinforcement learning for language processing

Author: Martin-Gonzalez Anabel
Navarro-Guerrero Nicolas
Uc-Cetina Victor
Weber Cornelius
Wermter Stefan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/03/2022
Field of study

In recent years some researchers have explored the use of reinforcement learning (RL) algorithms as key components in the solution of various natural language processing tasks. For instance, some of these algorithms leveraging deep neural learning have found their way into conversational systems. This paper reviews the state of the art of RL methods for their possible use for different problems of natural language processing, focusing primarily on conversational systems, mainly due to their growing relevance. We provide detailed descriptions of the problems as well as discussions of why RL is well-suited to solve them. Also, we analyze the advantages and limitations of these methods. Finally, we elaborate on promising research directions in natural language processing that might benefit from reinforcement learning

arXiv.org e-Print Archive

Frames: A Corpus for Adding Memory to Goal-Oriented Dialogue Systems

Author: Asri Layla El
Fine Emery
Harris Justin
Mehrotra Rahul
Schulz Hannes
Sharma Shikhar
Suleman Kaheer
Zumer Jeremie
Publication venue
Publication date: 01/01/2017
Field of study

This paper presents the Frames dataset (Frames is available at http://datasets.maluuba.com/Frames), a corpus of 1369 human-human dialogues with an average of 15 turns per dialogue. We developed this dataset to study the role of memory in goal-oriented dialogue systems. Based on Frames, we introduce a task called frame tracking, which extends state tracking to a setting where several states are tracked simultaneously. We propose a baseline model for this task. We show that Frames can also be used to study memory in dialogue management and information presentation through natural language generation

arXiv.org e-Print Archive

Crossref

Recommended from our members

Recurrent Neural Network Language Generation for Dialogue Systems

Author: Wen Tsung-Hsien
Publication venue: University of Cambridge
Publication date: 08/05/2018
Field of study

Language is the principal medium for ideas, while dialogue is the most natural and effective way for humans to interact with and access information from machines. Natural language generation (NLG) is a critical component of spoken dialogue and it has a significant impact on usability and perceived quality. Many commonly used NLG systems employ rules and heuristics, which tend to generate inflexible and stylised responses without the natural variation of human language. However, the frequent repetition of identical output forms can quickly make dialogue become tedious for most real-world users. Additionally, these rules and heuristics are not scalable and hence not trivially extensible to other domains or languages. A statistical approach to language generation can learn language decisions directly from data without relying on hand-coded rules or heuristics, which brings scalability and flexibility to NLG. Statistical models also provide an opportunity to learn in-domain human colloquialisms and cross-domain model adaptations. A robust and quasi-supervised NLG model is proposed in this thesis. The model leverages a Recurrent Neural Network (RNN)-based surface realiser and a gating mechanism applied to input semantics. The model is motivated by the Long-Short Term Memory (LSTM) network. The RNN-based surface realiser and gating mechanism use a neural network to learn end-to-end language generation decisions from input dialogue act and sentence pairs; it also integrates sentence planning and surface realisation into a single optimisation problem. The single optimisation not only bypasses the costly intermediate linguistic annotations but also generates more natural and human-like responses. Furthermore, a domain adaptation study shows that the proposed model can be readily adapted and extended to new dialogue domains via a proposed recipe. Continuing the success of end-to-end learning, the second part of the thesis speculates on building an end-to-end dialogue system by framing it as a conditional generation problem. The proposed model encapsulates a belief tracker with a minimal state representation and a generator that takes the dialogue context to produce responses. These features suggest comprehension and fast learning. The proposed model is capable of understanding requests and accomplishing tasks after training on only a few hundred human-human dialogues. A complementary Wizard-of-Oz data collection method is also introduced to facilitate the collection of human-human conversations from online workers. The results demonstrate that the proposed model can talk to human judges naturally, without any difficulty, for a sample application domain. In addition, the results also suggest that the introduction of a stochastic latent variable can help the system model intrinsic variation in communicative intention much better.Tsung-Hsien Wen's Ph.D. is supported by Toshiba Research Europe Ltd, Cambridge Research Laborator

Apollo (Cambridge)

A Survey of Available Corpora For Building Data-Driven Dialogue Systems: The Journal Version

Author: Charlin Laurent
Henderson Peter
Lowe Ryan
Pineau Joelle
Serban Iulian Vlad
Publication venue: University of Illinois at Chicago Library
Publication date: 11/05/2018
Field of study

During the past decade, several areas of speech and language understanding have witnessed substantial breakthroughs from the use of data-driven models. In the area of dialogue systems, the trend is less obvious, and most practical systems are still built through significant engineering and expert knowledge. Nevertheless, several recent results suggest that data-driven approaches are feasible and quite promising. To facilitate research in this area, we have carried out a wide survey of publicly available datasets suitable for data-driven learning of dialogue systems. We discuss important characteristics of these datasets, how they can be used to learn diverse dialogue strategies, and their other potential uses. We also examine methods for transfer learning between datasets and the use of external knowledge. Finally, we discuss appropriate choice of evaluation metrics for the learning objective

University of Illinois at Chicago: Journals@UIC

Dialogue & Discourse (E-Journal - Universität Bielefeld)