Search CORE

26 research outputs found

Goal-oriented Dialogue Policy Learning from Failures

Author: Chen Xiaoping
Lu Keting
Zhang Shiqi
Publication venue
Publication date: 22/11/2018
Field of study

Reinforcement learning methods have been used for learning dialogue policies. However, learning an effective dialogue policy frequently requires prohibitively many conversations. This is partly because of the sparse rewards in dialogues, and the very few successful dialogues in early learning phase. Hindsight experience replay (HER) enables learning from failures, but the vanilla HER is inapplicable to dialogue learning due to the implicit goals. In this work, we develop two complex HER methods providing different trade-offs between complexity and performance, and, for the first time, enabled HER-based dialogue policy learning. Experiments using a realistic user simulator show that our HER methods perform better than existing experience replay methods (as applied to deep Q-networks) in learning rate

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Towards Solving Text-based Games by Producing Adaptive Action Spaces

Author: Abdala Julián
Baldini Patricia Noemí
Balmaceda Javier
Bambill Héctor
Calandrini Guillermo
Coppo Ricardo
Fernández Andrés
Jakomin Marina
Silva Bustos Camila
Tarnoski Santiago
Tourret María
Publication venue
Publication date: 03/12/2018
Field of study

To solve a text-based game, an agent needs to formulate valid text commands for a given context and find the ones that lead to success. Recent attempts at solving text-based games with deep reinforcement learning have focused on the latter, i.e., learning to act optimally when valid actions are known in advance. In this work, we propose to tackle the first task and train a model that generates the set of all valid commands for a given context. We try three generative models on a dataset generated with Textworld. The best model can generate valid commands which were unseen at training and achieve high

F_1

score on the test set

arXiv.org e-Print Archive

Servicio de Difusión de la Creación Intelectual

Learning a Policy for Opportunistic Active Learning

Author: Mooney Raymond J.
Padmakumar Aishwarya
Stone Peter
Publication venue
Publication date: 01/01/2018
Field of study

Active learning identifies data points to label that are expected to be the most useful in improving a supervised model. Opportunistic active learning incorporates active learning into interactive tasks that constrain possible queries during interactions. Prior work has shown that opportunistic active learning can be used to improve grounding of natural language descriptions in an interactive object retrieval task. In this work, we use reinforcement learning for such an object retrieval task, to learn a policy that effectively trades off task completion with model improvement that would benefit future tasks.Comment: EMNLP 2018 Camera Read

arXiv.org e-Print Archive

Crossref

Frames: A Corpus for Adding Memory to Goal-Oriented Dialogue Systems

Author: Asri Layla El
Fine Emery
Harris Justin
Mehrotra Rahul
Schulz Hannes
Sharma Shikhar
Suleman Kaheer
Zumer Jeremie
Publication venue
Publication date: 01/01/2017
Field of study

This paper presents the Frames dataset (Frames is available at http://datasets.maluuba.com/Frames), a corpus of 1369 human-human dialogues with an average of 15 turns per dialogue. We developed this dataset to study the role of memory in goal-oriented dialogue systems. Based on Frames, we introduce a task called frame tracking, which extends state tracking to a setting where several states are tracked simultaneously. We propose a baseline model for this task. We show that Frames can also be used to study memory in dialogue management and information presentation through natural language generation

arXiv.org e-Print Archive

Crossref

Budgeted Reinforcement Learning in Continuous State Space

Author: Carrara Nicolas
Laroche Romain
Leurent Edouard
Maillard Odalric-Ambrym
Pietquin Olivier
Urvoy Tanguy
Publication venue: HAL CCSD
Publication date: 01/12/2019
Field of study

International audienceA Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints. It relies on a notion of risk implemented in the shape of a cost signal constrained to lie below an-adjustable-threshold. So far, BMDPs could only be solved in the case of finite state spaces with known dynamics. This work extends the state-of-the-art to continuous spaces environments and unknown dynamics. We show that the solution to a BMDP is a fixed point of a novel Budgeted Bellman Optimality operator. This observation allows us to introduce natural extensions of Deep Reinforcement Learning algorithms to address large-scale BMDPs. We validate our approach on two simulated applications: spoken dialogue and autonomous driving