Search CORE

4,550 research outputs found

Budgeted Reinforcement Learning in Continuous State Space

Author: Carrara Nicolas
Laroche Romain
Leurent Edouard
Maillard Odalric-Ambrym
Pietquin Olivier
Urvoy Tanguy
Publication venue
Publication date: 27/05/2019
Field of study

A Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints. It relies on a notion of risk implemented in the shape of a cost signal constrained to lie below an - adjustable - threshold. So far, BMDPs could only be solved in the case of finite state spaces with known dynamics. This work extends the state-of-the-art to continuous spaces environments and unknown dynamics. We show that the solution to a BMDP is a fixed point of a novel Budgeted Bellman Optimality operator. This observation allows us to introduce natural extensions of Deep Reinforcement Learning algorithms to address large-scale BMDPs. We validate our approach on two simulated applications: spoken dialogue and autonomous driving.Comment: N. Carrara and E. Leurent have equally contribute

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Hal-Diderot

Nonstrict hierarchical reinforcement learning for interactive systems and robots

Author: Beck A.
Belpaeme T.
Betteridge J.
Crook P. A.
Cuayáhuitl H.
Cuayáhuitl H.
Cuayáhuitl H.
Cuayáhuitl H.
Cuayáhuitl H.
Daubigney L.
Dethlefs N.
Dethlefs N.
Dethlefs N.
Dethlefs N.
Dethlefs N.
Dethlefs N.
Heeman P.
Janarthanam S.
Keizer S.
Kruijff-Korbayová I.
Kruijff-Korbayová I.
Lemon O.
Li L.
Mitsunaga N.
Nalin M.
Pietquin O.
Schlangen D.
Thomaz A. L.
Williams J.
Young S.
Zue V.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 21/11/2014
Field of study

Conversational systems and robots that use reinforcement learning for policy optimization in large domains often face the problem of limited scalability. This problem has been addressed either by using function approximation techniques that estimate the approximate true value function of a policy or by using a hierarchical decomposition of a learning task into subtasks. We present a novel approach for dialogue policy optimization that combines the benefits of both hierarchical control and function approximation and that allows flexible transitions between dialogue subtasks to give human users more control over the dialogue. To this end, each reinforcement learning agent in the hierarchy is extended with a subtask transition function and a dynamic state space to allow flexible switching between subdialogues. In addition, the subtask policies are represented with linear function approximation in order to generalize the decision making to situations unseen in training. Our proposed approach is evaluated in an interactive conversational robot that learns to play quiz games. Experimental results, using simulation and real users, provide evidence that our proposed approach can lead to more flexible (natural) interactions than strict hierarchical control and that it is preferred by human users