Search CORE

8,111 research outputs found

Budgeted Reinforcement Learning in Continuous State Space

Author: Carrara Nicolas
Laroche Romain
Leurent Edouard
Maillard Odalric-Ambrym
Pietquin Olivier
Urvoy Tanguy
Publication venue
Publication date: 27/05/2019
Field of study

A Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints. It relies on a notion of risk implemented in the shape of a cost signal constrained to lie below an - adjustable - threshold. So far, BMDPs could only be solved in the case of finite state spaces with known dynamics. This work extends the state-of-the-art to continuous spaces environments and unknown dynamics. We show that the solution to a BMDP is a fixed point of a novel Budgeted Bellman Optimality operator. This observation allows us to introduce natural extensions of Deep Reinforcement Learning algorithms to address large-scale BMDPs. We validate our approach on two simulated applications: spoken dialogue and autonomous driving.Comment: N. Carrara and E. Leurent have equally contribute

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Hal-Diderot