Search CORE

Practical Open-Loop Optimistic Planning

Author: Leurent Edouard
Maillard Odalric-Ambrym
Publication venue: HAL CCSD
Publication date: 16/09/2019
Field of study

International audienceWe consider the problem of online planning in a Markov Decision Process when given only access to a generative model, restricted to open-loop policies-i.e. sequences of actions-and under budget constraint. In this setting, the Open-Loop Optimistic Planning (OLOP) algorithm enjoys good theoretical guarantees but is overly conservative in practice, as we show in numerical experiments. We propose a modified version of the algorithm with tighter upper-confidence bounds, KL-OLOP, that leads to better practical performances while retaining the sample complexity bound. Finally, we propose an efficient implementation that significantly improves the time complexity of both algorithms

A Survey of State-Action Representations for Autonomous Driving

Author: Leurent Edouard
Publication venue: HAL CCSD
Publication date: 29/10/2018
Field of study

arXiv.org e-Print Archive

Budgeted Reinforcement Learning in Continuous State Space

Author: Carrara Nicolas
Laroche Romain
Leurent Edouard
Maillard Odalric-Ambrym
Pietquin Olivier
Urvoy Tanguy
Publication venue
Publication date: 27/05/2019
Field of study

A Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints. It relies on a notion of risk implemented in the shape of a cost signal constrained to lie below an - adjustable - threshold. So far, BMDPs could only be solved in the case of finite state spaces with known dynamics. This work extends the state-of-the-art to continuous spaces environments and unknown dynamics. We show that the solution to a BMDP is a fixed point of a novel Budgeted Bellman Optimality operator. This observation allows us to introduce natural extensions of Deep Reinforcement Learning algorithms to address large-scale BMDPs. We validate our approach on two simulated applications: spoken dialogue and autonomous driving.Comment: N. Carrara and E. Leurent have equally contribute

Practical Open-Loop Optimistic Planning

Author: Leurent Edouard
Maillard Odalric-Ambrym
Publication venue: HAL CCSD
Publication date: 16/09/2019
Field of study

Monte-Carlo Graph Search: the Value of Merging Similar States

Author: Leurent Edouard
Maillard Odalric-Ambrym
Publication venue: HAL CCSD
Publication date: 18/11/2020
Field of study

International audienceWe consider the problem of planning in a Markov Decision Process (MDP) with a generative model and limited computational budget. Despite the underlying MDP transitions having a graph structure, the popular Monte-Carlo Tree Search algorithms such as UCT rely on a tree structure to represent their value estimates. That is, they do not identify together two similar states reached via different trajectories and represented in separate branches of the tree. In this work, we propose a graph-based planning algorithm, which takes into account this state similarity. In our analysis, we provide a regret bound that depends on a novel problem-dependent measure of difficulty, which improves on the original tree-based bound in MDPs where the trajectories overlap, and recovers it otherwise. Then, we show that this methodology can be adapted to existing planning algorithms that deal with stochastic systems. Finally, numerical simulations illustrate the benefits of our approach

Robust-Adaptive Control of Linear Systems: beyond Quadratic Costs

Author: Efimov Denis
Leurent Edouard
Maillard Odalric-Ambrym
Publication venue: HAL CCSD
Publication date: 06/12/2020
Field of study

International audienceWe consider the problem of robust and adaptive model predictive control (MPC) of a linear system, with unknown parameters that are learned along the way (adaptive), in a critical setting where failures must be prevented (robust). This problem has been studied from different perspectives by different communities. However, the existing theory deals only with the case of quadratic costs (the LQ problem), which limits applications to stabilisation and tracking tasks only. In order to handle more general (non-convex) costs that naturally arise in many practical problems, we carefully select and bring together several tools from different communities, namely non-asymptotic linear regression, recent results in interval prediction, and tree-based planning. Combining and adapting the theoretical guarantees at each layer is non trivial, and we provide the first end-to-end suboptimality analysis for this setting. Interestingly, our analysis naturally adapts to handle many models and combines with a data-driven robust model selection strategy, which enables to relax the modelling assumptions. Last, we strive to preserve tractability at any stage of the method, that we illustrate on two challenging simulated environments

arXiv.org e-Print Archive

Adaptive Reward-Free Exploration

Author: Domingues Omar Darwiche
Jonsson Anders
Kaufmann Emilie
Leurent Edouard
Ménard Pierre
Valko Michal
Publication venue
Publication date: 07/10/2020
Field of study

Reward-free exploration is a reinforcement learning setting studied by Jin et al. (2020), who address it by running several algorithms with regret guarantees in parallel. In our work, we instead give a more natural adaptive approach for reward-free exploration which directly reduces upper bounds on the maximum MDP estimation error. We show that, interestingly, our reward-free UCRL algorithm can be seen as a variant of an algorithm of Fiechter from 1994, originally proposed for a different objective that we call best-policy identification. We prove that RF-UCRL needs of order

({SAH^4}/{\varepsilon^2})(\log(1/\delta) + S)

episodes to output, with probability

1-\delta

, an

\varepsilon

-approximation of the optimal policy for any reward function. This bound improves over existing sample-complexity bounds in both the small

\varepsilon

and the small

\delta

regimes. We further investigate the relative complexities of reward-free exploration and best-policy identification