60 research outputs found
Interval Prediction for Continuous-Time Systems with Parametric Uncertainties
The problem of behaviour prediction for linear parameter-varying systems is
considered in the interval framework. It is assumed that the system is subject
to uncertain inputs and the vector of scheduling parameters is unmeasurable,
but all uncertainties take values in a given admissible set. Then an interval
predictor is designed and its stability is guaranteed applying Lyapunov
function with a novel structure. The conditions of stability are formulated in
the form of linear matrix inequalities. Efficiency of the theoretical results
is demonstrated in the application to safe motion planning for autonomous
vehicles.Comment: 6 pages, CDC 2019. Website:
https://eleurent.github.io/interval-prediction
Practical Open-Loop Optimistic Planning
International audienceWe consider the problem of online planning in a Markov Decision Process when given only access to a generative model, restricted to open-loop policies-i.e. sequences of actions-and under budget constraint. In this setting, the Open-Loop Optimistic Planning (OLOP) algorithm enjoys good theoretical guarantees but is overly conservative in practice, as we show in numerical experiments. We propose a modified version of the algorithm with tighter upper-confidence bounds, KL-OLOP, that leads to better practical performances while retaining the sample complexity bound. Finally, we propose an efficient implementation that significantly improves the time complexity of both algorithms
Budgeted Reinforcement Learning in Continuous State Space
A Budgeted Markov Decision Process (BMDP) is an extension of a Markov
Decision Process to critical applications requiring safety constraints. It
relies on a notion of risk implemented in the shape of a cost signal
constrained to lie below an - adjustable - threshold. So far, BMDPs could only
be solved in the case of finite state spaces with known dynamics. This work
extends the state-of-the-art to continuous spaces environments and unknown
dynamics. We show that the solution to a BMDP is a fixed point of a novel
Budgeted Bellman Optimality operator. This observation allows us to introduce
natural extensions of Deep Reinforcement Learning algorithms to address
large-scale BMDPs. We validate our approach on two simulated applications:
spoken dialogue and autonomous driving.Comment: N. Carrara and E. Leurent have equally contribute
Practical Open-Loop Optimistic Planning
International audienceWe consider the problem of online planning in a Markov Decision Process when given only access to a generative model, restricted to open-loop policies-i.e. sequences of actions-and under budget constraint. In this setting, the Open-Loop Optimistic Planning (OLOP) algorithm enjoys good theoretical guarantees but is overly conservative in practice, as we show in numerical experiments. We propose a modified version of the algorithm with tighter upper-confidence bounds, KL-OLOP, that leads to better practical performances while retaining the sample complexity bound. Finally, we propose an efficient implementation that significantly improves the time complexity of both algorithms
Monte-Carlo Graph Search: the Value of Merging Similar States
International audienceWe consider the problem of planning in a Markov Decision Process (MDP) with a generative model and limited computational budget. Despite the underlying MDP transitions having a graph structure, the popular Monte-Carlo Tree Search algorithms such as UCT rely on a tree structure to represent their value estimates. That is, they do not identify together two similar states reached via different trajectories and represented in separate branches of the tree. In this work, we propose a graph-based planning algorithm, which takes into account this state similarity. In our analysis, we provide a regret bound that depends on a novel problem-dependent measure of difficulty, which improves on the original tree-based bound in MDPs where the trajectories overlap, and recovers it otherwise. Then, we show that this methodology can be adapted to existing planning algorithms that deal with stochastic systems. Finally, numerical simulations illustrate the benefits of our approach
Robust-Adaptive Control of Linear Systems: beyond Quadratic Costs
International audienceWe consider the problem of robust and adaptive model predictive control (MPC) of a linear system, with unknown parameters that are learned along the way (adaptive), in a critical setting where failures must be prevented (robust). This problem has been studied from different perspectives by different communities. However, the existing theory deals only with the case of quadratic costs (the LQ problem), which limits applications to stabilisation and tracking tasks only. In order to handle more general (non-convex) costs that naturally arise in many practical problems, we carefully select and bring together several tools from different communities, namely non-asymptotic linear regression, recent results in interval prediction, and tree-based planning. Combining and adapting the theoretical guarantees at each layer is non trivial, and we provide the first end-to-end suboptimality analysis for this setting. Interestingly, our analysis naturally adapts to handle many models and combines with a data-driven robust model selection strategy, which enables to relax the modelling assumptions. Last, we strive to preserve tractability at any stage of the method, that we illustrate on two challenging simulated environments
Adaptive Reward-Free Exploration
Reward-free exploration is a reinforcement learning setting studied by Jin et
al. (2020), who address it by running several algorithms with regret guarantees
in parallel. In our work, we instead give a more natural adaptive approach for
reward-free exploration which directly reduces upper bounds on the maximum MDP
estimation error. We show that, interestingly, our reward-free UCRL algorithm
can be seen as a variant of an algorithm of Fiechter from 1994, originally
proposed for a different objective that we call best-policy identification. We
prove that RF-UCRL needs of order episodes to output, with probability , an
-approximation of the optimal policy for any reward function. This
bound improves over existing sample-complexity bounds in both the small
and the small regimes. We further investigate the
relative complexities of reward-free exploration and best-policy
identification
- …