3,621 research outputs found

    Budgeted Reinforcement Learning in Continuous State Space

    Get PDF
    A Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints. It relies on a notion of risk implemented in the shape of a cost signal constrained to lie below an - adjustable - threshold. So far, BMDPs could only be solved in the case of finite state spaces with known dynamics. This work extends the state-of-the-art to continuous spaces environments and unknown dynamics. We show that the solution to a BMDP is a fixed point of a novel Budgeted Bellman Optimality operator. This observation allows us to introduce natural extensions of Deep Reinforcement Learning algorithms to address large-scale BMDPs. We validate our approach on two simulated applications: spoken dialogue and autonomous driving.Comment: N. Carrara and E. Leurent have equally contribute

    Budgeted Reinforcement Learning in Continuous State Space

    Get PDF
    International audienceA Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints. It relies on a notion of risk implemented in the shape of a cost signal constrained to lie below an-adjustable-threshold. So far, BMDPs could only be solved in the case of finite state spaces with known dynamics. This work extends the state-of-the-art to continuous spaces environments and unknown dynamics. We show that the solution to a BMDP is a fixed point of a novel Budgeted Bellman Optimality operator. This observation allows us to introduce natural extensions of Deep Reinforcement Learning algorithms to address large-scale BMDPs. We validate our approach on two simulated applications: spoken dialogue and autonomous driving

    Budgeted Reinforcement Learning in Continuous State Space

    Get PDF
    International audienceA Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints. It relies on a notion of risk implemented in the shape of a cost signal constrained to lie below an-adjustable-threshold. So far, BMDPs could only be solved in the case of finite state spaces with known dynamics. This work extends the state-of-the-art to continuous spaces environments and unknown dynamics. We show that the solution to a BMDP is a fixed point of a novel Budgeted Bellman Optimality operator. This observation allows us to introduce natural extensions of Deep Reinforcement Learning algorithms to address large-scale BMDPs. We validate our approach on two simulated applications: spoken dialogue and autonomous driving

    A Fitted-Q Algorithm for Budgeted MDPs

    Get PDF
    Workshop on Safety, Risk and Uncertainty in Reinforcement Learning. https://sites.google.com/view/rl-uai2018/We address the problem of bud-geted/constrained reinforcement learning in continuous state-space using a batch of transitions. For this purpose, we introduce a novel algorithm called Budgeted Fitted-Q (BFTQ). We carry out some preliminary benchmarks on a continuous 2-D world. They show that BFTQ performs as well as a penalized Fitted-Q algorithm while also allowing ones to adapt the trained policy on-the-fly for a given amount of budget and without the need of engineering the reward penalties. We believe that the general principles used to design BFTQ could be used to extend others classical reinforcement learning algorithms to budget-oriented applications

    Regression with Linear Factored Functions

    Full text link
    Many applications that use empirically estimated functions face a curse of dimensionality, because the integrals over most function classes must be approximated by sampling. This paper introduces a novel regression-algorithm that learns linear factored functions (LFF). This class of functions has structural properties that allow to analytically solve certain integrals and to calculate point-wise products. Applications like belief propagation and reinforcement learning can exploit these properties to break the curse and speed up computation. We derive a regularized greedy optimization scheme, that learns factored basis functions during training. The novel regression algorithm performs competitively to Gaussian processes on benchmark tasks, and the learned LFF functions are with 4-9 factored basis functions on average very compact.Comment: Under review as conference paper at ECML/PKDD 201
    corecore