1,813 research outputs found
Budgeted Reinforcement Learning in Continuous State Space
A Budgeted Markov Decision Process (BMDP) is an extension of a Markov
Decision Process to critical applications requiring safety constraints. It
relies on a notion of risk implemented in the shape of a cost signal
constrained to lie below an - adjustable - threshold. So far, BMDPs could only
be solved in the case of finite state spaces with known dynamics. This work
extends the state-of-the-art to continuous spaces environments and unknown
dynamics. We show that the solution to a BMDP is a fixed point of a novel
Budgeted Bellman Optimality operator. This observation allows us to introduce
natural extensions of Deep Reinforcement Learning algorithms to address
large-scale BMDPs. We validate our approach on two simulated applications:
spoken dialogue and autonomous driving.Comment: N. Carrara and E. Leurent have equally contribute
Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control
Trial-and-error based reinforcement learning (RL) has seen rapid advancements
in recent times, especially with the advent of deep neural networks. However,
the majority of autonomous RL algorithms require a large number of interactions
with the environment. A large number of interactions may be impractical in many
real-world applications, such as robotics, and many practical systems have to
obey limitations in the form of state space or control constraints. To reduce
the number of system interactions while simultaneously handling constraints, we
propose a model-based RL framework based on probabilistic Model Predictive
Control (MPC). In particular, we propose to learn a probabilistic transition
model using Gaussian Processes (GPs) to incorporate model uncertainty into
long-term predictions, thereby, reducing the impact of model errors. We then
use MPC to find a control sequence that minimises the expected long-term cost.
We provide theoretical guarantees for first-order optimality in the GP-based
transition models with deterministic approximate inference for long-term
planning. We demonstrate that our approach does not only achieve
state-of-the-art data efficiency, but also is a principled way for RL in
constrained environments.Comment: Accepted at AISTATS 2018
Hybrid Behaviour of Markov Population Models
We investigate the behaviour of population models written in Stochastic
Concurrent Constraint Programming (sCCP), a stochastic extension of Concurrent
Constraint Programming. In particular, we focus on models from which we can
define a semantics of sCCP both in terms of Continuous Time Markov Chains
(CTMC) and in terms of Stochastic Hybrid Systems, in which some populations are
approximated continuously, while others are kept discrete. We will prove the
correctness of the hybrid semantics from the point of view of the limiting
behaviour of a sequence of models for increasing population size. More
specifically, we prove that, under suitable regularity conditions, the sequence
of CTMC constructed from sCCP programs for increasing population size converges
to the hybrid system constructed by means of the hybrid semantics. We
investigate in particular what happens for sCCP models in which some
transitions are guarded by boolean predicates or in the presence of
instantaneous transitions
- …