19,325 research outputs found
Optimal Reinforcement Learning for Gaussian Systems
The exploration-exploitation trade-off is among the central challenges of
reinforcement learning. The optimal Bayesian solution is intractable in
general. This paper studies to what extent analytic statements about optimal
learning are possible if all beliefs are Gaussian processes. A first order
approximation of learning of both loss and dynamics, for nonlinear,
time-varying systems in continuous time and space, subject to a relatively weak
restriction on the dynamics, is described by an infinite-dimensional partial
differential equation. An approximate finite-dimensional projection gives an
impression for how this result may be helpful.Comment: final pre-conference version of this NIPS 2011 paper. Once again,
please note some nontrivial changes to exposition and interpretation of the
results, in particular in Equation (9) and Eqs. 11-14. The algorithm and
results have remained the same, but their theoretical interpretation has
change
Data-driven Economic NMPC using Reinforcement Learning
Reinforcement Learning (RL) is a powerful tool to perform data-driven optimal
control without relying on a model of the system. However, RL struggles to
provide hard guarantees on the behavior of the resulting control scheme. In
contrast, Nonlinear Model Predictive Control (NMPC) and Economic NMPC (ENMPC)
are standard tools for the closed-loop optimal control of complex systems with
constraints and limitations, and benefit from a rich theory to assess their
closed-loop behavior. Unfortunately, the performance of (E)NMPC hinges on the
quality of the model underlying the control scheme. In this paper, we show that
an (E)NMPC scheme can be tuned to deliver the optimal policy of the real system
even when using a wrong model. This result also holds for real systems having
stochastic dynamics. This entails that ENMPC can be used as a new type of
function approximator within RL. Furthermore, we investigate our results in the
context of ENMPC and formally connect them to the concept of dissipativity,
which is central for the ENMPC stability. Finally, we detail how these results
can be used to deploy classic RL tools for tuning (E)NMPC schemes. We apply
these tools on both a classical linear MPC setting and a standard nonlinear
example from the ENMPC literature
Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control
Trial-and-error based reinforcement learning (RL) has seen rapid advancements
in recent times, especially with the advent of deep neural networks. However,
the majority of autonomous RL algorithms require a large number of interactions
with the environment. A large number of interactions may be impractical in many
real-world applications, such as robotics, and many practical systems have to
obey limitations in the form of state space or control constraints. To reduce
the number of system interactions while simultaneously handling constraints, we
propose a model-based RL framework based on probabilistic Model Predictive
Control (MPC). In particular, we propose to learn a probabilistic transition
model using Gaussian Processes (GPs) to incorporate model uncertainty into
long-term predictions, thereby, reducing the impact of model errors. We then
use MPC to find a control sequence that minimises the expected long-term cost.
We provide theoretical guarantees for first-order optimality in the GP-based
transition models with deterministic approximate inference for long-term
planning. We demonstrate that our approach does not only achieve
state-of-the-art data efficiency, but also is a principled way for RL in
constrained environments.Comment: Accepted at AISTATS 2018
Control Regularization for Reduced Variance Reinforcement Learning
Dealing with high variance is a significant challenge in model-free
reinforcement learning (RL). Existing methods are unreliable, exhibiting high
variance in performance from run to run using different initializations/seeds.
Focusing on problems arising in continuous control, we propose a functional
regularization approach to augmenting model-free RL. In particular, we
regularize the behavior of the deep policy to be similar to a policy prior,
i.e., we regularize in function space. We show that functional regularization
yields a bias-variance trade-off, and propose an adaptive tuning strategy to
optimize this trade-off. When the policy prior has control-theoretic stability
guarantees, we further show that this regularization approximately preserves
those stability guarantees throughout learning. We validate our approach
empirically on a range of settings, and demonstrate significantly reduced
variance, guaranteed dynamic stability, and more efficient learning than deep
RL alone.Comment: Appearing in ICML 201
- …