9 research outputs found
Optimal Reinforcement Learning for Gaussian Systems
The exploration-exploitation trade-off is among the central challenges of
reinforcement learning. The optimal Bayesian solution is intractable in
general. This paper studies to what extent analytic statements about optimal
learning are possible if all beliefs are Gaussian processes. A first order
approximation of learning of both loss and dynamics, for nonlinear,
time-varying systems in continuous time and space, subject to a relatively weak
restriction on the dynamics, is described by an infinite-dimensional partial
differential equation. An approximate finite-dimensional projection gives an
impression for how this result may be helpful.Comment: final pre-conference version of this NIPS 2011 paper. Once again,
please note some nontrivial changes to exposition and interpretation of the
results, in particular in Equation (9) and Eqs. 11-14. The algorithm and
results have remained the same, but their theoretical interpretation has
change
Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control
Trial-and-error based reinforcement learning (RL) has seen rapid advancements
in recent times, especially with the advent of deep neural networks. However,
the majority of autonomous RL algorithms require a large number of interactions
with the environment. A large number of interactions may be impractical in many
real-world applications, such as robotics, and many practical systems have to
obey limitations in the form of state space or control constraints. To reduce
the number of system interactions while simultaneously handling constraints, we
propose a model-based RL framework based on probabilistic Model Predictive
Control (MPC). In particular, we propose to learn a probabilistic transition
model using Gaussian Processes (GPs) to incorporate model uncertainty into
long-term predictions, thereby, reducing the impact of model errors. We then
use MPC to find a control sequence that minimises the expected long-term cost.
We provide theoretical guarantees for first-order optimality in the GP-based
transition models with deterministic approximate inference for long-term
planning. We demonstrate that our approach does not only achieve
state-of-the-art data efficiency, but also is a principled way for RL in
constrained environments.Comment: Accepted at AISTATS 2018
Bayesian Learning-Based Adaptive Control for Safety Critical Systems
Deep learning has enjoyed much recent success, and applying state-of-the-art
model learning methods to controls is an exciting prospect. However, there is a
strong reluctance to use these methods on safety-critical systems, which have
constraints on safety, stability, and real-time performance. We propose a
framework which satisfies these constraints while allowing the use of deep
neural networks for learning model uncertainties. Central to our method is the
use of Bayesian model learning, which provides an avenue for maintaining
appropriate degrees of caution in the face of the unknown. In the proposed
approach, we develop an adaptive control framework leveraging the theory of
stochastic CLFs (Control Lyapunov Functions) and stochastic CBFs (Control
Barrier Functions) along with tractable Bayesian model learning via Gaussian
Processes or Bayesian neural networks. Under reasonable assumptions, we
guarantee stability and safety while adapting to unknown dynamics with
probability 1. We demonstrate this architecture for high-speed terrestrial
mobility targeting potential applications in safety-critical high-speed Mars
rover missions.Comment: Corrected an error in section II, where previously the problem was
introduced in a non-stochastic setting and wrongly assumed the solution to an
ODE with Gaussian distributed parametric uncertainty was equivalent to an SDE
with a learned diffusion term. See Lew, T et al. "On the Problem of
Reformulating Systems with Uncertain Dynamics as a Stochastic Differential
Equation
Gaussian Max-Value Entropy Search for Multi-Agent Bayesian Optimization
We study the multi-agent Bayesian optimization (BO) problem, where multiple
agents maximize a black-box function via iterative queries. We focus on Entropy
Search (ES), a sample-efficient BO algorithm that selects queries to maximize
the mutual information about the maximum of the black-box function. One of the
main challenges of ES is that calculating the mutual information requires
computationally-costly approximation techniques. For multi-agent BO problems,
the computational cost of ES is exponential in the number of agents. To address
this challenge, we propose the Gaussian Max-value Entropy Search, a multi-agent
BO algorithm with favorable sample and computational efficiency. The key to our
idea is to use a normal distribution to approximate the function maximum and
calculate its mutual information accordingly. The resulting approximation
allows queries to be cast as the solution of a closed-form optimization problem
which, in turn, can be solved via a modified gradient ascent algorithm and
scaled to a large number of agents. We demonstrate the effectiveness of
Gaussian max-value Entropy Search through numerical experiments on standard
test functions and real-robot experiments on the source-seeking problem.
Results show that the proposed algorithm outperforms the multi-agent BO
baselines in the numerical experiments and can stably seek the source with a
limited number of noisy observations on real robots.Comment: 10 pages, 9 figure
Probabilistic models for data efficient reinforcement learning
Trial-and-error based reinforcement learning (RL) has seen rapid advancements
in recent times, especially with the advent of deep neural networks. However, the
standard deep learning methods often overlook the progress made in control theory
by treating systems as black-box. We propose a model-based RL framework based
on probabilistic Model Predictive Control (MPC). In particular, we propose to learn
a probabilistic transition model using Gaussian Processes (GPs) to incorporate model
uncertainty into long-term predictions, thereby, reducing the impact of model errors. We
provide theoretical guarantees for first-order optimality in the GP-based transition models
with deterministic approximate inference for long-term planning. We demonstrate that
our approach not only achieves the state-of-the-art data efficiency, but also is a principled
way for RL in constrained environments.
When the true state of the dynamical system cannot be fully observed the standard
model based methods cannot be directly applied. For these systems an additional step of
state estimation is needed. We propose distributed message passing for state estimation in
non-linear dynamical systems. In particular, we propose to use expectation propagation
(EP) to iteratively refine the state estimate, i.e., the Gaussian posterior distribution on the
latent state. We show two things: (a) Classical Rauch-Tung-Striebel (RTS) smoothers,
such as the extended Kalman smoother (EKS) or the unscented Kalman smoother (UKS),
are special cases of our message passing scheme; (b) running the message passing
scheme more than once can lead to significant improvements over the classical RTS
smoothers. We show the explicit connection between message passing with EP and
well-known RTS smoothers and provide a practical implementation of the suggested
algorithm. Furthermore, we address convergence issues of EP by generalising this
framework to damped updates and the consideration of general -divergences.
Probabilistic models can also be used to generate synthetic data. In model based RL
we use ’synthetic’ data as a proxy to real environments and in order to achieve high data
efficiency. The ability to generate high-fidelity synthetic data is crucial when available
(real) data is limited as in RL or where privacy and data protection standards allow
only for limited use of the given data, e.g., in medical and financial data-sets. Current
state-of-the-art methods for synthetic data generation are based on generative models,
such as Generative Adversarial Networks (GANs). Even though GANs have achieved
remarkable results in synthetic data generation, they are often challenging to interpret.
Furthermore, GAN-based methods can suffer when used with mixed real and categorical
variables. Moreover, the loss function (discriminator loss) design itself is problem
specific, i.e., the generative model may not be useful for tasks it was not explicitly trained
for. In this paper, we propose to use a probabilistic model as a synthetic data generator.
Learning the probabilistic model for the data is equivalent to estimating the density of
the data. Based on the copula theory, we divide the density estimation task into two parts,
i.e., estimating univariate marginals and estimating the multivariate copula density over
the univariate marginals. We use normalising flows to learn both the copula density and
univariate marginals. We benchmark our method on both simulated and real data-sets in
terms of density estimation as well as the ability to generate high-fidelity synthetic data.Open Acces