6,970 research outputs found
Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control
Trial-and-error based reinforcement learning (RL) has seen rapid advancements
in recent times, especially with the advent of deep neural networks. However,
the majority of autonomous RL algorithms require a large number of interactions
with the environment. A large number of interactions may be impractical in many
real-world applications, such as robotics, and many practical systems have to
obey limitations in the form of state space or control constraints. To reduce
the number of system interactions while simultaneously handling constraints, we
propose a model-based RL framework based on probabilistic Model Predictive
Control (MPC). In particular, we propose to learn a probabilistic transition
model using Gaussian Processes (GPs) to incorporate model uncertainty into
long-term predictions, thereby, reducing the impact of model errors. We then
use MPC to find a control sequence that minimises the expected long-term cost.
We provide theoretical guarantees for first-order optimality in the GP-based
transition models with deterministic approximate inference for long-term
planning. We demonstrate that our approach does not only achieve
state-of-the-art data efficiency, but also is a principled way for RL in
constrained environments.Comment: Accepted at AISTATS 2018
PILCO: A Model-Based and Data-Efficient Approach to Policy Search
In this paper, we introduce PILCO, a practical, data-efficient model-based policy search method. PILCO reduces model bias, one of the key problems of model-based reinforcement learning, in a principled way. By learning a probabilistic dynamics model and explicitly incorporating model uncertainty into long-term planning, PILCO can cope with very little data and facilitates learning from scratch in only a few trials. Policy evaluation is performed in closed form using state-of-the-art approximate inference. Furthermore, policy gradients are computed analytically for policy improvement. We report unprecedented learning efficiency on challenging and high-dimensional control tasks. Copyright 2011 by the author(s)/owner(s)
Agile Autonomous Driving using End-to-End Deep Imitation Learning
We present an end-to-end imitation learning system for agile, off-road
autonomous driving using only low-cost sensors. By imitating a model predictive
controller equipped with advanced sensors, we train a deep neural network
control policy to map raw, high-dimensional observations to continuous steering
and throttle commands. Compared with recent approaches to similar tasks, our
method requires neither state estimation nor on-the-fly planning to navigate
the vehicle. Our approach relies on, and experimentally validates, recent
imitation learning theory. Empirically, we show that policies trained with
online imitation learning overcome well-known challenges related to covariate
shift and generalize better than policies trained with batch imitation
learning. Built on these insights, our autonomous driving system demonstrates
successful high-speed off-road driving, matching the state-of-the-art
performance.Comment: 13 pages, Robotics: Science and Systems (RSS) 201
Pseudospectral Model Predictive Control under Partially Learned Dynamics
Trajectory optimization of a controlled dynamical system is an essential part
of autonomy, however many trajectory optimization techniques are limited by the
fidelity of the underlying parametric model. In the field of robotics, a lack
of model knowledge can be overcome with machine learning techniques, utilizing
measurements to build a dynamical model from the data. This paper aims to take
the middle ground between these two approaches by introducing a semi-parametric
representation of the underlying system dynamics. Our goal is to leverage the
considerable information contained in a traditional physics based model and
combine it with a data-driven, non-parametric regression technique known as a
Gaussian Process. Integrating this semi-parametric model with model predictive
pseudospectral control, we demonstrate this technique on both a cart pole and
quadrotor simulation with unmodeled damping and parametric error. In order to
manage parametric uncertainty, we introduce an algorithm that utilizes Sparse
Spectrum Gaussian Processes (SSGP) for online learning after each rollout. We
implement this online learning technique on a cart pole and quadrator, then
demonstrate the use of online learning and obstacle avoidance for the dubin
vehicle dynamics.Comment: Accepted but withdrawn from AIAA Scitech 201
Stochastic MPC Design for a Two-Component Granulation Process
We address the issue of control of a stochastic two-component granulation
process in pharmaceutical applications through using Stochastic Model
Predictive Control (SMPC) and model reduction to obtain the desired particle
distribution. We first use the method of moments to reduce the governing
integro-differential equation down to a nonlinear ordinary differential
equation (ODE). This reduced-order model is employed in the SMPC formulation.
The probabilistic constraints in this formulation keep the variance of
particles' drug concentration in an admissible range. To solve the resulting
stochastic optimization problem, we first employ polynomial chaos expansion to
obtain the Probability Distribution Function (PDF) of the future state
variables using the uncertain variables' distributions. As a result, the
original stochastic optimization problem for a particulate system is converted
to a deterministic dynamic optimization. This approximation lessens the
computation burden of the controller and makes its real time application
possible.Comment: American control Conference, May, 201
Bayesian model predictive control: Efficient model exploration and regret bounds using posterior sampling
Tight performance specifications in combination with operational constraints
make model predictive control (MPC) the method of choice in various industries.
As the performance of an MPC controller depends on a sufficiently accurate
objective and prediction model of the process, a significant effort in the MPC
design procedure is dedicated to modeling and identification. Driven by the
increasing amount of available system data and advances in the field of machine
learning, data-driven MPC techniques have been developed to facilitate the MPC
controller design. While these methods are able to leverage available data,
they typically do not provide principled mechanisms to automatically trade off
exploitation of available data and exploration to improve and update the
objective and prediction model. To this end, we present a learning-based MPC
formulation using posterior sampling techniques, which provides finite-time
regret bounds on the learning performance while being simple to implement using
off-the-shelf MPC software and algorithms. The performance analysis of the
method is based on posterior sampling theory and its practical efficiency is
illustrated using a numerical example of a highly nonlinear dynamical
car-trailer system
- …