10 research outputs found
Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning
Model-free deep reinforcement learning algorithms have been shown to be
capable of learning a wide range of robotic skills, but typically require a
very large number of samples to achieve good performance. Model-based
algorithms, in principle, can provide for much more efficient learning, but
have proven difficult to extend to expressive, high-capacity models such as
deep neural networks. In this work, we demonstrate that medium-sized neural
network models can in fact be combined with model predictive control (MPC) to
achieve excellent sample complexity in a model-based reinforcement learning
algorithm, producing stable and plausible gaits to accomplish various complex
locomotion tasks. We also propose using deep neural network dynamics models to
initialize a model-free learner, in order to combine the sample efficiency of
model-based approaches with the high task-specific performance of model-free
methods. We empirically demonstrate on MuJoCo locomotion tasks that our pure
model-based approach trained on just random action data can follow arbitrary
trajectories with excellent sample efficiency, and that our hybrid algorithm
can accelerate model-free learning on high-speed benchmark tasks, achieving
sample efficiency gains of 3-5x on swimmer, cheetah, hopper, and ant agents.
Videos can be found at https://sites.google.com/view/mbm
Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control
Trial-and-error based reinforcement learning (RL) has seen rapid advancements
in recent times, especially with the advent of deep neural networks. However,
the majority of autonomous RL algorithms require a large number of interactions
with the environment. A large number of interactions may be impractical in many
real-world applications, such as robotics, and many practical systems have to
obey limitations in the form of state space or control constraints. To reduce
the number of system interactions while simultaneously handling constraints, we
propose a model-based RL framework based on probabilistic Model Predictive
Control (MPC). In particular, we propose to learn a probabilistic transition
model using Gaussian Processes (GPs) to incorporate model uncertainty into
long-term predictions, thereby, reducing the impact of model errors. We then
use MPC to find a control sequence that minimises the expected long-term cost.
We provide theoretical guarantees for first-order optimality in the GP-based
transition models with deterministic approximate inference for long-term
planning. We demonstrate that our approach does not only achieve
state-of-the-art data efficiency, but also is a principled way for RL in
constrained environments.Comment: Accepted at AISTATS 2018
Constructing Parsimonious Analytic Models for Dynamic Systems via Symbolic Regression
Developing mathematical models of dynamic systems is central to many
disciplines of engineering and science. Models facilitate simulations, analysis
of the system's behavior, decision making and design of automatic control
algorithms. Even inherently model-free control techniques such as reinforcement
learning (RL) have been shown to benefit from the use of models, typically
learned online. Any model construction method must address the tradeoff between
the accuracy of the model and its complexity, which is difficult to strike. In
this paper, we propose to employ symbolic regression (SR) to construct
parsimonious process models described by analytic equations. We have equipped
our method with two different state-of-the-art SR algorithms which
automatically search for equations that fit the measured data: Single Node
Genetic Programming (SNGP) and Multi-Gene Genetic Programming (MGGP). In
addition to the standard problem formulation in the state-space domain, we show
how the method can also be applied to input-output models of the NARX
(nonlinear autoregressive with exogenous input) type. We present the approach
on three simulated examples with up to 14-dimensional state space: an inverted
pendulum, a mobile robot, and a bipedal walking robot. A comparison with deep
neural networks and local linear regression shows that SR in most cases
outperforms these commonly used alternative methods. We demonstrate on a real
pendulum system that the analytic model found enables a RL controller to
successfully perform the swing-up task, based on a model constructed from only
100 data samples
Learning Terrain Dynamics: A Gaussian Process Modeling and Optimal Control Adaptation Framework Applied to Robotic Jumping
The complex dynamics characterizing deformable terrain presents significant impediments toward the real-world viability of locomotive robotics, particularly for legged machines. We explore vertical, robotic jumping as a model task for legged locomotion on presumed-uncharacterized, nonrigid terrain. By integrating Gaussian process (GP)-based regression and evaluation to estimate ground reaction forces as a function of the state, a 1-D jumper acquires the capability to learn forcing profiles exerted by its environment in tandem with achieving its control objective. The GP-based dynamical model initially assumes a baseline rigid, noncompliant surface. As part of an iterative procedure, the optimizer employing this model generates an optimal control strategy to achieve a target jump height. Experiential data recovered from execution on the true surface model are applied to train the GP, in turn, providing the optimizer a more richly informed dynamical model of the environment. The iterative control-learning procedure was rigorously evaluated in experiment, over different surface types, whereby a robotic hopper was challenged to jump to several different target heights. Each task was achieved within ten attempts, over which the terrain's dynamics were learned. With each iteration, GP predictions of ground forcing became incrementally refined, rapidly matching experimental force measurements. The few-iteration convergence demonstrates a fundamental capacity to both estimate and adapt to unknown terrain dynamics in application-realistic time scales, all with control tools amenable to robotic legged locomotion
Achieving Practical Functional Electrical Stimulation-driven Reaching Motions In An Individual With Tetraplegia
Functional electrical stimulation (FES) is a promising technique for restoring the ability to complete reaching motions to individuals with tetraplegia due to a spinal cord injury (SCI). FES has proven to be a successful technique for controlling many functional tasks such as grasping, standing, and even limited walking. However, translating these successes to reaching motions has proven difficult due to the complexity of the arm and the goaldirected nature of reaching motions. The state-of-the-art systems either use robots to assist the FES-driven reaching motions or control the arm of healthy subjects to complete planar motions. These controllers do not directly translate to controlling the full-arm of an individual with tetraplegia because the muscle capabilities of individuals with spinal cord injuries are unique and often limited due to muscle atrophy and the loss of function caused by lower motor neuron damage. This dissertation aims to develop a full-arm FES-driven reaching controller that is capable of achieving 3D reaching motions in an individual with a spinal cord injury. Aim 1 was to develop a complete-arm FES-driven reaching controller that can hold static hand positions for an individual with high tetraplegia due to SCI. We developed a combined feedforward-feedback controller which used the subject-specific model to automatically determine the muscle stimulation commands necessary to hold a desired static hand position. Aim 2 was to develop a subject-specific model-based control strategy to use FES to drive the arm of an individual with high tetraplegia due to SCI along a desired path in the subject’s workspace. We used trajectory optimization to find feasible trajectories which explicitly account for the unique muscle characteristics and the simulated arm dynamics of our subject with tetraplegia. We then developed a model predictive control controller to iii control the arm along the desired trajectory. The controller developed in this dissertation is a significant step towards restoring full arm reaching function to individuals with spinal cord injuries