10 research outputs found

    Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning

    Full text link
    Model-free deep reinforcement learning algorithms have been shown to be capable of learning a wide range of robotic skills, but typically require a very large number of samples to achieve good performance. Model-based algorithms, in principle, can provide for much more efficient learning, but have proven difficult to extend to expressive, high-capacity models such as deep neural networks. In this work, we demonstrate that medium-sized neural network models can in fact be combined with model predictive control (MPC) to achieve excellent sample complexity in a model-based reinforcement learning algorithm, producing stable and plausible gaits to accomplish various complex locomotion tasks. We also propose using deep neural network dynamics models to initialize a model-free learner, in order to combine the sample efficiency of model-based approaches with the high task-specific performance of model-free methods. We empirically demonstrate on MuJoCo locomotion tasks that our pure model-based approach trained on just random action data can follow arbitrary trajectories with excellent sample efficiency, and that our hybrid algorithm can accelerate model-free learning on high-speed benchmark tasks, achieving sample efficiency gains of 3-5x on swimmer, cheetah, hopper, and ant agents. Videos can be found at https://sites.google.com/view/mbm

    Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control

    Full text link
    Trial-and-error based reinforcement learning (RL) has seen rapid advancements in recent times, especially with the advent of deep neural networks. However, the majority of autonomous RL algorithms require a large number of interactions with the environment. A large number of interactions may be impractical in many real-world applications, such as robotics, and many practical systems have to obey limitations in the form of state space or control constraints. To reduce the number of system interactions while simultaneously handling constraints, we propose a model-based RL framework based on probabilistic Model Predictive Control (MPC). In particular, we propose to learn a probabilistic transition model using Gaussian Processes (GPs) to incorporate model uncertainty into long-term predictions, thereby, reducing the impact of model errors. We then use MPC to find a control sequence that minimises the expected long-term cost. We provide theoretical guarantees for first-order optimality in the GP-based transition models with deterministic approximate inference for long-term planning. We demonstrate that our approach does not only achieve state-of-the-art data efficiency, but also is a principled way for RL in constrained environments.Comment: Accepted at AISTATS 2018

    Constructing Parsimonious Analytic Models for Dynamic Systems via Symbolic Regression

    Full text link
    Developing mathematical models of dynamic systems is central to many disciplines of engineering and science. Models facilitate simulations, analysis of the system's behavior, decision making and design of automatic control algorithms. Even inherently model-free control techniques such as reinforcement learning (RL) have been shown to benefit from the use of models, typically learned online. Any model construction method must address the tradeoff between the accuracy of the model and its complexity, which is difficult to strike. In this paper, we propose to employ symbolic regression (SR) to construct parsimonious process models described by analytic equations. We have equipped our method with two different state-of-the-art SR algorithms which automatically search for equations that fit the measured data: Single Node Genetic Programming (SNGP) and Multi-Gene Genetic Programming (MGGP). In addition to the standard problem formulation in the state-space domain, we show how the method can also be applied to input-output models of the NARX (nonlinear autoregressive with exogenous input) type. We present the approach on three simulated examples with up to 14-dimensional state space: an inverted pendulum, a mobile robot, and a bipedal walking robot. A comparison with deep neural networks and local linear regression shows that SR in most cases outperforms these commonly used alternative methods. We demonstrate on a real pendulum system that the analytic model found enables a RL controller to successfully perform the swing-up task, based on a model constructed from only 100 data samples

    Learning Terrain Dynamics: A Gaussian Process Modeling and Optimal Control Adaptation Framework Applied to Robotic Jumping

    Get PDF
    The complex dynamics characterizing deformable terrain presents significant impediments toward the real-world viability of locomotive robotics, particularly for legged machines. We explore vertical, robotic jumping as a model task for legged locomotion on presumed-uncharacterized, nonrigid terrain. By integrating Gaussian process (GP)-based regression and evaluation to estimate ground reaction forces as a function of the state, a 1-D jumper acquires the capability to learn forcing profiles exerted by its environment in tandem with achieving its control objective. The GP-based dynamical model initially assumes a baseline rigid, noncompliant surface. As part of an iterative procedure, the optimizer employing this model generates an optimal control strategy to achieve a target jump height. Experiential data recovered from execution on the true surface model are applied to train the GP, in turn, providing the optimizer a more richly informed dynamical model of the environment. The iterative control-learning procedure was rigorously evaluated in experiment, over different surface types, whereby a robotic hopper was challenged to jump to several different target heights. Each task was achieved within ten attempts, over which the terrain's dynamics were learned. With each iteration, GP predictions of ground forcing became incrementally refined, rapidly matching experimental force measurements. The few-iteration convergence demonstrates a fundamental capacity to both estimate and adapt to unknown terrain dynamics in application-realistic time scales, all with control tools amenable to robotic legged locomotion

    Achieving Practical Functional Electrical Stimulation-driven Reaching Motions In An Individual With Tetraplegia

    Get PDF
    Functional electrical stimulation (FES) is a promising technique for restoring the ability to complete reaching motions to individuals with tetraplegia due to a spinal cord injury (SCI). FES has proven to be a successful technique for controlling many functional tasks such as grasping, standing, and even limited walking. However, translating these successes to reaching motions has proven difficult due to the complexity of the arm and the goaldirected nature of reaching motions. The state-of-the-art systems either use robots to assist the FES-driven reaching motions or control the arm of healthy subjects to complete planar motions. These controllers do not directly translate to controlling the full-arm of an individual with tetraplegia because the muscle capabilities of individuals with spinal cord injuries are unique and often limited due to muscle atrophy and the loss of function caused by lower motor neuron damage. This dissertation aims to develop a full-arm FES-driven reaching controller that is capable of achieving 3D reaching motions in an individual with a spinal cord injury. Aim 1 was to develop a complete-arm FES-driven reaching controller that can hold static hand positions for an individual with high tetraplegia due to SCI. We developed a combined feedforward-feedback controller which used the subject-specific model to automatically determine the muscle stimulation commands necessary to hold a desired static hand position. Aim 2 was to develop a subject-specific model-based control strategy to use FES to drive the arm of an individual with high tetraplegia due to SCI along a desired path in the subject’s workspace. We used trajectory optimization to find feasible trajectories which explicitly account for the unique muscle characteristics and the simulated arm dynamics of our subject with tetraplegia. We then developed a model predictive control controller to iii control the arm along the desired trajectory. The controller developed in this dissertation is a significant step towards restoring full arm reaching function to individuals with spinal cord injuries

    Modeling of Magnetic Fields and Extended Objects for Localization Applications

    Full text link
    corecore