3,786 research outputs found
Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning
Model-free deep reinforcement learning algorithms have been shown to be
capable of learning a wide range of robotic skills, but typically require a
very large number of samples to achieve good performance. Model-based
algorithms, in principle, can provide for much more efficient learning, but
have proven difficult to extend to expressive, high-capacity models such as
deep neural networks. In this work, we demonstrate that medium-sized neural
network models can in fact be combined with model predictive control (MPC) to
achieve excellent sample complexity in a model-based reinforcement learning
algorithm, producing stable and plausible gaits to accomplish various complex
locomotion tasks. We also propose using deep neural network dynamics models to
initialize a model-free learner, in order to combine the sample efficiency of
model-based approaches with the high task-specific performance of model-free
methods. We empirically demonstrate on MuJoCo locomotion tasks that our pure
model-based approach trained on just random action data can follow arbitrary
trajectories with excellent sample efficiency, and that our hybrid algorithm
can accelerate model-free learning on high-speed benchmark tasks, achieving
sample efficiency gains of 3-5x on swimmer, cheetah, hopper, and ant agents.
Videos can be found at https://sites.google.com/view/mbm
Stability-Guaranteed Reinforcement Learning for Contact-rich Manipulation
Reinforcement learning (RL) has had its fair share of success in contact-rich
manipulation tasks but it still lags behind in benefiting from advances in
robot control theory such as impedance control and stability guarantees.
Recently, the concept of variable impedance control (VIC) was adopted into RL
with encouraging results. However, the more important issue of stability
remains unaddressed. To clarify the challenge in stable RL, we introduce the
term all-the-time-stability that unambiguously means that every possible
rollout will be stability certified. Our contribution is a model-free RL method
that not only adopts VIC but also achieves all-the-time-stability. Building on
a recently proposed stable VIC controller as the policy parameterization, we
introduce a novel policy search algorithm that is inspired by Cross-Entropy
Method and inherently guarantees stability. Our experimental studies confirm
the feasibility and usefulness of stability guarantee and also features, to the
best of our knowledge, the first successful application of RL with
all-the-time-stability on the benchmark problem of peg-in-hole.Comment: Accepted at Robotics and Automation Letter
- …