399 research outputs found
Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning
Model-free deep reinforcement learning algorithms have been shown to be
capable of learning a wide range of robotic skills, but typically require a
very large number of samples to achieve good performance. Model-based
algorithms, in principle, can provide for much more efficient learning, but
have proven difficult to extend to expressive, high-capacity models such as
deep neural networks. In this work, we demonstrate that medium-sized neural
network models can in fact be combined with model predictive control (MPC) to
achieve excellent sample complexity in a model-based reinforcement learning
algorithm, producing stable and plausible gaits to accomplish various complex
locomotion tasks. We also propose using deep neural network dynamics models to
initialize a model-free learner, in order to combine the sample efficiency of
model-based approaches with the high task-specific performance of model-free
methods. We empirically demonstrate on MuJoCo locomotion tasks that our pure
model-based approach trained on just random action data can follow arbitrary
trajectories with excellent sample efficiency, and that our hybrid algorithm
can accelerate model-free learning on high-speed benchmark tasks, achieving
sample efficiency gains of 3-5x on swimmer, cheetah, hopper, and ant agents.
Videos can be found at https://sites.google.com/view/mbm
Learning to stop: a unifying principle for legged locomotion in varying environments.
Evolutionary studies have unequivocally proven the transition of living organisms from water to land. Consequently, it can be deduced that locomotion strategies must have evolved from one environment to the other. However, the mechanism by which this transition happened and its implications on bio-mechanical studies and robotics research have not been explored in detail. This paper presents a unifying control strategy for locomotion in varying environments based on the principle of 'learning to stop'. Using a common reinforcement learning framework, deep deterministic policy gradient, we show that our proposed learning strategy facilitates a fast and safe methodology for transferring learned controllers from the facile water environment to the harsh land environment. Our results not only propose a plausible mechanism for safe and quick transition of locomotion strategies from a water to land environment but also provide a novel alternative for safer and faster training of robots
From Knowing to Doing: Learning Diverse Motor Skills through Instruction Learning
Recent years have witnessed many successful trials in the robot learning
field. For contact-rich robotic tasks, it is challenging to learn coordinated
motor skills by reinforcement learning. Imitation learning solves this problem
by using a mimic reward to encourage the robot to track a given reference
trajectory. However, imitation learning is not so efficient and may constrain
the learned motion. In this paper, we propose instruction learning, which is
inspired by the human learning process and is highly efficient, flexible, and
versatile for robot motion learning. Instead of using a reference signal in the
reward, instruction learning applies a reference signal directly as a
feedforward action, and it is combined with a feedback action learned by
reinforcement learning to control the robot. Besides, we propose the action
bounding technique and remove the mimic reward, which is shown to be crucial
for efficient and flexible learning. We compare the performance of instruction
learning with imitation learning, indicating that instruction learning can
greatly speed up the training process and guarantee learning the desired motion
correctly. The effectiveness of instruction learning is validated through a
bunch of motion learning examples for a biped robot and a quadruped robot,
where skills can be learned typically within several million steps. Besides, we
also conduct sim-to-real transfer and online learning experiments on a real
quadruped robot. Instruction learning has shown great merits and potential,
making it a promising alternative for imitation learning
- …