1,258 research outputs found
Learning Image-Conditioned Dynamics Models for Control of Under-actuated Legged Millirobots
Millirobots are a promising robotic platform for many applications due to
their small size and low manufacturing costs. Legged millirobots, in
particular, can provide increased mobility in complex environments and improved
scaling of obstacles. However, controlling these small, highly dynamic, and
underactuated legged systems is difficult. Hand-engineered controllers can
sometimes control these legged millirobots, but they have difficulties with
dynamic maneuvers and complex terrains. We present an approach for controlling
a real-world legged millirobot that is based on learned neural network models.
Using less than 17 minutes of data, our method can learn a predictive model of
the robot's dynamics that can enable effective gaits to be synthesized on the
fly for following user-specified waypoints on a given terrain. Furthermore, by
leveraging expressive, high-capacity neural network models, our approach allows
for these predictions to be directly conditioned on camera images, endowing the
robot with the ability to predict how different terrains might affect its
dynamics. This enables sample-efficient and effective learning for locomotion
of a dynamic legged millirobot on various terrains, including gravel, turf,
carpet, and styrofoam. Experiment videos can be found at
https://sites.google.com/view/imageconddy
Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning
Model-free deep reinforcement learning algorithms have been shown to be
capable of learning a wide range of robotic skills, but typically require a
very large number of samples to achieve good performance. Model-based
algorithms, in principle, can provide for much more efficient learning, but
have proven difficult to extend to expressive, high-capacity models such as
deep neural networks. In this work, we demonstrate that medium-sized neural
network models can in fact be combined with model predictive control (MPC) to
achieve excellent sample complexity in a model-based reinforcement learning
algorithm, producing stable and plausible gaits to accomplish various complex
locomotion tasks. We also propose using deep neural network dynamics models to
initialize a model-free learner, in order to combine the sample efficiency of
model-based approaches with the high task-specific performance of model-free
methods. We empirically demonstrate on MuJoCo locomotion tasks that our pure
model-based approach trained on just random action data can follow arbitrary
trajectories with excellent sample efficiency, and that our hybrid algorithm
can accelerate model-free learning on high-speed benchmark tasks, achieving
sample efficiency gains of 3-5x on swimmer, cheetah, hopper, and ant agents.
Videos can be found at https://sites.google.com/view/mbm
Grow Your Limits: Continuous Improvement with Real-World RL for Robotic Locomotion
Deep reinforcement learning (RL) can enable robots to autonomously acquire
complex behaviors, such as legged locomotion. However, RL in the real world is
complicated by constraints on efficiency, safety, and overall training
stability, which limits its practical applicability. We present APRL, a policy
regularization framework that modulates the robot's exploration over the course
of training, striking a balance between flexible improvement potential and
focused, efficient exploration. APRL enables a quadrupedal robot to efficiently
learn to walk entirely in the real world within minutes and continue to improve
with more training where prior work saturates in performance. We demonstrate
that continued training with APRL results in a policy that is substantially
more capable of navigating challenging situations and is able to adapt to
changes in dynamics with continued training.Comment: First two authors contributed equally. Project website:
https://sites.google.com/berkeley.edu/apr
Deep Reinforcement Learning for Tensegrity Robot Locomotion
Tensegrity robots, composed of rigid rods connected by elastic cables, have a
number of unique properties that make them appealing for use as planetary
exploration rovers. However, control of tensegrity robots remains a difficult
problem due to their unusual structures and complex dynamics. In this work, we
show how locomotion gaits can be learned automatically using a novel extension
of mirror descent guided policy search (MDGPS) applied to periodic locomotion
movements, and we demonstrate the effectiveness of our approach on tensegrity
robot locomotion. We evaluate our method with real-world and simulated
experiments on the SUPERball tensegrity robot, showing that the learned
policies generalize to changes in system parameters, unreliable sensor
measurements, and variation in environmental conditions, including varied
terrains and a range of different gravities. Our experiments demonstrate that
our method not only learns fast, power-efficient feedback policies for rolling
gaits, but that these policies can succeed with only the limited onboard
sensing provided by SUPERball's accelerometers. We compare the learned feedback
policies to learned open-loop policies and hand-engineered controllers, and
demonstrate that the learned policy enables the first continuous, reliable
locomotion gait for the real SUPERball robot. Our code and other supplementary
materials are available from http://rll.berkeley.edu/drl_tensegrityComment: International Conference on Robotics and Automation (ICRA), 2017.
Project website link is http://rll.berkeley.edu/drl_tensegrit
Reliable Trajectories for Dynamic Quadrupeds using Analytical Costs and Learned Initializations
Dynamic traversal of uneven terrain is a major objective in the field of
legged robotics. The most recent model predictive control approaches for these
systems can generate robust dynamic motion of short duration; however, planning
over a longer time horizon may be necessary when navigating complex terrain. A
recently-developed framework, Trajectory Optimization for Walking Robots
(TOWR), computes such plans but does not guarantee their reliability on real
platforms, under uncertainty and perturbations. We extend TOWR with analytical
costs to generate trajectories that a state-of-the-art whole-body tracking
controller can successfully execute. To reduce online computation time, we
implement a learning-based scheme for initialization of the nonlinear program
based on offline experience. The execution of trajectories as long as 16
footsteps and 5.5 s over different terrains by a real quadruped demonstrates
the effectiveness of the approach on hardware. This work builds toward an
online system which can efficiently and robustly replan dynamic trajectories.Comment: Video: https://youtu.be/LKFDB_BOhl
- …