417 research outputs found
Shaping in Practice: Training Wheels to Learn Fast Hopping Directly in Hardware
Learning instead of designing robot controllers can greatly reduce
engineering effort required, while also emphasizing robustness. Despite
considerable progress in simulation, applying learning directly in hardware is
still challenging, in part due to the necessity to explore potentially unstable
parameters. We explore the concept of shaping the reward landscape with
training wheels: temporary modifications of the physical hardware that
facilitate learning. We demonstrate the concept with a robot leg mounted on a
boom learning to hop fast. This proof of concept embodies typical challenges
such as instability and contact, while being simple enough to empirically map
out and visualize the reward landscape. Based on our results we propose three
criteria for designing effective training wheels for learning in robotics. A
video synopsis can be found at https://youtu.be/6iH5E3LrYh8.Comment: Accepted to the IEEE International Conference on Robotics and
Automation (ICRA) 2018, 6 pages, 6 figure
Robust Quadrupedal Locomotion via Risk-Averse Policy Learning
The robustness of legged locomotion is crucial for quadrupedal robots in
challenging terrains. Recently, Reinforcement Learning (RL) has shown promising
results in legged locomotion and various methods try to integrate privileged
distillation, scene modeling, and external sensors to improve the
generalization and robustness of locomotion policies. However, these methods
are hard to handle uncertain scenarios such as abrupt terrain changes or
unexpected external forces. In this paper, we consider a novel risk-sensitive
perspective to enhance the robustness of legged locomotion. Specifically, we
employ a distributional value function learned by quantile regression to model
the aleatoric uncertainty of environments, and perform risk-averse policy
learning by optimizing the worst-case scenarios via a risk distortion measure.
Extensive experiments in both simulation environments and a real Aliengo robot
demonstrate that our method is efficient in handling various external
disturbances, and the resulting policy exhibits improved robustness in harsh
and uncertain situations in legged locomotion. Videos are available at
https://risk-averse-locomotion.github.io/.Comment: 8 pages, 5 figure
Grow Your Limits: Continuous Improvement with Real-World RL for Robotic Locomotion
Deep reinforcement learning (RL) can enable robots to autonomously acquire
complex behaviors, such as legged locomotion. However, RL in the real world is
complicated by constraints on efficiency, safety, and overall training
stability, which limits its practical applicability. We present APRL, a policy
regularization framework that modulates the robot's exploration over the course
of training, striking a balance between flexible improvement potential and
focused, efficient exploration. APRL enables a quadrupedal robot to efficiently
learn to walk entirely in the real world within minutes and continue to improve
with more training where prior work saturates in performance. We demonstrate
that continued training with APRL results in a policy that is substantially
more capable of navigating challenging situations and is able to adapt to
changes in dynamics with continued training.Comment: First two authors contributed equally. Project website:
https://sites.google.com/berkeley.edu/apr
- …