43 research outputs found
Learning When to Switch: Composing Controllers to Traverse a Sequence of Terrain Artifacts
Legged robots often use separate control policiesthat are highly engineered
for traversing difficult terrain suchas stairs, gaps, and steps, where
switching between policies isonly possible when the robot is in a region that
is commonto adjacent controllers. Deep Reinforcement Learning (DRL)is a
promising alternative to hand-crafted control design,though typically requires
the full set of test conditions to beknown before training. DRL policies can
result in complex(often unrealistic) behaviours that have few or no
overlappingregions between adjacent policies, making it difficult to
switchbehaviours. In this work we develop multiple DRL policieswith Curriculum
Learning (CL), each that can traverse asingle respective terrain condition,
while ensuring an overlapbetween policies. We then train a network for each
destinationpolicy that estimates the likelihood of successfully switchingfrom
any other policy. We evaluate our switching methodon a previously unseen
combination of terrain artifacts andshow that it performs better than heuristic
methods. Whileour method is trained on individual terrain types, it
performscomparably to a Deep Q Network trained on the full set ofterrain
conditions. This approach allows the development ofseparate policies in
constrained conditions with embedded priorknowledge about each behaviour, that
is scalable to any numberof behaviours, and prepares DRL methods for
applications inthe real worl
Guided Curriculum Learning for Walking Over Complex Terrain
Reliable bipedal walking over complex terrain is a challenging problem, using
a curriculum can help learning. Curriculum learning is the idea of starting
with an achievable version of a task and increasing the difficulty as a success
criteria is met. We propose a 3-stage curriculum to train Deep Reinforcement
Learning policies for bipedal walking over various challenging terrains. In the
first stage, the agent starts on an easy terrain and the terrain difficulty is
gradually increased, while forces derived from a target policy are applied to
the robot joints and the base. In the second stage, the guiding forces are
gradually reduced to zero. Finally, in the third stage, random perturbations
with increasing magnitude are applied to the robot base, so the robustness of
the policies are improved. In simulation experiments, we show that our approach
is effective in learning walking policies, separate from each other, for five
terrain types: flat, hurdles, gaps, stairs, and steps. Moreover, we demonstrate
that in the absence of human demonstrations, a simple hand designed walking
trajectory is a sufficient prior to learn to traverse complex terrain types. In
ablation studies, we show that taking out any one of the three stages of the
curriculum degrades the learning performance.Comment: Submitted to Australasian Conference on Robotics and Automation
(ACRA) 202
DiffMimic: Efficient Motion Mimicking with Differentiable Physics
Motion mimicking is a foundational task in physics-based character animation.
However, most existing motion mimicking methods are built upon reinforcement
learning (RL) and suffer from heavy reward engineering, high variance, and slow
convergence with hard explorations. Specifically, they usually take tens of
hours or even days of training to mimic a simple motion sequence, resulting in
poor scalability. In this work, we leverage differentiable physics simulators
(DPS) and propose an efficient motion mimicking method dubbed DiffMimic. Our
key insight is that DPS casts a complex policy learning task to a much simpler
state matching problem. In particular, DPS learns a stable policy by analytical
gradients with ground-truth physical priors hence leading to significantly
faster and stabler convergence than RL-based methods. Moreover, to escape from
local optima, we utilize a Demonstration Replay mechanism to enable stable
gradient backpropagation in a long horizon. Extensive experiments on standard
benchmarks show that DiffMimic has a better sample efficiency and time
efficiency than existing methods (e.g., DeepMimic). Notably, DiffMimic allows a
physically simulated character to learn Backflip after 10 minutes of training
and be able to cycle it after 3 hours of training, while the existing approach
may require about a day of training to cycle Backflip. More importantly, we
hope DiffMimic can benefit more differentiable animation systems with
techniques like differentiable clothes simulation in future research.Comment: ICLR 2023 Code is at https://github.com/jiawei-ren/diffmimic Project
page is at https://diffmimic.github.io