20,341 research outputs found
Model-free trajectory optimization for reinforcement learning
Many of the recent Trajectory Optimization algorithms alternate between local approximation of the dynamics and conservative policy update.
However, linearly approximating the dynamics in order to derive the new policy can bias the update and prevent convergence to the optimal policy.
In this article, we propose a new model-free algorithm that backpropagates a local quadratic time-dependent Q-Function, allowing the derivation
of the policy update in closed form. Our policy update ensures exact KL-constraint satisfaction without simplifying assumptions on the system
dynamics demonstrating improved performance in comparison to related Trajectory Optimization algorithms linearizing the dynamics
OPT-Mimic: Imitation of Optimized Trajectories for Dynamic Quadruped Behaviors
Reinforcement Learning (RL) has seen many recent successes for quadruped
robot control. The imitation of reference motions provides a simple and
powerful prior for guiding solutions towards desired solutions without the need
for meticulous reward design. While much work uses motion capture data or
hand-crafted trajectories as the reference motion, relatively little work has
explored the use of reference motions coming from model-based trajectory
optimization. In this work, we investigate several design considerations that
arise with such a framework, as demonstrated through four dynamic behaviours:
trot, front hop, 180 backflip, and biped stepping. These are trained in
simulation and transferred to a physical Solo 8 quadruped robot without further
adaptation. In particular, we explore the space of feed-forward designs
afforded by the trajectory optimizer to understand its impact on RL learning
efficiency and sim-to-real transfer. These findings contribute to the long
standing goal of producing robot controllers that combine the interpretability
and precision of model-based optimization with the robustness that model-free
RL-based controllers offer
- …