Reinforcement Learning (RL) has seen many recent successes for quadruped
robot control. The imitation of reference motions provides a simple and
powerful prior for guiding solutions towards desired solutions without the need
for meticulous reward design. While much work uses motion capture data or
hand-crafted trajectories as the reference motion, relatively little work has
explored the use of reference motions coming from model-based trajectory
optimization. In this work, we investigate several design considerations that
arise with such a framework, as demonstrated through four dynamic behaviours:
trot, front hop, 180 backflip, and biped stepping. These are trained in
simulation and transferred to a physical Solo 8 quadruped robot without further
adaptation. In particular, we explore the space of feed-forward designs
afforded by the trajectory optimizer to understand its impact on RL learning
efficiency and sim-to-real transfer. These findings contribute to the long
standing goal of producing robot controllers that combine the interpretability
and precision of model-based optimization with the robustness that model-free
RL-based controllers offer