Search CORE

2 research outputs found

OPT-Mimic: Imitation of Optimized Trajectories for Dynamic Quadruped Behaviors

Author: Fuchioka Yuni
van de Panne Michiel
Xie Zhaoming
Publication venue
Publication date: 04/11/2022
Field of study

Reinforcement Learning (RL) has seen many recent successes for quadruped robot control. The imitation of reference motions provides a simple and powerful prior for guiding solutions towards desired solutions without the need for meticulous reward design. While much work uses motion capture data or hand-crafted trajectories as the reference motion, relatively little work has explored the use of reference motions coming from model-based trajectory optimization. In this work, we investigate several design considerations that arise with such a framework, as demonstrated through four dynamic behaviours: trot, front hop, 180 backflip, and biped stepping. These are trained in simulation and transferred to a physical Solo 8 quadruped robot without further adaptation. In particular, we explore the space of feed-forward designs afforded by the trajectory optimizer to understand its impact on RL learning efficiency and sim-to-real transfer. These findings contribute to the long standing goal of producing robot controllers that combine the interpretability and precision of model-based optimization with the robustness that model-free RL-based controllers offer

arXiv.org e-Print Archive

Imitating optimized trajectories for dynamic quadruped behaviors

Author: Fuchioka Yuni
Publication venue: University of British Columbia Press
Publication date: 01/05/2023
Field of study

Reinforcement Learning (RL) has seen many recent successes for quadruped robot control. The imitation of reference motions provides a simple and powerful prior for guiding solutions towards desired solutions without the need for meticulous reward design. While much work uses motion capture data or hand-crafted trajectories as the reference motion, relatively little work has explored the use of reference motions coming from model-based trajectory optimization. This may be advantageous because using high quality reference motions from trajectory optimization could alleviate the need to tune RL environments specifically to every task, thus shortening the time necessary to design controllers for robots through RL. In this work, we investigate several design considerations that arise with such a framework, as demonstrated through four dynamic behaviours: trot, front hop, 180 backflip, and biped stepping. These are trained in simulation and transferred to a physical Solo 8 quadruped robot without further adaptation. In particular, we explore the space of feed-forward designs afforded by the trajectory optimizer to understand its impact on RL learning efficiency and sim-to-real transfer. These findings contribute to the long standing goal of producing robot controllers that combine the interpretability and fast optimization of model-based optimization with the robustness that model-free RL-based controllers offer.Science, Faculty ofComputer Science, Department ofGraduat

University of British Columbia: cIRcle - UBC's Information Repository