Everyday tasks of long-horizon and comprising a sequence of multiple implicit
subtasks still impose a major challenge in offline robot control. While a
number of prior methods aimed to address this setting with variants of
imitation and offline reinforcement learning, the learned behavior is typically
narrow and often struggles to reach configurable long-horizon goals. As both
paradigms have complementary strengths and weaknesses, we propose a novel
hierarchical approach that combines the strengths of both methods to learn
task-agnostic long-horizon policies from high-dimensional camera observations.
Concretely, we combine a low-level policy that learns latent skills via
imitation learning and a high-level policy learned from offline reinforcement
learning for skill-chaining the latent behavior priors. Experiments in various
simulated and real robot control tasks show that our formulation enables
producing previously unseen combinations of skills to reach temporally extended
goals by "stitching" together latent skills through goal chaining with an
order-of-magnitude improvement in performance upon state-of-the-art baselines.
We even learn one multi-task visuomotor policy for 25 distinct manipulation
tasks in the real world which outperforms both imitation learning and offline
reinforcement learning techniques.Comment: CoRL 2022. Project website: http://tacorl.cs.uni-freiburg.de