1 research outputs found
SpawnNet: Learning Generalizable Visuomotor Skills from Pre-trained Networks
The existing internet-scale image and video datasets cover a wide range of
everyday objects and tasks, bringing the potential of learning policies that
have broad generalization. Prior works have explored visual pre-training with
different self-supervised objectives, but the generalization capabilities of
the learned policies remain relatively unknown. In this work, we take the first
step towards this challenge, focusing on how pre-trained representations can
help the generalization of the learned policies. We first identify the key
bottleneck in using a frozen pre-trained visual backbone for policy learning.
We then propose SpawnNet, a novel two-stream architecture that learns to fuse
pre-trained multi-layer representations into a separate network to learn a
robust policy. Through extensive simulated and real experiments, we demonstrate
significantly better categorical generalization compared to prior approaches in
imitation learning settings