The grand aim of having a single robot that can manipulate arbitrary objects
in diverse settings is at odds with the paucity of robotics datasets. Acquiring
and growing such datasets is strenuous due to manual efforts, operational
costs, and safety challenges. A path toward such an universal agent would
require a structured framework capable of wide generalization but trained
within a reasonable data budget. In this paper, we develop an efficient system
(RoboAgent) for training universal agents capable of multi-task manipulation
skills using (a) semantic augmentations that can rapidly multiply existing
datasets and (b) action representations that can extract performant policies
with small yet diverse multi-modal datasets without overfitting. In addition,
reliable task conditioning and an expressive policy architecture enable our
agent to exhibit a diverse repertoire of skills in novel situations specified
using language commands. Using merely 7500 demonstrations, we are able to train
a single agent capable of 12 unique skills, and demonstrate its generalization
over 38 tasks spread across common daily activities in diverse kitchen scenes.
On average, RoboAgent outperforms prior methods by over 40% in unseen
situations while being more sample efficient and being amenable to capability
improvements and extensions through fine-tuning. Videos at
https://robopen.github.io