CALVIN: A Benchmark for Language-Conditioned Policy Learning for
  Long-Horizon Robot Manipulation Tasks

Burgard, Wolfram; Hermann, Lukas; Mees, Oier; Rosete-Beas, Erick

CALVIN: A Benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks

Authors: Wolfram Burgard
Lukas Hermann
Oier Mees
Erick Rosete-Beas
Publication date: 13 July 2022
Publisher

Abstract

General-purpose robots coexisting with humans in their environment must learn to relate human language to their perceptions and actions to be useful in a range of daily tasks. Moreover, they need to acquire a diverse repertoire of general-purpose skills that allow composing long-horizon tasks by following unconstrained language instructions. In this paper, we present CALVIN (Composing Actions from Language and Vision), an open-source simulated benchmark to learn long-horizon language-conditioned tasks. Our aim is to make it possible to develop agents that can solve many robotic manipulation tasks over a long horizon, from onboard sensors, and specified only via human language. CALVIN tasks are more complex in terms of sequence length, action space, and language than existing vision-and-language task datasets and supports flexible specification of sensor suites. We evaluate the agents in zero-shot to novel language instructions and to novel environments and objects. We show that a baseline model based on multi-context imitation learning performs poorly on CALVIN, suggesting that there is significant room for developing innovative agents that learn to relate human language to their world models with this benchmark.Comment: Accepted for publication at IEEE Robotics and Automation Letters (RAL). Code, models and dataset available at http://calvin.cs.uni-freiburg.d

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2112.03227

Last time updated on 14/02/2022