Pre-trained and frozen large language models (LLMs) can effectively map
simple scene rearrangement instructions to programs over a robot's visuomotor
functions through appropriate few-shot example prompting. To parse open-domain
natural language and adapt to a user's idiosyncratic procedures, not known
during prompt engineering time, fixed prompts fall short. In this paper, we
introduce HELPER, an embodied agent equipped with an external memory of
language-program pairs that parses free-form human-robot dialogue into action
programs through retrieval-augmented LLM prompting: relevant memories are
retrieved based on the current dialogue, instruction, correction, or VLM
description, and used as in-context prompt examples for LLM querying. The
memory is expanded during deployment to include pairs of user's language and
action plans, to assist future inferences and personalize them to the user's
language and routines. HELPER sets a new state-of-the-art in the TEACh
benchmark in both Execution from Dialog History (EDH) and Trajectory from
Dialogue (TfD), with a 1.7x improvement over the previous state-of-the-art for
TfD. Our models, code, and video results can be found in our project's website:
https://helper-agent-llm.github.io.Comment: Project page with code & videos: https://helper-agent-llm.github.i