When humans conceive how to perform a particular task, they do so
hierarchically: splitting higher-level tasks into smaller sub-tasks. However,
in the literature on natural language (NL) command of situated agents, most
works have treated the procedures to be executed as flat sequences of simple
actions, or any hierarchies of procedures have been shallow at best. In this
paper, we propose a formalism of procedures as programs, a powerful yet
intuitive method of representing hierarchical procedural knowledge for agent
command and control. We further propose a modeling paradigm of hierarchical
modular networks, which consist of a planner and reactors that convert NL
intents to predictions of executable programs and probe the environment for
information necessary to complete the program execution. We instantiate this
framework on the IQA and ALFRED datasets for NL instruction following. Our
model outperforms reactive baselines by a large margin on both datasets. We
also demonstrate that our framework is more data-efficient, and that it allows
for fast iterative development