We seek to create agents that both act and communicate with other agents in
pursuit of a goal. Towards this end, we extend LIGHT (Urbanek et al. 2019)---a
large-scale crowd-sourced fantasy text-game---with a dataset of quests. These
contain natural language motivations paired with in-game goals and human
demonstrations; completing a quest might require dialogue or actions (or both).
We introduce a reinforcement learning system that (1) incorporates large-scale
language modeling-based and commonsense reasoning-based pre-training to imbue
the agent with relevant priors; and (2) leverages a factorized action space of
action commands and dialogue, balancing between the two. We conduct zero-shot
evaluations using held-out human expert demonstrations, showing that our agents
are able to act consistently and talk naturally with respect to their
motivations