Children acquire their native language with apparent ease by observing how
language is used in context and attempting to use it themselves. They do so
without laborious annotations, negative examples, or even direct corrections.
We take a step toward robots that can do the same by training a grounded
semantic parser, which discovers latent linguistic representations that can be
used for the execution of natural-language commands. In particular, we focus on
the difficult domain of commands with a temporal aspect, whose semantics we
capture with Linear Temporal Logic, LTL. Our parser is trained with pairs of
sentences and executions as well as an executor. At training time, the parser
hypothesizes a meaning representation for the input as a formula in LTL. Three
competing pressures allow the parser to discover meaning from language. First,
any hypothesized meaning for a sentence must be permissive enough to reflect
all the annotated execution trajectories. Second, the executor -- a pretrained
end-to-end LTL planner -- must find that the observe trajectories are likely
executions of the meaning. Finally, a generator, which reconstructs the
original input, encourages the model to find representations that conserve
knowledge about the command. Together these ensure that the meaning is neither
too general nor too specific. Our model generalizes well, being able to parse
and execute both machine-generated and human-generated commands, with
near-equal accuracy, despite the fact that the human-generated sentences are
much more varied and complex with an open lexicon. The approach presented here
is not specific to LTL: it can be applied to any domain where sentence meanings
can be hypothesized and an executor can verify these meanings, thus opening the
door to many applications for robotic agents.Comment: 10 pages, 2 figures, Accepted in Conference on Robot Learning (CoRL)
202