We study the problem of modeling spatiotemporal trajectories over long time
horizons using expert demonstrations. For instance, in sports, agents often
choose action sequences with long-term goals in mind, such as achieving a
certain strategic position. Conventional policy learning approaches, such as
those based on Markov decision processes, generally fail at learning cohesive
long-term behavior in such high-dimensional state spaces, and are only
effective when myopic modeling lead to the desired behavior. The key difficulty
is that conventional approaches are "shallow" models that only learn a single
state-action policy. We instead propose a hierarchical policy class that
automatically reasons about both long-term and short-term goals, which we
instantiate as a hierarchical neural network. We showcase our approach in a
case study on learning to imitate demonstrated basketball trajectories, and
show that it generates significantly more realistic trajectories compared to
non-hierarchical baselines as judged by professional sports analysts.Comment: Published in NIPS 201