A reinforcement learning agent that needs to pursue different goals across
episodes requires a goal-conditional policy. In addition to their potential to
generalize desirable behavior to unseen goals, such policies may also enable
higher-level planning based on subgoals. In sparse-reward environments, the
capacity to exploit information about the degree to which an arbitrary goal has
been achieved while another goal was intended appears crucial to enable sample
efficient learning. However, reinforcement learning agents have only recently
been endowed with such capacity for hindsight. In this paper, we demonstrate
how hindsight can be introduced to policy gradient methods, generalizing this
idea to a broad class of successful algorithms. Our experiments on a diverse
selection of sparse-reward environments show that hindsight leads to a
remarkable increase in sample efficiency.Comment: Accepted to ICLR 201