466,376 research outputs found
CompILE: Compositional Imitation Learning and Execution
We introduce Compositional Imitation Learning and Execution (CompILE): a
framework for learning reusable, variable-length segments of
hierarchically-structured behavior from demonstration data. CompILE uses a
novel unsupervised, fully-differentiable sequence segmentation module to learn
latent encodings of sequential data that can be re-composed and executed to
perform new tasks. Once trained, our model generalizes to sequences of longer
length and from environment instances not seen during training. We evaluate
CompILE in a challenging 2D multi-task environment and a continuous control
task, and show that it can find correct task boundaries and event encodings in
an unsupervised manner. Latent codes and associated behavior policies
discovered by CompILE can be used by a hierarchical agent, where the high-level
policy selects actions in the latent code space, and the low-level,
task-specific policies are simply the learned decoders. We found that our
CompILE-based agent could learn given only sparse rewards, where agents without
task-specific policies struggle.Comment: ICML (2019
Task-Agnostic Continual Reinforcement Learning: In Praise of a Simple Baseline
We study methods for task-agnostic continual reinforcement learning (TACRL).
TACRL is a setting that combines the difficulties of partially-observable RL (a
consequence of task agnosticism) and the difficulties of continual learning
(CL), i.e., learning on a non-stationary sequence of tasks. We compare TACRL
methods with their soft upper bounds prescribed by previous literature:
multi-task learning (MTL) methods which do not have to deal with non-stationary
data distributions, as well as task-aware methods, which are allowed to operate
under full observability. We consider a previously unexplored and
straightforward baseline for TACRL, replay-based recurrent RL (3RL), in which
we augment an RL algorithm with recurrent mechanisms to mitigate partial
observability and experience replay mechanisms for catastrophic forgetting in
CL.
By studying empirical performance in a sequence of RL tasks, we find
surprising occurrences of 3RL matching and overcoming the MTL and task-aware
soft upper bounds. We lay out hypotheses that could explain this inflection
point of continual and task-agnostic learning research. Our hypotheses are
empirically tested in continuous control tasks via a large-scale study of the
popular multi-task and continual learning benchmark Meta-World. By analyzing
different training statistics including gradient conflict, we find evidence
that 3RL's outperformance stems from its ability to quickly infer how new tasks
relate with the previous ones, enabling forward transfer
- …