World models power some of the most efficient reinforcement learning
algorithms. In this work, we showcase that they can be harnessed for continual
learning - a situation when the agent faces changing environments. World models
typically employ a replay buffer for training, which can be naturally extended
to continual learning. We systematically study how different selective
experience replay methods affect performance, forgetting, and transfer. We also
provide recommendations regarding various modeling options for using world
models. The best set of choices is called Continual-Dreamer, it is
task-agnostic and utilizes the world model for continual exploration.
Continual-Dreamer is sample efficient and outperforms state-of-the-art
task-agnostic continual reinforcement learning methods on Minigrid and Minihack
benchmarks.Comment: Accepted at CoLLAs 2023, 21 pages, 15 figure