Search CORE

6,590 research outputs found

The Effectiveness of World Models for Continual Reinforcement Learning

Author: Bortkiewicz Michał
Kessler Samuel
Miłoś Piotr
Ostaszewski Mateusz
Parker-Holder Jack
Roberts Stephen J.
Wołczyk Maciej
Żarski Mateusz
Publication venue
Publication date: 12/07/2023
Field of study

World models power some of the most efficient reinforcement learning algorithms. In this work, we showcase that they can be harnessed for continual learning - a situation when the agent faces changing environments. World models typically employ a replay buffer for training, which can be naturally extended to continual learning. We systematically study how different selective experience replay methods affect performance, forgetting, and transfer. We also provide recommendations regarding various modeling options for using world models. The best set of choices is called Continual-Dreamer, it is task-agnostic and utilizes the world model for continual exploration. Continual-Dreamer is sample efficient and outperforms state-of-the-art task-agnostic continual reinforcement learning methods on Minigrid and Minihack benchmarks.Comment: Accepted at CoLLAs 2023, 21 pages, 15 figure

arXiv.org e-Print Archive

Using Hindsight to Anchor Past Knowledge in Continual Learning

Author: Chaudhry Arslan
Dokania Puneet K.
Gordo Albert
Lopez-Paz David
Torr Philip
Publication venue
Publication date: 02/03/2021
Field of study

In continual learning, the learner faces a stream of data whose distribution changes over time. Modern neural networks are known to suffer under this setting, as they quickly forget previously acquired knowledge. To address such catastrophic forgetting, many continual learning methods implement different types of experience replay, re-learning on past data stored in a small buffer known as episodic memory. In this work, we complement experience replay with a new objective that we call anchoring, where the learner uses bilevel optimization to update its knowledge on the current task, while keeping intact the predictions on some anchor points of past tasks. These anchor points are learned using gradient-based optimization to maximize forgetting, which is approximated by fine-tuning the currently trained model on the episodic memory of past tasks. Experiments on several supervised learning benchmarks for continual learning demonstrate that our approach improves the standard experience replay in terms of both accuracy and forgetting metrics and for various sizes of episodic memories.Comment: Accepted at AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Offline Experience Replay for Continual Offline Reinforcement Learning

Author: Gai Sibo
He Li
Wang Donglin
Publication venue
Publication date: 23/05/2023
Field of study

The capability of continuously learning new skills via a sequence of pre-collected offline datasets is desired for an agent. However, consecutively learning a sequence of offline tasks likely leads to the catastrophic forgetting issue under resource-limited scenarios. In this paper, we formulate a new setting, continual offline reinforcement learning (CORL), where an agent learns a sequence of offline reinforcement learning tasks and pursues good performance on all learned tasks with a small replay buffer without exploring any of the environments of all the sequential tasks. For consistently learning on all sequential tasks, an agent requires acquiring new knowledge and meanwhile preserving old knowledge in an offline manner. To this end, we introduced continual learning algorithms and experimentally found experience replay (ER) to be the most suitable algorithm for the CORL problem. However, we observe that introducing ER into CORL encounters a new distribution shift problem: the mismatch between the experiences in the replay buffer and trajectories from the learned policy. To address such an issue, we propose a new model-based experience selection (MBES) scheme to build the replay buffer, where a transition model is learned to approximate the state distribution. This model is used to bridge the distribution bias between the replay buffer and the learned model by filtering the data from offline data that most closely resembles the learned model for storage. Moreover, in order to enhance the ability on learning new tasks, we retrofit the experience replay method with a new dual behavior cloning (DBC) architecture to avoid the disturbance of behavior-cloning loss on the Q-learning process. In general, we call our algorithm offline experience replay (OER). Extensive experiments demonstrate that our OER method outperforms SOTA baselines in widely-used Mujoco environments.Comment: 9 pages, 4 figure

arXiv.org e-Print Archive

Class-Incremental Learning Using Generative Experience Replay Based on Time-aware Regularization

Author: Hu Zizhao
Rostami Mohammad
Publication venue
Publication date: 05/10/2023
Field of study

Learning new tasks accumulatively without forgetting remains a critical challenge in continual learning. Generative experience replay addresses this challenge by synthesizing pseudo-data points for past learned tasks and later replaying them for concurrent training along with the new tasks' data. Generative replay is the best strategy for continual learning under a strict class-incremental setting when certain constraints need to be met: (i) constant model size, (ii) no pre-training dataset, and (iii) no memory buffer for storing past tasks' data. Inspired by the biological nervous system mechanisms, we introduce a time-aware regularization method to dynamically fine-tune the three training objective terms used for generative replay: supervised learning, latent regularization, and data reconstruction. Experimental results on major benchmarks indicate that our method pushes the limit of brain-inspired continual learners under such strict settings, improves memory retention, and increases the average performance over continually arriving tasks

arXiv.org e-Print Archive

Segmentation of Multiple Sclerosis Lesions across Hospitals: Learn Continually or Train from Scratch?

Author: Bakshi Rohit
Callot Virginie
Chandar Sarath
Cohen-Adad Julien
Filippi Massimo
Granberg Tobias
Karthik Enamundram Naga
Kerbrat Anne
Labauge Pierre
Reich Daniel S.
Talbott Jason
Publication venue
Publication date: 26/10/2022
Field of study

Segmentation of Multiple Sclerosis (MS) lesions is a challenging problem. Several deep-learning-based methods have been proposed in recent years. However, most methods tend to be static, that is, a single model trained on a large, specialized dataset, which does not generalize well. Instead, the model should learn across datasets arriving sequentially from different hospitals by building upon the characteristics of lesions in a continual manner. In this regard, we explore experience replay, a well-known continual learning method, in the context of MS lesion segmentation across multi-contrast data from 8 different hospitals. Our experiments show that replay is able to achieve positive backward transfer and reduce catastrophic forgetting compared to sequential fine-tuning. Furthermore, replay outperforms the multi-domain training, thereby emerging as a promising solution for the segmentation of MS lesions. The code is available at this link: https://github.com/naga-karthik/continual-learning-msComment: Accepted at the Medical Imaging Meets NeurIPS (MedNeurIPS) Workshop 202

arXiv.org e-Print Archive