6,590 research outputs found
The Effectiveness of World Models for Continual Reinforcement Learning
World models power some of the most efficient reinforcement learning
algorithms. In this work, we showcase that they can be harnessed for continual
learning - a situation when the agent faces changing environments. World models
typically employ a replay buffer for training, which can be naturally extended
to continual learning. We systematically study how different selective
experience replay methods affect performance, forgetting, and transfer. We also
provide recommendations regarding various modeling options for using world
models. The best set of choices is called Continual-Dreamer, it is
task-agnostic and utilizes the world model for continual exploration.
Continual-Dreamer is sample efficient and outperforms state-of-the-art
task-agnostic continual reinforcement learning methods on Minigrid and Minihack
benchmarks.Comment: Accepted at CoLLAs 2023, 21 pages, 15 figure
Using Hindsight to Anchor Past Knowledge in Continual Learning
In continual learning, the learner faces a stream of data whose distribution
changes over time. Modern neural networks are known to suffer under this
setting, as they quickly forget previously acquired knowledge. To address such
catastrophic forgetting, many continual learning methods implement different
types of experience replay, re-learning on past data stored in a small buffer
known as episodic memory. In this work, we complement experience replay with a
new objective that we call anchoring, where the learner uses bilevel
optimization to update its knowledge on the current task, while keeping intact
the predictions on some anchor points of past tasks. These anchor points are
learned using gradient-based optimization to maximize forgetting, which is
approximated by fine-tuning the currently trained model on the episodic memory
of past tasks. Experiments on several supervised learning benchmarks for
continual learning demonstrate that our approach improves the standard
experience replay in terms of both accuracy and forgetting metrics and for
various sizes of episodic memories.Comment: Accepted at AAAI 202
Offline Experience Replay for Continual Offline Reinforcement Learning
The capability of continuously learning new skills via a sequence of
pre-collected offline datasets is desired for an agent. However, consecutively
learning a sequence of offline tasks likely leads to the catastrophic
forgetting issue under resource-limited scenarios. In this paper, we formulate
a new setting, continual offline reinforcement learning (CORL), where an agent
learns a sequence of offline reinforcement learning tasks and pursues good
performance on all learned tasks with a small replay buffer without exploring
any of the environments of all the sequential tasks. For consistently learning
on all sequential tasks, an agent requires acquiring new knowledge and
meanwhile preserving old knowledge in an offline manner. To this end, we
introduced continual learning algorithms and experimentally found experience
replay (ER) to be the most suitable algorithm for the CORL problem. However, we
observe that introducing ER into CORL encounters a new distribution shift
problem: the mismatch between the experiences in the replay buffer and
trajectories from the learned policy. To address such an issue, we propose a
new model-based experience selection (MBES) scheme to build the replay buffer,
where a transition model is learned to approximate the state distribution. This
model is used to bridge the distribution bias between the replay buffer and the
learned model by filtering the data from offline data that most closely
resembles the learned model for storage. Moreover, in order to enhance the
ability on learning new tasks, we retrofit the experience replay method with a
new dual behavior cloning (DBC) architecture to avoid the disturbance of
behavior-cloning loss on the Q-learning process. In general, we call our
algorithm offline experience replay (OER). Extensive experiments demonstrate
that our OER method outperforms SOTA baselines in widely-used Mujoco
environments.Comment: 9 pages, 4 figure
Class-Incremental Learning Using Generative Experience Replay Based on Time-aware Regularization
Learning new tasks accumulatively without forgetting remains a critical
challenge in continual learning. Generative experience replay addresses this
challenge by synthesizing pseudo-data points for past learned tasks and later
replaying them for concurrent training along with the new tasks' data.
Generative replay is the best strategy for continual learning under a strict
class-incremental setting when certain constraints need to be met: (i) constant
model size, (ii) no pre-training dataset, and (iii) no memory buffer for
storing past tasks' data. Inspired by the biological nervous system mechanisms,
we introduce a time-aware regularization method to dynamically fine-tune the
three training objective terms used for generative replay: supervised learning,
latent regularization, and data reconstruction. Experimental results on major
benchmarks indicate that our method pushes the limit of brain-inspired
continual learners under such strict settings, improves memory retention, and
increases the average performance over continually arriving tasks
Segmentation of Multiple Sclerosis Lesions across Hospitals: Learn Continually or Train from Scratch?
Segmentation of Multiple Sclerosis (MS) lesions is a challenging problem.
Several deep-learning-based methods have been proposed in recent years.
However, most methods tend to be static, that is, a single model trained on a
large, specialized dataset, which does not generalize well. Instead, the model
should learn across datasets arriving sequentially from different hospitals by
building upon the characteristics of lesions in a continual manner. In this
regard, we explore experience replay, a well-known continual learning method,
in the context of MS lesion segmentation across multi-contrast data from 8
different hospitals. Our experiments show that replay is able to achieve
positive backward transfer and reduce catastrophic forgetting compared to
sequential fine-tuning. Furthermore, replay outperforms the multi-domain
training, thereby emerging as a promising solution for the segmentation of MS
lesions. The code is available at this link:
https://github.com/naga-karthik/continual-learning-msComment: Accepted at the Medical Imaging Meets NeurIPS (MedNeurIPS) Workshop
202
- …