4 research outputs found
Adaptive Regularization for Class-Incremental Learning
Class-Incremental Learning updates a deep classifier with new categories
while maintaining the previously observed class accuracy. Regularizing the
neural network weights is a common method to prevent forgetting previously
learned classes while learning novel ones. However, existing regularizers use a
constant magnitude throughout the learning sessions, which may not reflect the
varying levels of difficulty of the tasks encountered during incremental
learning. This study investigates the necessity of adaptive regularization in
Class-Incremental Learning, which dynamically adjusts the regularization
strength according to the complexity of the task at hand. We propose a Bayesian
Optimization-based approach to automatically determine the optimal
regularization magnitude for each learning task. Our experiments on two
datasets via two regularizers demonstrate the importance of adaptive
regularization for achieving accurate and less forgetful visual incremental
learning
Continual Learning with Dynamic Sparse Training: Exploring Algorithms for Effective Model Updates
Continual learning (CL) refers to the ability of an intelligent system to
sequentially acquire and retain knowledge from a stream of data with as little
computational overhead as possible. To this end; regularization, replay,
architecture, and parameter isolation approaches were introduced to the
literature. Parameter isolation using a sparse network which enables to
allocate distinct parts of the neural network to different tasks and also
allows to share of parameters between tasks if they are similar. Dynamic Sparse
Training (DST) is a prominent way to find these sparse networks and isolate
them for each task. This paper is the first empirical study investigating the
effect of different DST components under the CL paradigm to fill a critical
research gap and shed light on the optimal configuration of DST for CL if it
exists. Therefore, we perform a comprehensive study in which we investigate
various DST components to find the best topology per task on well-known
CIFAR100 and miniImageNet benchmarks in a task-incremental CL setup since our
primary focus is to evaluate the performance of various DST criteria, rather
than the process of mask selection. We found that, at a low sparsity level,
Erdos-Renyi Kernel (ERK) initialization utilizes the backbone more efficiently
and allows to effectively learn increments of tasks. At a high sparsity level,
however, uniform initialization demonstrates more reliable and robust
performance. In terms of growth strategy; performance is dependent on the
defined initialization strategy, and the extent of sparsity. Finally,
adaptivity within DST components is a promising way for better continual
learners
Continual Learning with Dynamic Sparse Training: Exploring Algorithms for Effective Model Updates
peer reviewedContinual learning (CL) refers to the ability of an intelligent system to
sequentially acquire and retain knowledge from a stream of data with as little
computational overhead as possible. To this end; regularization, replay,
architecture, and parameter isolation approaches were introduced to the
literature. Parameter isolation using a sparse network which enables to
allocate distinct parts of the neural network to different tasks and also
allows to share of parameters between tasks if they are similar. Dynamic Sparse
Training (DST) is a prominent way to find these sparse networks and isolate
them for each task. This paper is the first empirical study investigating the
effect of different DST components under the CL paradigm to fill a critical
research gap and shed light on the optimal configuration of DST for CL if it
exists. Therefore, we perform a comprehensive study in which we investigate
various DST components to find the best topology per task on well-known
CIFAR100 and miniImageNet benchmarks in a task-incremental CL setup since our
primary focus is to evaluate the performance of various DST criteria, rather
than the process of mask selection. We found that, at a low sparsity level,
Erdos-R\'enyi Kernel (ERK) initialization utilizes the backbone more
efficiently and allows to effectively learn increments of tasks. At a high
sparsity level, unless it is extreme, uniform initialization demonstrates a
more reliable and robust performance. In terms of growth strategy; performance
is dependent on the defined initialization strategy and the extent of sparsity.
Finally, adaptivity within DST components is a promising way for better
continual learners.9. Industry, innovation and infrastructur
FOCIL: Finetune-and-Freeze for Online Class Incremental Learning by Training Randomly Pruned Sparse Experts
Class incremental learning (CIL) in an online continual learning setting strives to acquire knowledge on a series of novel classes from a data stream, using each data point only once for training. This is more realistic compared to offline modes, where it is assumed that all data from novel class(es) is readily available. Current online CIL approaches store a subset of the previous data which creates heavy overhead costs in terms of both memory and computation, as well as privacy issues. In this paper, we propose a new online CIL approach called FOCIL. It fine-tunes the main architecture continually by training a randomly pruned sparse subnetwork for each task. Then, it freezes the trained connections to prevent forgetting. FOCIL also determines the sparsity level and learning rate per task adaptively and ensures (almost) zero forgetting across all tasks without storing any replay data. Experimental results on 10-Task CIFAR100, 20-Task CIFAR100, and 100-Task TinyImagenet, demonstrate that our method outperforms the SOTA by a large margin. The code is publicly available at https://github.com/muratonuryildirim/FOCIL.9. Industry, innovation and infrastructur