56 research outputs found
Frosting Weights for Better Continual Training
Training a neural network model can be a lifelong learning process and is a
computationally intensive one. A severe adverse effect that may occur in deep
neural network models is that they can suffer from catastrophic forgetting
during retraining on new data. To avoid such disruptions in the continuous
learning, one appealing property is the additive nature of ensemble models. In
this paper, we propose two generic ensemble approaches, gradient boosting and
meta-learning, to solve the catastrophic forgetting problem in tuning
pre-trained neural network models
SIESTA: Efficient Online Continual Learning with Sleep
In supervised continual learning, a deep neural network (DNN) is updated with
an ever-growing data stream. Unlike the offline setting where data is shuffled,
we cannot make any distributional assumptions about the data stream. Ideally,
only one pass through the dataset is needed for computational efficiency.
However, existing methods are inadequate and make many assumptions that cannot
be made for real-world applications, while simultaneously failing to improve
computational efficiency. In this paper, we propose a novel continual learning
method, SIESTA based on wake/sleep framework for training, which is well
aligned to the needs of on-device learning. The major goal of SIESTA is to
advance compute efficient continual learning so that DNNs can be updated
efficiently using far less time and energy. The principal innovations of SIESTA
are: 1) rapid online updates using a rehearsal-free, backpropagation-free, and
data-driven network update rule during its wake phase, and 2) expedited memory
consolidation using a compute-restricted rehearsal policy during its sleep
phase. For memory efficiency, SIESTA adapts latent rehearsal using memory
indexing from REMIND. Compared to REMIND and prior arts, SIESTA is far more
computationally efficient, enabling continual learning on ImageNet-1K in under
2 hours on a single GPU; moreover, in the augmentation-free setting it matches
the performance of the offline learner, a milestone critical to driving
adoption of continual learning in real-world applications.Comment: Accepted to TMLR 202
ADVERSARY AWARE CONTINUAL LEARNING
Continual learning approaches are useful as they help the model to learn new information (classes) sequentially, while also retaining the previously acquired information (classes). However, these approaches are adversary agnostic, i.e., they do not consider the possibility of malicious attacks. In this dissertation, we have demonstrated that continual learning approaches are extremely vulnerable to the adversarial backdoor attacks, where an intelligent adversary can introduce small amount of misinformation to the model in the form of imperceptible backdoor pattern during training to cause deliberate forgetting of a specific class at test time. We then propose a novel defensive framework to counter such an insidious attack where, we use the attacker’s primary strength – hiding the backdoor pattern by making it imperceptible to humans – against it and propose to learn a perceptible (stronger) pattern (also during the training) that can overpower the attacker’s imperceptible (weaker) pattern. We demonstrate the effectiveness of the proposed defensive mechanism through various commonly used replay-based (both generative and exact replay-based) continual learning algorithms using CIFAR-10, CIFAR-100, and MNIST benchmark datasets. Most noteworthy, we show that our proposed defensive framework considerably improves the robustness of continual learning algorithms with ZERO knowledge of the attacker’s target task, attacker’s target class, shape, size, and location of the attacker’s pattern. The proposed defensive framework also does not depend on the underlying continual learning algorithm. We term our proposed defensive framework as Adversary Aware Continual Learning (AACL)
Scalable approximate inference methods for Bayesian deep learning
This thesis proposes multiple methods for approximate inference in deep Bayesian neural networks split across three parts.
The first part develops a scalable Laplace approximation based on a block- diagonal Kronecker factored approximation of the Hessian. This approximation accounts for parameter correlations – overcoming the overly restrictive independence assumption of diagonal methods – while avoiding the quadratic scaling in the num- ber of parameters of the full Laplace approximation. The chapter further extends the method to online learning where datasets are observed one at a time. As the experiments demonstrate, modelling correlations between the parameters leads to improved performance over the diagonal approximation in uncertainty estimation and continual learning, in particular in the latter setting the improvements can be substantial.
The second part explores two parameter-efficient approaches for variational inference in neural networks, one based on factorised binary distributions over the weights, one extending ideas from sparse Gaussian processes to neural network weight matrices. The former encounters similar underfitting issues as mean-field Gaussian approaches, which can be alleviated by a MAP-style method in a hierarchi- cal model. The latter, based on an extension of Matheron’s rule to matrix normal distributions, achieves comparable uncertainty estimation performance to ensembles with the accuracy of a deterministic network while using only 25% of the number of parameters of a single ResNet-50.
The third part introduces TyXe, a probabilistic programming library built on top of Pyro to facilitate turning PyTorch neural networks into Bayesian ones. In contrast to existing frameworks, TyXe avoids introducing a layer abstraction, allowing it to support arbitrary architectures. This is demonstrated in a range of applications, from image classification with torchvision ResNets over node labelling with DGL graph neural networks to incorporating uncertainty into neural radiance fields with PyTorch3d
Continual semi-supervised learning through contrastive interpolation consistency
Continual Learning (CL) investigates how to train Deep Networks on a stream of tasks without incurring forgetting. CL settings proposed in literature assume that every incoming example is paired with ground-truth annotations. However, this clashes with many real-world applications: gathering labeled data, which is in itself tedious and expensive, becomes infeasible when data flow as a stream. This work explores Continual Semi-Supervised Learning (CSSL): here, only a small fraction of labeled input examples are shown to the learner. We assess how current CL methods (e.g.: EWC, LwF, iCaRL, ER, GDumb, DER) perform in this novel and challenging scenario, where overfitting entangles forgetting. Subsequently, we design a novel CSSL method that exploits metric learning and consistency regularization to leverage unlabeled examples while learning. We show that our proposal exhibits higher resilience to diminishing supervision and, even more surprisingly, relying only on supervision suffices to outperform SOTA methods trained under full supervision
Domain-incremental Cardiac Image Segmentation with Style-oriented Replay and Domain-sensitive Feature Whitening
Contemporary methods have shown promising results on cardiac image
segmentation, but merely in static learning, i.e., optimizing the network once
for all, ignoring potential needs for model updating. In real-world scenarios,
new data continues to be gathered from multiple institutions over time and new
demands keep growing to pursue more satisfying performance. The desired model
should incrementally learn from each incoming dataset and progressively update
with improved functionality as time goes by. As the datasets sequentially
delivered from multiple sites are normally heterogenous with domain
discrepancy, each updated model should not catastrophically forget previously
learned domains while well generalizing to currently arrived domains or even
unseen domains. In medical scenarios, this is particularly challenging as
accessing or storing past data is commonly not allowed due to data privacy. To
this end, we propose a novel domain-incremental learning framework to recover
past domain inputs first and then regularly replay them during model
optimization. Particularly, we first present a style-oriented replay module to
enable structure-realistic and memory-efficient reproduction of past data, and
then incorporate the replayed past data to jointly optimize the model with
current data to alleviate catastrophic forgetting. During optimization, we
additionally perform domain-sensitive feature whitening to suppress model's
dependency on features that are sensitive to domain changes (e.g.,
domain-distinctive style features) to assist domain-invariant feature
exploration and gradually improve the generalization performance of the
network. We have extensively evaluated our approach with the M&Ms Dataset in
single-domain and compound-domain incremental learning settings with improved
performance over other comparison approaches.Comment: Accepted to IEEE Transactions on Medical Imagin
- …