Search CORE

56 research outputs found

Frosting Weights for Better Continual Training

Author: Liu Feng
Trajcevski Goce
Wang Dingding
Zhu Xiaofeng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 06/01/2020
Field of study

Training a neural network model can be a lifelong learning process and is a computationally intensive one. A severe adverse effect that may occur in deep neural network models is that they can suffer from catastrophic forgetting during retraining on new data. To avoid such disruptions in the continuous learning, one appealing property is the additive nature of ensemble models. In this paper, we propose two generic ensemble approaches, gradient boosting and meta-learning, to solve the catastrophic forgetting problem in tuning pre-trained neural network models

arXiv.org e-Print Archive

Crossref

SIESTA: Efficient Online Continual Learning with Sleep

Author: Gallardo Jhair
Harun Md Yousuf
Hayes Tyler L.
Kanan Christopher
Kemker Ronald
Publication venue
Publication date: 02/11/2023
Field of study

In supervised continual learning, a deep neural network (DNN) is updated with an ever-growing data stream. Unlike the offline setting where data is shuffled, we cannot make any distributional assumptions about the data stream. Ideally, only one pass through the dataset is needed for computational efficiency. However, existing methods are inadequate and make many assumptions that cannot be made for real-world applications, while simultaneously failing to improve computational efficiency. In this paper, we propose a novel continual learning method, SIESTA based on wake/sleep framework for training, which is well aligned to the needs of on-device learning. The major goal of SIESTA is to advance compute efficient continual learning so that DNNs can be updated efficiently using far less time and energy. The principal innovations of SIESTA are: 1) rapid online updates using a rehearsal-free, backpropagation-free, and data-driven network update rule during its wake phase, and 2) expedited memory consolidation using a compute-restricted rehearsal policy during its sleep phase. For memory efficiency, SIESTA adapts latent rehearsal using memory indexing from REMIND. Compared to REMIND and prior arts, SIESTA is far more computationally efficient, enabling continual learning on ImageNet-1K in under 2 hours on a single GPU; moreover, in the augmentation-free setting it matches the performance of the offline learner, a milestone critical to driving adoption of continual learning in real-world applications.Comment: Accepted to TMLR 202

arXiv.org e-Print Archive

ADVERSARY AWARE CONTINUAL LEARNING

Author: Umer Muhammad
Publication venue: Rowan Digital Works
Publication date: 06/06/2023
Field of study

Continual learning approaches are useful as they help the model to learn new information (classes) sequentially, while also retaining the previously acquired information (classes). However, these approaches are adversary agnostic, i.e., they do not consider the possibility of malicious attacks. In this dissertation, we have demonstrated that continual learning approaches are extremely vulnerable to the adversarial backdoor attacks, where an intelligent adversary can introduce small amount of misinformation to the model in the form of imperceptible backdoor pattern during training to cause deliberate forgetting of a specific class at test time. We then propose a novel defensive framework to counter such an insidious attack where, we use the attacker’s primary strength – hiding the backdoor pattern by making it imperceptible to humans – against it and propose to learn a perceptible (stronger) pattern (also during the training) that can overpower the attacker’s imperceptible (weaker) pattern. We demonstrate the effectiveness of the proposed defensive mechanism through various commonly used replay-based (both generative and exact replay-based) continual learning algorithms using CIFAR-10, CIFAR-100, and MNIST benchmark datasets. Most noteworthy, we show that our proposed defensive framework considerably improves the robustness of continual learning algorithms with ZERO knowledge of the attacker’s target task, attacker’s target class, shape, size, and location of the attacker’s pattern. The proposed defensive framework also does not depend on the underlying continual learning algorithm. We term our proposed defensive framework as Adversary Aware Continual Learning (AACL)

Rowan University

Scalable approximate inference methods for Bayesian deep learning

Author: Ritter Julian Hippolyt
Publication venue: UCL (University College London)
Publication date: 28/05/2023
Field of study

This thesis proposes multiple methods for approximate inference in deep Bayesian neural networks split across three parts. The first part develops a scalable Laplace approximation based on a block- diagonal Kronecker factored approximation of the Hessian. This approximation accounts for parameter correlations – overcoming the overly restrictive independence assumption of diagonal methods – while avoiding the quadratic scaling in the num- ber of parameters of the full Laplace approximation. The chapter further extends the method to online learning where datasets are observed one at a time. As the experiments demonstrate, modelling correlations between the parameters leads to improved performance over the diagonal approximation in uncertainty estimation and continual learning, in particular in the latter setting the improvements can be substantial. The second part explores two parameter-efficient approaches for variational inference in neural networks, one based on factorised binary distributions over the weights, one extending ideas from sparse Gaussian processes to neural network weight matrices. The former encounters similar underfitting issues as mean-field Gaussian approaches, which can be alleviated by a MAP-style method in a hierarchi- cal model. The latter, based on an extension of Matheron’s rule to matrix normal distributions, achieves comparable uncertainty estimation performance to ensembles with the accuracy of a deterministic network while using only 25% of the number of parameters of a single ResNet-50. The third part introduces TyXe, a probabilistic programming library built on top of Pyro to facilitate turning PyTorch neural networks into Bayesian ones. In contrast to existing frameworks, TyXe avoids introducing a layer abstraction, allowing it to support arbitrary architectures. This is demonstrated in a range of applications, from image classification with torchvision ResNets over node labelling with DGL graph neural networks to incorporating uncertainty into neural radiance fields with PyTorch3d

UCL Discovery

Continual semi-supervised learning through contrastive interpolation consistency

Author: Bonicelli Lorenzo
Boschini Matteo
Buzzega Pietro
Calderara Simone
Porrello Angelo
Publication venue: 'Elsevier BV'
Publication date: 01/01/2022
Field of study

Continual Learning (CL) investigates how to train Deep Networks on a stream of tasks without incurring forgetting. CL settings proposed in literature assume that every incoming example is paired with ground-truth annotations. However, this clashes with many real-world applications: gathering labeled data, which is in itself tedious and expensive, becomes infeasible when data flow as a stream. This work explores Continual Semi-Supervised Learning (CSSL): here, only a small fraction of labeled input examples are shown to the learner. We assess how current CL methods (e.g.: EWC, LwF, iCaRL, ER, GDumb, DER) perform in this novel and challenging scenario, where overfitting entangles forgetting. Subsequently, we design a novel CSSL method that exploits metric learning and consistency regularization to leverage unlabeled examples while learning. We show that our proposal exhibits higher resilience to diminishing supervision and, even more surprisingly, relying only on supervision suffices to outperform SOTA methods trained under full supervision

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Domain-incremental Cardiac Image Segmentation with Style-oriented Replay and Domain-sensitive Feature Whitening

Author: Heng Pheng-Ann
Li Kang
Yu Lequan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/11/2022
Field of study

Contemporary methods have shown promising results on cardiac image segmentation, but merely in static learning, i.e., optimizing the network once for all, ignoring potential needs for model updating. In real-world scenarios, new data continues to be gathered from multiple institutions over time and new demands keep growing to pursue more satisfying performance. The desired model should incrementally learn from each incoming dataset and progressively update with improved functionality as time goes by. As the datasets sequentially delivered from multiple sites are normally heterogenous with domain discrepancy, each updated model should not catastrophically forget previously learned domains while well generalizing to currently arrived domains or even unseen domains. In medical scenarios, this is particularly challenging as accessing or storing past data is commonly not allowed due to data privacy. To this end, we propose a novel domain-incremental learning framework to recover past domain inputs first and then regularly replay them during model optimization. Particularly, we first present a style-oriented replay module to enable structure-realistic and memory-efficient reproduction of past data, and then incorporate the replayed past data to jointly optimize the model with current data to alleviate catastrophic forgetting. During optimization, we additionally perform domain-sensitive feature whitening to suppress model's dependency on features that are sensitive to domain changes (e.g., domain-distinctive style features) to assist domain-invariant feature exploration and gradually improve the generalization performance of the network. We have extensively evaluated our approach with the M&Ms Dataset in single-domain and compound-domain incremental learning settings with improved performance over other comparison approaches.Comment: Accepted to IEEE Transactions on Medical Imagin

arXiv.org e-Print Archive