702 research outputs found
Continual Learning via Sequential Function-Space Variational Inference
Sequential Bayesian inference over predictive functions is a natural
framework for continual learning from streams of data. However, applying it to
neural networks has proved challenging in practice. Addressing the drawbacks of
existing techniques, we propose an optimization objective derived by
formulating continual learning as sequential function-space variational
inference. In contrast to existing methods that regularize neural network
parameters directly, this objective allows parameters to vary widely during
training, enabling better adaptation to new tasks. Compared to objectives that
directly regularize neural network predictions, the proposed objective allows
for more flexible variational distributions and more effective regularization.
We demonstrate that, across a range of task sequences, neural networks trained
via sequential function-space variational inference achieve better predictive
accuracy than networks trained with related methods while depending less on
maintaining a set of representative points from previous tasks.Comment: Published in Proceedings of the 39th International Conference on
Machine Learning (ICML 2022
Recent Advances of Continual Learning in Computer Vision: An Overview
In contrast to batch learning where all training data is available at once,
continual learning represents a family of methods that accumulate knowledge and
learn continuously with data available in sequential order. Similar to the
human learning process with the ability of learning, fusing, and accumulating
new knowledge coming at different time steps, continual learning is considered
to have high practical significance. Hence, continual learning has been studied
in various artificial intelligence tasks. In this paper, we present a
comprehensive review of the recent progress of continual learning in computer
vision. In particular, the works are grouped by their representative
techniques, including regularization, knowledge distillation, memory,
generative replay, parameter isolation, and a combination of the above
techniques. For each category of these techniques, both its characteristics and
applications in computer vision are presented. At the end of this overview,
several subareas, where continuous knowledge accumulation is potentially
helpful while continual learning has not been well studied, are discussed
Continual Learning with Extended Kronecker-factored Approximate Curvature
We propose a quadratic penalty method for continual learning of neural
networks that contain batch normalization (BN) layers. The Hessian of a loss
function represents the curvature of the quadratic penalty function, and a
Kronecker-factored approximate curvature (K-FAC) is used widely to practically
compute the Hessian of a neural network. However, the approximation is not
valid if there is dependence between examples, typically caused by BN layers in
deep network architectures. We extend the K-FAC method so that the
inter-example relations are taken into account and the Hessian of deep neural
networks can be properly approximated under practical assumptions. We also
propose a method of weight merging and reparameterization to properly handle
statistical parameters of BN, which plays a critical role for continual
learning with BN, and a method that selects hyperparameters without source task
data. Our method shows better performance than baselines in the permuted MNIST
task with BN layers and in sequential learning from the ImageNet classification
task to fine-grained classification tasks with ResNet-50, without any explicit
or implicit use of source task data for hyperparameter selection.Comment: CVPR 202
Online Continual Learning via Logit Adjusted Softmax
Online continual learning is a challenging problem where models must learn
from a non-stationary data stream while avoiding catastrophic forgetting.
Inter-class imbalance during training has been identified as a major cause of
forgetting, leading to model prediction bias towards recently learned classes.
In this paper, we theoretically analyze that inter-class imbalance is entirely
attributed to imbalanced class-priors, and the function learned from
intra-class intrinsic distributions is the Bayes-optimal classifier. To that
end, we present that a simple adjustment of model logits during training can
effectively resist prior class bias and pursue the corresponding Bayes-optimum.
Our proposed method, Logit Adjusted Softmax, can mitigate the impact of
inter-class imbalance not only in class-incremental but also in realistic
general setups, with little additional computational cost. We evaluate our
approach on various benchmarks and demonstrate significant performance
improvements compared to prior arts. For example, our approach improves the
best baseline by 4.6% on CIFAR10
Continual Learning with Adaptive Weights (CLAW)
Approaches to continual learning aim to successfully learn a set of related tasks that arrive in an online manner. Recently, several frameworks have been developed which enable deep learning to be deployed in this learning scenario. A key modelling decision is to what extent the architecture should be shared across tasks. On the one hand, separately modelling each task avoids catastrophic forgetting but it does not support transfer learning and leads to large models. On the other hand, rigidly specifying a shared component and a task-specific part enables task transfer and limits the model size, but it is vulnerable to catastrophic forgetting and restricts the form of task-transfer that can occur. Ideally, the network should adaptively identify which parts of the network to share in a data driven way. Here we introduce such an approach called Continual Learning with Adaptive Weights (CLAW), which is based on probabilistic modelling and variational inference. Experiments show that CLAW achieves state-of-the-art performance on six benchmarks in terms of overall continual learning performance, as measured by classification accuracy, and in terms of addressing catastrophic forgetting
A Wholistic View of Continual Learning with Deep Neural Networks: Forgotten Lessons and the Bridge to Active and Open World Learning
Current deep learning research is dominated by benchmark evaluation. A method
is regarded as favorable if it empirically performs well on the dedicated test
set. This mentality is seamlessly reflected in the resurfacing area of
continual learning, where consecutively arriving sets of benchmark data are
investigated. The core challenge is framed as protecting previously acquired
representations from being catastrophically forgotten due to the iterative
parameter updates. However, comparison of individual methods is nevertheless
treated in isolation from real world application and typically judged by
monitoring accumulated test set performance. The closed world assumption
remains predominant. It is assumed that during deployment a model is guaranteed
to encounter data that stems from the same distribution as used for training.
This poses a massive challenge as neural networks are well known to provide
overconfident false predictions on unknown instances and break down in the face
of corrupted data. In this work we argue that notable lessons from open set
recognition, the identification of statistically deviating data outside of the
observed dataset, and the adjacent field of active learning, where data is
incrementally queried such that the expected performance gain is maximized, are
frequently overlooked in the deep learning era. Based on these forgotten
lessons, we propose a consolidated view to bridge continual learning, active
learning and open set recognition in deep neural networks. Our results show
that this not only benefits each individual paradigm, but highlights the
natural synergies in a common framework. We empirically demonstrate
improvements when alleviating catastrophic forgetting, querying data in active
learning, selecting task orders, while exhibiting robust open world application
where previously proposed methods fail.Comment: 32 page
On the exploration of incremental learning for fine-grained image retrieval
Computer Systems, Imagery and Medi
- …