18 research outputs found
A Comprehensive Survey of Forgetting in Deep Learning Beyond Continual Learning
Forgetting refers to the loss or deterioration of previously acquired
information or knowledge. While the existing surveys on forgetting have
primarily focused on continual learning, forgetting is a prevalent phenomenon
observed in various other research domains within deep learning. Forgetting
manifests in research fields such as generative models due to generator shifts,
and federated learning due to heterogeneous data distributions across clients.
Addressing forgetting encompasses several challenges, including balancing the
retention of old task knowledge with fast learning of new tasks, managing task
interference with conflicting goals, and preventing privacy leakage, etc.
Moreover, most existing surveys on continual learning implicitly assume that
forgetting is always harmful. In contrast, our survey argues that forgetting is
a double-edged sword and can be beneficial and desirable in certain cases, such
as privacy-preserving scenarios. By exploring forgetting in a broader context,
we aim to present a more nuanced understanding of this phenomenon and highlight
its potential advantages. Through this comprehensive survey, we aspire to
uncover potential solutions by drawing upon ideas and approaches from various
fields that have dealt with forgetting. By examining forgetting beyond its
conventional boundaries, in future work, we hope to encourage the development
of novel strategies for mitigating, harnessing, or even embracing forgetting in
real applications. A comprehensive list of papers about forgetting in various
research fields is available at
\url{https://github.com/EnnengYang/Awesome-Forgetting-in-Deep-Learning}
Dynamic Scalable Self-Attention Ensemble for Task-Free Continual Learning
Continual learning represents a challenging task for modern deep neural networks due to the catastrophic forgetting following the adaptation of network parameters to new tasks. In this paper, we address a more challenging learning paradigm called Task-Free Continual Learning (TFCL), in which the task information is missing during the training. To deal with this problem, we introduce the Dynamic Scalable Self-Attention Ensemble (DSSAE) model, which dynamically adds new Vision Transformer (ViT) based-experts to deal with the data distribution shift during the training. To avoid frequent expansions and ensure an appropriate number of experts for the model, we propose a new dynamic expansion mechanism that evaluates the novelty of incoming samples as expansion signals. Furthermore, the proposed expansion mechanism does not require knowing the task information or the class label, which can be used in a realistic learning environment. Empirical results demonstrate that the proposed DSSAE achieves state-of-the-art performance in a series of TFCL experiments
Training deep generative models via lifelong learning
Lifelong learning represents an essential function of an artificial intelligence system, which can continually acquire and learn novel knowledge without forgetting it. Lately, deep learning has brought new possibilities for the development of artificial intelligence, including artificial lifelong learning systems. However, most existing lifelong learning systems are limited to the classification task, and lifelong generative modelling remains a new stage. In this PhD thesis, our research goal mainly focuses on training deep generative models in the context of lifelong learning. The advantage of our research topic over general continual learning is that we can implement many downstream tasks within a unified framework, including classification, image generation, image interpolation, and disentangled representation learning. Firstly, we propose a new lifelong hybrid approach, combining the advantages of Generative Adversarial Net (GAN) and Variational Autoencoder (VAE) for lifelong generative modelling. The proposed model can learn a robust generative replay network that provides high-quality generative replay samples to relieve forgetting, while we also train inference models to capture meaningful latent representations over time. Secondly, to learn a long sequence of tasks, we propose a novel dynamic expansion model that can reuse existing network parameters and knowledge to learn a related task while building a new component to deal with a novel task. Thirdly, we propose a novel lifelong teacher-student framework where a dynamic expansible GAN mixture model implements the teacher module. Then, we introduce a novel self-supervised learning approach for the Student that allows capturing cross-domain latent representations from the entire knowledge accumulated by the Teacher as well as from novel data. Finally, we extend the lifelong teacher-student framework to task-free continual learning, where the task information is unavailable. The proposed model can adaptively expand its network architecture when detecting the data distribution shift during the training, which can be applied to infinite data streams
Robust and efficient inference and learning algorithms for generative models
Generative modelling is a popular paradigm in machine learning due to its natural
ability to describe uncertainty in data and models and for its applications including data
compression (Ho et al., 2020), missing data imputation (Valera et al., 2018), synthetic
data generation (Lin et al., 2020), representation learning (Kingma and Welling, 2014),
robust classification (Li et al., 2019b), and more. For generative models, the task of
finding the distribution of unobserved variables conditioned on observed ones is referred
to as inference. Finding the optimal model that makes the model distribution close to the
data distribution according to some discrepancy measures is called learning. In practice,
existing learning and inference methods can fall short on robustness and efficiency. A
method that is more robust to its hyper-parameters or different types of data can be
more easily adapted to various real-world applications. How efficient a method is in
regard to the size and the dimensionality of data determines at what scale the method
can be applied. This thesis presents four pieces of my original work that improves these
properties in generative models.
First, I introduce two novel Bayesian inference algorithms. One is called coupled
multinomial Hamiltonian Monte Carlo (Xu et al., 2021a); it builds on Heng and Jacob
(2019), which is a recent work in unbiased Markov chain Monte Carlo (MCMC) (Jacob
et al., 2019b) and has been found to sensitive to hyper-parameters and less efficient
compared to normal, biased MCMC. These issues are solved by establishing couplings
to the widely-used multinomial Hamiltonian Monte Carlo, leading to a statistically
more efficient and robust method. The other method is called roulette-based variational
expectation (RAVE; Xu et al., 2019) that applies amortised inference to a model family
called Bayesian non-parametric models, in which the number of parameters are allowed
to grow unbounded as the data gets more complex. Unlike previous sampling-based
methods that are slow or variational inference methods that rely on truncation, RAVE
combines the advantages of both to achieve flexible inference that is also computational
efficient. Second, I introduce two novel learning methods. One is called generative
ratio-matching (Srivastava et al., 2019) which is a learning algorithm that makes deep
generative models based on kernel methods applicable to high-dimensional data. The
key innovation of this method is learning a projection of the data to a lower-dimensional
space in which the density ratio is preserved such that learning can be done in the lowerdimensional
space where kernel methods are effective. The other method is called
Bayesian symbolic physics that combines Bayesian inference and symbolic regression
in the context of naïve physics—the study of how humans understand and learn physics.
Unlike classic generative models for which the structure of the generative process is
predefined or deep generative models where the process is represented by data-hungry
neural networks, Bayesian-symbolic generative processes are defined by functions over
a hypothesis space specified by a context-free grammar. This formulation allows these
models to incorporate domain knowledge in learning, which gives highly-improved
sample efficiency. For all four pieces of work, I provide theoretical analyses and/or
empirical results to validate that the algorithmic advances lead to improvements in
robustness and efficiency for generative models.
Lastly, I summarise my contributions to free and open-source software on generative
modelling. This includes a set of Julia packages that I contributed and are currently
used by the Turing probabilistic programming language (Ge et al., 2018). These packages,
which are highly reusable components for building probabilistic programming
languages, together form a probabilistic programming ecosystem in Julia. An important
package that is primarily developed by me is called ADVANCEDHMC.JL (Xu et al.,
2020), which provides robust and efficient implementations of HMC methods and has
been adopted as the backend of Turing. Importantly, the design of this package allows
an intuitive abstraction to construct HMC samplers similarly to how they are mathematically
defined. The promise of these open-source packages is to make generative
modelling techniques more accessible to domain experts from various backgrounds and
to make relevant research more reproducible to help advance the field
Meta learning for few shot learning
Few-shot learning aims to scale visual recognition to open-ended growth of new classes with limited labelled examples, thus alleviating data and computation bottleneck of conventional deep learning. This thesis proposes a meta learning (a.k.a. learning to learn), paradigm to tackle the real-world few shot learning challenges.
Firstly, we present a parameterized multi-metric based meta learning algorithm (RelationNet2). Existing metric learning algorithms are always based on training a global deep embedding and metric to support image similarity matching, but we propose a deep comparison network comprised of embedding and relation modules learning multiple non-linear distance metrics based on different levels of features simultaneously. Furthermore, images are represented as \todo{a} distribution rather than vectors via learning parameterized Gaussian noise regularization, reducing overfitting and enable the use of deeper embeddings.
We next consider the fact that several recent competitors develop effective few-shot learners through strong conventional representations in combination with very simple classifiers, questioning whether “meta-learning” is necessary or highly effective features are sufficient. To defend meta-learning, we take an approach agnostic to the off-the-shelf features, and focus exclusively on meta-learning the final classifier layer. Specifically, we introduce MetaQDA, a Bayesian meta-learning extension of quadratic discriminant analysis classifier, that is complementary to advances in feature representations, leading to high accuracy and state-of-the-art uncertainty calibration performance in predictions.
Finally, we investigate the extension of MetaQDA to more generalized real-world scenarios beyond the narrow standard few-shot benchmarks. Our model achieves both many-shot and few-shot classification accuracy in generalized few-shot learning. In terms of few-shot class-incremental learning, MetaQDA is inherently suitable to novel classes growing \todo{scenarios}. As for open-set recognition, we calculate the probability belonging to novel class by Bayes' Rule, maintaining high accuracy in both close-set recognition and open-set rejection.
Overall, our contributions in few-shot meta-learning advance state of the art under both accuracy and calibration metrics, explore a series of increasingly realistic problem settings, to support more researchers and practitioners in future exploration
Deep Learning in Medical Image Analysis
The accelerating power of deep learning in diagnosing diseases will empower physicians and speed up decision making in clinical environments. Applications of modern medical instruments and digitalization of medical care have generated enormous amounts of medical images in recent years. In this big data arena, new deep learning methods and computational models for efficient data processing, analysis, and modeling of the generated data are crucially important for clinical applications and understanding the underlying biological process. This book presents and highlights novel algorithms, architectures, techniques, and applications of deep learning for medical image analysis
Connected Attribute Filtering Based on Contour Smoothness
A new attribute measuring the contour smoothness of 2-D objects is presented in the context of morphological attribute filtering. The attribute is based on the ratio of the circularity and non-compactness, and has a maximum of 1 for a perfect circle. It decreases as the object boundary becomes irregular. Computation on hierarchical image representation structures relies on five auxiliary data members and is rapid. Contour smoothness is a suitable descriptor for detecting and discriminating man-made structures from other image features. An example is demonstrated on a very-high-resolution satellite image using connected pattern spectra and the switchboard platform