Search CORE

18 research outputs found

A Comprehensive Survey of Forgetting in Deep Learning Beyond Continual Learning

Author: Huang Heng
Shen Li
Wang Zhenyi
Yang Enneng
Publication venue
Publication date: 23/07/2023
Field of study

Forgetting refers to the loss or deterioration of previously acquired information or knowledge. While the existing surveys on forgetting have primarily focused on continual learning, forgetting is a prevalent phenomenon observed in various other research domains within deep learning. Forgetting manifests in research fields such as generative models due to generator shifts, and federated learning due to heterogeneous data distributions across clients. Addressing forgetting encompasses several challenges, including balancing the retention of old task knowledge with fast learning of new tasks, managing task interference with conflicting goals, and preventing privacy leakage, etc. Moreover, most existing surveys on continual learning implicitly assume that forgetting is always harmful. In contrast, our survey argues that forgetting is a double-edged sword and can be beneficial and desirable in certain cases, such as privacy-preserving scenarios. By exploring forgetting in a broader context, we aim to present a more nuanced understanding of this phenomenon and highlight its potential advantages. Through this comprehensive survey, we aspire to uncover potential solutions by drawing upon ideas and approaches from various fields that have dealt with forgetting. By examining forgetting beyond its conventional boundaries, in future work, we hope to encourage the development of novel strategies for mitigating, harnessing, or even embracing forgetting in real applications. A comprehensive list of papers about forgetting in various research fields is available at \url{https://github.com/EnnengYang/Awesome-Forgetting-in-Deep-Learning}

arXiv.org e-Print Archive

Dynamic Scalable Self-Attention Ensemble for Task-Free Continual Learning

Author: Bors Adrian Gheorghe
Ye Fei
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/06/2023
Field of study

Continual learning represents a challenging task for modern deep neural networks due to the catastrophic forgetting following the adaptation of network parameters to new tasks. In this paper, we address a more challenging learning paradigm called Task-Free Continual Learning (TFCL), in which the task information is missing during the training. To deal with this problem, we introduce the Dynamic Scalable Self-Attention Ensemble (DSSAE) model, which dynamically adds new Vision Transformer (ViT) based-experts to deal with the data distribution shift during the training. To avoid frequent expansions and ensure an appropriate number of experts for the model, we propose a new dynamic expansion mechanism that evaluates the novelty of incoming samples as expansion signals. Furthermore, the proposed expansion mechanism does not require knowing the task information or the class label, which can be used in a realistic learning environment. Empirical results demonstrate that the proposed DSSAE achieves state-of-the-art performance in a series of TFCL experiments

White Rose Research Online

Training deep generative models via lifelong learning

Author: Ye Fei
Publication venue
Publication date: 01/09/2022
Field of study

Lifelong learning represents an essential function of an artificial intelligence system, which can continually acquire and learn novel knowledge without forgetting it. Lately, deep learning has brought new possibilities for the development of artificial intelligence, including artificial lifelong learning systems. However, most existing lifelong learning systems are limited to the classification task, and lifelong generative modelling remains a new stage. In this PhD thesis, our research goal mainly focuses on training deep generative models in the context of lifelong learning. The advantage of our research topic over general continual learning is that we can implement many downstream tasks within a unified framework, including classification, image generation, image interpolation, and disentangled representation learning. Firstly, we propose a new lifelong hybrid approach, combining the advantages of Generative Adversarial Net (GAN) and Variational Autoencoder (VAE) for lifelong generative modelling. The proposed model can learn a robust generative replay network that provides high-quality generative replay samples to relieve forgetting, while we also train inference models to capture meaningful latent representations over time. Secondly, to learn a long sequence of tasks, we propose a novel dynamic expansion model that can reuse existing network parameters and knowledge to learn a related task while building a new component to deal with a novel task. Thirdly, we propose a novel lifelong teacher-student framework where a dynamic expansible GAN mixture model implements the teacher module. Then, we introduce a novel self-supervised learning approach for the Student that allows capturing cross-domain latent representations from the entire knowledge accumulated by the Teacher as well as from novel data. Finally, we extend the lifelong teacher-student framework to task-free continual learning, where the task information is unavailable. The proposed model can adaptively expand its network architecture when detecting the data distribution shift during the training, which can be applied to infinite data streams

White Rose E-theses Online

Robust and efficient inference and learning algorithms for generative models

Author: Xu Kai
Publication venue: The University of Edinburgh
Publication date: 09/06/2022
Field of study

Generative modelling is a popular paradigm in machine learning due to its natural ability to describe uncertainty in data and models and for its applications including data compression (Ho et al., 2020), missing data imputation (Valera et al., 2018), synthetic data generation (Lin et al., 2020), representation learning (Kingma and Welling, 2014), robust classification (Li et al., 2019b), and more. For generative models, the task of finding the distribution of unobserved variables conditioned on observed ones is referred to as inference. Finding the optimal model that makes the model distribution close to the data distribution according to some discrepancy measures is called learning. In practice, existing learning and inference methods can fall short on robustness and efficiency. A method that is more robust to its hyper-parameters or different types of data can be more easily adapted to various real-world applications. How efficient a method is in regard to the size and the dimensionality of data determines at what scale the method can be applied. This thesis presents four pieces of my original work that improves these properties in generative models. First, I introduce two novel Bayesian inference algorithms. One is called coupled multinomial Hamiltonian Monte Carlo (Xu et al., 2021a); it builds on Heng and Jacob (2019), which is a recent work in unbiased Markov chain Monte Carlo (MCMC) (Jacob et al., 2019b) and has been found to sensitive to hyper-parameters and less efficient compared to normal, biased MCMC. These issues are solved by establishing couplings to the widely-used multinomial Hamiltonian Monte Carlo, leading to a statistically more efficient and robust method. The other method is called roulette-based variational expectation (RAVE; Xu et al., 2019) that applies amortised inference to a model family called Bayesian non-parametric models, in which the number of parameters are allowed to grow unbounded as the data gets more complex. Unlike previous sampling-based methods that are slow or variational inference methods that rely on truncation, RAVE combines the advantages of both to achieve flexible inference that is also computational efficient. Second, I introduce two novel learning methods. One is called generative ratio-matching (Srivastava et al., 2019) which is a learning algorithm that makes deep generative models based on kernel methods applicable to high-dimensional data. The key innovation of this method is learning a projection of the data to a lower-dimensional space in which the density ratio is preserved such that learning can be done in the lowerdimensional space where kernel methods are effective. The other method is called Bayesian symbolic physics that combines Bayesian inference and symbolic regression in the context of naïve physics—the study of how humans understand and learn physics. Unlike classic generative models for which the structure of the generative process is predefined or deep generative models where the process is represented by data-hungry neural networks, Bayesian-symbolic generative processes are defined by functions over a hypothesis space specified by a context-free grammar. This formulation allows these models to incorporate domain knowledge in learning, which gives highly-improved sample efficiency. For all four pieces of work, I provide theoretical analyses and/or empirical results to validate that the algorithmic advances lead to improvements in robustness and efficiency for generative models. Lastly, I summarise my contributions to free and open-source software on generative modelling. This includes a set of Julia packages that I contributed and are currently used by the Turing probabilistic programming language (Ge et al., 2018). These packages, which are highly reusable components for building probabilistic programming languages, together form a probabilistic programming ecosystem in Julia. An important package that is primarily developed by me is called ADVANCEDHMC.JL (Xu et al., 2020), which provides robust and efficient implementations of HMC methods and has been adopted as the backend of Turing. Importantly, the design of this package allows an intuitive abstraction to construct HMC samplers similarly to how they are mathematically defined. The promise of these open-source packages is to make generative modelling techniques more accessible to domain experts from various backgrounds and to make relevant research more reproducible to help advance the field

Edinburgh Research Archive

Meta learning for few shot learning

Author: Zhang Xueting
Publication venue: The University of Edinburgh
Publication date: 08/06/2022
Field of study

Few-shot learning aims to scale visual recognition to open-ended growth of new classes with limited labelled examples, thus alleviating data and computation bottleneck of conventional deep learning. This thesis proposes a meta learning (a.k.a. learning to learn), paradigm to tackle the real-world few shot learning challenges. Firstly, we present a parameterized multi-metric based meta learning algorithm (RelationNet2). Existing metric learning algorithms are always based on training a global deep embedding and metric to support image similarity matching, but we propose a deep comparison network comprised of embedding and relation modules learning multiple non-linear distance metrics based on different levels of features simultaneously. Furthermore, images are represented as \todo{a} distribution rather than vectors via learning parameterized Gaussian noise regularization, reducing overfitting and enable the use of deeper embeddings. We next consider the fact that several recent competitors develop effective few-shot learners through strong conventional representations in combination with very simple classifiers, questioning whether “meta-learning” is necessary or highly effective features are sufficient. To defend meta-learning, we take an approach agnostic to the off-the-shelf features, and focus exclusively on meta-learning the final classifier layer. Specifically, we introduce MetaQDA, a Bayesian meta-learning extension of quadratic discriminant analysis classifier, that is complementary to advances in feature representations, leading to high accuracy and state-of-the-art uncertainty calibration performance in predictions. Finally, we investigate the extension of MetaQDA to more generalized real-world scenarios beyond the narrow standard few-shot benchmarks. Our model achieves both many-shot and few-shot classification accuracy in generalized few-shot learning. In terms of few-shot class-incremental learning, MetaQDA is inherently suitable to novel classes growing \todo{scenarios}. As for open-set recognition, we calculate the probability belonging to novel class by Bayes' Rule, maintaining high accuracy in both close-set recognition and open-set rejection. Overall, our contributions in few-shot meta-learning advance state of the art under both accuracy and calibration metrics, explore a series of increasingly realistic problem settings, to support more researchers and practitioners in future exploration

Edinburgh Research Archive

Deep Learning in Medical Image Analysis

Author
Publication venue: 'MDPI AG'
Publication date: 11/01/2022
Field of study

The accelerating power of deep learning in diagnosing diseases will empower physicians and speed up decision making in clinical environments. Applications of modern medical instruments and digitalization of medical care have generated enormous amounts of medical images in recent years. In this big data arena, new deep learning methods and computational models for efficient data processing, analysis, and modeling of the generated data are crucially important for clinical applications and understanding the underlying biological process. This book presents and highlights novel algorithms, architectures, techniques, and applications of deep learning for medical image analysis

Directory of Open Access Books (DOAB)

Memorized Variational Continual Learning for Dirichlet Process Mixtures

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Crossref

Connected Attribute Filtering Based on Contour Smoothness

Author: Ouzounis Georgios
Urbach Erik R.
Wilkinson M.H.F.
Publication venue: The Russian Academie of Science
Publication date: 01/01/2013
Field of study

University of Groningen

Connected Attribute Filtering Based on Contour Smoothness

Author: Ouzounis Georgios
Urbach Erik R.
Wilkinson M.H.F.
Publication venue: The Russian Academie of Science
Publication date: 01/01/2013
Field of study

A new attribute measuring the contour smoothness of 2-D objects is presented in the context of morphological attribute filtering. The attribute is based on the ratio of the circularity and non-compactness, and has a maximum of 1 for a perfect circle. It decreases as the object boundary becomes irregular. Computation on hierarchical image representation structures relies on five auxiliary data members and is rapid. Contour smoothness is a suitable descriptor for detecting and discriminating man-made structures from other image features. An example is demonstrated on a very-high-resolution satellite image using connected pattern spectra and the switchboard platform

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen