323 research outputs found

    Continual Learning with Adaptive Weights (CLAW)

    Get PDF
    Approaches to continual learning aim to successfully learn a set of related tasks that arrive in an online manner. Recently, several frameworks have been developed which enable deep learning to be deployed in this learning scenario. A key modelling decision is to what extent the architecture should be shared across tasks. On the one hand, separately modelling each task avoids catastrophic forgetting but it does not support transfer learning and leads to large models. On the other hand, rigidly specifying a shared component and a task-specific part enables task transfer and limits the model size, but it is vulnerable to catastrophic forgetting and restricts the form of task-transfer that can occur. Ideally, the network should adaptively identify which parts of the network to share in a data driven way. Here we introduce such an approach called Continual Learning with Adaptive Weights (CLAW), which is based on probabilistic modelling and variational inference. Experiments show that CLAW achieves state-of-the-art performance on six benchmarks in terms of overall continual learning performance, as measured by classification accuracy, and in terms of addressing catastrophic forgetting

    Scalable approximate inference methods for Bayesian deep learning

    Get PDF
    This thesis proposes multiple methods for approximate inference in deep Bayesian neural networks split across three parts. The first part develops a scalable Laplace approximation based on a block- diagonal Kronecker factored approximation of the Hessian. This approximation accounts for parameter correlations – overcoming the overly restrictive independence assumption of diagonal methods – while avoiding the quadratic scaling in the num- ber of parameters of the full Laplace approximation. The chapter further extends the method to online learning where datasets are observed one at a time. As the experiments demonstrate, modelling correlations between the parameters leads to improved performance over the diagonal approximation in uncertainty estimation and continual learning, in particular in the latter setting the improvements can be substantial. The second part explores two parameter-efficient approaches for variational inference in neural networks, one based on factorised binary distributions over the weights, one extending ideas from sparse Gaussian processes to neural network weight matrices. The former encounters similar underfitting issues as mean-field Gaussian approaches, which can be alleviated by a MAP-style method in a hierarchi- cal model. The latter, based on an extension of Matheron’s rule to matrix normal distributions, achieves comparable uncertainty estimation performance to ensembles with the accuracy of a deterministic network while using only 25% of the number of parameters of a single ResNet-50. The third part introduces TyXe, a probabilistic programming library built on top of Pyro to facilitate turning PyTorch neural networks into Bayesian ones. In contrast to existing frameworks, TyXe avoids introducing a layer abstraction, allowing it to support arbitrary architectures. This is demonstrated in a range of applications, from image classification with torchvision ResNets over node labelling with DGL graph neural networks to incorporating uncertainty into neural radiance fields with PyTorch3d

    Continual machine learning for non-stationary data analysis

    Get PDF
    Although deep learning models have achieved significant successes in various fields, most of them have limited capacity in learning multiple tasks sequentially. The issue of forgetting the previously learned tasks in continual learning is known as catastrophic forgetting or interference. When the input data or the goal of learning changes, a conventional machine learning model will learn and adapt to the new status. However, the model will not remember or recognise any revisits to the previous states. This causes performance reduction and re-training curves in dealing with periodic or irregularly reoccurring changes in the data or goals. Without continual learning ability, one cannot deploy an adaptive machine learning model in a changing environment. This thesis investigates the continual learning and mitigating the catastrophic forgetting problem in neural networks. We assume non-stationary data contains multiple different tasks which are coming in sequence and will not be stored. We propose a regularisation method, which is to identify and penalise the changes of important parameters of previous tasks while learning a new one. However, when the number of tasks is sufficiently large, this method cannot preserve all the previously learned knowledge, or it impedes the integration of new knowledge. This is also known as the stability-plasticity dilemma. To solve this problem, we proposed a replay method based on Generative Adversarial Networks (GANs). Different from other replay methods, the proposed model is not bounded by the fitting capacity of the generator. However, the number of parameters increases rapidly as the number of learned tasks grows. Therefore, we propose a continual learning model based on Bayesian neural networks and a Mixture of Experts (MoE) framework. The proposed model integrates different experts which are responsible for different tasks into a giant model. Previously knowledge is preserved, and new tasks can be efficiently learned by assigning new experts. Based on Monte-Carlo Sampling, the performance is not satisfied. To address this issue, we propose a Probabilistic Neural Network (PNN) and integrate it with a conventional neural network. The PNN can produce the likelihood given input and be used in a variety of fields. To apply continual learning methods to real-world applications, we then propose a semi-supervised learning model to analyse healthcare datasets. The proposed framework extracts the general features from unlabelled data. We integrate the PNN into the framework to classify the data, which includes a smaller set of labelled samples and continually learn the new cases. The proposed model has been tested on benchmark datasets and also a real-world clinical dataset. The results showed that our proposed model outperforms the state-of-the-art models without requiring prior knowledge of the tasks and overall accuracy of the continual learning. The experiments on the real-world clinical data were designed to identify the risk of Urinary Tract Infections (UTIs) using in-home monitoring data. The UTI risk analysis model has been deployed in a digital platform and is currently part of the on-going Minder clinical study at the UK Dementia Research Institute (UK DRI). An earlier version of the model was deployed as a part of a Class-I CE marked medical device. The UK DRI Minder platform and the deployed machine learning models, including the UTI risk analysis model developed in this research, are in the process to be accredited as a Class-IIa medical device. Overall, this PhD research tackles theoretical and applied challenges of continuous learning models in dealing with real-world data. We evaluate the proposed continual learning methods in a variety of benchmarks with comprehensive analysis and show their effectiveness. Furthermore, we have applied the proposed methods in real-world applications and demonstrated the applicability of the models to real-world settings and clinical problems.Open Acces

    Function Space Bayesian Pseudocoreset for Bayesian Neural Networks

    Full text link
    A Bayesian pseudocoreset is a compact synthetic dataset summarizing essential information of a large-scale dataset and thus can be used as a proxy dataset for scalable Bayesian inference. Typically, a Bayesian pseudocoreset is constructed by minimizing a divergence measure between the posterior conditioning on the pseudocoreset and the posterior conditioning on the full dataset. However, evaluating the divergence can be challenging, particularly for the models like deep neural networks having high-dimensional parameters. In this paper, we propose a novel Bayesian pseudocoreset construction method that operates on a function space. Unlike previous methods, which construct and match the coreset and full data posteriors in the space of model parameters (weights), our method constructs variational approximations to the coreset posterior on a function space and matches it to the full data posterior in the function space. By working directly on the function space, our method could bypass several challenges that may arise when working on a weight space, including limited scalability and multi-modality issue. Through various experiments, we demonstrate that the Bayesian pseudocoresets constructed from our method enjoys enhanced uncertainty quantification and better robustness across various model architectures
    • …
    corecore