10,441 research outputs found
Learning to Learn Variational Semantic Memory
In this paper, we introduce variational semantic memory into meta-learning to
acquire long-term knowledge for few-shot learning. The variational semantic
memory accrues and stores semantic information for the probabilistic inference
of class prototypes in a hierarchical Bayesian framework. The semantic memory
is grown from scratch and gradually consolidated by absorbing information from
tasks it experiences. By doing so, it is able to accumulate long-term, general
knowledge that enables it to learn new concepts of objects. We formulate memory
recall as the variational inference of a latent memory variable from addressed
contents, which offers a principled way to adapt the knowledge to individual
tasks. Our variational semantic memory, as a new long-term memory module,
confers principled recall and update mechanisms that enable semantic
information to be efficiently accrued and adapted for few-shot learning.
Experiments demonstrate that the probabilistic modelling of prototypes achieves
a more informative representation of object classes compared to deterministic
vectors. The consistent new state-of-the-art performance on four benchmarks
shows the benefit of variational semantic memory in boosting few-shot
recognition.Comment: accepted to NeurIPS 2020; code is available in
https://github.com/YDU-uva/VS
Addressing Catastrophic Forgetting in Few-Shot Problems
Neural networks are known to suffer from catastrophic forgetting when trained
on sequential datasets. While there have been numerous attempts to solve this
problem in large-scale supervised classification, little has been done to
overcome catastrophic forgetting in few-shot classification problems. We
demonstrate that the popular gradient-based model-agnostic meta-learning
algorithm (MAML) indeed suffers from catastrophic forgetting and introduce a
Bayesian online meta-learning framework that tackles this problem. Our
framework utilises Bayesian online learning and meta-learning along with
Laplace approximation and variational inference to overcome catastrophic
forgetting in few-shot classification problems. The experimental evaluations
demonstrate that our framework can effectively achieve this goal in comparison
with various baselines. As an additional utility, we also demonstrate
empirically that our framework is capable of meta-learning on sequentially
arriving few-shot tasks from a stationary task distribution.Comment: ICML 202
BRUNO: A Deep Recurrent Model for Exchangeable Data
We present a novel model architecture which leverages deep learning tools to
perform exact Bayesian inference on sets of high dimensional, complex
observations. Our model is provably exchangeable, meaning that the joint
distribution over observations is invariant under permutation: this property
lies at the heart of Bayesian inference. The model does not require variational
approximations to train, and new samples can be generated conditional on
previous samples, with cost linear in the size of the conditioning set. The
advantages of our architecture are demonstrated on learning tasks that require
generalisation from short observed sequences while modelling sequence
variability, such as conditional image generation, few-shot learning, and
anomaly detection.Comment: NIPS 201
Recommended from our members
Advances in Probabilistic Modelling: Sparse Gaussian Processes, Autoencoders, and Few-shot Learning
Learning is the ability to generalise beyond training examples; but because many generalisations are consistent with a given set of observations, all machine learning methods rely on inductive biases to select certain generalisations over others. This thesis explores how the model structure
and priors affect the inductiven biases of probabilistic models, and our ability to learn and make inferences from data.
Specifically we present theoretical analyses alongside algorithmic and modelling advances in three areas of probabilistic machine learning: sparse Gaussian process approximations and invariant covariance functions, learning flexible priors for variational autoencoders, and probabilistic approaches for few-shot learning. As inference is rarely tractable, we discuss variational inference methods as a secondary theme.
First, we disentangle the theoretical properties and optimisation behaviour
of two widely used sparse Gaussian process approximations. We conclude that a variational free energy approximation is more principled and extensible and should be used in practice despite
potential optimisation difficulties. We then discuss how general symmetries and invariances can be integrated into Gaussian process priors and can be learned using the marginal likelihood. To make inference tractable, we develop a variational inference scheme that uses unbiased estimates of intractable covariance functions.
We then address the mismatch between aggregate posteriors and priors in variational autoencoders and propose a mechanism to define flexible distributions using a form of rejection sampling. We use this approach to define a more flexible prior distribution on the latent space of a variational autoencoder, which generalises to unseen test data and reduces the number of low quality samples from the model in a practical way.
Finally, we propose two probabilistic approaches to few-shot learning that achieve state of the art results on benchmarks, building on multi-task probabilistic models with adaptive classifier heads. Our first approach combines a pre-trained deep feature extractor with a simple probabilistic
model for the head, and can be linked to automatically regularised softmax regression. The second employs an amortised head model; it can be viewed to meta-learn probabilistic inference for prediction, and can be generalised to other contexts such as few-shot regression.UK Engineering and Physics Research Council (EPSRC) DTA, Qualcomm Studentship in Technology, Max Planck Societ
SCHA-VAE: Hierarchical Context Aggregation for Few-Shot Generation
A few-shot generative model should be able to generate data from a novel
distribution by only observing a limited set of examples. In few-shot learning
the model is trained on data from many sets from distributions sharing some
underlying properties such as sets of characters from different alphabets or
objects from different categories. We extend current latent variable models for
sets to a fully hierarchical approach with an attention-based point to
set-level aggregation and call our method SCHA-VAE for
Set-Context-Hierarchical-Aggregation Variational Autoencoder. We explore
likelihood-based model comparison, iterative data sampling, and adaptation-free
out-of-distribution generalization. Our results show that the hierarchical
formulation better captures the intrinsic variability within the sets in the
small data regime. This work generalizes deep latent variable approaches to
few-shot learning, taking a step toward large-scale few-shot generation with a
formulation that readily works with current state-of-the-art deep generative
models.Comment: ICML 202
Meta-Learning with Variational Semantic Memory for Word Sense Disambiguation
A critical challenge faced by supervised word sense disambiguation (WSD) is
the lack of large annotated datasets with sufficient coverage of words in their
diversity of senses. This inspired recent research on few-shot WSD using
meta-learning. While such work has successfully applied meta-learning to learn
new word senses from very few examples, its performance still lags behind its
fully supervised counterpart. Aiming to further close this gap, we propose a
model of semantic memory for WSD in a meta-learning setting. Semantic memory
encapsulates prior experiences seen throughout the lifetime of the model, which
aids better generalization in limited data settings. Our model is based on
hierarchical variational inference and incorporates an adaptive memory update
rule via a hypernetwork. We show our model advances the state of the art in
few-shot WSD, supports effective learning in extremely data scarce (e.g.
one-shot) scenarios and produces meaning prototypes that capture similar senses
of distinct words.Comment: 15 pages, 5 figure
- …