8,507 research outputs found
Probabilistic task modelling for meta-learning
We propose probabilistic task modelling -- a generative probabilistic model
for collections of tasks used in meta-learning. The proposed model combines
variational auto-encoding and latent Dirichlet allocation to model each task as
a mixture of Gaussian distribution in an embedding space. Such modelling
provides an explicit representation of a task through its task-theme mixture.
We present an efficient approximation inference technique based on variational
inference method for empirical Bayes parameter estimation. We perform empirical
evaluations to validate the task uncertainty and task distance produced by the
proposed method through correlation diagrams of the prediction accuracy on
testing tasks. We also carry out experiments of task selection in meta-learning
to demonstrate how the task relatedness inferred from the proposed model help
to facilitate meta-learning algorithms.Comment: Accepted at UAI 202
Recommended from our members
Variational methods with dependence structure
It is a common practice among humans to deduce, to explain and to make predictions based on concepts that are not directly observable. In Bayesian statistics, the underlying propositions of the unobserved latent variables are summarized in the posterior distribution. With the increasing complexity of real-world data and statistical models, fast and accurate inference for the posterior becomes essential. Variational methods, by casting the posterior inference problem in the optimization framework, are widely used for their flexibility and computational efficiency. In this thesis, we develop new variational methods, studying their theoretical properties and applications.
In the first part of the thesis, we utilize dependence structures towards addressing fundamental problems in variational inference (VI): posterior uncertainty estimation, convergence properties, and discrete optimization. Though it is flexible, variational inference often underestimates the posterior uncertainty. This is a consequence of the over-simplified variational family. Mean-field variational inference (MFVI), for example, uses a product of independent distributions as a coarse approximation to the posterior. As a remedy, we propose a hierarchical variational distribution with flexible parameterization that can model the dependence structure between latent variables. With a newly derived objective, we show that the proposed variational method can achieve accurate and efficient uncertainty estimation.
We further theoretically study the structured variational inference in the setting of the Stochastic Blockmodel (SBM). The variational distribution is constructed with a pairwise structure among the nodes of a graph. We prove that, in a broad density regime and for general random initializations, the estimated class labels by structured VI converge to the ground truth with high probability. Empirically, we demonstrate structured VI is more robust compared with MFVI when the graph is sparse and the signal to noise ratio is low.
When the latent variables are discrete, gradient descent based VI often suffers from bias and high variance in the gradient estimation. With correlated random samples, we propose a novel unbiased, low-variance gradient estimator. We demonstrate that under certain constraints, such correlated sampling gives an optimal control variates for the variance reduction. The efficient gradient estimation can be applied to solve a wide range of problems such as the variable selection, reinforcement learning, natural language processing, among others.
For the second part of the thesis, we apply variational methods to the study of generalization problems in the meta-learning. When trained over multiple-tasks, we identify that a variety of the meta-learning algorithms implicitly require the tasks to have a mutually-exclusive dependence structure. This prevents the task-level overfitting problem and ensures the fast adaptation of the algorithm in the face of a new task. However, such dependence structure may not exist for general tasks. When the tasks are non-mutually exclusive, we develop new meta-learning algorithms with variational regularization to prevent the task-level overfitting. Consequently, we can expand the meta-learning to the domains which it cannot be effective on before.Statistic
ContraBAR: Contrastive Bayes-Adaptive Deep RL
In meta reinforcement learning (meta RL), an agent seeks a Bayes-optimal
policy -- the optimal policy when facing an unknown task that is sampled from
some known task distribution. Previous approaches tackled this problem by
inferring a belief over task parameters, using variational inference methods.
Motivated by recent successes of contrastive learning approaches in RL, such as
contrastive predictive coding (CPC), we investigate whether contrastive methods
can be used for learning Bayes-optimal behavior. We begin by proving that
representations learned by CPC are indeed sufficient for Bayes optimality.
Based on this observation, we propose a simple meta RL algorithm that uses CPC
in lieu of variational belief inference. Our method, ContraBAR, achieves
comparable performance to state-of-the-art in domains with state-based
observation and circumvents the computational toll of future observation
reconstruction, enabling learning in domains with image-based observations. It
can also be combined with image augmentations for domain randomization and used
seamlessly in both online and offline meta RL settings.Comment: ICML 2023. Pytorch code available at
https://github.com/ec2604/ContraBA
Learning to Learn Kernels with Variational Random Features
In this work, we introduce kernels with random Fourier features in the
meta-learning framework to leverage their strong few-shot learning ability. We
propose meta variational random features (MetaVRF) to learn adaptive kernels
for the base-learner, which is developed in a latent variable model by treating
the random feature basis as the latent variable. We formulate the optimization
of MetaVRF as a variational inference problem by deriving an evidence lower
bound under the meta-learning framework. To incorporate shared knowledge from
related tasks, we propose a context inference of the posterior, which is
established by an LSTM architecture. The LSTM-based inference network can
effectively integrate the context information of previous tasks with
task-specific information, generating informative and adaptive features. The
learned MetaVRF can produce kernels of high representational power with a
relatively low spectral sampling rate and also enables fast adaptation to new
tasks. Experimental results on a variety of few-shot regression and
classification tasks demonstrate that MetaVRF delivers much better, or at least
competitive, performance compared to existing meta-learning alternatives.Comment: ICML'2020; code is available in:
https://github.com/Yingjun-Du/MetaVR
Variational Metric Scaling for Metric-Based Meta-Learning
Metric-based meta-learning has attracted a lot of attention due to its
effectiveness and efficiency in few-shot learning. Recent studies show that
metric scaling plays a crucial role in the performance of metric-based
meta-learning algorithms. However, there still lacks a principled method for
learning the metric scaling parameter automatically. In this paper, we recast
metric-based meta-learning from a Bayesian perspective and develop a
variational metric scaling framework for learning a proper metric scaling
parameter. Firstly, we propose a stochastic variational method to learn a
single global scaling parameter. To better fit the embedding space to a given
data distribution, we extend our method to learn a dimensional scaling vector
to transform the embedding space. Furthermore, to learn task-specific
embeddings, we generate task-dependent dimensional scaling vectors with
amortized variational inference. Our method is end-to-end without any
pre-training and can be used as a simple plug-and-play module for existing
metric-based meta-algorithms. Experiments on mini-ImageNet show that our
methods can be used to consistently improve the performance of existing
metric-based meta-algorithms including prototypical networks and TADAM. The
source code can be downloaded from
https://github.com/jiaxinchen666/variational-scaling.Comment: AAAI202
Amortised Inference in Bayesian Neural Networks
Meta-learning is a framework in which machine learning models train over a
set of datasets in order to produce predictions on new datasets at test time.
Probabilistic meta-learning has received an abundance of attention from the
research community in recent years, but a problem shared by many existing
probabilistic meta-models is that they require a very large number of datasets
in order to produce high-quality predictions with well-calibrated uncertainty
estimates. In many applications, however, such quantities of data are simply
not available.
In this dissertation we present a significantly more data-efficient approach
to probabilistic meta-learning through per-datapoint amortisation of inference
in Bayesian neural networks, introducing the Amortised Pseudo-Observation
Variational Inference Bayesian Neural Network (APOVI-BNN). First, we show that
the approximate posteriors obtained under our amortised scheme are of similar
or better quality to those obtained through traditional variational inference,
despite the fact that the amortised inference is performed in a single forward
pass. We then discuss how the APOVI-BNN may be viewed as a new member of the
neural process family, motivating the use of neural process training objectives
for potentially better predictive performance on complex problems as a result.
Finally, we assess the predictive performance of the APOVI-BNN against other
probabilistic meta-models in both a one-dimensional regression problem and in a
significantly more complex image completion setting. In both cases, when the
amount of training data is limited, our model is the best in its class.Comment: This thesis served as the author's final project report for the
University of Cambridge part IIB Engineering Tripos. 37 pages, 7 figure
Learning to Learn Variational Semantic Memory
In this paper, we introduce variational semantic memory into meta-learning to
acquire long-term knowledge for few-shot learning. The variational semantic
memory accrues and stores semantic information for the probabilistic inference
of class prototypes in a hierarchical Bayesian framework. The semantic memory
is grown from scratch and gradually consolidated by absorbing information from
tasks it experiences. By doing so, it is able to accumulate long-term, general
knowledge that enables it to learn new concepts of objects. We formulate memory
recall as the variational inference of a latent memory variable from addressed
contents, which offers a principled way to adapt the knowledge to individual
tasks. Our variational semantic memory, as a new long-term memory module,
confers principled recall and update mechanisms that enable semantic
information to be efficiently accrued and adapted for few-shot learning.
Experiments demonstrate that the probabilistic modelling of prototypes achieves
a more informative representation of object classes compared to deterministic
vectors. The consistent new state-of-the-art performance on four benchmarks
shows the benefit of variational semantic memory in boosting few-shot
recognition.Comment: accepted to NeurIPS 2020; code is available in
https://github.com/YDU-uva/VS
Recommended from our members
Advances in Probabilistic Modelling: Sparse Gaussian Processes, Autoencoders, and Few-shot Learning
Learning is the ability to generalise beyond training examples; but because many generalisations are consistent with a given set of observations, all machine learning methods rely on inductive biases to select certain generalisations over others. This thesis explores how the model structure
and priors affect the inductiven biases of probabilistic models, and our ability to learn and make inferences from data.
Specifically we present theoretical analyses alongside algorithmic and modelling advances in three areas of probabilistic machine learning: sparse Gaussian process approximations and invariant covariance functions, learning flexible priors for variational autoencoders, and probabilistic approaches for few-shot learning. As inference is rarely tractable, we discuss variational inference methods as a secondary theme.
First, we disentangle the theoretical properties and optimisation behaviour
of two widely used sparse Gaussian process approximations. We conclude that a variational free energy approximation is more principled and extensible and should be used in practice despite
potential optimisation difficulties. We then discuss how general symmetries and invariances can be integrated into Gaussian process priors and can be learned using the marginal likelihood. To make inference tractable, we develop a variational inference scheme that uses unbiased estimates of intractable covariance functions.
We then address the mismatch between aggregate posteriors and priors in variational autoencoders and propose a mechanism to define flexible distributions using a form of rejection sampling. We use this approach to define a more flexible prior distribution on the latent space of a variational autoencoder, which generalises to unseen test data and reduces the number of low quality samples from the model in a practical way.
Finally, we propose two probabilistic approaches to few-shot learning that achieve state of the art results on benchmarks, building on multi-task probabilistic models with adaptive classifier heads. Our first approach combines a pre-trained deep feature extractor with a simple probabilistic
model for the head, and can be linked to automatically regularised softmax regression. The second employs an amortised head model; it can be viewed to meta-learn probabilistic inference for prediction, and can be generalised to other contexts such as few-shot regression.UK Engineering and Physics Research Council (EPSRC) DTA, Qualcomm Studentship in Technology, Max Planck Societ
- …