257 research outputs found
A Unified View on PAC-Bayes Bounds for Meta-Learning
Meta learning automatically infers an inductive bias, that includes the hyperparameter of the baselearning algorithm, by observing data from a finite number of related tasks. This paper studies PAC-Bayes bounds on meta generalization gap. The meta-generalization gap comprises two sources of generalization gaps: the environmentlevel and task-level gaps resulting from observation of a finite number of tasks and data samples per task, respectively. In this paper, by upper bounding arbitrary convex functions, which link the expected and empirical losses at the environment and also per-task levels, we obtain new PACBayes bounds. Using these bounds, we develop new PAC-Bayes meta-learning algorithms. Numerical examples demonstrate the merits of the proposed novel bounds and algorithm in comparison to prior PAC-Bayes bounds for meta-learning
Scalable PAC-Bayesian Meta-Learning via the PAC-Optimal Hyper-Posterior: From Theory to Practice
Meta-Learning aims to speed up the learning process on new tasks by acquiring
useful inductive biases from datasets of related learning tasks. While, in
practice, the number of related tasks available is often small, most of the
existing approaches assume an abundance of tasks; making them unrealistic and
prone to overfitting. A central question in the meta-learning literature is how
to regularize to ensure generalization to unseen tasks. In this work, we
provide a theoretical analysis using the PAC-Bayesian theory and present a
generalization bound for meta-learning, which was first derived by Rothfuss et
al. (2021). Crucially, the bound allows us to derive the closed form of the
optimal hyper-posterior, referred to as PACOH, which leads to the best
performance guarantees. We provide a theoretical analysis and empirical case
study under which conditions and to what extent these guarantees for
meta-learning improve upon PAC-Bayesian per-task learning bounds. The
closed-form PACOH inspires a practical meta-learning approach that avoids the
reliance on bi-level optimization, giving rise to a stochastic optimization
problem that is amenable to standard variational methods that scale well. Our
experiments show that, when instantiating the PACOH with Gaussian processes and
Bayesian Neural Networks models, the resulting methods are more scalable, and
yield state-of-the-art performance, both in terms of predictive accuracy and
the quality of uncertainty estimates.Comment: 61 pages. arXiv admin note: text overlap with arXiv:2002.0555
A review of domain adaptation without target labels
Domain adaptation has become a prominent problem setting in machine learning
and related fields. This review asks the question: how can a classifier learn
from a source domain and generalize to a target domain? We present a
categorization of approaches, divided into, what we refer to as, sample-based,
feature-based and inference-based methods. Sample-based methods focus on
weighting individual observations during training based on their importance to
the target domain. Feature-based methods revolve around on mapping, projecting
and representing features such that a source classifier performs well on the
target domain and inference-based methods incorporate adaptation into the
parameter estimation procedure, for instance through constraints on the
optimization procedure. Additionally, we review a number of conditions that
allow for formulating bounds on the cross-domain generalization error. Our
categorization highlights recurring ideas and raises questions important to
further research.Comment: 20 pages, 5 figure
Hypernetwork approach to Bayesian MAML
The main goal of Few-Shot learning algorithms is to enable learning from
small amounts of data. One of the most popular and elegant Few-Shot learning
approaches is Model-Agnostic Meta-Learning (MAML). The main idea behind this
method is to learn the shared universal weights of a meta-model, which are then
adapted for specific tasks. However, the method suffers from over-fitting and
poorly quantifies uncertainty due to limited data size. Bayesian approaches
could, in principle, alleviate these shortcomings by learning weight
distributions in place of point-wise weights. Unfortunately, previous
modifications of MAML are limited due to the simplicity of Gaussian posteriors,
MAML-like gradient-based weight updates, or by the same structure enforced for
universal and adapted weights.
In this paper, we propose a novel framework for Bayesian MAML called
BayesianHMAML, which employs Hypernetworks for weight updates. It learns the
universal weights point-wise, but a probabilistic structure is added when
adapted for specific tasks. In such a framework, we can use simple Gaussian
distributions or more complicated posteriors induced by Continuous Normalizing
Flows.Comment: arXiv admin note: text overlap with arXiv:2205.1574
Chromatic PAC-Bayes Bounds for Non-IID Data: Applications to Ranking and Stationary -Mixing Processes
Pac-Bayes bounds are among the most accurate generalization bounds for
classifiers learned from independently and identically distributed (IID) data,
and it is particularly so for margin classifiers: there have been recent
contributions showing how practical these bounds can be either to perform model
selection (Ambroladze et al., 2007) or even to directly guide the learning of
linear classifiers (Germain et al., 2009). However, there are many practical
situations where the training data show some dependencies and where the
traditional IID assumption does not hold. Stating generalization bounds for
such frameworks is therefore of the utmost interest, both from theoretical and
practical standpoints. In this work, we propose the first - to the best of our
knowledge - Pac-Bayes generalization bounds for classifiers trained on data
exhibiting interdependencies. The approach undertaken to establish our results
is based on the decomposition of a so-called dependency graph that encodes the
dependencies within the data, in sets of independent data, thanks to graph
fractional covers. Our bounds are very general, since being able to find an
upper bound on the fractional chromatic number of the dependency graph is
sufficient to get new Pac-Bayes bounds for specific settings. We show how our
results can be used to derive bounds for ranking statistics (such as Auc) and
classifiers trained on data distributed according to a stationary {\ss}-mixing
process. In the way, we show how our approach seemlessly allows us to deal with
U-processes. As a side note, we also provide a Pac-Bayes generalization bound
for classifiers learned on data from stationary -mixing distributions.Comment: Long version of the AISTATS 09 paper:
http://jmlr.csail.mit.edu/proceedings/papers/v5/ralaivola09a/ralaivola09a.pd
PAC-Bayes Bounds for Bandit Problems: A Survey and Experimental Comparison
PAC-Bayes has recently re-emerged as an effective theory with which one can
derive principled learning algorithms with tight performance guarantees.
However, applications of PAC-Bayes to bandit problems are relatively rare,
which is a great misfortune. Many decision-making problems in healthcare,
finance and natural sciences can be modelled as bandit problems. In many of
these applications, principled algorithms with strong performance guarantees
would be very much appreciated. This survey provides an overview of PAC-Bayes
bounds for bandit problems and an experimental comparison of these bounds. On
the one hand, we found that PAC-Bayes bounds are a useful tool for designing
offline bandit algorithms with performance guarantees. In our experiments, a
PAC-Bayesian offline contextual bandit algorithm was able to learn randomised
neural network polices with competitive expected reward and non-vacuous
performance guarantees. On the other hand, the PAC-Bayesian online bandit
algorithms that we tested had loose cumulative regret bounds. We conclude by
discussing some topics for future work on PAC-Bayesian bandit algorithms.Comment: 32 pages, 8 figure
PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization
While there has been progress in developing non-vacuous generalization bounds
for deep neural networks, these bounds tend to be uninformative about why deep
learning works. In this paper, we develop a compression approach based on
quantizing neural network parameters in a linear subspace, profoundly improving
on previous results to provide state-of-the-art generalization bounds on a
variety of tasks, including transfer learning. We use these tight bounds to
better understand the role of model size, equivariance, and the implicit biases
of optimization, for generalization in deep learning. Notably, we find large
models can be compressed to a much greater extent than previously known,
encapsulating Occam's razor. We also argue for data-independent bounds in
explaining generalization.Comment: NeurIPS 2022. Code is available at
https://github.com/activatedgeek/tight-pac-baye
Wasserstein PAC-Bayes Learning: A Bridge Between Generalisation and Optimisation
PAC-Bayes learning is an established framework to assess the generalisation
ability of learning algorithm during the training phase. However, it remains
challenging to know whether PAC-Bayes is useful to understand, before training,
why the output of well-known algorithms generalise well. We positively answer
this question by expanding the \emph{Wasserstein PAC-Bayes} framework, briefly
introduced in \cite{amit2022ipm}. We provide new generalisation bounds
exploiting geometric assumptions on the loss function. Using our framework, we
prove, before any training, that the output of an algorithm from
\citet{lambert2022variational} has a strong asymptotic generalisation ability.
More precisely, we show that it is possible to incorporate optimisation results
within a generalisation framework, building a bridge between PAC-Bayes and
optimisation algorithms
- …