35 research outputs found
Sequential Gibbs Posteriors with Applications to Principal Component Analysis
Gibbs posteriors are proportional to a prior distribution multiplied by an
exponentiated loss function, with a key tuning parameter weighting information
in the loss relative to the prior and providing a control of posterior
uncertainty. Gibbs posteriors provide a principled framework for
likelihood-free Bayesian inference, but in many situations, including a single
tuning parameter inevitably leads to poor uncertainty quantification. In
particular, regardless of the value of the parameter, credible regions have far
from the nominal frequentist coverage even in large samples. We propose a
sequential extension to Gibbs posteriors to address this problem. We prove the
proposed sequential posterior exhibits concentration and a Bernstein-von Mises
theorem, which holds under easy to verify conditions in Euclidean space and on
manifolds. As a byproduct, we obtain the first Bernstein-von Mises theorem for
traditional likelihood-based Bayesian posteriors on manifolds. All methods are
illustrated with an application to principal component analysis
Generalization Bounds: Perspectives from Information Theory and PAC-Bayes
A fundamental question in theoretical machine learning is generalization.
Over the past decades, the PAC-Bayesian approach has been established as a
flexible framework to address the generalization capabilities of machine
learning algorithms, and design new ones. Recently, it has garnered increased
interest due to its potential applicability for a variety of learning
algorithms, including deep neural networks. In parallel, an
information-theoretic view of generalization has developed, wherein the
relation between generalization and various information measures has been
established. This framework is intimately connected to the PAC-Bayesian
approach, and a number of results have been independently discovered in both
strands. In this monograph, we highlight this strong connection and present a
unified treatment of generalization. We present techniques and results that the
two perspectives have in common, and discuss the approaches and interpretations
that differ. In particular, we demonstrate how many proofs in the area share a
modular structure, through which the underlying ideas can be intuited. We pay
special attention to the conditional mutual information (CMI) framework;
analytical studies of the information complexity of learning algorithms; and
the application of the proposed methods to deep learning. This monograph is
intended to provide a comprehensive introduction to information-theoretic
generalization bounds and their connection to PAC-Bayes, serving as a
foundation from which the most recent developments are accessible. It is aimed
broadly towards researchers with an interest in generalization and theoretical
machine learning.Comment: 222 page
PAC-Bayes Generalisation Bounds for Heavy-Tailed Losses through Supermartingales
While PAC-Bayes is now an established learning framework for light-tailed
losses (\emph{e.g.}, subgaussian or subexponential), its extension to the case
of heavy-tailed losses remains largely uncharted and has attracted a growing
interest in recent years. We contribute PAC-Bayes generalisation bounds for
heavy-tailed losses under the sole assumption of bounded variance of the loss
function. Under that assumption, we extend previous results from
\citet{kuzborskij2019efron}. Our key technical contribution is exploiting an
extention of Markov's inequality for supermartingales. Our proof technique
unifies and extends different PAC-Bayesian frameworks by providing bounds for
unbounded martingales as well as bounds for batch and online learning with
heavy-tailed losses.Comment: New Section 3 on Online PAC-Baye
PAC-Bayesian Bandit Algorithms With Guarantees
PAC-Bayes is a mathematical framework that can be used to provide performance guarantees for machine learning algorithms, explain why specific machine learning algorithms work well, and design new machine learning algorithms. Since the first PAC-Bayesian theorems were proven in the late 1990's, several impressive milestones have been achieved. PAC-Bayes generalisation bounds have been used to prove tight error bounds for deep neural networks. In addition, PAC-Bayes bounds have been used to explain why machine learning principles such as large margin classification and preference for flat minima of a loss function work well. However, these milestones were achieved in simple supervised learning problems.
In this thesis, inspired by the success of the PAC-Bayes framework in supervised learning settings, we investigate the potential of the PAC-Bayes framework as a tool for designing and analysing bandit algorithms.
First, we provide a comprehensive overview of PAC-Bayes bounds for bandit problems and an experimental comparison of these bounds. Previous works focused on PAC-Bayes bounds for martingales and their application to importance sampling-based estimates of the reward or regret of a policy. On the one hand, we found that these PAC-Bayes bounds are a useful tool for designing offline policy search algorithms with performance guarantees. In our experiments, a PAC-Bayesian offline policy search algorithm was able to learn randomised neural network polices with competitive expected reward and non-vacuous performance guarantees. On the other hand, the PAC-Bayesian online policy search algorithms that we tested had underwhelming performance and loose cumulative regret bounds.
Next, we present novel PAC-Bayes-style algorithms with worst-case regret bounds for linear bandit problems. We combine PAC-Bayes bounds with the "optimism in the face of uncertainty" principle, which reduces a stochastic bandit problem to the construction of a confidence sequence for the unknown reward function. We use a novel PAC-Bayes-style tail bound for adaptive martingale mixtures to construct convex PAC-Bayes-style confidence sequences for (sparse) linear bandits. We show that (sparse) linear bandit algorithms based on our PAC-Bayes-style confidence sequences are guaranteed to achieve competitive worst-case regret. We also show that our confidence sequences yield confidence bounds that are tighter than competitors, both empirically and theoretically. Finally, we demonstrate that our tighter PAC-Bayes-style confidence bounds result in bandit algorithms with improved cumulative regret
PAC-Bayesian Treatment Allocation Under Budget Constraints
This paper considers the estimation of treatment assignment rules when the
policy maker faces a general budget or resource constraint. Utilizing the
PAC-Bayesian framework, we propose new treatment assignment rules that allow
for flexible notions of treatment outcome, treatment cost, and a budget
constraint. For example, the constraint setting allows for cost-savings, when
the costs of non-treatment exceed those of treatment for a subpopulation, to be
factored into the budget. It also accommodates simpler settings, such as
quantity constraints, and doesn't require outcome responses and costs to have
the same unit of measurement. Importantly, the approach accounts for settings
where budget or resource limitations may preclude treating all that can
benefit, where costs may vary with individual characteristics, and where there
may be uncertainty regarding the cost of treatment rules of interest. Despite
the nomenclature, our theoretical analysis examines frequentist properties of
the proposed rules. For stochastic rules that typically approach
budget-penalized empirical welfare maximizing policies in larger samples, we
derive non-asymptotic generalization bounds for the target population costs and
sharp oracle-type inequalities that compare the rules' welfare regret to that
of optimal policies in relevant budget categories. A closely related,
non-stochastic, model aggregation treatment assignment rule is shown to inherit
desirable attributes.Comment: 70 pages, 7 figure
PAC-Bayesian Learning of Optimization Algorithms
We apply the PAC-Bayes theory to the setting of learning-to-optimize. To the
best of our knowledge, we present the first framework to learn optimization
algorithms with provable generalization guarantees (PAC-bounds) and explicit
trade-off between a high probability of convergence and a high convergence
speed. Even in the limit case, where convergence is guaranteed, our learned
optimization algorithms provably outperform related algorithms based on a
(deterministic) worst-case analysis. Our results rely on PAC-Bayes bounds for
general, unbounded loss-functions based on exponential families. By
generalizing existing ideas, we reformulate the learning procedure into a
one-dimensional minimization problem and study the possibility to find a global
minimum, which enables the algorithmic realization of the learning procedure.
As a proof-of-concept, we learn hyperparameters of standard optimization
algorithms to empirically underline our theory.Comment: Accepted to AISTATS 202
A Unified View on PAC-Bayes Bounds for Meta-Learning
Meta learning automatically infers an inductive bias, that includes the hyperparameter of the baselearning algorithm, by observing data from a finite number of related tasks. This paper studies PAC-Bayes bounds on meta generalization gap. The meta-generalization gap comprises two sources of generalization gaps: the environmentlevel and task-level gaps resulting from observation of a finite number of tasks and data samples per task, respectively. In this paper, by upper bounding arbitrary convex functions, which link the expected and empirical losses at the environment and also per-task levels, we obtain new PACBayes bounds. Using these bounds, we develop new PAC-Bayes meta-learning algorithms. Numerical examples demonstrate the merits of the proposed novel bounds and algorithm in comparison to prior PAC-Bayes bounds for meta-learning
Efficient local search for Pseudo Boolean Optimization
Algorithms and the Foundations of Software technolog
Discrete Mathematics and Symmetry
Some of the most beautiful studies in Mathematics are related to Symmetry and Geometry. For this reason, we select here some contributions about such aspects and Discrete Geometry. As we know, Symmetry in a system means invariance of its elements under conditions of transformations. When we consider network structures, symmetry means invariance of adjacency of nodes under the permutations of node set. The graph isomorphism is an equivalence relation on the set of graphs. Therefore, it partitions the class of all graphs into equivalence classes. The underlying idea of isomorphism is that some objects have the same structure if we omit the individual character of their components. A set of graphs isomorphic to each other is denominated as an isomorphism class of graphs. The automorphism of a graph will be an isomorphism from G onto itself. The family of all automorphisms of a graph G is a permutation group