188 research outputs found
PAC-Bayesian High Dimensional Bipartite Ranking
This paper is devoted to the bipartite ranking problem, a classical
statistical learning task, in a high dimensional setting. We propose a scoring
and ranking strategy based on the PAC-Bayesian approach. We consider nonlinear
additive scoring functions, and we derive non-asymptotic risk bounds under a
sparsity assumption. In particular, oracle inequalities in probability holding
under a margin condition assess the performance of our procedure, and prove its
minimax optimality. An MCMC-flavored algorithm is proposed to implement our
method, along with its behavior on synthetic and real-life datasets
Local False Discovery Rate Based Methods for Multiple Testing of One-Way Classified Hypotheses
This paper continues the line of research initiated in
\cite{Liu:Sarkar:Zhao:2016} on developing a novel framework for multiple
testing of hypotheses grouped in a one-way classified form using
hypothesis-specific local false discovery rates (Lfdr's). It is built on an
extension of the standard two-class mixture model from single to multiple
groups, defining hypothesis-specific Lfdr as a function of the conditional Lfdr
for the hypothesis given that it is within a significant group and the Lfdr for
the group itself and involving a new parameter that measures grouping effect.
This definition captures the underlying group structure for the hypotheses
belonging to a group more effectively than the standard two-class mixture
model. Two new Lfdr based methods, possessing meaningful optimalities, are
produced in their oracle forms. One, designed to control false discoveries
across the entire collection of hypotheses, is proposed as a powerful
alternative to simply pooling all the hypotheses into a single group and using
commonly used Lfdr based method under the standard single-group two-class
mixture model. The other is proposed as an Lfdr analog of the method of
\cite{Benjamini:Bogomolov:2014} for selective inference. It controls Lfdr based
measure of false discoveries associated with selecting groups concurrently with
controlling the average of within-group false discovery proportions across the
selected groups. Simulation studies and real-data application show that our
proposed methods are often more powerful than their relevant competitors.Comment: 26 pages, 17 figure
Bayesian variable selection with shrinking and diffusing priors
We consider a Bayesian approach to variable selection in the presence of high
dimensional covariates based on a hierarchical model that places prior
distributions on the regression coefficients as well as on the model space. We
adopt the well-known spike and slab Gaussian priors with a distinct feature,
that is, the prior variances depend on the sample size through which
appropriate shrinkage can be achieved. We show the strong selection consistency
of the proposed method in the sense that the posterior probability of the true
model converges to one even when the number of covariates grows nearly
exponentially with the sample size. This is arguably the strongest selection
consistency result that has been available in the Bayesian variable selection
literature; yet the proposed method can be carried out through posterior
sampling with a simple Gibbs sampler. Furthermore, we argue that the proposed
method is asymptotically similar to model selection with the penalty. We
also demonstrate through empirical work the fine performance of the proposed
approach relative to some state of the art alternatives.Comment: Published in at http://dx.doi.org/10.1214/14-AOS1207 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
On the properties of variational approximations of Gibbs posteriors
Abstract The PAC-Bayesian approach is a powerful set of techniques to derive non-asymptotic risk bounds for random estimators. The corresponding optimal distribution of estimators, usually called the Gibbs posterior, is unfortunately often intractable. One may sample from it using Markov chain Monte Carlo, but this is usually too slow for big datasets. We consider instead variational approximations of the Gibbs posterior, which are fast to compute. We undertake a general study of the properties of such approximations. Our main finding is that such a variational approximation has often the same rate of convergence as the original PAC-Bayesian procedure it approximates. In addition, we show that, when the risk function is convex, a variational approximation can be obtained in polynomial time using a convex solver. We give finite sample oracle inequalities for the corresponding estimator. We specialize our results to several learning tasks (classification, ranking, matrix completion), discuss how to implement a variational approximation in each case, and illustrate the good properties of said approximation on real datasets
PAC-Bayesian Treatment Allocation Under Budget Constraints
This paper considers the estimation of treatment assignment rules when the
policy maker faces a general budget or resource constraint. Utilizing the
PAC-Bayesian framework, we propose new treatment assignment rules that allow
for flexible notions of treatment outcome, treatment cost, and a budget
constraint. For example, the constraint setting allows for cost-savings, when
the costs of non-treatment exceed those of treatment for a subpopulation, to be
factored into the budget. It also accommodates simpler settings, such as
quantity constraints, and doesn't require outcome responses and costs to have
the same unit of measurement. Importantly, the approach accounts for settings
where budget or resource limitations may preclude treating all that can
benefit, where costs may vary with individual characteristics, and where there
may be uncertainty regarding the cost of treatment rules of interest. Despite
the nomenclature, our theoretical analysis examines frequentist properties of
the proposed rules. For stochastic rules that typically approach
budget-penalized empirical welfare maximizing policies in larger samples, we
derive non-asymptotic generalization bounds for the target population costs and
sharp oracle-type inequalities that compare the rules' welfare regret to that
of optimal policies in relevant budget categories. A closely related,
non-stochastic, model aggregation treatment assignment rule is shown to inherit
desirable attributes.Comment: 70 pages, 7 figure
A reduced-rank approach to predicting multiple binary responses through machine learning
This paper investigates the problem of simultaneously predicting multiple
binary responses by utilizing a shared set of covariates. Our approach
incorporates machine learning techniques for binary classification, without
making assumptions about the underlying observations. Instead, our focus lies
on a group of predictors, aiming to identify the one that minimizes prediction
error. Unlike previous studies that primarily address estimation error, we
directly analyze the prediction error of our method using PAC-Bayesian bounds
techniques. In this paper, we introduce a pseudo-Bayesian approach capable of
handling incomplete response data. Our strategy is efficiently implemented
using the Langevin Monte Carlo method. Through simulation studies and a
practical application using real data, we demonstrate the effectiveness of our
proposed method, producing comparable or sometimes superior results compared to
the current state-of-the-art method
Generalization Bounds: Perspectives from Information Theory and PAC-Bayes
A fundamental question in theoretical machine learning is generalization.
Over the past decades, the PAC-Bayesian approach has been established as a
flexible framework to address the generalization capabilities of machine
learning algorithms, and design new ones. Recently, it has garnered increased
interest due to its potential applicability for a variety of learning
algorithms, including deep neural networks. In parallel, an
information-theoretic view of generalization has developed, wherein the
relation between generalization and various information measures has been
established. This framework is intimately connected to the PAC-Bayesian
approach, and a number of results have been independently discovered in both
strands. In this monograph, we highlight this strong connection and present a
unified treatment of generalization. We present techniques and results that the
two perspectives have in common, and discuss the approaches and interpretations
that differ. In particular, we demonstrate how many proofs in the area share a
modular structure, through which the underlying ideas can be intuited. We pay
special attention to the conditional mutual information (CMI) framework;
analytical studies of the information complexity of learning algorithms; and
the application of the proposed methods to deep learning. This monograph is
intended to provide a comprehensive introduction to information-theoretic
generalization bounds and their connection to PAC-Bayes, serving as a
foundation from which the most recent developments are accessible. It is aimed
broadly towards researchers with an interest in generalization and theoretical
machine learning.Comment: 222 page
- …