188 research outputs found

    PAC-Bayesian High Dimensional Bipartite Ranking

    Get PDF
    This paper is devoted to the bipartite ranking problem, a classical statistical learning task, in a high dimensional setting. We propose a scoring and ranking strategy based on the PAC-Bayesian approach. We consider nonlinear additive scoring functions, and we derive non-asymptotic risk bounds under a sparsity assumption. In particular, oracle inequalities in probability holding under a margin condition assess the performance of our procedure, and prove its minimax optimality. An MCMC-flavored algorithm is proposed to implement our method, along with its behavior on synthetic and real-life datasets

    Local False Discovery Rate Based Methods for Multiple Testing of One-Way Classified Hypotheses

    Full text link
    This paper continues the line of research initiated in \cite{Liu:Sarkar:Zhao:2016} on developing a novel framework for multiple testing of hypotheses grouped in a one-way classified form using hypothesis-specific local false discovery rates (Lfdr's). It is built on an extension of the standard two-class mixture model from single to multiple groups, defining hypothesis-specific Lfdr as a function of the conditional Lfdr for the hypothesis given that it is within a significant group and the Lfdr for the group itself and involving a new parameter that measures grouping effect. This definition captures the underlying group structure for the hypotheses belonging to a group more effectively than the standard two-class mixture model. Two new Lfdr based methods, possessing meaningful optimalities, are produced in their oracle forms. One, designed to control false discoveries across the entire collection of hypotheses, is proposed as a powerful alternative to simply pooling all the hypotheses into a single group and using commonly used Lfdr based method under the standard single-group two-class mixture model. The other is proposed as an Lfdr analog of the method of \cite{Benjamini:Bogomolov:2014} for selective inference. It controls Lfdr based measure of false discoveries associated with selecting groups concurrently with controlling the average of within-group false discovery proportions across the selected groups. Simulation studies and real-data application show that our proposed methods are often more powerful than their relevant competitors.Comment: 26 pages, 17 figure

    Bayesian variable selection with shrinking and diffusing priors

    Full text link
    We consider a Bayesian approach to variable selection in the presence of high dimensional covariates based on a hierarchical model that places prior distributions on the regression coefficients as well as on the model space. We adopt the well-known spike and slab Gaussian priors with a distinct feature, that is, the prior variances depend on the sample size through which appropriate shrinkage can be achieved. We show the strong selection consistency of the proposed method in the sense that the posterior probability of the true model converges to one even when the number of covariates grows nearly exponentially with the sample size. This is arguably the strongest selection consistency result that has been available in the Bayesian variable selection literature; yet the proposed method can be carried out through posterior sampling with a simple Gibbs sampler. Furthermore, we argue that the proposed method is asymptotically similar to model selection with the L0L_0 penalty. We also demonstrate through empirical work the fine performance of the proposed approach relative to some state of the art alternatives.Comment: Published in at http://dx.doi.org/10.1214/14-AOS1207 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    On the properties of variational approximations of Gibbs posteriors

    Get PDF
    Abstract The PAC-Bayesian approach is a powerful set of techniques to derive non-asymptotic risk bounds for random estimators. The corresponding optimal distribution of estimators, usually called the Gibbs posterior, is unfortunately often intractable. One may sample from it using Markov chain Monte Carlo, but this is usually too slow for big datasets. We consider instead variational approximations of the Gibbs posterior, which are fast to compute. We undertake a general study of the properties of such approximations. Our main finding is that such a variational approximation has often the same rate of convergence as the original PAC-Bayesian procedure it approximates. In addition, we show that, when the risk function is convex, a variational approximation can be obtained in polynomial time using a convex solver. We give finite sample oracle inequalities for the corresponding estimator. We specialize our results to several learning tasks (classification, ranking, matrix completion), discuss how to implement a variational approximation in each case, and illustrate the good properties of said approximation on real datasets

    PAC-Bayesian Treatment Allocation Under Budget Constraints

    Full text link
    This paper considers the estimation of treatment assignment rules when the policy maker faces a general budget or resource constraint. Utilizing the PAC-Bayesian framework, we propose new treatment assignment rules that allow for flexible notions of treatment outcome, treatment cost, and a budget constraint. For example, the constraint setting allows for cost-savings, when the costs of non-treatment exceed those of treatment for a subpopulation, to be factored into the budget. It also accommodates simpler settings, such as quantity constraints, and doesn't require outcome responses and costs to have the same unit of measurement. Importantly, the approach accounts for settings where budget or resource limitations may preclude treating all that can benefit, where costs may vary with individual characteristics, and where there may be uncertainty regarding the cost of treatment rules of interest. Despite the nomenclature, our theoretical analysis examines frequentist properties of the proposed rules. For stochastic rules that typically approach budget-penalized empirical welfare maximizing policies in larger samples, we derive non-asymptotic generalization bounds for the target population costs and sharp oracle-type inequalities that compare the rules' welfare regret to that of optimal policies in relevant budget categories. A closely related, non-stochastic, model aggregation treatment assignment rule is shown to inherit desirable attributes.Comment: 70 pages, 7 figure

    A reduced-rank approach to predicting multiple binary responses through machine learning

    Full text link
    This paper investigates the problem of simultaneously predicting multiple binary responses by utilizing a shared set of covariates. Our approach incorporates machine learning techniques for binary classification, without making assumptions about the underlying observations. Instead, our focus lies on a group of predictors, aiming to identify the one that minimizes prediction error. Unlike previous studies that primarily address estimation error, we directly analyze the prediction error of our method using PAC-Bayesian bounds techniques. In this paper, we introduce a pseudo-Bayesian approach capable of handling incomplete response data. Our strategy is efficiently implemented using the Langevin Monte Carlo method. Through simulation studies and a practical application using real data, we demonstrate the effectiveness of our proposed method, producing comparable or sometimes superior results compared to the current state-of-the-art method

    Generalization Bounds: Perspectives from Information Theory and PAC-Bayes

    Full text link
    A fundamental question in theoretical machine learning is generalization. Over the past decades, the PAC-Bayesian approach has been established as a flexible framework to address the generalization capabilities of machine learning algorithms, and design new ones. Recently, it has garnered increased interest due to its potential applicability for a variety of learning algorithms, including deep neural networks. In parallel, an information-theoretic view of generalization has developed, wherein the relation between generalization and various information measures has been established. This framework is intimately connected to the PAC-Bayesian approach, and a number of results have been independently discovered in both strands. In this monograph, we highlight this strong connection and present a unified treatment of generalization. We present techniques and results that the two perspectives have in common, and discuss the approaches and interpretations that differ. In particular, we demonstrate how many proofs in the area share a modular structure, through which the underlying ideas can be intuited. We pay special attention to the conditional mutual information (CMI) framework; analytical studies of the information complexity of learning algorithms; and the application of the proposed methods to deep learning. This monograph is intended to provide a comprehensive introduction to information-theoretic generalization bounds and their connection to PAC-Bayes, serving as a foundation from which the most recent developments are accessible. It is aimed broadly towards researchers with an interest in generalization and theoretical machine learning.Comment: 222 page
    • …