396 research outputs found

    Methodological and Computational Advances for High–Dimensional Bayesian Regression with Binary and Categorical Responses

    Get PDF
    Probit and logistic regressions are among the most popular and well-established formulations to model binary observations, thanks to their plain structure and high interpretability. Despite their simplicity, their use poses non-trivial hindrances to the inferential procedure, particularly from a computational perspective and in high-dimensional scenarios. This still motivates thriving active research for probit, logit, and a number of their generalizations, especially within the Bayesian community. Conjugacy results for standard probit regression under normal and unified skew-normal (SUN) priors appeared only recently in the literature. Such findings were rapidly extended to different generalizations of probit regression, including multinomial probit, dynamic multivariate probit and skewed Gaussian processes among others. Nonetheless, these recent developments focus on specific subclasses of models, which can all be regarded as instances of a potentially broader family of formulations, that rely on partially or fully discretized Gaussian latent utilities. As such, we develop a unified comprehensive framework that encompasses all the above constructions and many others, such as tobit regression and its extensions, for which conjugacy results are yet missing. We show that the SUN family of distribution is conjugate for all models within the broad class considered, which notably encompasses all formulations relying on likelihoods given by the product of multivariate Gaussian densities and cumulative distributions, evaluated at a linear combination of the parameter of interest. Such a unifying framework is practically and conceptually useful for studying general theoretical properties and developing future extensions. This includes new avenues for improved posterior inference exploiting i.i.d. samplers from the exact SUN posteriors and recent accurate and scalable variational Bayes (VB) approximations and expectation-propagation, for which we derive a novel efficient implementation. Along a parallel research line, we focus on binary regression under logit mapping, for which computations in high dimensions still pose open challenges. To overcome such difficulties, several contributions focus on solving iteratively a series of surrogate problems, entailing the sequential refinement of tangent lower bounds for the logistic log-likelihoods. For instance, tractable quadratic minorizers can be exploited to obtain maximum likelihood (ML) and maximum a posteriori estimates via minorize-maximize and expectation-maximization schemes, with desirable convergence guarantees. Likewise, quadratic surrogates can be used to construct Gaussian approximations of the posterior distribution in mean-field VB routines, which might however suffer from low accuracy in high dimensions. This issue can be mitigated by resorting to more flexible but involved piece-wise quadratic bounds, that however are typically defined in an implicit way and entail reduced tractability as the number of pieces increases. For this reason, we derive a novel tangent minorizer for logistic log-likelihoods, that combines the quadratic term with a single piece-wise linear contribution per each observation, proportional to the absolute value of the corresponding linear predictor. The proposed bound is guaranteed to improve the accuracy over the sharpest among quadratic minorizers, while minimizing the reduction in tractability compared to general piece-wise quadratic bounds. As opposed to the latter, its explicit analytical expression allows to simplify computations by exploiting a renowned scale-mixture representation of Laplace random variables. We investigate the benefit of the proposed methodology both in the context of penalized ML estimation, where it leads to a faster convergence rate of the optimization procedure, and of VB approximation, as the resulting accuracy improvement over mean-field strategies can be substantial in skewed and high-dimensional scenarios

    Adaptive Annealed Importance Sampling with Constant Rate Progress

    Full text link
    Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution given its unnormalized density function. This algorithm relies on a sequence of interpolating distributions bridging the target to an initial tractable distribution such as the well-known geometric mean path of unnormalized distributions which is assumed to be suboptimal in general. In this paper, we prove that the geometric annealing corresponds to the distribution path that minimizes the KL divergence between the current particle distribution and the desired target when the feasible change in the particle distribution is constrained. Following this observation, we derive the constant rate discretization schedule for this annealing sequence, which adjusts the schedule to the difficulty of moving samples between the initial and the target distributions. We further extend our results to ff-divergences and present the respective dynamics of annealing sequences based on which we propose the Constant Rate AIS (CR-AIS) algorithm and its efficient implementation for α\alpha-divergences. We empirically show that CR-AIS performs well on multiple benchmark distributions while avoiding the computationally expensive tuning loop in existing Adaptive AIS

    Modeling heterogeneity in random graphs through latent space models: a selective review

    Get PDF
    We present a selective review on probabilistic modeling of heterogeneity in random graphs. We focus on latent space models and more particularly on stochastic block models and their extensions that have undergone major developments in the last five years

    Approximate inference in graphical models

    Get PDF
    Probability theory provides a mathematically rigorous yet conceptually flexible calculus of uncertainty, allowing the construction of complex hierarchical models for real-world inference tasks. Unfortunately, exact inference in probabilistic models is often computationally expensive or even intractable. A close inspection in such situations often reveals that computational bottlenecks are confined to certain aspects of the model, which can be circumvented by approximations without having to sacrifice the model's interesting aspects. The conceptual framework of graphical models provides an elegant means of representing probabilistic models and deriving both exact and approximate inference algorithms in terms of local computations. This makes graphical models an ideal aid in the development of generalizable approximations. This thesis contains a brief introduction to approximate inference in graphical models (Chapter 2), followed by three extensive case studies in which approximate inference algorithms are developed for challenging applied inference problems. Chapter 3 derives the first probabilistic game tree search algorithm. Chapter 4 provides a novel expressive model for inference in psychometric questionnaires. Chapter 5 develops a model for the topics of large corpora of text documents, conditional on document metadata, with a focus on computational speed. In each case, graphical models help in two important ways: They first provide important structural insight into the problem; and then suggest practical approximations to the exact probabilistic solution.This work was supported by a scholarship from Microsoft Research, Ltd

    Probabilistic models for structured sparsity

    Get PDF

    Applications of Approximate Learning and Inference for Probabilistic Models

    Get PDF
    We develop approximate inference and learning methods for facilitating the use of probabilistic modeling techniques motivated by applications in two different areas. First, we consider the ill-posed inverse problem of recovering an image from an underdetermined system of linear measurements corrupted by noise. Second, we consider the problem of inferring user preferences for items from counts, pairwise comparisons and user activity logs, instances of implicit feedback. Plausible models for images and the noise, incurred when recording them, render posterior inference intractable, while the scale of the inference problem makes sampling based approximations ineffective. Therefore, we develop deterministic approximate inference algorithms for two different augmentations of a typical sparse linear model: first, for the rectified-linear Poisson likelihood, and second, for tree-structured super-Gaussian mixture models. The rectified-linear Poisson likelihood is an alternative noise model, applicable in astronomical and biomedical imaging applications, that operate in intensity regimes in which quantum effects lead to observations that are best described by counts of particles arriving at a sensor, as well as in general Poisson regression problems arising in various fields. In this context we show, that the model-specific computations for Expectation Propagation can be robustly solved by a simple dynamic program. Next, we develop a scalable approximate inference algorithm for structured mixture models, that uses a discrete graphical model to represent dependencies between the latent mixture components of a collection of mixture models. Specifically, we use tree-structured mixtures of super-Gaussians to model the persistence across scales of large coefficients of the Wavelet transform of an image for improved reconstruction. In the second part on models of user preference, we consider two settings: the global static and the contextual dynamic setting. In the global static setting, we represent user-item preferences by a latent low-rank matrix. Instead of using numeric ratings we develop methods to infer this latent representation for two types of implicit feedback: aggregate counts of users interacting with a service and the binary outcomes of pairwise comparisons. We model count data using a latent Gaussian bilinear model with Poisson likelihoods. For this model, we show that the Variational Gaussian approximation can be further relaxed to be available in closed-form by adding additional constraints, leading to an efficient inference algorithm. In the second implicit feedback scenario, we infer the latent preference matrix from pairwise preference statements. We combine a low-rank bilinear model with non-parameteric item- feature regression and develop a novel approximate variational Expectation Maximization algorithm that mitigates the computational challenges due to latent couplings induced by the pairwise comparisons. Finally, in the contextual dynamic setting, we model sequences of user activity at the granularity of single interaction events instead of aggregate counts. Routinely gathered in the background at a large scale in many applications, such sequences can reveal temporal and contextual aspects of user behavior through recurrent patterns. To describe such data, we propose a generic collaborative sequence model based on recurrent neural networks, that combines ideas from collaborative filtering and language modeling

    Bayesian Learning for Neural Networks: an algorithmic survey

    Full text link
    The last decade witnessed a growing interest in Bayesian learning. Yet, the technicality of the topic and the multitude of ingredients involved therein, besides the complexity of turning theory into practical implementations, limit the use of the Bayesian learning paradigm, preventing its widespread adoption across different fields and applications. This self-contained survey engages and introduces readers to the principles and algorithms of Bayesian Learning for Neural Networks. It provides an introduction to the topic from an accessible, practical-algorithmic perspective. Upon providing a general introduction to Bayesian Neural Networks, we discuss and present both standard and recent approaches for Bayesian inference, with an emphasis on solutions relying on Variational Inference and the use of Natural gradients. We also discuss the use of manifold optimization as a state-of-the-art approach to Bayesian learning. We examine the characteristic properties of all the discussed methods, and provide pseudo-codes for their implementation, paying attention to practical aspects, such as the computation of the gradient

    A Parsimonious Tour of Bayesian Model Uncertainty

    Full text link
    Modern statistical software and machine learning libraries are enabling semi-automated statistical inference. Within this context, it appears easier and easier to try and fit many models to the data at hand, reversing thereby the Fisherian way of conducting science by collecting data after the scientific hypothesis (and hence the model) has been determined. The renewed goal of the statistician becomes to help the practitioner choose within such large and heterogeneous families of models, a task known as model selection. The Bayesian paradigm offers a systematized way of assessing this problem. This approach, launched by Harold Jeffreys in his 1935 book Theory of Probability, has witnessed a remarkable evolution in the last decades, that has brought about several new theoretical and methodological advances. Some of these recent developments are the focus of this survey, which tries to present a unifying perspective on work carried out by different communities. In particular, we focus on non-asymptotic out-of-sample performance of Bayesian model selection and averaging techniques, and draw connections with penalized maximum likelihood. We also describe recent extensions to wider classes of probabilistic frameworks including high-dimensional, unidentifiable, or likelihood-free models
    corecore