1,115 research outputs found

    Asymptotically Exact, Embarrassingly Parallel MCMC

    Full text link
    Communication costs, resulting from synchronization requirements during learning, can greatly slow down many parallel machine learning algorithms. In this paper, we present a parallel Markov chain Monte Carlo (MCMC) algorithm in which subsets of data are processed independently, with very little communication. First, we arbitrarily partition data onto multiple machines. Then, on each machine, any classical MCMC method (e.g., Gibbs sampling) may be used to draw samples from a posterior distribution given the data subset. Finally, the samples from each machine are combined to form samples from the full posterior. This embarrassingly parallel algorithm allows each machine to act independently on a subset of the data (without communication) until the final combination stage. We prove that our algorithm generates asymptotically exact samples and empirically demonstrate its ability to parallelize burn-in and sampling in several models

    Penalized additive regression for space-time data: a Bayesian perspective

    Get PDF
    We propose extensions of penalized spline generalized additive models for analysing space-time regression data and study them from a Bayesian perspective. Non-linear effects of continuous covariates and time trends are modelled through Bayesian versions of penalized splines, while correlated spatial effects follow a Markov random field prior. This allows to treat all functions and effects within a unified general framework by assigning appropriate priors with different forms and degrees of smoothness. Inference can be performed either with full (FB) or empirical Bayes (EB) posterior analysis. FB inference using MCMC techniques is a slight extension of own previous work. For EB inference, a computationally efficient solution is developed on the basis of a generalized linear mixed model representation. The second approach can be viewed as posterior mode estimation and is closely related to penalized likelihood estimation in a frequentist setting. Variance components, corresponding to smoothing parameters, are then estimated by using marginal likelihood. We carefully compare both inferential procedures in simulation studies and illustrate them through real data applications. The methodology is available in the open domain statistical package BayesX and as an S-plus/R function

    Semiparametric Classification under a Forest Density Assumption

    Get PDF
    This dissertation proposes a new semiparametric approach for binary classification that exploits the modeling flexibility of sparse graphical models. This approach is based on non-parametrically estimated densities, which are notoriously difficult to obtain when the number of dimensions is even moderately large. In this work, it is assumed that each class can be well-represented by a family of undirected sparse graphical models, specifically a forest-structured distribution. By making this assumption, non-parametric estimation of only one- and two-dimensional marginal densities are required to transform the data into a space where a linear classifier is optimal. This work proves convergence results for the forest density classifier under certain conditions. Its performance is illustrated by comparing it to several state-of-the-art classifiers on simulated forest-distributed data as well as a panel of real datasets from different domains. These experiments indicate that the proposed method is competitive with popular methods across a wide range of applications

    Sparse Nonparametric Graphical Models

    Full text link
    We present some nonparametric methods for graphical modeling. In the discrete case, where the data are binary or drawn from a finite alphabet, Markov random fields are already essentially nonparametric, since the cliques can take only a finite number of values. Continuous data are different. The Gaussian graphical model is the standard parametric model for continuous data, but it makes distributional assumptions that are often unrealistic. We discuss two approaches to building more flexible graphical models. One allows arbitrary graphs and a nonparametric extension of the Gaussian; the other uses kernel density estimation and restricts the graphs to trees and forests. Examples of both methods are presented. We also discuss possible future research directions for nonparametric graphical modeling.Comment: Published in at http://dx.doi.org/10.1214/12-STS391 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org
    • ā€¦
    corecore