142 research outputs found

    Scalable Bayes via Barycenter in Wasserstein Space

    Full text link
    Divide-and-conquer based methods for Bayesian inference provide a general approach for tractable posterior inference when the sample size is large. These methods divide the data into smaller subsets, sample from the posterior distribution of parameters in parallel on all the subsets, and combine posterior samples from all the subsets to approximate the full data posterior distribution. The smaller size of any subset compared to the full data implies that posterior sampling on any subset is computationally more efficient than sampling from the true posterior distribution. Since the combination step takes negligible time relative to sampling, posterior computations can be scaled to massive data by dividing the full data into a sufficiently large number of data subsets. One such approach relies on the geometry of posterior distributions estimated across different subsets and combines them through their barycenter in a Wasserstein space of probability measures. We provide theoretical guarantees on the accuracy of approximation that are valid in many applications. We show that the geometric method approximates the full data posterior distribution better than its competitors across diverse simulations and reproduces known results when applied to a movie ratings database.Comment: 43 pages, 7 figures, and 11 tables. The updated revision will appear in JML

    Simple, Scalable and Accurate Posterior Interval Estimation

    Full text link
    There is a lack of simple and scalable algorithms for uncertainty quantification. Bayesian methods quantify uncertainty through posterior and predictive distributions, but it is difficult to rapidly estimate summaries of these distributions, such as quantiles and intervals. Variational Bayes approximations are widely used, but may badly underestimate posterior covariance. Typically, the focus of Bayesian inference is on point and interval estimates for one-dimensional functionals of interest. In small scale problems, Markov chain Monte Carlo algorithms remain the gold standard, but such algorithms face major problems in scaling up to big data. Various modifications have been proposed based on parallelization and approximations based on subsamples, but such approaches are either highly complex or lack theoretical support and/or good performance outside of narrow settings. We propose a very simple and general posterior interval estimation algorithm, which is based on running Markov chain Monte Carlo in parallel for subsets of the data and averaging quantiles estimated from each subset. We provide strong theoretical guarantees and illustrate performance in several applications.Comment: 50 pages, 6 figures, 11 table

    A Divide-and-Conquer Bayesian Approach to Large-Scale Kriging

    Full text link
    We propose a three-step divide-and-conquer strategy within the Bayesian paradigm that delivers massive scalability for any spatial process model. We partition the data into a large number of subsets, apply a readily available Bayesian spatial process model on every subset, in parallel, and optimally combine the posterior distributions estimated across all the subsets into a pseudo-posterior distribution that conditions on the entire data. The combined pseudo posterior distribution replaces the full data posterior distribution for predicting the responses at arbitrary locations and for inference on the model parameters and spatial surface. Based on distributed Bayesian inference, our approach is called "Distributed Kriging" (DISK) and offers significant advantages in massive data applications where the full data are stored across multiple machines. We show theoretically that the Bayes L2L_2-risk of the DISK posterior distribution achieves the near optimal convergence rate in estimating the true spatial surface with various types of covariance functions, and provide upper bounds for the number of subsets as a function of the full sample size. The model-free feature of DISK is demonstrated by scaling posterior computations in spatial process models with a stationary full-rank and a nonstationary low-rank Gaussian process (GP) prior. A variety of simulations and a geostatistical analysis of the Pacific Ocean sea surface temperature data validate our theoretical results.Comment: 29 pages, including 4 figures and 5 table

    Scalable Bayes under Informative Sampling

    Full text link
    The United States Bureau of Labor Statistics collects data using survey instruments under informative sampling designs that assign probabilities of inclusion to be correlated with the response. The bureau extensively uses Bayesian hierarchical models and posterior sampling to impute missing items in respondent-level data and to infer population parameters. Posterior sampling for survey data collected based on informative designs are computationally expensive and do not support production schedules of the bureau. Motivated by this problem, we propose a new method to scale Bayesian computations in informative sampling designs. Our method divides the data into smaller subsets, performs posterior sampling in parallel for every subset, and combines the collection of posterior samples from all the subsets through their mean in the Wasserstein space of order 2. Theoretically, we construct conditions on a class of sampling designs where posterior consistency of the proposed method is achieved. Empirically, we demonstrate that our method is competitive with traditional methods while being significantly faster in many simulations and in the Current Employment Statistics survey conducted by the bureau.Comment: 34 pages, 6 figures, 2 table

    Multilevel Clustering via Wasserstein Means

    Full text link
    We propose a novel approach to the problem of multilevel clustering, which aims to simultaneously partition data in each group and discover grouping patterns among groups in a potentially large hierarchically structured corpus of data. Our method involves a joint optimization formulation over several spaces of discrete probability measures, which are endowed with Wasserstein distance metrics. We propose a number of variants of this problem, which admit fast optimization algorithms, by exploiting the connection to the problem of finding Wasserstein barycenters. Consistency properties are established for the estimates of both local and global clusters. Finally, experiment results with both synthetic and real data are presented to demonstrate the flexibility and scalability of the proposed approach.Comment: Proceedings of the ICML, 201

    An Algorithm for Distributed Bayesian Inference in Generalized Linear Models

    Full text link
    Monte Carlo algorithms, such as Markov chain Monte Carlo (MCMC) and Hamiltonian Monte Carlo (HMC), are routinely used for Bayesian inference in generalized linear models; however, these algorithms are prohibitively slow in massive data settings because they require multiple passes through the full data in every iteration. Addressing this problem, we develop a scalable extension of these algorithms using the divide-and-conquer (D&C) technique that divides the data into a sufficiently large number of subsets, draws parameters in parallel on the subsets using a \textit{powered} likelihood, and produces Monte Carlo draws of the parameter by combining parameter draws obtained from each subset. These combined parameter draws play the role of draws from the original sampling algorithm. Our main contributions are two-fold. First, we demonstrate through diverse simulated and real data analyses that our distributed algorithm is comparable to the current state-of-the-art D&C algorithm in terms of statistical accuracy and computational efficiency. Second, providing theoretical support for our empirical observations, we identify regularity assumptions under which the proposed algorithm leads to asymptotically optimal inference. We illustrate our methodology through normal linear and logistic regressions, where parts of our D&C algorithm are analytically tractable.Comment: 24 pages, 3 Table

    Fast Algorithms for Computational Optimal Transport and Wasserstein Barycenter

    Full text link
    We provide theoretical complexity analysis for new algorithms to compute the optimal transport (OT) distance between two discrete probability distributions, and demonstrate their favorable practical performance over state-of-art primal-dual algorithms and their capability in solving other problems in large-scale, such as the Wasserstein barycenter problem for multiple probability distributions. First, we introduce the \emph{accelerated primal-dual randomized coordinate descent} (APDRCD) algorithm for computing the OT distance. We provide its complexity upper bound \bigOtil(\frac{n^{5/2}}{\varepsilon}) where nn stands for the number of atoms of these probability measures and ε>0\varepsilon > 0 is the desired accuracy. This complexity bound matches the best known complexities of primal-dual algorithms for the OT problems, including the adaptive primal-dual accelerated gradient descent (APDAGD) and the adaptive primal-dual accelerated mirror descent (APDAMD) algorithms. Then, we demonstrate the better performance of the APDRCD algorithm over the APDAGD and APDAMD algorithms through extensive experimental studies, and further improve its practical performance by proposing a greedy version of it, which we refer to as \emph{accelerated primal-dual greedy coordinate descent} (APDGCD). Finally, we generalize the APDRCD and APDGCD algorithms to distributed algorithms for computing the Wasserstein barycenter for multiple probability distributions.Comment: 18 pages, 35 figure

    Principal Geodesic Analysis for Probability Measures under the Optimal Transport Metric

    Full text link
    Given a family of probability measures in P(X), the space of probability measures on a Hilbert space X, our goal in this paper is to highlight one ore more curves in P(X) that summarize efficiently that family. We propose to study this problem under the optimal transport (Wasserstein) geometry, using curves that are restricted to be geodesic segments under that metric. We show that concepts that play a key role in Euclidean PCA, such as data centering or orthogonality of principal directions, find a natural equivalent in the optimal transport geometry, using Wasserstein means and differential geometry. The implementation of these ideas is, however, computationally challenging. To achieve scalable algorithms that can handle thousands of measures, we propose to use a relaxed definition for geodesics and regularized optimal transport distances. The interest of our approach is demonstrated on images seen either as shapes or color histograms.Comment: 9 pages, 8 figures. To appear in Advances in Neural Information Processing Systems (NIPS) 201

    An asymptotic analysis of distributed nonparametric methods

    Full text link
    We investigate and compare the fundamental performance of several distributed learning methods that have been proposed recently. We do this in the context of a distributed version of the classical signal-in-Gaussian-white-noise model, which serves as a benchmark model for studying performance in this setting. The results show how the design and tuning of a distributed method can have great impact on convergence rates and validity of uncertainty quantification. Moreover, we highlight the difficulty of designing nonparametric distributed procedures that automatically adapt to smoothness.Comment: 29 pages, 4 figure

    Continuous Regularized Wasserstein Barycenters

    Full text link
    Wasserstein barycenters provide a geometrically meaningful way to aggregate probability distributions, built on the theory of optimal transport. They are difficult to compute in practice, however, leading previous work to restrict their supports to finite sets of points. Leveraging a new dual formulation for the regularized Wasserstein barycenter problem, we introduce a stochastic algorithm that constructs a continuous approximation of the barycenter. We establish strong duality and use the corresponding primal-dual relationship to parametrize the barycenter implicitly using the dual potentials of regularized transport problems. The resulting problem can be solved with stochastic gradient descent, which yields an efficient online algorithm to approximate the barycenter of continuous distributions given sample access. We demonstrate the effectiveness of our approach and compare against previous work on synthetic examples and real-world applications
    • …
    corecore