Search CORE

142 research outputs found

Scalable Bayes via Barycenter in Wasserstein Space

Author: Dunson David B.
Li Cheng
Srivastava Sanvesh
Publication venue
Publication date: 19/06/2018
Field of study

Divide-and-conquer based methods for Bayesian inference provide a general approach for tractable posterior inference when the sample size is large. These methods divide the data into smaller subsets, sample from the posterior distribution of parameters in parallel on all the subsets, and combine posterior samples from all the subsets to approximate the full data posterior distribution. The smaller size of any subset compared to the full data implies that posterior sampling on any subset is computationally more efficient than sampling from the true posterior distribution. Since the combination step takes negligible time relative to sampling, posterior computations can be scaled to massive data by dividing the full data into a sufficiently large number of data subsets. One such approach relies on the geometry of posterior distributions estimated across different subsets and combines them through their barycenter in a Wasserstein space of probability measures. We provide theoretical guarantees on the accuracy of approximation that are valid in many applications. We show that the geometric method approximates the full data posterior distribution better than its competitors across diverse simulations and reproduces known results when applied to a movie ratings database.Comment: 43 pages, 7 figures, and 11 tables. The updated revision will appear in JML

arXiv.org e-Print Archive

Simple, Scalable and Accurate Posterior Interval Estimation

Author: Dunson David B.
Li Cheng
Srivastava Sanvesh
Publication venue
Publication date: 23/12/2016
Field of study

There is a lack of simple and scalable algorithms for uncertainty quantification. Bayesian methods quantify uncertainty through posterior and predictive distributions, but it is difficult to rapidly estimate summaries of these distributions, such as quantiles and intervals. Variational Bayes approximations are widely used, but may badly underestimate posterior covariance. Typically, the focus of Bayesian inference is on point and interval estimates for one-dimensional functionals of interest. In small scale problems, Markov chain Monte Carlo algorithms remain the gold standard, but such algorithms face major problems in scaling up to big data. Various modifications have been proposed based on parallelization and approximations based on subsamples, but such approaches are either highly complex or lack theoretical support and/or good performance outside of narrow settings. We propose a very simple and general posterior interval estimation algorithm, which is based on running Markov chain Monte Carlo in parallel for subsets of the data and averaging quantiles estimated from each subset. We provide strong theoretical guarantees and illustrate performance in several applications.Comment: 50 pages, 6 figures, 11 table

arXiv.org e-Print Archive

A Divide-and-Conquer Bayesian Approach to Large-Scale Kriging

Author: Guhaniyogi Rajarshi
Li Cheng
Savitsky Terrance D.
Srivastava Sanvesh
Publication venue
Publication date: 12/06/2019
Field of study

We propose a three-step divide-and-conquer strategy within the Bayesian paradigm that delivers massive scalability for any spatial process model. We partition the data into a large number of subsets, apply a readily available Bayesian spatial process model on every subset, in parallel, and optimally combine the posterior distributions estimated across all the subsets into a pseudo-posterior distribution that conditions on the entire data. The combined pseudo posterior distribution replaces the full data posterior distribution for predicting the responses at arbitrary locations and for inference on the model parameters and spatial surface. Based on distributed Bayesian inference, our approach is called "Distributed Kriging" (DISK) and offers significant advantages in massive data applications where the full data are stored across multiple machines. We show theoretically that the Bayes

L_2

-risk of the DISK posterior distribution achieves the near optimal convergence rate in estimating the true spatial surface with various types of covariance functions, and provide upper bounds for the number of subsets as a function of the full sample size. The model-free feature of DISK is demonstrated by scaling posterior computations in spatial process models with a stationary full-rank and a nonstationary low-rank Gaussian process (GP) prior. A variety of simulations and a geostatistical analysis of the Pacific Ocean sea surface temperature data validate our theoretical results.Comment: 29 pages, including 4 figures and 5 table

arXiv.org e-Print Archive

Scalable Bayes under Informative Sampling

Author: Savitsky Terrance D.
Srivastava Sanvesh
Publication venue
Publication date: 24/10/2017
Field of study

The United States Bureau of Labor Statistics collects data using survey instruments under informative sampling designs that assign probabilities of inclusion to be correlated with the response. The bureau extensively uses Bayesian hierarchical models and posterior sampling to impute missing items in respondent-level data and to infer population parameters. Posterior sampling for survey data collected based on informative designs are computationally expensive and do not support production schedules of the bureau. Motivated by this problem, we propose a new method to scale Bayesian computations in informative sampling designs. Our method divides the data into smaller subsets, performs posterior sampling in parallel for every subset, and combines the collection of posterior samples from all the subsets through their mean in the Wasserstein space of order 2. Theoretically, we construct conditions on a class of sampling designs where posterior consistency of the proposed method is achieved. Empirically, we demonstrate that our method is competitive with traditional methods while being significantly faster in many simulations and in the Current Employment Statistics survey conducted by the bureau.Comment: 34 pages, 6 figures, 2 table

arXiv.org e-Print Archive

Multilevel Clustering via Wasserstein Means

Author: Bui Hung Hai
Ho Nhat
Huynh Viet
Nguyen XuanLong
Phung Dinh
Yurochkin Mikhail
Publication venue
Publication date: 12/06/2017
Field of study

We propose a novel approach to the problem of multilevel clustering, which aims to simultaneously partition data in each group and discover grouping patterns among groups in a potentially large hierarchically structured corpus of data. Our method involves a joint optimization formulation over several spaces of discrete probability measures, which are endowed with Wasserstein distance metrics. We propose a number of variants of this problem, which admit fast optimization algorithms, by exploiting the connection to the problem of finding Wasserstein barycenters. Consistency properties are established for the estimates of both local and global clusters. Finally, experiment results with both synthetic and real data are presented to demonstrate the flexibility and scalability of the proposed approach.Comment: Proceedings of the ICML, 201

arXiv.org e-Print Archive

An Algorithm for Distributed Bayesian Inference in Generalized Linear Models

Author: Shyamalkumar Nariankadu D.
Srivastava Sanvesh
Publication venue
Publication date: 27/08/2020
Field of study

Monte Carlo algorithms, such as Markov chain Monte Carlo (MCMC) and Hamiltonian Monte Carlo (HMC), are routinely used for Bayesian inference in generalized linear models; however, these algorithms are prohibitively slow in massive data settings because they require multiple passes through the full data in every iteration. Addressing this problem, we develop a scalable extension of these algorithms using the divide-and-conquer (D&C) technique that divides the data into a sufficiently large number of subsets, draws parameters in parallel on the subsets using a \textit{powered} likelihood, and produces Monte Carlo draws of the parameter by combining parameter draws obtained from each subset. These combined parameter draws play the role of draws from the original sampling algorithm. Our main contributions are two-fold. First, we demonstrate through diverse simulated and real data analyses that our distributed algorithm is comparable to the current state-of-the-art D&C algorithm in terms of statistical accuracy and computational efficiency. Second, providing theoretical support for our empirical observations, we identify regularity assumptions under which the proposed algorithm leads to asymptotically optimal inference. We illustrate our methodology through normal linear and logistic regressions, where parts of our D&C algorithm are analytically tractable.Comment: 24 pages, 3 Table

arXiv.org e-Print Archive

Fast Algorithms for Computational Optimal Transport and Wasserstein Barycenter

Author: Guo Wenshuo
Ho Nhat
Jordan Michael I.
Publication venue
Publication date: 15/06/2020
Field of study

We provide theoretical complexity analysis for new algorithms to compute the optimal transport (OT) distance between two discrete probability distributions, and demonstrate their favorable practical performance over state-of-art primal-dual algorithms and their capability in solving other problems in large-scale, such as the Wasserstein barycenter problem for multiple probability distributions. First, we introduce the \emph{accelerated primal-dual randomized coordinate descent} (APDRCD) algorithm for computing the OT distance. We provide its complexity upper bound \bigOtil(\frac{n^{5/2}}{\varepsilon}) where

n

stands for the number of atoms of these probability measures and

\varepsilon > 0

is the desired accuracy. This complexity bound matches the best known complexities of primal-dual algorithms for the OT problems, including the adaptive primal-dual accelerated gradient descent (APDAGD) and the adaptive primal-dual accelerated mirror descent (APDAMD) algorithms. Then, we demonstrate the better performance of the APDRCD algorithm over the APDAGD and APDAMD algorithms through extensive experimental studies, and further improve its practical performance by proposing a greedy version of it, which we refer to as \emph{accelerated primal-dual greedy coordinate descent} (APDGCD). Finally, we generalize the APDRCD and APDGCD algorithms to distributed algorithms for computing the Wasserstein barycenter for multiple probability distributions.Comment: 18 pages, 35 figure

arXiv.org e-Print Archive

Principal Geodesic Analysis for Probability Measures under the Optimal Transport Metric

Author: Cuturi Marco
Seguy Vivien
Publication venue
Publication date: 22/11/2015
Field of study

Given a family of probability measures in P(X), the space of probability measures on a Hilbert space X, our goal in this paper is to highlight one ore more curves in P(X) that summarize efficiently that family. We propose to study this problem under the optimal transport (Wasserstein) geometry, using curves that are restricted to be geodesic segments under that metric. We show that concepts that play a key role in Euclidean PCA, such as data centering or orthogonality of principal directions, find a natural equivalent in the optimal transport geometry, using Wasserstein means and differential geometry. The implementation of these ideas is, however, computationally challenging. To achieve scalable algorithms that can handle thousands of measures, we propose to use a relaxed definition for geodesics and regularized optimal transport distances. The interest of our approach is demonstrated on images seen either as shapes or color histograms.Comment: 9 pages, 8 figures. To appear in Advances in Neural Information Processing Systems (NIPS) 201

arXiv.org e-Print Archive

An asymptotic analysis of distributed nonparametric methods

Author: Szabo Botond
van Zanten Harry
Publication venue
Publication date: 08/11/2017
Field of study

We investigate and compare the fundamental performance of several distributed learning methods that have been proposed recently. We do this in the context of a distributed version of the classical signal-in-Gaussian-white-noise model, which serves as a benchmark model for studying performance in this setting. The results show how the design and tuning of a distributed method can have great impact on convergence rates and validity of uncertainty quantification. Moreover, we highlight the difficulty of designing nonparametric distributed procedures that automatically adapt to smoothness.Comment: 29 pages, 4 figure

arXiv.org e-Print Archive

Continuous Regularized Wasserstein Barycenters

Author: Genevay Aude
Li Lingxiao
Solomon Justin
Yurochkin Mikhail
Publication venue
Publication date: 28/08/2020
Field of study

Wasserstein barycenters provide a geometrically meaningful way to aggregate probability distributions, built on the theory of optimal transport. They are difficult to compute in practice, however, leading previous work to restrict their supports to finite sets of points. Leveraging a new dual formulation for the regularized Wasserstein barycenter problem, we introduce a stochastic algorithm that constructs a continuous approximation of the barycenter. We establish strong duality and use the corresponding primal-dual relationship to parametrize the barycenter implicitly using the dual potentials of regularized transport problems. The resulting problem can be solved with stochastic gradient descent, which yields an efficient online algorithm to approximate the barycenter of continuous distributions given sample access. We demonstrate the effectiveness of our approach and compare against previous work on synthetic examples and real-world applications

arXiv.org e-Print Archive