363 research outputs found

    Quantitative contraction rates for Markov chains on general state spaces

    Get PDF
    We investigate the problem of quantifying contraction coefficients of Markov transition kernels in Kantorovich (L1L^1 Wasserstein) distances. For diffusion processes, relatively precise quantitative bounds on contraction rates have recently been derived by combining appropriate couplings with carefully designed Kantorovich distances. In this paper, we partially carry over this approach from diffusions to Markov chains. We derive quantitative lower bounds on contraction rates for Markov chains on general state spaces that are powerful if the dynamics is dominated by small local moves. For Markov chains on Rd\mathbb{R^d} with isotropic transition kernels, the general bounds can be used efficiently together with a coupling that combines maximal and reflection coupling. The results are applied to Euler discretizations of stochastic differential equations with non-globally contractive drifts, and to the Metropolis adjusted Langevin algorithm for sampling from a class of probability measures on high dimensional state spaces that are not globally log-concave.Comment: 39 page

    Practical bounds on the error of Bayesian posterior approximations: A nonasymptotic approach

    Full text link
    Bayesian inference typically requires the computation of an approximation to the posterior distribution. An important requirement for an approximate Bayesian inference algorithm is to output high-accuracy posterior mean and uncertainty estimates. Classical Monte Carlo methods, particularly Markov Chain Monte Carlo, remain the gold standard for approximate Bayesian inference because they have a robust finite-sample theory and reliable convergence diagnostics. However, alternative methods, which are more scalable or apply to problems where Markov Chain Monte Carlo cannot be used, lack the same finite-data approximation theory and tools for evaluating their accuracy. In this work, we develop a flexible new approach to bounding the error of mean and uncertainty estimates of scalable inference algorithms. Our strategy is to control the estimation errors in terms of Wasserstein distance, then bound the Wasserstein distance via a generalized notion of Fisher distance. Unlike computing the Wasserstein distance, which requires access to the normalized posterior distribution, the Fisher distance is tractable to compute because it requires access only to the gradient of the log posterior density. We demonstrate the usefulness of our Fisher distance approach by deriving bounds on the Wasserstein error of the Laplace approximation and Hilbert coresets. We anticipate that our approach will be applicable to many other approximate inference methods such as the integrated Laplace approximation, variational inference, and approximate Bayesian computationComment: 22 pages, 2 figure

    MEXIT: Maximal un-coupling times for stochastic processes

    Get PDF
    Classical coupling constructions arrange for copies of the \emph{same} Markov process started at two \emph{different} initial states to become equal as soon as possible. In this paper, we consider an alternative coupling framework in which one seeks to arrange for two \emph{different} Markov (or other stochastic) processes to remain equal for as long as possible, when started in the \emph{same} state. We refer to this "un-coupling" or "maximal agreement" construction as \emph{MEXIT}, standing for "maximal exit". After highlighting the importance of un-coupling arguments in a few key statistical and probabilistic settings, we develop an explicit \MEXIT construction for stochastic processes in discrete time with countable state-space. This construction is generalized to random processes on general state-space running in continuous time, and then exemplified by discussion of \MEXIT for Brownian motions with two different constant drifts.Comment: 28 page

    PASS-GLM: polynomial approximate sufficient statistics for scalable Bayesian GLM inference

    Full text link
    Generalized linear models (GLMs) -- such as logistic regression, Poisson regression, and robust regression -- provide interpretable models for diverse data types. Probabilistic approaches, particularly Bayesian ones, allow coherent estimates of uncertainty, incorporation of prior information, and sharing of power across experiments via hierarchical models. In practice, however, the approximate Bayesian methods necessary for inference have either failed to scale to large data sets or failed to provide theoretical guarantees on the quality of inference. We propose a new approach based on constructing polynomial approximate sufficient statistics for GLMs (PASS-GLM). We demonstrate that our method admits a simple algorithm as well as trivial streaming and distributed extensions that do not compound error across computations. We provide theoretical guarantees on the quality of point (MAP) estimates, the approximate posterior, and posterior mean and uncertainty estimates. We validate our approach empirically in the case of logistic regression using a quadratic approximation and show competitive performance with stochastic gradient descent, MCMC, and the Laplace approximation in terms of speed and multiple measures of accuracy -- including on an advertising data set with 40 million data points and 20,000 covariates.Comment: In Proceedings of the 31st Annual Conference on Neural Information Processing Systems (NIPS 2017). v3: corrected typos in Appendix

    Tensor approximation of generalized correlated diffusions and applications

    Get PDF
    This thesis documents my research activity conducted in the past three years at the Department of Statistical Science at the University College London. My investigation is focused on functional-analytic methods applied to the characterization of generalized correlated Markov processes. The main objective of the research is to formalize the properties of such a class of stochastic processes when approximated in a tensor space. This lead to the development of a new interpretation of the correlation among processes that is exploited for the analysis of copula functions and their statistical properties
    corecore