1,546 research outputs found

    Practical bounds on the error of Bayesian posterior approximations: A nonasymptotic approach

    Full text link
    Bayesian inference typically requires the computation of an approximation to the posterior distribution. An important requirement for an approximate Bayesian inference algorithm is to output high-accuracy posterior mean and uncertainty estimates. Classical Monte Carlo methods, particularly Markov Chain Monte Carlo, remain the gold standard for approximate Bayesian inference because they have a robust finite-sample theory and reliable convergence diagnostics. However, alternative methods, which are more scalable or apply to problems where Markov Chain Monte Carlo cannot be used, lack the same finite-data approximation theory and tools for evaluating their accuracy. In this work, we develop a flexible new approach to bounding the error of mean and uncertainty estimates of scalable inference algorithms. Our strategy is to control the estimation errors in terms of Wasserstein distance, then bound the Wasserstein distance via a generalized notion of Fisher distance. Unlike computing the Wasserstein distance, which requires access to the normalized posterior distribution, the Fisher distance is tractable to compute because it requires access only to the gradient of the log posterior density. We demonstrate the usefulness of our Fisher distance approach by deriving bounds on the Wasserstein error of the Laplace approximation and Hilbert coresets. We anticipate that our approach will be applicable to many other approximate inference methods such as the integrated Laplace approximation, variational inference, and approximate Bayesian computationComment: 22 pages, 2 figure

    The Brouwer Lecture 2005: Statistical estimation with model selection

    Get PDF
    The purpose of this paper is to explain the interest and importance of (approximate) models and model selection in Statistics. Starting from the very elementary example of histograms we present a general notion of finite dimensional model for statistical estimation and we explain what type of risk bounds can be expected from the use of one such model. We then give the performance of suitable model selection procedures from a family of such models. We illustrate our point of view by two main examples: the choice of a partition for designing a histogram from an n-sample and the problem of variable selection in the context of Gaussian regression

    Asymptotic Accuracy of Bayesian Estimation for a Single Latent Variable

    Full text link
    In data science and machine learning, hierarchical parametric models, such as mixture models, are often used. They contain two kinds of variables: observable variables, which represent the parts of the data that can be directly measured, and latent variables, which represent the underlying processes that generate the data. Although there has been an increase in research on the estimation accuracy for observable variables, the theoretical analysis of estimating latent variables has not been thoroughly investigated. In a previous study, we determined the accuracy of a Bayes estimation for the joint probability of the latent variables in a dataset, and we proved that the Bayes method is asymptotically more accurate than the maximum-likelihood method. However, the accuracy of the Bayes estimation for a single latent variable remains unknown. In the present paper, we derive the asymptotic expansions of the error functions, which are defined by the Kullback-Leibler divergence, for two types of single-variable estimations when the statistical regularity is satisfied. Our results indicate that the accuracies of the Bayes and maximum-likelihood methods are asymptotically equivalent and clarify that the Bayes method is only advantageous for multivariable estimations.Comment: 28 pages, 3 figure

    Optimal cross-validation in density estimation with the L2L^2-loss

    Full text link
    We analyze the performance of cross-validation (CV) in the density estimation framework with two purposes: (i) risk estimation and (ii) model selection. The main focus is given to the so-called leave-pp-out CV procedure (Lpo), where pp denotes the cardinality of the test set. Closed-form expressions are settled for the Lpo estimator of the risk of projection estimators. These expressions provide a great improvement upon VV-fold cross-validation in terms of variability and computational complexity. From a theoretical point of view, closed-form expressions also enable to study the Lpo performance in terms of risk estimation. The optimality of leave-one-out (Loo), that is Lpo with p=1p=1, is proved among CV procedures used for risk estimation. Two model selection frameworks are also considered: estimation, as opposed to identification. For estimation with finite sample size nn, optimality is achieved for pp large enough [with p/n=o(1)p/n=o(1)] to balance the overfitting resulting from the structure of the model collection. For identification, model selection consistency is settled for Lpo as long as p/np/n is conveniently related to the rate of convergence of the best estimator in the collection: (i) p/n1p/n\to1 as n+n\to+\infty with a parametric rate, and (ii) p/n=o(1)p/n=o(1) with some nonparametric estimators. These theoretical results are validated by simulation experiments.Comment: Published in at http://dx.doi.org/10.1214/14-AOS1240 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Maximum Lqq-likelihood estimation

    Full text link
    In this paper, the maximum Lqq-likelihood estimator (MLqqE), a new parameter estimator based on nonextensive entropy [Kibernetika 3 (1967) 30--35] is introduced. The properties of the MLqqE are studied via asymptotic analysis and computer simulations. The behavior of the MLqqE is characterized by the degree of distortion qq applied to the assumed model. When qq is properly chosen for small and moderate sample sizes, the MLqqE can successfully trade bias for precision, resulting in a substantial reduction of the mean squared error. When the sample size is large and qq tends to 1, a necessary and sufficient condition to ensure a proper asymptotic normality and efficiency of MLqqE is established.Comment: Published in at http://dx.doi.org/10.1214/09-AOS687 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Poisson inverse problems

    Get PDF
    In this paper we focus on nonparametric estimators in inverse problems for Poisson processes involving the use of wavelet decompositions. Adopting an adaptive wavelet Galerkin discretization, we find that our method combines the well-known theoretical advantages of wavelet--vaguelette decompositions for inverse problems in terms of optimally adapting to the unknown smoothness of the solution, together with the remarkably simple closed-form expressions of Galerkin inversion methods. Adapting the results of Barron and Sheu [Ann. Statist. 19 (1991) 1347--1369] to the context of log-intensity functions approximated by wavelet series with the use of the Kullback--Leibler distance between two point processes, we also present an asymptotic analysis of convergence rates that justifies our approach. In order to shed some light on the theoretical results obtained and to examine the accuracy of our estimates in finite samples, we illustrate our method by the analysis of some simulated examples.Comment: Published at http://dx.doi.org/10.1214/009053606000000687 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org
    corecore