1,546 research outputs found
Practical bounds on the error of Bayesian posterior approximations: A nonasymptotic approach
Bayesian inference typically requires the computation of an approximation to
the posterior distribution. An important requirement for an approximate
Bayesian inference algorithm is to output high-accuracy posterior mean and
uncertainty estimates. Classical Monte Carlo methods, particularly Markov Chain
Monte Carlo, remain the gold standard for approximate Bayesian inference
because they have a robust finite-sample theory and reliable convergence
diagnostics. However, alternative methods, which are more scalable or apply to
problems where Markov Chain Monte Carlo cannot be used, lack the same
finite-data approximation theory and tools for evaluating their accuracy. In
this work, we develop a flexible new approach to bounding the error of mean and
uncertainty estimates of scalable inference algorithms. Our strategy is to
control the estimation errors in terms of Wasserstein distance, then bound the
Wasserstein distance via a generalized notion of Fisher distance. Unlike
computing the Wasserstein distance, which requires access to the normalized
posterior distribution, the Fisher distance is tractable to compute because it
requires access only to the gradient of the log posterior density. We
demonstrate the usefulness of our Fisher distance approach by deriving bounds
on the Wasserstein error of the Laplace approximation and Hilbert coresets. We
anticipate that our approach will be applicable to many other approximate
inference methods such as the integrated Laplace approximation, variational
inference, and approximate Bayesian computationComment: 22 pages, 2 figure
The Brouwer Lecture 2005: Statistical estimation with model selection
The purpose of this paper is to explain the interest and importance of
(approximate) models and model selection in Statistics. Starting from the very
elementary example of histograms we present a general notion of finite
dimensional model for statistical estimation and we explain what type of risk
bounds can be expected from the use of one such model. We then give the
performance of suitable model selection procedures from a family of such
models. We illustrate our point of view by two main examples: the choice of a
partition for designing a histogram from an n-sample and the problem of
variable selection in the context of Gaussian regression
Asymptotic Accuracy of Bayesian Estimation for a Single Latent Variable
In data science and machine learning, hierarchical parametric models, such as
mixture models, are often used. They contain two kinds of variables: observable
variables, which represent the parts of the data that can be directly measured,
and latent variables, which represent the underlying processes that generate
the data. Although there has been an increase in research on the estimation
accuracy for observable variables, the theoretical analysis of estimating
latent variables has not been thoroughly investigated. In a previous study, we
determined the accuracy of a Bayes estimation for the joint probability of the
latent variables in a dataset, and we proved that the Bayes method is
asymptotically more accurate than the maximum-likelihood method. However, the
accuracy of the Bayes estimation for a single latent variable remains unknown.
In the present paper, we derive the asymptotic expansions of the error
functions, which are defined by the Kullback-Leibler divergence, for two types
of single-variable estimations when the statistical regularity is satisfied.
Our results indicate that the accuracies of the Bayes and maximum-likelihood
methods are asymptotically equivalent and clarify that the Bayes method is only
advantageous for multivariable estimations.Comment: 28 pages, 3 figure
Optimal cross-validation in density estimation with the -loss
We analyze the performance of cross-validation (CV) in the density estimation
framework with two purposes: (i) risk estimation and (ii) model selection. The
main focus is given to the so-called leave--out CV procedure (Lpo), where
denotes the cardinality of the test set. Closed-form expressions are
settled for the Lpo estimator of the risk of projection estimators. These
expressions provide a great improvement upon -fold cross-validation in terms
of variability and computational complexity. From a theoretical point of view,
closed-form expressions also enable to study the Lpo performance in terms of
risk estimation. The optimality of leave-one-out (Loo), that is Lpo with ,
is proved among CV procedures used for risk estimation. Two model selection
frameworks are also considered: estimation, as opposed to identification. For
estimation with finite sample size , optimality is achieved for large
enough [with ] to balance the overfitting resulting from the
structure of the model collection. For identification, model selection
consistency is settled for Lpo as long as is conveniently related to the
rate of convergence of the best estimator in the collection: (i) as
with a parametric rate, and (ii) with some
nonparametric estimators. These theoretical results are validated by simulation
experiments.Comment: Published in at http://dx.doi.org/10.1214/14-AOS1240 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Maximum L-likelihood estimation
In this paper, the maximum L-likelihood estimator (MLE), a new
parameter estimator based on nonextensive entropy [Kibernetika 3 (1967) 30--35]
is introduced. The properties of the MLE are studied via asymptotic analysis
and computer simulations. The behavior of the MLE is characterized by the
degree of distortion applied to the assumed model. When is properly
chosen for small and moderate sample sizes, the MLE can successfully trade
bias for precision, resulting in a substantial reduction of the mean squared
error. When the sample size is large and tends to 1, a necessary and
sufficient condition to ensure a proper asymptotic normality and efficiency of
MLE is established.Comment: Published in at http://dx.doi.org/10.1214/09-AOS687 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Poisson inverse problems
In this paper we focus on nonparametric estimators in inverse problems for
Poisson processes involving the use of wavelet decompositions. Adopting an
adaptive wavelet Galerkin discretization, we find that our method combines the
well-known theoretical advantages of wavelet--vaguelette decompositions for
inverse problems in terms of optimally adapting to the unknown smoothness of
the solution, together with the remarkably simple closed-form expressions of
Galerkin inversion methods. Adapting the results of Barron and Sheu [Ann.
Statist. 19 (1991) 1347--1369] to the context of log-intensity functions
approximated by wavelet series with the use of the Kullback--Leibler distance
between two point processes, we also present an asymptotic analysis of
convergence rates that justifies our approach. In order to shed some light on
the theoretical results obtained and to examine the accuracy of our estimates
in finite samples, we illustrate our method by the analysis of some simulated
examples.Comment: Published at http://dx.doi.org/10.1214/009053606000000687 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …