6,837 research outputs found
On the Asymptotic Efficiency of Approximate Bayesian Computation Estimators
Many statistical applications involve models for which it is difficult to
evaluate the likelihood, but from which it is relatively easy to sample.
Approximate Bayesian computation is a likelihood-free method for implementing
Bayesian inference in such cases. We present results on the asymptotic variance
of estimators obtained using approximate Bayesian computation in a large-data
limit. Our key assumption is that the data are summarized by a
fixed-dimensional summary statistic that obeys a central limit theorem. We
prove asymptotic normality of the mean of the approximate Bayesian computation
posterior. This result also shows that, in terms of asymptotic variance, we
should use a summary statistic that is the same dimension as the parameter
vector, p; and that any summary statistic of higher dimension can be reduced,
through a linear transformation, to dimension p in a way that can only reduce
the asymptotic variance of the posterior mean. We look at how the Monte Carlo
error of an importance sampling algorithm that samples from the approximate
Bayesian computation posterior affects the accuracy of estimators. We give
conditions on the importance sampling proposal distribution such that the
variance of the estimator will be the same order as that of the maximum
likelihood estimator based on the summary statistics used. This suggests an
iterative importance sampling algorithm, which we evaluate empirically on a
stochastic volatility model.Comment: Main text shortened and proof revised. To appear in Biometrik
Computationally Efficient Nonparametric Importance Sampling
The variance reduction established by importance sampling strongly depends on
the choice of the importance sampling distribution. A good choice is often hard
to achieve especially for high-dimensional integration problems. Nonparametric
estimation of the optimal importance sampling distribution (known as
nonparametric importance sampling) is a reasonable alternative to parametric
approaches.In this article nonparametric variants of both the self-normalized
and the unnormalized importance sampling estimator are proposed and
investigated. A common critique on nonparametric importance sampling is the
increased computational burden compared to parametric methods. We solve this
problem to a large degree by utilizing the linear blend frequency polygon
estimator instead of a kernel estimator. Mean square error convergence
properties are investigated leading to recommendations for the efficient
application of nonparametric importance sampling. Particularly, we show that
nonparametric importance sampling asymptotically attains optimal importance
sampling variance. The efficiency of nonparametric importance sampling
algorithms heavily relies on the computational efficiency of the employed
nonparametric estimator. The linear blend frequency polygon outperforms kernel
estimators in terms of certain criteria such as efficient sampling and
evaluation. Furthermore, it is compatible with the inversion method for sample
generation. This allows to combine our algorithms with other variance reduction
techniques such as stratified sampling. Empirical evidence for the usefulness
of the suggested algorithms is obtained by means of three benchmark integration
problems. As an application we estimate the distribution of the queue length of
a spam filter queueing system based on real data.Comment: 29 pages, 7 figure
Robust Covariance Matrix Estimation with Data-Dependent VAR Prewhitening Order
This paper analyzes the performance of heteroskedasticity-and-autocorrelation-consistent (HAC) covariance matrix estimators in which the residuals are prewhitened using a vector autoregressive (VAR) filter. We highlight the pitfalls of using an arbitrarily fixed lag order for the VAR filter, and we demonstrate the benefits of using a model selection criterion (either AIC or BIC) to determine its lag structure. Furthermore, once data-dependent VAR prewhitening has been utilized, we find negligible or even counter-productive effects of applying standard kernel-based methods to the prewhitened residuals; that is, the performance of the prewhitened kernel estimator is virtually indistinguishable from that of the VARHAC estimator.
Integral approximation by kernel smoothing
Let be an i.i.d. sequence of random variables in
, . We show that, for any function , under regularity conditions, where
is the classical kernel estimator of the density of . This
result is striking because it speeds up traditional rates, in root , derived
from the central limit theorem when . Although this paper
highlights some applications, we mainly address theoretical issues related to
the later result. We derive upper bounds for the rate of convergence in
probability. These bounds depend on the regularity of the functions
and , the dimension and the bandwidth of the kernel estimator
. Moreover, they are shown to be accurate since they are used as
renormalizing sequences in two central limit theorems each reflecting different
degrees of smoothness of . As an application to regression modelling
with random design, we provide the asymptotic normality of the estimation of
the linear functionals of a regression function. As a consequence of the above
result, the asymptotic variance does not depend on the regression function.
Finally, we debate the choice of the bandwidth for integral approximation and
we highlight the good behavior of our procedure through simulations.Comment: Published at http://dx.doi.org/10.3150/15-BEJ725 in the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm). arXiv admin
note: text overlap with arXiv:1312.449
Estimating spatial quantile regression with functional coefficients: A robust semiparametric framework
This paper considers an estimation of semiparametric functional
(varying)-coefficient quantile regression with spatial data. A general robust
framework is developed that treats quantile regression for spatial data in a
natural semiparametric way. The local M-estimators of the unknown
functional-coefficient functions are proposed by using local linear
approximation, and their asymptotic distributions are then established under
weak spatial mixing conditions allowing the data processes to be either
stationary or nonstationary with spatial trends. Application to a soil data set
is demonstrated with interesting findings that go beyond traditional analysis.Comment: Published in at http://dx.doi.org/10.3150/12-BEJ480 the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
Nonparametric estimation of scalar diffusions based on low frequency data
We study the problem of estimating the coefficients of a diffusion (X_t,t\geq
0); the estimation is based on discrete data X_{n\Delta},n=0,1,...,N. The
sampling frequency \Delta^{-1} is constant, and asymptotics are taken as the
number N of observations tends to infinity. We prove that the problem of
estimating both the diffusion coefficient (the volatility) and the drift in a
nonparametric setting is ill-posed: the minimax rates of convergence for
Sobolev constraints and squared-error loss coincide with that of a,
respectively, first- and second-order linear inverse problem. To ensure
ergodicity and limit technical difficulties we restrict ourselves to scalar
diffusions living on a compact interval with reflecting boundary conditions.
Our approach is based on the spectral analysis of the associated Markov
semigroup. A rate-optimal estimation of the coefficients is obtained via the
nonparametric estimation of an eigenvalue-eigenfunction pair of the transition
operator of the discrete time Markov chain (X_{n\Delta},n=0,1,...,N) in a
suitable Sobolev norm, together with an estimation of its invariant density.Comment: Published at http://dx.doi.org/10.1214/009053604000000797 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Indirect likelihood inference
Given a sample from a fully specified parametric model, let Zn be a given finite-dimensional statistic - for example, an initial estimator or a set of sample moments. We propose to (re-)estimate the parameters of the model by maximizing the likelihood of Zn. We call this the maximum indirect likelihood (MIL) estimator. We also propose a computationally tractable Bayesian version of the estimator which we refer to as a Bayesian Indirect Likelihood (BIL) estimator. In most cases, the density of the statistic will be of unknown form, and we develop simulated versions of the MIL and BIL estimators. We show that the indirect likelihood estimators are consistent and asymptotically normally distributed, with the same asymptotic variance as that of the corresponding efficient two-step GMM estimator based on the same statistic. However, our likelihood-based estimators, by taking into account the full finite-sample distribution of the statistic, are higher order efficient relative to GMM-type estimators. Furthermore, in many cases they enjoy a bias reduction property similar to that of the indirect inference estimator. Monte Carlo results for a number of applications including dynamic and nonlinear panel data models, a structural auction model and two DSGE models show that the proposed estimators indeed have attractive finite sample properties.indirect inference; maximum-likelihood; simulation-based
Properties of principal component methods for functional and longitudinal data analysis
The use of principal component methods to analyze functional data is
appropriate in a wide range of different settings. In studies of ``functional
data analysis,'' it has often been assumed that a sample of random functions is
observed precisely, in the continuum and without noise. While this has been the
traditional setting for functional data analysis, in the context of
longitudinal data analysis a random function typically represents a patient, or
subject, who is observed at only a small number of randomly distributed points,
with nonnegligible measurement error. Nevertheless, essentially the same
methods can be used in both these cases, as well as in the vast number of
settings that lie between them. How is performance affected by the sampling
plan? In this paper we answer that question. We show that if there is a sample
of functions, or subjects, then estimation of eigenvalues is a
semiparametric problem, with root- consistent estimators, even if only a few
observations are made of each function, and if each observation is encumbered
by noise. However, estimation of eigenfunctions becomes a nonparametric problem
when observations are sparse. The optimal convergence rates in this case are
those which pertain to more familiar function-estimation settings. We also
describe the effects of sampling at regularly spaced points, as opposed to
random points. In particular, it is shown that there are often advantages in
sampling randomly. However, even in the case of noisy data there is a threshold
sampling rate (depending on the number of functions treated) above which the
rate of sampling (either randomly or regularly) has negligible impact on
estimator performance, no matter whether eigenfunctions or eigenvectors are
being estimated.Comment: Published at http://dx.doi.org/10.1214/009053606000000272 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …