1,561 research outputs found
On the Asymptotic Efficiency of Approximate Bayesian Computation Estimators
Many statistical applications involve models for which it is difficult to
evaluate the likelihood, but from which it is relatively easy to sample.
Approximate Bayesian computation is a likelihood-free method for implementing
Bayesian inference in such cases. We present results on the asymptotic variance
of estimators obtained using approximate Bayesian computation in a large-data
limit. Our key assumption is that the data are summarized by a
fixed-dimensional summary statistic that obeys a central limit theorem. We
prove asymptotic normality of the mean of the approximate Bayesian computation
posterior. This result also shows that, in terms of asymptotic variance, we
should use a summary statistic that is the same dimension as the parameter
vector, p; and that any summary statistic of higher dimension can be reduced,
through a linear transformation, to dimension p in a way that can only reduce
the asymptotic variance of the posterior mean. We look at how the Monte Carlo
error of an importance sampling algorithm that samples from the approximate
Bayesian computation posterior affects the accuracy of estimators. We give
conditions on the importance sampling proposal distribution such that the
variance of the estimator will be the same order as that of the maximum
likelihood estimator based on the summary statistics used. This suggests an
iterative importance sampling algorithm, which we evaluate empirically on a
stochastic volatility model.Comment: Main text shortened and proof revised. To appear in Biometrik
Particle approximations of the score and observed information matrix for parameter estimation in state space models with linear computational cost
Poyiadjis et al. (2011) show how particle methods can be used to estimate
both the score and the observed information matrix for state space models.
These methods either suffer from a computational cost that is quadratic in the
number of particles, or produce estimates whose variance increases
quadratically with the amount of data. This paper introduces an alternative
approach for estimating these terms at a computational cost that is linear in
the number of particles. The method is derived using a combination of kernel
density estimation, to avoid the particle degeneracy that causes the
quadratically increasing variance, and Rao-Blackwellisation. Crucially, we show
the method is robust to the choice of bandwidth within the kernel density
estimation, as it has good asymptotic properties regardless of this choice. Our
estimates of the score and observed information matrix can be used within both
online and batch procedures for estimating parameters for state space models.
Empirical results show improved parameter estimates compared to existing
methods at a significantly reduced computational cost. Supplementary materials
including code are available.Comment: Accepted to Journal of Computational and Graphical Statistic
Tractable diffusion and coalescent processes for weakly correlated loci
Widely used models in genetics include the Wright-Fisher diffusion and its
moment dual, Kingman's coalescent. Each has a multilocus extension but under
neither extension is the sampling distribution available in closed-form, and
their computation is extremely difficult. In this paper we derive two new
multilocus population genetic models, one a diffusion and the other a
coalescent process, which are much simpler than the standard models, but which
capture their key properties for large recombination rates. The diffusion model
is based on a central limit theorem for density dependent population processes,
and we show that the sampling distribution is a linear combination of moments
of Gaussian distributions and hence available in closed-form. The coalescent
process is based on a probabilistic coupling of the ancestral recombination
graph to a simpler genealogical process which exposes the leading dynamics of
the former. We further demonstrate that when we consider the sampling
distribution as an asymptotic expansion in inverse powers of the recombination
parameter, the sampling distributions of the new models agree with the standard
ones up to the first two orders.Comment: 34 pages, 1 figur
On Optimal Multiple Changepoint Algorithms for Large Data
There is an increasing need for algorithms that can accurately detect
changepoints in long time-series, or equivalent, data. Many common approaches
to detecting changepoints, for example based on penalised likelihood or minimum
description length, can be formulated in terms of minimising a cost over
segmentations. Dynamic programming methods exist to solve this minimisation
problem exactly, but these tend to scale at least quadratically in the length
of the time-series. Algorithms, such as Binary Segmentation, exist that have a
computational cost that is close to linear in the length of the time-series,
but these are not guaranteed to find the optimal segmentation. Recently pruning
ideas have been suggested that can speed up the dynamic programming algorithms,
whilst still being guaranteed to find true minimum of the cost function. Here
we extend these pruning methods, and introduce two new algorithms for
segmenting data, FPOP and SNIP. Empirical results show that FPOP is
substantially faster than existing dynamic programming methods, and unlike the
existing methods its computational efficiency is robust to the number of
changepoints in the data. We evaluate the method at detecting Copy Number
Variations and observe that FPOP has a computational cost that is competitive
with that of Binary Segmentation.Comment: 20 page
- …