6,064 research outputs found
Non-linear regression models for Approximate Bayesian Computation
Approximate Bayesian inference on the basis of summary statistics is
well-suited to complex problems for which the likelihood is either
mathematically or computationally intractable. However the methods that use
rejection suffer from the curse of dimensionality when the number of summary
statistics is increased. Here we propose a machine-learning approach to the
estimation of the posterior density by introducing two innovations. The new
method fits a nonlinear conditional heteroscedastic regression of the parameter
on the summary statistics, and then adaptively improves estimation using
importance sampling. The new algorithm is compared to the state-of-the-art
approximate Bayesian methods, and achieves considerable reduction of the
computational burden in two examples of inference in statistical genetics and
in a queueing model.Comment: 4 figures; version 3 minor changes; to appear in Statistics and
Computin
Bayesian computation via empirical likelihood
Approximate Bayesian computation (ABC) has become an essential tool for the
analysis of complex stochastic models when the likelihood function is
numerically unavailable. However, the well-established statistical method of
empirical likelihood provides another route to such settings that bypasses
simulations from the model and the choices of the ABC parameters (summary
statistics, distance, tolerance), while being convergent in the number of
observations. Furthermore, bypassing model simulations may lead to significant
time savings in complex models, for instance those found in population
genetics. The BCel algorithm we develop in this paper also provides an
evaluation of its own performance through an associated effective sample size.
The method is illustrated using several examples, including estimation of
standard distributions, time series, and population genetics models.Comment: 21 pages, 12 figures, revised version of the previous version with a
new titl
A subsampled double bootstrap for massive data
The bootstrap is a popular and powerful method for assessing precision of
estimators and inferential methods. However, for massive datasets which are
increasingly prevalent, the bootstrap becomes prohibitively costly in
computation and its feasibility is questionable even with modern parallel
computing platforms. Recently Kleiner, Talwalkar, Sarkar, and Jordan (2014)
proposed a method called BLB (Bag of Little Bootstraps) for massive data which
is more computationally scalable with little sacrifice of statistical accuracy.
Building on BLB and the idea of fast double bootstrap, we propose a new
resampling method, the subsampled double bootstrap, for both independent data
and time series data. We establish consistency of the subsampled double
bootstrap under mild conditions for both independent and dependent cases.
Methodologically, the subsampled double bootstrap is superior to BLB in terms
of running time, more sample coverage and automatic implementation with less
tuning parameters for a given time budget. Its advantage relative to BLB and
bootstrap is also demonstrated in numerical simulations and a data
illustration
ABC likelihood-freee methods for model choice in Gibbs random fields
Gibbs random fields (GRF) are polymorphous statistical models that can be
used to analyse different types of dependence, in particular for spatially
correlated data. However, when those models are faced with the challenge of
selecting a dependence structure from many, the use of standard model choice
methods is hampered by the unavailability of the normalising constant in the
Gibbs likelihood. In particular, from a Bayesian perspective, the computation
of the posterior probabilities of the models under competition requires special
likelihood-free simulation techniques like the Approximate Bayesian Computation
(ABC) algorithm that is intensively used in population genetics. We show in
this paper how to implement an ABC algorithm geared towards model choice in the
general setting of Gibbs random fields, demonstrating in particular that there
exists a sufficient statistic across models. The accuracy of the approximation
to the posterior probabilities can be further improved by importance sampling
on the distribution of the models. The practical aspects of the method are
detailed through two applications, the test of an iid Bernoulli model versus a
first-order Markov chain, and the choice of a folding structure for two
proteins.Comment: 19 pages, 5 figures, to appear in Bayesian Analysi
- …