55,166 research outputs found
A Comparative Review of Dimension Reduction Methods in Approximate Bayesian Computation
Approximate Bayesian computation (ABC) methods make use of comparisons
between simulated and observed summary statistics to overcome the problem of
computationally intractable likelihood functions. As the practical
implementation of ABC requires computations based on vectors of summary
statistics, rather than full data sets, a central question is how to derive
low-dimensional summary statistics from the observed data with minimal loss of
information. In this article we provide a comprehensive review and comparison
of the performance of the principal methods of dimension reduction proposed in
the ABC literature. The methods are split into three nonmutually exclusive
classes consisting of best subset selection methods, projection techniques and
regularization. In addition, we introduce two new methods of dimension
reduction. The first is a best subset selection method based on Akaike and
Bayesian information criteria, and the second uses ridge regression as a
regularization procedure. We illustrate the performance of these dimension
reduction techniques through the analysis of three challenging models and data
sets.Comment: Published in at http://dx.doi.org/10.1214/12-STS406 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Non-linear regression models for Approximate Bayesian Computation
Approximate Bayesian inference on the basis of summary statistics is
well-suited to complex problems for which the likelihood is either
mathematically or computationally intractable. However the methods that use
rejection suffer from the curse of dimensionality when the number of summary
statistics is increased. Here we propose a machine-learning approach to the
estimation of the posterior density by introducing two innovations. The new
method fits a nonlinear conditional heteroscedastic regression of the parameter
on the summary statistics, and then adaptively improves estimation using
importance sampling. The new algorithm is compared to the state-of-the-art
approximate Bayesian methods, and achieves considerable reduction of the
computational burden in two examples of inference in statistical genetics and
in a queueing model.Comment: 4 figures; version 3 minor changes; to appear in Statistics and
Computin
ABC random forests for Bayesian parameter inference
This preprint has been reviewed and recommended by Peer Community In
Evolutionary Biology (http://dx.doi.org/10.24072/pci.evolbiol.100036).
Approximate Bayesian computation (ABC) has grown into a standard methodology
that manages Bayesian inference for models associated with intractable
likelihood functions. Most ABC implementations require the preliminary
selection of a vector of informative statistics summarizing raw data.
Furthermore, in almost all existing implementations, the tolerance level that
separates acceptance from rejection of simulated parameter values needs to be
calibrated. We propose to conduct likelihood-free Bayesian inferences about
parameters with no prior selection of the relevant components of the summary
statistics and bypassing the derivation of the associated tolerance level. The
approach relies on the random forest methodology of Breiman (2001) applied in a
(non parametric) regression setting. We advocate the derivation of a new random
forest for each component of the parameter vector of interest. When compared
with earlier ABC solutions, this method offers significant gains in terms of
robustness to the choice of the summary statistics, does not depend on any type
of tolerance level, and is a good trade-off in term of quality of point
estimator precision and credible interval estimations for a given computing
time. We illustrate the performance of our methodological proposal and compare
it with earlier ABC methods on a Normal toy example and a population genetics
example dealing with human population evolution. All methods designed here have
been incorporated in the R package abcrf (version 1.7) available on CRAN.Comment: Main text: 24 pages, 6 figures Supplementary Information: 14 pages, 5
figure
Approximate maximum likelihood estimation using data-cloning ABC
A maximum likelihood methodology for a general class of models is presented,
using an approximate Bayesian computation (ABC) approach. The typical target of
ABC methods are models with intractable likelihoods, and we combine an ABC-MCMC
sampler with so-called "data cloning" for maximum likelihood estimation.
Accuracy of ABC methods relies on the use of a small threshold value for
comparing simulations from the model and observed data. The proposed
methodology shows how to use large threshold values, while the number of
data-clones is increased to ease convergence towards an approximate maximum
likelihood estimate. We show how to exploit the methodology to reduce the
number of iterations of a standard ABC-MCMC algorithm and therefore reduce the
computational effort, while obtaining reasonable point estimates. Simulation
studies show the good performance of our approach on models with intractable
likelihoods such as g-and-k distributions, stochastic differential equations
and state-space models.Comment: 25 pages. Minor revision. It includes a parametric bootstrap for the
exact MLE for the first example; includes mean bias and RMSE calculations for
the third example. Forthcoming in Computational Statistics and Data Analysi
A Likelihood-Free Inference Framework for Population Genetic Data using Exchangeable Neural Networks
An explosion of high-throughput DNA sequencing in the past decade has led to
a surge of interest in population-scale inference with whole-genome data.
Recent work in population genetics has centered on designing inference methods
for relatively simple model classes, and few scalable general-purpose inference
techniques exist for more realistic, complex models. To achieve this, two
inferential challenges need to be addressed: (1) population data are
exchangeable, calling for methods that efficiently exploit the symmetries of
the data, and (2) computing likelihoods is intractable as it requires
integrating over a set of correlated, extremely high-dimensional latent
variables. These challenges are traditionally tackled by likelihood-free
methods that use scientific simulators to generate datasets and reduce them to
hand-designed, permutation-invariant summary statistics, often leading to
inaccurate inference. In this work, we develop an exchangeable neural network
that performs summary statistic-free, likelihood-free inference. Our framework
can be applied in a black-box fashion across a variety of simulation-based
tasks, both within and outside biology. We demonstrate the power of our
approach on the recombination hotspot testing problem, outperforming the
state-of-the-art.Comment: 9 pages, 8 figure
Divide and conquer in ABC: Expectation-Progagation algorithms for likelihood-free inference
ABC algorithms are notoriously expensive in computing time, as they require
simulating many complete artificial datasets from the model. We advocate in
this paper a "divide and conquer" approach to ABC, where we split the
likelihood into n factors, and combine in some way n "local" ABC approximations
of each factor. This has two advantages: (a) such an approach is typically much
faster than standard ABC and (b) it makes it possible to use local summary
statistics (i.e. summary statistics that depend only on the data-points that
correspond to a single factor), rather than global summary statistics (that
depend on the complete dataset). This greatly alleviates the bias introduced by
summary statistics, and even removes it entirely in situations where local
summary statistics are simply the identity function.
We focus on EP (Expectation-Propagation), a convenient and powerful way to
combine n local approximations into a global approximation. Compared to the EP-
ABC approach of Barthelm\'e and Chopin (2014), we present two variations, one
based on the parallel EP algorithm of Cseke and Heskes (2011), which has the
advantage of being implementable on a parallel architecture, and one version
which bridges the gap between standard EP and parallel EP. We illustrate our
approach with an expensive application of ABC, namely inference on spatial
extremes.Comment: To appear in the forthcoming Handbook of Approximate Bayesian
Computation (ABC), edited by S. Sisson, L. Fan, and M. Beaumon
An automatic adaptive method to combine summary statistics in approximate Bayesian computation
To infer the parameters of mechanistic models with intractable likelihoods,
techniques such as approximate Bayesian computation (ABC) are increasingly
being adopted. One of the main disadvantages of ABC in practical situations,
however, is that parameter inference must generally rely on summary statistics
of the data. This is particularly the case for problems involving
high-dimensional data, such as biological imaging experiments. However, some
summary statistics contain more information about parameters of interest than
others, and it is not always clear how to weight their contributions within the
ABC framework. We address this problem by developing an automatic, adaptive
algorithm that chooses weights for each summary statistic. Our algorithm aims
to maximize the distance between the prior and the approximate posterior by
automatically adapting the weights within the ABC distance function.
Computationally, we use a nearest neighbour estimator of the distance between
distributions. We justify the algorithm theoretically based on properties of
the nearest neighbour distance estimator. To demonstrate the effectiveness of
our algorithm, we apply it to a variety of test problems, including several
stochastic models of biochemical reaction networks, and a spatial model of
diffusion, and compare our results with existing algorithms
- …