55,166 research outputs found

    A Comparative Review of Dimension Reduction Methods in Approximate Bayesian Computation

    Get PDF
    Approximate Bayesian computation (ABC) methods make use of comparisons between simulated and observed summary statistics to overcome the problem of computationally intractable likelihood functions. As the practical implementation of ABC requires computations based on vectors of summary statistics, rather than full data sets, a central question is how to derive low-dimensional summary statistics from the observed data with minimal loss of information. In this article we provide a comprehensive review and comparison of the performance of the principal methods of dimension reduction proposed in the ABC literature. The methods are split into three nonmutually exclusive classes consisting of best subset selection methods, projection techniques and regularization. In addition, we introduce two new methods of dimension reduction. The first is a best subset selection method based on Akaike and Bayesian information criteria, and the second uses ridge regression as a regularization procedure. We illustrate the performance of these dimension reduction techniques through the analysis of three challenging models and data sets.Comment: Published in at http://dx.doi.org/10.1214/12-STS406 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Non-linear regression models for Approximate Bayesian Computation

    Full text link
    Approximate Bayesian inference on the basis of summary statistics is well-suited to complex problems for which the likelihood is either mathematically or computationally intractable. However the methods that use rejection suffer from the curse of dimensionality when the number of summary statistics is increased. Here we propose a machine-learning approach to the estimation of the posterior density by introducing two innovations. The new method fits a nonlinear conditional heteroscedastic regression of the parameter on the summary statistics, and then adaptively improves estimation using importance sampling. The new algorithm is compared to the state-of-the-art approximate Bayesian methods, and achieves considerable reduction of the computational burden in two examples of inference in statistical genetics and in a queueing model.Comment: 4 figures; version 3 minor changes; to appear in Statistics and Computin

    ABC random forests for Bayesian parameter inference

    Get PDF
    This preprint has been reviewed and recommended by Peer Community In Evolutionary Biology (http://dx.doi.org/10.24072/pci.evolbiol.100036). Approximate Bayesian computation (ABC) has grown into a standard methodology that manages Bayesian inference for models associated with intractable likelihood functions. Most ABC implementations require the preliminary selection of a vector of informative statistics summarizing raw data. Furthermore, in almost all existing implementations, the tolerance level that separates acceptance from rejection of simulated parameter values needs to be calibrated. We propose to conduct likelihood-free Bayesian inferences about parameters with no prior selection of the relevant components of the summary statistics and bypassing the derivation of the associated tolerance level. The approach relies on the random forest methodology of Breiman (2001) applied in a (non parametric) regression setting. We advocate the derivation of a new random forest for each component of the parameter vector of interest. When compared with earlier ABC solutions, this method offers significant gains in terms of robustness to the choice of the summary statistics, does not depend on any type of tolerance level, and is a good trade-off in term of quality of point estimator precision and credible interval estimations for a given computing time. We illustrate the performance of our methodological proposal and compare it with earlier ABC methods on a Normal toy example and a population genetics example dealing with human population evolution. All methods designed here have been incorporated in the R package abcrf (version 1.7) available on CRAN.Comment: Main text: 24 pages, 6 figures Supplementary Information: 14 pages, 5 figure

    Approximate maximum likelihood estimation using data-cloning ABC

    Full text link
    A maximum likelihood methodology for a general class of models is presented, using an approximate Bayesian computation (ABC) approach. The typical target of ABC methods are models with intractable likelihoods, and we combine an ABC-MCMC sampler with so-called "data cloning" for maximum likelihood estimation. Accuracy of ABC methods relies on the use of a small threshold value for comparing simulations from the model and observed data. The proposed methodology shows how to use large threshold values, while the number of data-clones is increased to ease convergence towards an approximate maximum likelihood estimate. We show how to exploit the methodology to reduce the number of iterations of a standard ABC-MCMC algorithm and therefore reduce the computational effort, while obtaining reasonable point estimates. Simulation studies show the good performance of our approach on models with intractable likelihoods such as g-and-k distributions, stochastic differential equations and state-space models.Comment: 25 pages. Minor revision. It includes a parametric bootstrap for the exact MLE for the first example; includes mean bias and RMSE calculations for the third example. Forthcoming in Computational Statistics and Data Analysi

    A Likelihood-Free Inference Framework for Population Genetic Data using Exchangeable Neural Networks

    Full text link
    An explosion of high-throughput DNA sequencing in the past decade has led to a surge of interest in population-scale inference with whole-genome data. Recent work in population genetics has centered on designing inference methods for relatively simple model classes, and few scalable general-purpose inference techniques exist for more realistic, complex models. To achieve this, two inferential challenges need to be addressed: (1) population data are exchangeable, calling for methods that efficiently exploit the symmetries of the data, and (2) computing likelihoods is intractable as it requires integrating over a set of correlated, extremely high-dimensional latent variables. These challenges are traditionally tackled by likelihood-free methods that use scientific simulators to generate datasets and reduce them to hand-designed, permutation-invariant summary statistics, often leading to inaccurate inference. In this work, we develop an exchangeable neural network that performs summary statistic-free, likelihood-free inference. Our framework can be applied in a black-box fashion across a variety of simulation-based tasks, both within and outside biology. We demonstrate the power of our approach on the recombination hotspot testing problem, outperforming the state-of-the-art.Comment: 9 pages, 8 figure

    Divide and conquer in ABC: Expectation-Progagation algorithms for likelihood-free inference

    Full text link
    ABC algorithms are notoriously expensive in computing time, as they require simulating many complete artificial datasets from the model. We advocate in this paper a "divide and conquer" approach to ABC, where we split the likelihood into n factors, and combine in some way n "local" ABC approximations of each factor. This has two advantages: (a) such an approach is typically much faster than standard ABC and (b) it makes it possible to use local summary statistics (i.e. summary statistics that depend only on the data-points that correspond to a single factor), rather than global summary statistics (that depend on the complete dataset). This greatly alleviates the bias introduced by summary statistics, and even removes it entirely in situations where local summary statistics are simply the identity function. We focus on EP (Expectation-Propagation), a convenient and powerful way to combine n local approximations into a global approximation. Compared to the EP- ABC approach of Barthelm\'e and Chopin (2014), we present two variations, one based on the parallel EP algorithm of Cseke and Heskes (2011), which has the advantage of being implementable on a parallel architecture, and one version which bridges the gap between standard EP and parallel EP. We illustrate our approach with an expensive application of ABC, namely inference on spatial extremes.Comment: To appear in the forthcoming Handbook of Approximate Bayesian Computation (ABC), edited by S. Sisson, L. Fan, and M. Beaumon

    An automatic adaptive method to combine summary statistics in approximate Bayesian computation

    Full text link
    To infer the parameters of mechanistic models with intractable likelihoods, techniques such as approximate Bayesian computation (ABC) are increasingly being adopted. One of the main disadvantages of ABC in practical situations, however, is that parameter inference must generally rely on summary statistics of the data. This is particularly the case for problems involving high-dimensional data, such as biological imaging experiments. However, some summary statistics contain more information about parameters of interest than others, and it is not always clear how to weight their contributions within the ABC framework. We address this problem by developing an automatic, adaptive algorithm that chooses weights for each summary statistic. Our algorithm aims to maximize the distance between the prior and the approximate posterior by automatically adapting the weights within the ABC distance function. Computationally, we use a nearest neighbour estimator of the distance between distributions. We justify the algorithm theoretically based on properties of the nearest neighbour distance estimator. To demonstrate the effectiveness of our algorithm, we apply it to a variety of test problems, including several stochastic models of biochemical reaction networks, and a spatial model of diffusion, and compare our results with existing algorithms
    • …
    corecore