731 research outputs found

    Estimating Bayes factors via thermodynamic integration and population MCMC

    Get PDF
    A Bayesian approach to model comparison based on the integrated or marginal likelihood is considered, and applications to linear regression models and nonlinear ordinary differential equation (ODE) models are used as the setting in which to elucidate and further develop existing statistical methodology. The focus is on two methods of marginal likelihood estimation. First, a statistical failure of the widely employed Posterior Harmonic Mean estimator is highlighted. It is demonstrated that there is a systematic bias capable of significantly skewing Bayes factor estimates, which has not previously been highlighted in the literature. Second, a detailed study of the recently proposed Thermodynamic Integral estimator is presented, which characterises the error associated with its discrete form. An experimental study using analytically tractable linear regression models highlights substantial differences with recently published results regarding optimal discretisation. Finally, with the insights gained, it is demonstrated how Population MCMC and thermodynamic integration methods may be elegantly combined to estimate Bayes factors accurately enough to discriminate between nonlinear models based on systems of ODEs, which has important application in describing the behaviour of complex processes arising in a wide variety of research areas, such as Systems Biology, Computational Ecology and Chemical Engineering. (C) 2009 Elsevier B.V. All rights reserve

    Bayesian model selection for exponential random graph models via adjusted pseudolikelihoods

    Get PDF
    Models with intractable likelihood functions arise in areas including network analysis and spatial statistics, especially those involving Gibbs random fields. Posterior parameter es timation in these settings is termed a doubly-intractable problem because both the likelihood function and the posterior distribution are intractable. The comparison of Bayesian models is often based on the statistical evidence, the integral of the un-normalised posterior distribution over the model parameters which is rarely available in closed form. For doubly-intractable models, estimating the evidence adds another layer of difficulty. Consequently, the selection of the model that best describes an observed network among a collection of exponential random graph models for network analysis is a daunting task. Pseudolikelihoods offer a tractable approximation to the likelihood but should be treated with caution because they can lead to an unreasonable inference. This paper specifies a method to adjust pseudolikelihoods in order to obtain a reasonable, yet tractable, approximation to the likelihood. This allows implementation of widely used computational methods for evidence estimation and pursuit of Bayesian model selection of exponential random graph models for the analysis of social networks. Empirical comparisons to existing methods show that our procedure yields similar evidence estimates, but at a lower computational cost.Comment: Supplementary material attached. To view attachments, please download and extract the gzzipped source file listed under "Other formats

    Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

    Full text link
    Within path sampling framework, we show that probability distribution divergences, such as the Chernoff information, can be estimated via thermodynamic integration. The Boltzmann-Gibbs distribution pertaining to different Hamiltonians is implemented to derive tempered transitions along the path, linking the distributions of interest at the endpoints. Under this perspective, a geometric approach is feasible, which prompts intuition and facilitates tuning the error sources. Additionally, there are direct applications in Bayesian model evaluation. Existing marginal likelihood and Bayes factor estimators are reviewed here along with their stepping-stone sampling analogues. New estimators are presented and the use of compound paths is introduced

    An alternative marginal likelihood estimator for phylogenetic models

    Get PDF
    Bayesian phylogenetic methods are generating noticeable enthusiasm in the field of molecular systematics. Many phylogenetic models are often at stake and different approaches are used to compare them within a Bayesian framework. The Bayes factor, defined as the ratio of the marginal likelihoods of two competing models, plays a key role in Bayesian model selection. We focus on an alternative estimator of the marginal likelihood whose computation is still a challenging problem. Several computational solutions have been proposed none of which can be considered outperforming the others simultaneously in terms of simplicity of implementation, computational burden and precision of the estimates. Practitioners and researchers, often led by available software, have privileged so far the simplicity of the harmonic mean estimator (HM) and the arithmetic mean estimator (AM). However it is known that the resulting estimates of the Bayesian evidence in favor of one model are biased and often inaccurate up to having an infinite variance so that the reliability of the corresponding conclusions is doubtful. Our new implementation of the generalized harmonic mean (GHM) idea recycles MCMC simulations from the posterior, shares the computational simplicity of the original HM estimator, but, unlike it, overcomes the infinite variance issue. The alternative estimator is applied to simulated phylogenetic data and produces fully satisfactory results outperforming those simple estimators currently provided by most of the publicly available software

    Targeting Bayes factors with direct-path non-equilibrium thermodynamic integration

    Get PDF
    Thermodynamic integration (TI) for computing marginal likelihoods is based on an inverse annealing path from the prior to the posterior distribution. In many cases, the resulting estimator suffers from high variability, which particularly stems from the prior regime. When comparing complex models with differences in a comparatively small number of parameters, intrinsic errors from sampling fluctuations may outweigh the differences in the log marginal likelihood estimates. In the present article, we propose a thermodynamic integration scheme that directly targets the log Bayes factor. The method is based on a modified annealing path between the posterior distributions of the two models compared, which systematically avoids the high variance prior regime. We combine this scheme with the concept of non-equilibrium TI to minimise discretisation errors from numerical integration. Results obtained on Bayesian regression models applied to standard benchmark data, and a complex hierarchical model applied to biopathway inference, demonstrate a significant reduction in estimator variance over state-of-the-art TI methods

    Marginal likelihoods in phylogenetics: a review of methods and applications

    Full text link
    By providing a framework of accounting for the shared ancestry inherent to all life, phylogenetics is becoming the statistical foundation of biology. The importance of model choice continues to grow as phylogenetic models continue to increase in complexity to better capture micro and macroevolutionary processes. In a Bayesian framework, the marginal likelihood is how data update our prior beliefs about models, which gives us an intuitive measure of comparing model fit that is grounded in probability theory. Given the rapid increase in the number and complexity of phylogenetic models, methods for approximating marginal likelihoods are increasingly important. Here we try to provide an intuitive description of marginal likelihoods and why they are important in Bayesian model testing. We also categorize and review methods for estimating marginal likelihoods of phylogenetic models, highlighting several recent methods that provide well-behaved estimates. Furthermore, we review some empirical studies that demonstrate how marginal likelihoods can be used to learn about models of evolution from biological data. We discuss promising alternatives that can complement marginal likelihoods for Bayesian model choice, including posterior-predictive methods. Using simulations, we find one alternative method based on approximate-Bayesian computation (ABC) to be biased. We conclude by discussing the challenges of Bayesian model choice and future directions that promise to improve the approximation of marginal likelihoods and Bayesian phylogenetics as a whole.Comment: 33 pages, 3 figure

    A study of Population MCMC for estimating Bayes Factors over nonlinear ODE models

    Get PDF
    Higher resolution biological data is now becoming available in ever greater quantities, allowing the complex behaviour of fundamental biological processes to be studied in much more detail. The area of Systems Biology is in desperate need of methods for inferring the most likely topology of the underlying genetic networks from this oftentimes noisy and poorly sampled data, to support the construction and testing of new model hypotheses. Towards that end, Bayesian methodology provides an ideal framework for tackling such challenges, and in particular offers a means of objectively comparing competing plausible models through the estimation of Bayes factors. There are, however, formidable obstacles which must be overcome to allow model inference using Bayes factors to be of practical use. Many important biological processes may be most accurately represented using nonlinear models based on systems of ordinary differential equations (ODEs), however parameter inference over these models often produces correspondingly nonlinear posterior distributions, which are very challenging to sample from, often resulting in biased marginal likelihood estimates with large variances. Such problems are commonly encountered when modelling circardian rhythms, which exhibit highly nonlinear oscillatory dynamics and play a central role in the overall functioning of most organisms. In this thesis I investigate tools for calculating Bayes factors to distinguish between ODE-based Goodwin oscillator models of varying complexity, which form the basic building blocks for describing this ubiquitous circadian behaviour. The main result in Chapter 3 of this thesis demonstrates how Population Markov Chain Monte Carlo may be employed in conjunction with thermodynamic integration methods to estimate Bayes factors which may accurately distinguish between two nonlinear oscillator models of varying complexity, given noisy experimental data generated from each of the models. In addition, it is shown how alternative methods may fail drastically in this setting, in particular harmonic mean based estimates. Suggestions are given regarding the optimal temperature schedule which should be employed for Population MCMC, and several ideas for future research extending this work are also discussed

    A Bayesian Approach to the Detection Problem in Gravitational Wave Astronomy

    Full text link
    The analysis of data from gravitational wave detectors can be divided into three phases: search, characterization, and evaluation. The evaluation of the detection - determining whether a candidate event is astrophysical in origin or some artifact created by instrument noise - is a crucial step in the analysis. The on-going analyses of data from ground based detectors employ a frequentist approach to the detection problem. A detection statistic is chosen, for which background levels and detection efficiencies are estimated from Monte Carlo studies. This approach frames the detection problem in terms of an infinite collection of trials, with the actual measurement corresponding to some realization of this hypothetical set. Here we explore an alternative, Bayesian approach to the detection problem, that considers prior information and the actual data in hand. Our particular focus is on the computational techniques used to implement the Bayesian analysis. We find that the Parallel Tempered Markov Chain Monte Carlo (PTMCMC) algorithm is able to address all three phases of the anaylsis in a coherent framework. The signals are found by locating the posterior modes, the model parameters are characterized by mapping out the joint posterior distribution, and finally, the model evidence is computed by thermodynamic integration. As a demonstration, we consider the detection problem of selecting between models describing the data as instrument noise, or instrument noise plus the signal from a single compact galactic binary. The evidence ratios, or Bayes factors, computed by the PTMCMC algorithm are found to be in close agreement with those computed using a Reversible Jump Markov Chain Monte Carlo algorithm.Comment: 19 pages, 12 figures, revised to address referee's comment
    corecore