731 research outputs found
Estimating Bayes factors via thermodynamic integration and population MCMC
A Bayesian approach to model comparison based on the integrated or marginal likelihood is considered, and applications to linear regression models and nonlinear ordinary differential equation (ODE) models are used as the setting in which to elucidate and further develop existing statistical methodology. The focus is on two methods of marginal likelihood estimation. First, a statistical failure of the widely employed Posterior Harmonic Mean estimator is highlighted. It is demonstrated that there is a systematic bias capable of significantly skewing Bayes factor estimates, which has not previously been highlighted in the literature. Second, a detailed study of the recently proposed Thermodynamic Integral estimator is presented, which characterises the error associated with its discrete form. An experimental study using analytically tractable linear regression models highlights substantial differences with recently published results regarding optimal discretisation. Finally, with the insights gained, it is demonstrated how Population MCMC and thermodynamic integration methods may be elegantly combined to estimate Bayes factors accurately enough to discriminate between nonlinear models based on systems of ODEs, which has important application in describing the behaviour of complex processes arising in a wide variety of research areas, such as Systems Biology, Computational Ecology and Chemical Engineering. (C) 2009 Elsevier B.V. All rights reserve
Bayesian model selection for exponential random graph models via adjusted pseudolikelihoods
Models with intractable likelihood functions arise in areas including network
analysis and spatial statistics, especially those involving Gibbs random
fields. Posterior parameter es timation in these settings is termed a
doubly-intractable problem because both the likelihood function and the
posterior distribution are intractable. The comparison of Bayesian models is
often based on the statistical evidence, the integral of the un-normalised
posterior distribution over the model parameters which is rarely available in
closed form. For doubly-intractable models, estimating the evidence adds
another layer of difficulty. Consequently, the selection of the model that best
describes an observed network among a collection of exponential random graph
models for network analysis is a daunting task. Pseudolikelihoods offer a
tractable approximation to the likelihood but should be treated with caution
because they can lead to an unreasonable inference. This paper specifies a
method to adjust pseudolikelihoods in order to obtain a reasonable, yet
tractable, approximation to the likelihood. This allows implementation of
widely used computational methods for evidence estimation and pursuit of
Bayesian model selection of exponential random graph models for the analysis of
social networks. Empirical comparisons to existing methods show that our
procedure yields similar evidence estimates, but at a lower computational cost.Comment: Supplementary material attached. To view attachments, please download
and extract the gzzipped source file listed under "Other formats
Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison
Within path sampling framework, we show that probability distribution
divergences, such as the Chernoff information, can be estimated via
thermodynamic integration. The Boltzmann-Gibbs distribution pertaining to
different Hamiltonians is implemented to derive tempered transitions along the
path, linking the distributions of interest at the endpoints. Under this
perspective, a geometric approach is feasible, which prompts intuition and
facilitates tuning the error sources. Additionally, there are direct
applications in Bayesian model evaluation. Existing marginal likelihood and
Bayes factor estimators are reviewed here along with their stepping-stone
sampling analogues. New estimators are presented and the use of compound paths
is introduced
An alternative marginal likelihood estimator for phylogenetic models
Bayesian phylogenetic methods are generating noticeable enthusiasm in the
field of molecular systematics. Many phylogenetic models are often at stake and
different approaches are used to compare them within a Bayesian framework. The
Bayes factor, defined as the ratio of the marginal likelihoods of two competing
models, plays a key role in Bayesian model selection. We focus on an
alternative estimator of the marginal likelihood whose computation is still a
challenging problem. Several computational solutions have been proposed none of
which can be considered outperforming the others simultaneously in terms of
simplicity of implementation, computational burden and precision of the
estimates. Practitioners and researchers, often led by available software, have
privileged so far the simplicity of the harmonic mean estimator (HM) and the
arithmetic mean estimator (AM). However it is known that the resulting
estimates of the Bayesian evidence in favor of one model are biased and often
inaccurate up to having an infinite variance so that the reliability of the
corresponding conclusions is doubtful. Our new implementation of the
generalized harmonic mean (GHM) idea recycles MCMC simulations from the
posterior, shares the computational simplicity of the original HM estimator,
but, unlike it, overcomes the infinite variance issue. The alternative
estimator is applied to simulated phylogenetic data and produces fully
satisfactory results outperforming those simple estimators currently provided
by most of the publicly available software
Targeting Bayes factors with direct-path non-equilibrium thermodynamic integration
Thermodynamic integration (TI) for computing marginal likelihoods is based on an inverse annealing path from the prior to the posterior distribution. In many cases, the resulting estimator suffers from high variability, which particularly stems from the prior regime. When comparing complex models with differences in a comparatively small number of parameters, intrinsic errors from sampling fluctuations may outweigh the differences in the log marginal likelihood estimates. In the present article, we propose a thermodynamic integration scheme that directly targets the log Bayes factor. The method is based on a modified annealing path between the posterior distributions of the two models compared, which systematically avoids the high variance prior regime. We combine this scheme with the concept of non-equilibrium TI to minimise discretisation errors from numerical integration. Results obtained on Bayesian regression models applied to standard benchmark data, and a complex hierarchical model applied to biopathway inference, demonstrate a significant reduction in estimator variance over state-of-the-art TI methods
Marginal likelihoods in phylogenetics: a review of methods and applications
By providing a framework of accounting for the shared ancestry inherent to
all life, phylogenetics is becoming the statistical foundation of biology. The
importance of model choice continues to grow as phylogenetic models continue to
increase in complexity to better capture micro and macroevolutionary processes.
In a Bayesian framework, the marginal likelihood is how data update our prior
beliefs about models, which gives us an intuitive measure of comparing model
fit that is grounded in probability theory. Given the rapid increase in the
number and complexity of phylogenetic models, methods for approximating
marginal likelihoods are increasingly important. Here we try to provide an
intuitive description of marginal likelihoods and why they are important in
Bayesian model testing. We also categorize and review methods for estimating
marginal likelihoods of phylogenetic models, highlighting several recent
methods that provide well-behaved estimates. Furthermore, we review some
empirical studies that demonstrate how marginal likelihoods can be used to
learn about models of evolution from biological data. We discuss promising
alternatives that can complement marginal likelihoods for Bayesian model
choice, including posterior-predictive methods. Using simulations, we find one
alternative method based on approximate-Bayesian computation (ABC) to be
biased. We conclude by discussing the challenges of Bayesian model choice and
future directions that promise to improve the approximation of marginal
likelihoods and Bayesian phylogenetics as a whole.Comment: 33 pages, 3 figure
A study of Population MCMC for estimating Bayes Factors over nonlinear ODE models
Higher resolution biological data is now becoming available in ever greater quantities, allowing the complex behaviour of fundamental biological processes to be studied in much more detail. The area of Systems Biology is in desperate need of methods for inferring the most likely topology of the underlying genetic networks from this oftentimes noisy and poorly sampled data, to support the construction and testing of new model hypotheses. Towards that end, Bayesian methodology provides an ideal framework for tackling such challenges, and in particular offers a means of objectively comparing competing plausible models through the estimation of Bayes factors.
There are, however, formidable obstacles which must be overcome to allow model inference using Bayes factors to be of practical use. Many important biological processes may be most accurately represented using nonlinear models based on systems of ordinary differential equations (ODEs), however parameter inference over these models often produces correspondingly nonlinear posterior distributions, which are very challenging to sample from, often resulting in biased
marginal likelihood estimates with large variances. Such problems are commonly encountered when modelling circardian rhythms, which exhibit highly nonlinear oscillatory dynamics and play a central role in the overall functioning of most
organisms. In this thesis I investigate tools for calculating Bayes factors to distinguish between ODE-based Goodwin oscillator models of varying complexity, which form the basic building blocks for describing this ubiquitous circadian behaviour.
The main result in Chapter 3 of this thesis demonstrates how Population Markov Chain Monte Carlo may be employed in conjunction with thermodynamic integration methods to estimate Bayes factors which may accurately distinguish
between two nonlinear oscillator models of varying complexity, given noisy experimental data generated from each of the models. In addition, it is shown how alternative methods may fail drastically in this setting, in particular harmonic mean based estimates. Suggestions are given regarding the optimal temperature schedule which should be employed for Population MCMC, and several ideas for future research extending this work are also discussed
A Bayesian Approach to the Detection Problem in Gravitational Wave Astronomy
The analysis of data from gravitational wave detectors can be divided into
three phases: search, characterization, and evaluation. The evaluation of the
detection - determining whether a candidate event is astrophysical in origin or
some artifact created by instrument noise - is a crucial step in the analysis.
The on-going analyses of data from ground based detectors employ a frequentist
approach to the detection problem. A detection statistic is chosen, for which
background levels and detection efficiencies are estimated from Monte Carlo
studies. This approach frames the detection problem in terms of an infinite
collection of trials, with the actual measurement corresponding to some
realization of this hypothetical set. Here we explore an alternative, Bayesian
approach to the detection problem, that considers prior information and the
actual data in hand. Our particular focus is on the computational techniques
used to implement the Bayesian analysis. We find that the Parallel Tempered
Markov Chain Monte Carlo (PTMCMC) algorithm is able to address all three phases
of the anaylsis in a coherent framework. The signals are found by locating the
posterior modes, the model parameters are characterized by mapping out the
joint posterior distribution, and finally, the model evidence is computed by
thermodynamic integration. As a demonstration, we consider the detection
problem of selecting between models describing the data as instrument noise, or
instrument noise plus the signal from a single compact galactic binary. The
evidence ratios, or Bayes factors, computed by the PTMCMC algorithm are found
to be in close agreement with those computed using a Reversible Jump Markov
Chain Monte Carlo algorithm.Comment: 19 pages, 12 figures, revised to address referee's comment
- …