13,749 research outputs found
Accelerating MCMC via Parallel Predictive Prefetching
We present a general framework for accelerating a large class of widely used
Markov chain Monte Carlo (MCMC) algorithms. Our approach exploits fast,
iterative approximations to the target density to speculatively evaluate many
potential future steps of the chain in parallel. The approach can accelerate
computation of the target distribution of a Bayesian inference problem, without
compromising exactness, by exploiting subsets of data. It takes advantage of
whatever parallel resources are available, but produces results exactly
equivalent to standard serial execution. In the initial burn-in phase of chain
evaluation, it achieves speedup over serial evaluation that is close to linear
in the number of available cores
An alternative marginal likelihood estimator for phylogenetic models
Bayesian phylogenetic methods are generating noticeable enthusiasm in the
field of molecular systematics. Many phylogenetic models are often at stake and
different approaches are used to compare them within a Bayesian framework. The
Bayes factor, defined as the ratio of the marginal likelihoods of two competing
models, plays a key role in Bayesian model selection. We focus on an
alternative estimator of the marginal likelihood whose computation is still a
challenging problem. Several computational solutions have been proposed none of
which can be considered outperforming the others simultaneously in terms of
simplicity of implementation, computational burden and precision of the
estimates. Practitioners and researchers, often led by available software, have
privileged so far the simplicity of the harmonic mean estimator (HM) and the
arithmetic mean estimator (AM). However it is known that the resulting
estimates of the Bayesian evidence in favor of one model are biased and often
inaccurate up to having an infinite variance so that the reliability of the
corresponding conclusions is doubtful. Our new implementation of the
generalized harmonic mean (GHM) idea recycles MCMC simulations from the
posterior, shares the computational simplicity of the original HM estimator,
but, unlike it, overcomes the infinite variance issue. The alternative
estimator is applied to simulated phylogenetic data and produces fully
satisfactory results outperforming those simple estimators currently provided
by most of the publicly available software
Likelihood-Free Parallel Tempering
Approximate Bayesian Computational (ABC) methods (or likelihood-free methods)
have appeared in the past fifteen years as useful methods to perform Bayesian
analyses when the likelihood is analytically or computationally intractable.
Several ABC methods have been proposed: Monte Carlo Markov Chains (MCMC)
methods have been developped by Marjoramet al. (2003) and by Bortotet al.
(2007) for instance, and sequential methods have been proposed among others by
Sissonet al. (2007), Beaumont et al. (2009) and Del Moral et al. (2009). Until
now, while ABC-MCMC methods remain the reference, sequential ABC methods have
appeared to outperforms them (see for example McKinley et al. (2009) or Sisson
et al. (2007)). In this paper a new algorithm combining population-based MCMC
methods with ABC requirements is proposed, using an analogy with the Parallel
Tempering algorithm (Geyer, 1991). Performances are compared with existing ABC
algorithms on simulations and on a real example
Bayesian inference on compact binary inspiral gravitational radiation signals in interferometric data
Presented is a description of a Markov chain Monte Carlo (MCMC) parameter
estimation routine for use with interferometric gravitational radiational data
in searches for binary neutron star inspiral signals. Five parameters
associated with the inspiral can be estimated, and summary statistics are
produced. Advanced MCMC methods were implemented, including importance
resampling and prior distributions based on detection probability, in order to
increase the efficiency of the code. An example is presented from an
application using realistic, albeit fictitious, data.Comment: submitted to Classical and Quantum Gravity. 14 pages, 5 figure
Analytic Continuation of Quantum Monte Carlo Data by Stochastic Analytical Inference
We present an algorithm for the analytic continuation of imaginary-time
quantum Monte Carlo data which is strictly based on principles of Bayesian
statistical inference. Within this framework we are able to obtain an explicit
expression for the calculation of a weighted average over possible energy
spectra, which can be evaluated by standard Monte Carlo simulations, yielding
as by-product also the distribution function as function of the regularization
parameter. Our algorithm thus avoids the usual ad-hoc assumptions introduced in
similar algortihms to fix the regularization parameter. We apply the algorithm
to imaginary-time quantum Monte Carlo data and compare the resulting energy
spectra with those from a standard maximum entropy calculation
A Bayesian approach to the semi-analytic model of galaxy formation: methodology
We believe that a wide range of physical processes conspire to shape the
observed galaxy population but we remain unsure of their detailed interactions.
The semi-analytic model (SAM) of galaxy formation uses multi-dimensional
parameterisations of the physical processes of galaxy formation and provides a
tool to constrain these underlying physical interactions. Because of the high
dimensionality, the parametric problem of galaxy formation may be profitably
tackled with a Bayesian-inference based approach, which allows one to constrain
theory with data in a statistically rigorous way. In this paper we develop a
SAM in the framework of Bayesian inference. We show that, with a parallel
implementation of an advanced Markov-Chain Monte-Carlo algorithm, it is now
possible to rigorously sample the posterior distribution of the
high-dimensional parameter space of typical SAMs. As an example, we
characterise galaxy formation in the current CDM cosmology using the
stellar mass function of galaxies as an observational constraint. We find that
the posterior probability distribution is both topologically complex and
degenerate in some important model parameters, suggesting that thorough
explorations of the parameter space are needed to understand the models. We
also demonstrate that because of the model degeneracy, adopting a narrow prior
strongly restricts the model. Therefore, the inferences based on SAMs are
conditional to the model adopted. Using synthetic data to mimic systematic
errors in the stellar mass function, we demonstrate that an accurate
observational error model is essential to meaningful inference.Comment: revised version to match published article published in MNRA
Priors for Random Count Matrices Derived from a Family of Negative Binomial Processes
We define a family of probability distributions for random count matrices
with a potentially unbounded number of rows and columns. The three
distributions we consider are derived from the gamma-Poisson, gamma-negative
binomial, and beta-negative binomial processes. Because the models lead to
closed-form Gibbs sampling update equations, they are natural candidates for
nonparametric Bayesian priors over count matrices. A key aspect of our analysis
is the recognition that, although the random count matrices within the family
are defined by a row-wise construction, their columns can be shown to be i.i.d.
This fact is used to derive explicit formulas for drawing all the columns at
once. Moreover, by analyzing these matrices' combinatorial structure, we
describe how to sequentially construct a column-i.i.d. random count matrix one
row at a time, and derive the predictive distribution of a new row count vector
with previously unseen features. We describe the similarities and differences
between the three priors, and argue that the greater flexibility of the gamma-
and beta- negative binomial processes, especially their ability to model
over-dispersed, heavy-tailed count data, makes these well suited to a wide
variety of real-world applications. As an example of our framework, we
construct a naive-Bayes text classifier to categorize a count vector to one of
several existing random count matrices of different categories. The classifier
supports an unbounded number of features, and unlike most existing methods, it
does not require a predefined finite vocabulary to be shared by all the
categories, and needs neither feature selection nor parameter tuning. Both the
gamma- and beta- negative binomial processes are shown to significantly
outperform the gamma-Poisson process for document categorization, with
comparable performance to other state-of-the-art supervised text classification
algorithms.Comment: To appear in Journal of the American Statistical Association (Theory
and Methods). 31 pages + 11 page supplement, 5 figure
- …