9 research outputs found
Hidden Gibbs random fields model selection using Block Likelihood Information Criterion
Performing model selection between Gibbs random fields is a very challenging
task. Indeed, due to the Markovian dependence structure, the normalizing
constant of the fields cannot be computed using standard analytical or
numerical methods. Furthermore, such unobserved fields cannot be integrated out
and the likelihood evaluztion is a doubly intractable problem. This forms a
central issue to pick the model that best fits an observed data. We introduce a
new approximate version of the Bayesian Information Criterion. We partition the
lattice into continuous rectangular blocks and we approximate the probability
measure of the hidden Gibbs field by the product of some Gibbs distributions
over the blocks. On that basis, we estimate the likelihood and derive the Block
Likelihood Information Criterion (BLIC) that answers model choice questions
such as the selection of the dependency structure or the number of latent
states. We study the performances of BLIC for those questions. In addition, we
present a comparison with ABC algorithms to point out that the novel criterion
offers a better trade-off between time efficiency and reliable results
Pre-processing for approximate Bayesian computation in image analysis
Most of the existing algorithms for approximate Bayesian computation (ABC)
assume that it is feasible to simulate pseudo-data from the model at each
iteration. However, the computational cost of these simulations can be
prohibitive for high dimensional data. An important example is the Potts model,
which is commonly used in image analysis. Images encountered in real world
applications can have millions of pixels, therefore scalability is a major
concern. We apply ABC with a synthetic likelihood to the hidden Potts model
with additive Gaussian noise. Using a pre-processing step, we fit a binding
function to model the relationship between the model parameters and the
synthetic likelihood parameters. Our numerical experiments demonstrate that the
precomputed binding function dramatically improves the scalability of ABC,
reducing the average runtime required for model fitting from 71 hours to only 7
minutes. We also illustrate the method by estimating the smoothing parameter
for remotely sensed satellite imagery. Without precomputation, Bayesian
inference is impractical for datasets of that scale.Comment: 5th IMS-ISBA joint meeting (MCMSki IV
Reliable ABC model choice via random forests
Approximate Bayesian computation (ABC) methods provide an elaborate approach
to Bayesian inference on complex models, including model choice. Both
theoretical arguments and simulation experiments indicate, however, that model
posterior probabilities may be poorly evaluated by standard ABC techniques. We
propose a novel approach based on a machine learning tool named random forests
to conduct selection among the highly complex models covered by ABC algorithms.
We thus modify the way Bayesian model selection is both understood and
operated, in that we rephrase the inferential goal as a classification problem,
first predicting the model that best fits the data with random forests and
postponing the approximation of the posterior probability of the predicted MAP
for a second stage also relying on random forests. Compared with earlier
implementations of ABC model choice, the ABC random forest approach offers
several potential improvements: (i) it often has a larger discriminative power
among the competing models, (ii) it is more robust against the number and
choice of statistics summarizing the data, (iii) the computing effort is
drastically reduced (with a gain in computation efficiency of at least fifty),
and (iv) it includes an approximation of the posterior probability of the
selected model. The call to random forests will undoubtedly extend the range of
size of datasets and complexity of models that ABC can handle. We illustrate
the power of this novel methodology by analyzing controlled experiments as well
as genuine population genetics datasets. The proposed methodologies are
implemented in the R package abcrf available on the CRAN.Comment: 39 pages, 15 figures, 6 table
Likelihood-free model choice
Fan, and Beaumont (2017). Beyond exposing the potential pitfalls of ABC approximations to posterior probabilities, the review emphasizes mostly the solution proposed by [25] on the use of random forests for aggregating summary statistics and for estimating the posterior probability of the most likely model via a secondary random forest
Inference with selection, varying population size and evolving population structure: Application of ABC to a forward-backward coalescent process with interactions
Genetic data are often used to infer history, demographic changes or detect genes under selection. Inferential methods are commonly based on models making various strong assumptions: demography and population structures are supposed \textit{a priori known}, the evolution of the genetic composition of a population does not affect demography nor population structure, and there is no selection nor interaction between and within genetic strains. In this paper, we present a stochastic birth death model with competitive interaction to describe an asexual population, and we develop an inferential procedure for ecological, demographic and genetical parameters. We first show how genetic diversity and genealogies are related to birth and death rates, and to how individuals compete within and between strains. {This leads us to propose an original model of phylogenies, with trait structure and interactions, that allows multiple mergings}. Second, we develop an Approximate Bayesian Computation framework to use our model for analyzing genetic data. We apply our procedure to simulated and real data. We show that the procedure give accurate estimate of the parameters of the model. We finally carry an illustration on real data and analyze the genetic diversity of microsatellites on Y-chromosomes sampled from Central Asia populations in order to test whether different social organizations show significantly different fertility