250 research outputs found
Adaptive approximate Bayesian computation for complex models
Approximate Bayesian computation (ABC) is a family of computational
techniques in Bayesian statistics. These techniques allow to fi t a model to
data without relying on the computation of the model likelihood. They instead
require to simulate a large number of times the model to be fi tted. A number
of re finements to the original rejection-based ABC scheme have been proposed,
including the sequential improvement of posterior distributions. This technique
allows to de- crease the number of model simulations required, but it still
presents several shortcomings which are particu- larly problematic for costly
to simulate complex models. We here provide a new algorithm to perform adaptive
approximate Bayesian computation, which is shown to perform better on both a
toy example and a complex social model.Comment: 14 pages, 5 figure
Non-linear regression models for Approximate Bayesian Computation
Approximate Bayesian inference on the basis of summary statistics is
well-suited to complex problems for which the likelihood is either
mathematically or computationally intractable. However the methods that use
rejection suffer from the curse of dimensionality when the number of summary
statistics is increased. Here we propose a machine-learning approach to the
estimation of the posterior density by introducing two innovations. The new
method fits a nonlinear conditional heteroscedastic regression of the parameter
on the summary statistics, and then adaptively improves estimation using
importance sampling. The new algorithm is compared to the state-of-the-art
approximate Bayesian methods, and achieves considerable reduction of the
computational burden in two examples of inference in statistical genetics and
in a queueing model.Comment: 4 figures; version 3 minor changes; to appear in Statistics and
Computin
Choosing summary statistics by least angle regression for approximate Bayesian computation
YesBayesian statistical inference relies on the posterior distribution. Depending on the model, the posterior can be more or less difficult to derive. In recent years, there has been a lot of interest in complex settings where the likelihood is analytically intractable. In such situations, approximate Bayesian computation (ABC) provides an attractive way of carrying out Bayesian inference. For obtaining reliable posterior estimates however, it is important to keep the approximation errors small in ABC. The choice of an appropriate set of summary statistics plays a crucial role in this effort. Here, we report the development of a new algorithm that is based on least angle regression for choosing summary statistics. In two population genetic examples, the performance of the new algorithm is better than a previously proposed approach that uses partial least squares.Higher Education Commission (HEC), College Deanship of Scientific Research, King Saud University, Riyadh Saudi Arabia - research group project RGP-VPP-280
Simulation-based model selection for dynamical systems in systems and population biology
Computer simulations have become an important tool across the biomedical
sciences and beyond. For many important problems several different models or
hypotheses exist and choosing which one best describes reality or observed data
is not straightforward. We therefore require suitable statistical tools that
allow us to choose rationally between different mechanistic models of e.g.
signal transduction or gene regulation networks. This is particularly
challenging in systems biology where only a small number of molecular species
can be assayed at any given time and all measurements are subject to
measurement uncertainty. Here we develop such a model selection framework based
on approximate Bayesian computation and employing sequential Monte Carlo
sampling. We show that our approach can be applied across a wide range of
biological scenarios, and we illustrate its use on real data describing
influenza dynamics and the JAK-STAT signalling pathway. Bayesian model
selection strikes a balance between the complexity of the simulation models and
their ability to describe observed data. The present approach enables us to
employ the whole formal apparatus to any system that can be (efficiently)
simulated, even when exact likelihoods are computationally intractable.Comment: This article is in press in Bioinformatics, 2009. Advance Access is
available on Bioinformatics webpag
Bayesian Parameter Estimation for Latent Markov Random Fields and Social Networks
Undirected graphical models are widely used in statistics, physics and
machine vision. However Bayesian parameter estimation for undirected models is
extremely challenging, since evaluation of the posterior typically involves the
calculation of an intractable normalising constant. This problem has received
much attention, but very little of this has focussed on the important practical
case where the data consists of noisy or incomplete observations of the
underlying hidden structure. This paper specifically addresses this problem,
comparing two alternative methodologies. In the first of these approaches
particle Markov chain Monte Carlo (Andrieu et al., 2010) is used to efficiently
explore the parameter space, combined with the exchange algorithm (Murray et
al., 2006) for avoiding the calculation of the intractable normalising constant
(a proof showing that this combination targets the correct distribution in
found in a supplementary appendix online). This approach is compared with
approximate Bayesian computation (Pritchard et al., 1999). Applications to
estimating the parameters of Ising models and exponential random graphs from
noisy data are presented. Each algorithm used in the paper targets an
approximation to the true posterior due to the use of MCMC to simulate from the
latent graphical model, in lieu of being able to do this exactly in general.
The supplementary appendix also describes the nature of the resulting
approximation.Comment: 26 pages, 2 figures, accepted in Journal of Computational and
Graphical Statistics (http://www.amstat.org/publications/jcgs.cfm
Global parameter identification of stochastic reaction networks from single trajectories
We consider the problem of inferring the unknown parameters of a stochastic
biochemical network model from a single measured time-course of the
concentration of some of the involved species. Such measurements are available,
e.g., from live-cell fluorescence microscopy in image-based systems biology. In
addition, fluctuation time-courses from, e.g., fluorescence correlation
spectroscopy provide additional information about the system dynamics that can
be used to more robustly infer parameters than when considering only mean
concentrations. Estimating model parameters from a single experimental
trajectory enables single-cell measurements and quantification of cell--cell
variability. We propose a novel combination of an adaptive Monte Carlo sampler,
called Gaussian Adaptation, and efficient exact stochastic simulation
algorithms that allows parameter identification from single stochastic
trajectories. We benchmark the proposed method on a linear and a non-linear
reaction network at steady state and during transient phases. In addition, we
demonstrate that the present method also provides an ellipsoidal volume
estimate of the viable part of parameter space and is able to estimate the
physical volume of the compartment in which the observed reactions take place.Comment: Article in print as a book chapter in Springer's "Advances in Systems
Biology
Piecewise Approximate Bayesian Computation: fast inference for discretely observed Markov models using a factorised posterior distribution
Many modern statistical applications involve inference for complicated stochastic models for which the likelihood function is difficult or even impossible to calculate, and hence conventional likelihood-based inferential techniques cannot be used. In such settings, Bayesian inference can be performed using Approximate Bayesian Computation (ABC). However, in spite of many recent developments to ABC methodology, in many applications the computational cost of ABC necessitates the choice of summary statistics and tolerances that can potentially severely bias the estimate of the posterior.
We propose a new “piecewise” ABC approach suitable for discretely observed Markov models that involves writing the posterior density of the parameters as a product of factors, each a function of only a subset of the data, and then using ABC within each factor. The approach has the advantage of side-stepping the need to choose a summary statistic and it enables a stringent tolerance to be set, making the posterior “less approximate”. We investigate two methods for estimating the posterior density based on ABC samples for each of the factors: the first is to use a Gaussian approximation for each factor, and the second is to use a kernel density estimate. Both methods have their merits. The Gaussian approximation is simple, fast, and probably adequate for many applications. On the other hand, using instead a kernel density estimate has the benefit of consistently estimating the true piecewise ABC posterior as the number of ABC samples tends to infinity. We illustrate the piecewise ABC approach with four examples; in each case, the approach offers fast and accurate inference
Bayesian model comparison with un-normalised likelihoods
Models for which the likelihood function can be evaluated only up to a parameter-dependent unknown normalizing constant, such as Markov random field models, are used widely in computer science, statistical physics, spatial statistics, and network analysis. However, Bayesian analysis of these models using standard Monte Carlo methods is not possible due to the intractability of their likelihood functions. Several methods that permit exact, or close to exact, simulation from the posterior distribution have recently been developed. However, estimating the evidence and Bayes’ factors for these models remains challenging in general. This paper describes new random weight importance sampling and sequential Monte Carlo methods for estimating BFs that use simulation to circumvent the evaluation of the intractable likelihood, and compares them to existing methods. In some cases we observe an advantage in the use of biased weight estimates. An initial investigation into the theoretical and empirical properties of this class of methods is presented. Some support for the use of biased estimates is presented, but we advocate caution in the use of such estimates
Methods for detecting associations between phenotype and aggregations of rare variants
Although genome-wide association studies have uncovered variants associated with more than 150 traits, the percentage of phenotypic variation explained by these associations remains small. This has led to the search for the dark matter that explains this missing genetic component of heritability. One potential explanation for dark matter is rare variants, and several statistics have been devised to detect associations resulting from aggregations of rare variants in relatively short regions of interest, such as candidate genes. In this paper we investigate the feasibility of extending this approach in an agnostic way, in which we consider all variants within a much broader region of interest, such as an entire chromosome or even the entire exome. Our method searches for subsets of variant sites using either Markov chain Monte Carlo or genetic algorithms. The analysis was performed with knowledge of the Genetic Analysis Workshop 17 answers
Probabilistic machine learning and artificial intelligence.
How can a machine learn from experience? Probabilistic modelling provides a framework for understanding what learning is, and has therefore emerged as one of the principal theoretical and practical approaches for designing machines that learn from data acquired through experience. The probabilistic framework, which describes how to represent and manipulate uncertainty about models and predictions, has a central role in scientific data analysis, machine learning, robotics, cognitive science and artificial intelligence. This Review provides an introduction to this framework, and discusses some of the state-of-the-art advances in the field, namely, probabilistic programming, Bayesian optimization, data compression and automatic model discovery.The author acknowledges an EPSRC grant EP/I036575/1, the DARPA PPAML programme, a Google Focused Research Award for the Automatic Statistician and support from Microsoft Research.This is the author accepted manuscript. The final version is available from NPG at http://www.nature.com/nature/journal/v521/n7553/full/nature14541.html#abstract
- …
