103 research outputs found
Notes to Robert et al.: Model criticism informs model choice and model comparison
In their letter to PNAS and a comprehensive set of notes on arXiv
[arXiv:0909.5673v2], Christian Robert, Kerrie Mengersen and Carla Chen (RMC)
represent our approach to model criticism in situations when the likelihood
cannot be computed as a way to "contrast several models with each other". In
addition, RMC argue that model assessment with Approximate Bayesian Computation
under model uncertainty (ABCmu) is unduly challenging and question its Bayesian
foundations. We disagree, and clarify that ABCmu is a probabilistically sound
and powerful too for criticizing a model against aspects of the observed data,
and discuss further the utility of ABCmu.Comment: Reply to [arXiv:0909.5673v2
Statistical modelling of summary values leads to accurate Approximate Bayesian Computations
Approximate Bayesian Computation (ABC) methods rely on asymptotic arguments,
implying that parameter inference can be systematically biased even when
sufficient statistics are available. We propose to construct the ABC
accept/reject step from decision theoretic arguments on a suitable auxiliary
space. This framework, referred to as ABC*, fully specifies which test
statistics to use, how to combine them, how to set the tolerances and how long
to simulate in order to obtain accuracy properties on the auxiliary space. Akin
to maximum-likelihood indirect inference, regularity conditions establish when
the ABC* approximation to the posterior density is accurate on the original
parameter space in terms of the Kullback-Leibler divergence and the maximum a
posteriori point estimate. Fundamentally, escaping asymptotic arguments
requires knowledge of the distribution of test statistics, which we obtain
through modelling the distribution of summary values, data points on a summary
level. Synthetic examples and an application to time series data of influenza A
(H3N2) infections in the Netherlands illustrate ABC* in action.Comment: Videos can be played with Acrobat Reader. Manuscript under review and
not accepte
Using ABC for model design and inference across biological scales
Using ABC for model design and inference across biological scales
Goodness of fit for models with intractable likelihood
Routine goodness-of-fit analyses of complex models with intractable likelihoods are
hampered by a lack of computationally tractable diagnostic measures with wellunderstood
frequency properties, that is, with a known sampling distribution. This
frustrates the ability to assess the extremity of the data relative to fitted simulation
models in terms of pre-specified test statistics, an essential requirement for model
improvement. Given an Approximate Bayesian Computation setting for a posited
model with an intractable likelihood for which it is possible to simulate from them, we
present a general and computationally inexpensive Monte Carlo framework for obtaining
p-valuesthat are asymptotically uniformly distributed in [0, 1] under the posited
model when assumptions about the asymptotic equivalence between the conditional
statistic and the maximum likelihood estimator hold. The proposed framework follows
almost directly from the conditional predictive p-value proposed in the Bayesian literature.
Numerical investigations demonstrate favorable power properties in detecting
actual model discrepancies relative to other diagnostic approaches. We illustrate the
technique on analytically tractable examples and on a complex tuberculosis transmission
model.Authors have been founded by MINECO-Spain projects PID2019-104790GB-I00 (M.E. Castellanos and
S. Cabras) and Wellcome Trust fellowship WR092311MF (O. Ratmann)
Inference for Nonlinear Epidemiological Models Using Genealogies and Time Series
Phylodynamics - the field aiming to quantitatively integrate the ecological and evolutionary dynamics of rapidly evolving populations like those of RNA viruses – increasingly relies upon coalescent approaches to infer past population dynamics from reconstructed genealogies. As sequence data have become more abundant, these approaches are beginning to be used on populations undergoing rapid and rather complex dynamics. In such cases, the simple demographic models that current phylodynamic methods employ can be limiting. First, these models are not ideal for yielding biological insight into the processes that drive the dynamics of the populations of interest. Second, these models differ in form from mechanistic and often stochastic population dynamic models that are currently widely used when fitting models to time series data. As such, their use does not allow for both genealogical data and time series data to be considered in tandem when conducting inference. Here, we present a flexible statistical framework for phylodynamic inference that goes beyond these current limitations. The framework we present employs a recently developed method known as particle MCMC to fit stochastic, nonlinear mechanistic models for complex population dynamics to gene genealogies and time series data in a Bayesian framework. We demonstrate our approach using a nonlinear Susceptible-Infected-Recovered (SIR) model for the transmission dynamics of an infectious disease and show through simulations that it provides accurate estimates of past disease dynamics and key epidemiological parameters from genealogies with or without accompanying time series data
Using Likelihood-Free Inference to Compare Evolutionary Dynamics of the Protein Networks of H. pylori and P. falciparum
Gene duplication with subsequent interaction divergence is one of the primary driving forces in the evolution of genetic systems. Yet little is known about the precise mechanisms and the role of duplication divergence in the evolution of protein networks from the prokaryote and eukaryote domains. We developed a novel, model-based approach for Bayesian inference on biological network data that centres on approximate Bayesian computation, or likelihood-free inference. Instead of computing the intractable likelihood of the protein network topology, our method summarizes key features of the network and, based on these, uses a MCMC algorithm to approximate the posterior distribution of the model parameters. This allowed us to reliably fit a flexible mixture model that captures hallmarks of evolution by gene duplication and subfunctionalization to protein interaction network data of Helicobacter pylori and Plasmodium falciparum. The 80% credible intervals for the duplication–divergence component are [0.64, 0.98] for H. pylori and [0.87, 0.99] for P. falciparum. The remaining parameter estimates are not inconsistent with sequence data. An extensive sensitivity analysis showed that incompleteness of PIN data does not largely affect the analysis of models of protein network evolution, and that the degree sequence alone barely captures the evolutionary footprints of protein networks relative to other statistics. Our likelihood-free inference approach enables a fully Bayesian analysis of a complex and highly stochastic system that is otherwise intractable at present. Modelling the evolutionary history of PIN data, it transpires that only the simultaneous analysis of several global aspects of protein networks enables credible and consistent inference to be made from available datasets. Our results indicate that gene duplication has played a larger part in the network evolution of the eukaryote than in the prokaryote, and suggests that single gene duplications with immediate divergence alone may explain more than 60% of biological network data in both domains
Estimating fine age structure and time trends in human contact patterns from coarse contact data: the Bayesian rate consistency model
Since the emergence of severe acute respiratory syndrome coronavirus 2
(SARS-CoV-2), many contact surveys have been conducted to measure changes in
human interactions in the face of the pandemic and non-pharmaceutical
interventions. These surveys were typically conducted longitudinally, using
protocols that differ from those used in the pre-pandemic era. We present a
model-based statistical approach that can reconstruct contact patterns at
1-year resolution even when the age of the contacts is reported coarsely by 5
or 10-year age bands. This innovation is rooted in population-level consistency
constraints in how contacts between groups must add up, which prompts us to
call the approach presented here the Bayesian rate consistency model. The model
incorporates computationally efficient Hilbert Space Gaussian process priors to
infer the dynamics in age- and gender-structured social contacts and is
designed to adjust for reporting fatigue in longitudinal surveys. We
demonstrate on simulations the ability to reconstruct contact patterns by
gender and 1-year age interval from coarse data with adequate accuracy and
within a fully Bayesian framework to quantify uncertainty. We investigate the
patterns of social contact data collected in Germany from April to June 2020
across five longitudinal survey waves. We reconstruct the fine age structure in
social contacts during the early stages of the pandemic and demonstrate that
social contacts rebounded in a structured, non-homogeneous manner. We also show
that by July 2020, social contact intensities remained well below pre-pandemic
values despite a considerable easing of non-pharmaceutical interventions. This
model-based inference approach is open access, computationally tractable
enabling full Bayesian uncertainty quantification, and readily applicable to
contemporary survey data as long as the exact age of survey participants is
reported.Comment: 39 pages, 16 figure
Bayesian mixture models for phylogenetic source attribution from consensus sequences and time since infection estimates
In stopping the spread of infectious diseases, pathogen genomic data can be
used to reconstruct transmission events and characterize population-level
sources of infection. Most approaches for identifying transmission pairs do not
account for the time that passed since divergence of pathogen variants in
individuals, which is problematic in viruses with high within-host evolutionary
rates. This is prompting us to consider possible transmission pairs in terms of
phylogenetic data and additional estimates of time since infection derived from
clinical biomarkers. We develop Bayesian mixture models with an evolutionary
clock as signal component and additional mixed effects or covariate random
functions describing the mixing weights to classify potential pairs into likely
and unlikely transmission pairs. We demonstrate that although sources cannot be
identified at the individual level with certainty, even with the additional
data on time elapsed, inferences into the population-level sources of
transmission are possible, and more accurate than using only phylogenetic data
without time since infection estimates. We apply the approach to estimate
age-specific sources of HIV infection in Amsterdam MSM transmission networks
between 2010-2021. This study demonstrates that infection time estimates
provide informative data to characterize transmission sources, and shows how
phylogenetic source attribution can then be done with multi-dimensional mixture
models
- …