104 research outputs found

    Notes to Robert et al.: Model criticism informs model choice and model comparison

    Full text link
    In their letter to PNAS and a comprehensive set of notes on arXiv [arXiv:0909.5673v2], Christian Robert, Kerrie Mengersen and Carla Chen (RMC) represent our approach to model criticism in situations when the likelihood cannot be computed as a way to "contrast several models with each other". In addition, RMC argue that model assessment with Approximate Bayesian Computation under model uncertainty (ABCmu) is unduly challenging and question its Bayesian foundations. We disagree, and clarify that ABCmu is a probabilistically sound and powerful too for criticizing a model against aspects of the observed data, and discuss further the utility of ABCmu.Comment: Reply to [arXiv:0909.5673v2

    Statistical modelling of summary values leads to accurate Approximate Bayesian Computations

    Full text link
    Approximate Bayesian Computation (ABC) methods rely on asymptotic arguments, implying that parameter inference can be systematically biased even when sufficient statistics are available. We propose to construct the ABC accept/reject step from decision theoretic arguments on a suitable auxiliary space. This framework, referred to as ABC*, fully specifies which test statistics to use, how to combine them, how to set the tolerances and how long to simulate in order to obtain accuracy properties on the auxiliary space. Akin to maximum-likelihood indirect inference, regularity conditions establish when the ABC* approximation to the posterior density is accurate on the original parameter space in terms of the Kullback-Leibler divergence and the maximum a posteriori point estimate. Fundamentally, escaping asymptotic arguments requires knowledge of the distribution of test statistics, which we obtain through modelling the distribution of summary values, data points on a summary level. Synthetic examples and an application to time series data of influenza A (H3N2) infections in the Netherlands illustrate ABC* in action.Comment: Videos can be played with Acrobat Reader. Manuscript under review and not accepte

    Using ABC for model design and inference across biological scales

    Get PDF
    Using ABC for model design and inference across biological scales

    Goodness of fit for models with intractable likelihood

    Get PDF
    Routine goodness-of-fit analyses of complex models with intractable likelihoods are hampered by a lack of computationally tractable diagnostic measures with wellunderstood frequency properties, that is, with a known sampling distribution. This frustrates the ability to assess the extremity of the data relative to fitted simulation models in terms of pre-specified test statistics, an essential requirement for model improvement. Given an Approximate Bayesian Computation setting for a posited model with an intractable likelihood for which it is possible to simulate from them, we present a general and computationally inexpensive Monte Carlo framework for obtaining p-valuesthat are asymptotically uniformly distributed in [0, 1] under the posited model when assumptions about the asymptotic equivalence between the conditional statistic and the maximum likelihood estimator hold. The proposed framework follows almost directly from the conditional predictive p-value proposed in the Bayesian literature. Numerical investigations demonstrate favorable power properties in detecting actual model discrepancies relative to other diagnostic approaches. We illustrate the technique on analytically tractable examples and on a complex tuberculosis transmission model.Authors have been founded by MINECO-Spain projects PID2019-104790GB-I00 (M.E. Castellanos and S. Cabras) and Wellcome Trust fellowship WR092311MF (O. Ratmann)

    Inference for Nonlinear Epidemiological Models Using Genealogies and Time Series

    Get PDF
    Phylodynamics - the field aiming to quantitatively integrate the ecological and evolutionary dynamics of rapidly evolving populations like those of RNA viruses – increasingly relies upon coalescent approaches to infer past population dynamics from reconstructed genealogies. As sequence data have become more abundant, these approaches are beginning to be used on populations undergoing rapid and rather complex dynamics. In such cases, the simple demographic models that current phylodynamic methods employ can be limiting. First, these models are not ideal for yielding biological insight into the processes that drive the dynamics of the populations of interest. Second, these models differ in form from mechanistic and often stochastic population dynamic models that are currently widely used when fitting models to time series data. As such, their use does not allow for both genealogical data and time series data to be considered in tandem when conducting inference. Here, we present a flexible statistical framework for phylodynamic inference that goes beyond these current limitations. The framework we present employs a recently developed method known as particle MCMC to fit stochastic, nonlinear mechanistic models for complex population dynamics to gene genealogies and time series data in a Bayesian framework. We demonstrate our approach using a nonlinear Susceptible-Infected-Recovered (SIR) model for the transmission dynamics of an infectious disease and show through simulations that it provides accurate estimates of past disease dynamics and key epidemiological parameters from genealogies with or without accompanying time series data

    Using Likelihood-Free Inference to Compare Evolutionary Dynamics of the Protein Networks of H. pylori and P. falciparum

    Get PDF
    Gene duplication with subsequent interaction divergence is one of the primary driving forces in the evolution of genetic systems. Yet little is known about the precise mechanisms and the role of duplication divergence in the evolution of protein networks from the prokaryote and eukaryote domains. We developed a novel, model-based approach for Bayesian inference on biological network data that centres on approximate Bayesian computation, or likelihood-free inference. Instead of computing the intractable likelihood of the protein network topology, our method summarizes key features of the network and, based on these, uses a MCMC algorithm to approximate the posterior distribution of the model parameters. This allowed us to reliably fit a flexible mixture model that captures hallmarks of evolution by gene duplication and subfunctionalization to protein interaction network data of Helicobacter pylori and Plasmodium falciparum. The 80% credible intervals for the duplication–divergence component are [0.64, 0.98] for H. pylori and [0.87, 0.99] for P. falciparum. The remaining parameter estimates are not inconsistent with sequence data. An extensive sensitivity analysis showed that incompleteness of PIN data does not largely affect the analysis of models of protein network evolution, and that the degree sequence alone barely captures the evolutionary footprints of protein networks relative to other statistics. Our likelihood-free inference approach enables a fully Bayesian analysis of a complex and highly stochastic system that is otherwise intractable at present. Modelling the evolutionary history of PIN data, it transpires that only the simultaneous analysis of several global aspects of protein networks enables credible and consistent inference to be made from available datasets. Our results indicate that gene duplication has played a larger part in the network evolution of the eukaryote than in the prokaryote, and suggests that single gene duplications with immediate divergence alone may explain more than 60% of biological network data in both domains

    Estimating fine age structure and time trends in human contact patterns from coarse contact data: the Bayesian rate consistency model

    Full text link
    Since the emergence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), many contact surveys have been conducted to measure changes in human interactions in the face of the pandemic and non-pharmaceutical interventions. These surveys were typically conducted longitudinally, using protocols that differ from those used in the pre-pandemic era. We present a model-based statistical approach that can reconstruct contact patterns at 1-year resolution even when the age of the contacts is reported coarsely by 5 or 10-year age bands. This innovation is rooted in population-level consistency constraints in how contacts between groups must add up, which prompts us to call the approach presented here the Bayesian rate consistency model. The model incorporates computationally efficient Hilbert Space Gaussian process priors to infer the dynamics in age- and gender-structured social contacts and is designed to adjust for reporting fatigue in longitudinal surveys. We demonstrate on simulations the ability to reconstruct contact patterns by gender and 1-year age interval from coarse data with adequate accuracy and within a fully Bayesian framework to quantify uncertainty. We investigate the patterns of social contact data collected in Germany from April to June 2020 across five longitudinal survey waves. We reconstruct the fine age structure in social contacts during the early stages of the pandemic and demonstrate that social contacts rebounded in a structured, non-homogeneous manner. We also show that by July 2020, social contact intensities remained well below pre-pandemic values despite a considerable easing of non-pharmaceutical interventions. This model-based inference approach is open access, computationally tractable enabling full Bayesian uncertainty quantification, and readily applicable to contemporary survey data as long as the exact age of survey participants is reported.Comment: 39 pages, 16 figure

    Bayesian mixture models for phylogenetic source attribution from consensus sequences and time since infection estimates

    Full text link
    In stopping the spread of infectious diseases, pathogen genomic data can be used to reconstruct transmission events and characterize population-level sources of infection. Most approaches for identifying transmission pairs do not account for the time that passed since divergence of pathogen variants in individuals, which is problematic in viruses with high within-host evolutionary rates. This is prompting us to consider possible transmission pairs in terms of phylogenetic data and additional estimates of time since infection derived from clinical biomarkers. We develop Bayesian mixture models with an evolutionary clock as signal component and additional mixed effects or covariate random functions describing the mixing weights to classify potential pairs into likely and unlikely transmission pairs. We demonstrate that although sources cannot be identified at the individual level with certainty, even with the additional data on time elapsed, inferences into the population-level sources of transmission are possible, and more accurate than using only phylogenetic data without time since infection estimates. We apply the approach to estimate age-specific sources of HIV infection in Amsterdam MSM transmission networks between 2010-2021. This study demonstrates that infection time estimates provide informative data to characterize transmission sources, and shows how phylogenetic source attribution can then be done with multi-dimensional mixture models
    • …
    corecore