30 research outputs found

    Assessing biases in phylodynamic inferences in the presence of super-spreaders.

    Get PDF
    Phylodynamic analyses using pathogen genetic data have become popular for making epidemiological inferences. However, many methods assume that the underlying host population follows homogenous mixing patterns. Nevertheless, in real disease outbreaks, a small number of individuals infect a disproportionately large number of others (super-spreaders). Our objective was to quantify the degree of bias in estimating the epidemic starting date in the presence of super-spreaders using different sample selection strategies. We simulated 100 epidemics of a hypothetical pathogen (fast evolving foot and mouth disease virus-like) over a real livestock movement network allowing the genetic mutations in pathogen sequence. Genetic sequences were sampled serially over the epidemic, which were then used to estimate the epidemic starting date using Extended Bayesian Coalescent Skyline plot (EBSP) and Birth-death skyline plot (BDSKY) models. Our results showed that the degree of bias varies over different epidemic situations, with substantial overestimations on the epidemic duration occurring in some occasions. While the accuracy and precision of BDSKY were deteriorated when a super-spreader generated a larger proportion of secondary cases, those of EBSP were deteriorated when epidemics were shorter. The accuracies of the inference were similar irrespective of whether the analysis used all sampled sequences or only a subset of them, although the former required substantially longer computational times. When phylodynamic analyses need to be performed under a time constraint to inform policy makers, we suggest multiple phylodynamics models to be used simultaneously for a subset of data to ascertain the robustness of inferences

    Modeling the growth and decline of pathogen effective population size provides insight into epidemic dynamics and drivers of antimicrobial resistance

    Get PDF
    Nonparametric population genetic modeling provides a simple and flexible approach for studying demographic history and epidemic dynamics using pathogen sequence data. Existing Bayesian approaches are premised on stochastic processes with stationary increments which may provide an unrealistic prior for epidemic histories which feature extended period of exponential growth or decline. We show that nonparametric models defined in terms of the growth rate of the effective population size can provide a more realistic prior for epidemic history. We propose a nonparametric autoregressive model on the growth rate as a prior for effective population size, which corresponds to the dynamics expected under many epidemic situations. We demonstrate the use of this model within a Bayesian phylodynamic inference framework. Our method correctly reconstructs trends of epidemic growth and decline from pathogen genealogies even when genealogical data are sparse and conventional skyline estimators erroneously predict stable population size. We also propose a regression approach for relating growth rates of pathogen effective population size and time-varying variables that may impact the replicative fitness of a pathogen. The model is applied to real data from rabies virus and Staphylococcus aureus epidemics. We find a close correspondence between the estimated growth rates of a lineage of methicillin-resistant S. aureus and population-level prescription rates of β -lactam antibiotics. The new models are implemented in an open source R package called skygrowth which is available at https://github.com/mrc-ide/skygrowth

    Phylodynamic inference across epidemic scales

    Get PDF
    Within-host genetic diversity and large transmission bottlenecks confound phylodynamic inference of epidemiological dynamics. Conventional phylodynamic approaches assume that nodes in a time-scaled pathogen phylogeny correspond closely to the time of transmission between hosts that are ancestral to the sample. However, when hosts harbour diverse pathogen populations, node times can substantially pre-date infection times. Imperfect bottlenecks can cause lineages sampled in different individuals to coalesce in unexpected patterns. To address realistic violations of standard phylodynamic assumptions we developed a new inference approach based on a multi-scale coalescent model, accounting for nonlinear epidemiological dynamics, heterogeneous sampling through time, non-negligible genetic diversity of pathogens within hosts, and imperfect transmission bottlenecks. We apply this method to HIV-1 and Ebola virus outbreak sequence data, illustrating how and when conventional phylodynamic inference may give misleading results. Within-host diversity of HIV-1 causes substantial upwards bias in the number of infected hosts using conventional coalescent models, but estimates using the multi-scale model have greater consistency with reported number of diagnoses through time. In contrast, we find that within- host diversity of Ebola virus has little influence on estimated numbers of infected hosts or reproduction numbers, and estimates are highly consistent with the reported number of diagnoses through time. The multi-scale coalescent also enables estimation of within-host effective population size using single sequences from a random sample of patients. We find within-host population genetic diversity of HIV-1 p17 to be 2 Nμ = 0 . 012(95% CI:0 . 0066 − 0 . 023), which is lower than estimates based on HIV envelope serial sequencing of individual patients

    Dynamic Population Models with Temporal Preferential Sampling to Infer Phenology

    Full text link
    To study population dynamics, ecologists and wildlife biologists use relative abundance data, which are often subject to temporal preferential sampling. Temporal preferential sampling occurs when sampling effort varies across time. To account for preferential sampling, we specify a Bayesian hierarchical abundance model that considers the dependence between observation times and the ecological process of interest. The proposed model improves abundance estimates during periods of infrequent observation and accounts for temporal preferential sampling in discrete time. Additionally, our model facilitates posterior inference for population growth rates and mechanistic phenometrics. We apply our model to analyze both simulated data and mosquito count data collected by the National Ecological Observatory Network. In the second case study, we characterize the population growth rate and abundance of several mosquito species in the Aedes genus.Comment: 29 pages, 5 figures, 1 tabl

    The effects of sampling strategy on the quality of reconstruction of viral population dynamics using Bayesian skyline family coalescent methods:A simulation study

    Get PDF
    The ongoing large-scale increase in the total amount of genetic data for viruses and other pathogens has led to a situation in which it is often not possible to include every available sequence in a phylogenetic analysis and expect the procedure to complete in reasonable computational time. This raises questions about how a set of sequences should be selected for analysis, particularly if the data are used to infer more than just the phylogenetic tree itself. The design of sampling strategies for molecular epidemiology has been a neglected field of research. This article describes a large-scale simulation exercise that was undertaken to select an appropriate strategy when using the GMRF skygrid, one of the Bayesian skyline family of coalescent methods, in order to reconstruct past population dynamics. The simulated scenarios were intended to represent sampling for the population of an endemic virus across multiple geographical locations. Large phylogenies were simulated under a coalescent or structured coalescent model and sequences simulated from these trees; the resulting datasets were then downsampled for analyses according to a variety of schemes. Variation in results between different replicates of the same scheme was not insignificant, and as a result, we recommend that where possible analyses are repeated with different datasets in order to establish that elements of a reconstruction are not simply the result of the particular set of samples selected. We show that an individual stochastic choice of sequences can introduce spurious behaviour in the median line of the skygrid plot and that even marginal likelihood estimation can suggest complicated dynamics that were not in fact present. We recommend that the median line should not be used to infer historical events on its own. Sampling sequences with uniform probability with respect to both time and spatial location (deme) never performed worse than sampling with probability proportional to the effective population size at that time and in that location and frequently was superior. As a result, we recommend this approach in the design of future studies. We also confirm that the inclusion of many recent sequences from a single geographical location in an analysis tends to result in a spurious bottleneck effect in the reconstruction and caution against interpreting this as genuine

    Importation of Alpha and Delta variants during the SARS-CoV-2 epidemic in Switzerland: Phylogenetic analysis and intervention scenarios.

    Get PDF
    The SARS-CoV-2 pandemic has led to the emergence of various variants of concern (VoCs) that are associated with increased transmissibility, immune evasion, or differences in disease severity. The emergence of VoCs fueled interest in understanding the potential impact of travel restrictions and surveillance strategies to prevent or delay the early spread of VoCs. We performed phylogenetic analyses and mathematical modeling to study the importation and spread of the VoCs Alpha and Delta in Switzerland in 2020 and 2021. Using a phylogenetic approach, we estimated between 383-1,038 imports of Alpha and 455-1,347 imports of Delta into Switzerland. We then used the results from the phylogenetic analysis to parameterize a dynamic transmission model that accurately described the subsequent spread of Alpha and Delta. We modeled different counterfactual intervention scenarios to quantify the potential impact of border closures and surveillance of travelers on the spread of Alpha and Delta. We found that implementing border closures after the announcement of VoCs would have been of limited impact to mitigate the spread of VoCs. In contrast, increased surveillance of travelers could prove to be an effective measure for delaying the spread of VoCs in situations where their severity remains unclear. Our study shows how phylogenetic analysis in combination with dynamic transmission models can be used to estimate the number of imported SARS-CoV-2 variants and the potential impact of different intervention scenarios to inform the public health response during the pandemic

    Bayesian methods for source attribution using HIV deep sequence data

    Get PDF
    The advent of pathogen deep-sequencing technology provides new opportunities for infec- tious disease surveillance, especially for fast-evolving viruses like human immunodeficiency virus (HIV). In particular, multiple reads per host contain detailed information on viral within- host diversity. This information allows the reconstruction of partial directed transmission networks, where estimates of who is source and who is recipient are directly available from the phylogenetic ordering of the viruses of any two individuals. This is a new approach for phylodynamics, and the topic of my thesis. In this thesis, I present updates to the bioinformatics pipeline used by the Phylogenetics And Networks for Generalised Epidemics in Africa consortium for processing HIV deep sequence data and running the phyloscanner program. I then present a semi-parametric Bayesian Poisson model for inferring infectious disease transmission flows and the sources of infection at the population level. The framework is computationally scalable in high- dimensional flow spaces thanks to Hilbert Space Gaussian process approximations, allows for sampling bias adjustments, and estimation of gender- and age-specific transmission flows at a finer resolution than previously possible. In this sense, the methods that I developed enable us to overcome some problems which have been unable to be solved by conventional phylodynamic approaches. We apply the approach to densely sampled, population-based HIV deep-sequence data from Rakai, Uganda. I focus on characterising age-specific transmission dynamics, and examining the sources of HIV infections in adolescent and young women in particular.Open Acces

    Bayesian Computing with INLA: A Review

    Get PDF
    The key operation in Bayesian inference is to compute high-dimensional integrals. An old approximate technique is the Laplace method or approximation, which dates back to Pierre-Simon Laplace (1774). This simple idea approximates the integrand with a second-order Taylor expansion around the mode and computes the integral analytically. By developing a nested version of this classical idea, combined with modern numerical techniques for sparse matrices, we obtain the approach of integrated nested Laplace approximations (INLA) to do approximate Bayesian inference for latent Gaussian models (LGMs). LGMs represent an important model abstraction for Bayesian inference and include a large proportion of the statistical models used today. In this review, we discuss the reasons for the success of the INLA approach, the R-INLA package, why it is so accurate, why the approximations are very quick to compute, and why LGMs make such a useful concept for Bayesian computing

    Optimising the use of new data streams for making epidemiological inferences in veterinary epidemiology : a thesis presented in partial fulfilment of the requirements for the degree of PhD in Veterinary Epidemiology at Massey University, Manawatu, New Zealand

    Get PDF
    Many ‘big data’ streams have recently become available in animal health disciplines. While these data may be able to provide valuable epidemiological information, researchers are at risk of making erroneous inferences if limitations in these data are overlooked. This thesis focused on understanding the better use of two data streams—livestock movement records and genetic sequence data. The first study analysed national dairy cattle movement data in New Zealand to explore whether regionalisation of the country based on bovine tuberculosis risk influenced trade decisions. The results suggested that the observed livestock movement patterns could be explained by the majority of, but not all, farmers avoiding purchasing cattle from high disease risk areas. The second study took an alternative approach—qualitative interviews—to understanding farmers’ livestock purchasing practices. This study suggested that farmers are not necessarily concerned with disease status of source farms and that it may be the reliance on stock agents to facilitate trade that creates the observed livestock movement patterns in New Zealand. The findings from this study also implied that various demographic and production characteristics of animals may influence farmers’ livestock selling practices, which were quantitatively verified in the third study analysing livestock movement data and animal production data. These studies not only showed that analyses based solely on ‘big data’ can be misleading but also provided useful information necessary to predict future livestock movement patterns. The final study evaluated the performance of various genetic sequence sampling strategies in making phylodynamic inferences. We showed that using all available genetic samples can be not only computationally expensive, but also may lead to erroneous inferences. The results also suggested that strategies for sampling genetic sequences for phylodynamic analyses may need to be tailored based on epidemiological characteristics of each epidemic

    Emerging concepts of data integration in pathogen phylodynamics

    Get PDF
    Phylodynamics has become an increasingly popular statistical framework to extract evolutionary and epidemiological information from pathogen genomes. By harnessing such information, epidemiologists aim to shed light on the spatio-temporal patterns of spread and to test hypotheses about the underlying interaction of evolutionary and ecological dynamics in pathogen populations. Although the field has witnessed a rich development of statistical inference tools with increasing levels of sophistication, these tools initially focused on sequences as their sole primary data source. Integrating various sources of information, however, promises to deliver more precise insights in infectious diseases and to increase opportunities for statistical hypothesis testing. Here, we review how the emerging concept of data integration is stimulating new advances in Bayesian evolutionary inference methodology which formalize a marriage of statistical thinking and evolutionary biology. These approaches include connecting sequence to trait evolution, such as for host, phenotypic and geographic sampling information, but also the incorporation of covariates of evolutionary and epidemic processes in the reconstruction procedures. We highlight how a full Bayesian approach to covariate modelling and testing can generate further insights into sequence evolution, trait evolution and population dynamics in pathogen populations. Specific examples demonstrate how such approaches can be used to test the impact of host on rabies and HIV evolutionary rates, to identify the drivers of influenza dispersal as well as the determinants of rabies cross-species transmissions, and to quantify the evolutionary dynamics of influenza antigenicity. Finally, we briefly discuss how data integration is now also permeating through the inference of transmission dynamics, leading to novel insights into tree-generative processes and detailed reconstructions of transmission trees.status: publishe
    corecore