37 research outputs found

    skelesim : an extensible, general framework for population genetic simulation in R

    Get PDF
    Simulations are a key tool in molecular ecology for inference and forecasting, as well as for evaluating new methods. Due to growing computational power and a diversity of software with different capabilities, simulations are becoming increasingly powerful and useful. However, the widespread use of simulations by geneticists and ecologists is hindered by difficulties in understanding these softwares’ complex capabilities, composing code and input files, a daunting bioinformatics barrier, and a steep conceptual learning curve. skeleSim (an R package) guides users in choosing appropriate simulations, setting parameters, calculating genetic summary statistics, and organizing data output, in a reproducible pipeline within the R environment. skeleSim is designed to be an extensible framework that can ‘wrap’ around any simulation software (inside or outside the R environment) and be extended to calculate and graph any genetic summary statistics. Currently, skeleSim implements coalescent and forward-time models available in the fastsimcoal2 and rmetasim simulation engines to produce null distributions for multiple population genetic statistics and marker types, under a variety of demographic conditions. skeleSim is intended to make simulations easier while still allowing full model complexity to ensure that simulations play a fundamental role in molecular ecology investigations. skeleSim can also serve as a teaching tool: demonstrating the outcomes of stochastic population genetic processes; teaching general concepts of simulations; and providing an introduction to the R environment with a user-friendly graphical user interface (using shiny)

    Assembling evidence for identifying reservoirs of infection

    Get PDF
    Many pathogens persist in multihost systems, making the identification of infection reservoirs crucial for devising effective interventions. Here, we present a conceptual framework for classifying patterns of incidence and prevalence, and review recent scientific advances that allow us to study and manage reservoirs simultaneously. We argue that interventions can have a crucial role in enriching our mechanistic understanding of how reservoirs function and should be embedded as quasi-experimental studies in adaptive management frameworks. Single approaches to the study of reservoirs are unlikely to generate conclusive insights whereas the formal integration of data and methodologies, involving interventions, pathogen genetics, and contemporary surveillance techniques, promises to open up new opportunities to advance understanding of complex multihost systems

    Integrating serological and genetic data to quantify cross-species transmission: brucellosis as a case study

    Get PDF
    Epidemiological data are often fragmented, partial, and/or ambiguous and unable to yield the desired level of understanding of infectious disease dynamics to adequately inform control measures. Here, we show how the information contained in widely available serology data can be enhanced by integration with less common type-specific data, to improve the understanding of the transmission dynamics of complex multi-species pathogens and host communities. Using brucellosis in Northern Tanzania as a case-study, we developed a latent process model based on serology data obtained from the field, to reconstruct Brucella transmission dynamics. We were able to identify sheep and goats as a more likely source of human and animal infection than cattle; however, the highly cross-reactive nature of Brucella spp. meant that it was not possible to determine which Brucella species (B. abortus or B. melitensis) is responsible for human infection. We extended our model to integrate simulated serology and typing data, and show that although serology alone can identify the host source of human infection under certain restrictive conditions, the integration of even small amounts (5%) of typing data can improve understanding of complex epidemiological dynamics. We show that data integration will often be essential when more than one pathogen is present and when the distinction between exposed and infectious individuals is not clear from serology data. With increasing epidemiological complexity, serology data become less informative. However, we show how this weakness can be mitigated by integrating such data with typing data, thereby enhancing the inference from these data and improving understanding of the underlying dynamics

    Bayesian reconstruction of SARS-CoV-2 transmissions highlights substantial proportion of negative serial intervals

    Get PDF
    BACKGROUND: The serial interval is a key epidemiological measure that quantifies the time between the onset of symptoms in an infector-infectee pair. It indicates how quickly new generations of cases appear, thus informing on the speed of an epidemic. Estimating the serial interval requires to identify pairs of infectors and infectees. Yet, most studies fail to assess the direction of transmission between cases and assume that the order of infections - and thus transmissions - strictly follows the order of symptom onsets, thereby imposing serial intervals to be positive. Because of the long and highly variable incubation period of SARS-CoV-2, this may not always be true (i.e an infectee may show symptoms before their infector) and negative serial intervals may occur. This study aims to estimate the serial interval of different SARS-CoV-2 variants whilst accounting for negative serial intervals. METHODS: This analysis included 5 842 symptomatic individuals with confirmed SARS-CoV-2 infection amongst 2 579 households from September 2020 to August 2022 across England & Wales. We used a Bayesian framework to infer who infected whom by exploring all transmission trees compatible with the observed dates of symptoms, based on a wide range of incubation period and generation time distributions compatible with estimates reported in the literature. Serial intervals were derived from the reconstructed transmission pairs, stratified by variants. RESULTS: We estimated that 22% (95% credible interval (CrI) 8-32%) of serial interval values are negative across all VOC. The mean serial interval was shortest for Omicron BA5 (2.02 days, 1.26-2.84) and longest for Alpha (3.37 days, 2.52-4.04). CONCLUSIONS: This study highlights the large proportion of negative serial intervals across SARS-CoV-2 variants. Because the serial interval is widely used to estimate transmissibility and forecast cases, these results may have critical implications for epidemic control

    Bayesian reconstruction of SARS-CoV-2 transmissions highlights substantial proportion of negative serial intervals

    Get PDF
    BACKGROUND: The serial interval is a key epidemiological measure that quantifies the time between the onset of symptoms in an infector-infectee pair. It indicates how quickly new generations of cases appear, thus informing on the speed of an epidemic. Estimating the serial interval requires to identify pairs of infectors and infectees. Yet, most studies fail to assess the direction of transmission between cases and assume that the order of infections - and thus transmissions - strictly follows the order of symptom onsets, thereby imposing serial intervals to be positive. Because of the long and highly variable incubation period of SARS-CoV-2, this may not always be true (i.e an infectee may show symptoms before their infector) and negative serial intervals may occur. This study aims to estimate the serial interval of different SARS-CoV-2 variants whilst accounting for negative serial intervals. METHODS: This analysis included 5 842 symptomatic individuals with confirmed SARS-CoV-2 infection amongst 2 579 households from September 2020 to August 2022 across England & Wales. We used a Bayesian framework to infer who infected whom by exploring all transmission trees compatible with the observed dates of symptoms, based on a wide range of incubation period and generation time distributions compatible with estimates reported in the literature. Serial intervals were derived from the reconstructed transmission pairs, stratified by variants. RESULTS: We estimated that 22% (95% credible interval (CrI) 8-32%) of serial interval values are negative across all VOC. The mean serial interval was shortest for Omicron BA5 (2.02 days, 1.26-2.84) and longest for Alpha (3.37 days, 2.52-4.04). CONCLUSIONS: This study highlights the large proportion of negative serial intervals across SARS-CoV-2 variants. Because the serial interval is widely used to estimate transmissibility and forecast cases, these results may have critical implications for epidemic control

    Inferring epidemiological links from deep sequencing data: a statistical learning approach for human, animal and plant diseases

    Get PDF
    Pathogen sequence data have been exploited to infer who infected whom, by using empirical and model-based approaches. Most of these approaches exploit one pathogen sequence per infected host (e.g. individual, household, field). However, modern sequencing techniques can reveal the polymorphic nature of within-host populations of pathogens. Thus, these techniques provide a subsample of the pathogen variants that were present in the host at the sampling time. Such data are expected to give more insight on epidemiological links than a single sequence per host. In general, a mechanistic viewpoint to transmission and micro-evolution has been followed to infer epidemiological links from these data. Here, we investigate an alternative approach grounded on statistical learning. The idea consists of learning the structure of epidemiological links with a pseudo-evolutionary model applied to training data obtained from contact tracing, for example, and using this initial stage to infer links for the whole dataset. Such an approach has the potential to be particularly valuable in the case of a risk of erroneous mechanistic assumptions, it is sufficiently parsimonious to allow the handling of big datasets in the future, and it is versatile enough to be applied to very different contexts from animal, human and plant epidemiology. This article is part of the theme issue 'Modelling infectious disease outbreaks in humans, animals and plants: approaches and important themes'. This issue is linked with the subsequent theme issue 'Modelling infectious disease outbreaks in humans, animals and plants: epidemic forecasting and control'

    Assessing the epidemic potential of RNA and DNA viruses

    Get PDF
    Many new and emerging RNA and DNA viruses are zoonotic or have zoonotic origins in an animal reservoir that is usually mammalian and sometimes avian. Not all zoonotic viruses are transmissible (directly or by an arthropod vector) between human hosts. Virus genome sequence data provide the best evidence of transmission. Of human transmissible virus, 37 species have so far been restricted to self-limiting outbreaks. These viruses are priorities for surveillance because relatively minor changes in their epidemiologies can potentially lead to major changes in the threat they pose to public health. On the basis of comparisons across all recognized human viruses, we consider the characteristics of these priority viruses and assess the likelihood that they will further emerge in human populations. We also assess the likelihood that a virus that can infect humans but is not capable of transmission (directly or by a vector) between human hosts can acquire that capability

    Molecular Infectious Disease Epidemiology: Survival Analysis and Algorithms Linking Phylogenies to Transmission Trees

    Full text link
    Recent work has attempted to use whole-genome sequence data from pathogens to reconstruct the transmission trees linking infectors and infectees in outbreaks. However, transmission trees from one outbreak do not generalize to future outbreaks. Reconstruction of transmission trees is most useful to public health if it leads to generalizable scientific insights about disease transmission. In a survival analysis framework, estimation of transmission parameters is based on sums or averages over the possible transmission trees. A phylogeny can increase the precision of these estimates by providing partial information about who infected whom. The leaves of the phylogeny represent sampled pathogens, which have known hosts. The interior nodes represent common ancestors of sampled pathogens, which have unknown hosts. Starting from assumptions about disease biology and epidemiologic study design, we prove that there is a one-to-one correspondence between the possible assignments of interior node hosts and the transmission trees simultaneously consistent with the phylogeny and the epidemiologic data on person, place, and time. We develop algorithms to enumerate these transmission trees and show these can be used to calculate likelihoods that incorporate both epidemiologic data and a phylogeny. A simulation study confirms that this leads to more efficient estimates of hazard ratios for infectiousness and baseline hazards of infectious contact, and we use these methods to analyze data from a foot-and-mouth disease virus outbreak in the United Kingdom in 2001. These results demonstrate the importance of data on individuals who escape infection, which is often overlooked. The combination of survival analysis and algorithms linking phylogenies to transmission trees is a rigorous but flexible statistical foundation for molecular infectious disease epidemiology.Comment: 28 pages, 11 figures, 3 table
    corecore