37 research outputs found
Recommended from our members
Phylogenetic patterns recover known HIV epidemiological relationships and reveal common transmission of multiple variants.
The growth of human immunodeficiency virus (HIV) sequence databases resulting from drug resistance testing has motivated efforts using phylogenetic methods to assess how HIV spreads1-4. Such inference is potentially both powerful and useful for tracking the epidemiology of HIV and the allocation of resources to prevention campaigns. We recently used simulation and a small number of illustrative cases to show that certain phylogenetic patterns are associated with different types of epidemiological linkage5. Our original approach was later generalized for large next-generation sequencing datasets and implemented as a free computational pipeline6. Previous work has claimed that direction and directness of transmission could not be established from phylogeny because one could not be sure that there were no intervening or missing links involved7-9. Here, we address this issue by investigating phylogenetic patterns from 272 previously identified HIV transmission chains with 955 transmission pairs representing diverse geography, risk groups, subtypes, and genomic regions. These HIV transmissions had known linkage based on epidemiological information such as partner studies, mother-to-child transmission, pairs identified by contact tracing, and criminal cases. We show that the resulting phylogeny inferred from real HIV genetic sequences indeed reveals distinct patterns associated with direct transmission contra transmissions from a common source. Thus, our results establish how to interpret phylogenetic trees based on HIV sequences when tracking who-infected-whom, when and how genetic information can be used for improved tracking of HIV spread. We also investigate limitations that stem from limited sampling and genetic time-trends in the donor and recipient HIV populations
skelesim : an extensible, general framework for population genetic simulation in R
Simulations are a key tool in molecular ecology for inference and forecasting, as well as for evaluating new methods. Due to growing computational power and a diversity of software with different capabilities, simulations are becoming increasingly powerful and useful. However, the widespread use of simulations by geneticists and ecologists is hindered by difficulties in understanding these softwaresâ complex capabilities, composing code and input files, a daunting bioinformatics barrier, and a steep conceptual learning curve. skeleSim (an R package) guides users in choosing appropriate simulations, setting parameters, calculating genetic summary statistics, and organizing data output, in a reproducible pipeline within the R environment. skeleSim is designed to be an extensible framework that can âwrapâ around any simulation software (inside or outside the R environment) and be extended to calculate and graph any genetic summary statistics. Currently, skeleSim implements coalescent and forward-time models available in the fastsimcoal2 and rmetasim simulation engines to produce null distributions for multiple population genetic statistics and marker types, under a variety of demographic conditions. skeleSim is intended to make simulations easier while still allowing full model complexity to ensure that simulations play a fundamental role in molecular ecology investigations. skeleSim can also serve as a teaching tool: demonstrating the outcomes of stochastic population genetic processes; teaching general concepts of simulations; and providing an introduction to the R environment with a user-friendly graphical user interface (using shiny)
Assembling evidence for identifying reservoirs of infection
Many pathogens persist in multihost systems, making the identification of infection reservoirs crucial for devising effective interventions. Here, we present a conceptual framework for classifying patterns of incidence and prevalence, and review recent scientific advances that allow us to study and manage reservoirs simultaneously. We argue that interventions can have a crucial role in enriching our mechanistic understanding of how reservoirs function and should be embedded as quasi-experimental studies in adaptive management frameworks. Single approaches to the study of reservoirs are unlikely to generate conclusive insights whereas the formal integration of data and methodologies, involving interventions, pathogen genetics, and contemporary surveillance techniques, promises to open up new opportunities to advance understanding of complex multihost systems
Integrating serological and genetic data to quantify cross-species transmission: brucellosis as a case study
Epidemiological data are often fragmented, partial, and/or ambiguous and unable to yield the desired level of understanding
of infectious disease dynamics to adequately inform control measures. Here, we show how the information contained in
widely available serology data can be enhanced by integration with less common type-specific data, to improve the understanding
of the transmission dynamics of complex multi-species pathogens and host communities. Using brucellosis in
Northern Tanzania as a case-study, we developed a latent process model based on serology data obtained from the
field, to reconstruct Brucella transmission dynamics. We were able to identify sheep and goats as a more likely source
of human and animal infection than cattle; however, the highly cross-reactive nature of Brucella spp. meant that it was
not possible to determine which Brucella species (B. abortus or B. melitensis) is responsible for human infection. We
extended our model to integrate simulated serology and typing data, and show that although serology alone can identify
the host source of human infection under certain restrictive conditions, the integration of even small amounts (5%) of
typing data can improve understanding of complex epidemiological dynamics. We show that data integration will often
be essential when more than one pathogen is present and when the distinction between exposed and infectious individuals
is not clear from serology data. With increasing epidemiological complexity, serology data become less informative.
However, we show how this weakness can be mitigated by integrating such data with typing data, thereby enhancing
the inference from these data and improving understanding of the underlying dynamics
Bayesian reconstruction of SARS-CoV-2 transmissions highlights substantial proportion of negative serial intervals
BACKGROUND: The serial interval is a key epidemiological measure that quantifies the time between the onset of symptoms in an infector-infectee pair. It indicates how quickly new generations of cases appear, thus informing on the speed of an epidemic. Estimating the serial interval requires to identify pairs of infectors and infectees. Yet, most studies fail to assess the direction of transmission between cases and assume that the order of infections - and thus transmissions - strictly follows the order of symptom onsets, thereby imposing serial intervals to be positive. Because of the long and highly variable incubation period of SARS-CoV-2, this may not always be true (i.e an infectee may show symptoms before their infector) and negative serial intervals may occur. This study aims to estimate the serial interval of different SARS-CoV-2 variants whilst accounting for negative serial intervals. METHODS: This analysis included 5 842 symptomatic individuals with confirmed SARS-CoV-2 infection amongst 2 579 households from September 2020 to August 2022 across England & Wales. We used a Bayesian framework to infer who infected whom by exploring all transmission trees compatible with the observed dates of symptoms, based on a wide range of incubation period and generation time distributions compatible with estimates reported in the literature. Serial intervals were derived from the reconstructed transmission pairs, stratified by variants. RESULTS: We estimated that 22% (95% credible interval (CrI) 8-32%) of serial interval values are negative across all VOC. The mean serial interval was shortest for Omicron BA5 (2.02 days, 1.26-2.84) and longest for Alpha (3.37 days, 2.52-4.04). CONCLUSIONS: This study highlights the large proportion of negative serial intervals across SARS-CoV-2 variants. Because the serial interval is widely used to estimate transmissibility and forecast cases, these results may have critical implications for epidemic control
Bayesian reconstruction of SARS-CoV-2 transmissions highlights substantial proportion of negative serial intervals
BACKGROUND: The serial interval is a key epidemiological measure that quantifies the time between the onset of symptoms in an infector-infectee pair. It indicates how quickly new generations of cases appear, thus informing on the speed of an epidemic. Estimating the serial interval requires to identify pairs of infectors and infectees. Yet, most studies fail to assess the direction of transmission between cases and assume that the order of infections - and thus transmissions - strictly follows the order of symptom onsets, thereby imposing serial intervals to be positive. Because of the long and highly variable incubation period of SARS-CoV-2, this may not always be true (i.e an infectee may show symptoms before their infector) and negative serial intervals may occur. This study aims to estimate the serial interval of different SARS-CoV-2 variants whilst accounting for negative serial intervals. METHODS: This analysis included 5 842 symptomatic individuals with confirmed SARS-CoV-2 infection amongst 2 579 households from September 2020 to August 2022 across England & Wales. We used a Bayesian framework to infer who infected whom by exploring all transmission trees compatible with the observed dates of symptoms, based on a wide range of incubation period and generation time distributions compatible with estimates reported in the literature. Serial intervals were derived from the reconstructed transmission pairs, stratified by variants. RESULTS: We estimated that 22% (95% credible interval (CrI) 8-32%) of serial interval values are negative across all VOC. The mean serial interval was shortest for Omicron BA5 (2.02 days, 1.26-2.84) and longest for Alpha (3.37 days, 2.52-4.04). CONCLUSIONS: This study highlights the large proportion of negative serial intervals across SARS-CoV-2 variants. Because the serial interval is widely used to estimate transmissibility and forecast cases, these results may have critical implications for epidemic control
Inferring epidemiological links from deep sequencing data: a statistical learning approach for human, animal and plant diseases
Pathogen sequence data have been exploited to infer who infected whom, by using empirical and model-based approaches. Most of these approaches exploit one pathogen sequence per infected host (e.g. individual, household, field). However, modern sequencing techniques can reveal the polymorphic nature of within-host populations of pathogens. Thus, these techniques provide a subsample of the pathogen variants that were present in the host at the sampling time. Such data are expected to give more insight on epidemiological links than a single sequence per host. In general, a mechanistic viewpoint to transmission and micro-evolution has been followed to infer epidemiological links from these data. Here, we investigate an alternative approach grounded on statistical learning. The idea consists of learning the structure of epidemiological links with a pseudo-evolutionary model applied to training data obtained from contact tracing, for example, and using this initial stage to infer links for the whole dataset. Such an approach has the potential to be particularly valuable in the case of a risk of erroneous mechanistic assumptions, it is sufficiently parsimonious to allow the handling of big datasets in the future, and it is versatile enough to be applied to very different contexts from animal, human and plant epidemiology. This article is part of the theme issue 'Modelling infectious disease outbreaks in humans, animals and plants: approaches and important themes'. This issue is linked with the subsequent theme issue 'Modelling infectious disease outbreaks in humans, animals and plants: epidemic forecasting and control'
Assessing the epidemic potential of RNA and DNA viruses
Many new and emerging RNA and DNA viruses are zoonotic or have zoonotic origins in an animal reservoir that is usually mammalian and sometimes avian. Not all zoonotic viruses are transmissible (directly or by an arthropod vector) between human hosts. Virus genome sequence data provide the best evidence of transmission. Of human transmissible virus, 37 species have so far been restricted to self-limiting outbreaks. These viruses are priorities for surveillance because relatively minor changes in their epidemiologies can potentially lead to major changes in the threat they pose to public health. On the basis of comparisons across all recognized human viruses, we consider the characteristics of these priority viruses and assess the likelihood that they will further emerge in human populations. We also assess the likelihood that a virus that can infect humans but is not capable of transmission (directly or by a vector) between human hosts can acquire that capability
Molecular Infectious Disease Epidemiology: Survival Analysis and Algorithms Linking Phylogenies to Transmission Trees
Recent work has attempted to use whole-genome sequence data from pathogens to
reconstruct the transmission trees linking infectors and infectees in
outbreaks. However, transmission trees from one outbreak do not generalize to
future outbreaks. Reconstruction of transmission trees is most useful to public
health if it leads to generalizable scientific insights about disease
transmission. In a survival analysis framework, estimation of transmission
parameters is based on sums or averages over the possible transmission trees. A
phylogeny can increase the precision of these estimates by providing partial
information about who infected whom. The leaves of the phylogeny represent
sampled pathogens, which have known hosts. The interior nodes represent common
ancestors of sampled pathogens, which have unknown hosts. Starting from
assumptions about disease biology and epidemiologic study design, we prove that
there is a one-to-one correspondence between the possible assignments of
interior node hosts and the transmission trees simultaneously consistent with
the phylogeny and the epidemiologic data on person, place, and time. We develop
algorithms to enumerate these transmission trees and show these can be used to
calculate likelihoods that incorporate both epidemiologic data and a phylogeny.
A simulation study confirms that this leads to more efficient estimates of
hazard ratios for infectiousness and baseline hazards of infectious contact,
and we use these methods to analyze data from a foot-and-mouth disease virus
outbreak in the United Kingdom in 2001. These results demonstrate the
importance of data on individuals who escape infection, which is often
overlooked. The combination of survival analysis and algorithms linking
phylogenies to transmission trees is a rigorous but flexible statistical
foundation for molecular infectious disease epidemiology.Comment: 28 pages, 11 figures, 3 table