15 research outputs found

    Molecular Infectious Disease Epidemiology: Survival Analysis and Algorithms Linking Phylogenies to Transmission Trees

    Full text link
    Recent work has attempted to use whole-genome sequence data from pathogens to reconstruct the transmission trees linking infectors and infectees in outbreaks. However, transmission trees from one outbreak do not generalize to future outbreaks. Reconstruction of transmission trees is most useful to public health if it leads to generalizable scientific insights about disease transmission. In a survival analysis framework, estimation of transmission parameters is based on sums or averages over the possible transmission trees. A phylogeny can increase the precision of these estimates by providing partial information about who infected whom. The leaves of the phylogeny represent sampled pathogens, which have known hosts. The interior nodes represent common ancestors of sampled pathogens, which have unknown hosts. Starting from assumptions about disease biology and epidemiologic study design, we prove that there is a one-to-one correspondence between the possible assignments of interior node hosts and the transmission trees simultaneously consistent with the phylogeny and the epidemiologic data on person, place, and time. We develop algorithms to enumerate these transmission trees and show these can be used to calculate likelihoods that incorporate both epidemiologic data and a phylogeny. A simulation study confirms that this leads to more efficient estimates of hazard ratios for infectiousness and baseline hazards of infectious contact, and we use these methods to analyze data from a foot-and-mouth disease virus outbreak in the United Kingdom in 2001. These results demonstrate the importance of data on individuals who escape infection, which is often overlooked. The combination of survival analysis and algorithms linking phylogenies to transmission trees is a rigorous but flexible statistical foundation for molecular infectious disease epidemiology.Comment: 28 pages, 11 figures, 3 table

    Inferring epidemiological links from deep sequencing data: a statistical learning approach for human, animal and plant diseases

    Get PDF
    Pathogen sequence data have been exploited to infer who infected whom, by using empirical and model-based approaches. Most of these approaches exploit one pathogen sequence per infected host (e.g. individual, household, field). However, modern sequencing techniques can reveal the polymorphic nature of within-host populations of pathogens. Thus, these techniques provide a subsample of the pathogen variants that were present in the host at the sampling time. Such data are expected to give more insight on epidemiological links than a single sequence per host. In general, a mechanistic viewpoint to transmission and micro-evolution has been followed to infer epidemiological links from these data. Here, we investigate an alternative approach grounded on statistical learning. The idea consists of learning the structure of epidemiological links with a pseudo-evolutionary model applied to training data obtained from contact tracing, for example, and using this initial stage to infer links for the whole dataset. Such an approach has the potential to be particularly valuable in the case of a risk of erroneous mechanistic assumptions, it is sufficiently parsimonious to allow the handling of big datasets in the future, and it is versatile enough to be applied to very different contexts from animal, human and plant epidemiology. This article is part of the theme issue 'Modelling infectious disease outbreaks in humans, animals and plants: approaches and important themes'. This issue is linked with the subsequent theme issue 'Modelling infectious disease outbreaks in humans, animals and plants: epidemic forecasting and control'

    Integrating epidemiological and genetic data with different sampling intensities into a dynamic model of respiratory syncytial virus transmission.

    Get PDF
    Respiratory syncytial virus (RSV) is responsible for a significant burden of severe acute lower respiratory tract illness in children under 5 years old; particularly infants. Prior to rolling out any vaccination program, identification of the source of infant infections could further guide vaccination strategies. We extended a dynamic model calibrated at the individual host level initially fit to social-temporal data on shedding patterns to include whole genome sequencing data available at a lower sampling intensity. The study population was 493 individuals (55 aged < 1 year) distributed across 47 households, observed through one RSV season in coastal Kenya. We found that 58/97 (60%) of RSV-A and 65/125 (52%) of RSV-B cases arose from infection probably occurring within the household. Nineteen (45%) infant infections appeared to be the result of infection by other household members, of which 13 (68%) were a result of transmission from a household co-occupant aged between 2 and 13 years. The applicability of genomic data in studies of transmission dynamics is highly context specific; influenced by the question, data collection protocols and pathogen under investigation. The results further highlight the importance of pre-school and school-aged children in RSV transmission, particularly the role they play in directly infecting the household infant. These age groups are a potential RSV vaccination target group

    Integrating viral RNA sequence and epidemiological data to define transmission patterns for respiratory syncytial virus

    Get PDF
    The analyses contained herein focus on making comparisons between model inferences obtained using different scales of pathogen identification, with a particular focus on respiratory syncytial virus (RSV). A significant proportion of lower respiratory tract infections in children has been attributed to infection by RSV and as such, there has been global interest in understanding its transmission characteristics in order to plan for effective control. Mathematical models have often been used to explore potential mechanisms that drive the patterns observed in data collected at different scales. Several models have been used to explore how immunity to RSV is acquired and maintained, vaccination strategies and potential drivers of seasonality. However, most of these models do not make a distinction between the two antigenically and genetically distinct RSV groups (RSV A and RSV B), neither do they consider its ecological environment, in particular, potential interactions between RSV and other viral pathogens. This thesis therefore presents work done aimed at understanding the transmission characteristics of viral respiratory pathogens spreading in a group of households using a dynamic model of transmission The data analysed is cohort data collected between December 2009 and June 2010 from 493 individual distributed across 47 households from a rural coastal community in Kenya. Individuals in the study had nasopharyngeal swab samples collected twice weekly irrespective of symptom status. Infecting viral pathogens were identified using RT-PCR resulting in the identification of 4 main pathogens: RSV, human coronavirus, rhinovirus and adenovirus. RSV and coronavirus were further classified according to genetically distinct subgroups. Some of the RSV samples were sequenced to obtain whole genome sequences (WGS) and further classified into genetic clades/clusters. I first conducted a review of methods to identify the best way to integrate socialtemporal data and WGS genetic data into a single modelling framework for RSV. Given that the social-temporal data and genetic data were available at different sampling densities, I decided to use a model that focused on the data with the highest density. The results in this thesis are thus presented in three main chapters; the first focuses on analysing social-temporal shedding patterns of RSV identified at the group level (i.e. distinguish between RSV A and RSV B); the second incorporates the available genetic data into the model used to analyse the social-temporal data (i.e. separating RSV-A into 5 clusters, and RSV-B into 7 clusters); the third is an analysis of the interaction of two pathogens, RSV and coronavirus, identified at two different scales. One of the main findings in this thesis is that the household setting plays an important role in the spread of RSV, a finding that is made clearer with added detail on pathogen type. In the case of the data analysed here, and the social structuring from which it was collected, RSV clades appeared to mimic household structure as such identification at this level did not drastically change the transmission characteristic observed with identification at the group level. However, the combination of epidemiological and genetic data elucidated transmission chains within the household enabling the identification of the sources of infant RSV infections. For this particular study, it was inferred that the sources of infant RSV infections were both in the same household as the infant and from external sources. Where infant infections occurred in the household, the source of infection was often a child between the ages of 2-13 years. It was inferred that previous infection with one RSV group type reduced susceptibility to re-infection by heterologous group type within the same epidemic. Interactions were also observed between RSV and human coronavirus groups. In particular, previous infection with RSV B was estimated to increase susceptibility to corona OC43 by 81% (95% CrI: 40%, 134%). Detailed data of infection events in individual hosts can provide a wealth of knowledge. The inferences made from this study should be explored at larger spatial and temporal scales to determine the population level impact, and hence public-health significance, of pathogen interactions, whether these interactions are between strains of the same pathogen of between different pathogens. In planning for, and assessing the impact of, an intervention against a particular pathogen, investigators should not ignore the preexisting ecological balance and should make efforts to understand how this will be disrupted by an intervention against one or more pathogens

    o2geosocial: Reconstructing who-infected-whom from routinely collected surveillance data

    Get PDF
    Reconstructing the history of individual transmission events between cases is key to understanding what factors facilitate the spread of an infectious disease. Since conducting extended contact-tracing investigations can be logistically challenging and costly, statistical inference methods have been developed to reconstruct transmission trees from onset dates and genetic sequences. However, these methods are not as effective if the mutation rate of the virus is very slow, or if sequencing data is sparse. We developed the package o2geosocial to combine variables from routinely collected surveillance data with a simple transmission process model. The model reconstructs transmission trees when full genetic sequences are not available, or uninformative. Our model incorporates the reported age-group, onset date, location and genotype of infected cases to infer probabilistic transmission trees. The package also includes functions to summarise and visualise the inferred cluster size distribution. The results generated by o2geosocial can highlight regions where importations repeatedly caused large outbreaks, which may indicate a higher regional susceptibility to infections. It can also be used to generate the individual number of secondary transmissions, and show the features associated with individuals involved in high transmission events. The package is available for download from the Comprehensive R Archive Network (CRAN) and GitHub.</ns3:p

    Inference of Infectious Disease Dynamics from Genetic Data via Sequential Monte Carlo

    Full text link
    When an epidemic moves through a population of hosts, the process of transmission may leave a signature in the genetic sequences of the pathogen. Patterns in pathogen sequences may therefore be a rich source of information on disease dynamics. Genetic sequences may replace or supplement other epidemiological observations. Furthermore, sequences may contain information not present in other datatypes, opening the possibility of inferences inaccessible by other means. The field of phylodynamic inference aims to reconstruct disease dynamics from pathogen genetic sequences. Although a wide variety of phylodynamic inference methods have been proposed, most methods for fitting mechanistic models of disease operate in two disjoint steps, first estimating the phylogeny of the pathogen and then fitting models of disease dynamics to properties of the estimated phylogeny. Logical inconsistency in demographic assumptions underlying the two stages of inference may create bias in resulting parameter estimates. Joint inference of disease dynamics and phylogeny ensures consistent assumptions, but few methods for joint inference are currently available. The central work of this thesis is a new method for joint inference of disease dynamics and phylogeny from pathogen genetic sequences. This likelihood-based method, which we call genPomp, allows for fitting mechanistic models of arbitrary complexity to genetic sequences. The organization of this thesis is as follows. In Chapter I, we present background on the field of phylodynamic inference. In Chapter II, we use simulation to study a two-stage inference approach proposed by Rasmussen et al. (2011). We find that errors in phylogenetic reconstruction may drive bias in two-stage phylodynamic inference. This result underscores the need for methodology for joint inference of the transmission model and the pathogen phylogeny. In Chapter III, we propose a flexible method for joint inference and demonstrate the feasibility of this method through simulation and a study on stage-specific infectiousness of HIV in Detroit, MI. This method is comprised of a class of algorithms that use sequential Monte Carlo to estimate and maximize likelihoods. In Appendix A we show theoretical support for our algorithms. In Chapter IV, we demonstrate the flexibility of our approach by developing a model of transmission of Vancomycin-resistant enterococcus in a hospital setting. To allow for fitting this model to patient-level data we developed a targeted proposal, detailed in Appendix B. We present exploratory analysis of a hospital outbreak at NIH that motivates the form of the model, and carry out a study on simulated data. Although some assumptions of the simulated example are unrealistic, these initial results will inform future efforts at fitting real data. In Chapter V, we summarize the progress represented in this thesis and consider possibilities for future work.PHDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/146063/1/alxsmth_1.pd

    Integrating host population contact structure and pathogen whole-genome sequence data to understand the epidemiology of infectious diseases : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy, Massey University, Manawatū, New Zealand

    Get PDF
    With advances in high-throughput sequencing technologies, computational biology, and evolutionary modelling, pathogen sequence data is increasingly being used to inform infectious disease outbreak investigations; supporting inferences on the timing and directionality of transmission as well as providing insights into pathogen evolutionary dynamics and the development of antimicrobial resistance. This thesis focuses on the application of pathogen whole-genome sequence data in conjunction with social network analysis to investigate the transmission dynamics of two important pathogens; Campylobacter jejuni and Staphylococcus aureus. The first four studies centre around the recent emergence of an antimicrobial-resistant C. jejuni strain that was found to have rapidly spread throughout the New Zealand commercial poultry industry. All four studies build on the results of an industry survey that were not only used to determine the basic farm demographics and biosecurity practices of all poultry producers, but also to construct five contact networks representing the on- and off-farm movement patterns of goods and services. Contact networks were used in study one to investigate the relationship between farm-level contact risk pathways and the reported level of biosecurity. However, despite many farms having a number of contact risk pathways, no relationship was found due to the high level of variability in biosecurity practices between producers. In study two the contact risk between commercial poultry, backyard poultry, and wild birds was investigated by examining the spatial overlap between the commercial contact networks and (i) all poultry transactions made through the online auction website TradeMe® and, (ii) all wild bird observations made through the online citizen science bird monitoring project, eBird, with study results suggesting that the greatest risk is due to the growing number of online trades made over increasingly long distances and shorter timespans. Study three further uses the commercial contact networks to investigate the role of multiple transmission pathways on the genetic relatedness of 167 C. jejuni isolates sampled from across 30 commercial poultry farms. Permutational multivariate analysis of variance and distance-based linear models were used to explore the relative importance of network distances as potential determinants of the pairwise genetic relatedness between the C. jejuni isolates, with study results highlighting the importance of transporting feed vehicles in addition to the geographical proximity of farms and the parent company in the spread of disease. In the last of the four C. jejuni studies, a compartmental disease transmission model was developed to simulate both the spread and sequence mutations across an outbreak within the commercial poultry industry. Simulated sequences were used in an analysis mirroring the methods used in study three in order to validate the approaches examining the contribution of local contacts and network contacts towards disease transmission. An additional analysis is also performed in which the simulated sequence data is used to infer a transmission tree and explore the use of pathogen phylogenies in determining who-infected-whom across different model systems. A further study, motivated by the application of whole-genome sequence data to infer transmission, investigated the spread of S. aureus within the New Zealand dairy industry. This study demonstrated how whole-genome sequence data can be used to investigate pathogen population and evolutionary dynamics at multiple scales: from local to national and international. For this study, the genetic relatedness between 57 bovine-derived S. aureus isolates sampled from across 17 New Zealand dairy herds were compared with 59 S. aureus isolates that had been previously sampled and characterised from humans and domestic pets from across New Zealand and 103 S. aureus isolates extracted from GenBank that included both human and livestock isolates sampled from across 19 countries. Results from this study not only support evidence showing that the movement of live animals is an important risk factor for the spread of S. aureus, but also show that using cattle-tracing data alone may not be enough to fully capture the between farm transmission dynamics of S. aureus. Overall, by using these two pathogen examples, this thesis demonstrates the potential use of pathogen whole-genome sequence data alongside contact network data in an epidemiological investigation, whilst highlighting the limitations and future challenges that must be considered in order to continue to develop robust methods that can be used to reliably infer the transmission and evolutionary dynamics across a range of infectious diseases
    corecore