60 research outputs found
Automatic differentiation is no panacea for phylogenetic gradient computation
Gradients of probabilistic model likelihoods with respect to their parameters
are essential for modern computational statistics and machine learning. These
calculations are readily available for arbitrary models via automatic
differentiation implemented in general-purpose machine-learning libraries such
as TensorFlow and PyTorch. Although these libraries are highly optimized, it is
not clear if their general-purpose nature will limit their algorithmic
complexity or implementation speed for the phylogenetic case compared to
phylogenetics-specific code. In this paper, we compare six gradient
implementations of the phylogenetic likelihood functions, in isolation and also
as part of a variational inference procedure. We find that although automatic
differentiation can scale approximately linearly in tree size, it is much
slower than the carefully-implemented gradient calculation for tree likelihood
and ratio transformation operations. We conclude that a mixed approach
combining phylogenetic libraries with machine learning libraries will provide
the optimal combination of speed and model flexibility moving forward.Comment: 15 pages and 2 figures in main text, plus supplementary material
The molecular epidemiology of multiple zoonotic origins of SARS-CoV-2
Understanding the circumstances that lead to pandemics is important for their prevention. Here, we analyze the genomic diversity of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) early in the coronavirus disease 2019 (COVID-19) pandemic. We show that SARS-CoV-2 genomic diversity before February 2020 likely comprised only two distinct viral lineages, denoted A and B. Phylodynamic rooting methods, coupled with epidemic simulations, reveal that these lineages were the result of at least two separate cross-species transmission events into humans. The first zoonotic transmission likely involved lineage B viruses around 18 November 2019 (23 October–8 December), while the separate introduction of lineage A likely occurred within weeks of this event. These findings indicate that it is unlikely that SARS-CoV-2 circulated widely in humans prior to November 2019 and define the narrow window between when SARS-CoV-2 first jumped into humans and when the first cases of COVID-19 were reported. As with other coronaviruses, SARS-CoV-2 emergence likely resulted from multiple zoonotic events
Ebola virus transmission initiated by systemic ebola virus disease relapse
During the 2018-2020 Ebola virus disease (EVD) outbreak in North Kivu province in the Democratic Republic of Congo, EVD was diagnosed in a patient who had received the recombinant vesicular stomatitis virus-based vaccine expressing a ZEBOV glycoprotein (rVSV-ZEBOV) (Merck). His treatment included an Ebola virus (EBOV)-specific monoclonal antibody (mAb114), and he recovered within 14 days. However, 6 months later, he presented again with severe EVD-like illness and EBOV viremia, and he died. We initiated epidemiologic and genomic investigations that showed that the patient had had a relapse of acute EVD that led to a transmission chain resulting in 91 cases across six health zones over 4 months. (Funded by the Bill and Melinda Gates Foundation and others.)
Genomic epidemiology reveals multiple introductions of Zika virus into the United States
Zika virus (ZIKV) is causing an unprecedented epidemic linked to severe congenital abnormalities. In July 2016, mosquito-borne ZIKV transmission was reported in the continental United States; since then, hundreds of locally acquired infections have been reported in Florida. To gain insights into the timing, source, and likely route(s) of ZIKV introduction, we tracked the virus from its first detection in Florida by sequencing ZIKV genomes from infected patients and Aedes aegypti mosquitoes. We show that at least 4 introductions, but potentially as many as 40, contributed to the outbreak in Florida and that local transmission is likely to have started in the spring of 2016-several months before its initial detection. By analysing surveillance and genetic data, we show that ZIKV moved among transmission zones in Miami. Our analyses show that most introductions were linked to the Caribbean, a finding corroborated by the high incidence rates and traffic volumes from the region into the Miami area. Our study provides an understanding of how ZIKV initiates transmission in new regions
Multiplex PCR method for MinION and Illumina sequencing of Zika and other virus genomes directly from clinical samples
Genome sequencing has become a powerful tool for studying emerging infectious diseases; however, genome sequencing directly from clinical samples (i.e., without isolation and culture) remains challenging for viruses such as Zika, for which metagenomic sequencing methods may generate insufficient numbers of viral reads. Here we present a protocol for generating coding-sequence-complete genomes, comprising an online primer design tool, a novel multiplex PCR enrichment protocol, optimized library preparation methods for the portable MinION sequencer (Oxford Nanopore Technologies) and the Illumina range of instruments, and a bioinformatics pipeline for generating consensus sequences. The MinION protocol does not require an Internet connection for analysis, making it suitable for field applications with limited connectivity. Our method relies on multiplex PCR for targeted enrichment of viral genomes from samples containing as few as 50 genome copies per reaction. Viral consensus sequences can be achieved in 1-2 d by starting with clinical samples and following a simple laboratory workflow. This method has been successfully used by several groups studying Zika virus evolution and is facilitating an understanding of the spread of the virus in the Americas. The protocol can be used to sequence other viral genomes using the online Primal Scheme primer designer software. It is suitable for sequencing either RNA or DNA viruses in the field during outbreaks or as an inexpensive, convenient method for use in the lab
Genome sequencing reveals Zika virus diversity and spread in the Americas
Although the recent Zika virus (ZIKV) epidemic in the Americas and its link to birth defects have attracted a great deal of attention, much remains unknown about ZIKV disease epidemiology and ZIKV evolution, in part owing to a lack of genomic data. Here we address this gap in knowledge by using multiple sequencing approaches to generate 110 ZIKV genomes from clinical and mosquito samples from 10 countries and territories, greatly expanding the observed viral genetic diversity from this outbreak. We analysed the timing and patterns of introductions into distinct geographic regions; our phylogenetic evidence suggests rapid expansion of the outbreak in Brazil and multiple introductions of outbreak strains into Puerto Rico, Honduras, Colombia, other Caribbean islands, and the continental United States. We find that ZIKV circulated undetected in multiple regions for many months before the first locally transmitted cases were confirmed, highlighting the importance of surveillance of viral infections. We identify mutations with possible functional implications for ZIKV biology and pathogenesis, as well as those that might be relevant to the effectiveness of diagnostic tests
Recommended from our members
Inferring the risk factors behind the geographical spread and transmission of Zika in the Americas
Background: An unprecedented Zika virus epidemic occurred in the Americas during 2015-2016. The size of the epidemic in conjunction with newly recognized health risks associated with the virus attracted significant attention across the research community. Our study complements several recent studies which have mapped epidemiological elements of Zika, by introducing a newly proposed methodology to simultaneously estimate the contribution of various risk factors for geographic spread resulting in local transmission and to compute the risk of spread (or re-introductions) between each pair of regions. The focus of our analysis is on the Americas, where the set of regions includes all countries, overseas territories, and the states of the US. Methodology/Principal findings We present a novel application of the Generalized Inverse Infection Model (GIIM). The GIIM model uses real observations from the outbreak and seeks to estimate the risk factors driving transmission. The observations are derived from the dates of reported local transmission of Zika virus in each region, the network structure is defined by the passenger air travel movements between all pairs of regions, and the risk factors considered include regional socioeconomic factors, vector habitat suitability, travel volumes, and epidemiological data. The GIIM relies on a multi-agent based optimization method to estimate the parameters, and utilizes a data driven stochastic-dynamic epidemic model for evaluation. As expected, we found that mosquito abundance, incidence rate at the origin region, and human population density are risk factors for Zika virus transmission and spread. Surprisingly, air passenger volume was less impactful, and the most significant factor was (a negative relationship with) the regional gross domestic product (GDP) per capita. Conclusions/Significance: Our model generates country level exportation and importation risk profiles over the course of the epidemic and provides quantitative estimates for the likelihood of introduced Zika virus resulting in local transmission, between all origin-destination travel pairs in the Americas. Our findings indicate that local vector control, rather than travel restrictions, will be more effective at reducing the risks of Zika virus transmission and establishment. Moreover, the inverse relationship between Zika virus transmission and GDP suggests that Zika cases are more likely to occur in regions where people cannot afford to protect themselves from mosquitoes. The modeling framework is not specific for Zika virus, and could easily be employed for other vector-borne pathogens with sufficient epidemiological and entomological data
Recommended from our members
Many-core algorithms for high-dimensional gradients on phylogenetic trees.
MOTIVATION: Advancements in high-throughput genomic sequencing are delivering genomic pathogen data at an unprecedented rate, positioning statistical phylogenetics as a critical tool to monitor infectious diseases globally. This rapid growth spurs the need for efficient inference techniques, such as Hamiltonian Monte Carlo (HMC) in a Bayesian framework, to estimate parameters of these phylogenetic models where the dimensions of the parameters increase with the number of sequences N. HMC requires repeated calculation of the gradient of the data log-likelihood with respect to (wrt) all branch-length-specific (BLS) parameters that traditionally takes O(N2) operations using the standard pruning algorithm. A recent study proposes an approach to calculate this gradient in O(N), enabling researchers to take advantage of gradient-based samplers such as HMC. The CPU implementation of this approach makes the calculation of the gradient computationally tractable for nucleotide-based models but falls short in performance for larger state-space size models, such as Markov-modulated and codon models. Here, we describe novel massively parallel algorithms to calculate the gradient of the log-likelihood wrt all BLS parameters that take advantage of graphics processing units (GPUs) and result in many fold higher speedups over previous CPU implementations. RESULTS: We benchmark these GPU algorithms on three computing systems using three evolutionary inference examples exploring complete genomes from 997 dengue viruses, 62 carnivore mitochondria and 49 yeasts, and observe a >128-fold speedup over the CPU implementation for codon-based models and >8-fold speedup for nucleotide-based models. As a practical demonstration, we also estimate the timing of the first introduction of West Nile virus into the continental Unites States under a codon model with a relaxed molecular clock from 104 full viral genomes, an inference task previously intractable. AVAILABILITY AND IMPLEMENTATION: We provide an implementation of our GPU algorithms in BEAGLE v4.0.0 (https://github.com/beagle-dev/beagle-lib), an open-source library for statistical phylogenetics that enables parallel calculations on multi-core CPUs and GPUs. We employ a BEAGLE-implementation using the Bayesian phylogenetics framework BEAST (https://github.com/beast-dev/beast-mcmc)
- …