33 research outputs found

    Genomic epidemiology of SARS-CoV-2: from outbreak investigations, to national and international surveillance efforts

    Get PDF
    The response of the global genomics community to the SARS-CoV-2 pandemic has been unprecedented. At time of writing there are more than 3.7 million SARS-CoV-2 genome sequences shared publicly on GISAID (www.gisaid.org). This scale of data on that order of magnitude presents novel opportunities and challenges for the field of genomic epidemiology. This thesis describes the development, validation and implementation of novel tools to facilitate different aspects of genomic epidemiology, from outbreak investigations to surveillance efforts. The Pango nomenclature lineage system is a set of rules that defines epidemiological lineages of SARS-CoV-2. Pango defines lineages from whole genome sequences, which 195 nations around the world have been producing for SARS-CoV-2. In chapter 1, I discuss the development and validation of pangolin, a software tool developed to assign the most likely Pango lineage to novel SARS-CoV-2 genomes. Initially, pangolin used a classic phylogenetic approach to assign lineages although further methods were trialled and implemented as the pandemic progressed to cope with the scale of and analytical challenges associated with SARS-CoV-2 data. Since it was first implemented, millions of SARSCoV-2 genomes have been assigned lineages with the pangolin tool from users across the world. For a number of reasons, labs may not be in a position to produce full genome sequences. Chapter 2 investigates how the lineage system can be used if only spike nucleotide sequences are available and defines ‘lineage sets’ that summarise what lineage information exists within a given spike haplotype. We find that for many lineages, including the main lineages corresponding to the WHO-defined variants of concern (VOCs), the spike nucleotide sequence is sufficient to distinguish Pango lineages and I describe the development of a software tool hedgehog that is a wrapper for pangolin that both defines and assigns these spike-based lineage sets. Pango lineage assignments with pangolin have been used almost ubiquitously across the globe and provide a simple, quick piece of information to classify SARS-CoV-2 genomes. However, for both outbreak investigations and routine surveillance, a more in-depth analysis is needed to give more than just this one piece of information. In chapter 3, I present civet, a software tool that addresses the challenge of the SARS-Cov-2 global dataset that is on the order of 3.7 million sequences and performs robust phylogenetic analyses on query sequences of interest, whilst contextualising them in the background data. Using civet, the user can produce an interactive report that summarises genomic, phylogenetic and epidemiological information, enabling routine analyses and investigations to be carried out in a single command. The suite of tools in this thesis have been developed to enable researchers to rapidly get robust and actionable information from SARS-CoV-2 genomes for genomic epidemiology efforts worldwide

    Comparison of eleven RNA extraction methods for poliovirus direct molecular detection in stool samples

    Get PDF
    Direct detection by PCR of poliovirus RNA in stool samples provides a rapid diagnostic and surveillance tool that can replace virus isolation by cell culture in global polio surveillance. The sensitivity of direct detection methods is likely to depend on the choice of RNA extraction method and sample volume. We report a comparative analysis of 11 nucleic acid extraction methods (7 manual and 4 semiautomated) for poliovirus molecular detection using stool samples (n = 59) that had been previously identified as poliovirus positive by cell culture. To assess the effect of RNA recovery methods, extracted RNA using each of the 11 methods was tested with a poliovirus-specific reverse transcription-quantitative PCR (RT-qPCR), a pan-poliovirus RT-PCR (near-whole-genome amplification), a pan-enterovirus RT-PCR (entire capsid region), and a nested VP1 PCR that is the basis of a direct detection method based on nanopore sequencing. We also assessed extracted RNA integrity and quantity. The overall effect of extraction method on poliovirus PCR amplification assays tested in this study was found to be statistically significant (P < 0.001), thus indicating that the choice of RNA extraction method is an important component that needs to be carefully considered for any diagnostic based on nucleic acid amplification. Performance of the methods was generally consistent across the different assays used. Of the 11 extraction methods tested, the MagMAX viral RNA isolation kit used manually or automatically was found to be the preferable method for poliovirus molecular direct detection considering performance, cost, and processing time

    Genome annotation improvements from cross-phyla proteogenomics and time-of-day differences in malaria mosquito proteins using untargeted quantitative proteomics

    Get PDF
    The malaria mosquito, Anopheles stephensi, and other mosquitoes modulate their biology to match the time-of-day. In the present work, we used a non-hypothesis driven approach (untargeted proteomics) to identify proteins in mosquito tissue, and then quantified the relative abundance of the identified proteins from An. stephensi bodies. Using these quantified protein levels, we then analyzed the data for proteins that were only detectable at certain times-of-the day, highlighting the need to consider time-of-day in experimental design. Further, we extended our time-of-day analysis to look for proteins which cycle in a rhythmic 24-hour ("circadian") manner, identifying 31 rhythmic proteins. Finally, to maximize the utility of our data, we performed a proteogenomic analysis to improve the genome annotation of An. stephensi. We compare peptides that were detected using mass spectrometry but are 'missing' from the An. stephensi predicted proteome, to reference proteomes from 38 other primarily human disease vector species. We found 239 such peptide matches and reveal that genome annotation can be improved using proteogenomic analysis from taxonomically diverse reference proteomes. Examination of 'missing' peptides revealed reading frame errors, errors in gene-calling, overlapping gene models, and suspected gaps in the genome assembly

    Assignment of epidemiological lineages in an emerging pandemic using the pangolin tool.

    Get PDF
    Funder: Oxford Martin School, University of OxfordThe response of the global virus genomics community to the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic has been unprecedented, with significant advances made towards the 'real-time' generation and sharing of SARS-CoV-2 genomic data. The rapid growth in virus genome data production has necessitated the development of new analytical methods that can deal with orders of magnitude of more genomes than previously available. Here, we present and describe Phylogenetic Assignment of Named Global Outbreak Lineages (pangolin), a computational tool that has been developed to assign the most likely lineage to a given SARS-CoV-2 genome sequence according to the Pango dynamic lineage nomenclature scheme. To date, nearly two million virus genomes have been submitted to the web-application implementation of pangolin, which has facilitated the SARS-CoV-2 genomic epidemiology and provided researchers with access to actionable information about the pandemic's transmission lineages

    Rapid and sensitive direct detection and identification of poliovirus from stool and environmental surveillance samples using nanopore sequencing

    Get PDF
    Global poliovirus surveillance involves virus isolation from stool and environmental samples, intratypic differential (ITD) by PCR, and sequencing of the VP1 region to distinguish vaccine (Sabin), vaccine-derived, and wild-type polioviruses and to ensure an appropriate response. This cell culture algorithm takes 2 to 3 weeks on average between sample receipt and sequencing. Direct detection of viral RNA using PCR allows faster detection but has traditionally faced challenges related to poor sensitivity and difficulties in sequencing common samples containing poliovirus and enterovirus mixtures. We present a nested PCR and nanopore sequencing protocol that allows rapid (99.9%. This novel method shows promise as a faster and safer alternative to cell culture for the detection and real-time sequencing of polioviruses in stool and environmental samples

    Evaluating the effects of SARS-CoV-2 Spike mutation D614G on transmissibility and pathogenicity

    Get PDF
    SummaryGlobal dispersal and increasing frequency of the SARS-CoV-2 Spike protein variant D614G are suggestive of a selective advantage but may also be due to a random founder effect. We investigate the hypothesis for positive selection of Spike D614G in the United Kingdom using more than 25,000 whole genome SARS-CoV-2 sequences. Despite the availability of a large data set, well represented by both Spike 614 variants, not all approaches showed a conclusive signal of positive selection. Population genetic analysis indicates that 614G increases in frequency relative to 614D in a manner consistent with a selective advantage. We do not find any indication that patients infected with the Spike 614G variant have higher COVID-19 mortality or clinical severity, but 614G is associated with higher viral load and younger age of patients. Significant differences in growth and size of 614G phylogenetic clusters indicate a need for continued study of this variant.</jats:p

    COVID-19 in health-care workers in three hospitals in the south of the Netherlands:A cross-sectional study

    Get PDF
    Background: 10 days after the first reported case of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection in the Netherlands (on Feb 27, 2020), 55 (4%) of 1497 health-care workers in nine hospitals located in the south of the Netherlands had tested positive for SARS-CoV-2 RNA. We aimed to gain insight in possible sources of infection in health-care workers. Methods: We did a cross-sectional study at three of the nine hospitals located in the south of the Netherlands. We screened health-care workers at the participating hospitals for SARS-CoV-2 infection, based on clinical symptoms (fever or mild respiratory symptoms) in the 10 days before screening. We obtained epidemiological data through structured interviews with health-care workers and combined this information with data from whole-genome sequencing of SARS-CoV-2 in clinical samples taken from health-care workers and patients. We did an in-depth analysis of sources and modes of transmission of SARS-CoV-2 in health-care workers and patients. Findings: Between March 2 and March 12, 2020, 1796 (15%) of 12 022 health-care workers were screened, of whom 96 (5%) tested positive for SARS-CoV-2. We obtained complete and near-complete genome sequences from 50 health-care workers and ten patients. Most sequences were grouped in three clusters, with two clusters showing local circulation within the region. The noted patterns were consistent with multiple introductions into the hospitals through community-acquired infections and local amplification in the community. Interpretation: Although direct transmission in the hospitals cannot be ruled out, our data do not support widespread nosocomial transmission as the source of infection in patients or health-care workers. Funding: EU Horizon 2020 (RECoVer, VEO, and the European Joint Programme One Health METASTAVA), and the National Institute of Allergy and Infectious Diseases, National Institutes of Health
    corecore