3 research outputs found

    Discovering new viral lineages and estimating their abundance in wastewater

    No full text
    Wastewater surveillance of SARS-CoV-2 has emerged as a critical tool for tracking the spread of COVID-19. In addition to estimating the relative case numbers using qPCR, SARS-CoV-2 genomic RNA can be extracted from wastewater and sequenced. The sequenced genomes provide information about which lineages, in particular which variants of concern (VOCs) are present in a community. Wastewater RNA sequencing data has two distinct challenges: First, the genomes are highly fragmented and the alignments often have poor genome coverage. Second, the samples are comprised of a mixture of genomes so mutations cannot be directly attributed to a single lineage. In this thesis, I explore methods to overcome these two challenges to extract useful information from the samples. First, I look at the problem of determining the relative abundance of VOCs. Most existing techniques only consider mutations which are unique to a particular VOC which massively reduces the amount of usable data. I introduce a new technique which extends mean and median frequencies over shared mutations in order to make use of the huge pool of shared mutations. Next, I investigate strategies for designing single-amplicon sequencing methods. I look at selecting single amplicons which are well-conserved and rich in information. I also design a single amplicon which is capable of amplifying multiple coronaviruses. I conclude the SARS-CoV-2 work by providing a technique which can identify novel lineages and sublineages from wastewater sequencing runs. Finally, I show that the techniques for analyzing SARS-CoV-2 in wastewater can also be applied to an important plant pathogen, the Tomato Brown Rugose Fruit Virus

    Tracking SARS-CoV-2 variants of concern in wastewater: an assessment of nine computational tools using simulated genomic data

    No full text
    Wastewater-based surveillance (WBS) is an important epidemiological and public health tool for tracking pathogens across the scale of a building, neighbourhood, city, or region. WBS gained widespread adoption globally during the SARS-CoV-2 pandemic for estimating community infection levels by qPCR. Sequencing pathogen genes or genomes from wastewater adds information about pathogen genetic diversity, which can be used to identify viral lineages (including variants of concern) that are circulating in a local population. Capturing the genetic diversity by WBS sequencing is not trivial, as wastewater samples often contain a diverse mixture of viral lineages with real mutations and sequencing errors, which must be deconvoluted computationally from short sequencing reads. In this study we assess nine different computational tools that have recently been developed to address this challenge. We simulated 100 wastewater sequence samples consisting of SARS-CoV-2 BA.1, BA.2, and Delta lineages, in various mixtures, as well as a Delta-Omicron recombinant and a synthetic 'novel' lineage. Most tools performed well in identifying the true lineages present and estimating their relative abundances and were generally robust to variation in sequencing depth and read length. While many tools identified lineages present down to 1 % frequency, results were more reliable above a 5 % threshold. The presence of an unknown synthetic lineage, which represents an unclassified SARS-CoV-2 lineage, increases the error in relative abundance estimates of other lineages, but the magnitude of this effect was small for most tools. The tools also varied in how they labelled novel synthetic lineages and recombinants. While our simulated dataset represents just one of many possible use cases for these methods, we hope it helps users understand potential sources of error or bias in wastewater sequencing analysis and to appreciate the commonalities and differences across methods.Pattern Recognition and Bioinformatic

    Tracking SARS-CoV-2 variants of concern in wastewater: an assessment of nine computational tools using simulated genomic data

    No full text
    Wastewater-based surveillance (WBS) is an important epidemiological and public health tool for tracking pathogens across the scale of a building, neighbourhood, city, or region. WBS gained widespread adoption globally during the SARS-CoV-2 pandemic for estimating community infection levels by qPCR. Sequencing pathogen genes or genomes from wastewater adds information about pathogen genetic diversity, which can be used to identify viral lineages (including variants of concern) that are circulating in a local population. Capturing the genetic diversity by WBS sequencing is not trivial, as wastewater samples often contain a diverse mixture of viral lineages with real mutations and sequencing errors, which must be deconvoluted computationally from short sequencing reads. In this study we assess nine different computational tools that have recently been developed to address this challenge. We simulated 100 wastewater sequence samples consisting of SARS-CoV-2 BA.1, BA.2, and Delta lineages, in various mixtures, as well as a Delta-Omicron recombinant and a synthetic 'novel' lineage. Most tools performed well in identifying the true lineages present and estimating their relative abundances and were generally robust to variation in sequencing depth and read length. While many tools identified lineages present down to 1 % frequency, results were more reliable above a 5 % threshold. The presence of an unknown synthetic lineage, which represents an unclassified SARS-CoV-2 lineage, increases the error in relative abundance estimates of other lineages, but the magnitude of this effect was small for most tools. The tools also varied in how they labelled novel synthetic lineages and recombinants. While our simulated dataset represents just one of many possible use cases for these methods, we hope it helps users understand potential sources of error or bias in wastewater sequencing analysis and to appreciate the commonalities and differences across methods.ISSN:2057-585
    corecore