2,978 research outputs found

    Highly Sensitive and Specific Detection of Rare Variants in Mixed Viral Populations from Massively Parallel Sequence Data

    Get PDF
    Viruses diversify over time within hosts, often undercutting the effectiveness of host defenses and therapeutic interventions. To design successful vaccines and therapeutics, it is critical to better understand viral diversification, including comprehensively characterizing the genetic variants in viral intra-host populations and modeling changes from transmission through the course of infection. Massively parallel sequencing technologies can overcome the cost constraints of older sequencing methods and obtain the high sequence coverage needed to detect rare genetic variants (<1%) within an infected host, and to assay variants without prior knowledge. Critical to interpreting deep sequence data sets is the ability to distinguish biological variants from process errors with high sensitivity and specificity. To address this challenge, we describe V-Phaser, an algorithm able to recognize rare biological variants in mixed populations. V-Phaser uses covariation (i.e. phasing) between observed variants to increase sensitivity and an expectation maximization algorithm that iteratively recalibrates base quality scores to increase specificity. Overall, V-Phaser achieved >97% sensitivity and >97% specificity on control read sets. On data derived from a patient after four years of HIV-1 infection, V-Phaser detected 2,015 variants across the ∼10 kb genome, including 603 rare variants (<1% frequency) detected only using phase information. V-Phaser identified variants at frequencies down to 0.2%, comparable to the detection threshold of allele-specific PCR, a method that requires prior knowledge of the variants. The high sensitivity and specificity of V-Phaser enables identifying and tracking changes in low frequency variants in mixed populations such as RNA viruses

    Distinguishing low frequency mutations from RT-PCR and sequence errors in viral deep sequencing data

    Get PDF
    There is a high prevalence of coronary artery disease (CAD) in patients with left bundle branch block (LBBB); however there are many other causes for this electrocardiographic abnormality. Non-invasive assessment of these patients remains difficult, and all commonly used modalities exhibit several drawbacks. This often leads to these patients undergoing invasive coronary angiography which may not have been necessary. In this review, we examine the uses and limitations of commonly performed non-invasive tests for diagnosis of CAD in patients with LBBB

    Low-frequency variant detection in viral populations using massively parallel sequencing data

    Get PDF

    Application of the New Generation of Sequencing Technologies for Evaluation of Genetic Consistency of Influenza A Vaccine Viruses

    Get PDF
    For almost half a century, Sanger sequencing has been the conventional method for sequencing DNA. However, its utility for sequencing heterogeneous viral populations is limited because it can only detect mutations that are present in a significant portion of the DNA molecules. Several molecular methods that quantify mutations present at low levels in viral populations were proposed for evaluation of genetic consistency of viral vaccines; however, these methods are only suitable for single site polymorphisms, and cannot be used to screen for unknown mutations

    ViVaMBC: estimating viral sequence variation in complex populations from illumina deep-sequencing data using model-based clustering

    Get PDF
    Background: Deep-sequencing allows for an in-depth characterization of sequence variation in complex populations. However, technology associated errors may impede a powerful assessment of low-frequency mutations. Fortunately, base calls are complemented with quality scores which are derived from a quadruplet of intensities, one channel for each nucleotide type for Illumina sequencing. The highest intensity of the four channels determines the base that is called. Mismatch bases can often be corrected by the second best base, i.e. the base with the second highest intensity in the quadruplet. A virus variant model-based clustering method, ViVaMBC, is presented that explores quality scores and second best base calls for identifying and quantifying viral variants. ViVaMBC is optimized to call variants at the codon level (nucleotide triplets) which enables immediate biological interpretation of the variants with respect to their antiviral drug responses. Results: Using mixtures of HCV plasmids we show that our method accurately estimates frequencies down to 0.5%. The estimates are unbiased when average coverages of 25,000 are reached. A comparison with the SNP-callers V-Phaser2, ShoRAH, and LoFreq shows that ViVaMBC has a superb sensitivity and specificity for variants with frequencies above 0.4%. Unlike the competitors, ViVaMBC reports a higher number of false-positive findings with frequencies below 0.4% which might partially originate from picking up artificial variants introduced by errors in the sample and library preparation step. Conclusions: ViVaMBC is the first method to call viral variants directly at the codon level. The strength of the approach lies in modeling the error probabilities based on the quality scores. Although the use of second best base calls appeared very promising in our data exploration phase, their utility was limited. They provided a slight increase in sensitivity, which however does not warrant the additional computational cost of running the offline base caller. Apparently a lot of information is already contained in the quality scores enabling the model based clustering procedure to adjust the majority of the sequencing errors. Overall the sensitivity of ViVaMBC is such that technical constraints like PCR errors start to form the bottleneck for low frequency variant detection

    Applications of Next-Generation Sequencing Technologies to Diagnostic Virology

    Get PDF
    Novel DNA sequencing techniques, referred to as β€œnext-generation” sequencing (NGS), provide high speed and throughput that can produce an enormous volume of sequences with many possible applications in research and diagnostic settings. In this article, we provide an overview of the many applications of NGS in diagnostic virology. NGS techniques have been used for high-throughput whole viral genome sequencing, such as sequencing of new influenza viruses, for detection of viral genome variability and evolution within the host, such as investigation of human immunodeficiency virus and human hepatitis C virus quasispecies, and monitoring of low-abundance antiviral drug-resistance mutations. NGS techniques have been applied to metagenomics-based strategies for the detection of unexpected disease-associated viruses and for the discovery of novel human viruses, including cancer-related viruses. Finally, the human virome in healthy and disease conditions has been described by NGS-based metagenomics

    Novel molecular techniques for diagnostics and cancer biology

    Get PDF
    Molecular biology is reliant on a large set of increasingly complex methods. The development of high-throughput DNA sequencing almost 20 years ago kicked off a revolution in method development due to its incredible versatility. Besides determining the genomic DNA sequence itself, sequencing has been used to profile gene expression, investigate binding of proteins to DNA and RNA, trace cell lineages, screen for genes involved in biological processes, assay 3D organization of chromatin, and much more. Most of these methods have been immensely useful in cancer biology, helping us to understand the mechanisms of this complex disease and find new ways to battle it. But sequencing is not necessary if the mere presence or absence of a nucleic acid is impotant. In order to be able to rapidly diagnose viral diseases, crucial during pandemics such as the recent COVID-19, simpler methods are more useful. Various nucleic acid detection methods have been developed for molecular diagnostics, which can provide an answer within minutes. In this thesis, the fields of high-throughput sequencing, cancer biology, and molecular viral diagnostics are reviewed, since the work presented here consists of three projects dealing with these different topics. In Paper I, we present a novel method for detecting low frequency variants in DNA. Such variants are important in applications such as genetic heterogeneity or minimal residual disease in cancer. However, their detection is hampered by the errors in sequencing data. To circumvent this, one approach is to attach double-stranded unique molecular identifier sequences (dsUMIs) to the ends of each DNA fragment before sequencing. This allows to compare reads originating from the same original molecule and form consensus sequences, removing most errors in the process. However, protocols that achieve this are challenging to perform. We developed a novel, simplified library preparation approach called one pot double-stranded UMI sequencing (OPUSeq) that adds dsUMIs to DNA in the same reaction as the PCR. We demonstrate that OPUSeq efficiently removes errors in sequencing data and can be used to detect variants down to 0.01% variant allele frequency. Using OPUSeq, we also found a novel type of artifact that arises when fragmentase enzyme mix is used in library preparation. In Paper II, we investigated the existence of genetic factors that regulate cell state plasticity in cancer. Cancer cells are known to be capable of phenotypic cell state transitions that help them evade treatment. In certain cancer cell line models, such as the chronic myeloid leukemia (CML) K562, the cells are observed to adopt and switch between different states even in the absence of any specific stimuli. As our model system, we used the heterogeneous expression of CD24 protein in K562 as a marker for differential cell states. We designed two orthogonal genome-wide CRISPR-Cas9 knockout screening approaches to look for genes which regulate the spontaneous transitions between CD24-positive and CD24-negative states. We performed both screens and combined the data to produce a list of 49 plasticity regulator candidate genes. We further showed that seven of these genes are differentially expressed between CML patients exhibiting early molecular response to imatinib and those who do not, indicating a connection between plasticity and drug resistance. Finally, we validate one of the plasticity impeding candidates, ALDOB, by generating a single knockout model and demonstrating the increased ability of these cells to undergo state transitions. In Paper III, we present a protocol for detection of SARS-CoV-2 RNA in unextracted patient samples using reverse transcription loop mediated isothermal amplification (RT-LAMP) with non-commercial enzymes. This protocol provides an alternative diagnostic method for situations where RT-LAMP and RNA extraction reagents are scarce. First, we showed how reverse transcriptases (RT) and strand-displacing polymerases necessary for RT-LAMP can be expressed and purified in-house. We tested different enzymes and LAMP primer sets and optimized the reaction conditions. Benchmarking showed that our in-house mix performs similarly to or even better than commercial alternatives. Finally, we tested our protocol on heat-inactivated, unextracted nasopharyngeal samples from patients and found that it exhibited good specificity as well as good sensitivity in samples with moderate to high viral load

    Evaluating the performance of tools used to call minority variants from whole genome short-read data.

    Get PDF
    Background: High-throughput whole genome sequencing facilitates investigation of minority virus sub-populations from virus positive samples. Minority variants are useful in understanding within and between host diversity, population dynamics and can potentially assist in elucidating person-person transmission pathways. Several minority variant callers have been developed to describe low frequency sub-populations from whole genome sequence data. These callers differ based on bioinformatics and statistical methods used to discriminate sequencing errors from low-frequency variants. Methods: We evaluated the diagnostic performance and concordance between published minority variant callers used in identifying minority variants from whole-genome sequence data from virus samples. We used the ART-Illumina read simulation tool to generate three artificial short-read datasets of varying coverage and error profiles from an RSV reference genome. The datasets were spiked with nucleotide variants at predetermined positions and frequencies. Variants were called using FreeBayes, LoFreq, Vardict, and VarScan2. The variant callers' agreement in identifying known variants was quantified using two measures; concordance accuracy and the inter-caller concordance. Results: The variant callers reported differences in identifying minority variants from the datasets. Concordance accuracy and inter-caller concordance were positively correlated with sample coverage. FreeBayes identified the majority of variants although it was characterised by variable sensitivity and precision in addition to a high false positive rate relative to the other minority variant callers and which varied with sample coverage. LoFreq was the most conservative caller. Conclusions: We conducted a performance and concordance evaluation of four minority variant calling tools used to identify and quantify low frequency variants. Inconsistency in the quality of sequenced samples impacts on sensitivity and accuracy of minority variant callers. Our study suggests that combining at least three tools when identifying minority variants is useful inΒ filtering errors when calling low frequency variants

    Recent advances in inferring viral diversity from high-throughput sequencing data

    Get PDF
    Rapidly evolving RNA viruses prevail within a host as a collection of closely related variants, referred to as viral quasispecies. Advances in high-throughput sequencing (HTS) technologies have facilitated the assessment of the genetic diversity of such virus populations at an unprecedented level of detail. However, analysis of HTS data from virus populations is challenging due to short, error-prone reads. In order to account for uncertainties originating from these limitations, several computational and statistical methods have been developed for studying the genetic heterogeneity of virus population. Here, we review methods for the analysis of HTS reads, including approaches to local diversity estimation and global haplotype reconstruction. Challenges posed by aligning reads, as well as the impact of reference biases on diversity estimates are also discussed. In addition, we address some of the experimental approaches designed to improve the biological signal-to-noise ratio. In the future, computational methods for the analysis of heterogeneous virus populations are likely to continue being complemented by technological developments.ISSN:0168-170

    Application of next-generation sequencing technologies in virology

    Get PDF
    The progress of science is punctuated by the advent of revolutionary technologies that provide new ways and scales to formulate scientific questions and advance knowledge. Following on from electron microscopy, cell culture and PCR, next-generation sequencing is one of these methodologies that is now changing the way that we understand viruses, particularly in the areas of genome sequencing, evolution, ecology, discovery and transcriptomics. Possibilities for these methodologies are only limited by our scientific imagination and, to some extent, by their cost, which has restricted their use to relatively small numbers of samples. Challenges remain, including the storage and analysis of the large amounts of data generated. As the chemistries employed mature, costs will decrease. In addition, improved methods for analysis will become available, opening yet further applications in virology including routine diagnostic work on individuals, and new understanding of the interaction between viral and host transcriptomes. An exciting era of viral exploration has begun, and will set us new challenges to understand the role of newly discovered viral diversity in both disease and health
    • …
    corecore