3,252 research outputs found

    Efficient error correction for next-generation sequencing of viral amplicons

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Next-generation sequencing allows the analysis of an unprecedented number of viral sequence variants from infected patients, presenting a novel opportunity for understanding virus evolution, drug resistance and immune escape. However, sequencing in bulk is error prone. Thus, the generated data require error identification and correction. Most error-correction methods to date are not optimized for amplicon analysis and assume that the error rate is randomly distributed. Recent quality assessment of amplicon sequences obtained using 454-sequencing showed that the error rate is strongly linked to the presence and size of homopolymers, position in the sequence and length of the amplicon. All these parameters are strongly sequence specific and should be incorporated into the calibration of error-correction algorithms designed for amplicon sequencing.</p> <p>Results</p> <p>In this paper, we present two new efficient error correction algorithms optimized for viral amplicons: (i) k-mer-based error correction (KEC) and (ii) empirical frequency threshold (ET). Both were compared to a previously published clustering algorithm (SHORAH), in order to evaluate their relative performance on 24 experimental datasets obtained by 454-sequencing of amplicons with known sequences. All three algorithms show similar accuracy in finding true haplotypes. However, KEC and ET were significantly more efficient than SHORAH in removing false haplotypes and estimating the frequency of true ones.</p> <p>Conclusions</p> <p>Both algorithms, KEC and ET, are highly suitable for rapid recovery of error-free haplotypes obtained by 454-sequencing of amplicons from heterogeneous viruses.</p> <p>The implementations of the algorithms and data sets used for their testing are available at: <url>http://alan.cs.gsu.edu/NGS/?q=content/pyrosequencing-error-correction-algorithm</url></p

    Algorithms for Viral Population Analysis

    Get PDF
    The genetic structure of an intra-host viral population has an effect on many clinically important phenotypic traits such as escape from vaccine induced immunity, virulence, and response to antiviral therapies. Next-generation sequencing provides read-coverage sufficient for genomic reconstruction of a heterogeneous, yet highly similar, viral population; and more specifically, for the detection of rare variants. Admittedly, while depth is less of an issue for modern sequencers, the short length of generated reads complicates viral population assembly. This task is worsened by the presence of both random and systematic sequencing errors in huge amounts of data. In this dissertation I present completed work for reconstructing a viral population given next-generation sequencing data. Several algorithms are described for solving this problem under the error-free amplicon (or sliding-window) model. In order for these methods to handle actual real-world data, an error-correction method is proposed. A formal derivation of its likelihood model along with optimization steps for an EM algorithm are presented. Although these methods perform well, they cannot take into account paired-end sequencing data. In order to address this, a new method is detailed that works under the error-free paired-end case along with maximum a-posteriori estimation of the model parameters

    Viral Quasispecies Reconstruction Using Next Generation Sequencing Reads

    Get PDF
    The genomic diversity of viral quasispecies is a subject of great interest, especially for chronic infections. Characterization of viral diversity can be addressed by high-throughput sequencing technology (454 Life Sciences, Illumina, SOLiD, Ion Torrent, etc.). Standard assembly software was originally designed for single genome assembly and cannot be used to assemble and estimate the frequency of closely related quasispecies sequences. This work focuses on parsimonious and maximum likelihood models for assembling viral quasispecies and estimating their frequencies from 454 sequencing data. Our methods have been applied to several RNA viruses (HCV, IBV) as well as DNA viruses (HBV), genotyped using 454 Life Sciences amplicon and shotgun methods

    BMC Bioinformatics

    Get PDF
    BackgroundNext-generation sequencing allows the analysis of an unprecedented number of viral sequence variants from infected patients, presenting a novel opportunity for understanding virus evolution, drug resistance and immune escape. However, sequencing in bulk is error prone. Thus, the generated data require error identification and correction. Most error-correction methods to date are not optimized for amplicon analysis and assume that the error rate is randomly distributed. Recent quality assessment of amplicon sequences obtained using 454-sequencing showed that the error rate is strongly linked to the presence and size of homopolymers, position in the sequence and length of the amplicon. All these parameters are strongly sequence specific and should be incorporated into the calibration of error-correction algorithms designed for amplicon sequencing.ResultsIn this paper, we present two new efficient error correction algorithms optimized for viral amplicons: (i) k-mer-based error correction (KEC) and (ii) empirical frequency threshold (ET). Both were compared to a previously published clustering algorithm (SHORAH), in order to evaluate their relative performance on 24 experimental datasets obtained by 454-sequencing of amplicons with known sequences. All three algorithms show similar accuracy in finding true haplotypes. However, KEC and ET were significantly more efficient than SHORAH in removing false haplotypes and estimating the frequency of true ones.ConclusionsBoth algorithms, KEC and ET, are highly suitable for rapid recovery of error-free haplotypes obtained by 454-sequencing of amplicons from heterogeneous viruses

    Viral population estimation using pyrosequencing

    Get PDF
    The diversity of virus populations within single infected hosts presents a major difficulty for the natural immune response as well as for vaccine design and antiviral drug therapy. Recently developed pyrophosphate based sequencing technologies (pyrosequencing) can be used for quantifying this diversity by ultra-deep sequencing of virus samples. We present computational methods for the analysis of such sequence data and apply these techniques to pyrosequencing data obtained from HIV populations within patients harboring drug resistant virus strains. Our main result is the estimation of the population structure of the sample from the pyrosequencing reads. This inference is based on a statistical approach to error correction, followed by a combinatorial algorithm for constructing a minimal set of haplotypes that explain the data. Using this set of explaining haplotypes, we apply a statistical model to infer the frequencies of the haplotypes in the population via an EM algorithm. We demonstrate that pyrosequencing reads allow for effective population reconstruction by extensive simulations and by comparison to 165 sequences obtained directly from clonal sequencing of four independent, diverse HIV populations. Thus, pyrosequencing can be used for cost-effective estimation of the structure of virus populations, promising new insights into viral evolutionary dynamics and disease control strategies.Comment: 23 pages, 13 figure

    Viral Diversity by Deep Sequencing: Approaches to Analyzing Effects of Anti-HIV Treatments

    Get PDF
    HIV is a deadly virus responsible for the AIDS pandemic, which has claimed countless lives since its origins in the early 1980s. A cure for HIV is still elusive - HIV can exist as a diverse and dynamic population that adapts quickly to immune and drug pressures, making elimination of infection difficult. Advances in antiretroviral (ARV) therapy have resulted in effective control of HIV for some but not all patients. This dissertation reports case studies of the response of viral populations to selection pressures exerted by emerging anti-HIV therapies. Deep sequencing technology was used to probe viral swarms at high-resolution, which helped make clinically relevant conclusions. Further, novel computational approaches were implemented to control procedural noise and carefully interpret signal. In one study, we examine HIV integrase inhibitors (INIs), which are among the latest ARV drugs. INIs act at a pre-integration level by aborting viral integration, which would normally lead to lasting infection. Raltegravir (RAL) is the only FDA-approved INI to date. Investigating drug resistance is crucial to informing future course of ARV therapy. We describe evolving HIV swarms in patients exhibiting a switch in RAL-resistance profiles. To understand implications of RAL administration, we analyzed the pre-therapy or treatment-naïve context for the viral populations in-depth. Our findings suggest that predominant mutations arise only in presence of RAL - in its absence, they do not constitute fit polymorphisms. For all their effectiveness, drugs have not eradicated HIV. A recent clinical case, however, involving transfer of HIV-resistant cells to an infected patient, resulted for the first time in possible cure. This emphasized the importance of gene-modification and cell-based therapies to treat HIV. One such strategy showing promise uses an antisense to target HIV. The approach has been safe although clinical efficacy has not been fully determined. In support of one such study, we deep-sequenced viral swarms in the presence of antisense-modified cells. Encouragingly, we observed minority strains harboring evidence of antisense pressure in vivo, demonstrating the potential of alternative therapy. Finally, this dissertation underscores the significance of rare signatures in HIV populations, and outlines methods to investigate them

    Advanced sequencing approaches detected insertions of viral and human origin in the viral genome of chronic hepatitis E virus patients

    Get PDF
    The awareness of hepatitis E virus (HEV) increased significantly in the last decade due to its unexpectedly high prevalence in high-income countries. There, infections with HEV-genotype 3 (HEV-3) are predominant which can progress to chronicity in immunocompromised individuals. Persistent infection and antiviral therapy can select HEV-3 variants; however, the spectrum and occurrence of HEV-3 variants is underreported. To gain in-depth insights into the viral population and to perform detailed characterization of viral genomes, we used a new approach combining long-range PCR with next-generation and third-generation sequencing which allowed near full-length sequencing of HEV-3 genomes. Furthermore, we developed a targeted ultra-deep sequencing approach to assess the dynamics of clinically relevant mutations in the RdRp-region and to detect insertions in the HVR-domain in the HEV genomes. Using this new approach, we not only identified several insertions of human (AHNAK, RPL18) and viral origin (RdRp-derived) in the HVR-region isolated from an exemplary sample but detected a variant containing two different insertions simultaneously (AHNAK- and RdRp-derived). This finding is the first HEV-variant recognized as such showing various insertions in the HVR-domain. Thus, this molecular approach will add incrementally to our current knowledge of the HEV-genome organization and pathogenesis in chronic hepatitis E.Peer Reviewe

    SeekDeep: single-base resolution de novo clustering for amplicon deep sequencing

    Get PDF
    PCR amplicon deep sequencing continues to transform the investigation of genetic diversity in viral, bacterial, and eukaryotic populations. In eukaryotic populations such as Plasmodium falciparum infections, it is important to discriminate sequences differing by a single nucleotide polymorphism. In bacterial populations, single-base resolution can provide improved resolution towards species and strains. Here, we introduce the SeekDeep suite built around the qluster algorithm, which is capable of accurately building de novo clusters representing true, biological local haplotypes differing by just a single base. It outperforms current software, particularly at low frequencies and at low input read depths, whether resolving single-base differences or traditional OTUs. SeekDeep is open source and works with all major sequencing technologies, making it broadly useful in a wide variety of applications of amplicon deep sequencing to extract accurate and maximal biologic information
    • …
    corecore