22 research outputs found

    Highly Sensitive and Specific Detection of Rare Variants in Mixed Viral Populations from Massively Parallel Sequence Data

    Get PDF
    Viruses diversify over time within hosts, often undercutting the effectiveness of host defenses and therapeutic interventions. To design successful vaccines and therapeutics, it is critical to better understand viral diversification, including comprehensively characterizing the genetic variants in viral intra-host populations and modeling changes from transmission through the course of infection. Massively parallel sequencing technologies can overcome the cost constraints of older sequencing methods and obtain the high sequence coverage needed to detect rare genetic variants (<1%) within an infected host, and to assay variants without prior knowledge. Critical to interpreting deep sequence data sets is the ability to distinguish biological variants from process errors with high sensitivity and specificity. To address this challenge, we describe V-Phaser, an algorithm able to recognize rare biological variants in mixed populations. V-Phaser uses covariation (i.e. phasing) between observed variants to increase sensitivity and an expectation maximization algorithm that iteratively recalibrates base quality scores to increase specificity. Overall, V-Phaser achieved >97% sensitivity and >97% specificity on control read sets. On data derived from a patient after four years of HIV-1 infection, V-Phaser detected 2,015 variants across the ∼10 kb genome, including 603 rare variants (<1% frequency) detected only using phase information. V-Phaser identified variants at frequencies down to 0.2%, comparable to the detection threshold of allele-specific PCR, a method that requires prior knowledge of the variants. The high sensitivity and specificity of V-Phaser enables identifying and tracking changes in low frequency variants in mixed populations such as RNA viruses

    Whole Genome Deep Sequencing of HIV-1 Reveals the Impact of Early Minor Variants Upon Immune Recognition During Acute Infection

    Get PDF
    Deep sequencing technologies have the potential to transform the study of highly variable viral pathogens by providing a rapid and cost-effective approach to sensitively characterize rapidly evolving viral quasispecies. Here, we report on a high-throughput whole HIV-1 genome deep sequencing platform that combines 454 pyrosequencing with novel assembly and variant detection algorithms. In one subject we combined these genetic data with detailed immunological analyses to comprehensively evaluate viral evolution and immune escape during the acute phase of HIV-1 infection. The majority of early, low frequency mutations represented viral adaptation to host CD8+ T cell responses, evidence of strong immune selection pressure occurring during the early decline from peak viremia. CD8+ T cell responses capable of recognizing these low frequency escape variants coincided with the selection and evolution of more effective secondary HLA-anchor escape mutations. Frequent, and in some cases rapid, reversion of transmitted mutations was also observed across the viral genome. When located within restricted CD8 epitopes these low frequency reverting mutations were sufficient to prime de novo responses to these epitopes, again illustrating the capacity of the immune response to recognize and respond to low frequency variants. More importantly, rapid viral escape from the most immunodominant CD8+ T cell responses coincided with plateauing of the initial viral load decline in this subject, suggestive of a potential link between maintenance of effective, dominant CD8 responses and the degree of early viremia reduction. We conclude that the early control of HIV-1 replication by immunodominant CD8+ T cell responses may be substantially influenced by rapid, low frequency viral adaptations not detected by conventional sequencing approaches, which warrants further investigation. These data support the critical need for vaccine-induced CD8+ T cell responses to target more highly constrained regions of the virus in order to ensure the maintenance of immunodominant CD8 responses and the sustained decline of early viremia

    Phase information increased sensitivity, and base quality scores increased specificity.

    No full text
    <p>We compared <i>V-Phaser</i> to alternate versions of <i>V-Phaser</i> with specific components disabled. In the No Phase version, <i>V-Phaser</i> called variants without phase information. In the Uniform Errors version, <i>V-Phaser</i> estimated uniform error rates within homopolymer and nonhomopolymer regions without regard to assigned base qualities. In the No Filtering version, <i>V-Phaser</i> did not filter out low quality bases. (<b>A</b>) Phase information increased sensitivity. The version without phase information attained a sensitivity of 90%, but all other versions of <i>V-Phaser</i> used phase information and attained a sensitivity of 97% or more. We calculated sensitivity as the percentage of known variants correctly identified. Data are from WNV mixed population control dataset. (<b>B</b>) Individual base quality scores increased specificity. Among loci with mismatches, the Uniform Errors version had only 91% specificity, but all other versions incorporated base quality scores in their probability model and attained 97% specificity or more. We calculated specificity as the percentage of loci in the control sample correctly identified as having no variants among loci that had at least one candidate variant. Data are from infectious clone (HIV NL4-3) control dataset.</p

    Error rates were not uniformly distributed.

    No full text
    <p>Error rates varied by (<b>A</b>) read position, (<b>B</b>) base transition, and (<b>C</b>) base quality score. We counted as errors any mismatches to the consensus assembly for each of the two runs in the control read set under the assumption that the NL-43 infectious clone had no diversity. We defined the read position relative to the beginning or end of the read, whichever was closer. We defined a base transition as a dinucleotide representing the transition from the preceding base to the current base, and we scored a transition as an error if the current base was a mismatch. Base quality scores came from the sequencing process.</p

    NQS filtering improves fit of probability model to data.

    No full text
    <p>(<b>A</b>) Quantile-quantile (q-q) plots under NQS filtering show good fit of the probability model to the observed distribution of errors. Since the probability model is discrete, p values are projected onto a uniform distribution, and the distribution of projected p values is compared with the expected null distribution. See <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002417#s4" target="_blank">Materials and Methods</a> section for details. (<b>B</b>) In contrast, q-q plots under no filtering show that no filtering skews the calibration of the probability model used by <i>V-Phaser</i>. Q-q plots of models based on subsets of the reads demonstrate that this effect becomes more pronounced with increasing coverage (see <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002417#pcbi.1002417.s001" target="_blank">Figure S1</a>). Q-q plots are scaled to fit curve, so y = x line is not at a 45 degree angle.</p

    Phase information increased sensitivity to detect minor variants.

    No full text
    <p>Phase information increased sensitivity to detect low frequency variants, as shown by these histograms of variants under 2.5%. All versions of <i>V-Phaser</i> detected 100% of the variants above 2.5% frequency, so these variants are not shown here. All versions of <i>V-Phaser</i> with phase information (<b>A</b>), (<b>C</b>), and (<b>D</b>) detected most variants below 1% in frequency, but the No Phase version (<b>B</b>) missed many variants below 1% and some variants as high as 2.5%. Data are from control WNV mixed population.</p

    Comparison of <i>V-Phaser</i> to other viral variant callers.

    No full text
    <p>Sensitivities and specificities reported across residues interrogated by all programs. Sensitivity is measured as the fraction of the known variants found by each program in the WNV mixed population control data set. Specificity is the fraction of sites not containing known variants that were called as invariant in the HIV NL4-3 control data set; values reported in parentheses include inserted and deleted bases (see <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002417#s4" target="_blank">Materials and Methods</a>).</p

    Phase increased sensitivity to detect variants.

    No full text
    <p>Phase increased sensitivity to detect variants, as seen over a range of error rates at coverages of 100-fold, 250-fold, and 500-fold. The <i>phased variant detection threshold frequency (VDTF)</i> is the lowest frequency of reads with variants at two specific loci that <i>V-Phaser</i> can distinguish from error among reads that span both loci. The <i>unphased VDTF</i> is the lowest frequency of one variant that <i>V-Phaser</i> can distinguish from error among reads that cover that locus. 100-fold <i>phased</i> sequence coverage achieves comparable detection thresholds as 500-fold <i>unphased</i>. We use Equation 7 to calculate the <i>phased</i> and <i>unphased VDTFs</i>. (See the <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002417#s4" target="_blank">Materials and Methods</a> section for Equation 7 and its derivation.)</p
    corecore