21 research outputs found

    NQS filtering improves fit of probability model to data.

    No full text
    <p>(<b>A</b>) Quantile-quantile (q-q) plots under NQS filtering show good fit of the probability model to the observed distribution of errors. Since the probability model is discrete, p values are projected onto a uniform distribution, and the distribution of projected p values is compared with the expected null distribution. See <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002417#s4" target="_blank">Materials and Methods</a> section for details. (<b>B</b>) In contrast, q-q plots under no filtering show that no filtering skews the calibration of the probability model used by <i>V-Phaser</i>. Q-q plots of models based on subsets of the reads demonstrate that this effect becomes more pronounced with increasing coverage (see <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002417#pcbi.1002417.s001" target="_blank">Figure S1</a>). Q-q plots are scaled to fit curve, so y = x line is not at a 45 degree angle.</p

    Phase information increased sensitivity, and base quality scores increased specificity.

    No full text
    <p>We compared <i>V-Phaser</i> to alternate versions of <i>V-Phaser</i> with specific components disabled. In the No Phase version, <i>V-Phaser</i> called variants without phase information. In the Uniform Errors version, <i>V-Phaser</i> estimated uniform error rates within homopolymer and nonhomopolymer regions without regard to assigned base qualities. In the No Filtering version, <i>V-Phaser</i> did not filter out low quality bases. (<b>A</b>) Phase information increased sensitivity. The version without phase information attained a sensitivity of 90%, but all other versions of <i>V-Phaser</i> used phase information and attained a sensitivity of 97% or more. We calculated sensitivity as the percentage of known variants correctly identified. Data are from WNV mixed population control dataset. (<b>B</b>) Individual base quality scores increased specificity. Among loci with mismatches, the Uniform Errors version had only 91% specificity, but all other versions incorporated base quality scores in their probability model and attained 97% specificity or more. We calculated specificity as the percentage of loci in the control sample correctly identified as having no variants among loci that had at least one candidate variant. Data are from infectious clone (HIV NL4-3) control dataset.</p

    Error rates were not uniformly distributed.

    No full text
    <p>Error rates varied by (<b>A</b>) read position, (<b>B</b>) base transition, and (<b>C</b>) base quality score. We counted as errors any mismatches to the consensus assembly for each of the two runs in the control read set under the assumption that the NL-43 infectious clone had no diversity. We defined the read position relative to the beginning or end of the read, whichever was closer. We defined a base transition as a dinucleotide representing the transition from the preceding base to the current base, and we scored a transition as an error if the current base was a mismatch. Base quality scores came from the sequencing process.</p

    Phase increased sensitivity to detect variants.

    No full text
    <p>Phase increased sensitivity to detect variants, as seen over a range of error rates at coverages of 100-fold, 250-fold, and 500-fold. The <i>phased variant detection threshold frequency (VDTF)</i> is the lowest frequency of reads with variants at two specific loci that <i>V-Phaser</i> can distinguish from error among reads that span both loci. The <i>unphased VDTF</i> is the lowest frequency of one variant that <i>V-Phaser</i> can distinguish from error among reads that cover that locus. 100-fold <i>phased</i> sequence coverage achieves comparable detection thresholds as 500-fold <i>unphased</i>. We use Equation 7 to calculate the <i>phased</i> and <i>unphased VDTFs</i>. (See the <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002417#s4" target="_blank">Materials and Methods</a> section for Equation 7 and its derivation.)</p

    Phase information increased sensitivity to detect minor variants.

    No full text
    <p>Phase information increased sensitivity to detect low frequency variants, as shown by these histograms of variants under 2.5%. All versions of <i>V-Phaser</i> detected 100% of the variants above 2.5% frequency, so these variants are not shown here. All versions of <i>V-Phaser</i> with phase information (<b>A</b>), (<b>C</b>), and (<b>D</b>) detected most variants below 1% in frequency, but the No Phase version (<b>B</b>) missed many variants below 1% and some variants as high as 2.5%. Data are from control WNV mixed population.</p

    Comparison of <i>V-Phaser</i> to other viral variant callers.

    No full text
    <p>Sensitivities and specificities reported across residues interrogated by all programs. Sensitivity is measured as the fraction of the known variants found by each program in the WNV mixed population control data set. Specificity is the fraction of sites not containing known variants that were called as invariant in the HIV NL4-3 control data set; values reported in parentheses include inserted and deleted bases (see <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002417#s4" target="_blank">Materials and Methods</a>).</p

    Mean average pairwise Hamming distance (APHD) of HIV-1 Env SGA/S sequences distinguishes between single and multiple founder viruses.

    No full text
    <p><b>(A)</b> A training set of SGA/S Env sequences derived from 127 previously published acute HIV-1 infected subjects illustrating a wide range of <i>env</i> diversity. The APHD is calculated using a sliding window of 120bp with a step size of 21bp. The mean APHD is plotted according to Fiebig stages as defined by HIV-1 clinical laboratory test results. <b>(B)</b> A classifier based on a logistic regression segregated 127 subjects into single or multiple infections and correctly assigned 97% of subjects into the respective groups. Each point corresponds to an individual subject with the number of subjects denoted on the x-axis in parenthesis under each Fiebig stage.</p
    corecore