Search CORE

21 research outputs found

NQS filtering improves fit of probability model to data.

Author: Alexander R. Macalalad (177162)
Bruce W. Birren (147656)
Christian L. Boutwell (177170)
Christine M. Malboeuf (177167)
Doug E. Brackney (177174)
Elizabeth M. Ryan (177168)
Gregory D. Ebel (177185)
Joshua Z. Levin (177182)
Karen A. Power (177172)
Kendra N. Pesko (177178)
Matthew R. Henn (103220)
Michael C. Zody (155402)
Niall J. Lennon (177164)
Patrick Charlebois (177163)
Ruchi M. Newman (177165)
Todd M. Allen (177189)
Publication venue
Publication date
Field of study

(A) Quantile-quantile (q-q) plots under NQS filtering show good fit of the probability model to the observed distribution of errors. Since the probability model is discrete, p values are projected onto a uniform distribution, and the distribution of projected p values is compared with the expected null distribution. See <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002417#s4" target="_blank">Materials and Methods</a> section for details. (B) In contrast, q-q plots under no filtering show that no filtering skews the calibration of the probability model used by V-Phaser. Q-q plots of models based on subsets of the reads demonstrate that this effect becomes more pronounced with increasing coverage (see <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002417#pcbi.1002417.s001" target="_blank">Figure S1</a>). Q-q plots are scaled to fit curve, so y = x line is not at a 45 degree angle.</p

FigShare

Phase information increased sensitivity, and base quality scores increased specificity.

Author: Alexander R. Macalalad (177162)
Bruce W. Birren (147656)
Christian L. Boutwell (177170)
Christine M. Malboeuf (177167)
Doug E. Brackney (177174)
Elizabeth M. Ryan (177168)
Gregory D. Ebel (177185)
Joshua Z. Levin (177182)
Karen A. Power (177172)
Kendra N. Pesko (177178)
Matthew R. Henn (103220)
Michael C. Zody (155402)
Niall J. Lennon (177164)
Patrick Charlebois (177163)
Ruchi M. Newman (177165)
Todd M. Allen (177189)
Publication venue
Publication date
Field of study

We compared V-Phaser to alternate versions of V-Phaser with specific components disabled. In the No Phase version, V-Phaser called variants without phase information. In the Uniform Errors version, V-Phaser estimated uniform error rates within homopolymer and nonhomopolymer regions without regard to assigned base qualities. In the No Filtering version, V-Phaser did not filter out low quality bases. (A) Phase information increased sensitivity. The version without phase information attained a sensitivity of 90%, but all other versions of V-Phaser used phase information and attained a sensitivity of 97% or more. We calculated sensitivity as the percentage of known variants correctly identified. Data are from WNV mixed population control dataset. (B) Individual base quality scores increased specificity. Among loci with mismatches, the Uniform Errors version had only 91% specificity, but all other versions incorporated base quality scores in their probability model and attained 97% specificity or more. We calculated specificity as the percentage of loci in the control sample correctly identified as having no variants among loci that had at least one candidate variant. Data are from infectious clone (HIV NL4-3) control dataset.</p

FigShare

Error rates were not uniformly distributed.

Author: Alexander R. Macalalad (177162)
Bruce W. Birren (147656)
Christian L. Boutwell (177170)
Christine M. Malboeuf (177167)
Doug E. Brackney (177174)
Elizabeth M. Ryan (177168)
Gregory D. Ebel (177185)
Joshua Z. Levin (177182)
Karen A. Power (177172)
Kendra N. Pesko (177178)
Matthew R. Henn (103220)
Michael C. Zody (155402)
Niall J. Lennon (177164)
Patrick Charlebois (177163)
Ruchi M. Newman (177165)
Todd M. Allen (177189)
Publication venue
Publication date
Field of study

Error rates varied by (A) read position, (B) base transition, and (C) base quality score. We counted as errors any mismatches to the consensus assembly for each of the two runs in the control read set under the assumption that the NL-43 infectious clone had no diversity. We defined the read position relative to the beginning or end of the read, whichever was closer. We defined a base transition as a dinucleotide representing the transition from the preceding base to the current base, and we scored a transition as an error if the current base was a mismatch. Base quality scores came from the sequencing process.</p

FigShare

Phase increased sensitivity to detect variants.

Author: Alexander R. Macalalad (177162)
Bruce W. Birren (147656)
Christian L. Boutwell (177170)
Christine M. Malboeuf (177167)
Doug E. Brackney (177174)
Elizabeth M. Ryan (177168)
Gregory D. Ebel (177185)
Joshua Z. Levin (177182)
Karen A. Power (177172)
Kendra N. Pesko (177178)
Matthew R. Henn (103220)
Michael C. Zody (155402)
Niall J. Lennon (177164)
Patrick Charlebois (177163)
Ruchi M. Newman (177165)
Todd M. Allen (177189)
Publication venue
Publication date
Field of study

Phase increased sensitivity to detect variants, as seen over a range of error rates at coverages of 100-fold, 250-fold, and 500-fold. The phased variant detection threshold frequency (VDTF) is the lowest frequency of reads with variants at two specific loci that V-Phaser can distinguish from error among reads that span both loci. The unphased VDTF is the lowest frequency of one variant that V-Phaser can distinguish from error among reads that cover that locus. 100-fold phased sequence coverage achieves comparable detection thresholds as 500-fold unphased. We use Equation 7 to calculate the phased and unphased VDTFs. (See the <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002417#s4" target="_blank">Materials and Methods</a> section for Equation 7 and its derivation.)</p

FigShare

Phase information increased sensitivity to detect minor variants.

Author: Alexander R. Macalalad (177162)
Bruce W. Birren (147656)
Christian L. Boutwell (177170)
Christine M. Malboeuf (177167)
Doug E. Brackney (177174)
Elizabeth M. Ryan (177168)
Gregory D. Ebel (177185)
Joshua Z. Levin (177182)
Karen A. Power (177172)
Kendra N. Pesko (177178)
Matthew R. Henn (103220)
Michael C. Zody (155402)
Niall J. Lennon (177164)
Patrick Charlebois (177163)
Ruchi M. Newman (177165)
Todd M. Allen (177189)
Publication venue
Publication date
Field of study

Phase information increased sensitivity to detect low frequency variants, as shown by these histograms of variants under 2.5%. All versions of V-Phaser detected 100% of the variants above 2.5% frequency, so these variants are not shown here. All versions of V-Phaser with phase information (A), (C), and (D) detected most variants below 1% in frequency, but the No Phase version (B) missed many variants below 1% and some variants as high as 2.5%. Data are from control WNV mixed population.</p

FigShare

Comparison of V-Phaser to other viral variant callers.

Author: Alexander R. Macalalad (177162)
Bruce W. Birren (147656)
Christian L. Boutwell (177170)
Christine M. Malboeuf (177167)
Doug E. Brackney (177174)
Elizabeth M. Ryan (177168)
Gregory D. Ebel (177185)
Joshua Z. Levin (177182)
Karen A. Power (177172)
Kendra N. Pesko (177178)
Matthew R. Henn (103220)
Michael C. Zody (155402)
Niall J. Lennon (177164)
Patrick Charlebois (177163)
Ruchi M. Newman (177165)
Todd M. Allen (177189)
Publication venue
Publication date
Field of study

Sensitivities and specificities reported across residues interrogated by all programs. Sensitivity is measured as the fraction of the known variants found by each program in the WNV mixed population control data set. Specificity is the fraction of sites not containing known variants that were called as invariant in the HIV NL4-3 control data set; values reported in parentheses include inserted and deleted bases (see <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002417#s4" target="_blank">Materials and Methods</a>).</p

FigShare

Multiplicity of HIV-1 infection in HSX, MSM and IDU subjects.

Multiplicity of HIV-1 infection in HSX, MSM and IDU subjects.</p

VU Research Portal

Institutional Repository Universiteit Antwerpen

FigShare

Mean average pairwise Hamming distance (APHD) of HIV-1 Env SGA/S sequences distinguishes between single and multiple founder viruses.

(A) A training set of SGA/S Env sequences derived from 127 previously published acute HIV-1 infected subjects illustrating a wide range of env diversity. The APHD is calculated using a sliding window of 120bp with a step size of 21bp. The mean APHD is plotted according to Fiebig stages as defined by HIV-1 clinical laboratory test results. (B) A classifier based on a logistic regression segregated 127 subjects into single or multiple infections and correctly assigned 97% of subjects into the respective groups. Each point corresponds to an individual subject with the number of subjects denoted on the x-axis in parenthesis under each Fiebig stage.</p

FigShare

Mapping of signature sites on the three-dimensional structure of gp120 shows clustering around the CD4-binding site.

A ribbon representation of the crystal structure from the JRFL gp120 molecule (grey) bound to CD4 molecule (green) (PDBID: 2B4C). The CD4 binding site is highlighted in transparent green while signature sites 283, 343, 362, 389, 429, 465 and 471 are all depicted as red space-filling residues.</p

FigShare

Signature sites identified between MSM and HSX Founder viruses in Env using a phylogenetic corrected method.

Signature sites identified between MSM and HSX Founder viruses in Env using a phylogenetic corrected method.</p

FigShare