21 research outputs found

    Inferring Short-Range Linkage Information from Sequencing Chromatograms

    Get PDF
    Direct Sanger sequencing of viral genome populations yields multiple ambiguous sequence positions. It is not straightforward to derive linkage information from sequencing chromatograms, which in turn hampers the correct interpretation of the sequence data. We present a method for determining the variants existing in a viral quasispecies in the case of two nearby ambiguous sequence positions by exploiting the effect of sequence context-dependent incorporation of dideoxynucleotides. The computational model was trained on data from sequencing chromatograms of clonal variants and was evaluated on two test sets of in vitro mixtures. The approach achieved high accuracies in identifying the mixture components of 97.4% on a test set in which the positions to be analyzed are only one base apart from each other, and of 84.5% on a test set in which the ambiguous positions are separated by three bases. In silk experiments suggest two major limitations of our approach in terms of accuracy. First, due to a basic limitation of Sanger sequencing, it is not possible to reliably detect minor variants with a relative frequency of no more than 10%. Second, the model cannot distinguish between mixtures of two or four clonal variants, if one of two sets of linear constraints is fulfilled. Furthermore, the approach requires repetitive sequencing of all variants that might be present in the mixture to be analyzed. Nevertheless, the effectiveness of our method on the two in vitro test sets shows that short-range linkage information of two ambiguous sequence positions can be inferred from Sanger sequencing chromatograms without any further assumptions on the mixture composition. Additionally, our model provides new insights into the established and widely used Sanger sequencing technology. The source code of our method is made available at http://bioinf.mpi-inf.mpg.de/publications/beggel/linkageinformation.zip

    Fraction estimates for dilution series.

    No full text
    <p>All subplots show nominal mixture fractions versus estimated mixture fractions for three dilution series. A–C use all peak heights and provide proportions estimates with low error and low variance. D–E ignore the peak heights at the ambiguous positions and try to estimate the mixture proportions based on the unambiguous positions only. The resulting fraction estimates show higher error and higher variance.</p

    Sequencing chromatogram.

    No full text
    <p>The sequencing chromatogram shows two nearby ambiguous sequence positions 610 and 612. At position 610 adenine and guanine are present. At position 612 adenine and thymine are present. Positions are numbered with respect to the <i>Reverse Transcriptase</i> of the hepatitis B virus genome. This chromatogram raises the question which of the bases at positions 610 and 612 are present on the same clonal variant.</p

    Prediction accuracy on <i>in vitro</i> test sets.

    No full text
    <p>The figure shows the prediction accuracy on test sets TS1 (subplots A and B) and TS2 (subplots C and D). Prediction accuracy was evaluated both at the clone and at the model level. Each test sample is either predicted correctly, predicted incorrectly or unassigned. The latter happens when the marginal likelihood of the best model divided by the marginal likelihoods of all other models falls below the uncertainty cutoff displayed on the x-axis.</p

    <i>In silico</i> prediction results.

    No full text
    <p>1771 <i>in silico</i> test chromatograms were created by computing the mixture profiles on a grid of values with precision 0.05. Test chromatograms were classified by the mixture model with . The subplots show all falsely classified samples separately for each falsely predicted label. Six major cases of misclassification can be observed. Subplots A–D show test samples that consist of four haplotypes with at least one haplotype having low frequency. Subplots E and F show test samples that were predicted as mixtures of haplotypes 1 and 4 or of haplotypes 2 and 3, respectively. The data points of subplot E satisfy the linear constraints , and . The data points of subplot F satisfy , and .</p

    Peak heights of dilution series.

    No full text
    <p>The figure shows the median normalized peak heights of the chromatograms of a dilution series sorted by nominal mixture proportion. The normalized peak heights before the ambiguous sequence position are almost identical for all nominal mixture proportions. At the ambiguous sequence position and at up to five bases downstream of the ambiguous sequence position a smooth and apparently linear transition between the peak heights of the samples with nominal mixture proportions 10∶0 and 0∶10 can be observed.</p

    Results of the first international HIV-1 coreceptor proficiency panel test

    No full text
    Background: Quality Assurance (QA) programs are essential to evaluate performance in diagnostics laboratories. Objectives: We present the results from the first QA for HIV-1 genotypic tropism testing, organized and coordinated by the Institute of Virology at the University of Cologne. Study design: 12 cell culture-derived viral strains of different HIV-1 clades from the NIH AIDS Reagent Program were sent to the participants to be processed with their standard diagnostic methods Fasta files containing the V3 region sequence were centrally analyzed at the Institute of Virology, Cologne. All samples were sent in parallel for phenotypic testing. Results: 36 laboratories from 16 countries reported genotypic results. The sequence-generation efficacy was 95.1%, while the phenotypic assays ESTA (R) and PhenXR only achieved results for 58.3% of the samples. All four X4 samples were identified by 31/36 laboratories, two laboratories amplified 3/4 x4 samples, and three detected 2/4 x4 samples. There was high concordance among the genotypic and phenotypic results, although differences in FPR values were detected. Most deficiencies in sequence editing did not result in wrong classification of X4 viruses as R5, with the exception of sample NRZ05 by laboratory 38, but in an overestimation of CXCR4 use. Conclusions: This demonstrates that genotypic tropism prediction is a safe procedure for clinical purposes. As we used homogeneous cell culture samples and all sequence fasta files were centrally analyzed, variations in FPR values can only be attributed to sample preparation, RT-PCR or sequence editing protocols
    corecore