7 research outputs found
“Deep” Sequencing Accuracy and Reproducibility Using Roche/454 Technology for Inferring Co-Receptor Usage in HIV-1
<div><p>Next generation, “deep”, sequencing has increasing applications both clinically and in disparate fields of research. This study investigates the accuracy and reproducibility of “deep” sequencing as applied to co-receptor prediction using the V3 loop of Human Immunodeficiency Virus-1. Despite increasing use in HIV co-receptor prediction, the accuracy and reproducibility of deep sequencing technology, and the factors which can affect it, have received only a limited level of investigation. To accomplish this, repeated deep sequencing results were generated using the Roche GS-FLX (454) from a number of sources including a non-homogeneous clinical sample (N = 47 replicates over 18 deep sequencing runs), and a large clinical cohort from the MOTIVATE and A400129 studies (N = 1521). For repeated measurements of a non-homogeneous clinical sample, increasing input copy number both decreased variance in the measured proportion of non-R5 using virus (p<<0.001 and 0.02 for single replicates and triplicates respectively) and increased measured viral diversity (p<0.001; multiple measures). Detection of sequences with a mean abundance less than 1% abundance showed a 2 fold increase in median coefficient of variation (CV) in repeated measurements of a non-homogeneous clinical sample, and a 2.7 fold increase in CV in the MOTIVATE/A400129 dataset compared to sequences with ≥1% abundance. An unexpected source of error included read position, with low accuracy reads occurring more frequently towards the edge of sequencing regions (p<<0.001). Overall, the primary source of variability was sampling error caused by low input copy number/minority species prevalence, though other sources of error including sequence intrinsic, temporal, and read-position related errors were detected.</p></div
Histogram of percent non-R5 by Trofile call of patients in the MOTIVATE and A4001029 trials for which both population sequencing test results and original Trofile assay results were available (N = 1383).
<p>Patients are grouped according to the percent non-R5 measured by deep sequencing and Trofile call. R5 Trofile calls are shown in white. Dual Mixed Trofile calls are shown in grey. X4 Trofile calls are shown in black.</p
Sequence differences by viral load and PCR replication.
<p>A) Percent non-R5 virus detected at varying input copy numbers for a non-homogeneous clinical sample. The dashed line represents the approximate true value for percent of non-R5 virus. B) Measures of viral diversity for 18 and 286 input copies on the non-homogeneous clinical sample amplified with a single PCR and with triplicate PCR. Nucleotide entropies are shown on the left, mean phylogenetic branch length is shown on the right. Estimates tended to be more accurate when the input copy numbers were higher.</p
Spatial distribution of control bead accuracy.
<p>A)The density distributions for 111 runs of a single control bead (TF2LonG) are shown as an example plate with lighter colours representing higher read density. Regular areas of low density show areas covered by the gasket which physically separates sequencing regions. The division of reads for later analyses are shown as coloured boxes, with reads inside the bluegreenblue box being counted in the inner area, reads between the blue and green boxes as in the middle area, and anything outside of the blue box as being from the outer area. B) Proportion of reads with 95% or lower accuracy by area on the <b>plate</b> are shown for each bead in the inner (green), middle (blue) and outer (red) areas respectively. Proportion in percent is shown on the y axis. The total number of reads with an accuracy less than or equal to 95% are shown above each bar while the total number of reads in the region are shown underneath the bar. In general, the number of reads with an accuracy less than or equal to 95% increases towards the outer regions. TF120LonG however, was an exception, having no difference between the middle and outer areas (both with a prevalence of 0.0036%), though the middle and outer areas followed the trend of having more lower accuracy reads than the inner area.</p
Sequence reproducibility by read direction.
<p>A) Histograms of the most common sequence detected in the reverse direction but missing in the forward direction for the MOTIVATE/A4001029 dataset (N = 1521). Less abundant sequences were more likely to be detected only in a single direction compared to sequences with higher abundance. B) Correlation of measured proportion non-R5 for the MOTIVATE/A4001029 dataset between the forward and reverse direction (N = 1521). A linear regression is shown as a solid line. C) Histograms of the most common sequence detected in the forward direction but missing in the reverse direction for the MOTIVATE/A4001029 dataset (N = 1521). D) The lower panel contains a Bland-Altman comparison. Mean difference is plotted as a solid black line and limits of agreement are shown as dotted lines.</p
Variability as a function of sequence abundance.
<p>The forward direction is indicated in black, the reverse direction in grey. A) Variability in the 100 most common sequences from a non-homogeneous clinical sample run multiple times (N = 47 replicates over 18 deep sequencing runs). B) Variability between the forward and reverse direction for samples in MOTIVATE/A4001029 dataset (N = 1521 samples; 41030 unique sequences, median 26 [IQR 19–34] per sample). Less abundant sequences tended to have more variable prevalence estimates in comparison to more abundant sequences.</p
Reproducibility over time by sequence prevalence for a non-homogenous clinical sample (N = 47 replicates over 18 deep sequencing runs).
<p>A) The proportion of non-R5 virus (defined as PSSM≥−4.75) measured by sequencing run. Sequences in the “forward” direction are shown in black. Sequences in the “reverse” direction are shown in grey. Thick lines represent the overall mean percentage non-R5 virus measured from all sequences for that run. B) The number of times the 10 most common unique sequences were observed for the non-homogenous clinical sample are plotted against the sequencing run. Each sequence has a unique shade-shape combination.</p