17 research outputs found
Calling accuracy decreases with homopolymer length.
<p>Lines show mean accuracy for each kit by reference homopolymer length, across bases 10–100 and bases 10–200, the latter range only relevant for the two 200 bp kits.</p
Relationship between base position and error rate for homopolymer (over-call/under-call) versus substitution errors.
<p>Panel (a) shows the homopolymer error rate (insertion+deletion) by read base position, and panel (b) shows the substitution error rate by base position. Each line is the raw mean error rate for a single data-set with the kit and species as specified by the colour key.</p
Relationship between G+C% and the observed mean coverage for 100 bp bins in the reference genome.
<p>Panel (a) is a boxplot of the distribution of the square-root normalized mean read depth across the 100 bp windows for each reference genome, broken down further by sequencing kit and G+C% bin. The coverage for each run was normalised by the mean coverage –the boxplots show the square-root fold-change from the mean genomic coverage for each combination of species, kit and G+C% bin. Thus a value of 2 means the coverage was four times that of the mean for that sequencing run. The boxes display the central 50% of the values in each treatment, with the median represented by the solid black horizontal bar. The whiskers each extend for 1.5× the inter-quartile range, and the black dots represent extreme individual observations which fall outside this range. The variability observed in the high G+C bins are likely due to the small sample size for these G+C regions, shown in panel (b). The outliers are potentially due to repetitive content in the genome that failed to be masked by our perfect match repeat approach.</p
Ion Torrent quality scores versus empirically estimated quality score for base.
<p>The grey cloud surrounding the LOESS smoother function indicates the 95% confidence interval for the conditional mean. Individual observations for each quality are plotted as black points.</p
Mean rates of insertion, deletion and substitution errors across the three sequencing kits.
<p>Each box-plot shows the distribution of error rates for the specified type across the runs for the specified kit (species are aggregated).</p
Examples of over-call/under-call errors in homopolymers of length less than 2.
<p>By aligning the read (derived from the rounded flow-values), and its corresponding reference sequence (considered the ‘true’ sequence) at the flow level, we can identify examples of over-calling a zero-length homopolymer (Flow Cycle #2), and under-calling a one-length homopolymer (Flow Cycle # 6). Flow Cycle #5 demonstrates a zero-length homopolymer being correctly called as zero.</p
Breakdown of substitution type as a proportion of all substitutions for each sequencing kit.
<p>Breakdown of substitution type as a proportion of all substitutions for each sequencing kit.</p
Effect of quality and flow trimming on dataset metrics, aggregated by kit used.
<p>AT = Analysis trim, QT = Quality trim, HRI = High-residual ionogram trim (1-mers and 2-mers), HRI3 = High-residual ionogram trim (1-mer, 2-mer, 3-mers). The ‘comparison homopolymer rates’ are taken from other literature using the same kit and level of quality assurance (both cases used Torrent Server version 1.5.0).</p
Sequencing runs generated for this study.
<p>The name for each run is comprised of the chip (314, 316), species (B – <i>Bacillus amyloliquefaciens</i>, S – <i>Sulfolobus tokodaii</i>), machine (a, b), and kit (100 - Ion OneTouch Template Kit, 200M - Ion Xpress Template 200 kit, 200 - Ion OneTouch 200 Template kit). Runs are listed in chronological order. ‘% Wells with ISPs’ describes the percentage of wells on the chip which contained a bead. Mean Length AT denotes length after 3′ adapter trimming.</p
Estimated main and deviance effects for each explanatory variable in the double-generalised linear model.
<p>Position-in-cycle (PIC) effects are in <b><a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003031#pcbi.1003031.s011" target="_blank">Table S1</a></b>. The intercept represents the mean effect (or dispersion effect) for an observation with all settings at baseline (baseline factors in this model taken to be <i>B. amyloliquefaciens</i>, 100 bp OneTouch Kit and Chip 314). The other coefficients are the differences from when their respective factor is changed from baseline.</p