Search CORE

17 research outputs found

Calling accuracy decreases with homopolymer length.

Author: Gene W. Tyson (220802)
Glenn Stone (85608)
Lauren M. Bragg (401714)
Margaret K. Butler (401715)
Philip Hugenholtz (13833)
Publication venue
Publication date
Field of study

Lines show mean accuracy for each kit by reference homopolymer length, across bases 10–100 and bases 10–200, the latter range only relevant for the two 200 bp kits.</p

The Francis Crick Institute

Relationship between base position and error rate for homopolymer (over-call/under-call) versus substitution errors.

Author: Gene W. Tyson (220802)
Glenn Stone (85608)
Lauren M. Bragg (401714)
Margaret K. Butler (401715)
Philip Hugenholtz (13833)
Publication venue
Publication date
Field of study

Panel (a) shows the homopolymer error rate (insertion+deletion) by read base position, and panel (b) shows the substitution error rate by base position. Each line is the raw mean error rate for a single data-set with the kit and species as specified by the colour key.</p

The Francis Crick Institute

Relationship between G+C% and the observed mean coverage for 100 bp bins in the reference genome.

Author: Gene W. Tyson (220802)
Glenn Stone (85608)
Lauren M. Bragg (401714)
Margaret K. Butler (401715)
Philip Hugenholtz (13833)
Publication venue
Publication date
Field of study

Panel (a) is a boxplot of the distribution of the square-root normalized mean read depth across the 100 bp windows for each reference genome, broken down further by sequencing kit and G+C% bin. The coverage for each run was normalised by the mean coverage –the boxplots show the square-root fold-change from the mean genomic coverage for each combination of species, kit and G+C% bin. Thus a value of 2 means the coverage was four times that of the mean for that sequencing run. The boxes display the central 50% of the values in each treatment, with the median represented by the solid black horizontal bar. The whiskers each extend for 1.5× the inter-quartile range, and the black dots represent extreme individual observations which fall outside this range. The variability observed in the high G+C bins are likely due to the small sample size for these G+C regions, shown in panel (b). The outliers are potentially due to repetitive content in the genome that failed to be masked by our perfect match repeat approach.</p

The Francis Crick Institute

Ion Torrent quality scores versus empirically estimated quality score for base.

Author: Gene W. Tyson (220802)
Glenn Stone (85608)
Lauren M. Bragg (401714)
Margaret K. Butler (401715)
Philip Hugenholtz (13833)
Publication venue
Publication date
Field of study

The grey cloud surrounding the LOESS smoother function indicates the 95% confidence interval for the conditional mean. Individual observations for each quality are plotted as black points.</p

The Francis Crick Institute

Mean rates of insertion, deletion and substitution errors across the three sequencing kits.

Author: Gene W. Tyson (220802)
Glenn Stone (85608)
Lauren M. Bragg (401714)
Margaret K. Butler (401715)
Philip Hugenholtz (13833)
Publication venue
Publication date
Field of study

Each box-plot shows the distribution of error rates for the specified type across the runs for the specified kit (species are aggregated).</p

The Francis Crick Institute

Examples of over-call/under-call errors in homopolymers of length less than 2.

Author: Gene W. Tyson (220802)
Glenn Stone (85608)
Lauren M. Bragg (401714)
Margaret K. Butler (401715)
Philip Hugenholtz (13833)
Publication venue
Publication date
Field of study

By aligning the read (derived from the rounded flow-values), and its corresponding reference sequence (considered the ‘true’ sequence) at the flow level, we can identify examples of over-calling a zero-length homopolymer (Flow Cycle #2), and under-calling a one-length homopolymer (Flow Cycle # 6). Flow Cycle #5 demonstrates a zero-length homopolymer being correctly called as zero.</p

The Francis Crick Institute

Breakdown of substitution type as a proportion of all substitutions for each sequencing kit.

Author: Gene W. Tyson (220802)
Glenn Stone (85608)
Lauren M. Bragg (401714)
Margaret K. Butler (401715)
Philip Hugenholtz (13833)
Publication venue
Publication date
Field of study

Breakdown of substitution type as a proportion of all substitutions for each sequencing kit.</p

The Francis Crick Institute

Effect of quality and flow trimming on dataset metrics, aggregated by kit used.

Author: Gene W. Tyson (220802)
Glenn Stone (85608)
Lauren M. Bragg (401714)
Margaret K. Butler (401715)
Philip Hugenholtz (13833)
Publication venue
Publication date
Field of study

AT = Analysis trim, QT = Quality trim, HRI = High-residual ionogram trim (1-mers and 2-mers), HRI3 = High-residual ionogram trim (1-mer, 2-mer, 3-mers). The ‘comparison homopolymer rates’ are taken from other literature using the same kit and level of quality assurance (both cases used Torrent Server version 1.5.0).</p

The Francis Crick Institute

Sequencing runs generated for this study.

Author: Gene W. Tyson (220802)
Glenn Stone (85608)
Lauren M. Bragg (401714)
Margaret K. Butler (401715)
Philip Hugenholtz (13833)
Publication venue
Publication date
Field of study

The name for each run is comprised of the chip (314, 316), species (B – Bacillus amyloliquefaciens, S – Sulfolobus tokodaii), machine (a, b), and kit (100 - Ion OneTouch Template Kit, 200M - Ion Xpress Template 200 kit, 200 - Ion OneTouch 200 Template kit). Runs are listed in chronological order. ‘% Wells with ISPs’ describes the percentage of wells on the chip which contained a bead. Mean Length AT denotes length after 3′ adapter trimming.</p

The Francis Crick Institute

Estimated main and deviance effects for each explanatory variable in the double-generalised linear model.

Author: Gene W. Tyson (220802)
Glenn Stone (85608)
Lauren M. Bragg (401714)
Margaret K. Butler (401715)
Philip Hugenholtz (13833)
Publication venue
Publication date
Field of study

Position-in-cycle (PIC) effects are in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003031#pcbi.1003031.s011" target="_blank">Table S1</a>. The intercept represents the mean effect (or dispersion effect) for an observation with all settings at baseline (baseline factors in this model taken to be B. amyloliquefaciens, 100 bp OneTouch Kit and Chip 314). The other coefficients are the differences from when their respective factor is changed from baseline.</p

The Francis Crick Institute