7 research outputs found
Results of <i>Solanum</i> Sanger sequencing.
<p>We chose 27 predicted SNPs to test with Sanger sequencing. These were chosen from the Bubbleparse ranked list according to four sets of criteria, with all SNPs placed within the top 22,000 of the 68,000 linked SNPs at the expected heterozygosity. The largest number, 10, were chosen from the top of the list, which contains bubbles which are close to the expected allele ratio and also of high coverage. A second group were chosen which were very close to the expected allele ratio, but not such good coverage. A third group had very good coverage, but were not so close to the expect allele ratio. Finally, a fourth group was chosen which had high quality scores, but not necessarily as close a ratio or as good a coverage as previous groups. Overall, we found a high rate of true SNPs – with 23 out of 27 sequences containing a SNP in the position predicted by Bubbleparse, of which 14 displayed the predicted heterozygous alleles. The group chosen from the top of the ranked list showed high accuracy, with 9 out of 10 SNPs confirmed, but high rates were also shown for the other groups, though sample sizes were small.</p
Efficacy of five different methods for ranking bubbles.
<p>In the top graph, moving down the ranked tables, groups of 100 bubbles were taken and compared with the canonical set to calculate the percentage of ‘true’ bubbles. In the bottom graph, groups of 1000 bubbles were taken, allowing the majority of the bubbles to be included. Ranking by the Bubbleparse heuristic produces a much higher true positive rate than any of the alternative methods over the top 50,000 SNPs. From around the 100,000 mark, the Bubbleparse line exhibits a saw shape, the peaks of which are caused by the individual constituents of the ranking heuristic. Note, in the top graph, the blue trace (total coverage) is obscured by the green trace, as both are almost 0.</p
Identification of <i>Arabidopsis thaliana</i> SNPs.
<p>Percentages of canonical SNPs found (solid lines) and percentage of Bubbleparse identified SNPs that were found in the canonical set (dotted lines) for Bur-0 and Tsu-1 with search depth set to 0, 1, 2 and 3 at constant read coverage (top) and for Ler-1 at varied read coverage (bottom).</p
Results of <i>Arabidopsis thaliana</i> Sanger sequencing.
<p>From the ranked list of all SNPs predicted by Bubbleparse in contigs of over 200 nt, the top 48, as well as 16 from 25%, 50% and 75% down the list were tested with Sanger sequencing. This confirmed all but 5 as being real SNPs between Col-0 and Ler-1. The remaining five all had sequencing problems – such as the sequence ending before the SNP was reached – so are not confirmed as false postives.</p
Types of <i>Solanum</i> SNPs discovered by Bubbleparse.
<p>Graph showing the types of SNPs discovered by Bubbleparse for the cross between <i>Phytophthora infestans</i> resistant <i>Solanum berthaultii</i> and susceptible <i>Solanum stenotomum</i>. Because of the nature of the cross, we expect to find heterozygous resistance-linked SNPs and Bubbleparse produced a list of 68,084 of these, from which we selected 27 for sequencing.</p
Effect of depth of search on number of bubbles found by Bubbleparse.
<p>Graphs showing numbers of bubbles found for Bur-0 and Tsu-1 with search depth set to 0, 1, 2 and 3 at constant read coverage (top) and for Ler-1 at varied read coverage (bottom).</p
Bubbles in the de Bruijn graph.
<p>(A) Representation of a simple 11 nt sequence as a de Bruijn graph (top) and then with a SNP (bottom). Nodes represent <i>k</i>mers – sequences of <i>k</i> nucleotides – and edges join together kmers that overlap by <i>k</i>−1 nucleotides. A SNP causes a bifurcation in the graph and the new path joins up with the original path after <i>k</i> nodes. (B) de Bruijn graph representations of a single heterozygous SNP (top) and a SNP followed by a second SNP within <i>k</i> nt (bottom). (C) Our bubble classification system assigns a type according to the number of colours present on each path through the bubble. Thus a bubble corresponding to a heterozygous SNP from an organism in which the resistant variant contains 2 alleles and the susceptible contains 1 allele would produce a bubble with 2 colours on one path and 1 colour on a second path and would be classified as a type “2,1”. Similarly, a bubble corresponding to a heterozygous SNP from an organism in which both the resistant and susceptible variants contain 2 alleles would be classified as a type “2,1”. Finally, a less common example where 2 alleles are present in one variant and 3 in another would appear as a type “2,2,1”.</p