11 research outputs found

    Datasets.

    No full text
    <p>*The 48-bp reads in the NCBI SRA set have a 2 bp initial barcode that was trimmed, resulting in 46 bp reads.</p><p>Datasets used for benchmark tests. For <i>H. sapiens</i> and <i>P. falciparum</i>, two times are given for TopHat. For <i>H. sapiens</i>, the longer time is with more sensitive settings, but the shorter time resulted in less than 5% fewer junctions at a similar specificity. For <i>P. falciparum</i>, the longer time is with more sensitive but less stringent settings whereas the shorter time is for the more stringent settings that resulted in significantly fewer junctions but with a much higher specificity.</p

    HMM Parameter Values.

    No full text
    <p>The initial and trained values for the HMM. The first two columns (“1→2” and “2→1”) show the probability of transitioning from State 1 to State 2 and the reverse. The probability of transitioning from State 2 to State 1 is fixed at 0 (indicating a 100% probability of remaining in State 2). For each state, the probability of a match at each quality bin is reported. The initial values were used to validate the HMM. HMMSplicer uses Initial Value Set 2, though the initial values do not impact the final trained values (see <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0013875#pone-0013875-g002" target="_blank">Figure 2b</a>). The trained values are shown for each dataset analyzed. The Human values are the same as those shown in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0013875#pone-0013875-g001" target="_blank">Figure 1</a>, though in more detail.</p

    Experimental confirmation of predicted <i>Plasmodium falciparum</i> splice junctions.

    No full text
    <p>Schematics of the predicted splice junctions and sequenced RT-PCR products for <b>a</b>) PFC0285c, <b>b</b>) PF07_0101, and <b>c</b>) PFD0185c. For PFC0285c, the verified junction likely splices an additional exon in the 5′UTR to the coding region of the gene. The confirmed junction in PF07_0101 splices out 291 nt (97 aa) from the first exon, which could represent an alternative protein-coding isoform, or an error in the gene model. The demonstrated junctions in PFD0185c excise 85bp near the 3′ end of the gene, causing a frameshift, and appear to splice two exons within the 3′UTR of the gene together. Again, the junction within the gene model may represent an alternative splicing event or an error in the gene model. ESTs near all three areas are included to provide the direction of the genes.</p

    XBP1 non-canonical intron.

    No full text
    <p>HMMSplicer discovers the non-canonical <i>XBP1</i> intron. HMMSplicer identifies three reads containing the non-canonical CA-AG splice site in <i>XBP1</i>. Because the reads are fairly evenly split, both read-halves aligned to the genome. The edges identified by HMMSplicer are 2 and 4 bp off from the actual splice site because the sequence at the beginning of the intron repeats the sequence at the beginning of the subsequent exon. When identical junctions are collapsed, there are two junctions, one with a score of 1024 and one with a score of 1030, which puts them in the top 0.5% of the collapsed non-canonical junctions.</p

    Alternative 5′ and 3′ splice sites.

    No full text
    <p>HMMSplicer results within 15 bp of RefSeq introns were analyzed to measure the number of bases added or removed from the spliced transcript. There were 997 instances where the intron had an alternate 5′ splice site (5′SS, shown in grey) and 2,577 instances of an alternate 3′ splice site (3′SS site, shown in black). The most common alternative splice was 3 bases removed or added to the exon at the 3′SS. TopHat results showed a similar pattern, though only 875 alternates (262 5′SS alternates and 613 3′SS alternates) are found, less than a quarter of the HMMSplicer results. WebLogos were constructed from the sequences at the 1,099 alternate 3′SS with three bases removed from the transcript and the 460 alternate 3′SS with three bases added to the transcript. For these, the green dashed line shows the alternate splice site while the red dashed line shows the canonical splice site. In both cases, a repetition of the YAG splice motif is evident.</p

    Algorithm parameters.

    No full text
    <p><b>a</b>) Percent of oligos able to map within a genome as a function of oligo size. The solid lines show the percentages if oligos are able to map up to 50 times within the genome (the value used in HMMSplicer seeding). The dashed lines show the percentages if a unique match is required. <b>b</b>) HMM training. The values for the two most variable parameters of the HMM are shown here, with the x-axis representing different training set sizes and initial HMM parameters. The error bars show the standard deviation of ten repetitions of training. HMMSplicer uses a training subset size of 10,000. <b>c</b>) Effect of size, in bases, for the second piece of the read. The percent of second pieces uniquely mapping within 80 kbp of the first piece increases as the size of the second piece increases, while the percent of second pieces mapping to multiple locations decreases.</p

    Simulation results.

    No full text
    <p>(<b>a</b>) Results for HMMSplicer and TopHat for 50 and 75 bp reads. Although values are similar at higher coverage levels, HMMSplicer exhibits substantial increases in sensitivity at lower coverage levels. (<b>b</b>) ROC curve for the 50 bp simulation results at 1×, 10×, and 50× coverage demonstrates that HMMSplicer's scoring algorithm accurately discriminates between true and false junctions. The number in parentheses is the area under the curve for each coverage level.</p

    Simulation Results.

    No full text
    <p>HMMSplicer and TopHat were run on read sets from 40 to 75 bp long at coverage levels from 1× to 50× on 503 non-overlapping gene transcripts from Human Chr20.</p

    Human results compared by transcript abundance.

    No full text
    <p>Transcript abundance was measured as Reads Per Kilobase per Million reads mapped (RPKM) and the genes were binned by RPKM to show the number of RefSeq junctions found at different levels of transcript abundance. For genes with an RPKM less than 10, HMMSplicer found 76.2% more junctions, whereas for genes with an RPKM above 50, HMMSplicer found only 6.7% more junctions. While a smaller number of highly expressed genes dominate the mRNA population, 74.8% of genes have RPKM values less than 10.</p

    HMMSplicer pipeline.

    No full text
    <p>After removing reads that have full-length alignments to the genome, reads are divided in half and aligned to the genome (step 1 as defined in the <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0013875#s4" target="_blank">Materials and Methods</a>). The HMM is trained using a subset of the read-half alignments (step 2a). The HMM bins quality scores into five levels. Although only three levels are shown in this overview for simplification, the values for all five levels can be found in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0013875#pone-0013875-t001" target="_blank">Table 1</a>. The trained HMM is then used to determine the splice position within each read-half alignment (step 2b). The remaining second piece of the read is then matched downstream to find the other intron edge (step 3). The initial set of splice junctions then proceed to rescue (step 4) and filter and collapse (step 5) to generate the final set of splice junctions.</p
    corecore