33 research outputs found

    Additional file 3: of Finding RNA structure in the unstructured RBPome

    No full text
    Figure S3 There is no improvement in binding prediction from amino acid sequence by utilizing RNA structure with random structure probabilities. A) When we add RNA structural features to the sequence k-mer space of AffinityRegression, but assign structure probabilities randomly, we do no predict binding any better than using sequence features alone. B) When we add RNA structural features to the sequence k-mer space of AffinityRegression, but assign structure probabilities randomly, we do not predict the top-bound probes as compared to unbound probes any better than using sequence features alone. (PNG 68 kb

    Effects of the number of aligned mammalian species on the TFBS detection accuracy

    No full text
    <p><b>Copyright information:</b></p><p>Taken from "Phylogenetic simulation of promoter evolution: estimation and modeling of binding site turnover events and assessment of their impact on alignment tools"</p><p>http://genomebiology.com/2007/8/10/R225</p><p>Genome Biology 2007;8(10):R225-R225.</p><p>Published online 24 Oct 2007</p><p>PMCID:PMC2246299.</p><p></p> Each panel shows the performance of a tool in aligning a different number of species. Human and baboon were used for the two species alignment, mouse was added for the three species alignment, and all five species but cow were used for four species alignment. While all tools have almost the same performance for aligning the two closely related species human and baboon, MUSCLE and DIALIGN performed better than other tools in maintaining or improving performance when adding more species to the alignment

    Additional file 2: of Finding RNA structure in the unstructured RBPome

    No full text
    Figure S2 A) RNA structural binding preferences do not improve in vitro binding prediction when random structure probabilities are assigned. Correlation results over 488 paired experiments reveals that RNA structure does not improve binding prediction when structure probabilities are assigned randomly. B) RNA structural binding preferences do not improve in vivo binding prediction when random structure probabilities are assigned. AUC results of 96 paired eCLIP and RNAcompete experiments over 21 joint proteins demonstrate that RNA structural binding preferences learned from in vitro data do not correlate well with protein-RNA interactions measured in vivo when structure probabilities are assigned randomly. (PNG 85 kb

    Detection accuracy of individual TFBSs on five-way mammalian alignments

    No full text
    <p><b>Copyright information:</b></p><p>Taken from "Phylogenetic simulation of promoter evolution: estimation and modeling of binding site turnover events and assessment of their impact on alignment tools"</p><p>http://genomebiology.com/2007/8/10/R225</p><p>Genome Biology 2007;8(10):R225-R225.</p><p>Published online 24 Oct 2007</p><p>PMCID:PMC2246299.</p><p></p> All five tools perform better at detecting YY1E2F and Pax6, which have low RTRs and short restricted distance for translocation, than IRF2 and ROAZ, which have high RTR and long restricted distance for translocation. MUSCLE shows an overall better performance than the other four tools. MLAGAN performs better than DIALIGN on YY1E2F, PAX6, PPARG and ROZA, while DIALIGN shows a better performance than MLAGAN on TP53 and PPARG, which have a long restricted distance for translocation but a relatively low RTR

    Experimental Validation of UNCOVER Predictions

    No full text
    <div><p>(A) RT-PCR validation of newly identified alternative exons with no prior EST evidence. Lane numbers are given in Arabic numerals below the gel; sample numbers of new verifications and negative controls are in Roman numerals above. Lanes 2–5 were verified using flanking primers and therefore show two bands each, the larger one corresponding to the event including the newly identified ACE. Lanes 6–9 used a primer internal to the newly identified exon and therefore only show one band each. Lanes 10–13 are typical examples of ten randomly selected introns in the ENCODE target regions that were not predicted to harbor AS events. Lane 14 shows a blank reaction control without adding template. Lanes 1 and 15 contain size markers spaced at 100 nt intervals, with the strong bands corresponding to 1,000 and 500 nt. Ensembl ID pairs for the known exon upstream of the validated new one and the corresponding gene are as follows: internal exons, lanes 2–6: ENSE00000881911.1:ENSG00000004866.5, ENSE00000862512.1:ENSG00000126217.3, ENSE00001201432.1:ENSG00000168781.5, ENSE00001146476.1:ENSG00000168781.5, and ENSE00001084095.4:ENSG00000164402.2; terminal exons, lanes 7–9: ENSE00001379673.1:ENSG00000159140.5, ENSE00001046164.1:ENSG00000067369.1, and ENSE00000952769.2:ENSG00000142183.3 (a known case as positive control); random negative controls, lanes 10–13: ENSE00001321652.4:ENSG00000161980.2, ENSE00000868377.2:ENSG00000102125.4, ENSE00001239587.1:ENSG00000100220.2, and ENSE00001307891.1:ENSG00000185721.1.</p><p>(B) Example UNCOVER alignment of a newly detected ACE. Aligned nucleotides are connected with a vertical dash in case of identity, a colon in case of a transition, and a dot in case of a transversion. The alignment is labeled with the types of the states that lead to the most likely alignment: C, conserved noncoding sequence; F, 5′ splice site; I, nonconserved intronic sequence; T, 3′ splice site; 1, 2, and 3, coding sequence, with the number giving the position in a codon. The detected ACE is flanked by highly conserved noncoding sequence, a characteristic of true ACEs. The sequence shown corresponds to the event in sample i in (A).</p></div

    The average TFBS sensitivity of five tools in aligning TFBS in five mammalian species

    No full text
    <p><b>Copyright information:</b></p><p>Taken from "Phylogenetic simulation of promoter evolution: estimation and modeling of binding site turnover events and assessment of their impact on alignment tools"</p><p>http://genomebiology.com/2007/8/10/R225</p><p>Genome Biology 2007;8(10):R225-R225.</p><p>Published online 24 Oct 2007</p><p>PMCID:PMC2246299.</p><p></p> The average TFBS sensitivity of all functional TFBSs. The average TFBS sensitivity with the subset of non-turnover sites among all TFBSs. The relative order of TFBS sensitivity for the five tools is almost the same as the order of their TFBS detection accuracy (Figure 10d)

    Recognition of Unknown Conserved Alternatively Spliced Exons

    No full text
    <div><p>The split structure of most mammalian protein-coding genes allows for the potential to produce multiple different mRNA and protein isoforms from a single gene locus through the process of alternative splicing (AS). We propose a computational approach called UNCOVER based on a pair hidden Markov model to discover conserved coding exonic sequences subject to AS that have so far gone undetected. Applying UNCOVER to orthologous introns of known human and mouse genes predicts skipped exons or retained introns present in both species, while discriminating them from conserved noncoding sequences. The accuracy of the model is evaluated on a curated set of genes with known conserved AS events. The prediction of skipped exons in the ~1% of the human genome represented by the ENCODE regions leads to more than 50 new exon candidates. Five novel predicted AS exons were validated by RT-PCR and sequencing analysis of 15 introns with strong UNCOVER predictions and lacking EST evidence. These results imply that a considerable number of conserved exonic sequences and associated isoforms are still completely missing from the current annotation of known genes. UNCOVER also identifies a small number of candidates for conserved intron retention.</p></div

    Structure of the UNCOVER pHMM

    No full text
    <div><p>The model is used to globally align a pair of orthologous human/mouse introns and detect conserved coding AS events.</p><p>(A) A schematic overview of the model architecture, with circles indicating groups of functionally related states. For accurate splicing, the two ends of an intron must be precisely determined by the splicing machinery. The prominent sites for this process are the 5′ splice site (5′ss) at the junction between the upstream exon and the intron and the 3′ splice site (3′ss) at the junction of the intron and the downstream exon. As reference, pictograms of the mammalian 5′ splice site and 3′ splice site are depicted, in which the letters at individual positions are scaled according to their frequency. We restrict ourselves to U2-type splice sites with perfectly conserved GT–AG dinucleotides. The alignment always starts with the conserved 5′ splice site after the initial GT dinucleotide. The transitions of the model then allow it to pursue several paths, corresponding to different types of AS, indicated by small icons. (1) The “default” is to observe conserved or nonconserved noncoding sequence, possibly alternating between these two. (2) Transitions to an ACE sequence of conserved {3′ splice site, skipped exon, 5′ splice site} are possible at any time, and can also occur more than once. (3) An IRE is modeled by going from a 5′ splice site to a 3′ splice site by only passing through a coding submodel. (4) and (5) An early exit from this codon model through another 5′ splice site leads to an alternative 5′ exon at the beginning of the sequence, or correspondingly to an alternative 3′ exon at the end. The alignment is fixed on the right side by the 3′ splice site at the end of the intron. All splice site states are first-order pair states not allowing for insertions or deletions. The 5′ splice site part of the model covers 9 nt (3 nt in the exon, plus the conserved GT and the following 4 nt in the intron); the 3′ splice site is 23 nt long (18 nt and the conserved AG in the intron plus 3 nt in the exon).</p><p>(B and C) A detailed view of the noncoding intronic submodel (B) and a close-up of the coding submodel (C), with closed circles representing pair states and dashed circles representing single states. Thick straight arrows indicate the allowed start and end states of the submodels. The noncoding conservation (B) is modeled by a first-order pair state, allowing insertions and deletions of individual nucleotides. The null model contains single first-order states representing nonconserved human and mouse intronic sequences. The coding states (C) comprise three second-order pair states for nucleotides in the three codon positions, as well as three second-order single states each for human and mouse to capture species-specific codon insertion/deletion events. The transition matrix ensures that only those insertion/deletion events covering complete codons are admissible.</p></div

    Dimorphic expression of multiple male- and female-enriched genes in B6 is delayed compared to 129S1 mice.

    No full text
    <p>(A) Expression of male- (top panel) and female-enriched (bottom panel) genes. Dimorphic expression for these genes is delayed by ∼5 hours in B6 compared to 129S1. (B) Heatmap showing dimorphic expression at E11.6 in 129S1 and comparison of same genes in B6. While a few genes show earlier dimorphic expression in B6 mice compared to 129S1, the dominant pattern shows a ∼5 hr delay between B6 and 129S1 mice. (C) Matrix showing the time of onset of dimorphism in 129S1 and B6 mice for male-enriched (top panel) and female-enriched (bottom panel) genes. For example, out of the 32 male-enriched probes that became dimorphically expressed at E11.4 in 129S1, 9 probes were also dimorphically expressed starting at E11.4 in B6 while 18 showed dimorphic expression starting at E11.6 in B6 mice. However, 4 genes are dimorphic in B6 XY gonads at E11.4, but not in 129S1 until E11.6. * indicates significant overlap with p<0.001 evaluated by a hypergeometric test. The highlighted diagonals show the number of genes showing similar onset of dimorphism in 129S1 and B6 mice. Note that some genes that are male- or female-enriched in one strain do not show dimorphism in the other strain.</p
    corecore