28 research outputs found

    Experimental Validation of UNCOVER Predictions

    No full text
    <div><p>(A) RT-PCR validation of newly identified alternative exons with no prior EST evidence. Lane numbers are given in Arabic numerals below the gel; sample numbers of new verifications and negative controls are in Roman numerals above. Lanes 2–5 were verified using flanking primers and therefore show two bands each, the larger one corresponding to the event including the newly identified ACE. Lanes 6–9 used a primer internal to the newly identified exon and therefore only show one band each. Lanes 10–13 are typical examples of ten randomly selected introns in the ENCODE target regions that were not predicted to harbor AS events. Lane 14 shows a blank reaction control without adding template. Lanes 1 and 15 contain size markers spaced at 100 nt intervals, with the strong bands corresponding to 1,000 and 500 nt. Ensembl ID pairs for the known exon upstream of the validated new one and the corresponding gene are as follows: internal exons, lanes 2–6: ENSE00000881911.1:ENSG00000004866.5, ENSE00000862512.1:ENSG00000126217.3, ENSE00001201432.1:ENSG00000168781.5, ENSE00001146476.1:ENSG00000168781.5, and ENSE00001084095.4:ENSG00000164402.2; terminal exons, lanes 7–9: ENSE00001379673.1:ENSG00000159140.5, ENSE00001046164.1:ENSG00000067369.1, and ENSE00000952769.2:ENSG00000142183.3 (a known case as positive control); random negative controls, lanes 10–13: ENSE00001321652.4:ENSG00000161980.2, ENSE00000868377.2:ENSG00000102125.4, ENSE00001239587.1:ENSG00000100220.2, and ENSE00001307891.1:ENSG00000185721.1.</p><p>(B) Example UNCOVER alignment of a newly detected ACE. Aligned nucleotides are connected with a vertical dash in case of identity, a colon in case of a transition, and a dot in case of a transversion. The alignment is labeled with the types of the states that lead to the most likely alignment: C, conserved noncoding sequence; F, 5′ splice site; I, nonconserved intronic sequence; T, 3′ splice site; 1, 2, and 3, coding sequence, with the number giving the position in a codon. The detected ACE is flanked by highly conserved noncoding sequence, a characteristic of true ACEs. The sequence shown corresponds to the event in sample i in (A).</p></div

    Recognition of Unknown Conserved Alternatively Spliced Exons

    No full text
    <div><p>The split structure of most mammalian protein-coding genes allows for the potential to produce multiple different mRNA and protein isoforms from a single gene locus through the process of alternative splicing (AS). We propose a computational approach called UNCOVER based on a pair hidden Markov model to discover conserved coding exonic sequences subject to AS that have so far gone undetected. Applying UNCOVER to orthologous introns of known human and mouse genes predicts skipped exons or retained introns present in both species, while discriminating them from conserved noncoding sequences. The accuracy of the model is evaluated on a curated set of genes with known conserved AS events. The prediction of skipped exons in the ~1% of the human genome represented by the ENCODE regions leads to more than 50 new exon candidates. Five novel predicted AS exons were validated by RT-PCR and sequencing analysis of 15 introns with strong UNCOVER predictions and lacking EST evidence. These results imply that a considerable number of conserved exonic sequences and associated isoforms are still completely missing from the current annotation of known genes. UNCOVER also identifies a small number of candidates for conserved intron retention.</p></div

    Structure of the UNCOVER pHMM

    No full text
    <div><p>The model is used to globally align a pair of orthologous human/mouse introns and detect conserved coding AS events.</p><p>(A) A schematic overview of the model architecture, with circles indicating groups of functionally related states. For accurate splicing, the two ends of an intron must be precisely determined by the splicing machinery. The prominent sites for this process are the 5′ splice site (5′ss) at the junction between the upstream exon and the intron and the 3′ splice site (3′ss) at the junction of the intron and the downstream exon. As reference, pictograms of the mammalian 5′ splice site and 3′ splice site are depicted, in which the letters at individual positions are scaled according to their frequency. We restrict ourselves to U2-type splice sites with perfectly conserved GT–AG dinucleotides. The alignment always starts with the conserved 5′ splice site after the initial GT dinucleotide. The transitions of the model then allow it to pursue several paths, corresponding to different types of AS, indicated by small icons. (1) The “default” is to observe conserved or nonconserved noncoding sequence, possibly alternating between these two. (2) Transitions to an ACE sequence of conserved {3′ splice site, skipped exon, 5′ splice site} are possible at any time, and can also occur more than once. (3) An IRE is modeled by going from a 5′ splice site to a 3′ splice site by only passing through a coding submodel. (4) and (5) An early exit from this codon model through another 5′ splice site leads to an alternative 5′ exon at the beginning of the sequence, or correspondingly to an alternative 3′ exon at the end. The alignment is fixed on the right side by the 3′ splice site at the end of the intron. All splice site states are first-order pair states not allowing for insertions or deletions. The 5′ splice site part of the model covers 9 nt (3 nt in the exon, plus the conserved GT and the following 4 nt in the intron); the 3′ splice site is 23 nt long (18 nt and the conserved AG in the intron plus 3 nt in the exon).</p><p>(B and C) A detailed view of the noncoding intronic submodel (B) and a close-up of the coding submodel (C), with closed circles representing pair states and dashed circles representing single states. Thick straight arrows indicate the allowed start and end states of the submodels. The noncoding conservation (B) is modeled by a first-order pair state, allowing insertions and deletions of individual nucleotides. The null model contains single first-order states representing nonconserved human and mouse intronic sequences. The coding states (C) comprise three second-order pair states for nucleotides in the three codon positions, as well as three second-order single states each for human and mouse to capture species-specific codon insertion/deletion events. The transition matrix ensures that only those insertion/deletion events covering complete codons are admissible.</p></div

    Comparison of alternative mRNA isoforms across 25 human tissues

    No full text
    <p><b>Copyright information:</b></p><p>Taken from "Variation in alternative splicing across human tissues"</p><p>Genome Biology 2004;5(10):R74-R74.</p><p>Published online 13 Sep 2004</p><p>PMCID:PMC545594.</p><p>Copyright © 2004 Yeo et al.; licensee BioMed Central Ltd.</p> Color-coded representation of values between pairs of tissues (see Figure 4 and Materials and methods for definition of SJD). Hierarchical clustering of SJD values using average-linkage clustering. Groups of tissues in clusters with short branch lengths (for example, thyroid/ovary, B-cell/bone) have highly similar patterns of AS. Mean SJD values (versus other 24 tissues) for each tissue

    Selection against Disruption of Predicted ESEs in Different Exon Regions

    No full text
    <p>Summary ORs were calculated for mutations that disrupt RESCUE-ESEs as in <a href="http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.0020268#pbio-0020268-g002" target="_blank">Figure 2</a>, for each of four regions spanning the length of a typical human internal exon. The heights of the blue bars represent the odds that an ESE will be disrupted by a mutation in the set of 2,561 validated SNPs (selected mutations) relative to the odds of disruption in the set of 100,000 simulated (unselected) mutations. Error bars extend one standard deviation on either side of the calculated value.</p

    Measuring Selective Pressure on Each RESCUE-ESE Hexamer

    No full text
    <div><p>Any point mutation alters six overlapping hexamers, and so a database of 8,408 SNP mutations alters a total of approximately 50,000 hexamers in the wild-type (ancestral) allele. In considering all 238 RESCUE-ESE hexamers, the frequency of each ESE hexamer in the total set of ancestral alleles was recorded for the database of SNPs and simulated mutations (8,408 SNP mutations and 100,000 simulated mutations). The ESE frequency in the SNP set was divided by the ESE frequency in the simulated set to calculate the RR for each of the 238 hexamers.</p> <p>(A) The distribution of RR for all 238 ESE hexamers is plotted on a logarithmic scale. A resampling strategy was used to identify 57 ESE hexamers that were significantly conserved (pink bars have an RR less than 1; <i>p</i> < 0.05) and also six ESE hexamers that were not conserved (blue bars have an RR greater than 1; <i>p</i> < 0.05).</p> <p>(B) The output of RESCUE-ESE was compared for several vertebrate genomes (human, mouse, pufferfish, and zebrafish). The set of 238 human RESCUE-ESE hexamers was divided into nonoverlapping subsets based on their conservation in the RESCUE-ESE output generated from other vertebrates. The proportion of ESEs that were significantly conserved in the SNP analysis (as described above in [A]) were recorded for each subset of RESCUE-ESE hexamers and are represented as pink sectors in the pie chart.</p></div

    Density of Predicted ESEs and SNPs along Human Exons

    No full text
    <p>RESCUE-ESE hexamers were searched against a database of 121,000 internal human exons. ESE density (blue curve) was determined as the fraction of hexamers beginning at the given exon position in this dataset that were contained in the RESCUE-ESE set. SNP density (red curve) was determined analogously using SNPs from dbSNP mapped to the exon database. Both curves were smoothed by averaging the densities over a leading (3′ss) or lagging (5′ss) window of ten nucleotides.</p

    Intron Conservation in the PRPP Synthetase Gene

    No full text
    <div><p>(A) Alignment of PRPP synthetase putative orthologs MG07148, NCU06970, FG09299, and AN1965. A black-edged rectangle indicates an intron position passing our quality filters, whereas an unedged gray rectangle indicates an intron position that was removed by our filter. Blue boxes mark raw intron gains, red boxes indicate raw intron losses, and gray boxes within black-edged rectangles highlight all other introns. We manually corrected an annotation error in the first intron of the last row of the alignment.</p> <p>(B) Phylogenetic conservation pattern of introns in the PRPP sythetase gene. Each passing intron position was categorized as being present in A. nidulans (A), F. graminearum (F), M. grisea (M), N. crassa (N), A. nidulans and N. crassa (AN), F. graminearum and M. grisea (FM), or all four organisms (AFMN). There are no passing cases of conservation in three or four species. The number of introns in each category is shown with a purple line. The black error bar plot shows the mean and standard deviation for each category for all 2,008 ortholog sets after fitting to a Poisson distribution (<a href="http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.0020422#s4" target="_blank">see Materials and Methods</a>). The number of introns in M. grisea and N. crassa is significantly higher, at the <i>p</i> < 1 × 10<sup>−9</sup> level.</p></div

    Genome-Wide Identification of Exons with UAGG and GGGG Silencing Motifs

    No full text
    <p>A database of 96,089 orthologous human and mouse exon pairs was searched for TAGG located anywhere in the exon and GGGG in bases 3–10 of the intron. Venn diagrams indicate the number of exons containing either or both sequence motifs in the human subset and the mouse subset of the database. The number of exons (19) in which UAGG and GGGG silencer motifs are conserved in orthologous human and mouse exons is also shown (intersection). The motif patterns are shown in the context of the exon (uppercase) and 5′ splice site region (lowercase) for 12 examples from the intersection dataset (human sequences are shown). Colon indicates 5′ splice site. The conserved TAGG and GGGG motifs are highlighted in red to illustrate variations in their positions. Gene name (HUGO ID) and exon number within the gene are indicated at far right. For one uncharacterized transcript, the GenBank accession is given instead (NM_018469_8). </p

    Alignment Filtering Protocol

    No full text
    <div><p>(A) Schematic of filtering protocol applied to a ten-residue window on each side of every intron position. If either side failed the filter, the position was discarded.</p> <p>(B) Distributions of minimum percent identity and similarity in ten-residue windows around 181 randomly selected intron positions, for three manual classifications. The minima were taken between the left and right windows. The yellow lines indicate the chosen thresholds of at least 50% similarity and 30% identity, and bars are colored yellow if they fall above the thresholds (pass) or orange if they fall below the thresholds (fail). Parentheses indicate the number of introns in each class that pass the cutoff and the total number of introns in that class. The five lowest-percent identity and similarity bars, containing 77 positions, in the “non-homologous” plot are omitted so as to not obscure the rest of the histogram.</p></div
    corecore