28 research outputs found
Strategies for Identifying RNA Splicing Regulatory Motifs and Predicting Alternative Splicing Events
Hollywood: a comparative relational database of alternative splicing
RNA splicing is an essential step in gene expression, and is often variable, giving rise to multiple alternatively spliced mRNA and protein isoforms from a single gene locus. The design of effective databases to support experimental and computational investigations of alternative splicing (AS) is a significant challenge. In an effort to integrate accurate exon and splice site annotation with current knowledge about splicing regulatory elements and predicted AS events, and to link information about the splicing of orthologous genes in different species, we have developed the Hollywood system. This database was built upon genomic annotation of splicing patterns of known genes derived from spliced alignment of complementary DNAs (cDNAs) and expressed sequence tags, and links features such as splice site sequence and strength, exonic splicing enhancers and silencers, conserved and non-conserved patterns of splicing, and cDNA library information for inferred alternative exons. Hollywood was implemented as a relational database and currently contains comprehensive information for human and mouse. It is accompanied by a web query tool that allows searches for sets of exons with specific splicing characteristics or splicing regulatory element composition, or gives a graphical or sequence-level summary of splicing patterns for a specific gene. A streamlined graphical representation of gene splicing patterns is provided, and these patterns can alternatively be layered onto existing information in the UCSC Genome Browser. The database is accessible at
Comparative analysis of sequence features involved in the recognition of tandem splice sites
<p>Abstract</p> <p>Background</p> <p>The splicing of pre-mRNAs is conspicuously often variable and produces multiple alternatively spliced (AS) isoforms that encode different messages from one gene locus. Computational studies uncovered a class of highly similar isoforms, which were related to tandem 5'-splice sites (5'ss) and 3'-splice sites (3'ss), yet with very sparse anecdotal evidence in experimental studies. To compare the types and levels of alternative tandem splice site exons occurring in different human organ systems and cell types, and to study known sequence features involved in the recognition and distinction of neighboring splice sites, we performed large-scale, stringent alignments of cDNA sequences and ESTs to the human and mouse genomes, followed by experimental validation.</p> <p>Results</p> <p>We analyzed alternative 5'ss exons (A5Es) and alternative 3'ss exons (A3Es), derived from transcript sequences that were aligned to assembled genome sequences to infer patterns of AS occurring in several thousands of genes. Comparing the levels of overlapping (tandem) and non-overlapping (competitive) A5Es and A3Es, a clear preference of isoforms was seen for tandem acceptors and donors, with four nucleotides and three to six nucleotides long exon extensions, respectively. A subset of inferred A5E tandem exons was selected and experimentally validated. With the focus on A5Es, we investigated their transcript coverage, sequence conservation and base-paring to U1 snRNA, proximal and distal splice site classification, candidate motifs for <it>cis</it>-regulatory activity, and compared A5Es with A3Es, constitutive and pseudo-exons, in <it>H. sapiens </it>and <it>M. musculus</it>. The results reveal a small but authentic enriched set of tandem splice site preference, with specific distances between proximal and distal 5'ss (3'ss), which showed a marked dichotomy between the levels of in- and out-of-frame splicing for A5Es and A3Es, respectively, identified a number of candidate NMD targets, and allowed a rough estimation of a number of undetected tandem donors based on splice site information.</p> <p>Conclusion</p> <p>This comparative study distinguishes tandem 5'ss and 3'ss, with three to six nucleotides long extensions, as having unusually high proportions of AS, experimentally validates tandem donors in a panel of different human tissues, highlights the dichotomy in the types of AS occurring at tandem splice sites, and elucidates that human alternative exons spliced at overlapping 5'ss posses features of typical splice variants that could well be beneficial for the cell.</p
Single Nucleotide Polymorphism–Based Validation of Exonic Splicing Enhancers
Because deleterious alleles arising from mutation are filtered by natural selection, mutations that create such alleles will be underrepresented in the set of common genetic variation existing in a population at any given time. Here, we describe an approach based on this idea called VERIFY (variant elimination reinforces functionality), which can be used to assess the extent of natural selection acting on an oligonucleotide motif or set of motifs predicted to have biological activity. As an application of this approach, we analyzed a set of 238 hexanucleotides previously predicted to have exonic splicing enhancer (ESE) activity in human exons using the relative enhancer and silencer classification by unanimous enrichment (RESCUE)-ESE method. Aligning the single nucleotide polymorphisms (SNPs) from the public human SNP database to the chimpanzee genome allowed inference of the direction of the mutations that created present-day SNPs. Analyzing the set of SNPs that overlap RESCUE-ESE hexamers, we conclude that nearly one-fifth of the mutations that disrupt predicted ESEs have been eliminated by natural selection (odds ratio = 0.82 ± 0.05). This selection is strongest for the predicted ESEs that are located near splice sites. Our results demonstrate a novel approach for quantifying the extent of natural selection acting on candidate functional motifs and also suggest certain features of mutations/SNPs, such as proximity to the splice site and disruption or alteration of predicted ESEs, that should be useful in identifying variants that might cause a biological phenotype
An Unusual 500,000 Bases Long Oscillation of Guanine and Cytosine Content in Human Chromosome 21
An oscillation with a period of around 500 kb in guanine and cytosine content
(GC%) is observed in the DNA sequence of human chromosome 21. This oscillation
is localized in the rightmost one-eighth region of the chromosome, from 43.5 Mb
to 46.5 Mb. Five cycles of oscillation are observed in this region with six
GC-rich peaks and five GC-poor valleys. The GC-poor valleys comprise regions
with low density of CpG islands and, alternating between the two DNA strands,
low gene density regions. Consequently, the long-range oscillation of GC%
result in spacing patterns of both CpG island density, and to a lesser extent,
gene densities.Comment: 15 pages (figures included), 5 figure
Spectral Analysis of Guanine and Cytosine Fluctuations of Mouse Genomic DNA
We study global fluctuations of the guanine and cytosine base content (GC%)
in mouse genomic DNA using spectral analyses. Power spectra S(f) of GC%
fluctuations in all nineteen autosomal and two sex chromosomes are observed to
have the universal functional form S(f) \sim 1/f^alpha (alpha \approx 1) over
several orders of magnitude in the frequency range 10^-7< f < 10^-5 cycle/base,
corresponding to long-ranging GC% correlations at distances between 100 kb and
10 Mb. S(f) for higher frequencies (f > 10^-5 cycle/base) shows a flattened
power-law function with alpha < 1 across all twenty-one chromosomes. The
substitution of about 38% interspersed repeats does not affect the functional
form of S(f), indicating that these are not predominantly responsible for the
long-ranged multi-scale GC% fluctuations in mammalian genomes. Several
biological implications of the large-scale GC% fluctuation are discussed,
including neutral evolutionary history by DNA duplication, chromosomal bands,
spatial distribution of transcription units (genes), replication timing, and
recombination hot spots.Comment: 15 pages (figures included), 2 figure
Recommended from our members
Variation in alternative splicing across human tissues
Background: Alternative pre-mRNA splicing (AS) is widely used by higher eukaryotes to generate different protein isoforms in specific cell or tissue types. To compare AS events across human tissues, we analyzed the splicing patterns of genomically aligned expressed sequence tags (ESTs) derived from libraries of cDNAs from different tissues. Results: Controlling for differences in EST coverage among tissues, we found that the brain and testis had the highest levels of exon skipping. The most pronounced differences between tissues were seen for the frequencies of alternative 3' splice site and alternative 5' splice site usage, which were about 50 to 100% higher in the liver than in any other human tissue studied. Quantifying differences in splice junction usage, the brain, pancreas, liver and the peripheral nervous system had the most distinctive patterns of AS. Analysis of available microarray expression data showed that the liver had the most divergent pattern of expression of serine-arginine protein and heterogeneous ribonucleoprotein genes compared to the other human tissues studied, possibly contributing to the unusually high frequency of alternative splice site usage seen in liver. Sequence motifs enriched in alternative exons in genes expressed in the brain, testis and liver suggest specific splicing factors that may be important in AS regulation in these tissues. Conclusions: This study distinguishes the human brain, testis and liver as having unusually high levels of AS, highlights differences in the types of AS occurring commonly in different tissues, and identifies candidate cis-regulatory elements and trans-acting factors likely to have important roles in tissue-specific AS in human cells