22,471 research outputs found

    evolution, structure and function of metazoan splicing factor PRPF39

    Get PDF
    In the yeast U1 snRNP the Prp39/Prp42 heterodimer is essential for early steps of spliceosome assembly. In metazoans no Prp42 ortholog exists, raising the question how the heterodimer is functionally substituted. Here we present the crystal structure of murine PRPF39, which forms a homodimer. Structure-guided point mutations disrupt dimer formation and inhibit splicing, manifesting the homodimer as functional unit. PRPF39 expression is controlled by NMD-inducing alternative splicing in mice and human, suggesting a role in adapting splicing efficiency to cell type specific requirements. A phylogenetic analysis reveals coevolution of shortened U1 snRNA and the absence of Prp42, which correlates with overall splicing complexity in different fungi. While current models correlate the diversity of spliceosomal proteins with splicing complexity, our study highlights a contrary case. We find that organisms with higher splicing complexity have substituted the Prp39/Prp42 heterodimer with a PRPF39 homodimer

    Regulation of splicing factors by alternative splicing and NMD is conserved between kingdoms yet evolutionarily flexible.

    Get PDF
    Ultraconserved elements, unusually long regions of perfect sequence identity, are found in genes encoding numerous RNA-binding proteins including arginine-serine rich (SR) splicing factors. Expression of these genes is regulated via alternative splicing of the ultraconserved regions to yield mRNAs that are degraded by nonsense-mediated mRNA decay (NMD), a process termed unproductive splicing (Lareau et al. 2007; Ni et al. 2007). As all human SR genes are affected by alternative splicing and NMD, one might expect this regulation to have originated in an early SR gene and persisted as duplications expanded the SR family. But in fact, unproductive splicing of most human SR genes arose independently (Lareau et al. 2007). This paradox led us to investigate the origin and proliferation of unproductive splicing in SR genes. We demonstrate that unproductive splicing of the splicing factor SRSF5 (SRp40) is conserved among all animals and even observed in fungi; this is a rare example of alternative splicing conserved between kingdoms, yet its effect is to trigger mRNA degradation. As the gene duplicated, the ancient unproductive splicing was lost in paralogs, and distinct unproductive splicing evolved rapidly and repeatedly to take its place. SR genes have consistently employed unproductive splicing, and while it is exceptionally conserved in some of these genes, turnover in specific events among paralogs shows flexible means to the same regulatory end

    Flexible RNA design under structure and sequence constraints using formal languages

    Get PDF
    The problem of RNA secondary structure design (also called inverse folding) is the following: given a target secondary structure, one aims to create a sequence that folds into, or is compatible with, a given structure. In several practical applications in biology, additional constraints must be taken into account, such as the presence/absence of regulatory motifs, either at a specific location or anywhere in the sequence. In this study, we investigate the design of RNA sequences from their targeted secondary structure, given these additional sequence constraints. To this purpose, we develop a general framework based on concepts of language theory, namely context-free grammars and finite automata. We efficiently combine a comprehensive set of constraints into a unifying context-free grammar of moderate size. From there, we use generic generic algorithms to perform a (weighted) random generation, or an exhaustive enumeration, of candidate sequences. The resulting method, whose complexity scales linearly with the length of the RNA, was implemented as a standalone program. The resulting software was embedded into a publicly available dedicated web server. The applicability demonstrated of the method on a concrete case study dedicated to Exon Splicing Enhancers, in which our approach was successfully used in the design of \emph{in vitro} experiments.Comment: ACM BCB 2013 - ACM Conference on Bioinformatics, Computational Biology and Biomedical Informatics (2013

    Bridging the synaptic gap: neuroligins and neurexin I in Apis mellifera

    Get PDF
    Vertebrate studies show neuroligins and neurexins are binding partners in a trans-synaptic cell adhesion complex, implicated in human autism and mental retardation disorders. Here we report a genetic analysis of homologous proteins in the honey bee. As in humans, the honeybee has five large (31-246 kb, up to 12 exons each) neuroligin genes, three of which are tightly clustered. RNA analysis of the neuroligin-3 gene reveals five alternatively spliced transcripts, generated through alternative use of exons encoding the cholinesterase-like domain. Whereas vertebrates have three neurexins the bee has just one gene named neurexin I (400 kb, 28 exons). However alternative isoforms of bee neurexin I are generated by differential use of 12 splice sites, mostly located in regions encoding LNS subdomains. Some of the splice variants of bee neurexin I resemble the vertebrate alpha- and beta-neurexins, albeit in vertebrates these forms are generated by alternative promoters. Novel splicing variations in the 3' region generate transcripts encoding alternative trans-membrane and PDZ domains. Another 3' splicing variation predicts soluble neurexin I isoforms. Neurexin I and neuroligin expression was found in brain tissue, with expression present throughout development, and in most cases significantly up-regulated in adults. Transcripts of neurexin I and one neuroligin tested were abundant in mushroom bodies, a higher order processing centre in the bee brain. We show neuroligins and neurexins comprise a highly conserved molecular system with likely similar functional roles in insects as vertebrates, and with scope in the honeybee to generate substantial functional diversity through alternative splicing. Our study provides important prerequisite data for using the bee as a model for vertebrate synaptic development.Australian National University PhD Scholarship Award to Sunita Biswas

    Top-Down Skiplists

    Full text link
    We describe todolists (top-down skiplists), a variant of skiplists (Pugh 1990) that can execute searches using at most log2εn+O(1)\log_{2-\varepsilon} n + O(1) binary comparisons per search and that have amortized update time O(ε1logn)O(\varepsilon^{-1}\log n). A variant of todolists, called working-todolists, can execute a search for any element xx using log2εw(x)+o(logw(x))\log_{2-\varepsilon} w(x) + o(\log w(x)) binary comparisons and have amortized search time O(ε1logw(w))O(\varepsilon^{-1}\log w(w)). Here, w(x)w(x) is the "working-set number" of xx. No previous data structure is known to achieve a bound better than 4log2w(x)4\log_2 w(x) comparisons. We show through experiments that, if implemented carefully, todolists are comparable to other common dictionary implementations in terms of insertion times and outperform them in terms of search times.Comment: 18 pages, 5 figure

    Computational prediction of splicing regulatory elements shared by Tetrapoda organisms

    Get PDF
    Background: auxiliary splicing sequences play an important role in ensuring accurate and efficient splicing by promoting or repressing recognition of authentic splice sites. These cis-acting motifs have been termed splicing enhancers and silencers and are located both in introns and exons. They co-evolved into an intricate splicing code together with additional functional constraints, such as tissue-specific and alternative splicing patterns. We used orthologous exons extracted from the University of California Santa Cruz multiple genome alignments of human and 22 Tetrapoda organisms to predict candidate enhancers and silencers that have reproducible and statistically significant bias towards annotated exonic boundaries.Results: a total of 2,546 Tetrapoda enhancers and silencers were clustered into 15 putative core motifs based on their Markov properties. Most of these elements have been identified previously, but 118 putative silencers and 260 enhancers (~15%) were novel. Examination of previously published experimental data for the presence of predicted elements showed that their mutations in 21/23 (91.3%) cases altered the splicing pattern as expected. Predicted intronic motifs flanking 3' and 5' splice sites had higher evolutionary conservation than other sequences within intronic flanks and the intronic enhancers were markedly differed between 3' and 5' intronic flanks.Conclusion: difference in intronic enhancers supporting 5' and 3' splice sites suggests an independent splicing commitment for neighboring exons. Increased evolutionary conservation for ISEs/ISSs within intronic flanks and effect of modulation of predicted elements on splicing suggest functional significance of found elements in splicing regulation. Most of the elements identified were shown to have direct implications in human splicing and therefore could be useful for building computational splicing models in biomedical researc

    Navigating in a sea of repeats in RNA-seq without drowning

    Full text link
    The main challenge in de novo assembly of NGS data is certainly to deal with repeats that are longer than the reads. This is particularly true for RNA- seq data, since coverage information cannot be used to flag repeated sequences, of which transposable elements are one of the main examples. Most transcriptome assemblers are based on de Bruijn graphs and have no clear and explicit model for repeats in RNA-seq data, relying instead on heuristics to deal with them. The results of this work are twofold. First, we introduce a formal model for repre- senting high copy number repeats in RNA-seq data and exploit its properties for inferring a combinatorial characteristic of repeat-associated subgraphs. We show that the problem of identifying in a de Bruijn graph a subgraph with this charac- teristic is NP-complete. In a second step, we show that in the specific case of a local assembly of alternative splicing (AS) events, we can implicitly avoid such subgraphs. In particular, we designed and implemented an algorithm to efficiently identify AS events that are not included in repeated regions. Finally, we validate our results using synthetic data. We also give an indication of the usefulness of our method on real data

    Extraction of Transcript Diversity from Scientific Literature

    Get PDF
    Transcript diversity generated by alternative splicing and associated mechanisms contributes heavily to the functional complexity of biological systems. The numerous examples of the mechanisms and functional implications of these events are scattered throughout the scientific literature. Thus, it is crucial to have a tool that can automatically extract the relevant facts and collect them in a knowledge base that can aid the interpretation of data from high-throughput methods. We have developed and applied a composite text-mining method for extracting information on transcript diversity from the entire MEDLINE database in order to create a database of genes with alternative transcripts. It contains information on tissue specificity, number of isoforms, causative mechanisms, functional implications, and experimental methods used for detection. We have mined this resource to identify 959 instances of tissue-specific splicing. Our results in combination with those from EST-based methods suggest that alternative splicing is the preferred mechanism for generating transcript diversity in the nervous system. We provide new annotations for 1,860 genes with the potential for generating transcript diversity. We assign the MeSH term “alternative splicing” to 1,536 additional abstracts in the MEDLINE database and suggest new MeSH terms for other events. We have successfully extracted information about transcript diversity and semiautomatically generated a database, LSAT, that can provide a quantitative understanding of the mechanisms behind tissue-specific gene expression. LSAT (Literature Support for Alternative Transcripts) is publicly available at http://www.bork.embl.de/LSAT/
    corecore