9 research outputs found

    TassDB2 - A comprehensive database of subtle alternative splicing events

    Get PDF
    Background: Subtle alternative splicing events involving tandem splice sites separated by a short (2-12 nucleotides) distance are frequent and evolutionarily widespread in eukaryotes, and a major contributor to the complexity of transcriptomes and proteomes. However, these events have been either omitted altogether in databases on alternative splicing, or only the cases of experimentally confirmed alternative splicing have been reported. Thus, a database which covers all confirmed cases of subtle alternative splicing as well as the numerous putative tandem splice sites (which might be confirmed once more transcript data becomes available), and allows to search for tandem splice sites with specific features and download the results, is a valuable resource for targeted experimental studies and large-scale bioinformatics analyses of tandem splice sites. Towards this goal we recently set up TassDB (Tandem Splice Site DataBase, version 1), which stores data about alternative splicing events at tandem splice sites separated by 3 nt in eight species. \ud Description: We have substantially revised and extended TassDB. The currently available version 2 contains extensive information about tandem splice sites separated by 2-12 nt for the human and mouse transcriptomes including data on the conservation of the tandem motifs in five vertebrates. TassDB2 offers a user-friendly interface to search for specific genes or for genes containing tandem splice sites with specific features as well as the possibility to download result datasets. For example, users can search for cases of alternative splicing where the proportion of EST/mRNA evidence supporting the minor isoform exceeds a specific threshold, or where the difference in splice site scores is specified by the user. The predicted impact of each event on the protein is also reported, along with information about being a putative target for the nonsense-mediated decay \ud (NMD) pathway. Links are provided to the UCSC genome browser and other external resources.\ud Conclusion: TassDB2, available via http://www.tassdb.info, provides comprehensive resources for researchers interested in both targeted experimental studies and large-scale bioinformatics analyses of short distance tandem splice sites.\ud \ud doi: 10.1186/1471-2105-11-216\u

    Bioinformatics Analyses of Alternative Splicing: Predition of alternative splicing events in animals and plants using Machine Learning and analysis of the extent and conservation of subtle alternative splicing

    Get PDF
    Alternatives Spleißen (AS) ist ein Mechanismus, durch den ein Multi-Exon-Gen verschiedene Transkripte und damit verschiedene Proteine exprimieren kann. AS trägt wesentlich zur Komplexität und Vielfalt eukaryotischer Transkriptome und Proteome bei. Die Bioinformatik hat in den vergangenen zehn Jahren entscheidenden Beiträge zu unserem Verständnis des AS in Bezug auf Verbreitung, Umfang und Konservierung der verschiedenen Klassen, Evolution, Regulierung und biologische Funktion geliefert. Zum Nachweis des AS im großen Maßstab wurden meist Verfahren zur Genom- und Transkriptom-weiten Alignierung von EST- und mRNA-Daten sowie Microarray-Analysen eingesetzt, die weitestgehend auf bioinformatischen Methoden basieren. Diese wurden durch rechnergestützte Verfahren zur Charakterisierung und Vorhersage von AS ergänzt, die zeigen, wie sich konstitutive und alternative Spleißorte sowie Exons unterscheiden. Die vorliegende Dissertationsschrift beschäftigt sich mit bioinformatischen Analysen ausgewählter Aspekte des AS. Im ersten Teil habe ich Verfahren zur Vorhersage des AS entwickelt, ohne dabei auf Datensätze exprimierter Sequenzen zurückzugreifen. Insbesondere habe ich Ansätze zur Vorhersage von Kassetten-Exons mittels Bayessches Netze (BN) weiterentwickelt und neue diskriminierende Merkmale etabliert. Diese verbesserten deutlich die Richtig-Positiv-Rate von publizierten 50% auf 61%, bei einer stringenten Falsch-Positiv-Rate von nur 0,5%. Ich konnte zeigen, dass Exons, die als konstitutiv gekennzeichnet waren, denen aber durch das BN eine hohe Wahrscheinlichkeit zugeweisen wurde, alternativ zu sein, in der Tat durch neueste Expressionsdaten als alternativ bestätigt wurden. Bei gleichen Datensätzen und Merkmalen entspricht die Leistungsfähigkeit eines BN der einer publizierten Support-Vektor-Maschine (SVM), was darauf hinweist, dass verlässliche Ergebnisse bei der Klassifikation mehr von den Merkmalen als von der Wahl des Klassifikators abhängen. Im zweiten Teil habe ich den BN-Ansatz auf eine umfangreiche und evolutionär weit verbreitete Klasse von AS-Ereignissen ausgeweitet, die als NAGNAG-Tandem-Spleißstellen bezeichnet werden und bei denen die alternativen Spleißorte nur 3 Nukleotide (nt) voneinander getrennt sind. Die sorgfältige Zusammenstellung der Trainings- und Test-Datensätze bei der Vorhersage des NAGNAG-AS trug zu einer ausgewogenen Sensitivität und Spezifität von 92% bei. Vorhersagen eines auf dem vereinigten Datensatz trainierten BN konnten in 81% (38/47) der Fälle experimentell bestätigt werden. Im Rahmen dieser Studie wurde damit einer der gegenwärtig umfangreichsten Datensätze zur experimentellen Verifizierung von Vorhersagen des AS generiert. Ein BN, trainiert anhand menschlicher Daten, erzielt ähnliche gute Ergebnisse bei vier anderen Wirbeltier-Genomen. Nur leichte Einbußen bei Vorhersagen für Drosophila melanogaster und Caenorhabditis elegans weisen darauf hin, dass der zugrunde liegende Spleißmechanismus über weite evolutionäre Distanzen konserviert zu seien scheint. Schließlich verwendete ich die Vorhersagegenauigkeit der experimentellen Validierung, um die Zahl der noch unentdeckten alternativen NAGNAGs abzuschätzen. Die Ergebnisse deuten darauf hin, dass der Mechanismus des NAGNAG-AS einfach, stochastisch und konserviert ist - unter Wirbeltieren und darüber hinaus. Des weiteren habe ich den BN-Ansatz zur Charakterisierung und Vorhersage von NAGNAG-AS in Physcomitrella patens, einem Moos, eingesetzt. Dies ist eine der ersten Studien zur Vorhersage von AS in Pflanzen, ohne dabei auf Datensätze von exprimierten Sequenzen zurückzugreifen. Wir erreichten ähnliche Ergebnisse, wie in unseren anderen Arbeiten zur Vorhersage NAGNAG-AS. Eine unabhängige Validierung mittels 454-NextGen-Sequenzdaten zeigte Richtig-Positiv-Raten von 64%-79% für gut unterstützt Fälle von NAGNAG-AS. Damit scheint der Mechanismus des NAGNAG-AS bei Pflanzen dem der Tiere zu ähneln

    SpliceDisease database: linking RNA splicing and disease

    Get PDF
    RNA splicing is an important aspect of gene regulation in many organisms. Splicing of RNA is regulated by complicated mechanisms involving numerous RNA-binding proteins and the intricate network of interactions among them. Mutations in cis-acting splicing elements or its regulatory proteins have been shown to be involved in human diseases. Defects in pre-mRNA splicing process have emerged as a common disease-causing mechanism. Therefore, a database integrating RNA splicing and disease associations would be helpful for understanding not only the RNA splicing but also its contribution to disease. In SpliceDisease database, we manually curated 2337 splicing mutation disease entries involving 303 genes and 370 diseases, which have been supported experimentally in 898 publications. The SpliceDisease database provides information including the change of the nucleotide in the sequence, the location of the mutation on the gene, the reference Pubmed ID and detailed description for the relationship among gene mutations, splicing defects and diseases. We standardized the names of the diseases and genes and provided links for these genes to NCBI and UCSC genome browser for further annotation and genomic sequences. For the location of the mutation, we give direct links of the entry to the respective position/region in the genome browser. The users can freely browse, search and download the data in SpliceDisease at http://cmbi.bjmu.edu.cn/sdisease

    Increased complexity of Tmem16a/Anoctamin 1 transcript alternative splicing

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>TMEM16A (Anoctamin 1; ANO1) is an eight transmembrane protein that functions as a calcium-activated chloride channel. <it>TMEM16A </it>in human exhibits alternatively spliced exons (6b, 13 and 15), which confer important roles in the regulation of channel function. Mouse <it>Tmem16a </it>is reported to consist of 25 exons that code for a 956 amino acid protein. In this study our aim was to provide details of mouse <it>Tmem16a </it>genomic structure and to investigate if <it>Tmem16a </it>transcript undergoes alternative splicing to generate channel diversity.</p> <p>Results</p> <p>We identified <it>Tmem16a </it>transcript variants consisting of alternative exons 6b, 10, 13, 14, 15 and 18. Our findings indicate that many of these exons are expressed in various combinations and that these splicing events are mostly conserved between mouse and human. In addition, we confirmed the expression of these exon variants in other mouse tissues. Additional splicing events were identified including a novel conserved exon 13b, tandem splice sites of exon 1 and 21 and two intron retention events.</p> <p>Conclusion</p> <p>Our results suggest that <it>Tmem16a </it>gene is significantly more complex than previously described. The complexity is especially evident in the region spanning exons 6 through 16 where a number of the alternative splicing events are thought to affect calcium sensitivity, voltage dependence and the kinetics of activation and deactivation of this calcium-activated chloride channel. The identification of multiple <it>Tmem16a </it>splice variants suggests that alternative splicing is an exquisite mechanism that operates to diversify TMEM16A channel function in both physiological and pathophysiological conditions.</p

    Alternative splicing and trans-splicing events revealed by analysis of the Bombyx mori transcriptome

    Get PDF
    Alternative splicing and trans-splicing events have not been systematically studied in the silkworm Bombyx mori. Here, the silkworm transcriptome was analyzed by RNA-seq. We identified 320 novel genes, modified 1140 gene models, and found thousands of alternative splicing and 58 trans-splicing events. Studies of three SR proteins show that both their alternative splicing patterns and mRNA products are conserved from insect to human, and one isoform of Srsf6 with a retained intron is expressed sex-specifically in silkworm gonads. Trans-splicing of mod(mdg4) in silkworm was experimentally confirmed. We identified integrations from a common 5′-gene with 46 newly identified alternative 3′-exons that are located on both DNA strands over a 500-kb region. Other trans-splicing events in B. mori were predicted by bioinformatic analysis, in which 12 events were confirmed by RT-PCR, six events were further validated by chimeric SNPs, and two events were confirmed by allele-specific RT-PCR in F 1 hybrids from distinct silkworm lines of JS and L10, indicating that trans-splicing is more widespread in insects than previously thought. Analysis of the B. mori transcriptome by RNA-seq provides valuable information of regulatory alternative splicing events. The conservation of splicing events across species and newly identified trans-splicing events suggest that B. mori is a good model for future studies. Published by Cold Spring Harbor Laboratory Press. Copyrigh

    Coupling and Coordination in Gene Expression Processes with Pre-mRNA Splicing

    Get PDF
    A processing is a tightly regulated and highly complex pathway which includes transcription, splicing, editing, transportation, translation and degradation. It has been well-documented that splicing of RNA polymerase II medicated nascent transcripts occurs co-transcriptionally and is functionally coupled to other RNA processing. Recently, increasing experimental evidence indicated that pre-mRNA splicing influences RNA degradation and vice versa. In this review, we summarized the recent findings demonstrating the coupling of these two processes. In addition, we highlighted the importance of splicing in the production of intronic miRNA and circular RNAs, and hence the discovery of the novel mechanisms in the regulation of gene expression.published_or_final_versio

    Combinatorial biological complexity: a study of amino acid side chains and alternative splicing

    Get PDF
    Both, laymen and experts have always been intrigued by nature’s vast complexity and variety. Often, these phenomena arise from combination of parts, as for example, cell types of the human body, or the diverse proteins of a cell. In this thesis I investigate three instances of combinatorial complexity: combinations of aliphatic amino acid side chains, alternative mRNA splicing in fungi, and mutually exclusively spliced exons in human and mouse. In the first part the number of aliphatic amino acid side chains is studied. Structural combinations yield a vast theoretical number, yet we find that only a fraction of them is realized in nature. Reasons especially with respect to restrictions by the genetic code are discussed. Moreover, strategies for the need for increased diversity are examined. In the second part, the extent of alternative splicing (AS) in fungi is investigated. A genome-wide, comparative multi-species study is conducted. I find that AS is common in fungi, but with lower frequency compared to plants and animals. AS is more common in more complex fungi, and is over-represented in pathogens. It is hypothesized that AS contributes to multi-cellular complexity in fungi. In the third part, mutually exclusive exons (MXEs) of mouse and human are detected and characterized. Rather unexpected patterns arose: the majority of MXEs originate from non-adjacent exons and frequently appear in clusters. Known regulatory mechanisms of MXE splicing are unsuitable for these MXEs, and thus, new mechanisms have to be sought. Summarizing it is hypothesized that complexity from combinations constitutes a universal principle in biology. However, there seems to be a need to restrict the combinatorial potential. This is highlighted by the interdependence of MXEs and the low number of realized amino acids in the genetic code. Combinatorial complexity and its restriction are discussed with respect to other biological systems to further substantiate the hypotheses

    Combination of novel and public RNA-seq datasets to generate an mRNA expression atlas for the domestic chicken

    Get PDF
    Background: The domestic chicken (Gallus gallus) is widely used as a model in developmental biology and is also an important livestock species. We describe a novel approach to data integration to generate an mRNA expression atlas for the chicken spanning major tissue types and developmental stages, using a diverse range of publicly-archived RNA-seq datasets and new data derived from immune cells and tissues. Results: Randomly down-sampling RNA-seq datasets to a common depth and quantifying expression against a reference transcriptome using the mRNA quantitation tool Kallisto ensured that disparate datasets explored comparable transcriptomic space. The network analysis tool Graphia was used to extract clusters of co-expressed genes from the resulting expression atlas, many of which were tissue or cell-type restricted, contained transcription factors that have previously been implicated in their regulation, or were otherwise associated with biological processes, such as the cell cycle. The atlas provides a resource for the functional annotation of genes that currently have only a locus ID. We cross-referenced the RNA-seq atlas to a publicly available embryonic Cap Analysis of Gene Expression (CAGE) dataset to infer the developmental time course of organ systems, and to identify a signature of the expansion of tissue macrophage populations during development. Conclusion: Expression profiles obtained from public RNA-seq datasets - despite being generated by different laboratories using different methodologies - can be made comparable to each other. This meta-analytic approach to RNA-seq can be extended with new datasets from novel tissues, and is applicable to any species

    Relative Timing of Intron Gain and a New Marker for Phylogenetic Analyses

    Get PDF
    Despite decades of effort by molecular systematists, the trees of life of eukaryotic organisms still remain partly unresolved or in conflict with each other. An ever increasing number of fully-sequenced genomes of various eukaryotes allows to consider gene and species phylogenies at genome-scale. However, such phylogenomics-based approaches also revealed that more taxa and more and more gene sequences are not the ultimate solution to fully resolve these conflicts, and that there is a need for sequence-independent phylogenetic meta-characters that are derived from genome sequences. Spliceosomal introns are characteristic features of eukaryotic nuclear genomes. The relatively rare changes of spliceosomal intron positions have already been used as genome-level markers, both for the estimation of intron evolution and phylogenies, however with variable success. In this thesis, a specific subset of these changes is introduced and established as a novel phylogenetic marker, termed near intron pair (NIP). These characters are inferred from homologous genes that contain mutually-exclusive intron presences at pairs of coding sequence (CDS) positions in close proximity. The idea that NIPs are powerful characters is based on the assumption that both very small exons and multiple intron gains at the same position are rare. To obtain sufficient numbers of NIP character data from genomic and alignment data sets in a consistent and flexible way, the implementation of a computational pipeline was a main goal of this work. Starting from orthologous (or more general: homologous) gene datasets comprising genomic sequences and corresponding CDS transcript annotations, the multiple alignment generation is an integral part of this pipeline. The alignment can be calculated at the amino acid level utilizing external tools (e.g. transAlign) and results in a codon alignment via back-translation. Guided by the multiple alignment, the positionally homologous intron positions should become apparent when mapped individually for each transcript. The pipeline proceeds at this stage to output portions of the intron-annotated alignment that contain at least one candidate of a NIP character. In a subsequent pipeline script, these collected so-called NIP region files are finally converted to binary state characters representing valid NIPs in dependence of quality filter constraints concerning, e.g., the amino acid alignment conservation around intron loci and splice sites, to name a few. The computational pipeline tools provide the researcher to elaborate on NIP character matrices that can be used for tree inference, e.g., using the maximum parsimony approach. In a first NIP-based application, the phylogenetic position of major orders of holometabolic insects (more specifically: the Coleoptera-Hymenoptera-Mecopterida trifurcation) was evaluated in a cladistic sense. As already suggested during a study on the eIF2gamma gene based on two NIP cases (Krauss et al. 2005), the genome-scale evaluation supported Hymenoptera as sister group to an assemblage of Coleoptera and Mecopterida, in agreement with other studies, but contradicting the previously established view. As part of the genome paper describing a new species of twisted-wing parasites (Strepsiptera), the NIP method was employed to help to resolve the phylogenetic position of them within (holometabolic) insects. Together with analyses of sequence patterns and a further meta-character, it revealed twisted-wing parasites as being the closest relatives of the mega-diverse beetles. NIP-based reconstructions of the metazoan tree covering a broad selection of representative animal species also identified some weaknesses of the NIP approach that may suffer e.g. from alignment/ortholog prediction artifacts (depending on the depth of range of taxa) and systematic biases (long branch attraction artifacts, due to unequal evolutionary rates of intron gain/loss and the use of the maximum parsimony method). In a further study, the identification of NIPs within the recently diverged genus Drosophila could be utilized to characterize recent intron gain events that apparently involved several cases of intron sliding and tandem exon duplication, albeit the mechanisms of gain for the majority of cases could not be elucidated. Finally, the NIP marker could be established as a novel phylogenetic marker, in particular dedicated to complementarily explore the wealth of genome data for phylogenetic purposes and to address open questions of intron evolution
    corecore