2,183 research outputs found

    An expectation-maximization algorithm for probabilistic reconstructions of full-length isoforms from splice graphs.

    Reconstructing full-length transcript isoforms from sequence fragments (such as ESTs) is a major interest and challenge for bioinformatic analysis of pre-mRNA alternative splicing. This problem has been formulated as finding traversals across the splice graph, which is a directed acyclic graph (DAG) representation of gene structure and alternative splicing. In this manuscript we introduce a probabilistic formulation of the isoform reconstruction problem, and provide an expectation-maximization (EM) algorithm for its maximum likelihood solution. Using a series of simulated data and expressed sequences from real human genes, we demonstrate that our EM algorithm can correctly handle various situations of fragmentation and coupling in the input data. Our work establishes a general probabilistic framework for splice graph-based reconstructions of full-length isoforms

    Data structures and algorithms for analysis of alternative splicing with RNA-Seq data

    Activity-regulated RNA editing in select neuronal subfields in hippocampus

    RNA editing by adensosine deaminases is a widespread mechanism to alter genetic information in metazoa. In addition to modifications in non-coding regions, editing contributes to diversification of protein function, in analogy to alternative splicing. However, although splicing programs respond to external signals, facilitating fine tuning and homeostasis of cellular functions, a similar regulation has not been described for RNA editing. Here, we show that the AMPA receptor R/G editing site is dynamically regulated in the hippocampus in response to activity. These changes are bi-directional, reversible and correlate with levels of the editase Adar2. This regulation is observed in the CA1 hippocampal subfield but not in CA3 and is thus subfield/celltype-specific. Moreover, alternative splicing of the flip/flop cassette downstream of the R/G site is closely linked to the editing state, which is regulated by Ca(2+). Our data show that A-to-I RNA editing has the capacity to tune protein function in response to external stimuli

    Genome-wide analysis of alternative splicing in cow: implications in bovine as a model for human diseases

    Background: Alternative splicing (AS) is a primary mechanism of functional regulation in the human genome, with 60% to 80% of human genes being alternatively spliced. As part of the bovine genome annotation team, we have analysed 4567 bovine AS genes, compared to 16715 human and 16491 mouse AS genes, along with Gene Ontology (GO) analysis. We also analysed the two most important events, cassette exons and intron retention in 94 human disease genes and mapped them to the bovine orthologous genes. Of the 94 human inherited disease genes, a protein domain analysis was carried out for the transcript sequences of 12 human genes that have orthologous genes and have been characterised in cow. Results: Of the 21,755 bovine genes, 4,567 genes (21%) are alternatively spliced, compared to 16,715 (68%) in human and 16,491 (57%) in mouse. Gene-level analysis of the orthologous set suggested that bovine genes show fewer AS events compared to human and mouse genes. A detailed examination of cassette exons across human and cow for 94 human disease genes, suggested that a majority of cassette exons in human were present and constitutive in bovine as opposed to intron retention which exhibited 50% of the exons as present and 50% as absent in cow. We observed that AS plays a major role in disease implications in human through manipulations of essential/functional protein domains. It was also evident that majority of these 12 genes had conservation of all essential domains in their bovine orthologous counterpart, for these human diseases. Conclusion: While alternative splicing has the potential to create many mRNA isoforms from a single gene, in cow the majority of genes generate two to three isoforms, compared to six in human and four in mouse. Our analyses demonstrated that a smaller number of bovine genes show greater transcript diversity. GO definitions for bovine AS genes provided 38% more functional information than currently available in the sequence database. Our protein domain analysis helped us verify the suitability of using bovine as a model for human diseases and also recognize the contribution of AS towards the disease phenotypes.13 page(s

    ECgene: an alternative splicing database update

    ECgene () was developed to provide functional annotation for alternatively spliced genes. The applications encompass the genome-based transcript modeling for alternative splicing (AS), domain analysis with Gene Ontology (GO) annotation and expression analysis based on the EST and SAGE data. We have expanded the ECgene's AS modeling and EST clustering to nine organisms for which sufficient EST data are available in the GenBank. As for the human genome, we have also introduced several new applications to analyze differential expression. ECprofiler is an ontology-based candidate gene search system that allows users to select an arbitrary combination of gene expression pattern and GO functional categories. DEGEST is a database of differentially expressed genes and isoforms based on the EST information. Importantly, gene expression is analyzed at three distinctive levels—gene, isoform and exon levels. The user interfaces for functional and expression analyses have been substantially improved. ASviewer is a dedicated java application that visualizes the transcript structure and functional features of alternatively spliced variants. The SAGE part of the expression module provides many additional features including SNP, differential expression and alternative tag positions

    A General Definition and Nomenclature for Alternative Splicing Events

    Understanding the molecular mechanisms responsible for the regulation of the transcriptome present in eukaryotic cells is one of the most challenging tasks in the postgenomic era. In this regard, alternative splicing (AS) is a key phenomenon contributing to the production of different mature transcripts from the same primary RNA sequence. As a plethora of different transcript forms is available in databases, a first step to uncover the biology that drives AS is to identify the different types of reflected splicing variation. In this work, we present a general definition of the AS event along with a notation system that involves the relative positions of the splice sites. This nomenclature univocally and dynamically assigns a specific “AS code” to every possible pattern of splicing variation. On the basis of this definition and the corresponding codes, we have developed a computational tool (AStalavista) that automatically characterizes the complete landscape of AS events in a given transcript annotation of a genome, thus providing a platform to investigate the transcriptome diversity across genes, chromosomes, and species. Our analysis reveals that a substantial part—in human more than a quarter—of the observed splicing variations are ignored in common classification pipelines. We have used AStalavista to investigate and to compare the AS landscape of different reference annotation sets in human and in other metazoan species and found that proportions of AS events change substantially depending on the annotation protocol, species-specific attributes, and coding constraints acting on the transcripts. The AStalavista system therefore provides a general framework to conduct specific studies investigating the occurrence, impact, and regulation of AS

    Genome-Wide Data-Mining of Candidate Human Splice Translational Efficiency Polymorphisms (STEPs) and an Online Database

    Variation in pre-mRNA splicing is common and in some cases caused by genetic variants in intronic splicing motifs. Recent studies into the insulin gene (INS) discovered a polymorphism in a 5' non-coding intron that influences the likelihood of intron retention in the final mRNA, extending the 5' untranslated region and maintaining protein quality. Retention was also associated with increased insulin levels, suggesting that such variants--splice translational efficiency polymorphisms (STEPs)--may relate to disease phenotypes through differential protein expression. We set out to explore the prevalence of STEPs in the human genome and validate this new category of protein quantitative trait loci (pQTL) using publicly available data.Gene transcript and variant data were collected and mined for candidate STEPs in motif regions. Sequences from transcripts containing potential STEPs were analysed for evidence of splice site recognition and an effect in expressed sequence tags (ESTs). 16 publicly released genome-wide association data sets of common diseases were searched for association to candidate polymorphisms with HapMap frequency data. Our study found 3324 candidate STEPs lying in motif sequences of 5' non-coding introns and further mining revealed 170 with transcript evidence of intron retention. 21 potential STEPs had EST evidence of intron retention or exon extension, as well as population frequency data for comparison.Results suggest that the insulin STEP was not a unique example and that many STEPs may occur genome-wide with potentially causal effects in complex disease. An online database of STEPs is freely accessible at http://dbstep.genes.org.uk/

    Alternative Splicing and Protein Structure Evolution

    In den letzten Jahren gab es in verschiedensten Bereichen der Biologie einen dramatischen Anstieg verfügbarer, experimenteller Daten. Diese erlauben zum ersten Mal eine detailierte Analyse der Funktionsweisen von zellulären Komponenten wie Genen und Proteinen, die Analyse ihrer Verknüpfung in zellulären Netzwerken sowie der Geschichte ihrer Evolution. Insbesondere der Bioinformatik kommt hier eine wichtige Rolle in der Datenaufbereitung und ihrer biologischen Interpretation zu. In der vorliegenden Doktorarbeit werden zwei wichtige Bereiche der aktuellen bioinformatischen Forschung untersucht, nämlich die Analyse von Proteinstrukturevolution und Ähnlichkeiten zwischen Proteinstrukturen, sowie die Analyse von alternativem Splicing, einem integralen Prozess in eukaryotischen Zellen, der zur funktionellen Diversität beiträgt. Insbesondere führen wir mit dieser Arbeit die Idee einer kombinierten Analyse der beiden Mechanismen (Strukturevolution und Splicing) ein. Wir zeigen, dass sich durch eine kombinierte Betrachtung neue Einsichten gewinnen lassen, wie Strukturevolution und alternatives Splicing sowie eine Kopplung beider Mechanismen zu funktioneller und struktureller Komplexität in höheren Organismen beitragen. Die in der Arbeit vorgestellten Methoden, Hypothesen und Ergebnisse können dabei einen Beitrag zu unserem Verständnis der Funktionsweise von Strukturevolution und alternativem Splicing bei der Entstehung komplexer Organismen leisten wodurch beide, traditionell getrennte Bereiche der Bioinformatik in Zukunft voneinander profitieren können

    Prediction of alternative isoforms from exon expression levels in RNA-Seq experiments

    Alternative splicing, polyadenylation of pre-messenger RNA molecules and differential promoter usage can produce a variety of transcript isoforms whose respective expression levels are regulated in time and space, thus contributing specific biological functions. However, the repertoire of mammalian alternative transcripts and their regulation are still poorly understood. Second-generation sequencing is now opening unprecedented routes to address the analysis of entire transcriptomes. Here, we developed methods that allow the prediction and quantification of alternative isoforms derived solely from exon expression levels in RNA-Seq data. These are based on an explicit statistical model and enable the prediction of alternative isoforms within or between conditions using any known gene annotation, as well as the relative quantification of known transcript structures. Applying these methods to a human RNA-Seq dataset, we validated a significant fraction of the predictions by RT-PCR. Data further showed that these predictions correlated well with information originating from junction reads. A direct comparison with exon arrays indicated improved performances of RNA-Seq over microarrays in the prediction of skipped exons. Altogether, the set of methods presented here comprehensively addresses multiple aspects of alternative isoform analysis. The software is available as an open-source R-package called Solas at http://cmb.molgen.mpg.de/2ndGenerationSequencing/Solas/