2,853 research outputs found

    Large-scale analysis of human alternative protein isoforms: pattern classification and correlation with subcellular localization signals

    Get PDF
    We investigated human alternative protein isoforms of >2600 genes based on full-length cDNA clones and SwissProt. We classified the isoforms and examined their co-occurrence for each gene. Further, we investigated potential relationships between these changes and differential subcellular localization. The two most abundant patterns were the one with different C-terminal regions and the one with an internal insertion, which together account for 43% of the total. Although changes of the N-terminal region are less common than those of the C-terminal region, extension of the C-terminal region is much less common than that of the N-terminal region, probably because of the difficulty of removing stop codons in one isoform. We also found that there are some frequently used combinations of co-occurrence in alternative isoforms. We interpret this as evidence that there is some structural relationship which produces a repertoire of isoformal patterns. Finally, many terminal changes are predicted to cause differential subcellular localization, especially in targeting either peroxisomes or mitochondria. Our study sheds new light on the enrichment of the human proteome through alternative splicing and related events. Our database of alternative protein isoforms is available through the internet

    AltAnalyze and DomainGraph: analyzing and visualizing exon expression data

    Get PDF
    Alternative splicing is an important mechanism for increasing protein diversity. However, its functional effects are largely unknown. Here, we present our new software workflow composed of the open-source application AltAnalyze and the Cytoscape plugin DomainGraph. Both programs provide an intuitive and comprehensive end-to-end solution for the analysis and visualization of alternative splicing data from Affymetrix Exon and Gene Arrays at the level of proteins, domains, microRNA binding sites, molecular interactions and pathways. Our software tools include easy-to-use graphical user interfaces, rigorous statistical methods (FIRMA, MiDAS and DABG filtering) and do not require prior knowledge of exon array analysis or programming. They provide new methods for automatic interpretation and visualization of the effects of alternative exon inclusion on protein domain composition and microRNA binding sites. These data can be visualized together with affected pathways and gene or protein interaction networks, allowing a straightforward identification of potential biological effects due to alternative splicing at different levels of granularity. Our programs are available at http://www.altanalyze.org and http://www.domaingraph.de. These websites also include extensive documentation, tutorials and sample data

    Identifying and characterising key alternative splicing events in Drosophila development

    Get PDF
    In complex Metazoans a given gene frequently codes for multiple protein isoforms, through processes such as alternative splicing. Large scale functional annotation of these isoforms is a key challenge for functional genomics. This annotation gap is increasing with the large numbers of multi transcript genes being identified by technologies such as RNASeq. Furthermore attempts to characterise the functions of splicing in an organism are complicated by the difficulty in distinguishing functional isoforms from those produced by splicing errors or transcription noise. Tools to help prioritise candidate isoforms for testing are largely absent

    Evolution of alternative splice variation and exon usage following whole genome duplication

    Get PDF
    Whole genome duplication and alternative splicing are two mechanisms that contribute to protein diversity. Whole genome duplication doubles the genetic material in an organism, providing raw material for adaptation and evolution of novel traits. Alternative splicing contributes to the proteome complexity, as it gives a gene the ability to produce several mRNA isoforms by alternatively splicing gene transcripts. While both processes are important factors in increasing protein diversity, their relationship is not well understood. The two primary aims of this thesis were to use Oxford Nanopore long-read RNA sequencing to better characterize the isoform diversity in Atlantic salmon and look for patterns of alternative splicing evolution following the salmonid-specific whole genome duplication event. With the long-read RNA sequences, we found that the majority (75%) of isoforms that mapped to known genes in the Atlantic salmon reference genome were previously unannotated; however, the annotated isoforms were more highly expressed. The diversity of isoforms was then used to test the models of alternative splicing evolution following whole genome duplication: the independent model, the function-sharing model, and the accelerated alternative splicing model. Our results did not support either the accelerated alternative splicing or function-sharing model, indicating no strong relationship between alternative splicing evolution and genes duplicated in the salmonid-specific whole genome duplication event.Helgenomeduplikasjon og alternativ spleising er to mekanismer som øker proteindiversitet. WGD fordobler en organismes genetiske materiale, noe som gir råmaterial for evolusjon av nye egenskaper, og for tilpasningsdyktighet. Alternativ spleising bidrar til å øke proteindiversiteten ettersom et gen kan produsere flere mRNA-isoformer ved å alternativt spleise transkripter. Selv om begge prosessene er viktige bidrag til økt proteindiversitet, er forholdet mellom dem ikke godt forstått. De to hovedmålene med denne masteroppgaven var å bruke Oxford Nanopore long-read RNA-sekvensering for å bedre karakterisere isoformdiversitet i atlanterhavslaks, og å se etter mønstre i evolusjonen av alternativ spleising som følge av den salmonid-spesifikke helgenomduplikasjonen. Ved å bruke long-read RNA-sekvenser fant vi ut at flertallet (75%) av isoformene med opphav fra et kjent gen i atlanterhavslaksens referansegenom var tidligere ikke annotert. De kjente isoformene, derimot, var høyere uttrykt. Isoformmangfoldet ble deretter brukt for å teste modellene for evolusjon av alternativ spleising: den uavhengige modellen, deling-av-funksjon-modellen og akselerert alternativ spleising-modellen. Våre resultater støttet hverken akselerert alternativ spleising- eller deling-av-funksjon-modellen, noe som indikerer at det ikke er et sterkt forhold mellom evolusjon av alternativ spleising og gener duplisert i den salmonid-spesifikke helgenomduplikasjonen.M-BIOTE

    Deep sequencing of pre-translational mRNPs reveals hidden flux through evolutionarily conserved AS-NMD pathways

    Get PDF
    Deep sequencing of mRNAs (RNA-Seq) is now the preferred method for transcriptome-wide quantification of gene expression. Yet many mRNA isoforms, such as those eliminated by nonsense-mediated decay (NMD), are inherently unstable. Thus a significant drawback of steady-state RNA-Seq is that it provides marginal information on the flux through alternative splicing pathways. Measurement of such flux necessitates capture of newly made species prior to mRNA decay. One means to capture nascent mRNAs is affinity purifying either the exon junction complex (EJC) or activated spliceosomes. Late-stage spliceosomes deposit the EJC upstream of exon-exon junctions, where it remains associated until the first round of translation. As most mRNA decay pathways are translation-dependent, these EJC- or spliceosome-associated, pre-translational mRNAs should provide an accurate record of the initial population of alternate mRNA isoforms. Previous work has analyzed the protein composition and structure of pre- translational mRNPs in detail. While in the Moore lab, my project has focused on exploring the diversity of mRNA isoforms contained within these complexes. As expected, known NMD isoforms are more highly represented in pre-translational mRNPs than in RNA-Seq libraries. To investigate whether pre-translational mRNPs contain novel mRNA isoforms, we created a bioinformatics pipeline that identified thousands of previously unannotated splicing events. Though many can be attributed to “splicing noise”, others are evolutionarily-conserved events that produce new AS-NMD isoforms likely involved in maintenance of protein homeostasis. Several of these occur in genes whose overexpression has been linked to poor cancer prognosis

    Splice-mediated Variants of Proteins (SpliVaP) – data and characterization of changes in signatures among protein isoforms due to alternative splicing

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>It is often the case that mammalian genes are alternatively spliced; the resulting alternate transcripts often encode protein isoforms that differ in amino acid sequences. Changes among the protein isoforms can alter the cellular properties of proteins. The effect can range from a subtle modulation to a complete loss of function.</p> <p>Results</p> <p>(i) We examined human splice-mediated protein isoforms (as extracted from a manually curated data set, and from a computationally predicted data set) for differences in the annotation for protein signatures (Pfam domains and PRINTS fingerprints) and we characterized the differences & their effects on protein functionalities. An important question addressed relates to the extent of protein isoforms that may lack any known function in the cell. (ii) We present a database that reports differences in protein signatures among human splice-mediated protein isoform sequences.</p> <p>Conclusion</p> <p>(i) Characterization: The work points to distinct sets of alternatively spliced genes with varying degrees of annotation for the splice-mediated protein isoforms. Protein molecular functions seen to be often affected are those that relate to: binding, catalytic, transcription regulation, structural molecule, transporter, motor, and antioxidant; and the processes that are often affected are nucleic acid binding, signal transduction, and protein-protein interactions. Signatures are often included/excluded and truncated in length among protein isoforms; truncation is seen as the predominant type of change. Analysis points to the following novel aspects: (a) Analysis using data from the manually curated Vega indicates that one in 8.9 genes can lead to a protein isoform of no "known" function; and one in 18 expressed protein isoforms can be such an "orphan" isoform; the corresponding numbers as seen with computationally predicted ASD data set are: one in 4.9 genes and one in 9.8 isoforms. (b) When swapping of signatures occurs, it is often between those of same functional classifications. (c) Pfam domains can occur in varying lengths, and PRINTS fingerprints can occur with varying number of constituent motifs among isoforms – since such a variation is seen in large number of genes, it could be a general mechanism to modulate protein function. (ii) Data: The reported resource (at <url>http://www.bioinformatica.crs4.org/tools/dbs/splivap/</url>) provides the community ability to access data on splice-mediated protein isoforms (with value-added annotation such as association with diseases) through changes in protein signatures.</p

    Revealing missing human protein isoforms based on Ab initio prediction, RNA-seq and proteomics

    Get PDF
    Biological and biomedical research relies on comprehensive understanding of protein-coding transcripts. However, the total number of human proteins is still unknown due to the prevalence of alternative splicing. In this paper, we detected 31,566 novel transcripts with coding potential by filtering our ab initio predictions with 50 RNA-seq datasets from diverse tissues/cell lines. PCR followed by MiSeq sequencing showed that at least 84.1% of these predicted novel splice sites could be validated. In contrast to known transcripts, the expression of these novel transcripts were highly tissue-specific. Based on these novel transcripts, at least 36 novel proteins were detected from shotgun proteomics data of 41 breast samples. We also showed L1 retrotransposons have a more significant impact on the origin of new transcripts/genes than previously thought. Furthermore, we found that alternative splicing is extraordinarily widespread for genes involved in specific biological functions like protein binding, nucleoside binding, neuron projection, membrane organization and cell adhesion. In the end, the total number of human transcripts with protein-coding potential was estimated to be at least 204,950.publishedVersio

    Cross-species network and transcript transfer

    Get PDF
    Metabolic processes, signal transduction, gene regulation, as well as gene and protein expression are largely controlled by biological networks. High-throughput experiments allow the measurement of a wide range of cellular states and interactions. However, networks are often not known in detail for specific biological systems and conditions. Gene and protein annotations are often transferred from model organisms to the species of interest. Therefore, the question arises whether biological networks can be transferred between species or whether they are specific for individual contexts. In this thesis, the following aspects are investigated: (i) the conservation and (ii) the cross-species transfer of eukaryotic protein-interaction and gene regulatory (transcription factor- target) networks, as well as (iii) the conservation of alternatively spliced variants. In the simplest case, interactions can be transferred between species, based solely on the sequence similarity of the orthologous genes. However, such a transfer often results either in the transfer of only a few interactions (medium/high sequence similarity threshold) or in the transfer of many speculative interactions (low sequence similarity threshold). Thus, advanced network transfer approaches also consider the annotations of orthologous genes involved in the interaction transfer, as well as features derived from the network structure, in order to enable a reliable interaction transfer, even between phylogenetically very distant species. In this work, such an approach for the transfer of protein interactions is presented (COIN). COIN uses a sophisticated machine-learning model in order to label transferred interactions as either correctly transferred (conserved) or as incorrectly transferred (not conserved). The comparison and the cross-species transfer of regulatory networks is more difficult than the transfer of protein interaction networks, as a huge fraction of the known regulations is only described in the (not machine-readable) scientific literature. In addition, compared to protein interactions, only a few conserved regulations are known, and regulatory elements appear to be strongly context-specific. In this work, the cross-species analysis of regulatory interaction networks is enabled with software tools and databases for global (ConReg) and thousands of context-specific (CroCo) regulatory interactions that are derived and integrated from the scientific literature, binding site predictions and experimental data. Genes and their protein products are the main players in biological networks. However, to date, the aspect is neglected that a gene can encode different proteins. These alternative proteins can differ strongly from each other with respect to their molecular structure, function and their role in networks. The identification of conserved and species-specific splice variants and the integration of variants in network models will allow a more complete cross-species transfer and comparison of biological networks. With ISAR we support the cross-species transfer and comparison of alternative variants by introducing a gene-structure aware (i.e. exon-intron structure aware) multiple sequence alignment approach for variants from orthologous and paralogous genes. The methods presented here and the appropriate databases allow the cross-species transfer of biological networks, the comparison of thousands of context-specific networks, and the cross-species comparison of alternatively spliced variants. Thus, they can be used as a starting point for the understanding of regulatory and signaling mechanisms in many biological systems.In biologischen Systemen werden Stoffwechselprozesse, Signalübertragungen sowie die Regulation von Gen- und Proteinexpression maßgeblich durch biologische Netzwerke gesteuert. Hochdurchsatz-Experimente ermöglichen die Messung einer Vielzahl von zellulären Zuständen und Wechselwirkungen. Allerdings sind für die meisten Systeme und Kontexte biologische Netzwerke nach wie vor unbekannt. Gen- und Proteinannotationen werden häufig von Modellorganismen übernommen. Demnach stellt sich die Frage, ob auch biologische Netzwerke und damit die systemischen Eigenschaften ähnlich sind und übertragen werden können. In dieser Arbeit wird: (i) Die Konservierung und (ii) die artenübergreifende Übertragung von eukaryotischen Protein-Interaktions- und regulatorischen (Transkriptionsfaktor-Zielgen) Netzwerken, sowie (iii) die Konservierung von Spleißvarianten untersucht. Interaktionen können im einfachsten Fall nur auf Basis der Sequenzähnlichkeit zwischen orthologen Genen übertragen werden. Allerdings führt eine solche Übertragung oft dazu, dass nur sehr wenige Interaktionen übertragen werden können (hoher bis mittlerer Sequenzschwellwert) oder dass ein Großteil der übertragenden Interaktionen sehr spekulativ ist (niedriger Sequenzschwellwert). Verbesserte Methoden berücksichtigen deswegen zusätzlich noch die Annotationen der Orthologen, Eigenschaften der Interaktionspartner sowie die Netzwerkstruktur und können somit auch Interaktionen auf phylogenetisch weit entfernte Arten (zuverlässig) übertragen. In dieser Arbeit wird ein solcher Ansatz für die Übertragung von Protein-Interaktionen vorgestellt (COIN). COIN verwendet Verfahren des maschinellen Lernens, um Interaktionen als richtig (konserviert) oder als falsch übertragend (nicht konserviert) zu klassifizieren. Der Vergleich und die artenübergreifende Übertragung von regulatorischen Interaktionen ist im Vergleich zu Protein-Interaktionen schwieriger, da ein Großteil der bekannten Regulationen nur in der (nicht maschinenlesbaren) wissenschaftlichen Literatur beschrieben ist. Zudem sind im Vergleich zu Protein-Interaktionen nur wenige konservierte Regulationen bekannt und regulatorische Elemente scheinen stark kontextabhängig zu sein. In dieser Arbeit wird die artenübergreifende Analyse von regulatorischen Netzwerken mit Softwarewerkzeugen und Datenbanken für globale (ConReg) und kontextspezifische (CroCo) regulatorische Interaktionen ermöglicht. Regulationen wurden dafür aus Vorhersagen, experimentellen Daten und aus der wissenschaftlichen Literatur abgeleitet und integriert. Grundbaustein für viele biologische Netzwerke sind Gene und deren Proteinprodukte. Bisherige Netzwerkmodelle vernachlässigen allerdings meist den Aspekt, dass ein Gen verschiedene Proteine kodieren kann, die sich von der Funktion, der Proteinstruktur und der Rolle in Netzwerken stark voneinander unterscheiden können. Die Identifizierung von konservierten und artspezifischen Proteinprodukten und deren Integration in Netzwerkmodelle würde einen vollständigeren Übertrag und Vergleich von Netzwerken ermöglichen. In dieser Arbeit wird der artenübergreifende Vergleich von Proteinprodukten mit einem multiplen Sequenzalignmentverfahren für alternative Varianten von paralogen und orthologen Genen unterstützt, unter Berücksichtigung der bekannten Exon-Intron-Grenzen (ISAR). Die in dieser Arbeit vorgestellten Verfahren, Datenbanken und Softwarewerkzeuge ermöglichen die Übertragung von biologischen Netzwerken, den Vergleich von tausenden kontextspezifischen Netzwerken und den artenübergreifenden Vergleich von alternativen Varianten. Sie können damit die Ausgangsbasis für ein Verständnis von Kommunikations- und Regulationsmechanismen in vielen biologischen Systemen bilden
    corecore