32 research outputs found

    Bioinformatics Analyses of Alternative Splicing: Predition of alternative splicing events in animals and plants using Machine Learning and analysis of the extent and conservation of subtle alternative splicing

    Get PDF
    Alternatives Spleißen (AS) ist ein Mechanismus, durch den ein Multi-Exon-Gen verschiedene Transkripte und damit verschiedene Proteine exprimieren kann. AS trĂ€gt wesentlich zur KomplexitĂ€t und Vielfalt eukaryotischer Transkriptome und Proteome bei. Die Bioinformatik hat in den vergangenen zehn Jahren entscheidenden BeitrĂ€ge zu unserem VerstĂ€ndnis des AS in Bezug auf Verbreitung, Umfang und Konservierung der verschiedenen Klassen, Evolution, Regulierung und biologische Funktion geliefert. Zum Nachweis des AS im großen Maßstab wurden meist Verfahren zur Genom- und Transkriptom-weiten Alignierung von EST- und mRNA-Daten sowie Microarray-Analysen eingesetzt, die weitestgehend auf bioinformatischen Methoden basieren. Diese wurden durch rechnergestĂŒtzte Verfahren zur Charakterisierung und Vorhersage von AS ergĂ€nzt, die zeigen, wie sich konstitutive und alternative Spleißorte sowie Exons unterscheiden. Die vorliegende Dissertationsschrift beschĂ€ftigt sich mit bioinformatischen Analysen ausgewĂ€hlter Aspekte des AS. Im ersten Teil habe ich Verfahren zur Vorhersage des AS entwickelt, ohne dabei auf DatensĂ€tze exprimierter Sequenzen zurĂŒckzugreifen. Insbesondere habe ich AnsĂ€tze zur Vorhersage von Kassetten-Exons mittels Bayessches Netze (BN) weiterentwickelt und neue diskriminierende Merkmale etabliert. Diese verbesserten deutlich die Richtig-Positiv-Rate von publizierten 50% auf 61%, bei einer stringenten Falsch-Positiv-Rate von nur 0,5%. Ich konnte zeigen, dass Exons, die als konstitutiv gekennzeichnet waren, denen aber durch das BN eine hohe Wahrscheinlichkeit zugeweisen wurde, alternativ zu sein, in der Tat durch neueste Expressionsdaten als alternativ bestĂ€tigt wurden. Bei gleichen DatensĂ€tzen und Merkmalen entspricht die LeistungsfĂ€higkeit eines BN der einer publizierten Support-Vektor-Maschine (SVM), was darauf hinweist, dass verlĂ€ssliche Ergebnisse bei der Klassifikation mehr von den Merkmalen als von der Wahl des Klassifikators abhĂ€ngen. Im zweiten Teil habe ich den BN-Ansatz auf eine umfangreiche und evolutionĂ€r weit verbreitete Klasse von AS-Ereignissen ausgeweitet, die als NAGNAG-Tandem-Spleißstellen bezeichnet werden und bei denen die alternativen Spleißorte nur 3 Nukleotide (nt) voneinander getrennt sind. Die sorgfĂ€ltige Zusammenstellung der Trainings- und Test-DatensĂ€tze bei der Vorhersage des NAGNAG-AS trug zu einer ausgewogenen SensitivitĂ€t und SpezifitĂ€t von 92% bei. Vorhersagen eines auf dem vereinigten Datensatz trainierten BN konnten in 81% (38/47) der FĂ€lle experimentell bestĂ€tigt werden. Im Rahmen dieser Studie wurde damit einer der gegenwĂ€rtig umfangreichsten DatensĂ€tze zur experimentellen Verifizierung von Vorhersagen des AS generiert. Ein BN, trainiert anhand menschlicher Daten, erzielt Ă€hnliche gute Ergebnisse bei vier anderen Wirbeltier-Genomen. Nur leichte Einbußen bei Vorhersagen fĂŒr Drosophila melanogaster und Caenorhabditis elegans weisen darauf hin, dass der zugrunde liegende Spleißmechanismus ĂŒber weite evolutionĂ€re Distanzen konserviert zu seien scheint. Schließlich verwendete ich die Vorhersagegenauigkeit der experimentellen Validierung, um die Zahl der noch unentdeckten alternativen NAGNAGs abzuschĂ€tzen. Die Ergebnisse deuten darauf hin, dass der Mechanismus des NAGNAG-AS einfach, stochastisch und konserviert ist - unter Wirbeltieren und darĂŒber hinaus. Des weiteren habe ich den BN-Ansatz zur Charakterisierung und Vorhersage von NAGNAG-AS in Physcomitrella patens, einem Moos, eingesetzt. Dies ist eine der ersten Studien zur Vorhersage von AS in Pflanzen, ohne dabei auf DatensĂ€tze von exprimierten Sequenzen zurĂŒckzugreifen. Wir erreichten Ă€hnliche Ergebnisse, wie in unseren anderen Arbeiten zur Vorhersage NAGNAG-AS. Eine unabhĂ€ngige Validierung mittels 454-NextGen-Sequenzdaten zeigte Richtig-Positiv-Raten von 64%-79% fĂŒr gut unterstĂŒtzt FĂ€lle von NAGNAG-AS. Damit scheint der Mechanismus des NAGNAG-AS bei Pflanzen dem der Tiere zu Ă€hneln

    Improved identification of conserved cassette exons using Bayesian networks

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Alternative splicing is a major contributor to the diversity of eukaryotic transcriptomes and proteomes. Currently, large scale detection of alternative splicing using expressed sequence tags (ESTs) or microarrays does not capture all alternative splicing events. Moreover, for many species genomic data is being produced at a far greater rate than corresponding transcript data, hence <it>in silico </it>methods of predicting alternative splicing have to be improved.</p> <p>Results</p> <p>Here, we show that the use of Bayesian networks (BNs) allows accurate prediction of evolutionary conserved exon skipping events. At a stringent false positive rate of 0.5%, our BN achieves an improved true positive rate of 61%, compared to a previously reported 50% on the same dataset using support vector machines (SVMs). Incorporating several novel discriminative features such as intronic splicing regulatory elements leads to the improvement. Features related to mRNA secondary structure increase the prediction performance, corroborating previous findings that secondary structures are important for exon recognition. Random labelling tests rule out overfitting. Cross-validation on another dataset confirms the increased performance. When using the same dataset and the same set of features, the BN matches the performance of an SVM in earlier literature. Remarkably, we could show that about half of the exons which are labelled constitutive but receive a high probability of being alternative by the BN, are in fact alternative exons according to the latest EST data. Finally, we predict exon skipping without using conservation-based features, and achieve a true positive rate of 29% at a false positive rate of 0.5%.</p> <p>Conclusion</p> <p>BNs can be used to achieve accurate identification of alternative exons and provide clues about possible dependencies between relevant features. The near-identical performance of the BN and SVM when using the same features shows that good classification depends more on features than on the choice of classifier. Conservation based features continue to be the most informative, and hence distinguishing alternative exons from constitutive ones without using conservation based features remains a challenging problem.</p

    TassDB2 - A comprehensive database of subtle alternative splicing events

    Get PDF
    Background: Subtle alternative splicing events involving tandem splice sites separated by a short (2-12 nucleotides) distance are frequent and evolutionarily widespread in eukaryotes, and a major contributor to the complexity of transcriptomes and proteomes. However, these events have been either omitted altogether in databases on alternative splicing, or only the cases of experimentally confirmed alternative splicing have been reported. Thus, a database which covers all confirmed cases of subtle alternative splicing as well as the numerous putative tandem splice sites (which might be confirmed once more transcript data becomes available), and allows to search for tandem splice sites with specific features and download the results, is a valuable resource for targeted experimental studies and large-scale bioinformatics analyses of tandem splice sites. Towards this goal we recently set up TassDB (Tandem Splice Site DataBase, version 1), which stores data about alternative splicing events at tandem splice sites separated by 3 nt in eight species. \ud Description: We have substantially revised and extended TassDB. The currently available version 2 contains extensive information about tandem splice sites separated by 2-12 nt for the human and mouse transcriptomes including data on the conservation of the tandem motifs in five vertebrates. TassDB2 offers a user-friendly interface to search for specific genes or for genes containing tandem splice sites with specific features as well as the possibility to download result datasets. For example, users can search for cases of alternative splicing where the proportion of EST/mRNA evidence supporting the minor isoform exceeds a specific threshold, or where the difference in splice site scores is specified by the user. The predicted impact of each event on the protein is also reported, along with information about being a putative target for the nonsense-mediated decay \ud (NMD) pathway. Links are provided to the UCSC genome browser and other external resources.\ud Conclusion: TassDB2, available via http://www.tassdb.info, provides comprehensive resources for researchers interested in both targeted experimental studies and large-scale bioinformatics analyses of short distance tandem splice sites.\ud \ud doi: 10.1186/1471-2105-11-216\u

    The molecular diversity of Luminal A breast tumors

    Get PDF
    Breast cancer is a collection of diseases with distinct molecular traits, prognosis, and therapeutic options. Luminal A breast cancer is the most heterogeneous, both molecularly and clinically. Using genomic data from over 1,000 Luminal Atumors from multiple studies, we analyzed the copy number and mutational landscape of this tumor subtype. This integrated analysis revealed four major subtypes defined by distinct copy-number and mutation profiles. We identified an atypical Luminal A subtype characterized by high genomic instability, TP53 mutations, and increased Aurora kinase signaling; these genomic alterations lead to a worse clinical prognosis. Aberrations of chromosomes 1, 8, and 16, together with PIK3CA, GATA3, AKT1, and MAP3K1 mutations drive the other subtypes. Finally, an unbiased pathway analysis revealed multiple rare, but mutually exclusive, alterations linked to loss of activity of co-repressor complexes N-Cor and SMRT. These rare alterations were the most prevalent in Luminal A tumors and may predict resistance to endocrine therapy. Our work provides for a further molecular stratification of Luminal A breast tumors, with potential direct clinical implications.Electronic supplementary materialThe online version of this article (doi:10.1007/s10549-013-2699-3) contains supplementary material, which is available to authorized users

    Accurate prediction of NAGNAG alternative splicing

    Get PDF
    Alternative splicing (AS) involving NAGNAG tandem acceptors is an evolutionarily widespread class of AS. Recent predictions of alternative acceptor usage reported better results for acceptors separated by larger distances, than for NAGNAGs. To improve the latter, we aimed at the use of Bayesian networks (BN), and extensive experimental validation of the predictions. Using carefully constructed training and test datasets, a balanced sensitivity and specificity of ≄92% was achieved. A BN trained on the combined dataset was then used to make predictions, and 81% (38/47) of the experimentally tested predictions were verified. Using a BN learned on human data on six other genomes, we show that while the performance for the vertebrate genomes matches that achieved on human data, there is a slight drop for Drosophila and worm. Lastly, using the prediction accuracy according to experimental validation, we estimate the number of yet undiscovered alternative NAGNAGs. State of the art classifiers can produce highly accurate prediction of AS at NAGNAGs, indicating that we have identified the major features of the ‘NAGNAG-splicing code’ within the splice site and its immediate neighborhood. Our results suggest that the mechanism behind NAGNAG AS is simple, stochastic, and conserved among vertebrates and beyond

    Comprehensive molecular profiling of lung adenocarcinoma: The cancer genome atlas research network

    Get PDF
    Adenocarcinoma of the lung is the leading cause of cancer death worldwide. Here we report molecular profiling of 230 resected lung adenocarcinomas using messenger RNA, microRNA and DNA sequencing integrated with copy number, methylation and proteomic analyses. High rates of somatic mutation were seen (mean 8.9 mutations per megabase). Eighteen genes were statistically significantly mutated, including RIT1 activating mutations and newly described loss-of-function MGA mutations which are mutually exclusive with focal MYC amplification. EGFR mutations were more frequent in female patients, whereas mutations in RBM10 were more common in males. Aberrations in NF1, MET, ERBB2 and RIT1 occurred in 13% of cases and were enriched in samples otherwise lacking an activated oncogene, suggesting a driver role for these events in certain tumours. DNA and mRNA sequence from the same tumour highlighted splicing alterations driven by somatic genomic changes, including exon 14 skipping in MET mRNA in 4% of cases. MAPK and PI(3)K pathway activity, when measured at the protein level, was explained by known mutations in only a fraction of cases, suggesting additional, unexplained mechanisms of pathway activation. These data establish a foundation for classification and further investigations of lung adenocarcinoma molecular pathogenesis

    The molecular diversity of Luminal A breast tumors

    Get PDF
    Breast cancer is a collection of diseases with distinct molecular traits, prognosis, and therapeutic options. Luminal A breast cancer is the most heterogeneous, both molecularly and clinically. Using genomic data from over 1,000 Luminal A tumors from multiple studies, we analyzed the copy number and mutational landscape of this tumor subtype. This integrated analysis revealed four major subtypes defined by distinct copy-number and mutation profiles. We identified an atypical Luminal A subtype characterized by high genomic instability, TP53 mutations, and increased Aurora kinase signaling; these genomic alterations lead to a worse clinical prognosis. Aberrations of chromosomes 1, 8, and 16, together with PIK3CA, GATA3, AKT1, and MAP3K1 mutations drive the other subtypes. Finally, an unbiased pathway analysis revealed multiple rare, but mutually exclusive, alterations linked to loss of activity of co-repressor complexes N-Cor and SMRT. These rare alterations were the most prevalent in Luminal A tumors and may predict resistance to endocrine therapy. Our work provides for a further molecular stratification of Luminal A breast tumors, with potential direct clinical implications. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s10549-013-2699-3) contains supplementary material, which is available to authorized users

    Multiplatform Analysis of 12 Cancer Types Reveals Molecular Classification within and across Tissues of Origin

    Get PDF
    Recent genomic analyses of pathologically-defined tumor types identify “within-a-tissue” disease subtypes. However, the extent to which genomic signatures are shared across tissues is still unclear. We performed an integrative analysis using five genome-wide platforms and one proteomic platform on 3,527 specimens from 12 cancer types, revealing a unified classification into 11 major subtypes. Five subtypes were nearly identical to their tissue-of-origin counterparts, but several distinct cancer types were found to converge into common subtypes. Lung squamous, head & neck, and a subset of bladder cancers coalesced into one subtype typified by TP53 alterations, TP63 amplifications, and high expression of immune and proliferation pathway genes. Of note, bladder cancers split into three pan-cancer subtypes. The multi-platform classification, while correlated with tissue-of-origin, provides independent information for predicting clinical outcomes. All datasets are available for data-mining from a unified resource to support further biological discoveries and insights into novel therapeutic strategies

    doi:10.1093/nar/gkp220 Accurate prediction of NAGNAG alternative splicing

    No full text
    Alternative splicing (AS) involving NAGNAG tandem acceptors is an evolutionarily widespread class of AS. Recent predictions of alternative acceptor usage reported better results for acceptors separated by larger distances, than for NAGNAGs. To improve the latter, we aimed at the use of Bayesian networks (BN), and extensive experimental validation of the predictions. Using carefully constructed training and test datasets, a balanced sensitivity and specificity of 92 % was achieved. A BN trained on the combined dataset was then used to make predictions, and 81 % (38/47) of the experimentally tested predictions were verified. Using a BN learned on human data on six other genomes, we show that while the performance for the vertebrate genomes matches that achieved on human data, there is a slight drop for Drosophila and worm. Lastly, using the prediction accuracy according to experimental validation, we estimate the number of yet undiscovered alternative NAGNAGs. State of the art classifiers can produce highly accurate prediction of AS at NAGNAGs, indicating that we have identified the major features of the ‘NAGNAG-splicing code ’ within the splice site and its immediate neighborhood. Our results suggest that the mechanism behind NAGNAG AS is simple, stochastic, and conserved among vertebrates and beyond
    corecore