6,943 research outputs found

    Latent rank change detection for analysis of splice-junction microarrays with nonlinear effects

    Full text link
    Alternative splicing of gene transcripts greatly expands the functional capacity of the genome, and certain splice isoforms may indicate specific disease states such as cancer. Splice junction microarrays interrogate thousands of splice junctions, but data analysis is difficult and error prone because of the increased complexity compared to differential gene expression analysis. We present Rank Change Detection (RCD) as a method to identify differential splicing events based upon a straightforward probabilistic model comparing the over- or underrepresentation of two or more competing isoforms. RCD has advantages over commonly used methods because it is robust to false positive errors due to nonlinear trends in microarray measurements. Further, RCD does not depend on prior knowledge of splice isoforms, yet it takes advantage of the inherent structure of mutually exclusive junctions, and it is conceptually generalizable to other types of splicing arrays or RNA-Seq. RCD specifically identifies the biologically important cases when a splice junction becomes more or less prevalent compared to other mutually exclusive junctions. The example data is from different cell lines of glioblastoma tumors assayed with Agilent microarrays.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS389 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Methods to study splicing from high-throughput RNA Sequencing data

    Full text link
    The development of novel high-throughput sequencing (HTS) methods for RNA (RNA-Seq) has provided a very powerful mean to study splicing under multiple conditions at unprecedented depth. However, the complexity of the information to be analyzed has turned this into a challenging task. In the last few years, a plethora of tools have been developed, allowing researchers to process RNA-Seq data to study the expression of isoforms and splicing events, and their relative changes under different conditions. We provide an overview of the methods available to study splicing from short RNA-Seq data. We group the methods according to the different questions they address: 1) Assignment of the sequencing reads to their likely gene of origin. This is addressed by methods that map reads to the genome and/or to the available gene annotations. 2) Recovering the sequence of splicing events and isoforms. This is addressed by transcript reconstruction and de novo assembly methods. 3) Quantification of events and isoforms. Either after reconstructing transcripts or using an annotation, many methods estimate the expression level or the relative usage of isoforms and/or events. 4) Providing an isoform or event view of differential splicing or expression. These include methods that compare relative event/isoform abundance or isoform expression across two or more conditions. 5) Visualizing splicing regulation. Various tools facilitate the visualization of the RNA-Seq data in the context of alternative splicing. In this review, we do not describe the specific mathematical models behind each method. Our aim is rather to provide an overview that could serve as an entry point for users who need to decide on a suitable tool for a specific analysis. We also attempt to propose a classification of the tools according to the operations they do, to facilitate the comparison and choice of methods.Comment: 31 pages, 1 figure, 9 tables. Small corrections adde

    Tools and strategies for RNA-sequencing data analysis

    Get PDF
    RNA-Sequencing (RNA-seq) has enabled the in-depth study of the transcriptome, becoming the primary research method in the field of molecular biology. The typical aim of RNA-seq is to quantify and detect differentially expressed (DE) and differentially spliced (DS) genes. Numerous methodologies and tools have been developed in recent years to assist in analyzing RNA-seq data. However, it is difficult for researchers to decide which methods or strategies they should adopt to optimize the analysis of their datasets. In this Thesis, in Study I, we applied the gene-level DE analysis approach to detect the androgen-regulated genes between cancerous and benign samples in 48 primary prostate cancer patients. Combined with other measurements from the same samples, our analysis indicated that patients having TMPRSS-ERG gene fusion had distinct intratumoral androgen profiles compared to TMPRSS-ERG negative tumors. However, the DE can remain undetected when the expression varies across the gene due to reasons such as alternative splicing. Hence, to account for this problem, an alternate analysis approach has been suggested in which the statistical testing of lower feature levels (e.g. transcripts, transcript compatibility counts, or exons) is performed initially, followed by aggregating the results to the gene level. In Study II, we tested this alternate approach on these lower features and compared the results to those from the conventional gene-level approach. In the alternate approach, two methods (Lancaster method and empirical brown method (ebm)) were tested for aggregating the feature-level results to gene-level results. Our results suggest that the exon-level estimates improve the detection of the DE genes when the ebm method is used for aggregating the results. Accordingly, R/Bioconductor package EBSEA was developed using the winning approach. RNA-seq data can also be used to find DS events between conditions. However, the detection of DS is more challenging than the detection of DE. In Study III, a comprehensive comparison of ten DS tools was performed. We concluded that exonbased and event-based methods (rMATS and MAJIQ) performed overall best across the different evaluation metrics considered. Furthermore, we observed overall low concordance between the results reported by the different tools, making it recommendable to use more than one tool when performing DS analysis, and to concentrate on the overlapping results.Työkaluja ja strategioita RNA-sekvensointidatan analyysiin RNA-sekvensointi (RNA-seq) on mahdollistanut transkriptomin yksityiskohtaisen tarkastelun ja siitä on tullut hyvin suosittu työkalu molekyylibiologian tutkimuksessa. RNA-sekvensointitutkimusten tyypillinen tarkoitus on selvittää näyteryh- mien välillä eriävästi ilmentyviä ja silmukoituvia geenejä. RNA-sekvensointidatojen analyysiin on kehitetty runsaasti työkaluja ja usein on haastavaa valita näiden joukosta optimaaliset välineet tietyn aineiston käsittelyyn. Tässä väitöstyössä osajulkaisussa I tunnistettiin androgeenihormonien säätelemiä eriävästi ilmentyviä geenejä syöpäkudoksen ja terveen kudoksen välillä 48 eturauhassyöpäpotilaalla. Kun nämä tulokset yhdistettiin muihin samojen potilaiden käytettävissä oleviin mittausarvoihin, havaittiin, että TMPRSS-ERG-geenifuusion omaavien potilaiden syöpäkudoksen androgeenihormonigeenien ilmentymisprofiili poikkesi verrattuna niihin potilaisiin, joilta ei löytynyt vastaavaa geenifuusiota. On kuitenkin mahdollista, että tällä lähestymistavalla eriävä ilmentyminen jää joidenkin geenien osalta havaitsematta, jos ilmentymistaso vaihtelee geenin eri osissa, esimerkiksi vaihtoehtoisen silmukoinnin vaikutuksen vuoksi. Ratkaisuksi tähän on esitetty uudenlaista lähestymistapaa, jossa tilastollinen testaus näyteryhmien välillä suoritetaan geenin rakenteen osalta hienojakoisemmalla tasolla (esimerkiksi transkriptien, transkriptiyhteensopivien mittausyksiköiden tai eksonien tasolla) ja vasta näin saadut osatulokset yhdistetään geenitason kokonaistulokseksi. Julkaisussa II verrattiin tätä lähestymistapaa perinteiseen geenitason analyysiin testaamalla kahta eri menetelmää tulosten yhdistämiseen takaisin geenitasolle: 1) Lancaster- menetelmää ja 2) empiiristä Brown-menetelmää (ebm). Tulosten perusteella eksonitason mittausarvojen käyttö yhdistettynä ebm-menetelmään paransi eriävästi ilmentyvien geenien tunnistusta. Tämä lähestymistapa on sisällytetty väitöstyössä kehitettyyn geenien eriävää ilmentymistä analysoivaan R/Bioconductor -analyysipakettiin EBSEA. RNA-sekvensointidataa voidaan käyttää myös eriävien silmukointitapahtumien tunnistamiseen näyteryhmien välillä. Tämä on kuitenkin haastavampaa kuin geenien eriävän ilmentymisen analyysi. Julkaisussa III vertailtiin kymmentä eriävien silmukointitapahtumien tunnistamiseen kehitettyä työkalua. Näistä työkaluista eksoniperustaiset ja silmukointitapahtumaperustaiset työkalut (erityisesti rMATS ja MAJIQ) tuottivat parhaat kokonaistulokset käytetyillä vertailukriteereillä. Työkalujen tuottamien tulosten välillä havaittiin kuitenkin merkittäviä eroja, minkä johdosta tulosten jatkotarkastelussa on hyödyllistä keskittyä niihin tuloksiin, jotka ovat löydettävissä useammalla kuin yhdellä työkalulla

    The PSI-U1 snRNP interaction regulates male mating behavior in Drosophila

    No full text
    Alternative pre-mRNA splicing (AS) is a critical regulatory mechanism that operates extensively in the nervous system to produce diverse protein isoforms. Fruitless AS isoforms have been shown to influence male courtship behavior, but the underlying mechanisms are unknown. Using genome-wide approaches and quantitative behavioral assays, we show that the P-element somatic inhibitor (PSI) and its interaction with the U1 small nuclear ribonucleoprotein complex (snRNP) control male courtship behavior. PSI mutants lacking the U1 snRNP-interacting domain (PSIΔAB mutant) exhibit extended but futile mating attempts. The PSIΔAB mutant results in significant changes in the AS patterns of ∼1,200 genes in the Drosophila brain, many of which have been implicated in the regulation of male courtship behavior. PSI directly regulates the AS of at least one-third of these transcripts, suggesting that PSI-U1 snRNP interactions coordinate the behavioral network underlying courtship behavior. Importantly, one of these direct targets is fruitless, the master regulator of courtship. Thus, PSI imposes a specific mode of regulatory control within the neuronal circuit controlling courtship, even though it is broadly expressed in the fly nervous system. This study reinforces the importance of AS in the control of gene activity in neurons and integrated neuronal circuits, and provides a surprising link between a pleiotropic pre-mRNA splicing pathway and the precise control of successful male mating behavior

    Probing Isoform Switching Events in Various Cancer Types: Lessons From Pan-Cancer Studies

    Full text link
    Alternative splicing is an essential regulatory mechanism for gene expression in mammalian cells contributing to protein, cellular, and species diversity. In cancer, alternative splicing is frequently disturbed, leading to changes in the expression of alternatively spliced protein isoforms. Advances in sequencing technologies and analysis methods led to new insights into the extent and functional impact of disturbed alternative splicing events. In this review, we give a brief overview of the molecular mechanisms driving alternative splicing, highlight the function of alternative splicing in healthy tissues and describe how alternative splicing is disrupted in cancer. We summarize current available computational tools for analyzing differential transcript usage, isoform switching events, and the pathogenic impact of cancer-specific splicing events. Finally, the strategies of three recent pan-cancer studies on isoform switching events are compared. Their methodological similarities and discrepancies are highlighted and lessons learned from the comparison are listed. We hope that our assessment will lead to new and more robust methods for cancer-specific transcript detection and help to produce more accurate functional impact predictions of isoform switching events

    Bayesian Modeling Approaches for Temporal Dynamics in RNA-seq Data

    Get PDF
    Analysis of differential expression has been a central role to address the variety of biological questions in the manner to characterize abnormal patterns of cellular and molecular functions for last decades. To date, identification of differentially expressed genes and isoforms has been more intensively focused on temporal dynamics over a series of time points. Bayesian strategies have been successfully employed to uncover the complexity of biological interest with the methodological and analytical perspectives for the various platforms of high-throughput data, for instance, methods in differential expression analysis and network modules in transcriptome data, peak-callers in ChipSeq data, target prediction in microRNA data and meta-methods between different platforms. In this chapter, we will discuss how our methodological works based on Bayesian models address important questions to arise in the architecture of temporal dynamics in RNA-seq data
    corecore