6,943 research outputs found
Latent rank change detection for analysis of splice-junction microarrays with nonlinear effects
Alternative splicing of gene transcripts greatly expands the functional
capacity of the genome, and certain splice isoforms may indicate specific
disease states such as cancer. Splice junction microarrays interrogate
thousands of splice junctions, but data analysis is difficult and error prone
because of the increased complexity compared to differential gene expression
analysis. We present Rank Change Detection (RCD) as a method to identify
differential splicing events based upon a straightforward probabilistic model
comparing the over- or underrepresentation of two or more competing isoforms.
RCD has advantages over commonly used methods because it is robust to false
positive errors due to nonlinear trends in microarray measurements. Further,
RCD does not depend on prior knowledge of splice isoforms, yet it takes
advantage of the inherent structure of mutually exclusive junctions, and it is
conceptually generalizable to other types of splicing arrays or RNA-Seq. RCD
specifically identifies the biologically important cases when a splice junction
becomes more or less prevalent compared to other mutually exclusive junctions.
The example data is from different cell lines of glioblastoma tumors assayed
with Agilent microarrays.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS389 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Methods to study splicing from high-throughput RNA Sequencing data
The development of novel high-throughput sequencing (HTS) methods for RNA
(RNA-Seq) has provided a very powerful mean to study splicing under multiple
conditions at unprecedented depth. However, the complexity of the information
to be analyzed has turned this into a challenging task. In the last few years,
a plethora of tools have been developed, allowing researchers to process
RNA-Seq data to study the expression of isoforms and splicing events, and their
relative changes under different conditions. We provide an overview of the
methods available to study splicing from short RNA-Seq data. We group the
methods according to the different questions they address: 1) Assignment of the
sequencing reads to their likely gene of origin. This is addressed by methods
that map reads to the genome and/or to the available gene annotations. 2)
Recovering the sequence of splicing events and isoforms. This is addressed by
transcript reconstruction and de novo assembly methods. 3) Quantification of
events and isoforms. Either after reconstructing transcripts or using an
annotation, many methods estimate the expression level or the relative usage of
isoforms and/or events. 4) Providing an isoform or event view of differential
splicing or expression. These include methods that compare relative
event/isoform abundance or isoform expression across two or more conditions. 5)
Visualizing splicing regulation. Various tools facilitate the visualization of
the RNA-Seq data in the context of alternative splicing. In this review, we do
not describe the specific mathematical models behind each method. Our aim is
rather to provide an overview that could serve as an entry point for users who
need to decide on a suitable tool for a specific analysis. We also attempt to
propose a classification of the tools according to the operations they do, to
facilitate the comparison and choice of methods.Comment: 31 pages, 1 figure, 9 tables. Small corrections adde
Recommended from our members
The Expanding Landscape of Alternative Splicing Variation in Human Populations.
Alternative splicing is a tightly regulated biological process by which the number of gene products for any given gene can be greatly expanded. Genomic variants in splicing regulatory sequences can disrupt splicing and cause disease. Recent developments in sequencing technologies and computational biology have allowed researchers to investigate alternative splicing at an unprecedented scale and resolution. Population-scale transcriptome studies have revealed many naturally occurring genetic variants that modulate alternative splicing and consequently influence phenotypic variability and disease susceptibility in human populations. Innovations in experimental and computational tools such as massively parallel reporter assays and deep learning have enabled the rapid screening of genomic variants for their causal impacts on splicing. In this review, we describe technological advances that have greatly increased the speed and scale at which discoveries are made about the genetic variation of alternative splicing. We summarize major findings from population transcriptomic studies of alternative splicing and discuss the implications of these findings for human genetics and medicine
Tools and strategies for RNA-sequencing data analysis
RNA-Sequencing (RNA-seq) has enabled the in-depth study of the transcriptome, becoming the primary research method in the field of molecular biology. The typical aim of RNA-seq is to quantify and detect differentially expressed (DE) and differentially spliced (DS) genes. Numerous methodologies and tools have been developed in recent years to assist in analyzing RNA-seq data. However, it is difficult for researchers to decide which methods or strategies they should adopt to optimize the analysis of their datasets.
In this Thesis, in Study I, we applied the gene-level DE analysis approach to detect the androgen-regulated genes between cancerous and benign samples in 48 primary prostate cancer patients. Combined with other measurements from the same samples, our analysis indicated that patients having TMPRSS-ERG gene fusion had distinct intratumoral androgen profiles compared to TMPRSS-ERG negative tumors. However, the DE can remain undetected when the expression varies across the gene due to reasons such as alternative splicing. Hence, to account for this problem, an alternate analysis approach has been suggested in which the statistical testing of lower feature levels (e.g. transcripts, transcript compatibility counts, or exons) is performed initially, followed by aggregating the results to the gene level. In Study II, we tested this alternate approach on these lower features and compared the results to those from the conventional gene-level approach. In the alternate approach, two methods (Lancaster method and empirical brown method (ebm)) were tested for aggregating the feature-level results to gene-level results. Our results suggest that the exon-level estimates improve the detection of the DE genes when the ebm method is used for aggregating the results. Accordingly, R/Bioconductor package EBSEA was developed using the winning approach.
RNA-seq data can also be used to find DS events between conditions. However, the detection of DS is more challenging than the detection of DE. In Study III, a comprehensive comparison of ten DS tools was performed. We concluded that exonbased and event-based methods (rMATS and MAJIQ) performed overall best across the different evaluation metrics considered. Furthermore, we observed overall low concordance between the results reported by the different tools, making it recommendable to use more than one tool when performing DS analysis, and to concentrate on the overlapping results.Työkaluja ja strategioita RNA-sekvensointidatan analyysiin
RNA-sekvensointi (RNA-seq) on mahdollistanut transkriptomin yksityiskohtaisen tarkastelun ja siitä on tullut hyvin suosittu työkalu molekyylibiologian tutkimuksessa. RNA-sekvensointitutkimusten tyypillinen tarkoitus on selvittää näyteryh- mien välillä eriävästi ilmentyviä ja silmukoituvia geenejä. RNA-sekvensointidatojen analyysiin on kehitetty runsaasti työkaluja ja usein on haastavaa valita näiden joukosta optimaaliset välineet tietyn aineiston käsittelyyn.
Tässä väitöstyössä osajulkaisussa I tunnistettiin androgeenihormonien säätelemiä eriävästi ilmentyviä geenejä syöpäkudoksen ja terveen kudoksen välillä 48 eturauhassyöpäpotilaalla. Kun nämä tulokset yhdistettiin muihin samojen potilaiden käytettävissä oleviin mittausarvoihin, havaittiin, että TMPRSS-ERG-geenifuusion omaavien potilaiden syöpäkudoksen androgeenihormonigeenien ilmentymisprofiili poikkesi verrattuna niihin potilaisiin, joilta ei löytynyt vastaavaa geenifuusiota. On kuitenkin mahdollista, että tällä lähestymistavalla eriävä ilmentyminen jää joidenkin geenien osalta havaitsematta, jos ilmentymistaso vaihtelee geenin eri osissa, esimerkiksi vaihtoehtoisen silmukoinnin vaikutuksen vuoksi. Ratkaisuksi tähän on esitetty uudenlaista lähestymistapaa, jossa tilastollinen testaus näyteryhmien välillä suoritetaan geenin rakenteen osalta hienojakoisemmalla tasolla (esimerkiksi transkriptien, transkriptiyhteensopivien mittausyksiköiden tai eksonien tasolla) ja vasta näin saadut osatulokset yhdistetään geenitason kokonaistulokseksi. Julkaisussa II verrattiin tätä lähestymistapaa perinteiseen geenitason analyysiin testaamalla kahta eri menetelmää tulosten yhdistämiseen takaisin geenitasolle: 1) Lancaster- menetelmää ja 2) empiiristä Brown-menetelmää (ebm). Tulosten perusteella eksonitason mittausarvojen käyttö yhdistettynä ebm-menetelmään paransi eriävästi ilmentyvien geenien tunnistusta. Tämä lähestymistapa on sisällytetty väitöstyössä kehitettyyn geenien eriävää ilmentymistä analysoivaan R/Bioconductor -analyysipakettiin EBSEA.
RNA-sekvensointidataa voidaan käyttää myös eriävien silmukointitapahtumien tunnistamiseen näyteryhmien välillä. Tämä on kuitenkin haastavampaa kuin geenien eriävän ilmentymisen analyysi. Julkaisussa III vertailtiin kymmentä eriävien silmukointitapahtumien tunnistamiseen kehitettyä työkalua. Näistä työkaluista eksoniperustaiset ja silmukointitapahtumaperustaiset työkalut (erityisesti rMATS ja MAJIQ) tuottivat parhaat kokonaistulokset käytetyillä vertailukriteereillä. Työkalujen tuottamien tulosten välillä havaittiin kuitenkin merkittäviä eroja, minkä johdosta tulosten jatkotarkastelussa on hyödyllistä keskittyä niihin tuloksiin, jotka ovat löydettävissä useammalla kuin yhdellä työkalulla
The PSI-U1 snRNP interaction regulates male mating behavior in Drosophila
Alternative pre-mRNA splicing (AS) is a critical regulatory mechanism that operates extensively in the nervous system to produce diverse protein isoforms. Fruitless AS isoforms have been shown to influence male courtship behavior, but the underlying mechanisms are unknown. Using genome-wide approaches and quantitative behavioral assays, we show that the P-element somatic inhibitor (PSI) and its interaction with the U1 small nuclear ribonucleoprotein complex (snRNP) control male courtship behavior. PSI mutants lacking the U1 snRNP-interacting domain (PSIΔAB mutant) exhibit extended but futile mating attempts. The PSIΔAB mutant results in significant changes in the AS patterns of ∼1,200 genes in the Drosophila brain, many of which have been implicated in the regulation of male courtship behavior. PSI directly regulates the AS of at least one-third of these transcripts, suggesting that PSI-U1 snRNP interactions coordinate the behavioral network underlying courtship behavior. Importantly, one of these direct targets is fruitless, the master regulator of courtship. Thus, PSI imposes a specific mode of regulatory control within the neuronal circuit controlling courtship, even though it is broadly expressed in the fly nervous system. This study reinforces the importance of AS in the control of gene activity in neurons and integrated neuronal circuits, and provides a surprising link between a pleiotropic pre-mRNA splicing pathway and the precise control of successful male mating behavior
Probing Isoform Switching Events in Various Cancer Types: Lessons From Pan-Cancer Studies
Alternative splicing is an essential regulatory mechanism for gene expression in mammalian cells contributing to protein, cellular, and species diversity. In cancer, alternative splicing is frequently disturbed, leading to changes in the expression of alternatively spliced protein isoforms. Advances in sequencing technologies and analysis methods led to new insights into the extent and functional impact of disturbed alternative splicing events. In this review, we give a brief overview of the molecular mechanisms driving alternative splicing, highlight the function of alternative splicing in healthy tissues and describe how alternative splicing is disrupted in cancer. We summarize current available computational tools for analyzing differential transcript usage, isoform switching events, and the pathogenic impact of cancer-specific splicing events. Finally, the strategies of three recent pan-cancer studies on isoform switching events are compared. Their methodological similarities and discrepancies are highlighted and lessons learned from the comparison are listed. We hope that our assessment will lead to new and more robust methods for cancer-specific transcript detection and help to produce more accurate functional impact predictions of isoform switching events
Recommended from our members
Genome-Wide Profiling of Circular RNAs in the Rapidly Growing Shoots of Moso Bamboo (Phyllostachys edulis).
Circular RNAs, including circular exonic RNAs (circRNA), circular intronic RNAs (ciRNA) and exon-intron circRNAs (EIciRNAs), are a new type of noncoding RNAs. Growing shoots of moso bamboo (Phyllostachys edulis) represent an excellent model of fast growth and their circular RNAs have not been studied yet. To understand the potential regulation of circular RNAs, we systematically characterized circular RNAs from eight different developmental stages of rapidly growing shoots. Here, we identified 895 circular RNAs including a subset of mutually inclusive circRNA. These circular RNAs were generated from 759 corresponding parental coding genes involved in cellulose, hemicellulose and lignin biosynthetic process. Gene co-expression analysis revealed that hub genes, such as DEFECTIVE IN RNA-DIRECTED DNA METHYLATION 1 (DRD1), MAINTENANCE OF METHYLATION (MOM), dicer-like 3 (DCL3) and ARGONAUTE 1 (AGO1), were significantly enriched giving rise to circular RNAs. The expression level of these circular RNAs presented correlation with its linear counterpart according to transcriptome sequencing. Further protoplast transformation experiments indicated that overexpressing circ-bHLH93 generating from transcription factor decreased its linear transcript. Finally, the expression profiles suggested that circular RNAs may have interplay with miRNAs to regulate their cognate linear mRNAs, which was further supported by overexpressing miRNA156 decreasing the transcript of circ-TRF-1 and linear transcripts of TRF-1. Taken together, the overall profile of circular RNAs provided new insight into an unexplored category of long noncoding RNA regulation in moso bamboo
Bayesian Modeling Approaches for Temporal Dynamics in RNA-seq Data
Analysis of differential expression has been a central role to address the variety of biological questions in the manner to characterize abnormal patterns of cellular and molecular functions for last decades. To date, identification of differentially expressed genes and isoforms has been more intensively focused on temporal dynamics over a series of time points. Bayesian strategies have been successfully employed to uncover the complexity of biological interest with the methodological and analytical perspectives for the various platforms of high-throughput data, for instance, methods in differential expression analysis and network modules in transcriptome data, peak-callers in ChipSeq data, target prediction in microRNA data and meta-methods between different platforms. In this chapter, we will discuss how our methodological works based on Bayesian models address important questions to arise in the architecture of temporal dynamics in RNA-seq data
- …