6,194 research outputs found
Identification and visualization of differential isoform expression in RNA-seq time series
Motivation: As sequencing technologies improve their capacity to detect distinct transcripts of the same gene and to address complex experimental designs such as longitudinal studies, there is a need to develop statistical methods for the analysis of isoform expression changes in time series data. Results: Iso-maSigPro is a new functionality of the R package maSigPro for transcriptomics time series data analysis. Iso-maSigPro identifies genes with a differential isoform usage across time. The package also includes new clustering and visualization functions that allow grouping of genes with similar expression patterns at the isoform level, as well as those genes with a shift in major expressed isoform. Availability and implementation: The package is freely available under the LGPL license from the Bioconductor web site.This work was supported by EU FP7 STATegra project agreement [306000]; and the Spanish Ministry of Economy and Competitiveness [BIO2012-40244 and BIO2015-71658-R]
Methods to study splicing from high-throughput RNA Sequencing data
The development of novel high-throughput sequencing (HTS) methods for RNA
(RNA-Seq) has provided a very powerful mean to study splicing under multiple
conditions at unprecedented depth. However, the complexity of the information
to be analyzed has turned this into a challenging task. In the last few years,
a plethora of tools have been developed, allowing researchers to process
RNA-Seq data to study the expression of isoforms and splicing events, and their
relative changes under different conditions. We provide an overview of the
methods available to study splicing from short RNA-Seq data. We group the
methods according to the different questions they address: 1) Assignment of the
sequencing reads to their likely gene of origin. This is addressed by methods
that map reads to the genome and/or to the available gene annotations. 2)
Recovering the sequence of splicing events and isoforms. This is addressed by
transcript reconstruction and de novo assembly methods. 3) Quantification of
events and isoforms. Either after reconstructing transcripts or using an
annotation, many methods estimate the expression level or the relative usage of
isoforms and/or events. 4) Providing an isoform or event view of differential
splicing or expression. These include methods that compare relative
event/isoform abundance or isoform expression across two or more conditions. 5)
Visualizing splicing regulation. Various tools facilitate the visualization of
the RNA-Seq data in the context of alternative splicing. In this review, we do
not describe the specific mathematical models behind each method. Our aim is
rather to provide an overview that could serve as an entry point for users who
need to decide on a suitable tool for a specific analysis. We also attempt to
propose a classification of the tools according to the operations they do, to
facilitate the comparison and choice of methods.Comment: 31 pages, 1 figure, 9 tables. Small corrections adde
Recommended from our members
Striking circadian neuron diversity and cycling of Drosophila alternative splicing.
Although alternative pre-mRNA splicing (AS) significantly diversifies the neuronal proteome, the extent of AS is still unknown due in part to the large number of diverse cell types in the brain. To address this complexity issue, we used an annotation-free computational method to analyze and compare the AS profiles between small specific groups of Drosophila circadian neurons. The method, the Junction Usage Model (JUM), allows the comprehensive profiling of both known and novel AS events from specific RNA-seq libraries. The results show that many diverse and novel pre-mRNA isoforms are preferentially expressed in one class of clock neuron and also absent from the more standard Drosophila head RNA preparation. These AS events are enriched in potassium channels important for neuronal firing, and there are also cycling isoforms with no detectable underlying transcriptional oscillations. The results suggest massive AS regulation in the brain that is also likely important for circadian regulation
3D RNA-seq:A powerful and flexible tool for rapid and accurate differential expression and alternative splicing analysis of RNA-seq data for biologists
RNA-sequencing (RNA-seq) analysis of gene expression and alternative splicing should be routine and robust but is often a bottleneck for biologists because of different and complex analysis programs and reliance on specialized bioinformatics skills. We have developed the ‘3D RNA-seq’ App, an R shiny App and web-based pipeline for the comprehensive analysis of RNA-seq data from any organism. It represents an easy-to-use, flexible and powerful tool for analysis of both gene and transcript-level gene expression to identify differential gene/transcript expression, differential alternative splicing and differential transcript usage (3D) as well as isoform switching from RNA-seq data. 3D RNA-seq integrates state-of-the-art differential expression analysis tools and adopts best practice for RNA-seq analysis. The program is designed to be run by biologists with minimal bioinformatics experience (or by bioinformaticians) allowing lab scientists to analyse their RNA-seq data. It achieves this by operating through a user-friendly graphical interface which automates the data flow through the programs in the pipeline. The comprehensive analysis performed by 3D RNA-seq is extremely rapid and accurate, can handle complex experimental designs, allows user setting of statistical parameters, visualizes the results through graphics and tables, and generates publication quality figures such as heat-maps, expression profiles and GO enrichment plots. The utility of 3D RNA-seq is illustrated by analysis of data from a time-series of cold-treated Arabidopsis plants and from dexamethasone-treated male and female mouse cortex and hypothalamus data identifying dexamethasone-induced sex- and brain region-specific differential gene expression and alternative splicing
A survey of best practices for RNA-seq data analysis.
RNA-sequencing (RNA-seq) has a wide variety of applications, but no single analysis pipeline can be used in all cases. We review all of the major steps in RNA-seq data analysis, including experimental design, quality control, read alignment, quantification of gene and transcript levels, visualization, differential gene expression, alternative splicing, functional analysis, gene fusion detection and eQTL mapping. We highlight the challenges associated with each step. We discuss the analysis of small RNAs and the integration of RNA-seq with other functional genomics techniques. Finally, we discuss the outlook for novel technologies that are changing the state of the art in transcriptomics.This is the final published version. It first appeared at http://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0881-8
Modeling and analysis of RNA-seq data: a review from a statistical perspective
Background: Since the invention of next-generation RNA sequencing (RNA-seq)
technologies, they have become a powerful tool to study the presence and
quantity of RNA molecules in biological samples and have revolutionized
transcriptomic studies. The analysis of RNA-seq data at four different levels
(samples, genes, transcripts, and exons) involve multiple statistical and
computational questions, some of which remain challenging up to date.
Results: We review RNA-seq analysis tools at the sample, gene, transcript,
and exon levels from a statistical perspective. We also highlight the
biological and statistical questions of most practical considerations.
Conclusion: The development of statistical and computational methods for
analyzing RNA- seq data has made significant advances in the past decade.
However, methods developed to answer the same biological question often rely on
diverse statical models and exhibit different performance under different
scenarios. This review discusses and compares multiple commonly used
statistical models regarding their assumptions, in the hope of helping users
select appropriate methods as needed, as well as assisting developers for
future method development
Bayesian Modeling Approaches for Temporal Dynamics in RNA-seq Data
Analysis of differential expression has been a central role to address the variety of biological questions in the manner to characterize abnormal patterns of cellular and molecular functions for last decades. To date, identification of differentially expressed genes and isoforms has been more intensively focused on temporal dynamics over a series of time points. Bayesian strategies have been successfully employed to uncover the complexity of biological interest with the methodological and analytical perspectives for the various platforms of high-throughput data, for instance, methods in differential expression analysis and network modules in transcriptome data, peak-callers in ChipSeq data, target prediction in microRNA data and meta-methods between different platforms. In this chapter, we will discuss how our methodological works based on Bayesian models address important questions to arise in the architecture of temporal dynamics in RNA-seq data
RNA CoMPASS: RNA Comprehensive Multi-Processor Analysis System for Sequencing
The main theme of this dissertation is to develop a distributed computational pipeline for processing next-generation RNA sequencing (RNA-seq) data. RNA-seq experiments generate hundreds of millions of short reads for each DNA/RNA sample. There are many existing bioinformatics tools developed for the analysis and visualization of this data, but very large studies present computational and organizational challenges that are difficult to overcome manually. We designed a comprehensive pipeline for the analysis of RNA sequencing which leverages many existing tools and parallel computing technology to facilitate the analysis of extremely large studies. RNA CoMPASS provides a web-based graphical user interface and distributed computational pipeline including endogenous transcriptome quantification and additionally the investigation of exogenous sequences
Transcript assembly, quantification and differential alternative splicing detection from RNA-Seq
This dissertation is focused on improving RNA-Seq processing in terms of
transcript assembly, transcript quantification and detection of differential alternative splicing.
There are two major challenges of solving these three problems.
The first is accurately deriving transcript-level expression values from RNA-Seq reads that often align ambiguously to a set of overlapping isoforms.
To make matter worse, gene annotation tends to misguide transcript quantification as new transcripts are often discovered in new RNA-Seq experiments.
The second challenge is accounting for intrinsic uncertainties or variabilities in RNA-Seq measurement when calling differential alternative splicing from multiple samples across two conditions.
Those uncertainties include coverage bias and biological variations.
Failing to account for these variabilities can lead to higher false positive rates.
To addressed these challenges, I develop a series of novel algorithms which are implemented in a software package called Strawberry.
To tackle the read assignment uncertainty challenge, Strawberry assembles aligned RNA-Seq reads into transcripts using a constrained flow network algorithm.
After the assembly, Strawberry uses a latent class model to assign reads to transcripts.
These two steps use different optimization frameworks but utilize the same graph structure, which allows a highly efficient, expandable and accurate algorithm for dealing large data.
To infer differential alternative splicing, Strawberry extends the single sample quantification model by imposing a generalized linear model on the relative transcript proportions.
To account for count overdispersion, Strawberry uses an empirical Bayesian hierarchical model.
For coverage bias, Strawberry performs a bias correction step which borrows information across samples and genes before fitting the differential analysis model.
A serious of simulated and real data are used to evaluate and benchmark Strawberry\u27s result.
Strawberry outperforms Cufflinks and StringTie in terms of both assembly and quantification accuracies.
In terms of detecting differential alternative splicing, Strawberry also outperforms several state-of-the-art methods including DEXSeq, Cuffdiff 2 and DSGseq.
Strawberry and its supporting code, e.g., simulation and validation, are freely available at my github (\url{https://github.com/ruolin})
- …