Search CORE

2,347 research outputs found

Estimation of alternative splicing isoform frequencies from RNA-Seq data

Author: A Mortazavi
A Oshlack
A Roberts
Alex Zelikovsky
B Jackson
B Langmead
B Li
B Paşaniuc
BE Howard
C Trapnell
C Trapnell
CP Ponting
D Hiller
E Wang
H Jiang
H Richard
I Birol
Ion I Măndoiu
J Bloom
J Clarke
J Eid
J Feng
KD Hansen
M Anton
M Griffith
M Guttman
M Sultan
Marius Nicolae
P Carninci
Serghei Mangul
Team MGC Project
V Lacroix
Y She
Y Surget-Groba
Z Wang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Massively parallel whole transcriptome sequencing, commonly referred as RNA-Seq, is quickly becoming the technology of choice for gene expression profiling. However, due to the short read length delivered by current sequencing technologies, estimation of expression levels for alternative splicing gene isoforms remains challenging. Results In this paper we present a novel expectation-maximization algorithm for inference of isoform- and gene-specific expression levels from RNA-Seq data. Our algorithm, referred to as IsoEM, is based on disambiguating information provided by the distribution of insert sizes generated during sequencing library preparation, and takes advantage of base quality scores, strand and read pairing information when available. The open source Java implementation of IsoEM is freely available at <url>http://dna.engr.uconn.edu/software/IsoEM/</url>. Conclusions Empirical experiments on both synthetic and real RNA-Seq datasets show that IsoEM has scalable running time and outperforms existing methods of isoform and gene expression level estimation. Simulation experiments confirm previous findings that, for a fixed sequencing cost, using reads longer than 25-36 bases does not necessarily lead to better accuracy for estimating expression levels of annotated isoforms and genes.</p

Crossref

ScholarWorks @ Georgia State University

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Dagstuhl Research Online Publication Server

Methods to study splicing from high-throughput RNA Sequencing data

Author: A Ameur
A Bhasi
A Dobin
A Mortazavi
A Oshlack
A Roberts
A Roberts
AM Mezlini
AN Brooks
B Jackson
B Kakaradov
B Langmead
B Li
B Li
BJ Haas
BJ Haas
C Trapnell
C Trapnell
C Trapnell
D Hiller
D Singh
DL Wood
DW Bryant
E Eyras
E Lee
E Turro
ET Wang
F Birzele
F Bona De
F Denoeud
F Tang
G Robertson
G Xu
GA Sacomoto
GR Grant
GS Slater
H Bao
H Jiang
H Jiang
H Kim
H Richard
J Behr
J Du
J Feng
J Hu
J Lovén
J Martin
J Salzman
J Seok
J Seok
J Wu
J Wu
JE Allen
JJ Li
JP Venables
K Schneeberger
K Wang
KD Hansen
KF Au
KL Howe
KM Borgwardt
L Chen
L Chen
L Wang
L Wang
LY Chen
M Aschoff
M Fiume
M Garber
M Griffith
M Guttman
M Stanke
M Stanke
M Sultan
MC Ryan
MF Rogers
MG Grabherr
MH Schulz
MT Dimon
N Cloonan
N Cloonan
N Deng
N Leng
N Nicolae
N Philippe
N Vijay
NA Fonseca
O Stegle
P Drewe
P Glaus
PL Martelli
PP Labaj
Q Liu
Q Liu
Q Pan
QY Zhao
R Bohnert
R Guigó
R Li
S Anders
S Djebali
S Filichkin
S Heber
S Huang
S Lee
S Mangul
S Marco-Sola
S Shen
S Sonnenburg
S Srivastava
S Tang
S Zheng
SB Montgomery
SH Nagaraj
SK Lou
T Bonfert
TA Clark
TD Wu
TD Wu
W Li
W Li
W Wang
WJ Kent
Y Hu
Y Katz
Y Li
Y Liao
Y Surget-Groba
Y Xing
Y Xing
Y Zhang
Z Xia
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/07/2015
Field of study

The development of novel high-throughput sequencing (HTS) methods for RNA (RNA-Seq) has provided a very powerful mean to study splicing under multiple conditions at unprecedented depth. However, the complexity of the information to be analyzed has turned this into a challenging task. In the last few years, a plethora of tools have been developed, allowing researchers to process RNA-Seq data to study the expression of isoforms and splicing events, and their relative changes under different conditions. We provide an overview of the methods available to study splicing from short RNA-Seq data. We group the methods according to the different questions they address: 1) Assignment of the sequencing reads to their likely gene of origin. This is addressed by methods that map reads to the genome and/or to the available gene annotations. 2) Recovering the sequence of splicing events and isoforms. This is addressed by transcript reconstruction and de novo assembly methods. 3) Quantification of events and isoforms. Either after reconstructing transcripts or using an annotation, many methods estimate the expression level or the relative usage of isoforms and/or events. 4) Providing an isoform or event view of differential splicing or expression. These include methods that compare relative event/isoform abundance or isoform expression across two or more conditions. 5) Visualizing splicing regulation. Various tools facilitate the visualization of the RNA-Seq data in the context of alternative splicing. In this review, we do not describe the specific mathematical models behind each method. Our aim is rather to provide an overview that could serve as an entry point for users who need to decide on a suitable tool for a specific analysis. We also attempt to propose a classification of the tools according to the operations they do, to facilitate the comparison and choice of methods.Comment: 31 pages, 1 figure, 9 tables. Small corrections adde

arXiv.org e-Print Archive

Crossref

Models for transcript quantification from RNA-Seq

Author: Pachter Lior
Publication venue
Publication date: 12/05/2011
Field of study

RNA-Seq is rapidly becoming the standard technology for transcriptome analysis. Fundamental to many of the applications of RNA-Seq is the quantification problem, which is the accurate measurement of relative transcript abundances from the sequenced reads. We focus on this problem, and review many recently published models that are used to estimate the relative abundances. In addition to describing the models and the different approaches to inference, we also explain how methods are related to each other. A key result is that we show how inference with many of the models results in identical estimates of relative abundances, even though model formulations can be very different. In fact, we are able to show how a single general model captures many of the elements of previously published methods. We also review the applications of RNA-Seq models to differential analysis, and explain why accurate relative transcript abundance estimates are crucial for downstream analyses

arXiv.org e-Print Archive

CiteSeerX

Predominant contribution of cis-regulatory divergence in the evolution of mouse alternative splicing

Author: Ballegeer Marlies
Chen Wei
Gao Qingsong
Libert Claude
Sun Wei
Publication venue: 'EMBO'
Publication date: 01/01/2015
Field of study

Divergence of alternative splicing represents one of the major driving forces to shape phenotypic diversity during evolution. However, the extent to which these divergences could be explained by the evolving cis-regulatory versus trans-acting factors remains unresolved. To globally investigate the relative contributions of the two factors for the first time in mammals, we measured splicing difference between C57BL/6J and SPRET/EiJ mouse strains and allele-specific splicing pattern in their F1 hybrid. Out of 11,818 alternative splicing events expressed in the cultured fibroblast cells, we identified 796 with significant difference between the parental strains. After integrating allele-specific data from F1 hybrid, we demonstrated that these events could be predominately attributed to cis-regulatory variants, including those residing at and beyond canonical splicing sites. Contrary to previous observations in Drosophila, such predominant contribution was consistently observed across different types of alternative splicing. Further analysis of liver tissues from the same mouse strains and reanalysis of published datasets on other strains showed similar trends, implying in general the predominant contribution of cis-regulatory changes in the evolution of mouse alternative splicing

Crossref

Ghent University Academic Bibliography

PubMed Central

MDC Repository

Latent rank change detection for analysis of splice-junction microarrays with nonlinear effects

Author: Burns Suzanne
Burton Tarea
Gelfond Jonathan
Penalva Luiz O. F.
Sogayar Mari
Zarzabal Lee Ann
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2011
Field of study

Alternative splicing of gene transcripts greatly expands the functional capacity of the genome, and certain splice isoforms may indicate specific disease states such as cancer. Splice junction microarrays interrogate thousands of splice junctions, but data analysis is difficult and error prone because of the increased complexity compared to differential gene expression analysis. We present Rank Change Detection (RCD) as a method to identify differential splicing events based upon a straightforward probabilistic model comparing the over- or underrepresentation of two or more competing isoforms. RCD has advantages over commonly used methods because it is robust to false positive errors due to nonlinear trends in microarray measurements. Further, RCD does not depend on prior knowledge of splice isoforms, yet it takes advantage of the inherent structure of mutually exclusive junctions, and it is conceptually generalizable to other types of splicing arrays or RNA-Seq. RCD specifically identifies the biologically important cases when a splice junction becomes more or less prevalent compared to other mutually exclusive junctions. The example data is from different cell lines of glioblastoma tumors assayed with Agilent microarrays.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS389 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

RCAAP - Repositório Científico de Acesso Aberto de Portugal

Algorithms for Transcriptome Quantification and Reconstruction from RNA-Seq Data

Author: Mangul Serghei
Publication venue: ScholarWorks @ Georgia State University
Publication date: 16/11/2012
Field of study

Massively parallel whole transcriptome sequencing and its ability to generate full transcriptome data at the single transcript level provides a powerful tool with multiple interrelated applications, including transcriptome reconstruction, gene/isoform expression estimation, also known as transcriptome quantification. As a result, whole transcriptome sequencing has become the technology of choice for performing transcriptome analysis, rapidly replacing array-based technologies. The most commonly used transcriptome sequencing protocol, referred to as RNA-Seq, generates short (single or paired) sequencing tags from the ends of randomly generated cDNA fragments. RNA-Seq protocol reduces the sequencing cost and significantly increases data throughput, but is computationally challenging to reconstruct full-length transcripts and accurately estimate their abundances across all cell types. We focus on two main problems in transcriptome data analysis, namely, transcriptome reconstruction and quantification. Transcriptome reconstruction, also referred to as novel isoform discovery, is the problem of reconstructing the transcript sequences from the sequencing data. Reconstruction can be done de novo or it can be assisted by existing genome and transcriptome annotations. Transcriptome quantification refers to the problem of estimating the expression level of each transcript. We present a genome-guided and annotation-guided transcriptome reconstruction methods as well as methods for transcript and gene expression level estimation. Empirical results on both synthetic and real RNA-seq datasets show that the proposed methods improve transcriptome quantification and reconstruction accuracy compared to previous methods

ScholarWorks @ Georgia State University

Optimization Techniques For Next-Generation Sequencing Data Analysis

Author: Caciula Adrian
Publication venue: ScholarWorks @ Georgia State University
Publication date: 12/08/2014
Field of study

High-throughput RNA sequencing (RNA-Seq) is a popular cost-efficient technology with many medical and biological applications. This technology, however, presents a number of computational challenges in reconstructing full-length transcripts and accurately estimate their abundances across all cell types. Our contributions include (1) transcript and gene expression level estimation methods, (2) methods for genome-guided and annotation-guided transcriptome reconstruction, and (3) de novo assembly and annotation of real data sets. Transcript expression level estimation, also referred to as transcriptome quantification, tackle the problem of estimating the expression level of each transcript. Transcriptome quantification analysis is crucial to determine similar transcripts or unraveling gene functions and transcription regulation mechanisms. We propose a novel simulated regression based method for transcriptome frequency estimation from RNA-Seq reads. Transcriptome reconstruction refers to the problem of reconstructing the transcript sequences from the RNA-Seq data. We present genome-guided and annotation-guided transcriptome reconstruction methods. Empirical results on both synthetic and real RNA-seq datasets show that the proposed methods improve transcriptome quantification and reconstruction accuracy compared to currently state of the art methods. We further present the assembly and annotation of Bugula neritina transcriptome (a marine colonial animal), and Tallapoosa darter genome (a species-rich radiation freshwater fish)

ScholarWorks @ Georgia State University

Recommended from our members

Striking circadian neuron diversity and cycling of Drosophila alternative splicing.

Author: Abruzzi Katharine C
Rio Donald C
Rosbash Michael
Wang Qingqing
Publication venue: eScholarship, University of California
Publication date: 01/06/2018
Field of study

Although alternative pre-mRNA splicing (AS) significantly diversifies the neuronal proteome, the extent of AS is still unknown due in part to the large number of diverse cell types in the brain. To address this complexity issue, we used an annotation-free computational method to analyze and compare the AS profiles between small specific groups of Drosophila circadian neurons. The method, the Junction Usage Model (JUM), allows the comprehensive profiling of both known and novel AS events from specific RNA-seq libraries. The results show that many diverse and novel pre-mRNA isoforms are preferentially expressed in one class of clock neuron and also absent from the more standard Drosophila head RNA preparation. These AS events are enriched in potassium channels important for neuronal firing, and there are also cycling isoforms with no detectable underlying transcriptional oscillations. The results suggest massive AS regulation in the brain that is also likely important for circadian regulation

eScholarship - University of California

Decoding a cancer-relevant splicing decision in the RON proto-oncogene using high-throughput mutagenesis

Author: A Decorsiere
A Subramanian
A Sveen
AB Rosenberg
AM Bolger
B Singh
C Collesi
C Ghigna
CV Lefave
D Wang
E Dardenne
E Sebestyen
ET Wang
F Supek
FE Baralle
FJ Sedlazeck
FX Sutandy
G Consortium
G Yeo
GA Auwera Van der
H Han
H Jung
H Moon
H Moon
HP Yao
HY Xiong
J Chakedis
J Rauch
J Vivian
JD Ellis
JF Fisette
JJ Gartner
JM O’Toole
JT Witten
K Zhang
KS Pollard
M Llorian
M Nazim
MC Wahl
MC Whitlock
P Julien
P Papasaikas
PJ Uren
Q Pan
R Savisaar
S Bonomi
S Gueroussov
S Ke
S Mayer
SA Shabalina
T Sterne-Weiler
V Gotea
VK Mootha
WF Mueller
X Xiao
X Yang
XD Fu
Y Barash
Y Barash
Y Katz
Y Lu
Y Xing
Y Xing
YQ Zhou
Z Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Mutations causing aberrant splicing are frequently implicated in human diseases including cancer. Here, we establish a high-throughput screen of randomly mutated minigenes to decode the cis-regulatory landscape that determines alternative splicing of exon 11 in the proto-oncogene MST1R (RON). Mathematical modelling of splicing kinetics enables us to identify more than 1000 mutations affecting RON exon 11 skipping, which corresponds to the pathological isoform RON Delta 165. Importantly, the effects correlate with RON alternative splicing in cancer patients bearing the same mutations. Moreover, we highlight heterogeneous nuclear ribonucleoprotein H (HNRNPH) as a key regulator of RON splicing in healthy tissues and cancer. Using iCLIP and synergy analysis, we pinpoint the functionally most relevant HNRNPH binding sites and demonstrate how cooperative HNRNPH binding facilitates a splicing switch of RON exon 11. Our results thereby offer insights into splicing regulation and the impact of mutations on alternative splicing in cancer.Institute of Molecular Biology Core Facilities; DFG [ZA 881/2-1, KO 4566/4-1, LE 3473/2-1]; LOEWE program Ubiquitin Networks (Ub-Net) of the State of Hesse (Germany); Deutsche Forschungsgemeinschaft [SFB902 B13]; EMBO [3057]; Fundacao para a Ciencia e a Tecnologia, Portugal (FCT Investigator Starting Grant) [IF/00595/2014]; German Federal Ministry of Research (BMBF; e:bio junior group program) [FKZ: 0316196]; Boehringer Ingelheim Foundation; [INST 47/870-1 FUGG

Crossref

Directory of Open Access Journals

Universidade de Lisboa: Repositório.UL

Sapientia

Hochschulschriftenserver - Universität Frankfurt am Main