Search CORE

Online Research @ Cardiff

Permanent Hosting, Archiving and Indexing of Digital Resources and Assets

Oxford University Research Archive

TRAPLINE: a standardized and automated pipeline for RNA sequencing data analysis, evaluation and annotation

Author: David Robert
Jung Julia Jeannine
Krebs Stefan
Rimmbach Christian
Schmitz Ulf
Steinhoff Gustav
Wolfien Markus
Wolkenhauer Olaf
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Background: Technical advances in Next Generation Sequencing (NGS) provide a means to acquire deeper insights into cellular functions. The lack of standardized and automated methodologies poses a challenge for the analysis and interpretation of RNA sequencing data. We critically compare and evaluate state-of-the-art bioinformatics approaches and present a workflow that integrates the best performing data analysis, data evaluation and annotation methods in a Transparent, Reproducible and Automated PipeLINE (TRAPLINE) for RNA sequencing data processing (suitable for Illumina, SOLiD and Solexa). Results: Comparative transcriptomics analyses with TRAPLINE result in a set of differentially expressed genes, their corresponding protein-protein interactions, splice variants, promoter activity, predicted miRNA-target interactions and files for single nucleotide polymorphism (SNP) calling. The obtained results are combined into a single file for downstream analysis such as network construction. We demonstrate the value of the proposed pipeline by characterizing the transcriptome of our recently described stem cell derived antibiotic selected cardiac bodies ('aCaBs'). Conclusion: TRAPLINE supports NGS-based research by providing a workflow that requires no bioinformatics skills, decreases the processing time of the analysis and works in the cloud. The pipeline is implemented in the biomedical research platform Galaxy and is freely accessible via www.sbi.uni-rostock.de/RNAseqTRAPLINE or the specific Galaxy manual page (https://usegalaxy.org/u/mwolfien/p/trapline-manual)

ResearchOnline@JCU

Springer - Publisher Connector

ResearchOnline at James Cook University

Open Access LMU

Stellenbosch University SUNScholar Repository

FigShare

TRAPLINE: a standardized and automated pipeline for RNA sequencing data analysis, evaluation and annotation

Author: A Chatr-Aryamontri
A McKenna
A Mortazavi
A Roberts
B Langmead
B Langmead
BT Wilhelm
BT Wilhelm
C Chen
C Fabbro Del
C Trapnell
C Trapnell
C Trapnell
Christian Rimmbach
D Betel
D Blankenberg
D Blankenberg
D Kim
D Ramskold
DG Knowles
DW Huang
DW Huang
EA Howe
EC Hayden
F Seyednasrollah
F Swift
G Bindea
G Luxan
Gustav Steinhoff
H Li
HB Xu
I Kozarewa
I Nookaew
J Li
JC Marioni
JJ Jung
Julia Jeannine Jung
L Wang
M Lohse
MA Kallio
Markus Wolfien
MD Robinson
O Morozova
OD Iancu
Olaf Wolkenhauer
R David
Robert David
S Anders
S Pepke
SK Mbandi
Stefan Krebs
Ulf Schmitz
V Bansal
VM Kvam
VT Sreedharan
W Ritchie
Y Benjamini
Y Hu
Z Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Public Library of Science (PLOS)

RNA-Seq Mapping and Detection of Gene Fusions with a Suffix Array Algorithm

Author: A Ameur
A McPherson
A Mortazavi
A Sboner
Asim S. Siddiqui
B Li
Benjamin S. Kong
BJ Druker
BP Lewis
BP Rubin
C Adem
C Kumar-Sinha
C Lin
C Tognon
C Trapnell
C Trapnell
CA Maher
CA Maher
CA Westbrook
Catalin Barbacioru
Chieh-Yuan Li
D Zerbino
EL Kwak
ET Wang
F De Bona
F Denoeud
F Ozsolak
F Tang
Fiona C. Hyland
G Robertson
H Edgren
Heinz Breu
I Birol
J Wang
JD Rowley
Jeffrey K. Ichikawa
Jian Gu
Joel P. Brockman
John P. Bodeau
JP Koivunen
K Inaki
K Kannan
Kelli S. Bramlett
KF Au
KJ McKernan
KS Kosik
L Shi
Liviu Popescu
M Guttman
M Kinsella
M Krzywinski
M Nicolae
M Persson
M Yassour
Matthew W. Muller
MC Haffner
MF Berger
Milan Radovich
N Cloonan
N Cloonan
N Palanisamy
Nriti Garg
O Monni
OA Hampton
Onur Sakarya
P Shepherd
Paolo Vatta
Penn P. Whitley
RD Canales
Robert C. Nutter
S Perner
SA Tomlins
SG O'Brien
Sowmi Utiramerur
SR Knezevich
U Manber
U Nagalakshmi
Vidya Kudlingar
Weixiong Zhang
Y Hu
Y Surget-Groba
Yongzhi Chen
Yulei N. Wang
YW Asmann
Z Wang
Zheng Zhang
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

High-throughput RNA sequencing enables quantification of transcripts (both known and novel), exon/exon junctions and fusions of exons from different genes. Discovery of gene fusions–particularly those expressed with low abundance– is a challenge with short- and medium-length sequencing reads. To address this challenge, we implemented an RNA-Seq mapping pipeline within the LifeScope software. We introduced new features including filter and junction mapping, annotation-aided pairing rescue and accurate mapping quality values. We combined this pipeline with a Suffix Array Spliced Read (SASR) aligner to detect chimeric transcripts. Performing paired-end RNA-Seq of the breast cancer cell line MCF-7 using the SOLiD system, we called 40 gene fusions among over 120,000 splicing junctions. We validated 36 of these 40 fusions with TaqMan assays, of which 25 were expressed in MCF-7 but not the Human Brain Reference. An intra-chromosomal gene fusion involving the estrogen receptor alpha gene ESR1, and another involving the RPS6KB1 (Ribosomal protein S6 kinase beta-1) were recurrently expressed in a number of breast tumor cell lines and a clinical tumor sample

CiteSeerX

Public Library of Science (PLOS)

deFuse: An Algorithm for Gene Fusion Discovery in Tumor RNA-Seq Data

Gene fusions created by somatic genomic rearrangements are known to play an important role in the onset and development of some cancers, such as lymphomas and sarcomas. RNA-Seq (whole transcriptome shotgun sequencing) is proving to be a useful tool for the discovery of novel gene fusions in cancer transcriptomes. However, algorithmic methods for the discovery of gene fusions using RNA-Seq data remain underdeveloped. We have developed deFuse, a novel computational method for fusion discovery in tumor RNA-Seq data. Unlike existing methods that use only unique best-hit alignments and consider only fusion boundaries at the ends of known exons, deFuse considers all alignments and all possible locations for fusion boundaries. As a result, deFuse is able to identify fusion sequences with demonstrably better sensitivity than previous approaches. To increase the specificity of our approach, we curated a list of 60 true positive and 61 true negative fusion sequences (as confirmed by RT-PCR), and have trained an adaboost classifier on 11 novel features of the sequence data. The resulting classifier has an estimated value of 0.91 for the area under the ROC curve. We have used deFuse to discover gene fusions in 40 ovarian tumor samples, one ovarian cancer cell line, and three sarcoma samples. We report herein the first gene fusions discovered in ovarian cancer. We conclude that gene fusions are not infrequent events in ovarian cancer and that these events have the potential to substantially alter the expression patterns of the genes involved; gene fusions should therefore be considered in efforts to comprehensively characterize the mutational profiles of ovarian cancer transcriptomes

A Robust Method for Transcript Quantification with RNA-Seq Data

Author: Chiang Derek Y.
Hu Yin
Huang Yan
Jones Corbin D.
Liu Jinze
Liu Yufeng
MacLeod James N.
Prins Jan F.
Publication venue
Publication date: 01/01/2013
Field of study

The advent of high throughput RNA-seq technology allows deep sampling of the transcriptome, making it possible to characterize both the diversity and the abundance of transcript isoforms. Accurate abundance estimation or transcript quantification of isoforms is critical for downstream differential analysis (e.g., healthy vs. diseased cells) but remains a challenging problem for several reasons. First, while various types of algorithms have been developed for abundance estimation, short reads often do not uniquely identify the transcript isoforms from which they were sampled. As a result, the quantification problem may not be identifiable, i.e., lacks a unique transcript solution even if the read maps uniquely to the reference genome. In this article, we develop a general linear model for transcript quantification that leverages reads spanning multiple splice junctions to ameliorate identifiability. Second, RNA-seq reads sampled from the transcriptome exhibit unknown position-specific and sequence-specific biases. We extend our method to simultaneously learn bias parameters during transcript quantification to improve accuracy. Third, transcript quantification is often provided with a candidate set of isoforms, not all of which are likely to be significantly expressed in a given tissue type or condition. By resolving the linear system with LASSO, our approach can infer an accurate set of dominantly expressed transcripts while existing methods tend to assign positive expression to every candidate isoform. Using simulated RNA-seq datasets, our method demonstrated better quantification accuracy and the inference of dominant set of transcripts than existing methods. The application of our method on real data experimentally demonstrated that transcript quantification is effective for differential analysis of transcriptomes

Carolina Digital Repository

Detecting Cancer Outlier Genes with Potential Rearrangement Using Gene Expression Data and Biological Networks

Author: Mohammed Alshalalfa
Reda Alhajj
Tarek A. Bismar
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2012
Field of study

Gene alterations are a major component of the landscape of tumor genomes. To assess the significance of these alterations in the development of prostate cancer, it is necessary to identify these alterations and analyze them from systems biology perspective. Here, we present a new method (EigFusion) for predicting outlier genes with potential gene rearrangement. EigFusion demonstrated excellent performance in identifying outlier genes with potential rearrangement by testing it to synthetic and real data to evaluate performance. EigFusion was able to identify previously unrecognized genes such as FABP5 and KCNH8 and confirmed their association with primary and metastatic prostate samples while confirmed the metastatic specificity for other genes such as PAH, TOP2A, and SPINK1. We performed protein network based approaches to analyze the network context of potential rearranged genes. Functional gene rearrangement Modules are constructed by integrating functional protein networks. Rearranged genes showed to be highly connected to well-known altered genes in cancer such as AR, RB1, MYC, and BRCA1. Finally, using clinical outcome data of prostate cancer patients, potential rearranged genes demonstrated significant association with prostate cancer specific death

PRISM: University of Calgary Digital Repository

A NOVEL COMPUTATIONAL FRAMEWORK FOR TRANSCRIPTOME ANALYSIS WITH RNA-SEQ DATA

Author: Hu Yin
Publication venue: UKnowledge
Publication date: 01/01/2013
Field of study

The advance of high-throughput sequencing technologies and their application on mRNA transcriptome sequencing (RNA-seq) have enabled comprehensive and unbiased profiling of the landscape of transcription in a cell. In order to address the current limitation of analyzing accuracy and scalability in transcriptome analysis, a novel computational framework has been developed on large-scale RNA-seq datasets with no dependence on transcript annotations. Directly from raw reads, a probabilistic approach is first applied to infer the best transcript fragment alignments from paired-end reads. Empowered by the identification of alternative splicing modules, this framework then performs precise and efficient differential analysis at automatically detected alternative splicing variants, which circumvents the need of full transcript reconstruction and quantification. Beyond the scope of classical group-wise analysis, a clustering scheme is further described for mining prominent consistency among samples in transcription, breaking the restriction of presumed grouping. The performance of the framework has been demonstrated by a series of simulation studies and real datasets, including the Cancer Genome Atlas (TCGA) breast cancer analysis. The successful applications have suggested the unprecedented opportunity in using differential transcription analysis to reveal variations in the mRNA transcriptome in response to cellular differentiation or effects of diseases

University of Kentucky

A probabilistic framework for aligning paired-end RNA-seq data

Author: Au
Berger
Dempster
Derek Y. Chiang
Jan F. Prins
Jinze Liu
Kai Wang
Krzywinski
Li
Maher
Trapnell
Wang
Wu
Xiaping He
Yin Hu
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

Motivation: The RNA-seq paired-end read (PER) protocol samples transcript fragments longer than the sequencing capability of today's technology by sequencing just the two ends of each fragment. Deep sampling of the transcriptome using the PER protocol presents the opportunity to reconstruct the unsequenced portion of each transcript fragment using end reads from overlapping PERs, guided by the expected length of the fragment