Search CORE

367 research outputs found

Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments

Author: A Lee
A Mortazavi
A Oshlack
B Ewing
B Langmead
DR Bentley
DY Chiang
Elizabeth Purdom
ET Wang
H Li
Illumina
Illumina
J Lu
James H Bullard
JC Dohm
JC Marioni
Kasper D Hansen
MA Taub
MAQC Consortium
MD Robinson
PAC Hoen
RA Irizarry
RA Irizarry
RD Canales
S Durinck
Sandrine Dudoit
U Nagalakshmi
Publication venue: BioMed Central
Publication date: 21/04/2009
Field of study

Abstract Background High-throughput sequencing technologies, such as the Illumina Genome Analyzer, are powerful new tools for investigating a wide range of biological and medical questions. Statistical and computational methods are key for drawing meaningful and accurate conclusions from the massive and complex datasets generated by the sequencers. We provide a detailed evaluation of statistical methods for normalization and differential expression (DE) analysis of Illumina transcriptome sequencing (mRNA-Seq) data. Results We compare statistical methods for detecting genes that are significantly DE between two types of biological samples and find that there are substantial differences in how the test statistics handle low-count genes. We evaluate how DE results are affected by features of the sequencing platform, such as, varying gene lengths, base-calling calibration method (with and without phi X control lane), and flow-cell/library preparation effects. We investigate the impact of the read count normalization method on DE results and show that the standard approach of scaling by total lane counts (e.g., RPKM) can bias estimates of DE. We propose more general quantile-based normalization procedures and demonstrate an improvement in DE detection. Conclusions Our results have significant practical and methodological implications for the design and analysis of mRNA-Seq experiments. They highlight the importance of appropriate statistical methods for normalization and DE inference, to account for features of the sequencing platform that could impact the accuracy of results. They also reveal the need for further research in the development of statistical and computational methods for mRNA-Seq.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Collection Of Biostatistics Research Archive

flexiMAP: a regression-based method for discovering differential alternative polyadenylation events in standard RNA-seq data

Author: Anders
Anvar
Arefeen
Bullard
Cribari-Neto
Elkon
Frazee
Garalde
Grassi
Ha
Love
Moll
Robinson
Szkop
Szkop
Wang
Wang
Xia
Ye
Publication venue: Oxford Journals
Publication date: 14/10/2020
Field of study

Motivation: We present flexible Modeling of Alternative PolyAdenylation (flexiMAP), a newbeta-regression-based method implemented in R, for discovering differential alternative polyadenylation events in standard RNA-seq data. Results: We show, using both simulated and real data, that flexiMAP exhibits a good balance between specificity and sensitivity and compares favourably to existing methods, especially at low fold changes. In addition, the tests on simulated data reveal some hitherto unrecognized caveats of existing methods. Importantly, flexiMAP allows modeling of multiple known covariates that often confound the results of RNA-seq data analysis. Availability and implementation: The flexiMAPR package is available at:https://github.com/kszkop/flexiMAP. Scripts and data to reproduce the analysis in this paper are available at: https://doi.org/10.5281/zenodo.3689788

Crossref

PubMed Central

UCL Discovery

Birkbeck Institutional Research Online

NBLDA: Negative Binomial Linear Discriminant Analysis for RNA-Seq Data

Author: Dong Kai
Tong Tiejun
Wan Xiang
Zhao Hongyu
Publication venue
Publication date: 27/01/2015
Field of study

RNA-sequencing (RNA-Seq) has become a powerful technology to characterize gene expression profiles because it is more accurate and comprehensive than microarrays. Although statistical methods that have been developed for microarray data can be applied to RNA-Seq data, they are not ideal due to the discrete nature of RNA-Seq data. The Poisson distribution and negative binomial distribution are commonly used to model count data. Recently, Witten (2011) proposed a Poisson linear discriminant analysis for RNA-Seq data. The Poisson assumption may not be as appropriate as negative binomial distribution when biological replicates are available and in the presence of overdispersion (i.e., when the variance is larger than the mean). However, it is more complicated to model negative binomial variables because they involve a dispersion parameter that needs to be estimated. In this paper, we propose a negative binomial linear discriminant analysis for RNA-Seq data. By Bayes' rule, we construct the classifier by fitting a negative binomial model, and propose some plug-in rules to estimate the unknown parameters in the classifier. The relationship between the negative binomial classifier and the Poisson classifier is explored, with a numerical investigation of the impact of dispersion on the discriminant score. Simulation results show the superiority of our proposed method. We also analyze four real RNA-Seq data sets to demonstrate the advantage of our method in real-world applications

arXiv.org e-Print Archive

Springer - Publisher Connector

RNA sequencing reveals two major classes of gene expression levels in metazoan cells

Author: Alexander van Oudenaarden
Casella G
Daniel Hebenstreit
Miaoqing Fang
Muxin Gu
Sarah A Teichmann
Varodom Charoensawan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

The expression level of a gene is often used as a proxy for determining whether the protein or RNA product is functional in a cell or tissue. Therefore, it is of fundamental importance to understand the global distribution of gene expression levels, and to be able to interpret it mechanistically and functionally. Here we use RNA sequencing of mouse Th2 cells, coupled with a range of other techniques, to show that all genes can be separated, based on their expression abundance, into two distinct groups: one group comprising of lowly expressed and putatively non-functional mRNAs, and the other of highly expressed mRNAs with active chromatin marks at their promoters

DSpace@MIT

Crossref

PubMed Central

Warwick Research Archives Portal Repository

ReCount: A multi-experiment resource of analysis-ready RNA-seq gene count datasets

Author: Frazee Alyssa C
Langmead Ben
Leek Jeffrey T
Publication venue: BioMed Central
Publication date: 01/11/2011
Field of study

Abstract 1 Background RNA sequencing is a flexible and powerful new approach for measuring gene, exon, or isoform expression. To maximize the utility of RNA sequencing data, new statistical methods are needed for clustering, differential expression, and other analyses. A major barrier to the development of new statistical methods is the lack of RNA sequencing datasets that can be easily obtained and analyzed in common statistical software packages such as R. To speed up the development process, we have created a resource of analysis-ready RNA-sequencing datasets. 2 Description ReCount is an online resource of RNA-seq gene count tables and auxilliary data. Tables were built from raw RNA sequencing data from 18 different published studies comprising 475 samples and over 8 billion reads. Using the Myrna package, reads were aligned, overlapped with gene models and tabulated into gene-by-sample count tables that are ready for statistical analysis. Count tables and phenotype data were combined into Bioconductor ExpressionSet objects for ease of analysis. ReCount also contains the Myrna manifest files and R source code used to process the samples, allowing statistical and computational scientists to consider alternative parameter values. 3 Conclusions By combining datasets from many studies and providing data that has already been processed from. fastq format into ready-to-use. RData and. txt files, ReCount facilitates analysis and methods development for RNA-seq count data. We anticipate that ReCount will also be useful for investigators who wish to consider cross-study comparisons and alternative normalization strategies for RNA-seq.</p

Directory of Open Access Journals

PubMed Central

Recommended from our members

Single-cell transcriptomes reveal the mechanism for a breast cancer prognostic gene panel.

Author: Cai Jin
Chen Xuelian
Kabeer Mustafa H
Li Shengwen Calvin
Loudon William G
Nangia Chaitali S
Plant Ashley S
Stucky Andres
Torno Lilibeth
Zhang Gang
Zhong Jiang F
Publication venue: eScholarship, University of California
Publication date: 01/09/2018
Field of study

The clinical benefits of the MammaPrint® signature for breast cancer is well documented; however, how these genes are related to cell cycle perturbation have not been well determined. Our single-cell transcriptome mapping (algorithm) provides details into the fine perturbation of all individual genes during a cell cycle, providing a view of the cell-cycle-phase specific landscape of any given human genes. Specifically, we identified that 38 out of the 70 (54%) MammaPrint® signature genes are perturbated to a specific phase of the cell cycle. The MammaPrint® signature panel derived its clinical prognosis power from measuring the cell cycle activity of specific breast cancer samples. Such cell cycle phase index of the MammaPrint® signature suggested that measurement of the cell cycle index from tumors could be developed into a prognosis tool for various types of cancer beyond breast cancer, potentially improving therapy through targeting a specific phase of the cell cycle of cancer cells

eScholarship - University of California