Search CORE

2,368 research outputs found

Bayesian estimation of Differential Transcript Usage from RNA-seq data

Author: Papastamoulis Panagiotis
Rattray Magnus
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2017
Field of study

Next generation sequencing allows the identification of genes consisting of differentially expressed transcripts, a term which usually refers to changes in the overall expression level. A specific type of differential expression is differential transcript usage (DTU) and targets changes in the relative within gene expression of a transcript. The contribution of this paper is to: (a) extend the use of cjBitSeq to the DTU context, a previously introduced Bayesian model which is originally designed for identifying changes in overall expression levels and (b) propose a Bayesian version of DRIMSeq, a frequentist model for inferring DTU. cjBitSeq is a read based model and performs fully Bayesian inference by MCMC sampling on the space of latent state of each transcript per gene. BayesDRIMSeq is a count based model and estimates the Bayes Factor of a DTU model against a null model using Laplace's approximation. The proposed models are benchmarked against the existing ones using a recent independent simulation study as well as a real RNA-seq dataset. Our results suggest that the Bayesian methods exhibit similar performance with DRIMSeq in terms of precision/recall but offer better calibration of False Discovery Rate.Comment: Revised version, accepted to Statistical Applications in Genetics and Molecular Biolog

arXiv.org e-Print Archive

Crossref

The University of Manchester - Institutional Repository

Statistical Tests for Detecting Differential RNA-Transcript Expression from Read Counts

Author: Gunnar Rätsch
Karsten Borgwardt
Oliver Stegle
Philipp Drewe
Philipp Drewe
Regina Bohnert
Publication venue
Publication date: 01/01/2010
Field of study

As a fruit of the current revolution in sequencing technology, transcriptomes can now be analyzed at an unprecedented level of detail. These advances have been exploited for detecting differential expressed genes across biological samples and for quantifying the abundances of various RNA transcripts within one gene. However, explicit strategies for detecting the hidden differential abundances of RNA transcripts in biological samples have not been defined. In this work, we present two novel statistical tests to address this issue: a 'gene structure sensitive' Poisson test for detecting differential expression when the transcript structure of the gene is known, and a kernel-based test called Maximum Mean Discrepancy when it is unknown. We analyzed the proposed approaches on simulated read data for two artificial samples as well as on factual reads generated by the Illumina Genome Analyzer for two _C. elegans_ samples. Our analysis shows that the Poisson test identifies genes with differential transcript expression considerably better that previously proposed RNA transcript quantification approaches for this task. The MMD test is able to detect a large fraction (75%) of such differential cases without the knowledge of the annotated transcripts. It is therefore well-suited to analyze RNA-Seq experiments when the genome annotations are incomplete or not available, where other approaches have to fail

Crossref

Nature Precedings

MPG.PuRe

Bayesian Methods for Gene Expression Analysis from High-Throughput Sequencing data

Author: Glaus Peter
Publication venue
Publication date: 01/08/2014
Field of study

The University of Manchester - Institutional Repository

Perplexity: Evaluating Transcript Abundance Estimation in the Absence of Ground Truth

Author: Chan Skylar
Fan Jason
Patro Rob
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 21st International Workshop on Algorithms in Bioinformatics (WABI 2021)
Publication date: 01/01/2021
Field of study

There has been rapid development of probabilistic models and inference methods for transcript abundance estimation from RNA-seq data. These models aim to accurately estimate transcript-level abundances, to account for different biases in the measurement process, and even to assess uncertainty in resulting estimates that can be propagated to subsequent analyses. The assumed accuracy of the estimates inferred by such methods underpin gene expression based analysis routinely carried out in the lab. Although hyperparameter selection is known to affect the distributions of inferred abundances (e.g. producing smooth versus sparse estimates), strategies for performing model selection in experimental data have been addressed informally at best. Thus, we derive perplexity for evaluating abundance estimates on fragment sets directly. We adapt perplexity from the analogous metric used to evaluate language and topic models and extend the metric to carefully account for corner cases unique to RNA-seq. In experimental data, estimates with the best perplexity also best correlate with qPCR measurements. In simulated data, perplexity is well behaved and concordant with genome-wide measurements against ground truth and differential expression analysis. To our knowledge, our study is the first to make possible model selection for transcript abundance estimation on experimental data in the absence of ground truth

PubMed Central

Dagstuhl Research Online Publication Server

Digital Repository at the University of Maryland

Quantifying alternative splicing from paired-end RNA-sequencing data

Author: Attolini Camille Stephan-Otto
Kroiss Manuel
Rossell David
Stöcker Almond
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/03/2014
Field of study

RNA-sequencing has revolutionized biomedical research and, in particular, our ability to study gene alternative splicing. The problem has important implications for human health, as alternative splicing may be involved in malfunctions at the cellular level and multiple diseases. However, the high-dimensional nature of the data and the existence of experimental biases pose serious data analysis challenges. We find that the standard data summaries used to study alternative splicing are severely limited, as they ignore a substantial amount of valuable information. Current data analysis methods are based on such summaries and are hence suboptimal. Further, they have limited flexibility in accounting for technical biases. We propose novel data summaries and a Bayesian modeling framework that overcome these limitations and determine biases in a nonparametric, highly flexible manner. These summaries adapt naturally to the rapid improvements in sequencing technology. We provide efficient point estimates and uncertainty assessments. The approach allows to study alternative splicing patterns for individual samples and can also be the basis for downstream analyses. We found a severalfold improvement in estimation mean square error compared popular approaches in simulations, and substantially higher consistency between replicates in experimental data. Our findings indicate the need for adjusting the routine summarization and analysis of alternative splicing RNA-seq studies. We provide a software implementation in the R package casper.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS687 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org). With correction

arXiv.org e-Print Archive

PubMed Central

Warwick Research Archives Portal Repository

Recommended from our members

Genomic Profiling of Childhood Tumor Patient-Derived Xenograft Models to Enable Rational Clinical Trial Design.

Author: Baxter Patricia
Bendak Katerina
Berko Esther R
Bowen Jay
Braun Frank K
Bryan Anthony C
Böhm Julia W
Cardenas Maria F
Chen Yidong
Coppens Sara E
Cross Katherine L
Diskin Sharon J
Doddapaneni HarshaVardhan
Du Yunchen
Egolf Laura E
Evans Kathryn
Farrel Alvin
Gaonkar Krutika S
Gastier-Foster Julie M
Gatto Gregory J
Gorlick Richard
Greene Casey S
Haber Michelle
Hart Lori S
Haussler David
Houghton Peter J
Jayaseelan Joy
Kalletla Karthik
Kendsersky Nathan M
Kolb E Anders
Krytska Kateryna
Kurmashev Dias
Kurmasheva Raushan T
Leraas Kristen M
Li Xiao-Nan
Lindsay Holly B
Lock Richard B
Lopez Gonzalo
Maris John M
Marshall Glenn M
Mayoh Chelsea
McCalmont Hannah
McCoy Kristyn
Modi Apexa
Momin Zeineen
Morton Christopher
Mosse Yael P
Nance Jonas
Patel Khushbu
Pfeil Jacob
Qi Lin
Raman Pichai
Rathi Komal S
Reynolds C Patrick
Rokita Jo Lynne
Sacks Gregory I
Sanchez Yolanda
Shu Jack
Smith Malcolm A
Tyrrell Vanessa
Upton Kristen A
Vaksman Zalman
Vaske Olena Morozova
Way Gregory P
Wheeler David A
Zhang Huiyuan
Zhang Wendong
Zhao Sibo
Zheng Siyuan
Publication venue: eScholarship, University of California
Publication date: 01/11/2019
Field of study

Accelerating cures for children with cancer remains an immediate challenge as a result of extensive oncogenic heterogeneity between and within histologies, distinct molecular mechanisms evolving between diagnosis and relapsed disease, and limited therapeutic options. To systematically prioritize and rationally test novel agents in preclinical murine models, researchers within the Pediatric Preclinical Testing Consortium are continuously developing patient-derived xenografts (PDXs)-many of which are refractory to current standard-of-care treatments-from high-risk childhood cancers. Here, we genomically characterize 261 PDX models from 37 unique pediatric cancers; demonstrate faithful recapitulation of histologies and subtypes; and refine our understanding of relapsed disease. In addition, we use expression signatures to classify tumors for TP53 and NF1 pathway inactivation. We anticipate that these data will serve as a resource for pediatric oncology drug development and will guide rational clinical trial design for children with cancer

eScholarship - University of California

Deep generative modeling for single-cell transcriptomics.

Author: A Regev
A Tanay
A Wagner
A Zeisel
AP Patel
B Wang
BK Tusi
CA Vallejos
CA Vallejos
D DeTomaso
D Grün
D Risso
DM Blei
E Pierson
FA Wolf
G Finak
G Görgün
GXY Zheng
HI Nakaya
J Ding
J Fan
Jeffrey Regier
JT Gaublomme
K Shekhar
L Haghverdi
L Held
M Stoeckius
MD Robinson
MI Love
Michael B. Cole
Michael I. Jordan
Nir Yosef
PV Kharchenko
Q Li
RE Kass
Romain Lopez
S Prabhakaran
S Semrau
U Shaham
WE Johnson
Publication venue: eScholarship, University of California
Publication date: 01/12/2018
Field of study

Single-cell transcriptome measurements can reveal unexplored biological diversity, but they suffer from technical noise and bias that must be modeled to account for the resulting uncertainty in downstream analyses. Here we introduce single-cell variational inference (scVI), a ready-to-use scalable framework for the probabilistic representation and analysis of gene expression in single cells ( https://github.com/YosefLab/scVI ). scVI uses stochastic optimization and deep neural networks to aggregate information across similar cells and genes and to approximate the distributions that underlie observed expression values, while accounting for batch effects and limited sensitivity. We used scVI for a range of fundamental analysis tasks including batch correction, visualization, clustering, and differential expression, and achieved high accuracy for each task

Crossref

eScholarship - University of California

MapSplice: Accurate Mapping of RNA-Seq Reads for Splice Junction Discovery

Author: Chiang Derek Y.
Coleman Stephen J
Grimm Sara A.
He Xiaping
Huang Yan
Liu Jinze
Macleod James N
Mieczkowski Piotr
Perou Charles M.
Prins Jan F.
Savich Gleb L.
Singh Darshan
Wang Kai
Zeng Zheng
Publication venue: UKnowledge
Publication date: 01/01/2010
Field of study

The accurate mapping of reads that span splice junctions is a critical component of all analytic techniques that work with RNA-seq data. We introduce a second generation splice detection algorithm, MapSplice, whose focus is high sensitivity and specificity in the detection of splices as well as CPU and memory efficiency. MapSplice can be applied to both short (\u3c75 bp) and long reads (≥75 bp). MapSplice is not dependent on splice site features or intron length, consequently it can detect novel canonical as well as non-canonical splices. MapSplice leverages the quality and diversity of read alignments of a given splice to increase accuracy. We demonstrate that MapSplice achieves higher sensitivity and specificity than TopHat and SpliceMap on a set of simulated RNA-seq data. Experimental studies also support the accuracy of the algorithm. Splice junctions derived from eight breast cancer RNA-seq datasets recapitulated the extensiveness of alternative splicing on a global level as well as the differences between molecular subtypes of breast cancer. These combined results indicate that MapSplice is a highly accurate algorithm for the alignment of RNA-seq reads to splice junctions. Software download URL: http://www.netlab.uky.edu/p/bioinfo/MapSplice

PubMed Central

University of Kentucky

Carolina Digital Repository