Search CORE

3,344 research outputs found

BNP-Seq: Bayesian Nonparametric Differential Expression Analysis of Sequencing Count Data

Author: Dadaneh Siamak Zamani
Qian Xiaoning
Zhou Mingyuan
Publication venue
Publication date: 02/05/2017
Field of study

We perform differential expression analysis of high-throughput sequencing count data under a Bayesian nonparametric framework, removing sophisticated ad-hoc pre-processing steps commonly required in existing algorithms. We propose to use the gamma (beta) negative binomial process, which takes into account different sequencing depths using sample-specific negative binomial probability (dispersion) parameters, to detect differentially expressed genes by comparing the posterior distributions of gene-specific negative binomial dispersion (probability) parameters. These model parameters are inferred by borrowing statistical strength across both the genes and samples. Extensive experiments on both simulated and real-world RNA sequencing count data show that the proposed differential expression analysis algorithms clearly outperform previously proposed ones in terms of the areas under both the receiver operating characteristic and precision-recall curves.Comment: To appear in Journal of the American Statistical Associatio

arXiv.org e-Print Archive

FigShare

Recommended from our members

Simulating multiple faceted variability in single cell RNA sequencing.

Author: Xu Chenling
Yosef Nir
Zhang Xiuwei
Publication venue: eScholarship, University of California
Publication date: 01/06/2019
Field of study

The abundance of new computational methods for processing and interpreting transcriptomes at a single cell level raises the need for in silico platforms for evaluation and validation. Here, we present SymSim, a simulator that explicitly models the processes that give rise to data observed in single cell RNA-Seq experiments. The components of the SymSim pipeline pertain to the three primary sources of variation in single cell RNA-Seq data: noise intrinsic to the process of transcription, extrinsic variation indicative of different cell states (both discrete and continuous), and technical variation due to low sensitivity and measurement noise and bias. We demonstrate how SymSim can be used for benchmarking methods for clustering, differential expression and trajectory inference, and for examining the effects of various parameters on their performance. We also show how SymSim can be used to evaluate the number of cells required to detect a rare population under various scenarios

eScholarship - University of California

Distinguishing low frequency mutations from RT-PCR and sequence errors in viral deep sequencing data

Author: Haydon Daniel T.
King David J.
King Donald
Morelli Marco J.
Orton Richard J.
Paton David
Wright Caroline F.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 24/03/2015
Field of study

There is a high prevalence of coronary artery disease (CAD) in patients with left bundle branch block (LBBB); however there are many other causes for this electrocardiographic abnormality. Non-invasive assessment of these patients remains difficult, and all commonly used modalities exhibit several drawbacks. This often leads to these patients undergoing invasive coronary angiography which may not have been necessary. In this review, we examine the uses and limitations of commonly performed non-invasive tests for diagnosis of CAD in patients with LBBB

Crossref

Springer - Publisher Connector

PubMed Central

Enlighten

MSIQ: Joint Modeling of Multiple RNA-seq Samples for Accurate Isoform Quantification

Author: Li Jingyi Jessica
Li Wei Vivian
Zhang Shihua
Zhao Anqi
Publication venue
Publication date: 02/12/2017
Field of study

Next-generation RNA sequencing (RNA-seq) technology has been widely used to assess full-length RNA isoform abundance in a high-throughput manner. RNA-seq data offer insight into gene expression levels and transcriptome structures, enabling us to better understand the regulation of gene expression and fundamental biological processes. Accurate isoform quantification from RNA-seq data is challenging due to the information loss in sequencing experiments. A recent accumulation of multiple RNA-seq data sets from the same tissue or cell type provides new opportunities to improve the accuracy of isoform quantification. However, existing statistical or computational methods for multiple RNA-seq samples either pool the samples into one sample or assign equal weights to the samples when estimating isoform abundance. These methods ignore the possible heterogeneity in the quality of different samples and could result in biased and unrobust estimates. In this article, we develop a method, which we call "joint modeling of multiple RNA-seq samples for accurate isoform quantification" (MSIQ), for more accurate and robust isoform quantification by integrating multiple RNA-seq samples under a Bayesian framework. Our method aims to (1) identify a consistent group of samples with homogeneous quality and (2) improve isoform quantification accuracy by jointly modeling multiple RNA-seq samples by allowing for higher weights on the consistent group. We show that MSIQ provides a consistent estimator of isoform abundance, and we demonstrate the accuracy and effectiveness of MSIQ compared with alternative methods through simulation studies on D. melanogaster genes. We justify MSIQ's advantages over existing approaches via application studies on real RNA-seq data from human embryonic stem cells, brain tissues, and the HepG2 immortalized cell line

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Discrete distributional differential expression (D3E)--a tool for gene expression analysis of single-cell RNA-seq data.

Author: Delmans Mihails
Hemberg Martin
Publication venue: BMC Bioinformatics
Publication date: 29/02/2016
Field of study

BACKGROUND: The advent of high throughput RNA-seq at the single-cell level has opened up new opportunities to elucidate the heterogeneity of gene expression. One of the most widespread applications of RNA-seq is to identify genes which are differentially expressed between two experimental conditions. RESULTS: We present a discrete, distributional method for differential gene expression (D(3)E), a novel algorithm specifically designed for single-cell RNA-seq data. We use synthetic data to evaluate D(3)E, demonstrating that it can detect changes in expression, even when the mean level remains unchanged. Since D(3)E is based on an analytically tractable stochastic model, it provides additional biological insights by quantifying biologically meaningful properties, such as the average burst size and frequency. We use D(3)E to investigate experimental data, and with the help of the underlying model, we directly test hypotheses about the driving mechanism behind changes in gene expression. CONCLUSION: Evaluation using synthetic data shows that D(3)E performs better than other methods for identifying differentially expressed genes since it is designed to take full advantage of the information available from single-cell RNA-seq experiments. Moreover, the analytical model underlying D(3)E makes it possible to gain additional biological insights

Modelling capture efficiency of single-cell RNA-sequencing data improves inference of transcriptome-wide burst kinetics

Author: Jørgensen Andreas Christ Sølvsten
Marguerat Samuel
Shahrezaei Vahid
Tang Wenhao
Thomas Philipp
Publication venue: OXFORD UNIV PRESS
Publication date: 01/07/2023
Field of study

Motivation: Gene expression is characterized by stochastic bursts of transcription that occur at brief and random periods of promoter activity. The kinetics of gene expression burstiness differs across the genome and is dependent on the promoter sequence, among other factors. Single-cell RNA sequencing (scRNA-seq) has made it possible to quantify the cell-to-cell variability in transcription at a global genome-wide level. However, scRNA-seq data are prone to technical variability, including low and variable capture efficiency of transcripts from individual cells. // Results: Here, we propose a novel mathematical theory for the observed variability in scRNA-seq data. Our method captures burst kinetics and variability in both the cell size and capture efficiency, which allows us to propose several likelihood-based and simulation-based methods for the inference of burst kinetics from scRNA-seq data. Using both synthetic and real data, we show that the simulation-based methods provide an accurate, robust and flexible tool for inferring burst kinetics from scRNA-seq data. In particular, in a supervised manner, a simulation-based inference method based on neural networks proves to be accurate and useful when applied to both allele and nonallele-specific scRNA-seq data. // Availability and implementation: The code for Neural Network and Approximate Bayesian Computation inference is available at https://github.com/WT215/nnRNA and https://github.com/WT215/Julia_ABC, respectively

UCL Discovery

Recommended from our members

lncRNA-dependent mechanisms of androgen-receptor-regulated gene activation programs.

Author: Evans Christopher P
Jin Chunyu
Li Wenbo
Lin Chunru
Meng Da
Merkurjev Daria
Ohgi Kenneth A
Rosenfeld Michael G
Tanasa Bogdan
Yang Joy C
Yang Liuqing
Zhang Jie
Publication venue: eScholarship, University of California
Publication date: 01/08/2013
Field of study

Although recent studies have indicated roles of long non-coding RNAs (lncRNAs) in physiological aspects of cell-type determination and tissue homeostasis, their potential involvement in regulated gene transcription programs remains rather poorly understood. The androgen receptor regulates a large repertoire of genes central to the identity and behaviour of prostate cancer cells, and functions in a ligand-independent fashion in many prostate cancers when they become hormone refractory after initial androgen deprivation therapy. Here we report that two lncRNAs highly overexpressed in aggressive prostate cancer, PRNCR1 (also known as PCAT8) and PCGEM1, bind successively to the androgen receptor and strongly enhance both ligand-dependent and ligand-independent androgen-receptor-mediated gene activation programs and proliferation in prostate cancer cells. Binding of PRNCR1 to the carboxy-terminally acetylated androgen receptor on enhancers and its association with DOT1L appear to be required for recruitment of the second lncRNA, PCGEM1, to the androgen receptor amino terminus that is methylated by DOT1L. Unexpectedly, recognition of specific protein marks by PCGEM1-recruited pygopus 2 PHD domain enhances selective looping of androgen-receptor-bound enhancers to target gene promoters in these cells. In 'resistant' prostate cancer cells, these overexpressed lncRNAs can interact with, and are required for, the robust activation of both truncated and full-length androgen receptor, causing ligand-independent activation of the androgen receptor transcriptional program and cell proliferation. Conditionally expressed short hairpin RNA targeting these lncRNAs in castration-resistant prostate cancer cell lines strongly suppressed tumour xenograft growth in vivo. Together, these results indicate that these overexpressed lncRNAs can potentially serve as a required component of castration-resistance in prostatic tumours

eScholarship - University of California

Modelling capture efficiency of single-cell RNA-sequencing data improves inference of transcriptome-wide burst kinetics

Author: Jørgensen ACS
Marguerat S
Shahrezaei V
Tang W
Thomas P
Publication venue: 'Oxford University Press (OUP)'
Publication date: 22/06/2023
Field of study

MOTIVATION: Gene expression is characterised by stochastic bursts of transcription that occur at brief and random periods of promoter activity. The kinetics of gene expression burstiness differs across the genome and is dependent on the promoter sequence, among other factors. Single-cell RNA sequencing (scRNA-seq) has made it possible to quantify the cell-to-cell variability in transcription at a global genome-wide level. However, scRNA-seq data is prone to technical variability, including low and variable capture efficiency of transcripts from individual cells. RESULTS: Here, we propose a novel mathematical theory for the observed variability in scRNA-seq data. Our method captures burst kinetics and variability in both the cell size and capture efficiency, which allows us to propose several likelihood-based and simulation-based methods for the inference of burst kinetics from scRNA-seq data. Using both synthetic and real data, we show that the simulation-based methods provide an accurate, robust and flexible tool for inferring burst kinetics from scRNA-seq data. In particular, in a supervised manner, a simulation-based inference method based on neural networks proves to be accurate and useful when applied to both allele and non-allele-specific scRNA-seq data. AVAILABILITY: The code for Neural Network and Approximate Bayesian Computation inference is available at https://github.com/WT215/nnRNA and https://github.com/WT215/Julia_ABC respectively. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

Spiral - Imperial College Digital Repository