1,958 research outputs found
MSIQ: Joint Modeling of Multiple RNA-seq Samples for Accurate Isoform Quantification
Next-generation RNA sequencing (RNA-seq) technology has been widely used to
assess full-length RNA isoform abundance in a high-throughput manner. RNA-seq
data offer insight into gene expression levels and transcriptome structures,
enabling us to better understand the regulation of gene expression and
fundamental biological processes. Accurate isoform quantification from RNA-seq
data is challenging due to the information loss in sequencing experiments. A
recent accumulation of multiple RNA-seq data sets from the same tissue or cell
type provides new opportunities to improve the accuracy of isoform
quantification. However, existing statistical or computational methods for
multiple RNA-seq samples either pool the samples into one sample or assign
equal weights to the samples when estimating isoform abundance. These methods
ignore the possible heterogeneity in the quality of different samples and could
result in biased and unrobust estimates. In this article, we develop a method,
which we call "joint modeling of multiple RNA-seq samples for accurate isoform
quantification" (MSIQ), for more accurate and robust isoform quantification by
integrating multiple RNA-seq samples under a Bayesian framework. Our method
aims to (1) identify a consistent group of samples with homogeneous quality and
(2) improve isoform quantification accuracy by jointly modeling multiple
RNA-seq samples by allowing for higher weights on the consistent group. We show
that MSIQ provides a consistent estimator of isoform abundance, and we
demonstrate the accuracy and effectiveness of MSIQ compared with alternative
methods through simulation studies on D. melanogaster genes. We justify MSIQ's
advantages over existing approaches via application studies on real RNA-seq
data from human embryonic stem cells, brain tissues, and the HepG2 immortalized
cell line
TROM: A Testing-based Method for Finding Transcriptomic Similarity of Biological Samples
Comparative transcriptomics has gained increasing popularity in genomic
research thanks to the development of high-throughput technologies including
microarray and next-generation RNA sequencing that have generated numerous
transcriptomic data. An important question is to understand the conservation
and differentiation of biological processes in different species. We propose a
testing-based method TROM (Transcriptome Overlap Measure) for comparing
transcriptomes within or between different species, and provide a different
perspective to interpret transcriptomic similarity in contrast to traditional
correlation analyses. Specifically, the TROM method focuses on identifying
associated genes that capture molecular characteristics of biological samples,
and subsequently comparing the biological samples by testing the overlap of
their associated genes. We use simulation and real data studies to demonstrate
that TROM is more powerful in identifying similar transcriptomes and more
robust to stochastic gene expression noise than Pearson and Spearman
correlations. We apply TROM to compare the developmental stages of six
Drosophila species, C. elegans, S. purpuratus, D. rerio and mouse liver, and
find interesting correspondence patterns that imply conserved gene expression
programs in the development of these species. The TROM method is available as
an R package on CRAN (http://cran.r-project.org/) with manuals and source codes
available at http://www.stat.ucla.edu/ jingyi.li/software-and-data/trom.html
Recommended from our members
EpiAlign: an alignment-based bioinformatic tool for comparing chromatin state sequences.
The availability of genome-wide epigenomic datasets enables in-depth studies of epigenetic modifications and their relationships with chromatin structures and gene expression. Various alignment tools have been developed to align nucleotide or protein sequences in order to identify structurally similar regions. However, there are currently no alignment methods specifically designed for comparing multi-track epigenomic signals and detecting common patterns that may explain functional or evolutionary similarities. We propose a new local alignment algorithm, EpiAlign, designed to compare chromatin state sequences learned from multi-track epigenomic signals and to identify locally aligned chromatin regions. EpiAlign is a dynamic programming algorithm that novelly incorporates varying lengths and frequencies of chromatin states. We demonstrate the efficacy of EpiAlign through extensive simulations and studies on the real data from the NIH Roadmap Epigenomics project. EpiAlign is able to extract recurrent chromatin state patterns along a single epigenome, and many of these patterns carry cell-type-specific characteristics. EpiAlign can also detect common chromatin state patterns across multiple epigenomes, and it will serve as a useful tool to group and distinguish epigenomic samples based on genome-wide or local chromatin state patterns
Modeling and analysis of RNA-seq data: a review from a statistical perspective
Background: Since the invention of next-generation RNA sequencing (RNA-seq)
technologies, they have become a powerful tool to study the presence and
quantity of RNA molecules in biological samples and have revolutionized
transcriptomic studies. The analysis of RNA-seq data at four different levels
(samples, genes, transcripts, and exons) involve multiple statistical and
computational questions, some of which remain challenging up to date.
Results: We review RNA-seq analysis tools at the sample, gene, transcript,
and exon levels from a statistical perspective. We also highlight the
biological and statistical questions of most practical considerations.
Conclusion: The development of statistical and computational methods for
analyzing RNA- seq data has made significant advances in the past decade.
However, methods developed to answer the same biological question often rely on
diverse statical models and exhibit different performance under different
scenarios. This review discusses and compares multiple commonly used
statistical models regarding their assumptions, in the hope of helping users
select appropriate methods as needed, as well as assisting developers for
future method development
Issues arising from benchmarking single-cell RNA sequencing imputation methods
On June 25th, 2018, Huang et al. published a computational method SAVER on
Nature Methods for imputing dropout gene expression levels in single cell RNA
sequencing (scRNA-seq) data. Huang et al. performed a set of comprehensive
benchmarking analyses, including comparison with the data from RNA fluorescence
in situ hybridization, to demonstrate that SAVER outperformed two existing
scRNA-seq imputation methods, scImpute and MAGIC. However, their computational
analyses were based on semi-synthetic data that the authors had generated
following the Poisson-Gamma model used in the SAVER method. We have therefore
re-examined Huang et al.'s study. We find that the semi-synthetic data have
very different properties from those of real scRNA-seq data and that the cell
clusters used for benchmarking are inconsistent with the cell types labeled by
biologists. We show that a reanalysis based on real scRNA-seq data and grounded
on biological knowledge of cell types leads to different results and
conclusions from those of Huang et al.Comment: 5 page
Recommended from our members
Predicting and comparing transcription start sites in single cell populations
The advent of 5' single-cell RNA sequencing (scRNA-seq) technologies offers unique opportunities to identify and analyze transcription start sites (TSSs) at a single-cell resolution. These technologies have the potential to uncover the complexities of transcription initiation and alternative TSS usage across different cell types and conditions. Despite the emergence of computational methods designed to analyze 5' RNA sequencing data, current methods often lack comparative evaluations in single-cell contexts and are predominantly tailored for paired-end data, neglecting the potential of single-end data. This study introduces scTSS, a computational pipeline developed to bridge this gap by accommodating both paired-end and single-end 5' scRNA-seq data. scTSS enables joint analysis of multiple single-cell samples, starting with TSS cluster prediction and quantification, followed by differential TSS usage analysis. It employs a Binomial generalized linear mixed model to accurately and efficiently detect differential TSS usage. We demonstrate the utility of scTSS through its application in analyzing transcriptional initiation from single-cell data of two distinct diseases. The results illustrate scTSS's ability to discern alternative TSS usage between different cell types or biological conditions and to identify cell subpopulations characterized by unique TSS-level expression profiles
Recommended from our members
Analyzing RNA-Seq data from Chlamydia with super broad transcriptomic activation: challenges, solutions, and implications for other systems.
BACKGROUND: RNA sequencing (RNA-Seq) offers profound insights into the complex transcriptomes of diverse biological systems. However, standard differential expression analysis pipelines based on DESeq2 and edgeR encounter challenges when applied to the immediate early transcriptomes of Chlamydia spp., obligate intracellular bacteria. These challenges arise from their reliance on assumptions that do not hold in scenarios characterized by extensive transcriptomic activation and limited repression. RESULTS: Standard analyses using unique chlamydial RNA-Seq reads alone identify nearly 300 upregulated and about 300 downregulated genes, significantly deviating from actual RNA-Seq read trends. By incorporating both chlamydial and host reads or adjusting for total sequencing depth, the revised normalization methods each detected over 700 upregulated genes and 30 or fewer downregulated genes, closely aligned with observed RNA-Seq data. Further validation through qRT-PCR analysis confirmed the effectiveness of these adjusted approaches in capturing the true extent of transcriptomic activation during the immediate early phase of chlamydial infection. CONCLUSIONS: This study highlights the limitations of standard RNA-Seq analysis tools in scenarios with extensive transcriptomic activation, such as in Chlamydia spp. during early infection. Our revised normalization methods, incorporating host reads or total sequencing depth, provide a more accurate representation of gene expression dynamics. These approaches may inform similar adjustments in other systems with unbalanced gene expression dynamics, enhancing the accuracy of transcriptomic analysis
- …
