8,616 research outputs found
Multivariate approach to the analysis of correlated RNA-seq data
학위논문 (석사)-- 서울대학교 대학원 : 통계학과, 2017. 2. 박태성.High-throughput RNA-seq technology has emerged as a powerful tool for understanding the molecular basis of phenotype variation in biology, including disease. Recently, some correlated RNA-seq datasets started to be generated. While there have been several approaches proposed for identifying the differentially expressed genes (DEGs), not many methods can analyze correlated RNA-seq data. We expect the simultaneous analysis of correlated RNA-seq data to increase of power of detecting DEGs. In this paper, we propose a multivariate method to find DEGs on correlated RNA-seq data based on the Generalized Estimating Equations (GEE) approach. The advantage of the proposed method is to consider correlated RNA-seq data simultaneously while accounting for correlations. Through real data analysis and simulation studies, we show that our multivariate approach has higher power of detecting DEGs than the existing methods.1 Introduction 1
1.1 Background 1
1.2 Purpose 2
2 Material and Methods 3
2.1 Real RNA-seq datasets 3
2.1.1 Diet data 3
2.1.2 Toxicity data 4
2.2 Review of commonly used approach 5
2.2.1 edgeR 5
2.2.2 DESeq 5
2.2.3 limma+voom 6
2.3 Proposed approach : GEE method 7
3 Simulations 9
3.1 Simulation Settings 10
3.1.1 Different number of DEGs 10
3.1.2 Different value of φ 10
3.1.3 Different number of correlated datasets 10
3.2 Results of Simulation 11
4 Application to Real Data 15
5 Discussion 19
Bibliography 21
초록 23Maste
Recommended from our members
Identification and characterization of dysregulated P-element induced wimpy testis-interacting RNAs in head and neck squamous cell carcinoma.
It is clear that alcohol consumption is a major risk factor in the pathogenesis of head and neck squamous cell carcinoma (HNSCC); however, the molecular mechanism underlying the pathogenesis of alcohol-associated HNSCC remains poorly understood. The aim of the present study was to identify and characterize P-element-induced wimpy testis (PIWI)-interacting RNAs (piRNAs) and PIWI proteins dysregulated in alcohol-associated HNSCC to elucidate their function in the development of this cancer. Using next generation RNA-sequencing (RNA-seq) data obtained from 40 HNSCC patients, the piRNA and PIWI protein expression of HNSCC samples was compared between alcohol drinkers and non-drinkers. A separate piRNA expression RNA-seq analysis of 18 non-smoker HNSCC patients was also conducted. To verify piRNA expression, reverse transcription-quantitative polymerase chain reaction (RT-qPCR) was performed on the most differentially expressed alcohol-associated piRNAs in ethanol and acetaldehyde-treated normal oral keratinocytes. The correlation between piRNA expression and patient survival was analyzed using Kaplan-Meier estimators and multivariate Cox proportional hazard models. A comparison between alcohol drinking and non-drinking HNSCC patients demonstrated that a panel of 3,223 piRNA transcripts were consistently detected and differentially expressed. RNA-seq analysis and in vitro RT-qPCR verification revealed that 4 of these piRNAs, piR-35373, piR-266308, piR-58510 and piR-38034, were significantly dysregulated between drinking and non-drinking cohorts. Of these four piRNAs, low expression of piR-58510 and piR-35373 significantly correlated with improved patient survival. Furthermore, human PIWI-like protein 4 was consistently upregulated in ethanol and acetaldehyde-treated normal oral keratinocytes. These results demonstrate that alcohol consumption may cause dysregulation of piRNA expression in HNSCC and in vitro verifications identified 4 piRNAs that may be involved in the pathogenesis of alcohol-associated HNSCC
Inferring evolutionary histories of pathway regulation from transcriptional profiling data
One of the outstanding challenges in comparative genomics is to interpret the
evolutionary importance of regulatory variation between species. Rigorous
molecular evolution-based methods to infer evidence for natural selection from
expression data are at a premium in the field, and to date, phylogenetic
approaches have not been well-suited to address the question in the small sets
of taxa profiled in standard surveys of gene expression. We have developed a
strategy to infer evolutionary histories from expression profiles by analyzing
suites of genes of common function. In a manner conceptually similar to
molecular evolution models in which the evolutionary rates of DNA sequence at
multiple loci follow a gamma distribution, we modeled expression of the genes
of an \emph{a priori}-defined pathway with rates drawn from an inverse gamma
distribution. We then developed a fitting strategy to infer the parameters of
this distribution from expression measurements, and to identify gene groups
whose expression patterns were consistent with evolutionary constraint or rapid
evolution in particular species. Simulations confirmed the power and accuracy
of our inference method. As an experimental testbed for our approach, we
generated and analyzed transcriptional profiles of four \emph{Saccharomyces}
yeasts. The results revealed pathways with signatures of constrained and
accelerated regulatory evolution in individual yeasts and across the phylogeny,
highlighting the prevalence of pathway-level expression change during the
divergence of yeast species. We anticipate that our pathway-based phylogenetic
approach will be of broad utility in the search to understand the evolutionary
relevance of regulatory change.Comment: 30 pages, 12 figures, 2 tables, contact authors for supplementary
table
Variance component score test for time-course gene set analysis of longitudinal RNA-seq data
As gene expression measurement technology is shifting from microarrays to
sequencing, the statistical tools available for their analysis must be adapted
since RNA-seq data are measured as counts. Recently, it has been proposed to
tackle the count nature of these data by modeling log-count reads per million
as continuous variables, using nonparametric regression to account for their
inherent heteroscedasticity. Adopting such a framework, we propose tcgsaseq, a
principled, model-free and efficient top-down method for detecting longitudinal
changes in RNA-seq gene sets. Considering gene sets defined a priori, tcgsaseq
identifies those whose expression vary over time, based on an original variance
component score test accounting for both covariates and heteroscedasticity
without assuming any specific parametric distribution for the transformed
counts. We demonstrate that despite the presence of a nonparametric component,
our test statistic has a simple form and limiting distribution, and both may be
computed quickly. A permutation version of the test is additionally proposed
for very small sample sizes. Applied to both simulated data and two real
datasets, the proposed method is shown to exhibit very good statistical
properties, with an increase in stability and power when compared to state of
the art methods ROAST, edgeR and DESeq2, which can fail to control the type I
error under certain realistic settings. We have made the method available for
the community in the R package tcgsaseq.Comment: 23 pages, 6 figures, typo corrections & acceptance acknowledgemen
Network-based approaches to explore complex biological systems towards network medicine
Network medicine relies on different types of networks: from the molecular level of protein–protein interactions to gene regulatory network and correlation studies of gene expression. Among network approaches based on the analysis of the topological properties of protein–protein interaction (PPI) networks, we discuss the widespread DIAMOnD (disease module detection) algorithm. Starting from the assumption that PPI networks can be viewed as maps where diseases can be identified with localized perturbation within a specific neighborhood (i.e., disease modules), DIAMOnD performs a systematic analysis of the human PPI network to uncover new disease-associated genes by exploiting the connectivity significance instead of connection density. The past few years have witnessed the increasing interest in understanding the molecular mechanism of post-transcriptional regulation with a special emphasis on non-coding RNAs since they are emerging as key regulators of many cellular processes in both physiological and pathological states. Recent findings show that coding genes are not the only targets that microRNAs interact with. In fact, there is a pool of different RNAs—including long non-coding RNAs (lncRNAs) —competing with each other to attract microRNAs for interactions, thus acting as competing endogenous RNAs (ceRNAs). The framework of regulatory networks provides a powerful tool to gather new insights into ceRNA regulatory mechanisms. Here, we describe a data-driven model recently developed to explore the lncRNA-associated ceRNA activity in breast invasive carcinoma. On the other hand, a very promising example of the co-expression network is the one implemented by the software SWIM (switch miner), which combines topological properties of correlation networks with gene expression data in order to identify a small pool of genes—called switch genes—critically associated with drastic changes in cell phenotype. Here, we describe SWIM tool along with its applications to cancer research and compare its predictions with DIAMOnD disease genes
- …