8,616 research outputs found

    Multivariate approach to the analysis of correlated RNA-seq data

    Get PDF
    학위논문 (석사)-- 서울대학교 대학원 : 통계학과, 2017. 2. 박태성.High-throughput RNA-seq technology has emerged as a powerful tool for understanding the molecular basis of phenotype variation in biology, including disease. Recently, some correlated RNA-seq datasets started to be generated. While there have been several approaches proposed for identifying the differentially expressed genes (DEGs), not many methods can analyze correlated RNA-seq data. We expect the simultaneous analysis of correlated RNA-seq data to increase of power of detecting DEGs. In this paper, we propose a multivariate method to find DEGs on correlated RNA-seq data based on the Generalized Estimating Equations (GEE) approach. The advantage of the proposed method is to consider correlated RNA-seq data simultaneously while accounting for correlations. Through real data analysis and simulation studies, we show that our multivariate approach has higher power of detecting DEGs than the existing methods.1 Introduction 1 1.1 Background 1 1.2 Purpose 2 2 Material and Methods 3 2.1 Real RNA-seq datasets 3 2.1.1 Diet data 3 2.1.2 Toxicity data 4 2.2 Review of commonly used approach 5 2.2.1 edgeR 5 2.2.2 DESeq 5 2.2.3 limma+voom 6 2.3 Proposed approach : GEE method 7 3 Simulations 9 3.1 Simulation Settings 10 3.1.1 Different number of DEGs 10 3.1.2 Different value of φ 10 3.1.3 Different number of correlated datasets 10 3.2 Results of Simulation 11 4 Application to Real Data 15 5 Discussion 19 Bibliography 21 초록 23Maste

    Inferring evolutionary histories of pathway regulation from transcriptional profiling data

    Get PDF
    One of the outstanding challenges in comparative genomics is to interpret the evolutionary importance of regulatory variation between species. Rigorous molecular evolution-based methods to infer evidence for natural selection from expression data are at a premium in the field, and to date, phylogenetic approaches have not been well-suited to address the question in the small sets of taxa profiled in standard surveys of gene expression. We have developed a strategy to infer evolutionary histories from expression profiles by analyzing suites of genes of common function. In a manner conceptually similar to molecular evolution models in which the evolutionary rates of DNA sequence at multiple loci follow a gamma distribution, we modeled expression of the genes of an \emph{a priori}-defined pathway with rates drawn from an inverse gamma distribution. We then developed a fitting strategy to infer the parameters of this distribution from expression measurements, and to identify gene groups whose expression patterns were consistent with evolutionary constraint or rapid evolution in particular species. Simulations confirmed the power and accuracy of our inference method. As an experimental testbed for our approach, we generated and analyzed transcriptional profiles of four \emph{Saccharomyces} yeasts. The results revealed pathways with signatures of constrained and accelerated regulatory evolution in individual yeasts and across the phylogeny, highlighting the prevalence of pathway-level expression change during the divergence of yeast species. We anticipate that our pathway-based phylogenetic approach will be of broad utility in the search to understand the evolutionary relevance of regulatory change.Comment: 30 pages, 12 figures, 2 tables, contact authors for supplementary table

    Variance component score test for time-course gene set analysis of longitudinal RNA-seq data

    Get PDF
    As gene expression measurement technology is shifting from microarrays to sequencing, the statistical tools available for their analysis must be adapted since RNA-seq data are measured as counts. Recently, it has been proposed to tackle the count nature of these data by modeling log-count reads per million as continuous variables, using nonparametric regression to account for their inherent heteroscedasticity. Adopting such a framework, we propose tcgsaseq, a principled, model-free and efficient top-down method for detecting longitudinal changes in RNA-seq gene sets. Considering gene sets defined a priori, tcgsaseq identifies those whose expression vary over time, based on an original variance component score test accounting for both covariates and heteroscedasticity without assuming any specific parametric distribution for the transformed counts. We demonstrate that despite the presence of a nonparametric component, our test statistic has a simple form and limiting distribution, and both may be computed quickly. A permutation version of the test is additionally proposed for very small sample sizes. Applied to both simulated data and two real datasets, the proposed method is shown to exhibit very good statistical properties, with an increase in stability and power when compared to state of the art methods ROAST, edgeR and DESeq2, which can fail to control the type I error under certain realistic settings. We have made the method available for the community in the R package tcgsaseq.Comment: 23 pages, 6 figures, typo corrections & acceptance acknowledgemen

    Network-based approaches to explore complex biological systems towards network medicine

    Get PDF
    Network medicine relies on different types of networks: from the molecular level of protein–protein interactions to gene regulatory network and correlation studies of gene expression. Among network approaches based on the analysis of the topological properties of protein–protein interaction (PPI) networks, we discuss the widespread DIAMOnD (disease module detection) algorithm. Starting from the assumption that PPI networks can be viewed as maps where diseases can be identified with localized perturbation within a specific neighborhood (i.e., disease modules), DIAMOnD performs a systematic analysis of the human PPI network to uncover new disease-associated genes by exploiting the connectivity significance instead of connection density. The past few years have witnessed the increasing interest in understanding the molecular mechanism of post-transcriptional regulation with a special emphasis on non-coding RNAs since they are emerging as key regulators of many cellular processes in both physiological and pathological states. Recent findings show that coding genes are not the only targets that microRNAs interact with. In fact, there is a pool of different RNAs—including long non-coding RNAs (lncRNAs) —competing with each other to attract microRNAs for interactions, thus acting as competing endogenous RNAs (ceRNAs). The framework of regulatory networks provides a powerful tool to gather new insights into ceRNA regulatory mechanisms. Here, we describe a data-driven model recently developed to explore the lncRNA-associated ceRNA activity in breast invasive carcinoma. On the other hand, a very promising example of the co-expression network is the one implemented by the software SWIM (switch miner), which combines topological properties of correlation networks with gene expression data in order to identify a small pool of genes—called switch genes—critically associated with drastic changes in cell phenotype. Here, we describe SWIM tool along with its applications to cancer research and compare its predictions with DIAMOnD disease genes
    corecore