7 research outputs found

    Sample size calculation while controlling false discovery rate for differential expression analysis with RNA-sequencing experiments

    Get PDF
    This excel file contains comparison of resulting sample size and power between Li et al.’s method [18] and our proposed method for simulation 1, with parameter settings from Table 1 in [18]. The results are obtained under m=200, with Li’s result in the first row from each parameter setting, and our result in the second row. (XLS 49.2 kb

    Dedicated transcriptomics combined with power analysis lead to functional understanding of genes with weak phenotypic changes in knockout lines

    Get PDF
    Author summary Knockout mice benefit the understanding of gene functions in mammals. However, it has proven difficult for many genes to identify clear phenotypes, related due to lack of sufficient assays. As Lewis Wolpert put it in a famous quote “But did you take them to the opera?”, thus metaphorically alluding to the need to extend phenotyping efforts. This insight led to the establishment of phenotyping pipelines that are nowadays routinely used to characterize knock-out lines. However, transcriptomic approaches based on RNA-Seq have been much less explored for such deep-level studies. We conducted here both, a theoretical power analysis and practical RNA-Seq experiments on two knockout lines with small phenotypic effects to investigate the parameters including sample size, sequencing depth, fold change, and dispersion. Our dedicated RNA-Seq studies discovered thousands of genes with small transcriptional changes and enriched in specific functions in both knockout lines. We find that it is more important to increase the number of samples than to increase the sequencing depth. Our work shows that a deep RNA-Seq study on knockouts is powerful for understanding gene functions in cases of weak phenotypic effects, and provides a guideline for the experimental design of such studies

    Sample size calculations and normalization methods for RNA-seq data.

    Get PDF
    High-throughput RNA sequencing (RNA-seq) has become the preferred choice for transcriptomics and gene expression studies. With the rapid growth of RNA-seq applications, sample size calculation methods for RNA-seq experiment design and data normalization methods for DEG analysis are important issues to be explored and discussed. The underlying theme of this dissertation is to develop novel sample size calculation methods in RNA-seq experiment design using test statistics. I have also proposed two novel normalization methods for analysis of RNA-seq data. In chapter one, I present the test statistical methods including Wald’s test, log-transformed Wald’s test and likelihood ratio test statistics for RNA-seq data with a negative binomial distribution. Following the test statistics, I present the five sample calculation methods based on a one-sided test. A comparison of my five methods and an existing method was performed by calculating the sample sizes and the simulated power in different scenarios. Due to the limitations of these methods, in chapter two, I have further derived two explicit sample size calculation methods based on a generalized linear model with a negative binomial distribution in RNA-seq data. These two sample size methods based on a two-sided Wald’s test are presented under a wide range of settings including the imbalanced design and unequal read depth, which is applicable in many situations. In chapter 3, I have a literature review of the existing normalization methods and describe the challenge of choosing an optimal normalization method due to multiple factors contributing to read count variability that effect overall the sensitivity and specificity. Then, I present two proposed normalization methods. I evaluate the performance of the commonly used methods (DESeq, TMM-edgeR, FPKM-CuffDiff, TC, Med, UQ and FQ) and two new methods I propose: Med-pgQ2 and UQ-pgQ2. The results from MAQC2 data shows that my proposed Med-pgQ2 and UQ-pgQ2 methods may be better choices for the differential gene analysis of RNA-seq data by improving specificity while maintaining a good detection power given a nominal FDR level. Finally, in chapter 4, I focus on data analysis in RNA-seq data using three normalization methods and two test statistic method with the aid of DESeq2 and edgeR packages. Through within-group analysis of these real RNA-seq data, I have found my normalization method, UQ-pgQ2, performs best with a lower false positive rate while maintaining a good detection power. Thus, in my work, I have derived the explicit sample size calculation methods, which is a very useful tool for researchers to quickly estimate the sample sizes in an experiment design. Furthermore, my two normalization methods can improve the performance for differential gene analysis of RNA-seq data by controlling false positives for high read count genes
    corecore