120 research outputs found
Recommended from our members
The Level of Residual Dispersion Variation and the Power of Differential Expression Tests for RNA-Seq Data
RNA-Sequencing (RNA-Seq) has been widely adopted for quantifying gene expression changes in comparative transcriptome analysis. For detecting differentially expressed genes, a variety of statistical methods based on the negative binomial (NB) distribution have been proposed. These methods differ in the ways they handle the NB nuisance parameters (i.e., the dispersion parameters associated with each gene) to save power, such as by using a dispersion model to exploit an apparent relationship between the dispersion parameter and the NB mean. Presumably, dispersion models with fewer parameters will result in greater power if the models are correct, but will produce misleading conclusions if not. This paper investigates this power and robustness trade-off by assessing rates of identifying true differential expression using the various methods under realistic assumptions about NB dispersion parameters. Our results indicate that the relative performances of the different methods are closely related to the level of dispersion variation unexplained by the dispersion model. We propose a simple statistic to quantify the level of residual dispersion variation from a fitted dispersion model and show that the magnitude of this statistic gives hints about whether and how much we can gain statistical power by a dispersion-modeling approach
Recommended from our members
Goodness-of-Fit Tests and Model Diagnostics for Negative Binomial Regression of RNA Sequencing Data
This work is about assessing model adequacy for negative binomial (NB) regression, particularly (1) assessing the adequacy of the NB assumption, and (2) assessing the appropriateness of models for NB dispersion parameters. Tools for the first are appropriate for NB regression generally; those for the second are primarily intended for RNA sequencing (RNA-Seq) data analysis. The typically small number of biological samples and large number of genes in RNA-Seq analysis motivate us to address the trade-offs between robustness and statistical power using NB regression models. One widely-used power-saving strategy, for example, is to assume some commonalities of NB dispersion parameters across genes via simple models relating them to mean expression rates, and many such models have been proposed. As RNA-Seq analysis is becoming ever more popular, it is appropriate to make more thorough investigations into power and robustness of the resulting methods, and into practical tools for model assessment. In this article, we propose simulation-based statistical tests and diagnostic graphics to address model adequacy. We provide simulated and real data examples to illustrate that our proposed methods are effective for detecting the misspecification of the NB mean-variance relationship as well as judging the adequacy of fit of several NB dispersion models
Recommended from our members
Differential Expression of Genes Involved in Host Recognition, Attachment, and Degradation in the Mycoparasite Tolypocladium ophioglossoides
This is the publisher’s final pdf. The published article is copyrighted by the author(s) and published by the Genetics Society of America. The published article can be found at: https://doi.org/10.1534/g3.116.027045The ability of a fungus to infect novel hosts is dependent on changes in gene content, expression, or regulation. Examining gene expression under simulated host conditions can explore which genes may contribute to host jumping. Insect pathogenesis is the inferred ancestral character state for species of Tolypocladium, however several species are parasites of truffles, including Tolypocladium ophioglossoides. To identify potentially crucial genes in this interkingdom host switch, T. ophioglossoides was grown on four media conditions: media containing the inner and outer portions of its natural host (truffles of Elaphomyces), cuticles from an ancestral host (beetle), and a rich medium (Yeast Malt). Through high-throughput RNASeq of mRNA from these conditions, many differentially expressed genes were identified in the experiment. These included PTH11-related G-protein-coupled receptors (GPCRs) hypothesized to be involved in host recognition, and also found to be upregulated in insect pathogens. A divergent chitinase with a signal peptide was also found to be highly upregulated on media containing truffle tissue, suggesting an exogenous degradative activity in the presence of the truffle host. The adhesin gene, Mad1, was highly expressed on truffle media as well. A BiNGO analysis of overrepresented GO terms from genes expressed during each growth condition found that genes involved in redox reactions and transmembrane transport were the most overrepresented during T. ophioglossoides growth on truffle media, suggesting their importance in growth on fungal tissue as compared to other hosts and environments. Genes involved in secondary metabolism were most highly expressed during growth on insect tissue, suggesting that their products may not be necessary during parasitism of Elaphomyces. This study provides clues into understanding genetic mechanisms underlying the transition from insect to truffle parasitism
Recommended from our members
Length Bias Correction in Gene Ontology Enrichment Analysis Using Logistic Regression
When assessing differential gene expression from RNA sequencing data, commonly used statistical tests tend to have greater power to detect differential expression of genes encoding longer transcripts. This phenomenon, called "length bias", will influence subsequent analyses such as Gene Ontology enrichment analysis. In the presence of length bias, Gene Ontology categories that include longer genes are more likely to be identified as enriched. These categories, however, are not necessarily biologically more relevant. We show that one can effectively adjust for length bias in Gene Ontology analysis by including transcript length as a covariate in a logistic regression model. The logistic regression model makes the statistical issue underlying length bias more transparent: transcript length becomes a confounding factor when it correlates with both the Gene Ontology membership and the significance of the differential expression test. The inclusion of the transcript length as a covariate allows one to investigate the direct correlation between the Gene Ontology membership and the significance of testing differential expression, conditional on the transcript length. We present both real and simulated data examples to show that the logistic regression approach is simple, effective, and flexible
Recommended from our members
Higher order asymptotics for negative binomial regression inferences from RNA-sequencing data
RNA sequencing (RNA-Seq) is the current method of choice for characterizing transcriptomes and
quantifying gene expression changes. This next generation sequencing-based method provides unprecedented
depth and resolution. The negative binomial (NB) probability distribution has been shown to be a
useful model for frequencies of mapped RNA-Seq reads and consequently provides a basis for statistical analysis
of gene expression. Negative binomial exact tests are available for two-group comparisons but do not
extend to negative binomial regression analysis, which is important for examining gene expression as a function
of explanatory variables and for adjusted group comparisons accounting for other factors. We address
the adequacy of available large-sample tests for the small sample sizes typically available from RNA-Seq
studies and consider a higher-order asymptotic (HOA) adjustment to likelihood ratio tests. We demonstrate
that 1) the HOA-adjusted likelihood ratio test is practically indistinguishable from the exact test in situations
where the exact test is available, 2) the type I error of the HOA test matches the nominal specification in
regression settings we examined via simulation, and 3) the power of the likelihood ratio test does not appear
to be affected by the HOA adjustment. This work helps clarify the accuracy of the unadjusted likelihood ratio
test and the degree of improvement available with the HOA adjustment. Furthermore, the HOA test may be
preferable even when the exact test is available because it does not require ad hoc library size adjustments.Keywords: Regression, RNA-Seq, Overdispersion, Extra- Poisson variation, Negative binomial, Higher-order asymptotic
Recommended from our members
Methylome reorganization during in vitro dedifferentiation and regeneration of Populus trichocarpa
Background: Cytosine DNA methylation (5mC) is an epigenetic modification that is important to genome stability and regulation of gene expression. Perturbations of 5mC have been implicated as a cause of phenotypic variation among plants regenerated through in vitro culture systems. However, the pattern of change in 5mC and its functional role with respect to gene expression, are poorly understood at the genome scale. A fuller understanding of how 5mC changes during in vitro manipulation may aid the development of methods for reducing or amplifying the mutagenic and epigenetic effects of in vitro culture and plant transformation.
Results: We investigated the in vitro methylome of the model tree species Populus trichocarpa in a system that mimics routine methods for regeneration and plant transformation in the genus Populus (poplar). Using methylated DNA immunoprecipitation followed by high-throughput sequencing (MeDIP-seq), we compared the methylomes of internode stem segments from micropropagated explants, dedifferentiated calli, and internodes from regenerated plants. We found that more than half (56%) of the methylated portion of the genome appeared to be differentially methylated among the three tissue types. Surprisingly, gene promoter methylation varied little among tissues, however, the percentage of body-methylated genes increased from 9% to 14% between explants and callus tissue, then decreased to 8% in regenerated internodes. Forty-five percent of differentially-methylated genes underwent transient methylation, becoming methylated in calli, and demethylated in regenerants. These genes were more frequent in chromosomal regions with higher gene density. Comparisons with an expression microarray dataset showed that genes methylated at both promoters and gene bodies had lower expression than genes that were unmethylated or only promoter-methylated in all three tissues. Four types of abundant transposable elements showed their highest levels of 5mC in regenerated internodes.
Conclusions: DNA methylation varies in a highly gene-and chromosome-differential manner during in vitro differentiation and regeneration. 5mC in redifferentiated tissues was not reset to that in original explants during the study period. Hypermethylation of gene bodies in dedifferentiated cells did not interfere with transcription, and may serve a protective role against activation of abundant transposable elements.Keywords: Jacq, Palm Elaeis guineensis, Expression analysis, Genome, Cytosine methylation, Arabidopsis cells, Somaclonal variation, Plants, Tissue culture, Gene
- …