64 research outputs found

    Nonparametric false discovery rate control for identifying simultaneous signals

    Get PDF
    It is frequently of interest to jointly analyze multiple sequences of multiple tests in order to identify simultaneous signals, defined as features tested in multiple studies whose test statistics are non-null in each. In many problems, however, the null distributions of the test statistics may be complicated or even unknown, and there do not currently exist any procedures that can be employed in these cases. This paper proposes a new nonparametric procedure that can identify simultaneous signals across multiple studies even without knowing the null distributions of the test statistics. The method is shown to asymptotically control the false discovery rate, and in simulations had excellent power and error control. In an analysis of gene expression and histone acetylation patterns in the brains of mice exposed to a conspecific intruder, it identified genes that were both differentially expressed and next to differentially accessible chromatin. The proposed method is available in the R package github.com/sdzhao/ssa

    Multiple hypothesis testing and RNA-seq differential expression analysis accounting for dependence and relevant covariates

    Get PDF
    This dissertation is a collection of four papers on the development of statistical methods for the analysis of high-dimensional data, mostly RNA-seq gene expression data. We introduce in the first two papers two covariate-selection strategies for RNA-seq analysis. As in any experiment or observational study, covariates may hold information about heterogeneity of the experimental or observational units used in the investigation. Either ignoring relevant covariates or accounting for irrelevant covariates may be detrimental to RNA-seq analysis. We show through simulation that our methods outperform methods that do not take covariate selection into account. Next, we develop in the third paper a parametric bootstrap algorithm to analyze RNA-seq datasets from repeated measures designs. In such designs, RNA samples are extracted from each experimental unit at multiple time points. The read counts that result from RNA sequencing of the samples extracted from the same experimental unit tend to be temporally correlated. Simulation studies show the advantages of our method over alternatives that do not account for correlation among observations within experimental units. Finally, we develop a new method to estimate and control false discovery rate (FDR) when identifying simultaneous signals in two independent experiments. Our FDR estimation and control procedure is a generalization of the histogram-based FDR estimation and control procedure for one experiment proposed by Nettleton et al. (2016); Liang and Nettleton (2012). We show that our method performs well and better than other existing methods both in theory and in simulation

    Nonparametric False Discovery Rate Control for Identifying Simultaneous Signals

    Get PDF
    It is frequently of interest to identify simultaneous signals, defined as features that exhibit statistical significance across each of several independent experiments. For example, genes that are consistently differentially expressed across experiments in different animal species can reveal evolutionarily conserved biological mechanisms. However, in some problems the test statistics corresponding to these features can have complicated or unknown null distributions. This paper proposes a novel nonparametric false discovery rate control procedure that can identify simultaneous signals even without knowing these null distributions. The method is shown, theoretically and in simulations, to asymptotically control the false discovery rate. It was also used to identify genes that were both differentially expressed and proximal to differentially accessible chromatin in the brains of mice exposed to a conspecific intruder. The proposed method is available in the R package github.com/sdzhao/ssa

    Adjusting for Gene-Specific Covariates to Improve RNA-seq Analysis

    Get PDF
    Summary This paper suggests a novel positive false discovery rate (pFDR) controlling method for testing gene-specific hypotheses using a gene-specific covariate variable, such as gene length. We suppose the null probability depends on the covariate variable. In this context, we propose a rejection rule that accounts for heterogeneity among tests by employing two distinct types of null probabilities. We establish a pFDR estimator for a given rejection rule by following Storey\u27s q-value framework. A condition on a type 1 error posterior probability is provided that equivalently characterizes our rejection rule. We also present a suitable procedure for selecting a tuning parameter through cross-validation that maximizes the expected number of hypotheses declared significant. A simulation study demonstrates that our method is comparable to or better than existing methods across realistic scenarios. In data analysis, we find support for our method\u27s premise that the null probability varies with a gene-specific covariate variable

    Detecting Differentially Expressed Genes with RNA-seq Data Using Backward Selection to Account for the Effects of Relevant Covariates

    Get PDF
    A common challenge in analysis of transcriptomic data is to identify differentially expressed genes, i.e., genes whose mean transcript abundance levels differ across the levels of a factor of scientific interest. Transcript abundance levels can be measured simultaneously for thousands of genes in multiple biological samples using RNA sequencing (RNA-seq) technology. Part of the variation in RNA-seq measures of transcript abundance may be associated with variation in continuous and/or categorical covariates measured for each experimental unit or RNA sample. Ignoring relevant covariates or modeling the effects of irrelevant covariates can be detrimental to identifying differentially expressed genes. We propose a backward selection strategy for selecting a set of covariates whose effects are accounted for when searching for differentially expressed genes. We illustrate our approach through the analysis of an RNA-seq study intended to identify genes differentially expressed between two lines of pigs divergently selected for residual feed intake. We use simulation to show the advantages of our backward selection procedure over alternative strategies that either ignore or adjust for all measured covariates

    Differentially Expressed Genes in Blood from Young Pigs between Two Swine Lines Divergently Selected for Feed Efficiency: Potential Biomarkers for Improving Feed Efficiency

    Get PDF
    The goal of this study was to find potential gene expression biomarkers in blood of piglets that can be used to predict pigs’ future feed efficiency. Using RNA-seq technology, we found 453 genes were differentially expressed (false discovery rate (FDR) ≤ 0.05) in the blood of two Yorkshire lines of pigs divergently selected for feed efficiency (FE) based on residual feed intake (RFI). Genes involved in several biosynthetic processes were overrepresented among genes more highly expressed in the low RFI line compared to the high RFI line. Weighted gene co-expression network analysis (WGCNA) also revealed genes involved in some of these biosynthesis processes and having similar patterns of expression formed clusters. The average expression in the clusters was highly associated with lines (p \u3c 3.9E-07, R2 \u3e 0.59). Current findings implied these biosynthesis pathways might be more active in the high RFI line. After further stringent validation, some of the differentially expressed genes (DEGs) will be selected for validation as biomarkers for feed efficiency

    The Effect of PRRS Viral Level and Isolate on Tonsil Gene Expression

    Get PDF
    Porcine Reproductive and Respiratory Syndrome virus (PRRSV) can persist in tonsil tissue for \u3e150 days post infection (dpi) without clinical signs.This can occur even when PRRSV is cleared from serumand can result insecondary outbreaks. Tonsil tissue from commercial crossbred pigs that were experimentally infected with one of two PRRSV isolates, NVSL-97-7985 (NVSL) or KS-2006-72109 (KS06),was used to identify genes that were differentially expressed in pigs with extreme high or low tonsil PRRS viremia at 42 dpi. Results provide insighton the mechanisms of PRRSV persistence in tonsils and help to identify bio-markers for PRRSV persistence in tonsil tissue.This maylead tothe development of more effective strategies to reduce the chance of PRRS re-breaks

    Acute Systemic Inflammatory Response to Lipopolysaccharide Stimulation in Pigs Divergently Selected for Residual Feed Intake

    Get PDF
    Background: It is unclear whether improving feed efficiency by selection for low residual feed intake (RFI) compromises pigs’ immunocompetence. Here, we aimed at investigating whether pig lines divergently selected for RFI had different inflammatory responses to lipopolysaccharide (LPS) exposure, regarding to clinical presentations and transcriptomic changes in peripheral blood cells. Results: LPS injection induced acute systemic inflammation in both the low-RFI and high-RFI line (n = 8 per line). At 4 h post injection (hpi), the low-RFI line had a significantly lower (p= 0.0075) mean rectal temperature compared to the high-RFI line. However, no significant differences in complete blood count or levels of several plasma cytokines were detected between the two lines. Profiling blood transcriptomes at 0, 2, 6, and 24 hpi by RNA-sequencing revealed that LPS induced dramatic transcriptional changes, with 6296 genes differentially expressed at at least one time point post injection relative to baseline in at least one line (n =4 per line) (|log2(fold change)| ≥ log2(1.2); q \u3c 0.05). Furthermore, applying the same cutoffs, we detected 334 genes differentially expressed between the two lines at at least one time point, including 33 genes differentially expressed between the two lines at baseline. But no significant line-by-time interaction effects were detected. Genes involved in protein translation, defense response, immune response, and signaling were enriched in different co-expression clusters of genes responsive to LPS stimulation. The two lines were largely similar in their peripheral blood transcriptomic responses to LPS stimulation at the pathway level, although the low-RFI line had a slightly lower level of inflammatory response than the high-RFI line from 2 to 6 hpi and a slightly higher level of inflammatory response than the high-RFI line at 24 hpi. Conclusions: The pig lines divergently selected for RFI had a largely similar response to LPS stimulation. However, the low-RFI line had a relatively lower-level, but longer-lasting, inflammatory response compared to the high-RFI line. Our results suggest selection for feed efficient pigs does not significantly compromise a pig’sacute systemic inflammatory response to LPS, although slight differences in intensity and duration may occur
    • …
    corecore