27 research outputs found
A nonparametric empirical Bayes approach to covariance matrix estimation
We propose an empirical Bayes method to estimate high-dimensional covariance
matrices. Our procedure centers on vectorizing the covariance matrix and
treating matrix estimation as a vector estimation problem. Drawing from the
compound decision theory literature, we introduce a new class of decision rules
that generalizes several existing procedures. We then use a nonparametric
empirical Bayes g-modeling approach to estimate the oracle optimal rule in that
class. This allows us to let the data itself determine how best to shrink the
estimator, rather than shrinking in a pre-determined direction such as toward a
diagonal matrix. Simulation results and a gene expression network analysis
shows that our approach can outperform a number of state-of-the-art proposals
in a wide range of settings, sometimes substantially.Comment: 20 pages, 4 figure
Nonparametric false discovery rate control for identifying simultaneous signals
It is frequently of interest to jointly analyze multiple sequences of
multiple tests in order to identify simultaneous signals, defined as features
tested in multiple studies whose test statistics are non-null in each. In many
problems, however, the null distributions of the test statistics may be
complicated or even unknown, and there do not currently exist any procedures
that can be employed in these cases. This paper proposes a new nonparametric
procedure that can identify simultaneous signals across multiple studies even
without knowing the null distributions of the test statistics. The method is
shown to asymptotically control the false discovery rate, and in simulations
had excellent power and error control. In an analysis of gene expression and
histone acetylation patterns in the brains of mice exposed to a conspecific
intruder, it identified genes that were both differentially expressed and next
to differentially accessible chromatin. The proposed method is available in the
R package github.com/sdzhao/ssa
Nonparametric False Discovery Rate Control for Identifying Simultaneous Signals
It is frequently of interest to identify simultaneous signals, defined as features that exhibit statistical significance across each of several independent experiments. For example, genes that are consistently differentially expressed across experiments in different animal species can reveal evolutionarily conserved biological mechanisms. However, in some problems the test statistics corresponding to these features can have complicated or unknown null distributions. This paper proposes a novel nonparametric false discovery rate control procedure that can identify simultaneous signals even without knowing these null distributions. The method is shown, theoretically and in simulations, to asymptotically control the false discovery rate. It was also used to identify genes that were both differentially expressed and proximal to differentially accessible chromatin in the brains of mice exposed to a conspecific intruder. The proposed method is available in the R package github.com/sdzhao/ssa
A New Class of Dantzig Selectors for Censored Linear Regression Models
The Dantzig variable selector has recently emerged as a powerful tool for fitting regularized regression models. A key advantage is that it does not pertain to a particular likelihood or objective function, as opposed to the existing penalized likelihood methods, and hence has the potential for wide applicability. To our knowledge, limited work has been done for the Dantzig selector when the outcome is subject to censoring. This paper proposes a new class of Dantzig variable selectors for linear regression models for right-censored outcomes. We first establish the finite sample error bound for the estimator and show the proposed selector is nearly optimal in the `2 sense. To improve model selection performance, we further propose an adaptive Dantzig variable selector and discuss its large sample properties, namely, consistency in model selection and asymptotic normality of the estimator. The practical utility of the proposed adaptive Dantzig selectors is verified via extensive simulations. We apply the proposed methods to a myeloma clinical trial and identify important predictive genes for patients ’ survival
Transcriptional regulatory dynamics drive coordinated metabolic and neural response to social challenge in mice
Agonistic encounters are powerful effectors of future behavior, and the ability to learn from this type of social challenge is an essential adaptive trait. We recently identified a conserved transcriptional program defining the response to social challenge across animal species, highly enriched in transcription factor (TF), energy metabolism, and developmental signaling genes. To understand the trajectory of this program and to uncover the most important regulatory influences controlling this response, we integrated gene expression data with the chromatin landscape in the hypothalamus, frontal cortex, and amygdala of socially challenged mice over time. The expression data revealed a complex spatiotemporal patterning of events starting with neural signaling molecules in the frontal cortex and ending in the modulation of developmental factors in the amygdala and hypothalamus, underpinned by a systems-wide shift in expression of energy metabolism-related genes. The transcriptional signals were correlated with significant shifts in chromatin accessibility and a network of challenge-associated TFs. Among these, the conserved metabolic and developmental regulator ESRRA was highlighted for an especially early and important regulatory role. Cell-type deconvolution analysis attributed the differential metabolic and developmental signals in this social context primarily to oligodendrocytes and neurons, respectively, and we show that ESRRA is expressed in both cell types. Localizing ESRRA binding sites in cortical chromatin, we show that this nuclear receptor binds both differentially expressed energy-related and neurodevelopmental TF genes. These data link metabolic and neurodevelopmental signali ng to social challenge, and identify key regulatory drivers of this process with unprecedented tissue and temporal resolution
A nonparametric regression approach to asymptotically optimal estimation of normal means
Simultaneous estimation of multiple parameters has received a great deal of
recent interest, with applications in multiple testing, causal inference, and
large-scale data analysis. Most approaches to simultaneous estimation use
empirical Bayes methodology. Here we propose an alternative, completely
frequentist approach based on nonparametric regression. We show that
simultaneous estimation can be viewed as a constrained and penalized
least-squares regression problem, so that empirical risk minimization can be
used to estimate the optimal estimator within a certain class. We show that
under mild conditions, our data-driven decision rules have asymptotically
optimal risk that can match the best known convergence rates for this compound
estimation problem. Our approach provides another perspective to understand
sufficient conditions for asymptotic optimality of simultaneous estimation. Our
proposed estimators demonstrate comparable performance to state-of-the-art
empirical Bayes methods in a variety of simulation settings and our methodology
can be extended to apply to many practically interesting settings