Search CORE

27 research outputs found

A nonparametric empirical Bayes approach to covariance matrix estimation

Author: Xin Huiqin
Zhao Sihai Dave
Publication venue
Publication date: 10/04/2021
Field of study

We propose an empirical Bayes method to estimate high-dimensional covariance matrices. Our procedure centers on vectorizing the covariance matrix and treating matrix estimation as a vector estimation problem. Drawing from the compound decision theory literature, we introduce a new class of decision rules that generalizes several existing procedures. We then use a nonparametric empirical Bayes g-modeling approach to estimate the oracle optimal rule in that class. This allows us to let the data itself determine how best to shrink the estimator, rather than shrinking in a pre-determined direction such as toward a diagonal matrix. Simulation results and a gene expression network analysis shows that our approach can outperform a number of state-of-the-art proposals in a wide range of settings, sometimes substantially.Comment: 20 pages, 4 figure

arXiv.org e-Print Archive

Nonparametric false discovery rate control for identifying simultaneous signals

Author: Nguyen Yet Tien
Zhao Sihai Dave
Publication venue
Publication date: 15/01/2019
Field of study

It is frequently of interest to jointly analyze multiple sequences of multiple tests in order to identify simultaneous signals, defined as features tested in multiple studies whose test statistics are non-null in each. In many problems, however, the null distributions of the test statistics may be complicated or even unknown, and there do not currently exist any procedures that can be employed in these cases. This paper proposes a new nonparametric procedure that can identify simultaneous signals across multiple studies even without knowing the null distributions of the test statistics. The method is shown to asymptotically control the false discovery rate, and in simulations had excellent power and error control. In an analysis of gene expression and histone acetylation patterns in the brains of mice exposed to a conspecific intruder, it identified genes that were both differentially expressed and next to differentially accessible chromatin. The proposed method is available in the R package github.com/sdzhao/ssa

arXiv.org e-Print Archive

Old Dominion University

Principled Sure Independence Screening for Cox Models with Ultra-high-dimensional Covariates

Author: Li Yi
Zhao Sihai Dave
Publication venue: Collection of Biostatistics Research Archive
Publication date: 19/07/2010
Field of study

Collection Of Biostatistics Research Archive

Nonparametric False Discovery Rate Control for Identifying Simultaneous Signals

Author: Nguyen Yet Tian
Zhao Sihai Dave
Publication venue: ODU Digital Commons
Publication date: 01/01/2020
Field of study

It is frequently of interest to identify simultaneous signals, defined as features that exhibit statistical significance across each of several independent experiments. For example, genes that are consistently differentially expressed across experiments in different animal species can reveal evolutionarily conserved biological mechanisms. However, in some problems the test statistics corresponding to these features can have complicated or unknown null distributions. This paper proposes a novel nonparametric false discovery rate control procedure that can identify simultaneous signals even without knowing these null distributions. The method is shown, theoretically and in simulations, to asymptotically control the false discovery rate. It was also used to identify genes that were both differentially expressed and proximal to differentially accessible chromatin in the brains of mice exposed to a conspecific intruder. The proposed method is available in the R package github.com/sdzhao/ssa

Old Dominion University

A New Class of Dantzig Selectors for Censored Linear Regression Models

Author: Dicker Lee
Li Yi
Zhao Sihai Dave
Publication venue: Collection of Biostatistics Research Archive
Publication date: 01/01/2010
Field of study

The Dantzig variable selector has recently emerged as a powerful tool for fitting regularized regression models. A key advantage is that it does not pertain to a particular likelihood or objective function, as opposed to the existing penalized likelihood methods, and hence has the potential for wide applicability. To our knowledge, limited work has been done for the Dantzig selector when the outcome is subject to censoring. This paper proposes a new class of Dantzig variable selectors for linear regression models for right-censored outcomes. We first establish the finite sample error bound for the estimator and show the proposed selector is nearly optimal in the `2 sense. To improve model selection performance, we further propose an adaptive Dantzig variable selector and discuss its large sample properties, namely, consistency in model selection and asymptotic normality of the estimator. The practical utility of the proposed adaptive Dantzig selectors is verified via extensive simulations. We apply the proposed methods to a myeloma clinical trial and identify important predictive genes for patients ’ survival

CiteSeerX

Collection Of Biostatistics Research Archive

Transcriptional regulatory dynamics drive coordinated metabolic and neural response to social challenge in mice

Author: Caetano-Anolles Derek
Chandrasekaran Sriram
Lu Xiaochen
Saul Michael C.
Seward Christopher H.
Sinha Saurabh
Sloofman Laura G.
Stubbs Lisa
Sun Hao
Troy Joseph M.
Weisner Patricia A.
Zhang Huimin
Zhao Sihai Dave
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 29/03/2017
Field of study

Agonistic encounters are powerful effectors of future behavior, and the ability to learn from this type of social challenge is an essential adaptive trait. We recently identified a conserved transcriptional program defining the response to social challenge across animal species, highly enriched in transcription factor (TF), energy metabolism, and developmental signaling genes. To understand the trajectory of this program and to uncover the most important regulatory influences controlling this response, we integrated gene expression data with the chromatin landscape in the hypothalamus, frontal cortex, and amygdala of socially challenged mice over time. The expression data revealed a complex spatiotemporal patterning of events starting with neural signaling molecules in the frontal cortex and ending in the modulation of developmental factors in the amygdala and hypothalamus, underpinned by a systems-wide shift in expression of energy metabolism-related genes. The transcriptional signals were correlated with significant shifts in chromatin accessibility and a network of challenge-associated TFs. Among these, the conserved metabolic and developmental regulator ESRRA was highlighted for an especially early and important regulatory role. Cell-type deconvolution analysis attributed the differential metabolic and developmental signals in this social context primarily to oligodendrocytes and neurons, respectively, and we show that ESRRA is expressed in both cell types. Localizing ESRRA binding sites in cortical chromatin, we show that this nuclear receptor binds both differentially expressed energy-related and neurodevelopmental TF genes. These data link metabolic and neurodevelopmental signali ng to social challenge, and identify key regulatory drivers of this process with unprecedented tissue and temporal resolution

DSpace@MIT

Crossref

A nonparametric regression approach to asymptotically optimal estimation of normal means

Author: Barbehenn Alton
Zhao Sihai Dave
Publication venue
Publication date: 30/04/2022
Field of study

Simultaneous estimation of multiple parameters has received a great deal of recent interest, with applications in multiple testing, causal inference, and large-scale data analysis. Most approaches to simultaneous estimation use empirical Bayes methodology. Here we propose an alternative, completely frequentist approach based on nonparametric regression. We show that simultaneous estimation can be viewed as a constrained and penalized least-squares regression problem, so that empirical risk minimization can be used to estimate the optimal estimator within a certain class. We show that under mild conditions, our data-driven decision rules have asymptotically optimal risk that can match the best known convergence rates for this compound estimation problem. Our approach provides another perspective to understand sufficient conditions for asymptotic optimality of simultaneous estimation. Our proposed estimators demonstrate comparable performance to state-of-the-art empirical Bayes methods in a variety of simulation settings and our methodology can be extended to apply to many practically interesting settings

arXiv.org e-Print Archive