193 research outputs found
Extracting Functional Modules from Biological Pathways
It has been proposed that functional modules are the fundamental units of cellular function. Methods to identify these modules have thus far relied on gene expression data or protein-protein interaction (PPI) data, but have a few limitations. We propose a new method, using biological pathway data to identify functional modules, that can potentially overcome these limitations. We also construct a network of these modules using functionally relevant PPI data. This network displays the flow and integration of information between modules and can be used to map cellular function
A nonparametric empirical Bayes approach to covariance matrix estimation
We propose an empirical Bayes method to estimate high-dimensional covariance
matrices. Our procedure centers on vectorizing the covariance matrix and
treating matrix estimation as a vector estimation problem. Drawing from the
compound decision theory literature, we introduce a new class of decision rules
that generalizes several existing procedures. We then use a nonparametric
empirical Bayes g-modeling approach to estimate the oracle optimal rule in that
class. This allows us to let the data itself determine how best to shrink the
estimator, rather than shrinking in a pre-determined direction such as toward a
diagonal matrix. Simulation results and a gene expression network analysis
shows that our approach can outperform a number of state-of-the-art proposals
in a wide range of settings, sometimes substantially.Comment: 20 pages, 4 figure
Nonparametric false discovery rate control for identifying simultaneous signals
It is frequently of interest to jointly analyze multiple sequences of
multiple tests in order to identify simultaneous signals, defined as features
tested in multiple studies whose test statistics are non-null in each. In many
problems, however, the null distributions of the test statistics may be
complicated or even unknown, and there do not currently exist any procedures
that can be employed in these cases. This paper proposes a new nonparametric
procedure that can identify simultaneous signals across multiple studies even
without knowing the null distributions of the test statistics. The method is
shown to asymptotically control the false discovery rate, and in simulations
had excellent power and error control. In an analysis of gene expression and
histone acetylation patterns in the brains of mice exposed to a conspecific
intruder, it identified genes that were both differentially expressed and next
to differentially accessible chromatin. The proposed method is available in the
R package github.com/sdzhao/ssa
Nonparametric False Discovery Rate Control for Identifying Simultaneous Signals
It is frequently of interest to identify simultaneous signals, defined as features that exhibit statistical significance across each of several independent experiments. For example, genes that are consistently differentially expressed across experiments in different animal species can reveal evolutionarily conserved biological mechanisms. However, in some problems the test statistics corresponding to these features can have complicated or unknown null distributions. This paper proposes a novel nonparametric false discovery rate control procedure that can identify simultaneous signals even without knowing these null distributions. The method is shown, theoretically and in simulations, to asymptotically control the false discovery rate. It was also used to identify genes that were both differentially expressed and proximal to differentially accessible chromatin in the brains of mice exposed to a conspecific intruder. The proposed method is available in the R package github.com/sdzhao/ssa
- …