4 research outputs found
Detection of Sparse Positive Dependence
In a bivariate setting, we consider the problem of detecting a sparse
contamination or mixture component, where the effect manifests itself as a
positive dependence between the variables, which are otherwise independent in
the main component. We first look at this problem in the context of a normal
mixture model. In essence, the situation reduces to a univariate setting where
the effect is a decrease in variance. In particular, a higher criticism test
based on the pairwise differences is shown to achieve the detection boundary
defined by the (oracle) likelihood ratio test. We then turn to a Gaussian
copula model where the marginal distributions are unknown. Standard invariance
considerations lead us to consider rank tests. In fact, a higher criticism test
based on the pairwise rank differences achieves the detection boundary in the
normal mixture model, although not in the very sparse regime. We do not know of
any rank test that has any power in that regime
Nonparametric false discovery rate control for identifying simultaneous signals
It is frequently of interest to jointly analyze multiple sequences of
multiple tests in order to identify simultaneous signals, defined as features
tested in multiple studies whose test statistics are non-null in each. In many
problems, however, the null distributions of the test statistics may be
complicated or even unknown, and there do not currently exist any procedures
that can be employed in these cases. This paper proposes a new nonparametric
procedure that can identify simultaneous signals across multiple studies even
without knowing the null distributions of the test statistics. The method is
shown to asymptotically control the false discovery rate, and in simulations
had excellent power and error control. In an analysis of gene expression and
histone acetylation patterns in the brains of mice exposed to a conspecific
intruder, it identified genes that were both differentially expressed and next
to differentially accessible chromatin. The proposed method is available in the
R package github.com/sdzhao/ssa
Nonparametric False Discovery Rate Control for Identifying Simultaneous Signals
It is frequently of interest to identify simultaneous signals, defined as features that exhibit statistical significance across each of several independent experiments. For example, genes that are consistently differentially expressed across experiments in different animal species can reveal evolutionarily conserved biological mechanisms. However, in some problems the test statistics corresponding to these features can have complicated or unknown null distributions. This paper proposes a novel nonparametric false discovery rate control procedure that can identify simultaneous signals even without knowing these null distributions. The method is shown, theoretically and in simulations, to asymptotically control the false discovery rate. It was also used to identify genes that were both differentially expressed and proximal to differentially accessible chromatin in the brains of mice exposed to a conspecific intruder. The proposed method is available in the R package github.com/sdzhao/ssa