61 research outputs found

### UPS delivers optimal phase diagram in high-dimensional variable selection

Consider a linear model $Y=X\beta+z$, $z\sim N(0,I_n)$. Here, $X=X_{n,p}$,
where both $p$ and $n$ are large, but $p>n$. We model the rows of $X$ as i.i.d.
samples from $N(0,\frac{1}{n}\Omega)$, where $\Omega$ is a $p\times p$
correlation matrix, which is unknown to us but is presumably sparse. The vector
$\beta$ is also unknown but has relatively few nonzero coordinates, and we are
interested in identifying these nonzeros. We propose the Univariate
Penalization Screeing (UPS) for variable selection. This is a screen and clean
method where we screen with univariate thresholding and clean with penalized
MLE. It has two important properties: sure screening and separable after
screening. These properties enable us to reduce the original regression problem
to many small-size regression problems that can be fitted separately. The UPS
is effective both in theory and in computation.Comment: Published in at http://dx.doi.org/10.1214/11-AOS947 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org

### Higher criticism for detecting sparse heterogeneous mixtures

Higher criticism, or second-level significance testing, is a
multiple-comparisons concept mentioned in passing by Tukey. It concerns a
situation where there are many independent tests of significance and one is
interested in rejecting the joint null hypothesis. Tukey suggested comparing
the fraction of observed significances at a given \alpha-level to the expected
fraction under the joint null. In fact, he suggested standardizing the
difference of the two quantities and forming a z-score; the resulting z-score
tests the significance of the body of significance tests. We consider a
generalization, where we maximize this z-score over a range of significance
levels 0<\alpha\leq\alpha_0.
We are able to show that the resulting higher criticism statistic is
effective at resolving a very subtle testing problem: testing whether n normal
means are all zero versus the alternative that a small fraction is nonzero. The
subtlety of this ``sparse normal means'' testing problem can be seen from work
of Ingster and Jin, who studied such problems in great detail. In their
studies, they identified an interesting range of cases where the small fraction
of nonzero means is so small that the alternative hypothesis exhibits little
noticeable effect on the distribution of the p-values either for the bulk of
the tests or for the few most highly significant tests.
In this range, when the amplitude of nonzero means is calibrated with the
fraction of nonzero means, the likelihood ratio test for a precisely specified
alternative would still succeed in separating the two hypotheses.Comment: Published by the Institute of Mathematical Statistics
(http://www.imstat.org) in the Annals of Statistics
(http://www.imstat.org/aos/) at http://dx.doi.org/10.1214/00905360400000026

### Asymptotic minimaxity of False Discovery Rate thresholding for sparse exponential data

We apply FDR thresholding to a non-Gaussian vector whose coordinates X_i,
i=1,..., n, are independent exponential with individual means $\mu_i$. The
vector $\mu =(\mu_i)$ is thought to be sparse, with most coordinates 1 but a
small fraction significantly larger than 1; roughly, most coordinates are
simply `noise,' but a small fraction contain `signal.' We measure risk by
per-coordinate mean-squared error in recovering $\log(\mu_i)$, and study
minimax estimation over parameter spaces defined by constraints on the
per-coordinate p-norm of $\log(\mu_i)$:
$\frac{1}{n}\sum_{i=1}^n\log^p(\mu_i)\leq \eta^p$. We show for large n and
small $\eta$ that FDR thresholding can be nearly Minimax. The FDR control
parameter 0<q<1 plays an important role: when $q\leq 1/2$, the FDR estimator is
nearly minimax, while choosing a fixed q>1/2 prevents near minimaxity. These
conclusions mirror those found in the Gaussian case in Abramovich et al. [Ann.
Statist. 34 (2006) 584--653]. The techniques developed here seem applicable to
a wide range of other distributional assumptions, other loss measures and
non-i.i.d. dependency structures.Comment: Published at http://dx.doi.org/10.1214/009053606000000920 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org

### Optimal rates of convergence for estimating the null density and proportion of nonnull effects in large-scale multiple testing

An important estimation problem that is closely related to large-scale
multiple testing is that of estimating the null density and the proportion of
nonnull effects. A few estimators have been introduced in the literature;
however, several important problems, including the evaluation of the minimax
rate of convergence and the construction of rate-optimal estimators, remain
open. In this paper, we consider optimal estimation of the null density and the
proportion of nonnull effects. Both minimax lower and upper bounds are derived.
The lower bound is established by a two-point testing argument, where at the
core is the novel construction of two least favorable marginal densities $f_1$
and $f_2$. The density $f_1$ is heavy tailed both in the spatial and frequency
domains and $f_2$ is a perturbation of $f_1$ such that the characteristic
functions associated with $f_1$ and $f_2$ match each other in low frequencies.
The minimax upper bound is obtained by constructing estimators which rely on
the empirical characteristic function and Fourier analysis. The estimator is
shown to be minimax rate optimal. Compared to existing methods in the
literature, the proposed procedure not only provides more precise estimates of
the null density and the proportion of the nonnull effects, but also yields
more accurate results when used inside some multiple testing procedures which
aim at controlling the False Discovery Rate (FDR). The procedure is easy to
implement and numerical results are given.Comment: Published in at http://dx.doi.org/10.1214/09-AOS696 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org

### Estimating the Null and the Proportion of non-Null effects in Large-scale Multiple Comparisons

An important issue raised by Efron in the context of large-scale multiple
comparisons is that in many applications the usual assumption that the null
distribution is known is incorrect, and seemingly negligible differences in the
null may result in large differences in subsequent studies. This suggests that
a careful study of estimation of the null is indispensable.
In this paper, we consider the problem of estimating a null normal
distribution, and a closely related problem, estimation of the proportion of
non-null effects. We develop an approach based on the empirical characteristic
function and Fourier analysis. The estimators are shown to be uniformly
consistent over a wide class of parameters. Numerical performance of the
estimators is investigated using both simulated and real data. In particular,
we apply our procedure to the analysis of breast cancer and HIV microarray data
sets. The estimators perform favorably in comparison to existing methods.Comment: 42 pages, 6 figure

### Optimal classification in sparse Gaussian graphic model

Consider a two-class classification problem where the number of features is
much larger than the sample size. The features are masked by Gaussian noise
with mean zero and covariance matrix $\Sigma$, where the precision matrix
$\Omega=\Sigma^{-1}$ is unknown but is presumably sparse. The useful features,
also unknown, are sparse and each contributes weakly (i.e., rare and weak) to
the classification decision. By obtaining a reasonably good estimate of
$\Omega$, we formulate the setting as a linear regression model. We propose a
two-stage classification method where we first select features by the method of
Innovated Thresholding (IT), and then use the retained features and Fisher's
LDA for classification. In this approach, a crucial problem is how to set the
threshold of IT. We approach this problem by adapting the recent innovation of
Higher Criticism Thresholding (HCT). We find that when useful features are rare
and weak, the limiting behavior of HCT is essentially just as good as the
limiting behavior of ideal threshold, the threshold one would choose if the
underlying distribution of the signals is known (if only). Somewhat
surprisingly, when $\Omega$ is sufficiently sparse, its off-diagonal
coordinates usually do not have a major influence over the classification
decision. Compared to recent work in the case where $\Omega$ is the identity
matrix [Proc. Natl. Acad. Sci. USA 105 (2008) 14790-14795; Philos. Trans. R.
Soc. Lond. Ser. A Math. Phys. Eng. Sci. 367 (2009) 4449-4470], the current
setting is much more general, which needs a new approach and much more
sophisticated analysis. One key component of the analysis is the intimate
relationship between HCT and Fisher's separation. Another key component is the
tight large-deviation bounds for empirical processes for data with
unconventional correlation structures, where graph theory on vertex coloring
plays an important role.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1163 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org

- …