61 research outputs found

    UPS delivers optimal phase diagram in high-dimensional variable selection

    Full text link
    Consider a linear model Y=Xβ+zY=X\beta+z, zN(0,In)z\sim N(0,I_n). Here, X=Xn,pX=X_{n,p}, where both pp and nn are large, but p>np>n. We model the rows of XX as i.i.d. samples from N(0,1nΩ)N(0,\frac{1}{n}\Omega), where Ω\Omega is a p×pp\times p correlation matrix, which is unknown to us but is presumably sparse. The vector β\beta is also unknown but has relatively few nonzero coordinates, and we are interested in identifying these nonzeros. We propose the Univariate Penalization Screeing (UPS) for variable selection. This is a screen and clean method where we screen with univariate thresholding and clean with penalized MLE. It has two important properties: sure screening and separable after screening. These properties enable us to reduce the original regression problem to many small-size regression problems that can be fitted separately. The UPS is effective both in theory and in computation.Comment: Published in at http://dx.doi.org/10.1214/11-AOS947 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Higher criticism for detecting sparse heterogeneous mixtures

    Full text link
    Higher criticism, or second-level significance testing, is a multiple-comparisons concept mentioned in passing by Tukey. It concerns a situation where there are many independent tests of significance and one is interested in rejecting the joint null hypothesis. Tukey suggested comparing the fraction of observed significances at a given \alpha-level to the expected fraction under the joint null. In fact, he suggested standardizing the difference of the two quantities and forming a z-score; the resulting z-score tests the significance of the body of significance tests. We consider a generalization, where we maximize this z-score over a range of significance levels 0<\alpha\leq\alpha_0. We are able to show that the resulting higher criticism statistic is effective at resolving a very subtle testing problem: testing whether n normal means are all zero versus the alternative that a small fraction is nonzero. The subtlety of this ``sparse normal means'' testing problem can be seen from work of Ingster and Jin, who studied such problems in great detail. In their studies, they identified an interesting range of cases where the small fraction of nonzero means is so small that the alternative hypothesis exhibits little noticeable effect on the distribution of the p-values either for the bulk of the tests or for the few most highly significant tests. In this range, when the amplitude of nonzero means is calibrated with the fraction of nonzero means, the likelihood ratio test for a precisely specified alternative would still succeed in separating the two hypotheses.Comment: Published by the Institute of Mathematical Statistics (http://www.imstat.org) in the Annals of Statistics (http://www.imstat.org/aos/) at http://dx.doi.org/10.1214/00905360400000026

    Asymptotic minimaxity of False Discovery Rate thresholding for sparse exponential data

    Full text link
    We apply FDR thresholding to a non-Gaussian vector whose coordinates X_i, i=1,..., n, are independent exponential with individual means μi\mu_i. The vector μ=(μi)\mu =(\mu_i) is thought to be sparse, with most coordinates 1 but a small fraction significantly larger than 1; roughly, most coordinates are simply `noise,' but a small fraction contain `signal.' We measure risk by per-coordinate mean-squared error in recovering log(μi)\log(\mu_i), and study minimax estimation over parameter spaces defined by constraints on the per-coordinate p-norm of log(μi)\log(\mu_i): 1ni=1nlogp(μi)ηp\frac{1}{n}\sum_{i=1}^n\log^p(\mu_i)\leq \eta^p. We show for large n and small η\eta that FDR thresholding can be nearly Minimax. The FDR control parameter 0<q<1 plays an important role: when q1/2q\leq 1/2, the FDR estimator is nearly minimax, while choosing a fixed q>1/2 prevents near minimaxity. These conclusions mirror those found in the Gaussian case in Abramovich et al. [Ann. Statist. 34 (2006) 584--653]. The techniques developed here seem applicable to a wide range of other distributional assumptions, other loss measures and non-i.i.d. dependency structures.Comment: Published at http://dx.doi.org/10.1214/009053606000000920 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Optimal rates of convergence for estimating the null density and proportion of nonnull effects in large-scale multiple testing

    Get PDF
    An important estimation problem that is closely related to large-scale multiple testing is that of estimating the null density and the proportion of nonnull effects. A few estimators have been introduced in the literature; however, several important problems, including the evaluation of the minimax rate of convergence and the construction of rate-optimal estimators, remain open. In this paper, we consider optimal estimation of the null density and the proportion of nonnull effects. Both minimax lower and upper bounds are derived. The lower bound is established by a two-point testing argument, where at the core is the novel construction of two least favorable marginal densities f1f_1 and f2f_2. The density f1f_1 is heavy tailed both in the spatial and frequency domains and f2f_2 is a perturbation of f1f_1 such that the characteristic functions associated with f1f_1 and f2f_2 match each other in low frequencies. The minimax upper bound is obtained by constructing estimators which rely on the empirical characteristic function and Fourier analysis. The estimator is shown to be minimax rate optimal. Compared to existing methods in the literature, the proposed procedure not only provides more precise estimates of the null density and the proportion of the nonnull effects, but also yields more accurate results when used inside some multiple testing procedures which aim at controlling the False Discovery Rate (FDR). The procedure is easy to implement and numerical results are given.Comment: Published in at http://dx.doi.org/10.1214/09-AOS696 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Estimating the Null and the Proportion of non-Null effects in Large-scale Multiple Comparisons

    Get PDF
    An important issue raised by Efron in the context of large-scale multiple comparisons is that in many applications the usual assumption that the null distribution is known is incorrect, and seemingly negligible differences in the null may result in large differences in subsequent studies. This suggests that a careful study of estimation of the null is indispensable. In this paper, we consider the problem of estimating a null normal distribution, and a closely related problem, estimation of the proportion of non-null effects. We develop an approach based on the empirical characteristic function and Fourier analysis. The estimators are shown to be uniformly consistent over a wide class of parameters. Numerical performance of the estimators is investigated using both simulated and real data. In particular, we apply our procedure to the analysis of breast cancer and HIV microarray data sets. The estimators perform favorably in comparison to existing methods.Comment: 42 pages, 6 figure

    Optimal classification in sparse Gaussian graphic model

    Get PDF
    Consider a two-class classification problem where the number of features is much larger than the sample size. The features are masked by Gaussian noise with mean zero and covariance matrix Σ\Sigma, where the precision matrix Ω=Σ1\Omega=\Sigma^{-1} is unknown but is presumably sparse. The useful features, also unknown, are sparse and each contributes weakly (i.e., rare and weak) to the classification decision. By obtaining a reasonably good estimate of Ω\Omega, we formulate the setting as a linear regression model. We propose a two-stage classification method where we first select features by the method of Innovated Thresholding (IT), and then use the retained features and Fisher's LDA for classification. In this approach, a crucial problem is how to set the threshold of IT. We approach this problem by adapting the recent innovation of Higher Criticism Thresholding (HCT). We find that when useful features are rare and weak, the limiting behavior of HCT is essentially just as good as the limiting behavior of ideal threshold, the threshold one would choose if the underlying distribution of the signals is known (if only). Somewhat surprisingly, when Ω\Omega is sufficiently sparse, its off-diagonal coordinates usually do not have a major influence over the classification decision. Compared to recent work in the case where Ω\Omega is the identity matrix [Proc. Natl. Acad. Sci. USA 105 (2008) 14790-14795; Philos. Trans. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 367 (2009) 4449-4470], the current setting is much more general, which needs a new approach and much more sophisticated analysis. One key component of the analysis is the intimate relationship between HCT and Fisher's separation. Another key component is the tight large-deviation bounds for empirical processes for data with unconventional correlation structures, where graph theory on vertex coloring plays an important role.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1163 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org
    corecore