4,298 research outputs found

    Covariance regularization by thresholding

    Full text link
    This paper considers regularizing a covariance matrix of pp variables estimated from nn observations, by hard thresholding. We show that the thresholded estimate is consistent in the operator norm as long as the true covariance matrix is sparse in a suitable sense, the variables are Gaussian or sub-Gaussian, and (log⁑p)/nβ†’0(\log p)/n\to0, and obtain explicit rates. The results are uniform over families of covariance matrices which satisfy a fairly natural notion of sparsity. We discuss an intuitive resampling scheme for threshold selection and prove a general cross-validation result that justifies this approach. We also compare thresholding to other covariance estimators in simulations and on an example from climate data.Comment: Published in at http://dx.doi.org/10.1214/08-AOS600 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Minimax bounds for sparse PCA with noisy high-dimensional data

    Full text link
    We study the problem of estimating the leading eigenvectors of a high-dimensional population covariance matrix based on independent Gaussian observations. We establish a lower bound on the minimax risk of estimators under the l2l_2 loss, in the joint limit as dimension and sample size increase to infinity, under various models of sparsity for the population eigenvectors. The lower bound on the risk points to the existence of different regimes of sparsity of the eigenvectors. We also propose a new method for estimating the eigenvectors by a two-stage coordinate selection scheme.Comment: 1 figur

    Sparse PCA: Optimal rates and adaptive estimation

    Get PDF
    Principal component analysis (PCA) is one of the most commonly used statistical procedures with a wide range of applications. This paper considers both minimax and adaptive estimation of the principal subspace in the high dimensional setting. Under mild technical conditions, we first establish the optimal rates of convergence for estimating the principal subspace which are sharp with respect to all the parameters, thus providing a complete characterization of the difficulty of the estimation problem in term of the convergence rate. The lower bound is obtained by calculating the local metric entropy and an application of Fano's lemma. The rate optimal estimator is constructed using aggregation, which, however, might not be computationally feasible. We then introduce an adaptive procedure for estimating the principal subspace which is fully data driven and can be computed efficiently. It is shown that the estimator attains the optimal rates of convergence simultaneously over a large collection of the parameter spaces. A key idea in our construction is a reduction scheme which reduces the sparse PCA problem to a high-dimensional multivariate regression problem. This method is potentially also useful for other related problems.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1178 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org
    • …
    corecore