3 research outputs found

    A Randomized Rounding Algorithm for Sparse PCA

    Full text link
    We present and analyze a simple, two-step algorithm to approximate the optimal solution of the sparse PCA problem. Our approach first solves a L1 penalized version of the NP-hard sparse PCA optimization problem and then uses a randomized rounding strategy to sparsify the resulting dense solution. Our main theoretical result guarantees an additive error approximation and provides a tradeoff between sparsity and accuracy. Our experimental evaluation indicates that our approach is competitive in practice, even compared to state-of-the-art toolboxes such as Spasm.Comment: 28 pages, 11 figures, 2 table

    Sparse principal component analysis and its l1l_1-relaxation

    Full text link
    Principal component analysis (PCA) is one of the most widely used dimensionality reduction methods in scientific data analysis. In many applications, for additional interpretability, it is desirable for the factor loadings to be sparse, that is, we solve PCA with an additional cardinality (l0) constraint. The resulting optimization problem is called the sparse principal component analysis (SPCA). One popular approach to achieve sparsity is to replace the l0 constraint by an l1 constraint. In this paper, we prove that, independent of the data, the optimal objective function value of the problem with l0 constraint is within a constant factor of the the optimal objective function value of the problem with l1 constraint. To the best of our knowledge, this is the first formal relationship established between the l0 and the l1 constraint version of the problem

    Approximation Algorithms for Sparse Principal Component Analysis

    Full text link
    Principal component analysis (PCA) is a widely used dimension reduction technique in machine learning and multivariate statistics. To improve the interpretability of PCA, various approaches to obtain sparse principal direction loadings have been proposed, which are termed Sparse Principal Component Analysis (SPCA). In this paper, we present thresholding as a provably accurate, polynomial time, approximation algorithm for the SPCA problem, without imposing any restrictive assumptions on the input covariance matrix. Our first thresholding algorithm using the Singular Value Decomposition is conceptually simple; is faster than current state-of-the-art; and performs well in practice. On the negative side, our (novel) theoretical bounds do not accurately predict the strong practical performance of this approach. The second algorithm solves a well-known semidefinite programming relaxation and then uses a novel, two step, deterministic thresholding scheme to compute a sparse principal vector. It works very well in practice and, remarkably, this solid practical performance is accurately predicted by our theoretical bounds, which bridge the theory-practice gap better than current state-of-the-art
    corecore