3 research outputs found
A Randomized Rounding Algorithm for Sparse PCA
We present and analyze a simple, two-step algorithm to approximate the
optimal solution of the sparse PCA problem. Our approach first solves a L1
penalized version of the NP-hard sparse PCA optimization problem and then uses
a randomized rounding strategy to sparsify the resulting dense solution. Our
main theoretical result guarantees an additive error approximation and provides
a tradeoff between sparsity and accuracy. Our experimental evaluation indicates
that our approach is competitive in practice, even compared to state-of-the-art
toolboxes such as Spasm.Comment: 28 pages, 11 figures, 2 table
Sparse principal component analysis and its -relaxation
Principal component analysis (PCA) is one of the most widely used
dimensionality reduction methods in scientific data analysis. In many
applications, for additional interpretability, it is desirable for the factor
loadings to be sparse, that is, we solve PCA with an additional cardinality
(l0) constraint. The resulting optimization problem is called the sparse
principal component analysis (SPCA). One popular approach to achieve sparsity
is to replace the l0 constraint by an l1 constraint. In this paper, we prove
that, independent of the data, the optimal objective function value of the
problem with l0 constraint is within a constant factor of the the optimal
objective function value of the problem with l1 constraint. To the best of our
knowledge, this is the first formal relationship established between the l0 and
the l1 constraint version of the problem
Approximation Algorithms for Sparse Principal Component Analysis
Principal component analysis (PCA) is a widely used dimension reduction
technique in machine learning and multivariate statistics. To improve the
interpretability of PCA, various approaches to obtain sparse principal
direction loadings have been proposed, which are termed Sparse Principal
Component Analysis (SPCA). In this paper, we present thresholding as a provably
accurate, polynomial time, approximation algorithm for the SPCA problem,
without imposing any restrictive assumptions on the input covariance matrix.
Our first thresholding algorithm using the Singular Value Decomposition is
conceptually simple; is faster than current state-of-the-art; and performs well
in practice. On the negative side, our (novel) theoretical bounds do not
accurately predict the strong practical performance of this approach. The
second algorithm solves a well-known semidefinite programming relaxation and
then uses a novel, two step, deterministic thresholding scheme to compute a
sparse principal vector. It works very well in practice and, remarkably, this
solid practical performance is accurately predicted by our theoretical bounds,
which bridge the theory-practice gap better than current state-of-the-art