1,795 research outputs found
Robust sparse principal component analysis.
A method for principal component analysis is proposed that is sparse and robust at the same time. The sparsity delivers principal components that have loadings on a small number of variables, making them easier to interpret. The robustness makes the analysis resistant to outlying observations. The principal components correspond to directions that maximize a robust measure of the variance, with an additional penalty term to take sparseness into account. We propose an algorithm to compute the sparse and robust principal components. The method is applied on several real data examples, and diagnostic plots for detecting outliers and for selecting the degree of sparsity are provided. A simulation experiment studies the loss in statistical efficiency by requiring both robustness and sparsity.Dispersion measure; Projection-pursuit; Outliers; Variable selection;
Relaxed 2-D Principal Component Analysis by Norm for Face Recognition
A relaxed two dimensional principal component analysis (R2DPCA) approach is
proposed for face recognition. Different to the 2DPCA, 2DPCA- and G2DPCA,
the R2DPCA utilizes the label information (if known) of training samples to
calculate a relaxation vector and presents a weight to each subset of training
data. A new relaxed scatter matrix is defined and the computed projection axes
are able to increase the accuracy of face recognition. The optimal -norms
are selected in a reasonable range. Numerical experiments on practical face
databased indicate that the R2DPCA has high generalization ability and can
achieve a higher recognition rate than state-of-the-art methods.Comment: 19 pages, 11 figure
Alternating Maximization: Unifying Framework for 8 Sparse PCA Formulations and Efficient Parallel Codes
Given a multivariate data set, sparse principal component analysis (SPCA)
aims to extract several linear combinations of the variables that together
explain the variance in the data as much as possible, while controlling the
number of nonzero loadings in these combinations. In this paper we consider 8
different optimization formulations for computing a single sparse loading
vector; these are obtained by combining the following factors: we employ two
norms for measuring variance (L2, L1) and two sparsity-inducing norms (L0, L1),
which are used in two different ways (constraint, penalty). Three of our
formulations, notably the one with L0 constraint and L1 variance, have not been
considered in the literature. We give a unifying reformulation which we propose
to solve via a natural alternating maximization (AM) method. We show the the AM
method is nontrivially equivalent to GPower (Journ\'{e}e et al; JMLR
11:517--553, 2010) for all our formulations. Besides this, we provide 24
efficient parallel SPCA implementations: 3 codes (multi-core, GPU and cluster)
for each of the 8 problems. Parallelism in the methods is aimed at i) speeding
up computations (our GPU code can be 100 times faster than an efficient serial
code written in C++), ii) obtaining solutions explaining more variance and iii)
dealing with big data problems (our cluster code is able to solve a 357 GB
problem in about a minute).Comment: 29 pages, 9 tables, 7 figures (the paper is accompanied by a release
of the open-source code '24am'
- …