Search CORE

88,703 research outputs found

Augmented sparse principal component analysis for high dimensional data

Author: Johnstone Iain M.
Paul Debashis
Publication venue
Publication date: 06/02/2012
Field of study

We study the problem of estimating the leading eigenvectors of a high-dimensional population covariance matrix based on independent Gaussian observations. We establish lower bounds on the rates of convergence of the estimators of the leading eigenvectors under

l^q

-sparsity constraints when an

l^2

loss function is used. We also propose an estimator of the leading eigenvectors based on a coordinate selection scheme combined with PCA and show that the proposed estimator achieves the optimal rate of convergence under a sparsity regime. Moreover, we establish that under certain scenarios, the usual PCA achieves the minimax convergence rate.Comment: This manuscript was written in 2007, and a version has been available on the first author's website, but it is posted to arXiv now in its 2007 form. Revisions incorporating later work will be posted separatel

arXiv.org e-Print Archive

eScholarship - University of California

Sparse Representation of High Dimensional Data for Classification

Author: Siddiqui Salman
Publication venue: Montclair State University Digital Commons
Publication date: 01/08/2008
Field of study

In this thesis we propose the use of sparse Principal Component Analysis (PCA) for representing high dimensional data for classification. Sparse transformation reduces the data volume/dimensionality without loss of critical information, so that it can be processed efficiently and assimilated by a human. We obtained sparse representation of high dimensional dataset using Sparse Principal Component Analysis (SPCA) and Direct formulation of Sparse Principal Component Analysis (DSPCA). Later we performed classification using k Nearest Neighbor (kNN) Method and compared its result with regular PCA. The experiments were performed on hyperspectral data and various datasets obtained from University of California, Irvine (UCI) machine learning dataset repository. The results suggest that sparse data representation is desirable because sparse representation enhances interpretation. It also improves classification performance with certain number of features and in most of the cases classification performance is similar to regular PCA

Montclair State University Digital Commons

Sparse principal component analysis for natural language processing

Author: CC Aggarwal
D Olson
DM Witten
E Haddi
H Trevor
IT Jolliffe
IT Jolliffe
J Camacho
K Rao
N Japkowicz
R Drikvandi
R Drikvandi
R Drikvandi
S Ning-min
T Robert
T Sirimongkolkasem
W Zhang
Y Shi
Y Shi
Y Shi
Z Hui
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

High dimensional data are rapidly growing in many different disciplines, particularly in natural language processing. The analysis of natural language processing requires working with high dimensional matrices of word embeddings obtained from text data. Those matrices are often sparse in the sense that they contain many zero elements. Sparse principal component analysis is an advanced mathematical tool for the analysis of high dimensional data. In this paper, we study and apply the sparse principal component analysis for natural language processing, which can effectively handle large sparse matrices. We study several formulations for sparse principal component analysis, together with algorithms for implementing those formulations. Our work is motivated and illustrated by a real text dataset. We find that the sparse principal component analysis performs as good as the ordinary principal component analysis in terms of accuracy and precision, while it shows two major advantages: faster calculations and easier interpretation of the principal components. These advantages are very helpful especially in big data situations

Durham Research Online

Crossref

E-space: Manchester Metropolitan University's Research Repository

ECA: High Dimensional Elliptical Component Analysis in non-Gaussian Distributions

Author: Han Fang
Liu Han
Publication venue
Publication date: 03/10/2016
Field of study

We present a robust alternative to principal component analysis (PCA) --- called elliptical component analysis (ECA) --- for analyzing high dimensional, elliptically distributed data. ECA estimates the eigenspace of the covariance matrix of the elliptical data. To cope with heavy-tailed elliptical distributions, a multivariate rank statistic is exploited. At the model-level, we consider two settings: either that the leading eigenvectors of the covariance matrix are non-sparse or that they are sparse. Methodologically, we propose ECA procedures for both non-sparse and sparse settings. Theoretically, we provide both non-asymptotic and asymptotic analyses quantifying the theoretical performances of ECA. In the non-sparse setting, we show that ECA's performance is highly related to the effective rank of the covariance matrix. In the sparse setting, the results are twofold: (i) We show that the sparse ECA estimator based on a combinatoric program attains the optimal rate of convergence; (ii) Based on some recent developments in estimating sparse leading eigenvectors, we show that a computationally efficient sparse ECA estimator attains the optimal rate of convergence under a suboptimal scaling.Comment: to appear in JASA (T&M

arXiv.org e-Print Archive

FigShare