102,782 research outputs found
High Dimensional Semiparametric Scale-Invariant Principal Component Analysis
We propose a new high dimensional semiparametric principal component analysis
(PCA) method, named Copula Component Analysis (COCA). The semiparametric model
assumes that, after unspecified marginally monotone transformations, the
distributions are multivariate Gaussian. COCA improves upon PCA and sparse PCA
in three aspects: (i) It is robust to modeling assumptions; (ii) It is robust
to outliers and data contamination; (iii) It is scale-invariant and yields more
interpretable results. We prove that the COCA estimators obtain fast estimation
rates and are feature selection consistent when the dimension is nearly
exponentially large relative to the sample size. Careful experiments confirm
that COCA outperforms sparse PCA on both synthetic and real-world datasets.Comment: Accepted in IEEE Transactions on Pattern Analysis and Machine
Intelligence (TPMAI
ECA: High Dimensional Elliptical Component Analysis in non-Gaussian Distributions
We present a robust alternative to principal component analysis (PCA) ---
called elliptical component analysis (ECA) --- for analyzing high dimensional,
elliptically distributed data. ECA estimates the eigenspace of the covariance
matrix of the elliptical data. To cope with heavy-tailed elliptical
distributions, a multivariate rank statistic is exploited. At the model-level,
we consider two settings: either that the leading eigenvectors of the
covariance matrix are non-sparse or that they are sparse. Methodologically, we
propose ECA procedures for both non-sparse and sparse settings. Theoretically,
we provide both non-asymptotic and asymptotic analyses quantifying the
theoretical performances of ECA. In the non-sparse setting, we show that ECA's
performance is highly related to the effective rank of the covariance matrix.
In the sparse setting, the results are twofold: (i) We show that the sparse ECA
estimator based on a combinatoric program attains the optimal rate of
convergence; (ii) Based on some recent developments in estimating sparse
leading eigenvectors, we show that a computationally efficient sparse ECA
estimator attains the optimal rate of convergence under a suboptimal scaling.Comment: to appear in JASA (T&M
- …