Principal Component Analysis (PCA) is a method for estimating a subspace
given noisy samples. It is useful in a variety of problems ranging from
dimensionality reduction to anomaly detection and the visualization of high
dimensional data. PCA performs well in the presence of moderate noise and even
with missing data, but is also sensitive to outliers. PCA is also known to have
a phase transition when noise is independent and identically distributed;
recovery of the subspace sharply declines at a threshold noise variance.
Effective use of PCA requires a rigorous understanding of these behaviors. This
paper provides a step towards an analysis of PCA for samples with
heteroscedastic noise, that is, samples that have non-uniform noise variances
and so are no longer identically distributed. In particular, we provide a
simple asymptotic prediction of the recovery of a one-dimensional subspace from
noisy heteroscedastic samples. The prediction enables: a) easy and efficient
calculation of the asymptotic performance, and b) qualitative reasoning to
understand how PCA is impacted by heteroscedasticity (such as outliers).Comment: Presented at 54th Annual Allerton Conference on Communication,
Control, and Computing (Allerton