47 research outputs found
Towards a Theoretical Analysis of PCA for Heteroscedastic Data
Principal Component Analysis (PCA) is a method for estimating a subspace
given noisy samples. It is useful in a variety of problems ranging from
dimensionality reduction to anomaly detection and the visualization of high
dimensional data. PCA performs well in the presence of moderate noise and even
with missing data, but is also sensitive to outliers. PCA is also known to have
a phase transition when noise is independent and identically distributed;
recovery of the subspace sharply declines at a threshold noise variance.
Effective use of PCA requires a rigorous understanding of these behaviors. This
paper provides a step towards an analysis of PCA for samples with
heteroscedastic noise, that is, samples that have non-uniform noise variances
and so are no longer identically distributed. In particular, we provide a
simple asymptotic prediction of the recovery of a one-dimensional subspace from
noisy heteroscedastic samples. The prediction enables: a) easy and efficient
calculation of the asymptotic performance, and b) qualitative reasoning to
understand how PCA is impacted by heteroscedasticity (such as outliers).Comment: Presented at 54th Annual Allerton Conference on Communication,
Control, and Computing (Allerton
Average Characteristic Polynomials of Determinantal Point Processes
We investigate the average characteristic polynomial where the 's are real random variables
which form a determinantal point process associated to a bounded projection
operator. For a subclass of point processes, which contains Orthogonal
Polynomial Ensembles and Multiple Orthogonal Polynomial Ensembles, we provide a
sufficient condition for its limiting zero distribution to match with the
limiting distribution of the random variables, almost surely, as goes to
infinity. Moreover, such a condition turns out to be sufficient to strengthen
the mean convergence to the almost sure one for the moments of the empirical
measure associated to the determinantal point process, a fact of independent
interest. As an application, we obtain from a theorem of Kuijlaars and Van
Assche a unified way to describe the almost sure convergence for classical
Orthogonal Polynomial Ensembles. As another application, we obtain from
Voiculescu's theorems the limiting zero distribution for multiple Hermite and
multiple Laguerre polynomials, expressed in terms of free convolutions of
classical distributions with atomic measures.Comment: 26 page
Free Probability, Sample Covariance Matrices and Stochastic Eigen-Inference
Random matrix theory is now a big subject with applications in many disciplines of science, engineering and finance. This talk is a survey specifically oriented towards the needs and interests of a computationally inclined audience. We include the important mathematics (free probability) that permit the characterization of a large class of random matrices. We discuss how computational software is transforming this theory into practice by highlighting its use in the context of a stochastic eigen-inference application.Singapore-MIT Alliance (SMA