47 research outputs found

    Towards a Theoretical Analysis of PCA for Heteroscedastic Data

    Full text link
    Principal Component Analysis (PCA) is a method for estimating a subspace given noisy samples. It is useful in a variety of problems ranging from dimensionality reduction to anomaly detection and the visualization of high dimensional data. PCA performs well in the presence of moderate noise and even with missing data, but is also sensitive to outliers. PCA is also known to have a phase transition when noise is independent and identically distributed; recovery of the subspace sharply declines at a threshold noise variance. Effective use of PCA requires a rigorous understanding of these behaviors. This paper provides a step towards an analysis of PCA for samples with heteroscedastic noise, that is, samples that have non-uniform noise variances and so are no longer identically distributed. In particular, we provide a simple asymptotic prediction of the recovery of a one-dimensional subspace from noisy heteroscedastic samples. The prediction enables: a) easy and efficient calculation of the asymptotic performance, and b) qualitative reasoning to understand how PCA is impacted by heteroscedasticity (such as outliers).Comment: Presented at 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton

    Average Characteristic Polynomials of Determinantal Point Processes

    Full text link
    We investigate the average characteristic polynomial E[i=1N(zxi)]\mathbb E\big[\prod_{i=1}^N(z-x_i)\big] where the xix_i's are real random variables which form a determinantal point process associated to a bounded projection operator. For a subclass of point processes, which contains Orthogonal Polynomial Ensembles and Multiple Orthogonal Polynomial Ensembles, we provide a sufficient condition for its limiting zero distribution to match with the limiting distribution of the random variables, almost surely, as NN goes to infinity. Moreover, such a condition turns out to be sufficient to strengthen the mean convergence to the almost sure one for the moments of the empirical measure associated to the determinantal point process, a fact of independent interest. As an application, we obtain from a theorem of Kuijlaars and Van Assche a unified way to describe the almost sure convergence for classical Orthogonal Polynomial Ensembles. As another application, we obtain from Voiculescu's theorems the limiting zero distribution for multiple Hermite and multiple Laguerre polynomials, expressed in terms of free convolutions of classical distributions with atomic measures.Comment: 26 page

    Free Probability, Sample Covariance Matrices and Stochastic Eigen-Inference

    Get PDF
    Random matrix theory is now a big subject with applications in many disciplines of science, engineering and finance. This talk is a survey specifically oriented towards the needs and interests of a computationally inclined audience. We include the important mathematics (free probability) that permit the characterization of a large class of random matrices. We discuss how computational software is transforming this theory into practice by highlighting its use in the context of a stochastic eigen-inference application.Singapore-MIT Alliance (SMA
    corecore