research

Optimal Principal Component Analysis in Distributed and Streaming Models

Abstract

We study the Principal Component Analysis (PCA) problem in the distributed and streaming models of computation. Given a matrix A∈RmΓ—n,A \in R^{m \times n}, a rank parameter k<rank(A)k < rank(A), and an accuracy parameter 0<Ο΅<10 < \epsilon < 1, we want to output an mΓ—km \times k orthonormal matrix UU for which ∣∣Aβˆ’UUTA∣∣F2≀(1+Ο΅)β‹…βˆ£βˆ£Aβˆ’Ak∣∣F2, || A - U U^T A ||_F^2 \le \left(1 + \epsilon \right) \cdot || A - A_k||_F^2, where Ak∈RmΓ—nA_k \in R^{m \times n} is the best rank-kk approximation to AA. This paper provides improved algorithms for distributed PCA and streaming PCA.Comment: STOC2016 full versio

    Similar works

    Full text

    thumbnail-image

    Available Versions