7 research outputs found
Revisiting Co-Occurring Directions: Sharper Analysis and Efficient Algorithm for Sparse Matrices
We study the streaming model for approximate matrix multiplication (AMM). We
are interested in the scenario that the algorithm can only take one pass over
the data with limited memory. The state-of-the-art deterministic sketching
algorithm for streaming AMM is the co-occurring directions (COD), which has
much smaller approximation errors than randomized algorithms and outperforms
other deterministic sketching methods empirically. In this paper, we provide a
tighter error bound for COD whose leading term considers the potential
approximate low-rank structure and the correlation of input matrices. We prove
COD is space optimal with respect to our improved error bound. We also propose
a variant of COD for sparse matrices with theoretical guarantees. The
experiments on real-world sparse datasets show that the proposed algorithm is
more efficient than baseline methods
Communication-efficient distributed covariance sketch, with application to distributed PCA
A sketch of a large data set captures vital properties of the original data while typically occupying much less space. In this paper, we consider the problem of computing a sketch of a massive data matrix A ∈ Rn×d that is distributed across s machines. Our goal is to output a matrix B ∈ Rℓ×d which is significantly smaller than but still approximates A well in terms of covariance error, i.e., kAT A - BT Bk2. Such a matrix B is called a covariance sketch of A. We are mainly focused on minimizing the communication cost, which is arguably the most valuable resource in distributed computations. We show that there is a nontrivial gap between deterministic and randomized communication complexity for computing a covariance sketch. More specifically, we first prove an almost tight deterministic communication lower bound, then provide a new randomized algorithm with communication cost smaller than the deterministic lower bound. Based on a well-known connection between covariance sketch and approximate principle component analysis, we obtain better communication bounds for the distributed PCA problem. Moreover, we also give an improved distributed PCA algorithm for sparse input matrices, which uses our distributed sketching algorithm as a key building block