Search CORE

7 research outputs found

Revisiting Co-Occurring Directions: Sharper Analysis and Efficient Algorithm for Sparse Matrices

Author: Chen Cheng
Luo Luo
Xie Guangzeng
Ye Haishan
Publication venue
Publication date: 17/12/2020
Field of study

We study the streaming model for approximate matrix multiplication (AMM). We are interested in the scenario that the algorithm can only take one pass over the data with limited memory. The state-of-the-art deterministic sketching algorithm for streaming AMM is the co-occurring directions (COD), which has much smaller approximation errors than randomized algorithms and outperforms other deterministic sketching methods empirically. In this paper, we provide a tighter error bound for COD whose leading term considers the potential approximate low-rank structure and the correlation of input matrices. We prove COD is space optimal with respect to our improved error bound. We also propose a variant of COD for sparse matrices with theoretical guarantees. The experiments on real-world sparse datasets show that the proposed algorithm is more efficient than baseline methods

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

A Simple Proof of a New Set Disjointness with Applications to Data Streams

Author: Kamath Akshay
Price Eric
Woodruff David P.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 36th Computational Complexity Conference (CCC 2021)
Publication date: 01/01/2021
Field of study

Dagstuhl Research Online Publication Server

Communication-efficient distributed covariance sketch, with application to distributed PCA

Author: Huang Z
Lin X
Zhang W
Zhang Y
Publication venue: Microtome Publishing
Publication date: 05/07/2022
Field of study

A sketch of a large data set captures vital properties of the original data while typically occupying much less space. In this paper, we consider the problem of computing a sketch of a massive data matrix A ∈ Rn×d that is distributed across s machines. Our goal is to output a matrix B ∈ Rℓ×d which is significantly smaller than but still approximates A well in terms of covariance error, i.e., kAT A - BT Bk2. Such a matrix B is called a covariance sketch of A. We are mainly focused on minimizing the communication cost, which is arguably the most valuable resource in distributed computations. We show that there is a nontrivial gap between deterministic and randomized communication complexity for computing a covariance sketch. More specifically, we first prove an almost tight deterministic communication lower bound, then provide a new randomized algorithm with communication cost smaller than the deterministic lower bound. Based on a well-known connection between covariance sketch and approximate principle component analysis, we obtain better communication bounds for the distributed PCA problem. Moreover, we also give an improved distributed PCA algorithm for sparse input matrices, which uses our distributed sketching algorithm as a key building block

OPUS - University of Technology Sydney