9 research outputs found
Deterministic algorithms for skewed matrix products
Recently, Pagh presented a randomized approximation algorithm for the
multiplication of real-valued matrices building upon work for detecting the
most frequent items in data streams. We continue this line of research and
present new {\em deterministic} matrix multiplication algorithms.
Motivated by applications in data mining, we first consider the case of
real-valued, nonnegative -by- input matrices and , and show how to
obtain a deterministic approximation of the weights of individual entries, as
well as the entrywise -norm, of the product . The algorithm is simple,
space efficient and runs in one pass over the input matrices. For a user
defined the algorithm runs in time and space and returns an approximation of the
entries of within an additive factor of , where is the entrywise 1-norm of a matrix and
is the time required to sort real numbers in linear space.
Building upon a result by Berinde et al. we show that for skewed matrix
products (a common situation in many real-life applications) the algorithm is
more efficient and achieves better approximation guarantees than previously
known randomized algorithms.
When the input matrices are not restricted to nonnegative entries, we present
a new deterministic group testing algorithm detecting nonzero entries in the
matrix product with large absolute value. The algorithm is clearly outperformed
by randomized matrix multiplication algorithms, but as a byproduct we obtain
the first -time deterministic algorithm for matrix
products with nonzero entries
Simple Set Sketching
Imagine handling collisions in a hash table by storing, in each cell, the
bit-wise exclusive-or of the set of keys hashing there. This appears to be a
terrible idea: For keys and buckets, where is constant,
we expect that a constant fraction of the keys will be unrecoverable due to
collisions.
We show that if this collision resolution strategy is repeated three times
independently the situation reverses: If is below a threshold of
then we can recover the set of all inserted keys in linear time
with high probability.
Even though the description of our data structure is simple, its analysis is
nontrivial. Our approach can be seen as a variant of the Invertible Bloom
Filter (IBF) of Eppstein and Goodrich. While IBFs involve an explicit checksum
per bucket to decide whether the bucket stores a single key, we exploit the
idea of quotienting, namely that some bits of the key are implicit in the
location where it is stored. We let those serve as an implicit checksum. These
bits are not quite enough to ensure that no errors occur and the main technical
challenge is to show that decoding can recover from these errors.Comment: To be published at SIAM Symposium on Simplicity in Algorithms
(SOSA23
Improved Algorithms for White-Box Adversarial Streams
We study streaming algorithms in the white-box adversarial stream model,
where the internal state of the streaming algorithm is revealed to an adversary
who adaptively generates the stream updates, but the algorithm obtains fresh
randomness unknown to the adversary at each time step. We incorporate
cryptographic assumptions to construct robust algorithms against such
adversaries. We propose efficient algorithms for sparse recovery of vectors,
low rank recovery of matrices and tensors, as well as low rank plus sparse
recovery of matrices, i.e., robust PCA. Unlike deterministic algorithms, our
algorithms can report when the input is not sparse or low rank even in the
presence of such an adversary. We use these recovery algorithms to improve upon
and solve new problems in numerical linear algebra and combinatorial
optimization on white-box adversarial streams. For example, we give the first
efficient algorithm for outputting a matching in a graph with insertions and
deletions to its edges provided the matching size is small, and otherwise we
declare the matching size is large. We also improve the approximation versus
memory tradeoff of previous work for estimating the number of non-zero elements
in a vector and computing the matrix rank.Comment: ICML 202
Deterministic K-set structure
Abstract. A k-set structure over data streams is a bounded-space data structure that supports stream insertion and deletion operations and returns the set of (item, frequency) pairs in the stream, provided, the number of distinct items in the stream does not exceed k; and returns nil otherwise. This is a fundamental problem with applications in data streaming [24], data reconciliation in distributed systems [22] and mobile computing [28], etc. In this paper, we study the problem of obtaining deterministic algorithms for the k-set problem.