9 research outputs found

    Deterministic algorithms for skewed matrix products

    Get PDF
    Recently, Pagh presented a randomized approximation algorithm for the multiplication of real-valued matrices building upon work for detecting the most frequent items in data streams. We continue this line of research and present new {\em deterministic} matrix multiplication algorithms. Motivated by applications in data mining, we first consider the case of real-valued, nonnegative nn-by-nn input matrices AA and BB, and show how to obtain a deterministic approximation of the weights of individual entries, as well as the entrywise pp-norm, of the product ABAB. The algorithm is simple, space efficient and runs in one pass over the input matrices. For a user defined b∈(0,n2)b \in (0, n^2) the algorithm runs in time O(nb+n⋅Sort(n))O(nb + n\cdot\text{Sort}(n)) and space O(n+b)O(n + b) and returns an approximation of the entries of ABAB within an additive factor of ∥AB∥E1/b\|AB\|_{E1}/b, where ∥C∥E1=∑i,j∣Cij∣\|C\|_{E1} = \sum_{i, j} |C_{ij}| is the entrywise 1-norm of a matrix CC and Sort(n)\text{Sort}(n) is the time required to sort nn real numbers in linear space. Building upon a result by Berinde et al. we show that for skewed matrix products (a common situation in many real-life applications) the algorithm is more efficient and achieves better approximation guarantees than previously known randomized algorithms. When the input matrices are not restricted to nonnegative entries, we present a new deterministic group testing algorithm detecting nonzero entries in the matrix product with large absolute value. The algorithm is clearly outperformed by randomized matrix multiplication algorithms, but as a byproduct we obtain the first O(n2+ε)O(n^{2 + \varepsilon})-time deterministic algorithm for matrix products with O(n)O(\sqrt{n}) nonzero entries

    Simple Set Sketching

    Full text link
    Imagine handling collisions in a hash table by storing, in each cell, the bit-wise exclusive-or of the set of keys hashing there. This appears to be a terrible idea: For αn\alpha n keys and nn buckets, where α\alpha is constant, we expect that a constant fraction of the keys will be unrecoverable due to collisions. We show that if this collision resolution strategy is repeated three times independently the situation reverses: If α\alpha is below a threshold of ≈0.81\approx 0.81 then we can recover the set of all inserted keys in linear time with high probability. Even though the description of our data structure is simple, its analysis is nontrivial. Our approach can be seen as a variant of the Invertible Bloom Filter (IBF) of Eppstein and Goodrich. While IBFs involve an explicit checksum per bucket to decide whether the bucket stores a single key, we exploit the idea of quotienting, namely that some bits of the key are implicit in the location where it is stored. We let those serve as an implicit checksum. These bits are not quite enough to ensure that no errors occur and the main technical challenge is to show that decoding can recover from these errors.Comment: To be published at SIAM Symposium on Simplicity in Algorithms (SOSA23

    Improved Algorithms for White-Box Adversarial Streams

    Full text link
    We study streaming algorithms in the white-box adversarial stream model, where the internal state of the streaming algorithm is revealed to an adversary who adaptively generates the stream updates, but the algorithm obtains fresh randomness unknown to the adversary at each time step. We incorporate cryptographic assumptions to construct robust algorithms against such adversaries. We propose efficient algorithms for sparse recovery of vectors, low rank recovery of matrices and tensors, as well as low rank plus sparse recovery of matrices, i.e., robust PCA. Unlike deterministic algorithms, our algorithms can report when the input is not sparse or low rank even in the presence of such an adversary. We use these recovery algorithms to improve upon and solve new problems in numerical linear algebra and combinatorial optimization on white-box adversarial streams. For example, we give the first efficient algorithm for outputting a matching in a graph with insertions and deletions to its edges provided the matching size is small, and otherwise we declare the matching size is large. We also improve the approximation versus memory tradeoff of previous work for estimating the number of non-zero elements in a vector and computing the matrix rank.Comment: ICML 202

    Deterministic K-set structure

    No full text
    Abstract. A k-set structure over data streams is a bounded-space data structure that supports stream insertion and deletion operations and returns the set of (item, frequency) pairs in the stream, provided, the number of distinct items in the stream does not exceed k; and returns nil otherwise. This is a fundamental problem with applications in data streaming [24], data reconciliation in distributed systems [22] and mobile computing [28], etc. In this paper, we study the problem of obtaining deterministic algorithms for the k-set problem.
    corecore