7 research outputs found

    Revisiting the Direct Sum Theorem and Space Lower Bounds in Random Order Streams

    Get PDF
    Estimating frequency moments and LpL_p distances are well studied problems in the adversarial data stream model and tight space bounds are known for these two problems. There has been growing interest in revisiting these problems in the framework of random-order streams. The best space lower bound known for computing the kthk^{th} frequency moment in random-order streams is Ω(n1−2.5/k)\Omega(n^{1-2.5/k}) by Andoni et al., and it is conjectured that the real lower bound shall be Ω(n1−2/k)\Omega(n^{1-2/k}). In this paper, we resolve this conjecture. In our approach, we revisit the direct sum theorem developed by Bar-Yossef et al. in a random-partition private messages model and provide a tight Ω(n1−2/k/ℓ)\Omega(n^{1-2/k}/\ell) space lower bound for any ℓ\ell-pass algorithm that approximates the frequency moment in random-order stream model to a constant factor. Finally, we also introduce the notion of space-entropy tradeoffs in random order streams, as a means of studying intermediate models between adversarial and fully random order streams. We show an almost tight space-entropy tradeoff for L∞L_\infty distance and a non-trivial tradeoff for LpL_p distances

    Space-Efficient Estimation of Statistics Over Sub-Sampled Streams

    Get PDF
    In many stream monitoring situations, the data arrival rate is so high that it is not even possible to observe each element of the stream. The most common solution is to subsample the data stream and use the sample to infer properties and estimate aggregates of the original stream. However, in many cases, the estimation of aggregates on the original stream cannot be accomplished through simply estimating them on the sampled stream, followed by a normalization. We present algorithms for estimating frequency moments, support size, entropy, and heavy hitters of the original stream, through a single pass over the sampled stream

    Communication Steps for Parallel Query Processing

    Full text link
    We consider the problem of computing a relational query qq on a large input database of size nn, using a large number pp of servers. The computation is performed in rounds, and each server can receive only O(n/p1−ε)O(n/p^{1-\varepsilon}) bits of data, where ε∈[0,1]\varepsilon \in [0,1] is a parameter that controls replication. We examine how many global communication steps are needed to compute qq. We establish both lower and upper bounds, in two settings. For a single round of communication, we give lower bounds in the strongest possible model, where arbitrary bits may be exchanged; we show that any algorithm requires ε≥1−1/τ∗\varepsilon \geq 1-1/\tau^*, where τ∗\tau^* is the fractional vertex cover of the hypergraph of qq. We also give an algorithm that matches the lower bound for a specific class of databases. For multiple rounds of communication, we present lower bounds in a model where routing decisions for a tuple are tuple-based. We show that for the class of tree-like queries there exists a tradeoff between the number of rounds and the space exponent ε\varepsilon. The lower bounds for multiple rounds are the first of their kind. Our results also imply that transitive closure cannot be computed in O(1) rounds of communication

    Tight bounds for distributed functional monitoring

    Full text link
    We resolve several fundamental questions in the area of distributed functional monitoring, initiated by Cormode, Muthukrishnan, and Yi (SODA, 2008), and receiving recent attention. In this model there are k sites each tracking their input streams and communicating with a central coordinator. The coordinator’s task is to continuously maintain an approximate output to a function computed over the union of the k streams. The goal is to minimize the number of bits communicated. Let the p-th frequency moment be defined as Fp = ∑ f p i, where fi is the frequency of element i. i We show the randomized communication complexity of estimating the number of distinct elements (that is, F0) up to a 1 + ε factor is Ω(k/ε2), improving upon the previous Ω(k + 1/ε2) bound and matching known upper bounds. For Fp, p> 1, we improve the previous Ω(k + 1/ε2) communication bound to Ω(kp−1 /ε2). We obtain similar improvements for heavy hitters, empirical entropy, and other problems
    corecore