10,657 research outputs found

    Tight Lower Bound for Linear Sketches of Moments

    Get PDF
    The problem of estimating frequency moments of a data stream has attracted a lot of attention since the onset of streaming algorithms [AMS99]. While the space complexity for approximately computing the pthp^{\rm th} moment, for p(0,2]p\in(0,2] has been settled [KNW10], for p>2p>2 the exact complexity remains open. For p>2p>2 the current best algorithm uses O(n12/plogn)O(n^{1-2/p}\log n) words of space [AKO11,BO10], whereas the lower bound is of Ω(n12/p)\Omega(n^{1-2/p}) [BJKS04]. In this paper, we show a tight lower bound of Ω(n12/plogn)\Omega(n^{1-2/p}\log n) words for the class of algorithms based on linear sketches, which store only a sketch AxAx of input vector xx and some (possibly randomized) matrix AA. We note that all known algorithms for this problem are linear sketches.Comment: In Proceedings of the 40th International Colloquium on Automata, Languages and Programming (ICALP), Riga, Latvia, July 201

    Approximate F_2-Sketching of Valuation Functions

    Get PDF
    We study the problem of constructing a linear sketch of minimum dimension that allows approximation of a given real-valued function f : F_2^n - > R with small expected squared error. We develop a general theory of linear sketching for such functions through which we analyze their dimension for most commonly studied types of valuation functions: additive, budget-additive, coverage, alpha-Lipschitz submodular and matroid rank functions. This gives a characterization of how many bits of information have to be stored about the input x so that one can compute f under additive updates to its coordinates. Our results are tight in most cases and we also give extensions to the distributional version of the problem where the input x in F_2^n is generated uniformly at random. Using known connections with dynamic streaming algorithms, both upper and lower bounds on dimension obtained in our work extend to the space complexity of algorithms evaluating f(x) under long sequences of additive updates to the input x presented as a stream. Similar results hold for simultaneous communication in a distributed setting

    On Estimating the First Frequency Moment of Data Streams

    Full text link
    Estimating the first moment of a data stream defined as F_1 = \sum_{i \in \{1, 2, \ldots, n\}} \abs{f_i} to within 1±ϵ1 \pm \epsilon-relative error with high probability is a basic and influential problem in data stream processing. A tight space bound of O(ϵ2log(mM))O(\epsilon^{-2} \log (mM)) is known from the work of [Kane-Nelson-Woodruff-SODA10]. However, all known algorithms for this problem require per-update stream processing time of Ω(ϵ2)\Omega(\epsilon^{-2}), with the only exception being the algorithm of [Ganguly-Cormode-RANDOM07] that requires per-update processing time of O(log2(mM)(logn))O(\log^2(mM)(\log n)) albeit with sub-optimal space O(ϵ3log2(mM))O(\epsilon^{-3}\log^2(mM)). In this paper, we present an algorithm for estimating F1F_1 that achieves near-optimality in both space and update processing time. The space requirement is O(ϵ2(logn+(logϵ1)log(mM)))O(\epsilon^{-2}(\log n + (\log \epsilon^{-1})\log(mM))) and the per-update processing time is O((logn)log(ϵ1))O( (\log n)\log (\epsilon^{-1})).Comment: 12 page

    Towards Optimal Moment Estimation in Streaming and Distributed Models

    Get PDF
    One of the oldest problems in the data stream model is to approximate the p-th moment ||X||_p^p = sum_{i=1}^n X_i^p of an underlying non-negative vector X in R^n, which is presented as a sequence of poly(n) updates to its coordinates. Of particular interest is when p in (0,2]. Although a tight space bound of Theta(epsilon^-2 log n) bits is known for this problem when both positive and negative updates are allowed, surprisingly there is still a gap in the space complexity of this problem when all updates are positive. Specifically, the upper bound is O(epsilon^-2 log n) bits, while the lower bound is only Omega(epsilon^-2 + log n) bits. Recently, an upper bound of O~(epsilon^-2 + log n) bits was obtained under the assumption that the updates arrive in a random order. We show that for p in (0, 1], the random order assumption is not needed. Namely, we give an upper bound for worst-case streams of O~(epsilon^-2 + log n) bits for estimating |X |_p^p. Our techniques also give new upper bounds for estimating the empirical entropy in a stream. On the other hand, we show that for p in (1,2], in the natural coordinator and blackboard distributed communication topologies, there is an O~(epsilon^-2) bit max-communication upper bound based on a randomized rounding scheme. Our protocols also give rise to protocols for heavy hitters and approximate matrix product. We generalize our results to arbitrary communication topologies G, obtaining an O~(epsilon^2 log d) max-communication upper bound, where d is the diameter of G. Interestingly, our upper bound rules out natural communication complexity-based approaches for proving an Omega(epsilon^-2 log n) bit lower bound for p in (1,2] for streaming algorithms. In particular, any such lower bound must come from a topology with large diameter
    corecore