10,657 research outputs found
Tight Lower Bound for Linear Sketches of Moments
The problem of estimating frequency moments of a data stream has attracted a
lot of attention since the onset of streaming algorithms [AMS99]. While the
space complexity for approximately computing the moment, for
has been settled [KNW10], for the exact complexity remains
open. For the current best algorithm uses words of
space [AKO11,BO10], whereas the lower bound is of [BJKS04].
In this paper, we show a tight lower bound of words
for the class of algorithms based on linear sketches, which store only a sketch
of input vector and some (possibly randomized) matrix . We note
that all known algorithms for this problem are linear sketches.Comment: In Proceedings of the 40th International Colloquium on Automata,
Languages and Programming (ICALP), Riga, Latvia, July 201
Approximate F_2-Sketching of Valuation Functions
We study the problem of constructing a linear sketch of minimum dimension that allows approximation of a given real-valued function f : F_2^n - > R with small expected squared error. We develop a general theory of linear sketching for such functions through which we analyze their dimension for most commonly studied types of valuation functions: additive, budget-additive, coverage, alpha-Lipschitz submodular and matroid rank functions. This gives a characterization of how many bits of information have to be stored about the input x so that one can compute f under additive updates to its coordinates.
Our results are tight in most cases and we also give extensions to the distributional version of the problem where the input x in F_2^n is generated uniformly at random. Using known connections with dynamic streaming algorithms, both upper and lower bounds on dimension obtained in our work extend to the space complexity of algorithms evaluating f(x) under long sequences of additive updates to the input x presented as a stream. Similar results hold for simultaneous communication in a distributed setting
On Estimating the First Frequency Moment of Data Streams
Estimating the first moment of a data stream defined as F_1 = \sum_{i \in
\{1, 2, \ldots, n\}} \abs{f_i} to within -relative error with
high probability is a basic and influential problem in data stream processing.
A tight space bound of is known from the work of
[Kane-Nelson-Woodruff-SODA10]. However, all known algorithms for this problem
require per-update stream processing time of , with the
only exception being the algorithm of [Ganguly-Cormode-RANDOM07] that requires
per-update processing time of albeit with sub-optimal
space . In this paper, we present an algorithm for
estimating that achieves near-optimality in both space and update
processing time. The space requirement is and the per-update processing time is .Comment: 12 page
Towards Optimal Moment Estimation in Streaming and Distributed Models
One of the oldest problems in the data stream model is to approximate the p-th moment ||X||_p^p = sum_{i=1}^n X_i^p of an underlying non-negative vector X in R^n, which is presented as a sequence of poly(n) updates to its coordinates. Of particular interest is when p in (0,2]. Although a tight space bound of Theta(epsilon^-2 log n) bits is known for this problem when both positive and negative updates are allowed, surprisingly there is still a gap in the space complexity of this problem when all updates are positive. Specifically, the upper bound is O(epsilon^-2 log n) bits, while the lower bound is only Omega(epsilon^-2 + log n) bits. Recently, an upper bound of O~(epsilon^-2 + log n) bits was obtained under the assumption that the updates arrive in a random order.
We show that for p in (0, 1], the random order assumption is not needed. Namely, we give an upper bound for worst-case streams of O~(epsilon^-2 + log n) bits for estimating |X |_p^p. Our techniques also give new upper bounds for estimating the empirical entropy in a stream. On the other hand, we show that for p in (1,2], in the natural coordinator and blackboard distributed communication topologies, there is an O~(epsilon^-2) bit max-communication upper bound based on a randomized rounding scheme. Our protocols also give rise to protocols for heavy hitters and approximate matrix product. We generalize our results to arbitrary communication topologies G, obtaining an O~(epsilon^2 log d) max-communication upper bound, where d is the diameter of G. Interestingly, our upper bound rules out natural communication complexity-based approaches for proving an Omega(epsilon^-2 log n) bit lower bound for p in (1,2] for streaming algorithms. In particular, any such lower bound must come from a topology with large diameter
- …