20 research outputs found
Recursive Sketching For Frequency Moments
In a ground-breaking paper, Indyk and Woodruff (STOC 05) showed how to
compute (for ) in space complexity O(\mbox{\em poly-log}(n,m)\cdot
n^{1-\frac2k}), which is optimal up to (large) poly-logarithmic factors in
and , where is the length of the stream and is the upper bound on
the number of distinct elements in a stream. The best known lower bound for
large moments is . A follow-up work of
Bhuvanagiri, Ganguly, Kesh and Saha (SODA 2006) reduced the poly-logarithmic
factors of Indyk and Woodruff to . Further reduction of poly-log factors has been an elusive
goal since 2006, when Indyk and Woodruff method seemed to hit a natural
"barrier." Using our simple recursive sketch, we provide a different yet simple
approach to obtain a algorithm for constant (our bound is, in fact, somewhat
stronger, where the term can be replaced by any constant number
of iterations instead of just two or three, thus approaching .
Our bound also works for non-constant (for details see the body of
the paper). Further, our algorithm requires only -wise independence, in
contrast to existing methods that use pseudo-random generators for computing
large frequency moments
Differentially Private Fractional Frequency Moments Estimation with Polylogarithmic Space
We prove that Fp sketch, a well-celebrated streaming algorithm for frequency moments estimation, is differentially private as is when p ∈ (0, 1]. Fp sketch uses only polylogarithmic space, exponentially better than existing DP baselines and only worse than the optimal non-private baseline by a logarithmic factor. The evaluation shows that Fp sketch can achieve reasonable accuracy with differential privacy guarantee. The evaluation code is included in the supplementary material
Continuous Monitoring of l_p Norms in Data Streams
In insertion-only streaming, one sees a sequence of indices a_1, a_2, ..., a_m in [n]. The stream defines a sequence of m frequency vectors x(1), ..., x(m) each in R^n, where x(t) is the frequency vector of items after seeing the first t indices in the stream. Much work in the streaming literature focuses on estimating some function f(x(m)). Many applications though require obtaining estimates at time t of f(x(t)), for every t in [m]. Naively this guarantee is obtained by devising an algorithm with failure probability less than 1/m, then performing a union bound over all stream updates to guarantee that all m estimates are simultaneously accurate with good probability. When f(x) is some l_p norm of x, recent works have shown that this union bound is wasteful and better space complexity is possible for the continuous monitoring problem, with the strongest known results being for p=2. In this work, we improve the state of the art for all 0<p<2, which we obtain via a novel analysis of Indyk\u27s p-stable sketch
An Optimal Algorithm for Large Frequency Moments Using O(n^(1-2/k)) Bits
In this paper, we provide the first optimal algorithm for the remaining open question from the seminal paper of Alon, Matias, and Szegedy: approximating large frequency moments. We give an upper bound on the space required to find a k-th frequency moment of O(n^(1-2/k)) bits that matches, up to a constant factor, the lower bound of Woodruff et. al for constant epsilon and constant k.
Our algorithm makes a single pass over the stream and works for any constant k > 3. It is based upon two major technical accomplishments: first, we provide an optimal algorithm for finding the heavy elements in a stream; and second, we provide a technique using Martingale Sketches which gives an optimal reduction of the large frequency moment problem to the all heavy elements problem. We also provide a polylogarithmic improvement for frequency moments, frequency based functions, spatial data streams, and measuring independence of data sets
Tight Lower Bound for Linear Sketches of Moments
The problem of estimating frequency moments of a data stream has attracted a
lot of attention since the onset of streaming algorithms [AMS99]. While the
space complexity for approximately computing the moment, for
has been settled [KNW10], for the exact complexity remains
open. For the current best algorithm uses words of
space [AKO11,BO10], whereas the lower bound is of [BJKS04].
In this paper, we show a tight lower bound of words
for the class of algorithms based on linear sketches, which store only a sketch
of input vector and some (possibly randomized) matrix . We note
that all known algorithms for this problem are linear sketches.Comment: In Proceedings of the 40th International Colloquium on Automata,
Languages and Programming (ICALP), Riga, Latvia, July 201
Recommended from our members
Fast Moment Estimation in Data Streams in Optimal Space
We give a space-optimal streaming algorithm with update time for approximating the pth frequency moment, 0 < p < 2, of a length-n vector updated in a data stream up to a factor of . This provides a nearly exponential improvement over the previous space optimal algorithm of [Kane-Nelson-Woodruff, SODA 2010], which had update time . When combined with the work of [Harvey-Nelson-Onak, FOCS 2008], we also obtain the first algorithm for entropy estimation in turnstile streams which simultaneously achieves near-optimal space and fast update time.Engineering and Applied Science