28 research outputs found

    Recursive Sketching For Frequency Moments

    Full text link
    In a ground-breaking paper, Indyk and Woodruff (STOC 05) showed how to compute FkF_k (for k>2k>2) in space complexity O(\mbox{\em poly-log}(n,m)\cdot n^{1-\frac2k}), which is optimal up to (large) poly-logarithmic factors in nn and mm, where mm is the length of the stream and nn is the upper bound on the number of distinct elements in a stream. The best known lower bound for large moments is Ξ©(log⁑(n)n1βˆ’2k)\Omega(\log(n)n^{1-\frac2k}). A follow-up work of Bhuvanagiri, Ganguly, Kesh and Saha (SODA 2006) reduced the poly-logarithmic factors of Indyk and Woodruff to O(log⁑2(m)β‹…(log⁑n+log⁑m)β‹…n1βˆ’2k)O(\log^2(m)\cdot (\log n+ \log m)\cdot n^{1-{2\over k}}). Further reduction of poly-log factors has been an elusive goal since 2006, when Indyk and Woodruff method seemed to hit a natural "barrier." Using our simple recursive sketch, we provide a different yet simple approach to obtain a O(log⁑(m)log⁑(nm)β‹…(log⁑log⁑n)4β‹…n1βˆ’2k)O(\log(m)\log(nm)\cdot (\log\log n)^4\cdot n^{1-{2\over k}}) algorithm for constant Ο΅\epsilon (our bound is, in fact, somewhat stronger, where the (log⁑log⁑n)(\log\log n) term can be replaced by any constant number of log⁑\log iterations instead of just two or three, thus approaching logβˆ—nlog^*n. Our bound also works for non-constant Ο΅\epsilon (for details see the body of the paper). Further, our algorithm requires only 44-wise independence, in contrast to existing methods that use pseudo-random generators for computing large frequency moments

    Approximate Hamming distance in a stream

    Get PDF
    We consider the problem of computing a (1+Ο΅)(1+\epsilon)-approximation of the Hamming distance between a pattern of length nn and successive substrings of a stream. We first look at the one-way randomised communication complexity of this problem, giving Alice the first half of the stream and Bob the second half. We show the following: (1) If Alice and Bob both share the pattern then there is an O(Ο΅βˆ’4log⁑2n)O(\epsilon^{-4} \log^2 n) bit randomised one-way communication protocol. (2) If only Alice has the pattern then there is an O(Ο΅βˆ’2nlog⁑n)O(\epsilon^{-2}\sqrt{n}\log n) bit randomised one-way communication protocol. We then go on to develop small space streaming algorithms for (1+Ο΅)(1+\epsilon)-approximate Hamming distance which give worst case running time guarantees per arriving symbol. (1) For binary input alphabets there is an O(Ο΅βˆ’3nlog⁑2n)O(\epsilon^{-3} \sqrt{n} \log^{2} n) space and O(Ο΅βˆ’2log⁑n)O(\epsilon^{-2} \log{n}) time streaming (1+Ο΅)(1+\epsilon)-approximate Hamming distance algorithm. (2) For general input alphabets there is an O(Ο΅βˆ’5nlog⁑4n)O(\epsilon^{-5} \sqrt{n} \log^{4} n) space and O(Ο΅βˆ’4log⁑3n)O(\epsilon^{-4} \log^3 {n}) time streaming (1+Ο΅)(1+\epsilon)-approximate Hamming distance algorithm.Comment: Submitted to ICALP' 201
    corecore