701 research outputs found

    Separating k-Player from t-Player One-Way Communication, with Applications to Data Streams

    Get PDF
    In a k-party communication problem, the k players with inputs x_1, x_2, ..., x_k, respectively, want to evaluate a function f(x_1, x_2, ..., x_k) using as little communication as possible. We consider the message-passing model, in which the inputs are partitioned in an arbitrary, possibly worst-case manner, among a smaller number t of players (t<k). The t-player communication cost of computing f can only be smaller than the k-player communication cost, since the t players can trivially simulate the k-player protocol. But how much smaller can it be? We study deterministic and randomized protocols in the one-way model, and provide separations for product input distributions, which are optimal for low error probability protocols. We also provide much stronger separations when the input distribution is non-product. A key application of our results is in proving lower bounds for data stream algorithms. In particular, we give an optimal Omega(epsilon^{-2}log(N) log log(mM)) bits of space lower bound for the fundamental problem of (1 +/-{epsilon})-approximating the number |x |_0 of non-zero entries of an n-dimensional vector x after m updates each of magnitude M, and with success probability >= 2/3, in a strict turnstile stream. Our result matches the best known upper bound when epsilon >= 1/polylog(mM). It also improves on the prior Omega({epsilon}^{-2}log(mM)) lower bound and separates the complexity of approximating L_0 from approximating the p-norm L_p for p bounded away from 0, since the latter has an O(epsilon^{-2}log(mM)) bit upper bound

    Approximate Hamming distance in a stream

    Get PDF
    We consider the problem of computing a (1+ϵ)(1+\epsilon)-approximation of the Hamming distance between a pattern of length nn and successive substrings of a stream. We first look at the one-way randomised communication complexity of this problem, giving Alice the first half of the stream and Bob the second half. We show the following: (1) If Alice and Bob both share the pattern then there is an O(ϵ4log2n)O(\epsilon^{-4} \log^2 n) bit randomised one-way communication protocol. (2) If only Alice has the pattern then there is an O(ϵ2nlogn)O(\epsilon^{-2}\sqrt{n}\log n) bit randomised one-way communication protocol. We then go on to develop small space streaming algorithms for (1+ϵ)(1+\epsilon)-approximate Hamming distance which give worst case running time guarantees per arriving symbol. (1) For binary input alphabets there is an O(ϵ3nlog2n)O(\epsilon^{-3} \sqrt{n} \log^{2} n) space and O(ϵ2logn)O(\epsilon^{-2} \log{n}) time streaming (1+ϵ)(1+\epsilon)-approximate Hamming distance algorithm. (2) For general input alphabets there is an O(ϵ5nlog4n)O(\epsilon^{-5} \sqrt{n} \log^{4} n) space and O(ϵ4log3n)O(\epsilon^{-4} \log^3 {n}) time streaming (1+ϵ)(1+\epsilon)-approximate Hamming distance algorithm.Comment: Submitted to ICALP' 201

    Recursive Sketching For Frequency Moments

    Full text link
    In a ground-breaking paper, Indyk and Woodruff (STOC 05) showed how to compute FkF_k (for k>2k>2) in space complexity O(\mbox{\em poly-log}(n,m)\cdot n^{1-\frac2k}), which is optimal up to (large) poly-logarithmic factors in nn and mm, where mm is the length of the stream and nn is the upper bound on the number of distinct elements in a stream. The best known lower bound for large moments is Ω(log(n)n12k)\Omega(\log(n)n^{1-\frac2k}). A follow-up work of Bhuvanagiri, Ganguly, Kesh and Saha (SODA 2006) reduced the poly-logarithmic factors of Indyk and Woodruff to O(log2(m)(logn+logm)n12k)O(\log^2(m)\cdot (\log n+ \log m)\cdot n^{1-{2\over k}}). Further reduction of poly-log factors has been an elusive goal since 2006, when Indyk and Woodruff method seemed to hit a natural "barrier." Using our simple recursive sketch, we provide a different yet simple approach to obtain a O(log(m)log(nm)(loglogn)4n12k)O(\log(m)\log(nm)\cdot (\log\log n)^4\cdot n^{1-{2\over k}}) algorithm for constant ϵ\epsilon (our bound is, in fact, somewhat stronger, where the (loglogn)(\log\log n) term can be replaced by any constant number of log\log iterations instead of just two or three, thus approaching lognlog^*n. Our bound also works for non-constant ϵ\epsilon (for details see the body of the paper). Further, our algorithm requires only 44-wise independence, in contrast to existing methods that use pseudo-random generators for computing large frequency moments

    Algorithmic Techniques for Processing Data Streams

    Get PDF
    We give a survey at some algorithmic techniques for processing data streams. After covering the basic methods of sampling and sketching, we present more evolved procedures that resort on those basic ones. In particular, we examine algorithmic schemes for similarity mining, the concept of group testing, and techniques for clustering and summarizing data streams
    corecore