10 research outputs found

    An Optimal Lower Bound on the Communication Complexity of Gap-Hamming-Distance

    Get PDF
    We prove an optimal Ω(n)\Omega(n) lower bound on the randomized communication complexity of the much-studied Gap-Hamming-Distance problem. As a consequence, we obtain essentially optimal multi-pass space lower bounds in the data stream model for a number of fundamental problems, including the estimation of frequency moments. The Gap-Hamming-Distance problem is a communication problem, wherein Alice and Bob receive nn-bit strings xx and yy, respectively. They are promised that the Hamming distance between xx and yy is either at least n/2+nn/2+\sqrt{n} or at most n/2−nn/2-\sqrt{n}, and their goal is to decide which of these is the case. Since the formal presentation of the problem by Indyk and Woodruff (FOCS, 2003), it had been conjectured that the naive protocol, which uses nn bits of communication, is asymptotically optimal. The conjecture was shown to be true in several special cases, e.g., when the communication is deterministic, or when the number of rounds of communication is limited. The proof of our aforementioned result, which settles this conjecture fully, is based on a new geometric statement regarding correlations in Gaussian space, related to a result of C. Borell (1985). To prove this geometric statement, we show that random projections of not-too-small sets in Gaussian space are close to a mixture of translated normal variables

    Stochastic Streams: Sample Complexity vs. Space Complexity

    Get PDF
    We address the trade-off between the computational resources needed to process a large data set and the number of samples available from the data set. Specifically, we consider the following abstraction: we receive a potentially infinite stream of IID samples from some unknown distribution D, and are tasked with computing some function f(D). If the stream is observed for time t, how much memory, s, is required to estimate f(D)? We refer to t as the sample complexity and s as the space complexity. The main focus of this paper is investigating the trade-offs between the space and sample complexity. We study these trade-offs for several canonical problems studied in the data stream model: estimating the collision probability, i.e., the second moment of a distribution, deciding if a graph is connected, and approximating the dimension of an unknown subspace. Our results are based on techniques for simulating different classical sampling procedures in this model, emulating random walks given a sequence of IID samples, as well as leveraging a characterization between communication bounded protocols and statistical query algorithms

    Communication Complexity of Inner Product in Symmetric Normed Spaces

    Get PDF
    We introduce and study the communication complexity of computing the inner product of two vectors, where the input is restricted w.r.t. a norm NN on the space Rn\mathbb{R}^n. Here, Alice and Bob hold two vectors v,uv,u such that ∄v∄N≀1\|v\|_N\le 1 and ∄u∄N∗≀1\|u\|_{N^*}\le 1, where N∗N^* is the dual norm. They want to compute their inner product ⟹v,u⟩\langle v,u \rangle up to an Δ\varepsilon additive term. The problem is denoted by IPN\mathrm{IP}_N. We systematically study IPN\mathrm{IP}_N, showing the following results: - For any symmetric norm NN, given ∄v∄N≀1\|v\|_N\le 1 and ∄u∄N∗≀1\|u\|_{N^*}\le 1 there is a randomized protocol for IPN\mathrm{IP}_N using O~(Δ−6log⁥n)\tilde{\mathcal{O}}(\varepsilon^{-6} \log n) bits -- we will denote this by RΔ,1/3(IPN)≀O~(Δ−6log⁥n)\mathcal{R}_{\varepsilon,1/3}(\mathrm{IP}_{N}) \leq \tilde{\mathcal{O}}(\varepsilon^{-6} \log n). - One way communication complexity R→(IPℓp)≀O(Δ−max⁥(2,p)⋅log⁥nΔ)\overrightarrow{\mathcal{R}}(\mathrm{IP}_{\ell_p})\leq\mathcal{O}(\varepsilon^{-\max(2,p)}\cdot \log\frac n\varepsilon), and a nearly matching lower bound R→(IPℓp)≄Ω(Δ−max⁥(2,p))\overrightarrow{\mathcal{R}}(\mathrm{IP}_{\ell_p}) \geq \Omega(\varepsilon^{-\max(2,p)}) for Δ−max⁥(2,p)â‰Șn\varepsilon^{-\max(2,p)} \ll n. - One way communication complexity R→(N)\overrightarrow{\mathcal{R}}(N) for a symmetric norm NN is governed by embeddings ℓ∞k\ell_\infty^k into NN. Specifically, while a small distortion embedding easily implies a lower bound Ω(k)\Omega(k), we show that, conversely, non-existence of such an embedding implies protocol with communication kO(log⁥log⁥k)log⁥2nk^{\mathcal{O}(\log \log k)} \log^2 n. - For arbitrary origin symmetric convex polytope PP, we show R(IPN)≀O(Δ−2log⁥xc(P))\mathcal{R}(\mathrm{IP}_{N}) \le\mathcal{O}(\varepsilon^{-2} \log \mathrm{xc}(P)), where NN is the unique norm for which PP is a unit ball, and xc(P)\mathrm{xc}(P) is the extension complexity of PP.Comment: Accepted to ITCS 202

    Communication Complexity of Inner Product in Symmetric Normed Spaces

    Get PDF

    Robust lower bounds for communication and stream computation

    Get PDF
    We study the communication complexity of evaluating functions when the input data is randomly allocated (according to some known distribution) amongst two or more players, possibly with information overlap. This naturally extends previously studied variable partition models such as the best-case and worst-case partition models. We aim to understand whether the hardness of a communication problem holds for almost every allocation of the input, as opposed to holding for perhaps just a few atypical partitions. A key application is to the heavily studied data stream model. There is a strong connection between our communication lower bounds and lower bounds in the data stream model that are “robust” to the ordering of the data. That is, we prove lower bounds for when the order of the items in the stream is chosen not adversarially but rather uniformly (or near-uniformly) from the set of all permutations. This random-order data stream model has attracted recent interest, since lower bounds here give stronger evidence for the inherent hardness of streaming problems. Our results include the first random-partition communication lower bounds for problems including multi-party set disjointness and gap-Hamming-distance. Both are tight. We also extend and improve previous results for a form of pointer jumping that is relevant to the problem of selection (in particular, median finding). Collectively, these results yield lower bounds for a variety of problems in the random-order data stream model, including estimating the number of distinct elements, approximating frequency moments, and quantile estimation. A short version of this article is available in the Proceedings of the 40th Annual ACM Symposium on Theory of Computing (STOC'08), ACM, pp. 641-650. Compared to the conference presentation, this version considerably expands the detail of the discussion and in the proofs, and substantially changes some of the proof techniques

    Some Communication Complexity Results and their Applications

    Get PDF
    Communication Complexity represents one of the premier techniques for proving lower bounds in theoretical computer science. Lower bounds on communication problems can be leveraged to prove lower bounds in several different areas. In this work, we study three different communication complexity problems. The lower bounds for these problems have applications in circuit complexity, wireless sensor networks, and streaming algorithms. First, we study the multiparty pointer jumping problem. We present the first nontrivial upper bound for this problem. We also provide a suite of strong lower bounds under several restricted classes of protocols. Next, we initiate the study of several non-monotone functions in the distributed functional monitoring setting and provide several lower bounds. In particular, we give a generic adversarial technique and show that when deletions are allowed, no nontrivial protocol is possible. Finally, we study the Gap-Hamming-Distance problem and give tight lower bounds for protocols that use a constant number of messages. As a result, we take a well-known lower bound for one-pass streaming algorithms for a host of problems and extend it so it applies to streaming algorithms that use a constant number of passes

    LIPIcs, Volume 251, ITCS 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 251, ITCS 2023, Complete Volum

    The average-case complexity of counting distinct elements

    No full text
    We continue the study of approximating the number of distinct elements in a data stream of length n to within a (1±ɛ) factor. It is known that if the stream may consist of arbitrary data arriving in an arbitrary order, then any 1-pass algorithm requires ℩(1/ɛ 2) bits of space to perform this task. To try to bypass this lower bound, the problem was recently studied in a model in which the stream may consist of arbitrary data, but it arrives to the algorithm in a random order. However, even in this model an ℩(1/ɛ 2) lower bound was established. This is because the adversary can still choose the data arbitrarily. This leaves open the possibility that the problem is only hard under a pathological choice of data, which would be of little practical relevance. We study the average-case complexity of this problem under certain distributions. Namely, we study the case when each successive stream item is drawn independently and uniformly at random from an unknown subset of d items for an unknown value of d. This captures the notion of random uncorrelated data. For a wide range of values of d and n, we design a 1-pass algorithm that bypasses the ℩(1/Δ 2) lower bound that holds in the adversarial and random-order models, thereby showing that this model admits more space-efficient algorithms. Moreover, the update time of our algorithm is optimal. Despite these positive results, for a certain range of values of d and n we show that estimating the number of distinct elements requires ℩(1/Δ 2) bits of space even in this model. Our lower bound subsumes previous bounds, showing that even for natural choices of data the problem is hard
    corecore