10 research outputs found
An Optimal Lower Bound on the Communication Complexity of Gap-Hamming-Distance
We prove an optimal lower bound on the randomized communication
complexity of the much-studied Gap-Hamming-Distance problem. As a consequence,
we obtain essentially optimal multi-pass space lower bounds in the data stream
model for a number of fundamental problems, including the estimation of
frequency moments.
The Gap-Hamming-Distance problem is a communication problem, wherein Alice
and Bob receive -bit strings and , respectively. They are promised
that the Hamming distance between and is either at least
or at most , and their goal is to decide which of these is the
case. Since the formal presentation of the problem by Indyk and Woodruff (FOCS,
2003), it had been conjectured that the naive protocol, which uses bits of
communication, is asymptotically optimal. The conjecture was shown to be true
in several special cases, e.g., when the communication is deterministic, or
when the number of rounds of communication is limited.
The proof of our aforementioned result, which settles this conjecture fully,
is based on a new geometric statement regarding correlations in Gaussian space,
related to a result of C. Borell (1985). To prove this geometric statement, we
show that random projections of not-too-small sets in Gaussian space are close
to a mixture of translated normal variables
Stochastic Streams: Sample Complexity vs. Space Complexity
We address the trade-off between the computational resources needed to process a large data set and the number of samples available from the data set. Specifically, we consider the following abstraction: we receive a potentially infinite stream of IID samples from some unknown distribution D, and are tasked with computing some function f(D). If the stream is observed for time t, how much memory, s, is required to estimate f(D)? We refer to t as the sample complexity and s as the space complexity. The main focus of this paper is investigating the trade-offs between the space and sample complexity. We study these trade-offs for several canonical problems studied in the data stream model: estimating the collision probability, i.e., the second moment of a distribution, deciding if a graph is connected, and approximating the dimension of an unknown subspace. Our results are based on techniques for simulating different classical sampling procedures in this model, emulating random walks given a sequence of IID samples, as well as leveraging a characterization between communication bounded protocols and statistical query algorithms
Communication Complexity of Inner Product in Symmetric Normed Spaces
We introduce and study the communication complexity of computing the inner
product of two vectors, where the input is restricted w.r.t. a norm on the
space . Here, Alice and Bob hold two vectors such that
and , where is the dual norm. They want
to compute their inner product up to an
additive term. The problem is denoted by .
We systematically study , showing the following results:
- For any symmetric norm , given and
there is a randomized protocol for using
bits -- we will denote this by
.
- One way communication complexity
, and a nearly matching lower bound
for .
- One way communication complexity for a
symmetric norm is governed by embeddings into .
Specifically, while a small distortion embedding easily implies a lower bound
, we show that, conversely, non-existence of such an embedding
implies protocol with communication .
- For arbitrary origin symmetric convex polytope , we show
, where is the unique norm for which is a unit ball,
and is the extension complexity of .Comment: Accepted to ITCS 202
Robust lower bounds for communication and stream computation
We study the communication complexity of evaluating functions when the input data is randomly allocated (according to some known distribution) amongst two or more players, possibly with information overlap. This naturally extends previously studied variable partition models such as the best-case and worst-case partition models. We aim to understand whether the hardness of a communication problem holds for almost every allocation of the input, as opposed to holding for perhaps just a few atypical partitions.
A key application is to the heavily studied data stream model. There is a strong connection between our communication lower bounds and lower bounds in the data stream model that are ârobustâ to the ordering of the data. That is, we prove lower bounds for when the order of the items in the stream is chosen not adversarially but rather uniformly (or near-uniformly) from the set of all permutations. This random-order data stream model has attracted recent interest, since lower bounds here give stronger evidence for the inherent hardness of streaming problems.
Our results include the first random-partition communication lower bounds for problems including multi-party set disjointness and gap-Hamming-distance. Both are tight. We also extend and improve previous results for a form of pointer jumping that is relevant to the problem of selection (in particular, median finding). Collectively, these results yield lower bounds for a variety of problems in the random-order data stream model, including estimating the number of distinct elements, approximating frequency moments, and quantile estimation.
A short version of this article is available in the Proceedings of the 40th Annual ACM Symposium on Theory of Computing (STOC'08), ACM, pp. 641-650. Compared to the conference presentation, this version considerably expands the detail of the discussion and in the proofs, and substantially changes some of the proof techniques
Some Communication Complexity Results and their Applications
Communication Complexity represents one of the premier techniques for proving lower bounds in theoretical computer science. Lower bounds on communication problems can be leveraged to prove lower bounds in several different areas. In this work, we study three different communication complexity problems. The lower bounds for these problems have applications in circuit complexity, wireless sensor networks, and streaming algorithms. First, we study the multiparty pointer jumping problem. We present the first nontrivial upper bound for this problem. We also provide a suite of strong lower bounds under several restricted classes of protocols. Next, we initiate the study of several non-monotone functions in the distributed functional monitoring setting and provide several lower bounds. In particular, we give a generic adversarial technique and show that when deletions are allowed, no nontrivial protocol is possible. Finally, we study the Gap-Hamming-Distance problem and give tight lower bounds for protocols that use a constant number of messages. As a result, we take a well-known lower bound for one-pass streaming algorithms for a host of problems and extend it so it applies to streaming algorithms that use a constant number of passes
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
The average-case complexity of counting distinct elements
We continue the study of approximating the number of distinct elements in a data stream of length n to within a (1±É) factor. It is known that if the stream may consist of arbitrary data arriving in an arbitrary order, then any 1-pass algorithm requires âŠ(1/É 2) bits of space to perform this task. To try to bypass this lower bound, the problem was recently studied in a model in which the stream may consist of arbitrary data, but it arrives to the algorithm in a random order. However, even in this model an âŠ(1/É 2) lower bound was established. This is because the adversary can still choose the data arbitrarily. This leaves open the possibility that the problem is only hard under a pathological choice of data, which would be of little practical relevance. We study the average-case complexity of this problem under certain distributions. Namely, we study the case when each successive stream item is drawn independently and uniformly at random from an unknown subset of d items for an unknown value of d. This captures the notion of random uncorrelated data. For a wide range of values of d and n, we design a 1-pass algorithm that bypasses the âŠ(1/Δ 2) lower bound that holds in the adversarial and random-order models, thereby showing that this model admits more space-efficient algorithms. Moreover, the update time of our algorithm is optimal. Despite these positive results, for a certain range of values of d and n we show that estimating the number of distinct elements requires âŠ(1/Δ 2) bits of space even in this model. Our lower bound subsumes previous bounds, showing that even for natural choices of data the problem is hard