21,243 research outputs found
Tight bounds for distributed functional monitoring
We resolve several fundamental questions in the area of distributed functional monitoring, initiated by Cormode, Muthukrishnan, and Yi (SODA, 2008), and receiving recent attention. In this model there are k sites each tracking their input streams and communicating with a central coordinator. The coordinator’s task is to continuously maintain an approximate output to a function computed over the union of the k streams. The goal is to minimize the number of bits communicated. Let the p-th frequency moment be defined as Fp = ∑ f p i, where fi is the frequency of element i. i We show the randomized communication complexity of estimating the number of distinct elements (that is, F0) up to a 1 + ε factor is Ω(k/ε2), improving upon the previous Ω(k + 1/ε2) bound and matching known upper bounds. For Fp, p> 1, we improve the previous Ω(k + 1/ε2) communication bound to Ω(kp−1 /ε2). We obtain similar improvements for heavy hitters, empirical entropy, and other problems
Some Communication Complexity Results and their Applications
Communication Complexity represents one of the premier techniques for proving lower bounds in theoretical computer science. Lower bounds on communication problems can be leveraged to prove lower bounds in several different areas. In this work, we study three different communication complexity problems. The lower bounds for these problems have applications in circuit complexity, wireless sensor networks, and streaming algorithms. First, we study the multiparty pointer jumping problem. We present the first nontrivial upper bound for this problem. We also provide a suite of strong lower bounds under several restricted classes of protocols. Next, we initiate the study of several non-monotone functions in the distributed functional monitoring setting and provide several lower bounds. In particular, we give a generic adversarial technique and show that when deletions are allowed, no nontrivial protocol is possible. Finally, we study the Gap-Hamming-Distance problem and give tight lower bounds for protocols that use a constant number of messages. As a result, we take a well-known lower bound for one-pass streaming algorithms for a host of problems and extend it so it applies to streaming algorithms that use a constant number of passes
Towards Optimal Moment Estimation in Streaming and Distributed Models
One of the oldest problems in the data stream model is to approximate the p-th moment ||X||_p^p = sum_{i=1}^n X_i^p of an underlying non-negative vector X in R^n, which is presented as a sequence of poly(n) updates to its coordinates. Of particular interest is when p in (0,2]. Although a tight space bound of Theta(epsilon^-2 log n) bits is known for this problem when both positive and negative updates are allowed, surprisingly there is still a gap in the space complexity of this problem when all updates are positive. Specifically, the upper bound is O(epsilon^-2 log n) bits, while the lower bound is only Omega(epsilon^-2 + log n) bits. Recently, an upper bound of O~(epsilon^-2 + log n) bits was obtained under the assumption that the updates arrive in a random order.
We show that for p in (0, 1], the random order assumption is not needed. Namely, we give an upper bound for worst-case streams of O~(epsilon^-2 + log n) bits for estimating |X |_p^p. Our techniques also give new upper bounds for estimating the empirical entropy in a stream. On the other hand, we show that for p in (1,2], in the natural coordinator and blackboard distributed communication topologies, there is an O~(epsilon^-2) bit max-communication upper bound based on a randomized rounding scheme. Our protocols also give rise to protocols for heavy hitters and approximate matrix product. We generalize our results to arbitrary communication topologies G, obtaining an O~(epsilon^2 log d) max-communication upper bound, where d is the diameter of G. Interestingly, our upper bound rules out natural communication complexity-based approaches for proving an Omega(epsilon^-2 log n) bit lower bound for p in (1,2] for streaming algorithms. In particular, any such lower bound must come from a topology with large diameter
Weighted Reservoir Sampling from Distributed Streams
We consider message-efficient continuous random sampling from a distributed
stream, where the probability of inclusion of an item in the sample is
proportional to a weight associated with the item. The unweighted version,
where all weights are equal, is well studied, and admits tight upper and lower
bounds on message complexity. For weighted sampling with replacement, there is
a simple reduction to unweighted sampling with replacement. However, in many
applications the stream has only a few heavy items which may dominate a random
sample when chosen with replacement. Weighted sampling \textit{without
replacement} (weighted SWOR) eludes this issue, since such heavy items can be
sampled at most once.
In this work, we present the first message-optimal algorithm for weighted
SWOR from a distributed stream. Our algorithm also has optimal space and time
complexity. As an application of our algorithm for weighted SWOR, we derive the
first distributed streaming algorithms for tracking \textit{heavy hitters with
residual error}. Here the goal is to identify stream items that contribute
significantly to the residual stream, once the heaviest items are removed.
Residual heavy hitters generalize the notion of heavy hitters and are
important in streams that have a skewed distribution of weights. In addition to
the upper bound, we also provide a lower bound on the message complexity that
is nearly tight up to a factor. Finally, we use our weighted
sampling algorithm to improve the message complexity of distributed
tracking, also known as count tracking, which is a widely studied problem in
distributed streaming. We also derive a tight message lower bound, which closes
the message complexity of this fundamental problem.Comment: To appear in PODS 201
An Optimal Lower Bound on the Communication Complexity of Gap-Hamming-Distance
We prove an optimal lower bound on the randomized communication
complexity of the much-studied Gap-Hamming-Distance problem. As a consequence,
we obtain essentially optimal multi-pass space lower bounds in the data stream
model for a number of fundamental problems, including the estimation of
frequency moments.
The Gap-Hamming-Distance problem is a communication problem, wherein Alice
and Bob receive -bit strings and , respectively. They are promised
that the Hamming distance between and is either at least
or at most , and their goal is to decide which of these is the
case. Since the formal presentation of the problem by Indyk and Woodruff (FOCS,
2003), it had been conjectured that the naive protocol, which uses bits of
communication, is asymptotically optimal. The conjecture was shown to be true
in several special cases, e.g., when the communication is deterministic, or
when the number of rounds of communication is limited.
The proof of our aforementioned result, which settles this conjecture fully,
is based on a new geometric statement regarding correlations in Gaussian space,
related to a result of C. Borell (1985). To prove this geometric statement, we
show that random projections of not-too-small sets in Gaussian space are close
to a mixture of translated normal variables
On The Communication Complexity of Linear Algebraic Problems in the Message Passing Model
We study the communication complexity of linear algebraic problems over
finite fields in the multi-player message passing model, proving a number of
tight lower bounds. Specifically, for a matrix which is distributed among a
number of players, we consider the problem of determining its rank, of
computing entries in its inverse, and of solving linear equations. We also
consider related problems such as computing the generalized inner product of
vectors held on different servers. We give a general framework for reducing
these multi-player problems to their two-player counterparts, showing that the
randomized -player communication complexity of these problems is at least
times the randomized two-player communication complexity. Provided the
problem has a certain amount of algebraic symmetry, which we formally define,
we can show the hardest input distribution is a symmetric distribution, and
therefore apply a recent multi-player lower bound technique of Phillips et al.
Further, we give new two-player lower bounds for a number of these problems. In
particular, our optimal lower bound for the two-player version of the matrix
rank problem resolves an open question of Sun and Wang.
A common feature of our lower bounds is that they apply even to the special
"threshold promise" versions of these problems, wherein the underlying
quantity, e.g., rank, is promised to be one of just two values, one on each
side of some critical threshold. These kinds of promise problems are
commonplace in the literature on data streaming as sources of hardness for
reductions giving space lower bounds
- …