14,497 research outputs found
On the Distributed Complexity of Large-Scale Graph Computations
Motivated by the increasing need to understand the distributed algorithmic
foundations of large-scale graph computations, we study some fundamental graph
problems in a message-passing model for distributed computing where
machines jointly perform computations on graphs with nodes (typically, ). The input graph is assumed to be initially randomly partitioned among
the machines, a common implementation in many real-world systems.
Communication is point-to-point, and the goal is to minimize the number of
communication {\em rounds} of the computation.
Our main contribution is the {\em General Lower Bound Theorem}, a theorem
that can be used to show non-trivial lower bounds on the round complexity of
distributed large-scale data computations. The General Lower Bound Theorem is
established via an information-theoretic approach that relates the round
complexity to the minimal amount of information required by machines to solve
the problem. Our approach is generic and this theorem can be used in a
"cookbook" fashion to show distributed lower bounds in the context of several
problems, including non-graph problems. We present two applications by showing
(almost) tight lower bounds for the round complexity of two fundamental graph
problems, namely {\em PageRank computation} and {\em triangle enumeration}. Our
approach, as demonstrated in the case of PageRank, can yield tight lower bounds
for problems (including, and especially, under a stochastic partition of the
input) where communication complexity techniques are not obvious.
Our approach, as demonstrated in the case of triangle enumeration, can yield
stronger round lower bounds as well as message-round tradeoffs compared to
approaches that use communication complexity techniques
Distributed Data Summarization in Well-Connected Networks
We study distributed algorithms for some fundamental problems in data summarization. Given a communication graph G of n nodes each of which may hold a value initially, we focus on computing sum_{i=1}^N g(f_i), where f_i is the number of occurrences of value i and g is some fixed function. This includes important statistics such as the number of distinct elements, frequency moments, and the empirical entropy of the data.
In the CONGEST~ model, a simple adaptation from streaming lower bounds shows that it requires Omega~(D+ n) rounds, where D is the diameter of the graph, to compute some of these statistics exactly. However, these lower bounds do not hold for graphs that are well-connected. We give an algorithm that computes sum_{i=1}^{N} g(f_i) exactly in {tau_{G}} * 2^{O(sqrt{log n})} rounds where {tau_{G}} is the mixing time of G. This also has applications in computing the top k most frequent elements.
We demonstrate that there is a high similarity between the GOSSIP~ model and the CONGEST~ model in well-connected graphs. In particular, we show that each round of the GOSSIP~ model can be simulated almost perfectly in O~({tau_{G}}) rounds of the CONGEST~ model. To this end, we develop a new algorithm for the GOSSIP~ model that 1 +/- epsilon approximates the p-th frequency moment F_p = sum_{i=1}^N f_i^p in O~(epsilon^{-2} n^{1-k/p}) roundsfor p >= 2, when the number of distinct elements F_0 is at most O(n^{1/(k-1)}). This result can be translated back to the CONGEST~ model with a factor O~({tau_{G}}) blow-up in the number of rounds
Tight Bounds for the Cover Times of Random Walks with Heterogeneous Step Lengths
Search patterns of randomly oriented steps of different lengths have been observed on all scales of the biological world, ranging from the microscopic to the ecological, including in protein motors, bacteria, T-cells, honeybees, marine predators, and more. Through different models, it has been demonstrated that adopting a variety in the magnitude of the step lengths can greatly improve the search efficiency. However, the precise connection between the search efficiency and the number of step lengths in the repertoire of the searcher has not been identified. Motivated by biological examples in one-dimensional terrains, a recent paper studied the best cover time on an n-node cycle that can be achieved by a random walk process that uses k step lengths. By tuning the lengths and corresponding probabilities the authors therein showed that the best cover time is roughly n 1+Θ(1/k). While this bound is useful for large values of k, it is hardly informative for small k values, which are of interest in biology. In this paper, we provide a tight bound for the cover time of such a walk, for every integer k > 1. Specifically, up to lower order polylogarithmic factors, the upper bound on the cover time is a polynomial in n of exponent 1+ 1/(2k−1). For k = 2, 3, 4 and 5 the exponent is thus 4/3 , 6/5 , 8/7 , and 10/9 , respectively. Informally, our result implies that, as long as the number of step lengths k is not too large, incorporating an additional step length to the repertoire of the process enables to improve the cover time by a polynomial factor, but the extent of the improvement gradually decreases with k
Gossip vs. Markov Chains, and Randomness-Efficient Rumor Spreading
We study gossip algorithms for the rumor spreading problem which asks one
node to deliver a rumor to all nodes in an unknown network. We present the
first protocol for any expander graph with nodes such that, the
protocol informs every node in rounds with high probability, and
uses random bits in total. The runtime of our protocol is
tight, and the randomness requirement of random bits almost
matches the lower bound of random bits for dense graphs. We
further show that, for many graph families, polylogarithmic number of random
bits in total suffice to spread the rumor in rounds.
These results together give us an almost complete understanding of the
randomness requirement of this fundamental gossip process.
Our analysis relies on unexpectedly tight connections among gossip processes,
Markov chains, and branching programs. First, we establish a connection between
rumor spreading processes and Markov chains, which is used to approximate the
rumor spreading time by the mixing time of Markov chains. Second, we show a
reduction from rumor spreading processes to branching programs, and this
reduction provides a general framework to derandomize gossip processes. In
addition to designing rumor spreading protocols, these novel techniques may
have applications in studying parallel and multiple random walks, and
randomness complexity of distributed algorithms.Comment: 41 pages, 1 figure. arXiv admin note: substantial text overlap with
arXiv:1304.135
- …