14,497 research outputs found

    On the Distributed Complexity of Large-Scale Graph Computations

    Full text link
    Motivated by the increasing need to understand the distributed algorithmic foundations of large-scale graph computations, we study some fundamental graph problems in a message-passing model for distributed computing where k2k \geq 2 machines jointly perform computations on graphs with nn nodes (typically, nkn \gg k). The input graph is assumed to be initially randomly partitioned among the kk machines, a common implementation in many real-world systems. Communication is point-to-point, and the goal is to minimize the number of communication {\em rounds} of the computation. Our main contribution is the {\em General Lower Bound Theorem}, a theorem that can be used to show non-trivial lower bounds on the round complexity of distributed large-scale data computations. The General Lower Bound Theorem is established via an information-theoretic approach that relates the round complexity to the minimal amount of information required by machines to solve the problem. Our approach is generic and this theorem can be used in a "cookbook" fashion to show distributed lower bounds in the context of several problems, including non-graph problems. We present two applications by showing (almost) tight lower bounds for the round complexity of two fundamental graph problems, namely {\em PageRank computation} and {\em triangle enumeration}. Our approach, as demonstrated in the case of PageRank, can yield tight lower bounds for problems (including, and especially, under a stochastic partition of the input) where communication complexity techniques are not obvious. Our approach, as demonstrated in the case of triangle enumeration, can yield stronger round lower bounds as well as message-round tradeoffs compared to approaches that use communication complexity techniques

    Distributed Data Summarization in Well-Connected Networks

    Get PDF
    We study distributed algorithms for some fundamental problems in data summarization. Given a communication graph G of n nodes each of which may hold a value initially, we focus on computing sum_{i=1}^N g(f_i), where f_i is the number of occurrences of value i and g is some fixed function. This includes important statistics such as the number of distinct elements, frequency moments, and the empirical entropy of the data. In the CONGEST~ model, a simple adaptation from streaming lower bounds shows that it requires Omega~(D+ n) rounds, where D is the diameter of the graph, to compute some of these statistics exactly. However, these lower bounds do not hold for graphs that are well-connected. We give an algorithm that computes sum_{i=1}^{N} g(f_i) exactly in {tau_{G}} * 2^{O(sqrt{log n})} rounds where {tau_{G}} is the mixing time of G. This also has applications in computing the top k most frequent elements. We demonstrate that there is a high similarity between the GOSSIP~ model and the CONGEST~ model in well-connected graphs. In particular, we show that each round of the GOSSIP~ model can be simulated almost perfectly in O~({tau_{G}}) rounds of the CONGEST~ model. To this end, we develop a new algorithm for the GOSSIP~ model that 1 +/- epsilon approximates the p-th frequency moment F_p = sum_{i=1}^N f_i^p in O~(epsilon^{-2} n^{1-k/p}) roundsfor p >= 2, when the number of distinct elements F_0 is at most O(n^{1/(k-1)}). This result can be translated back to the CONGEST~ model with a factor O~({tau_{G}}) blow-up in the number of rounds

    Tight Bounds for the Cover Times of Random Walks with Heterogeneous Step Lengths

    Get PDF
    Search patterns of randomly oriented steps of different lengths have been observed on all scales of the biological world, ranging from the microscopic to the ecological, including in protein motors, bacteria, T-cells, honeybees, marine predators, and more. Through different models, it has been demonstrated that adopting a variety in the magnitude of the step lengths can greatly improve the search efficiency. However, the precise connection between the search efficiency and the number of step lengths in the repertoire of the searcher has not been identified. Motivated by biological examples in one-dimensional terrains, a recent paper studied the best cover time on an n-node cycle that can be achieved by a random walk process that uses k step lengths. By tuning the lengths and corresponding probabilities the authors therein showed that the best cover time is roughly n 1+Θ(1/k). While this bound is useful for large values of k, it is hardly informative for small k values, which are of interest in biology. In this paper, we provide a tight bound for the cover time of such a walk, for every integer k > 1. Specifically, up to lower order polylogarithmic factors, the upper bound on the cover time is a polynomial in n of exponent 1+ 1/(2k−1). For k = 2, 3, 4 and 5 the exponent is thus 4/3 , 6/5 , 8/7 , and 10/9 , respectively. Informally, our result implies that, as long as the number of step lengths k is not too large, incorporating an additional step length to the repertoire of the process enables to improve the cover time by a polynomial factor, but the extent of the improvement gradually decreases with k

    Gossip vs. Markov Chains, and Randomness-Efficient Rumor Spreading

    Get PDF
    We study gossip algorithms for the rumor spreading problem which asks one node to deliver a rumor to all nodes in an unknown network. We present the first protocol for any expander graph GG with nn nodes such that, the protocol informs every node in O(logn)O(\log n) rounds with high probability, and uses O~(logn)\tilde{O}(\log n) random bits in total. The runtime of our protocol is tight, and the randomness requirement of O~(logn)\tilde{O}(\log n) random bits almost matches the lower bound of Ω(logn)\Omega(\log n) random bits for dense graphs. We further show that, for many graph families, polylogarithmic number of random bits in total suffice to spread the rumor in O(polylogn)O(\mathrm{poly}\log n) rounds. These results together give us an almost complete understanding of the randomness requirement of this fundamental gossip process. Our analysis relies on unexpectedly tight connections among gossip processes, Markov chains, and branching programs. First, we establish a connection between rumor spreading processes and Markov chains, which is used to approximate the rumor spreading time by the mixing time of Markov chains. Second, we show a reduction from rumor spreading processes to branching programs, and this reduction provides a general framework to derandomize gossip processes. In addition to designing rumor spreading protocols, these novel techniques may have applications in studying parallel and multiple random walks, and randomness complexity of distributed algorithms.Comment: 41 pages, 1 figure. arXiv admin note: substantial text overlap with arXiv:1304.135
    corecore