    Topology Dependent Bounds For FAQs

    In this paper, we prove topology dependent bounds on the number of rounds needed to compute Functional Aggregate Queries (FAQs) studied by Abo Khamis et al. [PODS 2016] in a synchronous distributed network under the model considered by Chattopadhyay et al. [FOCS 2014, SODA 2017]. Unlike the recent work on computing database queries in the Massively Parallel Computation model, in the model of Chattopadhyay et al., nodes can communicate only via private point-to-point channels and we are interested in bounds that work over an {\em arbitrary} communication topology. This is the first work to consider more practically motivated problems in this distributed model. For the sake of exposition, we focus on two special problems in this paper: Boolean Conjunctive Query (BCQ) and computing variable/factor marginals in Probabilistic Graphical Models (PGMs). We obtain tight bounds on the number of rounds needed to compute such queries as long as the underlying hypergraph of the query is O(1)O(1)-degenerate and has O(1)O(1)-arity. In particular, the O(1)O(1)-degeneracy condition covers most well-studied queries that are efficiently computable in the centralized computation model like queries with constant treewidth. These tight bounds depend on a new notion of `width' (namely internal-node-width) for Generalized Hypertree Decompositions (GHDs) of acyclic hypergraphs, which minimizes the number of internal nodes in a sub-class of GHDs. To the best of our knowledge, this width has not been studied explicitly in the theoretical database literature. Finally, we consider the problem of computing the product of a vector with a chain of matrices and prove tight bounds on its round complexity (over the finite field of two elements) using a novel min-entropy based argument.Comment: A conference version was presented at PODS 201

    Tight Bounds for Asymptotic and Approximate Consensus

    We study the performance of asymptotic and approximate consensus algorithms under harsh environmental conditions. The asymptotic consensus problem requires a set of agents to repeatedly set their outputs such that the outputs converge to a common value within the convex hull of initial values. This problem, and the related approximate consensus problem, are fundamental building blocks in distributed systems where exact consensus among agents is not required or possible, e.g., man-made distributed control systems, and have applications in the analysis of natural distributed systems, such as flocking and opinion dynamics. We prove tight lower bounds on the contraction rates of asymptotic consensus algorithms in dynamic networks, from which we deduce bounds on the time complexity of approximate consensus algorithms. In particular, the obtained bounds show optimality of asymptotic and approximate consensus algorithms presented in [Charron-Bost et al., ICALP'16] for certain dynamic networks, including the weakest dynamic network model in which asymptotic and approximate consensus are solvable. As a corollary we also obtain asymptotically tight bounds for asymptotic consensus in the classical asynchronous model with crashes. Central to our lower bound proofs is an extended notion of valency, the set of reachable limits of an asymptotic consensus algorithm starting from a given configuration. We further relate topological properties of valencies to the solvability of exact consensus, shedding some light on the relation of these three fundamental problems in dynamic networks

    On the Distributed Complexity of Large-Scale Graph Computations

    Motivated by the increasing need to understand the distributed algorithmic foundations of large-scale graph computations, we study some fundamental graph problems in a message-passing model for distributed computing where k≥2k \geq 2 machines jointly perform computations on graphs with nn nodes (typically, n≫kn \gg k). The input graph is assumed to be initially randomly partitioned among the kk machines, a common implementation in many real-world systems. Communication is point-to-point, and the goal is to minimize the number of communication {\em rounds} of the computation. Our main contribution is the {\em General Lower Bound Theorem}, a theorem that can be used to show non-trivial lower bounds on the round complexity of distributed large-scale data computations. The General Lower Bound Theorem is established via an information-theoretic approach that relates the round complexity to the minimal amount of information required by machines to solve the problem. Our approach is generic and this theorem can be used in a "cookbook" fashion to show distributed lower bounds in the context of several problems, including non-graph problems. We present two applications by showing (almost) tight lower bounds for the round complexity of two fundamental graph problems, namely {\em PageRank computation} and {\em triangle enumeration}. Our approach, as demonstrated in the case of PageRank, can yield tight lower bounds for problems (including, and especially, under a stochastic partition of the input) where communication complexity techniques are not obvious. Our approach, as demonstrated in the case of triangle enumeration, can yield stronger round lower bounds as well as message-round tradeoffs compared to approaches that use communication complexity techniques

    Towards Optimal Moment Estimation in Streaming and Distributed Models

    One of the oldest problems in the data stream model is to approximate the p-th moment ||X||_p^p = sum_{i=1}^n X_i^p of an underlying non-negative vector X in R^n, which is presented as a sequence of poly(n) updates to its coordinates. Of particular interest is when p in (0,2]. Although a tight space bound of Theta(epsilon^-2 log n) bits is known for this problem when both positive and negative updates are allowed, surprisingly there is still a gap in the space complexity of this problem when all updates are positive. Specifically, the upper bound is O(epsilon^-2 log n) bits, while the lower bound is only Omega(epsilon^-2 + log n) bits. Recently, an upper bound of O~(epsilon^-2 + log n) bits was obtained under the assumption that the updates arrive in a random order. We show that for p in (0, 1], the random order assumption is not needed. Namely, we give an upper bound for worst-case streams of O~(epsilon^-2 + log n) bits for estimating |X |_p^p. Our techniques also give new upper bounds for estimating the empirical entropy in a stream. On the other hand, we show that for p in (1,2], in the natural coordinator and blackboard distributed communication topologies, there is an O~(epsilon^-2) bit max-communication upper bound based on a randomized rounding scheme. Our protocols also give rise to protocols for heavy hitters and approximate matrix product. We generalize our results to arbitrary communication topologies G, obtaining an O~(epsilon^2 log d) max-communication upper bound, where d is the diameter of G. Interestingly, our upper bound rules out natural communication complexity-based approaches for proving an Omega(epsilon^-2 log n) bit lower bound for p in (1,2] for streaming algorithms. In particular, any such lower bound must come from a topology with large diameter

    Towards a complexity theory for the congested clique

    The congested clique model of distributed computing has been receiving attention as a model for densely connected distributed systems. While there has been significant progress on the side of upper bounds, we have very little in terms of lower bounds for the congested clique; indeed, it is now know that proving explicit congested clique lower bounds is as difficult as proving circuit lower bounds. In this work, we use various more traditional complexity-theoretic tools to build a clearer picture of the complexity landscape of the congested clique: -- Nondeterminism and beyond: We introduce the nondeterministic congested clique model (analogous to NP) and show that there is a natural canonical problem family that captures all problems solvable in constant time with nondeterministic algorithms. We further generalise these notions by introducing the constant-round decision hierarchy (analogous to the polynomial hierarchy). -- Non-constructive lower bounds: We lift the prior non-uniform counting arguments to a general technique for proving non-constructive uniform lower bounds for the congested clique. In particular, we prove a time hierarchy theorem for the congested clique, showing that there are decision problems of essentially all complexities, both in the deterministic and nondeterministic settings. -- Fine-grained complexity: We map out relationships between various natural problems in the congested clique model, arguing that a reduction-based complexity theory currently gives us a fairly good picture of the complexity landscape of the congested clique

    Optimal Dynamic Distributed MIS

    Finding a maximal independent set (MIS) in a graph is a cornerstone task in distributed computing. The local nature of an MIS allows for fast solutions in a static distributed setting, which are logarithmic in the number of nodes or in their degrees. The result trivially applies for the dynamic distributed model, in which edges or nodes may be inserted or deleted. In this paper, we take a different approach which exploits locality to the extreme, and show how to update an MIS in a dynamic distributed setting, either \emph{synchronous} or \emph{asynchronous}, with only \emph{a single adjustment} and in a single round, in expectation. These strong guarantees hold for the \emph{complete fully dynamic} setting: Insertions and deletions, of edges as well as nodes, gracefully and abruptly. This strongly separates the static and dynamic distributed models, as super-constant lower bounds exist for computing an MIS in the former. Our results are obtained by a novel analysis of the surprisingly simple solution of carefully simulating the greedy \emph{sequential} MIS algorithm with a random ordering of the nodes. As such, our algorithm has a direct application as a 33-approximation algorithm for correlation clustering. This adds to the important toolbox of distributed graph decompositions, which are widely used as crucial building blocks in distributed computing. Finally, our algorithm enjoys a useful \emph{history-independence} property, meaning the output is independent of the history of topology changes that constructed that graph. This means the output cannot be chosen, or even biased, by the adversary in case its goal is to prevent us from optimizing some objective function.Comment: 19 pages including appendix and reference
