606 research outputs found
Communication Steps for Parallel Query Processing
We consider the problem of computing a relational query on a large input
database of size , using a large number of servers. The computation is
performed in rounds, and each server can receive only
bits of data, where is a parameter that controls
replication. We examine how many global communication steps are needed to
compute . We establish both lower and upper bounds, in two settings. For a
single round of communication, we give lower bounds in the strongest possible
model, where arbitrary bits may be exchanged; we show that any algorithm
requires , where is the fractional vertex
cover of the hypergraph of . We also give an algorithm that matches the
lower bound for a specific class of databases. For multiple rounds of
communication, we present lower bounds in a model where routing decisions for a
tuple are tuple-based. We show that for the class of tree-like queries there
exists a tradeoff between the number of rounds and the space exponent
. The lower bounds for multiple rounds are the first of their
kind. Our results also imply that transitive closure cannot be computed in O(1)
rounds of communication
Equivalence Classes and Conditional Hardness in Massively Parallel Computations
The Massively Parallel Computation (MPC) model serves as a common abstraction of many modern large-scale data processing frameworks, and has been receiving increasingly more attention over the past few years, especially in the context of classical graph problems. So far, the only way to argue lower bounds for this model is to condition on conjectures about the hardness of some specific problems, such as graph connectivity on promise graphs that are either one cycle or two cycles, usually called the one cycle vs. two cycles problem. This is unlike the traditional arguments based on conjectures about complexity classes (e.g., P ? NP), which are often more robust in the sense that refuting them would lead to groundbreaking algorithms for a whole bunch of problems.
In this paper we present connections between problems and classes of problems that allow the latter type of arguments. These connections concern the class of problems solvable in a sublogarithmic amount of rounds in the MPC model, denoted by MPC(o(log N)), and some standard classes concerning space complexity, namely L and NL, and suggest conjectures that are robust in the sense that refuting them would lead to many surprisingly fast new algorithms in the MPC model. We also obtain new conditional lower bounds, and prove new reductions and equivalences between problems in the MPC model
Space and Time Efficient Parallel Graph Decomposition, Clustering, and Diameter Approximation
We develop a novel parallel decomposition strategy for unweighted, undirected
graphs, based on growing disjoint connected clusters from batches of centers
progressively selected from yet uncovered nodes. With respect to similar
previous decompositions, our strategy exercises a tighter control on both the
number of clusters and their maximum radius.
We present two important applications of our parallel graph decomposition:
(1) -center clustering approximation; and (2) diameter approximation. In
both cases, we obtain algorithms which feature a polylogarithmic approximation
factor and are amenable to a distributed implementation that is geared for
massive (long-diameter) graphs. The total space needed for the computation is
linear in the problem size, and the parallel depth is substantially sublinear
in the diameter for graphs with low doubling dimension. To the best of our
knowledge, ours are the first parallel approximations for these problems which
achieve sub-diameter parallel time, for a relevant class of graphs, using only
linear space. Besides the theoretical guarantees, our algorithms allow for a
very simple implementation on clustered architectures: we report on extensive
experiments which demonstrate their effectiveness and efficiency on large
graphs as compared to alternative known approaches.Comment: 14 page
- …