24,456 research outputs found
Worst-Case Optimal Algorithms for Parallel Query Processing
In this paper, we study the communication complexity for the problem of
computing a conjunctive query on a large database in a parallel setting with
servers. In contrast to previous work, where upper and lower bounds on the
communication were specified for particular structures of data (either data
without skew, or data with specific types of skew), in this work we focus on
worst-case analysis of the communication cost. The goal is to find worst-case
optimal parallel algorithms, similar to the work of [18] for sequential
algorithms.
We first show that for a single round we can obtain an optimal worst-case
algorithm. The optimal load for a conjunctive query when all relations have
size equal to is , where is a new query-related
quantity called the edge quasi-packing number, which is different from both the
edge packing number and edge cover number of the query hypergraph. For multiple
rounds, we present algorithms that are optimal for several classes of queries.
Finally, we show a surprising connection to the external memory model, which
allows us to translate parallel algorithms to external memory algorithms. This
technique allows us to recover (within a polylogarithmic factor) several recent
results on the I/O complexity for computing join queries, and also obtain
optimal algorithms for other classes of queries
Parallel Batch-Dynamic Graph Connectivity
In this paper, we study batch parallel algorithms for the dynamic
connectivity problem, a fundamental problem that has received considerable
attention in the sequential setting. The most well known sequential algorithm
for dynamic connectivity is the elegant level-set algorithm of Holm, de
Lichtenberg and Thorup (HDT), which achieves amortized time per
edge insertion or deletion, and time per query. We
design a parallel batch-dynamic connectivity algorithm that is work-efficient
with respect to the HDT algorithm for small batch sizes, and is asymptotically
faster when the average batch size is sufficiently large. Given a sequence of
batched updates, where is the average batch size of all deletions, our
algorithm achieves expected amortized work per
edge insertion and deletion and depth w.h.p. Our algorithm
answers a batch of connectivity queries in expected
work and depth w.h.p. To the best of our knowledge, our algorithm
is the first parallel batch-dynamic algorithm for connectivity.Comment: This is the full version of the paper appearing in the ACM Symposium
on Parallelism in Algorithms and Architectures (SPAA), 201
Instance and Output Optimal Parallel Algorithms for Acyclic Joins
Massively parallel join algorithms have received much attention in recent
years, while most prior work has focused on worst-optimal algorithms. However,
the worst-case optimality of these join algorithms relies on hard instances
having very large output sizes, which rarely appear in practice. A stronger
notion of optimality is {\em output-optimal}, which requires an algorithm to be
optimal within the class of all instances sharing the same input and output
size. An even stronger optimality is {\em instance-optimal}, i.e., the
algorithm is optimal on every single instance, but this may not always be
achievable.
In the traditional RAM model of computation, the classical Yannakakis
algorithm is instance-optimal on any acyclic join. But in the massively
parallel computation (MPC) model, the situation becomes much more complicated.
We first show that for the class of r-hierarchical joins, instance-optimality
can still be achieved in the MPC model. Then, we give a new MPC algorithm for
an arbitrary acyclic join with load O ({\IN \over p} + {\sqrt{\IN \cdot \OUT}
\over p}), where \IN,\OUT are the input and output sizes of the join, and
is the number of servers in the MPC model. This improves the MPC version of
the Yannakakis algorithm by an O (\sqrt{\OUT \over \IN} ) factor.
Furthermore, we show that this is output-optimal when \OUT = O(p \cdot \IN),
for every acyclic but non-r-hierarchical join. Finally, we give the first
output-sensitive lower bound for the triangle join in the MPC model, showing
that it is inherently more difficult than acyclic joins
Time lower bounds for nonadaptive turnstile streaming algorithms
We say a turnstile streaming algorithm is "non-adaptive" if, during updates,
the memory cells written and read depend only on the index being updated and
random coins tossed at the beginning of the stream (and not on the memory
contents of the algorithm). Memory cells read during queries may be decided
upon adaptively. All known turnstile streaming algorithms in the literature are
non-adaptive.
We prove the first non-trivial update time lower bounds for both randomized
and deterministic turnstile streaming algorithms, which hold when the
algorithms are non-adaptive. While there has been abundant success in proving
space lower bounds, there have been no non-trivial update time lower bounds in
the turnstile model. Our lower bounds hold against classically studied problems
such as heavy hitters, point query, entropy estimation, and moment estimation.
In some cases of deterministic algorithms, our lower bounds nearly match known
upper bounds
- …