2,004 research outputs found
Worst-Case Optimal Algorithms for Parallel Query Processing
In this paper, we study the communication complexity for the problem of
computing a conjunctive query on a large database in a parallel setting with
servers. In contrast to previous work, where upper and lower bounds on the
communication were specified for particular structures of data (either data
without skew, or data with specific types of skew), in this work we focus on
worst-case analysis of the communication cost. The goal is to find worst-case
optimal parallel algorithms, similar to the work of [18] for sequential
algorithms.
We first show that for a single round we can obtain an optimal worst-case
algorithm. The optimal load for a conjunctive query when all relations have
size equal to is , where is a new query-related
quantity called the edge quasi-packing number, which is different from both the
edge packing number and edge cover number of the query hypergraph. For multiple
rounds, we present algorithms that are optimal for several classes of queries.
Finally, we show a surprising connection to the external memory model, which
allows us to translate parallel algorithms to external memory algorithms. This
technique allows us to recover (within a polylogarithmic factor) several recent
results on the I/O complexity for computing join queries, and also obtain
optimal algorithms for other classes of queries
Distributed Connectivity Decomposition
We present time-efficient distributed algorithms for decomposing graphs with
large edge or vertex connectivity into multiple spanning or dominating trees,
respectively. As their primary applications, these decompositions allow us to
achieve information flow with size close to the connectivity by parallelizing
it along the trees. More specifically, our distributed decomposition algorithms
are as follows:
(I) A decomposition of each undirected graph with vertex-connectivity
into (fractionally) vertex-disjoint weighted dominating trees with total weight
, in rounds.
(II) A decomposition of each undirected graph with edge-connectivity
into (fractionally) edge-disjoint weighted spanning trees with total
weight , in
rounds.
We also show round complexity lower bounds of
and
for the above two decompositions,
using techniques of [Das Sarma et al., STOC'11]. Moreover, our
vertex-connectivity decomposition extends to centralized algorithms and
improves the time complexity of [Censor-Hillel et al., SODA'14] from
to near-optimal .
As corollaries, we also get distributed oblivious routing broadcast with
-competitive edge-congestion and -competitive
vertex-congestion. Furthermore, the vertex connectivity decomposition leads to
near-time-optimal -approximation of vertex connectivity: centralized
and distributed . The former moves
toward the 1974 conjecture of Aho, Hopcroft, and Ullman postulating an
centralized exact algorithm while the latter is the first distributed vertex
connectivity approximation
Communication Steps for Parallel Query Processing
We consider the problem of computing a relational query on a large input
database of size , using a large number of servers. The computation is
performed in rounds, and each server can receive only
bits of data, where is a parameter that controls
replication. We examine how many global communication steps are needed to
compute . We establish both lower and upper bounds, in two settings. For a
single round of communication, we give lower bounds in the strongest possible
model, where arbitrary bits may be exchanged; we show that any algorithm
requires , where is the fractional vertex
cover of the hypergraph of . We also give an algorithm that matches the
lower bound for a specific class of databases. For multiple rounds of
communication, we present lower bounds in a model where routing decisions for a
tuple are tuple-based. We show that for the class of tree-like queries there
exists a tradeoff between the number of rounds and the space exponent
. The lower bounds for multiple rounds are the first of their
kind. Our results also imply that transitive closure cannot be computed in O(1)
rounds of communication
Instance and Output Optimal Parallel Algorithms for Acyclic Joins
Massively parallel join algorithms have received much attention in recent
years, while most prior work has focused on worst-optimal algorithms. However,
the worst-case optimality of these join algorithms relies on hard instances
having very large output sizes, which rarely appear in practice. A stronger
notion of optimality is {\em output-optimal}, which requires an algorithm to be
optimal within the class of all instances sharing the same input and output
size. An even stronger optimality is {\em instance-optimal}, i.e., the
algorithm is optimal on every single instance, but this may not always be
achievable.
In the traditional RAM model of computation, the classical Yannakakis
algorithm is instance-optimal on any acyclic join. But in the massively
parallel computation (MPC) model, the situation becomes much more complicated.
We first show that for the class of r-hierarchical joins, instance-optimality
can still be achieved in the MPC model. Then, we give a new MPC algorithm for
an arbitrary acyclic join with load O ({\IN \over p} + {\sqrt{\IN \cdot \OUT}
\over p}), where \IN,\OUT are the input and output sizes of the join, and
is the number of servers in the MPC model. This improves the MPC version of
the Yannakakis algorithm by an O (\sqrt{\OUT \over \IN} ) factor.
Furthermore, we show that this is output-optimal when \OUT = O(p \cdot \IN),
for every acyclic but non-r-hierarchical join. Finally, we give the first
output-sensitive lower bound for the triangle join in the MPC model, showing
that it is inherently more difficult than acyclic joins
Optimal Distributed Covering Algorithms
We present a time-optimal deterministic distributed algorithm for approximating a minimum weight vertex cover in hypergraphs of rank f. This problem is equivalent to the Minimum Weight Set Cover problem in which the frequency of every element is bounded by f. The approximation factor of our algorithm is (f+epsilon). Let Delta denote the maximum degree in the hypergraph. Our algorithm runs in the congest model and requires O(log{Delta} / log log Delta) rounds, for constants epsilon in (0,1] and f in N^+. This is the first distributed algorithm for this problem whose running time does not depend on the vertex weights nor the number of vertices. Thus adding another member to the exclusive family of provably optimal distributed algorithms.
For constant values of f and epsilon, our algorithm improves over the (f+epsilon)-approximation algorithm of [Fabian Kuhn et al., 2006] whose running time is O(log Delta + log W), where W is the ratio between the largest and smallest vertex weights in the graph. Our algorithm also achieves an f-approximation for the problem in O(f log n) rounds, improving over the classical result of [Samir Khuller et al., 1994] that achieves a running time of O(f log^2 n). Finally, for weighted vertex cover (f=2) our algorithm achieves a deterministic running time of O(log n), matching the randomized previously best result of [Koufogiannakis and Young, 2011].
We also show that integer covering-programs can be reduced to the Minimum Weight Set Cover problem in the distributed setting. This allows us to achieve an (f+epsilon)-approximate integral solution in O((1+f/log n)* ((log Delta)/(log log Delta) + (f * log M)^{1.01}* log epsilon^{-1}* (log Delta)^{0.01})) rounds, where f bounds the number of variables in a constraint, Delta bounds the number of constraints a variable appears in, and M=max {1, ceil[1/a_{min}]}, where a_{min} is the smallest normalized constraint coefficient. This improves over the results of [Fabian Kuhn et al., 2006] for the integral case, which combined with rounding achieves the same guarantees in O(epsilon^{-4}* f^4 * log f * log(M * Delta)) rounds
A Near-Optimal Parallel Algorithm for Joining Binary Relations
We present a constant-round algorithm in the massively parallel computation
(MPC) model for evaluating a natural join where every input relation has two
attributes. Our algorithm achieves a load of where
is the total size of the input relations, is the number of machines,
is the join's fractional edge covering number, and hides
a polylogarithmic factor. The load matches a known lower bound up to a
polylogarithmic factor. At the core of the proposed algorithm is a new theorem
(which we name {\em the isolated cartesian product theorem}) that provides
fresh insight into the problem's mathematical structure. Our result implies
that the {\em subgraph enumeration problem}, where the goal is to report all
the occurrences of a constant-sized subgraph pattern, can be settled optimally
(up to a polylogarithmic factor) in the MPC model.Comment: Short versions of this article appeared in PODS'17 and ICDT'20. The
article is under submission to a journal. The red sentences are highlighted
for the journal's reviewer
Melting and freezing of argon in a granular packing of linear mesopore arrays
Freezing and melting of Ar condensed in a granular packing of template-grown
arrays of linear mesopores (SBA-15, mean pore diameter 8 nanometer) has been
studied by specific heat measurements C as a function of fractional filling of
the pores. While interfacial melting leads to a single melting peak in C,
homogeneous and heterogeneous freezing along with a delayering transition for
partial fillings of the pores result in a complex freezing mechanism
explainable only by a consideration of regular adsorption sites (in the
cylindrical mesopores) and irregular adsorption sites (in niches of the rough
external surfaces of the grains, and at points of mutual contact of the powder
grains). The tensile pressure release upon reaching bulk liquid/vapor
coexistence quantitatively accounts for an upward shift of the
melting/freeezing temperature observed while overfilling the mesopores.Comment: 4 pages, 4 figures, to appear as a Letter in Physical Review Letter
- …