Search CORE

2,004 research outputs found

Worst-Case Optimal Algorithms for Parallel Query Processing

Author: Beame Paul
Koutris Paraschos
Suciu Dan
Publication venue
Publication date: 01/01/2016
Field of study

In this paper, we study the communication complexity for the problem of computing a conjunctive query on a large database in a parallel setting with

p

servers. In contrast to previous work, where upper and lower bounds on the communication were specified for particular structures of data (either data without skew, or data with specific types of skew), in this work we focus on worst-case analysis of the communication cost. The goal is to find worst-case optimal parallel algorithms, similar to the work of [18] for sequential algorithms. We first show that for a single round we can obtain an optimal worst-case algorithm. The optimal load for a conjunctive query

q

when all relations have size equal to

M

O(M/p^{1/\psi^*})

, where

\psi^*

is a new query-related quantity called the edge quasi-packing number, which is different from both the edge packing number and edge cover number of the query hypergraph. For multiple rounds, we present algorithms that are optimal for several classes of queries. Finally, we show a surprising connection to the external memory model, which allows us to translate parallel algorithms to external memory algorithms. This technique allows us to recover (within a polylogarithmic factor) several recent results on the I/O complexity for computing join queries, and also obtain optimal algorithms for other classes of queries

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

A Simple Parallel Algorithm for Natural Joins on Binary Relations

Author: Tao Yufei
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 23rd International Conference on Database Theory (ICDT 2020)
Publication date: 01/01/2020
Field of study

Dagstuhl Research Online Publication Server

Distributed Connectivity Decomposition

Author: Censor-Hillel Keren
Ghaffari Mohsen
Kuhn Fabian
Publication venue
Publication date: 21/11/2013
Field of study

We present time-efficient distributed algorithms for decomposing graphs with large edge or vertex connectivity into multiple spanning or dominating trees, respectively. As their primary applications, these decompositions allow us to achieve information flow with size close to the connectivity by parallelizing it along the trees. More specifically, our distributed decomposition algorithms are as follows: (I) A decomposition of each undirected graph with vertex-connectivity

k

into (fractionally) vertex-disjoint weighted dominating trees with total weight

\Omega(\frac{k}{\log n})

, in

\widetilde{O}(D+\sqrt{n})

rounds. (II) A decomposition of each undirected graph with edge-connectivity

\lambda

into (fractionally) edge-disjoint weighted spanning trees with total weight

\lceil\frac{\lambda-1}{2}\rceil(1-\varepsilon)

, in

\widetilde{O}(D+\sqrt{n\lambda})

rounds. We also show round complexity lower bounds of

\tilde{\Omega}(D+\sqrt{\frac{n}{k}})

and

\tilde{\Omega}(D+\sqrt{\frac{n}{\lambda}})

for the above two decompositions, using techniques of [Das Sarma et al., STOC'11]. Moreover, our vertex-connectivity decomposition extends to centralized algorithms and improves the time complexity of [Censor-Hillel et al., SODA'14] from

O(n^3)

to near-optimal

\tilde{O}(m)

. As corollaries, we also get distributed oblivious routing broadcast with

O(1)

-competitive edge-congestion and

O(\log n)

-competitive vertex-congestion. Furthermore, the vertex connectivity decomposition leads to near-time-optimal

O(\log n)

-approximation of vertex connectivity: centralized

\widetilde{O}(m)

and distributed

\tilde{O}(D+\sqrt{n})

. The former moves toward the 1974 conjecture of Aho, Hopcroft, and Ullman postulating an

O(m)

centralized exact algorithm while the latter is the first distributed vertex connectivity approximation

arXiv.org e-Print Archive

CiteSeerX

Communication Steps for Parallel Query Processing

Author: Beame Paul
Koutris Paraschos
Suciu Dan
Publication venue
Publication date: 01/01/2013
Field of study

We consider the problem of computing a relational query

q

on a large input database of size

n

, using a large number

p

of servers. The computation is performed in rounds, and each server can receive only

O(n/p^{1-\varepsilon})

bits of data, where

\varepsilon \in [0,1]

is a parameter that controls replication. We examine how many global communication steps are needed to compute

q

. We establish both lower and upper bounds, in two settings. For a single round of communication, we give lower bounds in the strongest possible model, where arbitrary bits may be exchanged; we show that any algorithm requires

\varepsilon \geq 1-1/\tau^*

, where

\tau^*

is the fractional vertex cover of the hypergraph of

q

. We also give an algorithm that matches the lower bound for a specific class of databases. For multiple rounds of communication, we present lower bounds in a model where routing decisions for a tuple are tuple-based. We show that for the class of tree-like queries there exists a tradeoff between the number of rounds and the space exponent

\varepsilon

. The lower bounds for multiple rounds are the first of their kind. Our results also imply that transitive closure cannot be computed in O(1) rounds of communication

arXiv.org e-Print Archive

CiteSeerX

Crossref

Instance and Output Optimal Parallel Algorithms for Acyclic Joins

Author: Hu Xiao
Yi Ke
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

Massively parallel join algorithms have received much attention in recent years, while most prior work has focused on worst-optimal algorithms. However, the worst-case optimality of these join algorithms relies on hard instances having very large output sizes, which rarely appear in practice. A stronger notion of optimality is {\em output-optimal}, which requires an algorithm to be optimal within the class of all instances sharing the same input and output size. An even stronger optimality is {\em instance-optimal}, i.e., the algorithm is optimal on every single instance, but this may not always be achievable. In the traditional RAM model of computation, the classical Yannakakis algorithm is instance-optimal on any acyclic join. But in the massively parallel computation (MPC) model, the situation becomes much more complicated. We first show that for the class of r-hierarchical joins, instance-optimality can still be achieved in the MPC model. Then, we give a new MPC algorithm for an arbitrary acyclic join with load O ({\IN \over p} + {\sqrt{\IN \cdot \OUT} \over p}), where \IN,\OUT are the input and output sizes of the join, and

p

is the number of servers in the MPC model. This improves the MPC version of the Yannakakis algorithm by an O (\sqrt{\OUT \over \IN} ) factor. Furthermore, we show that this is output-optimal when \OUT = O(p \cdot \IN), for every acyclic but non-r-hierarchical join. Finally, we give the first output-sensitive lower bound for the triangle join in the MPC model, showing that it is inherently more difficult than acyclic joins

arXiv.org e-Print Archive

Crossref

Optimal Distributed Covering Algorithms

Author: Ben-Basat Ran
Even Guy
Kawarabayashi Ken-ichi
Schwartzman Gregory
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 33rd International Symposium on Distributed Computing (DISC 2019)
Publication date: 01/01/2019
Field of study

We present a time-optimal deterministic distributed algorithm for approximating a minimum weight vertex cover in hypergraphs of rank f. This problem is equivalent to the Minimum Weight Set Cover problem in which the frequency of every element is bounded by f. The approximation factor of our algorithm is (f+epsilon). Let Delta denote the maximum degree in the hypergraph. Our algorithm runs in the congest model and requires O(log{Delta} / log log Delta) rounds, for constants epsilon in (0,1] and f in N^+. This is the first distributed algorithm for this problem whose running time does not depend on the vertex weights nor the number of vertices. Thus adding another member to the exclusive family of provably optimal distributed algorithms. For constant values of f and epsilon, our algorithm improves over the (f+epsilon)-approximation algorithm of [Fabian Kuhn et al., 2006] whose running time is O(log Delta + log W), where W is the ratio between the largest and smallest vertex weights in the graph. Our algorithm also achieves an f-approximation for the problem in O(f log n) rounds, improving over the classical result of [Samir Khuller et al., 1994] that achieves a running time of O(f log^2 n). Finally, for weighted vertex cover (f=2) our algorithm achieves a deterministic running time of O(log n), matching the randomized previously best result of [Koufogiannakis and Young, 2011]. We also show that integer covering-programs can be reduced to the Minimum Weight Set Cover problem in the distributed setting. This allows us to achieve an (f+epsilon)-approximate integral solution in O((1+f/log n)* ((log Delta)/(log log Delta) + (f * log M)^{1.01}* log epsilon^{-1}* (log Delta)^{0.01})) rounds, where f bounds the number of variables in a constraint, Delta bounds the number of constraints a variable appears in, and M=max {1, ceil[1/a_{min}]}, where a_{min} is the smallest normalized constraint coefficient. This improves over the results of [Fabian Kuhn et al., 2006] for the integral case, which combined with rounding achieves the same guarantees in O(epsilon^{-4}* f^4 * log f * log(M * Delta)) rounds

arXiv.org e-Print Archive

Crossref

Dagstuhl Research Online Publication Server

A Near-Optimal Parallel Algorithm for Joining Binary Relations

Author: Ketsman Bas
Suciu Dan
Tao Yufei
Publication venue
Publication date: 19/10/2021
Field of study

We present a constant-round algorithm in the massively parallel computation (MPC) model for evaluating a natural join where every input relation has two attributes. Our algorithm achieves a load of

\tilde{O}(m/p^{1/\rho})

where

m

is the total size of the input relations,

p

is the number of machines,

\rho

is the join's fractional edge covering number, and

\tilde{O}(.)

hides a polylogarithmic factor. The load matches a known lower bound up to a polylogarithmic factor. At the core of the proposed algorithm is a new theorem (which we name {\em the isolated cartesian product theorem}) that provides fresh insight into the problem's mathematical structure. Our result implies that the {\em subgraph enumeration problem}, where the goal is to report all the occurrences of a constant-sized subgraph pattern, can be settled optimally (up to a polylogarithmic factor) in the MPC model.Comment: Short versions of this article appeared in PODS'17 and ICDT'20. The article is under submission to a journal. The red sentences are highlighted for the journal's reviewer

arXiv.org e-Print Archive

Episciences.org

Melting and freezing of argon in a granular packing of linear mesopore arrays

Author: Christof Schaefer
Dirk Wallacher
Klaus Knorr
Patrick Huber
Tommy Hofmann
Y. C. Yortsos
Publication venue: 'American Physical Society (APS)'
Publication date: 29/03/2008
Field of study

Freezing and melting of Ar condensed in a granular packing of template-grown arrays of linear mesopores (SBA-15, mean pore diameter 8 nanometer) has been studied by specific heat measurements C as a function of fractional filling of the pores. While interfacial melting leads to a single melting peak in C, homogeneous and heterogeneous freezing along with a delayering transition for partial fillings of the pores result in a complex freezing mechanism explainable only by a consideration of regular adsorption sites (in the cylindrical mesopores) and irregular adsorption sites (in niches of the rough external surfaces of the grains, and at points of mutual contact of the powder grains). The tensile pressure release upon reaching bulk liquid/vapor coexistence quantitatively accounts for an upward shift of the melting/freeezing temperature observed while overfilling the mesopores.Comment: 4 pages, 4 figures, to appear as a Letter in Physical Review Letter

arXiv.org e-Print Archive

Crossref