2,004 research outputs found

    Worst-Case Optimal Algorithms for Parallel Query Processing

    Get PDF
    In this paper, we study the communication complexity for the problem of computing a conjunctive query on a large database in a parallel setting with pp servers. In contrast to previous work, where upper and lower bounds on the communication were specified for particular structures of data (either data without skew, or data with specific types of skew), in this work we focus on worst-case analysis of the communication cost. The goal is to find worst-case optimal parallel algorithms, similar to the work of [18] for sequential algorithms. We first show that for a single round we can obtain an optimal worst-case algorithm. The optimal load for a conjunctive query qq when all relations have size equal to MM is O(M/p1/ψ)O(M/p^{1/\psi^*}), where ψ\psi^* is a new query-related quantity called the edge quasi-packing number, which is different from both the edge packing number and edge cover number of the query hypergraph. For multiple rounds, we present algorithms that are optimal for several classes of queries. Finally, we show a surprising connection to the external memory model, which allows us to translate parallel algorithms to external memory algorithms. This technique allows us to recover (within a polylogarithmic factor) several recent results on the I/O complexity for computing join queries, and also obtain optimal algorithms for other classes of queries

    A Simple Parallel Algorithm for Natural Joins on Binary Relations

    Get PDF

    Distributed Connectivity Decomposition

    Full text link
    We present time-efficient distributed algorithms for decomposing graphs with large edge or vertex connectivity into multiple spanning or dominating trees, respectively. As their primary applications, these decompositions allow us to achieve information flow with size close to the connectivity by parallelizing it along the trees. More specifically, our distributed decomposition algorithms are as follows: (I) A decomposition of each undirected graph with vertex-connectivity kk into (fractionally) vertex-disjoint weighted dominating trees with total weight Ω(klogn)\Omega(\frac{k}{\log n}), in O~(D+n)\widetilde{O}(D+\sqrt{n}) rounds. (II) A decomposition of each undirected graph with edge-connectivity λ\lambda into (fractionally) edge-disjoint weighted spanning trees with total weight λ12(1ε)\lceil\frac{\lambda-1}{2}\rceil(1-\varepsilon), in O~(D+nλ)\widetilde{O}(D+\sqrt{n\lambda}) rounds. We also show round complexity lower bounds of Ω~(D+nk)\tilde{\Omega}(D+\sqrt{\frac{n}{k}}) and Ω~(D+nλ)\tilde{\Omega}(D+\sqrt{\frac{n}{\lambda}}) for the above two decompositions, using techniques of [Das Sarma et al., STOC'11]. Moreover, our vertex-connectivity decomposition extends to centralized algorithms and improves the time complexity of [Censor-Hillel et al., SODA'14] from O(n3)O(n^3) to near-optimal O~(m)\tilde{O}(m). As corollaries, we also get distributed oblivious routing broadcast with O(1)O(1)-competitive edge-congestion and O(logn)O(\log n)-competitive vertex-congestion. Furthermore, the vertex connectivity decomposition leads to near-time-optimal O(logn)O(\log n)-approximation of vertex connectivity: centralized O~(m)\widetilde{O}(m) and distributed O~(D+n)\tilde{O}(D+\sqrt{n}). The former moves toward the 1974 conjecture of Aho, Hopcroft, and Ullman postulating an O(m)O(m) centralized exact algorithm while the latter is the first distributed vertex connectivity approximation

    Communication Steps for Parallel Query Processing

    Full text link
    We consider the problem of computing a relational query qq on a large input database of size nn, using a large number pp of servers. The computation is performed in rounds, and each server can receive only O(n/p1ε)O(n/p^{1-\varepsilon}) bits of data, where ε[0,1]\varepsilon \in [0,1] is a parameter that controls replication. We examine how many global communication steps are needed to compute qq. We establish both lower and upper bounds, in two settings. For a single round of communication, we give lower bounds in the strongest possible model, where arbitrary bits may be exchanged; we show that any algorithm requires ε11/τ\varepsilon \geq 1-1/\tau^*, where τ\tau^* is the fractional vertex cover of the hypergraph of qq. We also give an algorithm that matches the lower bound for a specific class of databases. For multiple rounds of communication, we present lower bounds in a model where routing decisions for a tuple are tuple-based. We show that for the class of tree-like queries there exists a tradeoff between the number of rounds and the space exponent ε\varepsilon. The lower bounds for multiple rounds are the first of their kind. Our results also imply that transitive closure cannot be computed in O(1) rounds of communication

    Instance and Output Optimal Parallel Algorithms for Acyclic Joins

    Full text link
    Massively parallel join algorithms have received much attention in recent years, while most prior work has focused on worst-optimal algorithms. However, the worst-case optimality of these join algorithms relies on hard instances having very large output sizes, which rarely appear in practice. A stronger notion of optimality is {\em output-optimal}, which requires an algorithm to be optimal within the class of all instances sharing the same input and output size. An even stronger optimality is {\em instance-optimal}, i.e., the algorithm is optimal on every single instance, but this may not always be achievable. In the traditional RAM model of computation, the classical Yannakakis algorithm is instance-optimal on any acyclic join. But in the massively parallel computation (MPC) model, the situation becomes much more complicated. We first show that for the class of r-hierarchical joins, instance-optimality can still be achieved in the MPC model. Then, we give a new MPC algorithm for an arbitrary acyclic join with load O ({\IN \over p} + {\sqrt{\IN \cdot \OUT} \over p}), where \IN,\OUT are the input and output sizes of the join, and pp is the number of servers in the MPC model. This improves the MPC version of the Yannakakis algorithm by an O (\sqrt{\OUT \over \IN} ) factor. Furthermore, we show that this is output-optimal when \OUT = O(p \cdot \IN), for every acyclic but non-r-hierarchical join. Finally, we give the first output-sensitive lower bound for the triangle join in the MPC model, showing that it is inherently more difficult than acyclic joins

    Optimal Distributed Covering Algorithms

    Get PDF
    We present a time-optimal deterministic distributed algorithm for approximating a minimum weight vertex cover in hypergraphs of rank f. This problem is equivalent to the Minimum Weight Set Cover problem in which the frequency of every element is bounded by f. The approximation factor of our algorithm is (f+epsilon). Let Delta denote the maximum degree in the hypergraph. Our algorithm runs in the congest model and requires O(log{Delta} / log log Delta) rounds, for constants epsilon in (0,1] and f in N^+. This is the first distributed algorithm for this problem whose running time does not depend on the vertex weights nor the number of vertices. Thus adding another member to the exclusive family of provably optimal distributed algorithms. For constant values of f and epsilon, our algorithm improves over the (f+epsilon)-approximation algorithm of [Fabian Kuhn et al., 2006] whose running time is O(log Delta + log W), where W is the ratio between the largest and smallest vertex weights in the graph. Our algorithm also achieves an f-approximation for the problem in O(f log n) rounds, improving over the classical result of [Samir Khuller et al., 1994] that achieves a running time of O(f log^2 n). Finally, for weighted vertex cover (f=2) our algorithm achieves a deterministic running time of O(log n), matching the randomized previously best result of [Koufogiannakis and Young, 2011]. We also show that integer covering-programs can be reduced to the Minimum Weight Set Cover problem in the distributed setting. This allows us to achieve an (f+epsilon)-approximate integral solution in O((1+f/log n)* ((log Delta)/(log log Delta) + (f * log M)^{1.01}* log epsilon^{-1}* (log Delta)^{0.01})) rounds, where f bounds the number of variables in a constraint, Delta bounds the number of constraints a variable appears in, and M=max {1, ceil[1/a_{min}]}, where a_{min} is the smallest normalized constraint coefficient. This improves over the results of [Fabian Kuhn et al., 2006] for the integral case, which combined with rounding achieves the same guarantees in O(epsilon^{-4}* f^4 * log f * log(M * Delta)) rounds

    A Near-Optimal Parallel Algorithm for Joining Binary Relations

    Full text link
    We present a constant-round algorithm in the massively parallel computation (MPC) model for evaluating a natural join where every input relation has two attributes. Our algorithm achieves a load of O~(m/p1/ρ)\tilde{O}(m/p^{1/\rho}) where mm is the total size of the input relations, pp is the number of machines, ρ\rho is the join's fractional edge covering number, and O~(.)\tilde{O}(.) hides a polylogarithmic factor. The load matches a known lower bound up to a polylogarithmic factor. At the core of the proposed algorithm is a new theorem (which we name {\em the isolated cartesian product theorem}) that provides fresh insight into the problem's mathematical structure. Our result implies that the {\em subgraph enumeration problem}, where the goal is to report all the occurrences of a constant-sized subgraph pattern, can be settled optimally (up to a polylogarithmic factor) in the MPC model.Comment: Short versions of this article appeared in PODS'17 and ICDT'20. The article is under submission to a journal. The red sentences are highlighted for the journal's reviewer

    Melting and freezing of argon in a granular packing of linear mesopore arrays

    Full text link
    Freezing and melting of Ar condensed in a granular packing of template-grown arrays of linear mesopores (SBA-15, mean pore diameter 8 nanometer) has been studied by specific heat measurements C as a function of fractional filling of the pores. While interfacial melting leads to a single melting peak in C, homogeneous and heterogeneous freezing along with a delayering transition for partial fillings of the pores result in a complex freezing mechanism explainable only by a consideration of regular adsorption sites (in the cylindrical mesopores) and irregular adsorption sites (in niches of the rough external surfaces of the grains, and at points of mutual contact of the powder grains). The tensile pressure release upon reaching bulk liquid/vapor coexistence quantitatively accounts for an upward shift of the melting/freeezing temperature observed while overfilling the mesopores.Comment: 4 pages, 4 figures, to appear as a Letter in Physical Review Letter
    corecore