470 research outputs found

    Communication Steps for Parallel Query Processing

    Full text link
    We consider the problem of computing a relational query qq on a large input database of size nn, using a large number pp of servers. The computation is performed in rounds, and each server can receive only O(n/p1ε)O(n/p^{1-\varepsilon}) bits of data, where ε[0,1]\varepsilon \in [0,1] is a parameter that controls replication. We examine how many global communication steps are needed to compute qq. We establish both lower and upper bounds, in two settings. For a single round of communication, we give lower bounds in the strongest possible model, where arbitrary bits may be exchanged; we show that any algorithm requires ε11/τ\varepsilon \geq 1-1/\tau^*, where τ\tau^* is the fractional vertex cover of the hypergraph of qq. We also give an algorithm that matches the lower bound for a specific class of databases. For multiple rounds of communication, we present lower bounds in a model where routing decisions for a tuple are tuple-based. We show that for the class of tree-like queries there exists a tradeoff between the number of rounds and the space exponent ε\varepsilon. The lower bounds for multiple rounds are the first of their kind. Our results also imply that transitive closure cannot be computed in O(1) rounds of communication

    Worst-Case Optimal Algorithms for Parallel Query Processing

    Get PDF
    In this paper, we study the communication complexity for the problem of computing a conjunctive query on a large database in a parallel setting with pp servers. In contrast to previous work, where upper and lower bounds on the communication were specified for particular structures of data (either data without skew, or data with specific types of skew), in this work we focus on worst-case analysis of the communication cost. The goal is to find worst-case optimal parallel algorithms, similar to the work of [18] for sequential algorithms. We first show that for a single round we can obtain an optimal worst-case algorithm. The optimal load for a conjunctive query qq when all relations have size equal to MM is O(M/p1/ψ)O(M/p^{1/\psi^*}), where ψ\psi^* is a new query-related quantity called the edge quasi-packing number, which is different from both the edge packing number and edge cover number of the query hypergraph. For multiple rounds, we present algorithms that are optimal for several classes of queries. Finally, we show a surprising connection to the external memory model, which allows us to translate parallel algorithms to external memory algorithms. This technique allows us to recover (within a polylogarithmic factor) several recent results on the I/O complexity for computing join queries, and also obtain optimal algorithms for other classes of queries

    Parallel Query Processing on 2D Mesh and Linear Array Architectures

    Get PDF
    As the size of the web grows, it is necessary to parallelize the process of retrieving information from the web. Incorporating parallelism in search engines is one of the approaches towards achieving this aim. This paper presents an algorithm for query processing on the 2D mesh architecture and two algorithms for linear array architectures. We attempt to exploit the arrangement of processors and the communication pattern in both 2D mesh and linear array architectures to attain high speedup and efficiency for queries-keywords comparisons. A cost model is presented for each algorithm based on both processing and communication cost. Proposed algorithms are evaluated using speedup and efficiency performance metrics. For the same number of processors, 2D Mesh_QP outperforms both linear array algorithms (LA_QPAKP and LA_QPKE). Keywords: 2D Mesh, Linear Arrays, Parallel computing, Query processin

    Controlling Disk Contention for Parallel Query Processing in Shared Disk Database Systems

    Get PDF
    Shared Disk database systems offer a high flexibility for parallel transaction and query processing. This is because each node can process any transaction, query or subquery because it has access to the entire database. Compared to Shared Nothing, this is particularly advantageous for scan queries for which the degree of intra-query parallelism as well as the scan processors themselves can dynamically be chosen. On the other hand, there is the danger of disk contention between subqueries, in particular for index scans. We present a detailed simulation study to analyze the effectiveness of parallel scan processing in Shared Disk database systems. In particular, we investigate the relationship between the degree of declustering and the degree of scan parallelism for relation scans, clustered index scans, and non-clustered index scans. Furthermore, we study the usefulness of disk caches and prefetching for limiting disk contention. Finally, we show the importance of dynamically choosing the degree of scan parallelism to control disk contention in multi-user mode

    Join Execution Using Fragmented Columnar Indices on GPU and MIC

    Full text link
    The paper describes an approach to the parallel natural join execution on computing clusters with GPU and MIC Coprocessors. This approach is based on a decomposition of natural join relational operator using the column indices and domain-interval fragmentation. This decomposition admits parallel executing the resource-intensive relational operators without data transfers. All column index fragments are stored in main memory. To process the join of two relations, each pair of index fragments corresponding to particular domain interval is joined on a separate processor core. Described approach allows efficient parallel query processing for very large databases on modern computing cluster systems with many-core accelerators. A prototype of the DBMS coprocessor system was implemented using this technique. The results of computational experiments for GPU and Xeon Phi are presented. These results confirm the efficiency of proposed approach

    Handling Non-deterministic Data Availability in Parallel Query Execution.

    Get PDF
    The situation of non-deterministic data availability, where it is not known a priori which of two or more processes will respond first, cannot be handled with standard techniques. The consequence is sub-optimal processing because of inefficient resource allocation and unnecessary delays. In this paper we develop an effective solution to the problem by extending the demand-driven evaluation paradigm to the end of using operators with more than just one output stream. We show how inter-process communication and non-deterministic data availability in parallel query processing reduce to cases that can be executed efficiently with the new evaluation paradigm
    corecore