Search CORE

70,857 research outputs found

Parallel Query Processing on 2D Mesh and Linear Array Architectures

Author: Aboutabl Amal Elsayed
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 29/12/2013
Field of study

As the size of the web grows, it is necessary to parallelize the process of retrieving information from the web. Incorporating parallelism in search engines is one of the approaches towards achieving this aim. This paper presents an algorithm for query processing on the 2D mesh architecture and two algorithms for linear array architectures. We attempt to exploit the arrangement of processors and the communication pattern in both 2D mesh and linear array architectures to attain high speedup and efficiency for queries-keywords comparisons. A cost model is presented for each algorithm based on both processing and communication cost. Proposed algorithms are evaluated using speedup and efficiency performance metrics. For the same number of processors, 2D Mesh_QP outperforms both linear array algorithms (LA_QPAKP and LA_QPKE). Keywords: 2D Mesh, Linear Arrays, Parallel computing, Query processin

International Institute for Science, Technology and Education (IISTE): E-Journals

Locality-Aware Distribution Schemes

Author: Koutris Paraschos
Naughton Jeffrey
Sundarmurthy Bruhathi
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 24th International Conference on Database Theory (ICDT 2021)
Publication date: 01/01/2021
Field of study

One of the bottlenecks in parallel query processing is the cost of shuffling data across nodes in a cluster. Ideally, given a distribution of the data across the nodes and a query, we want to execute the query by performing only local computation and no communication: in this case, the query is called parallel-correct with respect to the data distribution. Previous work studied this problem for Conjunctive Queries in the case where the distribution scheme is oblivious, i.e., the location of each tuple depends only on the tuple and is independent of the instance. In this work, we show that oblivious schemes have a fundamental theoretical limitation, and initiate the formal study of distribution schemes that are locality-aware. In particular, we focus on a class of distribution schemes called co-hash distribution schemes, which are widely used in parallel systems. In co-hash partitioning, some tables are initially hashed, and the remaining tables are co-located so that a join condition is always satisfied. Given a co-hash distribution scheme, we formally study the complexity of deciding various desirable properties, including obliviousness and redundancy. Then, for a given Conjunctive Query and co-hash scheme, we determine the computational complexity of deciding whether the query is parallel-correct. We also explore a stronger notion of correctness, called parallel disjoint correctness, which guarantees that the query result will be disjointly partitioned across nodes, i.e., there is no duplication of results

Dagstuhl Research Online Publication Server

Recommended from our members

An effective data placement strategy for XML documents

Author: Lü K
Zhu Y
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2001
Field of study

As XML is increasingly being used in Web applications, new technologies need to be investigated for processing XML documents with high performance. Parallelism is a promising solution for structured document processing and data placement is a major factor for system performance improvement in parallel processing. This paper describes an effective XML document data placement strategy. The new strategy is based on a multilevel graph partitioning algorithm with the consideration of the unique features of XML documents and query distributions. A new algorithm, which is based on XML query schemas to derive the weighted graph from the labelled directed graph presentation of XML documents, is also proposed. Performance analysis on the algorithm presented in the paper shows that the new data placement strategy exhibits low workload skew and a high degree of parallelism

Brunel University Research Archive

Distributed top-k aggregation queries at large

Author: A. Marian
Gerhard Weikum
H. David
I.F. Ilyas
K. Church
K. Schnaitter
Matthias Bender
N. Bruno
Peter Triantafillou
R. Akbarinia
R. Fagin
Ralf Schenkel
S. Chaudhuri
S. Madden
Sebastian Michel
T. Cormen
Thomas Neumann
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Top-k query processing is a fundamental building block for efficient ranking in a large number of applications. Efficiency is a central issue, especially for distributed settings, when the data is spread across different nodes in a network. This paper introduces novel optimization methods for top-k aggregation queries in such distributed environments. The optimizations can be applied to all algorithms that fall into the frameworks of the prior TPUT and KLEE methods. The optimizations address three degrees of freedom: 1) hierarchically grouping input lists into top-k operator trees and optimizing the tree structure, 2) computing data-adaptive scan depths for different input sources, and 3) data-adaptive sampling of a small subset of input sources in scenarios with hundreds or thousands of query-relevant network nodes. All optimizations are based on a statistical cost model that utilizes local synopses, e.g., in the form of histograms, efficiently computed convolutions, and estimators based on order statistics. The paper presents comprehensive experiments, with three different real-life datasets and using the ns-2 network simulator for a packet-level simulation of a large Internet-style network

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Springer - Publisher Connector

Enlighten

MPG.PuRe

Control versus Data Flow in Parallel Database Machines

Author: Blanken Henk M.
Teeuw Wouter B.
Publication venue: IEEE Computer Society Press
Publication date: 01/01/1993
Field of study

The execution of a query in a parallel database machine can be controlled in either a control flow way, or in a data flow way. In the former case a single system node controls the entire query execution. In the latter case the processes that execute the query, although possibly running on different nodes of the system, trigger each other. Lately, many database research projects focus on data flow control since it should enhance response times and throughput. The authors study control versus data flow with regard to controlling the execution of database queries. An analytical model is used to compare control and data flow in order to gain insights into the question which mechanism is better under which circumstances. Also, some systems using data flow techniques are described, and the authors investigate to which degree they are really data flow. The results show that for particular types of queries data flow is very attractive, since it reduces the number of control messages and balances these messages over the node

University of Twente Research Information

Distributed Processing of Generalized Graph-Pattern Queries in SPARQL 1.1

Author: Gurajada Sairam
Theobald Martin
Publication venue
Publication date: 01/01/2016
Field of study

We propose an efficient and scalable architecture for processing generalized graph-pattern queries as they are specified by the current W3C recommendation of the SPARQL 1.1 "Query Language" component. Specifically, the class of queries we consider consists of sets of SPARQL triple patterns with labeled property paths. From a relational perspective, this class resolves to conjunctive queries of relational joins with additional graph-reachability predicates. For the scalable, i.e., distributed, processing of this kind of queries over very large RDF collections, we develop a suitable partitioning and indexing scheme, which allows us to shard the RDF triples over an entire cluster of compute nodes and to process an incoming SPARQL query over all of the relevant graph partitions (and thus compute nodes) in parallel. Unlike most prior works in this field, we specifically aim at the unified optimization and distributed processing of queries consisting of both relational joins and graph-reachability predicates. All communication among the compute nodes is established via a proprietary, asynchronous communication protocol based on the Message Passing Interface

arXiv.org e-Print Archive

Open Repository and Bibliography - Luxembourg

MPG.PuRe