Search CORE

28 research outputs found

A model for equi-join query processing in distributed relational databases

Author
Publication venue: Laboratory for Information and Decision Systems
Publication date: 01/01/1981
Field of study

"December 1981"Bibliography: leaf [1]"Contract ONR/N00014-77-C-0532"Kuan-Tsae Huang, Wilbur B. Davenport, Jr

DSpace@MIT

Implementation of composite semijoins using a variation of Bloom filters.

Author: Zhu Yongmei
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2004
Field of study

Different from a centralized database system, distributed query processing involves data transmission among different sites and this communication cost is a dominant factor compared to local processing cost. So, the objective of distributed query optimization is to find strategies to minimize the amount of data transmitted over the network. Since optimal query processing in distributed database systems has been shown to be an NP-hard problem, heuristics are applied to find a near-optimal processing strategy. Previous research has mainly focused on the use of joins, semijoins, and hash semijoins (Bloom filters). The semijoin is a commonly recognized operator, which provides efficient query results. As a variation of semijoin, the composite semijoin is beneficial to do semijoins as one composite rather than as multiple single column semijoins. The Hash semijoin (which uses a Bloom filter) is used to minimize the cost of a semijoin operation. This thesis report provides a summary of each category of query processing techniques and optimization algorithms. Also in this thesis, we propose a new algorithm called Composite Semijoin Filter by combining the idea of composite semijoins, Bloom filters and PERF joins. One of the advantages of this algorithm is to avoid collisions. The algorithm is evaluated and compared with initial feasible solution (IFS) and another filter-based algorithm. It has been shown that the algorithm gives substantial reduction on relations and the total cost.Dept. of Computer Science. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2004 .Z58. Source: Masters Abstracts International, Volume: 43-01, page: 0249. Adviser: Joan Morrissey. Thesis (M.Sc.)--University of Windsor (Canada), 2004

Scholarship at UWindsor

Recommended from our members

Better Semijoins Using Tuple Bit-Vectors

Author: Li Zhe
Ross Kenneth A.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/1994
Field of study

This paper presents the idea of "tuple-bit-vectors" for distributed query processing. Using tuple bit-vectors, a new two-way semijoin operator called 2SJ++ that enhances the semijoin with an essentially "free" backward reduction capability is proposed. We explore in detail the benefits and costs of 2SJ++ compared with other semijoin variants, and its effect on distributed query processing performance. We then focus on one particular distributed query processing algorithm, called the "one-shot" algorithm. We modify the one-shot algorithm by using 2SJ++ and demonstrate the improvements achieved in network transmission cost compared with the original one-shot technique. We use this improvement to demonstrate that equipped with the 2SJ++ technique, one can improve the performance of distributed query processing algorithms significantly without adding much complexity to the algorithms

Columbia University Academic Commons

An evaluation between Bloom Filter join and PERF join in Distributed Query Processing

Author: Pei Ming
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2008
Field of study

Nowadays, with the explosion of information and the telecommunication era\u27s coming, more and more huge applications encourage decentralization of data while accessing data from different sites [HFB00]. The process of retrieving data from different sites called Distributed Query Processing. The objective of distributed query optimization is to find the most cost-effective of executing query across the network [OV99]. Semijoin [BC81] [BG+81] is known as an effective operator to eliminate the tuples of a relation which are not contributive to a query. 2-way semijoin [KR87] is an extended version of semijoin which not only performs forward reduction like traditional semijoin does, but also provides backward reduction always in cost-effective way. Bloom Filter[B70] and PERF [LR95] are 2 filter based techniques which use a bit vector to represent of the original join attributes projection during the data transmission. Compare with generating a bit array with hash function in bloom filter, Perf join is based on the tuples scan order to avoid losing information caused by hash collision. In the thesis, we will apply both bloom filter and pert on 2-way semijoin algorithms to reduce transmission cost of distributed queries. Performance of propose algorithms will compare against each others and IFS (Initial Feasible Solution) through amount of experiments. \u27Keywords:\u27 Distributed Query Processing, Semijoin, Bloom Filter, Perf Join

Scholarship at UWindsor

Recommended from our members

A New Client-Server Architecture for Distributed Query Processing

Author: Li Zhe
Ross Kenneth A.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/1994
Field of study

This paper presents the idea of "tuple bit-vectors" for distributed query processing. Using tuple bit-vectors, a new two-way semijoin operator called 2SJ++ that enhances the semijoin with an essentially "free" backward reduction capability is proposed. We explore in detail the benefits and costs of 2SJ++ compared with other semijoin variants, and its effect on distributed query processing performance. We then focus on one particular distributed query processing algorithm, called the "one-shot" algorithm. We modify the one-shot algorithm by using 2SJ++ and demonstrate the improvements achieved in network transmission cost compared with the original one-shot technique. We use this improvement to demonstrate that equipped with the 2SJ++ technique, one can improve the performance of distributed query processing algorithms significantly without adding much complexity to the algorithms

Columbia University Academic Commons

A Genetic Programming Approach for Distributed Queries

Author: Cheung Karen S.K
Kamel Nabil
Publication venue: AIS Electronic Library (AISeL)
Publication date: 15/08/1997
Field of study

With the emergence of relatively inexpensive and advanced communication technology, Distributed Database Management Systems (DDBMS) have become an integral part of many computer applications. Efficient query processing is one of the most important issues in distributed database systems. In a distributed environment, it is common that queries extract data from different sites. It is important to limit the amount of data transfer across different sites. Semijoin is a way to reduce the cost of expensive joins between various sites. A key issue in query optimization based on semijoin reduction is to find a good sequence of semijoins that reduce the relations referenced in a given query before the joins are performed. This paper proposes a new approach, based on Genetic Programming (GP), to improve the process of database query in Distributed Database Systems. A longer version of this paper is available

AIS Electronic Library (AISeL)

An evaluation of a 2-way semijoin distributed query processing algorithm

Author: Chen Chen
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2001
Field of study

The 2-way semijoin is proposed as an important extended version of the semijoin, which adds a backward reduction to maximize the reduction capability of traditional semijoin operations used as an effective operator for minimizing transmission cost in distributed query processing. In this thesis, we evaluate the 2-way semijoin algorithm objectively against a full reducer which is the algorithm that fully reduces all relations involved in a query by eliminating all non-participating tuples from relations. Instead of using filter-based algorithm, our algorithm is implemented so that it avoids hash collisions and also allows for composite semijoins. A series of experiments with various queries are carried out to study the above issues. It has been show that using our 2-way semijoin algorithm to reduce the query relations achieves significantly reduction effect. It performs well with respectable results on both the average percentage reduction of query relations and the percentage of queries that achieve full reduction in terms of total cost. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2000 .C344. Source: Masters Abstracts International, Volume: 40-03, page: 0721. Adviser: Joan M. Morrissey. Thesis (M.Sc.)--University of Windsor (Canada), 2001

Scholarship at UWindsor

Reduction of collisions in Bloom filters during distributed query optimization.

Author: Liang Yan
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/1999
Field of study

The goal of distributed query optimization is to find the optimal strategy for the execution of a given query. The approaches in distributed query processing have mainly focused on the use of joins, semijoins, and filters. Semijoins have the advantage over joins in that there are no increases in data sizes. However, a semijoin needs more local processing such as projection and higher data transmission. To improve the distributed query processing, the filter-based approach is utilized. One of the limitations of this approach is collisions. We investigate how collisions affect the performance of the algorithm and how performance can be improved given those collisions. Our proposed algorithm utilizes two sets of filters to reduce the collisions, so the performance has been improved when collisions exist. Our proposed algorithm is evaluated objectively by comparison to a full reducer which is the algorithm that fully reduces all relations involved in a query by eliminating all non-participating tuples from the relations. The results of the evaluation show that: (1) With a perfect hash function, on average, our algorithm eliminates 97.41% of the unneeded data and fully reduces the relations of over 70% of the queries. (2) Using a single set of filters with specific percentages of collisions, on average, less than half of a queries are fully reduced by the algorithm. Therefore, the collisions substantially affects the performance. (3) Using two sets of filters, On average, our algorithm eliminates 95% of noncontributive tuples and achieves over 60% full reduction. In conclusion, our improved algorithm utilizes the two sets of filters to reduce the effects of collisions substantially. Therefore, we improve the performance of our algorithm under the assumption of collisions which is the major problem in using Bloom filters during distributed query optimization.Dept. of Computer Science. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis1999 .L53. Source: Masters Abstracts International, Volume: 39-02, page: 0528. Adviser: Joan Morrissey. Thesis (M.Sc.)--University of Windsor (Canada), 1999

Scholarship at UWindsor