Search CORE

57 research outputs found

An evaluation of improvement algorithms for query processing in a distributed database system.

Author: Chu Nelson Chung Ngok.
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2002
Field of study

In distributed database query processing, the database management system (DBMS) may consider all alternative queries and choose the one with the least cost. However, the number of alternative queries follows the number of relations, attributes, and distributed sites to increase exponentially. Therefore, the problem of finding an optimal query is a well-recognized NP-hard problem [CL84, OV99]. Applying heuristic algorithms to this kind of problem is a commonly used strategy. There are two basic steps in query processing. First, enumerate alternative plans for evaluating a query. Second, estimate the cost of each enumerated plan and choose the plan with the least estimated cost from the result of cost evaluation. In the area of distributed querying processing, the semijoin is a well-recognized operator, which provides efficient query results. There are some heuristic algorithms proposed to solve query-processing problems in distributed database systems. Unfortunately, most of these algorithms do not guarantee the optimality of the result. Therefore, some researchers have been motivated to identify some optimality properties for semijoin programs and have proposed a set of algorithms to improve a non-optimal semijoin program for satisfying those optimality properties. The performance and limitations of this set of algorithms will be evaluated in this thesis. There are some modifications on the algorithms which can improve the performance of improvement algorithms in this thesis. The research work includes the study of modification of the essential operations, such as semijoin, and the interrelationship between the sub-procedures in the algorithms. Different implementation approaches to the algorithms also have been explored.Dept. of Computer Science. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2002 .C48. Source: Masters Abstracts International, Volume: 41-04, page: 1103. Adviser: Joan Morrissey. Thesis (M.Sc.)--University of Windsor (Canada), 2002

Scholarship at UWindsor

An evaluation of a 2-way semijoin distributed query processing algorithm

Author: Chen Chen
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2001
Field of study

The 2-way semijoin is proposed as an important extended version of the semijoin, which adds a backward reduction to maximize the reduction capability of traditional semijoin operations used as an effective operator for minimizing transmission cost in distributed query processing. In this thesis, we evaluate the 2-way semijoin algorithm objectively against a full reducer which is the algorithm that fully reduces all relations involved in a query by eliminating all non-participating tuples from relations. Instead of using filter-based algorithm, our algorithm is implemented so that it avoids hash collisions and also allows for composite semijoins. A series of experiments with various queries are carried out to study the above issues. It has been show that using our 2-way semijoin algorithm to reduce the query relations achieves significantly reduction effect. It performs well with respectable results on both the average percentage reduction of query relations and the percentage of queries that achieve full reduction in terms of total cost. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2000 .C344. Source: Masters Abstracts International, Volume: 40-03, page: 0721. Adviser: Joan M. Morrissey. Thesis (M.Sc.)--University of Windsor (Canada), 2001

Scholarship at UWindsor

Implementation of composite semijoins using a variation of Bloom filters.

Author: Zhu Yongmei
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2004
Field of study

Different from a centralized database system, distributed query processing involves data transmission among different sites and this communication cost is a dominant factor compared to local processing cost. So, the objective of distributed query optimization is to find strategies to minimize the amount of data transmitted over the network. Since optimal query processing in distributed database systems has been shown to be an NP-hard problem, heuristics are applied to find a near-optimal processing strategy. Previous research has mainly focused on the use of joins, semijoins, and hash semijoins (Bloom filters). The semijoin is a commonly recognized operator, which provides efficient query results. As a variation of semijoin, the composite semijoin is beneficial to do semijoins as one composite rather than as multiple single column semijoins. The Hash semijoin (which uses a Bloom filter) is used to minimize the cost of a semijoin operation. This thesis report provides a summary of each category of query processing techniques and optimization algorithms. Also in this thesis, we propose a new algorithm called Composite Semijoin Filter by combining the idea of composite semijoins, Bloom filters and PERF joins. One of the advantages of this algorithm is to avoid collisions. The algorithm is evaluated and compared with initial feasible solution (IFS) and another filter-based algorithm. It has been shown that the algorithm gives substantial reduction on relations and the total cost.Dept. of Computer Science. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2004 .Z58. Source: Masters Abstracts International, Volume: 43-01, page: 0249. Adviser: Joan Morrissey. Thesis (M.Sc.)--University of Windsor (Canada), 2004

Scholarship at UWindsor

Algebraic optimization of recursive queries

Author: Apers Peter M.G.
Houtsma M.A.W.
Houtsma Maurice A.W.
Publication venue: North Holland
Publication date: 01/01/1992
Field of study

Over the past few years, much attention has been paid to deductive databases. They offer a logic-based interface, and allow formulation of complex recursive queries. However, they do not offer appropriate update facilities, and do not support existing applications. To overcome these problems an SQL-like interface is required besides a logic-based interface.\ud \ud In the PRISMA project we have developed a tightly-coupled distributed database, on a multiprocessor machine, with two user interfaces: SQL and PRISMAlog. Query optimization is localized in one component: the relational query optimizer. Therefore, we have defined an eXtended Relational Algebra that allows recursive query formulation and can also be used for expressing executable schedules, and we have developed algebraic optimization strategies for recursive queries. In this paper we describe an optimization strategy that rewrites regular (in the context of formal grammars) mutually recursive queries into standard Relational Algebra and transitive closure operations. We also describe how to push selections into the resulting transitive closure operations.\ud \ud The reason we focus on algebraic optimization is that, in our opinion, the new generation of advanced database systems will be built starting from existing state-of-the-art relational technology, instead of building a completely new class of systems

CiteSeerX

University of Twente Research Information

An evaluation of PERF joins for a two-way semijoin based algorithm.

Author: Yang Li
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2005
Field of study

Distributed database system is becoming more widely used instead of centralized database systems in business world due to business expansion and network technology development. Query optimization provides a strategy for executing each query over the networks in the most cost-effective way, which aims to minimize the transmission cost over the networks. Many techniques and algorithms have been proposed to optimize queries, such as semijoin[BC81][BGW+81], 2-way semijoin[KR87], composite semijoin[PC90], hash semijoin[TC92], PERF join[LR95], etc. In distributed query processing, the semijoin has been used as an effective operator to reduce the total amount of data transmission. 2-way semijoin is an extended version of semijoin for more cost-effective distributed query processing. PERF joins are 2-way semijoins using a bit vector during the backward phase. PERF[LR95] is designed to minimize the cost of the backward reduction. It is based on the tuple scan order instead of hashing. Thus it does not suffer any loss of join information incurred by hash collisions. Algorithm UPSJ and Algorithm CPSJ are proposed based on a 2-way semijoin algorithm. Two variants of PERF joins are applied to the 2-way semijoin algorithm. In Algorithm UPSJ, uncompressed PERF joins and 2-way semijoin techniques are combined. In Algorithm CPSJ, compressed PERF joins are applied during the backward processing. Programs are designed to implement both original and the enhanced algorithms. Several experiments are conducted and the results showed a considerable enhancement obtained by applying the PERF join concept.Dept. of Computer Science. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2005 .Y36. Source: Masters Abstracts International, Volume: 44-03, page: 1419. Thesis (M.Sc.)--University of Windsor (Canada), 2005

Scholarship at UWindsor

An evaluation between Bloom Filter join and PERF join in Distributed Query Processing

Author: Pei Ming
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2008
Field of study

Nowadays, with the explosion of information and the telecommunication era\u27s coming, more and more huge applications encourage decentralization of data while accessing data from different sites [HFB00]. The process of retrieving data from different sites called Distributed Query Processing. The objective of distributed query optimization is to find the most cost-effective of executing query across the network [OV99]. Semijoin [BC81] [BG+81] is known as an effective operator to eliminate the tuples of a relation which are not contributive to a query. 2-way semijoin [KR87] is an extended version of semijoin which not only performs forward reduction like traditional semijoin does, but also provides backward reduction always in cost-effective way. Bloom Filter[B70] and PERF [LR95] are 2 filter based techniques which use a bit vector to represent of the original join attributes projection during the data transmission. Compare with generating a bit array with hash function in bloom filter, Perf join is based on the tuples scan order to avoid losing information caused by hash collision. In the thesis, we will apply both bloom filter and pert on 2-way semijoin algorithms to reduce transmission cost of distributed queries. Performance of propose algorithms will compare against each others and IFS (Initial Feasible Solution) through amount of experiments. \u27Keywords:\u27 Distributed Query Processing, Semijoin, Bloom Filter, Perf Join

Scholarship at UWindsor

The use of reduction filters in distributed query optimization

Author: Osborn Wendy Kathleen
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/1998
Field of study

A major issue that affects the performance of a distributed database management system is the optimal processing of a query involving data from several sites. The problem of distributed query processing is to determine a sequence of operations, called an execution strategy, with the minimum cost. This has been shown to be an NP-Hard problem [Hen80, WC96]. Therefore, most proposed algorithms for processing distributed queries are heuristic, and focus on producing efficient (but suboptimal) strategies that minimize some particular cost of the query. Many proposed solutions use joins, semijoins, a combination of joins and semijoins, and dynamic methods. Solutions that use a filter-based approach have also been proposed. However, the limitations of such approaches include the assumption of a perfect hash function, the restriction of the algorithm to specific query types, and the restriction of the algorithm to a specific number of relations and joining attributes. Therefore, we propose a new filter-based algorithm that can process general queries consisting of an arbitrary number of relations and joining attributes. Also, it does not assume the use of a perfect hash function. (Abstract shortened by UMI.) Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis1998 .O87. Source: Masters Abstracts International, Volume: 39-02, page: 0531. Adviser: Joan Morrissey. Thesis (M.Sc.)--University of Windsor (Canada), 1998

Scholarship at UWindsor

Recommended from our members

A New Client-Server Architecture for Distributed Query Processing

Author: Li Zhe
Ross Kenneth A.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/1994
Field of study

This paper presents the idea of "tuple bit-vectors" for distributed query processing. Using tuple bit-vectors, a new two-way semijoin operator called 2SJ++ that enhances the semijoin with an essentially "free" backward reduction capability is proposed. We explore in detail the benefits and costs of 2SJ++ compared with other semijoin variants, and its effect on distributed query processing performance. We then focus on one particular distributed query processing algorithm, called the "one-shot" algorithm. We modify the one-shot algorithm by using 2SJ++ and demonstrate the improvements achieved in network transmission cost compared with the original one-shot technique. We use this improvement to demonstrate that equipped with the 2SJ++ technique, one can improve the performance of distributed query processing algorithms significantly without adding much complexity to the algorithms

Columbia University Academic Commons

Compressed positionally encoded record filters in distributed query processing.

Author: Zhou Ying (Joy)
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2004
Field of study

Different from a centralized database system, distributed query processing involves data transmission among distributed sites, which makes reducing transmission cost a major goal for distributed query optimization. A Positionally Encoded Record Filter (PERF) has attracted research attention as a cost-effective operator to reduce transmission cost. A PERF is a bit array generated by relation tuple scan order instead of hashing, so that it inherits the same compact size benefit as a Bloom filter while suffering no loss of join information caused by hash collisions. Our proposed algorithm PERF_C (Compressed PERF) further reduces the transmission cost in algorithm PERF by compressing both the join attributes and the corresponding PERF filters using arithmetic coding. We prove by time complexity analysis that compression is more efficient than sorting, which was proposed by earlier research to remove duplicates in algorithm PERF. Through the experiments on our synthetic testbed with 36 types of distributed queries, algorithm PERF_C effectively reduces the transmission cost with a cost reduction ratio of 62%--77% over IFS. And PERF_C outperforms PERF with a gain of 16%--36% in cost reduction ratio. A new metric to measure the compression speed in bits per second, compression bps , is defined as a guideline to decide when compression is beneficial. When compression overhead is considered, compression is beneficial only if compression bps is faster than data transfer speed. Tested on both randomly generated and specially designed distributed queries, number of join attributes, size of join attributes and relations, level of duplications are identified to be critical database factors affecting compression. Tested under three typical real computing platforms, compression bps is measured over a wide range of data size and falls in the range from 4M b/s to 9M b/s. Compared to the present relatively slow data transfer rate over Internet, compression is found to be an effective means of reducing transmission cost in distributed query processing. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2004 .Z565. Source: Masters Abstracts International, Volume: 43-01, page: 0249. Adviser: J. Morrissey. Thesis (M.Sc.)--University of Windsor (Canada), 2004

Scholarship at UWindsor

A bloom-filter strategy for response time reduction in distributed query processing.

Author: Gao Wanxin
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2003
Field of study

In distributed database systems, query optimization is to find strategies attempt to minimize the amount of data transmitted over the network. Optimization algorithms have an important impact on the performance of distributed query processing. Since optimal query processing in distributed database systems has been shown to be NP-Hard [WC96], heuristics are applied to find a cost-effective and efficient (but suboptimal) processing strategy. Many query optimization strategies have been proposed to minimize either the total cost or the response time. The approaches in distributed query processing have mainly focused on the use of joins, semijoins, and filters. In this thesis, we propose a new reduction strategy based on bloom-filters to significantly reduce the response time of a distributed query. This algorithm can process general queries consisting of an arbitrary number of relations and join attributes. The performance of the algorithm with respect to response time is compared against the Initial Feasible Solution (IFS). An amount of experimental results has been used to evaluate the performance of our algorithm. Compared to the IFS, our algorithm provides a significantly improved query solution. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2003 .G36. Source: Masters Abstracts International, Volume: 43-05, page: 1749. Thesis (M.Sc.)--University of Windsor (Canada), 2003

Scholarship at UWindsor