21 research outputs found

    A bloom-filter strategy for response time reduction in distributed query processing.

    Get PDF
    In distributed database systems, query optimization is to find strategies attempt to minimize the amount of data transmitted over the network. Optimization algorithms have an important impact on the performance of distributed query processing. Since optimal query processing in distributed database systems has been shown to be NP-Hard [WC96], heuristics are applied to find a cost-effective and efficient (but suboptimal) processing strategy. Many query optimization strategies have been proposed to minimize either the total cost or the response time. The approaches in distributed query processing have mainly focused on the use of joins, semijoins, and filters. In this thesis, we propose a new reduction strategy based on bloom-filters to significantly reduce the response time of a distributed query. This algorithm can process general queries consisting of an arbitrary number of relations and join attributes. The performance of the algorithm with respect to response time is compared against the Initial Feasible Solution (IFS). An amount of experimental results has been used to evaluate the performance of our algorithm. Compared to the IFS, our algorithm provides a significantly improved query solution. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2003 .G36. Source: Masters Abstracts International, Volume: 43-05, page: 1749. Thesis (M.Sc.)--University of Windsor (Canada), 2003

    Compressed positionally encoded record filters in distributed query processing.

    Get PDF
    Different from a centralized database system, distributed query processing involves data transmission among distributed sites, which makes reducing transmission cost a major goal for distributed query optimization. A Positionally Encoded Record Filter (PERF) has attracted research attention as a cost-effective operator to reduce transmission cost. A PERF is a bit array generated by relation tuple scan order instead of hashing, so that it inherits the same compact size benefit as a Bloom filter while suffering no loss of join information caused by hash collisions. Our proposed algorithm PERF_C (Compressed PERF) further reduces the transmission cost in algorithm PERF by compressing both the join attributes and the corresponding PERF filters using arithmetic coding. We prove by time complexity analysis that compression is more efficient than sorting, which was proposed by earlier research to remove duplicates in algorithm PERF. Through the experiments on our synthetic testbed with 36 types of distributed queries, algorithm PERF_C effectively reduces the transmission cost with a cost reduction ratio of 62%--77% over IFS. And PERF_C outperforms PERF with a gain of 16%--36% in cost reduction ratio. A new metric to measure the compression speed in bits per second, compression bps , is defined as a guideline to decide when compression is beneficial. When compression overhead is considered, compression is beneficial only if compression bps is faster than data transfer speed. Tested on both randomly generated and specially designed distributed queries, number of join attributes, size of join attributes and relations, level of duplications are identified to be critical database factors affecting compression. Tested under three typical real computing platforms, compression bps is measured over a wide range of data size and falls in the range from 4M b/s to 9M b/s. Compared to the present relatively slow data transfer rate over Internet, compression is found to be an effective means of reducing transmission cost in distributed query processing. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2004 .Z565. Source: Masters Abstracts International, Volume: 43-01, page: 0249. Adviser: J. Morrissey. Thesis (M.Sc.)--University of Windsor (Canada), 2004

    The use of reduction filters in distributed query optimization

    Get PDF
    A major issue that affects the performance of a distributed database management system is the optimal processing of a query involving data from several sites. The problem of distributed query processing is to determine a sequence of operations, called an execution strategy, with the minimum cost. This has been shown to be an NP-Hard problem [Hen80, WC96]. Therefore, most proposed algorithms for processing distributed queries are heuristic, and focus on producing efficient (but suboptimal) strategies that minimize some particular cost of the query. Many proposed solutions use joins, semijoins, a combination of joins and semijoins, and dynamic methods. Solutions that use a filter-based approach have also been proposed. However, the limitations of such approaches include the assumption of a perfect hash function, the restriction of the algorithm to specific query types, and the restriction of the algorithm to a specific number of relations and joining attributes. Therefore, we propose a new filter-based algorithm that can process general queries consisting of an arbitrary number of relations and joining attributes. Also, it does not assume the use of a perfect hash function. (Abstract shortened by UMI.) Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis1998 .O87. Source: Masters Abstracts International, Volume: 39-02, page: 0531. Adviser: Joan Morrissey. Thesis (M.Sc.)--University of Windsor (Canada), 1998

    Execution strategies for SQL subqueries

    Full text link
    Optimizing SQL subqueries has been an active area in database research and the database industry throughout the last decades. Pre-vious work has already identified some approaches to efficiently execute relational subqueries. For satisfactory performance, proper choice of subquery execution strategies becomes even more essen-tial today with the increase in decision support systems and auto-matically generated SQL, e.g., with ad-hoc reporting tools. This goes hand in hand with increasing query complexity and growing data volumes – which all pose challenges for an industrial-strength query optimizer. This current paper explores the basic building blocks that Microsoft SQL Server utilizes to optimize and execute relational subqueries. We start with indispensable prerequisites such as detection and removal of correlations for subqueries. We identify a full spectrum of fundamental subquery execution strategies such as forward and reverse lookup as well as set-based approaches, explain the different execution strategies for subqueries implemented in SQL Server, and relate them to the current state of the art. To the best of our knowl-edge, several strategies discussed in this paper have not been pub-lished before. An experimental evaluation complements the paper. It quantifies the performance characteristics of the different approaches and shows that indeed alternative execution strategies are needed in different circumstances, which make a cost-based query optimizer indispen-sable for adequate query performance
    corecore