24 research outputs found

    An evaluation of a 2-way semijoin distributed query processing algorithm

    Get PDF
    The 2-way semijoin is proposed as an important extended version of the semijoin, which adds a backward reduction to maximize the reduction capability of traditional semijoin operations used as an effective operator for minimizing transmission cost in distributed query processing. In this thesis, we evaluate the 2-way semijoin algorithm objectively against a full reducer which is the algorithm that fully reduces all relations involved in a query by eliminating all non-participating tuples from relations. Instead of using filter-based algorithm, our algorithm is implemented so that it avoids hash collisions and also allows for composite semijoins. A series of experiments with various queries are carried out to study the above issues. It has been show that using our 2-way semijoin algorithm to reduce the query relations achieves significantly reduction effect. It performs well with respectable results on both the average percentage reduction of query relations and the percentage of queries that achieve full reduction in terms of total cost. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2000 .C344. Source: Masters Abstracts International, Volume: 40-03, page: 0721. Adviser: Joan M. Morrissey. Thesis (M.Sc.)--University of Windsor (Canada), 2001

    A Genetic Programming Approach for Distributed Queries

    Get PDF
    With the emergence of relatively inexpensive and advanced communication technology, Distributed Database Management Systems (DDBMS) have become an integral part of many computer applications. Efficient query processing is one of the most important issues in distributed database systems. In a distributed environment, it is common that queries extract data from different sites. It is important to limit the amount of data transfer across different sites. Semijoin is a way to reduce the cost of expensive joins between various sites. A key issue in query optimization based on semijoin reduction is to find a good sequence of semijoins that reduce the relations referenced in a given query before the joins are performed. This paper proposes a new approach, based on Genetic Programming (GP), to improve the process of database query in Distributed Database Systems. A longer version of this paper is available

    Flattening an object algebra to provide performance

    Get PDF
    Algebraic transformation and optimization techniques have been the method of choice in relational query execution, but applying them in object-oriented (OO) DBMSs is difficult due to the complexity of OO query languages. This paper demonstrates that the problem can be simplified by mapping an OO data model to the binary relational model implemented by Monet, a state-of-the-art database kernel. We present a generic mapping scheme to flatten data models and study the case of straightforward OO model. We show how flattening enabled us to implement a query algebra, using only a very limited set of simple operations. The required primitives and query execution strategies are discussed, and their performance is evaluated on the 1-GByte TPC-D (Transaction-processing Performance Council's Benchmark D), showing that our divide-and-conquer approach yields excellent result

    A bloom-filter strategy for response time reduction in distributed query processing.

    Get PDF
    In distributed database systems, query optimization is to find strategies attempt to minimize the amount of data transmitted over the network. Optimization algorithms have an important impact on the performance of distributed query processing. Since optimal query processing in distributed database systems has been shown to be NP-Hard [WC96], heuristics are applied to find a cost-effective and efficient (but suboptimal) processing strategy. Many query optimization strategies have been proposed to minimize either the total cost or the response time. The approaches in distributed query processing have mainly focused on the use of joins, semijoins, and filters. In this thesis, we propose a new reduction strategy based on bloom-filters to significantly reduce the response time of a distributed query. This algorithm can process general queries consisting of an arbitrary number of relations and join attributes. The performance of the algorithm with respect to response time is compared against the Initial Feasible Solution (IFS). An amount of experimental results has been used to evaluate the performance of our algorithm. Compared to the IFS, our algorithm provides a significantly improved query solution. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2003 .G36. Source: Masters Abstracts International, Volume: 43-05, page: 1749. Thesis (M.Sc.)--University of Windsor (Canada), 2003

    The use of reduction filters in distributed query optimization

    Get PDF
    A major issue that affects the performance of a distributed database management system is the optimal processing of a query involving data from several sites. The problem of distributed query processing is to determine a sequence of operations, called an execution strategy, with the minimum cost. This has been shown to be an NP-Hard problem [Hen80, WC96]. Therefore, most proposed algorithms for processing distributed queries are heuristic, and focus on producing efficient (but suboptimal) strategies that minimize some particular cost of the query. Many proposed solutions use joins, semijoins, a combination of joins and semijoins, and dynamic methods. Solutions that use a filter-based approach have also been proposed. However, the limitations of such approaches include the assumption of a perfect hash function, the restriction of the algorithm to specific query types, and the restriction of the algorithm to a specific number of relations and joining attributes. Therefore, we propose a new filter-based algorithm that can process general queries consisting of an arbitrary number of relations and joining attributes. Also, it does not assume the use of a perfect hash function. (Abstract shortened by UMI.) Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis1998 .O87. Source: Masters Abstracts International, Volume: 39-02, page: 0531. Adviser: Joan Morrissey. Thesis (M.Sc.)--University of Windsor (Canada), 1998

    A Decision Support Tool for Distributed Database Design

    Get PDF
    The efficiency and effectiveness of a distributed database depend primarily on solving two interrelated design problems: data allocation, specifying what data to replicate and where to store it, and operating strategies, specifying where and how retrieval and update processes are performed. We develop a distributed database design approach that comprehensively addresses these problems, explicitly modeling their interdependencies for both retrieval and update processing. We extend earlier distributed database design models to include join order and data reduction by semijoin, in addition to data replication, copy identification, and join node selection. We demonstrate that join ordering and data reduction by semijoin are important distributed database design decisions that must be included in a distributed database design algorithm if it is to determine an overall optimal distributed database design

    A Comparison of Distributed Database Design Models

    Get PDF
    Although numerous database design models and solution algorithms have been developed, there has been little work that compares and evaluates these models. Lack of such work has left us with several questions: Do the more comprehensive models actually result in better solutions than the simpler models? If so, what makes them better? Are they better under all conditions or only under certain conditions? Are there trade-offs between data redundancy and sophisticated operation allocation strategies? In this paper, we systematically compare and evaluate several distributed database design models in terms of total operating cost and average response time under various conditions. We vary the relative frequency of update queries and selectivities of queries. The results demonstrate that replication, join node selection, join order, and reduction by semijoin, all have significant impact on the efficiency of a distributed database system. Replication was most effective for retrieval intensive and high selectivity situations. Join node selection, join order, and reduction by semijoin were most effective for balanced retrieval/update and low selectivity situations. The results also suggest that there are trade-offs between total operating cost and average response time design criteria

    Ontology-based Search Algorithms over Large-Scale Unstructured Peer-to-Peer Networks

    Get PDF
    Peer-to-Peer(P2P) systems have emerged as a promising paradigm to structure large scale distributed systems. They provide a robust, scalable and decentralized way to share and publish data.The unstructured P2P systems have gained much popularity in recent years for their wide applicability and simplicity. However efficient resource discovery remains a fundamental challenge for unstructured P2P networks due to the lack of a network structure. To effectively harness the power of unstructured P2P systems, the challenges in distributed knowledge management and information search need to be overcome. Current attempts to solve the problems pertaining to knowledge management and search have focused on simple term based routing indices and keyword search queries. Many P2P resource discovery applications will require more complex query functionality, as users will publish semantically rich data and need efficiently content location algorithms that find target content at moderate cost. Therefore, effective knowledge and data management techniques and search tools for information retrieval are imperative and lasting. In my dissertation, I present a suite of protocols that assist in efficient content location and knowledge management in unstructured Peer-to-Peer overlays. The basis of these schemes is their ability to learn from past peer interactions and increasing their performance with time.My work aims to provide effective and bandwidth-efficient searching and data sharing in unstructured P2P environments. A suite of algorithms which provide peers in unstructured P2P overlays with the state necessary in order to efficiently locate, disseminate and replicate objects is presented. Also, Existing approaches to federated search are adapted and new methods are developed for semantic knowledge representation, resource selection, and knowledge evolution for efficient search in dynamic and distributed P2P network environments. Furthermore,autonomous and decentralized algorithms that reorganizes an unstructured network topology into a one with desired search-enhancing properties are proposed in a network evolution model to facilitate effective and efficient semantic search in dynamic environments

    Optimization of object query languages

    Get PDF
    corecore