1,697 research outputs found

    Control versus Data Flow in Parallel Database Machines

    Get PDF
    The execution of a query in a parallel database machine can be controlled in either a control flow way, or in a data flow way. In the former case a single system node controls the entire query execution. In the latter case the processes that execute the query, although possibly running on different nodes of the system, trigger each other. Lately, many database research projects focus on data flow control since it should enhance response times and throughput. The authors study control versus data flow with regard to controlling the execution of database queries. An analytical model is used to compare control and data flow in order to gain insights into the question which mechanism is better under which circumstances. Also, some systems using data flow techniques are described, and the authors investigate to which degree they are really data flow. The results show that for particular types of queries data flow is very attractive, since it reduces the number of control messages and balances these messages over the node

    Extending a multi-set relational algebra to a parallel environment

    Get PDF
    Parallel database systems will very probably be the future for high-performance data-intensive applications. In the past decade, many parallel database systems have been developed, together with many languages and approaches to specify operations in these systems. A common background is still missing, however. This paper proposes an extended relational algebra for this purpose, based on the well-known standard relational algebra. The extended algebra provides both complete database manipulation language features, and data distribution and process allocation primitives to describe parallelism. It is defined in terms of multi-sets of tuples to allow handling of duplicates and to obtain a close connection to the world of high-performance data processing. Due to its algebraic nature, the language is well suited for optimization and parallelization through expression rewriting. The proposed language can be used as a database manipulation language on its own, as has been done in the PRISMA parallel database project, or as a formal basis for other languages, like SQL

    Parallel Database Architectures: A Simulation Study.

    Get PDF
    Parallel database systems are gaining popularity as a solution that provides scalability in large and growing databases. A parallel database system is a DBS which exploits multiprocessing systems to improve performance. Parallel database computers can be classified into three categories: shared memory, shared disk, and shared nothing. In shared memory, all resources, including main memory and disk units, are shared among several processors. In shared disk, a group of processors share a common pool of disks, but each processor has its own private main memory. In the shared-nothing system, every processor has its own memory and disk unit, that is, except for communication links, no resources are shared among the processors. In this work, we· compare the performance of the three architecture classes. Simulation models for the various architectures are introduced. Using these models, a number of experiments were conducted to compare the system performance of these architectures under different workloads and transaction models. The aim of this work is to provide a tool for evaluating the different architectures and their appropriateness for a specific database application

    Development of a parallel database environment

    Get PDF

    A Genetic Programming Framework for Two Data Mining Tasks: Classification and Generalized Rule Induction

    Get PDF
    This paper proposes a genetic programming (GP) framework for two major data mining tasks, namely classification and generalized rule induction. The framework emphasizes the integration between a GP algorithm and relational database systems. In particular, the fitness of individuals is computed by submitting SQL queries to a (parallel) database server. Some advantages of this integration from a data mining viewpoint are scalability, data-privacy control and automatic parallelization

    Dynamic File Migration to Support Parallel Database Systems

    No full text

    Dynamic Action Scheduling in a Parallel Database System

    Get PDF
    This paper describes a scheduling technique for parallel database systems to obtain high performance, both in terms of response time and throughput. The technique enables both intra- and inter-transaction parallelism while controlling concurrency between transactions correctly. Scheduling is performed dynamically at transaction execution time, taking into account dynamic aspects of the execution and allowing parallelism between the scheduling and transaction execution processes. The technique has a solid conceptual background, based on a simple graph-based approach. The usability and effectiveness of the technique are demonstrated by implementation in and measurements on the parallel PRISMA database system
    corecore