20 research outputs found

    A taxonomy of parallel sorting

    Get PDF
    TR 84-601In this paper, we propose a taxonomy of parallel sorting that includes a broad range of array and file sorting algorithms. We analyze the evolution of research on parallel sorting, from the earliest sorting networks to the shared memory algorithms and the VLSI sorters. In the context of sorting networks, we describe two fundamental parallel merging schemes - the odd-even and the bitonic merge. Sorting algorithms have been derived from these merging algorithms for parallel computers where processors communicate through interconnection networks such as the perfect shuffle, the mesh and a number of other sparse networks. After describing the network sorting algorithms, we show that, with a shared memory model of parallel computation, faster algorithms have been derived from parallel enumeration sorting schemes, where keys are first ranked and then rearranged according to their rank

    Design and Analysis of Multi-User Benchmarks for Database Systems

    Full text link
    ABSTRACT NOT SUPPLIE

    Performance Evaluation of Main Memory Database Systems

    Full text link
    In this paper we present the results of a comprehensive benchmark of the relational Main Memory Database System (MMDBS), that is the foundation of the interactive office system. Office-By-Example (OBE). Based on this case study, we identify issues that must be considered in the design and implementation of MMDBS's. We determine relevant performance metrics and describe techniques for benchmarking MMDBS's

    A Better Tool for Query Optimization

    Full text link
    When evaluating the performance of a query strategy, one must often estimate the number of distinct values of an attribute in a randomly selected subset of a relation. Most query optimizers compute this estimate based on the assumption that prior to the selection, the attribute values are uniformly distributed in the relation. In this paper we depart from this assumption and instead consider Zipf distributions that are known to accurately model text and name distributions. Given a relation of cardinality nn where a non-key attribute AA has a Zipf distribution, we derive both an exact formula and an approximate non-iterative formula for the expected number of distinct AA-values contained in a sample of kk randomly selected tuples. The approximation is accurate, and it is very easy to compute. Thus it provides a practical tool to deal with non-uniform distributions in query optimization

    Benchmarking Database Systems - A Systematic Approach

    No full text
    This paper describes a customized database and a comprehensive set of queries that can be used for systematic benchmarking of relational database systems. Designing this database and a set of carefully tuned benchmarks represents a first attempt in developing a scientific methodology for performance evaluation of database management systems. We have used this database to perform a comparative evaluation of the database machine DIRECT, the "university " and "commercial " versions of the INGRES database system, the relational database system ORACLE, and the IDM 500 database machine. We present a subset of our measurements (for the single user case only), that constitute a preliminary performance evaluation of these systems. NOTE TO THE READER It is important for the reader to recognize that the results presented in this paper represent the performance of the various database systems at ONE point in time and that new releases of the various systems will undoubtably perform differently. The objective of this research was not to make a definitive statement as to which is the best relational database system on the market today. Rather, our goal was to develop a standard set of benchmarks that could be used by database system designers for evaluating changes to their systems and by users for selecting the system which best suits their needs. It is also imperative that the reader understands that the results presented in no way measure the performance of the various systems in a multiuser environment. We are currently developing a methodology for benchmarking database systems in this environment

    Design-By-Example: A Design Tool for Relational Databases

    Full text link
    In recent years, research in relational design theory and in query optimization has established a firm ground for designing well-structured logical and physical database schemes. However, the design process prequires mastering a considerable amount of theoretical results. Furthermore, even for the initiated database designer, many of the known algorithms for logical design do not provide constructive guidelines for generating a database scheme that would prevent update anomalies and data inconsistencies. Nor do the algorithms and evaluation methods for file structures and query processing provide constructive physical design rules. We propose an expert tool that would make knowledge in relational design theory and query optimization automatically and transparently available to the database designer. This tool is a system with an interactive, graphical interface that uses examples to guide the designer through several phases of logical and physical database design. Logical design is based on example relations, and physical design on example queries. The example relations are automatically generated by the system. They contain sample data and satisfy the data dependencies that the designer specifies with the assistance of the expert tool. The example queries and their expected frequency are specified by the designer, using graphically displayed skeleton queries. The system generates a physical design scheme that optimizes the mix of queries expected by the designer, and computes a performance forecast. Both example relations and example queries can be modified by the designer, until the expert tool generates a satisfactory design

    Sorting Large Files on a Backend Multiprocessor

    Full text link
    A fundamental measure of processing power in a database management system is the performance of the sort utility it provides. When sorting a large data file on a serial computer, performance is limited by factors involving processor speed, memory capacity and I/O bandwidth. In this paper, we investigate the feasibility and efficiency of a parallel sort-merge algorithm through implementation on the JASMIN prototype, a backend multiprocessor built around a fast packet bus. We describe the design and implementation of a parallel sort utility that may become a building block for query processing in a database system that runs on JASMIN. We present and analyze the results of measurements corresponding to a range of file sizes and processor configurations. Our results show that using current, off-the-shelf technology coupled with a streamlined distributed operating system, three and five microprocessor configurations provide a very cost-effective sort of large files. The three processor configuration sorts a 100 megabyte file in one hour, which compares well with commercial sort packages available on high-performance mainframes. In additional experiments, we investigate a model to tune our sort software, and scale our results to higher processor and network capabilities

    Panel: The effect of large main memory on database systems

    No full text

    Duplicate record elimination in large data files

    No full text

    A General Framework for Computing Block Accesses

    Full text link
    A physical database system design should take account of skewed block access distributions, nonuniformly distributed attribute domains, and dependent attributes. In this paper we derive general formulas for the number of blocks accessed under these assumptions by considering a class of related occupancy problems. We then proceed to develop robust and accurate approximations for these formulas. We investigate three clases of approximation methods, respectively based on generating functions, Taylor series expansions, and majorization. These approximations are as simple to use and far more accurate than the cost estimate formulas generated by making independence and uniformity assumptions
    corecore