36,446 research outputs found

    Parallel Database Architectures: A Simulation Study.

    Get PDF
    Parallel database systems are gaining popularity as a solution that provides scalability in large and growing databases. A parallel database system is a DBS which exploits multiprocessing systems to improve performance. Parallel database computers can be classified into three categories: shared memory, shared disk, and shared nothing. In shared memory, all resources, including main memory and disk units, are shared among several processors. In shared disk, a group of processors share a common pool of disks, but each processor has its own private main memory. In the shared-nothing system, every processor has its own memory and disk unit, that is, except for communication links, no resources are shared among the processors. In this work, we· compare the performance of the three architecture classes. Simulation models for the various architectures are introduced. Using these models, a number of experiments were conducted to compare the system performance of these architectures under different workloads and transaction models. The aim of this work is to provide a tool for evaluating the different architectures and their appropriateness for a specific database application

    Forecasting the cost of processing multi-join queries via hashing for main-memory databases (Extended version)

    Full text link
    Database management systems (DBMSs) carefully optimize complex multi-join queries to avoid expensive disk I/O. As servers today feature tens or hundreds of gigabytes of RAM, a significant fraction of many analytic databases becomes memory-resident. Even after careful tuning for an in-memory environment, a linear disk I/O model such as the one implemented in PostgreSQL may make query response time predictions that are up to 2X slower than the optimal multi-join query plan over memory-resident data. This paper introduces a memory I/O cost model to identify good evaluation strategies for complex query plans with multiple hash-based equi-joins over memory-resident data. The proposed cost model is carefully validated for accuracy using three different systems, including an Amazon EC2 instance, to control for hardware-specific differences. Prior work in parallel query evaluation has advocated right-deep and bushy trees for multi-join queries due to their greater parallelization and pipelining potential. A surprising finding is that the conventional wisdom from shared-nothing disk-based systems does not directly apply to the modern shared-everything memory hierarchy. As corroborated by our model, the performance gap between the optimal left-deep and right-deep query plan can grow to about 10X as the number of joins in the query increases.Comment: 15 pages, 8 figures, extended version of the paper to appear in SoCC'1

    Enabling On-Demand Database Computing with MIT SuperCloud Database Management System

    Full text link
    The MIT SuperCloud database management system allows for rapid creation and flexible execution of a variety of the latest scientific databases, including Apache Accumulo and SciDB. It is designed to permit these databases to run on a High Performance Computing Cluster (HPCC) platform as seamlessly as any other HPCC job. It ensures the seamless migration of the databases to the resources assigned by the HPCC scheduler and centralized storage of the database files when not running. It also permits snapshotting of databases to allow researchers to experiment and push the limits of the technology without concerns for data or productivity loss if the database becomes unstable.Comment: 6 pages; accepted to IEEE High Performance Extreme Computing (HPEC) conference 2015. arXiv admin note: text overlap with arXiv:1406.492

    A Survey on Array Storage, Query Languages, and Systems

    Full text link
    Since scientific investigation is one of the most important providers of massive amounts of ordered data, there is a renewed interest in array data processing in the context of Big Data. To the best of our knowledge, a unified resource that summarizes and analyzes array processing research over its long existence is currently missing. In this survey, we provide a guide for past, present, and future research in array processing. The survey is organized along three main topics. Array storage discusses all the aspects related to array partitioning into chunks. The identification of a reduced set of array operators to form the foundation for an array query language is analyzed across multiple such proposals. Lastly, we survey real systems for array processing. The result is a thorough survey on array data storage and processing that should be consulted by anyone interested in this research topic, independent of experience level. The survey is not complete though. We greatly appreciate pointers towards any work we might have forgotten to mention.Comment: 44 page

    Object Database Scalability for Scientific Workloads

    Get PDF
    We describe the PetaByte-scale computing challenges posed by the next generation of particle physics experiments, due to start operation in 2005. The computing models adopted by the experiments call for systems capable of handling sustained data acquisition rates of at least 100 MBytes/second into an Object Database, which will have to handle several PetaBytes of accumulated data per year. The systems will be used to schedule CPU intensive reconstruction and analysis tasks on the highly complex physics Object data which need then be served to clients located at universities and laboratories worldwide. We report on measurements with a prototype system that makes use of a 256 CPU HP Exemplar X Class machine running the Objectivity/DB database. Our results show excellent scalability for up to 240 simultaneous database clients, and aggregate I/O rates exceeding 150 Mbytes/second, indicating the viability of the computing models
    • …
    corecore