18 research outputs found

    Multi-Dimensional Database Allocation for Parallel Data Warehouses

    Get PDF
    Data allocation is a key performance factor for parallel database systems (PDBS). This holds especially for data warehousing environments where huge amounts of data and complex analytical queries have to be dealt with. While there are several studies on data allocation for relational PDBS, the specific requirements of data warehouses have not yet been sufficiently addressed. In this study, we consider the allocation of relational data warehouses based on a star schema and utilizing bitmap index structures. We investigate how a multi-dimensional hierarchical data fragmentation of the fact table supports queries referencing different subsets of the schema dimensions. Our analysis is based on realistic parameters derived from a decision support benchmark. The performance implications of different allocation choices are evaluated by means of a detailed simulation model

    On Parallel Join Processing in Object-Relational Database Systems

    Get PDF
    So far only few performance studies on parallel object-relational database systems are available. In particular, the relative performance of relational vs. reference-based join processing in a parallel environment has not been investigated sufficiently. We present a performance study based on the BUCKY benchmark to compare parallel join processing using reference attributes with relational hash- and merge-join algorithms. In addition, we propose a data allocation scheme especially suited for object hierarchies and set-valued attributes

    Dynamic Query Scheduling in Parallel Data Warehouses

    Get PDF
    Data warehouse queries pose challenging performance problems that often necessitate the use of parallel database systems (PDBS). Although dynamic load balancing is of key importance in PDBS, to our knowledge it has not yet been investigated thoroughly for parallel data warehouses. In this study, we propose a scheduling strategy that simultaneously considers both processors and disks while utilizing the load balancing potential of a Shared Disk architecture. We compare the performance of this new method to several other approaches in a comprehensive simulation study, incorporating skew aspects and typical data warehouse features such as star schemas

    Workload-Aware Database Monitoring and Consolidation

    Get PDF
    In most enterprises, databases are deployed on dedicated database servers. Often, these servers are underutilized much of the time. For example, in traces from almost 200 production servers from different organizations, we see an average CPU utilization of less than 4%. This unused capacity can be potentially harnessed to consolidate multiple databases on fewer machines, reducing hardware and operational costs. Virtual machine (VM) technology is one popular way to approach this problem. However, as we demonstrate in this paper, VMs fail to adequately support database consolidation, because databases place a unique and challenging set of demands on hardware resources, which are not well-suited to the assumptions made by VM-based consolidation. Instead, our system for database consolidation, named Kairos, uses novel techniques to measure the hardware requirements of database workloads, as well as models to predict the combined resource utilization of those workloads. We formalize the consolidation problem as a non-linear optimization program, aiming to minimize the number of servers and balance load, while achieving near-zero performance degradation. We compare Kairos against virtual machines, showing up to a factor of 12Ă— higher throughput on a TPC-C-like benchmark. We also tested the effectiveness of our approach on real-world data collected from production servers at Wikia.com, Wikipedia, Second Life, and MIT CSAIL, showing absolute consolidation ratios ranging between 5.5:1 and 17:1

    Analytische Bestimmung einer Datenallokation fĂĽr Parallele Data Warehouses

    Get PDF
    Die stark wachsende Bedeutung der Analyse von Data Warehouse-Inhalten und bequemere Anfrageschnittstellen für Endbenutzer erhöhen das Aufkommen an OLAP-Queries signifikant. Bei der Reduktion des Arbeitsumfanges und dem Erreichen kurzer Antwortzeiten für diese komplexen Anfragen ist neben der Nutzung von Verarbeitungs- und I/O-Parallelität eine adäquate Datenallokation der Schlüssel zu guter Leistungsfähigkeit. Allerdings ist die Bestimmung einer geeigneten Fragmentierung und Allokation für große Datenmengen, wie sie z.B. in Form von Faktentabellen oder Indexstrukturen in relationalen Sternschemas vorliegen, ein schwieriges Problem. Hierfür existiert heutzutage praktisch keine Werkzeugunterstützung. Wir präsentieren daher einen Ansatz zur analytischen Bestimmung einer passenden multi-dimensionalen, hierarchischen Datenallokation. Unser Ansatz dürfte recht einfach in ein Werkzeug zur automatischen Unterstützung des Allokationsproblems integriert werden können