513 research outputs found

    Learning a Partitioning Advisor with Deep Reinforcement Learning

    Full text link
    Commercial data analytics products such as Microsoft Azure SQL Data Warehouse or Amazon Redshift provide ready-to-use scale-out database solutions for OLAP-style workloads in the cloud. While the provisioning of a database cluster is usually fully automated by cloud providers, customers typically still have to make important design decisions which were traditionally made by the database administrator such as selecting the partitioning schemes. In this paper we introduce a learned partitioning advisor for analytical OLAP-style workloads based on Deep Reinforcement Learning (DRL). The main idea is that a DRL agent learns its decisions based on experience by monitoring the rewards for different workloads and partitioning schemes. We evaluate our learned partitioning advisor in an experimental evaluation with different databases schemata and workloads of varying complexity. In the evaluation, we show that our advisor is not only able to find partitionings that outperform existing approaches for automated partitioning design but that it also can easily adjust to different deployments. This is especially important in cloud setups where customers can easily migrate their cluster to a new set of (virtual) machines

    Memory-aware sizing for in-memory databases

    Get PDF
    In-memory database systems are among the technological drivers of big data processing. In this paper we apply analytical modeling to enable efficient sizing of in-memory databases. We present novel response time approximations under online analytical processing workloads to model thread-level forkjoin and per-class memory occupation.We combine these approximations with a non-linear optimization program to minimize memory swapping in in-memory database clusters. We compare our approach with state-of-the-art response time approximations and trace-driven simulation using real data from an SAP HANA in-memory system and show that our optimization model is significantly more accurate than existing approaches at similar computational costs

    Design of efficient and elastic storage in the cloud

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    SAP HANA distributed in-memory database system: Transaction, session, and metadata management

    Get PDF
    One of the core principles of the SAP HANA database system is the comprehensive support of distributed query facility. Supporting scale-out scenarios was one of the major design principles of the system from the very beginning. Within this paper, we first give an overview of the overall functionality with respect to data allocation, metadata caching and query routing. We then dive into some level of detail for specific topics and explain features and methods not common in traditional disk-based database systems. In summary, the paper provides a comprehensive overview of distributed query processing in SAP HANA database to achieve scalability to handle large databases and heterogeneous types of workloads

    On-line analytical processing in distributed data warehouses

    Get PDF
    The concepts of 'data warehousing' and 'on-line analytical processing' have seen a growing interest in the research and commercial product community. Today, the trend moves away from complex centralized data warehouses to distributed data marts integrated in a common conceptual schema. However, as the first part of this paper demonstrates, there are many problems and little solutions for large distributed decision support systems in worldwide operating corporations. After showing the benefits and problems of the distributed approach, this paper outlines possibilities for achieving performance in distributed online analytical processing. Finally, the architectural framework of the prototypical distributed OLAP system CUBESTAR is outlined
    corecore