51 research outputs found

    Automated Storage Layout for Database Systems

    Get PDF
    Modern storage systems are complex. Simple direct-attached storage devices are giving way to storage systems that are flexible, network-attached, consolidated and virtualized. Today, storage systems have their own administrators, who use specialized tools and expertise to configure and manage storage resources. As a result, database administrators are no longer in direct control of the design and configuration of their database systems' underlying storage resources. This introduces problems because database physical design and storage configuration are closely related tasks, and the separation makes it more difficult to achieve a good end-to-end design. For instance, the performance of a database system depends strongly on the storage layout of database objects, such as tables and indexes, and the separation makes it hard to design a storage layout that is tuned to the I/O workload generated by the database system. In this thesis we address this problem and attempt to close the information gap between database and storage tiers by addressing the problem of predicting the storage (I/O) workload that will be generated by a database management system. Specifically, we show how to translate a database workload description, together with a database physical design, into a characterization of the I/O workload that will result. Such a characterization can directly be used by a storage configuration tool and thus enables effective end-to-end design and configuration spanning both the database and storage tiers. We then introduce our storage layout optimization tool, which leverages such workload characterizations to generate an optimized layout for a given set of database objects. We formulate the layout problem as a non-linear programming (NLP) problem and use the I/O characterization as input to an NLP solver. We have incorporated our I/O estimation technique into the PostgreSQL database management system and our layout optimization technique into a database layout advisor. We present an empirical assessment of the cost of both tools as well as the efficacy and accuracy of their results

    Dynamic Clustering in Object-Oriented Databases: An Advocacy for Simplicity

    Get PDF
    International audienceWe present in this paper three dynamic clustering techniques for Object-Oriented Databases (OODBs). The first two, Dynamic, Statistical & Tunable Clustering (DSTC) and StatClust, exploit both comprehensive usage statistics and the inter-object reference graph. They are quite elaborate. However, they are also complex to implement and induce a high overhead. The third clustering technique, called Detection & Reclustering of Objects (DRO), is based on the same principles, but is much simpler to implement. These three clustering algorithm have been implemented in the Texas persistent object store and compared in terms of clustering efficiency (i.e., overall performance increase) and overhead using the Object Clustering Benchmark (OCB). The results obtained showed that DRO induced a lighter overhead while still achieving better overall performance

    SQLCheck: Automated Detection and Diagnosis of SQL Anti-Patterns

    Full text link
    The emergence of database-as-a-service platforms has made deploying database applications easier than before. Now, developers can quickly create scalable applications. However, designing performant, maintainable, and accurate applications is challenging. Developers may unknowingly introduce anti-patterns in the application's SQL statements. These anti-patterns are design decisions that are intended to solve a problem, but often lead to other problems by violating fundamental design principles. In this paper, we present SQLCheck, a holistic toolchain for automatically finding and fixing anti-patterns in database applications. We introduce techniques for automatically (1) detecting anti-patterns with high precision and recall, (2) ranking the anti-patterns based on their impact on performance, maintainability, and accuracy of applications, and (3) suggesting alternative queries and changes to the database design to fix these anti-patterns. We demonstrate the prevalence of these anti-patterns in a large collection of queries and databases collected from open-source repositories. We introduce an anti-pattern detection algorithm that augments query analysis with data analysis. We present a ranking model for characterizing the impact of frequently occurring anti-patterns. We discuss how SQLCheck suggests fixes for high-impact anti-patterns using rule-based query refactoring techniques. Our experiments demonstrate that SQLCheck enables developers to create more performant, maintainable, and accurate applications.Comment: 18 pages (14 page paper, 1 page references, 2 page Appendix), 12 figures, Conference: SIGMOD'2

    Cracking KD-Tree: The first multidimensional adaptive indexing

    Get PDF
    Workload-aware physical data access structures are crucial to achieve short response time with (exploratory) data analysis tasks as commonly required for Big Data and Data Science applications. Recently proposed techniques such as automatic index advisers (for a priori known static workloads) and query-driven adaptive incremental indexing (for a priori unknown dynamic workloads) form the state-of-the-art to build single-dimensional indexes for single-attribute query predicates. However, similar techniques for more demanding multi-attribute query predicates, which are vital for any data analysis task, have not been proposed, yet. In this paper, we present our on-going work on a new set of workload-adaptive indexing techniques that focus on creating multidimensional indexes. We present our proof-of-concept, the Cracking KD-Tree, an adaptive indexing approach that generates a KD-Tree based on multidimensional range query predicates. It works by incrementally creating partial multidimensional indexes as a by-product of query processing. The indexes are produced only on those parts of the data that are accessed, and their creation cost is effectively distributed across a stream of queries. Experimental results show that the Cracking KD-Tree is three times faster than creating a full KD-Tree, one order of magnitude faster than executing full scans and two orders of magnitude faster than using uni-dimensional full or adaptive indexes on multiple columns

    A Framework for the Automatic Physical Configuration and Tuning of a Mysql Community Server

    Get PDF
    Manual physical configuration and tuning of database servers, is a complicated task requiring a high level of expertise. Database administrators must consider numerous possibilities, to determine a candidate configuration for implementation. In recent times database vendors have responded to this problem, providing solutions which can automatically configure and tune their products. Poor configuration choices, resulting in performance degradation commonplace in manual configurations, have been significantly reduced in these solutions. However, no such solution exists for MySQL Community Server. This thesis, proposes a novel framework for automatically tuning a MySQL Community Server. A first iteration of the framework has been built and is presented in this paper together with its performance measurements

    A solution to the materialized view selection problem in data warehousing

    Get PDF
    One of the most important decisions in the physical designing of a data warehouse is the selection of materialized views and indexes to be created. The problem is to select an appropriate set of views and indexes to storage that minimizes the total query response time, as long as the cost of maintaining them, given a constraint of some resource like storage space, is kept as low as possible.In this work, we have developed a new algorithm for the general problem of se-lection of views considering indexes, as an extension to a well-known algorithm. We present a heuristic for selection of views and indexes to optimize total que-ry response under a materialization time constraint. Finally, we present an ex-perimental comparison of our proposal with the considered state-of-art ap-proach.XI Workshop Bases de Datos y Minería de DatosRed de Universidades con Carreras de Informática (RedUNCI

    A solution to the materialized view selection problem in data warehousing

    Get PDF
    One of the most important decisions in the physical designing of a data warehouse is the selection of materialized views and indexes to be created. The problem is to select an appropriate set of views and indexes to storage that minimizes the total query response time, as long as the cost of maintaining them, given a constraint of some resource like storage space, is kept as low as possible.In this work, we have developed a new algorithm for the general problem of se-lection of views considering indexes, as an extension to a well-known algorithm. We present a heuristic for selection of views and indexes to optimize total que-ry response under a materialization time constraint. Finally, we present an ex-perimental comparison of our proposal with the considered state-of-art ap-proach.XI Workshop Bases de Datos y Minería de DatosRed de Universidades con Carreras de Informática (RedUNCI

    On the use of query-driven XML auto-indexing

    Full text link
    • …
    corecore