45 research outputs found
Automated Storage Layout for Database Systems
Modern storage systems are complex. Simple direct-attached storage devices are giving way to storage systems that are flexible, network-attached, consolidated and virtualized. Today, storage systems have their own administrators, who use specialized tools and expertise to configure
and manage storage resources. As a result, database administrators are no longer in direct control of the design and configuration of their database systems' underlying storage resources.
This introduces problems because database physical design and storage configuration are closely related tasks, and the separation
makes it more difficult to achieve a good end-to-end design. For instance, the performance of a database system depends strongly on the storage layout of database objects, such as tables and indexes, and the separation makes it hard to design a storage layout that is tuned to the I/O workload generated by the database system. In this thesis we address this problem and attempt to close the information gap between database and storage tiers by addressing the problem of predicting the storage (I/O) workload that will be generated by a database management system. Specifically, we show how to translate a database workload description, together with a database physical design, into a characterization of the I/O workload that will result. Such a characterization can directly be used by a storage configuration tool and thus enables effective end-to-end design and configuration
spanning both the database and storage tiers.
We then introduce our storage layout optimization tool, which leverages such workload characterizations to generate an optimized layout for a given set of database objects. We formulate the layout problem as a non-linear programming (NLP) problem and
use the I/O characterization as input to an NLP solver. We have incorporated our I/O estimation technique into the PostgreSQL database management system and our layout optimization technique into a database layout advisor. We present an empirical assessment of the cost of both tools as well as the efficacy and accuracy of their results
Dynamic Clustering in Object-Oriented Databases: An Advocacy for Simplicity
International audienceWe present in this paper three dynamic clustering techniques for Object-Oriented Databases (OODBs). The first two, Dynamic, Statistical & Tunable Clustering (DSTC) and StatClust, exploit both comprehensive usage statistics and the inter-object reference graph. They are quite elaborate. However, they are also complex to implement and induce a high overhead. The third clustering technique, called Detection & Reclustering of Objects (DRO), is based on the same principles, but is much simpler to implement. These three clustering algorithm have been implemented in the Texas persistent object store and compared in terms of clustering efficiency (i.e., overall performance increase) and overhead using the Object Clustering Benchmark (OCB). The results obtained showed that DRO induced a lighter overhead while still achieving better overall performance
Cracking KD-Tree: The first multidimensional adaptive indexing
Workload-aware physical data access structures are crucial to achieve short response time with (exploratory) data analysis tasks as commonly required for Big Data and Data Science applications. Recently proposed techniques such as automatic index advisers (for a priori known static workloads) and query-driven adaptive incremental indexing (for a priori unknown dynamic workloads) form the state-of-the-art to build single-dimensional indexes for single-attribute query predicates. However, similar techniques for more demanding multi-attribute query predicates, which are vital for any data analysis task, have not been proposed, yet. In this paper, we present our on-going work on a new set of workload-adaptive indexing techniques that focus on creating multidimensional indexes. We present our proof-of-concept, the Cracking KD-Tree, an adaptive indexing approach that generates a KD-Tree based on multidimensional range query predicates. It works by incrementally creating partial multidimensional indexes as a by-product of query processing. The indexes are produced only on those parts of the data that are accessed, and their creation cost is effectively distributed across a stream of queries. Experimental results show that the Cracking KD-Tree is three times faster than creating a full KD-Tree, one order of magnitude faster than executing full scans and two orders of magnitude faster than using uni-dimensional full or adaptive indexes on multiple columns
SQLCheck: Automated Detection and Diagnosis of SQL Anti-Patterns
The emergence of database-as-a-service platforms has made deploying database
applications easier than before. Now, developers can quickly create scalable
applications. However, designing performant, maintainable, and accurate
applications is challenging. Developers may unknowingly introduce anti-patterns
in the application's SQL statements. These anti-patterns are design decisions
that are intended to solve a problem, but often lead to other problems by
violating fundamental design principles.
In this paper, we present SQLCheck, a holistic toolchain for automatically
finding and fixing anti-patterns in database applications. We introduce
techniques for automatically (1) detecting anti-patterns with high precision
and recall, (2) ranking the anti-patterns based on their impact on performance,
maintainability, and accuracy of applications, and (3) suggesting alternative
queries and changes to the database design to fix these anti-patterns. We
demonstrate the prevalence of these anti-patterns in a large collection of
queries and databases collected from open-source repositories. We introduce an
anti-pattern detection algorithm that augments query analysis with data
analysis. We present a ranking model for characterizing the impact of
frequently occurring anti-patterns. We discuss how SQLCheck suggests fixes for
high-impact anti-patterns using rule-based query refactoring techniques. Our
experiments demonstrate that SQLCheck enables developers to create more
performant, maintainable, and accurate applications.Comment: 18 pages (14 page paper, 1 page references, 2 page Appendix), 12
figures, Conference: SIGMOD'2
A solution to the materialized view selection problem in data warehousing
One of the most important decisions in the physical designing of a data warehouse is the selection of materialized views and indexes to be created. The problem is to select an appropriate set of views and indexes to storage that minimizes the total query response time, as long as the cost of maintaining them, given a constraint of some resource like storage space, is kept as low as possible.In this work, we have developed a new algorithm for the general problem of se-lection of views considering indexes, as an extension to a well-known algorithm.
We present a heuristic for selection of views and indexes to optimize total que-ry response under a materialization time constraint. Finally, we present an ex-perimental comparison of our proposal with the considered state-of-art ap-proach.XI Workshop Bases de Datos y MinerĂa de DatosRed de Universidades con Carreras de Informática (RedUNCI
A solution to the materialized view selection problem in data warehousing
One of the most important decisions in the physical designing of a data warehouse is the selection of materialized views and indexes to be created. The problem is to select an appropriate set of views and indexes to storage that minimizes the total query response time, as long as the cost of maintaining them, given a constraint of some resource like storage space, is kept as low as possible.In this work, we have developed a new algorithm for the general problem of se-lection of views considering indexes, as an extension to a well-known algorithm.
We present a heuristic for selection of views and indexes to optimize total que-ry response under a materialization time constraint. Finally, we present an ex-perimental comparison of our proposal with the considered state-of-art ap-proach.XI Workshop Bases de Datos y MinerĂa de DatosRed de Universidades con Carreras de Informática (RedUNCI
A Framework for the Automatic Physical Configuration and Tuning of a Mysql Community Server
Manual physical configuration and tuning of database servers, is a complicated task requiring a high level of expertise. Database administrators must consider numerous possibilities, to determine a candidate configuration for implementation. In recent times database vendors have responded to this problem, providing solutions which can automatically configure and tune their products. Poor configuration choices, resulting in performance degradation commonplace in manual configurations, have been significantly reduced in these solutions. However, no such solution exists for MySQL Community Server. This thesis, proposes a novel framework for automatically tuning a MySQL Community Server. A first iteration of the framework has been built and is presented in this paper together with its performance measurements