1,594 research outputs found

    Stochastic Database Cracking: Towards Robust Adaptive Indexing in Main-Memory Column-Stores

    Get PDF
    Modern business applications and scientific databases call for inherently dynamic data storage environments. Such environments are characterized by two challenging features: (a) they have little idle system time to devote on physical design; and (b) there is little, if any, a priori workload knowledge, while the query and data workload keeps changing dynamically. In such environments, traditional approaches to index building and maintenance cannot apply. Database cracking has been proposed as a solution that allows on-the-fly physical data reorganization, as a collateral effect of query processing. Cracking aims to continuously and automatically adapt indexes to the workload at hand, without human intervention. Indexes are built incrementally, adaptively, and on demand. Nevertheless, as we show, existing adaptive indexing methods fail to deliver workload-robustness; they perform much better with random workloads than with others. This frailty derives from the inelasticity with which these approaches interpret each query as a hint on how data should be stored. Current cracking schemes blindly reorganize the data within each query's range, even if that results into successive expensive operations with minimal indexing benefit. In this paper, we introduce stochastic cracking, a significantly more resilient approach to adaptive indexing. Stochastic cracking also uses each query as a hint on how to reorganize data, but not blindly so; it gains resilience and avoids performance bottlenecks by deliberately applying certain arbitrary choices in its decision-making. Thereby, we bring adaptive indexing forward to a mature formulation that confers the workload-robustness previous approaches lacked. Our extensive experimental study verifies that stochastic cracking maintains the desired properties of original database cracking while at the same time it performs well with diverse realistic workloads.Comment: VLDB201

    Clustering-Based Materialized View Selection in Data Warehouses

    Full text link
    Materialized view selection is a non-trivial task. Hence, its complexity must be reduced. A judicious choice of views must be cost-driven and influenced by the workload experienced by the system. In this paper, we propose a framework for materialized view selection that exploits a data mining technique (clustering), in order to determine clusters of similar queries. We also propose a view merging algorithm that builds a set of candidate views, as well as a greedy process for selecting a set of views to materialize. This selection is based on cost models that evaluate the cost of accessing data using views and the cost of storing these views. To validate our strategy, we executed a workload of decision-support queries on a test data warehouse, with and without using our strategy. Our experimental results demonstrate its efficiency, even when storage space is limited

    Write-limited sorts and joins for persistent memory

    Get PDF
    To mitigate the impact of the widening gap between the memory needs of CPUs and what standard memory technology can deliver, system architects have introduced a new class of memory technology termed persistent memory. Persistent memory is byteaddressable, but exhibits asymmetric I/O: writes are typically one order of magnitude more expensive than reads. Byte addressability combined with I/O asymmetry render the performance profile of persistent memory unique. Thus, it becomes imperative to find new ways to seamlessly incorporate it into database systems. We do so in the context of query processing. We focus on the fundamental operations of sort and join processing. We introduce the notion of write-limited algorithms that effectively minimize the I/O cost. We give a high-level API that enables the system to dynamically optimize the workflow of the algorithms; or, alternatively, allows the developer to tune the write profile of the algorithms. We present four different techniques to incorporate persistent memory into the database processing stack in light of this API. We have implemented and extensively evaluated all our proposals. Our results show that the algorithms deliver on their promise of I/O-minimality and tunable performance. We showcase the merits and deficiencies of each implementation technique, thus taking a solid first step towards incorporating persistent memory into query processing. 1

    Integrating OLAP and Ranking: The Ranking-Cube Methodology

    Get PDF
    Recent years have witnessed an enormous growth of data in business, industry, and Web applications. Database search often returns a large collection of results, which poses challenges to both efficient query processing and effective digest of the query results. To address this problem, ranked search has been introduced to database systems. We study the problem of On-Line Analytical Processing (OLAP) of ranked queries, where ranked queries are conducted in the arbitrary subset of data defined by multi-dimensional selections. While pre-computation and multi-dimensional aggregation is the standard solution for OLAP, materializing dynamic ranking results is unrealistic because the ranking criteria are not known until the query time. To overcome such difficulty, we develop a new ranking cube method that performs semi on-line materialization and semi online computation in this thesis. Its complete life cycle, including cube construction, incremental maintenance, and query processing, is also discussed. We further extend the ranking cube in three dimensions. First, how to answer queries in high-dimensional data. Second, how to answer queries which involves joins over multiple relations. Third, how to answer general preference queries (besides ranked queries, such as skyline queries). Our performance studies show that ranking-cube is orders of magnitude faster than previous approaches

    An Efficient Query Optimizer with Materialized Intermediate Views in Distributed and Cloud Environment

    Get PDF
    In cloud computing environment hardware resources required for the execution of query using distributed relational database system are scaled up or scaled down according to the query workload performance. Complex queries require large scale of resources in order to complete their execution efficiently. The large scale of resource requirements can be reduced by minimizing query execution time that maximizes resource utilization and decreases payment overhead of customers. Complex queries or batch queries contain some common subexpressions. If these common subexpressions evaluated once and their results are cached, they can be used for execution of further queries. In this research, we have come up with an algorithm for query optimization, which aims at storing intermediate results of the queries and use these by-products for execution of future queries. Extensive experiments have been carried out with the help of simulation model to test the algorithm efficiency
    corecore