1,595 research outputs found
Stochastic Database Cracking: Towards Robust Adaptive Indexing in Main-Memory Column-Stores
Modern business applications and scientific databases call for inherently
dynamic data storage environments. Such environments are characterized by two
challenging features: (a) they have little idle system time to devote on
physical design; and (b) there is little, if any, a priori workload knowledge,
while the query and data workload keeps changing dynamically. In such
environments, traditional approaches to index building and maintenance cannot
apply. Database cracking has been proposed as a solution that allows on-the-fly
physical data reorganization, as a collateral effect of query processing.
Cracking aims to continuously and automatically adapt indexes to the workload
at hand, without human intervention. Indexes are built incrementally,
adaptively, and on demand. Nevertheless, as we show, existing adaptive indexing
methods fail to deliver workload-robustness; they perform much better with
random workloads than with others. This frailty derives from the inelasticity
with which these approaches interpret each query as a hint on how data should
be stored. Current cracking schemes blindly reorganize the data within each
query's range, even if that results into successive expensive operations with
minimal indexing benefit. In this paper, we introduce stochastic cracking, a
significantly more resilient approach to adaptive indexing. Stochastic cracking
also uses each query as a hint on how to reorganize data, but not blindly so;
it gains resilience and avoids performance bottlenecks by deliberately applying
certain arbitrary choices in its decision-making. Thereby, we bring adaptive
indexing forward to a mature formulation that confers the workload-robustness
previous approaches lacked. Our extensive experimental study verifies that
stochastic cracking maintains the desired properties of original database
cracking while at the same time it performs well with diverse realistic
workloads.Comment: VLDB201
Clustering-Based Materialized View Selection in Data Warehouses
Materialized view selection is a non-trivial task. Hence, its complexity must
be reduced. A judicious choice of views must be cost-driven and influenced by
the workload experienced by the system. In this paper, we propose a framework
for materialized view selection that exploits a data mining technique
(clustering), in order to determine clusters of similar queries. We also
propose a view merging algorithm that builds a set of candidate views, as well
as a greedy process for selecting a set of views to materialize. This selection
is based on cost models that evaluate the cost of accessing data using views
and the cost of storing these views. To validate our strategy, we executed a
workload of decision-support queries on a test data warehouse, with and without
using our strategy. Our experimental results demonstrate its efficiency, even
when storage space is limited
Write-limited sorts and joins for persistent memory
To mitigate the impact of the widening gap between the memory needs of CPUs and what standard memory technology can deliver, system architects have introduced a new class of memory technology termed persistent memory. Persistent memory is byteaddressable, but exhibits asymmetric I/O: writes are typically one order of magnitude more expensive than reads. Byte addressability combined with I/O asymmetry render the performance profile of persistent memory unique. Thus, it becomes imperative to find new ways to seamlessly incorporate it into database systems. We do so in the context of query processing. We focus on the fundamental operations of sort and join processing. We introduce the notion of write-limited algorithms that effectively minimize the I/O cost. We give a high-level API that enables the system to dynamically optimize the workflow of the algorithms; or, alternatively, allows the developer to tune the write profile of the algorithms. We present four different techniques to incorporate persistent memory into the database processing stack in light of this API. We have implemented and extensively evaluated all our proposals. Our results show that the algorithms deliver on their promise of I/O-minimality and tunable performance. We showcase the merits and deficiencies of each implementation technique, thus taking a solid first step towards incorporating persistent memory into query processing. 1
Integrating OLAP and Ranking: The Ranking-Cube Methodology
Recent years have witnessed an enormous growth of data in business, industry, and Web applications. Database search often returns a large collection of results, which poses challenges to both efficient query processing and effective digest of the query results. To address this problem, ranked search has been introduced to database systems. We study the problem of On-Line Analytical Processing (OLAP) of ranked queries, where ranked queries are conducted in the arbitrary subset of data defined by multi-dimensional selections. While pre-computation and multi-dimensional aggregation is the standard solution for OLAP, materializing dynamic ranking results is unrealistic because the ranking criteria are not known until the query time. To overcome such difficulty, we develop a new ranking cube method that performs semi on-line materialization and semi online computation in this thesis. Its complete life cycle, including cube construction, incremental maintenance, and query processing, is also discussed. We further extend the ranking cube in three dimensions. First, how to answer queries in high-dimensional data. Second, how to answer queries which involves joins over multiple relations. Third, how to answer general preference queries (besides ranked queries, such as skyline queries). Our performance studies show that ranking-cube is orders of magnitude faster than previous approaches
An Efficient Query Optimizer with Materialized Intermediate Views in Distributed and Cloud Environment
In cloud computing environment hardware resources required for the execution of query using distributed relational database system are scaled up or scaled down according to the query workload performance. Complex queries require large scale of resources in order to complete their execution efficiently. The large scale of resource requirements can be reduced by minimizing query execution time that maximizes resource utilization and decreases payment overhead of customers. Complex queries or batch queries contain some common subexpressions. If these common subexpressions evaluated once and their results are cached, they can be used for execution of further queries. In this research, we have come up with an algorithm for query optimization, which aims at storing intermediate results of the queries and use these by-products for execution of future queries. Extensive experiments have been carried out with the help of simulation model to test the algorithm efficiency
- …