131 research outputs found
Automatic physical database design : recommending materialized views
This work discusses physical database design while focusing on the problem of selecting materialized views for improving the performance of a database system. We first address the satisfiability and implication problems for mixed arithmetic constraints. The results are used to support the construction of a search space for view selection problems. We proposed an approach for constructing a search space based on identifying maximum commonalities among queries and on rewriting queries using views. These commonalities are used to define candidate views for materialization from which an optimal or near-optimal set can be chosen as a solution to the view selection problem. Using a search space constructed this way, we address a specific instance of the view selection problem that aims at minimizing the view maintenance cost of multiple materialized views using multi-query optimization techniques. Further, we study this same problem in the context of a commercial database management system in the presence of memory and time restrictions. We also suggest a heuristic approach for maintaining the views while guaranteeing that the restrictions are satisfied. Finally, we consider a dynamic version of the view selection problem where the workload is a sequence of query and update statements. In this case, the views can be created (materialized) and dropped during the execution of the workload. We have implemented our approaches to the dynamic view selection problem and performed extensive experimental testing. Our experiments show that our approaches perform in most cases better than previous ones in terms of effectiveness and efficiency
Maintenance-cost view-selection in large data warehouse systems: algorithms, implementations and evaluations.
Choi Chi Hon.Thesis (M.Phil.)--Chinese University of Hong Kong, 2003.Includes bibliographical references (leaves 120-126).Abstracts in English and Chinese.Abstract --- p.iAbstract (Chinese) --- p.iiAcknowledgement --- p.iiiContents --- p.ivList of Figures --- p.viiiList of Tables --- p.xChapter 1 --- Introduction --- p.1Chapter 1.1 --- Maintenance Cost View Selection Problem --- p.2Chapter 1.2 --- Previous Research Works --- p.3Chapter 1.3 --- Major Contributions --- p.4Chapter 1.4 --- Thesis Organization --- p.6Chapter 2 --- Literature Review --- p.7Chapter 2.1 --- Data Warehouse and OLAP Systems --- p.8Chapter 2.1.1 --- What Is Data Warehouse? --- p.8Chapter 2.1.2 --- What Is OLAP? --- p.10Chapter 2.1.3 --- Difference Between Operational Database Systems and OLAP --- p.10Chapter 2.1.4 --- Data Warehouse Architecture --- p.12Chapter 2.1.5 --- Multidimensional Data Model --- p.13Chapter 2.1.6 --- Star Schema and Snowflake Schema --- p.15Chapter 2.1.7 --- Data Cube --- p.17Chapter 2.1.8 --- ROLAP and MOLAP --- p.19Chapter 2.1.9 --- Query Optimization --- p.20Chapter 2.2 --- Materialized View --- p.22Chapter 2.2.1 --- What Is A Materialized View --- p.23Chapter 2.2.2 --- The Role of Materialized View in OLAP --- p.23Chapter 2.2.3 --- The Challenges in Exploiting Materialized View --- p.24Chapter 2.2.4 --- What Is View Maintenance --- p.25Chapter 2.3 --- View Selection --- p.27Chapter 2.3.1 --- Selection Strategy --- p.27Chapter 2.4 --- Summary --- p.32Chapter 3 --- Problem Definition --- p.33Chapter 3.1 --- View Selection Under Constraint --- p.33Chapter 3.2 --- The Lattice Framework for Maintenance Cost View Selection Prob- lem --- p.35Chapter 3.3 --- The Difficulties of Maintenance Cost View Selection Problem --- p.39Chapter 3.4 --- Summary --- p.41Chapter 4 --- What Difference Heuristics Make --- p.43Chapter 4.1 --- Motivation --- p.44Chapter 4.2 --- Example --- p.46Chapter 4.3 --- Existing Algorithms --- p.49Chapter 4.3.1 --- A*-Heuristic --- p.51Chapter 4.3.2 --- Inverted-Tree Greedy --- p.52Chapter 4.3.3 --- Two-Phase Greedy --- p.54Chapter 4.3.4 --- Integrated Greedy --- p.57Chapter 4.4 --- A Performance Study --- p.60Chapter 4.5 --- Summary --- p.68Chapter 5 --- Materialized View Selection as Constrained Evolutionary Opti- mization --- p.71Chapter 5.1 --- Motivation --- p.72Chapter 5.2 --- Evolutionary Algorithms --- p.73Chapter 5.2.1 --- Constraint Handling: Penalty v.s. Stochastic Ranking --- p.74Chapter 5.2.2 --- The New Stochastic Ranking Evolutionary Algorithm --- p.78Chapter 5.3 --- Experimental Studies --- p.81Chapter 5.3.1 --- Experimental Setup --- p.82Chapter 5.3.2 --- Experimental Results --- p.82Chapter 5.4 --- Summary --- p.89Chapter 6 --- Dynamic Materialized View Management Based On Predicates --- p.90Chapter 6.1 --- Motivation --- p.91Chapter 6.2 --- Examples --- p.93Chapter 6.3 --- Related Work: Static Prepartitioning-Based Materialized View Management --- p.96Chapter 6.4 --- A New Dynamic Predicate-based Partitioning Approach --- p.99Chapter 6.4.1 --- System Overview --- p.102Chapter 6.4.2 --- Partition Advisor --- p.103Chapter 6.4.3 --- View Manager --- p.104Chapter 6.5 --- A Performance Study --- p.108Chapter 6.5.1 --- Performance Metrics --- p.110Chapter 6.5.2 --- Feasibility Studies --- p.110Chapter 6.5.3 --- Query Locality --- p.112Chapter 6.5.4 --- The Effectiveness of Disk Size --- p.115Chapter 6.5.5 --- Scalability --- p.115Chapter 6.6 --- Summary --- p.116Chapter 7 --- Conclusions and Future Work --- p.118Bibliography --- p.12
Efficient OLAP Operations For RDF Analytics
International audienceRDF is the leading data model for the Semantic Web, and dedicated query languages such as SPARQL 1.1, featuring in particular aggregation, allow extracting information from RDF graphs. A framework for analytical processing of RDF data was introduced in [1], where analytical schemas and analytical queries (cubes) are fully redesigned for heterogeneous, semantic-rich RDF graphs. In this novel analytical setting, we consider the following optimization problem: how to reuse the materialized result of a given RDF analytical query (cube) in order to compute the answer to another cube. We provide view-based rewriting algorithms for these cube transformations, and demonstrate experimentally their practical interest
Optimizing Queries Using a Materialized View in a Data Warehoue
A data warehouse is a user-centered environment for data analysis and decision support. To support decision maker in making decisions quickly and accurately, using materialized views can provide significant improvements in query processing time. The problem of answering queries using views is to find efficient methods of answering a query using a set of previously materialized views over the database, rather than accessing the database relations. The known algorithms, the bucket algorithm, the inverse-rules algorithm have been used to rewrite queries using views before executing the queries. The bucket algorithm, predominantly used to rewrite queries, generates a candidate rewriting to a query using views then checks that the rewriting is contained in the original query. However, we show same deficiencies in the bucket algorithm then describe the containment bucket algorithm and give an optimal method to solve this problem. We present an experiment comparing the performance of both algorithms.Computer Science Departmen
The Data Lakehouse: Data Warehousing and More
Relational Database Management Systems designed for Online Analytical
Processing (RDBMS-OLAP) have been foundational to democratizing data and
enabling analytical use cases such as business intelligence and reporting for
many years. However, RDBMS-OLAP systems present some well-known challenges.
They are primarily optimized only for relational workloads, lead to
proliferation of data copies which can become unmanageable, and since the data
is stored in proprietary formats, it can lead to vendor lock-in, restricting
access to engines, tools, and capabilities beyond what the vendor offers. As
the demand for data-driven decision making surges, the need for a more robust
data architecture to address these challenges becomes ever more critical. Cloud
data lakes have addressed some of the shortcomings of RDBMS-OLAP systems, but
they present their own set of challenges. More recently, organizations have
often followed a two-tier architectural approach to take advantage of both
these platforms, leveraging both cloud data lakes and RDBMS-OLAP systems.
However, this approach brings additional challenges, complexities, and
overhead. This paper discusses how a data lakehouse, a new architectural
approach, achieves the same benefits of an RDBMS-OLAP and cloud data lake
combined, while also providing additional advantages. We take today's data
warehousing and break it down into implementation independent components,
capabilities, and practices. We then take these aspects and show how a
lakehouse architecture satisfies them. Then, we go a step further and discuss
what additional capabilities and benefits a lakehouse architecture provides
over an RDBMS-OLAP
A solution to the materialized view selection problem in data warehousing
One of the most important decisions in the physical designing of a data warehouse is the selection of materialized views and indexes to be created. The problem is to select an appropriate set of views and indexes to storage that minimizes the total query response time, as long as the cost of maintaining them, given a constraint of some resource like storage space, is kept as low as possible.In this work, we have developed a new algorithm for the general problem of se-lection of views considering indexes, as an extension to a well-known algorithm.
We present a heuristic for selection of views and indexes to optimize total que-ry response under a materialization time constraint. Finally, we present an ex-perimental comparison of our proposal with the considered state-of-art ap-proach.XI Workshop Bases de Datos y MinerĂa de DatosRed de Universidades con Carreras de Informática (RedUNCI
A solution to the materialized view selection problem in data warehousing
One of the most important decisions in the physical designing of a data warehouse is the selection of materialized views and indexes to be created. The problem is to select an appropriate set of views and indexes to storage that minimizes the total query response time, as long as the cost of maintaining them, given a constraint of some resource like storage space, is kept as low as possible.In this work, we have developed a new algorithm for the general problem of se-lection of views considering indexes, as an extension to a well-known algorithm.
We present a heuristic for selection of views and indexes to optimize total que-ry response under a materialization time constraint. Finally, we present an ex-perimental comparison of our proposal with the considered state-of-art ap-proach.XI Workshop Bases de Datos y MinerĂa de DatosRed de Universidades con Carreras de Informática (RedUNCI
- …