1,471 research outputs found

    Maintenance-cost view-selection in large data warehouse systems: algorithms, implementations and evaluations.

    Get PDF
    Choi Chi Hon.Thesis (M.Phil.)--Chinese University of Hong Kong, 2003.Includes bibliographical references (leaves 120-126).Abstracts in English and Chinese.Abstract --- p.iAbstract (Chinese) --- p.iiAcknowledgement --- p.iiiContents --- p.ivList of Figures --- p.viiiList of Tables --- p.xChapter 1 --- Introduction --- p.1Chapter 1.1 --- Maintenance Cost View Selection Problem --- p.2Chapter 1.2 --- Previous Research Works --- p.3Chapter 1.3 --- Major Contributions --- p.4Chapter 1.4 --- Thesis Organization --- p.6Chapter 2 --- Literature Review --- p.7Chapter 2.1 --- Data Warehouse and OLAP Systems --- p.8Chapter 2.1.1 --- What Is Data Warehouse? --- p.8Chapter 2.1.2 --- What Is OLAP? --- p.10Chapter 2.1.3 --- Difference Between Operational Database Systems and OLAP --- p.10Chapter 2.1.4 --- Data Warehouse Architecture --- p.12Chapter 2.1.5 --- Multidimensional Data Model --- p.13Chapter 2.1.6 --- Star Schema and Snowflake Schema --- p.15Chapter 2.1.7 --- Data Cube --- p.17Chapter 2.1.8 --- ROLAP and MOLAP --- p.19Chapter 2.1.9 --- Query Optimization --- p.20Chapter 2.2 --- Materialized View --- p.22Chapter 2.2.1 --- What Is A Materialized View --- p.23Chapter 2.2.2 --- The Role of Materialized View in OLAP --- p.23Chapter 2.2.3 --- The Challenges in Exploiting Materialized View --- p.24Chapter 2.2.4 --- What Is View Maintenance --- p.25Chapter 2.3 --- View Selection --- p.27Chapter 2.3.1 --- Selection Strategy --- p.27Chapter 2.4 --- Summary --- p.32Chapter 3 --- Problem Definition --- p.33Chapter 3.1 --- View Selection Under Constraint --- p.33Chapter 3.2 --- The Lattice Framework for Maintenance Cost View Selection Prob- lem --- p.35Chapter 3.3 --- The Difficulties of Maintenance Cost View Selection Problem --- p.39Chapter 3.4 --- Summary --- p.41Chapter 4 --- What Difference Heuristics Make --- p.43Chapter 4.1 --- Motivation --- p.44Chapter 4.2 --- Example --- p.46Chapter 4.3 --- Existing Algorithms --- p.49Chapter 4.3.1 --- A*-Heuristic --- p.51Chapter 4.3.2 --- Inverted-Tree Greedy --- p.52Chapter 4.3.3 --- Two-Phase Greedy --- p.54Chapter 4.3.4 --- Integrated Greedy --- p.57Chapter 4.4 --- A Performance Study --- p.60Chapter 4.5 --- Summary --- p.68Chapter 5 --- Materialized View Selection as Constrained Evolutionary Opti- mization --- p.71Chapter 5.1 --- Motivation --- p.72Chapter 5.2 --- Evolutionary Algorithms --- p.73Chapter 5.2.1 --- Constraint Handling: Penalty v.s. Stochastic Ranking --- p.74Chapter 5.2.2 --- The New Stochastic Ranking Evolutionary Algorithm --- p.78Chapter 5.3 --- Experimental Studies --- p.81Chapter 5.3.1 --- Experimental Setup --- p.82Chapter 5.3.2 --- Experimental Results --- p.82Chapter 5.4 --- Summary --- p.89Chapter 6 --- Dynamic Materialized View Management Based On Predicates --- p.90Chapter 6.1 --- Motivation --- p.91Chapter 6.2 --- Examples --- p.93Chapter 6.3 --- Related Work: Static Prepartitioning-Based Materialized View Management --- p.96Chapter 6.4 --- A New Dynamic Predicate-based Partitioning Approach --- p.99Chapter 6.4.1 --- System Overview --- p.102Chapter 6.4.2 --- Partition Advisor --- p.103Chapter 6.4.3 --- View Manager --- p.104Chapter 6.5 --- A Performance Study --- p.108Chapter 6.5.1 --- Performance Metrics --- p.110Chapter 6.5.2 --- Feasibility Studies --- p.110Chapter 6.5.3 --- Query Locality --- p.112Chapter 6.5.4 --- The Effectiveness of Disk Size --- p.115Chapter 6.5.5 --- Scalability --- p.115Chapter 6.6 --- Summary --- p.116Chapter 7 --- Conclusions and Future Work --- p.118Bibliography --- p.12

    Clustering-Based Materialized View Selection in Data Warehouses

    Full text link
    Materialized view selection is a non-trivial task. Hence, its complexity must be reduced. A judicious choice of views must be cost-driven and influenced by the workload experienced by the system. In this paper, we propose a framework for materialized view selection that exploits a data mining technique (clustering), in order to determine clusters of similar queries. We also propose a view merging algorithm that builds a set of candidate views, as well as a greedy process for selecting a set of views to materialize. This selection is based on cost models that evaluate the cost of accessing data using views and the cost of storing these views. To validate our strategy, we executed a workload of decision-support queries on a test data warehouse, with and without using our strategy. Our experimental results demonstrate its efficiency, even when storage space is limited

    A Novel Hybrid Optimization With Ensemble Constraint Handling Approach for the Optimal Materialized Views

    Get PDF
    The datawarehouse is extremely challenging to work with, as doing so necessitates a significant investment of both time and space. As a result, it is essential to enable rapid data processing in order to cut down on the amount of time needed to respond to queries that are sent to the warehouse. To effectively solve this problem, one of the significant approaches that should be taken is to take the view of materialization. It is extremely unlikely that all of the views that can be derived from the data will ever be materialized. As a result, view subsets need to be selected intelligently in order to enable rapid data processing for queries coming from a variety of locations. The Materialized view selection problem is addressed by the model that has been proposed. The model is based on the ensemble constraint handling techniques (ECHT). In order to optimize the problem, we must take into account the constraints, which include the self-adaptive penalty, the Epsilon ()-parameter, and the stochastic ranking. For the purpose of making a quicker and more accurate selection of queries from the data warehouse, the proposed model includes the implementation of an innovative algorithm known as the constrained hybrid Ebola with COATI optimization (CHECO) algorithm. For the purpose of computing the best possible fitness, the goals of "processing cost of the query," "response cost," and "maintenance cost" are each defined. The top views are selected by the CHECO algorithm based on whether or not the defined fitness requirements are met. In the final step of the process, the proposed model is compared to the models already in use in order to validate the performance improvement in terms of a variety of performance metrics

    A Framework for Developing Real-Time OLAP algorithm using Multi-core processing and GPU: Heterogeneous Computing

    Full text link
    The overwhelmingly increasing amount of stored data has spurred researchers seeking different methods in order to optimally take advantage of it which mostly have faced a response time problem as a result of this enormous size of data. Most of solutions have suggested materialization as a favourite solution. However, such a solution cannot attain Real- Time answers anyhow. In this paper we propose a framework illustrating the barriers and suggested solutions in the way of achieving Real-Time OLAP answers that are significantly used in decision support systems and data warehouses

    RDFViewS: A Storage Tuning Wizard for RDF Applications

    Get PDF
    In recent years, the significant growth of RDF data used in numerous applications has made its efficient and scalable manipulation an important issue. In this paper, we present RDFViewS, a system capable of choosing the most suitable views to materialize, in order to minimize the query response time for a specific SPARQL query workload, while taking into account the view maintenance cost and storage space constraints. Our system employs practical algorithms and heuristics to navigate through the search space of potential view configurations, and exploits the possibly available semantic information - expressed via an RDF Schema - to ensure the completeness of the query evaluation

    XML Reconstruction View Selection in XML Databases: Complexity Analysis and Approximation Scheme

    Full text link
    Query evaluation in an XML database requires reconstructing XML subtrees rooted at nodes found by an XML query. Since XML subtree reconstruction can be expensive, one approach to improve query response time is to use reconstruction views - materialized XML subtrees of an XML document, whose nodes are frequently accessed by XML queries. For this approach to be efficient, the principal requirement is a framework for view selection. In this work, we are the first to formalize and study the problem of XML reconstruction view selection. The input is a tree TT, in which every node ii has a size cic_i and profit pip_i, and the size limitation CC. The target is to find a subset of subtrees rooted at nodes i1,⋯ ,iki_1,\cdots, i_k respectively such that ci1+⋯+cik≤Cc_{i_1}+\cdots +c_{i_k}\le C, and pi1+⋯+pikp_{i_1}+\cdots +p_{i_k} is maximal. Furthermore, there is no overlap between any two subtrees selected in the solution. We prove that this problem is NP-hard and present a fully polynomial-time approximation scheme (FPTAS) as a solution
    • …
    corecore