6 research outputs found

    An architecture for recycling intermediates in a column-store

    Get PDF
    Automatically recycling (intermediate) results is a grand challenge for state-of-the-art databases to improve both query response time and throughput. Tuples are loaded and streamed through a tuple-at-a-time processing pipeline avoiding materialization of intermediates as much as possible. This limits the opportunities for reuse of overlapping computations to DBA-defined materialized views and function/result cache tuning. In contrast, the operator-at-a-time execution paradigm produces fully materialized results in each step of the query plan. To avoid resource contention, these intermediates are evicted as soon as possible. In this paper we study an architecture that harvests the by-products of the operator-at-a-time paradigm in a column store system using a lightweight mechanism, the recycler. The key challenge then becomes selection of the policies to admit intermediates to the resource pool, their retention period, and the eviction strategy when facing resource limitations. The proposed recycling architecture has been implemented in an open-source system. An experimental analysis against the TPC-H ad-hoc decision support benchmark and a complex, real-world application (SkyServer) demonstrates its effectiveness in terms of self-organizing behavior and its significant performance gains. The results indicate the potentials of recycling intermediates and charters a route for further development of database kernels

    Maintenance-cost view-selection in large data warehouse systems: algorithms, implementations and evaluations.

    Get PDF
    Choi Chi Hon.Thesis (M.Phil.)--Chinese University of Hong Kong, 2003.Includes bibliographical references (leaves 120-126).Abstracts in English and Chinese.Abstract --- p.iAbstract (Chinese) --- p.iiAcknowledgement --- p.iiiContents --- p.ivList of Figures --- p.viiiList of Tables --- p.xChapter 1 --- Introduction --- p.1Chapter 1.1 --- Maintenance Cost View Selection Problem --- p.2Chapter 1.2 --- Previous Research Works --- p.3Chapter 1.3 --- Major Contributions --- p.4Chapter 1.4 --- Thesis Organization --- p.6Chapter 2 --- Literature Review --- p.7Chapter 2.1 --- Data Warehouse and OLAP Systems --- p.8Chapter 2.1.1 --- What Is Data Warehouse? --- p.8Chapter 2.1.2 --- What Is OLAP? --- p.10Chapter 2.1.3 --- Difference Between Operational Database Systems and OLAP --- p.10Chapter 2.1.4 --- Data Warehouse Architecture --- p.12Chapter 2.1.5 --- Multidimensional Data Model --- p.13Chapter 2.1.6 --- Star Schema and Snowflake Schema --- p.15Chapter 2.1.7 --- Data Cube --- p.17Chapter 2.1.8 --- ROLAP and MOLAP --- p.19Chapter 2.1.9 --- Query Optimization --- p.20Chapter 2.2 --- Materialized View --- p.22Chapter 2.2.1 --- What Is A Materialized View --- p.23Chapter 2.2.2 --- The Role of Materialized View in OLAP --- p.23Chapter 2.2.3 --- The Challenges in Exploiting Materialized View --- p.24Chapter 2.2.4 --- What Is View Maintenance --- p.25Chapter 2.3 --- View Selection --- p.27Chapter 2.3.1 --- Selection Strategy --- p.27Chapter 2.4 --- Summary --- p.32Chapter 3 --- Problem Definition --- p.33Chapter 3.1 --- View Selection Under Constraint --- p.33Chapter 3.2 --- The Lattice Framework for Maintenance Cost View Selection Prob- lem --- p.35Chapter 3.3 --- The Difficulties of Maintenance Cost View Selection Problem --- p.39Chapter 3.4 --- Summary --- p.41Chapter 4 --- What Difference Heuristics Make --- p.43Chapter 4.1 --- Motivation --- p.44Chapter 4.2 --- Example --- p.46Chapter 4.3 --- Existing Algorithms --- p.49Chapter 4.3.1 --- A*-Heuristic --- p.51Chapter 4.3.2 --- Inverted-Tree Greedy --- p.52Chapter 4.3.3 --- Two-Phase Greedy --- p.54Chapter 4.3.4 --- Integrated Greedy --- p.57Chapter 4.4 --- A Performance Study --- p.60Chapter 4.5 --- Summary --- p.68Chapter 5 --- Materialized View Selection as Constrained Evolutionary Opti- mization --- p.71Chapter 5.1 --- Motivation --- p.72Chapter 5.2 --- Evolutionary Algorithms --- p.73Chapter 5.2.1 --- Constraint Handling: Penalty v.s. Stochastic Ranking --- p.74Chapter 5.2.2 --- The New Stochastic Ranking Evolutionary Algorithm --- p.78Chapter 5.3 --- Experimental Studies --- p.81Chapter 5.3.1 --- Experimental Setup --- p.82Chapter 5.3.2 --- Experimental Results --- p.82Chapter 5.4 --- Summary --- p.89Chapter 6 --- Dynamic Materialized View Management Based On Predicates --- p.90Chapter 6.1 --- Motivation --- p.91Chapter 6.2 --- Examples --- p.93Chapter 6.3 --- Related Work: Static Prepartitioning-Based Materialized View Management --- p.96Chapter 6.4 --- A New Dynamic Predicate-based Partitioning Approach --- p.99Chapter 6.4.1 --- System Overview --- p.102Chapter 6.4.2 --- Partition Advisor --- p.103Chapter 6.4.3 --- View Manager --- p.104Chapter 6.5 --- A Performance Study --- p.108Chapter 6.5.1 --- Performance Metrics --- p.110Chapter 6.5.2 --- Feasibility Studies --- p.110Chapter 6.5.3 --- Query Locality --- p.112Chapter 6.5.4 --- The Effectiveness of Disk Size --- p.115Chapter 6.5.5 --- Scalability --- p.115Chapter 6.6 --- Summary --- p.116Chapter 7 --- Conclusions and Future Work --- p.118Bibliography --- p.12

    An architecture for recycling intermediates in a column-store

    Get PDF
    Automatic recycling intermediate results to improve both query response time and throughput is a grand c

    An architecture for recycling intermediates in a column-store

    Full text link
    textabstractAutomatically recycling (intermediate) results is a grand challenge for state-of-the-art databases to improve both query response time and throughput. Tuples are loaded and streamed through a tuple-at-a-time processing pipeline avoiding materialization of intermediates as much as possible. This limits the opportunities for reuse of overlapping computations to DBA-defined materialized views and function/result cache tuning. In contrast, the operator-at-a-time execution paradigm produces fully materialized results in each step of the query plan. To avoid resource contention, these intermediates are evicted as soon as possible. In this paper we study an architecture that harvests the by-products of the operator-at-a-time paradigm in a column store system using a lightweight mechanism, the recycler. The key challenge then becomes selection of the policies to admit intermediates to the resource pool, their retention period, and the eviction strategy when facing resource limitations. The proposed recycling architecture has been implemented in an open-source system. An experimental analysis against the TPC-H ad-hoc decision support benchmark and a complex, real-world application (SkyServer) demonstrates its effectiveness in terms of self-organizing behavior and its significant performance gains. The results indicate the potentials of recycling intermediates and charters a route for further development of database kernels

    Research on Materialized View Selection

    Get PDF
    定义了数据仓库领域的视图选择问题,并讨论了与该问题相关的代价模型、收益函数、代价计算、约束条件和视图索引等内容;介绍了3大类视图选择方法,即静态方法、动态方法和混合方法,以及各类方法的代表性研究成果;最后展望未来的研究方向.Definition of view selection issue in the field of data warehouses is presented, followed by the discussion of related problems, such as cost model, benefit function, cost computation, restriction condition, view index, etc. Then three categories of view selection methods, namely, static, dynamic and hybrid methods are discussed. For each method, some representative work is introduced. Finally some future trends in this area are discussed.Supported by the National Natural Science Foundation of China under Grant No.60473051 (国家自然科学基金); the National High-Tech Research and Development Plan of China under Grant Nos.2007AA01Z191, 2006AA01Z230 (国家高技术研究发展计划(863)
    corecore