Search CORE

146 research outputs found

Recycling in pipelined query evaluation

Author: Boncz Peter A.
Nagel Fabian
Viglas Stratis
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

htmlabstractDatabase systems typically execute queries in isolation. Sharing recurring intermediate and final results between successive query invocations is ignored or only exploited by caching final query results. The DBA is kept in the loop to make explicit sharing decisions by identifying and/or defining materialized views. Thus decisions are made only after a long time and sharing opportunities may be missed. Recycling intermediate results has been proposed as a method to make database query engines profit from opportunities to reuse fine-grained partial query results, that is fully autonomous and is able to continuously adapt to changes in the workload. The technique was recently revisited in the context of MonetDB, a system that by default materializes all intermediate results. Materializing intermediate results can consume significant system resources, therefore most other database systems avoid this where possible, following a pipelined query architecture instead. The novelty of this paper is to show how recycling can successfully be applied in pipelined query executors, by tracking the benefit of materializing possible intermediate results and then choosing the ones making best use of a limited intermediate result cache. We present ways to maximize the potential of recycling by leveraging subsumption and proactive query rewriting. We have implemented our approach in the Vectorwise database engine and have experimentally evaluated its potential using both synthetic and real-world datasets. Our results show that intermediate result recycling significantly improves performance

Vectorwise: Beyond Column Stores

Author: Boncz P.A.
Zukowski M.
Publication venue
Publication date: 01/01/2012
Field of study

textabstractThis paper tells the story of Vectorwise, a high-performance analytical database system, from multiple perspectives: its history from academic project to commercial product, the evolution of its technical architecture, customer reactions to the product and its future research and development roadmap. One take-away from this story is that the novelty in Vectorwise is much more than just column-storage: it boasts many query processing innovations in its vectorized execution model, and an adaptive mixed row/column data storage model with indexing support tailored to analytical workloads. Another one is that there is a long road from research prototype to commercial product, though database research continues to achieve a strong innovative inﬂuence on product development

Practical cross-engine transactions in dual-engine database systems

Author: Zhang Jianqiu
Publication venue
Publication date: 09/08/2021
Field of study

With the growing DRAM capacity and core count in modern servers, database systems are becoming increasingly multi-engine to feature a heterogeneous set of engines. In particular, a memory-optimized engine and a conventional storage-centric engine may coexist to satisfy various application needs. However, handling cross-engine transactions that access more than one engine remains challenging in terms of correctness, performance and programmability. This thesis describes Skeena, an approach to cross-engine transactions with proper isolation guarantees and low overhead. Skeena adapts and integrates past concurrency control theory to provide a complete solution to supporting various isolation levels in dual-engine systems, and proposes a lightweight transaction tracking structure that captures the necessary information to guarantee correctness with low overhead. Evaluation on a 40-core server shows that Skeena only incurs minuscule overhead for cross-engine transactions, without penalizing single-engine transactions

Pay One, Get Hundreds for Free: Reducing Cloud Costs through Shared Query Execution

Author: Chen Chung-Min
Graefe Goetz
Harizopoulos Stavros
Lang Christian A.
Manegold Stefan
Transaction Processing Performance Council
Zukowski Marcin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/09/2018
Field of study

Cloud-based data analysis is nowadays common practice because of the lower system management overhead as well as the pay-as-you-go pricing model. The pricing model, however, is not always suitable for query processing as heavy use results in high costs. For example, in query-as-a-service systems, where users are charged per processed byte, collections of queries accessing the same data frequently can become expensive. The problem is compounded by the limited options for the user to optimize query execution when using declarative interfaces such as SQL. In this paper, we show how, without modifying existing systems and without the involvement of the cloud provider, it is possible to significantly reduce the overhead, and hence the cost, of query-as-a-service systems. Our approach is based on query rewriting so that multiple concurrent queries are combined into a single query. Our experiments show the aggregated amount of work done by the shared execution is smaller than in a query-at-a-time approach. Since queries are charged per byte processed, the cost of executing a group of queries is often the same as executing a single one of them. As an example, we demonstrate how the shared execution of the TPC-H benchmark is up to 100x and 16x cheaper in Amazon Athena and Google BigQuery than using a query-at-a-time approach while achieving a higher throughput

arXiv.org e-Print Archive

Repository for Publications and Research Data

Multi-flow Optimization via Horizontal Message Queue Partitioning

Author: Boehm Matthias
Habich Dirk
Lehner Wolfgang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 19/01/2023
Field of study

Integration flows are increasingly used to specify and execute data-intensive integration tasks between heterogeneous systems and applications. There are many different application areas such as near real-time ETL and data synchronization between operational systems. For the reasons of an increasing amount of data, highly distributed IT infrastructures, as well as high requirements for up-to-dateness of analytical query results and data consistency, many instances of integration flows are executed over time. Due to this high load, the performance of the central integration platform is crucial for an IT infrastructure. With the aim of throughput maximization, we propose the concept of multi-flow optimization (MFO). In this approach, messages are collected during a waiting time and executed in batches to optimize sequences of plan instances of a single integration flow. We introduce a horizontal (value-based) partitioning approach for message batch creation and show how to compute the optimal waiting time. This approach significantly reduces the total execution time of a message sequence and hence, it maximizes the throughput, while accepting moderate latency time

Qucosa

Technische Universität Dresden: Qucosa