Search CORE

2,719 research outputs found

An architecture for recycling intermediates in a column-store

Author: Ivanova M.G. (Milena)
Kersten M.L. (Martin)
Nes N.J. (Niels)
Pereira Goncalves R.A. (Romulo Antonio)
Publication venue: A.C.M.
Publication date: 01/12/2010
Field of study

Automatic recycling intermediate results to improve both query response time and throughput is a grand c

CWI's Institutional Repository

An architecture for recycling intermediates in a column-store

Author: Ivanova M.G. (Milena)
Kersten M.L. (Martin)
Nes N.J. (Niels)
Pereira Goncalves R.A. (Romulo Antonio)
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/06/2009
Field of study

Automatically recycling (intermediate) results is a grand challenge for state-of-the-art databases to improve both query response time and throughput. Tuples are loaded and streamed through a tuple-at-a-time processing pipeline avoiding materialization of intermediates as much as possible. This limits the opportunities for reuse of overlapping computations to DBA-defined materialized views and function/result cache tuning. In contrast, the operator-at-a-time execution paradigm produces fully materialized results in each step of the query plan. To avoid resource contention, these intermediates are evicted as soon as possible. In this paper we study an architecture that harvests the by-products of the operator-at-a-time paradigm in a column store system using a lightweight mechanism, the recycler. The key challenge then becomes selection of the policies to admit intermediates to the resource pool, their retention period, and the eviction strategy when facing resource limitations. The proposed recycling architecture has been implemented in an open-source system. An experimental analysis against the TPC-H ad-hoc decision support benchmark and a complex, real-world application (SkyServer) demonstrates its effectiveness in terms of self-organizing behavior and its significant performance gains. The results indicate the potentials of recycling intermediates and charters a route for further development of database kernels

CWI's Institutional Repository

An architecture for recycling intermediates in a column-store

Author: Ivanova Milena
Kersten Martin
Nes Niels
Pereira Goncalves Romulo Antonio
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2009
Field of study

textabstractAutomatically recycling (intermediate) results is a grand challenge for state-of-the-art databases to improve both query response time and throughput. Tuples are loaded and streamed through a tuple-at-a-time processing pipeline avoiding materialization of intermediates as much as possible. This limits the opportunities for reuse of overlapping computations to DBA-defined materialized views and function/result cache tuning. In contrast, the operator-at-a-time execution paradigm produces fully materialized results in each step of the query plan. To avoid resource contention, these intermediates are evicted as soon as possible. In this paper we study an architecture that harvests the by-products of the operator-at-a-time paradigm in a column store system using a lightweight mechanism, the recycler. The key challenge then becomes selection of the policies to admit intermediates to the resource pool, their retention period, and the eviction strategy when facing resource limitations. The proposed recycling architecture has been implemented in an open-source system. An experimental analysis against the TPC-H ad-hoc decision support benchmark and a complex, real-world application (SkyServer) demonstrates its effectiveness in terms of self-organizing behavior and its significant performance gains. The results indicate the potentials of recycling intermediates and charters a route for further development of database kernels

Crossref

An architecture for recycling intermediates in a column-store

Author: Agrawal S.
Bornhövd C.
Chen C.-M.
Choi C.-H.
Larson P.
Martin L. Kersten
Milena G. Ivanova
Niels J. Nes
Romulo A.P. Gonçalves
Scheuermann P.
Tan K.-L.
Transaction Processing Performance Council
Zhou J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Vectorwise: Beyond Column Stores

Author: Boncz P.A.
Zukowski M.
Publication venue
Publication date: 01/01/2012
Field of study

textabstractThis paper tells the story of Vectorwise, a high-performance analytical database system, from multiple perspectives: its history from academic project to commercial product, the evolution of its technical architecture, customer reactions to the product and its future research and development roadmap. One take-away from this story is that the novelty in Vectorwise is much more than just column-storage: it boasts many query processing innovations in its vectorized execution model, and an adaptive mixed row/column data storage model with indexing support tailored to analytical workloads. Another one is that there is a long road from research prototype to commercial product, though database research continues to achieve a strong innovative inﬂuence on product development

VU Research Portal

CWI's Institutional Repository

MonetDB: Two Decades of Research in Column-oriented Database Architectures

Author: Groffen F.E. (Fabian)
Idreos S. (Stratos)
Kersten M.L. (Martin)
Manegold S. (Stefan)
Mullender K.S. (Sjoerd)
Nes N.J. (Niels)
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/03/2012
Field of study

MonetDB is a state-of-the-art open-source column-store database management system targeting applications in need for analytics over large collections of data. MonetDB is actively used nowadays in health care, in telecommunications as well as in scientiﬁc databases and in data management research, accumulating on average more than 10,000 downloads on a monthly basis. This paper gives a brief overview of the MonetDB technology as it developed over the past two decades and the main research highlights which drive the current MonetDB design and form the basis for its future evolution

CWI's Institutional Repository

Just-in-time Data Distribution for Analytical Query Processing

Author: Groffen F.E. (Fabian)
Ivanova M.G. (Milena)
Kersten M.L. (Martin)
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/09/2012
Field of study

Distributed processing commonly requires data spread across machines using a priori static or hash-based data allocation. In this paper, we explore an alternative approach that starts from a master node in control of the complete database, and a variable number of worker nodes for delegated query processing. Data is shipped just-in-time to the worker nodes using a need to know policy, and is being reused, if possible, in subsequent queries. A bidding mechanism among the workers yields a scheduling with the most efficient reuse of previously shipped data, minimizing the data transfer costs. Just-in-time data shipment allows our system to benefit from locally available idle resources to boost overall performance. The system is maintenance-free and allocation is fully transparent to users. Our experiments show that the proposed adaptive distributed architecture is a viable and flexible alternative for small scale MapReduce-type of settings

CWI's Institutional Repository

Database architecture evolution: Mammals flourished long before dinosaurs became extinct

Author: Boncz P.A. (Peter)
Kersten M.L. (Martin)
Manegold S. (Stefan)
Publication venue: 'VLDB Endowment'
Publication date: 01/08/2009
Field of study

The holy grail for database architecture research is to find a solution that is Scalable & Speedy, to run on anything from small ARM processors up to globally distributed compute clusters, Stable & Secure, to service a broad user community, Small & Simple, to be comprehensible to a small team of programmers, Self-managing, to let it run out-of-the-box without hassle. In this paper, we provide a trip report on this quest, covering both past experiences, ongoing research on hardware-conscious algorithms, and novel ways towards self-management specifically focused on column store solutions

CWI's Institutional Repository

The Database Architectures Research Group at CWI

Author: Kersten M.L. (Martin)
Manegold S. (Stefan)
Mullender K.S. (Sjoerd)
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/12/2011
Field of study

The Database research group at CWI was established in 1985. It has steadily grown from two PhD students to a group of 17 people ultimo 2011. The group is supported by a scientific programmer and a system engineer to keep our machines running. In this short note, we look back at our past and highlight the multitude of topics being addressed

CWI's Institutional Repository

Enhanced Stream Processing in a DBMS Kernel

Author: Idreos S. (Stratos)
Kersten M.L. (Martin)
Liarou E. (Erietta)
Manegold S. (Stefan)
Publication venue
Publication date: 01/03/2013
Field of study

Continuous query processing has emerged as a promising query processing paradigm with numerous applications. A recent development is the need to handle both streaming queries and typical one-time queries in the same application. For example, data warehousing can greatly benefit from the integration of stream semantics, i.e., online analysis of incoming data and combination with existing data. This is especially useful to provide low latency in data-intensive analysis in big data warehouses that are augmented with new data on a daily basis. However, state-of-the-art database technology cannot handle streams efficiently due to their "continuous" nature. At the same time, state-of-the-art stream technology is purely focused on stream applications. The research efforts are mostly geared towards the creation of specialized stream management systems built with a different philosophy than a DBMS. The drawback of this approach is the limited opportunities to exploit successful past data processing technology, e.g., query optimization techniques. For this new problem we need to combine the best of both worlds. Here we take a completely different route by designing a stream engine on top of an existing relational database kernel. This includes reuse of both its storage/execution engine and its optimizer infrastructure. The major challenge then becomes the efficient support for specialized stream features. This paper focuses on incremental window-based processing, arguably the most crucial stream-specific requirement. In order to maintain and reuse the generic storage and execution model of the DBMS, we elevate the problem at the query plan level. Proper op

CWI's Institutional Repository