Search CORE

14 research outputs found

Cache-Conscious Radix-Decluster Projections

Author: M KERSTEN
N NES
P BONCZ
S MANEGOLD
Publication venue: 'Elsevier BV'
Publication date: 01/01/2007
Field of study

Database architecture evolution: Mammals flourished long before dinosaurs became extinct

Author: Boncz P.A. (Peter)
Kersten M.L. (Martin)
Manegold S. (Stefan)
Publication venue: 'VLDB Endowment'
Publication date: 01/08/2009
Field of study

The holy grail for database architecture research is to find a solution that is Scalable & Speedy, to run on anything from small ARM processors up to globally distributed compute clusters, Stable & Secure, to service a broad user community, Small & Simple, to be comprehensible to a small team of programmers, Self-managing, to let it run out-of-the-box without hassle. In this paper, we provide a trip report on this quest, covering both past experiences, ongoing research on hardware-conscious algorithms, and novel ways towards self-management specifically focused on column store solutions

CWI's Institutional Repository

The Database Architectures Research Group at CWI

Author: Kersten M.L. (Martin)
Manegold S. (Stefan)
Mullender K.S. (Sjoerd)
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/12/2011
Field of study

The Database research group at CWI was established in 1985. It has steadily grown from two PhD students to a group of 17 people ultimo 2011. The group is supported by a scientific programmer and a system engineer to keep our machines running. In this short note, we look back at our past and highlight the multitude of topics being addressed

CWI's Institutional Repository

Recommended from our members

Analytical Query Execution Optimized for all Layers of Modern Hardware

Author: Polychroniou Orestis
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2018
Field of study

Analytical database queries are at the core of business intelligence and decision support. To analyze the vast amounts of data available today, query execution needs to be orders of magnitude faster. Hardware advances have made a profound impact on database design and implementation. The large main memory capacity allows queries to execute exclusively in memory and shifts the bottleneck from disk access to memory bandwidth. In the new setting, to optimize query performance, databases must be aware of an unprecedented multitude of complicated hardware features. This thesis focuses on the design and implementation of highly efficient database systems by optimizing analytical query execution for all layers of modern hardware. The hardware layers include the network across multiple machines, main memory and the NUMA interconnection across multiple processors, the multiple levels of caches across multiple processor cores, and the execution pipeline within each core. For the network layer, we introduce a distributed join algorithm that minimizes the network traffic. For the memory hierarchy, we describe partitioning variants aware to the dynamics of the CPU caches and the NUMA interconnection. To improve the memory access rate of linear scans, we optimize lightweight compression variants and evaluate their trade-offs. To accelerate query execution within the core pipeline, we introduce advanced SIMD vectorization techniques generalizable across multiple operators. We evaluate our algorithms and techniques on both mainstream hardware and on many-integrated-core platforms, and combine our techniques in a new query engine design that can better utilize the features of many-core CPUs. In the era of hardware becoming increasingly parallel and datasets consistently growing in size, this thesis can serve as a compass for developing hardware-conscious databases with truly high-performance analytical query execution

Columbia University Academic Commons

Self-organizing tuple reconstruction in column-stores

Author: Idreos S. (Stratos)
Kersten M.L. (Martin)
Manegold S. (Stefan)
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/06/2009
Field of study

Column-stores gained popularity as a promising physical design alternative. Each attribute of a relation is physically stored as a separate column allowing queries to load only the required attributes. The overhead incurred is on-the-fly tuple reconstruction for multi-attribute queries. Each tuple reconstruction is a join of two columns based on tuple IDs, making it a significant cost component. The ultimate physical design is to have multiple presorted copies of each base table such that tuples are already appropriately organized in multiple different orders across the various columns. This requires the ability to predict the workload, idle time to prepare, and infrequent updates. In this paper, we propose a novel design, \emph{partial sideways cracking}, that minimizes the tuple rec

CWI's Institutional Repository

Cache-Efficient Aggregation: Hashing Is Sorting

Author: Färber Franz
Lacurie Arnaud
Lehner Wolfgang
Müller Ingo
Sanders Peter
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 14/06/2022
Field of study

For decades researchers have studied the duality of hashing and sorting for the implementation of the relational operators, especially for efficient aggregation. Depending on the underlying hardware and software architecture, the specifically implemented algorithms, and the data sets used in the experiments, different authors came to different conclusions about which is the better approach. In this paper we argue that in terms of cache efficiency, the two paradigms are actually the same. We support our claim by showing that the complexity of hashing is the same as the complexity of sorting in the external memory model. Furthermore we make the similarity of the two approaches obvious by designing an algorithmic framework that allows to switch seamlessly between hashing and sorting during execution. The fact that we mix hashing and sorting routines in the same algorithmic framework allows us to leverage the advantages of both approaches and makes their similarity obvious. On a more practical note, we also show how to achieve very low constant factors by tuning both the hashing and the sorting routines to modern hardware. Since we observe a complementary dependency of the constant factors of the two routines to the locality of the input, we exploit our framework to switch to the faster routine where appropriate. The result is a novel relational aggregation algorithm that is cache-efficient---independently and without prior knowledge of input skew and output cardinality---, highly parallelizable on modern multi-core systems, and operating at a speed close to the memory bandwidth, thus outperforming the state-of-the-art by up to 3.7x

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa

The Design and Implementation of Modern Column-Oriented Database Systems

Author: Abadi D.
Boncz P.A. (Peter)
Harizopoulos S.
Idreos S. (Stratos)
Madden S. (Samuel)
Publication venue: 'Now Publishers'
Publication date: 01/12/2013
Field of study

CWI's Institutional Repository

Self-organizing tuple reconstruction in column-stores

Author: Martin L. Kersten
Stefan Manegold
Stratos Idreos
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2009
Field of study

Column-stores gained popularity as a promising physical de-sign alternative. Each attribute of a relation is physically stored as a separate column allowing queries to load only the required attributes. The overhead incurred is on-the-fly tuple reconstruction for multi-attribute queries. Each tu-ple reconstruction is a join of two columns based on tuple IDs, making it a significant cost component. The ultimate physical design is to have multiple presorted copies of each base table such that tuples are already appropriately orga-nized in multiple different orders across the various columns. This requires the ability to predict the workload, idle time to prepare, and infrequent updates. In this paper, we propose a novel design, partial side-ways cracking, that minimizes the tuple reconstruction cost in a self-organizing way. It achieves performance similar to using presorted data, but without requiring the heavy initial presorting step itself. Instead, it handles dynamic, unpredictable workloads with no idle time and frequent up-dates. Auxiliary dynamic data structures, called cracker maps, provide a direct mapping between pairs of attributes used together in queries for tuple reconstruction. A map is continuously physically reorganized as an integral part of query evaluation, providing faster and reduced data access for future queries. To enable flexible and self-organizing be-havior in storage-limited environments, maps are material-ized only partially as demanded by the workload. Each map is a collection of separate chunks that are individually reor-ganized, dropped or recreated as needed. We implemented partial sideways cracking in an open-source column-store. A detailed experimental analysis demonstrates that it brings significant performance benefits for multi-attribute queries

CiteSeerX

Crossref

CWI's Institutional Repository

Space-Economical Partial Gram Indices for Exact Substring Matching

Author: Boncz P.A. (Peter)
Sidirourgos E. (Eleftherios)
Tang N. (Nan)
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/11/2009
Field of study

CWI's Institutional Repository