Search CORE

989 research outputs found

MorphStore — In-Memory Query Processing based on Morphing Compressed Intermediates LIVE

Author: Damme Patrick
Habich Dirk
Hildebrandt Juliana
Krause Alexander
Lehner Wolfgang
Pietrzyk Johannes
Ungethüm Annett
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 15/09/2022
Field of study

In this demo, we present MorphStore, an in-memory column store with a novel compression-aware query processing concept. Basically, compression using lightweight integer compression algorithms already plays an important role in existing in-memory column stores, but mainly for base data. The continuous handling of compression from the base data to the intermediate results during query processing has already been discussed, but not investigated in detail since the computational effort for compression as well as decompression is often assumed to exceed the benefits of a reduced transfer cost between CPU and main memory. However, this argument increasingly loses its validity as we are going to show in our demo. Generally, our novel compression-aware query processing concept is characterized by the fact that we are able to speed up the query execution by morphing compressed intermediate results from one scheme to another scheme to dynamically adapt to the changing data characteristics during query processing. Our morphing decisions are made using a cost-based approach

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa

Vectorwise: Beyond Column Stores

Author: Boncz P.A.
Zukowski M.
Publication venue
Publication date: 01/01/2012
Field of study

textabstractThis paper tells the story of Vectorwise, a high-performance analytical database system, from multiple perspectives: its history from academic project to commercial product, the evolution of its technical architecture, customer reactions to the product and its future research and development roadmap. One take-away from this story is that the novelty in Vectorwise is much more than just column-storage: it boasts many query processing innovations in its vectorized execution model, and an adaptive mixed row/column data storage model with indexing support tailored to analytical workloads. Another one is that there is a long road from research prototype to commercial product, though database research continues to achieve a strong innovative inﬂuence on product development

VU Research Portal

CWI's Institutional Repository

Compression-Aware In-Memory Query Processing: Vision, System Design and Beyond

Author: Damme Patrick
Habich Dirk
Hildebrandt Juliana
Lehner Wolfgang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/02/2023
Field of study

In-memory database systems have to keep base data as well as intermediate results generated during query processing in main memory. In addition, the effort to access intermediate results is equivalent to the effort to access the base data. Therefore, the optimization of intermediate results is interesting and has a high impact on the performance of the query execution. For this domain, we propose the continuous use of lightweight compression methods for intermediate results and have the aim of developing a balanced query processing approach based on compressed intermediate results. To minimize the overall query execution time, it is important to find a balance between the reduced transfer times and the increased computational effort. This paper provides an overview and presents a system design for our vision. Our system design addresses the challenge of integrating a large and evolving corpus of lightweight data compression algorithms in an in-memory column store. In detail, we present our model-driven approach and describe ongoing research topics to realize our compression-aware query processing vision

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa

Analytical Query Processing Using Heterogeneous SIMD Instruction Sets

Author: Ungethüm Annett
Publication venue
Publication date: 30/10/2020
Field of study

Numerous applications gather increasing amounts of data, which have to be managed and queried. Different hardware developments help to meet this challenge. The grow-ing capacity of main memory enables database systems to keep all their data in memory. Additionally, the hardware landscape is becoming more diverse. A plethora of homo-geneous and heterogeneous co-processors is available, where heterogeneity refers not only to a different computing power, but also to different instruction set architectures. For instance, modern Intel® CPUs offer different instruction sets supporting the Single Instruction Multiple Data (SIMD) paradigm, e.g. SSE, AVX, and AVX512. Database systems have started to exploit SIMD to increase performance. However, this is still a challenging task, because existing algorithms were mainly developed for scalar processing and because there is a huge variety of different instruction sets, which were never standardized and have no unified interface. This requires to completely rewrite the source code for porting a system to another hardware architecture, even if those archi-tectures are not fundamentally different and designed by the same company. Moreover, operations on large registers, which are the core principle of SIMD processing, behave counter-intuitively in several cases. This is especially true for analytical query process-ing, where different memory access patterns and data dependencies caused by the com-pression of data, challenge the limits of the SIMD principle. Finally, there are physical constraints to the use of such instructions affecting the CPU frequency scaling, which is further influenced by the use of multiple cores. This is because the supply power of a CPU is limited, such that not all transistors can be powered at the same time. Hence, there is a complex relationship between performance and power, and therefore also between performance and energy consumption. This thesis addresses the specific challenges, which are introduced by the application of SIMD in general, and the heterogeneity of SIMD ISAs in particular. Hence, the goal of this thesis is to exploit the potential of heterogeneous SIMD ISAs for increasing the performance as well as the energy-efficiency

Technische Universität Dresden: Qucosa

Data Vaults: a Database Welcome to Scientiﬁc File Repositories

Author: Datcu M. (Mihai)
Espinoza Molina D.
Ivanova M.G. (Milena)
Kargin Y. (Yagiz)
Kersten M.L. (Martin)
Manegold S. (Stefan)
Zhang Y. (Ying)
Publication venue
Publication date: 01/01/2013
Field of study

Efficient management and exploration of high-volume scientific file repositories have become pivotal for advancement in science. We propose to demonstrate the Data Vault, an extension of the database system architecture that transparently opens scientific file repositories for efficient in-database processing and exploration. The Data Vault facilitates science data analysis using high-level declarative languages, such as the traditional SQL and the novel array-oriented SciQL. Data of interest are loaded from the attached repository in a just-in-time manner without need for up-front data ingestion. The demo is built around concrete implementations of the Data Vault for two scientific use cases: seismic time series and Earth observation images. The seismic Data Vault uses the queries submitted by the audience to illustrate the internals of Data Vault functioning by revealing the mechanisms of dynamic query plan generation and on-demand external data ingestion. The image Data Vault shows an application view from the perspective of data mining researchers

CWI's Institutional Repository

International Migration, Integration and Social Cohesion online publications

A column-store meets the point clouds

Author: Ivanova M.G. (Milena)
Kersten M.L. (Martin)
Martinez-Rubi O. (Oscar)
Pereira Goncalves R.A. (Romulo Antonio)
Publication venue
Publication date: 01/07/2014
Field of study

Dealing with LIDAR data in the context of database management systems calls for a re-assessment of their functionality, performance, and storage/processing limitations. The territory for efficient and scalable processing of LIDAR repositories using GIS enabled database systems is still largely unexplored. Bringing together hard core database management experts and GIS application developers is a sine qua non to advance the state of the art. In particular to assess the relative merits of both traditional row-based database engines and the modern column-oriented database engines

CWI's Institutional Repository

DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines

Integrated data analysis (IDA) pipelines—that combine data management (DM) and query processing, high-performance computing (HPC), and machine learning (ML) training and scoring—become increasingly common in practice. Interestingly, systems of these areas share many compilation and runtime techniques, and the used—increasingly heterogeneous—hardware infrastructure converges as well. Yet, the programming paradigms, cluster resource management, data formats and representations, as well as execution strategies differ substantially. DAPHNE is an open and extensible system infrastructure for such IDA pipelines, including language abstractions, compilation and runtime techniques, multi-level scheduling, hardware (HW) accelerators, and computational storage for increasing productivity and eliminating unnecessary overheads. In this paper, we make a case for IDA pipelines, describe the overall DAPHNE system architecture, its key components, and the design of a vectorized execution engine for computational storage, HW accelerators, as well as local and distributed operations. Preliminary experiments that compare DAPHNE with MonetDB, Pandas, DuckDB, and TensorFlow show promising results

Institute of Transport Research:Publications

The IT University of Copenhagen's Repository