Search CORE

17 research outputs found

Limitations of Intra-operator Parallelism Using Heterogeneous Computing Resources

Author: Habich Dirk
Karnagel Tomas
Lehner Wolfgang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/03/2023
Field of study

The hardware landscape is changing from homogeneous multi-core systems towards wildly heterogeneous systems combining different computing units, like CPUs and GPUs. To utilize these heterogeneous environments, database query execution has to adapt to cope with different architectures and computing behaviors. In this paper, we investigate the simple idea of partitioning an operator’s input data and processing all data partitions in parallel, one partition per computing unit. For heterogeneous systems, data has to be partitioned according to the performance of the computing units. We define a way to calculate the partition sizes, analyze the parallel execution exemplarily for two database operators, and present limitations that could hinder significant performance improvements. The findings in this paper can help system developers to assess the possibilities and limitations of intra-operator parallelism in heterogeneous environments, leading to more informed decisions if this approach is beneficial for a given workload and hardware environment

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa

Heterogeneity-Aware Placement Strategies for Query Optimization

Author: Karnagel Tomas
Publication venue
Publication date: 23/05/2017
Field of study

Computing hardware is changing from systems with homogeneous CPUs to systems with heterogeneous computing units like GPUs, Many Integrated Cores, or FPGAs. This trend is caused by scaling problems of homogeneous systems, where heat dissipation and energy consumption is limiting further growths in compute-performance. Heterogeneous systems provide differently optimized computing hardware, which allows different operations to be computed on the most appropriate computing unit, resulting in faster execution and less energy consumption. For database systems, this is a new opportunity to accelerate query processing, allowing faster and more interactive querying of large amounts of data. However, the current hardware trend is also a challenge as most database systems do not support heterogeneous computing resources and it is not clear how to support these systems best. In the past, mainly single operators were ported to different computing units showing great results, while missing a system wide application. To efficiently support heterogeneous systems, a systems approach for query processing and query optimization is needed. In this thesis, we tackle the optimization challenge in detail. As a starting point, we evaluate three different approaches on isolated use-cases to assess their advantages and limitations. First, we evaluate a fork-join approach of intra-operator parallelism, where the same operator is executed on multiple computing units at the same time, each execution with different data partitions. Second, we evaluate using one computing unit statically to accelerate one operator, which provides high code-optimization potential, due to this static and pre-known usage of hardware and software. Third, we evaluate dynamically placing operators onto computing units, depending on the operator, the available computing hardware, and the given data sizes. We argue that the first and second approach suffer from multiple overheads or high implementation costs. The third approach, dynamic placement, shows good performance, while being highly extensible to different computing units and different operator implementations. To automate this dynamic approach, we first propose general placement optimization for query processing. This general approach includes runtime estimation of operators on different computing units as well as two approaches for defining the actual operator placement according to the estimated runtimes. The two placement approaches are local optimization, which decides the placement locally at run-time, and global optimization, where the placement is decided at compile-time, while allowing a global view for enhanced data sharing. The main limitation of the latter is the high dependency on cardinality estimation of intermediate results, as estimation errors for the cardinalities propagate to the operator runtime estimation and placement optimization. Therefore, we propose adaptive placement optimization, allowing the placement optimization to become fully independent of cardinalities estimation, effectively eliminating the main source of inaccuracy for runtime estimation and placement optimization. Finally, we define an adaptive placement sequence, incorporating all our proposed techniques of placement optimization. We implement this sequence as a virtualization layer between the database system and the heterogeneous hardware. Our implementation approach bases on preexisting interfaces to the database system and the hardware, allowing non-intrusive integration into existing database systems. We evaluate our techniques using two different database systems and two different OLAP benchmarks, accelerating the query processing through heterogeneous execution

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa

Heterogeneity-Aware Operator Placement in Column-Store DBMS

Author: Habich Dirk
Karnagel Tomas
Lehner Wolfgang
Schlegel Benjamin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/02/2023
Field of study

Due to the tremendous increase in the amount of data efficiently managed by current database systems, optimization is still one of the most challenging issues in database research. Today’s query optimizer determine the most efficient composition of physical operators to execute a given SQL query, whereas the underlying hardware consists of a multi-core CPU. However, hardware systems are more and more shifting towards heterogeneity, combining a multi-core CPU with various computing units, e.g., GPU or FPGA cores. In order to efficiently utilize the provided performance capability of such heterogeneous hardware, the assignment of physical operators to computing units gains importance. In this paper, we propose a heterogeneity-aware physical operator placement strategy (HOP) for in-memory columnar database systems in a heterogeneous environment. Our placement approach takes operators from the physical query execution plan as an input and assigns them to computing units using a cost model at runtime. To enable this runtime decision, our cost model uses the characteristics of the computing units, execution properties of the operators, as well as runtime data to estimate execution costs for each unit. We evaluated our approach on full TPC-H queries within a prototype database engine. As we are going to show, the placement in a heterogeneous hardware system has a high influence on query performance

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa

The HELLS-Join: A Heterogeneous Stream join for ExtremeLy Large windows

Author: Habich Dirk
Karnagel Tomas
Lehner Wolfgang
Schlegel Benjamin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 19/09/2022
Field of study

Upcoming processors are combining different computing units in a tightly-coupled approach using a unified shared memory hierarchy. This tightly-coupled combination leads to novel properties with regard to cooperation and interaction. This paper demonstrates the advantages of those processors for a stream-join operator as an important data-intensive example. In detail, we propose our HELLS-Join approach employing all heterogeneous devices by outsourcing parts of the algorithm on the appropriate device. Our HELLS-Join performs better than CPU stream joins, allowing wider time windows, higher stream frequencies, and more streams to be joined as before

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa

Big Data causing Big (TLB) Problems: Taming Random Memory Accesses on the GPU

Author: Ben-Nun Tal
Habich Dirk
Karnagel Tomas
Lehner Wolfgang
Werner Matthias
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 13/06/2022
Field of study

GPUs are increasingly adopted for large-scale database processing, where data accesses represent the major part of the computation. If the data accesses are irregular, like hash table accesses or random sampling, the GPU performance can suffer. Especially when scaling such accesses beyond 2GB of data, a performance decrease of an order of magnitude is encountered. This paper analyzes the source of the slowdown through extensive micro-benchmarking, attributing the root cause to the Translation Lookaside Buffer (TLB). Using the micro-benchmarks, the TLB hierarchy and structure are fully analyzed on two different GPU architectures, identifying never-before-published TLB sizes that can be used for efficient large-scale application tuning. Based on the gained knowledge, we propose a TLB-conscious approach to mitigate the slowdown for algorithms with irregular memory access. The proposed approach is applied to two fundamental database operations - random sampling and hash-based grouping - showing that the slowdown can be dramatically reduced, and resulting in a performance increase of up to 13×

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa

HW/SW-database-codesign for compressed bitmap index processing

Author: Arnold Oliver
Fettweis Gerhard
Haas Sebastian
Karnagel Tomas
Laux Erik
Lehner Wolfgang
Schlegel Benjamin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/01/2023
Field of study

Compressed bitmap indices are heavily used in scientific and commercial database systems because they largely improve query performance for various workloads. Early research focused on finding tailor-made index compression schemes that are amenable for modern processors. Improving performance further typically comes at the expense of a lower compression rate, which is in many applications not acceptable because of memory limitations. Alternatively, tailor-made hardware allows to achieve a performance that can only hardly be reached with software running on general-purpose CPUs. In this paper, we will show how to create a custom instruction set framework for compressed bitmap processing that is generic enough to implement most of the major compressed bitmap indices. For evaluation, we implemented WAH, PLWAH, and COMPAX operations using our framework and compared the resulting implementation to multiple state-of-the-art processors. We show that the custom-made bitmap processor achieves speedups of up to one order of magnitude by also using two orders of magnitude less energy compared to a modern energy-efficient Intel processor. Finally, we discuss how to embed our processor with database-specific instruction sets into database system environments

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa

Query processing on low-energy many-core processors

Author: Asmussen Nils
Fettweis Gerhard
Habich Dirk
Karnagel Tomas
Lehner Wolfgang
Nöthen Benedikt
Ungethüm Annett
Völp Marcus
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/01/2023
Field of study

Aside from performance, energy efficiency is an increasing challenge in database systems. To tackle both aspects in an integrated fashion, we pursue a hardware/software co-design approach. To fulfill the energy requirement from the hardware perspective, we utilize a low-energy processor design offering the possibility to us to place hundreds to millions of chips on a single board without any thermal restrictions. Furthermore, we address the performance requirement by the development of several database-specific instruction set extensions to customize each core, whereas each core does not have all extensions. Therefore, our hardware foundation is a low-energy processor consisting of a high number of heterogeneous cores. In this paper, we introduce our hardware setup on a system level and present several challenges for query processing. Based on these challenges, we describe two implementation concepts and a comparison between these concepts. Finally, we conclude the paper with some lessons learned and an outlook on our upcoming research directions

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa

The Orchestration Stack: The Impossible Task of Designing Software for Unknown Future Post-CMOS Hardware

Future systems based on post-CMOS technologies will be wildly heterogeneous, with properties largely unknown today. This paper presents our design of a new hardware/software stack to address the challenge of preparing software development for such systems. It combines well-understood technologies from different areas, e.g., network-on-chips, capability operating systems, flexible programming models and model checking. We describe our approach and provide details on key technologies

Open Repository and Bibliography - Luxembourg

Heterogeneity-Aware Placement Strategies for Query Optimization

Author: Karnagel Tomas
Publication venue
Publication date
Field of study

HSSS - Hochschulschriftenserver der SLUB

Heterogeneity-Aware Operator Placement in Column-Store DBMS

Author: Habich Dirk
Karnagel Tomas
Lehner Wolfgang
Schlegel Benjamin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/02/2023
Field of study

HSSS - Hochschulschriftenserver der SLUB