6 research outputs found

    Adaptive Lightweight Compression Acceleration on Hybrid CPU-FPGA System

    Get PDF

    Adaptive Lightweight Compression Acceleration on Hybrid CPU-FPGA System

    No full text
    With an increasingly large amount of data being collected in numerous application areas, the importance of online analytical processing (OLAP) workloads increases constantly. OLAP queries typically access only a small number of columns but a high number of rows and are, thus, most efficiently executed by column-stores. With the significant developments in the main memory domain even large datasets can be entirely held in the main memory. Thus, main memory column-stores have been established as state-of-the-art for OLAP scenarios. In these systems, all values of every column are encoded as a sequence of integer values and, thus, query processing is completely done on these integer sequences. To improve query processing, vectorization based the Single Instruction Multiple Data (SIMD) parallel paradigm is a state-of-the-art technique. Aside from vectorization, lightweight integer compression algorithms also play an important role to reduce the necessary memory space. Unfortunately, there is no single-best lightweight integer compression algorithm, and the algorithm selection decision depends most importantly on the data characteristics. Nevertheless, vectorization and integer compression complement each other, and the combined usage improves the query performance. Unfortunately, the benefits of vectorization are limited on modern x86-processors due to predefined and fixed SIMD instruction set extensions. Nowadays, the Field Programmable Gate Array (FPGA) offers a novel opportunity with regard to hardware reconfigurable capability. For example, we can use an arbitrary length of processor word in FPGA leading to a higher performance, we can prepare proper pipeline-based custom-made database accelerators, and we can develop embedded systems through utilizing such accelerators. Moreover, modern hybrid CPU-FPGA systems have a direct data communication channel between the main memory and FPGA which is useful for throughput acceleration. Based on these advantages, this thesis examines the utilization of FPGA for main memory column-stores. This examination is two-fold. First, we investigate the column scan on compressed data as important operation and second, we systematically look at lightweight integer compression. These two aspects are considered from the hardware perspective to guarantee a certain level of query performance acceleration. In particular, this thesis explores different embedded design options and proposes an adaptive lightweight integer compression system. Based on a comprehensive evaluation, we find out the optimal design constraint as per implementation mechanism for column scan and lightweight integer compression. Finally, we conclude this thesis by mentioning our upcoming research activities.:CONTENTS PAGE 1 INTRODUCTION 1 1.1 Analytical Data Systems 2 1.2 Query Acceleration 3 1.3 Thesis Contributions 5 2 BACKGROUND AND PROBLEM DEFINITION 7 2.1 Main Memory Column-Store Database Systems 8 2.2 State-of-the-art Optimization of Query Processing 11 2.2.1 Optimization using SIMD-Vectorization 11 2.2.2 Optimization using GPU-Accelerator 13 2.2.3 Summary 14 2.3 Opportunities and Challenges of FPGA-based Acceleration 15 2.3.1 Hybrid CPU-FPGA Architecture 15 2.3.2 Related Works on FPGA-based Acceleration 17 2.3.3 Research Challenges 19 3 COLUMN SCAN ON COMPRESSED DATA 21 3.1 Column Scan 22 3.1.1 Naïve 22 3.1.2 BitWeaving 24 3.1.3 SIMD Implementation 26 3.2 FPGA Implementation 29 3.2.1 Processing Element 30 3.2.2 Basic Architecture 31 3.2.3 Hybrid Architecture 31 3.3 Comparative Evaluation 32 3.3.1 SIMD Evaluation 33 3.3.2 FPGA Evaluation 36 3.4 Lessons Learned and Summary 38 4 ADAPTIVE LIGHTWEIGHT COMPRESSION SYSTEM 40 4.1 Lightweight Integer Compression 41 4.1.1 Overview and Classification 41 4.1.2 State-of-the-art Implementation Concepts 43 4.1.3 Discussion 46 4.2 FPGA-based Implementation of Lightweight Integer Compression Algorithms 46 4.2.1 Recap FPGA-based Architecture 47 4.2.2 Custom-made Compression HW Implementation 47 4.2.3 Lightweight Integer Compression System Implementation 56 4.2.4 Discussion 57 4.3 Adaptive Compression Systems 57 4.3.1 User-Specified Adaptive System 58 4.3.2 HW-Specified Adaptive Systems 59 4.4 Experimental Evaluation 63 4.4.1 Data Properties Definition 64 4.4.2 Physical-Level Compression 65 4.4.3 Logical-Level Compression 67 4.4.4 Cascaded Compression 69 4.4.5 Adaptive Compression 74 4.5 Lessons Learned and Summary 78 5 CONCLUSION AND FUTURE WORK 81 5.1 Conclusion 82 5.2 Future Work 84 BIBLIOGRAPHY 86 LIST OF FIGURES 91 LIST OF TABLES 94 CONTENT

    Adaptive Lightweight Compression Acceleration on Hybrid CPU-FPGA System

    No full text
    With an increasingly large amount of data being collected in numerous application areas, the importance of online analytical processing (OLAP) workloads increases constantly. OLAP queries typically access only a small number of columns but a high number of rows and are, thus, most efficiently executed by column-stores. With the significant developments in the main memory domain even large datasets can be entirely held in the main memory. Thus, main memory column-stores have been established as state-of-the-art for OLAP scenarios. In these systems, all values of every column are encoded as a sequence of integer values and, thus, query processing is completely done on these integer sequences. To improve query processing, vectorization based the Single Instruction Multiple Data (SIMD) parallel paradigm is a state-of-the-art technique. Aside from vectorization, lightweight integer compression algorithms also play an important role to reduce the necessary memory space. Unfortunately, there is no single-best lightweight integer compression algorithm, and the algorithm selection decision depends most importantly on the data characteristics. Nevertheless, vectorization and integer compression complement each other, and the combined usage improves the query performance. Unfortunately, the benefits of vectorization are limited on modern x86-processors due to predefined and fixed SIMD instruction set extensions. Nowadays, the Field Programmable Gate Array (FPGA) offers a novel opportunity with regard to hardware reconfigurable capability. For example, we can use an arbitrary length of processor word in FPGA leading to a higher performance, we can prepare proper pipeline-based custom-made database accelerators, and we can develop embedded systems through utilizing such accelerators. Moreover, modern hybrid CPU-FPGA systems have a direct data communication channel between the main memory and FPGA which is useful for throughput acceleration. Based on these advantages, this thesis examines the utilization of FPGA for main memory column-stores. This examination is two-fold. First, we investigate the column scan on compressed data as important operation and second, we systematically look at lightweight integer compression. These two aspects are considered from the hardware perspective to guarantee a certain level of query performance acceleration. In particular, this thesis explores different embedded design options and proposes an adaptive lightweight integer compression system. Based on a comprehensive evaluation, we find out the optimal design constraint as per implementation mechanism for column scan and lightweight integer compression. Finally, we conclude this thesis by mentioning our upcoming research activities.:CONTENTS PAGE 1 INTRODUCTION 1 1.1 Analytical Data Systems 2 1.2 Query Acceleration 3 1.3 Thesis Contributions 5 2 BACKGROUND AND PROBLEM DEFINITION 7 2.1 Main Memory Column-Store Database Systems 8 2.2 State-of-the-art Optimization of Query Processing 11 2.2.1 Optimization using SIMD-Vectorization 11 2.2.2 Optimization using GPU-Accelerator 13 2.2.3 Summary 14 2.3 Opportunities and Challenges of FPGA-based Acceleration 15 2.3.1 Hybrid CPU-FPGA Architecture 15 2.3.2 Related Works on FPGA-based Acceleration 17 2.3.3 Research Challenges 19 3 COLUMN SCAN ON COMPRESSED DATA 21 3.1 Column Scan 22 3.1.1 Naïve 22 3.1.2 BitWeaving 24 3.1.3 SIMD Implementation 26 3.2 FPGA Implementation 29 3.2.1 Processing Element 30 3.2.2 Basic Architecture 31 3.2.3 Hybrid Architecture 31 3.3 Comparative Evaluation 32 3.3.1 SIMD Evaluation 33 3.3.2 FPGA Evaluation 36 3.4 Lessons Learned and Summary 38 4 ADAPTIVE LIGHTWEIGHT COMPRESSION SYSTEM 40 4.1 Lightweight Integer Compression 41 4.1.1 Overview and Classification 41 4.1.2 State-of-the-art Implementation Concepts 43 4.1.3 Discussion 46 4.2 FPGA-based Implementation of Lightweight Integer Compression Algorithms 46 4.2.1 Recap FPGA-based Architecture 47 4.2.2 Custom-made Compression HW Implementation 47 4.2.3 Lightweight Integer Compression System Implementation 56 4.2.4 Discussion 57 4.3 Adaptive Compression Systems 57 4.3.1 User-Specified Adaptive System 58 4.3.2 HW-Specified Adaptive Systems 59 4.4 Experimental Evaluation 63 4.4.1 Data Properties Definition 64 4.4.2 Physical-Level Compression 65 4.4.3 Logical-Level Compression 67 4.4.4 Cascaded Compression 69 4.4.5 Adaptive Compression 74 4.5 Lessons Learned and Summary 78 5 CONCLUSION AND FUTURE WORK 81 5.1 Conclusion 82 5.2 Future Work 84 BIBLIOGRAPHY 86 LIST OF FIGURES 91 LIST OF TABLES 94 CONTENT

    Adaptive Lightweight Compression Acceleration on Hybrid CPU-FPGA System

    Get PDF
    With an increasingly large amount of data being collected in numerous application areas, the importance of online analytical processing (OLAP) workloads increases constantly. OLAP queries typically access only a small number of columns but a high number of rows and are, thus, most efficiently executed by column-stores. With the significant developments in the main memory domain even large datasets can be entirely held in the main memory. Thus, main memory column-stores have been established as state-of-the-art for OLAP scenarios. In these systems, all values of every column are encoded as a sequence of integer values and, thus, query processing is completely done on these integer sequences. To improve query processing, vectorization based the Single Instruction Multiple Data (SIMD) parallel paradigm is a state-of-the-art technique. Aside from vectorization, lightweight integer compression algorithms also play an important role to reduce the necessary memory space. Unfortunately, there is no single-best lightweight integer compression algorithm, and the algorithm selection decision depends most importantly on the data characteristics. Nevertheless, vectorization and integer compression complement each other, and the combined usage improves the query performance. Unfortunately, the benefits of vectorization are limited on modern x86-processors due to predefined and fixed SIMD instruction set extensions. Nowadays, the Field Programmable Gate Array (FPGA) offers a novel opportunity with regard to hardware reconfigurable capability. For example, we can use an arbitrary length of processor word in FPGA leading to a higher performance, we can prepare proper pipeline-based custom-made database accelerators, and we can develop embedded systems through utilizing such accelerators. Moreover, modern hybrid CPU-FPGA systems have a direct data communication channel between the main memory and FPGA which is useful for throughput acceleration. Based on these advantages, this thesis examines the utilization of FPGA for main memory column-stores. This examination is two-fold. First, we investigate the column scan on compressed data as important operation and second, we systematically look at lightweight integer compression. These two aspects are considered from the hardware perspective to guarantee a certain level of query performance acceleration. In particular, this thesis explores different embedded design options and proposes an adaptive lightweight integer compression system. Based on a comprehensive evaluation, we find out the optimal design constraint as per implementation mechanism for column scan and lightweight integer compression. Finally, we conclude this thesis by mentioning our upcoming research activities.:CONTENTS PAGE 1 INTRODUCTION 1 1.1 Analytical Data Systems 2 1.2 Query Acceleration 3 1.3 Thesis Contributions 5 2 BACKGROUND AND PROBLEM DEFINITION 7 2.1 Main Memory Column-Store Database Systems 8 2.2 State-of-the-art Optimization of Query Processing 11 2.2.1 Optimization using SIMD-Vectorization 11 2.2.2 Optimization using GPU-Accelerator 13 2.2.3 Summary 14 2.3 Opportunities and Challenges of FPGA-based Acceleration 15 2.3.1 Hybrid CPU-FPGA Architecture 15 2.3.2 Related Works on FPGA-based Acceleration 17 2.3.3 Research Challenges 19 3 COLUMN SCAN ON COMPRESSED DATA 21 3.1 Column Scan 22 3.1.1 Naïve 22 3.1.2 BitWeaving 24 3.1.3 SIMD Implementation 26 3.2 FPGA Implementation 29 3.2.1 Processing Element 30 3.2.2 Basic Architecture 31 3.2.3 Hybrid Architecture 31 3.3 Comparative Evaluation 32 3.3.1 SIMD Evaluation 33 3.3.2 FPGA Evaluation 36 3.4 Lessons Learned and Summary 38 4 ADAPTIVE LIGHTWEIGHT COMPRESSION SYSTEM 40 4.1 Lightweight Integer Compression 41 4.1.1 Overview and Classification 41 4.1.2 State-of-the-art Implementation Concepts 43 4.1.3 Discussion 46 4.2 FPGA-based Implementation of Lightweight Integer Compression Algorithms 46 4.2.1 Recap FPGA-based Architecture 47 4.2.2 Custom-made Compression HW Implementation 47 4.2.3 Lightweight Integer Compression System Implementation 56 4.2.4 Discussion 57 4.3 Adaptive Compression Systems 57 4.3.1 User-Specified Adaptive System 58 4.3.2 HW-Specified Adaptive Systems 59 4.4 Experimental Evaluation 63 4.4.1 Data Properties Definition 64 4.4.2 Physical-Level Compression 65 4.4.3 Logical-Level Compression 67 4.4.4 Cascaded Compression 69 4.4.5 Adaptive Compression 74 4.5 Lessons Learned and Summary 78 5 CONCLUSION AND FUTURE WORK 81 5.1 Conclusion 82 5.2 Future Work 84 BIBLIOGRAPHY 86 LIST OF FIGURES 91 LIST OF TABLES 94 CONTENT

    High-Throughput BitPacking Compression

    Get PDF
    To efficiently support analytical applications from a data management perspective, in-memory column store database systems are state-of-the art. In this kind of database system, lossless lightweight integer compression schemes are crucial to keep the memory storage as low as possible and to speedup query processing. In this specific compression domain, BitPacking is one of the most frequently applied compression scheme. However, (de) compression should not come with any additional cost during run time, but should be provided transparently without compromising the overall system performance. To achieve that, we focus on acceleration of BitPacking using Field Programmable Gate Arrays (FPGAs). Therefore, we outline several FPGA designs for BitPacking in this paper. As we are going to show in our evaluation, our specific designs provide the BitPacking compression scheme with high-throughput

    FPGA vs. SIMD: Comparison for Main Memory-Based Fast Column Scan

    Get PDF
    The ever-increasing growth of data demands reliable data-base system with high-throughput and low-latency. Main memory-based column store database systems are state-of-the-art on this perspective, whereby data (values) in relational tables are organized by columns rather than by rows. In such systems, a full column scan is a fundamental key operation and thus, the optimization of the key operation is very crucial. This leads to have compact storage layout based fast column scan techniques through intra-value parallelism. For this reason, we investigated on different well-known fast column scan techniques using SIMD (Single Instruction Multiple Data) vectorization as well as using Field Programmable Gate Arrays (FPGA). Moreover, we present selective results of our exhaustive evaluation. Based on this evaluation, we find out the best column scan technique as per implementation mechanism–FPGA and SIMD. Finally, we conclude this paper via mentioning some lessons learned for our ongoing research activities
    corecore