45 research outputs found
Decomposing and re-composing lightweight compression schemes - and why it matters
We argue for a richer view of the space of lightweight compression schemes for columnar DBMSes: We demonstrate how even simple simple schemes used in DBMSes decompose into constituent schemes through a columnar perspective on their decompression. With our concrete examples, we touch briefly on what follows from these and other decompositions: Composition of alternative compression schemes as well as other practical and analytical implications
Metode Sorting Bitonic Pada GPU
Perubahan arsitektur komputer menjadi
multiprocessor memang bisa membuat lebih banyak proses
bisa dikerjakan sekaligus, namun perubahan tersebut tidaklah
mampu meningkatkan kecepatan masing-masing proses secara
signifikan. Peningkatan kecepatan setiap proses bisa dicapai
melalui peningkatan kecepatan perangkat lunak. Kecepatan
perangkat lunak sangat ditentukan oleh algoritmanya. Usaha
untuk mencari algoritma yang lebih cepat tidaklah mudah,
namun dengan adanya komputer multiprocessor, dapatlah
dirancang algoritma yang lebih cepat, yaitu dengan
memparalelkan proses komputasinya. Salah satu contoh
implementasi dari multiprosessor pada desain grafis adalah
GPU (graphical processing unit) yang dipelopori oleh
NVIDIA. GPU menerapkan algoritma dari paralel computing.
Salah satu algoritma tersebut adalah sorting.
Sorting adalah salah satu masalah pokok yang sering
dikemukakan dalam pemrosesan paralel. Strategi
pemecahannya adalah dengan algoritma Divide and Conquer
yaitu strategi pemecahan masalah dengan cara melakukan
pembagian masalah yang besar tersebut menjadi beberapa
bagian yang lebih kecil secara rekursif hingga masalah
tersebut dapat dipecahkan secara langsung. Solusi yang
didapat dari setiap bagian kemudian digabungkan untuk
membentuk sebuah solusi yang utuh. Metode sorting seperti
ini dinamakan sebagai bitonic sort
Efficient Cross-Device Query Processing
The increasing diversity of hardware within a single system promises large performance gains but also poses a challenge for data management systems. Strategies for the efficient use of hardware with large performance differences are still lacking. For example, existing research on GPU supported data management largely handles the GPU in isolation from the systemâs CPU â The GPU is considered the central processor and the CPU used only to mitigate the GPUâs weaknesses where necessary. To make efficient use of all available devices, we developed a processing strategy that lets unequal devices like GPU and CPU combine their strengths rather than work in isolation. To this end, we decompose relational data into individual bits and place the resulting partitions on the appropriate devices. Operations are processed in phases, each phase executed on one device. This way, we achieve significant performance gains and good load distribution among the available devices in a limited real-life use case. To grow this idea into a generic system, we identify challenges as well as potential hardware configurations and applications that can benefit from this approach
Faster across the PCIe bus: A GPU library for lightweight decompression including support for patched compression schemes
This short paper present a collection of GPU lightweight decompression algorithms implementations within a FOSS library, Giddy - the first to be published to offer such functionality. As the use of compression is important in ameliorating PCIe data transfer bottlenecks, we believe this library and its constituent implementations can serve as useful building blocks in GPU-accelerated DBMSes --- as well as other data-intensive systems.
The paper also includes an initial exploration of GPU-oriented patched compression schemes. Patching makes compression ratio robust against outliers, and is important with real-life data, which (in contrast to many synthetic benchmark datasets) exhibits non-uniform data distributions and noise.
An experimental evaluation of both the unpatched and the patched schemes in Giddy is included
Query processing on low-energy many-core processors
Aside from performance, energy efficiency is an increasing challenge in database systems. To tackle both aspects in an integrated fashion, we pursue a hardware/software co-design approach. To fulfill the energy requirement from the hardware perspective, we utilize a low-energy processor design offering the possibility to us to place hundreds to millions of chips on a single board without any thermal restrictions. Furthermore, we address the performance requirement by the development of several database-specific instruction set extensions to customize each core, whereas each core does not have all extensions. Therefore, our hardware foundation is a low-energy processor consisting of a high number of heterogeneous cores. In this paper, we introduce our hardware setup on a system level and present several challenges for query processing. Based on these challenges, we describe two implementation concepts and a comparison between these concepts. Finally, we conclude the paper with some lessons learned and an outlook on our upcoming research directions
ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU
This paper presents implementations of a few selected SQL operations using theCUDA programming framework on the GPU platform. Nowadays, the GPUâsparallel architectures give a high speed-up on certain problems. Therefore, thenumber of non-graphical problems that can be run and sped-up on the GPUstill increases. Especially, there has been a lot of research in data mining onGPUs. In many cases it proves the advantage of oïŹoading processing fromthe CPU to the GPU. At the beginning of our project we chose the set ofSELECT WHERE and SELECT JOIN instructions as the most common op-erations used in databases. We parallelized these SQL operations using threemain mechanisms in CUDA: thread group hierarchy, shared memories, andbarrier synchronization. Our results show that the implemented highly parallelSELECT WHERE and SELECT JOIN operations on the GPU platform canbe signiïŹcantly faster than the sequential one in a database system run on theCPU
X-Device Query Processing by Bitwise Distribution
The diversity of hardware components within a single system calls for strategies for efficient cross-device data processing. For exam- ple, existing approaches to CPU/GPU co-processing distribute individual relational operators to the âmost appropriateâ device. While pleasantly simple, this strategy has a number of problems: it may leave the âinappropriateâ devices idle while overloading the âappropriateâ device and putting a high pressure on the PCI bus. To address these issues we distribute data among the devices by par- tially decomposing relations at the granularity of individual bits. Each of the resulting bit-partitions is stored and processed on one of the available devices. Using this strategy, we implemented a processor for spatial range queries that makes efficient use of all available devices. The performance gains achieved indicate that bitwise distribution makes a good cross-device processing strategy