2 research outputs found
CODAG: Characterizing and Optimizing Decompression Algorithms for GPUs
Data compression and decompression have become vital components of big-data
applications to manage the exponential growth in the amount of data collected
and stored. Furthermore, big-data applications have increasingly adopted GPUs
due to their high compute throughput and memory bandwidth. Prior works presume
that decompression is memory-bound and have dedicated most of the GPU's threads
to data movement and adopted complex software techniques to hide memory latency
for reading compressed data and writing uncompressed data. This paper shows
that these techniques lead to poor GPU resource utilization as most threads end
up waiting for the few decoding threads, exposing compute and synchronization
latencies.
Based on this observation, we propose CODAG, a novel and simple kernel
architecture for high throughput decompression on GPUs. CODAG eliminates the
use of specialized groups of threads, frees up compute resources to increase
the number of parallel decompression streams, and leverages the ample compute
activities and the GPU's hardware scheduler to tolerate synchronization,
compute, and memory latencies. Furthermore, CODAG provides a framework for
users to easily incorporate new decompression algorithms without being burdened
with implementing complex optimizations to hide memory latency. We validate our
proposed architecture with three different encoding techniques, RLE v1, RLE v2,
and Deflate, and a wide range of large datasets from different domains. We show
that CODAG provides 13.46x, 5.69x, and 1.18x speed up for RLE v1, RLE v2, and
Deflate, respectively, when compared to the state-of-the-art decompressors from
NVIDIA RAPIDS