1 research outputs found
LeCo: Lightweight Compression via Learning Serial Correlations
Lightweight data compression is a key technique that allows column stores to
exhibit superior performance for analytical queries. Despite a comprehensive
study on dictionary-based encodings to approach Shannon's entropy, few prior
works have systematically exploited the serial correlation in a column for
compression. In this paper, we propose LeCo (i.e., Learned Compression), a
framework that uses machine learning to remove the serial redundancy in a value
sequence automatically to achieve an outstanding compression ratio and
decompression performance simultaneously. LeCo presents a general approach to
this end, making existing (ad-hoc) algorithms such as Frame-of-Reference (FOR),
Delta Encoding, and Run-Length Encoding (RLE) special cases under our
framework. Our microbenchmark with three synthetic and six real-world data sets
shows that a prototype of LeCo achieves a Pareto improvement on both
compression ratio and random access speed over the existing solutions. When
integrating LeCo into widely-used applications, we observe up to 3.9x speed up
in filter-scanning a Parquet file and a 16% increase in Rocksdb's throughput