Compressing High-Frequency Time Series Through Multiple Models and Stealing from Residuals

Abstract

Wind turbines are equipped with high-quality sensors that generate vast volumes of high-frequency time series. The time series are ingested on the edge and transferred to the cloud for later analytics. This process is complicated by challenges like low network bandwidth and high cloud storage costs. ModelarDB was proposed as a solution to efficiently manage time series across the entire pipeline by using so-called models for lossless or error-bounded lossy compression of time series. However, ModelarDB’s compression can be further improved through: 1) avoiding models that only represent few values by storing residuals (i.e., values that models fail to compress) explicitly with them; 2) exploiting error bounds even more through preprocessing; and 3) timestamp compression specialized for regular and irregular time series. We propose the multi-model compression method Fauna which uses 1) the novel model fitting method Platypus; 2) PMC and Swing for compressing values and; 3) the novel Macaque for compressing residuals and timestamps. Platypus is a model fitting method that uses different models for specialized compression of values and residuals. We then evaluate state-of-the-art lossless compression methods for 32-bit floats and propose preprocessing methods to add support for error-bounded compression. We present Macaque that includes MacaqueV and MacaqueTS. MacaqueV modifies Facebook Gorilla’s lossless compression method for 32-bit floats (GorillaV) and combines it with our novel preprocessing methods to now also enable error-bounded lossy compression. MacaqueTS is a lossless compression method for timestamps. Using only Platypus reduces ModelarDB’s storage use by up to 1.8x and significantly simplifies using the system. While also up to 7x better for lossless compression, ModelarDB with Fauna uses up to 2.5x less storage than ModelarDB and up to 14.5x, 7.2x, 17.5x and 14.2x less storage than ClickHouse, Apache IoTDB, Apache Parquet and TimescaleDB, respectively, with a realistic 1% error bound

Similar works

Full text

thumbnail-image

VBN (Videnbasen) Aalborg Universitets forskningsportal

redirect
Last time updated on 30/12/2025

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.