26 research outputs found

    Scalable Model-Based Management of Correlated Dimensional Time Series in ModelarDB+

    Full text link
    To monitor critical infrastructure, high quality sensors sampled at a high frequency are increasingly used. However, as they produce huge amounts of data, only simple aggregates are stored. This removes outliers and fluctuations that could indicate problems. As a remedy, we present a model-based approach for managing time series with dimensions that exploits correlation in and among time series. Specifically, we propose compressing groups of correlated time series using an extensible set of model types within a user-defined error bound (possibly zero). We name this new category of model-based compression methods for time series Multi-Model Group Compression (MMGC). We present the first MMGC method GOLEMM and extend model types to compress time series groups. We propose primitives for users to effectively define groups for differently sized data sets, and based on these, an automated grouping method using only the time series dimensions. We propose algorithms for executing simple and multi-dimensional aggregate queries on models. Last, we implement our methods in the Time Series Management System (TSMS) ModelarDB (ModelarDB+). Our evaluation shows that compared to widely used formats, ModelarDB+ provides up to 13.7 times faster ingestion due to high compression, 113 times better compression due to the adaptivity of GOLEMM, 630 times faster aggregates by using models, and close to linear scalability. It is also extensible and supports online query processing.Comment: 12 Pages, 28 Figures, and 1 Tabl

    Model-Based Time Series Management at Scale

    Get PDF

    Scalable Model-Based Management of Massive High Frequency Wind Turbine Data with ModelarDB

    Get PDF
    Modern wind turbines are monitored by sensors that generate massive amounts of time series data that is ingested on the edge before it is transferred to the cloud where it is stored and queried. This results in at least four challenges: 1) High frequency time series data must be ingested on limited hardware fast enough to keep up with the generation; 2) Limited bandwidth makes it impossible to transfer the data without compression; 3) High storage costs when data is stored; and 4) Low data quality to unbounded lossy compression methods commonly used by practitioners. Practitioners currently use solutions that only solve some of these challenges. In this paper, we evaluate a solution for the entire pipeline based on the Time Series Management System ModelarDB that addresses all four challenges efficiently. With ModelarDB, the user can exploit both lossless and error-bounded lossy compression. We evaluate the solution in a realistic edge-to-cloud scenario with real-world data under different aspects. For lossless compression, ModelarDB achieves up to 1.5x better compression and 1.2x better transfer efficiency than lossless solutions commonly used by practitioners. For lossy compression, ModelarDB offers significant compression comparable to a lossy method commonly used by practitioners today. However, ModelarDB has orders of magnitude smaller errors

    Time Series Management Systems: A 2022 Survey

    Get PDF
    Enormous amounts of time series are being collected in many different domains. These include, but are not limited to, aviation, computing, energy, finance, logistics, and medicine. However, general-purpose Database Management Systems (DBMSs) are not optimized for times series management and thus significantly limit the amount of time series that can be efficiently stored and analyzed. As a remedy, specialized Time Series Management Systems (TSMSs) have been developed. This chapter, provides a thorough survey and classification of TSMSs that are developed through academic or industrial research and documented through peer-reviewed papers. To document their design and novel contributions, a summary of each system is provided. The systems are primarily classified based on their architecture. In addition, the systems are classified based on: when and why each system was developed, how it can be deployed, how mature its implementation is, how scalable it is, how it processes time series, what interfaces it provides, the type of approximation it supports, how low latency it can achieve, how it stores time series, and the types of queries it supports. The chapter concludes with a collection of open research problems based on the limitations of the surveyed systems

    MDZ: An Efficient Error-Bounded Lossy Compressor for Molecular Dynamics

    Get PDF
    Molecular dynamics (MD) has been widely used in today\u27s scientific research across multiple domains including materials science, biochemistry, biophysics, and structural biology. MD simulations can produce extremely large amounts of data in that each simulation could involve a large number of atoms (up to trillions) for a large number of timesteps (up to hundreds of millions). In this paper, we perform an in-depth analysis of a number of MD simulation datasets and then develop an efficient error-bounded lossy compressor that can significantly improve the compression ratios. The contributions are fourfold. (1) We characterize a number of MD datasets and summarize two commonly used execution models. (2) We develop an adaptive error-bounded lossy compression framework (called MDZ), which can optimize the compression for both execution models adaptively by taking advantage of their specific characteristics. (3) We compare our solution with six other state-of-the-art related works by using three MD simulation packages each with multiple configurations. Experiments show that our solution has up to 233 % higher compression ratios than the second-best lossy compressor in most cases. (4) We demonstrate that MDZ is fully capable of handling particle data beyond MD simulations

    Gestión distribuida de datos para aplicaciones de IoT

    Get PDF
    En un sistema tradicional de Internet de las Cosas (IoT), los datos recopilados por los nodos sensores y actuadores son enviados a la nube para su almacenamiento y análisis, mientras que los nodos reciben como respuesta comandos o instrucciones de control que producen cambios en sus actuadores. Este enfoque conduce a una alta latencia en la comunicación, un flujo de datos ascendente alto y mayores costos en los centros de datos en la nube. Adicionalmente, muchos sistemas de Internet de las Cosas experimentan problemas de conectividad que provocan la pérdida de datos si no existe un almacenamiento local coordinado. Este proyecto propone desarrollar un sistema distribuido que combina almacenamiento al borde de la red con servicios en la nube para mitigar estos inconvenientes. Este enfoque reduce latencias, optimiza el uso del ancho de banda y asegura la continuidad operativa en escenarios con conectividad limitada. Los resultados esperados incluyen un prototipo funcional que demuestre la viabilidad de esta arquitectura en aplicaciones reales.Red de Universidades con Carreras en Informátic

    Adaptive Encoding Strategies for Erasing-Based Lossless Floating-Point Compression

    Full text link
    Lossless floating-point time series compression is crucial for a wide range of critical scenarios. Nevertheless, it is a big challenge to compress time series losslessly due to the complex underlying layouts of floating-point values. The state-of-the-art erasing-based compression algorithm Elf demonstrates a rather impressive performance. We give an in-depth exploration of the encoding strategies of Elf, and find that there is still much room for improvement. In this paper, we propose Elf*, which employs a set of optimizations for leading zeros, center bits and sharing condition. Specifically, we develop a dynamic programming algorithm with a set of pruning strategies to compute the adaptive approximation rules efficiently. We theoretically prove that the adaptive approximation rules are globally optimal. We further extend Elf* to Streaming Elf*, i.e., SElf*, which achieves almost the same compression ratio as Elf*, while enjoying even higher efficiency in streaming scenarios. We compare Elf* and SElf* with 8 competitors using 22 datasets. The results demonstrate that SElf* achieves 9.2% relative compression ratio improvement over the best streaming competitor while maintaining similar efficiency, and that Elf* ranks among the most competitive batch compressors. All source codes are publicly released

    Evaluating the Impact of Error-Bounded Lossy Compression on Time Series Forecasting

    Get PDF
    Time series data is widely used for decision-making and advanced analytics such as forecasting. However, the vast data volumes make storage challenging. Using lossy compression can save more space compared to lossless methods, but it can affect the forecasting accuracy. Understanding the impact of lossy compression on forecasting accuracy is a multifaceted challenge, necessitating experimental evaluation across various forecasting models, compression methods, and time series. This paper conducts such experimental evaluation by combining seven forecasting models, three lossy compression algorithms, and six datasets. By simulating a real-life scenario where forecasting models use lossy compressed data for prediction, we address three main research questions related to compression error and its effects on the time series characteristics and the forecasting models. The results show that the Poor Man’s Compression and Swing Filter lossy compression algorithms add less error than the Squeeze method as the error bound increases. Poor Man’s Compression provides the best balance between compression ratio and forecasting accuracy. Specifically, we obtained an average compression ratio of 13.65, 5.56, and 14.97 for PMC, SWING, and SZ with an average impact on forecasting accuracy of 5.56%, 3.3%, and 8.5%, respectively. An analysis of several time series characteristics shows that the maximum Kullback-Leibler divergence between consecutive windows in the time series is the best indicator of the impact of lossy compression on forecasting accuracy. Finally, our results indicate that simple models like Arima, are more resilient to lossy compression than complex deep learning models. The source code and data are available at https://github.com/cmcuza/EvalImpLSTS.</p

    DeepMapping: Learned Data Mapping for Lossless Compression and Efficient Lookup

    Full text link
    Storing tabular data to balance storage and query efficiency is a long-standing research question in the database community. In this work, we argue and show that a novel DeepMapping abstraction, which relies on the impressive memorization capabilities of deep neural networks, can provide better storage cost, better latency, and better run-time memory footprint, all at the same time. Such unique properties may benefit a broad class of use cases in capacity-limited devices. Our proposed DeepMapping abstraction transforms a dataset into multiple key-value mappings and constructs a multi-tasking neural network model that outputs the corresponding values for a given input key. To deal with memorization errors, DeepMapping couples the learned neural network with a lightweight auxiliary data structure capable of correcting mistakes. The auxiliary structure design further enables DeepMapping to efficiently deal with insertions, deletions, and updates even without retraining the mapping. We propose a multi-task search strategy for selecting the hybrid DeepMapping structures (including model architecture and auxiliary structure) with a desirable trade-off among memorization capacity, size, and efficiency. Extensive experiments with a real-world dataset, synthetic and benchmark datasets, including TPC-H and TPC-DS, demonstrated that the DeepMapping approach can better balance the retrieving speed and compression ratio against several cutting-edge competitors
    corecore