3 research outputs found
Baler -- Machine Learning Based Compression of Scientific Data
Storing and sharing increasingly large datasets is a challenge across
scientific research and industry. In this paper, we document the development
and applications of Baler - a Machine Learning based data compression tool for
use across scientific disciplines and industry. Here, we present Baler's
performance for the compression of High Energy Physics (HEP) data, as well as
its application to Computational Fluid Dynamics (CFD) toy data as a
proof-of-principle. We also present suggestions for cross-disciplinary
guidelines to enable feasibility studies for machine learning based compression
for scientific data.Comment: 10 pages and 6 figures, excluding appendi
Baler -- Machine Learning Based Compression of Scientific Data
Storing and sharing increasingly large datasets is a challenge across scientific research and industry. In this paper, we document the development and applications of Baler - a Machine Learning based data compression tool for use across scientific disciplines and industry. Here, we present Baler's performance for the compression of High Energy Physics (HEP) data, as well as its application to Computational Fluid Dynamics (CFD) toy data as a proof-of-principle. We also present suggestions for cross-disciplinary guidelines to enable feasibility studies for machine learning based compression for scientific data
ROOT I/O compression improvements for HEP analysis
We overview recent changes in the ROOT I/O system, enhancing it by improving its performance and interaction with other data analysis ecosystems. Both the newly introduced compression algorithms, the much faster bulk I/O data path, and a few additional techniques have the potential to significantly improve experiment’s software performance.The need for efficient lossless data compression has grown significantly as the amount of HEP data collected, transmitted, and stored has dramatically increased over the last couple of years. While compression reduces storage space and, potentially, I/O bandwidth usage, it should not be applied blindly, because there are significant trade-offs between the increased CPU cost for reading and writing files and the reduces storage space.We overview recent changes in the ROOT I/O system, increasing performance and enhancing it and improving its interaction with other data analysis ecosystems. Both the newly introduced compression algorithms, the much faster bulk I/O data path, and a few additional techniques have the potential to significantly to improve experiment's software performance. The need for efficient lossless data compression has grown significantly as the amount of HEP data collected, transmitted, and stored has dramatically increased during the LHC era. While compression reduces storage space and, potentially, I/O bandwidth usage, it should not be applied blindly: there are significant trade-offs between the increased CPU cost for reading and writing files and the reduce storage space