4 research outputs found
ROOT I/O improvements for HEP analysis
We overview recent changes in the ROOT I/O system, increasing performance and enhancing it and improving its interaction with other data analysis ecosystems. Both the newly introduced compression algorithms, the much faster Bulk I/O data path, and a few additional techniques have the potential to significantly to improve experiments' software performance. The need for efficient lossless data compression has grown significantly as the amount of HEP data collected, transmitted, and stored has dramatically increased during the LHC era. While compression reduces storage space and, potentially, I/O bandwidth usage, it should not be applied blindly: there are significant trade-offs between the increased CPU cost for reading and writing files and the reduce storage space. As ROOT I/O is responsible for providing serializing functionality for complex C++ objects, for example for physics reconstruction. At the same time, analysis workflows typically have simpler objects compared to reconstruction. This later case's performance can be improve by using the ROOT "bulk I/O" interface, allowing to access multiple events to be returned per one library call
Extreme compression for Large Scale Data store
For the last 5 years Accelogic pioneered and perfected a radically new theory of numerical computing codenamed "Compressive Computing", which has an extremely profound impact on real-world computer science. At the core of this new theory is the discovery of one of its fundamental theorems which states that, under very general conditions, the vast majority (typically between 70% and 80%) of the bits used in modern large-scale numerical computations are absolutely irrelevant for the accuracy of the end result. This theory of Compressive Computing provides mechanisms able to identify (with high intelligence and surgical accuracy) the number of bits (i.e., the precision) that can be used to represent numbers without affecting the substance of the end results, as they are computed and vary in real time. The bottom line outcome would be to provide a state-of-the-art compression algorithm that surpasses those currently available in the ROOT framework, with the purpose of enabling substantial economic and operational gains (including speedup) for High Energy and Nuclear Physics data storage/analysis. In our initial studies, a factor of nearly x4 (3.9) compression was achieved with RHIC/STAR data where ROOT compression managed only x1.4. In this contribution, we will present our concepts of "functionally lossless compression", have a glance at examples and achievements in other communities, present the results and outcome of our current R&D as well as present a high-level view of our plan to move forward with a ROOT implementation that would deliver a basic solution readily integrated into HENP applications. As a collaboration of experimental scientists, private industry, and the ROOT Team, our aim is to capitalize on the substantial success delivered by the initial effort and produce a robust technology properly packaged as an open-source tool that could be used by virtually every experiment around the world as means for improving data management and accessibility
TMPIFile: A New Parallel I/O Solution in ROOT
Communication among processes is generating considerable interest in the scientific computing community due to the increasing use of distributed memory systems. In the field of high energy physics (HEP), however, little research has been addressed on this topic. More precisely in ROOT I/O, the de facto standard for data persistence in HEP applications, no such feature is provided. In order to perform efficient and robust cross-node communications, we introduce the TMPIFile functionality into ROOT where Message Passing Interface (MPI) is used to pass data across the entire distributed system. In the case of ATLAS workflows, instead of writing to file, the compressed data for each node is now stored into the sender side of TMPIFile. After a certain amount of data is collected, the TMPIFile sender can automatically package the data into a memory buffer and send it via MPI. On the other end of the communication, a collector receives buffers from multiple senders, merges them into one file, and at last, writes the file into the disk. Multiple collectors can be created to avoid I/O contention. Test results will be shown from runs at NERSC and ALCF
ROOT - An Object-Oriented Data Analysis Framework. root-project/root: v6.10/04
Patch release of the v6-10 series. See https://root.cern</p
