117 research outputs found
ROOT Status and Future Developments
In this talk we will review the major additions and improvements made to the
ROOT system in the last 18 months and present our plans for future
developments. The additons and improvements range from modifications to the I/O
sub-system to allow users to save and restore objects of classes that have not
been instrumented by special ROOT macros, to the addition of a geometry package
designed for building, browsing, tracking and visualizing detector geometries.
Other improvements include enhancements to the quick analysis sub-system
(TTree::Draw()), the addition of classes that allow inter-file object
references (TRef, TRefArray), better support for templated and STL classes,
amelioration of the Automatic Script Compiler and the incorporation of new
fitting and mathematical tools. Efforts have also been made to increase the
modularity of the ROOT system with the introduction of more abstract interfaces
and the development of a plug-in manager. In the near future, we intend to
continue the development of PROOF and its interfacing with GRID environments.
We plan on providing an interface between Geant3, Geant4 and Fluka and the new
geometry package. The ROOT GUI classes will finally be available on Windows and
we plan to release a GUI inspector and builder. In the last year, ROOT has
drawn the endorsement of additional experiments and institutions. It is now
officially supported by CERN and used as key I/O component by the LCG project.Comment: Talk from the 2003 Computing in High Energy and Nuclear Physics
(CHEP03), La Jolla, Ca, USA, March 2003, 5 pages, MSWord, pSN MOJT00
Increasing Parallelism in the ROOT I/O Subsystem
When processing large amounts of data, the rate at which reading and writing
can take place is a critical factor. High energy physics data processing
relying on ROOT is no exception. The recent parallelisation of LHC experiments'
software frameworks and the analysis of the ever increasing amount of collision
data collected by experiments further emphasized this issue underlying the need
of increasing the implicit parallelism expressed within the ROOT I/O. In this
contribution we highlight the improvements of the ROOT I/O subsystem which
targeted a satisfactory scaling behaviour in a multithreaded context. The
effect of parallelism on the individual steps which are chained by ROOT to read
and write data, namely (de)compression, (de)serialisation, access to storage
backend, are discussed. Performance measurements are discussed through real
life examples coming from CMS production workflows on traditional server
platforms and highly parallel architectures such as Intel Xeon Phi
Bitmap indices for fast end-user physics analysis in root.
Most physics analysis jobs involve multiple selection steps on the input data. These selection steps are called cuts or queries. A common strategy to implement these queries is to read all input data from files and then process the queries in memory. In many applications the number of variables used to define these queries is a relative small portion of the overall data set therefore reading all variables into memory takes unnecessarily long time. In this paper we describe an integration effort that can significantly reduce this unnecessary reading by using an efficient compressed bitmap index technology. The primary advantage of this index is that it can process arbitrary combinations of queries very efficiently, while most other indexing technologies suffer from the "curse of dimensionality" as the number of queries increases. By integrating this index technology with the ROOT analysis framework, the end-users can benefit from the added efficiency without having to modify their analysis programs. Our performance results show that for multi-dimensional queries, bitmap indices outperform the traditional analysis method up to a factor of 10
Boosting RDataFrame performance with transparent bulk event processing
RDataFrame is ROOT’s high-level interface for Python and C++ data analysis. Since it first became available, RDataFrame adoption has grown steadily and it is now poised to be a major component of analysis software pipelines for LHC Run 3 and beyond. Thanks to its design inspired by declarative programming principles, RDataFrame enables the development of highperformance, highly parallel analyses without requiring expert knowledge of multi-threading and I/O: user logic is expressed in terms of self-contained, small computation kernels tied together by a high-level API. This design completely decouples analysis logic from its actual execution, and opens several interesting avenues for workflow optimization. In particular, in this work we explore the benefits of moving internal data processing from an event-by-event to a bulkby-bulk loop. This refactoring dramatically reduces the framework’s runtime overheads; in collaboration with the I/O layer it improves data access patterns; it exposes information that optimizing compilers might use to auto-vectorize the invocation of user-defined computations; finally, while existing user-facing interfaces remain unaffected, it becomes possible to additionally offer interfaces that explicitly expose bulks of events, useful e.g. for the injection of GPU kernels into the analysis workflow. In order to inform similar future R&D, design challenges will be presented, as well as an investigation of the relevant timememory trade-off backed by novel performance benchmarks
Software Challenges For HL-LHC Data Analysis
The high energy physics community is discussing where investment is needed to
prepare software for the HL-LHC and its unprecedented challenges. The ROOT
project is one of the central software players in high energy physics since
decades. From its experience and expectations, the ROOT team has distilled a
comprehensive set of areas that should see research and development in the
context of data analysis software, for making best use of HL-LHC's physics
potential. This work shows what these areas could be, why the ROOT team
believes investing in them is needed, which gains are expected, and where
related work is ongoing. It can serve as an indication for future research
proposals and cooperations
ROOT - A C++ Framework for Petabyte Data Storage, Statistical Analysis and Visualization
ROOT is an object-oriented C++ framework conceived in the high-energy physics
(HEP) community, designed for storing and analyzing petabytes of data in an
efficient way. Any instance of a C++ class can be stored into a ROOT file in a
machine-independent compressed binary format. In ROOT the TTree object
container is optimized for statistical data analysis over very large data sets
by using vertical data storage techniques. These containers can span a large
number of files on local disks, the web, or a number of different shared file
systems. In order to analyze this data, the user can chose out of a wide set of
mathematical and statistical functions, including linear algebra classes,
numerical algorithms such as integration and minimization, and various methods
for performing regression analysis (fitting). In particular, ROOT offers
packages for complex data modeling and fitting, as well as multivariate
classification based on machine learning techniques. A central piece in these
analysis tools are the histogram classes which provide binning of one- and
multi-dimensional data. Results can be saved in high-quality graphical formats
like Postscript and PDF or in bitmap formats like JPG or GIF. The result can
also be stored into ROOT macros that allow a full recreation and rework of the
graphics. Users typically create their analysis macros step by step, making use
of the interactive C++ interpreter CINT, while running over small data samples.
Once the development is finished, they can run these macros at full compiled
speed over large data sets, using on-the-fly compilation, or by creating a
stand-alone batch program. Finally, if processing farms are available, the user
can reduce the execution time of intrinsically parallel tasks - e.g. data
mining in HEP - by using PROOF, which will take care of optimally distributing
the work over the available resources in a transparent way
Contract Aware Components, 10 years after
The notion of contract aware components has been published roughly ten years
ago and is now becoming mainstream in several fields where the usage of
software components is seen as critical. The goal of this paper is to survey
domains such as Embedded Systems or Service Oriented Architecture where the
notion of contract aware components has been influential. For each of these
domains we briefly describe what has been done with this idea and we discuss
the remaining challenges.Comment: In Proceedings WCSI 2010, arXiv:1010.233
ROOT for the HL-LHC: data format
This document discusses the state, roadmap, and risks of the foundational
components of ROOT with respect to the experiments at the HL-LHC (Run 4 and
beyond). As foundational components, the document considers in particular the
ROOT input/output (I/O) subsystem. The current HEP I/O is based on the TFile
container file format and the TTree binary event data format. The work going
into the new RNTuple event data format aims at superseding TTree, to make
RNTuple the production ROOT event data I/O that meets the requirements of Run 4
and beyond
ROOT’s RNTuple I/O Subsystem: The Path to Production
The RNTuple I/O subsystem is ROOT’s future event data file format and access API. It is driven by the expected data volume increase at upcoming HEP experiments, e.g. at the HL-LHC, and recent opportunities in the storage hardware and software landscape such as NVMe drives and distributed object stores. RNTuple is a redesign of the TTree binary format and API and has shown to deliver substantially faster data throughput and better data compression both compared to TTree and to industry standard formats. In order to let HENP computing workflows benefit from RNTuple’s superior performance, however, the I/O stack needs to connect efficiently to the rest of the ecosystem, from grid storage to (distributed) analysis frameworks to (multithreaded) experiment frameworks for reconstruction and ntuple derivation. With the RNTuple binary format soon arriving at its first production release, we present RNTuple’s feature set, integration efforts, and its performance impact on the time-to-solution. We show the latest performance figures of RDataFrame analysis code of realistic complexity, comparing RNTuple and TTree as data sources. We discuss RNTuple’s approach to functionality critical to the HENP I/O (such as multithreaded writes, fast data merging, schema evolution) and we provide an outlook on the road to its use in production
I/O performance studies of analysis workloads on production and dedicated resources at CERN
The recent evolutions of the analysis frameworks and physics data formats of the LHC experiments provide the opportunity of using central analysis facilities with a strong focus on interactivity and short turnaround times, to complement the more common distributed analysis on the Grid. In order to plan for such facilities, it is essential to know in detail the performance of the combination of a given analysis framework, of a specific analysis and of the installed computing and storage resources. This contribution describes performance studies performed at CERN, using the EOS disk-based storage, either directly or through an XCache instance, from both batch resources and highperformance compute nodes which could be used to build an analysis facility. A variety of benchmarks, both synthetic and based on real-world physics analyses and their corresponding input datasets, are utilized. In particular, the RNTuple format from the ROOT project is put to the test and compared to the latest version of the TTree format, and the impact of caches is assessed. In addition, we assessed the difference in performance between the use of storage system specific protocols, like XRootd, and FUSE. The results of this study are intended to be a valuable input in the design of analysis facilities, at CERN and elsewhere
- …