12 research outputs found

    ROOT for the HL-LHC: data format

    Full text link
    This document discusses the state, roadmap, and risks of the foundational components of ROOT with respect to the experiments at the HL-LHC (Run 4 and beyond). As foundational components, the document considers in particular the ROOT input/output (I/O) subsystem. The current HEP I/O is based on the TFile container file format and the TTree binary event data format. The work going into the new RNTuple event data format aims at superseding TTree, to make RNTuple the production ROOT event data I/O that meets the requirements of Run 4 and beyond

    25th International Conference on Computing in High Energy & Nuclear Physics

    No full text
    High energy physics has a constant demand for random number generators (RNGs) with high statistical quality. In this paper, we present ROOT's implementation of the RANLUX++ generator. We discuss the choice of relying only on standard C++ for portability reasons. Building on an initial implementation, we describe a set of optimizations to increase generator speed. This allows to reach performance very close to the original assembler version. We test our implementation on an Apple M1 and Nvidia GPUs to demonstrate the advantages of portable code

    A Portable Implementation of RANLUX++

    Get PDF
    High energy physics has a constant demand for random number generators (RNGs) with high statistical quality. In this paper, we present ROOT’s implementation of the RANLUX++ generator. We discuss the choice of relying only on standard C++ for portability reasons. Building on an initial implementation, we describe a set of optimizations to increase generator speed. This allows to reach performance very close to the original assembler version. We test our implementation on an Apple M1 and Nvidia GPUs to demonstrate the advantages of portable code

    Measurement data for paper "Evaluation of Asynchronous Offloading Capabilities of Accelerator Programming Models for Multiple Devices"

    No full text
    Accelerator devices are increasingly used to build large supercomputers and current installations usually include more than one accelerator per system node. To keep all devices busy, kernels have to be executed concurrently which can be achieved via asynchronous kernel launches. Our work compares the performance for an implementation of the Conjugate Gradient method with CUDA, OpenCL, and OpenACC on NVIDIA Pascal GPUs. Furthermore, it takes a look at Intel Xeon Phi coprocessors when programmed with OpenCL and OpenMP. In doing so, it tries to answer the question of whether the higher abstraction level of directive based models is inferior to lower level paradigms in terms of performance.This archive contains the modications to liboffload, all binaries and libraries including their respective commit ids, and the raw data of ourmeasurements

    Full Simulation of CMS for Run-3 and Phase-2

    No full text
    In this contribution we report status of the CMS Geant4 simulation and the prospects for Run-3 and Phase-2. Firstly, we report about our experience during the start of Run-3 with Geant4 10.7.2, the common software package DD4hep for geometry description, and VecGeom run time geometry library. In addition, FTFP\_BERT\_EMM Physics List and CMS configuration for tracking in magnetic field have been utilized. For the first time, for the Grid mass production of Monte Carlo, this combination of components is used. Further simulation improvements are under development targeting Run-3 such as the switch to the new Geant4 11.1 in production, that provides several features important for the optimization of simulation, for example the new transportation process with built-in multiple scattering, neutron general process, custom tracking manager, G4HepEm sub-library, and others. We will present evaluation of various options, validation results, and the final choice of simulation configuration for 2023 production and beyond. The performance of the CMS full simulation for Run-2 and Run-3 will also be discussed. CMS development plan for the Phase-2 Geant4 based simulation is very ambitious, and it includes a new geometry description, physics, and simulation configurations. The progress on new detector descriptions and full simulation will be presented as well as the research and development in progress to reduce compute capacity needs. Finally, the status of the R and D for using Celeritas and AdePT GPU prototypes in CMSSW will be presented

    Approaches for Task Affinity in OpenMP

    Get PDF
    OpenMP tasking supports parallelization of irregular algorithms. Recent OpenMP specifications extended tasking to increase functionality and to support optimizations, for instance with the taskloop construct. However, task scheduling remains opaque, which leads to inconsistent performance on NUMA architectures. We assess design issues for task affinity and explore several approaches to enable it. We evaluate these proposals with implementations in the Nanos++ and LLVM OpenMP runtimes that improve performance up to 40 % and significantly reduce execution time variation.Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energys National Nuclear Security Administration under contract DE-AC04-94AL85000. This work has been developed with the support of the grant SEV-2011-00067 of the Severo Ochoa Program, awarded by the Spanish Government, by the Spanish Ministry of Science and Innovation (TIN2015-65316-P, Computacion de Altas Prestaciones VII) and by the Intel-BSC Exascale Lab collaboration project. Some of the experiments were performed with computing resources granted by JARA- HPC from RWTH Aachen University under project jara0001. Parts of this work were funded by the German Federal Ministry of Research and Education (BMBF) under grant numbers 01IH13008A(ELP). Intel and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. * Other names and brands are the property of their respective owners. Software and workloads used in performance tests may have been optimized for per- formance only on Intel microprocessors. Performance tests, such as SYSmark and Mobile-Mark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance. Intel’s compilers may or may not optimize to the same degree for non-Intel micro- processors for optimizations that are not unique to Intel microprocessors. These opti- mizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.Peer Reviewe
    corecore