Search CORE

12 research outputs found

ROOT for the HL-LHC: data format

Author: Bellenot Bertrand
Blomer Jakob
Canal Philippe
Couet Olivier
Gomez Javier Lopez
Gruber Bernhard Manfred
Guiraud Enrico
Hahnfeld Jonas
Linev Sergey
Moneta Lorenzo
Naumann Axel
Padulano Vincenzo Eduardo
Rembser Jonas
Tadel Alja Mrak
Tadel Matevz
Tejedor Enric
Vassilev Vassil
Publication venue
Publication date: 09/04/2022
Field of study

This document discusses the state, roadmap, and risks of the foundational components of ROOT with respect to the experiments at the HL-LHC (Run 4 and beyond). As foundational components, the document considers in particular the ROOT input/output (I/O) subsystem. The current HEP I/O is based on the TFile container file format and the TTree binary event data format. The work going into the new RNTuple event data format aims at superseding TTree, to make RNTuple the production ROOT event data I/O that meets the requirements of Run 4 and beyond

arXiv.org e-Print Archive

CERN Document Server

A workflow for analyzing the effect of compiler optimizations on HPC applications

Author: Hahnfeld Jonas
Publication venue
Publication date: 01/01/2020
Field of study

Publikationsserver der RWTH Aachen University

25th International Conference on Computing in High Energy & Nuclear Physics

Author: Hahnfeld Jonas
Publication venue
Publication date: 01/01/2021
Field of study

High energy physics has a constant demand for random number generators (RNGs) with high statistical quality. In this paper, we present ROOT's implementation of the RANLUX++ generator. We discuss the choice of relying only on standard C++ for portability reasons. Building on an initial implementation, we describe a set of optimizations to increase generator speed. This allows to reach performance very close to the original assembler version. We test our implementation on an Apple M1 and Nvidia GPUs to demonstrate the advantages of portable code

CERN Document Server

A Portable Implementation of RANLUX++

Author: Jonas Hahnfeld
Lorenzo Moneta
Publication venue: 'EDP Sciences'
Publication date: 01/01/2021
Field of study

High energy physics has a constant demand for random number generators (RNGs) with high statistical quality. In this paper, we present ROOT’s implementation of the RANLUX++ generator. We discuss the choice of relying only on standard C++ for portability reasons. Building on an initial implementation, we describe a set of optimizations to increase generator speed. This allows to reach performance very close to the original assembler version. We test our implementation on an Apple M1 and Nvidia GPUs to demonstrate the advantages of portable code

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

Directory of Open Access Journals

CERN Document Server

Evaluation of Asynchronous Offloading Capabilities of Accelerator Programming Models for Multiple Devices

Author: Hahnfeld Jonas
Müller Matthias S.
Pflug Hans Joachim
Price James
Terboven Christian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

A Pattern for Overlapping Communication and Computation with OpenMP Target Directives

Author: Cramer Tim
Hahnfeld Jonas
Klemm Michael
Müller Matthias S.
Terboven Christian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Crossref

Publikationsserver der RWTH Aachen University

OpenMP Tools Interface: Synchronization Information for Data Race Detection

Author: Ahn Dong H.
Hahnfeld Jonas
Müller Matthias S.
Protze Joachim
Schulz Martin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Crossref

Publikationsserver der RWTH Aachen University

Measurement data for paper "Evaluation of Asynchronous Offloading Capabilities of Accelerator Programming Models for Multiple Devices"

Author: Hahnfeld Jonas
Müller Matthias S.
Pflug Hans Joachim
Price James
Terboven Christian
Publication venue: RWTH Aachen
Publication date: 01/01/2018
Field of study

Accelerator devices are increasingly used to build large supercomputers and current installations usually include more than one accelerator per system node. To keep all devices busy, kernels have to be executed concurrently which can be achieved via asynchronous kernel launches. Our work compares the performance for an implementation of the Conjugate Gradient method with CUDA, OpenCL, and OpenACC on NVIDIA Pascal GPUs. Furthermore, it takes a look at Intel Xeon Phi coprocessors when programmed with OpenCL and OpenMP. In doing so, it tries to answer the question of whether the higher abstraction level of directive based models is inferior to lower level paradigms in terms of performance.This archive contains the modications to liboffload, all binaries and libraries including their respective commit ids, and the raw data of ourmeasurements

Publikationsserver der RWTH Aachen University

Full Simulation of CMS for Run-3 and Phase-2

Author: Banerjee Sunanda
Ivantchenko Vladimir
Jonas Hahnfeld
Krammer Natascha
Muzaffar Shahzad
Pedro Kevin Jerome
Piparo Danilo
Srimanobhas Norraphat
Publication venue
Publication date: 08/09/2023
Field of study

In this contribution we report status of the CMS Geant4 simulation and the prospects for Run-3 and Phase-2. Firstly, we report about our experience during the start of Run-3 with Geant4 10.7.2, the common software package DD4hep for geometry description, and VecGeom run time geometry library. In addition, FTFP\_BERT\_EMM Physics List and CMS configuration for tracking in magnetic field have been utilized. For the first time, for the Grid mass production of Monte Carlo, this combination of components is used. Further simulation improvements are under development targeting Run-3 such as the switch to the new Geant4 11.1 in production, that provides several features important for the optimization of simulation, for example the new transportation process with built-in multiple scattering, neutron general process, custom tracking manager, G4HepEm sub-library, and others. We will present evaluation of various options, validation results, and the final choice of simulation configuration for 2023 production and beyond. The performance of the CMS full simulation for Run-2 and Run-3 will also be discussed. CMS development plan for the Phase-2 Geant4 based simulation is very ambitious, and it includes a new geometry description, physics, and simulation configurations. The progress on new detector descriptions and full simulation will be presented as well as the research and development in progress to reduce compute capacity needs. Finally, the status of the R and D for using Celeritas and AdePT GPU prototypes in CMSSW will be presented

CERN Document Server

Approaches for Task Affinity in OpenMP

Author: de Supinski Bronis R.
Duran Alejandro
Hahnfeld Jonas
Klemm Michael
Mateo Sergi
Olivier Stephen L.
Terboven Christian
Teruel Xavier
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

OpenMP tasking supports parallelization of irregular algorithms. Recent OpenMP specifications extended tasking to increase functionality and to support optimizations, for instance with the taskloop construct. However, task scheduling remains opaque, which leads to inconsistent performance on NUMA architectures. We assess design issues for task affinity and explore several approaches to enable it. We evaluate these proposals with implementations in the Nanos++ and LLVM OpenMP runtimes that improve performance up to 40 % and significantly reduce execution time variation.Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energys National Nuclear Security Administration under contract DE-AC04-94AL85000. This work has been developed with the support of the grant SEV-2011-00067 of the Severo Ochoa Program, awarded by the Spanish Government, by the Spanish Ministry of Science and Innovation (TIN2015-65316-P, Computacion de Altas Prestaciones VII) and by the Intel-BSC Exascale Lab collaboration project. Some of the experiments were performed with computing resources granted by JARA- HPC from RWTH Aachen University under project jara0001. Parts of this work were funded by the German Federal Ministry of Research and Education (BMBF) under grant numbers 01IH13008A(ELP). Intel and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. * Other names and brands are the property of their respective owners. Software and workloads used in performance tests may have been optimized for per- formance only on Intel microprocessors. Performance tests, such as SYSmark and Mobile-Mark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance. Intel’s compilers may or may not optimize to the same degree for non-Intel micro- processors for optimizations that are not unique to Intel microprocessors. These opti- mizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.Peer Reviewe

UPCommons. Portal del coneixement obert de la UPC

Publikationsserver der RWTH Aachen University