Search CORE

2,144 research outputs found

Recommended from our members

Validation of Software Releases for CMS

Author: Gutsche Oliver
Publication venue
Publication date: 14/05/2009
Field of study

The CMS software stack currently consists of more than 2 Million lines of code developed by over 250 authors with a new version being released every week. CMS has setup a validation process for quality assurance which enables the developers to compare the performance of a release to previous releases and references. The validation process provides the developers with reconstructed datasets of real data and MC samples. The samples span the whole range of detector effects and important physics signatures to benchmark the performance of the software. They are used to investigate interdependency effects of all CMS software components and to find and fix bugs. The release validation process described here is an integral part of CMS software development and contributes significantly to ensure stable production and analysis. It represents a sizable contribution to the overall MC production of CMS. Its success emphasizes the importance of a streamlined release validation process for projects with a large code basis and significant number of developers and can function as a model for future projects

UNT Digital Library

CERN Document Server

Event processing time prediction at the CMS experiment of the Large Hadron Collider

Author: Cury Samir
Gutsche Oliver
Kcira Dorian
Publication venue: 'AIP Publishing'
Publication date: 11/06/2014
Field of study

The physics event reconstruction is one of the biggest challenges for the computing of the LHC experiments. Among the different tasks that computing systems of the CMS experiment performs, the reconstruction takes most of the available CPU resources. The reconstruction time of single collisions varies according to event complexity. Measurements were done in order to determine this correlation quantitatively, creating means to predict it based on the data-taking conditions of the input samples. Currently the data processing system splits tasks in groups with the same number of collisions and does not account for variations in the processing time. These variations can be large and can lead to a considerable increase in the time it takes for CMS workflows to finish. The goal of this study was to use estimates on processing time to more efficiently split the workflow into jobs. By considering the CPU time needed for each job the spread of the job-length distribution in a workflow is reduced

Caltech Authors

Big Data in HEP: A comprehensive use case study

Author: Cremonesi Matteo
Elmer Peter
Gutsche Oliver
Jayatilaka Bo
Kowalkowski Jim
Pivarski Jim
Sehrish Saba
Surez Cristina Mantilla
Svyatkovskiy Alexey
Tran Nhan
Publication venue: 'IOP Publishing'
Publication date: 12/03/2017
Field of study

Experimental Particle Physics has been at the forefront of analyzing the worlds largest datasets for decades. The HEP community was the first to develop suitable software and computing tools for this task. In recent times, new toolkits and systems collectively called Big Data technologies have emerged to support the analysis of Petabyte and Exabyte datasets in industry. While the principles of data analysis in HEP have not changed (filtering and transforming experiment-specific data formats), these new technologies use different approaches and promise a fresh look at analysis of very large datasets and could potentially reduce the time-to-physics with increased interactivity. In this talk, we present an active LHC Run 2 analysis, searching for dark matter with the CMS detector, as a testbed for Big Data technologies. We directly compare the traditional NTuple-based analysis with an equivalent analysis using Apache Spark on the Hadoop ecosystem and beyond. In both cases, we start the analysis with the official experiment data formats and produce publication physics plots. We will discuss advantages and disadvantages of each approach and give an outlook on further studies needed.Comment: Proceedings for 22nd International Conference on Computing in High Energy and Nuclear Physics (CHEP 2016

arXiv.org e-Print Archive

Crossref

CERN Document Server

A Ceph S3 Object Data Store for HEP

Author: Gutsche Oliver
Illingworth Robert
Jayatilaka Bo
Jones Chris
Mason David
Peisker Alison
Smith Nick
Publication venue
Publication date: 27/11/2023
Field of study

We present a novel data format design that obviates the need for data tiers by storing individual event data products in column objects. The objects are stored and retrieved through Ceph S3 technology, with a layout designed to minimize metadata volume and maximize data processing parallelism. Performance benchmarks of data storage and retrieval are presented.Comment: CHEP2023 proceedings, to be published in EPJ Web of Conference

arXiv.org e-Print Archive

HEPCloud, a New Paradigm for HEP Facilities: CMS Amazon Web Services Investigation

Author: Bauerdick Lothar A. T.
Bockelman Brian
Dykstra Dave
Fisk Ian
Fuess Stuart
Garzoglio Gabriele
Girone Maria
Gutsche Oliver
Holzman Burt
Hufnagel Dirk
Kennedy Robert
Kim Hyunwoo
Magini Nicolo
Mason David
Spentzouris Panagiotis
Timm Steve
Tiradani Anthony
Vaandering Eric W.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/09/2017
Field of study

Historically, high energy physics computing has been performed on large purpose-built computing systems. These began as single-site compute facilities, but have evolved into the distributed computing grids used today. Recently, there has been an exponential increase in the capacity and capability of commercial clouds. Cloud resources are highly virtualized and intended to be able to be flexibly deployed for a variety of computing tasks. There is a growing nterest among the cloud providers to demonstrate the capability to perform large-scale scientific computing. In this paper, we discuss results from the CMS experiment using the Fermilab HEPCloud facility, which utilized both local Fermilab resources and virtual machines in the Amazon Web Services Elastic Compute Cloud. We discuss the planning, technical challenges, and lessons learned involved in performing physics workflows on a large-scale set of virtualized resources. In addition, we will discuss the economics and operational efficiencies when executing workflows both in the cloud and on dedicated resources.Comment: 15 pages, 9 figure

arXiv.org e-Print Archive

Crossref

CERN Document Server

High Energy Physics Forum for Computational Excellence: Working Group Reports (I. Applications Software II. Software Libraries and Tools III. Systems)

Author: Asai Makoto
Bauerdick Lothar
Borgland Anders
Calafiura Paolo
Dart Eli
Elmer Peter
Finkel Hal
Gottlieb Steve
Gutsche Oliver
Habib Salman
Hoeche Stefan
Izubuchi Taku
Kirby Michael
LeCompte Tom
Lyon Adam
Marshall Zach
Nugent Peter
Patton Simon
Petravick Don
Potekhin Maxim
Roser Robert
Sheldon Paul
Vay Jean-Luc
Viren Brett
Yanny Brian
Publication venue
Publication date: 28/10/2015
Field of study

Computing plays an essential role in all aspects of high energy physics. As computational technology evolves rapidly in new directions, and data throughput and volume continue to follow a steep trend-line, it is important for the HEP community to develop an effective response to a series of expected challenges. In order to help shape the desired response, the HEP Forum for Computational Excellence (HEP-FCE) initiated a roadmap planning activity with two key overlapping drivers -- 1) software effectiveness, and 2) infrastructure and expertise advancement. The HEP-FCE formed three working groups, 1) Applications Software, 2) Software Libraries and Tools, and 3) Systems (including systems software), to provide an overview of the current status of HEP computing and to present findings and opportunities for the desired HEP computational roadmap. The final versions of the reports are combined in this document, and are presented along with introductory material.Comment: 72 page

arXiv.org e-Print Archive

eScholarship - University of California

Using Big Data Technologies for HEP Analysis

Author: Bellini Claudio
Bian Bianny
Canali Luca
Cremonesi Matteo
Dimakopoulos Vasileios
Elmer Peter
Evangelos Evangelos
Fisk Ian
Girone Maria
Gutsche Oliver
Hoh Siew-Yan
Jayatilaka Bo
Khristenko Viktor
Luiselli Andrea
Melo Andrew
Olivito Dominick
Pazzini Jacopo
Pivarski Jim
Svyatkovskiy Alexey
Zanetti Marco
Publication venue: 'EDP Sciences'
Publication date: 01/01/2019
Field of study

The HEP community is approaching an era were the excellent performances of the particle accelerators in delivering collision at high rate will force the experiments to record a large amount of information. The growing size of the datasets could potentially become a limiting factor in the capability to produce scientific results timely and efficiently. Recently, new technologies and new approaches have been developed in industry to answer to the necessity to retrieve information as quickly as possible to analyze PB and EB datasets. Providing the scientists with these modern computing tools will lead to rethinking the principles of data analysis in HEP, making the overall scientific process faster and smoother. In this paper, we are presenting the latest developments and the most recent results on the usage of Apache Spark for HEP analysis. The study aims at evaluating the efficiency of the application of the new tools both quantitatively, by measuring the performances, and qualitatively, focusing on the user experience. The first goal is achieved by developing a data reduction facility: working together with CERN Openlab and Intel, CMS replicates a real physics search using Spark-based technologies, with the ambition of reducing 1 PB of public data in 5 hours, collected by the CMS experiment, to 1 TB of data in a format suitable for physics analysis. The second goal is achieved by implementing multiple physics use-cases in Apache Spark using as input preprocessed datasets derived from official CMS data and simulation. By performing different end-analyses up to the publication plots on different hardware, feasibility, usability and portability are compared to the ones of a traditional ROOT-based workflow

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

CERN Document Server

Evaluating Portable Parallelization Strategies for Heterogeneous Architectures in High Energy Physics

Author: Atif Mohammad
Battacharya Meghna
Calafiura Paolo
Childers Taylor
Dewing Mark
Dong Zhihua
Gutsche Oliver
Habib Salman
Knoepfel Kyle
Kortelainen Matti
Kwok Ka Hei Martin
Leggett Charles
Lin Meifeng
Pascuzzi Vincent
Strelchenko Alexei
Tsulaia Vakhtang
Viren Brett
Wang Tianle
Yeo Beomki
Yu Haiwang
Publication venue
Publication date: 27/06/2023
Field of study

High-energy physics (HEP) experiments have developed millions of lines of code over decades that are optimized to run on traditional x86 CPU systems. However, we are seeing a rapidly increasing fraction of floating point computing power in leadership-class computing facilities and traditional data centers coming from new accelerator architectures, such as GPUs. HEP experiments are now faced with the untenable prospect of rewriting millions of lines of x86 CPU code, for the increasingly dominant architectures found in these computational accelerators. This task is made more challenging by the architecture-specific languages and APIs promoted by manufacturers such as NVIDIA, Intel and AMD. Producing multiple, architecture-specific implementations is not a viable scenario, given the available person power and code maintenance issues. The Portable Parallelization Strategies team of the HEP Center for Computational Excellence is investigating the use of Kokkos, SYCL, OpenMP, std::execution::parallel and alpaka as potential portability solutions that promise to execute on multiple architectures from the same source code, using representative use cases from major HEP experiments, including the DUNE experiment of the Long Baseline Neutrino Facility, and the ATLAS and CMS experiments of the Large Hadron Collider. This cross-cutting evaluation of portability solutions using real applications will help inform and guide the HEP community when choosing their software and hardware suites for the next generation of experimental frameworks. We present the outcomes of our studies, including performance metrics, porting challenges, API evaluations, and build system integration.Comment: 18 pages, 9 Figures, 2 Table

arXiv.org e-Print Archive

The Future of High Energy Physics Software and Computing

Author: Bailey S.
Bhimji W.
Boyle P.
Cerati G.
Cranmer K.
Davies G.
Elvira V. D.
Elvira V. Daniel
Gardner R.
Gottlieb Steven
Gutsche Oliver
Heitmann K.
Hildreth M.
Hopkins W.
Humble T.
Kind M. Carrasco
Lin M.
Nachman Benjamin
Onyisi P.
Pedro K.
Perdue G.
Qiang J.
Roberts A.
Savage M.
Shanahan P.
Terao K.
Whiteson D.
Wuerthwein F.
Publication venue
Publication date: 08/11/2022
Field of study

Software and Computing (S&C) are essential to all High Energy Physics (HEP) experiments and many theoretical studies. The size and complexity of S&C are now commensurate with that of experimental instruments, playing a critical role in experimental design, data acquisition/instrumental control, reconstruction, and analysis. Furthermore, S&C often plays a leading role in driving the precision of theoretical calculations and simulations. Within this central role in HEP, S&C has been immensely successful over the last decade. This report looks forward to the next decade and beyond, in the context of the 2021 Particle Physics Community Planning Exercise ("Snowmass") organized by the Division of Particles and Fields (DPF) of the American Physical Society.Comment: Computational Frontier Report Contribution to Snowmass 2021; 41 pages, 1 figure. v2: missing ref and added missing topical group conveners. v3: fixed typo

arXiv.org e-Print Archive

CMS distributed computing workflow experience

The vast majority of the CMS Computing capacity, which is organized in a tiered hierarchy, is located away from CERN. The 7 Tier-1 sites archive the LHC proton-proton collision data that is initially processed at CERN. These sites provide access to all recorded and simulated data for the Tier-2 sites, via wide-area network (WAN) transfers. All central data processing workflows are executed at the Tier-1 level, which contain re-reconstruction and skimming workflows of collision data as well as reprocessing of simulated data to adapt to changing detector conditions. This paper describes the operation of the CMS processing infrastructure at the Tier-1 level. The Tier-1 workflows are described in detail. The operational optimization of resource usage is described. In particular, the variation of different workflows during the data taking period of 2010, their efficiencies and latencies as well as their impact on the delivery of physics results is discussed and lessons are drawn from this experience. The simulation of proton-proton collisions for the CMS experiment is primarily carried out at the second tier of the CMS computing infrastructure. Half of the Tier-2 sites of CMS are reserved for central Monte Carlo (MC) production while the other half is available for user analysis. This paper summarizes the large throughput of the MC production operation during the data taking period of 2010 and discusses the latencies and efficiencies of the various types of MC production workflows. We present the operational procedures to optimize the usage of available resources and we the operational model of CMS for including opportunistic resources, such as the larger Tier-3 sites, into the central production operation

DSpace@MIT

Crossref

Ghent University Academic Bibliography

DI-fusion

Joint Institute for Nuclear Research (JINR)