3,039 research outputs found
Recommended from our members
Validation of Software Releases for CMS
The CMS software stack currently consists of more than 2 Million lines of code developed by over 250 authors with a new version being released every week. CMS has setup a validation process for quality assurance which enables the developers to compare the performance of a release to previous releases and references. The validation process provides the developers with reconstructed datasets of real data and MC samples. The samples span the whole range of detector effects and important physics signatures to benchmark the performance of the software. They are used to investigate interdependency effects of all CMS software components and to find and fix bugs. The release validation process described here is an integral part of CMS software development and contributes significantly to ensure stable production and analysis. It represents a sizable contribution to the overall MC production of CMS. Its success emphasizes the importance of a streamlined release validation process for projects with a large code basis and significant number of developers and can function as a model for future projects
Event processing time prediction at the CMS experiment of the Large Hadron Collider
The physics event reconstruction is one of the biggest challenges for the computing of the LHC experiments. Among the different tasks that computing systems of the CMS experiment performs, the reconstruction takes most of the available CPU resources. The reconstruction time of single collisions varies according to event complexity. Measurements were done in order to determine this correlation quantitatively, creating means to predict it based on the data-taking conditions of the input samples. Currently the data processing system splits tasks in groups with the same number of collisions and does not account for variations in the processing time. These variations can be large and can lead to a considerable increase in the time it takes for CMS workflows to finish. The goal of this study was to use estimates on processing time to more efficiently split the workflow into jobs. By considering the CPU time needed for each job the spread of the job-length distribution in a workflow is reduced
Big Data in HEP: A comprehensive use case study
Experimental Particle Physics has been at the forefront of analyzing the
worlds largest datasets for decades. The HEP community was the first to develop
suitable software and computing tools for this task. In recent times, new
toolkits and systems collectively called Big Data technologies have emerged to
support the analysis of Petabyte and Exabyte datasets in industry. While the
principles of data analysis in HEP have not changed (filtering and transforming
experiment-specific data formats), these new technologies use different
approaches and promise a fresh look at analysis of very large datasets and
could potentially reduce the time-to-physics with increased interactivity. In
this talk, we present an active LHC Run 2 analysis, searching for dark matter
with the CMS detector, as a testbed for Big Data technologies. We directly
compare the traditional NTuple-based analysis with an equivalent analysis using
Apache Spark on the Hadoop ecosystem and beyond. In both cases, we start the
analysis with the official experiment data formats and produce publication
physics plots. We will discuss advantages and disadvantages of each approach
and give an outlook on further studies needed.Comment: Proceedings for 22nd International Conference on Computing in High
Energy and Nuclear Physics (CHEP 2016
A Ceph S3 Object Data Store for HEP
We present a novel data format design that obviates the need for data tiers
by storing individual event data products in column objects. The objects are
stored and retrieved through Ceph S3 technology, with a layout designed to
minimize metadata volume and maximize data processing parallelism. Performance
benchmarks of data storage and retrieval are presented.Comment: CHEP2023 proceedings, to be published in EPJ Web of Conference
HEPCloud, a New Paradigm for HEP Facilities: CMS Amazon Web Services Investigation
Historically, high energy physics computing has been performed on large
purpose-built computing systems. These began as single-site compute facilities,
but have evolved into the distributed computing grids used today. Recently,
there has been an exponential increase in the capacity and capability of
commercial clouds. Cloud resources are highly virtualized and intended to be
able to be flexibly deployed for a variety of computing tasks. There is a
growing nterest among the cloud providers to demonstrate the capability to
perform large-scale scientific computing. In this paper, we discuss results
from the CMS experiment using the Fermilab HEPCloud facility, which utilized
both local Fermilab resources and virtual machines in the Amazon Web Services
Elastic Compute Cloud. We discuss the planning, technical challenges, and
lessons learned involved in performing physics workflows on a large-scale set
of virtualized resources. In addition, we will discuss the economics and
operational efficiencies when executing workflows both in the cloud and on
dedicated resources.Comment: 15 pages, 9 figure
High Energy Physics Forum for Computational Excellence: Working Group Reports (I. Applications Software II. Software Libraries and Tools III. Systems)
Computing plays an essential role in all aspects of high energy physics. As
computational technology evolves rapidly in new directions, and data throughput
and volume continue to follow a steep trend-line, it is important for the HEP
community to develop an effective response to a series of expected challenges.
In order to help shape the desired response, the HEP Forum for Computational
Excellence (HEP-FCE) initiated a roadmap planning activity with two key
overlapping drivers -- 1) software effectiveness, and 2) infrastructure and
expertise advancement. The HEP-FCE formed three working groups, 1) Applications
Software, 2) Software Libraries and Tools, and 3) Systems (including systems
software), to provide an overview of the current status of HEP computing and to
present findings and opportunities for the desired HEP computational roadmap.
The final versions of the reports are combined in this document, and are
presented along with introductory material.Comment: 72 page
The U.S. CMS HL-LHC R&D Strategic Plan
The HL-LHC run is anticipated to start at the end of this decade and will
pose a significant challenge for the scale of the HEP software and computing
infrastructure. The mission of the U.S. CMS Software & Computing Operations
Program is to develop and operate the software and computing resources
necessary to process CMS data expeditiously and to enable U.S. physicists to
fully participate in the physics of CMS. We have developed a strategic plan to
prioritize R&D efforts to reach this goal for the HL-LHC. This plan includes
four grand challenges: modernizing physics software and improving algorithms,
building infrastructure for exabyte-scale datasets, transforming the scientific
data analysis process and transitioning from R&D to operations. We are involved
in a variety of R&D projects that fall within these grand challenges. In this
talk, we will introduce our four grand challenges and outline the R&D program
of the U.S. CMS Software & Computing Operations Program.Comment: CHEP2023 proceedings, to be published in EPJ Web of Conference
Using Big Data Technologies for HEP Analysis
The HEP community is approaching an era were the excellent performances of
the particle accelerators in delivering collision at high rate will force the
experiments to record a large amount of information. The growing size of the
datasets could potentially become a limiting factor in the capability to
produce scientific results timely and efficiently. Recently, new technologies
and new approaches have been developed in industry to answer to the necessity
to retrieve information as quickly as possible to analyze PB and EB datasets.
Providing the scientists with these modern computing tools will lead to
rethinking the principles of data analysis in HEP, making the overall
scientific process faster and smoother.
In this paper, we are presenting the latest developments and the most recent
results on the usage of Apache Spark for HEP analysis. The study aims at
evaluating the efficiency of the application of the new tools both
quantitatively, by measuring the performances, and qualitatively, focusing on
the user experience. The first goal is achieved by developing a data reduction
facility: working together with CERN Openlab and Intel, CMS replicates a real
physics search using Spark-based technologies, with the ambition of reducing 1
PB of public data in 5 hours, collected by the CMS experiment, to 1 TB of data
in a format suitable for physics analysis.
The second goal is achieved by implementing multiple physics use-cases in
Apache Spark using as input preprocessed datasets derived from official CMS
data and simulation. By performing different end-analyses up to the publication
plots on different hardware, feasibility, usability and portability are
compared to the ones of a traditional ROOT-based workflow
Evaluating Portable Parallelization Strategies for Heterogeneous Architectures in High Energy Physics
High-energy physics (HEP) experiments have developed millions of lines of
code over decades that are optimized to run on traditional x86 CPU systems.
However, we are seeing a rapidly increasing fraction of floating point
computing power in leadership-class computing facilities and traditional data
centers coming from new accelerator architectures, such as GPUs. HEP
experiments are now faced with the untenable prospect of rewriting millions of
lines of x86 CPU code, for the increasingly dominant architectures found in
these computational accelerators. This task is made more challenging by the
architecture-specific languages and APIs promoted by manufacturers such as
NVIDIA, Intel and AMD. Producing multiple, architecture-specific
implementations is not a viable scenario, given the available person power and
code maintenance issues.
The Portable Parallelization Strategies team of the HEP Center for
Computational Excellence is investigating the use of Kokkos, SYCL, OpenMP,
std::execution::parallel and alpaka as potential portability solutions that
promise to execute on multiple architectures from the same source code, using
representative use cases from major HEP experiments, including the DUNE
experiment of the Long Baseline Neutrino Facility, and the ATLAS and CMS
experiments of the Large Hadron Collider. This cross-cutting evaluation of
portability solutions using real applications will help inform and guide the
HEP community when choosing their software and hardware suites for the next
generation of experimental frameworks. We present the outcomes of our studies,
including performance metrics, porting challenges, API evaluations, and build
system integration.Comment: 18 pages, 9 Figures, 2 Table
The Future of High Energy Physics Software and Computing
Software and Computing (S&C) are essential to all High Energy Physics (HEP)
experiments and many theoretical studies. The size and complexity of S&C are
now commensurate with that of experimental instruments, playing a critical role
in experimental design, data acquisition/instrumental control, reconstruction,
and analysis. Furthermore, S&C often plays a leading role in driving the
precision of theoretical calculations and simulations. Within this central role
in HEP, S&C has been immensely successful over the last decade. This report
looks forward to the next decade and beyond, in the context of the 2021
Particle Physics Community Planning Exercise ("Snowmass") organized by the
Division of Particles and Fields (DPF) of the American Physical Society.Comment: Computational Frontier Report Contribution to Snowmass 2021; 41
pages, 1 figure. v2: missing ref and added missing topical group conveners.
v3: fixed typo
- …