2,447 research outputs found
Dynamic Virtualized Deployment of Particle Physics Environments on a High Performance Computing Cluster
The NEMO High Performance Computing Cluster at the University of Freiburg has
been made available to researchers of the ATLAS and CMS experiments. Users
access the cluster from external machines connected to the World-wide LHC
Computing Grid (WLCG). This paper describes how the full software environment
of the WLCG is provided in a virtual machine image. The interplay between the
schedulers for NEMO and for the external clusters is coordinated through the
ROCED service. A cloud computing infrastructure is deployed at NEMO to
orchestrate the simultaneous usage by bare metal and virtualized jobs. Through
the setup, resources are provided to users in a transparent, automatized, and
on-demand way. The performance of the virtualized environment has been
evaluated for particle physics applications
Dynamic Resource Extension for Data Intensive Computing with Specialized Software Environments on HPC Systems
Modern High Energy Physics (HEP) requires large-scale processing of extensive
amounts of scientific data. The needed computing resources are currently
provided statically by HEP specific computing centers. To increase the number
of available resources, for example to cover peak loads, the HEP computing development
team at KIT concentrates on the dynamic integration of additional
computing resources into the HEP infrastructure. Therefore, we developed ROCED,
a tool to dynamically request and integrate computing resources including
resources at HPC centers and commercial cloud providers. Since these resources
usually do not support HEP software natively, we rely on virtualization and container
technologies, which allows us to run HEP workflows on these so called
opportunistic resources. Additionally, we study the efficient processing of huge
amounts of data on a distributed infrastructure, where the data is usually stored
at HEP specific data centers and is accessed remotely over WAN. To optimize
the overall data throughput and to increase the CPU efficiency, we are currently
developing an automated caching system for frequently used data that is transparently
integrated into the distributed HEP computing infrastructure
Modeling Distributed Computing Infrastructures for HEP Applications
Predicting the performance of various infrastructure design options in
complex federated infrastructures with computing sites distributed over a wide
area network that support a plethora of users and workflows, such as the
Worldwide LHC Computing Grid (WLCG), is not trivial. Due to the complexity and
size of these infrastructures, it is not feasible to deploy experimental
test-beds at large scales merely for the purpose of comparing and evaluating
alternate designs. An alternative is to study the behaviours of these systems
using simulation. This approach has been used successfully in the past to
identify efficient and practical infrastructure designs for High Energy Physics
(HEP). A prominent example is the Monarc simulation framework, which was used
to study the initial structure of the WLCG. New simulation capabilities are
needed to simulate large-scale heterogeneous computing systems with complex
networks, data access and caching patterns. A modern tool to simulate HEP
workloads that execute on distributed computing infrastructures based on the
SimGrid and WRENCH simulation frameworks is outlined. Studies of its accuracy
and scalability are presented using HEP as a case-study. Hypothetical
adjustments to prevailing computing architectures in HEP are studied providing
insights into the dynamics of a part of the WLCG and candidates for
improvements.Comment: 26th International Conference on Computing in High Energy and Nuclear
Physics (CHEP 2023
Transparent Integration of Opportunistic Resources into the WLCG Compute Infrastructure
The inclusion of opportunistic resources, for example from High Performance Computing (HPC) centers or cloud providers, is an important contribution to bridging the gap between existing resources and future needs by the LHC collaborations, especially for the HL-LHC era. However, the integration of these resources poses new challenges and often needs to happen in a highly dynamic manner. To enable an effective and lightweight integration of these resources, the tools COBalD and TARDIS are developed at KIT.
In this contribution we report on the infrastructure we use to dynamically offer opportunistic resources to collaborations in the World Wide LHC Computing Grid (WLCG). The core components are COBalD/TARDIS, HTCondor, CVMFS and modern virtualization technology. The challenging task of managing the opportunistic resources is performed by COBalD/TARDIS. We showcase the challenges, employed solutions and experiences gained with the provisioning of opportunistic resources from several resource providers like university clusters, HPC centers and cloud setups in a multi VO environment. This work can serve as a blueprint for approaching the provisioning of resources from other resource providers
AUDITOR – Accounting data handling toolbox for opportunistic resources
Increasing computing demands and concerns about energy efficiency in high-performance and high-throughput computing are driving forces in the search for more efficient ways to use available resources. Sharing resources of an underutilised cluster with a high workload cluster increases the efficiency of the underutilised cluster. The software COBalD/TARDIS can dynamically and transparently integrate and disintegrate such resources. However, sharing resources also requires accounting. AUDITOR (Accounting Data Handling Toolbox for Opportunistic Resources) is a modular accounting ecosystem that is able to cover a wide range of use cases and infrastructures. Accounting data are gathered via so-called collectors, which are designed to monitor batch systems, COBalD/TARDIS, cloud schedulers, or other sources of information. The data is stored in a database, and access to the data is handled by the core component of AUDITOR, which provides a REST API along both Rust and a Python client libraries. So-called plugins can take actions based on accounting records. Depending on the use case, one simply selects a suitable collector and plugin from a growing ecosystem of collectors and plugins. To facilitate the development of collectors and plugins for yet uncovered use cases by the community, libraries for interacting with AUDITOR are provided
Federated Heterogeneous Compute and Storage Infrastructure for the PUNCH4NFDI Consortium
PUNCH4NFDI, funded by the Germany Research Foundation initially for five years, is a diverse consortium of particle, astro-, astroparticle, hadron and nuclear physics embedded in the National Research Data Infrastructure initiative. In order to provide seamless and federated access to the huge variety of compute and storage systems provided by the participating communities covering their very diverse needs, the Compute4PUNCH and Storage4PUNCH concepts have been developed. Both concepts comprise state-of-the-art technologies such as a token-based AAI for standardized access to compute and storage resources. The community supplied heterogeneous HPC, HTC and Cloud compute resources are dynamically and transparently integrated into one federated HTCondorbased overlay batch system using the COBalD/TARDIS resource meta-scheduler. Traditional login nodes and a JupyterHub provide entry points into the entire landscape of available compute resources, while container technologies and the CERN Virtual Machine File System (CVMFS) ensure a scalable provisioning of community-specific software environments. In Storage4PUNCH, community supplied storage systems mainly based on dCache or XRootD technology are being federated in a common infrastructure employing methods that are well established in the wider HEP community. Furthermore existing technologies for caching as well as metadata handling are being evaluated with the aim for a deeper integration. The combined Compute4PUNCH and Storage4PUNCH environment will allow a large variety of researchers to carry out resource-demanding analysis tasks. In this contribution we will present the Compute4PUNCH and Storage4PUNCH concepts, the current status of the developments as well as first experiences with scientific applications being executed on the available prototypes
Extending the distributed computing infrastructure of the CMS experiment with HPC resources
Particle accelerators are an important tool to study the fundamental properties of elementary particles. Currently the highest energy accelerator is the LHC at CERN, in Geneva, Switzerland. Each of its four major detectors, such as the CMS detector, produces dozens of Petabytes of data per year to be analyzed by a large international collaboration. The processing is carried out on the Worldwide LHC Computing Grid, that spans over more than 170 compute centers around the world and is used by a number of particle physics experiments. Recently the LHC experiments were encouraged to make increasing use of HPC resources. While Grid resources are homogeneous with respect to the used Grid middleware, HPC installations can be very different in their setup. In order to integrate HPC resources into the highly automatized processing setups of the CMS experiment a number of challenges need to be addressed. For processing, access to primary data and metadata as well as access to the software is required. At Grid sites all this is achieved via a number of services that are provided by each center. However at HPC sites many of these capabilities cannot be easily provided and have to be enabled in the user space or enabled by other means. At HPC centers there are often restrictions regarding network access to remote services, which is again a severe limitation. The paper discusses a number of solutions and recent experiences by the CMS experiment to include HPC resources in processing campaigns
- …