Search CORE

44 research outputs found

HPC resources for CMS offline computing: An integration and scalability challenge for the Submission Infrastructure

Author: Acosta Flechas Maria
Haleem Saqib
Khan Farrukh Aftab
Kim Hyunwoo
Kizinevic Edita
Mascheroni Marco
Pérez-Calero Yzquierdo Antonio
Tsipinakis Nikos
Publication venue: EDP Sciences
Publication date: 01/01/2024
Field of study

The computing resource needs of LHC experiments are expected to continue growing significantly during the Run 3 and into the HL-LHC era. The landscape of available resources will also evolve, as High Performance Computing (HPC) and Cloud resources will provide a comparable, or even dominant, fraction of the total compute capacity. The future years present a challenge for the experiments’ resource provisioning models, both in terms of scalability and increasing complexity. The CMS Submission Infrastructure (SI) provisions computing resources for CMS workflows. This infrastructure is built on a set of federated HTCondor pools, currently aggregating 400k CPU cores distributed worldwide and supporting the simultaneous execution of over 200k computing tasks. Incorporating HPC resources into CMS computing represents firstly an integration challenge, as HPC centers are much more diverse compared to Grid sites. Secondly, evolving the present SI, dimensioned to harness the current CMS computing capacity, to reach the resource scales required for the HLLHC phase, while maintaining global flexibility and efficiency, will represent an additional challenge for the SI. To preventively address future potential scalability limits, the SI team regularly runs tests to explore the maximum reach of our infrastructure. In this note, the integration of HPC resources into CMS offline computing is summarized, the potential concerns for the SI derived from the increased scale of operations are described, and the most recent results of scalability test on the CMS SI are reported

Directory of Open Access Journals

Improving efficiency of analysis jobs in CMS

Author: Balcas Justas
Belforte Stefano
Bockelman Brian Paul
Ciangottini Diego
Cristella Leonardo
Davila Foyo Diego
Hernández José M.
Hurtado Anampa Kenyi
Ivanov Todor Trendafilov
Letts James
Mascheroni Marco
Pérez-Calero Yzquierdo Antonio
Wolf Matthias
Woodard Anna Elizabeth
Publication venue: 'EDP Sciences'
Publication date: 17/09/2019
Field of study

Hundreds of physicists analyze data collected by the Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider using the CMS Remote Analysis Builder and the CMS global pool to exploit the resources of the Worldwide LHC Computing Grid. Efficient use of such an extensive and expensive resource is crucial. At the same time, the CMS collaboration is committed to minimizing time to insight for every scientist, by pushing for fewer possible access restrictions to the full data sample and supports the free choice of applications to run on the computing resources. Supporting such variety of workflows while preserving efficient resource usage poses special challenges. In this paper we report on three complementary approaches adopted in CMS to improve the scheduling efficiency of user analysis jobs: automatic job splitting, automated run time estimates and automated site selection for jobs

The integration of heterogeneous resources in the CMS Submission Infrastructure for the LHC Run 3 and beyond

Author: Acosta Flechas Maria
Haleem Saqib
Khan Farrukh Aftab
Kim Hyunwoo
Kizinevic Edita
Mascheroni Marco
Pérez-Calero Yzquierdo Antonio
Tsipinakis Nikos
Publication venue: EDP Sciences
Publication date: 01/01/2024
Field of study

While the computing landscape supporting LHC experiments is currently dominated by x86 processors at WLCG sites, this configuration will evolve in the coming years. LHC collaborations will be increasingly employing HPC and Cloud facilities to process the vast amounts of data expected during the LHC Run 3 and the future HL-LHC phase. These facilities often feature diverse compute resources, including alternative CPU architectures like ARM and IBM Power, as well as a variety of GPU specifications. Using these heterogeneous resources efficiently is thus essential for the LHC collaborations reaching their future scientific goals. The Submission Infrastructure (SI) is a central element in CMS Computing, enabling resource acquisition and exploitation by CMS data processing, simulation and analysis tasks. The SI must therefore be adapted to ensure access and optimal utilization of this heterogeneous compute capacity. Some steps in this evolution have been already taken, as CMS is currently using opportunistically a small pool of GPU slots provided mainly at the CMS WLCG sites. Additionally, Power9 processors have been validated for CMS production at the Marconi-100 cluster at CINECA. This note will describe the updated capabilities of the SI to continue ensuring the efficient allocation and use of computing resources by CMS, despite their increasing diversity. The next steps towards a full integration and support of heterogeneous resources according to CMS needs will also be reported

Directory of Open Access Journals

Adoption of a token-based authentication model for the CMS Submission Infrastructure

Author: Acosta Flechas Maria
Haleem Saqib
Khan Farrukh Aftab
Kim Hyunwoo
Kizinevic Edita
Mascheroni Marco
Pérez-Calero Yzquierdo Antonio
Tsipinakis Nikos
Würthwein Frank
Publication venue: EDP Sciences
Publication date: 01/01/2024
Field of study

The CMS Submission Infrastructure (SI) is the main computing resource provisioning system for CMS workloads. A number of HTCondor pools are employed to manage this infrastructure, which aggregates geographically distributed resources from the WLCG and other providers. Historically, the model of authentication among the diverse components of this infrastructure has relied on the Grid Security Infrastructure (GSI), based on identities and X509 certificates. In contrast, commonly used modern authentication standards are based on capabilities and tokens. The WLCG has identified this trend and aims at a transparent replacement of GSI for all its workload management, data transfer and storage access operations, to be completed during the current LHC Run 3. As part of this effort, and within the context of CMS computing, the Submission Infrastructure group is in the process of phasing out the GSI part of its authentication layers, in favor of IDTokens and Scitokens. The use of tokens is already well integrated into the HTCondor Software Suite, which has allowed us to fully migrate the authentication between internal components of SI. Additionally, recent versions of the HTCondor-CE support tokens as well, enabling CMS resource requests to Grid sites employing this CE technology to be granted by means of token exchange. After a rollout campaign to sites, successfully completed by the third quarter of 2022, the totality of HTCondor CEs in use by CMS are already receiving Scitoken-based pilot jobs. On the ARC CE side, a parallel campaign was launched to foster the adoption of the REST interface at CMS sites (required to enable token-based job submission via HTCondor-G), which is nearing completion as well. In this contribution, the newly adopted authentication model will be described. We will then report on the migration status and final steps towards complete GSI phase out in the CMS SI

Directory of Open Access Journals

Repurposing of the Run 2 CMS High Level Trigger Infrastructure as a Cloud Resource for Offline Computing

Author: Acosta Flechas Maria
Haleem Saqib
Khan Farrukh Aftab
Kim Hyunwoo
Kizinevic Edita
Mascheroni Marco
Pérez-Calero Yzquierdo Antonio
Spiga Damiele
Tsipinakis Nikos
Wissing Christoph
Würthwein Frank
Publication venue: EDP Sciences
Publication date: 01/01/2024
Field of study

The former CMS Run 2 High Level Trigger (HLT) farm is one of the largest contributors to CMS compute resources, providing about 25k job slots for offline computing. This CPU farm was initially employed as an opportunistic resource, exploited during inter-fill periods, in the LHC Run 2. Since then, it has become a nearly transparent extension of the CMS capacity at CERN, being located on-site at the LHC interaction point 5 (P5), where the CMS detector is installed. This resource has been configured to support the execution of critical CMS tasks, such as prompt detector data reconstruction. It can therefore be used in combination with the dedicated Tier 0 capacity at CERN, in order to process and absorb peaks in the stream of data coming from the CMS detector. The initial configuration for this resource, based on statically configured VMs, provided the required level of functionality. However, regular operations of this cluster revealed certain limitations compared to the resource provisioning and use model employed in the case of WLCG sites. A new configuration, based on a vacuum-like model, has been implemented for this resource in order to solve the detected shortcomings. This paper reports about this redeployment work on the permanent cloud for an enhanced support to CMS offline computing, comparing the former and new models’ respective functionalities, along with the commissioning effort for the new setup

Directory of Open Access Journals

Adoption of a token-based authentication model for the CMS Submission Infrastructure.

Author: Mascheroni Marco
Publication venue
Publication date: 21/09/2023
Field of study

CERN Document Server

The Common Analysis Framework Project

Author: Mascheroni Marco
Publication venue
Publication date: 13/11/2013
Field of study

ATLAS, CERN-IT, and CMS embarked on a project to develop a common system for analysis workflow management, resource provisioning and job scheduling in the distributed computing infrastructure based on elements of PanDA. After an extensive feasibility study and development of a proof-of-concept prototype, the project has now a basic infrastructure that can be used to support the analysis use case of both experiments with common services. In this paper we will discuss the state of the art of the current solution, giving an overview of all the components of the system

CERN Document Server

Implementation of New Security Features in CMSWEB Kubernetes Cluster at CERN

Author: Ali Aamir
Imran Muhammad
Kuznetsov Valentin
Mascheroni Marco
Pervaiz Aroosha
Pfeiffer Andreas
Trigazis Spyridon
Publication venue: EDP Sciences
Publication date: 01/01/2024
Field of study

The CMSWEB cluster is pivotal to the activities of the Compact Muon Solenoid (CMS) experiment, as it hosts critical services required for the operational needs of the CMS experiment. The security of these services and the corresponding data is crucial to CMS. Any malicious attack can compromise the availability of our services. Therefore, it is important to construct a robust security infrastructure. In this work, we discuss new security features introduced to the CMSWEB Kubernetes (“k8s”) cluster. The new features include the implementation of network policies, deployment of Open Policy Agent (OPA), enforcement of OPA policies, and the integration of Vault. The network policies act as an inside-the-cluster firewall to limit the network communication between the pods to the minimum necessary, and its dynamic nature allows us to work with microservices. The OPA validates the objects against some custom-defined policies during create, update, and delete operations to further enhance security. Without recompiling or changing the configuration of the Kubernetes API server, it can apply customized policies on Kubernetes objects and their audit functionality enabling us to detect pre-existing conflicts and issues. Although Kubernetes incorporates the concepts of secrets, they are only base64 encoded and are not dynamically configured. This is where Vault comes into play: Vault dynamically secures, stores, and tightly controls access to sensitive data. This way, the secret information is encrypted, secured, and centralized, making it more scalable and easier to manage. Thus, the implementation of these three security features corroborate the enhanced security and reliability of the CMSWEB Kubernetes infrastructure

Directory of Open Access Journals