32 research outputs found

    SDN next generation integrated architecture for HEP and global science

    Get PDF
    I describe a software-defined global system under development by Caltech and partner network teams in support of the LHC and other major science programs that coordinates workflows among hundreds of multi-petabyte data stores and petascale computing facilities interlinked by 100 Gbps networks, and the Exascale systems needed by the next decade

    CMS Connect

    Get PDF
    The CMS experiment collects and analyzes large amounts of data coming from high energy particle collisions produced by the Large Hadron Collider (LHC) at CERN. This involves a huge amount of real and simulated data processing that needs to be handled in batch-oriented platforms. The CMS Global Pool of computing resources provide +100K dedicated CPU cores and another 50K to 100K CPU cores from opportunistic resources for these kind of tasks and even though production and event processing analysis workflows are already managed by existing tools, there is still a lack of support to submit final stage condor-like analysis jobs familiar to Tier-3 or local Computing Facilities users into these distributed resources in an integrated (with other CMS services) and friendly way. CMS Connect is a set of computing tools and services designed to augment existing services in the CMS Physics community focusing on these kind of condor analysis jobs. It is based on the CI-Connect platform developed by the Open Science Grid and uses the CMS GlideInWMS infrastructure to transparently plug CMS global grid resources into a virtual pool accessed via a single submission machine. This paper describes the specific developments and deployment of CMS Connect beyond the CI-Connect platform in order to integrate the service with CMS specific needs, including specific Site submission, accounting of jobs and automated reporting to standard CMS monitoring resources in an effortless way to their users

    MonALISA, an agent-based monitoring and control system for the LHC experiments

    No full text
    MonALISA, which stands for Monitoring Agents using a Large Integrated Services Architecture, has been developed over the last fifteen years by California Insitute of Technology (Caltech) and its partners with the support of the software and computing program of the CMS and ALICE experiments at the Large Hadron Collider (LHC). The framework is based on Dynamic Distributed Service Architecture and is able to provide complete system monitoring, performance metrics of applications, Jobs or services, system control and global optimization services for complex systems. A short overview and status of MonALISA is given in this paper

    The Archive Solution for Distributed Workflow Management Agents of the CMS Experiment at LHC

    No full text

    Next-Generation Exascale Network Integrated Architecture for Global Science [Invited]

    No full text
    The next-generation exascale network integrated architecture (NGENIA-ES) is a project specifically designed to accomplish new levels of network and computing capabilities in support of global science collaborations through the development of a new class of intelligent, agile networked systems. Its path to success is built upon our ongoing developments in multiple areas, strong ties among our high energy physics, computer and network science, and engineering teams, and our close collaboration with key technology developers and providers deeply engaged in the national strategic computing initiative (NSCI). This paper describes the building of a new class of distributed systems, our work with the leadership computing facilities (LFCs), the use of software-defined networking (SDN) methods, and the use of data-driven methods for the scheduling and optimization of network resources. Sections I-III present the challenges of data-intensive research and the important ingredients of this ecosystem. Sections IV-VI describe some crucial elements of the foreseen solution and some of the progress so far. Sections VII-IX go into the details of orchestration, software-defined networking, and scheduling optimization. Finally, Section X talks about engagement and partnerships, and Section XI gives a summary. References are given at the end

    CMS readiness for multi-core workload scheduling

    No full text
    In the present run of the LHC, CMS data reconstruction and simulation algorithms benefit greatly from being executed as multiple threads running on several processor cores. The complexity of the Run 2 events requires parallelization of the code to reduce the memory-per- core footprint constraining serial execution programs, thus optimizing the exploitation of present multi-core processor architectures. The allocation of computing resources for multi-core tasks, however, becomes a complex problem in itself. The CMS workload submission infrastructure employs multi-slot partitionable pilots, built on HTCondor and GlideinWMS native features, to enable scheduling of single and multi-core jobs simultaneously. This provides a solution for the scheduling problem in a uniform way across grid sites running a diversity of gateways to compute resources and batch system technologies. This paper presents this strategy and the tools on which it has been implemented. The experience of managing multi-core resources at the Tier-0 and Tier-1 sites during 2015, along with the deployment phase to Tier-2 sites during early 2016 is reported. The process of performance monitoring and optimization to achieve efficient and flexible use of the resources is also described

    HTTP as a Data Access Protocol: Trials with XrootD in CMS’s AAA Project

    Get PDF
    The main goal of the project to demonstrate the ability of using HTTP data federations in a manner analogous to the existing AAA infrastructure of the CMS experiment. An initial testbed at Caltech has been built and changes in the CMS software (CMSSW) are being implemented in order to improve HTTP support. The testbed consists of a set of machines at the Caltech Tier2 that improve the support infrastructure for data federations at CMS. As a first step, we are building systems that produce and ingest network data transfers up to 80 Gbps. In collaboration with AAA, HTTP support is enabled at the US redirector and the Caltech testbed. A plugin for CMSSW is being developed for HTTP access based on the DaviX software. It will replace the present fork/exec or curl for HTTP access. In addition, extensions to the XRootD HTTP implementation are being developed to add functionality to it, such as client-based monitoring identifiers. In the future, patches will be developed to better integrate HTTP-over-XRootD with the Open Science Grid (OSG) distribution. First results of the transfer tests using HTTP are presented in this paper together with details about the initial setup

    Next-Generation Exascale Network Integrated Architecture for Global Science [Invited]

    No full text
    The next-generation exascale network integrated architecture (NGENIA-ES) is a project specifically designed to accomplish new levels of network and computing capabilities in support of global science collaborations through the development of a new class of intelligent, agile networked systems. Its path to success is built upon our ongoing developments in multiple areas, strong ties among our high energy physics, computer and network science, and engineering teams, and our close collaboration with key technology developers and providers deeply engaged in the national strategic computing initiative (NSCI). This paper describes the building of a new class of distributed systems, our work with the leadership computing facilities (LFCs), the use of software-defined networking (SDN) methods, and the use of data-driven methods for the scheduling and optimization of network resources. Sections I-III present the challenges of data-intensive research and the important ingredients of this ecosystem. Sections IV-VI describe some crucial elements of the foreseen solution and some of the progress so far. Sections VII-IX go into the details of orchestration, software-defined networking, and scheduling optimization. Finally, Section X talks about engagement and partnerships, and Section XI gives a summary. References are given at the end

    CMS Connect

    Get PDF
    The CMS experiment collects and analyzes large amounts of data coming from high energy particle collisions produced by the Large Hadron Collider (LHC) at CERN. This involves a huge amount of real and simulated data processing that needs to be handled in batch-oriented platforms. The CMS Global Pool of computing resources provide +100K dedicated CPU cores and another 50K to 100K CPU cores from opportunistic resources for these kind of tasks and even though production and event processing analysis workflows are already managed by existing tools, there is still a lack of support to submit final stage condor-like analysis jobs familiar to Tier-3 or local Computing Facilities users into these distributed resources in an integrated (with other CMS services) and friendly way. CMS Connect is a set of computing tools and services designed to augment existing services in the CMS Physics community focusing on these kind of condor analysis jobs. It is based on the CI-Connect platform developed by the Open Science Grid and uses the CMS GlideInWMS infrastructure to transparently plug CMS global grid resources into a virtual pool accessed via a single submission machine. This paper describes the specific developments and deployment of CMS Connect beyond the CI-Connect platform in order to integrate the service with CMS specific needs, including specific Site submission, accounting of jobs and automated reporting to standard CMS monitoring resources in an effortless way to their users
    corecore