Search CORE

32 research outputs found

SDN next generation integrated architecture for HEP and global science

Author: Balcas J.
Kcira D.
Legrand I.
Mughal A.
Newman H.
Spiropulu M.
Vlimant J. R.
Voicu R.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/03/2016
Field of study

I describe a software-defined global system under development by Caltech and partner network teams in support of the LHC and other major science programs that coordinates workflows among hundreds of multi-petabyte data stores and petascale computing facilities interlinked by 100 Gbps networks, and the Exascale systems needed by the next decade

Caltech Authors

CMS Connect

Author: Aftab Khan F.
Balcas J.
Bockelman B.
Gardner R., Jr.
Hurtado Anampa K.
Jayatilaka B.
Lannon K.
Larson K.
Letts J.
Marra Da Silva J.
Mascheroni M.
Mason D.
Perez-Calero Yzquierdo A.
Tiradani A.
Publication venue: 'AIP Publishing'
Publication date: 01/01/2017
Field of study

The CMS experiment collects and analyzes large amounts of data coming from high energy particle collisions produced by the Large Hadron Collider (LHC) at CERN. This involves a huge amount of real and simulated data processing that needs to be handled in batch-oriented platforms. The CMS Global Pool of computing resources provide +100K dedicated CPU cores and another 50K to 100K CPU cores from opportunistic resources for these kind of tasks and even though production and event processing analysis workflows are already managed by existing tools, there is still a lack of support to submit final stage condor-like analysis jobs familiar to Tier-3 or local Computing Facilities users into these distributed resources in an integrated (with other CMS services) and friendly way. CMS Connect is a set of computing tools and services designed to augment existing services in the CMS Physics community focusing on these kind of condor analysis jobs. It is based on the CI-Connect platform developed by the Open Science Grid and uses the CMS GlideInWMS infrastructure to transparently plug CMS global grid resources into a virtual pool accessed via a single submission machine. This paper describes the specific developments and deployment of CMS Connect beyond the CI-Connect platform in order to integrate the service with CMS specific needs, including specific Site submission, accounting of jobs and automated reporting to standard CMS monitoring resources in an effortless way to their users

DigitalCommons@University of Nebraska

Caltech Authors

MonALISA, an agent-based monitoring and control system for the LHC experiments

Author: Balcas J
Kcira D
Mughal A
Newman H
Spiropulu M
Vlimant J R
Publication venue
Publication date: 01/01/2017
Field of study

MonALISA, which stands for Monitoring Agents using a Large Integrated Services Architecture, has been developed over the last fifteen years by California Insitute of Technology (Caltech) and its partners with the support of the software and computing program of the CMS and ALICE experiments at the Large Hadron Collider (LHC). The framework is based on Dynamic Distributed Service Architecture and is able to provide complete system monitoring, performance metrics of applications, Jobs or services, system control and global optimization services for complex systems. A short overview and status of MonALISA is given in this paper

Crossref

CERN Document Server

The Archive Solution for Distributed Workflow Management Agents of the CMS Experiment at LHC

Author: D Spiga
D Thain
J Adelman
J Balcas
M Giffels
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Next-Generation Exascale Network Integrated Architecture for Global Science [Invited]

Author: Balcas J.
Kcira D.
Legrand I.
Mughal A.
Newman Harvey
Spiropulu M.
Vlimant J. R.
Voicu R.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/02/2017
Field of study

The next-generation exascale network integrated architecture (NGENIA-ES) is a project specifically designed to accomplish new levels of network and computing capabilities in support of global science collaborations through the development of a new class of intelligent, agile networked systems. Its path to success is built upon our ongoing developments in multiple areas, strong ties among our high energy physics, computer and network science, and engineering teams, and our close collaboration with key technology developers and providers deeply engaged in the national strategic computing initiative (NSCI). This paper describes the building of a new class of distributed systems, our work with the leadership computing facilities (LFCs), the use of software-defined networking (SDN) methods, and the use of data-driven methods for the scheduling and optimization of network resources. Sections I-III present the challenges of data-intensive research and the important ingredients of this ecosystem. Sections IV-VI describe some crucial elements of the foreseen solution and some of the progress so far. Sections VII-IX go into the details of orchestration, software-defined networking, and scheduling optimization. Finally, Section X talks about engagement and partnerships, and Section XI gives a summary. References are given at the end

Caltech Authors

CMS readiness for multi-core workload scheduling

Author: Aftab Khan F
Balcas J
Hernandez J
Letts J
Mason D
Perez-Calero Yzquierdo A
Verguilov V
Publication venue
Publication date: 01/01/2017
Field of study

In the present run of the LHC, CMS data reconstruction and simulation algorithms benefit greatly from being executed as multiple threads running on several processor cores. The complexity of the Run 2 events requires parallelization of the code to reduce the memory-per- core footprint constraining serial execution programs, thus optimizing the exploitation of present multi-core processor architectures. The allocation of computing resources for multi-core tasks, however, becomes a complex problem in itself. The CMS workload submission infrastructure employs multi-slot partitionable pilots, built on HTCondor and GlideinWMS native features, to enable scheduling of single and multi-core jobs simultaneously. This provides a solution for the scheduling problem in a uniform way across grid sites running a diversity of gateways to compute resources and batch system technologies. This paper presents this strategy and the tools on which it has been implemented. The experience of managing multi-core resources at the Tier-0 and Tier-1 sites during 2015, along with the deployment phase to Tier-2 sites during early 2016 is reported. The process of performance monitoring and optimization to achieve efficient and flexible use of the resources is also described

Crossref

CERN Document Server

HTTP as a Data Access Protocol: Trials with XrootD in CMS’s AAA Project

Author: B P Bockelman
D Kcira
H Newman
J Balcas
J Vlimant
null null
T W Hendricks
Publication venue: 'IOP Publishing'
Publication date: 01/01/2017
Field of study

The main goal of the project to demonstrate the ability of using HTTP data federations in a manner analogous to the existing AAA infrastructure of the CMS experiment. An initial testbed at Caltech has been built and changes in the CMS software (CMSSW) are being implemented in order to improve HTTP support. The testbed consists of a set of machines at the Caltech Tier2 that improve the support infrastructure for data federations at CMS. As a first step, we are building systems that produce and ingest network data transfers up to 80 Gbps. In collaboration with AAA, HTTP support is enabled at the US redirector and the Caltech testbed. A plugin for CMSSW is being developed for HTTP access based on the DaviX software. It will replace the present fork/exec or curl for HTTP access. In addition, extensions to the XRootD HTTP implementation are being developed to add functionality to it, such as client-based monitoring identifiers. In the future, patches will be developed to better integrate HTTP-over-XRootD with the Open Science Grid (OSG) distribution. First results of the transfer tests using HTTP are presented in this paper together with details about the initial setup

Crossref

DigitalCommons@University of Nebraska

CERN Document Server

Next-Generation Exascale Network Integrated Architecture for Global Science [Invited]

Author: Balcas J.
Kcira D.
Legrand I.
Mughal A.
Newman Harvey
Spiropulu M.
Vlimant J. R.
Voicu R.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/02/2017
Field of study

CMS Connect

Author: Aftab Khan F.
Balcas J.
Bockelman B.
Gardner R., Jr.
Hurtado Anampa K.
Jayatilaka B.
Lannon K.
Larson K.
Letts J.
Marra Da Silva J.
Mascheroni M.
Mason D.
Tiradani A.
Yzquierdo A. Perez-Calero
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 01/01/2017
Field of study

DigitalCommons@University of Nebraska