Search CORE

16 research outputs found

How to keep the Grid full and working with ATLAS production and physics jobs

Author: A Di Girolamo
A Filipcic
A Pacheco Pagés
ATLAS Collaboration
D Cameron
F Fassi
F H Barreiro Megino
Filipcic A
Forti A
I Glushkov
Maeno T
Megino F H Barreiro
R Walker
S González de la Hoz
T Maeno
W Yang
Publication venue: 'IOP Publishing'
Publication date
Field of study

PanDA for ATLAS Distributed Computing in the Next Decade

Author: A Klimentov
D Oleynik
F H Barreiro Megino
K De
null null
P Nilsson
S Padolski
S Panitkin
T Maeno
T Wenaus
Publication venue
Publication date: 25/09/2016
Field of study

The Production and Distributed Analysis (PanDA) system has been developed to meet ATLAS production and analysis requirements for a data-driven workload management system capable of operating at the Large Hadron Collider (LHC) data processing scale. Heterogeneous resources used by the ATLAS experiment are distributed worldwide at hundreds of sites, thousands of physicists analyse the data remotely, the volume of processed data is beyond the exabyte scale, dozens of scientific applications are supported, while data processing requires more than a few billion hours of computing usage per year. PanDA performed very well over the last decade including the LHC Run 1 data taking period. However, it was decided to upgrade the whole system concurrently with the LHC’s first long shutdown in order to cope with rapidly changing computing infrastructure. After two years of reengineering efforts, PanDA has embedded capabilities for fully dynamic and flexible workload management. The static batch job paradigm was discarded in favor of a more automated and scalable model. Workloads are dynamically tailored for optimal usage of resources, with the brokerage taking network traffic and forecasts into account. Computing resources are partitioned based on dynamic knowledge of their status and characteristics. The pilot has been re-factored around a plugin structure for easier development and deployment. Bookkeeping is handled with both coarse and fine granularities for efficient utilization of pledged or opportunistic resources. Leveraging direct remote data access and federated storage relaxes the geographical coupling between processing and data. An in-house security mechanism authenticates the pilot and data management services in off-grid environments such as volunteer computing and private local clusters. The PanDA monitor has been extensively optimized for performance and extended with analytics to provide aggregated summaries of the system as well as drill-down to operational details. There are as well many other challenges planned or recently implemented, and adoption by non-LHC experiments such as bioinformatics groups successfully running Paleomix (microbial genome and metagenomes) payload on supercomputers. In this talk we will focus on the new and planned features that are most important to the next decade of distributed computing workload management

Crossref

CERN Document Server

PanDA for ATLAS distributed computing in the next decade

Author: A Klimentov
ATLAS Collaboration
Borodin M
Borodin M
D Oleynik
De K
Evans L
F H Barreiro Megino
K De
Lassnig M
Mashinistov R
Megino F H Barreiro
Nilsson P
P Nilsson
Panitkin S
S Padolski
S Panitkin
T Maeno
T Wenaus
Tsulaia V
Tsulaia V
Wenaus T
Publication venue: 'IOP Publishing'
Publication date
Field of study

Crossref

Global heterogeneous resource harvesting: the next-generation PanDA Pilot for ATLAS

Author: A. Anisenkov
Alef Manfred
ATLAS Collaboration
Barreiro Megino F. H.
Calafiura P.
Calafiura P.
D. Drizhuk
D. Oleynik
M. Lassnig
P. Nilsson
W. Guan
Publication venue: 'IOP Publishing'
Publication date
Field of study

Crossref

Challenging data and workload management in CMS Computing with network-aware systems

Author: Barreiro Megino F H
Bloom K
Campana S etal
D Bonacorsi
Demar P
Egeland R
Grandi C
McKee S etal
Melo A etal
T Wildish
Publication venue: 'IOP Publishing'
Publication date
Field of study

Crossref

ATLAS Cloud Computing R&D project

Author: Barreiro Megino F
Benjamin D
Caballero Bejar J
DiGirolamo A
Gable I
Hendrix V
Hover J
Kucharczuk K
Medrano LLamas R
Ohman H
Panitkin S
Paterson M
Sobie R
Taylor R
Walker R
Zaytsev A
Publication venue
Publication date: 10/10/2013
Field of study

The computing model of the ATLAS experiment was designed around the concept of grid computing and, since the start of data taking, this model has proven very successful. However, new cloud computing technologies bring attractive features to improve the operations and elasticity of scientific distributed computing. ATLAS sees grid and cloud computing as complementary technologies that will coexist at different levels of resource abstraction, and two years ago created an R&D working group to investigate the different integration scenarios. The ATLAS Cloud Computing R&D has been able to demonstrate the feasibility of offloading work from grid to cloud sites and, as of today, is able to integrate transparently various cloud resources into the PanDA workload management system. The ATLAS Cloud Computing R&D is operating various PanDA queues on private and public resources and has provided several hundred thousand CPU days to the experiment. As a result, the ATLAS Cloud Computing R&D group has gained a significant insight into the cloud computing landscape and has identified points that still need to be addressed in order to fully utilize this technology.\nThis contribution will explain the cloud integration models that are being evaluated and will discuss ATLAS’ learning during the collaboration with leading commercial and academic cloud providers

CERN Document Server

ATLAS Cloud R&D

Author: Barreiro Megino F
Benjamin D
Caballero Bejar J
DiGirolamo A
Gable I
Hendrix V
Hover J
Kucharczuk K
Love P
Medrano LLamas R
Ohman H
Panitkin S
Paterson M
Sobie R
Taylor R
Walker R
Zaytsev A
Publication venue: 'IOP Publishing'
Publication date: 29/10/2013
Field of study

The computing model of the ATLAS experiment was designed around the concept of grid computing and, since the start of data taking, this model has proven very successful. However, new cloud computing technologies bring attractive features to improve the operations and elasticity of scientific distributed computing. ATLAS sees grid and cloud computing as complementary technologies that will coexist at different levels of resource abstraction, and two years ago created an R&D working group to investigate the different integration scenarios. The ATLAS Cloud Computing R&D has been able to demonstrate the feasibility of offloading work from grid to cloud sites and, as of today, is able to integrate transparently various cloud resources into the PanDA workload management system. The ATLAS Cloud Computing R&D is operating various PanDA queues on private and public resources and has provided several hundred thousand CPU days to the experiment. As a result, the ATLAS Cloud Computing R&D group has gained a significant insight into the cloud computing landscape and has identified points that still need to be addressed in order to fully utilize this technology. This contribution will explain the cloud integration models that are being evaluated and will discuss ATLAS’ learning during the collaboration with leading commercial and academic cloud providers

CERN Document Server

ATLAS Distributed Computing Automation

Author: Barreiro Megino F H
Borrego C
Campana S
Di Girolamo A
Elmsheuser J
Hejbal J
Kouba T
Legger F
Magradze E
Medrano Llamas R
Negri G
Rinaldi L
Schovancova J
Sciacca G
Serfon C
Van Der Ster D C
Publication venue
Publication date: 11/07/2012
Field of study

The ATLAS Experiment benefits from computing resources distributed worldwide at more than 100 WLCG sites. The ATLAS Grid sites provide over 100k CPU job slots, over 100 PB of storage space on disk or tape. Monitoring of status of such a complex infrastructure is essential. The ATLAS Grid infrastructure is monitored 24/7 by two teams of shifters distributed world-wide, by the ATLAS Distributed Computing experts, and by site administrators. In this paper we summarize automation efforts performed within the ATLAS Distributed Computing team in order to reduce manpower costs and improve the reliability of the system. Different aspects of the automation process are described: from the ATLAS Grid site topology provided by the ATLAS Grid Information System, via automatic site testing by the HammerCloud, to automatic exclusion from production or analysis activities

CERN Document Server

Predictive analytics tools to adjust and monitor performance metrics for the ATLAS Production System

Author: A Klimentov
Aad G
Borodin M
Caruana R
D Golubkov
De K
F H Barreiro Megino
Fujiwara K
Klimentov A
M Borodin
M Grigorieva
M Gubin
M Titov
Maeno T
Malhotra R
Meng X
S Padolski
T Korchuganova
T Maeno
Publication venue: 'IOP Publishing'
Publication date
Field of study

Crossref

Implementing data placement strategies for the CMS experiment based on a popularity model

Author: Aderholz M
Andreeva J
ATLAS Collaboration
Buchmuller O
CMS Collaboration
CMS Collaboration
Codispoti G
D Giordano
D Spiga
E Karavakis
Egeland R
F H Barreiro Megino
Grandi C
M Cinquilli
M Girone
Molfetas A
N Magini
Peters A J
TLS Group
V Mancinelli
Publication venue: 'IOP Publishing'
Publication date
Field of study

Crossref