Search CORE

783 research outputs found

A job monitoring system for the LCG computing grid

Author: Ahmad Hammad
David Meder-marouelli
Dimitri Igdalov
Peer Ueberholz
Torsten Harenberg
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

hsnr.de physik.uni-wuppertal.de Experience with generating simulation data of high energy physics experiments has shown that a job moni-toring system (JMS) is essential to understand failures of jobs within the Grid. Such a system can give in-formation about the status of the user job as well as the worker node in parallel while a user job is run-ning. It should support the user directly by allowing the user to interact with the running job and should be able to make an automatic error correction. Further-more, such a system can be extended for an automatic classification of errors which can improve the stability and performance of the Grid environment. To increase the acceptance of the Grid, a graphical user interface (GUI) has been developed and integrated with the job monitoring system. Both components are currently in-tegrated in the computing environment for generating data for the DØ Experiment. In this paper we want to describe the basic components of the job monitoring software.

CiteSeerX

Crossref

Running a Production Grid Site at the London e-Science Centre

Author: Aggarwal M
Colling D
Darlington J
Krznaric M
McBride D
van der Aa O
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2006
Field of study

This paper describes how the London e-Science Centre cluster MARS, a production 400+ Opteron CPU cluster, was integrated into the production Large Hadron Collider Compute Grid. It describes the practical issues that we encountered when deploying and maintaining this system, and details the techniques that were applied to resolve them. Finally, we provide a set of recommendations based on our experiences for grid software development in general that we believe would make the technology more accessible. © 2006 IEEE

Crossref

Spiral - Imperial College Digital Repository

Distributed Computing Grid Experiences in CMS

Author: Andreeva J.
Anjum A.
Barrass T.
Bonacorsi D.
Bunn J.
Capiluppi P.
Corvo M.
Darmenov N.
De Filippis N.
Donno F.
Donvito G.
Eulisse G.
Fanfani A.
Fanzago F.
Filine A.
Grandi C.
Hernandez J. M.
Innocente V.
Jan A.
Lacapara S.
Legrand I.
Metson S.
Newbold D.
Newman H.
Pierro A.
Silvestris L.
Steenberg C.
Stockinger H.
Taylor L.
Thomas M.
Tuura L.
Van Lingen F.
Wildish T.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

The CMS experiment is currently developing a computing system capable of serving, processing and archiving the large number of events that will be generated when the CMS detector starts taking data. During 2004 CMS undertook a large scale data challenge to demonstrate the ability of the CMS computing system to cope with a sustained data-taking rate equivalent to 25% of startup rate. Its goals were: to run CMS event reconstruction at CERN for a sustained period at 25 Hz input rate; to distribute the data to several regional centers; and enable data access at those centers for analysis. Grid middleware was utilized to help complete all aspects of the challenge. To continue to provide scalable access from anywhere in the world to the data, CMS is developing a layer of software that uses Grid tools to gain access to data and resources, and that aims to provide physicists with a user friendly interface for submitting their analysis jobs. This paper describes the data challenge experience with Grid infrastructure and the current development of the CMS analysis system

Crossref

Caltech Authors

CERN Document Server

HEP Applications Evaluation of the EDG Testbed and Middleware

Author: Augustin I.
Bagnasco S.
Barbera R.
Blaising J. J.
Bos K.
Boutigny D.
Burke S.
Capiluppi P.
Carminati F.
Cerello P.
Charlot C.
Closier J.
Colling D.
Fanfani A.
Garonne V.
Harris F.
Negri G.
Perini L.
Reale M.
Resconi S.
Sciaba A.
Sitta M.
Smirnova O.
Templon J.
Tsaregorodtsev A.
van Herwijnen E.
Vicinanza D.
Publication venue
Publication date: 01/01/2003
Field of study

Workpackage 8 of the European Datagrid project was formed in January 2001 with representatives from the four LHC experiments, and with experiment independent people from five of the six main EDG partners. In September 2002 WP8 was strengthened by the addition of effort from BaBar and D0. The original mandate of WP8 was, following the definition of short- and long-term requirements, to port experiment software to the EDG middleware and testbed environment. A major additional activity has been testing the basic functionality and performance of this environment. This paper reviews experiences and evaluations in the areas of job submission, data management, mass storage handling, information systems and monitoring. It also comments on the problems of remote debugging, the portability of code, and scaling problems with increasing numbers of jobs, sites and nodes. Reference is made to the pioneeering work of Atlas and CMS in integrating the use of the EDG Testbed into their data challenges. A forward look is made to essential software developments within EDG and to the necessary cooperation between EDG and LCG for the LCG prototype due in mid 2003.Comment: Talk from the 2003 Computing in High Energy and Nuclear Physics Conference (CHEP03), La Jolla, CA, USA, March 2003, 7 pages. PSN THCT00

arXiv.org e-Print Archive

HAL-IN2P3

Lund University Publications

Hal - Université Grenoble Alpes

HAL Université de Savoie

CERN Document Server

HAL-Polytechnique

CMS Monte Carlo production in the WLCG computing Grid

Author: Abbrescia M.
Bacchi W.
Caballero J.
Codispoti G.
De Filippis N.
De Weirdt S.
Donvito G.
Elmer P.
Eulisse G.
Evans D.
Fanfani A.
Flossdorf A.
Guan W.
Hammad G.
Hernandez J. M.
Hof C.
Kalini S.
Kavka C.
Khomitch A.
Kreuzer P.
Lazaridis C.
Maes J.
Maggi G.
Mohapatra A.
Myers S.
Pompili A.
Sanches J. A.
Sarkar S.
Van Lingen F.
van Mulders P.
Villella I.
Wakefield S.
Publication venue: 'AIP Publishing'
Publication date: 01/01/2008
Field of study

Monte Carlo production in CMS has received a major boost in performance and scale since the past CHEP06 conference. The production system has been re-engineered in order to incorporate the experience gained in running the previous system and to integrate production with the new CMS event data model, data management system and data processing framework. The system is interfaced to the two major computing Grids used by CMS, the LHC Computing Grid (LCG) and the Open Science Grid (OSG). Operational experience and integration aspects of the new CMS Monte Carlo production system is presented together with an analysis of production statistics. The new system automatically handles job submission, resource monitoring, job queuing, job distribution according to the available resources, data merging, registration of data into the data bookkeeping, data location, data transfer and placement systems. Compared to the previous production system automation, reliability and performance have been considerably improved. A more efficient use of computing resources and a better handling of the inherent Grid unreliability have resulted in an increase of production scale by about an order of magnitude, capable of running in parallel at the order of ten thousand jobs and yielding more than two million events per day

Caltech Authors

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

DI-fusion

CERN Document Server

Grid applications for the BaBar experiment

Author: Khan A
Wilson F
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

This paper discusses the use of e-Science Grid in providing computational resources for modern international High Energy Physics (HEP) experiments. We investigate the suitability of the current generation of Grid software to provide the necessary resources to perform large-scale simulation of the experiment and analysis of data in the context of multinational collaboration

CiteSeerX

Brunel University Research Archive

CMS Software Distribution on the LCG and OSG Grids

Author: Argirò S.
Ashby S.
Büge V.
Corvo M.
Darmenov N.
Darwish R.
Evans D.
Holzman B.
Kim B.
Muzaffar S.
Nowack A.
Rabbertz K.
Ratnikova N.
Thomas M.
Weng J.
Wildish T.
Publication venue
Publication date: 27/04/2006
Field of study

The efficient exploitation of worldwide distributed storage and computing resources available in the grids require a robust, transparent and fast deployment of experiment specific software. The approach followed by the CMS experiment at CERN in order to enable Monte-Carlo simulations, data analysis and software development in an international collaboration is presented. The current status and future improvement plans are described.Comment: 4 pages, 1 figure, latex with hyperref

arXiv.org e-Print Archive

CERN Document Server

Performance of R-GMA for monitoring grid jobs for CMS data production

Author: Byrom R
Colling D
Fisher SM
Grandi C
Hobson PR
Kyberd P
MacEvoy B
Nebrensky JJ
Traylen S
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1
Field of study

High energy physics experiments, such as the Compact Muon Solenoid (CMS) at the CERN laboratory in Geneva, have large-scale data processing requirements, with data accumulating at a rate of 1 Gbyte/s. This load comfortably exceeds any previous processing requirements and we believe it may be most efficiently satisfied through grid computing. Furthermore the production of large quantities of Monte Carlo simulated data provides an ideal test bed for grid technologies and will drive their development. One important challenge when using the grid for data analysis is the ability to monitor transparently the large number of jobs that are being executed simultaneously at multiple remote sites. R-GMA is a monitoring and information management service for distributed resources based on the grid monitoring architecture of the Global Grid Forum. We have previously developed a system allowing us to test its performance under a heavy load while using few real grid resources. We present the latest results on this system running on the LCG 2 grid test bed using the LCG 2.6.0 middleware release. For a sustained load equivalent to 7 generations of 1000 simultaneous jobs, R-GMA was able to transfer all published messages and store them in a database for 98% of the individual jobs. The failures experienced were at the remote sites, rather than at the archiver's MON box as had been expected

Crossref

Brunel University Research Archive

Installing, Running and Maintaining Large Linux Clusters at CERN

Author: Bahyl Vladimir
Chardi Benjamin
Fuchs Ulrich
Kleinwort Thorsten
Murth Martin
Smith Tim
van Eldik Jan
Publication venue
Publication date: 01/01/2003
Field of study

Having built up Linux clusters to more than 1000 nodes over the past five years, we already have practical experience confronting some of the LHC scale computing challenges: scalability, automation, hardware diversity, security, and rolling OS upgrades. This paper describes the tools and processes we have implemented, working in close collaboration with the EDG project [1], especially with the WP4 subtask, to improve the manageability of our clusters, in particular in the areas of system installation, configuration, and monitoring. In addition to the purely technical issues, providing shared interactive and batch services which can adapt to meet the diverse and changing requirements of our users is a significant challenge. We describe the developments and tuning that we have introduced on our LSF based systems to maximise both responsiveness to users and overall system utilisation. Finally, this paper will describe the problems we are facing in enlarging our heterogeneous Linux clusters, the progress we have made in dealing with the current issues and the steps we are taking to gridify the clustersComment: 5 pages, Proceedings for the CHEP 2003 conference, La Jolla, California, March 24 - 28, 200

arXiv.org e-Print Archive

CiteSeerX

CERN Document Server