783 research outputs found

    A job monitoring system for the LCG computing grid

    Full text link
    hsnr.de physik.uni-wuppertal.de Experience with generating simulation data of high energy physics experiments has shown that a job moni-toring system (JMS) is essential to understand failures of jobs within the Grid. Such a system can give in-formation about the status of the user job as well as the worker node in parallel while a user job is run-ning. It should support the user directly by allowing the user to interact with the running job and should be able to make an automatic error correction. Further-more, such a system can be extended for an automatic classification of errors which can improve the stability and performance of the Grid environment. To increase the acceptance of the Grid, a graphical user interface (GUI) has been developed and integrated with the job monitoring system. Both components are currently in-tegrated in the computing environment for generating data for the DØ Experiment. In this paper we want to describe the basic components of the job monitoring software.

    Running a Production Grid Site at the London e-Science Centre

    Get PDF
    This paper describes how the London e-Science Centre cluster MARS, a production 400+ Opteron CPU cluster, was integrated into the production Large Hadron Collider Compute Grid. It describes the practical issues that we encountered when deploying and maintaining this system, and details the techniques that were applied to resolve them. Finally, we provide a set of recommendations based on our experiences for grid software development in general that we believe would make the technology more accessible. © 2006 IEEE

    Distributed Computing Grid Experiences in CMS

    Get PDF
    The CMS experiment is currently developing a computing system capable of serving, processing and archiving the large number of events that will be generated when the CMS detector starts taking data. During 2004 CMS undertook a large scale data challenge to demonstrate the ability of the CMS computing system to cope with a sustained data-taking rate equivalent to 25% of startup rate. Its goals were: to run CMS event reconstruction at CERN for a sustained period at 25 Hz input rate; to distribute the data to several regional centers; and enable data access at those centers for analysis. Grid middleware was utilized to help complete all aspects of the challenge. To continue to provide scalable access from anywhere in the world to the data, CMS is developing a layer of software that uses Grid tools to gain access to data and resources, and that aims to provide physicists with a user friendly interface for submitting their analysis jobs. This paper describes the data challenge experience with Grid infrastructure and the current development of the CMS analysis system

    HEP Applications Evaluation of the EDG Testbed and Middleware

    Full text link
    Workpackage 8 of the European Datagrid project was formed in January 2001 with representatives from the four LHC experiments, and with experiment independent people from five of the six main EDG partners. In September 2002 WP8 was strengthened by the addition of effort from BaBar and D0. The original mandate of WP8 was, following the definition of short- and long-term requirements, to port experiment software to the EDG middleware and testbed environment. A major additional activity has been testing the basic functionality and performance of this environment. This paper reviews experiences and evaluations in the areas of job submission, data management, mass storage handling, information systems and monitoring. It also comments on the problems of remote debugging, the portability of code, and scaling problems with increasing numbers of jobs, sites and nodes. Reference is made to the pioneeering work of Atlas and CMS in integrating the use of the EDG Testbed into their data challenges. A forward look is made to essential software developments within EDG and to the necessary cooperation between EDG and LCG for the LCG prototype due in mid 2003.Comment: Talk from the 2003 Computing in High Energy and Nuclear Physics Conference (CHEP03), La Jolla, CA, USA, March 2003, 7 pages. PSN THCT00

    CMS Monte Carlo production in the WLCG computing Grid

    Get PDF
    Monte Carlo production in CMS has received a major boost in performance and scale since the past CHEP06 conference. The production system has been re-engineered in order to incorporate the experience gained in running the previous system and to integrate production with the new CMS event data model, data management system and data processing framework. The system is interfaced to the two major computing Grids used by CMS, the LHC Computing Grid (LCG) and the Open Science Grid (OSG). Operational experience and integration aspects of the new CMS Monte Carlo production system is presented together with an analysis of production statistics. The new system automatically handles job submission, resource monitoring, job queuing, job distribution according to the available resources, data merging, registration of data into the data bookkeeping, data location, data transfer and placement systems. Compared to the previous production system automation, reliability and performance have been considerably improved. A more efficient use of computing resources and a better handling of the inherent Grid unreliability have resulted in an increase of production scale by about an order of magnitude, capable of running in parallel at the order of ten thousand jobs and yielding more than two million events per day

    Grid applications for the BaBar experiment

    Get PDF
    This paper discusses the use of e-Science Grid in providing computational resources for modern international High Energy Physics (HEP) experiments. We investigate the suitability of the current generation of Grid software to provide the necessary resources to perform large-scale simulation of the experiment and analysis of data in the context of multinational collaboration

    CMS Software Distribution on the LCG and OSG Grids

    Full text link
    The efficient exploitation of worldwide distributed storage and computing resources available in the grids require a robust, transparent and fast deployment of experiment specific software. The approach followed by the CMS experiment at CERN in order to enable Monte-Carlo simulations, data analysis and software development in an international collaboration is presented. The current status and future improvement plans are described.Comment: 4 pages, 1 figure, latex with hyperref

    Performance of R-GMA for monitoring grid jobs for CMS data production

    Get PDF
    High energy physics experiments, such as the Compact Muon Solenoid (CMS) at the CERN laboratory in Geneva, have large-scale data processing requirements, with data accumulating at a rate of 1 Gbyte/s. This load comfortably exceeds any previous processing requirements and we believe it may be most efficiently satisfied through grid computing. Furthermore the production of large quantities of Monte Carlo simulated data provides an ideal test bed for grid technologies and will drive their development. One important challenge when using the grid for data analysis is the ability to monitor transparently the large number of jobs that are being executed simultaneously at multiple remote sites. R-GMA is a monitoring and information management service for distributed resources based on the grid monitoring architecture of the Global Grid Forum. We have previously developed a system allowing us to test its performance under a heavy load while using few real grid resources. We present the latest results on this system running on the LCG 2 grid test bed using the LCG 2.6.0 middleware release. For a sustained load equivalent to 7 generations of 1000 simultaneous jobs, R-GMA was able to transfer all published messages and store them in a database for 98% of the individual jobs. The failures experienced were at the remote sites, rather than at the archiver's MON box as had been expected

    Installing, Running and Maintaining Large Linux Clusters at CERN

    Full text link
    Having built up Linux clusters to more than 1000 nodes over the past five years, we already have practical experience confronting some of the LHC scale computing challenges: scalability, automation, hardware diversity, security, and rolling OS upgrades. This paper describes the tools and processes we have implemented, working in close collaboration with the EDG project [1], especially with the WP4 subtask, to improve the manageability of our clusters, in particular in the areas of system installation, configuration, and monitoring. In addition to the purely technical issues, providing shared interactive and batch services which can adapt to meet the diverse and changing requirements of our users is a significant challenge. We describe the developments and tuning that we have introduced on our LSF based systems to maximise both responsiveness to users and overall system utilisation. Finally, this paper will describe the problems we are facing in enlarging our heterogeneous Linux clusters, the progress we have made in dealing with the current issues and the steps we are taking to gridify the clustersComment: 5 pages, Proceedings for the CHEP 2003 conference, La Jolla, California, March 24 - 28, 200
    • 

    corecore