364 research outputs found
Scalability tests of R-GMA-based grid job monitoring system for CMS Monte Carlo data production
Copyright @ 2004 IEEEHigh-energy physics experiments, such as the compact muon solenoid (CMS) at the large hadron collider (LHC), have large-scale data processing computing requirements. The grid has been chosen as the solution. One important challenge when using the grid for large-scale data processing is the ability to monitor the large numbers of jobs that are being executed simultaneously at multiple remote sites. The relational grid monitoring architecture (R-GMA) is a monitoring and information management service for distributed resources based on the GMA of the Global Grid Forum. We report on the first measurements of R-GMA as part of a monitoring architecture to be used for batch submission of multiple Monte Carlo simulation jobs running on a CMS-specific LHC computing grid test bed. Monitoring information was transferred in real time from remote execution nodes back to the submitting host and stored in a database. In scalability tests, the job submission rates supported by successive releases of R-GMA improved significantly, approaching that expected in full-scale production
A study of publish/subscribe systems for real-time grid monitoring
Monitoring and controlling a large number of geographically distributed scientific instruments is a challenging task. Some operations on these instruments require real-time (or quasi real-time) response which make it even more difficult. In this paper, we describe the requirements of distributed monitoring for a possible future electrical power grid based on real-time extensions to grid computing. We examine several standards and publish/subscribe middleware candidates, some of which were specially designed and developed for grid monitoring. We analyze their architecture and functionality, and discuss the advantages and disadvantages. We report on a series of tests to measure their real-time performance and scalability
Performance of R-GMA for monitoring grid jobs for CMS data production
High energy physics experiments, such as the Compact Muon Solenoid (CMS) at the CERN laboratory in Geneva, have large-scale data processing requirements, with data accumulating at a rate of 1 Gbyte/s. This load comfortably exceeds any previous processing requirements and we believe it may be most efficiently satisfied through grid computing. Furthermore the production of large quantities of Monte Carlo simulated data provides an ideal test bed for grid technologies and will drive their development. One important challenge when using the grid for data analysis is the ability to monitor transparently the large number of jobs that are being executed simultaneously at multiple remote sites. R-GMA is a monitoring and information management service for distributed resources based on the grid monitoring architecture of the Global Grid Forum. We have previously developed a system allowing us to test its performance under a heavy load while using few real grid resources. We present the latest results on this system running on the LCG 2 grid test bed using the LCG 2.6.0 middleware release. For a sustained load equivalent to 7 generations of 1000 simultaneous jobs, R-GMA was able to transfer all published messages and store them in a database for 98% of the individual jobs. The failures experienced were at the remote sites, rather than at the archiver's MON box as had been expected
Sharing a conceptual model of grid resources and services
Grid technologies aim at enabling a coordinated resource-sharing and
problem-solving capabilities over local and wide area networks and span
locations, organizations, machine architectures and software boundaries. The
heterogeneity of involved resources and the need for interoperability among
different grid middlewares require the sharing of a common information model.
Abstractions of different flavors of resources and services and conceptual
schemas of domain specific entities require a collaboration effort in order to
enable a coherent information services cooperation.
With this paper, we present the result of our experience in grid resources
and services modelling carried out within the Grid Laboratory Uniform
Environment (GLUE) effort, a joint US and EU High Energy Physics projects
collaboration towards grid interoperability. The first implementation-neutral
agreement on services such as batch computing and storage manager, resources
such as the hierarchy cluster, sub-cluster, host and the storage library are
presented. Design guidelines and operational results are depicted together with
open issues and future evolutions.Comment: 4 pages, 0 figures, CHEP 200
Developing Resource Usage Service in WLCG
According to the Memorandum of Understanding (MoU) of the World-wide LHC Computing Grid (WLCG) project, participating sites are required to provide resource usage or accounting data to the Grid Operational Centre (GOC) to enrich the understanding of how shared resources are used, and to provide information for improving the effectiveness of resource allocation. As a multi-grid environment, the accounting process of WLCG is currently enabled by four accounting systems, each of which was developed independently by constituent grid projects. These accounting systems were designed and implemented based on project-specific local understanding of requirements, and therefore lack interoperability. In order to automate the accounting process in WLCG, three transportation methods are being introduced for streaming accounting data metered by heterogeneous accounting systems into GOC at Rutherford Appleton Laboratory (RAL) in the UK, where accounting data are aggregated and accumulated throughout the year. These transportation methods, however, were introduced on a per accounting-system basis, i.e. targeting at a particular accounting system, making them hard to reuse and customize to new requirements. This paper presents the design of WLCG-RUS system, a standards-compatible solution providing a consistent process for streaming resource usage data across various accounting systems, while ensuring interoperability, portability, and customization
Job Monitoring in an Interactive Grid Analysis Environment
The grid is emerging as a great computational resource but
its dynamic behavior makes the Grid environment unpredictable. Systems and networks can fail, and the
introduction of more users can result in resource starvation.
Once a job has been submitted for execution on the grid,
monitoring becomes essential for a user to see that the job is completed in an efficient way, and to detect any problems
that occur while the job is running. In current environments
once a user submits a job he loses direct control over the job and the system behaves like a batch system: the user
submits the job and later gets a result back. The only
information a user can obtain about a job is whether it is
scheduled, running, cancelled or finished. Today users are
becoming increasingly interested in such analysis grid
environments in which they can check the progress of the
job, obtain intermediate results, terminate the job based on
the progress of job or intermediate results, steer the job to
other nodes to achieve better performance and check the
resources consumed by the job. In order to fulfill their
requirements of interactivity a mechanism is needed that
can provide the user with real time access to information
about different attributes of a job. In this paper we present
the design of a Job Monitoring Service, a web service that
will provide interactive remote job monitoring by allowing
users to access different attributes of a job once it has been submitted to the interactive Grid Analysis Environment
Polish grid infrastructure for science and research
Structure, functionality, parameters and organization of the computing Grid
in Poland is described, mainly from the perspective of high-energy particle
physics community, currently its largest consumer and developer. It represents
distributed Tier-2 in the worldwide Grid infrastructure. It also provides
services and resources for data-intensive applications in other sciences.Comment: Proceeedings of IEEE Eurocon 2007, Warsaw, Poland, 9-12 Sep. 2007,
p.44
- …