Search CORE

3,180 research outputs found

Libra: An Economy driven Job Scheduling System for Clusters

Author: Ali Nosheen
Buyya Rajkumar
Hayat Zahra
Lotia Nausheen
Sherwani Jahanzeb
Publication venue
Publication date: 01/01/2002
Field of study

Clusters of computers have emerged as mainstream parallel and distributed platforms for high-performance, high-throughput and high-availability computing. To enable effective resource management on clusters, numerous cluster managements systems and schedulers have been designed. However, their focus has essentially been on maximizing CPU performance, but not on improving the value of utility delivered to the user and quality of services. This paper presents a new computational economy driven scheduling system called Libra, which has been designed to support allocation of resources based on the users? quality of service (QoS) requirements. It is intended to work as an add-on to the existing queuing and resource management system. The first version has been implemented as a plugin scheduler to the PBS (Portable Batch System) system. The scheduler offers market-based economy driven service for managing batch jobs on clusters by scheduling CPU time according to user utility as determined by their budget and deadline rather than system performance considerations. The Libra scheduler ensures that both these constraints are met within an O(n) run-time. The Libra scheduler has been simulated using the GridSim toolkit to carry out a detailed performance analysis. Results show that the deadline and budget based proportional resource allocation strategy improves the utility of the system and user satisfaction as compared to system-centric scheduling strategies.Comment: 13 page

arXiv.org e-Print Archive

CiteSeerX

Dependable Distributed Computing for the International Telecommunication Union Regional Radio Conference RRC06

Author: Lamanna M.
Manara A.
Mendez P.
Moscicki J. T.
Muraru A.
Publication venue
Publication date: 11/06/2009
Field of study

The International Telecommunication Union (ITU) Regional Radio Conference (RRC06) established in 2006 a new frequency plan for the introduction of digital broadcasting in European, African, Arab, CIS countries and Iran. The preparation of the plan involved complex calculations under short deadline and required dependable and efficient computing capability. The ITU designed and deployed in-situ a dedicated PC farm, in parallel to the European Organization for Nuclear Research (CERN) which provided and supported a system based on the EGEE Grid. The planning cycle at the RRC06 required a periodic execution in the order of 200,000 short jobs, using several hundreds of CPU hours, in a period of less than 12 hours. The nature of the problem required dynamic workload-balancing and low-latency access to the computing resources. We present the strategy and key technical choices that delivered a reliable service to the RRC06

arXiv.org e-Print Archive

Diet-ethic: Fair Scheduling of Optional Computations in GridRPC Middleware

Author: Camillo Frédéric
Caron Eddy
Guivarch Ronan
Hurault Aurélie
Klein Cristian
Pérez Christian
Publication venue: HAL CCSD
Publication date: 10/05/2012
Field of study

Most HPC platforms require users to submit a pre-determined number of computation requests (also called jobs). Unfortunately, this is cumbersome when some of the computations are optional, i.e., they are not critical, but their completion would improve results. For example, given a deadline, the number of requests to submit for a Monte Carlo experiment is difficult to choose. The more requests are completed, the better the results are, however, submitting too many might overload the platform. Conversely, submitting too few requests may leave resources unused and misses an opportunity to improve the results. This paper introduces and solves the problem of scheduling optional computations. An architecture which auto-tunes the number of requests is proposed, then implemented in the DIET GridRPC middleware. Real-life experiments show that several metrics are improved, such as user satisfaction, fairness and the number of completed requests. Moreover, the solution is shown to be scalable.La plupart des plate-formes HPC demandent à l'utilisateur de soumettre un nombre pré-déterminé de requêtes de calcul (aussi appelées " job "). Malheureusement, cela n'est pas pertinent quand une partie des calculs est optionnelle, c'est-à-dire, que l'exécution des requêtes n'est pas critique pour l'utilisateur, mais que leur complétion pourrait améliorer les résultats. Par exemple, étant donnée une date limite, le nombre de requêtes à soumettre pour une expérience Monte Carlo est difficile à choisir. Plus il y a des requêtes qui sont exécutées, meilleures sont les résultats. Cependant, en soumettant trop de requêtes, on risque de surcharger la plate-forme. À l'opposé, en ne soumettant pas assez de requêtes, les ressources sont sous-exploitées alors qu'elles auraient pu être utilisées pour améliorer les résultats. Cet article introduit et résout le problème d'ordonnancer des requêtes optionnelles. Une architecture qui choisit automatiquement le nombre de requêtes est proposée puis implémentée dans l'intergiciel GridRPC DIET. Les expériences faites sur de vraies plate-formes - telles que Grid'5000 - montrent que plusieurs métriques peuvent être améliorées, telles que la satisfaction des utilisateurs, l'équité et le nombre des requêtes exécutées. Enfin, la solution proposée passe à l'échelle

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Hal-Diderot

Bulk Scheduling with the DIANA Scheduler

Author: Ali Arshad
Anjum Ashiq
McClatchey Richard
Willers Ian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/08/2006
Field of study

Results from the research and development of a Data Intensive and Network Aware (DIANA) scheduling engine, to be used primarily for data intensive sciences such as physics analysis, are described. In Grid analyses, tasks can involve thousands of computing, data handling, and network resources. The central problem in the scheduling of these resources is the coordinated management of computation and data at multiple locations and not just data replication or movement. However, this can prove to be a rather costly operation and efficient sing can be a challenge if compute and data resources are mapped without considering network costs. We have implemented an adaptive algorithm within the so-called DIANA Scheduler which takes into account data location and size, network performance and computation capability in order to enable efficient global scheduling. DIANA is a performance-aware and economy-guided Meta Scheduler. It iteratively allocates each job to the site that is most likely to produce the best performance as well as optimizing the global queue for any remaining jobs. Therefore it is equally suitable whether a single job is being submitted or bulk scheduling is being performed. Results indicate that considerable performance improvements can be gained by adopting the DIANA scheduling approach.Comment: 12 pages, 11 figures. To be published in the IEEE Transactions in Nuclear Science, IEEE Press. 200

arXiv.org e-Print Archive