3,180 research outputs found
Libra: An Economy driven Job Scheduling System for Clusters
Clusters of computers have emerged as mainstream parallel and distributed
platforms for high-performance, high-throughput and high-availability
computing. To enable effective resource management on clusters, numerous
cluster managements systems and schedulers have been designed. However, their
focus has essentially been on maximizing CPU performance, but not on improving
the value of utility delivered to the user and quality of services. This paper
presents a new computational economy driven scheduling system called Libra,
which has been designed to support allocation of resources based on the users?
quality of service (QoS) requirements. It is intended to work as an add-on to
the existing queuing and resource management system. The first version has been
implemented as a plugin scheduler to the PBS (Portable Batch System) system.
The scheduler offers market-based economy driven service for managing batch
jobs on clusters by scheduling CPU time according to user utility as determined
by their budget and deadline rather than system performance considerations. The
Libra scheduler ensures that both these constraints are met within an O(n)
run-time. The Libra scheduler has been simulated using the GridSim toolkit to
carry out a detailed performance analysis. Results show that the deadline and
budget based proportional resource allocation strategy improves the utility of
the system and user satisfaction as compared to system-centric scheduling
strategies.Comment: 13 page
Dependable Distributed Computing for the International Telecommunication Union Regional Radio Conference RRC06
The International Telecommunication Union (ITU) Regional Radio Conference
(RRC06) established in 2006 a new frequency plan for the introduction of
digital broadcasting in European, African, Arab, CIS countries and Iran. The
preparation of the plan involved complex calculations under short deadline and
required dependable and efficient computing capability. The ITU designed and
deployed in-situ a dedicated PC farm, in parallel to the European Organization
for Nuclear Research (CERN) which provided and supported a system based on the
EGEE Grid. The planning cycle at the RRC06 required a periodic execution in the
order of 200,000 short jobs, using several hundreds of CPU hours, in a period
of less than 12 hours. The nature of the problem required dynamic
workload-balancing and low-latency access to the computing resources. We
present the strategy and key technical choices that delivered a reliable
service to the RRC06
Diet-ethic: Fair Scheduling of Optional Computations in GridRPC Middleware
Most HPC platforms require users to submit a pre-determined number of computation requests (also called jobs). Unfortunately, this is cumbersome when some of the computations are optional, i.e., they are not critical, but their completion would improve results. For example, given a deadline, the number of requests to submit for a Monte Carlo experiment is difficult to choose. The more requests are completed, the better the results are, however, submitting too many might overload the platform. Conversely, submitting too few requests may leave resources unused and misses an opportunity to improve the results. This paper introduces and solves the problem of scheduling optional computations. An architecture which auto-tunes the number of requests is proposed, then implemented in the DIET GridRPC middleware. Real-life experiments show that several metrics are improved, such as user satisfaction, fairness and the number of completed requests. Moreover, the solution is shown to be scalable.La plupart des plate-formes HPC demandent Ă l'utilisateur de soumettre un nombre prĂ©-dĂ©terminĂ© de requĂȘtes de calcul (aussi appelĂ©es " job "). Malheureusement, cela n'est pas pertinent quand une partie des calculs est optionnelle, c'est-Ă -dire, que l'exĂ©cution des requĂȘtes n'est pas critique pour l'utilisateur, mais que leur complĂ©tion pourrait amĂ©liorer les rĂ©sultats. Par exemple, Ă©tant donnĂ©e une date limite, le nombre de requĂȘtes Ă soumettre pour une expĂ©rience Monte Carlo est difficile Ă choisir. Plus il y a des requĂȘtes qui sont exĂ©cutĂ©es, meilleures sont les rĂ©sultats. Cependant, en soumettant trop de requĂȘtes, on risque de surcharger la plate-forme. Ă l'opposĂ©, en ne soumettant pas assez de requĂȘtes, les ressources sont sous-exploitĂ©es alors qu'elles auraient pu ĂȘtre utilisĂ©es pour amĂ©liorer les rĂ©sultats. Cet article introduit et rĂ©sout le problĂšme d'ordonnancer des requĂȘtes optionnelles. Une architecture qui choisit automatiquement le nombre de requĂȘtes est proposĂ©e puis implĂ©mentĂ©e dans l'intergiciel GridRPC DIET. Les expĂ©riences faites sur de vraies plate-formes - telles que Grid'5000 - montrent que plusieurs mĂ©triques peuvent ĂȘtre amĂ©liorĂ©es, telles que la satisfaction des utilisateurs, l'Ă©quitĂ© et le nombre des requĂȘtes exĂ©cutĂ©es. Enfin, la solution proposĂ©e passe Ă l'Ă©chelle
Bulk Scheduling with the DIANA Scheduler
Results from the research and development of a Data Intensive and Network
Aware (DIANA) scheduling engine, to be used primarily for data intensive
sciences such as physics analysis, are described. In Grid analyses, tasks can
involve thousands of computing, data handling, and network resources. The
central problem in the scheduling of these resources is the coordinated
management of computation and data at multiple locations and not just data
replication or movement. However, this can prove to be a rather costly
operation and efficient sing can be a challenge if compute and data resources
are mapped without considering network costs. We have implemented an adaptive
algorithm within the so-called DIANA Scheduler which takes into account data
location and size, network performance and computation capability in order to
enable efficient global scheduling. DIANA is a performance-aware and
economy-guided Meta Scheduler. It iteratively allocates each job to the site
that is most likely to produce the best performance as well as optimizing the
global queue for any remaining jobs. Therefore it is equally suitable whether a
single job is being submitted or bulk scheduling is being performed. Results
indicate that considerable performance improvements can be gained by adopting
the DIANA scheduling approach.Comment: 12 pages, 11 figures. To be published in the IEEE Transactions in
Nuclear Science, IEEE Press. 200
- âŠ