3,180 research outputs found

    Libra: An Economy driven Job Scheduling System for Clusters

    Full text link
    Clusters of computers have emerged as mainstream parallel and distributed platforms for high-performance, high-throughput and high-availability computing. To enable effective resource management on clusters, numerous cluster managements systems and schedulers have been designed. However, their focus has essentially been on maximizing CPU performance, but not on improving the value of utility delivered to the user and quality of services. This paper presents a new computational economy driven scheduling system called Libra, which has been designed to support allocation of resources based on the users? quality of service (QoS) requirements. It is intended to work as an add-on to the existing queuing and resource management system. The first version has been implemented as a plugin scheduler to the PBS (Portable Batch System) system. The scheduler offers market-based economy driven service for managing batch jobs on clusters by scheduling CPU time according to user utility as determined by their budget and deadline rather than system performance considerations. The Libra scheduler ensures that both these constraints are met within an O(n) run-time. The Libra scheduler has been simulated using the GridSim toolkit to carry out a detailed performance analysis. Results show that the deadline and budget based proportional resource allocation strategy improves the utility of the system and user satisfaction as compared to system-centric scheduling strategies.Comment: 13 page

    Dependable Distributed Computing for the International Telecommunication Union Regional Radio Conference RRC06

    Full text link
    The International Telecommunication Union (ITU) Regional Radio Conference (RRC06) established in 2006 a new frequency plan for the introduction of digital broadcasting in European, African, Arab, CIS countries and Iran. The preparation of the plan involved complex calculations under short deadline and required dependable and efficient computing capability. The ITU designed and deployed in-situ a dedicated PC farm, in parallel to the European Organization for Nuclear Research (CERN) which provided and supported a system based on the EGEE Grid. The planning cycle at the RRC06 required a periodic execution in the order of 200,000 short jobs, using several hundreds of CPU hours, in a period of less than 12 hours. The nature of the problem required dynamic workload-balancing and low-latency access to the computing resources. We present the strategy and key technical choices that delivered a reliable service to the RRC06

    Diet-ethic: Fair Scheduling of Optional Computations in GridRPC Middleware

    Get PDF
    Most HPC platforms require users to submit a pre-determined number of computation requests (also called jobs). Unfortunately, this is cumbersome when some of the computations are optional, i.e., they are not critical, but their completion would improve results. For example, given a deadline, the number of requests to submit for a Monte Carlo experiment is difficult to choose. The more requests are completed, the better the results are, however, submitting too many might overload the platform. Conversely, submitting too few requests may leave resources unused and misses an opportunity to improve the results. This paper introduces and solves the problem of scheduling optional computations. An architecture which auto-tunes the number of requests is proposed, then implemented in the DIET GridRPC middleware. Real-life experiments show that several metrics are improved, such as user satisfaction, fairness and the number of completed requests. Moreover, the solution is shown to be scalable.La plupart des plate-formes HPC demandent Ă  l'utilisateur de soumettre un nombre prĂ©-dĂ©terminĂ© de requĂȘtes de calcul (aussi appelĂ©es " job "). Malheureusement, cela n'est pas pertinent quand une partie des calculs est optionnelle, c'est-Ă -dire, que l'exĂ©cution des requĂȘtes n'est pas critique pour l'utilisateur, mais que leur complĂ©tion pourrait amĂ©liorer les rĂ©sultats. Par exemple, Ă©tant donnĂ©e une date limite, le nombre de requĂȘtes Ă  soumettre pour une expĂ©rience Monte Carlo est difficile Ă  choisir. Plus il y a des requĂȘtes qui sont exĂ©cutĂ©es, meilleures sont les rĂ©sultats. Cependant, en soumettant trop de requĂȘtes, on risque de surcharger la plate-forme. À l'opposĂ©, en ne soumettant pas assez de requĂȘtes, les ressources sont sous-exploitĂ©es alors qu'elles auraient pu ĂȘtre utilisĂ©es pour amĂ©liorer les rĂ©sultats. Cet article introduit et rĂ©sout le problĂšme d'ordonnancer des requĂȘtes optionnelles. Une architecture qui choisit automatiquement le nombre de requĂȘtes est proposĂ©e puis implĂ©mentĂ©e dans l'intergiciel GridRPC DIET. Les expĂ©riences faites sur de vraies plate-formes - telles que Grid'5000 - montrent que plusieurs mĂ©triques peuvent ĂȘtre amĂ©liorĂ©es, telles que la satisfaction des utilisateurs, l'Ă©quitĂ© et le nombre des requĂȘtes exĂ©cutĂ©es. Enfin, la solution proposĂ©e passe Ă  l'Ă©chelle

    Bulk Scheduling with the DIANA Scheduler

    Full text link
    Results from the research and development of a Data Intensive and Network Aware (DIANA) scheduling engine, to be used primarily for data intensive sciences such as physics analysis, are described. In Grid analyses, tasks can involve thousands of computing, data handling, and network resources. The central problem in the scheduling of these resources is the coordinated management of computation and data at multiple locations and not just data replication or movement. However, this can prove to be a rather costly operation and efficient sing can be a challenge if compute and data resources are mapped without considering network costs. We have implemented an adaptive algorithm within the so-called DIANA Scheduler which takes into account data location and size, network performance and computation capability in order to enable efficient global scheduling. DIANA is a performance-aware and economy-guided Meta Scheduler. It iteratively allocates each job to the site that is most likely to produce the best performance as well as optimizing the global queue for any remaining jobs. Therefore it is equally suitable whether a single job is being submitted or bulk scheduling is being performed. Results indicate that considerable performance improvements can be gained by adopting the DIANA scheduling approach.Comment: 12 pages, 11 figures. To be published in the IEEE Transactions in Nuclear Science, IEEE Press. 200
    • 

    corecore