185 research outputs found

    Libra: An Economy driven Job Scheduling System for Clusters

    Full text link
    Clusters of computers have emerged as mainstream parallel and distributed platforms for high-performance, high-throughput and high-availability computing. To enable effective resource management on clusters, numerous cluster managements systems and schedulers have been designed. However, their focus has essentially been on maximizing CPU performance, but not on improving the value of utility delivered to the user and quality of services. This paper presents a new computational economy driven scheduling system called Libra, which has been designed to support allocation of resources based on the users? quality of service (QoS) requirements. It is intended to work as an add-on to the existing queuing and resource management system. The first version has been implemented as a plugin scheduler to the PBS (Portable Batch System) system. The scheduler offers market-based economy driven service for managing batch jobs on clusters by scheduling CPU time according to user utility as determined by their budget and deadline rather than system performance considerations. The Libra scheduler ensures that both these constraints are met within an O(n) run-time. The Libra scheduler has been simulated using the GridSim toolkit to carry out a detailed performance analysis. Results show that the deadline and budget based proportional resource allocation strategy improves the utility of the system and user satisfaction as compared to system-centric scheduling strategies.Comment: 13 page

    Workload Schedulers - Genesis, Algorithms and Comparisons

    Get PDF
    In this article we provide brief descriptions of three classes of schedulers: Operating Systems Process Schedulers, Cluster Systems, Jobs Schedulers and Big Data Schedulers. We describe their evolution from early adoptions to modern implementations, considering both the use and features of algorithms. In summary, we discuss differences between all presented classes of schedulers and discuss their chronological development. In conclusion, we highlight similarities in the focus of scheduling strategies design, applicable to both local and distributed systems

    Enhancing Job Scheduling of an Atmospheric Intensive Data Application

    Get PDF
    Nowadays, e-Science applications involve great deal of data to have more accurate analysis. One of its application domains is the Radio Occultation which manages satellite data. Grid Processing Management is a physical infrastructure geographically distributed based on Grid Computing, that is implemented for the overall processing Radio Occultation analysis. After a brief description of algorithms adopted to characterize atmospheric profiles, the paper presents an improvement of job scheduling in order to decrease processing time and optimize resource utilization. Extension of grid computing capacity is implemented by virtual machines in existing physical Grid in order to satisfy temporary job requests. Also scheduling plays an important role in the infrastructure that is handled by a couple of schedulers which are developed to manage data automaticall

    A new job migration algorithm to improve data center efficiency

    Full text link
    The under exploitation of the available resources risks to be one of the main problems for a computing center. The growing demand of computational power necessarily entails more complex approaches in the management of the computing resources, with particular attention to the batch queue system scheduler. In a heterogeneous batch queue system, available for both serial single core processes and parallel multi core jobs, it may happen that one or more computational nodes composing the cluster are not fully occupied, running a number of jobs lower than their actual capability. A typical case is represented by more single core jobs running each one over a different multi core server, while more parallel jobs - requiring all the available cores of a host - are queued. A job rearrangement executed at runtime is able to free extra resources, in order to host new processes. We present an efficient method to improve the computing resources exploitation.Comment: 7 page

    A method of evaluation of high-performance computing batch schedulers

    Get PDF
    According to Sterling et al., a batch scheduler, also called workload management, is an application or set of services that provide a method to monitor and manage the flow of work through the system [Sterling01]. The purpose of this research was to develop a method to assess the execution speed of workloads that are submitted to a batch scheduler. While previous research exists, this research is different in that more complex jobs were devised that fully exercised the scheduler with established benchmarks. This research is important because the reduction of latency even if it is miniscule can lead to massive savings of electricity, time, and money over the long term. This is especially important in the era of green computing [Reuther18]. The methodology used to assess these schedulers involved the execution of custom automation scripts. These custom scripts were developed as part of this research to automatically submit custom jobs to the schedulers, take measurements, and record the results. There were multiple experiments conducted throughout the course of the research. These experiments were designed to apply the methodology and assess the execution speed of a small selection of batch schedulers. Due to time constraints, the research was limited to four schedulers. x The measurements that were taken during the experiments were wall time, RAM usage, and CPU usage. These measurements captured the utilization of system resources of each of the schedulers. The custom scripts were executed using, 1, 2, and 4 servers to determine how well a scheduler scales with network growth. The experiments were conducted on local school resources. All hardware was similar and was co-located within the same data-center. While the schedulers that were investigated as part of the experiments are agnostic to whether the system is grid, cluster, or super-computer; the investigation was limited to a cluster architecture

    Towards Peer-to-Peer Scheduling Architecture for the Czech National Grid

    Get PDF
    The Czech National Grid Infrastructure MetaCentrum has been using a central scheduler infrastructure for approximately the past 10 years. This facilitated simple administration and direct support for large jobs running across several geographical sites. The knowledge of complete state allowed the scheduler to provide high quality decision making incorporating features like fairshare. On the other hand, this central setup created a single point of failure issue and also reached its scalability limits. In this paper we describe our work towards a new distributed architecture that maintains high scheduling quality while solving most of the single server issues. Our new distributed architecture provides both local autonomy and high scheduling quality. Users can still submit jobs locally even when cross-site connectivity is lost. Individual schedulers work primarily with their local server but still maintain global state, that allows them to mimic centralised scheduling features. The architecture still supports central accounting and fairshare across the entire grid. Implementation is based on the open-source Torque batch system, which replaced the previous commercial PBSPro central server installation. Torque provides a similar codebase as it has a common ancestor with PBSPro in OpenPBS. Torque therefore provides familiar interface for both users and developers

    Practical Experiences With Torque Meta-Scheduling In The Czech National Grid

    Get PDF
    The Czech National Grid Infrastructure went through a complex transition inthe last year. The production environment has been switched from a commercialbatch system PBSPro, which was replaced by an open source alternative Torquebatch system.This paper concentrates on two aspects of this transition. First, we will presentour practical experience with Torque being used as a production ready batchsystem. Our modified version of Torque, with all the necessary PBSPro ex-clusive features re-implemented and further extended with new features likecloud-like behaviour, was deployed across the entire production environment,covering the entire Czech Republic for almost a full year.In the second part, we will present our work on meta-scheduling. This in-volves our work on distributed architecture and cloud-grid convergence. Thedistributed architecture was designed to overcome the limitations of a centralserver setup, which was originally used and presented stability and performanceissues. While this paper does not discuss the inclusion of cloud interfaces intogrids, it does present the dynamic infrastructure, which is a requirement forsharing the grid infrastructure between a batch system and a cloud gateway.We are also inviting everyone to try out our fork of the Torque batch system,which is now publicly available
    corecore