185 research outputs found
Libra: An Economy driven Job Scheduling System for Clusters
Clusters of computers have emerged as mainstream parallel and distributed
platforms for high-performance, high-throughput and high-availability
computing. To enable effective resource management on clusters, numerous
cluster managements systems and schedulers have been designed. However, their
focus has essentially been on maximizing CPU performance, but not on improving
the value of utility delivered to the user and quality of services. This paper
presents a new computational economy driven scheduling system called Libra,
which has been designed to support allocation of resources based on the users?
quality of service (QoS) requirements. It is intended to work as an add-on to
the existing queuing and resource management system. The first version has been
implemented as a plugin scheduler to the PBS (Portable Batch System) system.
The scheduler offers market-based economy driven service for managing batch
jobs on clusters by scheduling CPU time according to user utility as determined
by their budget and deadline rather than system performance considerations. The
Libra scheduler ensures that both these constraints are met within an O(n)
run-time. The Libra scheduler has been simulated using the GridSim toolkit to
carry out a detailed performance analysis. Results show that the deadline and
budget based proportional resource allocation strategy improves the utility of
the system and user satisfaction as compared to system-centric scheduling
strategies.Comment: 13 page
Workload Schedulers - Genesis, Algorithms and Comparisons
In this article we provide brief descriptions of three classes of schedulers: Operating Systems Process Schedulers, Cluster Systems, Jobs Schedulers and Big Data Schedulers. We describe their evolution from early adoptions to modern implementations, considering both the use and features of algorithms. In summary, we discuss differences between all presented classes of schedulers and discuss their chronological development. In conclusion, we highlight similarities in the focus of scheduling strategies design, applicable to both local and distributed systems
Enhancing Job Scheduling of an Atmospheric Intensive Data Application
Nowadays, e-Science applications involve great deal of data to have more accurate analysis. One of its application domains is the Radio Occultation which manages satellite data. Grid Processing Management is a physical infrastructure geographically distributed based on Grid Computing, that is implemented for the overall processing Radio Occultation analysis. After a brief description of algorithms adopted to characterize atmospheric profiles, the paper presents an improvement of job scheduling in order to decrease processing time and optimize resource utilization. Extension of grid computing capacity is implemented by virtual machines in existing physical Grid in order to satisfy temporary job requests. Also scheduling plays an important role in the infrastructure that is handled by a couple of schedulers which are developed to manage data automaticall
A new job migration algorithm to improve data center efficiency
The under exploitation of the available resources risks to be one of the main
problems for a computing center. The growing demand of computational power
necessarily entails more complex approaches in the management of the computing
resources, with particular attention to the batch queue system scheduler. In a
heterogeneous batch queue system, available for both serial single core
processes and parallel multi core jobs, it may happen that one or more
computational nodes composing the cluster are not fully occupied, running a
number of jobs lower than their actual capability. A typical case is
represented by more single core jobs running each one over a different multi
core server, while more parallel jobs - requiring all the available cores of a
host - are queued. A job rearrangement executed at runtime is able to free
extra resources, in order to host new processes. We present an efficient method
to improve the computing resources exploitation.Comment: 7 page
A method of evaluation of high-performance computing batch schedulers
According to Sterling et al., a batch scheduler, also called workload management, is an application or set of services that provide a method to monitor and manage the flow of work through the system [Sterling01]. The purpose of this research was to develop a method to assess the execution speed of workloads that are submitted to a batch scheduler. While previous research exists, this research is different in that more complex jobs were devised that fully exercised the scheduler with established benchmarks. This research is important because the reduction of latency even if it is miniscule can lead to massive savings of electricity, time, and money over the long term. This is especially important in the era of green computing [Reuther18]. The methodology used to assess these schedulers involved the execution of custom automation scripts. These custom scripts were developed as part of this research to automatically submit custom jobs to the schedulers, take measurements, and record the results. There were multiple experiments conducted throughout the course of the research. These experiments were designed to apply the methodology and assess the execution speed of a small selection of batch schedulers. Due to time constraints, the research was limited to four schedulers. x The measurements that were taken during the experiments were wall time, RAM usage, and CPU usage. These measurements captured the utilization of system resources of each of the schedulers. The custom scripts were executed using, 1, 2, and 4 servers to determine how well a scheduler scales with network growth. The experiments were conducted on local school resources. All hardware was similar and was co-located within the same data-center. While the schedulers that were investigated as part of the experiments are agnostic to whether the system is grid, cluster, or super-computer; the investigation was limited to a cluster architecture
Towards Peer-to-Peer Scheduling Architecture for the Czech National Grid
The Czech National Grid Infrastructure MetaCentrum has been using a central scheduler infrastructure for approximately the past 10 years. This facilitated simple administration and direct support for large jobs running across several geographical sites. The knowledge of complete state allowed the scheduler to provide high quality decision making incorporating features like fairshare. On the other hand, this central setup created a single point of failure issue and also reached its scalability limits. In this paper we describe our work towards a new distributed architecture that maintains high scheduling quality while solving most of the single server issues. Our new distributed architecture provides both local autonomy and high scheduling quality. Users can still submit jobs locally even when cross-site connectivity is lost. Individual schedulers work primarily with their local server but still maintain global state, that allows them to mimic centralised scheduling features. The architecture still supports central accounting and fairshare across the entire grid. Implementation is based on the open-source Torque batch system, which replaced the previous commercial PBSPro central server installation. Torque provides a similar codebase as it has a common ancestor with PBSPro in OpenPBS. Torque therefore provides familiar interface for both users and developers
Practical Experiences With Torque Meta-Scheduling In The Czech National Grid
The Czech National Grid Infrastructure went through a complex transition inthe last year. The production environment has been switched from a commercialbatch system PBSPro, which was replaced by an open source alternative Torquebatch system.This paper concentrates on two aspects of this transition. First, we will presentour practical experience with Torque being used as a production ready batchsystem. Our modified version of Torque, with all the necessary PBSPro ex-clusive features re-implemented and further extended with new features likecloud-like behaviour, was deployed across the entire production environment,covering the entire Czech Republic for almost a full year.In the second part, we will present our work on meta-scheduling. This in-volves our work on distributed architecture and cloud-grid convergence. Thedistributed architecture was designed to overcome the limitations of a centralserver setup, which was originally used and presented stability and performanceissues. While this paper does not discuss the inclusion of cloud interfaces intogrids, it does present the dynamic infrastructure, which is a requirement forsharing the grid infrastructure between a batch system and a cloud gateway.We are also inviting everyone to try out our fork of the Torque batch system,which is now publicly available
- …