170 research outputs found
Exploring the Relation between Two Levels of scheduling Using a Novel Simulation Approach
Modern high performance computing (HPC) systems exhibit a rapid growth in size, both “horizontally” in the number of nodes, as well as “vertically” in the number of cores per node. As such, they offer additional levels of hardware parallelism. Each level requires and employs algorithms for appropriately scheduling the computational work at the respective level. The present work explores the relation between two scheduling levels: batch and application. To understand and explore this relation, a novel simulation approach is presented that bridges two existing simulators from the two scheduling levels. A novel two-level simulator that implements the proposed
approach is introduced. The two-level simulator is used to simulate all combinations of three batch scheduling and four application scheduling algorithms from the literature. These combinations are considered for allocating resources and executing the parallel jobs from a workload of a production HPC system. The results of the scheduling experiments reveal the strong relation between decisions taken at the two scheduling levels and their mutual influence. Complementing the simulations, the two-level simulator produces abstract parallel execution traces, which can visually be examined and illustrate the execution of different jobs and, for each job, the execution of its tasks at node and core levels, respectively
Meta-scheduling Issues in Interoperable HPCs, Grids and Clouds
Over the last years, interoperability among resources has been emerged as one of the most challenging research topics. However, the commonality of the complexity of the architectures (e.g., heterogeneity) and the targets that each computational paradigm including HPC, grids and clouds aims to achieve (e.g., flexibility) remain the same. This is to efficiently orchestrate resources in a distributed computing fashion by bridging the gap among local and remote participants. Initially, this is closely related with the scheduling concept which is one of the most important issues for designing a cooperative resource management system, especially in large scale settings such as in grids and clouds. Within this context, meta-scheduling offers additional functionalities in the area of interoperable resource management, this is because of its great agility to handle sudden variations and dynamic situations in user demands. Accordingly, the case of inter-infrastructures, including InterCloud, entitle that the decentralised meta-scheduling scheme overcome issues like consolidated administration management, bottleneck and local information exposition. In this work, we detail the fundamental issues for developing an effective interoperable meta-scheduler for e-infrastructures in general and InterCloud in particular. Finally, we describe a simulation and experimental configuration based on real grid workload traces to demonstrate the interoperable setting as well as provide experimental results as part of a strategic plan for integrating future meta-schedulers
Scale Ratio Tuning of Group Based Job Scheduling in HPC Systems
During the initialization of a supercomputer job, no useful calculations are
performed. A high proportion of initialization time results in idle computing
resources and less computational efficiency. Certain methods and algorithms
combining jobs into groups are used to optimize scheduling of jobs with high
initialization proportion. The article considers the influence of the scale
ratio setting in algorithm for the job groups formation, on the performance
metrics of the workload manager. The study was carried out on the developed by
authors Aleabased workload manager model. The model makes it possible to
conduct a large number of experiments in reasonable time without losing the
accuracy of the simulation. We performed a series of experiments involving
various characteristics of the workload. The article represents the results of
a study of the scale ratio influence on efficiency metrics for different
initialization time proportions and input workflows with varying intensity and
homogeneity. The presented results allow the workload managers administrators
to set a scale ratio that provides an appropriate balance with contradictory
efficiency metrics
Simulating Batch and Application Level Scheduling Using GridSim and SimGrid
Modern high performance computing (HPC) sys- tems are increasing in the complexity of their design and in the levels of parallelism they offer. Studying and enhancing scheduling in HPC became very interesting for two main as- pects. First, scheduling decisions are taken by different types of schedulers such as batch, application, process, and thread schedulers. Second, simulation has become an important tool to examine the design of HPC systems. Therefore, in this work, we study the simulation of different scheduling levels. We used two well-known simulation toolkits, SimGrid and GridSim, in order to support two different scheduling levels, batch and application level scheduling. Each toolkit is extended to support both levels. Moreover, three different scheduling algorithms for each level are implemented and their performance is examined through a real workload dataset. Finally, a comparison for the extension challenges of the two simulators is conducted
Batsim: a Realistic Language-Independent Resources and Jobs Management Systems Simulator
International audienceAs large scale computation systems are growing to exascale, Resources and Jobs Management Systems (RJMS) need to evolve to manage this scale modification. However, their study is problematic since they are critical production systems, where experimenting is extremely costly due to downtime and energy costs. Meanwhile, many scheduling algorithms emerging from theoretical studies have not been transferred to production tools for lack of realistic experimental validation. To tackle these problems we propose Batsim, an extendable, language-independent and scalable RJMS simulator. It allows researchers and engineers to test and compare any scheduling algorithm, using a simple event-based communication interface, which allows different levels of realism. In this paper we show that Batsim's behaviour matches the one of the real RJMS OAR. Our evaluation process was made with reproducibility in mind and all the experiment material is freely available
- …