Search CORE

170 research outputs found

Exploring the Relation between Two Levels of scheduling Using a Novel Simulation Approach

Author: Ciorba Florina M.
Eleliemy Ahmed
Mohammed Ali
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

Modern high performance computing (HPC) systems exhibit a rapid growth in size, both “horizontally” in the number of nodes, as well as “vertically” in the number of cores per node. As such, they offer additional levels of hardware parallelism. Each level requires and employs algorithms for appropriately scheduling the computational work at the respective level. The present work explores the relation between two scheduling levels: batch and application. To understand and explore this relation, a novel simulation approach is presented that bridges two existing simulators from the two scheduling levels. A novel two-level simulator that implements the proposed approach is introduced. The two-level simulator is used to simulate all combinations of three batch scheduling and four application scheduling algorithms from the literature. These combinations are considered for allocating resources and executing the parallel jobs from a workload of a production HPC system. The results of the scheduling experiments reveal the strong relation between decisions taken at the two scheduling levels and their mutual influence. Complementing the simulations, the two-level simulator produces abstract parallel execution traces, which can visually be examined and illustrate the execution of different jobs and, for each job, the execution of its tasks at node and core levels, respectively

arXiv.org e-Print Archive

Crossref

edoc

Meta-scheduling Issues in Interoperable HPCs, Grids and Clouds

Author: Bessis N.
Cristea V.
Pop F.
Sotiriadis Stelios
Xhafa F.
Publication venue: 'Inderscience Publishers'
Publication date: 01/01/2012
Field of study

Over the last years, interoperability among resources has been emerged as one of the most challenging research topics. However, the commonality of the complexity of the architectures (e.g., heterogeneity) and the targets that each computational paradigm including HPC, grids and clouds aims to achieve (e.g., flexibility) remain the same. This is to efficiently orchestrate resources in a distributed computing fashion by bridging the gap among local and remote participants. Initially, this is closely related with the scheduling concept which is one of the most important issues for designing a cooperative resource management system, especially in large scale settings such as in grids and clouds. Within this context, meta-scheduling offers additional functionalities in the area of interoperable resource management, this is because of its great agility to handle sudden variations and dynamic situations in user demands. Accordingly, the case of inter-infrastructures, including InterCloud, entitle that the decentralised meta-scheduling scheme overcome issues like consolidated administration management, bottleneck and local information exposition. In this work, we detail the fundamental issues for developing an effective interoperable meta-scheduler for e-infrastructures in general and InterCloud in particular. Finally, we describe a simulation and experimental configuration based on real grid workload traces to demonstrate the interoperable setting as well as provide experimental results as part of a strategic plan for integrating future meta-schedulers

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

UPCommons. Portal del coneixement obert de la UPC

Edge Hill University Research Information Repository

Birkbeck Institutional Research Online

Scale Ratio Tuning of Group Based Job Scheduling in HPC Systems

Author: N Telegin P.
S. Lyakhovets D.
V. Baranov A.
Publication venue
Publication date: 29/11/2023
Field of study

During the initialization of a supercomputer job, no useful calculations are performed. A high proportion of initialization time results in idle computing resources and less computational efficiency. Certain methods and algorithms combining jobs into groups are used to optimize scheduling of jobs with high initialization proportion. The article considers the influence of the scale ratio setting in algorithm for the job groups formation, on the performance metrics of the workload manager. The study was carried out on the developed by authors Aleabased workload manager model. The model makes it possible to conduct a large number of experiments in reasonable time without losing the accuracy of the simulation. We performed a series of experiments involving various characteristics of the workload. The article represents the results of a study of the scale ratio influence on efficiency metrics for different initialization time proportions and input workflows with varying intensity and homogeneity. The presented results allow the workload managers administrators to set a scale ratio that provides an appropriate balance with contradictory efficiency metrics

arXiv.org e-Print Archive

Simulating Batch and Application Level Scheduling Using GridSim and SimGrid

Author: Ciorba Florina M.
Eleliemy Ahmed
Mohammed Ali
Publication venue: 29th ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2016)
Publication date: 01/01/2016
Field of study

Modern high performance computing (HPC) sys- tems are increasing in the complexity of their design and in the levels of parallelism they offer. Studying and enhancing scheduling in HPC became very interesting for two main as- pects. First, scheduling decisions are taken by different types of schedulers such as batch, application, process, and thread schedulers. Second, simulation has become an important tool to examine the design of HPC systems. Therefore, in this work, we study the simulation of different scheduling levels. We used two well-known simulation toolkits, SimGrid and GridSim, in order to support two different scheduling levels, batch and application level scheduling. Each toolkit is extended to support both levels. Moreover, three different scheduling algorithms for each level are implemented and their performance is examined through a real workload dataset. Finally, a comparison for the extension challenges of the two simulators is conducted

edoc

Batsim: a Realistic Language-Independent Resources and Jobs Management Systems Simulator

Author: Dutot Pierre-François
Mercier Michael
Poquet Millian
Richard Olivier
Publication venue: HAL CCSD
Publication date: 27/05/2016
Field of study

International audienceAs large scale computation systems are growing to exascale, Resources and Jobs Management Systems (RJMS) need to evolve to manage this scale modification. However, their study is problematic since they are critical production systems, where experimenting is extremely costly due to downtime and energy costs. Meanwhile, many scheduling algorithms emerging from theoretical studies have not been transferred to production tools for lack of realistic experimental validation. To tackle these problems we propose Batsim, an extendable, language-independent and scalable RJMS simulator. It allows researchers and engineers to test and compare any scheduling algorithm, using a simple event-based communication interface, which allows different levels of realism. In this paper we show that Batsim's behaviour matches the one of the real RJMS OAR. Our evaluation process was made with reproducibility in mind and all the experiment material is freely available

Hal - Université Grenoble Alpes

Exploring Scheduling for On-demand File Systems and Data Management within HPC Environments

Author: Soysal Mehmet
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 15/03/2021
Field of study

KITopen