170 research outputs found

    Exploring the Relation between Two Levels of scheduling Using a Novel Simulation Approach

    Get PDF
    Modern high performance computing (HPC) systems exhibit a rapid growth in size, both “horizontally” in the number of nodes, as well as “vertically” in the number of cores per node. As such, they offer additional levels of hardware parallelism. Each level requires and employs algorithms for appropriately scheduling the computational work at the respective level. The present work explores the relation between two scheduling levels: batch and application. To understand and explore this relation, a novel simulation approach is presented that bridges two existing simulators from the two scheduling levels. A novel two-level simulator that implements the proposed approach is introduced. The two-level simulator is used to simulate all combinations of three batch scheduling and four application scheduling algorithms from the literature. These combinations are considered for allocating resources and executing the parallel jobs from a workload of a production HPC system. The results of the scheduling experiments reveal the strong relation between decisions taken at the two scheduling levels and their mutual influence. Complementing the simulations, the two-level simulator produces abstract parallel execution traces, which can visually be examined and illustrate the execution of different jobs and, for each job, the execution of its tasks at node and core levels, respectively

    Meta-scheduling Issues in Interoperable HPCs, Grids and Clouds

    Get PDF
    Over the last years, interoperability among resources has been emerged as one of the most challenging research topics. However, the commonality of the complexity of the architectures (e.g., heterogeneity) and the targets that each computational paradigm including HPC, grids and clouds aims to achieve (e.g., flexibility) remain the same. This is to efficiently orchestrate resources in a distributed computing fashion by bridging the gap among local and remote participants. Initially, this is closely related with the scheduling concept which is one of the most important issues for designing a cooperative resource management system, especially in large scale settings such as in grids and clouds. Within this context, meta-scheduling offers additional functionalities in the area of interoperable resource management, this is because of its great agility to handle sudden variations and dynamic situations in user demands. Accordingly, the case of inter-infrastructures, including InterCloud, entitle that the decentralised meta-scheduling scheme overcome issues like consolidated administration management, bottleneck and local information exposition. In this work, we detail the fundamental issues for developing an effective interoperable meta-scheduler for e-infrastructures in general and InterCloud in particular. Finally, we describe a simulation and experimental configuration based on real grid workload traces to demonstrate the interoperable setting as well as provide experimental results as part of a strategic plan for integrating future meta-schedulers

    Scale Ratio Tuning of Group Based Job Scheduling in HPC Systems

    Full text link
    During the initialization of a supercomputer job, no useful calculations are performed. A high proportion of initialization time results in idle computing resources and less computational efficiency. Certain methods and algorithms combining jobs into groups are used to optimize scheduling of jobs with high initialization proportion. The article considers the influence of the scale ratio setting in algorithm for the job groups formation, on the performance metrics of the workload manager. The study was carried out on the developed by authors Aleabased workload manager model. The model makes it possible to conduct a large number of experiments in reasonable time without losing the accuracy of the simulation. We performed a series of experiments involving various characteristics of the workload. The article represents the results of a study of the scale ratio influence on efficiency metrics for different initialization time proportions and input workflows with varying intensity and homogeneity. The presented results allow the workload managers administrators to set a scale ratio that provides an appropriate balance with contradictory efficiency metrics

    Simulating Batch and Application Level Scheduling Using GridSim and SimGrid

    Get PDF
    Modern high performance computing (HPC) sys- tems are increasing in the complexity of their design and in the levels of parallelism they offer. Studying and enhancing scheduling in HPC became very interesting for two main as- pects. First, scheduling decisions are taken by different types of schedulers such as batch, application, process, and thread schedulers. Second, simulation has become an important tool to examine the design of HPC systems. Therefore, in this work, we study the simulation of different scheduling levels. We used two well-known simulation toolkits, SimGrid and GridSim, in order to support two different scheduling levels, batch and application level scheduling. Each toolkit is extended to support both levels. Moreover, three different scheduling algorithms for each level are implemented and their performance is examined through a real workload dataset. Finally, a comparison for the extension challenges of the two simulators is conducted

    Batsim: a Realistic Language-Independent Resources and Jobs Management Systems Simulator

    No full text
    International audienceAs large scale computation systems are growing to exascale, Resources and Jobs Management Systems (RJMS) need to evolve to manage this scale modification. However, their study is problematic since they are critical production systems, where experimenting is extremely costly due to downtime and energy costs. Meanwhile, many scheduling algorithms emerging from theoretical studies have not been transferred to production tools for lack of realistic experimental validation. To tackle these problems we propose Batsim, an extendable, language-independent and scalable RJMS simulator. It allows researchers and engineers to test and compare any scheduling algorithm, using a simple event-based communication interface, which allows different levels of realism. In this paper we show that Batsim's behaviour matches the one of the real RJMS OAR. Our evaluation process was made with reproducibility in mind and all the experiment material is freely available

    Exploring Scheduling for On-demand File Systems and Data Management within HPC Environments

    Get PDF
    • …
    corecore