2 research outputs found
Game theoretic analysis of the slurm scheduler model
In the context of High Performance Computing, scheduling
is a necessary tool to ensure that there exists acceptable
quality of service for the many users of the processing power
available. The scheduling process can vary from a simple First
Comes First Served model to a wide variety of more complex
implementations that tend to satisfy specific requirements
from each group of users. Slurm is an open source, faulttolerant,
and highly scalable cluster management system for
large and small Linux clusters [1]. MareNostrum 4, a High
Performance Computer, implements it to manage the execution
of jobs send to it by a variety of users [2]. Previous work
has been done from an algorithmic approach that attempts
at directly reduce queuing times among other costs [3][4].
We consider that there is utility at looking at the problem
also from a Game Theoretic perspective to define clearly the
mechanics involved in the system, and also those that define
the influx of tasks that the scheduler manages. We model the
Slurm scheduling mechanism using Game Theoretic concepts,
tools, and reasonable simplifications in an attempt to formally
characterize and study it. We identify variables that play a
significant role in the scheduling process and also experiment
with changes in the model that could make users behave
in a way that would improve overall quality of service. We
recognize that the complexity of the models might derive in
difficulty to theoretically analyze them, so we make use of
usage data derived from real usage from BSC-CNS users to
measure performance. The real usage data is extracted from
Autosubmit [5], a workflow manager developed at the Earth
Science Department at BSC-CNS. This is a convenient choice,
given that we also attempt to measure the influence of an
external agent (e.g. a workflow manager) could have in the
overall quality of service if it imposes restrictions, and the
nature of these restrictions
The High Perfomance Scheduler Game: A Characterization of Slurm, Metrics, and the Viability of Cooperation
The Slurm Scheduler is a widely used tool for scheduling in High Per-
formance Computing platforms around the world. Several studies have
been conducted to nd ways to improve speci c performance metrics,
mainly from an algorithmic perspective. Scheduling has also been stud-
ied from the viewpoint of Game Theory, where models that attempt to
capture the main characteristics of the problem are developed and an-
alyzed. In this study, we have used the tools that Algorithmic Game
Theory provides to develop and study a model that captures some of the
main characteristics of the Slurm Scheduler. We developed the necessary
software to test these models. We performed a thorough data analysis pro-
cess to build a reliable data source based on real usage information. Then,
through experimentation, we analyzed how our model and its variants be-
have; furthermore, we compared these results with the results from an
existing Slurm Simulator, developed by Barcelona Supercomputing Cen-
ter members. Using these results, we calculated an approximate value for
the Price of Anarchy, and we discuss the Viability of Cooperation in the
context of the Slurm Scheduler