6 research outputs found

    Data-driven job dispatching in HPC systems

    Get PDF
    As High Performance Computing (HPC) systems get closer to exascale performance, job dispatching strategies become critical for keeping system utilization high while keeping waiting times low for jobs competing for HPC system resources. In this paper, we take a data-driven approach and investigate whether better dispatching decisions can be made by transforming the log data produced by an HPC system into useful knowledge about its workload. In particular, we focus on job duration, develop a data-driven approach to job duration prediction, and analyze the effect of different prediction approaches in making dispatching decisions using a real workload dataset collected from Eurora, a hybrid HPC system. Experiments on various dispatching methods show promising results

    Power-Aware Job Dispatching in High Performance Computing Systems

    Get PDF
    This works deals with the power-aware job dispatching problem in supercomputers; broadly speaking the dispatching consists of assigning finite capacity resources to a set of activities, with a special concern toward power and energy efficient solutions. We introduce novel optimization approaches to address its multiple aspects. The proposed techniques have a broad application range but are aimed at applications in the field of High Performance Computing (HPC) systems. Devising a power-aware HPC job dispatcher is a complex, where contrasting goals must be satisfied. Furthermore, the online nature of the problem request that solutions must be computed in real time respecting stringent limits. This aspect historically discouraged the usage of exact methods and favouring instead the adoption of heuristic techniques. The application of optimization approaches to the dispatching task is still an unexplored area of research and can drastically improve the performance of HPC systems. In this work we tackle the job dispatching problem on a real HPC machine, the Eurora supercomputer hosted at the Cineca research center, Bologna. We propose a Constraint Programming (CP) model that outperforms the dispatching software currently in use. An essential element to take power-aware decisions during the job dispatching phase is the possibility to estimate jobs power consumptions before their execution. To this end, we applied Machine Learning techniques to create a prediction model that was trained and tested on the Euora supercomputer, showing a great prediction accuracy. Then we finally develop a power-aware solution, considering the same target machine, and we devise different approaches to solve the dispatching problem while curtailing the power consumption of the whole system under a given threshold. We proposed a heuristic technique and a CP/heuristic hybrid method, both able to solve practical size instances and outperform the current state-of-the-art techniques

    Scalable optimization-based Scheduling approaches for HPC facilities

    Get PDF
    This Thesis deals with the problem of scheduling applications on High-Performance Computing (HPC) machines. The goal is to create a scheduler that can improve the solutions w.r.t. the state-of-the-art under different metrics. However, improving the solution quality is not enough: creating a scheduler for future HPC machines requires to take into account also overheads and scalability. In this thesis we present a comprehensive, scalable, scheduling approach that features both an off-line and an on-line component. The off-line component is based on Constraint Programming (CP), an optimization technique that is well-suited for scheduling problems and allows for great flexibility. We leverage this flexibility to present first a optimization method designed to optimize the job waiting times, which is then extended via heuristics and search strategies to deal with more complex objective functions. Unfortunately, such a complex objective function cannot be handled by a solver in an acceptable amount of time for online operation on a HPC machine in-production. We deal with this difficulty by making use of a second, distributed, on-line scheduler. This second scheduler is designed to dramatically decrease the computational overhead and achieve a scalability adequate to future ExaFlops HPC machines. The distributed scheduler is proactive, and it takes decisions so as to follow a desirable, pre-specified, utilization profile. This feature makes it possible to connect these two schedulers to create a hybrid system: the CP component computes the scheduling on a trace of forecasted jobs one day ahead, machine learning techniques extract from the solution a near-optimal and desirable utilization profile, and the online scheduler takes care of the actual scheduling decisions in a scalable fashion. The resulting architecture manages to improve the HPC machine profit by an average 8.6%, while decreasing the computational overhead and, under normal conditions, without any side effect

    Development of Data-Driven Dispatching Heuristics for Heterogeneous HPC Systems

    Get PDF
    Nell’ambito dei sistemi High-Performance Computing, l'uso di euristiche di dispatching efficaci, per lo scheduling e l'allocazione dei jobs in arrivo, è fondamentale al fine di ottenere buoni livelli di Quality of Service. In questo elaborato ci concentreremo sul design e l’analisi di euristiche di allocazione delle risorse, che saranno progettate per sistemi HPC eterogenei, nei quali i nodi possono essere equipaggiati con diverse tipologie di unità di elaborazione. Impiegheremo poi euristiche data-driven per la predizione della durata dei jobs, e valuteremo il tutto dal punto di vista del throughput di sistema. Considereremo in particolare Eurora, un sistema HPC eterogeneo realizzato da CINECA, oltre che un workload catturato dal relativo log di sistema, contenente jobs reali inviati dagli utenti. Tutto ciò è stato possibile grazie ad AccaSim, un simulatore di sistemi HPC sviluppato nel Dipartimento di Informatica - Scienza e Ingegneria (DISI) dell’Università di Bologna, ed al quale si è contribuito in modo sostanziale. Quest’elaborato mostra che l’impatto di diverse euristiche di allocazione sul throughput di un sistema HPC eterogeneo non è trascurabile, con variazioni in grado di raggiungere picchi di un ordine di grandezza, e più pronunciate considerando brevi intervalli temporali, dell'ordine dei mesi. Abbiamo inoltre osservato che l’impiego di euristiche per la predizione della durata dei jobs è di grande beneficio al throughput su tutte le euristiche di allocazione, e specialmente su quelle che integrano in maniera più profonda tali elementi data-driven. Infine, l’analisi effettuata ha permesso di caratterizzare integralmente il sistema Eurora ed il relativo workload, permettendoci di comprendere al meglio gli effetti su di esso dei diversi metodi di dispatching, nonché di estendere le nostre considerazioni anche ad altre classi di sistemi

    Constraint Programming-based Job Dispatching for Modern HPC Applications

    Get PDF
    A High-Performance Computing job dispatcher is a critical software that assigns the finite computing resources to submitted jobs. This resource assignment over time is known as the on-line job dispatching problem in HPC systems. The fact the problem is on-line means that solutions must be computed in real-time, and their required time cannot exceed some threshold to do not affect the normal system functioning. In addition, a job dispatcher must deal with a lot of uncertainty: submission times, the number of requested resources, and duration of jobs. Heuristic-based techniques have been broadly used in HPC systems, at the cost of achieving (sub-)optimal solutions in a short time. However, the scheduling and resource allocation components are separated, thus generates a decoupled decision that may cause a performance loss. Optimization-based techniques are less used for this problem, although they can significantly improve the performance of HPC systems at the expense of higher computation time. Nowadays, HPC systems are being used for modern applications, such as big data analytics and predictive model building, that employ, in general, many short jobs. However, this information is unknown at dispatching time, and job dispatchers need to process large numbers of them quickly while ensuring high Quality-of-Service (QoS) levels. Constraint Programming (CP) has been shown to be an effective approach to tackle job dispatching problems. However, state-of-the-art CP-based job dispatchers are unable to satisfy the challenges of on-line dispatching, such as generate dispatching decisions in a brief period and integrate current and past information of the housing system. Given the previous reasons, we propose CP-based dispatchers that are more suitable for HPC systems running modern applications, generating on-line dispatching decisions in a proper time and are able to make effective use of job duration predictions to improve QoS levels, especially for workloads dominated by short jobs
    corecore