An study of the effect of process malleability in the energy efficiency on GPU‑based clusters

Abstract

The adoption of graphic processor units (GPU) in high-performance computing (HPC) infrastructures determines, in many cases, the energy consumption of those facilities. For this reason, an efficient management and administration of the GPU-enabled clusters is crucial for the optimum operation of the cluster. The main aim of this work is to study and design efficient mechanisms of job scheduling across GPU-enabled clusters by leveraging process malleability techniques, able to reconfigure running jobs, depending on the cluster status. This paper presents a model that improves the energy efficiency when processing a batch of jobs in an HPC cluster. The model is validated through the MPDATA algorithm, as a representative example of stencil computation used in numerical weather prediction. The proposed solution applies the efficiency metrics obtained in a new reconfiguration policy aimed at job arrays. This solution allows the reduction in the processing time of workloads up to 4.8 times and reduction in the energy consumption up to 2.4 times the cluster compared to the traditional job management, where jobs are not reconfigured during their execution

    Similar works