Scheduling Independent Moldable Tasks on Multi-Cores with GPUs

Bleuse, Raphaël; Hunold, Sascha; Kedad-Sidhoum, Safia; Monna, Florence; Mounié, Grégory; Trystram, Denis

research

Scheduling Independent Moldable Tasks on Multi-Cores with GPUs

Authors: Raphaël Bleuse
Sascha Hunold
Safia Kedad-Sidhoum
Florence Monna
Grégory Mounié
Denis Trystram
Publication date: 1 January 2016
Publisher: HAL CCSD
Doi

Abstract

The number of parallel systems using accelerators is growing up.The technology is now mature enough to allow sustainedpetaflop/s. However, reaching this performance scale requiresefficient scheduling algorithms to manage the heterogeneouscomputing resources.We present a new approach for scheduling independent tasks onmultiple CPUs and multiple GPUs. The tasks are assumed to beparallelizable on CPUs using the moldable model: the final numberof cores allotted to a task can be decided and set by thescheduler. More precisely, we design an algorithm aiming atminimizing the makespan---the maximum completion time of alltasks---for this scheduling problem. The proposed algorithmcombines a dual approximation scheme with a fast integer linearprogram (ILP). It determines both the partitioning of the tasks,ie whether a task should be mapped to CPUs or a GPU, and thenumber of CPUs allotted to a moldable task if mapped to the CPUs.A worst case analysis shows that the algorithm has anapproximation ratio of

\frac{3}{2} + \epsilon

. However, sincethe complexity of the ILP-based algorithm could benon-polynomial, we also present a proved polynomial-timealgorithm with an approximation ratio of

2+\epsilon

.We complement the theoretical analysis of our two novelalgorithms with an experimental study. In these experiments, wecompare our algorithms to a modified version of the classical\heft algorithm, adapted to handle moldable tasks. Theexperimental results show that our algorithm with the

\frac{3}{2} + \epsilon

approximation ratio producessignificantly shorter schedules than the modified \heft for mostof the instances. In addition, the experiments provide evidencethat this ILP-based algorithm is also practically able to solvelarger problem instances in a reasonable amount of time