Search CORE

205 research outputs found

A study of scheduling problems with preemptions on multi-core computers with GPU accelerators

Author: Błażewicz Jacek
Kedad-Sidhoum Safia
Monna Florence
Mounié Grégory
Trystram Denis
Publication venue: 'Elsevier BV'
Publication date: 01/12/2015
Field of study

International audienceFor many years, scheduling problems have been concerned either with parallel processor systems or with dedicated processors-job shop type systems. With a development of new computing architectures this partition is no longer so obvious. Multi-core (processor) computers equipped with GPU co-processors require new scheduling strategies. This paper is devoted to a characterization of this new type of scheduling problems. After a thorough introduction of the new model of a computing system, an extension of the classical notation of scheduling problems is proposed. A special attention is paid to preemptions, since this feature of the new architecture differs the most as compared with the classical model. In the paper, several scheduling algorithms, new ones and those refining classical approaches, are presented. Possible extensions of the model are also discussed

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Scheduling independent tasks on multi-cores with GPU accelerators

Author: Bleuse Raphaël
Kedad-Sidhoum Safia
Monna Florence
Mounié Grégory
Trystram Denis
Publication venue: 'Wiley'
Publication date: 25/04/2015
Field of study

International audienceMore and more computers use hybrid architectures combining multi-core processors and hardware accelerators like GPUs (Graphics Process-ing Units). We present in this paper a new method for scheduling efficiently parallel applications with m CPUs and k GPUs, where each task of the appli-cation can be processed either on a core (CPU) or on a GPU. The objective is to minimize the maximum completion time (makespan). The corresponding scheduling problem is NP-hard, we propose an efficient approximation algo-rithm which achieves an approximation ratio of 4 3 + 1 3k . We first detail and analyze the method, based on a dual approximation scheme, that uses dynamic programming to balance evenly the load between the heterogeneous resources. Then, we present a faster approximation algorithm for a special case of the previous problem, where all the tasks are accelerated when affected to GPU, with a performance guarantee of 3 2 for any number of GPUs. We run some simulations based on realistic benchmarks and compare the solutions obtained by a relaxed version of the generic method to the one provided by a classical scheduling algorithm (HEFT). Finally, we present an implementation of the 4/3-approximation and its relaxed version on a classical linear algebra kernel into the scheduler of the xKaapi runtime system

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Scheduling on Hybrid Platforms: Improved Approximability Window

Author: Fagnon Vincent
Kacem Imed
Lucarelli Giorgio
Simon Bertrand
Publication venue
Publication date: 09/02/2020
Field of study

Modern platforms are using accelerators in conjunction with standard processing units in order to reduce the running time of specific operations, such as matrix operations, and improve their performance. Scheduling on such hybrid platforms is a challenging problem since the algorithms used for the case of homogeneous resources do not adapt well. In this paper we consider the problem of scheduling a set of tasks subject to precedence constraints on hybrid platforms, composed of two types of processing units. We propose a

(3+2\sqrt{2})

-approximation algorithm and a conditional lower bound of 3 on the approximation ratio. These results improve upon the 6-approximation algorithm proposed by Kedad-Sidhoum et al. as well as the lower bound of 2 due to Svensson for identical machines. Our algorithm is inspired by the former one and distinguishes the allocation and the scheduling phases. However, we propose a different allocation procedure which, although is less efficient for the allocation sub-problem, leads to an improved approximation ratio for the whole scheduling problem. This approximation ratio actually decreases when the number of processing units of each type is close and matches the conditional lower bound when they are equal

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Hal-Diderot

Energy-aware scheduling in heterogeneous computing systems

Author: Iturriaga Santiago
Publication venue: Udelar. FI.
Publication date: 01/01/2013
Field of study

In the last decade, the grid computing systems emerged as useful provider of the computing power required for solving complex problems. The classic formulation of the scheduling problem in heterogeneous computing systems is NP-hard, thus approximation techniques are required for solving real-world scenarios of this problem. This thesis tackles the problem of scheduling tasks in a heterogeneous computing environment in reduced execution times, considering the schedule length and the total energy consumption as the optimization objectives. An efficient multithreading local search algorithm for solving the multi-objective scheduling problem in heterogeneous computing systems, named MEMLS, is presented. The proposed method follows a fully multi-objective approach, applying a Pareto-based dominance search that is executed in parallel by using several threads. The experimental analysis demonstrates that the new multithreading algorithm outperforms a set of fast and accurate two-phase deterministic heuristics based on the traditional MinMin. The new ME-MLS method is able to achieve significant improvements in both makespan and energy consumption objectives in reduced execution times for a large set of testbed instances, while exhibiting very good scalability. The ME-MLS was evaluated solving instances comprised of up to 2048 tasks and 64 machines. In order to scale the dimension of the problem instances even further and tackle large-sized problem instances, the Graphical Processing Unit (GPU) architecture is considered. This line of future work has been initially tackled with the gPALS: a hybrid CPU/GPU local search algorithm for efficiently tackling a single-objective heterogeneous computing scheduling problem. The gPALS shows very promising results, being able to tackle instances of up to 32768 tasks and 1024 machines in reasonable execution times.En la última década, los sistemas de computación grid se han convertido en útiles proveedores de la capacidad de cálculo necesaria para la resolución de problemas complejos. En su formulación clásica, el problema de la planificación de tareas en sistemas heterogéneos es un problema NP difícil, por lo que se requieren técnicas de resolución aproximadas para atacar instancias de tamaño realista de este problema. Esta tesis aborda el problema de la planificación de tareas en sistemas heterogéneos, considerando el largo de la planificación y el consumo energético como objetivos a optimizar. Para la resolución de este problema se propone un algoritmo de búsqueda local eficiente y multihilo. El método propuesto se trata de un enfoque plenamente multiobjetivo que consiste en la aplicación de una búsqueda basada en dominancia de Pareto que se ejecuta en paralelo mediante el uso de varios hilos de ejecución. El análisis experimental demuestra que el algoritmo multithilado propuesto supera a un conjunto de heurísticas deterministas rápidas y e caces basadas en el algoritmo MinMin tradicional. El nuevo método, ME-MLS, es capaz de lograr mejoras significativas tanto en el largo de la planificación y como en consumo energético, en tiempos de ejecución reducidos para un gran número de casos de prueba, mientras que exhibe una escalabilidad muy promisoria. El ME-MLS fue evaluado abordando instancias de hasta 2048 tareas y 64 máquinas. Con el n de aumentar la dimensión de las instancias abordadas y hacer frente a instancias de gran tamaño, se consideró la utilización de la arquitectura provista por las unidades de procesamiento gráfico (GPU). Esta línea de trabajo futuro ha sido abordada inicialmente con el algoritmo gPALS: un algoritmo híbrido CPU/GPU de búsqueda local para la planificación de tareas en en sistemas heterogéneos considerando el largo de la planificación como único objetivo. La evaluación del algoritmo gPALS ha mostrado resultados muy prometedores, siendo capaz de abordar instancias de hasta 32768 tareas y 1024 máquinas en tiempos de ejecución razonables

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Scheduling Independent Tasks on Multi-cores with GPU Accelerators

Author: Kedad-Sidhoum Safia
Monna Florence
Mounié Grégory
Trystram Denis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/08/2013
Field of study

Best PaperInternational audienceMore and more computers use hybrid architectures combin-ing multi-core processors and hardware accelerators like GPUs (Graphics Processing Units). We present in this paper a new method for scheduling efficiently parallel applications with

m

CPUs and

k

GPUs, where each task of the application can be processed either on a core (CPU) or on a GPU. The objective is to minimize the makespan. The corresponding scheduling problem is NP-hard, we propose an efficient approximation algorithm which achieves an approximation ratio of

\frac{4}{3} + \frac{1}{3k}

. We first detail and analyze the method, based on a dual approximation scheme, that uses a dynamic programming scheme to balance evenly the load between the heterogeneous resources. Finally, we run some simulations based on realistic benchmarks and compare the solution obtained by a relaxed version of this method to the one provided by a classical greedy algorithm and to lower bounds on the value of the optimal makespan

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Scheduling Independent Moldable Tasks on Multi-Cores with GPUs

Author: Bleuse Raphaël
Hunold Sascha
Kedad-Sidhoum Safia
Monna Florence
Mounié Grégory
Trystram Denis
Publication venue: HAL CCSD
Publication date: 01/01/2016
Field of study

The number of parallel systems using accelerators is growing up.The technology is now mature enough to allow sustainedpetaflop/s. However, reaching this performance scale requiresefficient scheduling algorithms to manage the heterogeneouscomputing resources.We present a new approach for scheduling independent tasks onmultiple CPUs and multiple GPUs. The tasks are assumed to beparallelizable on CPUs using the moldable model: the final numberof cores allotted to a task can be decided and set by thescheduler. More precisely, we design an algorithm aiming atminimizing the makespan---the maximum completion time of alltasks---for this scheduling problem. The proposed algorithmcombines a dual approximation scheme with a fast integer linearprogram (ILP). It determines both the partitioning of the tasks,ie whether a task should be mapped to CPUs or a GPU, and thenumber of CPUs allotted to a moldable task if mapped to the CPUs.A worst case analysis shows that the algorithm has anapproximation ratio of

\frac{3}{2} + \epsilon

. However, sincethe complexity of the ILP-based algorithm could benon-polynomial, we also present a proved polynomial-timealgorithm with an approximation ratio of

2+\epsilon

.We complement the theoretical analysis of our two novelalgorithms with an experimental study. In these experiments, wecompare our algorithms to a modified version of the classical\heft algorithm, adapted to handle moldable tasks. Theexperimental results show that our algorithm with the

\frac{3}{2} + \epsilon

approximation ratio producessignificantly shorter schedules than the modified \heft for mostof the instances. In addition, the experiments provide evidencethat this ILP-based algorithm is also practically able to solvelarger problem instances in a reasonable amount of time

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Impact of communication times on mixed CPU/GPU applications scheduling using KAAPI

Author: Beniamine David
Publication venue: HAL CCSD
Publication date: 20/06/2013
Field of study

National audienceHigh Performance Computing machines use more and more Graphical Processing Units as they are very efficient for homogeneous computation such as matrix operations. However before using these accelerators, one has to transfer data from the processor to them. Such a transfer can be slow. In this report, our aim is to study the impact of communication times on the makespan of a scheduling. Indeed, with a better anticipation of these communications, we could use the GPUs even more efficiently. More precisely, we will focus on machines with one or more GPUs and on applications with a low ratio of computations over communications. During this study, we have implemented two offline scheduling algorithms within XKAAPI's runtime. Then we have led an experimental study, combining these algorithms to highlight the impact of communication times. Finally our study has shown that, by using communication aware scheduling algorithms, we can reduce substantially the makespan of an application. Our experiments have shown a reduction of this makespan up to

64\%

on a machine with several GPUs executing homogeneous computations

Hal - Université Grenoble Alpes

An EPTAS for Scheduling on Unrelated Machines of Few Different Types

Author: A Asadpour
C Imreh
DS Hochbaum
E Horowitz
F Eisenbrand
GJ Woeginger
HW Lenstra Jr
I Bezáková
JC Gehrke
JK Lenstra
K Jansen
L Chen
R Bleuse
R Kannan
Publication venue
Publication date: 06/12/2017
Field of study

In the classical problem of scheduling on unrelated parallel machines, a set of jobs has to be assigned to a set of machines. The jobs have a processing time depending on the machine and the goal is to minimize the makespan, that is the maximum machine load. It is well known that this problem is NP-hard and does not allow polynomial time approximation algorithms with approximation guarantees smaller than

1.5

unless P

=

NP. We consider the case that there are only a constant number

K

of machine types. Two machines have the same type if all jobs have the same processing time for them. This variant of the problem is strongly NP-hard already for

K=1

. We present an efficient polynomial time approximation scheme (EPTAS) for the problem, that is, for any

\varepsilon > 0

an assignment with makespan of length at most

(1+\varepsilon)

times the optimum can be found in polynomial time in the input length and the exponent is independent of

1/\varepsilon

. In particular we achieve a running time of

2^{\mathcal{O}(K\log(K) \frac{1}{\varepsilon}\log^4 \frac{1}{\varepsilon})}+\mathrm{poly}(|I|)

, where

|I|

denotes the input length. Furthermore, we study three other problem variants and present an EPTAS for each of them: The Santa Claus problem, where the minimum machine load has to be maximized; the case of scheduling on unrelated parallel machines with a constant number of uniform types, where machines of the same type behave like uniformly related machines; and the multidimensional vector scheduling variant of the problem where both the dimension and the number of machine types are constant. For the Santa Claus problem we achieve the same running time. The results are achieved, using mixed integer linear programming and rounding techniques

arXiv.org e-Print Archive

Crossref