205 research outputs found
A study of scheduling problems with preemptions on multi-core computers with GPU accelerators
International audienceFor many years, scheduling problems have been concerned either with parallel processor systems or with dedicated processors-job shop type systems. With a development of new computing architectures this partition is no longer so obvious. Multi-core (processor) computers equipped with GPU co-processors require new scheduling strategies. This paper is devoted to a characterization of this new type of scheduling problems. After a thorough introduction of the new model of a computing system, an extension of the classical notation of scheduling problems is proposed. A special attention is paid to preemptions, since this feature of the new architecture differs the most as compared with the classical model. In the paper, several scheduling algorithms, new ones and those refining classical approaches, are presented. Possible extensions of the model are also discussed
Scheduling independent tasks on multi-cores with GPU accelerators
International audienceMore and more computers use hybrid architectures combining multi-core processors and hardware accelerators like GPUs (Graphics Process-ing Units). We present in this paper a new method for scheduling efficiently parallel applications with m CPUs and k GPUs, where each task of the appli-cation can be processed either on a core (CPU) or on a GPU. The objective is to minimize the maximum completion time (makespan). The corresponding scheduling problem is NP-hard, we propose an efficient approximation algo-rithm which achieves an approximation ratio of 4 3 + 1 3k . We first detail and analyze the method, based on a dual approximation scheme, that uses dynamic programming to balance evenly the load between the heterogeneous resources. Then, we present a faster approximation algorithm for a special case of the previous problem, where all the tasks are accelerated when affected to GPU, with a performance guarantee of 3 2 for any number of GPUs. We run some simulations based on realistic benchmarks and compare the solutions obtained by a relaxed version of the generic method to the one provided by a classical scheduling algorithm (HEFT). Finally, we present an implementation of the 4/3-approximation and its relaxed version on a classical linear algebra kernel into the scheduler of the xKaapi runtime system
Scheduling on Hybrid Platforms: Improved Approximability Window
Modern platforms are using accelerators in conjunction with standard
processing units in order to reduce the running time of specific operations,
such as matrix operations, and improve their performance. Scheduling on such
hybrid platforms is a challenging problem since the algorithms used for the
case of homogeneous resources do not adapt well. In this paper we consider the
problem of scheduling a set of tasks subject to precedence constraints on
hybrid platforms, composed of two types of processing units. We propose a
-approximation algorithm and a conditional lower bound of 3 on
the approximation ratio. These results improve upon the 6-approximation
algorithm proposed by Kedad-Sidhoum et al. as well as the lower bound of 2 due
to Svensson for identical machines. Our algorithm is inspired by the former one
and distinguishes the allocation and the scheduling phases. However, we propose
a different allocation procedure which, although is less efficient for the
allocation sub-problem, leads to an improved approximation ratio for the whole
scheduling problem. This approximation ratio actually decreases when the number
of processing units of each type is close and matches the conditional lower
bound when they are equal
Energy-aware scheduling in heterogeneous computing systems
In the last decade, the grid computing systems emerged as useful provider of the computing power required for solving complex problems.
The classic formulation of the scheduling problem in heterogeneous computing systems is NP-hard, thus approximation techniques are required for solving real-world scenarios of this problem. This thesis tackles the
problem of scheduling tasks in a heterogeneous computing environment in reduced execution times, considering the schedule length and the total energy consumption as the optimization objectives. An efficient multithreading local search algorithm for solving the multi-objective scheduling problem in heterogeneous computing systems, named MEMLS, is presented. The proposed method follows a fully multi-objective approach, applying a Pareto-based dominance search that is executed in parallel by using several threads. The experimental analysis demonstrates that the new multithreading algorithm outperforms a set of fast and accurate two-phase deterministic heuristics based on the traditional MinMin. The new ME-MLS method is able to achieve significant improvements in both makespan and energy consumption objectives in reduced execution times for a large set of testbed instances, while exhibiting very good scalability. The ME-MLS was evaluated solving instances
comprised of up to 2048 tasks and 64 machines. In order to scale the dimension of the problem instances even further and tackle large-sized problem instances, the Graphical Processing Unit (GPU) architecture is considered. This line of future work has been initially tackled with the gPALS: a hybrid CPU/GPU local search algorithm for
efficiently tackling a single-objective heterogeneous computing scheduling problem. The gPALS shows very promising results, being able to tackle instances of up to 32768 tasks and 1024 machines in reasonable
execution times.En la última década, los sistemas de computación grid se han convertido en útiles proveedores de la capacidad de cálculo necesaria para la resolución de problemas complejos. En su formulación clásica, el problema de
la planificación de tareas en sistemas heterogéneos es un problema NP difícil, por lo que se requieren técnicas de resolución aproximadas para atacar instancias de tamaño realista de este problema. Esta tesis aborda
el problema de la planificación de tareas en sistemas heterogéneos, considerando el largo de la planificación y el consumo energético como objetivos a optimizar. Para la resolución de este problema se propone un algoritmo de búsqueda local eficiente y multihilo. El método propuesto se trata de un enfoque plenamente multiobjetivo que consiste en la aplicación de una búsqueda basada en dominancia de Pareto que se ejecuta en paralelo mediante el uso de varios hilos de ejecución. El análisis experimental demuestra que el algoritmo multithilado propuesto supera a un conjunto de heurísticas deterministas rápidas y e caces basadas en el algoritmo MinMin tradicional. El nuevo método, ME-MLS, es capaz de lograr mejoras significativas tanto en el largo de la planificación y
como en consumo energético, en tiempos de ejecución reducidos para un gran número de casos de prueba, mientras que exhibe una escalabilidad muy promisoria. El ME-MLS fue evaluado abordando instancias de
hasta 2048 tareas y 64 máquinas. Con el n de aumentar la dimensión de las instancias abordadas y hacer frente a instancias de gran tamaño, se consideró la utilización de la arquitectura provista por las unidades de procesamiento gráfico (GPU). Esta línea de trabajo futuro ha sido abordada inicialmente con el algoritmo gPALS: un algoritmo híbrido CPU/GPU de búsqueda local para la planificación de tareas en en sistemas
heterogéneos considerando el largo de la planificación como único objetivo. La evaluación del algoritmo gPALS ha mostrado resultados muy prometedores, siendo capaz de abordar instancias de hasta 32768
tareas y 1024 máquinas en tiempos de ejecución razonables
Scheduling Independent Tasks on Multi-cores with GPU Accelerators
Best PaperInternational audienceMore and more computers use hybrid architectures combin-ing multi-core processors and hardware accelerators like GPUs (Graphics Processing Units). We present in this paper a new method for scheduling efficiently parallel applications with CPUs and GPUs, where each task of the application can be processed either on a core (CPU) or on a GPU. The objective is to minimize the makespan. The corresponding scheduling problem is NP-hard, we propose an efficient approximation algorithm which achieves an approximation ratio of . We first detail and analyze the method, based on a dual approximation scheme, that uses a dynamic programming scheme to balance evenly the load between the heterogeneous resources. Finally, we run some simulations based on realistic benchmarks and compare the solution obtained by a relaxed version of this method to the one provided by a classical greedy algorithm and to lower bounds on the value of the optimal makespan
Scheduling Independent Moldable Tasks on Multi-Cores with GPUs
The number of parallel systems using accelerators is growing up.The technology is now mature enough to allow sustainedpetaflop/s. However, reaching this performance scale requiresefficient scheduling algorithms to manage the heterogeneouscomputing resources.We present a new approach for scheduling independent tasks onmultiple CPUs and multiple GPUs. The tasks are assumed to beparallelizable on CPUs using the moldable model: the final numberof cores allotted to a task can be decided and set by thescheduler. More precisely, we design an algorithm aiming atminimizing the makespan---the maximum completion time of alltasks---for this scheduling problem. The proposed algorithmcombines a dual approximation scheme with a fast integer linearprogram (ILP). It determines both the partitioning of the tasks,ie whether a task should be mapped to CPUs or a GPU, and thenumber of CPUs allotted to a moldable task if mapped to the CPUs.A worst case analysis shows that the algorithm has anapproximation ratio of . However, sincethe complexity of the ILP-based algorithm could benon-polynomial, we also present a proved polynomial-timealgorithm with an approximation ratio of .We complement the theoretical analysis of our two novelalgorithms with an experimental study. In these experiments, wecompare our algorithms to a modified version of the classical\heft algorithm, adapted to handle moldable tasks. Theexperimental results show that our algorithm with the approximation ratio producessignificantly shorter schedules than the modified \heft for mostof the instances. In addition, the experiments provide evidencethat this ILP-based algorithm is also practically able to solvelarger problem instances in a reasonable amount of time
Impact of communication times on mixed CPU/GPU applications scheduling using KAAPI
National audienceHigh Performance Computing machines use more and more Graphical Processing Units as they are very efficient for homogeneous computation such as matrix operations. However before using these accelerators, one has to transfer data from the processor to them. Such a transfer can be slow. In this report, our aim is to study the impact of communication times on the makespan of a scheduling. Indeed, with a better anticipation of these communications, we could use the GPUs even more efficiently. More precisely, we will focus on machines with one or more GPUs and on applications with a low ratio of computations over communications. During this study, we have implemented two offline scheduling algorithms within XKAAPI's runtime. Then we have led an experimental study, combining these algorithms to highlight the impact of communication times. Finally our study has shown that, by using communication aware scheduling algorithms, we can reduce substantially the makespan of an application. Our experiments have shown a reduction of this makespan up to on a machine with several GPUs executing homogeneous computations
An EPTAS for Scheduling on Unrelated Machines of Few Different Types
In the classical problem of scheduling on unrelated parallel machines, a set
of jobs has to be assigned to a set of machines. The jobs have a processing
time depending on the machine and the goal is to minimize the makespan, that is
the maximum machine load. It is well known that this problem is NP-hard and
does not allow polynomial time approximation algorithms with approximation
guarantees smaller than unless PNP. We consider the case that there
are only a constant number of machine types. Two machines have the same
type if all jobs have the same processing time for them. This variant of the
problem is strongly NP-hard already for . We present an efficient
polynomial time approximation scheme (EPTAS) for the problem, that is, for any
an assignment with makespan of length at most
times the optimum can be found in polynomial time in the
input length and the exponent is independent of . In particular
we achieve a running time of , where
denotes the input length. Furthermore, we study three other problem
variants and present an EPTAS for each of them: The Santa Claus problem, where
the minimum machine load has to be maximized; the case of scheduling on
unrelated parallel machines with a constant number of uniform types, where
machines of the same type behave like uniformly related machines; and the
multidimensional vector scheduling variant of the problem where both the
dimension and the number of machine types are constant. For the Santa Claus
problem we achieve the same running time. The results are achieved, using mixed
integer linear programming and rounding techniques
- …