Search CORE

279 research outputs found

Parallel scheduling of task trees with limited memory

Author: Eyraud-Dubois Lionel
Marchal Loris
Sinnen Oliver
Vivien Frédéric
Publication venue: HAL CCSD
Publication date: 01/01/2014
Field of study

This paper investigates the execution of tree-shaped task graphs using multiple processors. Each edge of such a tree represents some large data. A task can only be executed if all input and output data fit into memory, and a data can only be removed from memory after the completion of the task that uses it as an input data. Such trees arise, for instance, in the multifrontal method of sparse matrix factorization. The peak memory needed for the processing of the entire tree depends on the execution order of the tasks. With one processor the objective of the tree traversal is to minimize the required memory. This problem was well studied and optimal polynomial algorithms were proposed. Here, we extend the problem by considering multiple processors, which is of obvious interest in the application area of matrix factorization. With multiple processors comes the additional objective to minimize the time needed to traverse the tree, i.e., to minimize the makespan. Not surprisingly, this problem proves to be much harder than the sequential one. We study the computational complexity of this problem and provide inapproximability results even for unit weight trees. We design a series of practical heuristics achieving different trade-offs between the minimization of peak memory usage and makespan. Some of these heuristics are able to process a tree while keeping the memory usage under a given memory limit. The different heuristics are evaluated in an extensive experimental evaluation using realistic trees.Dans ce rapport, nous nous intéressons au traitement d'arbres de tâches par plusieurs processeurs. Chaque arête d'un tel arbre représente un gros fichier d'entrée/sortie. Une tâche peut être traitée seulement si l'ensemble de ses fichiers d'entrée et de sortie peut résider en mémoire, et un fichier ne peut être retiré de la mémoire que lorsqu'il a été traité. De tels arbres surviennent, par exemple, lors de la factorisation de matrices creuses par des méthodes multifrontales. La quantité de mémoire nécessaire dépend de l'ordre de traitement des tâches. Avec un seul processeur, l'objectif est naturellement de minimiser la quantité de mémoire requise. Ce problème a déjà été étudié et des algorithmes polynomiaux ont été proposés. Nous étendons ce problème en considérant plusieurs processeurs, ce qui est d'un intérêt évident pour le problème de la factorisation de grandes matrices. Avec plusieurs processeurs se pose également le problème de la minimisation du temps nécessaire pour traiter l'arbre. Nous montrons que comme attendu, ce problème est bien plus compliqué que dans le cas séquentiel. Nous étudions la complexité de ce problème et nous fournissons des résultats d'inaproximabilité, même dans le cas de poids unitaires. Nous proposons plusieurs heuristiques qui obtiennent différents compromis entre mémoire et temps d'exécution. Certaines d'entre elles sont capables de traiter l'arbre tout en gardant la consommation mémoire inférieure à une limite donnée. Nous analysons les performances de toutes ces heuristiques par une large campagne de simulations utilisant des arbres réalistes

HAL-ENS-LYON

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Hal-Diderot

Oskar Bordeaux

Mapping tree-shaped workflows on systems with different memory sizes and processor speeds

Author: Benoit Anne
Kulagina Svetlana
Meyerhenke Henning
Publication venue: Humboldt-Universität zu Berlin
Publication date: 06/07/2023
Field of study

Directed acyclic graphs are commonly used to model scientific workflows, by expressing dependencies between tasks, as well as the resource requirements of the workflow. As a special case, rooted directed trees occur in several applications, for instance in sparse matrix computations. Since typical workflows are modeled by large trees, it is crucial to schedule them efficiently, so that their execution time (or makespan) is minimized. Furthermore, it is usually beneficial to distribute the execution on several compute nodes, hence increasing the available memory, and allowing us to parallelize parts of the execution. To exploit the heterogeneity of modern clusters in this context, we investigate the partitioning and mapping of tree‐shaped workflows on two types of target architecture models: in AM1, each processor can have a different memory size, and in AM2, each processor can also have a different speed (in addition to a different memory size). We design a three‐step heuristic for AM1, which adapts and extends previous work for homogeneous clusters [Gou C, Benoit A, Marchal L. Partitioning tree‐shaped task graphs for distributed platforms with limited memory. IEEE Trans Parallel Dist Syst 2020; 31(7): 1533–1544]. The changes we propose concern the assignment to processors (accounting for the different memory sizes) and the availability of suitable processors when splitting or merging subtrees. For AM2, we extend the heuristic for AM1 with a two‐phase local search approach. Phase A is a swap‐based hill climber, while (the optional) Phase B is inspired by iterated local search. We evaluate our heuristics for AM1 and AM2 with extensive simulations, and we demonstrate that exploiting the heterogeneity in the cluster significantly reduces the makespan, compared to the state of the art for homogeneous processors.Peer Reviewe

Dokumenten-Publikationsserver der Humboldt-Universität zu Berlin

Scheduling malleable task trees

Author: Marchal Loris
Simon Bertrand
Vivien Frédéric
Publication venue: HAL CCSD
Publication date: 01/09/2014
Field of study

Solving sparse linear systems can lead to processing tree workflows on a platform of processors. In this study, we use the model of malleable tasks motivated in [Prasanna96,Beaumont07] in order to study tree workflow schedules under two contradictory objectives: makespan minimization and memory minization. First, we give a simpler proof of the result of [Prasanna96] which allows to compute a makespan-optimal schedule for tree workflows. Then, we study a more realistic speed-up function and show that the previous schedules are not optimal in this context. Finally, we give complexity results concerning the objective of minimizing both makespan and memory

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Hal-Diderot

Independent and Divisible Task Scheduling on Heterogeneous Star-shaped Platforms with Limited Memory

Author: Beaumont Olivier
Legrand Arnaud
Marchal Loris
Robert Yves
Publication venue: HAL CCSD
Publication date: 01/01/2004
Field of study

In this paper, we consider the problem of allocating and scheduling a collection of independent, equal-sized tasks on heterogeneous star-shaped platforms. We also address the same problem for divisible tasks. For both cases, we take memory constraints into account. We prove strong NP-completeness results for different objective functions, namely makespan minimization and throughput maximization, on simple star-shaped platforms. We propose an approximation algorithm based on the unconstrained version (with unlimited memory) of the problem. We introduce several heuristics, which are evaluated and compared through extensive simulations. An unexpected conclusion drawn from these experiments is that classical scheduling heuristics that try to greedily minimize the completion time of each task are outperformed by the simple heuristic that consists in assigning the task to the available processor that has the smallest communication time, regardless of computation power (hence a "bandwidth-centric" distribution).Dans ce rapport, nous nous intéressons au problème de l’allocation d’un grand nombre de taches indépendantes et de taille identiques sur des plateformes de calcul hétérogènes organisées en étoile. Nous nous intéressons également au modèle des tâches divisibles. Pour ces deux modèles, nous prenons en compte les contraintes mémoires et démontrons des résultats de NP-complétude pour diverses métriques (le «makespakan» et le débit). Nous proposons un algorithme d’approximation basé sur la version non-contrainte (c’est-`a-dire avec une mémoire infinie) du problème. Nous proposons également d’autres heuristiques que nous évaluons à l’aide d’un grand nombre de simulations. Une conclusion inattendue qui ressort de ces expériences est que les heuristiques de listes classiques qui essaient de minimiser gloutonnement la durée de l’ordonnancement sont bien moins performantes que l’heuristique toute simple consistant à envoyer les tâches aux processeurs disponibles ayant le temps de communication le plus faible, sans même tenir compte de leur puissance de calcu

HAL-ENS-LYON

CiteSeerX

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Hal-Diderot

Memory-aware list scheduling for hybrid platforms

Author: Herrmann Julien
Marchal Loris
Robert Yves
Publication venue: HAL CCSD
Publication date: 10/02/2014
Field of study

This report provides memory-aware heuristics to schedule tasks graphs onto heterogeneous resources, such as a dual-memory cluster equipped with multicores and a dedicated accelerator (FPGA or GPU). Each task has a different processing time for either resource. The optimization objective is to schedule the graph so as to minimize execution time, given the available memory for each resource type. In addition to ordering the tasks, we must also decide on which resource to execute them, given their computation requirement and the memory currently available on each resource. The major contributions of this report are twofold: (i) the derivation of an intricate integer linear program formulation for this scheduling problem; and (ii) the design of memory-aware heuristics, which outperform the reference heuristics HEFT and MinMin on a wide variety of problem instances. The absolute performance of these heuristics is assessed for small-size graphs, with up to 30 tasks, thanks to the linear program

HAL-ENS-LYON

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

Partitionnement d’arbres de tâches pour des plates-formes distribuées avec limitation de mémoire

Author: Benoit Anne
Gou Changjiang
Marchal Loris
Publication venue: HAL CCSD
Publication date: 01/03/2019
Field of study

Scientific applications are commonly modeled as the processing of directed acyclicgraphs of tasks, and for some of them, the graph takes the special form of a rooted tree. Thistree expresses both the computational dependencies between tasks and their storage requirements.The problem of scheduling/traversing such a tree on a single processor to minimize its memoryfootprint has already been widely studied. Hence, we move to parallel processing and study howto partition the tree for a homogeneous multiprocessor platform, where each processor is equippedwith its own memory. We formally state the problem of partitioning the tree into subtrees suchthat each subtree can be processed on a single processor and the total resulting processing time isminimized. We prove that the problem is NP-complete, and we design polynomial-time heuristicsto address it. An extensive set of simulations demonstrates the usefulness of these heuristics.Les applications scientifiques sont couramment modélisées par des graphes de tâches. Pour certaines d'entre elles, le graphe prend la forme particulière d'un arbre enraciné". Cet arbre détermine à la fois les dépendance entre tâches de calcul et les besoins en stockage. Le problème d'ordonnancer (ou parcourir) un tel arbre sur un seul processeur pour réduire son empreinte mémoire a déjà largement été étudié. Dans ce rapport, nous considérons le traitement parallèle d'un tel arbre et étudions comment le partitionner pour une plate-forme decalcul formée de processeurs homogènes disposant chacun de sa propre mémoire.Nous formalisons le problème du partitionnement de l'arbre en sous-arbres de telle sorte que chaque sous-arbre puisse être traité sur un seul processeur et que le temps de calcul total soit minimal. Nous montrons que ce problème est NP-complet et proposons des heuristiques polynomiales. Un ensemble exhaustif,de simulations permet de montrer l'utilité de ces heuristiques

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Hal-Diderot

Scheduling for Large-Scale Systems

Author: Amestoy P.
Baleani M.
Brucker P.
Casanova H.
Garey M. R.
Kolettis N.
Lam C.-C.
Melhem R.
Pinedo M. L.
Prathipati R. B.
Puterman M. L.
Rayward-Smith V. J.
Sarkar V.
Shirazi B. A.
Uçar B.
Publication venue: 'Informa UK Limited'
Publication date
Field of study

Crossref

Placement de workflows de type arbre sur des architectures à mémoire hétérogène

Author: Benoit Anne
Kulagina Svetlana
Meyerhenke Henning
Publication venue: HAL CCSD
Publication date: 01/02/2022
Field of study

Directed acyclic graphs are commonly used to model scientific workflows, by expressing dependencies between tasks, as well as the resource requirements of the workflow. As a special case, rooted directed trees occur in several applications, for instance in sparse matrix computations. Since typical workflows are modeled by huge trees, it is crucial to schedule them efficiently, so that their execution time (or makespan) is minimized. Furthermore, it might be beneficial to distribute the execution on several compute nodes, hence increasing the available memory, and allowing us to parallelize parts of the execution. To exploit the heterogeneity of modern clusters in this context, we investigate the partitioning and mapping of tree-shaped workflows on target architectures where each processor can have a different memory size. Our three-step heuristic adapts and extends previous work for homogeneous clusters [Gou et al., TPDS 2020]. The changes we propose concern the assignment to processors (which considers the different memory sizes) and the availability of suitable processors when splitting or merging subtrees. We evaluate our approach with extensive simulations and demonstrate that exploiting the heterogeneity in the cluster reduces the makespan significantly compared to the state of the art for homogeneous memory

INRIA a CCSD electronic archive server