    Mapping tree-shaped workflows on systems with different memory sizes and processor speeds

    Directed acyclic graphs are commonly used to model scientific workflows, by expressing dependencies between tasks, as well as the resource requirements of the workflow. As a special case, rooted directed trees occur in several applications, for instance in sparse matrix computations. Since typical workflows are modeled by large trees, it is crucial to schedule them efficiently, so that their execution time (or makespan) is minimized. Furthermore, it is usually beneficial to distribute the execution on several compute nodes, hence increasing the available memory, and allowing us to parallelize parts of the execution. To exploit the heterogeneity of modern clusters in this context, we investigate the partitioning and mapping of tree‐shaped workflows on two types of target architecture models: in AM1, each processor can have a different memory size, and in AM2, each processor can also have a different speed (in addition to a different memory size). We design a three‐step heuristic for AM1, which adapts and extends previous work for homogeneous clusters [Gou C, Benoit A, Marchal L. Partitioning tree‐shaped task graphs for distributed platforms with limited memory. IEEE Trans Parallel Dist Syst 2020; 31(7): 1533–1544]. The changes we propose concern the assignment to processors (accounting for the different memory sizes) and the availability of suitable processors when splitting or merging subtrees. For AM2, we extend the heuristic for AM1 with a two‐phase local search approach. Phase A is a swap‐based hill climber, while (the optional) Phase B is inspired by iterated local search. We evaluate our heuristics for AM1 and AM2 with extensive simulations, and we demonstrate that exploiting the heterogeneity in the cluster significantly reduces the makespan, compared to the state of the art for homogeneous processors.Peer Reviewe

    Partitionnement d’arbres de tâches pour des plates-formes distribuées avec limitation de mémoire

    Scientific applications are commonly modeled as the processing of directed acyclicgraphs of tasks, and for some of them, the graph takes the special form of a rooted tree. Thistree expresses both the computational dependencies between tasks and their storage requirements.The problem of scheduling/traversing such a tree on a single processor to minimize its memoryfootprint has already been widely studied. Hence, we move to parallel processing and study howto partition the tree for a homogeneous multiprocessor platform, where each processor is equippedwith its own memory. We formally state the problem of partitioning the tree into subtrees suchthat each subtree can be processed on a single processor and the total resulting processing time isminimized. We prove that the problem is NP-complete, and we design polynomial-time heuristicsto address it. An extensive set of simulations demonstrates the usefulness of these heuristics.Les applications scientifiques sont couramment modélisées par des graphes de tâches. Pour certaines d'entre elles, le graphe prend la forme particulière d'un arbre enraciné". Cet arbre détermine à la fois les dépendance entre tâches de calcul et les besoins en stockage. Le problème d'ordonnancer (ou parcourir) un tel arbre sur un seul processeur pour réduire son empreinte mémoire a déjà largement été étudié. Dans ce rapport, nous considérons le traitement parallèle d'un tel arbre et étudions comment le partitionner pour une plate-forme decalcul formée de processeurs homogènes disposant chacun de sa propre mémoire.Nous formalisons le problème du partitionnement de l'arbre en sous-arbres de telle sorte que chaque sous-arbre puisse être traité sur un seul processeur et que le temps de calcul total soit minimal. Nous montrons que ce problème est NP-complet et proposons des heuristiques polynomiales. Un ensemble exhaustif,de simulations permet de montrer l'utilité de ces heuristiques

    Data stream mining techniques: a review

    A plethora of infinite data is generated from the Internet and other information sources. Analyzing this massive data in real-time and extracting valuable knowledge using different mining applications platforms have been an area for research and industry as well. However, data stream mining has different challenges making it different from traditional data mining. Recently, many studies have addressed the concerns on massive data mining problems and proposed several techniques that produce impressive results. In this paper, we review real time clustering and classification mining techniques for data stream. We analyze the characteristics of data stream mining and discuss the challenges and research issues of data steam mining. Finally, we present some of the platforms for data stream mining

    Placement de workflows de type arbre sur des architectures à mémoire hétérogène

    Directed acyclic graphs are commonly used to model scientific workflows, by expressing dependencies between tasks, as well as the resource requirements of the workflow. As a special case, rooted directed trees occur in several applications, for instance in sparse matrix computations. Since typical workflows are modeled by huge trees, it is crucial to schedule them efficiently, so that their execution time (or makespan) is minimized. Furthermore, it might be beneficial to distribute the execution on several compute nodes, hence increasing the available memory, and allowing us to parallelize parts of the execution. To exploit the heterogeneity of modern clusters in this context, we investigate the partitioning and mapping of tree-shaped workflows on target architectures where each processor can have a different memory size. Our three-step heuristic adapts and extends previous work for homogeneous clusters [Gou et al., TPDS 2020]. The changes we propose concern the assignment to processors (which considers the different memory sizes) and the availability of suitable processors when splitting or merging subtrees. We evaluate our approach with extensive simulations and demonstrate that exploiting the heterogeneity in the cluster reduces the makespan significantly compared to the state of the art for homogeneous memory

    Parallel scheduling of task trees with limited memory

    This paper investigates the execution of tree-shaped task graphs using multiple processors. Each edge of such a tree represents some large data. A task can only be executed if all input and output data fit into memory, and a data can only be removed from memory after the completion of the task that uses it as an input data. Such trees arise, for instance, in the multifrontal method of sparse matrix factorization. The peak memory needed for the processing of the entire tree depends on the execution order of the tasks. With one processor the objective of the tree traversal is to minimize the required memory. This problem was well studied and optimal polynomial algorithms were proposed. Here, we extend the problem by considering multiple processors, which is of obvious interest in the application area of matrix factorization. With multiple processors comes the additional objective to minimize the time needed to traverse the tree, i.e., to minimize the makespan. Not surprisingly, this problem proves to be much harder than the sequential one. We study the computational complexity of this problem and provide inapproximability results even for unit weight trees. We design a series of practical heuristics achieving different trade-offs between the minimization of peak memory usage and makespan. Some of these heuristics are able to process a tree while keeping the memory usage under a given memory limit. The different heuristics are evaluated in an extensive experimental evaluation using realistic trees.Dans ce rapport, nous nous intéressons au traitement d'arbres de tâches par plusieurs processeurs. Chaque arête d'un tel arbre représente un gros fichier d'entrée/sortie. Une tâche peut être traitée seulement si l'ensemble de ses fichiers d'entrée et de sortie peut résider en mémoire, et un fichier ne peut être retiré de la mémoire que lorsqu'il a été traité. De tels arbres surviennent, par exemple, lors de la factorisation de matrices creuses par des méthodes multifrontales. La quantité de mémoire nécessaire dépend de l'ordre de traitement des tâches. Avec un seul processeur, l'objectif est naturellement de minimiser la quantité de mémoire requise. Ce problème a déjà été étudié et des algorithmes polynomiaux ont été proposés. Nous étendons ce problème en considérant plusieurs processeurs, ce qui est d'un intérêt évident pour le problème de la factorisation de grandes matrices. Avec plusieurs processeurs se pose également le problème de la minimisation du temps nécessaire pour traiter l'arbre. Nous montrons que comme attendu, ce problème est bien plus compliqué que dans le cas séquentiel. Nous étudions la complexité de ce problème et nous fournissons des résultats d'inaproximabilité, même dans le cas de poids unitaires. Nous proposons plusieurs heuristiques qui obtiennent différents compromis entre mémoire et temps d'exécution. Certaines d'entre elles sont capables de traiter l'arbre tout en gardant la consommation mémoire inférieure à une limite donnée. Nous analysons les performances de toutes ces heuristiques par une large campagne de simulations utilisant des arbres réalistes

    Mapping Tree-shaped Workflows on Memory-heterogeneous Architectures

    International audienceDirected acyclic graphs are commonly used to model scientific workflows, by expressing dependencies between tasks, as well as the resource requirements of the workflow. As a special case, rooted directed trees occur in several applications. Since typical workflows are modeled by huge trees, it is crucial to schedule them efficiently. We investigate the partitioning and mapping of tree-shaped workflows on target architectures where each processor can have a different memory size. Our three-step heuristic adapts and extends previous work for homogeneous clusters. In particular, we design a novel algorithm to assign subtrees to processors with different memory sizes, and we show how to select appropriate processors when splitting or merging subtrees. The experiments demonstrate that exploiting the heterogeneity reduces the makespan significantly compared to the state of the art for homogeneous memories

    Streaming Graph Challenge: Stochastic Block Partition

    An important objective for analyzing real-world graphs is to achieve scalable performance on large, streaming graphs. A challenging and relevant example is the graph partition problem. As a combinatorial problem, graph partition is NP-hard, but existing relaxation methods provide reasonable approximate solutions that can be scaled for large graphs. Competitive benchmarks and challenges have proven to be an effective means to advance state-of-the-art performance and foster community collaboration. This paper describes a graph partition challenge with a baseline partition algorithm of sub-quadratic complexity. The algorithm employs rigorous Bayesian inferential methods based on a statistical model that captures characteristics of the real-world graphs. This strong foundation enables the algorithm to address limitations of well-known graph partition approaches such as modularity maximization. This paper describes various aspects of the challenge including: (1) the data sets and streaming graph generator, (2) the baseline partition algorithm with pseudocode, (3) an argument for the correctness of parallelizing the Bayesian inference, (4) different parallel computation strategies such as node-based parallelism and matrix-based parallelism, (5) evaluation metrics for partition correctness and computational requirements, (6) preliminary timing of a Python-based demonstration code and the open source C++ code, and (7) considerations for partitioning the graph in streaming fashion. Data sets and source code for the algorithm as well as metrics, with detailed documentation are available at GraphChallenge.org.Comment: To be published in 2017 IEEE High Performance Extreme Computing Conference (HPEC

    Deliverable JRA1.1: Evaluation of current network control and management planes for multi-domain network infrastructure

    This deliverable includes a compilation and evaluation of available control and management architectures and protocols applicable to a multilayer infrastructure in a multi-domain Virtual Network environment.The scope of this deliverable is mainly focused on the virtualisation of the resources within a network and at processing nodes. The virtualization of the FEDERICA infrastructure allows the provisioning of its available resources to users by means of FEDERICA slices. A slice is seen by the user as a real physical network under his/her domain, however it maps to a logical partition (a virtual instance) of the physical FEDERICA resources. A slice is built to exhibit to the highest degree all the principles applicable to a physical network (isolation, reproducibility, manageability, ...). Currently, there are no standard definitions available for network virtualization or its associated architectures. Therefore, this deliverable proposes the Virtual Network layer architecture and evaluates a set of Management- and Control Planes that can be used for the partitioning and virtualization of the FEDERICA network resources. This evaluation has been performed taking into account an initial set of FEDERICA requirements; a possible extension of the selected tools will be evaluated in future deliverables. The studies described in this deliverable define the virtual architecture of the FEDERICA infrastructure. During this activity, the need has been recognised to establish a new set of basic definitions (taxonomy) for the building blocks that compose the so-called slice, i.e. the virtual network instantiation (which is virtual with regard to the abstracted view made of the building blocks of the FEDERICA infrastructure) and its architectural plane representation. These definitions will be established as a common nomenclature for the FEDERICA project. Other important aspects when defining a new architecture are the user requirements. It is crucial that the resulting architecture fits the demands that users may have. Since this deliverable has been produced at the same time as the contact process with users, made by the project activities related to the Use Case definitions, JRA1 has proposed a set of basic Use Cases to be considered as starting point for its internal studies. When researchers want to experiment with their developments, they need not only network resources on their slices, but also a slice of the processing resources. These processing slice resources are understood as virtual machine instances that users can use to make them behave as software routers or end nodes, on which to download the software protocols or applications they have produced and want to assess in a realistic environment. Hence, this deliverable also studies the APIs of several virtual machine management software products in order to identify which best suits FEDERICA’s needs.Postprint (published version