63 research outputs found

    Scheduling Rigid, Evolving Applications on Homogeneous Resources

    Get PDF
    Classical applications executed on clusters or grids are either rigid/moldable or workflow-based. However, the increase of resource computing and storage capabilities has leveraged more complex applications. For example, some code coupling applications exhibit changing resource requirements without being a workflow. Executing them on current batch schedulers leads to an inefficient resource usage, as a block of resources has to be reserved for the whole duration of the application. This paper studies the problem of offline scheduling of rigid and evolving applications on homogeneous resources. It proposes several scheduling algorithms and evaluates them based on simulations. Results show that significant makespan and resource usage improvement can be achieved with short scheduling computing time

    Untying RMS from Application Scheduling

    Get PDF
    As both resources and applications are becoming more complex, resource management also becomes a more challenging task. For example, scheduling code-coupling applications on federations of clusters such as Grids results in complex resource selection algorithms. The abstractions provided by current Resource Management Systems (RMS) - usually rigid jobs or advance reservations - are insufficient to enable such applications to efficiently select resources. This paper studies an RMS architecture that delegates resource selection to applications while the RMS still keeps control over the resources. The proposed architecture is evaluated using a simulator which is then validated with a proof-of-concept implementation. Results show that such a system is feasible and performs well with respect to fairness and scalability.Comme les ressources ainsi que les applications deviennent de plus en plus complexes, la gestion des ressources devient également plus complexe. Par exemple, l'ordonnancement d'application à base de couplage de code sur une fédération des grappes, comme par exemples les grilles, demande des algorithmes complexes pour la sélection de ressources. Les abstractions offertes par les gestionnaires de ressources (RMS - Resource Management Systems) - les tâches rigide ou les réservations en avance - sont insuffisantes pour que de telles applications puissent sélectionner les ressources d'une manière efficace. Cet article s'intéresse à une architecture RMS qui délègue la sélection des ressources aux lanceurs d'applications mais qui continue de garder le contrôle des ressources. L'architecture proposée est évaluée avec des simulations, qui sont validées avec un prototype. Les résultats montrent qu'un tel système est faisable et qu'il se comporte bien vis à vis de l'extensibilité et de l'équité

    Scheduling Independent Moldable Tasks on Multi-Cores with GPUs

    Get PDF
    The number of parallel systems using accelerators is growing up.The technology is now mature enough to allow sustainedpetaflop/s. However, reaching this performance scale requiresefficient scheduling algorithms to manage the heterogeneouscomputing resources.We present a new approach for scheduling independent tasks onmultiple CPUs and multiple GPUs. The tasks are assumed to beparallelizable on CPUs using the moldable model: the final numberof cores allotted to a task can be decided and set by thescheduler. More precisely, we design an algorithm aiming atminimizing the makespan---the maximum completion time of alltasks---for this scheduling problem. The proposed algorithmcombines a dual approximation scheme with a fast integer linearprogram (ILP). It determines both the partitioning of the tasks,ie whether a task should be mapped to CPUs or a GPU, and thenumber of CPUs allotted to a moldable task if mapped to the CPUs.A worst case analysis shows that the algorithm has anapproximation ratio of 32+ϵ\frac{3}{2} + \epsilon. However, sincethe complexity of the ILP-based algorithm could benon-polynomial, we also present a proved polynomial-timealgorithm with an approximation ratio of 2+ϵ2+\epsilon.We complement the theoretical analysis of our two novelalgorithms with an experimental study. In these experiments, wecompare our algorithms to a modified version of the classical\heft algorithm, adapted to handle moldable tasks. Theexperimental results show that our algorithm with the32+ϵ\frac{3}{2} + \epsilon approximation ratio producessignificantly shorter schedules than the modified \heft for mostof the instances. In addition, the experiments provide evidencethat this ILP-based algorithm is also practically able to solvelarger problem instances in a reasonable amount of time

    Theory and Engineering of Scheduling Parallel Jobs

    Get PDF
    Scheduling is very important for an efficient utilization of modern parallel computing systems. In this thesis, four main research areas for scheduling are investigated: the interplay and distribution of decision makers, the efficient schedule computation, efficient scheduling for the memory hierarchy and energy-efficiency. The main result is a provably fast and efficient scheduling algorithm for malleable jobs. Experiments show the importance and possibilities of scheduling considering the memory hierarchy

    De l'ordonnancement des applications multi-niveaux

    Get PDF
    8 pagesNational audienceSous l'impulsion des besoins applicatifs, les moyens de calcul sont de plus en plus puissants. Cette évolution se fait notamment grâce à des architectures de plus en plus parallèles. Dans ce contexte, la portabilité des performances des applications HPC très optimisées est problématique. Le présent article est motivé par l'exemple d'une application HPC appelée HLW (High-Level Waste). Nous présentons un modèle de la structure d'HLW indépendant de l'architecture d'exécution. Nous généralisons ensuite ce modèle en introduisant les \emph{applications multi-niveaux} : des applications constituées d'un ensemble de tâches indépendantes ayant une probabilité de déclencher l'apparition d'une nouvelle tâche lorsqu'elles terminent. On s'intéresse ensuite à l'ordonnancement de telles tâches dans le cas où elles sont modelables, suivent la loi d'Amdahl et où l'architecture d'exécution est homogène. Nous proposons ensuite une famille d'algorithmes et évaluons ses performances à travers des simulations. Finalement, on sélectionne l'algorithme ayant les meilleures performances. Cet algorithme constitue une amélioration par rapport à l'ordonnancement par défaut d'HLW à la fois en terme de performance et d'indépendance par rapport aux paramètres de l'architecture

    Bridging a Gap Between Research and Production: Contributions to Scheduling and Simulation

    Get PDF
    Large scale distributed computing infrastructures (e.g., data centers, grids, or clouds) are used by scientists from various domains to produce outstanding research results, such as the discovery of the Higgs Boson in High Energy Physics. These infrastructures are also studied by Computer Scientists to produce their own set of scientific results. Ideally, a virtuous circle should exist between Domain and Computer Scientists: the former raising challenges that could be addressed by the latter. Unfortunately, in many occasions, a gap exists that prevents such an ideal and fostering collaboration. This habilitation covers research works conducted in the fields of scheduling and simulation that contribute to the filling of this gap. It discusses the necessary conditions to achieve this goal and details concrete initiatives in this endeavor

    Nützliche Strukturen und wie sie zu finden sind: Nicht Approximierbarkeit und Approximationen für diverse Varianten des Parallel Task Scheduling Problems

    Get PDF
    In this thesis, we consider the Parallel Task Scheduling problem and several variants. This problem and its variations have diverse applications in theory and practice; for example, they appear as sub-problems in higher dimensional problems. In the Parallel Task Scheduling problem, we are given a set of jobs and a set of identical machines. Each job is a parallel task; i.e., it needs a fixed number of identical machines to be processed. A schedule assigns to each job a set of machines it is processed on and a starting time. It is feasible if at each point in time each machine processes at most one job. In a variant of this problem, called Strip Packing, the identical machines are arranged in a total order, and jobs can only allocate neighboring machines with regard to this total order. In this case, we speak of Contiguous Parallel Task Scheduling as well. In another variant, called Single Resource Constraint Scheduling, we are given an additional constraint on how many jobs can be processed at the same time. For these variants of the Parallel Task Scheduling problem, we consider an extension, where the set of machines is grouped into identical clusters. When scheduling a job, we are allowed to allocate machines from only one cluster to process the job. For all these considered problems, we close some gaps between inapproximation or hardness result and the best possible algorithm. For Parallel Task Scheduling we prove that it is strongly NP-hard if we are given precisely 4 machines. Before it was known that it is strongly NP-hard if we are given at least 5 machines, and there was an (exact) pseudo-polynomial time algorithm for up to 3 machines. For Strip Packing, we present an algorithm with approximation ratio (5/4 +ε) and prove that there is no approximation with ratio less than 5/4 unless P = NP. Concerning Single Resource Constraint Scheduling, it is not possible to find an algorithm with ratio smaller than 3/2, unless P = NP, and we present an algorithm with ratio (3/2 +ε). For the extensions to identical clusters, there can be no approximation algorithm with a ratio smaller than 2 unless P = NP. For the extensions of Strip Packing and Parallel Task Scheduling there are 2-approximations already, but they have a huge worst case running time. We present 2-approximations that have a linear running time for the extensions of Strip Packing, Parallel Task Scheduling, and Single Resource Constraint Scheduling for the case that at least three clusters are present and greatly improve the running time for two clusters. Finally, we consider three variants of Scheduling on Identical Machines with setup times. We present EPTAS results for all of them which is the best one can hope for since these problems are strongly NP-complete.In dieser Thesis untersuchen wir das Problem Parallel Task Scheduling und einige seiner Varianten. Dieses Problem und seine Variationen haben vielfältige Anwendungen in Theorie und Praxis. Beispielsweise treten sie als Teilprobleme in höherdimensionalen Problemen auf. Im Problem Parallel Task Scheduling erhalten wir eine Menge von Jobs und eine Menge identischer Maschinen. Jeder Job ist ein paralleler Task, d. h. er benötigt eine feste Anzahl der identischen Maschinen, um bearbeitet zu werden. Ein Schedule ordnet den Jobs die Maschinen zu, auf denen sie bearbeitet werden sollen, sowie einen festen Startzeitpunkt der Bearbeitung. Der Schedule ist gültig, wenn zu jedem Zeitpunkt jede Maschine höchstens einen Job bearbeitet. Beim Strip Packing Problem sind die identischen Maschinen in einer totalen Ordnung angeordnet und Jobs können nur benachbarte Maschinen in Bezug auf diese Ordnung nutzen. In dem Single Resource Constraint Scheduling Problem gibt es eine zusätzliche Einschränkung, wie viele Jobs gleichzeitig verarbeitet werden können. Für die genannten Varianten des Parallel Task Scheduling Problems betrachten wir eine Erweiterung, bei der die Maschinen in identische Cluster gruppiert sind. Bei der Bearbeitung eines Jobs dürfen in diesem Modell nur Maschinen aus einem Cluster genutzt werden. Für all diese Probleme schließen wir Lücken zwischen Nichtapproximierbarkeit und Algorithmen. Für Parallel Task Scheduling zeigen wir, dass es stark NP-vollständig ist, wenn genau 4 Maschinen gegeben sind. Vorher war ein pseudopolynomieller Algorithmus für bis zu 3 Maschinen bekannt, sowie dass dieses Problem stark NP-vollständig ist für 5 oder mehr Maschinen. Für Strip Packing zeigen wir, dass es keinen pseudopolynomiellen Algorithmus gibt, der eine Güte besser als 5/4 besitzt und geben einen pseudopolynomiellen Algorithmus mit Güte (5/4 +ε) an. Für Single Resource Constraint Scheduling ist die bestmögliche Güte eine 3/2-Approximation und wir präsentieren eine (3/2 +ε)-Approximation. Für die Erweiterung auf identische Cluster gibt es keine Approximation mit Güte besser als 2. Vor unseren Untersuchungen waren bereits Algorithmen mit Güte 2 bekannt, die jedoch gigantische Worst-Case Laufzeiten haben. Wir geben für alle drei Varianten 2-Approximationen mit linearer Laufzeit an, sofern mindestens drei Cluster gegeben sind. Schlussendlich betrachten wir noch Scheduling auf Identischen Maschinen mit Setup Zeiten. Wir entwickeln für drei untersuche Varianten dieses Problems jeweils einen EPTAS, wobei ein EPTAS das beste ist, auf das man hoffen kann, es sei denn es gilt P = NP
    corecore