59 research outputs found
Efficient Algorithms for Scheduling Moldable Tasks
We study the problem of scheduling independent moldable tasks on
processors that arises in large-scale parallel computations. When tasks are
monotonic, the best known result is a -approximation
algorithm for makespan minimization with a complexity linear in and
polynomial in and where is
arbitrarily small. We propose a new perspective of the existing speedup models:
the speedup of a task is linear when the number of assigned
processors is small (up to a threshold ) while it presents
monotonicity when ranges in ; the bound
indicates an unacceptable overhead when parallelizing on too many processors.
For a given integer , let . In this paper, we propose a -approximation algorithm for makespan minimization with a
complexity where
(). As
a by-product, we also propose a -approximation algorithm for
throughput maximization with a common deadline with a complexity
Scheduling Monotone Moldable Jobs in Linear Time
A moldable job is a job that can be executed on an arbitrary number of
processors, and whose processing time depends on the number of processors
allotted to it. A moldable job is monotone if its work doesn't decrease for an
increasing number of allotted processors. We consider the problem of scheduling
monotone moldable jobs to minimize the makespan.
We argue that for certain compact input encodings a polynomial algorithm has
a running time polynomial in n and log(m), where n is the number of jobs and m
is the number of machines. We describe how monotony of jobs can be used to
counteract the increased problem complexity that arises from compact encodings,
and give tight bounds on the approximability of the problem with compact
encoding: it is NP-hard to solve optimally, but admits a PTAS.
The main focus of this work are efficient approximation algorithms. We
describe different techniques to exploit the monotony of the jobs for better
running times, and present a (3/2+{\epsilon})-approximate algorithm whose
running time is polynomial in log(m) and 1/{\epsilon}, and only linear in the
number n of jobs
Malleable Scheduling Beyond Identical Machines
In malleable job scheduling, jobs can be executed simultaneously on multiple machines with the processing time depending on the number of allocated machines. Jobs are required to be executed non-preemptively and in unison, in the sense that they occupy, during their execution, the same time interval over all the machines of the allocated set. In this work, we study generalizations of malleable job scheduling inspired by standard scheduling on unrelated machines. Specifically, we introduce a general model of malleable job scheduling, where each machine has a (possibly different) speed for each job, and the processing time of a job j on a set of allocated machines S depends on the total speed of S for j. For machines with unrelated speeds, we show that the optimal makespan cannot be approximated within a factor less than e/(e-1), unless P = NP. On the positive side, we present polynomial-time algorithms with approximation ratios 2e/(e-1) for machines with unrelated speeds, 3 for machines with uniform speeds, and 7/3 for restricted assignments on identical machines. Our algorithms are based on deterministic LP rounding and result in sparse schedules, in the sense that each machine shares at most one job with other machines. We also prove lower bounds on the integrality gap of 1+phi for unrelated speeds (phi is the golden ratio) and 2 for uniform speeds and restricted assignments. To indicate the generality of our approach, we show that it also yields constant factor approximation algorithms (i) for minimizing the sum of weighted completion times; and (ii) a variant where we determine the effective speed of a set of allocated machines based on the L_p norm of their speeds
Multi-Resource List Scheduling of Moldable Parallel Jobs under Precedence Constraints
The scheduling literature has traditionally focused on a single type of
resource (e.g., computing nodes). However, scientific applications in modern
High-Performance Computing (HPC) systems process large amounts of data, hence
have diverse requirements on different types of resources (e.g., cores, cache,
memory, I/O). All of these resources could potentially be exploited by the
runtime scheduler to improve the application performance. In this paper, we
study multi-resource scheduling to minimize the makespan of computational
workflows comprised of parallel jobs subject to precedence constraints. The
jobs are assumed to be moldable, allowing the scheduler to flexibly select a
variable set of resources before execution. We propose a multi-resource,
list-based scheduling algorithm, and prove that, on a system with types of
schedulable resources, our algorithm achieves an approximation ratio of
for any , and a ratio of for
large . We also present improved results for independent jobs and for jobs
with special precedence constraints (e.g., series-parallel graphs and trees).
Finally, we prove a lower bound of on the approximation ratio of any list
scheduling scheme with local priority considerations. To the best of our
knowledge, these are the first approximation results for moldable workflows
with multiple resource requirements
Closing the Gap for Pseudo-Polynomial Strip Packing
Two-dimensional packing problems are a fundamental class of optimization problems and Strip Packing is one of the most natural and famous among them. Indeed it can be defined in just one sentence: Given a set of rectangular axis parallel items and a strip with bounded width and infinite height, the objective is to find a packing of the items into the strip minimizing the packing height. We speak of pseudo-polynomial Strip Packing if we consider algorithms with pseudo-polynomial running time with respect to the width of the strip. It is known that there is no pseudo-polynomial time algorithm for Strip Packing with a ratio better than 5/4 unless P = NP. The best algorithm so far has a ratio of 4/3 + epsilon. In this paper, we close the gap between inapproximability result and currently known algorithms by presenting an algorithm with approximation ratio 5/4 + epsilon. The algorithm relies on a new structural result which is the main accomplishment of this paper. It states that each optimal solution can be transformed with bounded loss in the objective such that it has one of a polynomial number of different forms thus making the problem tractable by standard techniques, i.e., dynamic programming. To show the conceptual strength of the approach, we extend our result to other problems as well, e.g., Strip Packing with 90 degree rotations and Contiguous Moldable Task Scheduling, and present algorithms with approximation ratio 5/4 + epsilon for these problems as well
Ordonnancement avec tolérance aux pannes pour des tùches parallÚles à nombre de processeurs programmable
We study the resilient scheduling of moldable parallel jobs on high-performance computing (HPC) platforms. Moldable jobs allow for choosing a processor allocation before execution, and their execution time obeys various speedup models. The objective is to minimize the overall completion time of the jobs, or the makespan, when jobs can fail due to silent errors and hence may need to be re-executed after each failure until successful completion. Our work generalizes the classical scheduling framework for failure-free jobs. To cope with silent errors, we introduce two resilient scheduling algorithms, LPA-List and Batch-List, both of which use the List strategy to schedule the jobs. Without knowing a priori how many times each job will fail, LPA-List relies on a local strategy to allocate processors to the jobs, while Batch-List schedules the jobs in batches and allows only a restricted number of failures per job in each batch. We prove new approximation ratios for the two algorithms under several prominent speedup models (e.g., roofline, communication, Amdahl, power, monotonic, and a mixed model). An extensive set of simulations is conducted to evaluate different variants of the two algorithms, and the results show that they consistently outperform some baseline heuristics. Overall, our best algorithm is within a factor of 1.6 of a lower bound on average over the entire set of experiments, and within a factor of 4.2 in the worst case.Ce rapport Ă©tudie lâordonnancement rĂ©silient de tĂąches sur des plateformes de calcul Ă haute performance. Dans le problĂšme Ă©tudiĂ©, il est possible de choisir le nombre constant de processeurs effectuant chaque tĂąche, en dĂ©terminant le temps dâexĂ©cution de ces derniĂšres selon diffĂ©rent modĂšles de rendement. Nous dĂ©crivons des algorithmes dont lâobjectif est deminimiser le temps total dâexĂ©cution, sachant que les tĂąches sont susceptibles dâĂ©chouer et de devoir ĂȘtre rĂ©-effectuĂ©es Ă chaque erreur. Ce problĂšme est donc une gĂ©nĂ©ralisation du cadre classique oĂč toutes les tĂąches sont connues Ă priori et nâĂ©chouent pas. Nous dĂ©crivons un algorithme dâordonnancement par listes de prioritĂ©, et prouvons de nouvelles bornes dâapproximation pour trois modĂšles de rendement classiques (roofline, communication, Amdahl, power, monotonic, et un modĂšle qui mĂ©lange ceux-ci). Nous dĂ©crivons Ă©galement un algorithme dâordonnancement par lots, au sein desquels les tĂąches pourront Ă©chouer un nombre limitĂ© de fois, et prouvons alors de nouvelles bornes dâapproximation pour des rendements quelconques. Enfin, nous effectuons des expĂ©riences sur un ensemble complet dâexemples pour comparer les niveaux de performance de diffĂ©rentes variantes de nos algorithmes, significativement meilleurs que les algorithmes simples usuels. Notre meilleure heuristique est en moyenne Ă un facteur dâune borne infĂ©rieure de la solution optimale, et Ă un facteur dans le pire cas
04231 Abstracts Collection -- Scheduling in Computer and Manufacturing Systems
During 31.05.-04.06.04, the Dagstuhl Seminar 04231 "Scheduling in Computer and Manufacturing Systems" was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available
Theory and Engineering of Scheduling Parallel Jobs
Scheduling is very important for an efficient utilization of modern parallel computing systems. In this thesis, four main research areas for scheduling are investigated: the interplay and distribution of decision makers, the efficient schedule computation, efficient scheduling for the memory hierarchy and energy-efficiency. The main result is a provably fast and efficient scheduling algorithm for malleable jobs. Experiments show the importance and possibilities of scheduling considering the memory hierarchy
Scheduling moldable {BSP} tasks
Our main goal in this paper is to study the scheduling of parallel BSP tasks on clusters of computers. We focus our attention on special characteristics of BSP tasks, which can use less processors than the original required, but with a particular cost model. We discuss the problem of scheduling a batch of BSP tasks on a fixed number of computers. The objective is to minimize the completion time of the last task (makespan). We show that the problem is difficult and present approximation algorithms and heuristics. We finish the paper presenting the results of extensive simulations under different workloads
An Empirical Evaluation of Multi-Resource Scheduling for Moldable Workflows
Resource scheduling plays a vital role in High-Performance Computing (HPC) systems. However, most scheduling research in HPC has focused on only a single type of resource (e.g., computing cores or I/O resources). With the advancement in hardware architectures and the increase in data-intensive HPC applications, there is a need to simultaneously embrace a diverse set of resources (e.g., computing cores, cache, memory, I/O, and network resources) in the design of run-time schedulers for improving the overall application performance. This thesis performs an empirical evaluation of a recently proposed multi-resource scheduling algorithm for minimizing the overall completion time (or makespan) of computational workflows comprised of moldable parallel jobs. Moldable parallel jobs allow the scheduler to select the resource allocations at launch time and thus can adapt to the available system resources (as compared to rigid jobs) while staying easy to design and implement (as compared to malleable jobs). The algorithm was proven to have a worst-case approximation ratio that grows linearly with the number of resource types for moldable workflows. In this thesis, a comprehensive set of simulations is conducted to empirically evaluate the performance of the algorithm using synthetic workflows generated by DAGGEN and moldable jobs that exhibit different speedup profiles. The results show that the algorithm fares better than the theoretical bound predicts, and it consistently outperforms two baseline heuristics under a variety of parameter settings, illustrating its robust practical performance
- âŠ