59 research outputs found

    Efficient Algorithms for Scheduling Moldable Tasks

    Full text link
    We study the problem of scheduling nn independent moldable tasks on mm processors that arises in large-scale parallel computations. When tasks are monotonic, the best known result is a (32+Ï”)(\frac{3}{2}+\epsilon)-approximation algorithm for makespan minimization with a complexity linear in nn and polynomial in log⁥m\log{m} and 1Ï”\frac{1}{\epsilon} where Ï”\epsilon is arbitrarily small. We propose a new perspective of the existing speedup models: the speedup of a task TjT_{j} is linear when the number pp of assigned processors is small (up to a threshold ÎŽj\delta_{j}) while it presents monotonicity when pp ranges in [ÎŽj,kj][\delta_{j}, k_{j}]; the bound kjk_{j} indicates an unacceptable overhead when parallelizing on too many processors. For a given integer Ύ≄5\delta\geq 5, let u=⌈ή2⌉−1u=\left\lceil \sqrt[2]{\delta} \right\rceil-1. In this paper, we propose a 1Ξ(ÎŽ)(1+Ï”)\frac{1}{\theta(\delta)} (1+\epsilon)-approximation algorithm for makespan minimization with a complexity O(nlog⁥nÏ”log⁥m)\mathcal{O}(n\log{\frac{n}{\epsilon}}\log{m}) where Ξ(ÎŽ)=u+1u+2(1−km)\theta(\delta) = \frac{u+1}{u+2}\left( 1- \frac{k}{m} \right) (m≫km\gg k). As a by-product, we also propose a Ξ(ÎŽ)\theta(\delta)-approximation algorithm for throughput maximization with a common deadline with a complexity O(n2log⁥m)\mathcal{O}(n^{2}\log{m})

    Scheduling Monotone Moldable Jobs in Linear Time

    Full text link
    A moldable job is a job that can be executed on an arbitrary number of processors, and whose processing time depends on the number of processors allotted to it. A moldable job is monotone if its work doesn't decrease for an increasing number of allotted processors. We consider the problem of scheduling monotone moldable jobs to minimize the makespan. We argue that for certain compact input encodings a polynomial algorithm has a running time polynomial in n and log(m), where n is the number of jobs and m is the number of machines. We describe how monotony of jobs can be used to counteract the increased problem complexity that arises from compact encodings, and give tight bounds on the approximability of the problem with compact encoding: it is NP-hard to solve optimally, but admits a PTAS. The main focus of this work are efficient approximation algorithms. We describe different techniques to exploit the monotony of the jobs for better running times, and present a (3/2+{\epsilon})-approximate algorithm whose running time is polynomial in log(m) and 1/{\epsilon}, and only linear in the number n of jobs

    Malleable Scheduling Beyond Identical Machines

    Get PDF
    In malleable job scheduling, jobs can be executed simultaneously on multiple machines with the processing time depending on the number of allocated machines. Jobs are required to be executed non-preemptively and in unison, in the sense that they occupy, during their execution, the same time interval over all the machines of the allocated set. In this work, we study generalizations of malleable job scheduling inspired by standard scheduling on unrelated machines. Specifically, we introduce a general model of malleable job scheduling, where each machine has a (possibly different) speed for each job, and the processing time of a job j on a set of allocated machines S depends on the total speed of S for j. For machines with unrelated speeds, we show that the optimal makespan cannot be approximated within a factor less than e/(e-1), unless P = NP. On the positive side, we present polynomial-time algorithms with approximation ratios 2e/(e-1) for machines with unrelated speeds, 3 for machines with uniform speeds, and 7/3 for restricted assignments on identical machines. Our algorithms are based on deterministic LP rounding and result in sparse schedules, in the sense that each machine shares at most one job with other machines. We also prove lower bounds on the integrality gap of 1+phi for unrelated speeds (phi is the golden ratio) and 2 for uniform speeds and restricted assignments. To indicate the generality of our approach, we show that it also yields constant factor approximation algorithms (i) for minimizing the sum of weighted completion times; and (ii) a variant where we determine the effective speed of a set of allocated machines based on the L_p norm of their speeds

    Multi-Resource List Scheduling of Moldable Parallel Jobs under Precedence Constraints

    Full text link
    The scheduling literature has traditionally focused on a single type of resource (e.g., computing nodes). However, scientific applications in modern High-Performance Computing (HPC) systems process large amounts of data, hence have diverse requirements on different types of resources (e.g., cores, cache, memory, I/O). All of these resources could potentially be exploited by the runtime scheduler to improve the application performance. In this paper, we study multi-resource scheduling to minimize the makespan of computational workflows comprised of parallel jobs subject to precedence constraints. The jobs are assumed to be moldable, allowing the scheduler to flexibly select a variable set of resources before execution. We propose a multi-resource, list-based scheduling algorithm, and prove that, on a system with dd types of schedulable resources, our algorithm achieves an approximation ratio of 1.619d+2.545d+11.619d+2.545\sqrt{d}+1 for any dd, and a ratio of d+O(d23)d+O(\sqrt[3]{d^2}) for large dd. We also present improved results for independent jobs and for jobs with special precedence constraints (e.g., series-parallel graphs and trees). Finally, we prove a lower bound of dd on the approximation ratio of any list scheduling scheme with local priority considerations. To the best of our knowledge, these are the first approximation results for moldable workflows with multiple resource requirements

    Closing the Gap for Pseudo-Polynomial Strip Packing

    Get PDF
    Two-dimensional packing problems are a fundamental class of optimization problems and Strip Packing is one of the most natural and famous among them. Indeed it can be defined in just one sentence: Given a set of rectangular axis parallel items and a strip with bounded width and infinite height, the objective is to find a packing of the items into the strip minimizing the packing height. We speak of pseudo-polynomial Strip Packing if we consider algorithms with pseudo-polynomial running time with respect to the width of the strip. It is known that there is no pseudo-polynomial time algorithm for Strip Packing with a ratio better than 5/4 unless P = NP. The best algorithm so far has a ratio of 4/3 + epsilon. In this paper, we close the gap between inapproximability result and currently known algorithms by presenting an algorithm with approximation ratio 5/4 + epsilon. The algorithm relies on a new structural result which is the main accomplishment of this paper. It states that each optimal solution can be transformed with bounded loss in the objective such that it has one of a polynomial number of different forms thus making the problem tractable by standard techniques, i.e., dynamic programming. To show the conceptual strength of the approach, we extend our result to other problems as well, e.g., Strip Packing with 90 degree rotations and Contiguous Moldable Task Scheduling, and present algorithms with approximation ratio 5/4 + epsilon for these problems as well

    Ordonnancement avec tolérance aux pannes pour des tùches parallÚles à nombre de processeurs programmable

    Get PDF
    We study the resilient scheduling of moldable parallel jobs on high-performance computing (HPC) platforms. Moldable jobs allow for choosing a processor allocation before execution, and their execution time obeys various speedup models. The objective is to minimize the overall completion time of the jobs, or the makespan, when jobs can fail due to silent errors and hence may need to be re-executed after each failure until successful completion. Our work generalizes the classical scheduling framework for failure-free jobs. To cope with silent errors, we introduce two resilient scheduling algorithms, LPA-List and Batch-List, both of which use the List strategy to schedule the jobs. Without knowing a priori how many times each job will fail, LPA-List relies on a local strategy to allocate processors to the jobs, while Batch-List schedules the jobs in batches and allows only a restricted number of failures per job in each batch. We prove new approximation ratios for the two algorithms under several prominent speedup models (e.g., roofline, communication, Amdahl, power, monotonic, and a mixed model). An extensive set of simulations is conducted to evaluate different variants of the two algorithms, and the results show that they consistently outperform some baseline heuristics. Overall, our best algorithm is within a factor of 1.6 of a lower bound on average over the entire set of experiments, and within a factor of 4.2 in the worst case.Ce rapport Ă©tudie l’ordonnancement rĂ©silient de tĂąches sur des plateformes de calcul Ă  haute performance. Dans le problĂšme Ă©tudiĂ©, il est possible de choisir le nombre constant de processeurs effectuant chaque tĂąche, en dĂ©terminant le temps d’exĂ©cution de ces derniĂšres selon diffĂ©rent modĂšles de rendement. Nous dĂ©crivons des algorithmes dont l’objectif est deminimiser le temps total d’exĂ©cution, sachant que les tĂąches sont susceptibles d’échouer et de devoir ĂȘtre rĂ©-effectuĂ©es Ă  chaque erreur. Ce problĂšme est donc une gĂ©nĂ©ralisation du cadre classique oĂč toutes les tĂąches sont connues Ă  priori et n’échouent pas. Nous dĂ©crivons un algorithme d’ordonnancement par listes de prioritĂ©, et prouvons de nouvelles bornes d’approximation pour trois modĂšles de rendement classiques (roofline, communication, Amdahl, power, monotonic, et un modĂšle qui mĂ©lange ceux-ci). Nous dĂ©crivons Ă©galement un algorithme d’ordonnancement par lots, au sein desquels les tĂąches pourront Ă©chouer un nombre limitĂ© de fois, et prouvons alors de nouvelles bornes d’approximation pour des rendements quelconques. Enfin, nous effectuons des expĂ©riences sur un ensemble complet d’exemples pour comparer les niveaux de performance de diffĂ©rentes variantes de nos algorithmes, significativement meilleurs que les algorithmes simples usuels. Notre meilleure heuristique est en moyenne Ă  un facteur 1.61.6 d’une borne infĂ©rieure de la solution optimale, et Ă  un facteur 4.24.2 dans le pire cas

    04231 Abstracts Collection -- Scheduling in Computer and Manufacturing Systems

    Get PDF
    During 31.05.-04.06.04, the Dagstuhl Seminar 04231 "Scheduling in Computer and Manufacturing Systems" was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available

    Theory and Engineering of Scheduling Parallel Jobs

    Get PDF
    Scheduling is very important for an efficient utilization of modern parallel computing systems. In this thesis, four main research areas for scheduling are investigated: the interplay and distribution of decision makers, the efficient schedule computation, efficient scheduling for the memory hierarchy and energy-efficiency. The main result is a provably fast and efficient scheduling algorithm for malleable jobs. Experiments show the importance and possibilities of scheduling considering the memory hierarchy

    Scheduling moldable {BSP} tasks

    Get PDF
    Our main goal in this paper is to study the scheduling of parallel BSP tasks on clusters of computers. We focus our attention on special characteristics of BSP tasks, which can use less processors than the original required, but with a particular cost model. We discuss the problem of scheduling a batch of BSP tasks on a fixed number of computers. The objective is to minimize the completion time of the last task (makespan). We show that the problem is difficult and present approximation algorithms and heuristics. We finish the paper presenting the results of extensive simulations under different workloads

    An Empirical Evaluation of Multi-Resource Scheduling for Moldable Workflows

    Get PDF
    Resource scheduling plays a vital role in High-Performance Computing (HPC) systems. However, most scheduling research in HPC has focused on only a single type of resource (e.g., computing cores or I/O resources). With the advancement in hardware architectures and the increase in data-intensive HPC applications, there is a need to simultaneously embrace a diverse set of resources (e.g., computing cores, cache, memory, I/O, and network resources) in the design of run-time schedulers for improving the overall application performance. This thesis performs an empirical evaluation of a recently proposed multi-resource scheduling algorithm for minimizing the overall completion time (or makespan) of computational workflows comprised of moldable parallel jobs. Moldable parallel jobs allow the scheduler to select the resource allocations at launch time and thus can adapt to the available system resources (as compared to rigid jobs) while staying easy to design and implement (as compared to malleable jobs). The algorithm was proven to have a worst-case approximation ratio that grows linearly with the number of resource types for moldable workflows. In this thesis, a comprehensive set of simulations is conducted to empirically evaluate the performance of the algorithm using synthetic workflows generated by DAGGEN and moldable jobs that exhibit different speedup profiles. The results show that the algorithm fares better than the theoretical bound predicts, and it consistently outperforms two baseline heuristics under a variety of parameter settings, illustrating its robust practical performance
    • 

    corecore