142 research outputs found

    Efficient Algorithms for Scheduling Moldable Tasks

    Full text link
    We study the problem of scheduling nn independent moldable tasks on mm processors that arises in large-scale parallel computations. When tasks are monotonic, the best known result is a (32+Ï”)(\frac{3}{2}+\epsilon)-approximation algorithm for makespan minimization with a complexity linear in nn and polynomial in log⁥m\log{m} and 1Ï”\frac{1}{\epsilon} where Ï”\epsilon is arbitrarily small. We propose a new perspective of the existing speedup models: the speedup of a task TjT_{j} is linear when the number pp of assigned processors is small (up to a threshold ÎŽj\delta_{j}) while it presents monotonicity when pp ranges in [ÎŽj,kj][\delta_{j}, k_{j}]; the bound kjk_{j} indicates an unacceptable overhead when parallelizing on too many processors. For a given integer Ύ≄5\delta\geq 5, let u=⌈ή2⌉−1u=\left\lceil \sqrt[2]{\delta} \right\rceil-1. In this paper, we propose a 1Ξ(ÎŽ)(1+Ï”)\frac{1}{\theta(\delta)} (1+\epsilon)-approximation algorithm for makespan minimization with a complexity O(nlog⁥nÏ”log⁥m)\mathcal{O}(n\log{\frac{n}{\epsilon}}\log{m}) where Ξ(ÎŽ)=u+1u+2(1−km)\theta(\delta) = \frac{u+1}{u+2}\left( 1- \frac{k}{m} \right) (m≫km\gg k). As a by-product, we also propose a Ξ(ÎŽ)\theta(\delta)-approximation algorithm for throughput maximization with a common deadline with a complexity O(n2log⁥m)\mathcal{O}(n^{2}\log{m})

    Scheduling Monotone Moldable Jobs in Linear Time

    Full text link
    A moldable job is a job that can be executed on an arbitrary number of processors, and whose processing time depends on the number of processors allotted to it. A moldable job is monotone if its work doesn't decrease for an increasing number of allotted processors. We consider the problem of scheduling monotone moldable jobs to minimize the makespan. We argue that for certain compact input encodings a polynomial algorithm has a running time polynomial in n and log(m), where n is the number of jobs and m is the number of machines. We describe how monotony of jobs can be used to counteract the increased problem complexity that arises from compact encodings, and give tight bounds on the approximability of the problem with compact encoding: it is NP-hard to solve optimally, but admits a PTAS. The main focus of this work are efficient approximation algorithms. We describe different techniques to exploit the monotony of the jobs for better running times, and present a (3/2+{\epsilon})-approximate algorithm whose running time is polynomial in log(m) and 1/{\epsilon}, and only linear in the number n of jobs

    Malleable Scheduling Beyond Identical Machines

    Get PDF
    In malleable job scheduling, jobs can be executed simultaneously on multiple machines with the processing time depending on the number of allocated machines. Jobs are required to be executed non-preemptively and in unison, in the sense that they occupy, during their execution, the same time interval over all the machines of the allocated set. In this work, we study generalizations of malleable job scheduling inspired by standard scheduling on unrelated machines. Specifically, we introduce a general model of malleable job scheduling, where each machine has a (possibly different) speed for each job, and the processing time of a job j on a set of allocated machines S depends on the total speed of S for j. For machines with unrelated speeds, we show that the optimal makespan cannot be approximated within a factor less than e/(e-1), unless P = NP. On the positive side, we present polynomial-time algorithms with approximation ratios 2e/(e-1) for machines with unrelated speeds, 3 for machines with uniform speeds, and 7/3 for restricted assignments on identical machines. Our algorithms are based on deterministic LP rounding and result in sparse schedules, in the sense that each machine shares at most one job with other machines. We also prove lower bounds on the integrality gap of 1+phi for unrelated speeds (phi is the golden ratio) and 2 for uniform speeds and restricted assignments. To indicate the generality of our approach, we show that it also yields constant factor approximation algorithms (i) for minimizing the sum of weighted completion times; and (ii) a variant where we determine the effective speed of a set of allocated machines based on the L_p norm of their speeds

    Scheduling malleable task trees

    Get PDF
    Solving sparse linear systems can lead to processing tree workflows on a platform of processors. In this study, we use the model of malleable tasks motivated in [Prasanna96,Beaumont07] in order to study tree workflow schedules under two contradictory objectives: makespan minimization and memory minization. First, we give a simpler proof of the result of [Prasanna96] which allows to compute a makespan-optimal schedule for tree workflows. Then, we study a more realistic speed-up function and show that the previous schedules are not optimal in this context. Finally, we give complexity results concerning the objective of minimizing both makespan and memory

    Provably Efficient Adaptive Scheduling for Parallel Jobs

    Get PDF
    Scheduling competing jobs on multiprocessors has always been an important issue for parallel and distributed systems. The challenge is to ensure global, system-wide efficiency while offering a level of fairness to user jobs. Various degrees of successes have been achieved over the years. However, few existing schemes address both efficiency and fairness over a wide range of work loads. Moreover, in order to obtain analytical results, most of them require prior information about jobs, which may be difficult to obtain in real applications. This paper presents two novel adaptive scheduling algorithms -- GRAD for centralized scheduling, and WRAD for distributed scheduling. Both GRAD and WRAD ensure fair allocation under all levels of workload, and they offer provable efficiency without requiring prior information of job's parallelism. Moreover, they provide effective control over the scheduling overhead and ensure efficient utilization of processors. To the best of our knowledge, they are the first non-clairvoyant scheduling algorithms that offer such guarantees. We also believe that our new approach of resource request-allotment protocol deserves further exploration. Specifically, both GRAD and WRAD are O(1)-competitive with respect to mean response time for batched jobs, and O(1)-competitive with respect to makespan for non-batched jobs with arbitrary release times. The simulation results show that, for non-batched jobs, the makespan produced by GRAD is no more than 1.39 times of the optimal on average and it never exceeds 4.5 times. For batched jobs, the mean response time produced by GRAD is no more than 2.37 times of the optimal on average, and it never exceeds 5.5 times.Singapore-MIT Alliance (SMA

    Malleable task-graph scheduling with a practical speed-up model

    Get PDF
    Scientific workloads are often described by Directed Acyclic task Graphs.Indeed, DAGs represent both a model frequently studied in theoretical literature and the structure employed by dynamic runtime schedulers to handle HPC applications. A natural problem is then to compute a makespan-minimizing schedule of a given graph. In this paper, we are motivated by task graphs arising from multifrontal factorizations of sparsematrices and therefore work under the following practical model. We focus on malleable tasks (i.e., a single task can be allotted a time-varying number of processors) and specifically on a simple yet realistic speedup model: each task can be perfectly parallelized, but only up to a limited number of processors. We first prove that the associated decision problem of minimizing the makespan is NP-Complete. Then, we study a widely used algorithm, PropScheduling, under this practical model and propose a new strategy GreedyFilling. Even though both strategies are 2-approximations, experiments on real and synthetic data sets show that GreedyFilling achieves significantly lower makespans

    Efficient Parallel Scheduling of Malleable Tasks

    Get PDF
    • 

    corecore