2,959 research outputs found

    Scheduling Parallel Jobs with Linear Speedup

    Get PDF
    We consider a scheduling problem where a set of jobs is distributed over parallel machines. The processing time of any job is dependent on the usage of a scarce renewable resource, e.g., personnel. An amount of k units of that resource can be allocated to the jobs at any time, and the more of that resource is allocated to a job, the smaller its processing time. The dependence of processing times on the amount of resources is linear for any job. The objective is to find a resource allocation and a schedule that minimizes the makespan. Utilizing an integer quadratic programming relaxation, we show how to obtain a (3+e)-approximation algorithm for that problem, for any e>0. This generalizes and improves previous results, respectively. Our approach relies on a fully polynomial time approximation scheme to solve the quadratic programming relaxation. This result is interesting in itself, because the underlying quadratic program is NP-hard to solve in general. We also briefly discuss variants of the problem and derive lower bounds.operations research and management science;

    Feasibility Tests for Recurrent Real-Time Tasks in the Sporadic DAG Model

    Full text link
    A model has been proposed in [Baruah et al., in Proceedings of the IEEE Real-Time Systems Symposium 2012] for representing recurrent precedence-constrained tasks to be executed on multiprocessor platforms, where each recurrent task is modeled by a directed acyclic graph (DAG), a period, and a relative deadline. Each vertex of the DAG represents a sequential job, while the edges of the DAG represent precedence constraints between these jobs. All the jobs of the DAG are released simultaneously and have to be completed within some specified relative deadline. The task may release jobs in this manner an unbounded number of times, with successive releases occurring at least the specified period apart. The feasibility problem is to determine whether such a recurrent task can be scheduled to always meet all deadlines on a specified number of dedicated processors. The case of a single task has been considered in [Baruah et al., 2012]. The main contribution of this paper is to consider the case of multiple tasks. We show that EDF has a speedup bound of 2-1/m, where m is the number of processors. Moreover, we present polynomial and pseudopolynomial schedulability tests, of differing effectiveness, for determining whether a set of sporadic DAG tasks can be scheduled by EDF to meet all deadlines on a specified number of processors

    Towards Optimality in Parallel Scheduling

    Full text link
    To keep pace with Moore's law, chip designers have focused on increasing the number of cores per chip rather than single core performance. In turn, modern jobs are often designed to run on any number of cores. However, to effectively leverage these multi-core chips, one must address the question of how many cores to assign to each job. Given that jobs receive sublinear speedups from additional cores, there is an obvious tradeoff: allocating more cores to an individual job reduces the job's runtime, but in turn decreases the efficiency of the overall system. We ask how the system should schedule jobs across cores so as to minimize the mean response time over a stream of incoming jobs. To answer this question, we develop an analytical model of jobs running on a multi-core machine. We prove that EQUI, a policy which continuously divides cores evenly across jobs, is optimal when all jobs follow a single speedup curve and have exponentially distributed sizes. EQUI requires jobs to change their level of parallelization while they run. Since this is not possible for all workloads, we consider a class of "fixed-width" policies, which choose a single level of parallelization, k, to use for all jobs. We prove that, surprisingly, it is possible to achieve EQUI's performance without requiring jobs to change their levels of parallelization by using the optimal fixed level of parallelization, k*. We also show how to analytically derive the optimal k* as a function of the system load, the speedup curve, and the job size distribution. In the case where jobs may follow different speedup curves, finding a good scheduling policy is even more challenging. We find that policies like EQUI which performed well in the case of a single speedup function now perform poorly. We propose a very simple policy, GREEDY*, which performs near-optimally when compared to the numerically-derived optimal policy

    Priority-enabled Scheduling for Resizable Parallel Applications

    Get PDF
    In this paper, we illustrate the impact of dynamic resizability on parallel scheduling. Our ReSHAPE framework includes an application scheduler that supports dynamic resizing of parallel applications. We propose and evaluate new scheduling policies made possible by our ReSHAPE framework. The framework also provides a platform to experiment with more interesting and sophisticated scheduling policies and scenarios for resizable parallel applications. The proposed policies support scheduling of parallel applications with and without user assigned priorities. Experimental results show that these scheduling policies significantly improve individual application turn around time as well as overall cluster utilization

    Efficient Algorithms for Scheduling Moldable Tasks

    Full text link
    We study the problem of scheduling nn independent moldable tasks on mm processors that arises in large-scale parallel computations. When tasks are monotonic, the best known result is a (32+Ï”)(\frac{3}{2}+\epsilon)-approximation algorithm for makespan minimization with a complexity linear in nn and polynomial in log⁥m\log{m} and 1Ï”\frac{1}{\epsilon} where Ï”\epsilon is arbitrarily small. We propose a new perspective of the existing speedup models: the speedup of a task TjT_{j} is linear when the number pp of assigned processors is small (up to a threshold ÎŽj\delta_{j}) while it presents monotonicity when pp ranges in [ÎŽj,kj][\delta_{j}, k_{j}]; the bound kjk_{j} indicates an unacceptable overhead when parallelizing on too many processors. For a given integer Ύ≄5\delta\geq 5, let u=⌈ή2⌉−1u=\left\lceil \sqrt[2]{\delta} \right\rceil-1. In this paper, we propose a 1Ξ(ÎŽ)(1+Ï”)\frac{1}{\theta(\delta)} (1+\epsilon)-approximation algorithm for makespan minimization with a complexity O(nlog⁥nÏ”log⁥m)\mathcal{O}(n\log{\frac{n}{\epsilon}}\log{m}) where Ξ(ÎŽ)=u+1u+2(1−km)\theta(\delta) = \frac{u+1}{u+2}\left( 1- \frac{k}{m} \right) (m≫km\gg k). As a by-product, we also propose a Ξ(ÎŽ)\theta(\delta)-approximation algorithm for throughput maximization with a common deadline with a complexity O(n2log⁥m)\mathcal{O}(n^{2}\log{m})

    Scheduling Monotone Moldable Jobs in Linear Time

    Full text link
    A moldable job is a job that can be executed on an arbitrary number of processors, and whose processing time depends on the number of processors allotted to it. A moldable job is monotone if its work doesn't decrease for an increasing number of allotted processors. We consider the problem of scheduling monotone moldable jobs to minimize the makespan. We argue that for certain compact input encodings a polynomial algorithm has a running time polynomial in n and log(m), where n is the number of jobs and m is the number of machines. We describe how monotony of jobs can be used to counteract the increased problem complexity that arises from compact encodings, and give tight bounds on the approximability of the problem with compact encoding: it is NP-hard to solve optimally, but admits a PTAS. The main focus of this work are efficient approximation algorithms. We describe different techniques to exploit the monotony of the jobs for better running times, and present a (3/2+{\epsilon})-approximate algorithm whose running time is polynomial in log(m) and 1/{\epsilon}, and only linear in the number n of jobs

    Topology-aware GPU scheduling for learning workloads in cloud environments

    Get PDF
    Recent advances in hardware, such as systems with multiple GPUs and their availability in the cloud, are enabling deep learning in various domains including health care, autonomous vehicles, and Internet of Things. Multi-GPU systems exhibit complex connectivity among GPUs and between GPUs and CPUs. Workload schedulers must consider hardware topology and workload communication requirements in order to allocate CPU and GPU resources for optimal execution time and improved utilization in shared cloud environments. This paper presents a new topology-aware workload placement strategy to schedule deep learning jobs on multi-GPU systems. The placement strategy is evaluated with a prototype on a Power8 machine with Tesla P100 cards, showing speedups of up to ≈1.30x compared to state-of-the-art strategies; the proposed algorithm achieves this result by allocating GPUs that satisfy workload requirements while preventing interference. Additionally, a large-scale simulation shows that the proposed strategy provides higher resource utilization and performance in cloud systems.This project is supported by the IBM/BSC Technology Center for Supercomputing collaboration agreement. It has also received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 639595). It is also partially supported by the Ministry of Economy of Spain under contract TIN2015-65316-P and Generalitat de Catalunya under contract 2014SGR1051, by the ICREA Academia program, and by the BSC-CNS Severo Ochoa program (SEV-2015-0493). We thank our IBM Research colleagues Alaa Youssef and Asser Tantawi for the valuable discussions. We also thank SC17 committee member Blair Bethwaite of Monash University for his constructive feedback on the earlier drafts of this paper.Peer ReviewedPostprint (published version
    • 

    corecore