11,402 research outputs found

    A scheduling theory framework for GPU tasks efficient execution

    Get PDF
    Concurrent execution of tasks in GPUs can reduce the computation time of a workload by overlapping data transfer and execution commands. However it is difficult to implement an efficient run- time scheduler that minimizes the workload makespan as many execution orderings should be evaluated. In this paper, we employ scheduling theory to build a model that takes into account the device capabili- ties, workload characteristics, constraints and objec- tive functions. In our model, GPU tasks schedul- ing is reformulated as a flow shop scheduling prob- lem, which allow us to apply and compare well known methods already developed in the operations research field. In addition we develop a new heuristic, specif- ically focused on executing GPU commands, that achieves better scheduling results than previous tech- niques. Finally, a comprehensive evaluation, showing the suitability and robustness of this new approach, is conducted in three different NVIDIA architectures (Kepler, Maxwell and Pascal).Proyecto TIN2016- 0920R, Universidad de Málaga (Campus de Excelencia Internacional Andalucía Tech) y programa de donación de NVIDIA Corporation

    An EPTAS for Scheduling on Unrelated Machines of Few Different Types

    Full text link
    In the classical problem of scheduling on unrelated parallel machines, a set of jobs has to be assigned to a set of machines. The jobs have a processing time depending on the machine and the goal is to minimize the makespan, that is the maximum machine load. It is well known that this problem is NP-hard and does not allow polynomial time approximation algorithms with approximation guarantees smaller than 1.51.5 unless P==NP. We consider the case that there are only a constant number KK of machine types. Two machines have the same type if all jobs have the same processing time for them. This variant of the problem is strongly NP-hard already for K=1K=1. We present an efficient polynomial time approximation scheme (EPTAS) for the problem, that is, for any ε>0\varepsilon > 0 an assignment with makespan of length at most (1+ε)(1+\varepsilon) times the optimum can be found in polynomial time in the input length and the exponent is independent of 1/ε1/\varepsilon. In particular we achieve a running time of 2O(Klog(K)1εlog41ε)+poly(I)2^{\mathcal{O}(K\log(K) \frac{1}{\varepsilon}\log^4 \frac{1}{\varepsilon})}+\mathrm{poly}(|I|), where I|I| denotes the input length. Furthermore, we study three other problem variants and present an EPTAS for each of them: The Santa Claus problem, where the minimum machine load has to be maximized; the case of scheduling on unrelated parallel machines with a constant number of uniform types, where machines of the same type behave like uniformly related machines; and the multidimensional vector scheduling variant of the problem where both the dimension and the number of machine types are constant. For the Santa Claus problem we achieve the same running time. The results are achieved, using mixed integer linear programming and rounding techniques

    Learning Scheduling Algorithms for Data Processing Clusters

    Full text link
    Efficiently scheduling data processing jobs on distributed compute clusters requires complex algorithms. Current systems, however, use simple generalized heuristics and ignore workload characteristics, since developing and tuning a scheduling policy for each workload is infeasible. In this paper, we show that modern machine learning techniques can generate highly-efficient policies automatically. Decima uses reinforcement learning (RL) and neural networks to learn workload-specific scheduling algorithms without any human instruction beyond a high-level objective such as minimizing average job completion time. Off-the-shelf RL techniques, however, cannot handle the complexity and scale of the scheduling problem. To build Decima, we had to develop new representations for jobs' dependency graphs, design scalable RL models, and invent RL training methods for dealing with continuous stochastic job arrivals. Our prototype integration with Spark on a 25-node cluster shows that Decima improves the average job completion time over hand-tuned scheduling heuristics by at least 21%, achieving up to 2x improvement during periods of high cluster load

    A static scheduling approach to enable safety-critical OpenMP applications

    Get PDF
    Parallel computation is fundamental to satisfy the performance requirements of advanced safety-critical systems. OpenMP is a good candidate to exploit the performance opportunities of parallel platforms. However, safety-critical systems are often based on static allocation strategies, whereas current OpenMP implementations are based on dynamic schedulers. This paper proposes two OpenMP-compliant static allocation approaches: an optimal but costly approach based on an ILP formulation, and a sub-optimal but tractable approach that computes a worst-case makespan bound close to the optimal one.This work is funded by the EU projects P-SOCRATES (FP7-ICT-2013-10) and HERCULES (H2020/ICT/2015/688860), and the Spanish Ministry of Science and Innovation under contract TIN2015-65316-P.Peer ReviewedPostprint (author's final draft
    corecore