11,402 research outputs found
A scheduling theory framework for GPU tasks efficient execution
Concurrent execution of tasks in GPUs can reduce the computation time of a workload by
overlapping data transfer and execution commands.
However it is difficult to implement an efficient run-
time scheduler that minimizes the workload makespan
as many execution orderings should be evaluated. In
this paper, we employ scheduling theory to build a
model that takes into account the device capabili-
ties, workload characteristics, constraints and objec-
tive functions. In our model, GPU tasks schedul-
ing is reformulated as a flow shop scheduling prob-
lem, which allow us to apply and compare well known
methods already developed in the operations research
field. In addition we develop a new heuristic, specif-
ically focused on executing GPU commands, that
achieves better scheduling results than previous tech-
niques. Finally, a comprehensive evaluation, showing
the suitability and robustness of this new approach,
is conducted in three different NVIDIA architectures
(Kepler, Maxwell and Pascal).Proyecto TIN2016- 0920R, Universidad de Málaga (Campus de Excelencia Internacional Andalucía Tech) y programa de donación de NVIDIA Corporation
An EPTAS for Scheduling on Unrelated Machines of Few Different Types
In the classical problem of scheduling on unrelated parallel machines, a set
of jobs has to be assigned to a set of machines. The jobs have a processing
time depending on the machine and the goal is to minimize the makespan, that is
the maximum machine load. It is well known that this problem is NP-hard and
does not allow polynomial time approximation algorithms with approximation
guarantees smaller than unless PNP. We consider the case that there
are only a constant number of machine types. Two machines have the same
type if all jobs have the same processing time for them. This variant of the
problem is strongly NP-hard already for . We present an efficient
polynomial time approximation scheme (EPTAS) for the problem, that is, for any
an assignment with makespan of length at most
times the optimum can be found in polynomial time in the
input length and the exponent is independent of . In particular
we achieve a running time of , where
denotes the input length. Furthermore, we study three other problem
variants and present an EPTAS for each of them: The Santa Claus problem, where
the minimum machine load has to be maximized; the case of scheduling on
unrelated parallel machines with a constant number of uniform types, where
machines of the same type behave like uniformly related machines; and the
multidimensional vector scheduling variant of the problem where both the
dimension and the number of machine types are constant. For the Santa Claus
problem we achieve the same running time. The results are achieved, using mixed
integer linear programming and rounding techniques
Learning Scheduling Algorithms for Data Processing Clusters
Efficiently scheduling data processing jobs on distributed compute clusters
requires complex algorithms. Current systems, however, use simple generalized
heuristics and ignore workload characteristics, since developing and tuning a
scheduling policy for each workload is infeasible. In this paper, we show that
modern machine learning techniques can generate highly-efficient policies
automatically. Decima uses reinforcement learning (RL) and neural networks to
learn workload-specific scheduling algorithms without any human instruction
beyond a high-level objective such as minimizing average job completion time.
Off-the-shelf RL techniques, however, cannot handle the complexity and scale of
the scheduling problem. To build Decima, we had to develop new representations
for jobs' dependency graphs, design scalable RL models, and invent RL training
methods for dealing with continuous stochastic job arrivals. Our prototype
integration with Spark on a 25-node cluster shows that Decima improves the
average job completion time over hand-tuned scheduling heuristics by at least
21%, achieving up to 2x improvement during periods of high cluster load
A static scheduling approach to enable safety-critical OpenMP applications
Parallel computation is fundamental to satisfy the performance requirements of advanced safety-critical systems. OpenMP is a good candidate to exploit the performance opportunities of parallel platforms. However, safety-critical systems are often based on static allocation strategies, whereas current OpenMP implementations are based on dynamic schedulers. This paper proposes two OpenMP-compliant static allocation approaches: an optimal but costly approach based on an ILP formulation, and a sub-optimal but tractable approach that computes a worst-case makespan bound close to the optimal one.This work is funded by the EU projects P-SOCRATES (FP7-ICT-2013-10) and HERCULES (H2020/ICT/2015/688860), and the Spanish Ministry of Science and Innovation under contract TIN2015-65316-P.Peer ReviewedPostprint (author's final draft
- …