27,673 research outputs found
Co-Scheduling Algorithms for High-Throughput Workload Execution
This paper investigates co-scheduling algorithms for processing a set of
parallel applications. Instead of executing each application one by one, using
a maximum degree of parallelism for each of them, we aim at scheduling several
applications concurrently. We partition the original application set into a
series of packs, which are executed one by one. A pack comprises several
applications, each of them with an assigned number of processors, with the
constraint that the total number of processors assigned within a pack does not
exceed the maximum number of available processors. The objective is to
determine a partition into packs, and an assignment of processors to
applications, that minimize the sum of the execution times of the packs. We
thoroughly study the complexity of this optimization problem, and propose
several heuristics that exhibit very good performance on a variety of
workloads, whose application execution times model profiles of parallel
scientific codes. We show that co-scheduling leads to to faster workload
completion time and to faster response times on average (hence increasing
system throughput and saving energy), for significant benefits over traditional
scheduling from both the user and system perspectives
Efficient Parallelization of Short-Range Molecular Dynamics Simulations on Many-Core Systems
This article introduces a highly parallel algorithm for molecular dynamics
simulations with short-range forces on single node multi- and many-core
systems. The algorithm is designed to achieve high parallel speedups for
strongly inhomogeneous systems like nanodevices or nanostructured materials. In
the proposed scheme the calculation of the forces and the generation of
neighbor lists is divided into small tasks. The tasks are then executed by a
thread pool according to a dependent task schedule. This schedule is
constructed in such a way that a particle is never accessed by two threads at
the same time.Benchmark simulations on a typical 12 core machine show that the
described algorithm achieves excellent parallel efficiencies above 80 % for
different kinds of systems and all numbers of cores. For inhomogeneous systems
the speedups are strongly superior to those obtained with spatial
decomposition. Further benchmarks were performed on an Intel Xeon Phi
coprocessor. These simulations demonstrate that the algorithm scales well to
large numbers of cores.Comment: 12 pages, 8 figure
A C-DAG task model for scheduling complex real-time tasks on heterogeneous platforms: preemption matters
Recent commercial hardware platforms for embedded real-time systems feature
heterogeneous processing units and computing accelerators on the same
System-on-Chip. When designing complex real-time application for such
architectures, the designer needs to make a number of difficult choices: on
which processor should a certain task be implemented? Should a component be
implemented in parallel or sequentially? These choices may have a great impact
on feasibility, as the difference in the processor internal architectures
impact on the tasks' execution time and preemption cost. To help the designer
explore the wide space of design choices and tune the scheduling parameters, in
this paper we propose a novel real-time application model, called C-DAG,
specifically conceived for heterogeneous platforms. A C-DAG allows to specify
alternative implementations of the same component of an application for
different processing engines to be selected off-line, as well as conditional
branches to model if-then-else statements to be selected at run-time. We also
propose a schedulability analysis for the C-DAG model and a heuristic
allocation algorithm so that all deadlines are respected. Our analysis takes
into account the cost of preempting a task, which can be non-negligible on
certain processors. We demonstrate the effectiveness of our approach on a large
set of synthetic experiments by comparing with state of the art algorithms in
the literature
Integrating Job Parallelism in Real-Time Scheduling Theory
We investigate the global scheduling of sporadic, implicit deadline,
real-time task systems on multiprocessor platforms. We provide a task model
which integrates job parallelism. We prove that the time-complexity of the
feasibility problem of these systems is linear relatively to the number of
(sporadic) tasks for a fixed number of processors. We propose a scheduling
algorithm theoretically optimal (i.e., preemptions and migrations neglected).
Moreover, we provide an exact feasibility utilization bound. Lastly, we propose
a technique to limit the number of migrations and preemptions
- …