31,814 research outputs found
A Graph-Partition-Based Scheduling Policy for Heterogeneous Architectures
In order to improve system performance efficiently, a number of systems
choose to equip multi-core and many-core processors (such as GPUs). Due to
their discrete memory these heterogeneous architectures comprise a distributed
system within a computer. A data-flow programming model is attractive in this
setting for its ease of expressing concurrency. Programmers only need to define
task dependencies without considering how to schedule them on the hardware.
However, mapping the resulting task graph onto hardware efficiently remains a
challenge. In this paper, we propose a graph-partition scheduling policy for
mapping data-flow workloads to heterogeneous hardware. According to our
experiments, our graph-partition-based scheduling achieves comparable
performance to conventional queue-base approaches.Comment: Presented at DATE Friday Workshop on Heterogeneous Architectures and
Design Methods for Embedded Image Systems (HIS 2015) (arXiv:1502.07241
Scheduling data flow program in xkaapi: A new affinity based Algorithm for Heterogeneous Architectures
Efficient implementations of parallel applications on heterogeneous hybrid
architectures require a careful balance between computations and communications
with accelerator devices. Even if most of the communication time can be
overlapped by computations, it is essential to reduce the total volume of
communicated data. The literature therefore abounds with ad-hoc methods to
reach that balance, but that are architecture and application dependent. We
propose here a generic mechanism to automatically optimize the scheduling
between CPUs and GPUs, and compare two strategies within this mechanism: the
classical Heterogeneous Earliest Finish Time (HEFT) algorithm and our new,
parametrized, Distributed Affinity Dual Approximation algorithm (DADA), which
consists in grouping the tasks by affinity before running a fast dual
approximation. We ran experiments on a heterogeneous parallel machine with six
CPU cores and eight NVIDIA Fermi GPUs. Three standard dense linear algebra
kernels from the PLASMA library have been ported on top of the Xkaapi runtime.
We report their performances. It results that HEFT and DADA perform well for
various experimental conditions, but that DADA performs better for larger
systems and number of GPUs, and, in most cases, generates much lower data
transfers than HEFT to achieve the same performance
- …