11,673 research outputs found
Scheduling data flow program in xkaapi: A new affinity based Algorithm for Heterogeneous Architectures
Efficient implementations of parallel applications on heterogeneous hybrid
architectures require a careful balance between computations and communications
with accelerator devices. Even if most of the communication time can be
overlapped by computations, it is essential to reduce the total volume of
communicated data. The literature therefore abounds with ad-hoc methods to
reach that balance, but that are architecture and application dependent. We
propose here a generic mechanism to automatically optimize the scheduling
between CPUs and GPUs, and compare two strategies within this mechanism: the
classical Heterogeneous Earliest Finish Time (HEFT) algorithm and our new,
parametrized, Distributed Affinity Dual Approximation algorithm (DADA), which
consists in grouping the tasks by affinity before running a fast dual
approximation. We ran experiments on a heterogeneous parallel machine with six
CPU cores and eight NVIDIA Fermi GPUs. Three standard dense linear algebra
kernels from the PLASMA library have been ported on top of the Xkaapi runtime.
We report their performances. It results that HEFT and DADA perform well for
various experimental conditions, but that DADA performs better for larger
systems and number of GPUs, and, in most cases, generates much lower data
transfers than HEFT to achieve the same performance
Task scheduling techniques for asymmetric multi-core systems
As performance and energy efficiency have become the main challenges for next-generation high-performance computing, asymmetric multi-core architectures can provide solutions to tackle these issues. Parallel programming models need to be able to suit the needs of such systems and keep on increasing the application’s portability and efficiency. This paper proposes two task scheduling approaches that target asymmetric systems. These dynamic scheduling policies reduce total execution time either by detecting the longest or the critical path of the dynamic task dependency graph of the application, or by finding the earliest executor of a task. They use dynamic scheduling and information discoverable during execution, fact that makes them implementable and functional without the need of off-line profiling. In our evaluation we compare these scheduling approaches with two existing state-of the art heterogeneous schedulers and we track their improvement over a FIFO baseline scheduler. We show that the heterogeneous schedulers improve the baseline by up to 1.45 in a real 8-core asymmetric system and up to 2.1 in a simulated 32-core asymmetric chip.This work has been supported by the Spanish Government (SEV2015-0493), by the Spanish Ministry of Science and Innovation (contract TIN2015-65316-P), by Generalitat de
Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272), by the RoMoL ERC Advanced Grant (GA 321253) and the
European HiPEAC Network of Excellence. The Mont-Blanc project receives funding from the EU’s Seventh Framework Programme (FP7/2007-2013) under grant agreement
no 610402 and from the EU’s H2020 Framework Programme (H2020/2014-2020) under grant agreement no 671697. M.
Moretó has been partially supported by the Ministry of Economy and Competitiveness under Juan de la Cierva postdoctoral fellowship number JCI-2012-15047. M. Casas
is supported by the Secretary for Universities and Research of the Ministry of Economy and Knowledge of the Government of Catalonia and the Cofund programme of the Marie
Curie Actions of the 7th R&D Framework Programme of the European Union (Contract 2013 BP B 00243).Peer ReviewedPostprint (author's final draft
Runtime-guided mitigation of manufacturing variability in power-constrained multi-socket NUMA nodes
This work has been supported by the Spanish Government (Severo Ochoa grants SEV2015-0493, SEV-2011-00067), by
the Spanish Ministry of Science and Innovation (contracts TIN2015-65316-P), by Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272), by the RoMoL ERC Advanced Grant (GA 321253) and the European HiPEAC Network of Excellence. M. Moretó has been partially supported by the Ministry of Economy and Competitiveness under Juan de la Cierva postdoctoral fellowship number JCI-2012-15047. M. Casas is supported by the Secretary for Universities and Research of the Ministry of Economy and Knowledge of the Government of Catalonia and the Cofund
programme of the Marie Curie Actions of the 7th R&D Framework Programme of the European Union (Contract 2013 BP B 00243). This work was also partially performed
under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344 (LLNL-CONF-689878).
Finally, the authors are grateful to the reviewers for their valuable comments, to the RoMoL team, to Xavier Teruel and Kallia Chronaki from the Programming Models group
of BSC and the Computation Department of LLNL for their technical support and useful feedback.Peer ReviewedPostprint (published version
- …