Search CORE

2 research outputs found

A framework for efficient execution of data parallel irregular applications on heterogeneous systems

Author: Barbosa João
Ribeiro Roberto
Santos Luís Paulo
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 01/01/2015
Field of study

Exploiting the computing power of the diversity of resources available on heterogeneous systems is mandatory but a very challenging task. The diversity of architectures, execution models and programming tools, together with disjoint address spaces and di erent computing capabilities, raise a number of challenges that severely impact on application performance and programming productivity. This problem is further compounded in the presence of data parallel irregular applications. This paper presents a framework that addresses development and execution of data parallel irregular applications in heterogeneous systems. A uni ed task-based programming and execution model is proposed, together with inter and intra-device scheduling, which, coupled with a data management system, aim to achieve performance scalability across multiple devices, while maintaining high programming productivity. Intradevice scheduling on wide SIMD/SIMT architectures resorts to consumer-producer kernels, which, by allowing dynamic generation and rescheduling of new work units, enable balancing irregular workloads and increase resource utilization. Results show that regular and irregular applications scale well with the number of devices, while requiring minimal programming e ort. Consumer-producer kernels are able to sustain signi cant performance gains as long as the workload per basic work unit is enough to compensate overheads associated with intra-device scheduling. This not being the case, consumer kernels can still be used for the irregular application. Comparisons with an alternative framework, StarPU, which targets regular workloads, consistently demonstrate signi cant speedups. This is, to the best of our knowledge, the rst published integrated approach that successfully handles irregular workloads over heterogeneous systems.This work is funded by National Funds through the FCT - Fundação para a Ciência e a Tecnologia (Portuguese Foundation for Science and Technology) and by ERDF - European Regional Development Fund through the COMPETE Programme (operational programme for competitiveness) within projects PEst-OE/EEI/UI0752/2014 and FCOMP-01-0124-FEDER-010067. Also by the School of Engineering, Universidade do Minho within project P2SHOCS - Performance Portability on Scalable Heterogeneous Computing Systems

Universidade do Minho: RepositoriUM

Crossref

Recommended from our members

Fine-grain acceleration of graph algorithms on a heterogeneous chip

Author: Eryilmaz Cagri
Publication venue
Publication date: 14/07/2017
Field of study

With the rise of heterogeneous chips available in the market, where integrated GPU cores and CPU cores reside in the same chip and share a unified memory, it is possible to have better execution schemes for many graph algorithms. Graph algorithms can exhibit producer-consumer behavior, a varying amount of parallelism during execution, and irregularity which results in inefficiency. The inefficiency problem could be solved by exploiting heterogeneity between cores. In this work, I provide an understanding of the executions of some graph algorithms in heterogeneous chips and accelerate their executions by using fine-grain software optimization techniques. To achieve this, I introduce two different fine-grain execution techniques to accelerate the Maximal Independent Set and Preflow-push graph algorithms, and present an evaluation of the techniques on a heterogeneous chip. My techniques, namely Overlapping Threads with Hot-Vertices and Task Switcher, provide 1.3x to 16x speedup over CPU-only execution depending on the input and the algorithm.Electrical and Computer Engineerin

Texas ScholarWorks