Efficient OpenCL-based concurrent tasks offloading on accelerators

Abstract

Current heterogeneous platforms with CPUs and accelerators have the ability to launch several independent tasks simultaneously, in order to exploit concurrency among them. These tasks typically consist of data transfer commands and kernel computation commands. In this paper we develop a runtime approach to optimize the concurrency between data transfers and kernel computation commands in a multithreaded scenario where each CPU thread offloads tasks to the accelerator. It deploys a heuristic based on a temporal execution model for concurrent tasks. It is able to establish a near-optimal task execution order that significantly reduces the total execution time, including data transfers. Our approach has been evaluated employing five different benchmarks composed of dominant kernel and dominant transfer real tasks. In these experiments our heuristic achieves speedups up to 1.5x in AMD R9 and NVIDIA K20c accelerators and 1.3x in an Intel Xeon Phi (KNC) device.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

Similar works

Full text

thumbnail-image

Repositorio Institucional Universidad de Málaga

redirect
Last time updated on 06/12/2017

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.