3 research outputs found

    OpenCL Actors - Adding Data Parallelism to Actor-based Programming with CAF

    Full text link
    The actor model of computation has been designed for a seamless support of concurrency and distribution. However, it remains unspecific about data parallel program flows, while available processing power of modern many core hardware such as graphics processing units (GPUs) or coprocessors increases the relevance of data parallelism for general-purpose computation. In this work, we introduce OpenCL-enabled actors to the C++ Actor Framework (CAF). This offers a high level interface for accessing any OpenCL device without leaving the actor paradigm. The new type of actor is integrated into the runtime environment of CAF and gives rise to transparent message passing in distributed systems on heterogeneous hardware. Following the actor logic in CAF, OpenCL kernels can be composed while encapsulated in C++ actors, hence operate in a multi-stage fashion on data resident at the GPU. Developers are thus enabled to build complex data parallel programs from primitives without leaving the actor paradigm, nor sacrificing performance. Our evaluations on commodity GPUs, an Nvidia TESLA, and an Intel PHI reveal the expected linear scaling behavior when offloading larger workloads. For sub-second duties, the efficiency of offloading was found to largely differ between devices. Moreover, our findings indicate a negligible overhead over programming with the native OpenCL API.Comment: 28 page

    Design and Evaluation of the GeMTC Framework for GPU-enabled Many-Task Computing

    No full text
    We present the design and first performance and usability evaluation of GeMTC, a novel execution model and runtime system that enables accelerators to be programmed with many concurrent and independent tasks of potentially short or variable duration. With GeMTC, a broad class of such “many-task ” applications can leverage the increasing number of accelerated and hybrid high-end computing systems. GeMTC overcomes the obstacles to using GPUs in a many-task manner by scheduling and launching independent tasks on hardware designed for SIMD-style vector processing. We demonstrate the use of a high-level MTC programming model (the Swift parallel dataflow language) to run tasks on many accelerators and thus provide a highproductivity programming model for the growing number of supercomputers that are accelerator-enabled. While still in an experimental stage, GeMTC can already support tasks of fine (subsecond) granularity and execute concurrent heterogeneous tasks on 86,000 independent GPU warps spanning 2.7M GPU threads on the Blue Waters supercomputer
    corecore