159 research outputs found
Algorithmic skeleton framework for the orchestration of GPU computations
Dissertação para obtenção do Grau de Mestre em
Engenharia InformáticaThe Graphics Processing Unit (GPU) is gaining popularity as a co-processor to the
Central Processing Unit (CPU), due to its ability to surpass the latter’s performance in certain application fields. Nonetheless, harnessing the GPU’s capabilities is a non-trivial exercise that requires good knowledge of parallel programming. Thus, providing ways to extract such computational power has become an emerging research topic.
In this context, there have been several proposals in the field of GPGPU (Generalpurpose Computation on Graphics Processing Unit) development. However, most of these still offer a low-level abstraction of the GPU computing model, forcing the developer to adapt application computations in accordance with the SPMD model, as well as
to orchestrate the low-level details of the execution. On the other hand, the higher-level approaches have limitations that prevent the full exploitation of GPUs when the purpose goes beyond the simple offloading of a kernel.
To this extent, our proposal builds on the recent trend of applying the notion of algorithmic patterns (skeletons) to GPU computing. We propose Marrow, a high-level algorithmic skeleton framework that expands the set of skeletons currently available in
this field. Marrow’s skeletons orchestrate the execution of OpenCL computations and
introduce optimizations that overlap communication and computation, thus conjoining programming simplicity with performance gains in many application scenarios. Additionally, these skeletons can be combined (nested) to create more complex applications.
We evaluated the proposed constructs by confronting them against the comparable
skeleton libraries for GPGPU, as well as against hand-tuned OpenCL programs. The
results are favourable, indicating that Marrow’s skeletons are both flexible and efficient in the context of GPU computing.FCT-MCTES - financing the equipmen
Contract-Based General-Purpose GPU Programming
Using GPUs as general-purpose processors has revolutionized parallel
computing by offering, for a large and growing set of algorithms, massive
data-parallelization on desktop machines. An obstacle to widespread adoption,
however, is the difficulty of programming them and the low-level control of the
hardware required to achieve good performance. This paper suggests a
programming library, SafeGPU, that aims at striking a balance between
programmer productivity and performance, by making GPU data-parallel operations
accessible from within a classical object-oriented programming language. The
solution is integrated with the design-by-contract approach, which increases
confidence in functional program correctness by embedding executable program
specifications into the program text. We show that our library leads to modular
and maintainable code that is accessible to GPGPU non-experts, while providing
performance that is comparable with hand-written CUDA code. Furthermore,
runtime contract checking turns out to be feasible, as the contracts can be
executed on the GPU
Multi-GPU support on the marrow algorithmic skeleton framework
Dissertação para obtenção do Grau de Mestre em
Engenharia InformáticaWith the proliferation of general purpose GPUs, workload parallelization and datatransfer optimization became an increasing concern. The natural evolution from using a single GPU, is multiplying the amount of available processors, presenting new challenges, as tuning the workload decompositions and load balancing, when dealing with heterogeneous systems.
Higher-level programming is a very important asset in a multi-GPU environment, due to the complexity inherent to the currently used GPGPU APIs (OpenCL and CUDA), because of their low-level and code overhead. This can be obtained by introducing an abstraction layer, which has the advantage of enabling implicit optimizations and orchestrations
such as transparent load balancing mechanism and reduced explicit code overhead.
Algorithmic Skeletons, previously used in cluster environments, have recently been
adapted to the GPGPU context. Skeletons abstract most sources of code overhead, by
defining computation patterns of commonly used algorithms. The Marrow algorithmic
skeleton library is one of these, taking advantage of the abstractions to automate the
orchestration needed for an efficient GPU execution.
This thesis proposes the extension of Marrow to leverage the use of algorithmic skeletons
in the modular and efficient programming of multiple heterogeneous GPUs, within a single machine.
We were able to achieve a good balance between simplicity of the programming model and performance, obtaining good scalability when using multiple GPUs, with an efficient load distribution, although at the price of some overhead when using a single-GPU.projects PTDC/EIA-EIA/102579/2008 and PTDC/EIA-EIA/111518/200
Towards an algorithmic skeleton framework for programming the Intel R Xeon PhiTM processor
The Intel R Xeon PhiTM is the first processor based on Intel’s MIC (Many Integrated Cores) architecture. It is a co-processor specially tailored for data-parallel computations, whose basic architectural design is similar to the ones of GPUs (Graphics Processing Units), leveraging the use of many integrated low computational cores to perform parallel
computations. The main novelty of the MIC architecture, relatively to GPUs, is its
compatibility with the Intel x86 architecture. This enables the use of many of the tools commonly available for the parallel programming of x86-based architectures, which may lead to a smaller learning curve. However, programming the Xeon Phi still entails aspects intrinsic to accelerator-based computing, in general, and to the MIC architecture, in particular.
In this thesis we advocate the use of algorithmic skeletons for programming the Xeon Phi. Algorithmic skeletons abstract the complexity inherent to parallel programming,
hiding details such as resource management, parallel decomposition, inter-execution
flow communication, thus removing these concerns from the programmer’s mind. In
this context, the goal of the thesis is to lay the foundations for the development of a
simple but powerful and efficient skeleton framework for the programming of the Xeon
Phi processor. For this purpose we build upon Marrow, an existing framework for the
orchestration of OpenCLTM computations in multi-GPU and CPU environments. We extend
Marrow to execute both OpenCL and C++ parallel computations on the Xeon Phi.
We evaluate the newly developed framework, several well-known benchmarks, like
Saxpy and N-Body, will be used to compare, not only its performance to the existing
framework when executing on the co-processor, but also to assess the performance on the Xeon Phi versus a multi-GPU environment.projects PTDC/EIA- EIA/113613/2009 (Synergy-VM) and PTDC/EEI-CTP/1837/2012 (SwiftComp) for financing the purchase of the Intel R Xeon PhiT
Heterogeneous computing with an algorithmic skeleton framework
The Graphics Processing Unit (GPU) is present in almost every modern day personal
computer. Despite its specific purpose design, they have been increasingly used for general
computations with very good results. Hence, there is a growing effort from the community
to seamlessly integrate this kind of devices in everyday computing. However, to
fully exploit the potential of a system comprising GPUs and CPUs, these devices should
be presented to the programmer as a single platform.
The efficient combination of the power of CPU and GPU devices is highly dependent
on each device’s characteristics, resulting in platform specific applications that cannot
be ported to different systems. Also, the most efficient work balance among devices is
highly dependable on the computations to be performed and respective data sizes.
In this work, we propose a solution for heterogeneous environments based on the
abstraction level provided by algorithmic skeletons. Our goal is to take full advantage of
the power of all CPU and GPU devices present in a system, without the need for different
kernel implementations nor explicit work-distribution.To that end, we extended Marrow,
an algorithmic skeleton framework for multi-GPUs, to support CPU computations and
efficiently balance the work-load between devices. Our approach is based on an offline
training execution that identifies the ideal work balance and platform configurations for
a given application and input data size.
The evaluation of this work shows that the combination of CPU and GPU devices can
significantly boost the performance of our benchmarks in the tested environments, when
compared to GPU-only executions
Programming Heterogeneous Parallel Machines Using Refactoring and Monte-Carlo Tree Search
Funding: This work was supported by the EU Horizon 2020 project, TeamPlay, Grant Number 779882, and UK EPSRC Discovery, Grant Number EP/P020631/1.This paper presents a new technique for introducing and tuning parallelism for heterogeneous shared-memory systems (comprising a mixture of CPUs and GPUs), using a combination of algorithmic skeletons (such as farms and pipelines), Monte–Carlo tree search for deriving mappings of tasks to available hardware resources, and refactoring tool support for applying the patterns and mappings in an easy and effective way. Using our approach, we demonstrate easily obtainable, significant and scalable speedups on a number of case studies showing speedups of up to 41 over the sequential code on a 24-core machine with one GPU. We also demonstrate that the speedups obtained by mappings derived by the MCTS algorithm are within 5–15% of the best-obtained manual parallelisation.Publisher PDFPeer reviewe
- …