7 research outputs found
Algorithmic skeleton framework for the orchestration of GPU computations
Dissertação para obtenção do Grau de Mestre em
Engenharia InformáticaThe Graphics Processing Unit (GPU) is gaining popularity as a co-processor to the
Central Processing Unit (CPU), due to its ability to surpass the latter’s performance in certain application fields. Nonetheless, harnessing the GPU’s capabilities is a non-trivial exercise that requires good knowledge of parallel programming. Thus, providing ways to extract such computational power has become an emerging research topic.
In this context, there have been several proposals in the field of GPGPU (Generalpurpose Computation on Graphics Processing Unit) development. However, most of these still offer a low-level abstraction of the GPU computing model, forcing the developer to adapt application computations in accordance with the SPMD model, as well as
to orchestrate the low-level details of the execution. On the other hand, the higher-level approaches have limitations that prevent the full exploitation of GPUs when the purpose goes beyond the simple offloading of a kernel.
To this extent, our proposal builds on the recent trend of applying the notion of algorithmic patterns (skeletons) to GPU computing. We propose Marrow, a high-level algorithmic skeleton framework that expands the set of skeletons currently available in
this field. Marrow’s skeletons orchestrate the execution of OpenCL computations and
introduce optimizations that overlap communication and computation, thus conjoining programming simplicity with performance gains in many application scenarios. Additionally, these skeletons can be combined (nested) to create more complex applications.
We evaluated the proposed constructs by confronting them against the comparable
skeleton libraries for GPGPU, as well as against hand-tuned OpenCL programs. The
results are favourable, indicating that Marrow’s skeletons are both flexible and efficient in the context of GPU computing.FCT-MCTES - financing the equipmen
Contract-Based General-Purpose GPU Programming
Using GPUs as general-purpose processors has revolutionized parallel
computing by offering, for a large and growing set of algorithms, massive
data-parallelization on desktop machines. An obstacle to widespread adoption,
however, is the difficulty of programming them and the low-level control of the
hardware required to achieve good performance. This paper suggests a
programming library, SafeGPU, that aims at striking a balance between
programmer productivity and performance, by making GPU data-parallel operations
accessible from within a classical object-oriented programming language. The
solution is integrated with the design-by-contract approach, which increases
confidence in functional program correctness by embedding executable program
specifications into the program text. We show that our library leads to modular
and maintainable code that is accessible to GPGPU non-experts, while providing
performance that is comparable with hand-written CUDA code. Furthermore,
runtime contract checking turns out to be feasible, as the contracts can be
executed on the GPU
Contract-Based General-Purpose GPU Programming
Abstract Using GPUs as general-purpose processors has revolutionized parallel computing by offering, for a large and growing set of algorithms, massive data-parallelization on desktop machines. An obstacle to widespread adoption, however, is the difficulty of programming them and the low-level control of the hardware required to achieve good performance. This paper suggests a programming library, SafeGPU, that aims at striking a balance between programmer productivity and performance, by making GPU data-parallel operations accessible from within a classical object-oriented programming language. The solution is integrated with the design-bycontract approach, which increases confidence in functional program correctness by embedding executable program specifications into the program text. We show that our library leads to modular and maintainable code that is accessible to GPGPU non-experts, while providing performance that is comparable with hand-written CUDA code. Furthermore, runtime contract checking turns out to be feasible, as the contracts can be executed on the GPU
Energy models in data parallel CPU/GPU computations
La tesi affronta il problema della modellazione dei consumi energetici in computazioni data parallel (map) in architetture con GPU. I modelli sviluppati sono utilizzati per valutare il compromesso tra performance e consumi. La tesi include risultati sperimentali che validano sia i modelli sviluppati che la metodologia di ricerca di un compromesso tra prestazioni e consumi
Enhancing productivity and performance portability of opencl applications on heterogeneous systems using runtime optimizations
Initially driven by a strong need for increased computational performance in science and
engineering, heterogeneous systems have become ubiquitous and they are getting increasingly
complex. The single processor era has been replaced with multi-core processors,
which have quickly been surrounded by satellite devices aiming to increase the throughput
of the entire system. These auxiliary devices, such as Graphics Processing Units, Field Programmable
Gate Arrays or other specialized processors have very different architectures.
This puts an enormous strain on programming models and software developers to take full
advantage of the computing power at hand. Because of this diversity and the unachievable
flexibility and portability necessary to optimize for each target individually, heterogeneous
systems remain typically vastly under-utilized.
In this thesis, we explore two distinct ways to tackle this problem. Providing automated,
non intrusive methods in the form of compiler tools and implementing efficient abstractions
to automatically tune parameters for a restricted domain are two complementary
approaches investigated to better utilize compute resources in heterogeneous systems.
First, we explore a fully automated compiler based approach, where a runtime system
analyzes the computation flow of an OpenCL application and optimizes it across multiple
compute kernels. This method can be deployed on any existing application transparently
and replaces significant software engineering effort spent to tune application for a particular
system. We show that this technique achieves speedups of up to 3x over unoptimized
code and an average of 1.4x over manually optimized code for highly dynamic applications.
Second, a library based approach is designed to provide a high level abstraction for
complex problems in a specific domain, stencil computation. Using domain specific techniques,
the underlying framework optimizes the code aggressively. We show that even in
a restricted domain, automatic tuning mechanisms and robust architectural abstraction are
necessary to improve performance. Using the abstraction layer, we demonstrate strong scaling
of various applications to multiple GPUs with a speedup of up to 1.9x on two GPUs
and 3.6x on four
FSCL: Homogeneous programming, scheduling and execution on heterogeneous platforms
The last few years has seen activity towards programming models, languages and frameworks to address the increasingly wide range and broad availability of heterogeneous computing resources through raised programming abstraction and portability across different platforms.
The effort spent in simplifying parallel programming across heterogeneous platforms is often outweighed by the need for low-level control over computation setup and execution and by performance opportunities that are missed due to the overhead introduced by the additional abstraction. Moreover, despite the ability to port parallel code across devices, each device is generally characterised by a restricted set of computations that it can execute outperforming the other devices in the system. The problem is therefore to schedule computations on increasingly popular multi-device heterogeneous platforms, helping to choose the best device among the available ones each time a computation has to execute.
Our Ph.D. research investigates the possibilities to address the problem of programming and execution abstraction on heterogeneous platforms while helping to dynamically and transparently exploit the computing power of such platforms in a device-aware fashion