12,184 research outputs found
PyCUDA and PyOpenCL: A Scripting-Based Approach to GPU Run-Time Code Generation
High-performance computing has recently seen a surge of interest in
heterogeneous systems, with an emphasis on modern Graphics Processing Units
(GPUs). These devices offer tremendous potential for performance and efficiency
in important large-scale applications of computational science. However,
exploiting this potential can be challenging, as one must adapt to the
specialized and rapidly evolving computing environment currently exhibited by
GPUs. One way of addressing this challenge is to embrace better techniques and
develop tools tailored to their needs. This article presents one simple
technique, GPU run-time code generation (RTCG), along with PyCUDA and PyOpenCL,
two open-source toolkits that support this technique.
In introducing PyCUDA and PyOpenCL, this article proposes the combination of
a dynamic, high-level scripting language with the massive performance of a GPU
as a compelling two-tiered computing platform, potentially offering significant
performance and productivity advantages over conventional single-tier, static
systems. The concept of RTCG is simple and easily implemented using existing,
robust infrastructure. Nonetheless it is powerful enough to support (and
encourage) the creation of custom application-specific tools by its users. The
premise of the paper is illustrated by a wide range of examples where the
technique has been applied with considerable success.Comment: Submitted to Parallel Computing, Elsevie
A Modeling Approach based on UML/MARTE for GPU Architecture
Nowadays, the High Performance Computing is part of the context of embedded
systems. Graphics Processing Units (GPUs) are more and more used in
acceleration of the most part of algorithms and applications. Over the past
years, not many efforts have been done to describe abstractions of applications
in relation to their target architectures. Thus, when developers need to
associate applications and GPUs, for example, they find difficulty and prefer
using API for these architectures. This paper presents a metamodel extension
for MARTE profile and a model for GPU architectures. The main goal is to
specify the task and data allocation in the memory hierarchy of these
architectures. The results show that this approach will help to generate code
for GPUs based on model transformations using Model Driven Engineering (MDE).Comment: Symposium en Architectures nouvelles de machines (SympA'14) (2011
TANGO: Transparent heterogeneous hardware Architecture deployment for eNergy Gain in Operation
The paper is concerned with the issue of how software systems actually use
Heterogeneous Parallel Architectures (HPAs), with the goal of optimizing power
consumption on these resources. It argues the need for novel methods and tools
to support software developers aiming to optimise power consumption resulting
from designing, developing, deploying and running software on HPAs, while
maintaining other quality aspects of software to adequate and agreed levels. To
do so, a reference architecture to support energy efficiency at application
construction, deployment, and operation is discussed, as well as its
implementation and evaluation plans.Comment: Part of the Program Transformation for Programmability in
Heterogeneous Architectures (PROHA) workshop, Barcelona, Spain, 12th March
2016, 7 pages, LaTeX, 3 PNG figure
Accelerated Neural Networks on OpenCL Devices Using SYCL-DNN
Over the past few years machine learning has seen a renewed explosion of
interest, following a number of studies showing the effectiveness of neural
networks in a range of tasks which had previously been considered incredibly
hard. Neural networks' effectiveness in the fields of image recognition and
natural language processing stems primarily from the vast amounts of data
available to companies and researchers, coupled with the huge amounts of
compute power available in modern accelerators such as GPUs, FPGAs and ASICs.
There are a number of approaches available to developers for utilizing GPGPU
technologies such as SYCL, OpenCL and CUDA, however many applications require
the same low level mathematical routines. Libraries dedicated to accelerating
these common routines allow developers to easily make full use of the available
hardware without requiring low level knowledge of the hardware themselves,
however such libraries are often provided by hardware manufacturers for
specific hardware such as cuDNN for Nvidia hardware or MIOpen for AMD hardware.
SYCL-DNN is a new open-source library dedicated to providing accelerated
routines for neural network operations which are hardware and vendor agnostic.
Built on top of the SYCL open standard and written entirely in standard C++,
SYCL-DNN allows a user to easily accelerate neural network code for a wide
range of hardware using a modern C++ interface. The library is tested on AMD's
OpenCL for GPU, Intel's OpenCL for CPU and GPU, ARM's OpenCL for Mali GPUs as
well as ComputeAorta's OpenCL for R-Car CV engine and host CPU. In this talk we
will present performance figures for SYCL-DNN on this range of hardware, and
discuss how high performance was achieved on such a varied set of accelerators
with such different hardware features.Comment: 4 pages, 3 figures. In International Workshop on OpenCL (IWOCL '19),
May 13-15, 2019, Bosto
- …