Search CORE

12 research outputs found

High Performance Code Generation for Stencil Computation on Heterogeneous Multi-device Architectures

Author: Brunet Elisabeth
Li Pei
Namyst Raymond
Publication venue: IEEE Computer Society
Publication date: 13/11/2013
Field of study

International audienceHeterogeneous architectures have been widely used in the domain of high performance computing. On one hand, it allows a designer to use multiple types of computing units and each able to execute the tasks that it is best suited for to increase performance; on the other hand, it brings many challenges in programming for novice users, especially for heterogeneous systems with multi-devices. In this paper, we propose the code generator STEPOCL that generates OpenCL host program for heterogeneous multi-device architecture. In order to simplify the analyzing process, we ask user to provide the description of input and kernel parameters in an XML file, then our generator analyzes the description and generates automatically the host program. Due to the data partition and data exchange strategies, the generated host program can be executed on multi-devices without changing any kernel code. The experiment of iterative stencil loop code (ISL) shows that our tool is efficient. It guarantees the minimum data exchanges and achieves high performance on heterogeneous multi-device architecture

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

Data Layout Transformation for Stencil Computations on Short-Vector SIMD Architectures

Author: H. Dursun
K. Kennedy
L. Fireman
M.J. Wolfe
W. Augustin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Crossref

Hierarchical overlapped tiling

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2012
Field of study

Crossref

AN5D: Automated Stencil Framework for High-Degree Temporal Blocking on GPUs

Author: Ao Y.
Bondhugula U.
Bondhugula Uday
Chi Y.
de Fine Licht Johannes
Grosser Tobias
Grosser Tobias
Grosser Tobias
Hagedorn Bastian
Holewinski Justin
Irigoin F.
Kamil Shoaib
Konstantinidis E.
Krishnamoorthy Sriram
Maruyama Naoya
Meng Jiayuan
Muranushi Takayuki
Nguyen A.
Prajapati Nirmal
Ravishankar Mahesh
Rawat P. S.
Rawat Prashant
Rawat Prashant Singh
Rawat Prashant Singh
Rawat Prashant Singh
Rossinelli Diego
Shimokawabe Takashi
Shimokawabe Takashi
Tang W. T.
Tang Yuan
Verdoolaege Sven
Verdoolaege Sven
Verdoolaege Sven
Williams Samuel
Wolfe M.
Zohouri H. R.
Zohouri Hamid Reza
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/02/2020
Field of study

Stencil computation is one of the most widely-used compute patterns in high performance computing applications. Spatial and temporal blocking have been proposed to overcome the memory-bound nature of this type of computation by moving memory pressure from external memory to on-chip memory on GPUs. However, correctly implementing those optimizations while considering the complexity of the architecture and memory hierarchy of GPUs to achieve high performance is difficult. We propose AN5D, an automated stencil framework which is capable of automatically transforming and optimizing stencil patterns in a given C source code, and generating corresponding CUDA code. Parameter tuning in our framework is guided by our performance model. Our novel optimization strategy reduces shared memory and register pressure in comparison to existing implementations, allowing performance scaling up to a temporal blocking degree of 10. We achieve the highest performance reported so far for all evaluated stencil benchmarks on the state-of-the-art Tesla V100 GPU

arXiv.org e-Print Archive

Crossref

Libra.Net: Single Task Scheduling in a CPU-GPU Heterogeneous Environment

Author: PAGANUCCI STEFANO
Publication venue: 'Pisa University Press'
Publication date: 08/12/2010
Field of study

In this thesis we developed a single task scheduler in a CPU-GPU heterogeneous environment. We formulated a GPGPU performance model recognizing a ground model common to any GPGPU platform that must be refined to consider specific platforms. We proposed a model refinement for the Nvidia CUDA platform. Moreover, we formulated a CPU performance model for the Common Language Infrastructure virtual execution environment. Finally, we developed Libra.Net, a particular implementation of the scheduler for the Microsoft Common Language Runtime and evaluated its efficiency

Electronic Thesis and Dissertation Archive - Università di Pisa