Search CORE

3 research outputs found

Performance and portability of accelerated lattice Boltzmann applications with OpenACC

Author: Calore Enrico
Gabbana Alessandro
Kraus Jiri
Schifano Sebastiano Fabio
Tripiccione Raffaele
Publication venue: 'Wiley'
Publication date: 01/01/2016
Field of study

An increasingly large number of HPC systems rely on heterogeneous architectures combining traditional multi-core CPUs with power efficient accelerators. Designing efficient applications for these systems have been troublesome in the past as accelerators could usually be programmed using specific programming languages threatening maintainability, portability, and correctness. Several new programming environments try to tackle this problem. Among them, OpenACC offers a high-level approach based on compiler directives to mark regions of existing C, C++, or Fortran codes to run on accelerators. This approach directly addresses code portability, leaving to compilers the support of each different accelerator, but one has to carefully assess the relative costs of portable approaches versus computing efficiency. In this paper, we address precisely this issue, using as a test-bench a massively parallel lattice Boltzmann algorithm. We first describe our multi-node implementation and optimization of the algorithm, using OpenACC and MPI. We then benchmark the code on a variety of processors, including traditional CPUs and GPUs, and make accurate performance comparisons with other GPU implementations of the same algorithm using CUDA and OpenCL. We also asses the performance impact associated with portable programming, and the actual portability and performance-portability of OpenACC-based applications across several state-of-the-art architectures

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università di Ferrara

A Portable OpenCL Lattice Boltzmann Code for Multi- and Many-core Processor Architectures

Author: E. Calore
R. Tripiccione
S. F. Schifano
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

Abstract The architecture of high performance computing systems is becoming more and more heterogeneous, as accelerators play an increasingly important role alongside traditional CPUs. Programming heterogeneous systems efficiently is a complex task, that often requires the use of specific programming environments. Programming frameworks supporting codes portable across different high performance architectures have recently appeared, but one must carefully assess the relative costs of portability versus computing efficiency, and find a reasonable tradeoff point. In this paper we address precisely this issue, using as test-bench a Lattice Boltzmann code implemented in OpenCL. We analyze its performance on several different state-of-the-art processors: NVIDIA GPUs and Intel Xeon-Phi many-core accelerators, as well as more traditional Ivy Bridge and Opteron multi-core commodity CPUs. We also compare with results obtained with codes specifically optimized for each of these systems. Our work shows that a properly structured OpenCL code runs on many different systems reaching performance levels close to those obtained by architecture-tuned CUDA or C codes

Elsevier - Publisher Connector

Crossref

Archivio istituzionale della ricerca - Università di Ferrara

Open Access Repository