Search CORE

136 research outputs found

Portability and Scalability of OpenMP Offloading on State-of-the-art Accelerators

Author: Fridman Yehonatan
Oren Gal
Tamir Guy
Publication venue
Publication date: 09/04/2023
Field of study

Over the last decade, most of the increase in computing power has been gained by advances in accelerated many-core architectures, mainly in the form of GPGPUs. While accelerators achieve phenomenal performances in various computing tasks, their utilization requires code adaptations and transformations. Thus, OpenMP, the most common standard for multi-threading in scientific computing applications, introduced offloading capabilities between host (CPUs) and accelerators since v4.0, with increasing support in the successive v4.5, v5.0, v5.1, and the latest v5.2 versions. Recently, two state-of-the-art GPUs - the Intel Ponte Vecchio Max 1100 and the NVIDIA A100 GPUs - were released to the market, with the oneAPI and GNU LLVM-backed compilation for offloading, correspondingly. In this work, we present early performance results of OpenMP offloading capabilities to these devices while specifically analyzing the potability of advanced directives (using SOLLVE's OMPVV test suite) and the scalability of the hardware in representative scientific mini-app (the LULESH benchmark). Our results show that the vast majority of the offloading directives in v4.5 and 5.0 are supported in the latest oneAPI and GNU compilers; however, the support in v5.1 and v5.2 is still lacking. From the performance perspective, we found that PVC is up to 37% better than the A100 on the LULESH benchmark, presenting better performance in computing and data movements.Comment: 13 page

arXiv.org e-Print Archive

UPIR: Toward the Design of Unified Parallel Intermediate Representation for Parallel Programming Models

Author: Wang Anjia
Yan Yonghong
Yi Xinyao
Publication venue
Publication date: 28/10/2022
Field of study

The complexity of heterogeneous computing architectures, as well as the demand for productive and portable parallel application development, have driven the evolution of parallel programming models to become more comprehensive and complex than before. Enhancing the conventional compilation technologies and software infrastructure to be parallelism-aware has become one of the main goals of recent compiler development. In this paper, we propose the design of unified parallel intermediate representation (UPIR) for multiple parallel programming models and for enabling unified compiler transformation for the models. UPIR specifies three commonly used parallelism patterns (SPMD, data and task parallelism), data attributes and explicit data movement and memory management, and synchronization operations used in parallel programming. We demonstrate UPIR via a prototype implementation in the ROSE compiler for unifying IR for both OpenMP and OpenACC and in both C/C++ and Fortran, for unifying the transformation that lowers both OpenMP and OpenACC code to LLVM runtime, and for exporting UPIR to LLVM MLIR dialect.Comment: Typos corrected. Format update

arXiv.org e-Print Archive

Automatic Selection of Software Code Regions for Migrating to GPUs

Author: Fábio Daniel Reis Gaspar
Publication venue
Publication date: 04/03/2022
Field of study

Repositório Aberto da Universidade do Porto

Evaluating Portable Parallelization Strategies for Heterogeneous Architectures in High Energy Physics

Author: Atif Mohammad
Battacharya Meghna
Calafiura Paolo
Childers Taylor
Dewing Mark
Dong Zhihua
Gutsche Oliver
Habib Salman
Knoepfel Kyle
Kortelainen Matti
Kwok Ka Hei Martin
Leggett Charles
Lin Meifeng
Pascuzzi Vincent
Strelchenko Alexei
Tsulaia Vakhtang
Viren Brett
Wang Tianle
Yeo Beomki
Yu Haiwang
Publication venue
Publication date: 27/06/2023
Field of study

High-energy physics (HEP) experiments have developed millions of lines of code over decades that are optimized to run on traditional x86 CPU systems. However, we are seeing a rapidly increasing fraction of floating point computing power in leadership-class computing facilities and traditional data centers coming from new accelerator architectures, such as GPUs. HEP experiments are now faced with the untenable prospect of rewriting millions of lines of x86 CPU code, for the increasingly dominant architectures found in these computational accelerators. This task is made more challenging by the architecture-specific languages and APIs promoted by manufacturers such as NVIDIA, Intel and AMD. Producing multiple, architecture-specific implementations is not a viable scenario, given the available person power and code maintenance issues. The Portable Parallelization Strategies team of the HEP Center for Computational Excellence is investigating the use of Kokkos, SYCL, OpenMP, std::execution::parallel and alpaka as potential portability solutions that promise to execute on multiple architectures from the same source code, using representative use cases from major HEP experiments, including the DUNE experiment of the Long Baseline Neutrino Facility, and the ATLAS and CMS experiments of the Large Hadron Collider. This cross-cutting evaluation of portability solutions using real applications will help inform and guide the HEP community when choosing their software and hardware suites for the next generation of experimental frameworks. We present the outcomes of our studies, including performance metrics, porting challenges, API evaluations, and build system integration.Comment: 18 pages, 9 Figures, 2 Table

arXiv.org e-Print Archive

On the Porting and Optimisation of Physics Simulations for Heterogeneous Parallel Processors

Author: Martineau Matt J
Publication venue
Publication date: 25/06/2019
Field of study

Explore Bristol Research

Evaluating ISO C++ Parallel Algorithms on Heterogeneous HPC Systems

Author: Deakin Tom
Lin Tom
McIntosh-Smith Simon N
Publication venue: Institute of Electrical and Electronics Engineers (IEEE)
Publication date: 30/01/2023
Field of study

Explore Bristol Research