136 research outputs found
Portability and Scalability of OpenMP Offloading on State-of-the-art Accelerators
Over the last decade, most of the increase in computing power has been gained
by advances in accelerated many-core architectures, mainly in the form of
GPGPUs. While accelerators achieve phenomenal performances in various computing
tasks, their utilization requires code adaptations and transformations. Thus,
OpenMP, the most common standard for multi-threading in scientific computing
applications, introduced offloading capabilities between host (CPUs) and
accelerators since v4.0, with increasing support in the successive v4.5, v5.0,
v5.1, and the latest v5.2 versions. Recently, two state-of-the-art GPUs - the
Intel Ponte Vecchio Max 1100 and the NVIDIA A100 GPUs - were released to the
market, with the oneAPI and GNU LLVM-backed compilation for offloading,
correspondingly. In this work, we present early performance results of OpenMP
offloading capabilities to these devices while specifically analyzing the
potability of advanced directives (using SOLLVE's OMPVV test suite) and the
scalability of the hardware in representative scientific mini-app (the LULESH
benchmark). Our results show that the vast majority of the offloading
directives in v4.5 and 5.0 are supported in the latest oneAPI and GNU
compilers; however, the support in v5.1 and v5.2 is still lacking. From the
performance perspective, we found that PVC is up to 37% better than the A100 on
the LULESH benchmark, presenting better performance in computing and data
movements.Comment: 13 page
UPIR: Toward the Design of Unified Parallel Intermediate Representation for Parallel Programming Models
The complexity of heterogeneous computing architectures, as well as the
demand for productive and portable parallel application development, have
driven the evolution of parallel programming models to become more
comprehensive and complex than before. Enhancing the conventional compilation
technologies and software infrastructure to be parallelism-aware has become one
of the main goals of recent compiler development. In this paper, we propose the
design of unified parallel intermediate representation (UPIR) for multiple
parallel programming models and for enabling unified compiler transformation
for the models. UPIR specifies three commonly used parallelism patterns (SPMD,
data and task parallelism), data attributes and explicit data movement and
memory management, and synchronization operations used in parallel programming.
We demonstrate UPIR via a prototype implementation in the ROSE compiler for
unifying IR for both OpenMP and OpenACC and in both C/C++ and Fortran, for
unifying the transformation that lowers both OpenMP and OpenACC code to LLVM
runtime, and for exporting UPIR to LLVM MLIR dialect.Comment: Typos corrected. Format update
Evaluating Portable Parallelization Strategies for Heterogeneous Architectures in High Energy Physics
High-energy physics (HEP) experiments have developed millions of lines of
code over decades that are optimized to run on traditional x86 CPU systems.
However, we are seeing a rapidly increasing fraction of floating point
computing power in leadership-class computing facilities and traditional data
centers coming from new accelerator architectures, such as GPUs. HEP
experiments are now faced with the untenable prospect of rewriting millions of
lines of x86 CPU code, for the increasingly dominant architectures found in
these computational accelerators. This task is made more challenging by the
architecture-specific languages and APIs promoted by manufacturers such as
NVIDIA, Intel and AMD. Producing multiple, architecture-specific
implementations is not a viable scenario, given the available person power and
code maintenance issues.
The Portable Parallelization Strategies team of the HEP Center for
Computational Excellence is investigating the use of Kokkos, SYCL, OpenMP,
std::execution::parallel and alpaka as potential portability solutions that
promise to execute on multiple architectures from the same source code, using
representative use cases from major HEP experiments, including the DUNE
experiment of the Long Baseline Neutrino Facility, and the ATLAS and CMS
experiments of the Large Hadron Collider. This cross-cutting evaluation of
portability solutions using real applications will help inform and guide the
HEP community when choosing their software and hardware suites for the next
generation of experimental frameworks. We present the outcomes of our studies,
including performance metrics, porting challenges, API evaluations, and build
system integration.Comment: 18 pages, 9 Figures, 2 Table
- …