27 research outputs found
Toward Performance-Portable PETSc for GPU-based Exascale Systems
The Portable Extensible Toolkit for Scientific computation (PETSc) library
delivers scalable solvers for nonlinear time-dependent differential and
algebraic equations and for numerical optimization.The PETSc design for
performance portability addresses fundamental GPU accelerator challenges and
stresses flexibility and extensibility by separating the programming model used
by the application from that used by the library, and it enables application
developers to use their preferred programming model, such as Kokkos, RAJA,
SYCL, HIP, CUDA, or OpenCL, on upcoming exascale systems. A blueprint for using
GPUs from PETSc-based codes is provided, and case studies emphasize the
flexibility and high performance achieved on current GPU-based systems.Comment: 15 pages, 10 figures, 2 table
SYCL compute kernels for ExaHyPE
We discuss three SYCL realisations of a simple Finite Volume scheme over
multiple Cartesian patches. The realisation flavours differ in the way how they
map the compute steps onto loops and tasks: We compare an implementation which
is exclusively using a cascade of for-loops to a version which uses nested
parallelism, and finally benchmark these against a version which models the
calculations as task graph. Our work proposes realisation idioms to realise
these flavours within SYCL. The idioms translate to some degree to other GPGPU
programming techniques, too. Our preliminary results suggest that SYCL's
capability to model calculations via tasks or nested parallelism does not yet
allow such realisations to outperform their counterparts using exclusively data
parallelism
Austrian High-Performance-Computing meeting (AHPC2020)
This booklet is a collection of abstracts presented at the AHPC conference
Productivity, performance, and portability for computational fluid dynamics applications
Hardware trends over the last decade show increasing complexity and heterogeneity in high performance computing architectures, which presents developers of CFD applications with three key challenges; the need for achieving good performance, being able to utilise current and future hardware by being portable, and doing so in a productive manner. These three appear to contradict each other when using traditional programming approaches, but in recent years, several strategies such as template libraries and Domain Specific Languages have emerged as a potential solution; by giving up generality and focusing on a narrower domain of problems, all three can be achieved. This paper gives an overview of the state-of-the-art for delivering performance, portability, and productivity to CFD applications, ranging from high-level libraries that allow the symbolic description of PDEs to low-level techniques that target individual algorithmic patterns. We discuss advantages and challenges in using each approach, and review the performance benchmarking literature that compares implementations for hardware architectures and their programming methods, giving an overview of key applications and their comparative
performance
Under the hood of SYCL - an initial performance analysis with an unstructured-mesh CFD application
As the computing hardware landscape gets more diverse, and the complexity of hardware grows, the need for a general purpose parallel programming model capable of developing (performance) portable codes have become highly attractive. Intel’s OneAPI suite, which is based on the SYCL standard aims to fill this gap using a modern C++ API. In this paper, we use SYCL to parallelize MGCFD, an unstructured-mesh computational fluid dynamics (CFD) code, to explore current performance of SYCL. The code is benchmarked on several modern processor systems from Intel (including CPUs and the latest Xe LP GPU), AMD, ARM and Nvidia, making use of a variety of current SYCL compilers, with a particular focus on OneAPI and how it maps to Intel’s CPU and GPU architectures. We compare performance with other parallelisations available in OP2, including SIMD, OpenMP, MPI and CUDA. The results are mixed; the performance of this class of applications, when parallelized with SYCL, highly depends on the target architecture and the compiler, but in many cases comes close to the performance of currently prevalent parallel programming models. However, it still requires different parallelization strategies or code-paths be written for different hardware to obtain the best performanc
Lecture 12: Recent Advances in Time Integration Methods and How They Can Enable Exascale Simulations
To prepare for exascale systems, scientific simulations are growing in physical realism and thus complexity. This increase often results in additional and changing time scales. Time integration methods are critical to efficient solution of these multiphysics systems. Yet, many large-scale applications have not fully embraced modern time integration methods nor efficient software implementations. Hence, achieving temporal accuracy with new and complex simulations has proved challenging. We will overview recent advances in time integration methods, including additive IMEX methods, multirate methods, and parallel-in-time approaches, expected to help realize the potential of exascale systems on multiphysics simulations. Efficient execution of these methods relies, in turn, on efficient algebraic solvers, and we will discuss the relationships between integrators and solvers. In addition, an effective time integration approach is not complete without efficient software, and we will discuss effective software design approaches for time integrators and their uses in application codes. Lastly, examples demonstrating some of these new methods and their implementations will be presented.
This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. LLNL-ABS- 819501
HALO 1.0: A Hardware-agnostic Accelerator Orchestration Framework for Enabling Hardware-agnostic Programming with True Performance Portability for Heterogeneous HPC
This paper presents HALO 1.0, an open-ended extensible multi-agent software
framework that implements a set of proposed hardware-agnostic accelerator
orchestration (HALO) principles. HALO implements a novel compute-centric
message passing interface (C^2MPI) specification for enabling the
performance-portable execution of a hardware-agnostic host application across
heterogeneous accelerators. The experiment results of evaluating eight widely
used HPC subroutines based on Intel Xeon E5-2620 CPUs, Intel Arria 10 GX FPGAs,
and NVIDIA GeForce RTX 2080 Ti GPUs show that HALO 1.0 allows for a unified
control flow for host programs to run across all the computing devices with a
consistently top performance portability score, which is up to five orders of
magnitude higher than the OpenCL-based solution.Comment: 21 page
CompF2: Theoretical Calculations and Simulation Topical Group Report
This report summarizes the work of the Computational Frontier topical group
on theoretical calculations and simulation for Snowmass 2021. We discuss the
challenges, potential solutions, and needs facing six diverse but related
topical areas that span the subject of theoretical calculations and simulation
in high energy physics (HEP): cosmic calculations, particle accelerator
modeling, detector simulation, event generators, perturbative calculations, and
lattice QCD (quantum chromodynamics). The challenges arise from the next
generations of HEP experiments, which will include more complex instruments,
provide larger data volumes, and perform more precise measurements.
Calculations and simulations will need to keep up with these increased
requirements. The other aspect of the challenge is the evolution of computing
landscape away from general-purpose computing on CPUs and toward
special-purpose accelerators and coprocessors such as GPUs and FPGAs. These
newer devices can provide substantial improvements for certain categories of
algorithms, at the expense of more specialized programming and memory and data
access patterns.Comment: Report of the Computational Frontier Topical Group on Theoretical
Calculations and Simulation for Snowmass 202