119 research outputs found

    Acceleration of a Full-scale Industrial CFD Application with OP2

    Get PDF

    OP2-Clang : a source-to-source translator using Clang/LLVM LibTooling

    Get PDF
    Domain Specific Languages or Active Library frameworks have recently emerged as an important method for gaining performance portability, where an application can be efficiently executed on a wide range of HPC architectures without significant manual modifications. Embedded DSLs such as OP2, provides an API embedded in general purpose languages such as C/C++/Fortran. They rely on source-to-source translation and code refactorization to translate the higher-level API calls to platform specific parallel implementations. OP2 targets the solution of unstructured-mesh computations, where it can generate a variety of parallel implementations for execution on architectures such as CPUs, GPUs, distributed memory clusters and heterogeneous processors making use of a wide range of platform specific optimizations. Compiler tool-chains supporting source-to-source translation of code written in mainstream languages currently lack the capabilities to carry out such wide-ranging code transformations. Clang/LLVM’s Tooling library (LibTooling) has long been touted as having such capabilities but have only demonstrated its use in simple source refactoring tasks. In this paper we introduce OP2-Clang, a source-to-source translator based on LibTooling, for OP2’s C/C++ API, capable of generating target parallel code based on SIMD, OpenMP, CUDA and their combinations with MPI. OP2-Clang is designed to significantly reduce maintenance, particularly making it easy to be extended to generate new parallelizations and optimizations for hardware platforms. In this research, we demonstrate its capabilities including (1) the use of LibTooling’s AST matchers together with a simple strategy that use parallelization templates or skeletons to significantly reduce the complexity of generating radically different and transformed target code and (2) chart the challenges and solution to generating optimized parallelizations for OpenMP, SIMD and CUDA. Results indicate that OP2-Clang produces near-identical parallel code to that of OP2’s current source-to-source translator. We believe that the lessons learnt in OP2-Clang can be readily applied to developing other similar source-to-source translators, particularly for DSLs

    OP2-Clang : a source-to-source translator using Clang/LLVM LibTooling

    Get PDF
    Domain Specific Languages or Active Library frameworks have recently emerged as an important method for gaining performance portability, where an application can be efficiently executed on a wide range of HPC architectures without significant manual modifications. Embedded DSLs such as OP2, provides an API embedded in general purpose languages such as C/C++/Fortran. They rely on source-to-source translation and code refactorization to translate the higher-level API calls to platform specific parallel implementations. OP2 targets the solution of unstructured-mesh computations, where it can generate a variety of parallel implementations for execution on architectures such as CPUs, GPUs, distributed memory clusters and heterogeneous processors making use of a wide range of platform specific optimizations. Compiler tool-chains supporting source-to-source translation of code written in mainstream languages currently lack the capabilities to carry out such wide-ranging code transformations. Clang/LLVM’s Tooling library (LibTooling) has long been touted as having such capabilities but have only demonstrated its use in simple source refactoring tasks. In this paper we introduce OP2-Clang, a source-to-source translator based on LibTooling, for OP2’s C/C++ API, capable of generating target parallel code based on SIMD, OpenMP, CUDA and their combinations with MPI. OP2-Clang is designed to significantly reduce maintenance, particularly making it easy to be extended to generate new parallelizations and optimizations for hardware platforms. In this research, we demonstrate its capabilities including (1) the use of LibTooling’s AST matchers together with a simple strategy that use parallelization templates or skeletons to significantly reduce the complexity of generating radically different and transformed target code and (2) chart the challenges and solution to generating optimized parallelizations for OpenMP, SIMD and CUDA. Results indicate that OP2-Clang produces near-identical parallel code to that of OP2’s current source-to-source translator. We believe that the lessons learnt in OP2-Clang can be readily applied to developing other similar source-to-source translators, particularly for DSLs

    Under the hood of SYCL - an initial performance analysis with an unstructured-mesh CFD application

    Get PDF
    As the computing hardware landscape gets more diverse, and the complexity of hardware grows, the need for a general purpose parallel programming model capable of developing (performance) portable codes have become highly attractive. Intel’s OneAPI suite, which is based on the SYCL standard aims to fill this gap using a modern C++ API. In this paper, we use SYCL to parallelize MGCFD, an unstructured-mesh computational fluid dynamics (CFD) code, to explore current performance of SYCL. The code is benchmarked on several modern processor systems from Intel (including CPUs and the latest Xe LP GPU), AMD, ARM and Nvidia, making use of a variety of current SYCL compilers, with a particular focus on OneAPI and how it maps to Intel’s CPU and GPU architectures. We compare performance with other parallelisations available in OP2, including SIMD, OpenMP, MPI and CUDA. The results are mixed; the performance of this class of applications, when parallelized with SYCL, highly depends on the target architecture and the compiler, but in many cases comes close to the performance of currently prevalent parallel programming models. However, it still requires different parallelization strategies or code-paths be written for different hardware to obtain the best performanc

    An Unstructured CFD Mini-Application for the Performance Prediction of a Production CFD Code

    Get PDF
    Maintaining the performance of large scientific codes is a difficult task. To aid in this task, a number of mini-applications have been developed that are more tractable to analyze than large-scale production codes while retaining the performance characteristics of them. These “mini-apps” also enable faster hardware evaluation and, for sensitive commercial codes, allow evaluation of code and system changes outside of access approval processes. In this paper, we develop MG-CFD, a mini-application that represents a geometric multigrid, unstructured computational fluid dynamics (CFD) code, designed to exhibit similar performance characteristics without sharing commercially sensitive code. We detail our experiences of developing this application using guidelines detailed in existing research and contributing further to these. Our application is validated against the inviscid flux routine of HYDRA, a CFD code developed by Rolls-Royce plc for turbomachinery design. This paper (1) documents the development of MG-CFD, (2) introduces an associated performance model with which it is possible to assess the performance of HYDRA on new HPC architectures, and (3) demonstrates that it is possible to use MG-CFD and the performance models to predict the performance of HYDRA with a mean error of 9.2% for strong-scaling studies
    • …
    corecore