11 research outputs found

    Automatische Codegenerierung fĂŒr Massiv Parallele Applikationen in der Numerischen Strömungsmechanik

    Get PDF
    Solving partial differential equations (PDEs) is a fundamental challenge in many application domains in industry and academia alike. With increasingly large problems, efficient and highly scalable implementations become more and more crucial. Today, facing this challenge is more difficult than ever due to the increasingly heterogeneous hardware landscape. One promising approach is developing domain‐specific languages (DSLs) for a set of applications. Using code generation techniques then allows targeting a range of hardware platforms while concurrently applying domain‐specific optimizations in an automated fashion. The present work aims to further the state of the art in this field. As domain, we choose PDE solvers and, in particular, those from the group of geometric multigrid methods. To avoid having a focus too broad, we restrict ourselves to methods working on structured and patch‐structured grids. We face the challenge of handling a domain as complex as ours, while providing different abstractions for diverse user groups, by splitting our external DSL ExaSlang into multiple layers, each specifying different aspects of the final application. Layer 1 is designed to resemble LaTeX and allows inputting continuous equations and functions. Their discretization is expressed on layer 2. It is complemented by algorithmic components which can be implemented in a Matlab‐like syntax on layer 3. All information provided to this point is summarized on layer 4, enriched with particulars about data structures and the employed parallelization. Additionally, we support automated progression between the different layers. All ExaSlang input is processed by our jointly developed Scala code generation framework to ultimately emit C++ code. We particularly focus on how to generate applications parallelized with, e.g., MPI and OpenMP that are able to run on workstations and large‐scale cluster alike. We showcase the applicability of our approach by implementing simple test problems, like Poisson’s equation, as well as relevant applications from the field of computational fluid dynamics (CFD). In particular, we implement scalable solvers for the Stokes, Navier‐Stokes and shallow water equations (SWE) discretized using finite differences (FD) and finite volumes (FV). For the case of Navier‐Stokes, we also extend our implementation towards non‐uniform grids, thereby enabling static mesh refinement, and advanced effects such as the simulated fluid being non‐Newtonian and non‐isothermal

    p-adaptive discontinuous Galerkin method for the shallow water equations on heterogeneous computing architectures

    Full text link
    Heterogeneous computing and exploiting integrated CPU-GPU architectures has become a clear current trend since the flattening of Moore's Law. In this work, we propose a numerical and algorithmic re-design of a p-adaptive quadrature-free discontinuous Galerkin method (DG) for the shallow water equations (SWE). Our new approach separates the computations of the non-adaptive (lower-order) and adaptive (higher-order) parts of the discretization form each other. Thereby, we can overlap computations of the lower-order and the higher-order DG solution components. Furthermore, we investigate execution times of main computational kernels and use automatic code generation to optimize their distribution between the CPU and GPU. Several setups, including a prototype of a tsunami simulation in a tide-driven flow scenario, are investigated, and the results show that significant performance improvements can be achieved in suitable setups

    A Framework for Interactive Physical Simulations on Remote HPC Clusters

    No full text
    In this work, we introduce the framework for visualization and interactivity for physics engines in real-time, for short VIPER. It is able to execute various physical simulations, visualize the simulation results in real-time and offer computational steering. Especially interesting in this context are simulations running on remotely accessible HPC clusters. As an example, we present a particulate flow simulation consisting of a coupled rigid body and CFD simulation, the chosen visualization strategy and steering possibilities. Additionally, performance evaluations and a performance prediction model concerning the update rate for remote simulations in the context of the VIPER framework are given

    Automatic Generation of Massively Parallel Codes from ExaSlang

    No full text
    Domain-specific languages (DSLs) have the potential to provide an intuitive interface for specifying problems and solutions for domain experts. Based on this, code generation frameworks can produce compilable source code. However, apart from optimizing execution performance, parallelization is key for pushing the limits in problem size and an essential ingredient for exascale performance. We discuss necessary concepts for the introduction of such capabilities in code generators. In particular, those for partitioning the problem to be solved and accessing the partitioned data are elaborated. Furthermore, possible approaches to expose parallelism to users through a given DSL are discussed. Moreover, we present the implementation of these concepts in the ExaStencils framework. In its scope, a code generation framework for highly optimized and massively parallel geometric multigrid solvers is developed. It uses specifications from its multi-layered external DSL ExaSlang as input. Based on a general version for generating parallel code, we develop and implement widely applicable extensions and optimizations. Finally, a performance study of generated applications is conducted on the JuQueen supercomputer

    Towards Virtual Hardware Prototyping for Generated Geometric Multigrid Solvers

    No full text
    Many applications in scientific computing require solving one or more partial differential equations (PDEs). For this task, solvers from the class of multigrid methods are known to be amongst the most efficient. An optimal implementation, however, is highly dependent on the specific problem as well as the target hardware. As energy efficiency is a big topic in today's computing centers, energy-efficient platforms such as ARM-based clusters are actively researched. In this work, we present a domain-specific approach, starting with the problem formulation in a domain-specific language (DSL), down to code generation targeting a variety of systems including embedded architectures. Furthermore, we present an approach to simulate embedded architectures to achieve an optimal hardware/software co-design, i.e., an optimal composition of software and hardware modifications. In this context, we use a virtual environment (OVP) that enables the adaptation of multicore models and their simulation in an efficient way. Our approach shows that execution time prediction for ARM-based platforms is possible and feasible but has to be enhanced with more detailed cache and memory models. We substantiate our claims by providing results for the performance prediction of geometric multigrid solvers generated by the ExaStencils framework
    corecore