11 research outputs found
Automatische Codegenerierung fĂŒr Massiv Parallele Applikationen in der Numerischen Strömungsmechanik
Solving partial differential equations (PDEs) is a fundamental challenge in many application domains in industry and academia alike. With increasingly large problems, efficient and highly scalable implementations become more and more crucial. Today, facing this challenge is more difficult than ever due to the increasingly heterogeneous hardware landscape. One promising approach is developing domainâspecific languages (DSLs) for a set of applications. Using code generation techniques then allows targeting a range of hardware platforms while concurrently applying domainâspecific optimizations in an automated fashion. The present work aims to further the state of the art in this field. As domain, we choose PDE solvers and, in particular, those from the group of geometric multigrid methods. To avoid having a focus too broad, we restrict ourselves to methods working on structured and patchâstructured grids.
We face the challenge of handling a domain as complex as ours, while providing different abstractions for diverse user groups, by splitting our external DSL ExaSlang into multiple layers, each specifying different aspects of the final application. Layer 1 is designed to resemble LaTeX and allows inputting continuous equations and functions. Their discretization is expressed on layer 2. It is complemented by algorithmic components which can be implemented in a Matlabâlike syntax on layer 3. All information provided to this point is summarized on layer 4, enriched with particulars about data structures and the employed parallelization. Additionally, we support automated progression between the different layers. All ExaSlang input is processed by our jointly developed Scala code generation framework to ultimately emit C++ code. We particularly focus on how to generate applications parallelized with, e.g., MPI and OpenMP that are able to run on workstations and largeâscale cluster alike.
We showcase the applicability of our approach by implementing simple test problems, like Poissonâs equation, as well as relevant applications from the field of computational fluid dynamics (CFD). In particular, we implement scalable solvers for the Stokes, NavierâStokes and shallow water equations (SWE) discretized using finite differences (FD) and finite volumes (FV). For the case of NavierâStokes, we also extend our implementation towards nonâuniform grids, thereby enabling static mesh refinement, and advanced effects such as the simulated fluid being nonâNewtonian and nonâisothermal
p-adaptive discontinuous Galerkin method for the shallow water equations on heterogeneous computing architectures
Heterogeneous computing and exploiting integrated CPU-GPU architectures has
become a clear current trend since the flattening of Moore's Law. In this work,
we propose a numerical and algorithmic re-design of a p-adaptive
quadrature-free discontinuous Galerkin method (DG) for the shallow water
equations (SWE). Our new approach separates the computations of the
non-adaptive (lower-order) and adaptive (higher-order) parts of the
discretization form each other. Thereby, we can overlap computations of the
lower-order and the higher-order DG solution components. Furthermore, we
investigate execution times of main computational kernels and use automatic
code generation to optimize their distribution between the CPU and GPU. Several
setups, including a prototype of a tsunami simulation in a tide-driven flow
scenario, are investigated, and the results show that significant performance
improvements can be achieved in suitable setups
A Framework for Interactive Physical Simulations on Remote HPC Clusters
In this work, we introduce the framework for visualization and interactivity for physics engines in real-time, for short VIPER. It is able to execute various physical simulations, visualize the simulation results in real-time and offer computational steering. Especially interesting in this context are simulations running on remotely accessible HPC clusters. As an example, we present a particulate flow simulation consisting of a coupled rigid body and CFD simulation, the chosen visualization strategy and steering possibilities. Additionally, performance evaluations and a performance prediction model concerning the update rate for remote simulations in the context of the VIPER framework are given
Automatic Generation of Massively Parallel Codes from ExaSlang
Domain-specific languages (DSLs) have the potential to provide an intuitive interface for specifying problems and solutions for domain experts. Based on this, code generation frameworks can produce compilable source code. However, apart from optimizing execution performance, parallelization is key for pushing the limits in problem size and an essential ingredient for exascale performance. We discuss necessary concepts for the introduction of such capabilities in code generators. In particular, those for partitioning the problem to be solved and accessing the partitioned data are elaborated. Furthermore, possible approaches to expose parallelism to users through a given DSL are discussed. Moreover, we present the implementation of these concepts in the ExaStencils framework. In its scope, a code generation framework for highly optimized and massively parallel geometric multigrid solvers is developed. It uses specifications from its multi-layered external DSL ExaSlang as input. Based on a general version for generating parallel code, we develop and implement widely applicable extensions and optimizations. Finally, a performance study of generated applications is conducted on the JuQueen supercomputer
Towards Virtual Hardware Prototyping for Generated Geometric Multigrid Solvers
Many applications in scientific computing require solving one or more partial differential equations
(PDEs). For this task, solvers from the class of multigrid methods are known to be amongst the most efficient. An optimal implementation, however, is highly dependent on the specific problem as well as the target hardware. As energy efficiency is a big topic in today's computing centers, energy-efficient platforms such as ARM-based clusters are actively researched. In this work, we present a domain-specific approach, starting with the problem formulation in a domain-specific language (DSL), down to code generation targeting a variety of systems including embedded architectures. Furthermore, we present an approach to simulate embedded architectures to achieve an optimal hardware/software co-design, i.e., an optimal composition of software and hardware modifications. In this context, we use a virtual environment (OVP) that enables the adaptation of multicore models and their simulation in an efficient way. Our approach shows that execution time prediction for ARM-based platforms is possible and feasible but has to be enhanced with more detailed cache and memory models. We substantiate our claims by providing results for the performance prediction of geometric multigrid solvers generated by the ExaStencils framework