17 research outputs found
Performance Enhancements for the Lattice-Boltzmann Solver in the LAVA Framework
Performance enhancements in NASA's recently developed Lattice Boltzmann solver within the Launch Ascent and Vehicle Aerodynamics (LAVA) framework are presented. Two key algorithmic developments are highlighted. A coarse-fine interface treatment that discretely conserves mass and momentum has been implemented and successfully verified and validated. Code optimizations targeting improved serial and parallel performance were presented. For a simple turbulent Taylor-Green Vortex problem, we were able to demonstrate a 2.3 times speedup over the baseline code for a single Skylake-SP node containing 40 physical cores, and a 2.14 times speedup for 64 nodes containing 2560 physical cores. In addition, we were able to show that the optimizations enabled us to scale the code almost perfectly to 20480 physical cores where, including ghost cells, the problem size was 10 billion cells
A Scalable and Modular Software Architecture for Finite Elements on Hierarchical Hybrid Grids
In this article, a new generic higher-order finite-element framework for
massively parallel simulations is presented. The modular software architecture
is carefully designed to exploit the resources of modern and future
supercomputers. Combining an unstructured topology with structured grid
refinement facilitates high geometric adaptability and matrix-free multigrid
implementations with excellent performance. Different abstraction levels and
fully distributed data structures additionally ensure high flexibility,
extensibility, and scalability. The software concepts support sophisticated
load balancing and flexibly combining finite element spaces. Example scenarios
with coupled systems of PDEs show the applicability of the concepts to
performing geophysical simulations.Comment: Preprint of an article submitted to International Journal of
Parallel, Emergent and Distributed Systems (Taylor & Francis
Dynamic Load Balancing Techniques for Particulate Flow Simulations
Parallel multiphysics simulations often suffer from load imbalances
originating from the applied coupling of algorithms with spatially and
temporally varying workloads. It is thus desirable to minimize these imbalances
to reduce the time to solution and to better utilize the available hardware
resources. Taking particulate flows as an illustrating example application, we
present and evaluate load balancing techniques that tackle this challenging
task. This involves a load estimation step in which the currently generated
workload is predicted. We describe in detail how such a workload estimator can
be developed. In a second step, load distribution strategies like space-filling
curves or graph partitioning are applied to dynamically distribute the load
among the available processes. To compare and analyze their performance, we
employ these techniques to a benchmark scenario and observe a reduction of the
load imbalances by almost a factor of four. This results in a decrease of the
overall runtime by 14% for space-filling curves