6 research outputs found
Data-integrated methods for performance improvement of massively parallel coupled simulations
This thesis presents data-integrated methods to improve the computational performance of partitioned multi-physics simulations, particularly on highly parallel systems. Partitioned methods allow using available single-physic solvers and well-validated numerical methods for multi-physics simulations by decomposing the domain into smaller sub-domains. Each sub-domain is solved by a separate solver and an external library is incorporated to couple the solvers. This significantly reduces the software development cost and enhances flexibility, while it introduces new challenges that must be addressed carefully. These challenges include but are not limited to, efficient data communication between sub-domains, data mapping between not-matching meshes, inter-solver load balancing, and equation coupling.
In the current work, inter-solver communication is improved by introducing a two-level communication initialization scheme to the coupling library preCICE. The new method significantly speed-ups the initialization and removes memory bottlenecks of the previous implementation. In addition, a data-driven inter-solver load balancing method is developed to efficiently distribute available computational resources between coupled single-physic solvers. This method employs both regressions and deep neural networks (DNN) for modeling the performance of the solvers and derives and solves an optimization problem to distribute the available CPU and GPU cores among solvers. To accelerate the equation coupling between strongly coupled solvers, a hybrid framework is developed that integrates DNNs and classical solvers. The DNN computes a solution estimation for each time step which is used by classical solvers as a first guess to compute the final solution. To preserve DNN's efficiency during the simulation, a dynamic re-training strategy is introduced that updates the DNN's weights on-the-fly. The cheap but accurate solution estimation by the DNN surrogate solver significantly reduces the number of subsequent classical iterations necessary for solution convergence. Finally, a highly scalable simulation environment is introduced for fluid-structure interaction problems. The environment consists of highly parallel numerical solvers and an efficient and scalable coupling library. This framework is able to efficiently exploit both CPU-only and hybrid CPU-GPU machines. Numerical performance investigations using a complex test case demonstrate a very high parallel efficiency on a large number of CPUs and a significant speed-up due to the GPU acceleration
Efficient and scalable initialization of partitioned coupled simulations with preCICE
preCICE is an open-source library, that provides comprehensive functionality to couple independent parallelized solver codes to establish a partitioned multi-physics multi-code simulation environment. For data communication between the respective executables at runtime, it implements a peer-to-peer concept, which renders the computational cost of the coupling per time step negligible compared to the typical run time of the coupled codes. To initialize the peer-to-peer coupling, the mesh partitions of the respective solvers need to be compared to determine the point-to-point communication channels between the processes of both codes. This initialization effort can become a limiting factor, if we either reach memory limits or if we have to re-initialize communication relations in every time step. In this contribution, we remove two remaining bottlenecks: (i) We base the neighborhood search between mesh entities of two solvers on a tree data structure to avoid quadratic complexity, and (ii) we replace the sequential gather-scatter comparison of both mesh partitions by a two-level approach that first compares bounding boxes around mesh partitions in a sequential manner, subsequently establishes pairwise communication between processes of the two solvers, and finally compares mesh partitions between connected processes in parallel. We show, that the two-level initialization method is fives times faster than the old one-level scheme on 24,567 CPU-cores using a mesh with 628,898 vertices. In addition, the two-level scheme is able to handle much larger computational meshes, since the central mesh communication of the one-level scheme is replaced with a fully point-to-point mesh communication scheme.Deutsche ForschungsgemeinschaftEuropean Union’s Horizon 2020 research and innovation program under the Marie Sklodowska-Curie gran
A GPU Accelerated Framework for Partitioned Solution of Fluid-Structure Interaction Problems
We present a GPU-accelerated solver for the partitioned solution of fluid-structure interaction (FSI) problems. Independent scalable fluid and structure solvers are coupled by a library which handles the inter-code data communication, mapping and equation coupling. A coupling strategy is incorporated which allows accelerating expensive components of the coupled framework by offloading them to GPUs. To prove the efficiency of the proposed coupling strategy in conjunction with the offloading scheme, we present a numerical performance analysis for a complex test case in the filed of biomedical engineering. The numerical experiments demonstrate an excellent speed-up in the accelerated kernels (up to 133 times) which results in 6 to 8 times faster overall simulations. In addition, we observed a very good reduction in total simulation time by increasing the exploited compute nodes up to 8 (complete machine capacity).We thank the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) for supporting this work by funding - EXC2075 – 390740016 under Germany’s Excellence Strategy. We acknowledge the support by the Stuttgart Center for Simulation Science (SimTech). This work was also financially supported by • priority program 1648 - Software for Exascale Computing 214 (ExaFSA - Exascale Simulation of Fluid-Structure-Acoustics Interactions) of the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation), • Ministerio de EconomĂa y Competitividad, SecretarĂa de Estado de Investigacion, Desarrollo e ´ Innovacion, Spain (ENE2017-88697-R). ´ The performance measurements were carried out on the Vulcan cluster at the High-Performance Computing Center Stuttgart (HLRS). The authors wish to thank HLRS for compute time and technical support.Peer ReviewedPostprint (published version
Efficient and Scalable Initialization of Partitioned Coupled Simulations with preCICE
preCICE is an open-source library, that provides comprehensive functionality to couple independent parallelized solver codes to establish a partitioned multi-physics multi-code simulation environment. For data communication between the respective executables at runtime, it implements a peer-to-peer concept, which renders the computational cost of the coupling per time step negligible compared to the typical run time of the coupled codes. To initialize the peer-to-peer coupling, the mesh partitions of the respective solvers need to be compared to determine the point-to-point communication channels between the processes of both codes. This initialization effort can become a limiting factor, if we either reach memory limits or if we have to re-initialize communication relations in every time step. In this contribution, we remove two remaining bottlenecks: (i) We base the neighborhood search between mesh entities of two solvers on a tree data structure to avoid quadratic complexity, and (ii) we replace the sequential gather-scatter comparison of both mesh partitions by a two-level approach that first compares bounding boxes around mesh partitions in a sequential manner, subsequently establishes pairwise communication between processes of the two solvers, and finally compares mesh partitions between connected processes in parallel. We show, that the two-level initialization method is fives times faster than the old one-level scheme on 24,567 CPU-cores using a mesh with 628,898 vertices. In addition, the two-level scheme is able to handle much larger computational meshes, since the central mesh communication of the one-level scheme is replaced with a fully point-to-point mesh communication scheme
A scalable framework for the partitioned solution of fluid-structure interaction problems
In this work, we present a scalable and efficient parallel solver for the partitioned solution of fluid–structure interaction problems through multi-code coupling. Two instances of an in-house parallel software, TermoFluids, are used to solve the fluid and the structural sub-problems, coupled together on the interface via the preCICE coupling library. For fluid flow, the Arbitrary Lagrangian–Eulerian form of the Navier–Stokes equations is solved on an unstructured conforming grid using a second-order finite-volume discretization. A parallel dynamic mesh method for unstructured meshes is used to track the moving boundary. For the structural problem, the nonlinear elastodynamics equations are solved on an unstructured grid using a second-order finite-volume method. A semi-implicit FSI coupling method is used which segregates the fluid pressure term and couples it strongly to the structure, while the remaining fluid terms and the geometrical nonlinearities are only loosely coupled. A robust and advanced multi-vector quasi-Newton method is used for the coupling iterations between the solvers. Both the fluid and the structural solver use distributed-memory parallelism. The intra-solver communication required for data update in the solution process is carried out using non-blocking point-to-point communicators. The inter-code communication is fully parallel and point-to-point, avoiding any central communication unit. Inside each single-physics solver, the load is balanced by dividing the computational domain into fairly equal blocks for each process. Additionally, a load balancing model is used at the inter-code level to minimize the overall idle time of the processes. Two practical test cases in the context of hemodynamics are studied, demonstrating the accuracy and computational efficiency of the coupled solver. Strong scalability test results show a parallel efficiency of 83% on 10,080 CPU cores.This work was financially supported by—Ministerio de EconomĂa y Competitividad, SecretarĂa de Estado de InvestigaciĂłn, Desarrollo e InnovaciĂłn, Spain (ENE2017-88697-R),—priority program 1648—Software for Exascale Computing 214 (ExaFSA - Exascale Simulation of Fluid–Structure–Acoustics Interactions) of the German Research Foundation,—and a FI Ph.D. scholarship by the Agència de GestiĂł d’Ajuts Universitaris i de Recerca (AGAUR) of Generalitat de Catalunya (Spain). The performance measurements were carried out on the SuperMUC supercomputer at Leibniz Rechenzentrum (LRZ) der Bayerischen Akademie der Wissenschaften. The authors wish to thank LRZ for the computing time and the technical support.Peer ReviewedPostprint (author's final draft