461 research outputs found
A Flexible Patch-Based Lattice Boltzmann Parallelization Approach for Heterogeneous GPU-CPU Clusters
Sustaining a large fraction of single GPU performance in parallel
computations is considered to be the major problem of GPU-based clusters. In
this article, this topic is addressed in the context of a lattice Boltzmann
flow solver that is integrated in the WaLBerla software framework. We propose a
multi-GPU implementation using a block-structured MPI parallelization, suitable
for load balancing and heterogeneous computations on CPUs and GPUs. The
overhead required for multi-GPU simulations is discussed in detail and it is
demonstrated that the kernel performance can be sustained to a large extent.
With our GPU implementation, we achieve nearly perfect weak scalability on
InfiniBand clusters. However, in strong scaling scenarios multi-GPUs make less
efficient use of the hardware than IBM BG/P and x86 clusters. Hence, a cost
analysis must determine the best course of action for a particular simulation
task. Additionally, weak scaling results of heterogeneous simulations conducted
on CPUs and GPUs simultaneously are presented using clusters equipped with
varying node configurations.Comment: 20 pages, 12 figure
Dynamic Load Balancing Techniques for Particulate Flow Simulations
Parallel multiphysics simulations often suffer from load imbalances
originating from the applied coupling of algorithms with spatially and
temporally varying workloads. It is thus desirable to minimize these imbalances
to reduce the time to solution and to better utilize the available hardware
resources. Taking particulate flows as an illustrating example application, we
present and evaluate load balancing techniques that tackle this challenging
task. This involves a load estimation step in which the currently generated
workload is predicted. We describe in detail how such a workload estimator can
be developed. In a second step, load distribution strategies like space-filling
curves or graph partitioning are applied to dynamically distribute the load
among the available processes. To compare and analyze their performance, we
employ these techniques to a benchmark scenario and observe a reduction of the
load imbalances by almost a factor of four. This results in a decrease of the
overall runtime by 14% for space-filling curves
Zynq SoC based acceleration of the lattice Boltzmann method
Cerebral aneurysm is a lifeâthreatening condition. It is a weakness in a blood vessel that may enlarge and bleed into the surrounding area. In order to understand the surrounding environmental conditions during the interventions or surgical procedures, a simulation of blood flow in cerebral arteries is needed. One of the effective simulation approaches is to use the lattice Boltzmann (LB) method. Due to the computational complexity of the algorithm, the simulation is usually performed on high performance computers. In this paper, efficient hardware architectures of the LB method on a Zynq systemâonâchip (SoC) are designed and implemented. The proposed architectures have first been simulated in Vivado HLS environment and later implemented on a ZedBoard using the softwareâdefined SoC (SDSoC) development environment. In addition, a set of evaluations of different hardware architectures of the LB implementation is discussed in this paper. The experimental results show that the proposed implementation is able to accelerate the processing speed by a factor of 52 compared to a dualâcore ARM processorâbased software implementation
Lattice-Boltzmann LES modelling of a full-scale, biogas-mixed anaerobic digester
An EulerâLagrange multicomponent, non-Newtonian Lattice-Boltzmann method is applied for the first time to model a full-scale gas-mixed anaerobic digester for wastewater treatment. Rheology is modelled through a power-law model and, for the first time in gas-mixed anaerobic digestion modelling, turbulence is modelled through a Smagorinsky Large Eddy Simulation model. The hydrodynamics of the digester is studied by analysing flow and viscosity patterns, and assessing the degree of mixing through the Uniformity Index method. Results show independence from the grid size and the number of Lagrangian substeps employed for the Lagrangian sub-grid simulation model. Flow patterns are shown to depend mildly on the choice of bubble size, but not the asymptotic degree of mixing. Numerical runs of the model are compared to previous results in the literature, from a second-ordered Finite-Volume Method approach, and demonstrate an improvement, compared to literature data, of 1000-fold computational efficiency, massive parallelizability and much finer attainable spatial resolution. Whilst previous research concluded that the application of LES to full-scale anaerobic digestion mixing is unfeasible because of high computational expense, the increase in computational efficiency demonstrated here, now makes LES a feasible option to industries and consultancies
Proceedings of the 5th bwHPC Symposium
In modern science, the demand for more powerful and integrated research
infrastructures is growing constantly to address computational challenges
in data analysis, modeling and simulation. The bwHPC initiative, founded
by the Ministry of Science, Research and the Arts and the universities in
Baden-WĂŒrttemberg, is a state-wide federated approach aimed at assisting
scientists with mastering these challenges. At the 5th bwHPC Symposium
in September 2018, scientific users, technical operators and government
representatives came together for two days at the University of Freiburg. The
symposium provided an opportunity to present scientific results that were
obtained with the help of bwHPC resources. Additionally, the symposium served
as a platform for discussing and exchanging ideas concerning the use of these
large scientific infrastructures as well as its further development
- âŠ