19,473 research outputs found
A Flexible Patch-Based Lattice Boltzmann Parallelization Approach for Heterogeneous GPU-CPU Clusters
Sustaining a large fraction of single GPU performance in parallel
computations is considered to be the major problem of GPU-based clusters. In
this article, this topic is addressed in the context of a lattice Boltzmann
flow solver that is integrated in the WaLBerla software framework. We propose a
multi-GPU implementation using a block-structured MPI parallelization, suitable
for load balancing and heterogeneous computations on CPUs and GPUs. The
overhead required for multi-GPU simulations is discussed in detail and it is
demonstrated that the kernel performance can be sustained to a large extent.
With our GPU implementation, we achieve nearly perfect weak scalability on
InfiniBand clusters. However, in strong scaling scenarios multi-GPUs make less
efficient use of the hardware than IBM BG/P and x86 clusters. Hence, a cost
analysis must determine the best course of action for a particular simulation
task. Additionally, weak scaling results of heterogeneous simulations conducted
on CPUs and GPUs simultaneously are presented using clusters equipped with
varying node configurations.Comment: 20 pages, 12 figure
Real Time Wake Computations using Lattice Boltzmann Method on Many Integrated Core Processors
This paper puts forward an efficient Lattice Boltzmann method for use as a wake simulator suitable for
real-time environments. The method is limited to low speed incompressible flow but is very efficient and
can be used to compute flows “on the fly”. In particular, many-core machines allow for the method to be
used with the need of very expensive parallel clusters. Results are shown here for flows around
cylinders and simple ship shapes
Real Time Wake Computations using Lattice Boltzmann Method on Many Integrated Core Processors
This paper puts forward an efficient Lattice Boltzmann method for use as a wake simulator suitable for
real-time environments. The method is limited to low speed incompressible flow but is very efficient and
can be used to compute flows “on the fly”. In particular, many-core machines allow for the method to be
used with the need of very expensive parallel clusters. Results are shown here for flows around
cylinders and simple ship shapes
- …