774 research outputs found

    Real Time Wake Computations using Lattice Boltzmann Method on Many Integrated Core Processors

    Get PDF
    This paper puts forward an efficient Lattice Boltzmann method for use as a wake simulator suitable for real-time environments. The method is limited to low speed incompressible flow but is very efficient and can be used to compute flows “on the fly”. In particular, many-core machines allow for the method to be used with the need of very expensive parallel clusters. Results are shown here for flows around cylinders and simple ship shapes

    Real Time Wake Computations using Lattice Boltzmann Method on Many Integrated Core Processors

    Get PDF
    This paper puts forward an efficient Lattice Boltzmann method for use as a wake simulator suitable for real-time environments. The method is limited to low speed incompressible flow but is very efficient and can be used to compute flows “on the fly”. In particular, many-core machines allow for the method to be used with the need of very expensive parallel clusters. Results are shown here for flows around cylinders and simple ship shapes

    Link-wise Artificial Compressibility Method

    Get PDF
    The Artificial Compressibility Method (ACM) for the incompressible Navier-Stokes equations is (link-wise) reformulated (referred to as LW-ACM) by a finite set of discrete directions (links) on a regular Cartesian mesh, in analogy with the Lattice Boltzmann Method (LBM). The main advantage is the possibility of exploiting well established technologies originally developed for LBM and classical computational fluid dynamics, with special emphasis on finite differences (at least in the present paper), at the cost of minor changes. For instance, wall boundaries not aligned with the background Cartesian mesh can be taken into account by tracing the intersections of each link with the wall (analogously to LBM technology). LW-ACM requires no high-order moments beyond hydrodynamics (often referred to as ghost moments) and no kinetic expansion. Like finite difference schemes, only standard Taylor expansion is needed for analyzing consistency. Preliminary efforts towards optimal implementations have shown that LW-ACM is capable of similar computational speed as optimized (BGK-) LBM. In addition, the memory demand is significantly smaller than (BGK-) LBM. Importantly, with an efficient implementation, this algorithm may be one of the few which is compute-bound and not memory-bound. Two- and three-dimensional benchmarks are investigated, and an extensive comparative study between the present approach and state of the art methods from the literature is carried out. Numerical evidences suggest that LW-ACM represents an excellent alternative in terms of simplicity, stability and accuracy.Comment: 62 pages, 20 figure

    Lattice Boltzmann modeling for shallow water equations using high performance computing

    Get PDF
    The aim of this dissertation project is to extend the standard Lattice Boltzmann method (LBM) for shallow water flows in order to deal with three dimensional flow fields. The shallow water and mass transport equations have wide applications in ocean, coastal, and hydraulic engineering, which can benefit from the advantages of the LBM. The LBM has recently become an attractive numerical method to solve various fluid dynamics phenomena; however, it has not been extensively applied to modeling shallow water flow and mass transport. Only a few works can be found on improving the LBM for mass transport in shallow water flows and even fewer on extending it to model three dimensional shallow water flow fields. The application of the LBM to modeling the shallow water and mass transport equations has been limited because it is not clearly understood how the LBM solves the shallow water and mass transport equations. The project first focuses on studying the importance of choosing enhanced collision operators such as the multiple-relaxation-time (MRT) and two-relaxation-time (TRT) over the standard single-relaxation-time (SRT) in LBM. A (MRT) collision operator is chosen for the shallow water equations, while a (TRT) method is used for the advection-dispersion equation. Furthermore, two speed-of-sound techniques are introduced to account for heterogeneous and anisotropic dispersion coefficients. By selecting appropriate equilibrium distribution functions, the standard LBM is extended to solve three-dimensional wind-driven and density-driven circulation by introducing a multi-layer LB model. A MRT-LBM model is used to solve for each layer coupled by the vertical viscosity forcing term. To increase solution stability, an implicit step is suggested to obtain stratified flow velocities. Numerical examples are presented to verify the multi-layer LB model against analytical solutions. The model’s capability of calculating lateral and vertical distributions of the horizontal velocities is demonstrated for wind- and density- driven circulation over non-uniform bathymetry. The parallel performance of the LBM on central processing unit (CPU) based and graphics processing unit (GPU) based high performance computing (HPC) architectures is investigated showing attractive performance in relation to speedup and scalability

    GPU-accelerated simulation of colloidal suspensions with direct hydrodynamic interactions

    Full text link
    Solvent-mediated hydrodynamic interactions between colloidal particles can significantly alter their dynamics. We discuss the implementation of Stokesian dynamics in leading approximation for streaming processors as provided by the compute unified device architecture (CUDA) of recent graphics processors (GPUs). Thereby, the simulation of explicit solvent particles is avoided and hydrodynamic interactions can easily be accounted for in already available, highly accelerated molecular dynamics simulations. Special emphasis is put on efficient memory access and numerical stability. The algorithm is applied to the periodic sedimentation of a cluster of four suspended particles. Finally, we investigate the runtime performance of generic memory access patterns of complexity O(N2)O(N^2) for various GPU algorithms relying on either hardware cache or shared memory.Comment: to appear in a special issue of Eur. Phys. J. Special Topics on "Computer Simulations on GPUs

    The TheLMA project: a thermal lattice Boltzmann solver for the GPU

    Get PDF
    International audienceIn this paper, we consider the implementation of a thermal flow solver based on the lattice Boltzmann method (LBM) for graphics processing units (GPUs). We first describe the hybrid thermal LBM model implemented, and give a concise review of the CUDA technology. The specific issues that arise with LBM on GPUs are outlined. We propose an approach for efficient handling of the thermal part. Performance is close to optimum and is significantly better than the one of comparable CPU solvers. We validate our code by simulating the differentially heated cubic cavity (DHC). The computed results for steady flow patterns are in good agreement with previously published ones. Finally, we use our solver to study the phenomenology of transitional flows in the DHC

    Scalable Lattice Boltzmann Solvers for CUDA GPU Clusters

    No full text
    International audienceThe lattice Boltzmann method (LBM) is an innovative and promising approach in computational fluid dynamics. From an algorithmic standpoint it reduces to a regular data parallel procedure and is therefore well-suited to high performance computations. Numerous works report efficient implementations of the LBM for the GPU, but very few mention multi-GPU versions and even fewer GPU cluster implementations. Yet, to be of practical interest, GPU LBM solvers need to be able to perform large scale simulations. In the present contribution, we describe an efficient LBM implementation for CUDA GPU clusters. Our solver consists of a set of MPI communication routines and a CUDA kernel specifically designed to handle three-dimensional partitioning of the computation domain. Performance measurement were carried out on a small cluster. We show that the results are satisfying, both in terms of data throughput and parallelisation efficiency
    corecore