Search CORE

370 research outputs found

Accelerating fluid-solid simulations (Lattice-Boltzmann & Immersed-Boundary) on heterogeneous architectures

Author: Aidun
Bailey
Bernaschi
Bhatnagar
Calhoun
Dalton
Favier
Feichtinger
Green
Guo
Habich
Huang
Januszewski
Layton
Lima
Peskin
Pinelli
Qian
Rinaldi
Roma
Russell
Schnherr
Shet
Shet
Succi
Taira
Uhlmann
Valero-Lara
Valero-Lara
Valero-Lara
Wellein
Wittmann
Wu
Xu
Zhou
Zhu
Zhu
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

We propose a numerical approach based on the Lattice-Boltzmann (LBM) and Immersed Boundary (IB) methods to tackle the problem of the interaction of solids with an incompressible fluid flow, and its implementation on heterogeneous platforms based on data-parallel accelerators such as NVIDIA GPUs and the Intel Xeon Phi. We explain in detail the parallelization of these methods and describe a number of optimizations, mainly focusing on improving memory management and reducing the cost of host-accelerator communication. As previous research has consistently shown, pure LBM simulations are able to achieve good performance results on heterogeneous systems thanks to the high parallel efficiency of this method. Unfortunately, when coupling LBM and IB methods, the overheads of IB degrade the overall performance. As an alternative, we have explored different hybrid implementations that effectively hide such overheads and allow us to exploit both the multi-core and the hardware accelerator in a cooperative way, with excellent performance results

City Research Online

Crossref

BCAM's Institutional Repository Data

HAL AMU

Reducing memory requirements for large size LBM simulations on GPUs

Author: Axner
Bernaschi
Bernaschi
Gross
He
Januszewski
Kollmannsberger
Latt
Li
Li
Malaspinas
Marié
Mohamad
Obrecht
Pohl
Qian
Rinaldi
Shet
Succi
Valero-Lara
Valero-Lara
Valero-Lara
Valero-Lara
Valero-Lara
Valero-Lara
Valero-Lara
Wellein
Wendt
Yang
Yang
Ye
Publication venue: 'Wiley'
Publication date: 01/01/2017
Field of study

The scientific community in its never-ending road of larger and more efficient computational resources is in need of more efficient implementations that can adapt efficiently on the current parallel platforms. Graphics processing units are an appropriate platform that cover some of these demands. This architecture presents a high performance with a reduced cost and an efficient power consumption. However, the memory capacity in these devices is reduced and so expensive memory transfers are necessary to deal with big problems. Today, the lattice-Boltzmann method (LBM) has positioned as an efficient approach for Computational Fluid Dynamics simulations. Despite this method is particularly amenable to be efficiently parallelized, it is in need of a considerable memory capacity, which is the consequence of a dramatic fall in performance when dealing with large simulations. In this work, we propose some initiatives to minimize such demand of memory, which allows us to execute bigger simulations on the same platform without additional memory transfers, keeping a high performance. In particular, we present 2 new implementations, LBM-Ghost and LBM-Swap, which are deeply analyzed, presenting the pros and cons of each of them.This project was funded by the Spanish Ministry of Economy and Competitiveness (MINECO): BCAM Severo Ochoa accreditation SEV-2013-0323, MTM2013-40824, Computación de Altas Prestaciones VII TIN2015-65316-P, by the Basque Excellence Research Center (BERC 2014-2017) pro- gram by the Basque Government, and by the Departament d' Innovació, Universitats i Empresa de la Generalitat de Catalunya, under project MPEXPAR: Models de Programació i Entorns d' Execució Paral·lels (2014-SGR-1051). We also thank the support of the computing facilities of Extremadura Research Centre for Advanced Technologies (CETA-CIEMAT) and NVIDIA GPU Research Center program for the provided resources, as well as the support of NVIDIA through the BSC/UPC NVIDIA GPU Center of Excellence.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

UPCommons. Portal del coneixement obert de la UPC

Recommended from our members

Accelerating solid-fluid interaction using Lattice-Boltzmann and Immersed Boundary coupled simulations on heterogeneous platforms

Author: Pinelli A.
Prieto-Matias M.
Valero-Lara P.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

We propose a numerical approach based on the Lattice-Boltzmann (LBM) and Immersed Boundary (IB) methods to tackle the problem of the interaction of solids with an incompressible fluid flow. The proposed method uses a Cartesian uniform grid that incorporates both the fluid and the solid domain. This is a very optimum and novel method to solve this problem and is a growing research topic in Computational Fluid Dynamics. We explain in detail the parallelization of the whole method on both GPUs and an heterogeneous GPU-Multicore platform and describe different optimizations, focusing on memory management and CPU-GPU communication. Our performance evaluation consists of a series of numerical experiments that simulate situations of industrial and research interest. Based on these tests, we have shown that the baseline LBM implementation achieves satisfactory results on GPUs. Unfortunately, when coupling LBM and IB methods on GPUs, the overheads of IB degrade the overall performance. As an alternative we have explored an heterogeneous implementation that is able to hide such overheads and allows us to exploit both Multicore and GPU resources in a cooperative way

City Research Online

Elsevier - Publisher Connector

Crossref

FULL GPU Implementation of Lattice-Boltzmann Methods with Immersed Boundary Conditions for Fast Fluid Simulations

Author: Boroni Gustavo Adolfo
Dottori Javier Alejandro
Rinaldi Pablo Rafael
Publication venue: 'Multiphysics'
Publication date: 01/03/2017
Field of study

Lattice Boltzmann Method (LBM) has shown great potential in fluid simulations, but performance issues and difficulties to manage complex boundary conditions have hindered a wider application. The upcoming of Graphic Processing Units (GPU) Computing offered a possible solution for the performance issue, and methods like the Immersed Boundary (IB) algorithm proved to be a flexible solution to boundaries. Unfortunately, the implicit IB algorithm makes the LBM implementation in GPU a non-trivial task. This work presents a fully parallel GPU implementation of LBM in combination with IB. The fluid-boundary interaction is implemented via GPU kernels, using execution configurations and data structures specifically designed to accelerate each code execution. Simulations were validated against experimental and analytical data showing good agreement and improving the computational time. Substantial reductions of calculation rates were achieved, lowering down the required time to execute the same model in a CPU to about two magnitude orders.Fil: Boroni, Gustavo Adolfo. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional del Centro de la Provincia de Buenos Aires. Facultad de Ciencias Exactas. Grupo de Plasmas Densos Magnetizados. Provincia de Buenos Aires. Gobernación. Comision de Investigaciones Científicas. Grupo de Plasmas Densos Magnetizados; ArgentinaFil: Dottori, Javier Alejandro. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional del Centro de la Provincia de Buenos Aires. Facultad de Ciencias Exactas. Grupo de Plasmas Densos Magnetizados. Provincia de Buenos Aires. Gobernación. Comision de Investigaciones Científicas. Grupo de Plasmas Densos Magnetizados; ArgentinaFil: Rinaldi, Pablo Rafael. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional del Centro de la Provincia de Buenos Aires; Argentina. Provincia de Buenos Aires. Gobernación. Comisión de Investigaciones Científicas; Argentin

CONICET Digital

Directory of Open Access Journals

Particle-resolved thermal lattice Boltzmann simulation using OpenACC on multi-GPUs

Author: Li Bo-Tao
Xu Ao
Publication venue
Publication date: 31/08/2023
Field of study

We utilize the Open Accelerator (OpenACC) approach for graphics processing unit (GPU) accelerated particle-resolved thermal lattice Boltzmann (LB) simulation. We adopt the momentum-exchange method to calculate fluid-particle interactions to preserve the simplicity of the LB method. To address load imbalance issues, we extend the indirect addressing method to collect fluid-particle link information at each timestep and store indices of fluid-particle link in a fixed index array. We simulate the sedimentation of 4,800 hot particles in cold fluids with a domain size of

4000^{2}

, and the simulation achieves 1750 million lattice updates per second (MLUPS) on a single GPU. Furthermore, we implement a hybrid OpenACC and message passing interface (MPI) approach for multi-GPU accelerated simulation. This approach incorporates four optimization strategies, including building domain lists, utilizing request-answer communication, overlapping communications with computations, and executing computation tasks concurrently. By reducing data communication between GPUs, hiding communication latency through overlapping computation, and increasing the utilization of GPU resources, we achieve improved performance, reaching 10846 MLUPS using 8 GPUs. Our results demonstrate that the OpenACC-based GPU acceleration is promising for particle-resolved thermal lattice Boltzmann simulation.Comment: 45 pages, 18 figure

arXiv.org e-Print Archive

Multi-domain grid refinement for lattice-Boltzmann simulations on heterogeneous platforms

Author: Jansson J.
Valero-Lara P.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

The main contribution of the present work consists of several parallel approaches for grid refinement based on a multi-domain decomposition for lattice-Boltzmann simulations. The proposed method for discretizing the fluid incorporates different regular Cartesian grids with no homogeneous spatial domains, which are in need to be communicated each other. Three different parallel approaches are proposed, homogeneous Multicore, homogeneous GPU, and heterogeneous Multicore-GPU. Although, the homogeneous implementations exhibit satisfactory results, the heterogeneous approach achieves up to 30% extra efficiency, in terms of Millions of Fluid Lattice Updates per Second (MFLUPS), by overlapping some of the steps on both architectures, Multicore and GPU

Crossref

BCAM's Institutional Repository Data

Leveraging the Performance of LBM-HPC for Large Sizes on GPUs using Ghost Cells

Author: Valero-Lara P.
Publication venue
Publication date: 10/11/2016
Field of study

Today, we are living a growing demand of larger and more efficient computational resources from the scienti c community. On the other hand, the appearance of GPUs for general purpose computing supposed an important advance for covering such demand. These devices o er an impressive computational capacity at low cost and an efficient power consumption. However, the memory available in these devices is (sometimes) not enough, and so it is necessary computationally expensive memory transfers from (to) CPU to (from) GPU, causing a dramatic fall in performance. Recently, the Lattice-Boltzmann Method has positioned as an e cient methodology for fluid simulations. Although this method presents some interesting features particularly amenable to be efficiently exploited on parallel computers, it requires a considerable memory capacity, which can suppose an important drawback, in particular, on GPUs. In the present paper, it is proposed a new GPU-based implementation, which minimizes such requirements with respect to other state-of-the-art implementations. It allows us to execute almost 2 bigger problems without additional memory transfers, achieving faster executions when dealing with large problems

BCAM's Institutional Repository Data

Leveraging the performance of LBM-HPC for large sizes on GPUs using ghost cells

Author: AA Mohamad
AG Shet
C Obrecht
G Wellein
JF Wendt
M Bernaschi
M Bernaschi
M Januszewski
O Malaspinas
P Bhatnagar
P Rinaldi
P Valero-Lara
P Valero-Lara
P Valero-Lara
S Kollmannsberger
S Marié
S Succi
T Pohl
X He
YH Qian
Publication venue
Publication date: 10/11/2016
Field of study

Today, we are living a growing demand of larger and more efficient computational resources from the scientific community. On the other hand, the appearance of GPUs for general purpose computing supposed an important advance for covering such demand. These devices offer an impressive computational capacity at low cost and an efficient power consumption. However, the memory available in these devices is (sometimes) not enough, and so it is necessary computationally expensive memory transfers from (to) CPU to (from) GPU, causing a dramatic fall in performance. Recently, the Lattice-Boltzmann Method has positioned as an efficient methodology for fluid simulations. Although this method presents some interesting features particularly amenable to be efficiently exploited on parallel computers, it requires a considerable memory capacity, which can suppose an important drawback, in particular, on GPUs. In the present paper, it is proposed a new GPU-based implementation, which minimizes such requirements with respect to other state-of-the-art implementations. It allows us to execute almost 2

x

bigger problems without additional memory transfers, achieving faster executions when dealing with large problems

Crossref

BCAM's Institutional Repository Data

A Python implementation in graphic processing unit of a lattice Boltzmann model for unstable three-dimensional flows in immersed permeable media

Author: Boroni Gustavo Adolfo
Clausse Alejandro
Silin Nicolas
Publication venue: 'AIP Publishing'
Publication date: 01/12/2020
Field of study

The implementation of a lattice Boltzmann model for three-dimensional permeable media with localized drag forces is presented. The model was previously introduced for two-dimensional geometries and follows the basics of the immersed boundary method. Permeable flows are much less stable than their counterparts in porous media and generally produce large coherent flow structures, such as vortex lines, rolls, and wakes. In addition, in permeable media, the small-scale geometry often needs to be represented to a high degree of detail in order to capture certain transport phenomena, such as micro-convection or pollination. Hence, both calculation speed and memory requirements are under strain. The present model was implemented in a graphic processing unit showing excellent performance in the calculation of stable and unstable flows in a rectangular channel partially obstructed by an array of parallel wires. In particular, the model is able to deal with small and medium spatial scales without losing the heterogeneous nature of permeable flows in the homogenization process. The algorithm to manage memory issues is described in detail, and the results of the test case for stable and unstable conditions show the capability of the method to simulate these types of flows.Fil: Boroni, Gustavo Adolfo. Universidad Nacional del Centro de la Provincia de Buenos Aires. Facultad de Ciencias Sociales; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Silin, Nicolas. Comisión Nacional de Energía Atómica. Gerencia del Área de Energía Nuclear. Instituto Balseiro; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Clausse, Alejandro. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Comisión Nacional de Energía Atómica; Argentin

Crossref

CONICET Digital

Optimization of lattice Boltzmann simulations on heterogeneous computers

Author: Calore Enrico
Gabbana Alessandro
Schifano Sebastiano Fabio
Tripiccione Raffaele
Publication venue: 'SAGE Publications'
Publication date: 01/01/2019
Field of study

High-performance computing systems are more and more often based on accelerators. Computing applications targeting those systems often follow a host-driven approach, in which hosts offload almost all compute-intensive sections of the code onto accelerators; this approach only marginally exploits the computational resources available on the host CPUs, limiting overall performances. The obvious step forward is to run compute-intensive kernels in a concurrent and balanced way on both hosts and accelerators. In this paper, we consider exactly this problem for a class of applications based on lattice Boltzmann methods, widely used in computational fluid dynamics. Our goal is to develop just one program, portable and able to run efficiently on several different combinations of hosts and accelerators. To reach this goal, we define common data layouts enabling the code to exploit the different parallel and vector options of the various accelerators efficiently, and matching the possibly different requirements of the compute-bound and memory-bound kernels of the application. We also define models and metrics that predict the best partitioning of workloads among host and accelerator, and the optimally achievable overall performance level. We test the performance of our codes and their scaling properties using, as testbeds, HPC clusters incorporating different accelerators: Intel Xeon Phi many-core processors, NVIDIA GPUs, and AMD GPUs

Archivio istituzionale della ricerca - Università di Ferrara