Search CORE

341 research outputs found

Reducing memory requirements for large size LBM simulations on GPUs

Author: Axner
Bernaschi
Bernaschi
Gross
He
Januszewski
Kollmannsberger
Latt
Li
Li
Malaspinas
Marié
Mohamad
Obrecht
Pohl
Qian
Rinaldi
Shet
Succi
Valero-Lara
Valero-Lara
Valero-Lara
Valero-Lara
Valero-Lara
Valero-Lara
Valero-Lara
Wellein
Wendt
Yang
Yang
Ye
Publication venue: 'Wiley'
Publication date: 01/01/2017
Field of study

The scientific community in its never-ending road of larger and more efficient computational resources is in need of more efficient implementations that can adapt efficiently on the current parallel platforms. Graphics processing units are an appropriate platform that cover some of these demands. This architecture presents a high performance with a reduced cost and an efficient power consumption. However, the memory capacity in these devices is reduced and so expensive memory transfers are necessary to deal with big problems. Today, the lattice-Boltzmann method (LBM) has positioned as an efficient approach for Computational Fluid Dynamics simulations. Despite this method is particularly amenable to be efficiently parallelized, it is in need of a considerable memory capacity, which is the consequence of a dramatic fall in performance when dealing with large simulations. In this work, we propose some initiatives to minimize such demand of memory, which allows us to execute bigger simulations on the same platform without additional memory transfers, keeping a high performance. In particular, we present 2 new implementations, LBM-Ghost and LBM-Swap, which are deeply analyzed, presenting the pros and cons of each of them.This project was funded by the Spanish Ministry of Economy and Competitiveness (MINECO): BCAM Severo Ochoa accreditation SEV-2013-0323, MTM2013-40824, Computación de Altas Prestaciones VII TIN2015-65316-P, by the Basque Excellence Research Center (BERC 2014-2017) pro- gram by the Basque Government, and by the Departament d' Innovació, Universitats i Empresa de la Generalitat de Catalunya, under project MPEXPAR: Models de Programació i Entorns d' Execució Paral·lels (2014-SGR-1051). We also thank the support of the computing facilities of Extremadura Research Centre for Advanced Technologies (CETA-CIEMAT) and NVIDIA GPU Research Center program for the provided resources, as well as the support of NVIDIA through the BSC/UPC NVIDIA GPU Center of Excellence.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

UPCommons. Portal del coneixement obert de la UPC

Accelerating fluid-solid simulations (Lattice-Boltzmann & Immersed-Boundary) on heterogeneous architectures

Author: Aidun
Bailey
Bernaschi
Bhatnagar
Calhoun
Dalton
Favier
Feichtinger
Green
Guo
Habich
Huang
Januszewski
Layton
Lima
Peskin
Pinelli
Qian
Rinaldi
Roma
Russell
Schnherr
Shet
Shet
Succi
Taira
Uhlmann
Valero-Lara
Valero-Lara
Valero-Lara
Wellein
Wittmann
Wu
Xu
Zhou
Zhu
Zhu
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

We propose a numerical approach based on the Lattice-Boltzmann (LBM) and Immersed Boundary (IB) methods to tackle the problem of the interaction of solids with an incompressible fluid flow, and its implementation on heterogeneous platforms based on data-parallel accelerators such as NVIDIA GPUs and the Intel Xeon Phi. We explain in detail the parallelization of these methods and describe a number of optimizations, mainly focusing on improving memory management and reducing the cost of host-accelerator communication. As previous research has consistently shown, pure LBM simulations are able to achieve good performance results on heterogeneous systems thanks to the high parallel efficiency of this method. Unfortunately, when coupling LBM and IB methods, the overheads of IB degrade the overall performance. As an alternative, we have explored different hybrid implementations that effectively hide such overheads and allow us to exploit both the multi-core and the hardware accelerator in a cooperative way, with excellent performance results

City Research Online

Crossref

BCAM's Institutional Repository Data

HAL AMU

Multi-domain grid refinement for lattice-Boltzmann simulations on heterogeneous platforms

Author: Jansson J.
Valero-Lara P.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

The main contribution of the present work consists of several parallel approaches for grid refinement based on a multi-domain decomposition for lattice-Boltzmann simulations. The proposed method for discretizing the fluid incorporates different regular Cartesian grids with no homogeneous spatial domains, which are in need to be communicated each other. Three different parallel approaches are proposed, homogeneous Multicore, homogeneous GPU, and heterogeneous Multicore-GPU. Although, the homogeneous implementations exhibit satisfactory results, the heterogeneous approach achieves up to 30% extra efficiency, in terms of Millions of Fluid Lattice Updates per Second (MFLUPS), by overlapping some of the steps on both architectures, Multicore and GPU

Crossref

BCAM's Institutional Repository Data

A holistic scalable implementation approach of the lattice Boltzmann method for CPU/GPU heterogeneous clusters

Author: Bakhtiari AB
Bungartz HJB
Neumann PN
Riesinger CR
Schreiber M
Publication venue: MDPI
Publication date: 01/11/2017
Field of study

This is the author accepted manuscript. The final version is available from MDPI via the DOI in this record.Heterogeneous clusters are a widely utilized class of supercomputers assembled from different types of computing devices, for instance CPUs and GPUs, providing a huge computational potential. Programming them in a scalable way exploiting the maximal performance introduces numerous challenges such as optimizations for different computing devices, dealing with multiple levels of parallelism, the application of different programming models, work distribution, and hiding of communication with computation. We utilize the lattice Boltzmann method for fluid flow as a representative of a scientific computing application and develop a holistic implementation for large-scale CPU/GPU heterogeneous clusters. We review and combine a set of best practices and techniques ranging from optimizations for the particular computing devices to the orchestration of tens of thousands of CPU cores and thousands of GPUs. Eventually, we come up with an implementation using all the available computational resources for the lattice Boltzmann method operators. Our approach shows excellent scalability behavior making it future-proof for heterogeneous clusters of the upcoming architectures on the exaFLOPS scale. Parallel efficiencies of more than 90% are achieved leading to 2,604.72 GLUPS utilizing 24,576 CPU cores and 2,048 GPUs of the CPU/GPU heterogeneous cluster Piz Daint and computing more than 6.8 · 109 lattice cells.This work was supported by the German Research Foundation (DFG) as part of the Transregional Collaborative Research Centre “Invasive Computing” (SFB/TR 89). In addition, this work was supported by a grant from the Swiss National Supercomputing Centre (CSCS) under project ID d68. We further thank the Max Planck Computing & Data Facility (MPCDF) and the Global Scientific Information and Computing Center (GSIC) for providing computational resources

Directory of Open Access Journals

Open Research Exeter

Hybrid GPU / CPU Navier-Stokes lattice Boltzmann method for urban wind flow

Author: Camps Santasmasas Marta
Publication venue
Publication date: 01/08/2021
Field of study

The University of Manchester - Institutional Repository

Interactive 3D simulation for fluid–structure interactions using dual coupled GPUs

Author: Song Fengguang
Zhu Luoding
Zigon Bob
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

The scope of this work involves the integration of high-speed parallel computation with interactive, 3D visualization of the lattice-Boltzmann-based immersed boundary method for fluid–structure interaction. An NVIDIA Tesla K40c is used for the computations, while an NVIDIA Quadro K5000 is used for 3D vector field visualization. The simulation can be paused at any time step so that the vector field can be explored. The density and placement of streamlines and glyphs are adjustable by the user, while panning and zooming is controlled by the mouse. The simulation can then be resumed. Unlike most scientific applications in computational fluid dynamics where visualization is performed after the computations, our software allows for real-time visualizations of the flow fields while the computations take place. To the best of our knowledge, such a tool on GPUs for FSI does not exist. Our software can facilitate debugging, enable observation of detailed local fields of flow and deformation while computing, and expedite identification of ‘correct’ parameter combinations in parametric studies for new phenomenon. Therefore, our software is expected to shorten the ‘time to solution’ process and expedite the scientific discoveries via scientific computing

IUPUIScholarWorks