15 research outputs found

    GPU Communication Performance Engineering for the Lattice Boltzmann Method

    Get PDF
    Orientador : Prof. Dr. Daniel WeingaertnerDissertação (mestrado) - Universidade Federal do Paraná, Setor de Ciências Exatas, Programa de Pós-Graduação em Informática. Defesa: Curitiba, 10/08/2016Inclui referências : f. 59-62Área de concentração: Ciência da computaçãoResumo: A crescente importância do uso de GPUs para computação de propósito geral em supercomputadores faz com que o bom suporte a GPUs seja uma característica valiosa de frameworks de software para computação de alto desempenho como o waLBerla. waLBerla é um framework de software altamente paralelo que suporta uma ampla gama de fenômenos físicos. Embora apresente um bom desempenho em CPUs, testes demonstraram que as suas soluções de comunicação para GPU têm um desempenho ruim. Neste trabalho são apresentadas soluções para melhorar o desempenho, a eficiência do uso de memória e a usabilidade do waLBerla em supercomputadores baseados em GPU. A infraestrutura de comunicação proposta para GPUs NVIDIA com suporte a CUDA mostrou-se 25 vezes mais rápida do que o mecanismo de comunicação para GPU disponíveis anteriormente no waLBerla. Nossa solução para melhorar a eficiência do uso de memória da GPU permite usar 55% da memória necessária por uma abordagem simplista, o que possibilita executar simulações com domínios maiores ou usar menos GPUs para um determinado tamanho de domínio. Adicionalmente, levando-se em consideração que o desempenho de kernels CUDA se mostrou altamente sensível ao modo como a memória da GPU é acessada e a detalhes de implementação, foi proposto um mecanismo de indexação flexível de domínio que permite configurar as dimensões dos blocos de threads. Além disso, uma aplicação do Lattice Boltzmann Method (LBM) foi desenvolvida com kernels CUDA altamente otimizados a fim de se realizar todos os experimentos e testar todas as soluções propostas para o waLBerla. Palavras-chave: HPC, GPU, CUDA, Comunicação, Memória, Lattice Boltzmann Method, waLBerla.Abstract: The increasing importance of GPUs for general-purpose computation on supercomputers makes a good GPU support by High-Performance Computing (HPC) software frameworks such as waLBerla a valuable feature. waLBerla is a massively parallel software framework that supports a wide range of physical phenomena. Although it presents good performance on CPUs, tests have shown that its available GPU communication solutions perform poorly. In this work, we present solutions for improving waLBerla's performance, memory usage e_ciency and usability on GPUbased supercomputers. The proposed communication infrastructure for CUDA-enabled NVIDIA GPUs executed 25 times faster than the GPU communication mechanism previously available on waLBerla. Our solution for improving GPU memory usage e_ciency allowed for using 55% of the memory required by a naive approach, which makes possible for running simulations with larger domains or using fewer GPUs for a given domain size. In addition, as CUDA kernel performance showed to be very sensitive to the way data is accessed in GPU memory and kernel implementation details, we proposed a flexible domain indexing mechanism that allows for configuring thread block sizes. Finally, a Lattice Boltzmann Method (LBM) application was developed with highly optimized CUDA kernels in order to carry out all experiments and test all proposed solutions for waLBerla. Keywords: HPC, GPU, CUDA, Communication, Memory, Lattice Boltzmann Method, waLBerla

    Comparison of free-surface and conservative Allen-Cahn phase-field lattice Boltzmann method

    Full text link
    This study compares the free-surface lattice Boltzmann method (FSLBM) with the conservative Allen-Cahn phase-field lattice Boltzmann method (PFLBM) in their ability to model two-phase flows in which the behavior of the system is dominated by the heavy phase. Both models are introduced and their individual properties, strengths and weaknesses are thoroughly discussed. Six numerical benchmark cases were simulated with both models, including (i) a standing gravity and (ii) capillary wave, (iii) an unconfined rising gas bubble in liquid, (iv) a Taylor bubble in a cylindrical tube, and (v) the vertical and (vi) oblique impact of a drop into a pool of liquid. Comparing the simulation results with either analytical models or experimental data from the literature, four major observations were made. Firstly, the PFLBM selected was able to simulate flows purely governed by surface tension with reasonable accuracy. Secondly, the FSLBM, a sharp interface model, generally requires a lower resolution than the PFLBM, a diffuse interface model. However, in the limit case of a standing wave, this was not observed. Thirdly, in simulations of a bubble moving in a liquid, the FSLBM accurately predicted the bubble's shape and rise velocity with low computational resolution. Finally, the PFLBM's accuracy is found to be sensitive to the choice of the model's mobility parameter and interface width

    Advanced Automatic Code Generation for Multiple Relaxation-Time Lattice Boltzmann Methods

    Full text link
    The scientific code generation package lbmpy supports the automated design and the efficient implementation of lattice Boltzmann methods (LBMs) through metaprogramming. It is based on a new, concise calculus for describing multiple relaxation-time LBMs, including techniques that enable the numerically advantageous subtraction of the constant background component from the populations. These techniques are generalized to a wide range of collision spaces and equilibrium distributions. The article contains an overview of lbmpy's front-end and its code generation pipeline, which implements the new LBM calculus by means of symbolic formula manipulation tools and object-oriented programming. The generated codes have only a minimal number of arithmetic operations. Their automatic derivation rests on two novel Chimera transforms that have been specifically developed for efficiently computing raw and central moments. Information contained in the symbolic representation of the methods is further exploited in a customized sequence of algebraic simplifications, further reducing computational cost. When combined, these algebraic transformations lead to concise and compact numerical kernels. Specifically, with these optimizations, the advanced central moment- and cumulant-based methods can be realized with only little additional cost as when compared with the simple BGK method. The effectiveness and flexibility of the new lbmpy code generation system is demonstrated in simulating Taylor-Green vortex decay and the automatic derivation of an LBM algorithm to solve the shallow water equations.Comment: 23 pages, 6 figure

    Massively Parallel Stencil Code Solver with Autonomous Adaptive Block Distribution

    Get PDF

    Multiphysics simulations: challenges and opportunities.

    Full text link
    corecore