8 research outputs found
EPSILOD: efficient parallel skeleton for generic iterative stencil computations in distributed GPUs
Producción CientíficaIterative stencil computations are widely used in numerical simulations. They
present a high degree of parallelism, high locality and mostly-coalesced memory
access patterns. Therefore, GPUs are good candidates to speed up their computa-
tion. However, the development of stencil programs that can work with huge grids in
distributed systems with multiple GPUs is not straightforward, since it requires solv-
ing problems related to the partition of the grid across nodes and devices, and the
synchronization and data movement across remote GPUs. In this work, we present
EPSILOD, a high-productivity parallel programming skeleton for iterative stencil
computations on distributed multi-GPUs, of the same or different vendors that sup-
ports any type of n-dimensional geometric stencils of any order. It uses an abstract
specification of the stencil pattern (neighbors and weights) to internally derive the
data partition, synchronizations and communications. Computation is split to better
overlap with communications. This paper describes the underlying architecture of
EPSILOD, its main components, and presents an experimental evaluation to show
the benefits of our approach, including a comparison with another state-of-the-art
solution. The experimental results show that EPSILOD is faster and shows good
strong and weak scalability for platforms with both homogeneous and heterogene-
ous types of GPUJunta de Castilla y León, Ministerio de Economía, Industria y Competitividad, y Fondo Europeo de Desarrollo Regional (FEDER): Proyecto PCAS (TIN2017-88614-R) y Proyecto PROPHET-2 (VA226P20).Ministerio de Ciencia e Innovación, Agencia Estatal de Investigación y “European Union NextGenerationEU/PRTR” : (MCIN/ AEI/10.13039/501100011033) - grant TED2021-130367B-I00CTE-POWER and Minotauro and the technical support provided by Barcelona Supercomputing Center (RES-IM-2021-2-0005, RES-IM-2021-3-0024, RES- IM-2022-1-0014).Publicación en abierto financiada por el Consorcio de Bibliotecas Universitarias de Castilla y León (BUCLE), con cargo al Programa Operativo 2014ES16RFOP009 FEDER 2014-2020 DE CASTILLA Y LEÓN, Actuación:20007-CL - Apoyo Consorcio BUCL
AN5D: Automated Stencil Framework for High-Degree Temporal Blocking on GPUs
Stencil computation is one of the most widely-used compute patterns in high
performance computing applications. Spatial and temporal blocking have been
proposed to overcome the memory-bound nature of this type of computation by
moving memory pressure from external memory to on-chip memory on GPUs. However,
correctly implementing those optimizations while considering the complexity of
the architecture and memory hierarchy of GPUs to achieve high performance is
difficult. We propose AN5D, an automated stencil framework which is capable of
automatically transforming and optimizing stencil patterns in a given C source
code, and generating corresponding CUDA code. Parameter tuning in our framework
is guided by our performance model. Our novel optimization strategy reduces
shared memory and register pressure in comparison to existing implementations,
allowing performance scaling up to a temporal blocking degree of 10. We achieve
the highest performance reported so far for all evaluated stencil benchmarks on
the state-of-the-art Tesla V100 GPU
Large scale simulations of swirling and particle-laden flows using the Lattice-Boltzmann Method
Since the development of high performance computers, numerical simulations have evolved into an important scientific tool by means of mathematical modeling to address physical problems that are complex to handle experimentally. Predicting the behavior of physical systems which are not directly observable helps to design and optimize new technology. Computational fluid dynamics in specific aims to understand natural flow phenomena as well as to design and operate engineering processes in industry. With the continuous increase in computational power every year, the question of how to efficiently use computational resources becomes increasingly more important. Improving existing practices involves a better understanding of the underlying physical mechanisms as well as optimizing the algorithms that are used to solve them with robust and rapid numerical methods. The Lattice-Boltzmann method (LBM) is a mesoscopic approach to approximate the macroscopic equations of mass and momentum balance equations for a fluid flow. The objective of this study is to apply this concept to large scale problems and present its capabilities in terms of physical modelling and computing efficiency. As a validation step, computational models are tested against referenced theoretical, numerical and experimental evidences over a wide range of hydrodynamic conditions from creeping to turbulent flows and granular media. Turbulent flows are multi-scale flows that required fine meshes and long simulation times to converge statistics.Special care is taken to verify the fluid-solid interface for dispersed two-phase flows.Two main setups are examined – the non-reacting, swirling flow inside an injector and a particle-laden flow around a cylinder. Swirling flows are typical of aeronautical combustion chambers. The selected configuration is used to benchmark three different large eddy simulation solvers regarding their accuracy and computational efficiency. The obtained numerical results are compared to experimental results in terms of mean and fluctuating velocity profiles and pressure drop. The scaling, that is the code performance on a large range of processors, is characterized. Differences between several algorithmic approaches and different solvers are evaluated and commented. Next, we focused on particle laden flows around a cylinder as generic configuration for the interaction of a dispersed phase and flow hydrodynamic instabilities. It has been shown that the viscosity of a suspension increases relative to the particle volume fraction and for a certain range of particle material and concentration, this a fairly good model of interphase coupling. This phenomenon only occurs in numerical simulations that are able to describe finite size effects for rigid bodies. Comparing global flow parameters of suspensions at different particle volume fractions and sizes have shown that these flow features can be obtained for an equivalent single phase fluid with effective viscosity. Starting from neutrally buoyant particles the transition to granular flow is investigated. By increasing the relative density of particles, the influence of particle inertia on the equivalent fluid prediction is investigated and the contribution of particle collisions on the drag coefficient for varying relative densities is discussed. Conclusions are drawn regarding the code performance and physical representativeness of results