8 research outputs found

    EPSILOD: efficient parallel skeleton for generic iterative stencil computations in distributed GPUs

    Get PDF
    Producción CientíficaIterative stencil computations are widely used in numerical simulations. They present a high degree of parallelism, high locality and mostly-coalesced memory access patterns. Therefore, GPUs are good candidates to speed up their computa- tion. However, the development of stencil programs that can work with huge grids in distributed systems with multiple GPUs is not straightforward, since it requires solv- ing problems related to the partition of the grid across nodes and devices, and the synchronization and data movement across remote GPUs. In this work, we present EPSILOD, a high-productivity parallel programming skeleton for iterative stencil computations on distributed multi-GPUs, of the same or different vendors that sup- ports any type of n-dimensional geometric stencils of any order. It uses an abstract specification of the stencil pattern (neighbors and weights) to internally derive the data partition, synchronizations and communications. Computation is split to better overlap with communications. This paper describes the underlying architecture of EPSILOD, its main components, and presents an experimental evaluation to show the benefits of our approach, including a comparison with another state-of-the-art solution. The experimental results show that EPSILOD is faster and shows good strong and weak scalability for platforms with both homogeneous and heterogene- ous types of GPUJunta de Castilla y León, Ministerio de Economía, Industria y Competitividad, y Fondo Europeo de Desarrollo Regional (FEDER): Proyecto PCAS (TIN2017-88614-R) y Proyecto PROPHET-2 (VA226P20).Ministerio de Ciencia e Innovación, Agencia Estatal de Investigación y “European Union NextGenerationEU/PRTR” : (MCIN/ AEI/10.13039/501100011033) - grant TED2021-130367B-I00CTE-POWER and Minotauro and the technical support provided by Barcelona Supercomputing Center (RES-IM-2021-2-0005, RES-IM-2021-3-0024, RES- IM-2022-1-0014).Publicación en abierto financiada por el Consorcio de Bibliotecas Universitarias de Castilla y León (BUCLE), con cargo al Programa Operativo 2014ES16RFOP009 FEDER 2014-2020 DE CASTILLA Y LEÓN, Actuación:20007-CL - Apoyo Consorcio BUCL

    AN5D: Automated Stencil Framework for High-Degree Temporal Blocking on GPUs

    Full text link
    Stencil computation is one of the most widely-used compute patterns in high performance computing applications. Spatial and temporal blocking have been proposed to overcome the memory-bound nature of this type of computation by moving memory pressure from external memory to on-chip memory on GPUs. However, correctly implementing those optimizations while considering the complexity of the architecture and memory hierarchy of GPUs to achieve high performance is difficult. We propose AN5D, an automated stencil framework which is capable of automatically transforming and optimizing stencil patterns in a given C source code, and generating corresponding CUDA code. Parameter tuning in our framework is guided by our performance model. Our novel optimization strategy reduces shared memory and register pressure in comparison to existing implementations, allowing performance scaling up to a temporal blocking degree of 10. We achieve the highest performance reported so far for all evaluated stencil benchmarks on the state-of-the-art Tesla V100 GPU

    Large scale simulations of swirling and particle-laden flows using the Lattice-Boltzmann Method

    Get PDF
    Since the development of high performance computers, numerical simulations have evolved into an important scientific tool by means of mathematical modeling to address physical problems that are complex to handle experimentally. Predicting the behavior of physical systems which are not directly observable helps to design and optimize new technology. Computational fluid dynamics in specific aims to understand natural flow phenomena as well as to design and operate engineering processes in industry. With the continuous increase in computational power every year, the question of how to efficiently use computational resources becomes increasingly more important. Improving existing practices involves a better understanding of the underlying physical mechanisms as well as optimizing the algorithms that are used to solve them with robust and rapid numerical methods. The Lattice-Boltzmann method (LBM) is a mesoscopic approach to approximate the macroscopic equations of mass and momentum balance equations for a fluid flow. The objective of this study is to apply this concept to large scale problems and present its capabilities in terms of physical modelling and computing efficiency. As a validation step, computational models are tested against referenced theoretical, numerical and experimental evidences over a wide range of hydrodynamic conditions from creeping to turbulent flows and granular media. Turbulent flows are multi-scale flows that required fine meshes and long simulation times to converge statistics.Special care is taken to verify the fluid-solid interface for dispersed two-phase flows.Two main setups are examined – the non-reacting, swirling flow inside an injector and a particle-laden flow around a cylinder. Swirling flows are typical of aeronautical combustion chambers. The selected configuration is used to benchmark three different large eddy simulation solvers regarding their accuracy and computational efficiency. The obtained numerical results are compared to experimental results in terms of mean and fluctuating velocity profiles and pressure drop. The scaling, that is the code performance on a large range of processors, is characterized. Differences between several algorithmic approaches and different solvers are evaluated and commented. Next, we focused on particle laden flows around a cylinder as generic configuration for the interaction of a dispersed phase and flow hydrodynamic instabilities. It has been shown that the viscosity of a suspension increases relative to the particle volume fraction and for a certain range of particle material and concentration, this a fairly good model of interphase coupling. This phenomenon only occurs in numerical simulations that are able to describe finite size effects for rigid bodies. Comparing global flow parameters of suspensions at different particle volume fractions and sizes have shown that these flow features can be obtained for an equivalent single phase fluid with effective viscosity. Starting from neutrally buoyant particles the transition to granular flow is investigated. By increasing the relative density of particles, the influence of particle inertia on the equivalent fluid prediction is investigated and the contribution of particle collisions on the drag coefficient for varying relative densities is discussed. Conclusions are drawn regarding the code performance and physical representativeness of results

    Exploring Scheduling for On-demand File Systems and Data Management within HPC Environments

    Get PDF

    Exploring Scheduling for On-demand File Systems and Data Management within HPC Environments

    Get PDF
    corecore