172 research outputs found

    Avoiding synchronization to accelerate a CFD solver in GPU

    Get PDF
    The caffa3d.MBRi is an open source, GPU-aware, general purpose incompressible flow solver, aimed at providing a useful tool for numerical simulation of real world fluid flow problems that require both geometrical flexibility and parallel computation capabilities to afford tens and hundreds million cells simulations. At the core of this tool there are a number of linear solvers that can be selected according to the characteristics of the problem to solve. For band matrices, the most efficient linear solver included in caffa3d.MBRi is the Strongly Implicit Procedure (SIP) solver. The parallelization of this solver follows the hyper-planes strategy, where the computations in one hyper-plane bare no dependencies and can be executed in parallel, while the hyper-planes have to be processed sequentially. In this work, we analyze this strategy to reach an efficient GPU implementation of the SIP solver for the caffa3d.MBRi. In particular, we design and implement a self-scheduling procedure to avoid the overhead of CPU-GPU synchronization implied by the hyper-planes strategy, outperforming the standard GPU implementation of the SIP by approximately 2x.Agencia Nacional de Investigación e Innovació

    Production Level CFD Code Acceleration for Hybrid Many-Core Architectures

    Get PDF
    In this work, a novel graphics processing unit (GPU) distributed sharing model for hybrid many-core architectures is introduced and employed in the acceleration of a production-level computational fluid dynamics (CFD) code. The latest generation graphics hardware allows multiple processor cores to simultaneously share a single GPU through concurrent kernel execution. This feature has allowed the NASA FUN3D code to be accelerated in parallel with up to four processor cores sharing a single GPU. For codes to scale and fully use resources on these and the next generation machines, codes will need to employ some type of GPU sharing model, as presented in this work. Findings include the effects of GPU sharing on overall performance. A discussion of the inherent challenges that parallel unstructured CFD codes face in accelerator-based computing environments is included, with considerations for future generation architectures. This work was completed by the author in August 2010, and reflects the analysis and results of the time

    An open and parallel multiresolution framework using block-based adaptive grids

    Full text link
    A numerical approach for solving evolutionary partial differential equations in two and three space dimensions on block-based adaptive grids is presented. The numerical discretization is based on high-order, central finite-differences and explicit time integration. Grid refinement and coarsening are triggered by multiresolution analysis, i.e. thresholding of wavelet coefficients, which allow controlling the precision of the adaptive approximation of the solution with respect to uniform grid computations. The implementation of the scheme is fully parallel using MPI with a hybrid data structure. Load balancing relies on space filling curves techniques. Validation tests for 2D advection equations allow to assess the precision and performance of the developed code. Computations of the compressible Navier-Stokes equations for a temporally developing 2D mixing layer illustrate the properties of the code for nonlinear multi-scale problems. The code is open source

    Portable implementation model for CFD simulations. Application to hybrid CPU/GPU supercomputers

    Get PDF
    Nowadays, high performance computing (HPC) systems experience a disruptive moment with a variety of novel architectures and frameworks, without any clarity of which one is going to prevail. In this context, the portability of codes across different architectures is of major importance. This paper presents a portable implementation model based on an algebraic operational approach for direct numerical simulation (DNS) and large eddy simulation (LES) of incompressible turbulent flows using unstructured hybrid meshes. The strategy proposed consists in representing the whole time-integration algorithm using only three basic algebraic operations: sparse matrix–vector product, a linear combination of vectors and dot product. The main idea is based on decomposing the nonlinear operators into a concatenation of two SpMV operations. This provides high modularity and portability. An exhaustive analysis of the proposed implementation for hybrid CPU/GPU supercomputers has been conducted with tests using up to 128 GPUs. The main objective consists in understanding the challenges of implementing CFD codes on new architectures.Peer ReviewedPostprint (author's final draft
    • …
    corecore