295 research outputs found

    A matrix-free high-order discontinuous Galerkin compressible Navier-Stokes solver: A performance comparison of compressible and incompressible formulations for turbulent incompressible flows

    Full text link
    Both compressible and incompressible Navier-Stokes solvers can be used and are used to solve incompressible turbulent flow problems. In the compressible case, the Mach number is then considered as a solver parameter that is set to a small value, M≈0.1\mathrm{M}\approx 0.1, in order to mimic incompressible flows. This strategy is widely used for high-order discontinuous Galerkin discretizations of the compressible Navier-Stokes equations. The present work raises the question regarding the computational efficiency of compressible DG solvers as compared to a genuinely incompressible formulation. Our contributions to the state-of-the-art are twofold: Firstly, we present a high-performance discontinuous Galerkin solver for the compressible Navier-Stokes equations based on a highly efficient matrix-free implementation that targets modern cache-based multicore architectures. The performance results presented in this work focus on the node-level performance and our results suggest that there is great potential for further performance improvements for current state-of-the-art discontinuous Galerkin implementations of the compressible Navier-Stokes equations. Secondly, this compressible Navier-Stokes solver is put into perspective by comparing it to an incompressible DG solver that uses the same matrix-free implementation. We discuss algorithmic differences between both solution strategies and present an in-depth numerical investigation of the performance. The considered benchmark test cases are the three-dimensional Taylor-Green vortex problem as a representative of transitional flows and the turbulent channel flow problem as a representative of wall-bounded turbulent flows

    STREAmS: a high-fidelity accelerated solver for direct numerical simulation of compressible turbulent flow

    Full text link
    We present STREAmS, an in-house high-fidelity solver for large-scale, massively parallel direct numerical simulations (DNS) of compressible turbulent flows on graphical processing units (GPUs). STREAmS is written in the Fortran 90 language and it is tailored to carry out DNS of canonical compressible wall-bounded flows, namely turbulent plane channel, zero-pressure gradient turbulent boundary layer and supersonic oblique shock-wave/boundary layer interactions. The solver incorporates state-of-the-art numerical algorithms, specifically designed to cope with the challenging problems associated with the solution of high-speed turbulent flows and can be used across a wide range of Mach numbers, extending from the low subsonic up to the hypersonic regime. The use of cuf automatic kernels allowed an easy and efficient porting on the GPU architecture minimizing the changes to the original CPU code, which is also maintained. We discuss a memory allocation strategy based on duplicated arrays for host and device which carefully minimizes the memory usage making the solver suitable for large scale computations on the latest GPU cards. Comparison between different CPUs and GPUs architectures strongly favor the latter, and executing the solver on a single NVIDIA Tesla P100 corresponds to using approximately 330 Intel Knights Landing CPU cores. STREAmS shows very good strong scalability and essentially ideal weak scalability up to 2048 GPUs, paving the way to simulations in the genuine high-Reynolds number regime, possibly at friction Reynolds number Reτ>104Re_{\tau} > 10^4. The solver is released open source under GPLv3 license and is available at https://github.com/matteobernardini/STREAmS.Comment: 11 pages, 11 figure

    STREAmS: A high-fidelity accelerated solver for direct numerical simulation of compressible turbulent flows

    Get PDF
    We present STREAmS, an in-house high-fidelity solver for direct numerical simulations (DNS) of canonical compressible wall-bounded flows, namely turbulent plane channel, zero-pressure gradient turbulent boundary layer and supersonic oblique shock-wave/boundary layer interaction. The solver incorporates state-of-the-art numerical algorithms, specifically designed to cope with the challenging problems associated with the solution of high-speed turbulent flows and can be used across a wide range of Mach numbers, extending from the low subsonic up to the hypersonic regime. From the computational viewpoint, STREAmS is oriented to modern HPC platforms thanks to MPI parallelization and the ability to run on multi-GPU architectures. This paper discusses the main implementation strategies, with particular reference to the CUDA paradigm, the management of a single code for traditional and multi-GPU architectures, and the optimization process to take advantage of the latest generation of NVIDIA GPUs. Performance measurements show that single-GPU optimization more than halves the computing time as compared to the baseline version. At the same time, the asynchronous patterns implemented in STREAmS for MPI communications guarantee very good parallel performance especially in the weak scaling spirit, with efficiency exceeding 97% on 1024 GPUs. For overall evaluation of STREAmS with respect to other compressible solvers, comparison with a recent GPU-enabled community solver is presented. It turns out that, although STREAmS is much more limited in terms of flow configurations that can be addressed, the advantage in terms of accuracy, computing time and memory occupation is substantial, which makes it an ideal candidate for large-scale simulations of high-Reynolds number, compressible wall-bounded turbulent flows. The solver is released open source under GPLv3 license. Program summary: Program Title: STREAmS CPC Library link to program files: https://doi.org/10.17632/hdcgjpzr3y.1 Developer's repository link: https://github.com/matteobernardini/STREAmS Code Ocean capsule: https://codeocean.com/capsule/8931507/tree/v2 Licensing provisions: GPLv3 Programming language: Fortran 90, CUDA Fortran, MPI Nature of problem: Solving the three-dimensional compressible Navier–Stokes equations for low and high Mach regimes in a Cartesian domain configured for channel, boundary layer or shock-boundary layer interaction flows. Solution method: The convective terms are discretized using a hybrid energy-conservative shock-capturing scheme in locally conservative form. Shock-capturing capabilities rely on the use of Lax–Friedrichs flux vector splitting and weighted essentially non-oscillatory (WENO) reconstruction. The system is advanced in time using a three-stage, third-order RK scheme. Two-dimensional pencil distributed MPI parallelization is implemented alongside different patterns of GPU (CUDA Fortran) accelerated routines

    An unstructured parallel least-squares spectral element solver for incompressible flow problems

    Get PDF
    The parallelization of the least-squares spectral element formulation of the Stokes problem has recently been discussed for incompressible flow problems on structured grids. In the present work, the extension to unstructured grids is discussed. It will be shown that, to obtain an efficient and scalable method, two different kinds of distribution of data are required involving a rather complicated parallel conversion between the data. Once the data conversion has been performed, a large symmetric positive definite algebraic system has to be solved iteratively. It is well known that the Conjugate Gradient method is a good choice to solve such systems. To improve the convergence rate of the Conjugate Gradient process, both Jacobi and Additive Schwarz preconditioners are applied. The Additive Schwarz preconditioner is based on domain decomposition and can be implemented such that a preconditioning step corresponds to a parallel matrix-by-vector product. The new results reveal that the Additive Schwarz preconditioner is very suitable for the p-refinement version of the least-squares spectral element method. To obtain good portable programs which may run on distributed-memory multiprocessors, networks of workstations as well as shared-memory machines we use MPI (Message Passing Interface). Numerical simulations have been performed to validate the scalability of the different parts of the proposed method. The experiments entailed simulating several large scale incompressible flows on a Cray T3E and on an SGI Origin 3800 with the number of processors varying from one to more than one hundred. The results indicate that the present method has very good parallel scaling properties making it a powerful method for numerical simulations of incompressible flows

    An Unstructured Parallel Least-Squares Spectral Element Solver for Incompressible Flow Problems

    Get PDF
    The parallelization of the least-squares spectral element formulation of the Stokes problem has recently been discussed for incompressible flow problems on structured grids. In the present work, the extension to unstructured grids is discussed. It will be shown that, to obtain an efficient and scalable method, two different kinds of distribution of data are required involving a rather complicated parallel conversion between the data. Once the data conversion has been performed, a large symmetric positive definite algebraic system has to be solved iteratively. It is well known that the Conjugate Gradient method is a good choice to solve such systems. To improve the convergence rate of the Conjugate Gradient process, both Jacobi and Additive Schwarz preconditioners are applied. The Additive Schwarz preconditioner is based on domain decomposition and can be implemented such that a preconditioning step corresponds to a parallel matrix-by-vector product. The new results reveal that the Additive Schwarz preconditioner is very suitable for the p-refinement version of the least-squares spectral element method. To obtain good portable programs which may run on distributed-memory multiprocessors, networks of workstations as well as shared-memory machines we use MPI (Message Passing Interface). Numerical simulations have been performed to validate the scalability of the different parts of the proposed method. The experiments entailed simulating several large scale incompressible flows on a Cray T3E and on an SGI Origin 3800 with the number of processors varying from one to more than one hundred. The results indicate that the present method has very good parallel scaling properties making it a powerful method for numerical simulations of incompressible flows

    Towards Efficient and Scalable Discontinuous Galerkin Methods for Unsteady Flows

    Get PDF
    openNegli ultimi anni, la crescente disponibilit`a di risorse computazionali ha contribuito alla diffusione della fluidodinamica computazionale per la ricerca e per la progettazione industriale. Uno degli approcci pi promettenti si basa sul metodo agli elementi finiti discontinui di Galerkin (dG). Nell’ambito di queste metodologie, il contributo della tesi e' triplice. Innanzi- tutto, il lavoro introduce un algoritmo di parallelizzazione ibrida MPI/OpenMP per l’utilizzo efficiente di risorse di super calcolo. In secondo luogo, propone strategie di soluzione efficienti, scalabili e con limitata allocazione di memoria per la soluzione di problemi complessi. Infine, confronta le strategie di soluzione introdotte con nuove tecniche di discretizzazione dette “ibridizzabili”, su problemi riguardanti la soluzione delle equazioni di Navier–Stokes non stazionarie. L’efficienza computazionale e' stata valutata su casi di crescente complessita' riguardanti la simulazione della turbolenza. In primo luogo, e' stata considerata la convezione naturale di Rayleigh-Benard e il flusso turbolento in un canale a numeri di Reynolds moderatamente alti. Le strategie di soluzione proposte sono risultate fino a cinque volte piu` veloci rispetto ai metodi standard allocando solamente il 7% della memoria. In secondo luogo, e' stato analizzato il flusso attorno ad una piastra piana con bordo arrotondato sottoposta a diversi livelli di turbolenza in ingresso. Nonostante la maggiore complessità' dovuta all’uso di elementi curvi ed anisotropi, l’algoritmo proposto e' risultato oltre tre volte piu` veloce allocando il 15% della memoria rispetto ad un metodo standard. Concludendo, viene riportata la simulazione del “Boeing Rudimentary Landing Gear” a Re = 10^6. In tutti i casi i risultati ottenuti sono in ottimo accordo con i dati sperimentali e con precedenti simulazioni numeriche pubblicate in letteratura.In recent years the increasing availability of High Performance Computing (HPC) resources strongly promoted the widespread of high fidelity simulations, such as the Large Eddy Simulation (LES), for industrial research and design. One of the most promising approaches to those kind of simulations is based on the discontinuous Galerkin (dG) discretization method. The contribution of the thesis towards this research area is three-fold. First, the work introduces an efficient hybrid MPI/OpenMP parallelisation paradigm to fruitfully exploit large HPC facilities. Second, it reports efficient, scalable and memory saving solution strategies for stiff dG discretisations. Third, it compares those solution strategies, for the first time using the same numerical framework, to hybridizable discontinuous Galerkin (HDG) methods, including a novel implementation of a p-multigrid preconditioning approach, on unsteady flow problems involving the solution of the NavierStokes equations. The improvements in computational efficiency have been evaluated on cases of growing complexity involving large eddy simulations of turbulent flows. First, the Rayleigh-Benard convection problem and the turbulent channel flow at moderately high Reynolds numbers is presented. The solution strategies proposed resulted up to five times faster than standard matrix-based methods while al- locating the 7% of the memory. A second family of test cases involve the LES simulation of a rounded leading edge flat plate under different levels of free-stream turbulence. Although the increased stiffness of the iteration matrix due to the use of curved and stretched elements, the solver resulted more than three times faster while allocating the 15% of the memory if compared to standard methods. Finally, the large eddy simulation of the Boeing Rudimentary Landing Gear at Re = 10^6 is reported. In all the cases, a remarkable agreement with experimental data as well as previous numerical simulations is documented.INGEGNERIA INDUSTRIALEopenFranciolini, Matte

    Development of a high-order parallel solver for direct and large eddy simulations of turbulent flows

    Get PDF
    Turbulence is inherent in fluid dynamics, in that laminar flows are rather the exception than the rule, hence the longstanding interest in the subject, both within the academic community and the industrial R&D laboratories. Since 1883, much progress has been made, and statistics applied to turbulence have provided understanding of the scaling laws which are peculiar to several model flows, whereas experiments have given insight on the structure of real-world flows, but, soon enough, numerical approaches to the matter have become the most promising ones, since they lay the ground for the solution of high Reynolds number unsteady Navier-Stokes equations by means of computer systems. Nevertheless, despite the exponential rise in computational capability over the last few decades, the more computer technology advances, the higher the Reynolds number sought for test-cases of industrial interest: there is a natural tendency to perform simulations as large as possible, a habit that leaves no room for wasting resources. Indeed, as the scale separation grows with Re, the reduction of wall clock times for a high-fidelity solution of desired accuracy becomes increasingly important. To achieve this task, a CFD solver should rely on the use of appropriate physical models, consistent numerical methods to discretize the equations, accurate non-dissipative numerical schemes, efficient algorithms to solve the numerics, and fast routines implementing those algorithms. Two archetypal approaches to CFD are direct and large-eddy simulation (DNS and LES respectively), which profoundly differ in several aspects but are both “eddy-resolving” methods, meant to resolve the structures of the flow-field with the highest possible accuracy and putting in as little spurious dissipation as possible. These two requirements of accurate resolution of scales, and energy conservation, should be addressed by any numerical method, since they are essential to many real-world fluid flows of industrial interest. As a consequence, high order numerical schemes, and compact schemes among them, have received much consideration, since they address both goals, at the cost of a lower ease of application of the boundary condition, and a higher computational cost. The latter problem is tackled with parallel computing, which also allows to take advantage of the currently available computer power at the best possible extent. The research activity conducted by the present author has concerned the development, from scratch, of a three-dimensional, unsteady, incompressible Navier-Stokes parallel solver, which uses an advanced algorithm for the process-wise solution of the linear systems arising from the application of high order compact finite difference schemes, and hinges upon a three-dimensional decomposition of the cartesian computational space. The code is written in modern Fortran 2003 — plus a few features which are unique to the 2008 standard — and is parallelized through the use of MPI 3.1 standard’s advanced routines, as implemented by the OpenMPI library project. The coding was carried out with the objective of creating an original CFD high-order parallel solver which is maintainable and extendable, of course within a well-defined range of possibilities. With this main priority being outlined, particular attention was paid to several key concepts: modularity and readability of the source code and, in turn, its reusability; ease of implementation of virtually any new explicit or implicit finite difference scheme; modern programming style and avoidance of deprecated old legacy Fortran constructs and features, so that the world wide web is a reliable and active means to the quick solution of coding problems arising from the implementation of new modules in the code; last but not least, thorough comments, especially in critical sections of the code, explaining motives and possible expected weak links. Design, production, and documentation of a program from scratch is almost never complete. This is certainly true for the present effort. The method and the code are verified against the full three-dimensional Lid-Driven Cavity and Taylor-Green Vortex flows. The latter test is used also for the assessment of scalability and parallel efficiency
    • 

    corecore