1,033 research outputs found

    STREAmS: a high-fidelity accelerated solver for direct numerical simulation of compressible turbulent flow

    Full text link
    We present STREAmS, an in-house high-fidelity solver for large-scale, massively parallel direct numerical simulations (DNS) of compressible turbulent flows on graphical processing units (GPUs). STREAmS is written in the Fortran 90 language and it is tailored to carry out DNS of canonical compressible wall-bounded flows, namely turbulent plane channel, zero-pressure gradient turbulent boundary layer and supersonic oblique shock-wave/boundary layer interactions. The solver incorporates state-of-the-art numerical algorithms, specifically designed to cope with the challenging problems associated with the solution of high-speed turbulent flows and can be used across a wide range of Mach numbers, extending from the low subsonic up to the hypersonic regime. The use of cuf automatic kernels allowed an easy and efficient porting on the GPU architecture minimizing the changes to the original CPU code, which is also maintained. We discuss a memory allocation strategy based on duplicated arrays for host and device which carefully minimizes the memory usage making the solver suitable for large scale computations on the latest GPU cards. Comparison between different CPUs and GPUs architectures strongly favor the latter, and executing the solver on a single NVIDIA Tesla P100 corresponds to using approximately 330 Intel Knights Landing CPU cores. STREAmS shows very good strong scalability and essentially ideal weak scalability up to 2048 GPUs, paving the way to simulations in the genuine high-Reynolds number regime, possibly at friction Reynolds number Reτ>104Re_{\tau} > 10^4. The solver is released open source under GPLv3 license and is available at https://github.com/matteobernardini/STREAmS.Comment: 11 pages, 11 figure

    Accelerating moderately stiff chemical kinetics in reactive-flow simulations using GPUs

    Full text link
    The chemical kinetics ODEs arising from operator-split reactive-flow simulations were solved on GPUs using explicit integration algorithms. Nonstiff chemical kinetics of a hydrogen oxidation mechanism (9 species and 38 irreversible reactions) were computed using the explicit fifth-order Runge-Kutta-Cash-Karp method, and the GPU-accelerated version performed faster than single- and six-core CPU versions by factors of 126 and 25, respectively, for 524,288 ODEs. Moderately stiff kinetics, represented with mechanisms for hydrogen/carbon-monoxide (13 species and 54 irreversible reactions) and methane (53 species and 634 irreversible reactions) oxidation, were computed using the stabilized explicit second-order Runge-Kutta-Chebyshev (RKC) algorithm. The GPU-based RKC implementation demonstrated an increase in performance of nearly 59 and 10 times, for problem sizes consisting of 262,144 ODEs and larger, than the single- and six-core CPU-based RKC algorithms using the hydrogen/carbon-monoxide mechanism. With the methane mechanism, RKC-GPU performed more than 65 and 11 times faster, for problem sizes consisting of 131,072 ODEs and larger, than the single- and six-core RKC-CPU versions, and up to 57 times faster than the six-core CPU-based implicit VODE algorithm on 65,536 ODEs. In the presence of more severe stiffness, such as ethylene oxidation (111 species and 1566 irreversible reactions), RKC-GPU performed more than 17 times faster than RKC-CPU on six cores for 32,768 ODEs and larger, and at best 4.5 times faster than VODE on six CPU cores for 65,536 ODEs. With a larger time step size, RKC-GPU performed at best 2.5 times slower than six-core VODE for 8192 ODEs and larger. Therefore, the need for developing new strategies for integrating stiff chemistry on GPUs was discussed.Comment: 27 pages, LaTeX; corrected typos in Appendix equations A.10 and A.1

    STREAmS: A high-fidelity accelerated solver for direct numerical simulation of compressible turbulent flows

    Get PDF
    We present STREAmS, an in-house high-fidelity solver for direct numerical simulations (DNS) of canonical compressible wall-bounded flows, namely turbulent plane channel, zero-pressure gradient turbulent boundary layer and supersonic oblique shock-wave/boundary layer interaction. The solver incorporates state-of-the-art numerical algorithms, specifically designed to cope with the challenging problems associated with the solution of high-speed turbulent flows and can be used across a wide range of Mach numbers, extending from the low subsonic up to the hypersonic regime. From the computational viewpoint, STREAmS is oriented to modern HPC platforms thanks to MPI parallelization and the ability to run on multi-GPU architectures. This paper discusses the main implementation strategies, with particular reference to the CUDA paradigm, the management of a single code for traditional and multi-GPU architectures, and the optimization process to take advantage of the latest generation of NVIDIA GPUs. Performance measurements show that single-GPU optimization more than halves the computing time as compared to the baseline version. At the same time, the asynchronous patterns implemented in STREAmS for MPI communications guarantee very good parallel performance especially in the weak scaling spirit, with efficiency exceeding 97% on 1024 GPUs. For overall evaluation of STREAmS with respect to other compressible solvers, comparison with a recent GPU-enabled community solver is presented. It turns out that, although STREAmS is much more limited in terms of flow configurations that can be addressed, the advantage in terms of accuracy, computing time and memory occupation is substantial, which makes it an ideal candidate for large-scale simulations of high-Reynolds number, compressible wall-bounded turbulent flows. The solver is released open source under GPLv3 license. Program summary: Program Title: STREAmS CPC Library link to program files: https://doi.org/10.17632/hdcgjpzr3y.1 Developer's repository link: https://github.com/matteobernardini/STREAmS Code Ocean capsule: https://codeocean.com/capsule/8931507/tree/v2 Licensing provisions: GPLv3 Programming language: Fortran 90, CUDA Fortran, MPI Nature of problem: Solving the three-dimensional compressible Navier–Stokes equations for low and high Mach regimes in a Cartesian domain configured for channel, boundary layer or shock-boundary layer interaction flows. Solution method: The convective terms are discretized using a hybrid energy-conservative shock-capturing scheme in locally conservative form. Shock-capturing capabilities rely on the use of Lax–Friedrichs flux vector splitting and weighted essentially non-oscillatory (WENO) reconstruction. The system is advanced in time using a three-stage, third-order RK scheme. Two-dimensional pencil distributed MPI parallelization is implemented alongside different patterns of GPU (CUDA Fortran) accelerated routines

    A fast GPU Monte Carlo Radiative Heat Transfer Implementation for Coupling with Direct Numerical Simulation

    Full text link
    We implemented a fast Reciprocal Monte Carlo algorithm, to accurately solve radiative heat transfer in turbulent flows of non-grey participating media that can be coupled to fully resolved turbulent flows, namely to Direct Numerical Simulation (DNS). The spectrally varying absorption coefficient is treated in a narrow-band fashion with a correlated-k distribution. The implementation is verified with analytical solutions and validated with results from literature and line-by-line Monte Carlo computations. The method is implemented on GPU with a thorough attention to memory transfer and computational efficiency. The bottlenecks that dominate the computational expenses are addressed and several techniques are proposed to optimize the GPU execution. By implementing the proposed algorithmic accelerations, a speed-up of up to 3 orders of magnitude can be achieved, while maintaining the same accuracy

    GPU-Accelerated Large-Eddy Simulation of Turbulent Channel Flows

    Get PDF
    High performance computing clusters that are augmented with cost and power efficient graphics processing unit (GPU) provide new opportunities to broaden the use of large-eddy simulation technique to study high Reynolds number turbulent flows in fluids engineering applications. In this paper, we extend our earlier work on multi-GPU acceleration of an incompressible Navier-Stokes solver to include a large-eddy simulation (LES) capability. In particular, we implement the Lagrangian dynamic subgrid scale model and compare our results against existing direct numerical simulation (DNS) data of a turbulent channel flow at Reτ = 180. Overall, our LES results match fairly well with the DNS data. Our results show that the Reτ = 180 case can be entirely simulated on a single GPU, whereas higher Reynolds cases can benefit from a GPU cluster

    Toward a GPU-Accelerated Immersed Boundary Method for Wind Forecasting Over Complex Terrain

    Get PDF
    A short-term wind power forecasting capability can be a valuable tool in the renewable energy industry to address load-balancing issues that arise from intermittent wind fields. Although numerical weather prediction models have been used to forecast winds, their applicability to micro-scale atmospheric boundary layer flows and ability to predict wind speeds at turbine hub height with a desired accuracy is not clear. To address this issue, we develop a multi-GPU parallel flow solver to forecast winds over complex terrain at the micro-scale, where computational domain size can range from meters to several kilometers. In the solver, we adopt the immersed boundary method and the Lagrangian dynamic large-eddy simulation model and extend them to atmospheric flows. The computations are accelerated on GPU clusters with a dual-level parallel implementation that interleaves MPI with CUDA. We evaluate the flow solver components against test problems and obtain preliminary results of flow over Bolund Hill, a coastal hill in Denmark
    • …
    corecore