387 research outputs found

    A new gravitational N-body simulation algorithm for investigation of cosmological chaotic advection

    Full text link
    Recently alternative approaches in cosmology seeks to explain the nature of dark matter as a direct result of the non-linear spacetime curvature due to different types of deformation potentials. In this context, a key test for this hypothesis is to examine the effects of deformation on the evolution of large scales structures. An important requirement for the fine analysis of this pure gravitational signature (without dark matter elements) is to characterize the position of a galaxy during its trajectory to the gravitational collapse of super clusters at low redshifts. In this context, each element in an gravitational N-body simulation behaves as a tracer of collapse governed by the process known as chaotic advection (or lagrangian turbulence). In order to develop a detailed study of this new approach we develop the COsmic LAgrangian TUrbulence Simulator (COLATUS) to perform gravitational N-body simulations based on Compute Unified Device Architecture (CUDA) for graphics processing units (GPUs). In this paper we report the first robust results obtained from COLATUS.Comment: Proceedings of Sixth International School on Field Theory and Gravitation-2012 - by American Institute of Physic

    Direct numerical simulation of compressible turbulence accelerated by graphics processing unit. Part 1: An open-source high accuracy accelerated computational fluid dynamic software

    Full text link
    This paper introduces open-source computational fluid dynamics software named open computational fluid dynamic code for scientific computation with graphics processing unit (GPU) system (OpenCFD-SCU), developed by the authors for direct numerical simulation (DNS) of compressible wall-bounded turbulence. This software is based on the finite difference method and is accelerated by the use of a GPU, which provides an acceleration by a factor of more than 200 compared with central processing unit (CPU) software based on the same algorithm and number of message passing interface (MPI) processes, and the running speed of OpenCFD-SCU with just 512 GPUs exceed that of CPU software with 130\,000 CPUs. GPU-Stream technology is used to implement overlap of computing and communication, achieving 98.7\% parallel weak scalability with 24\,576 GPUs. The software includes a variety of high-precision finite difference schemes, and supports a hybrid finite difference scheme, enabling it to provide both robustness and high precision when simulating complex supersonic and hypersonic flows. When used with the wide range of supercomputers currently available, the software should able to improve the performance of large-scale simulations by up to two orders on the computational scale. Then, OpenCFD-SCU is applied to a validation and verification case of a Mach 2.9 compression ramp with mesh numbers up to 31.2 billion. More challenging cases using hybrid finite schemes are shown in Part 2(Dang, Li et al. 2022). The code is available and supported at \url{http://developer.hpccube.com/codes/danggl/opencfd-scu.git}.Comment: 23 pages, 25 figure

    STREAmS: a high-fidelity accelerated solver for direct numerical simulation of compressible turbulent flow

    Full text link
    We present STREAmS, an in-house high-fidelity solver for large-scale, massively parallel direct numerical simulations (DNS) of compressible turbulent flows on graphical processing units (GPUs). STREAmS is written in the Fortran 90 language and it is tailored to carry out DNS of canonical compressible wall-bounded flows, namely turbulent plane channel, zero-pressure gradient turbulent boundary layer and supersonic oblique shock-wave/boundary layer interactions. The solver incorporates state-of-the-art numerical algorithms, specifically designed to cope with the challenging problems associated with the solution of high-speed turbulent flows and can be used across a wide range of Mach numbers, extending from the low subsonic up to the hypersonic regime. The use of cuf automatic kernels allowed an easy and efficient porting on the GPU architecture minimizing the changes to the original CPU code, which is also maintained. We discuss a memory allocation strategy based on duplicated arrays for host and device which carefully minimizes the memory usage making the solver suitable for large scale computations on the latest GPU cards. Comparison between different CPUs and GPUs architectures strongly favor the latter, and executing the solver on a single NVIDIA Tesla P100 corresponds to using approximately 330 Intel Knights Landing CPU cores. STREAmS shows very good strong scalability and essentially ideal weak scalability up to 2048 GPUs, paving the way to simulations in the genuine high-Reynolds number regime, possibly at friction Reynolds number Reτ>104Re_{\tau} > 10^4. The solver is released open source under GPLv3 license and is available at https://github.com/matteobernardini/STREAmS.Comment: 11 pages, 11 figure

    STREAmS: A high-fidelity accelerated solver for direct numerical simulation of compressible turbulent flows

    Get PDF
    We present STREAmS, an in-house high-fidelity solver for direct numerical simulations (DNS) of canonical compressible wall-bounded flows, namely turbulent plane channel, zero-pressure gradient turbulent boundary layer and supersonic oblique shock-wave/boundary layer interaction. The solver incorporates state-of-the-art numerical algorithms, specifically designed to cope with the challenging problems associated with the solution of high-speed turbulent flows and can be used across a wide range of Mach numbers, extending from the low subsonic up to the hypersonic regime. From the computational viewpoint, STREAmS is oriented to modern HPC platforms thanks to MPI parallelization and the ability to run on multi-GPU architectures. This paper discusses the main implementation strategies, with particular reference to the CUDA paradigm, the management of a single code for traditional and multi-GPU architectures, and the optimization process to take advantage of the latest generation of NVIDIA GPUs. Performance measurements show that single-GPU optimization more than halves the computing time as compared to the baseline version. At the same time, the asynchronous patterns implemented in STREAmS for MPI communications guarantee very good parallel performance especially in the weak scaling spirit, with efficiency exceeding 97% on 1024 GPUs. For overall evaluation of STREAmS with respect to other compressible solvers, comparison with a recent GPU-enabled community solver is presented. It turns out that, although STREAmS is much more limited in terms of flow configurations that can be addressed, the advantage in terms of accuracy, computing time and memory occupation is substantial, which makes it an ideal candidate for large-scale simulations of high-Reynolds number, compressible wall-bounded turbulent flows. The solver is released open source under GPLv3 license. Program summary: Program Title: STREAmS CPC Library link to program files: https://doi.org/10.17632/hdcgjpzr3y.1 Developer's repository link: https://github.com/matteobernardini/STREAmS Code Ocean capsule: https://codeocean.com/capsule/8931507/tree/v2 Licensing provisions: GPLv3 Programming language: Fortran 90, CUDA Fortran, MPI Nature of problem: Solving the three-dimensional compressible Navier–Stokes equations for low and high Mach regimes in a Cartesian domain configured for channel, boundary layer or shock-boundary layer interaction flows. Solution method: The convective terms are discretized using a hybrid energy-conservative shock-capturing scheme in locally conservative form. Shock-capturing capabilities rely on the use of Lax–Friedrichs flux vector splitting and weighted essentially non-oscillatory (WENO) reconstruction. The system is advanced in time using a three-stage, third-order RK scheme. Two-dimensional pencil distributed MPI parallelization is implemented alongside different patterns of GPU (CUDA Fortran) accelerated routines

    A hierarchical parallel implementation model for algebra-based CFD simulations on hybrid supercomputers

    Get PDF
    (English) Continuous enhancement in hardware technologies enables scientific computing to advance incessantly and reach further aims. Since the start of the global race for exascale high-performance computing (HPC), massively-parallel devices of various architectures have been incorporated into the newest supercomputers, leading to an increasing hybridization of HPC systems. In this context of accelerated innovation, software portability and efficiency become crucial. Traditionally, scientific computing software development is based on calculations in iterative stencil loops (ISL) over a discretized geometry—the mesh. Despite being intuitive and versatile, the interdependency between algorithms and their computational implementations in stencil applications usually results in a large number of subroutines and introduces an inevitable complexity when it comes to portability and sustainability. An alternative is to break the interdependency between algorithm and implementation to cast the calculations into a minimalist set of kernels. The portable implementation model that is the object of this thesis is not restricted to a particular numerical method or problem. However, owing to the CTTC's long tradition in computational fluid dynamics (CFD) and without loss of generality, this work is targeted to solve transient CFD simulations. By casting discrete operators and mesh functions into (sparse) matrices and vectors, it is shown that all the calculations in a typical CFD algorithm boil down to the following basic linear algebra subroutines: the sparse matrix-vector product, the linear combination of vectors, and the dot product. The proposed formulation eases the deployment of scientific computing software in massively parallel hybrid computing systems and is demonstrated in the large-scale, direct numerical simulation of transient turbulent flows.(Català) La millora contínua en tecnologies de la informàtica possibilita a la comunitat de computació científica avançar incessantment i assolir ulteriors objectius. Des de l'inici de la cursa global per a la computació d'alt rendiment (HPC) d'exa-escala, s'han incorporat dispositius massivament paral·lels de diverses arquitectures als supercomputadors més nous, donant lloc a una creixent hibridació dels sistemes HPC. En aquest context d'innovació accelerada, la portabilitat i l'eficiència del programari esdevenen crucials. Tradicionalment, el desenvolupament de programari informàtic científic es basa en càlculs en bucles de patrons iteratius (ISL) sobre una geometria discretitzada: la malla. Tot i ser intuïtiva i versàtil, la interdependència entre algorismes i les seves implementacions computacionals en aplicacions de patrons sol donar lloc a un gran nombre de subrutines i introdueix una complexitat inevitable quan es tracta de portabilitat i sostenibilitat. Una alternativa és trencar la interdependència entre l'algorisme i la implementació per reduir els càlculs a un conjunt minimalista de subrutines. El model d'implementació portable objecte d'aquesta tesi no es limita a un mètode o problema numèric concret. No obstant això, i a causa de la llarga tradició del CTTC en dinàmica de fluids computacional (CFD) i sense pèrdua de generalitat, aquest treball està dirigit a resoldre simulacions CFD transitòries. Mitjançant la conversió d'operadors discrets i funcions de malla en matrius (disperses) i vectors, es demostra que tots els càlculs d'un algorisme CFD típic es redueixen a les següents subrutines bàsiques d'àlgebra lineal: el producte dispers matriu-vector, la combinació lineal de vectors, i el producte escalar. La formulació proposada facilita el desplegament de programari de computació científica en sistemes informàtics híbrids massivament paral·lels i es demostra el seu rendiment en la simulació numèrica directa de gran escala de fluxos turbulents transitoris.Enginyeria tèrmic

    A hierarchical parallel implementation model for algebra-based CFD simulations on hybrid supercomputers

    Get PDF
    (English) Continuous enhancement in hardware technologies enables scientific computing to advance incessantly and reach further aims. Since the start of the global race for exascale high-performance computing (HPC), massively-parallel devices of various architectures have been incorporated into the newest supercomputers, leading to an increasing hybridization of HPC systems. In this context of accelerated innovation, software portability and efficiency become crucial. Traditionally, scientific computing software development is based on calculations in iterative stencil loops (ISL) over a discretized geometry—the mesh. Despite being intuitive and versatile, the interdependency between algorithms and their computational implementations in stencil applications usually results in a large number of subroutines and introduces an inevitable complexity when it comes to portability and sustainability. An alternative is to break the interdependency between algorithm and implementation to cast the calculations into a minimalist set of kernels. The portable implementation model that is the object of this thesis is not restricted to a particular numerical method or problem. However, owing to the CTTC's long tradition in computational fluid dynamics (CFD) and without loss of generality, this work is targeted to solve transient CFD simulations. By casting discrete operators and mesh functions into (sparse) matrices and vectors, it is shown that all the calculations in a typical CFD algorithm boil down to the following basic linear algebra subroutines: the sparse matrix-vector product, the linear combination of vectors, and the dot product. The proposed formulation eases the deployment of scientific computing software in massively parallel hybrid computing systems and is demonstrated in the large-scale, direct numerical simulation of transient turbulent flows.(Català) La millora contínua en tecnologies de la informàtica possibilita a la comunitat de computació científica avançar incessantment i assolir ulteriors objectius. Des de l'inici de la cursa global per a la computació d'alt rendiment (HPC) d'exa-escala, s'han incorporat dispositius massivament paral·lels de diverses arquitectures als supercomputadors més nous, donant lloc a una creixent hibridació dels sistemes HPC. En aquest context d'innovació accelerada, la portabilitat i l'eficiència del programari esdevenen crucials. Tradicionalment, el desenvolupament de programari informàtic científic es basa en càlculs en bucles de patrons iteratius (ISL) sobre una geometria discretitzada: la malla. Tot i ser intuïtiva i versàtil, la interdependència entre algorismes i les seves implementacions computacionals en aplicacions de patrons sol donar lloc a un gran nombre de subrutines i introdueix una complexitat inevitable quan es tracta de portabilitat i sostenibilitat. Una alternativa és trencar la interdependència entre l'algorisme i la implementació per reduir els càlculs a un conjunt minimalista de subrutines. El model d'implementació portable objecte d'aquesta tesi no es limita a un mètode o problema numèric concret. No obstant això, i a causa de la llarga tradició del CTTC en dinàmica de fluids computacional (CFD) i sense pèrdua de generalitat, aquest treball està dirigit a resoldre simulacions CFD transitòries. Mitjançant la conversió d'operadors discrets i funcions de malla en matrius (disperses) i vectors, es demostra que tots els càlculs d'un algorisme CFD típic es redueixen a les següents subrutines bàsiques d'àlgebra lineal: el producte dispers matriu-vector, la combinació lineal de vectors, i el producte escalar. La formulació proposada facilita el desplegament de programari de computació científica en sistemes informàtics híbrids massivament paral·lels i es demostra el seu rendiment en la simulació numèrica directa de gran escala de fluxos turbulents transitoris.Postprint (published version

    RHEA: an open-source Reproducible Hybrid-architecture flow solver Engineered for Academia

    Get PDF
    The study of complex multiscale flows (Groen et al., 2014), like for example the motion of small-scale turbulent eddies over large aerodynamic structures (Jofre & Doostan, 2022), microconfined high-pressure supercritical fluids for enhanced energy transfer (Bernades & Jofre, 2022), or hydrodynamic focusing of microorganisms in wall-bounded flows (Palacios et al., 2022), greatly benefits from the combination of interconnected theoretical, computational and experimental approaches. This manifold methodology provides a robust framework tocorroborate the phenomena observed, validate the modeling assumptions utilized, and facilitatesthe exploration of wider parameter spaces and extraction of more sophisticated insights. These analyses are typically encompassed within the field of Predictive Science & Engineering (Njam, 2009), which has attracted attention in the Fluid Mechanics community and is expected to exponentially grow as computational studies transition from (mostly) physics simulations to active vectors for scientific discovery and technological innovation with the advent of Exascale computing (Alowayyed et al., 2017). In this regard, the computational flow solver presented aims at bridging the gap between studying complex multiscale flow problems and utilizing present and future state-of-the-art supercomputing systems in academic environments.The solver presented is named RHEA, which stands for open-source Reproducible Hybrid-architecture flow solver Engineered for Academia, and is available as an open-source Git repository at https://gitlab.com/ProjectRHEA/flowsolverrheaPeer ReviewedPostprint (author's final draft
    corecore