387 research outputs found
A new gravitational N-body simulation algorithm for investigation of cosmological chaotic advection
Recently alternative approaches in cosmology seeks to explain the nature of
dark matter as a direct result of the non-linear spacetime curvature due to
different types of deformation potentials. In this context, a key test for this
hypothesis is to examine the effects of deformation on the evolution of large
scales structures. An important requirement for the fine analysis of this pure
gravitational signature (without dark matter elements) is to characterize the
position of a galaxy during its trajectory to the gravitational collapse of
super clusters at low redshifts. In this context, each element in an
gravitational N-body simulation behaves as a tracer of collapse governed by the
process known as chaotic advection (or lagrangian turbulence). In order to
develop a detailed study of this new approach we develop the COsmic LAgrangian
TUrbulence Simulator (COLATUS) to perform gravitational N-body simulations
based on Compute Unified Device Architecture (CUDA) for graphics processing
units (GPUs). In this paper we report the first robust results obtained from
COLATUS.Comment: Proceedings of Sixth International School on Field Theory and
Gravitation-2012 - by American Institute of Physic
Direct numerical simulation of compressible turbulence accelerated by graphics processing unit. Part 1: An open-source high accuracy accelerated computational fluid dynamic software
This paper introduces open-source computational fluid dynamics software named
open computational fluid dynamic code for scientific computation with graphics
processing unit (GPU) system (OpenCFD-SCU), developed by the authors for direct
numerical simulation (DNS) of compressible wall-bounded turbulence. This
software is based on the finite difference method and is accelerated by the use
of a GPU, which provides an acceleration by a factor of more than 200 compared
with central processing unit (CPU) software based on the same algorithm and
number of message passing interface (MPI) processes, and the running speed of
OpenCFD-SCU with just 512 GPUs exceed that of CPU software with 130\,000 CPUs.
GPU-Stream technology is used to implement overlap of computing and
communication, achieving 98.7\% parallel weak scalability with 24\,576 GPUs.
The software includes a variety of high-precision finite difference schemes,
and supports a hybrid finite difference scheme, enabling it to provide both
robustness and high precision when simulating complex supersonic and hypersonic
flows. When used with the wide range of supercomputers currently available, the
software should able to improve the performance of large-scale simulations by
up to two orders on the computational scale. Then, OpenCFD-SCU is applied to a
validation and verification case of a Mach 2.9 compression ramp with mesh
numbers up to 31.2 billion. More challenging cases using hybrid finite schemes
are shown in Part 2(Dang, Li et al. 2022). The code is available and supported
at \url{http://developer.hpccube.com/codes/danggl/opencfd-scu.git}.Comment: 23 pages, 25 figure
STREAmS: a high-fidelity accelerated solver for direct numerical simulation of compressible turbulent flow
We present STREAmS, an in-house high-fidelity solver for large-scale,
massively parallel direct numerical simulations (DNS) of compressible turbulent
flows on graphical processing units (GPUs). STREAmS is written in the Fortran
90 language and it is tailored to carry out DNS of canonical compressible
wall-bounded flows, namely turbulent plane channel, zero-pressure gradient
turbulent boundary layer and supersonic oblique shock-wave/boundary layer
interactions. The solver incorporates state-of-the-art numerical algorithms,
specifically designed to cope with the challenging problems associated with the
solution of high-speed turbulent flows and can be used across a wide range of
Mach numbers, extending from the low subsonic up to the hypersonic regime. The
use of cuf automatic kernels allowed an easy and efficient porting on the GPU
architecture minimizing the changes to the original CPU code, which is also
maintained. We discuss a memory allocation strategy based on duplicated arrays
for host and device which carefully minimizes the memory usage making the
solver suitable for large scale computations on the latest GPU cards.
Comparison between different CPUs and GPUs architectures strongly favor the
latter, and executing the solver on a single NVIDIA Tesla P100 corresponds to
using approximately 330 Intel Knights Landing CPU cores. STREAmS shows very
good strong scalability and essentially ideal weak scalability up to 2048 GPUs,
paving the way to simulations in the genuine high-Reynolds number regime,
possibly at friction Reynolds number . The solver is released
open source under GPLv3 license and is available at
https://github.com/matteobernardini/STREAmS.Comment: 11 pages, 11 figure
STREAmS: A high-fidelity accelerated solver for direct numerical simulation of compressible turbulent flows
We present STREAmS, an in-house high-fidelity solver for direct numerical simulations (DNS) of canonical compressible wall-bounded flows, namely turbulent plane channel, zero-pressure gradient turbulent boundary layer and supersonic oblique shock-wave/boundary layer interaction. The solver incorporates state-of-the-art numerical algorithms, specifically designed to cope with the challenging problems associated with the solution of high-speed turbulent flows and can be used across a wide range of Mach numbers, extending from the low subsonic up to the hypersonic regime. From the computational viewpoint, STREAmS is oriented to modern HPC platforms thanks to MPI parallelization and the ability to run on multi-GPU architectures. This paper discusses the main implementation strategies, with particular reference to the CUDA paradigm, the management of a single code for traditional and multi-GPU architectures, and the optimization process to take advantage of the latest generation of NVIDIA GPUs. Performance measurements show that single-GPU optimization more than halves the computing time as compared to the baseline version. At the same time, the asynchronous patterns implemented in STREAmS for MPI communications guarantee very good parallel performance especially in the weak scaling spirit, with efficiency exceeding 97% on 1024 GPUs. For overall evaluation of STREAmS with respect to other compressible solvers, comparison with a recent GPU-enabled community solver is presented. It turns out that, although STREAmS is much more limited in terms of flow configurations that can be addressed, the advantage in terms of accuracy, computing time and memory occupation is substantial, which makes it an ideal candidate for large-scale simulations of high-Reynolds number, compressible wall-bounded turbulent flows. The solver is released open source under GPLv3 license. Program summary: Program Title: STREAmS CPC Library link to program files: https://doi.org/10.17632/hdcgjpzr3y.1 Developer's repository link: https://github.com/matteobernardini/STREAmS Code Ocean capsule: https://codeocean.com/capsule/8931507/tree/v2 Licensing provisions: GPLv3 Programming language: Fortran 90, CUDA Fortran, MPI Nature of problem: Solving the three-dimensional compressible Navier–Stokes equations for low and high Mach regimes in a Cartesian domain configured for channel, boundary layer or shock-boundary layer interaction flows. Solution method: The convective terms are discretized using a hybrid energy-conservative shock-capturing scheme in locally conservative form. Shock-capturing capabilities rely on the use of Lax–Friedrichs flux vector splitting and weighted essentially non-oscillatory (WENO) reconstruction. The system is advanced in time using a three-stage, third-order RK scheme. Two-dimensional pencil distributed MPI parallelization is implemented alongside different patterns of GPU (CUDA Fortran) accelerated routines
A hierarchical parallel implementation model for algebra-based CFD simulations on hybrid supercomputers
(English) Continuous enhancement in hardware technologies enables scientific computing to advance incessantly and reach further aims. Since the start of the global race for exascale high-performance computing (HPC), massively-parallel devices of various architectures have been incorporated into the newest supercomputers, leading to an increasing hybridization of HPC systems. In this context of accelerated innovation, software portability and efficiency become crucial.
Traditionally, scientific computing software development is based on calculations in iterative stencil loops (ISL) over a discretized geometry—the mesh. Despite being intuitive and versatile, the interdependency between algorithms and their computational implementations in stencil applications usually results in a large number of subroutines and introduces an inevitable complexity when it comes to portability and sustainability. An alternative is to break the interdependency between algorithm and implementation to cast the calculations into a minimalist set of kernels.
The portable implementation model that is the object of this thesis is not restricted to a particular numerical method or problem. However, owing to the CTTC's long tradition in computational fluid dynamics (CFD) and without loss of generality, this work is targeted to solve transient CFD simulations. By casting discrete operators and mesh functions into (sparse) matrices and vectors, it is shown that all the calculations in a typical CFD algorithm boil down to the following basic linear algebra subroutines: the sparse matrix-vector product, the linear combination of vectors, and the dot product.
The proposed formulation eases the deployment of scientific computing software in massively parallel hybrid computing systems and is demonstrated in the large-scale, direct numerical simulation of transient turbulent flows.(Català) La millora contínua en tecnologies de la informàtica possibilita a la comunitat de computació científica avançar incessantment i assolir ulteriors objectius. Des de l'inici de la cursa global per a la computació d'alt rendiment (HPC) d'exa-escala, s'han incorporat dispositius massivament paral·lels de diverses arquitectures als supercomputadors més nous, donant lloc a una creixent hibridació dels sistemes HPC. En aquest context d'innovació accelerada, la portabilitat i l'eficiència del programari esdevenen crucials. Tradicionalment, el desenvolupament de programari informàtic científic es basa en càlculs en bucles de patrons iteratius (ISL) sobre una geometria discretitzada: la malla. Tot i ser intuïtiva i versàtil, la interdependència entre algorismes i les seves implementacions computacionals en aplicacions de patrons sol donar lloc a un gran nombre de subrutines i introdueix una complexitat inevitable quan es tracta de portabilitat i sostenibilitat. Una alternativa és trencar la interdependència entre l'algorisme i la implementació per reduir els càlculs a un conjunt minimalista de subrutines. El model d'implementació portable objecte d'aquesta tesi no es limita a un mètode o problema numèric concret. No obstant això, i a causa de la llarga tradició del CTTC en dinàmica de fluids computacional (CFD) i sense pèrdua de generalitat, aquest treball està dirigit a resoldre simulacions CFD transitòries. Mitjançant la conversió d'operadors discrets i funcions de malla en matrius (disperses) i vectors, es demostra que tots els càlculs d'un algorisme CFD típic es redueixen a les següents subrutines bàsiques d'àlgebra lineal: el producte dispers matriu-vector, la combinació lineal de vectors, i el producte escalar. La formulació proposada facilita el desplegament de programari de computació científica en sistemes informàtics híbrids massivament paral·lels i es demostra el seu rendiment en la simulació numèrica directa de gran escala de fluxos turbulents transitoris.Enginyeria tèrmic
A hierarchical parallel implementation model for algebra-based CFD simulations on hybrid supercomputers
(English) Continuous enhancement in hardware technologies enables scientific computing to advance incessantly and reach further aims. Since the start of the global race for exascale high-performance computing (HPC), massively-parallel devices of various architectures have been incorporated into the newest supercomputers, leading to an increasing hybridization of HPC systems. In this context of accelerated innovation, software portability and efficiency become crucial.
Traditionally, scientific computing software development is based on calculations in iterative stencil loops (ISL) over a discretized geometry—the mesh. Despite being intuitive and versatile, the interdependency between algorithms and their computational implementations in stencil applications usually results in a large number of subroutines and introduces an inevitable complexity when it comes to portability and sustainability. An alternative is to break the interdependency between algorithm and implementation to cast the calculations into a minimalist set of kernels.
The portable implementation model that is the object of this thesis is not restricted to a particular numerical method or problem. However, owing to the CTTC's long tradition in computational fluid dynamics (CFD) and without loss of generality, this work is targeted to solve transient CFD simulations. By casting discrete operators and mesh functions into (sparse) matrices and vectors, it is shown that all the calculations in a typical CFD algorithm boil down to the following basic linear algebra subroutines: the sparse matrix-vector product, the linear combination of vectors, and the dot product.
The proposed formulation eases the deployment of scientific computing software in massively parallel hybrid computing systems and is demonstrated in the large-scale, direct numerical simulation of transient turbulent flows.(Català) La millora contínua en tecnologies de la informàtica possibilita a la comunitat de computació científica avançar incessantment i assolir ulteriors objectius. Des de l'inici de la cursa global per a la computació d'alt rendiment (HPC) d'exa-escala, s'han incorporat dispositius massivament paral·lels de diverses arquitectures als supercomputadors més nous, donant lloc a una creixent hibridació dels sistemes HPC. En aquest context d'innovació accelerada, la portabilitat i l'eficiència del programari esdevenen crucials. Tradicionalment, el desenvolupament de programari informàtic científic es basa en càlculs en bucles de patrons iteratius (ISL) sobre una geometria discretitzada: la malla. Tot i ser intuïtiva i versàtil, la interdependència entre algorismes i les seves implementacions computacionals en aplicacions de patrons sol donar lloc a un gran nombre de subrutines i introdueix una complexitat inevitable quan es tracta de portabilitat i sostenibilitat. Una alternativa és trencar la interdependència entre l'algorisme i la implementació per reduir els càlculs a un conjunt minimalista de subrutines. El model d'implementació portable objecte d'aquesta tesi no es limita a un mètode o problema numèric concret. No obstant això, i a causa de la llarga tradició del CTTC en dinàmica de fluids computacional (CFD) i sense pèrdua de generalitat, aquest treball està dirigit a resoldre simulacions CFD transitòries. Mitjançant la conversió d'operadors discrets i funcions de malla en matrius (disperses) i vectors, es demostra que tots els càlculs d'un algorisme CFD típic es redueixen a les següents subrutines bàsiques d'àlgebra lineal: el producte dispers matriu-vector, la combinació lineal de vectors, i el producte escalar. La formulació proposada facilita el desplegament de programari de computació científica en sistemes informàtics híbrids massivament paral·lels i es demostra el seu rendiment en la simulació numèrica directa de gran escala de fluxos turbulents transitoris.Postprint (published version
RHEA: an open-source Reproducible Hybrid-architecture flow solver Engineered for Academia
The study of complex multiscale flows (Groen et al., 2014), like for example the motion of small-scale turbulent eddies over large aerodynamic structures (Jofre & Doostan, 2022), microconfined high-pressure supercritical fluids for enhanced energy transfer (Bernades & Jofre, 2022), or hydrodynamic focusing of microorganisms in wall-bounded flows (Palacios et al.,
2022), greatly benefits from the combination of interconnected theoretical, computational and experimental approaches. This manifold methodology provides a robust framework tocorroborate the phenomena observed, validate the modeling assumptions utilized, and facilitatesthe exploration of wider parameter spaces and extraction of more sophisticated insights. These analyses are typically encompassed within the field of Predictive Science & Engineering (Njam, 2009), which has attracted attention in the Fluid Mechanics community and is expected to exponentially grow as computational studies transition from (mostly) physics simulations to active vectors for scientific discovery and technological innovation with the advent of Exascale computing (Alowayyed et al., 2017). In this regard, the computational flow solver presented aims at bridging the gap between studying complex multiscale flow problems and utilizing present and future state-of-the-art supercomputing systems in academic environments.The solver presented is named RHEA, which stands for open-source Reproducible Hybrid-architecture flow solver Engineered for Academia, and is available as an open-source Git
repository at https://gitlab.com/ProjectRHEA/flowsolverrheaPeer ReviewedPostprint (author's final draft
- …