1,957 research outputs found
Achieving High Speed CFD simulations: Optimization, Parallelization, and FPGA Acceleration for the unstructured DLR TAU Code
Today, large scale parallel simulations are fundamental tools to handle complex problems. The number of processors in current computation platforms has been recently increased and therefore it is necessary to optimize the application performance and to enhance the scalability of massively-parallel systems. In addition, new heterogeneous architectures, combining conventional processors with specific hardware, like FPGAs, to accelerate the most time consuming functions are considered as a strong alternative to boost the performance.
In this paper, the performance of the DLR TAU code is analyzed and optimized. The improvement of the code efficiency is addressed through three key activities: Optimization, parallelization and hardware acceleration. At first, a profiling analysis of the most time-consuming processes of the Reynolds Averaged Navier Stokes flow solver on a three-dimensional unstructured mesh is performed. Then, a study of the code scalability with new partitioning algorithms are tested to show the most suitable partitioning algorithms for the selected applications. Finally, a feasibility study on the application of FPGAs and GPUs for the hardware acceleration of CFD simulations is presented
Matrix-free GPU implementation of a preconditioned conjugate gradient solver for anisotropic elliptic PDEs
Many problems in geophysical and atmospheric modelling require the fast
solution of elliptic partial differential equations (PDEs) in "flat" three
dimensional geometries. In particular, an anisotropic elliptic PDE for the
pressure correction has to be solved at every time step in the dynamical core
of many numerical weather prediction models, and equations of a very similar
structure arise in global ocean models, subsurface flow simulations and gas and
oil reservoir modelling. The elliptic solve is often the bottleneck of the
forecast, and an algorithmically optimal method has to be used and implemented
efficiently. Graphics Processing Units have been shown to be highly efficient
for a wide range of applications in scientific computing, and recently
iterative solvers have been parallelised on these architectures. We describe
the GPU implementation and optimisation of a Preconditioned Conjugate Gradient
(PCG) algorithm for the solution of a three dimensional anisotropic elliptic
PDE for the pressure correction in NWP. Our implementation exploits the strong
vertical anisotropy of the elliptic operator in the construction of a suitable
preconditioner. As the algorithm is memory bound, performance can be improved
significantly by reducing the amount of global memory access. We achieve this
by using a matrix-free implementation which does not require explicit storage
of the matrix and instead recalculates the local stencil. Global memory access
can also be reduced by rewriting the algorithm using loop fusion and we show
that this further reduces the runtime on the GPU. We demonstrate the
performance of our matrix-free GPU code by comparing it to a sequential CPU
implementation and to a matrix-explicit GPU code which uses existing libraries.
The absolute performance of the algorithm for different problem sizes is
quantified in terms of floating point throughput and global memory bandwidth.Comment: 18 pages, 7 figure
Implicit High-Order Flux Reconstruction Solver for High-Speed Compressible Flows
The present paper addresses the development and implementation of the first
high-order Flux Reconstruction (FR) solver for high-speed flows within the
open-source COOLFluiD (Computational Object-Oriented Libraries for Fluid
Dynamics) platform. The resulting solver is fully implicit and able to simulate
compressible flow problems governed by either the Euler or the Navier-Stokes
equations in two and three dimensions. Furthermore, it can run in parallel on
multiple CPU-cores and is designed to handle unstructured grids consisting of
both straight and curved edged quadrilateral or hexahedral elements. While most
of the implementation relies on state-of-the-art FR algorithms, an improved and
more case-independent shock capturing scheme has been developed in order to
tackle the first viscous hypersonic simulations using the FR method. Extensive
verification of the FR solver has been performed through the use of
reproducible benchmark test cases with flow speeds ranging from subsonic to
hypersonic, up to Mach 17.6. The obtained results have been favorably compared
to those available in literature. Furthermore, so-called super-accuracy is
retrieved for certain cases when solving the Euler equations. The strengths of
the FR solver in terms of computational accuracy per degree of freedom are also
illustrated. Finally, the influence of the characterizing parameters of the FR
method as well as the the influence of the novel shock capturing scheme on the
accuracy of the developed solver is discussed
Thermodynamic Conditions in Quenching Chamber of Low Voltage Circuit Breaker
Práce se zabývá studiem procesů probíhajících při zhášení silnoproudého oblouku ve zhášecí komoře jističe. Je zaměřena na výpočet dynamiky tekutin a teplotního pole v okolí elektrického oblouku. V práci je dále popsán vliv vzdálenosti plechů v komoře a vliv tvarů plechů z hlediska aerodynamických podmínek uvnitř komory. Dalším cílem dosaženým touto prací je poskytnutí informací o vlivu polohy elektrického oblouku na termodynamické vlastnosti uvnitř komory. Toto je důležité, zejména pokud je oblouk do komory vtahován jinými silami, např. elektromagnetickými a během tohoto vtahovacího procesu mění svůj tvar i polohu. Za účelem co nejjednoduššího, ale zároveň co nejefektivnějšího řešení úkolu, byl vyvinut software určen speciálně pro výpočet dynamiky tekutin numerickou metodou konečných objemů (FVM). Tato metoda je, v porovnání s rozšířenější metodou konečných prvků (FEM), vhodnější pro výpočet dynamiky tekutin (CFD) zejména proto, že režie na výpočet jedné iterace jsou menší v porovnání s ostatními numerickými metodami. Další výhodou tohoto softwarového řešení je jeho modularita a rozšiřitelnost. Cely koncept softwaru je postaven na tzv. zásuvných modulech. Díky tomuto řešení můžeme využít výpočtové jádro pro další numerické analýzy, např. strukturální, elektromagnetickou apod. Jediná potřeba pro úspěšné používání těchto analýz je napsáni solveru pro konečné prvky (FEM). Jelikož je software koncipován jako multi–thread aplikace, využívá výkon současných vícejádrových procesorů naplno. Tato vlastnost se ještě více projeví, pokud se výpočet přesune z CPU na GPU. Jelikož současné grafické karty vyšších tříd mají několik desítek až stovek výpočetních jader a pracují s mnohem rychlejšími pamětmi, než CPU, je výpočetní výkon několikanásobně vyšší.Work deals with the study of processes that attend the electric arc extinction inside the quenching chamber of a circuit breaker. It is focused on several areas. The first one is concerned to fluid dynamics calculations (CFD) and the second one is aimed at thermal field calculations. In this work effects of metal plates distance together with metal plates shapes are described from aerodynamical point of view. Another objective solved by this work is to give information about influence of an electric arc position in a quenching chamber, which changed its shape due to forces acting on it during extinction process. For purpose of this work a new software solution for CFD was developed. Whole software concept is based on plug-ins. Due to this solution, the software§s calculation core can be used for other numerical analyses, like structural, electromagnetic, etc. The only requirement is to write a plug-in for these analyses. Because the software is designed as multi-threaded application, it can use the fully performance of current multi-core processors. Above mentioned property can be especially shown off, when a calculation is moved from CPU to GPU (Graphics Processing Units). Current high-end graphic cards have tens to hundreds cores and work with faster memories than CPU. Due to this fact, the simulation performance can raised manifold.
- …