1,957 research outputs found

    Achieving High Speed CFD simulations: Optimization, Parallelization, and FPGA Acceleration for the unstructured DLR TAU Code

    Get PDF
    Today, large scale parallel simulations are fundamental tools to handle complex problems. The number of processors in current computation platforms has been recently increased and therefore it is necessary to optimize the application performance and to enhance the scalability of massively-parallel systems. In addition, new heterogeneous architectures, combining conventional processors with specific hardware, like FPGAs, to accelerate the most time consuming functions are considered as a strong alternative to boost the performance. In this paper, the performance of the DLR TAU code is analyzed and optimized. The improvement of the code efficiency is addressed through three key activities: Optimization, parallelization and hardware acceleration. At first, a profiling analysis of the most time-consuming processes of the Reynolds Averaged Navier Stokes flow solver on a three-dimensional unstructured mesh is performed. Then, a study of the code scalability with new partitioning algorithms are tested to show the most suitable partitioning algorithms for the selected applications. Finally, a feasibility study on the application of FPGAs and GPUs for the hardware acceleration of CFD simulations is presented

    Matrix-free GPU implementation of a preconditioned conjugate gradient solver for anisotropic elliptic PDEs

    Get PDF
    Many problems in geophysical and atmospheric modelling require the fast solution of elliptic partial differential equations (PDEs) in "flat" three dimensional geometries. In particular, an anisotropic elliptic PDE for the pressure correction has to be solved at every time step in the dynamical core of many numerical weather prediction models, and equations of a very similar structure arise in global ocean models, subsurface flow simulations and gas and oil reservoir modelling. The elliptic solve is often the bottleneck of the forecast, and an algorithmically optimal method has to be used and implemented efficiently. Graphics Processing Units have been shown to be highly efficient for a wide range of applications in scientific computing, and recently iterative solvers have been parallelised on these architectures. We describe the GPU implementation and optimisation of a Preconditioned Conjugate Gradient (PCG) algorithm for the solution of a three dimensional anisotropic elliptic PDE for the pressure correction in NWP. Our implementation exploits the strong vertical anisotropy of the elliptic operator in the construction of a suitable preconditioner. As the algorithm is memory bound, performance can be improved significantly by reducing the amount of global memory access. We achieve this by using a matrix-free implementation which does not require explicit storage of the matrix and instead recalculates the local stencil. Global memory access can also be reduced by rewriting the algorithm using loop fusion and we show that this further reduces the runtime on the GPU. We demonstrate the performance of our matrix-free GPU code by comparing it to a sequential CPU implementation and to a matrix-explicit GPU code which uses existing libraries. The absolute performance of the algorithm for different problem sizes is quantified in terms of floating point throughput and global memory bandwidth.Comment: 18 pages, 7 figure

    Implicit High-Order Flux Reconstruction Solver for High-Speed Compressible Flows

    Full text link
    The present paper addresses the development and implementation of the first high-order Flux Reconstruction (FR) solver for high-speed flows within the open-source COOLFluiD (Computational Object-Oriented Libraries for Fluid Dynamics) platform. The resulting solver is fully implicit and able to simulate compressible flow problems governed by either the Euler or the Navier-Stokes equations in two and three dimensions. Furthermore, it can run in parallel on multiple CPU-cores and is designed to handle unstructured grids consisting of both straight and curved edged quadrilateral or hexahedral elements. While most of the implementation relies on state-of-the-art FR algorithms, an improved and more case-independent shock capturing scheme has been developed in order to tackle the first viscous hypersonic simulations using the FR method. Extensive verification of the FR solver has been performed through the use of reproducible benchmark test cases with flow speeds ranging from subsonic to hypersonic, up to Mach 17.6. The obtained results have been favorably compared to those available in literature. Furthermore, so-called super-accuracy is retrieved for certain cases when solving the Euler equations. The strengths of the FR solver in terms of computational accuracy per degree of freedom are also illustrated. Finally, the influence of the characterizing parameters of the FR method as well as the the influence of the novel shock capturing scheme on the accuracy of the developed solver is discussed

    Thermodynamic Conditions in Quenching Chamber of Low Voltage Circuit Breaker

    Get PDF
    Práce se zabývá studiem procesů probíhajících při zhášení silnoproudého oblouku ve zhášecí komoře jističe. Je zaměřena na výpočet dynamiky tekutin a teplotního pole v okolí elektrického oblouku. V práci je dále popsán vliv vzdálenosti plechů v komoře a vliv tvarů plechů z hlediska aerodynamických podmínek uvnitř komory. Dalším cílem dosaženým touto prací je poskytnutí informací o vlivu polohy elektrického oblouku na termodynamické vlastnosti uvnitř komory. Toto je důležité, zejména pokud je oblouk do komory vtahován jinými silami, např. elektromagnetickými a během tohoto vtahovacího procesu mění svůj tvar i polohu. Za účelem co nejjednoduššího, ale zároveň co nejefektivnějšího řešení úkolu, byl vyvinut software určen speciálně pro výpočet dynamiky tekutin numerickou metodou konečných objemů (FVM). Tato metoda je, v porovnání s rozšířenější metodou konečných prvků (FEM), vhodnější pro výpočet dynamiky tekutin (CFD) zejména proto, že režie na výpočet jedné iterace jsou menší v porovnání s ostatními numerickými metodami. Další výhodou tohoto softwarového řešení je jeho modularita a rozšiřitelnost. Cely koncept softwaru je postaven na tzv. zásuvných modulech. Díky tomuto řešení můžeme využít výpočtové jádro pro další numerické analýzy, např. strukturální, elektromagnetickou apod. Jediná potřeba pro úspěšné používání těchto analýz je napsáni solveru pro konečné prvky (FEM). Jelikož je software koncipován jako multi–thread aplikace, využívá výkon současných vícejádrových procesorů naplno. Tato vlastnost se ještě více projeví, pokud se výpočet přesune z CPU na GPU. Jelikož současné grafické karty vyšších tříd mají několik desítek až stovek výpočetních jader a pracují s mnohem rychlejšími pamětmi, než CPU, je výpočetní výkon několikanásobně vyšší.Work deals with the study of processes that attend the electric arc extinction inside the quenching chamber of a circuit breaker. It is focused on several areas. The first one is concerned to fluid dynamics calculations (CFD) and the second one is aimed at thermal field calculations. In this work effects of metal plates distance together with metal plates shapes are described from aerodynamical point of view. Another objective solved by this work is to give information about influence of an electric arc position in a quenching chamber, which changed its shape due to forces acting on it during extinction process. For purpose of this work a new software solution for CFD was developed. Whole software concept is based on plug-ins. Due to this solution, the software§s calculation core can be used for other numerical analyses, like structural, electromagnetic, etc. The only requirement is to write a plug-in for these analyses. Because the software is designed as multi-threaded application, it can use the fully performance of current multi-core processors. Above mentioned property can be especially shown off, when a calculation is moved from CPU to GPU (Graphics Processing Units). Current high-end graphic cards have tens to hundreds cores and work with faster memories than CPU. Due to this fact, the simulation performance can raised manifold.
    corecore