1,398 research outputs found

    A bibliography on parallel and vector numerical algorithms

    Get PDF
    This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also

    An open and parallel multiresolution framework using block-based adaptive grids

    Full text link
    A numerical approach for solving evolutionary partial differential equations in two and three space dimensions on block-based adaptive grids is presented. The numerical discretization is based on high-order, central finite-differences and explicit time integration. Grid refinement and coarsening are triggered by multiresolution analysis, i.e. thresholding of wavelet coefficients, which allow controlling the precision of the adaptive approximation of the solution with respect to uniform grid computations. The implementation of the scheme is fully parallel using MPI with a hybrid data structure. Load balancing relies on space filling curves techniques. Validation tests for 2D advection equations allow to assess the precision and performance of the developed code. Computations of the compressible Navier-Stokes equations for a temporally developing 2D mixing layer illustrate the properties of the code for nonlinear multi-scale problems. The code is open source

    CFD modelling of wind turbine airfoil aerodynamics

    Get PDF
    This paper reports the first findings of an ongoing research programme on wind turbine computational aerodynamics at the University of Glasgow. Several modeling aspects of wind turbine airfoil aerodynamics based on the solution of the Reynoldsaveraged Navier-Stokes (RANS) equations are addressed. One of these is the effect of an a priori method for structured grid adaptation aimed at improving the wake resolution. Presented results emphasize that the proposed adaptation strategy greatly improves the wake resolution in the far-field, whereas the wake is completely diffused by the non-adapted grid with the same number and distribution of grid nodes. A grid refinement analysis carried out with the adapted grid shows that the improvements of flow resolution thus achieved are of a smaller magnitude with respect to those accomplished by adapting the grid keeping constant the number of nodes. The proposed adaptation approach can be easily included in the structured generation process of both commercial and in-house structured mesh generators systems. The study also aims at quantifying the solution inaccuracy arising from not modeling the laminar-to-turbulent transition. It is found that the drag forces obtained by considering the flow as transitional or fully turbulent may differ by 50 %. The impact of various turbulence models on the predicted aerodynamic forces is also analyzed. All these issues are investigated using a special-purpose hyperbolic grid generator and a multi-block structured finitevolume RANS code. The numerical experiments consider the flow field past a wind turbine airfoil for which an exhaustive campaign of steady and unsteady experimental measurements was conducted. The predictive capabilities of the CFD solver are validated by comparing experimental data and numerical predictions for selected flow regimes. The incompressible analysis and design code XFOIL is also used to support the findings of the comparative analysis of numerical RANS-based results and experimental data

    Solution of partial differential equations on vector and parallel computers

    Get PDF
    The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed

    An MPI-CUDA Implementation for Massively Parallel Incompressible Flow Computations on Multi-GPU Clusters

    Get PDF
    Modern graphics processing units (GPUs) with many-core architectures have emerged as general-purpose parallel computing platforms that can accelerate simulation science applications tremendously. While multi-GPU workstations with several TeraFLOPS of peak computing power are available to accelerate computational problems, larger problems require even more resources. Conventional clusters of central processing units (CPU) are now being augmented with multiple GPUs in each compute-node to tackle large problems. The heterogeneous architecture of a multi-GPU cluster with a deep memory hierarchy creates unique challenges in developing scalable and efficient simulation codes. In this study, we pursue mixed MPI-CUDA implementations and investigate three strategies to probe the efficiency and scalability of incompressible flow computations on the Lincoln Tesla cluster at the National Center for Supercomputing Applications (NCSA). We exploit some of the advanced features of MPI and CUDA programming to overlap both GPU data transfer and MPI communications with computations on the GPU. We sustain approximately 2.4 TeraFLOPS on the 64 nodes of the NCSA Lincoln Tesla cluster using 128 GPUs with a total of 30,720 processing elements. Our results demonstrate that multi-GPU clusters can substantially accelerate computational fluid dynamics (CFD) simulations

    Segregated Runge–Kutta time integration of convection-stabilized mixed finite element schemes for wall-unresolved LES of incompressible flows

    Get PDF
    In this work, we develop a high-performance numerical framework for the large eddy simulation (LES) of incompressible flows. The spatial discretization of the nonlinear system is carried out using mixed finite element (FE) schemes supplemented with symmetric projection stabilization of the convective term and a penalty term for the divergence constraint. These additional terms introduced at the discrete level have been proved to act as implicit LES models. In order to perform meaningful wall-unresolved simulations, we consider a weak imposition of the boundary conditions using a Nitsche’s-type scheme, where the tangential component penalty term is designed to act as a wall law. Next, segregated Runge–Kutta (SRK) schemes (recently proposed by the authors for laminar flow problems) are applied to the LES simulation of turbulent flows. By the introduction of a penalty term on the trace of the acceleration, these methods exhibit excellent stability properties for both implicit and explicit treatment of the convective terms. SRK schemes are excellent for large-scale simulations, since they reduce the computational cost of the linear system solves by splitting velocity and pressure computations at the time integration level, leading to two uncoupled systems. The pressure system is a Darcy-type problem that can easily be preconditioned using a traditional block-preconditioning scheme that only requires a Poisson solver. At the end, only coercive systems have to be solved, which can be effectively preconditioned by multilevel domain decomposition schemes, which are both optimal and scalable. The framework is applied to the Taylor–Green and turbulent channel flow benchmarks in order to prove the accuracy of the convection-stabilized mixed FEs as LES models and SRK time integrators. The scalability of the preconditioning techniques (in space only) has also been proven for one step of the SRK scheme for the Taylor–Green flow using uniform meshes. Moreover, a turbulent flow around a NACA profile is solved to show the applicability of the proposed algorithms for a realistic problem.Peer ReviewedPostprint (author's final draft

    Development of a Navier-Stokes algorithm for parallel-processing supercomputers

    Get PDF
    An explicit flow solver, applicable to the hierarchy of model equations ranging from Euler to full Navier-Stokes, is combined with several techniques designed to reduce computational expense. The computational domain consists of local grid refinements embedded in a global coarse mesh, where the locations of these refinements are defined by the physics of the flow. Flow characteristics are also used to determine which set of model equations is appropriate for solution in each region, thereby reducing not only the number of grid points at which the solution must be obtained, but also the computational effort required to get that solution. Acceleration to steady-state is achieved by applying multigrid on each of the subgrids, regardless of the particular model equations being solved. Since each of these components is explicit, advantage can readily be taken of the vector- and parallel-processing capabilities of machines such as the Cray X-MP and Cray-2

    A Full-Depth Amalgamated Parallel 3D Geometric Multigrid Solver for GPU Clusters

    Get PDF
    Numerical computations of incompressible flow equations with pressure-based algorithms necessitate the solution of an elliptic Poisson equation, for which multigrid methods are known to be very efficient. In our previous work we presented a dual-level (MPI-CUDA) parallel implementation of the Navier-Stokes equations to simulate buoyancy-driven incompressible fluid flows on GPU clusters with simple iterative methods while focusing on the scalability of the overall solver. In the present study we describe the implementation and performance of a multigrid method to solve the pressure Poisson equation within our MPI-CUDA parallel incompressible flow solver. Various design decisions and algorithmic choices for multigrid methods are explored in light of NVIDIA’s recent Fermi architecture. We discuss how unique aspects of an MPI-CUDA implementation for GPU clusters is related to the software choices made to implement the multigrid method. We propose a new coarse grid solution method of embedded multigrid with amalgamation and show that the parallel implementation retains the numerical efficiency of the multigrid method. Performance measurements on the NCSA Lincoln and TACC Longhorn clusters are presented for up to 64 GPUs

    Efficient Simulations of Large Scale Convective Heat Transfer Problems

    Get PDF
    We describe an approach for efficient solution of large scale convective heat transfer problems, formulated as coupled unsteady heat conduction and incompressible fluid flow equations. The original problem is discretized in time using classical implicit methods, while stabilized finite elements are used for space discretization. The algorithm employed for the discretization of the fluid flow problem uses Picard's iterations to solve the arising nonlinear equations. Both problems, heat transfer and Navier-Stokes quations, give rise to large sparse systems of linear equations. The systems are solved using iterative GMRES solver with suitable preconditioning. For the incompressible flow equations we employ a special preconditioner based on algebraic multigrid (AMG) technique. The paper presents algorithmic and implementation details of the solution procedure, which is suitably tuned, especially for ill conditioned systems arising from discretizations of incompressible Navier-Stokes equations. We describe parallel implementation of the solver using MPI and elements of PETSC library. The scalability of the solver is favourably compared with other methods such as direct solvers and standard GMRES method with ILU preconditioning.

    STREAmS: a high-fidelity accelerated solver for direct numerical simulation of compressible turbulent flow

    Full text link
    We present STREAmS, an in-house high-fidelity solver for large-scale, massively parallel direct numerical simulations (DNS) of compressible turbulent flows on graphical processing units (GPUs). STREAmS is written in the Fortran 90 language and it is tailored to carry out DNS of canonical compressible wall-bounded flows, namely turbulent plane channel, zero-pressure gradient turbulent boundary layer and supersonic oblique shock-wave/boundary layer interactions. The solver incorporates state-of-the-art numerical algorithms, specifically designed to cope with the challenging problems associated with the solution of high-speed turbulent flows and can be used across a wide range of Mach numbers, extending from the low subsonic up to the hypersonic regime. The use of cuf automatic kernels allowed an easy and efficient porting on the GPU architecture minimizing the changes to the original CPU code, which is also maintained. We discuss a memory allocation strategy based on duplicated arrays for host and device which carefully minimizes the memory usage making the solver suitable for large scale computations on the latest GPU cards. Comparison between different CPUs and GPUs architectures strongly favor the latter, and executing the solver on a single NVIDIA Tesla P100 corresponds to using approximately 330 Intel Knights Landing CPU cores. STREAmS shows very good strong scalability and essentially ideal weak scalability up to 2048 GPUs, paving the way to simulations in the genuine high-Reynolds number regime, possibly at friction Reynolds number Reτ>104Re_{\tau} > 10^4. The solver is released open source under GPLv3 license and is available at https://github.com/matteobernardini/STREAmS.Comment: 11 pages, 11 figure
    • …
    corecore