45 research outputs found

    An unstructured parallel least-squares spectral element solver for incompressible flow problems

    Get PDF
    The parallelization of the least-squares spectral element formulation of the Stokes problem has recently been discussed for incompressible flow problems on structured grids. In the present work, the extension to unstructured grids is discussed. It will be shown that, to obtain an efficient and scalable method, two different kinds of distribution of data are required involving a rather complicated parallel conversion between the data. Once the data conversion has been performed, a large symmetric positive definite algebraic system has to be solved iteratively. It is well known that the Conjugate Gradient method is a good choice to solve such systems. To improve the convergence rate of the Conjugate Gradient process, both Jacobi and Additive Schwarz preconditioners are applied. The Additive Schwarz preconditioner is based on domain decomposition and can be implemented such that a preconditioning step corresponds to a parallel matrix-by-vector product. The new results reveal that the Additive Schwarz preconditioner is very suitable for the p-refinement version of the least-squares spectral element method. To obtain good portable programs which may run on distributed-memory multiprocessors, networks of workstations as well as shared-memory machines we use MPI (Message Passing Interface). Numerical simulations have been performed to validate the scalability of the different parts of the proposed method. The experiments entailed simulating several large scale incompressible flows on a Cray T3E and on an SGI Origin 3800 with the number of processors varying from one to more than one hundred. The results indicate that the present method has very good parallel scaling properties making it a powerful method for numerical simulations of incompressible flows

    An Unstructured Parallel Least-Squares Spectral Element Solver for Incompressible Flow Problems

    Get PDF
    The parallelization of the least-squares spectral element formulation of the Stokes problem has recently been discussed for incompressible flow problems on structured grids. In the present work, the extension to unstructured grids is discussed. It will be shown that, to obtain an efficient and scalable method, two different kinds of distribution of data are required involving a rather complicated parallel conversion between the data. Once the data conversion has been performed, a large symmetric positive definite algebraic system has to be solved iteratively. It is well known that the Conjugate Gradient method is a good choice to solve such systems. To improve the convergence rate of the Conjugate Gradient process, both Jacobi and Additive Schwarz preconditioners are applied. The Additive Schwarz preconditioner is based on domain decomposition and can be implemented such that a preconditioning step corresponds to a parallel matrix-by-vector product. The new results reveal that the Additive Schwarz preconditioner is very suitable for the p-refinement version of the least-squares spectral element method. To obtain good portable programs which may run on distributed-memory multiprocessors, networks of workstations as well as shared-memory machines we use MPI (Message Passing Interface). Numerical simulations have been performed to validate the scalability of the different parts of the proposed method. The experiments entailed simulating several large scale incompressible flows on a Cray T3E and on an SGI Origin 3800 with the number of processors varying from one to more than one hundred. The results indicate that the present method has very good parallel scaling properties making it a powerful method for numerical simulations of incompressible flows

    RIACS

    Get PDF
    Topics considered include: high-performance computing; cognitive and perceptual prostheses (computational aids designed to leverage human abilities); autonomous systems. Also included: development of a 3D unstructured grid code based on a finite volume formulation and applied to the Navier-stokes equations; Cartesian grid methods for complex geometry; multigrid methods for solving elliptic problems on unstructured grids; algebraic non-overlapping domain decomposition methods for compressible fluid flow problems on unstructured meshes; numerical methods for the compressible navier-stokes equations with application to aerodynamic flows; research in aerodynamic shape optimization; S-HARP: a parallel dynamic spectral partitioner; numerical schemes for the Hamilton-Jacobi and level set equations on triangulated domains; application of high-order shock capturing schemes to direct simulation of turbulence; multicast technology; network testbeds; supercomputer consolidation project

    Performance Modeling and Prediction for the Scalable Solution of Partial Differential Equations on Unstructured Grids

    Get PDF
    This dissertation studies the sources of poor performance in scientific computing codes based on partial differential equations (PDEs), which typically perform at a computational rate well below other scientific simulations (e.g., those with dense linear algebra or N-body kernels) on modern architectures with deep memory hierarchies. We identify that the primary factors responsible for this relatively poor performance are: insufficient available memory bandwidth, low ratio of work to data size (good algorithmic efficiency), and nonscaling cost of synchronization and gather/scatter operations (for a fixed problem size scaling). This dissertation also illustrates how to reuse the legacy scientific and engineering software within a library framework. Specifically, a three-dimensional unstructured grid incompressible Euler code from NASA has been parallelized with the Portable Extensible Toolkit for Scientific Computing (PETSc) library for distributed memory architectures. Using this newly instrumented code (called PETSc-FUN3D) as an example of a typical PDE solver, we demonstrate some strategies that are effective in tolerating the latencies arising from the hierarchical memory system and the network. Even on a single processor from each of the major contemporary architectural families, the PETSc-FUN3D code runs from 2.5 to 7.5 times faster than the legacy code on a medium-sized data set (with approximately 105 degrees of freedom). The major source of performance improvement is the increased locality in data reference patterns achieved through blocking, interlacing, and edge reordering. To explain these performance gains, we provide simple performance models based on memory bandwidth and instruction issue rates. Experimental evidence, in terms of translation lookaside buffer (TLB) and data cache miss rates, achieved memory bandwidth, and graduated floating point instructions per memory reference, is provided through accurate measurements with hardware counters. The performance models and experimental results motivate algorithmic and software practices that lead to improvements in both parallel scalability and per-node performance. We identify the bottlenecks to scalability (algorithmic as well as implementation) for a fixed-size problem when the number of processors grows to several thousands (the expected level of concurrency on terascale architectures). We also evaluate the hybrid programming model (mixed distributed/shared) from a performance standpoint

    Parallel simulations of reacting two-phase flows - A DoD Grand Challenge progress report

    Full text link
    Parallel simulation of unsteady turbulent combustion is carried out for a range of precursor test problems leading to the development of a new methodology for reacting two-phase flows. Simulations are carried out using large-eddy simulations (LES) which allows full spatio-temporal resolution of all scales larger than the grid resolution with the unresolved small-scales modeled by a localized dynamic one-equation subgrid models. For two-phase applications, Lagrangian tracking of a range of droplets is carried out and is fully coupled to the Eulerian gas phase flow. An extension of this approach to accurately deal with small-scale scalar mixing and chemical reactions has been carried out using an innovative model that is implemented within each LES cell, to account for the effects of small-scale mixing and molecular diffusion on the chemical processes. The first year's effort focused on validating this methodology using both simple and complex test configurations. Highly optimized parallel LES codes are used for these studies. In addition to parallel scaleup data, results discussed in this paper include stagnation point premixed flame, opposed jet diffusion flame, highly swirling premixed flame in a General Electric combustor and two-phase mixing and vaporization in mixing layers. Comparison with experimental data wherever possible, clearly demonstrates the unique capabilities of the new subgrid combustion LES model

    Research in Applied Mathematics, Fluid Mechanics and Computer Science

    Get PDF
    This report summarizes research conducted at the Institute for Computer Applications in Science and Engineering in applied mathematics, fluid mechanics, and computer science during the period October 1, 1998 through March 31, 1999

    Large Eddy Simulation of separating flows from curved surfaces

    Get PDF
    PhDThe capabilities and limitations of LES in predicting separation from curved surfaces at high Reynolds number are at the centre of this Thesis. Issues of particular interest are mesh resolution, subgrid-scale modelling and near-wall approximations aiming to reduce the computational cost. Two cases are examined: a flow separating in a channel with streamwise periodic constrictions (hills), and the flow around a single-element, high-lift aerofoil at a Reynolds number of 2.1 . 106. Prior to these studies, fully-developed channel-flow simulations are considered. These show substantial differences among subgrid-scale models in terms of the subgrid-scale viscosity magnitude and its wall-asymptotic variation. Modelling and numerical errors appear to counteract each other, thus reducing the total error. Wall functions axe shown to be a cost-effective approach, providing a reasonably accurate approximation in near-equilibrium conditions. Adequate resolution remains critical, however, in achieving successful simulations. In the hill flow, separation occurs downstream of the hill crest, reattachment takes place about half-way between two consecutive hills and partial recovery occurs prior to a re-acceleration on the following hill. A highly-resolved simulation, performed to produce -benchmark data, permits an extensive study of the flow properties. Coarser mesh simulations are then compared with the former. These highlight the influence of the streamwise discretisation around the separation point and the role played by the implementation details of the wall treatments, while the subgrid-scale models influence is less significant. The aerofoil, which features transition and separation, is extremely challenging and at the edge of current LES capabilities. None of the simulations reproduce 2 the experimental data well. Indications on the sensitivity to various parameters, including the numerical scheme, the mesh resolution and the spanwise extent, are extracted, however. The studies indicate the need for a structured mesh of about 80 million nodes to achieve the required accuracy. For the present study, this was unaffordable

    On the Construction of Deflation-Based Preconditioners

    Full text link

    On the construction of deflation-based preconditioners

    Get PDF
    In this article we introduce new bounds on the effective condition number of deflated and preconditioned-deflated symmetric positive definite linear systems. For the case of a subdomain deflation such as that of Nicolaides [SIAM J. Numer. Anal., 24 (1987), pp. 355--365], these theorems can provide direction in choosing a proper decomposition into subdomains. If grid refinement is performed, keeping the subdomain grid resolution fixed, the condition number is insensitive to the grid size. Subdomain deflation is very easy to implement and has been parallelized on a distributed memory system with only a small amount of additional communication. Numerical experiments for a steady-state convection-diffusion problem are included
    corecore