792 research outputs found
Development, Implementation, and Optimization of a Modern, Subsonic/Supersonic Panel Method
In the early stages of aircraft design, engineers consider many different design concepts, examining the trade-offs between different component arrangements and sizes, thrust and power requirements, etc. Because so many different designs are considered, it is best in the early stages of design to use simulation tools that are fast; accuracy is secondary. A common simulation tool for early design and analysis is the panel method. Panel methods were first developed in the 1950s and 1960s with the advent of modern computers. Despite being reasonably accurate and very fast, their development was abandoned in the late 1980s in favor of more complex and accurate simulation methods. The panel methods developed in the 1980s are still in use by aircraft designers today because of their accuracy and speed. However, they are cumbersome to use and limited in applicability. The purpose of this work is to reexamine panel methods in a modern context. In particular, this work focuses on the application of panel methods to supersonic aircraft (a supersonic aircraft is one that flies faster than the speed of sound). Various aspects of the panel method, including the distributions of the unknown flow variables on the surface of the aircraft and efficiently solving for these unknowns, are discussed. Trade-offs between alternative formulations are examined and recommendations given. This work also serves to bring together, clarify, and condense much of the literature previously published regarding panel methods so as to assist future developers of panel methods
Algebraic Temporal Blocking for Sparse Iterative Solvers on Multi-Core CPUs
Sparse linear iterative solvers are essential for many large-scale
simulations. Much of the runtime of these solvers is often spent in the
implicit evaluation of matrix polynomials via a sequence of sparse
matrix-vector products. A variety of approaches has been proposed to make these
polynomial evaluations explicit (i.e., fix the coefficients), e.g., polynomial
preconditioners or s-step Krylov methods. Furthermore, it is nowadays a popular
practice to approximate triangular solves by a matrix polynomial to increase
parallelism. Such algorithms allow to evaluate the polynomial using a so-called
matrix power kernel (MPK), which computes the product between a power of a
sparse matrix A and a dense vector x, or a related operation. Recently we have
shown that using the level-based formulation of sparse matrix-vector
multiplications in the Recursive Algebraic Coloring Engine (RACE) framework we
can perform temporal cache blocking of MPK to increase its performance. In this
work, we demonstrate the application of this cache-blocking optimization in
sparse iterative solvers.
By integrating the RACE library into the Trilinos framework, we demonstrate
the speedups achieved in preconditioned) s-step GMRES, polynomial
preconditioners, and algebraic multigrid (AMG). For MPK-dominated algorithms we
achieve speedups of up to 3x on modern multi-core compute nodes. For algorithms
with moderate contributions from subspace orthogonalization, the gain reduces
significantly, which is often caused by the insufficient quality of the
orthogonalization routines. Finally, we showcase the application of
RACE-accelerated solvers in a real-world wind turbine simulation (Nalu-Wind)
and highlight the new opportunities and perspectives opened up by RACE as a
cache-blocking technique for MPK-enabled sparse solvers.Comment: 25 pages, 11 figures, 3 table
Mixed Precision Iterative Refinement with Adaptive Precision Sparse Approximate Inverse Preconditioning
Hardware trends have motivated the development of mixed precision algo-rithms
in numerical linear algebra, which aim to decrease runtime while maintaining
acceptable accuracy. One recent development is the development of an adaptive
precision sparse matrix-vector produce routine, which may be used to accelerate
the solution of sparse linear systems by iterative methods. This approach is
also applicable to the application of inexact preconditioners, such as sparse
approximate inverse preconditioners used in Krylov subspace methods. In this
work, we develop an adaptive precision sparse approximate inverse
preconditioner and demonstrate its use within a five-precision GMRES-based
iterative refinement method. We call this algorithm variant BSPAI-GMRES-IR. We
then analyze the conditions for the convergence of BSPAI-GMRES-IR, and
determine settings under which BSPAI-GMRES-IR will produce similar backward and
forward errors as the existing SPAI-GMRES-IR method, the latter of which does
not use adaptive precision in preconditioning. Our numerical experiments show
that this approach can potentially lead to a reduction in the cost of storing
and applying sparse approximate inverse preconditioners, although a significant
reduction in cost may comes at the expense of increasing the number of GMRES
iterations required for convergence
LIPIcs, Volume 261, ICALP 2023, Complete Volume
LIPIcs, Volume 261, ICALP 2023, Complete Volum
A full approximation scheme multilevel method for nonlinear variational inequalities
We present the full approximation scheme constraint decomposition (FASCD)
multilevel method for solving variational inequalities (VIs). FASCD is a common
extension of both the full approximation scheme (FAS) multigrid technique for
nonlinear partial differential equations, due to A.~Brandt, and the constraint
decomposition (CD) method introduced by X.-C.~Tai for VIs arising in
optimization. We extend the CD idea by exploiting the telescoping nature of
certain function space subset decompositions arising from multilevel mesh
hierarchies. When a reduced-space (active set) Newton method is applied as a
smoother, with work proportional to the number of unknowns on a given mesh
level, FASCD V-cycles exhibit nearly mesh-independent convergence rates, and
full multigrid cycles are optimal solvers. The example problems include
differential operators which are symmetric linear, nonsymmetric linear, and
nonlinear, in unilateral and bilateral VI problems.Comment: 25 pages, 9 figure
An Evaluation and Comparison of GPU Hardware and Solver Libraries for Accelerating the OPM Flow Reservoir Simulator
Realistic reservoir simulation is known to be prohibitively expensive in
terms of computation time when increasing the accuracy of the simulation or by
enlarging the model grid size. One method to address this issue is to
parallelize the computation by dividing the model in several partitions and
using multiple CPUs to compute the result using techniques such as MPI and
multi-threading. Alternatively, GPUs are also a good candidate to accelerate
the computation due to their massively parallel architecture that allows many
floating point operations per second to be performed. The numerical iterative
solver takes thus the most computational time and is challenging to solve
efficiently due to the dependencies that exist in the model between cells. In
this work, we evaluate the OPM Flow simulator and compare several
state-of-the-art GPU solver libraries as well as custom developed solutions for
a BiCGStab solver using an ILU0 preconditioner and benchmark their performance
against the default DUNE library implementation running on multiple CPU
processors using MPI. The evaluated GPU software libraries include a manual
linear solver in OpenCL and the integration of several third party sparse
linear algebra libraries, such as cuSparse, rocSparse, and amgcl. To perform
our bench-marking, we use small, medium, and large use cases, starting with the
public test case NORNE that includes approximately 50k active cells and ending
with a large model that includes approximately 1 million active cells. We find
that a GPU can accelerate a single dual-threaded MPI process up to 5.6 times,
and that it can compare with around 8 dual-threaded MPI processes
Algebraic, Block and Multiplicative Preconditioners based on Fast Tridiagonal Solves on GPUs
This thesis contributes to the field of sparse linear algebra, graph applications, and preconditioners for Krylov iterative solvers of sparse linear equation systems, by providing a (block) tridiagonal solver library, a generalized sparse matrix-vector implementation, a linear forest extraction, and a multiplicative preconditioner based on tridiagonal solves. The tridiagonal library, which supports (scaled) partial pivoting, outperforms cuSPARSE's tridiagonal solver by factor five while completely utilizing the available GPU memory bandwidth. For the performance optimized solving of multiple right-hand sides, the explicit factorization of the tridiagonal matrix can be computed. The extraction of a weighted linear forest (union of disjoint paths) from a general graph is used to build algebraic (block) tridiagonal preconditioners and deploys the generalized sparse-matrix vector implementation of this thesis for preconditioner construction. During linear forest extraction, a new parallel bidirectional scan pattern, which can operate on double-linked list structures, identifies the path ID and the position of a vertex. The algebraic preconditioner construction is also used to build more advanced preconditioners, which contain multiple tridiagonal factors, based on generalized ILU factorizations. Additionally, other preconditioners based on tridiagonal factors are presented and evaluated in comparison to ILU and ILU incomplete sparse approximate inverse preconditioners (ILU-ISAI) for the solution of large sparse linear equation systems from the Sparse Matrix Collection. For all presented problems of this thesis, an efficient parallel algorithm and its CUDA implementation for single GPU systems is provided
Recycling Krylov Subspaces for Efficient Partitioned Solution of Aerostructural Adjoint Systems
Robust and efficient solvers for coupled-adjoint linear systems are crucial
to successful aerostructural optimization. Monolithic and partitioned
strategies can be applied. The monolithic approach is expected to offer better
robustness and efficiency for strong fluid-structure interactions. However, it
requires a high implementation cost and convergence may depend on appropriate
scaling and initialization strategies. On the other hand, the modularity of the
partitioned method enables a straightforward implementation while its
convergence may require relaxation. In addition, a partitioned solver leads to
a higher number of iterations to get the same level of convergence as the
monolithic one.
The objective of this paper is to accelerate the fluid-structure
coupled-adjoint partitioned solver by considering techniques borrowed from
approximate invariant subspace recycling strategies adapted to sequences of
linear systems with varying right-hand sides. Indeed, in a partitioned
framework, the structural source term attached to the fluid block of equations
affects the right-hand side with the nice property of quickly converging to a
constant value. We also consider deflation of approximate eigenvectors in
conjunction with advanced inner-outer Krylov solvers for the fluid block
equations. We demonstrate the benefit of these techniques by computing the
coupled derivatives of an aeroelastic configuration of the ONERA-M6 fixed wing
in transonic flow. For this exercise the fluid grid was coupled to a structural
model specifically designed to exhibit a high flexibility. All computations are
performed using RANS flow modeling and a fully linearized one-equation
Spalart-Allmaras turbulence model. Numerical simulations show up to 39%
reduction in matrix-vector products for GCRO-DR and up to 19% for the nested
FGCRO-DR solver.Comment: 42 pages, 21 figure
- …