123 research outputs found
A bibliography on parallel and vector numerical algorithms
This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also
Alternating-Direction Line-Relaxation Methods on Multicomputers
We study the multicom.puter performance of a three-dimensional Navier–Stokes solver based on alternating-direction line-relaxation methods. We compare several multicomputer implementations, each of which combines a particular line-relaxation method and a particular distributed block-tridiagonal solver. In our experiments, the problem size was determined by resolution requirements of the application. As a result, the granularity of the computations of our study is finer than is customary in the performance analysis of concurrent block-tridiagonal solvers. Our best results were obtained with a modified half-Gauss–Seidel line-relaxation method implemented by means of a new iterative block-tridiagonal solver that is developed here. Most computations were performed on the Intel Touchstone Delta, but we also used the Intel Paragon XP/S, the Parsytec SC-256, and the Fujitsu S-600 for comparison
Solving large sparse eigenvalue problems on supercomputers
An important problem in scientific computing consists in finding a few eigenvalues and corresponding eigenvectors of a very large and sparse matrix. The most popular methods to solve these problems are based on projection techniques on appropriate subspaces. The main attraction of these methods is that they only require the use of the matrix in the form of matrix by vector multiplications. The implementations on supercomputers of two such methods for symmetric matrices, namely Lanczos' method and Davidson's method are compared. Since one of the most important operations in these two methods is the multiplication of vectors by the sparse matrix, methods of performing this operation efficiently are discussed. The advantages and the disadvantages of each method are compared and implementation aspects are discussed. Numerical experiments on a one processor CRAY 2 and CRAY X-MP are reported. Possible parallel implementations are also discussed
Solution of partial differential equations on vector and parallel computers
The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed
The use of Lanczos's method to solve the large generalized symmetric definite eigenvalue problem
The generalized eigenvalue problem, Kx = Lambda Mx, is of significant practical importance, especially in structural enginering where it arises as the vibration and buckling problem. A new algorithm, LANZ, based on Lanczos's method is developed. LANZ uses a technique called dynamic shifting to improve the efficiency and reliability of the Lanczos algorithm. A new algorithm for solving the tridiagonal matrices that arise when using Lanczos's method is described. A modification of Parlett and Scott's selective orthogonalization algorithm is proposed. Results from an implementation of LANZ on a Convex C-220 show it to be superior to a subspace iteration code
Alternating direction implicit time integrations for finite difference acoustic wave propagation: Parallelization and convergence
This work studies the parallelization and empirical convergence of two finite
difference acoustic wave propagation methods on 2-D rectangular grids, that use
the same alternating direction implicit (ADI) time integration. This ADI
integration is based on a second-order implicit Crank-Nicolson temporal
discretization that is factored out by a Peaceman-Rachford decomposition of the
time and space equation terms. In space, these methods highly diverge and apply
different fourth-order accurate differentiation techniques. The first method
uses compact finite differences (CFD) on nodal meshes that requires solving
tridiagonal linear systems along each grid line, while the second one employs
staggered-grid mimetic finite differences (MFD). For each method, we implement
three parallel versions: (i) a multithreaded code in Octave, (ii) a C++ code
that exploits OpenMP loop parallelization, and (iii) a CUDA kernel for a NVIDIA
GTX 960 Maxwell card. In these implementations, the main source of parallelism
is the simultaneous ADI updating of each wave field matrix, either column-wise
or row-wise, according to the differentiation direction. In our numerical
applications, the highest performances are displayed by the CFD and MFD CUDA
codes that achieve speedups of 7.21x and 15.81x, respectively, relative to
their C++ sequential counterparts with optimal compilation flags. Our test
cases also allow to assess the numerical convergence and accuracy of both
methods. In a problem with exact harmonic solution, both methods exhibit
convergence rates close to 4 and the MDF accuracy is practically higher.
Alternatively, both convergences decay to second order on smooth problems with
severe gradients at boundaries, and the MDF rates degrade in highly-resolved
grids leading to larger inaccuracies. This transition of empirical convergences
agrees with the nominal truncation errors in space and time.Comment: 20 pages, 5 figure
High-Performance Computing: Dos and Don’ts
Computational fluid dynamics (CFD) is the main field of computational mechanics that has historically benefited from advances in high-performance computing. High-performance computing involves several techniques to make a simulation efficient and fast, such as distributed memory parallelism, shared memory parallelism, vectorization, memory access optimizations, etc. As an introduction, we present the anatomy of supercomputers, with special emphasis on HPC aspects relevant to CFD. Then, we develop some of the HPC concepts and numerical techniques applied to the complete CFD simulation framework: from preprocess (meshing) to postprocess (visualization) through the simulation itself (assembly and iterative solvers)
Domain decomposition methods for the parallel computation of reacting flows
Domain decomposition is a natural route to parallel computing for partial differential equation solvers. Subdomains of which the original domain of definition is comprised are assigned to independent processors at the price of periodic coordination between processors to compute global parameters and maintain the requisite degree of continuity of the solution at the subdomain interfaces. In the domain-decomposed solution of steady multidimensional systems of PDEs by finite difference methods using a pseudo-transient version of Newton iteration, the only portion of the computation which generally stands in the way of efficient parallelization is the solution of the large, sparse linear systems arising at each Newton step. For some Jacobian matrices drawn from an actual two-dimensional reacting flow problem, comparisons are made between relaxation-based linear solvers and also preconditioned iterative methods of Conjugate Gradient and Chebyshev type, focusing attention on both iteration count and global inner product count. The generalized minimum residual method with block-ILU preconditioning is judged the best serial method among those considered, and parallel numerical experiments on the Encore Multimax demonstrate for it approximately 10-fold speedup on 16 processors
- …