123 research outputs found

    A bibliography on parallel and vector numerical algorithms

    Get PDF
    This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also

    Alternating-Direction Line-Relaxation Methods on Multicomputers

    Get PDF
    We study the multicom.puter performance of a three-dimensional Navier–Stokes solver based on alternating-direction line-relaxation methods. We compare several multicomputer implementations, each of which combines a particular line-relaxation method and a particular distributed block-tridiagonal solver. In our experiments, the problem size was determined by resolution requirements of the application. As a result, the granularity of the computations of our study is finer than is customary in the performance analysis of concurrent block-tridiagonal solvers. Our best results were obtained with a modified half-Gauss–Seidel line-relaxation method implemented by means of a new iterative block-tridiagonal solver that is developed here. Most computations were performed on the Intel Touchstone Delta, but we also used the Intel Paragon XP/S, the Parsytec SC-256, and the Fujitsu S-600 for comparison

    Solving large sparse eigenvalue problems on supercomputers

    Get PDF
    An important problem in scientific computing consists in finding a few eigenvalues and corresponding eigenvectors of a very large and sparse matrix. The most popular methods to solve these problems are based on projection techniques on appropriate subspaces. The main attraction of these methods is that they only require the use of the matrix in the form of matrix by vector multiplications. The implementations on supercomputers of two such methods for symmetric matrices, namely Lanczos' method and Davidson's method are compared. Since one of the most important operations in these two methods is the multiplication of vectors by the sparse matrix, methods of performing this operation efficiently are discussed. The advantages and the disadvantages of each method are compared and implementation aspects are discussed. Numerical experiments on a one processor CRAY 2 and CRAY X-MP are reported. Possible parallel implementations are also discussed

    Solution of partial differential equations on vector and parallel computers

    Get PDF
    The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed

    The use of Lanczos's method to solve the large generalized symmetric definite eigenvalue problem

    Get PDF
    The generalized eigenvalue problem, Kx = Lambda Mx, is of significant practical importance, especially in structural enginering where it arises as the vibration and buckling problem. A new algorithm, LANZ, based on Lanczos's method is developed. LANZ uses a technique called dynamic shifting to improve the efficiency and reliability of the Lanczos algorithm. A new algorithm for solving the tridiagonal matrices that arise when using Lanczos's method is described. A modification of Parlett and Scott's selective orthogonalization algorithm is proposed. Results from an implementation of LANZ on a Convex C-220 show it to be superior to a subspace iteration code

    Alternating direction implicit time integrations for finite difference acoustic wave propagation: Parallelization and convergence

    Full text link
    This work studies the parallelization and empirical convergence of two finite difference acoustic wave propagation methods on 2-D rectangular grids, that use the same alternating direction implicit (ADI) time integration. This ADI integration is based on a second-order implicit Crank-Nicolson temporal discretization that is factored out by a Peaceman-Rachford decomposition of the time and space equation terms. In space, these methods highly diverge and apply different fourth-order accurate differentiation techniques. The first method uses compact finite differences (CFD) on nodal meshes that requires solving tridiagonal linear systems along each grid line, while the second one employs staggered-grid mimetic finite differences (MFD). For each method, we implement three parallel versions: (i) a multithreaded code in Octave, (ii) a C++ code that exploits OpenMP loop parallelization, and (iii) a CUDA kernel for a NVIDIA GTX 960 Maxwell card. In these implementations, the main source of parallelism is the simultaneous ADI updating of each wave field matrix, either column-wise or row-wise, according to the differentiation direction. In our numerical applications, the highest performances are displayed by the CFD and MFD CUDA codes that achieve speedups of 7.21x and 15.81x, respectively, relative to their C++ sequential counterparts with optimal compilation flags. Our test cases also allow to assess the numerical convergence and accuracy of both methods. In a problem with exact harmonic solution, both methods exhibit convergence rates close to 4 and the MDF accuracy is practically higher. Alternatively, both convergences decay to second order on smooth problems with severe gradients at boundaries, and the MDF rates degrade in highly-resolved grids leading to larger inaccuracies. This transition of empirical convergences agrees with the nominal truncation errors in space and time.Comment: 20 pages, 5 figure

    High-Performance Computing: Dos and Don’ts

    Get PDF
    Computational fluid dynamics (CFD) is the main field of computational mechanics that has historically benefited from advances in high-performance computing. High-performance computing involves several techniques to make a simulation efficient and fast, such as distributed memory parallelism, shared memory parallelism, vectorization, memory access optimizations, etc. As an introduction, we present the anatomy of supercomputers, with special emphasis on HPC aspects relevant to CFD. Then, we develop some of the HPC concepts and numerical techniques applied to the complete CFD simulation framework: from preprocess (meshing) to postprocess (visualization) through the simulation itself (assembly and iterative solvers)

    Domain decomposition methods for the parallel computation of reacting flows

    Get PDF
    Domain decomposition is a natural route to parallel computing for partial differential equation solvers. Subdomains of which the original domain of definition is comprised are assigned to independent processors at the price of periodic coordination between processors to compute global parameters and maintain the requisite degree of continuity of the solution at the subdomain interfaces. In the domain-decomposed solution of steady multidimensional systems of PDEs by finite difference methods using a pseudo-transient version of Newton iteration, the only portion of the computation which generally stands in the way of efficient parallelization is the solution of the large, sparse linear systems arising at each Newton step. For some Jacobian matrices drawn from an actual two-dimensional reacting flow problem, comparisons are made between relaxation-based linear solvers and also preconditioned iterative methods of Conjugate Gradient and Chebyshev type, focusing attention on both iteration count and global inner product count. The generalized minimum residual method with block-ILU preconditioning is judged the best serial method among those considered, and parallel numerical experiments on the Encore Multimax demonstrate for it approximately 10-fold speedup on 16 processors
    • …
    corecore