200 research outputs found

    A bibliography on parallel and vector numerical algorithms

    Get PDF
    This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also

    Solution of partial differential equations on vector and parallel computers

    Get PDF
    The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed

    Implementation and analysis of a Navier-Stokes algorithm on parallel computers

    Get PDF
    The results of the implementation of a Navier-Stokes algorithm on three parallel/vector computers are presented. The object of this research is to determine how well, or poorly, a single numerical algorithm would map onto three different architectures. The algorithm is a compact difference scheme for the solution of the incompressible, two-dimensional, time-dependent Navier-Stokes equations. The computers were chosen so as to encompass a variety of architectures. They are the following: the MPP, an SIMD machine with 16K bit serial processors; Flex/32, an MIMD machine with 20 processors; and Cray/2. The implementation of the algorithm is discussed in relation to these architectures and measures of the performance on each machine are given. The basic comparison is among SIMD instruction parallelism on the MPP, MIMD process parallelism on the Flex/32, and vectorization of a serial code on the Cray/2. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Finally, conclusions are presented

    Parallel algorithms for boundary value problems

    Get PDF
    A general approach to solve boundary value problems numerically in a parallel environment is discussed. The basic algorithm consists of two steps: the local step where all the P available processors work in parallel, and the global step where one processor solves a tridiagonal linear system of the order P. The main advantages of this approach are two fold. First, this suggested approach is very flexible, especially in the local step and thus the algorithm can be used with any number of processors and with any of the SIMD or MIMD machines. Secondly, the communication complexity is very small and thus can be used as easily with shared memory machines. Several examples for using this strategy are discussed

    Turbomachinery CFD on parallel computers

    Get PDF
    The role of multistage turbomachinery simulation in the development of propulsion system models is discussed. Particularly, the need for simulations with higher fidelity and faster turnaround time is highlighted. It is shown how such fast simulations can be used in engineering-oriented environments. The use of parallel processing to achieve the required turnaround times is discussed. Current work by several researchers in this area is summarized. Parallel turbomachinery CFD research at the NASA Lewis Research Center is then highlighted. These efforts are focused on implementing the average-passage turbomachinery model on MIMD, distributed memory parallel computers. Performance results are given for inviscid, single blade row and viscous, multistage applications on several parallel computers, including networked workstations

    Parallel solution of high-order numerical schemes for solving incompressible flows

    Get PDF
    A new parallel numerical scheme for solving incompressible steady-state flows is presented. The algorithm uses a finite-difference approach to solving the Navier-Stokes equations. The algorithms are scalable and expandable. They may be used with only two processors or with as many processors as are available. The code is general and expandable. Any size grid may be used. Four processors of the NASA LeRC Hypercluster were used to solve for steady-state flow in a driven square cavity. The Hypercluster was configured in a distributed-memory, hypercube-like architecture. By using a 50-by-50 finite-difference solution grid, an efficiency of 74 percent (a speedup of 2.96) was obtained

    On the impact of communication complexity in the design of parallel numerical algorithms

    Get PDF
    This paper describes two models of the cost of data movement in parallel numerical algorithms. One model is a generalization of an approach due to Hockney, and is suitable for shared memory multiprocessors where each processor has vector capabilities. The other model is applicable to highly parallel nonshared memory MIMD systems. In the second model, algorithm performance is characterized in terms of the communication network design. Techniques used in VLSI complexity theory are also brought in, and algorithm independent upper bounds on system performance are derived for several problems that are important to scientific computation

    Where are the parallel algorithms?

    Get PDF
    Four paradigms that can be useful in developing parallel algorithms are discussed. These include computational complexity analysis, changing the order of computation, asynchronous computation, and divide and conquer. Each is illustrated with an example from scientific computation, and it is shown that computational complexity must be used with great care or an inefficient algorithm may be selected

    Concurrent implementation of the Crank-Nicolson method for heat transfer analysis

    Get PDF
    To exploit the significant gains in computing speed provided by Multiple Instruction Multiple Data (MIMD) computers, concurrent methods for practical problems need to be investigated and test problems implemented on actual hardware. One such problem class is heat transfer analysis which is important in many aerospace applications. This paper compares the efficiency of two alternate implementations of heat transfer analysis on an experimental MIMD computer called the Finite Element Machine (FEM). The implicit Crank-Nicolson method is used to solve concurrently the heat transfer equations by both iterative and direct methods. Comparison of actual timing results achieved for the two methods and their significance relative to more complex problems are discussed

    Implementation and Performance Analysis of Numerical Algorithms on the MPP, Flex/32, and Cray/2

    Get PDF
    This dissertation presents the results of the implementation of a number of numerical algorithms on three parallel/vector computers. The object of this research is to determine how well, or poorly, a number of numerical algorithms would map onto three different architectures and to analyze the performance of these architectures using these algorithms. These algorithms are: a relaxation scheme for the solution of the Cauchy-Riemann equations, an ADI method for the solution of the diffusion equation, and a compact difference scheme for the solution of two-dimensional Navier-Stokes equations. The computers were chosen so as to encompass a variety of architectures. They are: the MPP, an SIMD machine with 16K bit serial processors; Flex/32, an MIMD machine with 20 processors; and the Cray/2. The machine architectures are briefly described. The implementation of these algorithms is discussed in relation to these architectures and measures of the performance on each machine are given. The basic comparison is among SIMD instruction parallelism on the MPP, MIMD process parallelism on the Flex/32, and vectorization of a serial code on the Cray/2. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for these algorithms on these architectures. Finally conclusions are presented
    corecore