191 research outputs found

    A bibliography on parallel and vector numerical algorithms

    Get PDF
    This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also

    Solution of partial differential equations on vector and parallel computers

    Get PDF
    The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed

    On the impact of communication complexity in the design of parallel numerical algorithms

    Get PDF
    This paper describes two models of the cost of data movement in parallel numerical algorithms. One model is a generalization of an approach due to Hockney, and is suitable for shared memory multiprocessors where each processor has vector capabilities. The other model is applicable to highly parallel nonshared memory MIMD systems. In the second model, algorithm performance is characterized in terms of the communication network design. Techniques used in VLSI complexity theory are also brought in, and algorithm independent upper bounds on system performance are derived for several problems that are important to scientific computation

    Massively parallel Poisson and QR factorization solvers

    Get PDF
    AbstractThe paper brings a massively parallel Poisson solver for rectangle domain and parallel algorithms for computation of QR factorization of a dense matrix A by means of Householder reflections and Givens rotations. The computer model under consideration is a SIMD mesh-connected toroidal n × n processor array.The Dirichlet problem is replaced by its finite-difference analog on an M × N (M + 1, N are powers of two) grid. The algorithm is composed of parallel fast sine transform and cyclic odd-even reduction blocks and runs in a fully parallel fashion. Its computational complexity is O(M N log Ln2), where L = max(M + 1, N). A parallel proposal of QR factorization by the Householder method zeros all subdiagonal elements in each column and updates all elements of the given submatrix in parallel. For the second method with Givens rotations, the parallel scheme of the Sameh and Kuck was chosen where the disjoint rotations can be computed simultaneously.The algorithms were coded in MPF and MPL parallel programming languages and results of computational experiments on the MasPar MP-1 system are also presented

    Numerics of High Performance Computers and Benchmark Evaluation of Distributed Memory Computers

    Get PDF
    The internal representation of numerical data, their speed of manipulation to generate the desired result through efficient utilisation of central processing unit, memory, and communication links are essential steps of all high performance scientific computations. Machine parameters, in particular, reveal accuracy and error bounds of computation, required for performance tuning of codes. This paper reports diagnosis of machine parameters, measurement of computing power of several workstations, serial and parallel computers, and a component-wise test procedure for distributed memory computers. Hierarchical memory structure is illustrated by block copying and unrolling techniques. Locality of reference for cache reuse of data is amply demonstrated by fast Fourier transform codes. Cache and register-blocking technique results in their optimum utilisation with consequent gain in throughput during vector-matrix operations. Implementation of these memory management techniques reduces cache inefficiency loss, which is known to be proportional to the number of processors. Of the two Linux clusters-ANUP16, HPC22 and HPC64, it has been found from the measurement of intrinsic parameters and from application benchmark of multi-block Euler code test run that ANUP16 is suitable for problems that exhibit fine-grained parallelism. The delivered performance of ANUP16 is of immense utility for developing high-end PC clusters like HPC64 and customised parallel computers with added advantage of speed and high degree of parallelism

    Adapting a Navier-Stokes code to the ICL-DAP

    Get PDF
    The results of an experiment are reported, i.c., to adapt a Navier-Stokes code, originally developed on a serial computer, to concurrent processing on the CL Distributed Array Processor (DAP). The algorithm used in solving the Navier-Stokes equations is briefly described. The architecture of the DAP and DAP FORTRAN are also described. The modifications of the algorithm so as to fit the DAP are given and discussed. Finally, performance results are given and conclusions are drawn

    A parallel finite element algorithm for 3D incompressible flow in velocity-vorticity form

    Get PDF
    In the last decade, developments and advancement in computer technology, especially the availability of the massively parallel machine, have escalated the numerical treatment of complex fluid flow problems to a new height. Numerical simulation of incompressible viscous fluid flow, often associated with practical industrial and environmental situations, is receiving intense scrutiny to perform in the promising distributed parallel computing environment. On the other hand, the field of computational fluid dynamics continues to explore and exploit unified and versatile formulations, in contention with the notorious divergence-free velocity field constraint, for incompressible Navier-Stokes equations that encompass fluid flow in two- and threedimensions. The velocity-vorticity formulation for the incompressible Navier-Stokes equations is chosen with the full extent to resolve these issues. In the present dissertation, a new finite element implementation for two- and three-dimensional incompressible fluid flow is developed in the velocity-vorticity form. Pressure is eliminated analytically by taking the curl of the momentum equations, and vorticity is introduced as the active variable. The formulation consists of the three derived vorticity transport equations in conjunction with three velocity Poisson equations. Satisfaction of the continuity constraint is cast onto the specific treatment of the kinematic vorticity boundary condition for the no slip wall. A divergence-free solution is guaranteed with equal order finite element interpolation functions for all state variables

    A review of parallel finite element methods on the DAP

    Get PDF
    AbstractThis paper reviews the research work that has been done to implement the finite element method for solving partial differential equations on the ICL distributed array processor (DAP). A brief outline of the principle features of the method is given, followed by details of the novel techniques required for implementation on the highly parallel architecture. Various methods of solution of the finite element equations are discussed; both direct and iterative techniques are included. The current state-of-the-art favours the use of the preconditioned conjugate gradient method. Some suggestions for future research work on parallel finite element methods are made

    Parallel unstructured solvers for linear partial differential equations

    Get PDF
    This thesis presents the development of a parallel algorithm to solve symmetric systems of linear equations and the computational implementation of a parallel partial differential equations solver for unstructured meshes. The proposed method, called distributive conjugate gradient - DCG, is based on a single-level domain decomposition method and the conjugate gradient method to obtain a highly scalable parallel algorithm. An overview on methods for the discretization of domains and partial differential equations is given. The partition and refinement of meshes is discussed and the formulation of the weighted residual method for two- and three-dimensions presented. Some of the methods to solve systems of linear equations are introduced, highlighting the conjugate gradient method and domain decomposition methods. A parallel unstructured PDE solver is proposed and its actual implementation presented. Emphasis is given to the data partition adopted and the scheme used for communication among adjacent subdomains is explained. A series of experiments in processor scalability is also reported. The derivation and parallelization of DCG are presented and the method validated throughout numerical experiments. The method capabilities and limitations were investigated by the solution of the Poisson equation with various source terms. The experimental results obtained using the parallel solver developed as part of this work show that the algorithm presented is accurate and highly scalable, achieving roughly linear parallel speed-up in many of the cases tested
    corecore