191 research outputs found
A bibliography on parallel and vector numerical algorithms
This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also
Solution of partial differential equations on vector and parallel computers
The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed
On the impact of communication complexity in the design of parallel numerical algorithms
This paper describes two models of the cost of data movement in parallel numerical algorithms. One model is a generalization of an approach due to Hockney, and is suitable for shared memory multiprocessors where each processor has vector capabilities. The other model is applicable to highly parallel nonshared memory MIMD systems. In the second model, algorithm performance is characterized in terms of the communication network design. Techniques used in VLSI complexity theory are also brought in, and algorithm independent upper bounds on system performance are derived for several problems that are important to scientific computation
Massively parallel Poisson and QR factorization solvers
AbstractThe paper brings a massively parallel Poisson solver for rectangle domain and parallel algorithms for computation of QR factorization of a dense matrix A by means of Householder reflections and Givens rotations. The computer model under consideration is a SIMD mesh-connected toroidal n × n processor array.The Dirichlet problem is replaced by its finite-difference analog on an M × N (M + 1, N are powers of two) grid. The algorithm is composed of parallel fast sine transform and cyclic odd-even reduction blocks and runs in a fully parallel fashion. Its computational complexity is O(M N log Ln2), where L = max(M + 1, N). A parallel proposal of QR factorization by the Householder method zeros all subdiagonal elements in each column and updates all elements of the given submatrix in parallel. For the second method with Givens rotations, the parallel scheme of the Sameh and Kuck was chosen where the disjoint rotations can be computed simultaneously.The algorithms were coded in MPF and MPL parallel programming languages and results of computational experiments on the MasPar MP-1 system are also presented
Numerics of High Performance Computers and Benchmark Evaluation of Distributed Memory Computers
The internal representation of numerical data, their speed of manipulation to generate the desired result through efficient utilisation of central processing unit, memory, and communication links are essential steps of all high performance scientific computations. Machine parameters, in particular, reveal accuracy and error bounds of computation, required for performance tuning of codes. This paper reports diagnosis of machine parameters, measurement of computing power of several workstations, serial and parallel computers, and a component-wise test procedure for distributed memory computers. Hierarchical memory structure is illustrated by block copying and unrolling techniques. Locality of reference for cache reuse of data is amply demonstrated by fast Fourier transform codes. Cache and register-blocking technique results in their optimum utilisation with consequent gain in throughput during vector-matrix operations. Implementation of these memory management techniques reduces cache inefficiency loss, which is known to be proportional to the number of processors. Of the two Linux clusters-ANUP16, HPC22 and HPC64, it has been found from the measurement of intrinsic parameters and from application benchmark of multi-block Euler code test run that ANUP16 is suitable for problems that exhibit fine-grained parallelism. The delivered performance of ANUP16 is of immense utility for developing high-end PC clusters like HPC64 and customised parallel computers with added advantage of speed and high degree of parallelism
Recommended from our members
The simulation of fluid flow processes using vector processors
In this thesis the potential gains in vectorisation of linear and non-linear systems of equations are investigated. Previous studies carried out on the suitability of algorithms for vectorisation have been based on the solution of Poisson's equation. In accordance with this, a range of algorithms are explored and compared using a VA-1 pipeline processor attached to a MASSCOMP MC5400. Analysis shows that almost full vectorisation is possible leading to speed-up factors of up to 90. Based on these results the vectorised conjugate gradient with a Jacobi preconditioner (JCGV) is the best of the algorithms considered.
This work is extended to the development of a two-dimensional fluid flow code which is used to solve the Navier-Stokes equations, SIMPLE is implemented to handle the non-linear nature of the equations. The first two problems are isothermal flows, viz, the 'moving lid cavity' and the 'sudden expansion in a duct' problem. A study of where the greatest computational effort is expended, and subsequent vectorisation leads to 98% of SIMPLE being modified. This results in speed-up factors of 6 for the cavity problem and 29 for the sudden expansion problem. In both problems the JCGV is marginally faster than the vectorised Jacobi with under-relaxation (JURY). However, the JCGV algorithm is not robust and it is necessary to relax carefully the approximation, otherwise high computation times or divergence is likely.
Two further problems are considered each with increasing complexity, these include scalar quantities of temperature and characteristics of k-e turbulence. One problem is based on 'turbulent L-shaped flow in a duct' and the other on the 'natural convection in a square cavity'. A consequence of the higher scalar computation gives speed-up factors of 5 for the turbulent L-shaped flow and 11 for the natural convection problem. There is little to choose between the JCGV and JURV algorithms, however, the robustness problems with the JCGV algorithm remain.
A multigrid method (ACM) is used to improve the convergence rate of the algorithms, particularly as the size of problem is increased. Although it is more effective in scalar, it also provides worthwhile improvements for the vectorised algorithms with overall factors of 8.5. Convergence difficulties with the JCG algorithm also prevents the combination with the ACM method. Therefore, the vectorised JUR algorithm with the ACM method is not only more efficient and reliable, but also has scope for improvement as the grid is increased.
The potential gains in vectorisation of the SIMPLE family on pipeline architectures have been clearly demonstrated and indicate that such efforts on practical CFD codes should be well rewarded with regard to processor performance
Adapting a Navier-Stokes code to the ICL-DAP
The results of an experiment are reported, i.c., to adapt a Navier-Stokes code, originally developed on a serial computer, to concurrent processing on the CL Distributed Array Processor (DAP). The algorithm used in solving the Navier-Stokes equations is briefly described. The architecture of the DAP and DAP FORTRAN are also described. The modifications of the algorithm so as to fit the DAP are given and discussed. Finally, performance results are given and conclusions are drawn
A parallel finite element algorithm for 3D incompressible flow in velocity-vorticity form
In the last decade, developments and advancement in computer technology, especially the availability of the massively parallel machine, have escalated the numerical treatment of complex fluid flow problems to a new height. Numerical simulation of incompressible viscous fluid flow, often associated with practical industrial and environmental situations, is receiving intense scrutiny to perform in the promising distributed parallel computing environment. On the other hand, the field of computational fluid dynamics continues to explore and exploit unified and versatile formulations, in contention with the notorious divergence-free velocity field constraint, for incompressible Navier-Stokes equations that encompass fluid flow in two- and threedimensions. The velocity-vorticity formulation for the incompressible Navier-Stokes equations is chosen with the full extent to resolve these issues. In the present dissertation, a new finite element implementation for two- and three-dimensional incompressible fluid flow is developed in the velocity-vorticity form. Pressure is eliminated analytically by taking the curl of the momentum equations, and vorticity is introduced as the active variable. The formulation consists of the three derived vorticity transport equations in conjunction with three velocity Poisson equations. Satisfaction of the continuity constraint is cast onto the specific treatment of the kinematic vorticity boundary condition for the no slip wall. A divergence-free solution is guaranteed with equal order finite element interpolation functions for all state variables
A review of parallel finite element methods on the DAP
AbstractThis paper reviews the research work that has been done to implement the finite element method for solving partial differential equations on the ICL distributed array processor (DAP). A brief outline of the principle features of the method is given, followed by details of the novel techniques required for implementation on the highly parallel architecture. Various methods of solution of the finite element equations are discussed; both direct and iterative techniques are included. The current state-of-the-art favours the use of the preconditioned conjugate gradient method. Some suggestions for future research work on parallel finite element methods are made
Parallel unstructured solvers for linear partial differential equations
This thesis presents the development of a parallel algorithm to solve symmetric
systems of linear equations and the computational implementation of a parallel
partial differential equations solver for unstructured meshes. The proposed
method, called distributive conjugate gradient - DCG, is based on a single-level
domain decomposition method and the conjugate gradient method to obtain a
highly scalable parallel algorithm.
An overview on methods for the discretization of domains and partial differential
equations is given. The partition and refinement of meshes is discussed and
the formulation of the weighted residual method for two- and three-dimensions
presented. Some of the methods to solve systems of linear equations are introduced,
highlighting the conjugate gradient method and domain decomposition
methods. A parallel unstructured PDE solver is proposed and its actual implementation
presented. Emphasis is given to the data partition adopted and the
scheme used for communication among adjacent subdomains is explained. A series
of experiments in processor scalability is also reported.
The derivation and parallelization of DCG are presented and the method validated
throughout numerical experiments. The method capabilities and limitations
were investigated by the solution of the Poisson equation with various source
terms. The experimental results obtained using the parallel solver developed as
part of this work show that the algorithm presented is accurate and highly scalable,
achieving roughly linear parallel speed-up in many of the cases tested
- …