67 research outputs found
A bibliography on parallel and vector numerical algorithms
This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also
Solution of partial differential equations on vector and parallel computers
The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed
A Parallel Branch and Bound Algorithm for Integer Linear Programming Models
A parallel branch and bound algorithm is developed for use with MIMD computers to study the efficiency of parallel processors on general integer linear programming problems. The Haldi and IBM test problems and a System Design model are used in the implementation of the algorithm. Initially the algorithm solves the Haldi and IBM test problems on a single processor computer which simulates a multiple processor computer. The algorithm is then implemented on the Denelcor HEP multiprocessor using two of the IBM problems to compare the results of the simulation to the results using an MIMD computer. Finally the algorithm is implemented on the HEP using the System Design model to show a case in which the number of pivots decreases as the number of processes are increased from seven to the process limit of sixteen.
In general, it is shown that super linear efficiency can be achieved using multiple processors
Parallel pivoting combined with parallel reduction
Parallel algorithms for triangularization of large, sparse, and unsymmetric matrices are presented. The method combines the parallel reduction with a new parallel pivoting technique, control over generations of fill-ins and a check for numerical stability, all done in parallel with the work being distributed over the active processes. The parallel technique uses the compatibility relation between pivots to identify parallel pivot candidates and uses the Markowitz number of pivots to minimize fill-in. This technique is not a preordering of the sparse matrix and is applied dynamically as the decomposition proceeds
Parallel alogorithms for MIMD parallel computers
This thesis mainly covers the design and analysis of asynchronous
parallel algorithms that can be run on MIMD (Multiple Instruction
Multiple Data) parallel computers, in particular the NEPTUNE system at
Loughborough University. Initially the fundamentals of parallel computer
architectures are introduced with different parallel architectures being
described and compared. The principles of parallel programming and the
design of parallel algorithms are also outlined. Also the main
characteristics of the 4 processor MIMD NEPTUNE system are presented,
and performance indicators, i.e. the speed-up and the efficiency factors
are defined for the measurement of parallelism in a given system.
Both numerical and non-numerical algorithms are covered in the
thesis. In the numerical solution of partial differential equations,
a new parallel 9-point block iterative method is developed. Here, the
organization of the blocks is done in such a way that each process
contains its own group of 9 points on the network, therefore, they can
be run in parallel. The parallel implementation of both 9-point and 4-
point block iterative methods were programmed using natural and redblack
ordering with synchronous and asynchronous approaches. The
results obtained for these different implementations were compared and
analysed.
Next the parallel version of the A.G.E. (Alternating Group Explicit)
method is developed in which the explicit nature of the difference
equation is revealed and exploited when applied to derive the solution
of both linear and non-linear 2-point boundary value problems. Two
strategies have been used in the implementation of the parallel A.G.E.
method using the synchronous and asynchronous approaches. The results
from these implementations were compared. Also for comparison reasons
the results obtained from the parallel A.G.E. were compared with the ~
corresponding results obtained from the parallel versions of the Jacobi,
Gauss-Seidel and S.O.R. methods. Finally, a computational complexity
analysis of the parallel A.G.E. algorithms is included.
In the area of non-numeric algorithms, the problems of sorting and
searching were studied. The sorting methods which were investigated
was the shell and the digit sort methods. with each method different
parallel strategies and approaches were used and compared to find the
best results which can be obtained on the parallel machine.
In the searching methods, the sequential search algorithm in an
unordered table and the binary search algorithms were investigated and
implemented in parallel with a presentation of the results. Finally,
a complexity analysis of these methods is presented.
The thesis concludes with a chapter summarizing the main results
Mapping Signal Processing Algorithms on Parallel Arcidtectures
Electrical Engineerin
Run-time parallelization and scheduling of loops
Run time methods are studied to automatically parallelize and schedule iterations of a do loop in certain cases, where compile-time information is inadequate. The methods presented involve execution time preprocessing of the loop. At compile-time, these methods set up the framework for performing a loop dependency analysis. At run time, wave fronts of concurrently executable loop iterations are identified. Using this wavefront information, loop iterations are reordered for increased parallelism. Symbolic transformation rules are used to produce: inspector procedures that perform execution time preprocessing and executors or transformed versions of source code loop structures. These transformed loop structures carry out the calculations planned in the inspector procedures. Performance results are presented from experiments conducted on the Encore Multimax. These results illustrate that run time reordering of loop indices can have a significant impact on performance. Furthermore, the overheads associated with this type of reordering are amortized when the loop is executed several times with the same dependency structure
- …