Search CORE

7 research outputs found

Recommended from our members

Multiplication of Matrices of Arbitrary Shape on a Data Parallel Computer

Author: Johnsson S. Lennart
Mathur Kapil K.
Publication venue
Publication date: 06/10/2015
Field of study

Some level-2 and level-3 Distributed Basic Linear Algebra Subroutines (DBLAS) that have been implemented on the Connection Machine system CM-200 are described. No assumption is made on the shape or size of the operands. For matrix-matrix multiplication, both the nonsystolic and the systolic algorithms are outlined. A systolic algorithm that computes the product matrix in-place is described in detail. We show that a level-3 DBLAS yields better performance than a level-2 DBLAS. On the Connection Machine system CM-200, blocking yields a performance improvement by a factor of up to three over level-2 DBLAS. For certain matrix shapes the systolic algorithms offer both improved performance and significantly reduced temporary storage requirements compared to the nonsystolic block algorithms. We show that, in order to minimize the communication time, an algorithm that leaves the largest operand matrix stationary should be chosen for matrix-matrix multiplication. Furthermore, it is shown both analytically and experimentally that the optimum shape of the processor array yields square stationary submatrices in each processor, i.e., the ratio between the length of the axes of the processing array must be the same as the ratio between the corresponding axes of the stationary matrix. The optimum processor array shape may yield a factor of square matrices. For rectangular matrices a factor of 30 improvement was observed for an optimum processor array shape compared to a poorly chosen processor array shape.Engineering and Applied Science

Harvard University - DASH

Efficient algorithms for the optical multi-trees (OMULT) architecture.

Author: Islam Mohammad Rabiul
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2004
Field of study

In this thesis, we have reported our investigations on efficiently implementing algorithms on the recently proposed Optical Multi-Trees (OMULT) multi-processors interconnection architecture that uses both electronic and optical links among processors. We have investigated algorithms for matrix multiplication of two matrices of size n2 x n2 and two matrices of arbitrary size, the prefix-sum of a series and some fundamental computational geometry problems. We show that some common algorithms for computational geometry---finding the convex hull, the smallest enclosing box, the empirical cumulative distribution function and the all-nearest neighbor problems of n data points can be computed on the OMULT network in O(log n) time, compared to O(√n) algorithms on the Optical Transpose Interconnection System (OTIS) mesh for each of these problems. Finally we have implemented our algorithm for matrix multiplication using the SimJava simulation tool and feel that this is a convenient environment for testing such parallel algorithms.Dept. of Computer Science. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2004 .I85. Source: Masters Abstracts International, Volume: 43-05, page: 1751. Adviser: Subir Bandyopadhyay. Thesis (M.Sc.)--University of Windsor (Canada), 2004

Scholarship at UWindsor

All-to-All Communication on the Connection Machine CM-200

Author
Publication venue: 'Hindawi Limited'
Publication date: 01/01/1995
Field of study

Crossref

Codesign Tradeoffs for High-Performance, Low-Power Linear Algebra Architectures

Author: Andreas Gerstlauer
Ardavan Pedram
Robert A. van de Geijn
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Algorithme innovant pour le traitement parallèle basé sur l'indépendance des tâches et la décomposition des données

Author: Abu Azab Hussam Hussein
Publication venue
Publication date: 01/01/2017
Field of study

Dépôt numérique de UQTR

Multiplication of Matrices of Arbitrary Shape on a Data Parallel Computer

Author: Kapil Mathur
S. Lennart Johnsson
Publication venue
Publication date: 01/01/1994
Field of study

Some level--2 and level--3 Distributed Basic Linear Algebra Subroutines (DBLAS) that have been implemented on the Connection Machine system CM--200 are described. For matrix--matrix multiplication, both the nonsystolic and the systolic algorithms are outlined. A systolic algorithm that computes the product matrix in--place is described in detail. All algorithms that are presented here are part of the Connection Machine Scientific Software Library, CMSSL. We show that a level--3 DBLAS yields better performance than a level--2 DBLAS. On the Connection Machine system CM--200, blocking yields a performance improvement by a factor of up to three over level--2 DBLAS. For certain matrix shapes the systolic algorithms offer both improved performance and significantly reduced temporary storage requirements compared to the nonsystolic block algorithms. The performance improvement over the blocked nonsystolic algorithms may be as much as a factor of seven, or more than a factor of 20 over the lev..

CiteSeerX