7 research outputs found

    Efficient algorithms for the optical multi-trees (OMULT) architecture.

    Get PDF
    In this thesis, we have reported our investigations on efficiently implementing algorithms on the recently proposed Optical Multi-Trees (OMULT) multi-processors interconnection architecture that uses both electronic and optical links among processors. We have investigated algorithms for matrix multiplication of two matrices of size n2 x n2 and two matrices of arbitrary size, the prefix-sum of a series and some fundamental computational geometry problems. We show that some common algorithms for computational geometry---finding the convex hull, the smallest enclosing box, the empirical cumulative distribution function and the all-nearest neighbor problems of n data points can be computed on the OMULT network in O(log n) time, compared to O(√n) algorithms on the Optical Transpose Interconnection System (OTIS) mesh for each of these problems. Finally we have implemented our algorithm for matrix multiplication using the SimJava simulation tool and feel that this is a convenient environment for testing such parallel algorithms.Dept. of Computer Science. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2004 .I85. Source: Masters Abstracts International, Volume: 43-05, page: 1751. Adviser: Subir Bandyopadhyay. Thesis (M.Sc.)--University of Windsor (Canada), 2004

    All-to-All Communication on the Connection Machine CM-200

    Get PDF

    Codesign Tradeoffs for High-Performance, Low-Power Linear Algebra Architectures

    Full text link

    Multiplication of Matrices of Arbitrary Shape on a Data Parallel Computer

    No full text
    Some level--2 and level--3 Distributed Basic Linear Algebra Subroutines (DBLAS) that have been implemented on the Connection Machine system CM--200 are described. For matrix--matrix multiplication, both the nonsystolic and the systolic algorithms are outlined. A systolic algorithm that computes the product matrix in--place is described in detail. All algorithms that are presented here are part of the Connection Machine Scientific Software Library, CMSSL. We show that a level--3 DBLAS yields better performance than a level--2 DBLAS. On the Connection Machine system CM--200, blocking yields a performance improvement by a factor of up to three over level--2 DBLAS. For certain matrix shapes the systolic algorithms offer both improved performance and significantly reduced temporary storage requirements compared to the nonsystolic block algorithms. The performance improvement over the blocked nonsystolic algorithms may be as much as a factor of seven, or more than a factor of 20 over the lev..
    corecore