Matrix Multiplication on Hypercubes Using Full Bandwidth and Constant Storage

Edelman, Alan; Ho, Ching-Tien; Johnsson, S. Lennart

Matrix Multiplication on Hypercubes Using Full Bandwidth and Constant Storage

Authors: Alan Edelman
Ching-Tien Ho
S. Lennart Johnsson
Publication date: 11 March 2016
Publisher

Abstract

For matrix multiplication on hypercube multiprocessors with the product matrix accumulated in place a processor must receive about P^2/√ N elements of each input operand, with operands of size PxP distributed evenly over N processors. With concurrent communication on all ports, the number of element transfers in sequence can be reduced to P^2/√N logN for each input operand. We present a two-level partitioning of the matrices and an algorithm for the matrix multiplication with optimal data motion and constant storage. The algorithm has sequential arithmetic complexity 2P^3, and parallel arithmetic complexity 2P^3/N. The algorithm has been implemented on the Connection Machine model CM-2. For the performance on the 8K CM-2, we measured about 1.6 Gflops, which would scale up to about 13 Gflops for a 64K full machine.Engineering and Applied Science

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Sustaining member

Harvard University - DASH

oai:dash.harvard.edu:1/2581100...

Last time updated on 17/04/2018