Search CORE

2 research outputs found

Intensive hypercube communication Prearranged communication in link-bound machines,

Author: Baru
Bruce Wagar
Cybenko
Gustafson
Hayes
Ho
Ho
Ho
Ho
Johnsson
Quentin F. Stout
Saad
Valiant
Valiant
Wagar
Publication venue: 'Elsevier BV'
Publication date: 01/01/1990
Field of study

Hypercube algorithms are developed for a variety of communication-intensive tasks such as transposing a matrix, histogramming, sending a (long) message from one node to another, broadcasting a message from one node to all others, broadcasting a message from each node to all others, and exchanging messages between nodes via a fixed permutation. The algorithm for exchanging via a fixed permutation can be viewed as a deterministic analog of Valiant's randomized routing. The algorithms are for link-bound hypercubes in which local processing time is ignored, communication time predominates, message headers are not needed because all nodes know the task being performed, and all nodes can use all communication links simultaneously. Through systematic use of techniques such as pipelining, hatching, variable packet sizing, symmetrizing, and completing, for all these problems algorithms which achieve a time with an optimal highest-order term are obtained.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/28830/1/0000664.pd

CiteSeerX

Crossref

Deep Blue Documents at the University of Michigan

On the design and implementation of broadcast and global combine operations using the postal model

Author: Bruck Jehoshua
De Coster Luc
Dewulf Natalie
Ho Ching-Tien
Lauwereins Rudy
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/03/1996
Field of study

There are a number of models that were proposed in recent years for message passing parallel systems. Examples are the postal model and its generalization the LogP model. In the postal model a parameter λ is used to model the communication latency of the message-passing system. Each node during each round can send a fixed-size message and, simultaneously, receive a message of the same size. Furthermore, a message sent out during round r will incur a latency of hand will arrive at the receiving node at round r + λ - 1. Our goal in this paper is to bridge the gap between the theoretical modeling and the practical implementation. In particular, we investigate a number of practical issues related to the design and implementation of two collective communication operations, namely, the broadcast operation and the global combine operation. Those practical issues include, for example, 1) techniques for measurement of the value of λ on a given machine, 2) creating efficient broadcast algorithms that get the latency hand the number of nodes n as parameters and 3) creating efficient global combine algorithms for parallel machines with λ which is not an integer. We propose solutions that address those practical issues and present results of an experimental study of the new algorithms on the Intel Delta machine. Our main conclusion is that the postal model can help in performance prediction and tuning, for example, a properly tuned broadcast improves the known implementation by more than 20%

Caltech Authors