Search CORE

5 research outputs found

The Transportation Primitive

Author: Alsabti Khaled A.
Ranka Sanjay
Shankar Ravi V.
Publication venue: SURFACE at Syracuse University
Publication date: 01/01/1994
Field of study

This paper presents algorithms for implementing the transportation primitive on a distributed memory parallel architecture. The transportation primitive performs many-to-many personalized communication with bounded incoming and outgoing traffic. We present a two-stage deterministic algorithm that decomposes the communication with possibly high variance in message size into two communication stages with low message size variance. If the maximum outgoing or incoming traffic at any processor is t, transportation can be done in 2t¯ time (+ lower order terms) when t O(p 2 + pø=¯) (¯ is the inverse of the data transfer rate, ø is the startup overhead). If the maximum outgoing and incoming traffic are r and c respectively, transportation can be done in (r+c)¯ time when either r O(p 2 ) or c O(p 2 ). Optimality and scalability are thus achieved when the traffic is large, a condition that is usually satisfied in practice. The algorithm was implemented on the Connection Machine CM-5. The implementation used the low latency communication primitives (active messages) available on the CM 5, but the algorithm as such is architecture independent. An alternate single-stage algorithm using distributed random scheduling was implemented on the CM-5 and the performance of the two algorithms were compared

Syracuse University Research Facility and Collaborative Environment

Array Combining Scatter Functions on Coarse-Grained, Distributed-Memory Parallel Machines

Author: Khaled A. Alsabti
Khaled A. Alsabti
Sanjay Ranka
Sanjay Ranka
Seungjo Bae
Seungjo Bae
Publication venue
Publication date
Field of study

In High Performance Fortran (HPF) array combining scatter functions are generalized array reduction functions. Elements in a subset of a source array are combined and the resulting value is sent to a position in a resulting array. Array combining scatter function can be defined in terms of random access write (RAW). In this paper we present several local and global combining schemes and their impact on overall performance. First we present two global combining schemes for array combining scatter functions with arbitrary hot spots (processors) on coarse-grained, distributed-memory parallel machines: direct (one-stage) algorithm and extended two-stage algorithm. We also present four local combining (collision resolution) schemes, whole local combining using open address hashing , whole local combining using direct address hashing , partial local combining without compression, and partial local combining with compression, to reduce variance in message size. In each global combining scheme..

CiteSeerX

The Transportation Primitive

Author: Khaled A. Alsabti
Khaled Alsabti
Ravi Shankar
Ravi V. Shankar
Sanjay Ranka
Sanjay Ranka
Publication venue
Publication date
Field of study

CiteSeerX

Many-to-many Personalized Communication with Bounded Traffic

Author: Khaled A. Alsabti
Ravi V. Shankar
Sanjay Ranka
Publication venue
Publication date: 01/01/1995
Field of study

This paper presents solutions for the problem of many-to-many personalized communication, with bounded incoming and outgoing traffic, on a distributed memory parallel machine. We present a two-stage algorithm that decomposes the many-tomany communication with possibly high variance in message size into two communications with low message size variance. The algorithm is deterministic and takes time 2tµ (+ lower order terms) when t >= O(p&sup2; + p tau/µ). Here t is the maximum outgoing or incoming traffic at any processor, tau is the startup overhead and µ is the inverse of the data transfer rate. Optimality is achieved when the traffic is large, a condition that is usually satisfied in practice on coarse-grained architectures. The algorithm was implemented on the Connection Machine CM-5. The implementation used the low latency communication primitives (active messages) available on the CM-5, but the algorithm as such is architecture-independent. An alternate single-stage algorithm using distri..

CiteSeerX

Syracuse University Research Facility and Collaborative Environment