Search CORE

106 research outputs found

Efficient hypercube communications

Author: Bhatt Shreyas R.
Publication venue: Digital Commons @ NJIT
Publication date: 31/01/1992
Field of study

Hypercube algorithms may be developed for a variety of communication-intensive tasks such as sending a message from one node to another, broadcasting a message from one node to all others, broadcasting a message from each node to all others, all-to-all personalized communication, one-to-all personalized communication, and exchanging messages between nodes via fixed permutations. All these communication patterns are special cases of many-to-many personalized communication. The problem of many-to-many personalized communication is investigated here. Two routing algorithms for many-to-many personalized communication are presented here. The algorithms proposed yield very high performance with respect to the number of time steps and packet transmissions. The first algorithm yields high performance through attempts to equibalance the number of messages at intermediate nodes. This technique tries to avoid creating a bottleneck at any node and thus reduces the total communication time. The second algorithm yields high performance through one-step time-lookahead equibalancing. It chooses from the candidate intermediate nodes the one which will probably have the minimum number of messages in the next cycle

Digital Commons @ New Jersey Institute of Technology (NJIT)

A new-generation class of parallel architectures and their performance evaluation

Author: Wang Qian
Publication venue: Digital Commons @ NJIT
Publication date: 31/05/1999
Field of study

The development of computers with hundreds or thousands of processors and capability for very high performance is absolutely essential for many computation problems, such as weather modeling, fluid dynamics, and aerodynamics. Several interconnection networks have been proposed for parallel computers. Nevertheless, the majority of them are plagued by rather poor topological properties that result in large memory latencies for DSM (Distributed Shared-Memory) computers. On the other hand, scalable networks with very good topological properties are often impossible to build because of their prohibitively high VLSI (e.g., wiring) complexity. Such a network is the generalized hypercube (GH). The GH supports full-connectivity of its nodes in each dimension and is characterized by outstanding topological properties. In addition, low-dimensional GHs have very large bisection widths. We propose in this dissertation a new class of processor interconnections, namely HOWs (Highly Overlapping Windows), that are more generic than the GH, are highly scalable, and have comparable performance. We analyze the communications capabilities of 2-D HOW systems and demonstrate that in practical cases HOW systems perform much better than binary hypercubes for important communications patterns. These properties are in addition to the good scalability and low hardware complexity of HOW systems. We present algorithms for one-to-one, one-to-all broadcasting, all-to-all broadcasting, one-to-all personalized, and all-to-all personalized communications on HOW systems. These algorithms are developed and evaluated for several communication models. In addition, we develop techniques for the efficient embedding of popular topologies, such as the ring, the torus, and the hypercube, into 1-D and 2-D HOW systems. The objective is to show that 2-D HOW systems are not only scalable and easy to implement, but they also result in good embedding of several classical topologies

Digital Commons @ New Jersey Institute of Technology (NJIT)

Partial multinode broadcast and partial exchange algorithms for d-dimensional meshes

Author
Publication venue: Massachusetts Institute of Technology, Laboratory for Information and Decision Systems
Publication date: 01/01/1992
Field of study

Caption title. "Revision of January 1992."Includes bibliographical references (p. 24-26).Supported by NSF. NSF-ECS-8519058 Supported by ARO. DAAL03-86-K-0171by Emmanouel A. Varvarigos and Dimitri P. Bertsekas

DSpace@MIT

Multiphase complete exchange on a circuit switched hypercube

Author: Bokhari Shahid H.
Publication venue
Publication date
Field of study

On a distributed memory parallel computer, the complete exchange (all-to-all personalized) communication pattern requires each of n processors to send a different block of data to each of the remaining n - 1 processors. This pattern is at the heart of many important algorithms, most notably the matrix transpose. For a circuit switched hypercube of dimension d(n = 2(sup d)), two algorithms for achieving complete exchange are known. These are (1) the Standard Exchange approach that employs d transmissions of size 2(sup d-1) blocks each and is useful for small block sizes, and (2) the Optimal Circuit Switched algorithm that employs 2(sup d) - 1 transmissions of 1 block each and is best for large block sizes. A unified multiphase algorithm is described that includes these two algorithms as special cases. The complete exchange on a hypercube of dimension d and block size m is achieved by carrying out k partial exchange on subcubes of dimension d(sub i) Sigma(sup k)(sub i=1) d(sub i) = d and effective block size m(sub i) = m2(sup d-di). When k = d and all d(sub i) = 1, this corresponds to algorithm (1) above. For the case of k = 1 and d(sub i) = d, this becomes the circuit switched algorithm (2). Changing the subcube dimensions d, varies the effective block size and permits a compromise between the data permutation and block transmission overhead of (1) and the startup overhead of (2). For a hypercube of dimension d, the number of possible combinations of subcubes is p(d), the number of partitions of the integer d. This is an exponential but very slowly growing function and it is feasible over these partitions to discover the best combination for a given message size. The approach was analyzed for, and implemented on, the Intel iPSC-860 circuit switched hypercube. Measurements show good agreement with predictions and demonstrate that the multiphase approach can substantially improve performance for block sizes in the 0 to 160 byte range. This range, which corresponds to 0 to 40 floating point numbers per processor, is commonly encountered in practical numeric applications. The multiphase technique is applicable to all circuit-switched hypercubes that use the common e-cube routing strategy

NASA Technical Reports Server

Star varietal cube: A New Large Scale Parallel Interconnection Network

Author: Adhikari Nibedita
Nag Binod
Pradhan Debendra
Swain Nirmal Keshari
Publication venue: Institute for Project Management Pvt. Ltd
Publication date: 28/07/2020
Field of study

This paper proposes a new interconnection network topology, called the Star varietalcube SVC(n,m), for large scale multicomputer systems. We take advantage of the hierarchical structure of the Star graph network and the Varietal hypercube to obtain an efficient method for constructing the new topology. The Star graph of dimension n and a Varietal hypercube of dimension m are used as building blocks. The resulting network has most of the desirable properties of the Star and Varietal hypercube including recursive structure, partionability, strong connectivity. The diameter of the Star varietal hypercube is about two third of the diameter of the Star-cube. The average distance of the proposed topology is also smaller than that of the Star-cube

Interscience Research Network

CCL: a portable and tunable collective communication library for scalable parallel computers

Author: Alex Ho
Ching-tien Ho
Jehoshua Bruck
Marc Snir
Pablo Elustondo
Robert Cypher
Senior Member
Senior Member
Shlomo Kipnis
Vasanth Bala
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1995
Field of study

A collective communication library for parallel computers includes frequently used operations such as broadcast, reduce, scatter, gather, concatenate, synchronize, and shift. Such a library provides users with a convenient programming interface, efficient communication operations, and the advantage of portability. A library of this nature, the Collective Communication Library (CCL), intended for the line of scalable parallel computer products by IBM, has been designed. CCL is part of the parallel application programming interface of the recently announced IBM 9076 Scalable POWERparallel System 1 (SP1). In this paper, we examine several issues related to the functionality, correctness, and performance of a portable collective communication library while focusing on three novel aspects in the design and implementation of CCL: 1) the introduction of process groups, 2) the definition of semantics that ensures correctness, and 3) the design of new and tunable algorithms based on a realistic point-to-point communication model

CiteSeerX

Caltech Authors

General broadcasting algorithms in one-port wormhole routed hypercubes

Author: Lee Myung Hoon
Publication venue: Digital Scholarship@UNLV
Publication date: 01/01/1996
Field of study

Wormhole routing has been accepted as an efficient switching mechanism in point-to-point interconnection networks. Here the network resource, i.e. node buffers and communication channels, are effectively utilized to deliver message across the network; We consider the problem of broadcasting a message in the hypercue equipped with the wormhole switching mechanism. The model is a generalization of an earlier work and considers a broadcast path-length of {dollar}m\ (1\leq m\leq n{dollar}) in the n-cube with a single-port communication capability. In this thesis, the scheme of e-cube and a Gray code path routing and intermediate reception capability have been adopted in order to solve the problem of broadcasting in one-port wormhole routed hypercubes. Two methods have been suggested; one is based on utilizing the Gray codes (Gray code path-based routing), while the other is based on the recursive partitioning of the cube (cube-based routing). The number of routing steps in both methods are compared to those in the previous results, as well as to the lower bounds derived based on the path-length m assumption. A cube-based and a path-based algorithm give {dollar}T(R)+(k\sb{c}+1)T(m){dollar} and {dollar}k\sb{G} +T(m){dollar} routing steps, respectively. By comparison with routing steps of both algorithms, the performance of the path-based algorithm shows better than that of the cube-based; The results of this work are significant and can be used for immediate implementation in contemporary machines most of which are equipped with wormhole routing and serial communication capability

University of Nevada, Las Vegas Repository

I/O embedding and broadcasting in star interconnection networks

Author: Palagummi Kalyanram
Publication venue: Digital Scholarship@UNLV
Publication date: 01/01/1999
Field of study

The issues of communication between a host or central controller and processors, in large interconnection networks are very important and have been studied in the past by several researchers. There is a plethora of problems that arise when processors are asked to exchange information on parallel computers on which processors are interconnected according to a specific topology. In robust networks, it is desirable at times to send (receive) data/control information to (from) all the processors in minimal time. This type of communication is commonly referred to as broadcasting. To speed up broadcasting in a given network without modifying its topology, certain processors called stations can be specified to act as relay agents. In this thesis, broadcasting issues in a star-based interconnection network are studied. The model adopted assumes all-port communication and wormhole switching mechanism. Initially, the problem treated is one of finding the minimum number of stations required to cover all the nodes in the star graph with i-adjacency. We consider 1-, 2-, and 3-adjacencies and determine the upper bound on the number of stations required to cover the nodes for each case. After deriving the number of stations, two algorithms are designed to broadcast the messages first from the host to stations, and then from stations to remaining nodes; In addition, a Binary-based Algorithm is designed to allow routing in the network by directly working on the binary labels assigned to the star graph. No look-up table is consulted during routing and minimum number of bits are used to represent a node label. At the end, the thesis sheds light on another algorithm for routing using parallel paths in the star network

University of Nevada, Las Vegas Repository

An efficient algorithm for multiple simultaneous broadcasts in the hypercube

Author
Publication venue: Massachusetts Institute of Technology, Laboratory for Information and Decision Systems
Publication date: 01/01/1990
Field of study

Includes bibliographical references (p. 9-10).Cover title.Research supported by the NSF. ECS-8552419 Research supported by the ARO. DAAL03-86-K-0171by George D. Stamoulis and John N. Tsitsiklis

DSpace@MIT