Investigation of reduced hypercube (RH) networks : embedding and routing capabilities by Sideras, Michalis A.
New Jersey Institute of Technology 
Digital Commons @ NJIT 
Theses Electronic Theses and Dissertations 
Winter 1-31-1994 
Investigation of reduced hypercube (RH) networks : embedding 
and routing capabilities 
Michalis A. Sideras 
New Jersey Institute of Technology 
Follow this and additional works at: https://digitalcommons.njit.edu/theses 
 Part of the Electrical and Electronics Commons 
Recommended Citation 
Sideras, Michalis A., "Investigation of reduced hypercube (RH) networks : embedding and routing 
capabilities" (1994). Theses. 1697. 
https://digitalcommons.njit.edu/theses/1697 
This Thesis is brought to you for free and open access by the Electronic Theses and Dissertations at Digital 
Commons @ NJIT. It has been accepted for inclusion in Theses by an authorized administrator of Digital Commons 
@ NJIT. For more information, please contact digitalcommons@njit.edu. 
 
Copyright Warning & Restrictions 
 
 
The copyright law of the United States (Title 17, United 
States Code) governs the making of photocopies or other 
reproductions of copyrighted material. 
 
Under certain conditions specified in the law, libraries and 
archives are authorized to furnish a photocopy or other 
reproduction. One of these specified conditions is that the 
photocopy or reproduction is not to be “used for any 
purpose other than private study, scholarship, or research.” 
If a, user makes a request for, or later uses, a photocopy or 
reproduction for purposes in excess of “fair use” that user 
may be liable for copyright infringement, 
 
This institution reserves the right to refuse to accept a 
copying order if, in its judgment, fulfillment of the order 
would involve violation of copyright law. 
 
Please Note:  The author retains the copyright while the 
New Jersey Institute of Technology reserves the right to 
distribute this thesis or dissertation 
 
 
Printing note: If you do not wish to print this page, then select  















The Van Houten library has removed some of the 
personal information and all signatures from the 
approval page and biographical sketches of theses 
and dissertations in order to protect the identity of 
NJIT graduates and faculty.  
 
ABSTRACT 
Investigation of Reduced Hypercube (RH) Networks: 
Embedding and Routing Capabilities 
by 
Michalis A. Sideras 
The choice of a topology for the interconnection of resources in a distributed-
memory parallel computing system is a major design decision. The direct binary 
hypercube has been widely used for this purpose due to its low diameter and its 
ability to efficiently emulate other important structures. The aforementioned strong 
properties of the hypercube come at the cost of high VLSI complexity due to the 
increase in the number of communication ports and channels per node with an 
increase in the total number of nodes. The reduced hypercube (RH) topology, which 
is obtained by a uniform reduction in the number of links for each hypercube node, 
yields lower complexity interconnection networks compared to hypercubes with the 
same number of nodes, thus permitting the construction of larger parallel systems. 
Furthermore, it has been shown that the RH at a lower cost achieves performance 
comparable to that of a regular hypercube with the same number of nodes. A 
very important issue for the viability of the RH is to investigate the efficiency 
of embedding frequently used topologies into it. This thesis proposes embedding 
algorithms for three very important topologies, namely the ring, the torus and the 
binary tree. The performance of the proposed algorithms is analyzed and compared 
to that of equivalent embedding algorithms for the regular hypercube. It is shown 
that these topologies are emulated efficiently on the RH. Additionally, two already 
proposed routing algorithms for the RH are evaluated through simulation results. 
INVESTIGATION OF REDUCED HYPERCUBE (RH) NETWORKS: 
EMBEDDING AND ROUTING CAPABILITIES 
by 
Michalis A. Sideras 
A Thesis 
Submitted to the Faculty of 
New Jersey Institute of Technology 
in Partial Fulfillment of the Requirements for the Degree of 
Master of Science in Electrical Engineering 

















Investigation of Reduced Hypercube (RH) Networks: 
Embedding and Routing Capabilities 
Michalis A. Sideras 
Dr. Sotirios G. Ziavras, Thesis Advisor 	 Date 
Assistant Professor of Electrical and Computer Engineering, NJIT 
Dr. Denis Blackmore, Committee Member 	 Date 
Professor of Mathematics, NJIT 
Dr. John D. Carpinelli, Committee Member 	 Date. 
Associate Professor of Electrical and Computer Engineering, NJIT  
BIOGRAPHICAL SKETCH 
Author: Michalis A. Sideras 
Degree: Master of Science in Electrical Engineering 
Date:  January 1994 
Undergraduate and Graduate Education: 
• Master of Science in Electrical Engineering, 
New Jersey Institute of Technology, Newark, NJ, 1994 
• Bachelor of Science in Computer Engineering, 
New Jersey Institute of Technology, Newark, NJ, 1992 
Major: Electrical Engineering 
Presentations and Publications: 
Sotirios G. Ziavras and Michalis A. Sideras, "Facilitating High-Performance.Image 
 Analysis on Reduced Hypercube (RH) Parallel Computers," 3rd International 
Workshop on Parallel Image Analysis, Submitted for Publication. 
iv  
This work is dedicated to 
my parents 
Antonis and Penelope 
v 
ACKNOWLEDGMENT 
The author wishes to express his sincere gratitude to his advisor, Professor 
Sotirios G. Ziavras, for his guidance, friendship, and moral support throughout this 
research. 
Special thanks to Professors Denis Blackmore and John D. Carpinelli for serving 
as members of the committee and for providing valuable comments and suggestions 
for this thesis. 
The author greatly acknowledges the asssistance of professors Gregory Kriegsmann 
and Michael Porter of the Department of Mathematics in the form of a graduate 
assistanship. 
An additional word of thanks to Professor Michael Porter for being so 
supportive and understanding to the author as his supervisor in the computer 
lab of the Department of Mathematics and for making his high-tech computer 
equipment available to him from which this thesis has greatly benefited.  
vi  
TABLE OF CONTENTS 
Chapter 	 Page  
1 INTRODUCTION 	 ........1 
1.1 Importance of Parallel Computing Systems  	1 
1.2 Topologies  	3 
2 HYPERCUBE VARIATIONS 	  7 
2.1 Existing Variations  	7 
2.2 Reduced Hypercube Interconnection Network  	 11 
2.2.1 Emulating Hypercubes on Reduced Hypercubes 	  15 
3 ROUTING ALGORITHMS FOR REDUCED HYPERCUBES 	 20 
3.1 Routing Algorithm I 	  20 
3.2 Routing Algorithm II 	  22 
3.3 Simulation Results and Comparison 	  23 
4 MAPPING ALGORITHMS FOR REDUCED HYPERCUBES 	 25 
4.1 Mapping of a Ring onto an RH 	  26 
4.2 Mapping of a Torus onto an RH 	 .......34 
4.2.1 Mapping of a 2 — D Torus onto an RH 	  34 
4.2.2 Mapping of a µ — D Torus onto an RH 	  37 
4.3 Mapping of a Binary Tree onto an RH 	  39 
5 CONCLUSIONS 	  41 
REFERENCES 	  43 
vii  
LIST OF TABLES 
Table 	 Page  
3.1 Average distances of nodes in RH's according to Algorithms I and II ........................... 24 
4.1 The SOE's and SOX's for n = 2 	  28 
4.2 Efficiency of torus mapping 	  39 
viii  
LIST OF FIGURES 
Figure 	 Page 
1.1 Fundamental topologies (a) Ring (h) Torus (c) Binary tree  	4 
2.1 Structure of the CCC(4)  	8 
2.2 A 2-cube and a 3-cube connected to form an incomplete hypercube ....... 	 10 
2.3 Producing the RH(2, 1) from the 4-cube by removing the red links ......... 	13 
2.4 The RH(3,1) 	 ......14 
2.5 Structure of the RH(k, 9) 	  16 
2.6 The average dilation of edges for hypercube emulation on the RH(k,n), 
as a function of k and n for k + 2n ≤ 26 	  19 
4.1 Mapping of a 32-node ring onto the RH(3,1) 	 31 
4.2 	Mapping of the 4 x 8 torus onto the RH (3, 1) .............................................................. 36 




1.1 Importance of Parallel Computing Systems 
Historically the roots of supercomputers are traced in the military. intelligence and 
the scientific communities. During the period of the cold war supercomputers were 
indispensable in the design of nuclear weapons, radar tracking, and many other appli-
cations such as code breaking. All the above account for the very special treatment 
that the supercomputer industry received from the government. 
The end of the cold war however did not bring the end of this industry. On the 
contrary, in the "economic war" which is taking shape now the role of supercomputers 
is and will be as important and even more important than it was during the period of 
the cold war. Supercomputers are growing into mainstream business. industrial and 
design applications. For instance they are used in the design of automobiles that will 
better protect passengers in crashes, for the design of internal combustion engines 
which burn fuel more efficiently, for the design of integrated circuits of unprecedented 
complexity and for pharmaceuticals that are safer and more effective. 
Originally efforts were heavily directed in the improvement of the raw speeds 
of the electronic components that make up a supercomputer. As a result of these 
efforts there emerged the first class of supercomputers. Sonic representative examples 
include the CRAY-2, the IBM 3081/3089, as well as the Burroughs D-825. In spite 
of the fact that substantial, and in some cases impressive, improvements have been 
achieved, they were not enough to reach the continually increasing demands in speed. 
The speed of light is an unsurpassed limit to this approach. This has prompted 
computer engineers to seek different approaches. 
The focus of both academia and industry has often shifted to a philosophy and 
approach which has been around for the last 15 years, that of massive parallelism. 
It is now accepted, astonishingly with a wide consensus, that massive parallelism is  
1 
2 
the only way to achieve the target speedups and even to break the teraflop barrier 
(1 Teraflop=1 trillion floating point operations per second). 
This has led to the emergence of massively parallel processor (MPP) systems 
which are moving toward domination of the supercomputer market. MPP's 
are high performance architectures which are based on the interconnection of 
hundreds or thousands of microprocessor-based nodes as opposed to 4, 8 or a few 
dozens of expensive, exotic processors in conventional supercomputers. The main 
advantage of MPP's is their support of scalability, that is, the ability to design 
high performance architectures capable of accommodating thousands of processors
, overcoming potential problems such as insufficient memory and communication 
overhead. The latter is a real issue for MPP's and it refers to the time it takes 
for the nodes to communicate in order to complete a task. It is obvious that this 
overhead is at the expense of the speedup gained by the parallelism of the task. 
Currently there are worldwide approximately an equal number of conventional super-
computers and powerful MPP's. There are about 500 of each kind. Both design 
techniques have achieved speeds approaching 10 gigaflops (1 gigaflop = 1 billion 
floating point operations per second). As it has been alluded to earlier, the target 
of both industry and the government is to approach processing rates of a trillion 
floating-point operations per second (1 teraflop) sustained on actual applications. 
This is 100 times the processing power of today's best machines which run at about 
10 gigaflops. The hope is that teraflop performance will be reached before the 
end of this decade. The target of teraflop performance is not arbitrary. There are 
certain useful tasks which need this kind of performance, such as for example the 
simulation with reasonably fine detail of the aerodynamics of an entire aircraft, the 
simulation of the global climate over a period of decades, and close to accurate 
weather forecasting. 
3 
1.2 Topologies  
A distinct characteristic of an MPP computer is the way its processing elements 
(PE's) are interconnected, i.e. the topology of the interconnection network. Actually 
one of the ways to classify multiprocessors is to make distinctions based on the 
topology of their interconnection networks. 
Some fundamental topologies for the development of parallel algorithms are 
the following: 
• Ring (or linear array) 
• Torus (or mesh) 
• Binary Tree 
• Pyramid 
The first three topologies arc shown in Fig 1.1. 
For instance, the linear array is important for image coding, the mesh is indis-
pensable for numerical analysis and intermediate-level image processing, the binary 
tree is inherently the most suitable topology for divide-and-conquer techniques, and 
the pyramid is suitable for multigrid operations and image processing. 
When it comes to the design and manufacturing of a. commercial MPP, the 
objective is often to build a general purpose system. This is the case usually unless 
there is a specific demand and need for a specialized system. The term general 
purpose clearly implies and dictates that the topology of the computer should he 
able to efficiently emulate the fundamental structures like those listed above. Indeed, 
during the short history of this new industry, the most successful 11 PP's were 
designed with a general purpose topology for the interconnection of their PE's. 
One very important topology which has been and still is central to research 
efforts in this area is the direct binary n-cube, otherwise called the n-dimensional 
4 
Figure 1.1 Fundamental topologies (a) Ring (b) Torus (c) Binary tree 
5 
hypercube. The direct binary n-cube is a special case of the direct k-ary n-cube 
with n dimensions and k nodes in each dimension [11]. The terms n-dimensional 
hypercube and n-cube will be used interchangeably in the rest of this thesis to denote 
the direct binary n-cube. Simply described, an n-dimensional hypercube or n-cube 
consists of 2n nodes. If unique consecutive binary n-bit addresses are assigned to its 
nodes, then a link (edge) exists between nodes whose addresses differ in a single bit. 
Consequently each node has n links attached to it. The hypercube possesses some 
extremely important topological properties which make it a very good choice for the 
interconnection of PE's in MPP systems. The most important of these properties 
are the following: 
1. Small diameter in large systems. The diameter of an interconnection network 
is defined as the maximum of the shortest distances between all pairs of nodes. 
For an n-cube, which has 
 
 nodes, the diameter is equal to n. 
2. Small average internode distance in large systems which is defined as the 
average of the distances of all nodes from a reference node. It is equal 
to  for the n-cube. 
3. General purpose topology, since it can emulate other important structures very 
efficiently. Indeed, a considerable number of algorithms have bees proposed 
for embedding several fundamental networks into the hypercube. Algorithms 
for the mapping of rectangular meshes have been proposed among others by 
Chan and Saad [13], and Johnsson [14]. Binary tree mappings were proposed 
among others by Wu [15], Deshpande and Jevenin [3], Ho and Johnsson [6], and 
Johnsson [14]. Finally, algorithms for embedding pyramids have been designed 
among others by Chan and Saad [13], Lai and White [16], and Ziavras and 
Siddiqui [17]. 
6 
The above advantages are a direct consequence of the high degree of connec-
tivity of the nodes in the hypercube and its highly regular structure. 
Several successful commercial products make use of the hypercube. The 
Thinking Machines CM-2 (Connection Machine-2), the NCUBE and the Intel iPSC 
are the most important. The CM-2 has up to 65,536 PE's which are simple 1-bit 
processors. In contrast the other two machines have a relatively small number of 
powerful processors (up to 1,024). The relation between the number of processors 
and their complexity is not coincidental. The hypercube has a major disadvantage, 
namely its number of channels per processor increases as the log2 of N where N is the 
total number of processors. The resulting high VLSI complexity of the hypercube is 
a limiting factor on the feasible size of a system. This is exactly the reason why the 
CM-2 uses simple processors in order to have a large number of them. 
The high VLSI complexity of the hypercube has forced several supercomputer 
manufacturers to seek different topologies for their new systems. Nevertheless, as 
the hypercube remains a good choice due to its powerful properties, researchers 
are now directing their efforts on the development of hypercube-like topologies with 
lower VLSI complexities. One should expect from such a structure to maintain to 
a large extent the powerful properties of the hypercube but a.t the same time have 
a considerably lower VLSI complexity, thus permitting the construction of larger 
systems that can yield higher throughputs. Section 2.1 briefly describes some existing 
important hypercube-like topologies which are also called hypercube variations. This 
thesis investigates the routing and embedding capabilities of such a variation, namely 
the reduced hypercube (RH) interconnection network. The reduced hypercube is a 
very promising hypercube variation which was proposed recently [2] and is described 
in detail in Section 2.2. 
CHAPTER 2 
HYPERCUBE VARIATIONS 
2.1 Existing Variations 
In Chapter 1 the major drawback of the hypercube interconnection network was 
identified as its resistance to incremental growth due to high VLSI complexity for 
large systems. In order to mitigate this disadvantage but at the same time maintain 
to a high extent the very important properties of the hypercube, researchers have 
developed several variations of the regular hypercube. The extent to which the above 
mentioned objective has been achieved varies. The RH interconnection network 
which is central to the research presented in this thesis, is a recent promising variation 
of the regular hypercube. A brief description of some important existing hypercube 
variations is deemed necessary and is in order. 
The cube-connected cycles CCC(n) interconnection network is presented first 
since it has a special relation with the RH. This relation will he explained in 
Section 2.2. The CCC(n) is made up of 
2
n  n-node rings. Each entire ring is assigned 
an address just if it were a hypercube node in a n -node configuration. Furthermore, 
each individual ring node has a unique log2 n — bit address within the n-node ring. 
The overall address of each node in the CCC(n) is obtained by the concatenation 
of the ring address with the node address within the ring. Nodes are connected 
in the following way. Each node connects to the node which has the same lower 
part of its address and whose upper part of the address differs in the bit which is 
equal to the decimal representation of the lower part. In other words, each node 
implements a connection in the direction it represents. Each node with address i in 
a ring implements a connection in the ith dimension of the n -node hypercube. The 
CCC(n) has the important property that its connectivity factor, i.e. the number of 
edges per node, is equal to 3, independently of the value of n. The CCC(4) is shown 
in Figure 2.1. 
7 
8  
Figure 2.1 Structure of the CCC(4) 
9  
The incomplete hypercube is another important variation of the hypercube. 
An incomplete hypercube is constructed by connecting two complete hypercubes of 
different sizes. Figure 2.2 shows the structure of an incomplete hypercube which 
is comprised of two maximal-sized complete hypercubes with 2 and 3 dimensions. 
respectively. 
In spite of the fact that the incomplete hypercube attempts to solve the incre-
mental growth problem of the conventional hypercube, it has the major disadvantage 
that a considerable number of communication ports remain totally unused. For 
example, consider an incomplete hypercube with 2,560 nodes which is obtained by 
connecting a 2,048-node hypercube and a 512-node hypercube. Had these been 
two distinct hypercubes, they would have had 11 and 9 communication ports per 
node, respectively. Since they have to be connected together to form the incomplete 
hypercube, they now have an additional port per node, thus bringing the number of 
ports per node to 12 and 10, respectively. From the 2,048 nodes of the first hypercube 
only 512 of them will be connected to the second hypercube. This implies that the 
rest of the nodes in the first hypercube (i.e. 2,048 — 512 = 1,536) will have an 
unused port each, for a. total of 1,536 unused communication ports. It is assumed 
that the same kind of processors are used for all nodes in each hypercube. This is 
primarily dictated by practical design needs. In general, assuming that an n0-cube 
and an n1-cube (n0 > n1 ) are connected to form an incomplete hypercube, the total 
number of unused communication ports is equal to 
2
0  — n1. Th s serves as an 
indication of the cost associated with unused resources. In spite of the fact that the 
incomplete hypercube structure seems to mitigate the incremental growth problem 
for the hypercube, it does have, at least locally, a higher VLSI complexity. This 
is not desirable however because one of the original goals was to obtain hypercube 
variations that have lower overall VLSI complexity in order to make the construction 
of larger systems feasible. 
10 
Figure 2.2 A 2-cube and a 3-cube connected to form an incomplete hypercube 
11 
2.2 Reduced Hypercube Interconnection Network  
Although [2] introduced reduced hypercubes (RH's) which can be viewed as hierar-
chical structures with several levels, only the properties of structures with two 
levels were studied extensively. This thesis also focuses on the RH's with only two 
levels. RH's are obtained by uniformly removing at design time several edges from 
hypercubes with the same number of nodes. 
A reduced hypercube RH(k,n) contains a total of N nodes where N = 
2k+2n > k ≥ n and n > 0. For simplicity let the exponent term k + 
2
  be denoted by v as it 
will be used repeatedly. Each node of the RH(k,n) is attached to k +  1 bidirectional 
channels. In a regular hypercube with the same number N = n  of nodes, each node 
is attached to v bidirectional links. Therefore, each node in the N -node RH has 
v — (k + 1 ) or n  — 1 links less than each node in the N-node regular hypercube. 
The N-node RH(k,n) is constructed from the N-node regular hypercube by 
uniformly removing 
n 
 — 1 links from each of its nodes. To accomplish this, the 
v-bit addresses of hypercube nodes are first partitioned into two fields, the 0th and 
1st fields, as follows. The 0th field contains the k least significant bits of the v-bit 
node address. This field represents the address of the node within a complete k-cube, 
which will be referred to as a building block (BB). The 1st field contains the 2n most 
significant bits of the v-bit node address. It represents the address of the BB that 
contains the node. In addition, a subfield is identified in the 0th  field, the 0th  subfield. 
It contains the n most significant bits of the k-bit 0th field. It represents the address 
of a (k—n)-dimensional subcube, which will be referred to as a subblock (SB), within 
the k-cube BB that contains the node. 
To conclude, the address of each node in the RH( k, n) is formed as shown in 
the following diagran, where the symbol "●" denotes concatenation and it will be 
used as such throughout this document. 
12  
In order to reduce the v-cube into the RH (k,n), out of the v bidirectional links of 
each hypercube node the following two sets are kept, leaving k + 1 links to each node. 
Set 1:  The k links of the v-cube that traverse the k lowest dimensions (i.e., 
dimensions 0 through k — 1) and connect the referenced node with k distinct nodes 
are kept. As a result, a complete k-dimensional building block (BB) that includes 
the referenced node is maintained. 
Set 2:  This set contains only one link which is also present in the original 
v-cube. This link is the one which connects directly the referenced node with the 
node whose address differs only in the mth bit of the 1st field, where m is the decimal 
value in the 0th subfield and 0 ≤ m ≤ 2n — 1. 
The resultant RH(k,n ) contains 22n k-cube BB's. A   address forms the 
2n most significant bits (i.e. the 1st field) of the v-bit, addresses for contained nodes. 
Each BB is divided into 2n subblocks (SB's). Connections between pairs of SB 's 
in different BB's are as follows: A node in a particular SB  of a particular BB is 
connected to the node with the same 0th field address which belongs to the BB whose 
2n-bit address differs only in the mth bit, where in is the value in the 0th subfield of 
the former node. 
Figure 2.3 shows the RH(2,1) after the links in red are removed from the 
4-dimensional hypercube. The RH(3,1) is shown in Figure 2.4. The dotted lines 
represent the inter-building block links. It was shown in [2] that the RH can emulate 
simultaneously, with dilation equal to one, several cube-connected cycles networks 
[8]. 
13  




Figure 2.4 The RH(3,1) 
15  
Example 2.1 Figure 2.4 shows the structure of the RH (3, 1). Note that there are 
2n = 21 = 2 SB's per BB. Each BB is a complete 3 — cube, since k = 3. BB 
addresses appear above each BB. In this case they consist of 2 bits. SB addresses 
appear inside SB boxes in the upper left corner. For this example, the SB addresses 
are 1-bit quantities. To understand better the way nodes in the RH are intercon-
nected, a few nodes are examined. Periods are used to separate from left to right the 
BB address (1st field), the SB address, (0th subfield), and the node address in the 
subblock — n bits). The node which belongs to BB = 00, SB = 1  and has address 
11  in the SB has an address denoted by 00.1.11. This node is connected with a set l 
link to the node with the same lower field in the BB 00 = 10, because the 0th subfield 
contains the subscript of the bit to be complemented. Therefore, the node 00.1.11  
connects to the node 10.1.11. Similarly 01.1.11 connects to 11.1.11 . 
A more general case is shown in Figure 2.5 [2]. It is the structure of the RH( k, 2) 
where the large squares represent the k-cube building blocks. The numbers above the 
squares represent in decimal the BB addresses and the numbers within the quadrants 
of large squares are the SB addresses in decimal. For simplicity, the actual nodes 
within the squares are not shown. Each curved line represents 2k-2 bidirectional 
communication channels; this is also the number of PE 's in each SB. It  is implied 
that each PE in a SB is connected to the PE with the same 0th field address in the 
SB where the curved line leads. 
2.2.1 Emulating Hypercubes on Reduced Hypercubes  
Given the fact that an RH is equivalent to a regular hypercube with fewer links, 
as explained previously, it is evident that the performance of the topology may 
sometimes degrade to some extent. The largest degradation appears for algorithms 
designed explicitly for the regular hypercube. It is desired to find the degree of 
this degradation not only for the purpose of theoretical analysis but also for a more 
16  
Figure 2.5 Structure of the RH(k,  2) 
17  
practical reason. There is a plethora of important algorithms which have already 
been developed for the regular hypercube, therefore how well the RH emulates the 
regular hypercube is important. Indeed, this was investigated in [2] and the most 
important findings are presented below. 
The dilation of edges associated with the chosen hypercube mapping must 
be found for evaluation of the performance. The dilation measures the increase in 
the communication overhead when compared to one-hop data transfers in the regular 
hypercube. Let the regular v-dimensional hypercube and the target RH ( k,n ) contain 
the same number of nodes; that is 2v where v = k + 2n. Assume that nodes from 
the regular hypercube are mapped to nodes of the RH with the same address. The 
following theorem [2] presents the resultant dilations of edges. 
Theorem 2.1 For the emulation of the (k + 2n)-dimensional hypercube on the 
reduced hypercube RH (k,n) with the same number of nodes, the dilations of the 
edges incident to a single node of the hypercube are: 1 for k + 1 of them and 2p + 1 for (np) of th m, where p = 1, 2, ..., n and, (np) represents the number f distinct p-combinations of n items. 
 
Example 2.2 The dilations of the edges incident to a. single node of the RH(5,2)  
for the emulation of the 9-dimensional hypercube are 1, 3 and 5 for 6. 2 and I edge. 
respectively. Similarly, the dilations of the edges incident to a single node for the 
emulation of the 16-dimensional hypercube on the RH(8,3) are 1, 3, 5 and 7 for 9, 
3, 3 and 1  edge, respectively. 
There are two other important metrics: the maximum and average dilations of 
edges for hypercube emulation on the RH. The following two corollaries provide the 
means for their calculation [2]. 
Corollary 2.1 The maximum dilation of edges for hypercube emulation on the 
RH(k,n) is equal to 2n + 1. 
18 
Corollary 2.2 The average dilation of edges for hypercube emulation on the 
RH ( k,n) is equal to 
The average dilation of edges for the last two examples is 1.88 and 2.5, respec-
tively. Figure 2.6 [2] shows the average dilation of edges for hypercube emulation as 
a function of k and n, with 1 ≤ n ≤ 4, 2 ≤ k ≤ 10 and k + 2n ≤ 26. Therefore, 
the emulation of hypercubes with up to 26 dimensions is considered in Figure 2.6. 
The numbers next to graph points represent the numbers of dimensions in the 
emulated hypercubes. It must be mentioned that the curves for n = 1,2 and 3 
represent realistic cases with respect to the actual implementation of the respective 
target systems with the current technology due to the relatively low complexity of 
RH's . Figure 2.6 shows that the average dilation of edges for hypercube emulation 
is relatively small. This observation guarantees small performance degradation for 
he implementation of hypercube algorithms on RH's. The effect of dilation is r duced significantly fr m left to right for the set of four well known packet switching techniques: store-and-forward, virtual cut-through, circuit switching, and wormhole routing. 
Figure 2.6 The average dilation of edges for hypercube emulation on the RH(k,n ), as a function of k and n for k + 2n ≤ 26  
19  
CHAPTER 3 
ROUTING ALGORITHMS FOR REDUCED HYPERCUBES 
Two distributed routing algorithms for RH's which were proposed in [2] are discussed 
in this chapter and their comparative analysis follows based on simulation results, 
Both algorithms consist of three steps and their first two steps are identical. The 
E-cube routing algorithm for regular hypercubes uses dimension ordering so that 
messages traverse in a predetermined increasing or decreasing order the dimensions 
where the source and destination addresses differ. This ordering is required to 
avoid the creation of deadlocks during the transmission of multiple messages. The 
increasing dimension order is used throughout this thesis. The description of the 
routing algorithms below ignores potential deadlock problems for the transmission 
of multiple messages. Virtual channels can be introduced to make both routing 
algorithms deadlock-free [2]. 
3.1 Routing Algorithm I 
The presentation of the first routing algorithm for the RH(k,n ) is very detailed for 
the purpose of clarity. The execution steps in this routing algorithm are as follows 
[2]. 
Step 1. Apply the E-cube routing algorithm for regular hypercubes to the least. 
significant, k — n bits of the source address, if the source and destination 
addresses differ in any of these bits; use dimension ordering, starting with 
dimension 0 and following the increasing order. This step implements E-cube 
routing within the source SB. 
Step 2. If the destination was reached, then stop, Otherwise, let λ be the subscript 
of the most significant bit position where the current and destination addresses 
differ (it can be found by an XOR operation). If λ < k, then apply the E-cube 
21 
routing algorithm to the 0th subfield of the current address and stop when the 
destination PE is reached. Otherwise, let m be the value represented by the 0th  
subfield of the current, address. If the current. and destination addresses differ 
in the mth bit of their 1st field, then transmit the message to the PE whose 
address differs from the current address only in the bit with subscript k + m (a 
direct connection exists for the implementation of this transfer) and go back 
to Step 2. Otherwise, if the two addresses do not differ in the mth bit of their 
1st field, then go to Step 3. 
Step 3. Find the bit in the 1st field that differs in the current. and destination 
addresses and whose subscript in this field differs in the smallest, number of 
bits from the n-bit value in the 0th subfield of the current. address. For multiple 
subscripts corresponding to the smallest number of bits, choose one of them at 
random. To find the neighbor that then receives the message, carry out E-cube 
routing within the k-cube building block to the nearest. PE that can correct 
the aforementioned bit in the 1 st field. Go to Step 2. 
Example 3.1 Assume the RH( 7,3). If the source and destination addresses 
source address = 00010010.010.0101 and destination address = 01110110.101.1101 










The underlined bit was changed in the corresponding step. 
3.2 Routing Algorithm II 
This algorithm performs better than the first routing algorithm for source and desti-
nation addresses that differ in a large number of bits in their 1st field. This routing 
algorithm uses the binary reflected Gray Code.  
Definition 3.1 The (cyclic) binary reflected Gray code of n bits RGC(n) is defined 
recursively by RGC(n) = {0 • RGC(n - 1), 1 • RGC-1(n - 1 )}, where " • " denotes 
concatenation, RGC-1(n - 1 ) denotes the sequence derived by reversing the order of 
elements in the sequence RGC(n - 1). and RGC(1)  = {0, 1}. 
Example 3.2 RGC(2) = {00, 01, 11, 10} and RGC(3) = {000. 001. 011, 010. 110. 
111, 101, 100}.  
Any Gray code has the property that any two neighboring codes in the sequence  
of all possible 2n n-bit numbers differ in a. single bit. The cyclic version of the RGC(n)  
is used throughout this thesis. The first two steps of the new algorithm are identical 
to those of routing algorithm I. Its third step follows.  
Step 3. Create a sequence of n-bit numbers that starts with the value in the 0th  
subfield of the current node, contains the offsets of bits that differ in the 1st  
field of the current and destination addresses, and ends with the value in the 
0th subfield of the destination node. Order the sequence of offsets as follows. 
First, create two candidate sequences that contain all of the offsets in the order 
they appear in the RGC (n ), when forward or backward traversal is carried 
out, respectively. From the two sequences choose the one with the smaller 
total number of 1 bits in the XOR results between consecutive pairs of offsets,  
23 
including the current and destination subfields. This choice results in a smaller 
number of changes for the 0th subfield and thus reduces the total number of 
intermediate nodes in the path. Then apply the E-cube routing algorithm to 
the 0th subfield of the current address in order to go closer to the node whose 
lower subfield is the same as the next offset in the chosen sequence. Go to Step 2. 
3.3 Simulation Results and Comparison 
Besides the diameter, one very important metric for any interconnection network is 
its average internode distance. It should be made clear that the average distance in 
the RH ( k , n ) depends on the actual routing algorithm which is used. The average 
distance of the n-dimensional hypercube is given by n/2. Table 3.1 shows the average 
distances in the RH (n, n ), with n = 1,2,3 and 4, for the first routing algorithm 
and a variation of the second algorithm. More specifically strict dimension ordering 
with the reflected Gray code is assumed for the second algorithm (the forward and 
backward traversal of the Gray code produce the same result when all possible source-
destination pairs are considered [2]). The simulation for the case n = t is very time 
consuming when all possible source-destination pairs are considered. Therefore, our 
simulation assumes all possible destinations for the source node with address 0, and 
the table contains the average for the forward (FWD) and backward (BWD) traversal 
of the Gray code. The fourth column is what is obtained if at each iteration the  
shortest distance according to both algorithms is chosen. The last column contains 
the average distance when for each source-destination pair the minimum of the 
distances is chosen with forward or backward traversal of the reflected Gray code. 
It comes as no surprise that the fourth column holds the shortest distances. It was 
observed that algorithm II performs better than algorithm I for pairs of source and 
destination addresses that differ in a large number of bits in their 1st field, 
24  
Table 3.1 Average distances of nodes in RH 's according to Algorithms I and 11 
n Alg. I Alg II (FWD or BWD) Best of I&II Best of Alg. II 
1 2.000000 2,000000 2.000000 	2.000000 
2 4.746094 4.875000 4.621094 4.625000 
3 9.851364 10.492188 9,559372 9,929688 
4 19.387329 21.196258 19,148174 20,548462 
CHAPTER 4 
MAPPING ALGORITHMS FOR REDUCED HYPERCUBES 
The most desired algorithm for the mapping of a source topology onto a target 
topology is one which yields an optimal mapping, The following definition is 
pertinent. 
Definition 4.1 A mapping is optimal if the source topology is a proper subgraph of 
the target topology. In this case there is a one-to-one mapping of source nodes to 
target nodes and the resultant dilations of all source edges are 1. The dilation of a 
source edge is defined as the length of the shortest path that connects the images of 
its two incident nodes in the target topology. 
In this chapter, algorithms are presented for the mapping of three very 
important structures onto the RH topology. More specifically, mapping algorithms 
are proposed for the ring (or linear array), the µ-D torus (or mesh), and the binary 
tree structures. Each mapping is evaluated in terms of the dilation of edges and the 
utilization of resources. They are also compared to existing mappings of the same 
structures onto the regular hypercube. 
Before the formal introduction of the mapping algorithms, a definition 
pertaining to them is in order. 
Definition 4.2 The cyclic binary U Gray code of n bits UGC(n) is defined recur-
sively by UGC(n) = {RGC(n-1) • 0, RGC-1 (n-1) • 1} , and UGC(1) = RGC(1) = 
{0, 1}. 
Example 4.1 UGC(2) = {00, 10, 11, 01} and UGC(3) = {000, 010, 110, 100, 101
, 111, 011, 001}.  
26 
4.1 Mapping of a Ring onto an RH 
In the case of the regular hypercube the mapping of the ring onto it. is optimal and 
the algorithm is straightforward. It is based on the property of the RGC that any 
two successive codes in the RGC differ in a single bit. Since nodes in the hypercube 
whose addresses differ in a single bit are directly connected, a ring may be obtained 
by assigning in ascending order hypercube addresses to ring nodes using the RGC seque ce.
 
The RH has considerably fewer links per node compared to the regular 
hypercube. As explained in Section 2.2, nodes in the RH whose addresses differ in 
a single bit are not necessarily linked together as they are in the regular hypercube. 
In fact, two nodes in the RH ( k,n) are linked together if their addresses 
differ in any single bit in the 0th field, or if they differ in only that bit of the 1st field whose offset 
in this field is equal to the decimal value in their 0th subfield. Therefore, the RCG  
alone cannot be used for optimal mapping. The definitions of building block (BB) 
and subblock (SB) presented earlier in Section 2.2 are essential for understanding 
the proposed mapping algorithm hereafter. 
For the N-node RH(k,n), the algorithm generates a sequence containing 
all possible N addresses ensuring that successive addresses represent. nodes of the 
RH which arc linked together directly. The sequence of addresses is generated by 
traversing the RH(k,n) BB by BB according to the RGC(2n). All nodes in a 
BB are traversed consecutively, therefore each BB is visited only once. The first, node 
with which the traversal of a particular BB starts will be referred to as the Node of 
Entry (NOE) into that BB. Furthermore, the SB that contains this NOE will be 
referred to as the Subblock of Entry (SOE). Similarly, the last node traversed in a 
BB and its corresponding subblock will be referred to as the Node of Exit (NOX) 
and Subblock of Exit (SOX), respectively. The following theorem is the basis of the 
proposed algorithm. 
27 
Theorem 4.1 If BB's in the RH(k,n) are traversed according to the RGC(2n), 
each BB is entered from SB 0 and exited from SB δ, or vice versa, where the SB 
address δ takes values from 1 to 2n — 1. This will be denoted by 0 ↔ δ. 
Proof.        Every other code in the RGC(2n ) differs from its succeeding code only in 
the least significant bit. Therefore, the SOX for half of the BB's is SB 0. SB  
0 is then the SOE for the other half of BB's because only SB's with the same 
address are connected together in RH's. 	● 
Given the SOE and SOX of each BB in the RH, the way the nodes of a 
particular SB are traversed is determined by the algorithm below for the case 0 ↔ 
δ,
i.e. SOE = 0 and SOX = δ. Let. the ordered sequence T
↔ 
δ = {τ0,τ1....τ2k-1 } 
contain the node addresses of the BB in the order in which they are traversed so 
that the NOE belongs to SB 0 and the NOX belongs to SB δ.
The following algorithm is valid for k > n, whereas the case k = n is inves-
tigated at the end of this section.  
28 
Table 4.1 The SOE's and SOX 's for n = 2 
BB Address .SOE SOX Form 
0000 11 00 3→0 
0001 00 01 0→1 
0011 01 00 1→0 
0010 00 10 0→2 
0110 10 00 2→0 
0111 00 01 0→1 
0101 01 00 1→0 
0100 00 11 0→3 
1100 11 00 3→0 
1101 00 01 0→1 
1111 01 00 1→0 
1110 00 10 0→2 
1010 10 00 2→0 
1011 00 01 0→1 
1001 01 00 1→0 
1000 00 11 0→3 
For the reverse case of S → 0, nodes are traversed in exactly the opposite 
order of that for the 0→δ  case. The next step determines the SOE and SOX pair 
for each BB in the RH. Given a BB with address α  its SOE has address equal 
to the offset of the bit where a differs from its immediately preceding code in the 
cyclic RGC(2n). Its SOX has address equal to the offset of the bit where n differs 
from its immediately succeeding code in the cyclic RGC (2n). Table 4.1 illustrates 
the SOE's and SOX's for n = 2. 
Formally, assuming that, the 22n BB's of the RH(k,n)  are visited using the 
RGC(2n), their SOX's are given by the exit sequence ES(p) derived by ES(p) = 
{ES'(p),p}, where ES'(j) is found recursively by ES'(j) = {ES' (j —1), j, ES' (j — 1)
} and ES'(0) = 0, 1 ≤  j ≤ p and p = 2n - 1. 
29  
Example 4.2 For the RH (k, 1), we have p = 21  — 1 = 1, 
ES'  (1) = {ES'(0),1,ES'(0) = {0, 1 ,0},and ES(1) = {0,1,0,1}. 
Similarly, for the RH ( k , 2), we have p = 22 — 1 = 3, ES' (3) = {ES' (2), 3, ES' (2)}. 
But ES' (2) = {ES' (1),2, ES' (1)} = {0,1,0,2,0,1,0}, therefore  
ES' (3) = {0,1,0,2,0,1,0,3,0,1,0,2,0,1,0} and 
ES(3) = {0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,3} 
From the ES(p) sequence, the list of SOE — SOX pairs of the form 0↔δ  
can be derived. Let ES(p) = {ζ0, ζ1, ...,ζ22n-1}. Then the sequence of these pairs 
is given by SP = {(
ζ22n-1 → ζ0, ζ0 → ζ1, ..., ζ22n-3 → ζ22n-2, ζ22n-2 → ζ22n-1)}, as the SOX of any BB is the SOE of the succeeding BB in the RHC(2n). 
Example 4.3 For the RH(k, 2), as calculated in the previous example 
SP = {3 → 0, 0 → 1, 1 → 0, 0 → 2, 2 → 0, 0 → 1, 1 → 0, 0 → 3, 3 → 0, 0 → 1, 1 → 0, 0 → 2, 2 → 0, 0 → 1, 1 → 0, 0 → 3}, as shown in Table 4.1. 
The sequence TSP[i] that shows the order in which the nodes of the ith BB in 
the RGC(2n) are traversed is then produced, such that the first node belongs to the 
SOE while the last node belongs to the SOX. Finally, each ring node is assigned an 
RH address by concatenating a BB address with the address of the node within the 
BB as determined earlier. The following C-like program performs the concatenation. 
Theorem 4.2 Given an N -node RH(k,n) with k > n, there is an optimal mapping 
of the N-node ring onto it. 
30  
Proof.        The combination of the Gray codes RGC and UGC allows the traversal 
of all the nodes in every BB. 	 ■  
Example 4.4 Here is a complete example for the mapping of a ring onto the 
RH(3,1). 
The mapping is shown in Figure 4.1. The links which are actually used for 
the mapping arc represented by thick lines. Following the nodes in. Figure 
according to the above mapping sequence (along the thick links), it can be 
verified that the mapping is optimal since all nodes are used and the dilation of 
all edges is equal to 1. 
3
1  
Figure 4.1 Mapping of a 32-node ring onto the RH(3,1) 
32  
For n = k, which corresponds to the case where each SB contains only one 
node, the mapping is not always optimal. The mapping algorithm presented for the 
case of n > k is good for the case of n = k except for the step in which the 
t0 → δ 
 
sequences are produced. Keep in mind that since each SB contains only one node, the 
terms SOE and NOE and the terms SOX and NOX are equivalent, respectively. 
he terms SOT and SOX are used here. To obtain the   sequences the following 
algorithm must be used. 
Let SOX = δ = δn-1 δn-2 δn-3 ... δ1 δ0, where δ = δ    δn-2 δn-3 ... δ 1  = 
RGCϕ(n - 1) and ϕ represents the index of the particular code in the RGC(n — 1) 
sequence. 
33  
4. SP = {3 → 0, 0 → 1, 1 → 0, 0 → 2, 2 → 0, 0 → 1, 1 → 0, 0 → 3, 3 → 0, 0 → 1, 1 → 0, 0 → 2, 2 → 0, 0 → 1, 1 → 0, 0 → 3} 
 
It can be seen in the previous example that one node is not used in each of the 
cases T0→3 and T3→0. The following theorem is pertinent. 
Theorem 4.3 For n = k, BB's where the address δ of the SOE or the SOX is one 
of the following two types: 
 
 
• the most significant bit is 0 and the parity of the remaining bits is even  
34 
• the most significant bit is 1 and the parity of the remaining bits is odd 
one SB will not be utilized in the mapping of the ring. SB's of the form just described 
will be referred to as special SB's. 
Proof. Assume the mapping of SB's onto a mesh of size 2n-1 x 2. The row 
numbers of the mesh are encoded using the RGC ( n — 1) while the column 
numbers are 0 and 1. Each point in the mesh then represents the. SB whose 
address is obtained by concatenating the encoded row and column numbers. 
The mapping algorithm traverses SB's in a serpentine-like manner in Step 1, 
while in Step 2 it traverses SB's using straight lines. For special SB's 
mentioned in the theorem, the SB which is in the same row with the SOX is 
omitted in the traversal. 	
■ 
Corollary 4.1 The mapping of a ring onto the RH(k,1) with the same number of 
nodes is optimal. 
Proof. 	Since n = 1, according to Theorem 4.3 there are no special SB's. 	
If a dilation of two can be tolerated, then all nodes can be used for the mapping of 
the ring. In this case, the previously unused node now precedes the NOX. 
4.2 Mapping of a Torus onto an RH  
4.2.1 Mapping of a 2 — D Torus onto an RH 
The algorithm for the mapping of a 2 — D torus onto the RH( k,n ) is an expanded 
version of the algorithm for ring mapping with k = n. The 2k-n nodes contained 
in each SB now form an entire row of the torus; this is accomplished by using the 
RGC(k—n) to map a linear array with 2k-n nodes onto the SB. Parallel connections 
between SB's now form columns of the torus. The address of each node in the torus 
is the pair of its Cartesian coordinates, i.e. its row and column addresses. 
35 
The encoded column addresses of torus nodes are then the ones assigned 
according to the RGC(k — n). The mapping procedure for the ring is now used 
to form the sequence of rows. As each SB forms a row. the problem is equivalent 
to mapping the ring with 22n+n nodes (this is the total number of SB's) onto the 
RH(n,n). Figure 4.2 shows the mapping of the 4 x 8 torus onto the RH(3, 1); all 
edges of the RH are used in this case. 
Figure 4.3 shows the mapping of the 2k-2 x 60 torus onto the RH(k, 2). The 
arcs illustrate the sequence of inter-BB and inter-SB edges selected by the mapping 
algorithm when starting with SB 3 of BB 0. The four encircled SR addresses 
illustrate the unused SB's for this mapping. 
With the above algorithm, a 2 - D torus of size Y x 2k-n is mapped onto the 
RH (k,n), where an upper bound on Y is 22n+n. If Y equals the upper bound then all 
nodes in the RH are used in an optimal mapping. The reason that Y cannot always 
attain its maximum value is the potential presence of some special SB's within the 
BB's, as stated in Theorem 4.3. 
We can take one step further and count for a given RH(k,n) how many pairs 
0 ↔  δ of each kind we have. Since the special SB's can then he determined, the 
utilization of nodes in the RH ( k,n ) can be calculated. 
Theorem 4.4 Given the RH(k,n) there exist 22n - δ BB's with SOE-SOX pairs of 
the form ↔ δ , for δ= 1, 2, . , 2n — 2 and n > 1, and 4 pairs for δ= 2n — 1 . 
Proof. 	The proof applies mathematical induction with the RGC(n). 	■ 
Figure 4.2 Mapping of the 4 x 8 torus onto the RH(3,1) 
36  
of unutilized SB's is 32+8+4 = 44. The total number of SB's is 
211= 2,048 SB's. Therefore, Utilization= = 0.978 or 97.8%. 
37  
Example 4.6 For the RH (k,3) we calculate the occurrences in each case. Special 
SB's are in boldface. 
The special SB's according to Theorem 4.3 are 3, 5 and 6. Hence, the number 
Corollary 4.2 There exists an optimal mapping of the. 2 - D torus of size 2k-1 x 23  
onto the RH(k,1). 
Proof.    For n = 1, according to Theorem 4.3 there are no special SB's, therefore 
all SB's are fully utilized. 	 ■ 
Table 4.2 shows the utilization of SB's for the cases n = 1, 2, 3 and 4. which 
correspond to realistic systems. The results in Table 4.2 clearly show that for n > 1 
the utilization is almost 1 and increases as n increases. 
4.2.2 Mapping of a µ — D Torus onto an RH 
The mapping algorithm of Subsection 4.2.1 is modified slightly here for the purpose 
of mapping the µ — D torus onto the RH ( k,n ). The modification of the algorithm 
follows. In the mapping of the 2-D torus, all nodes in an SB are used to form one 
dimension (i.e. a row) of the 2-D torus and SB to SB  parallel connections create the 
second dimension. It is evident that this second dimension mapping cannot change. 
38  
Figure 4.3 Mapping of the 2k-2 x 60 torus onto the RH(k, 2) 
39  
Table 4.2 Efficiency of torus mapping 
n Total of BBs SBs per BB Total of SBs Loss of SBs Utilization 
1 4 2 S 0 1.000 
2 16 4 64 4 0.937 
3 256 8 2,048 44 0.978 
4 65,536 16 1,048,576 11,474 0.989 
Therefore, we must extract more dimensions from each SB. Since the SB is a regular 
cube, and since we have the theory for mapping a µ-D torus onto a hypercube, this 
becomes an easy problem. It is obvious that the maximum number of dimensions 
that can be attained is equal to k — n + 1 . 
4.3 Mapping of a Binary Tree onto an RH 
The mapping of a complete (balanced) binary tree onto a regular hypercube can be 
carried out using either one of two approaches. The first. approach uses a one-to-one 
mapping while as a result of the second approach there exist hypercube nodes that. 
simulate more than one binary tree node. 
It is impossible to derive a one-to-one mapping of an n-level binary tree with 
2n —1 nodes onto an n-cube with 2n nodes, for all n ≥ 3. The largest binary tree that 
can be mapped in one-to-one fashion onto an n-cube has n — 1 levels and contains 
2n-1  — 1 nodes [3, 12]. Approximately half of the hypercube nodes are then utilized. 
In fact, two (n — 1)-level disjoint binary trees can be mapped simultaneously, leaving 
just one unused hypercube node. We would like to examine the situation for the 
case of the RH(k,n ) . Since the RH(k,n ) contains 22" complete k-cubes (BB's). 
then 2
2n+1 ( k — 1)-level 
 disjoint binary trees can be mapped simultaneously onto it, 
as each BB can accommodate for two el binary trees. 
40 
An algorithm in [3] uses a single two-degree node as a child of the root and 
stretches it (i.e., creates a double root with a. spacer node) to utilize all the nodes 
in the hypercube. The extra node is used only for communication between the 
root and one of its children. All edges have dilation one, except for the one whose 
image contains the spacer node and therefore has dilation two. This way an n-level 
(complete) binary tree is mapped onto an n-cube, and all nodes of the n-cube are 
used as a spacer node is also required. 
The regular hypercube has symmetric structure and therefore any node can 
be chosen as the root of the binary tree when the algorithm in [3] is applied, as 
long as the same transformation is applied to all node addresses. Each BB provides 
a k - level binary tree with a spacer node, and pairs of binary trees can be linked 
together through their spacer nodes as presented in [3] (one inter- BB edge out of the 
2k, edges that interconnect a pair of BB's, are used for this purpose). Therefore, a 
(k + 1)-level binary tree with 2k+1 — 1 nodes is obtained from each pair of BB's for a 
total of 22n-1 binary trees; each such binary tree contains a single spacer node. The 
roots of these binary trees can then be interconnected in a binary tree structure with 
maximum dilation n + 1 because up to n hops are required within a BB in order to 
reach a SB that implements connections in a specific dimension. Therefore, all but 
22n-1 (spacer) nodes are utilized. Due to the latter binary tree, 22n-1 nodes are used 
twice in this mapping. 
The resultant mapping has low dilation because practical values of n are 
normally in the range from 1 to 3. Therefore, RH's are also suitable for the 
emulation of binary trees. 
CHAPTER 5 
CONCLUSIONS  
The main objective of the research presented in this thesis was to develop algorithms 
for embedding into the RH topology three topologies very frequently used for the 
development of parallel algorithms. These frequently used topologies are the ring. 
the torus, and the binary tree. 
For the ring structure, the proposed mapping algorithm for the RH ( k,n ) , where 
k > n, achieves dilation one and all the nodes are utilized. Clearly this is an optimal 
mapping in all respects just as that for the regular hypercube. Taking into account. 
the fact that a realistic system will indeed have k > n. the above result is very 
important. Even for the case where n = k, a mapping with dilation one can be 
attained with a. relatively very small number of unutilized nodes as shown in Table 4.1 
of Subsection 4.2.1. The case of n = 1 is a special case in which no nodes remain 
unutilized. All nodes can be used in the case of k = n if a dilation of two can be 
tolerated. 
The mapping algorithm for the torus structure has turned out. to be an 
expanded version of the mapping algorithm for the ring structure for the case 
of n = k. For a mapping with dilation one only a small number of nodes remain 
unutilized for all n > 1. For the mapping of a square torus with maximum utilization 
of nodes, the following condition must be satisfied for the RH(k, n ): n - k = 22n  
The problem of mapping the binary tree onto the RH was expected to 1w 
more challenging. The proposed algorithms are based on existing algorithms for 
the conventional hypercube. These algorithms were applied to RH building blocks 
which are complete k-cubes. The mapping of trees with maximum size was achieved 
by slightly modifying the algorithms and also connecting building blocks in pairs. 
Therefore, multiple trees are obtained, though smaller in size than those obtained 
41  
42 
from a conventional hypercube with the same number of nodes. Although it seems 
that mapping of binary trees onto RH's is a more difficult problem than that of 
regular hypercubes, the fact that RH's permit the design of considerably larger 
systems is a major advantage. A larger system and a proper choice of n and k will 
result in desired tree sizes. 
It has already been established that a RH has significantly lower VLSI 
complexity compared to a. regular hypercube with the same number of nodes. As a 
direct consequence of this, larger systems may be designed, thus achieving significant. 
speedups. In addition it was shown that the RH maintains a low diameter and 
average distance with respect to its total number of nodes. Since it was also shown 
in this thesis that RH's permit the efficient mapping of frequently used topologies. 
RH's are viable alternatives to the hypercube for massively parallel processing. 
REFERENCES 
1. S.G. Ziavras, "On the Problem of Expanding Hypercube-Based Systems,-
J. Parallel Distrib. Computing 16, 1 (Sept. 1992), pp. 41-53. 
2. S.G. Ziavras, "RH: A Versatile Family of Reduced Hypercube Interconnection 
Networks," To appear in IEEE Trans. Parallel Distributed Systems. 
3. S.R. Deshpande and R.M. Jenevin, "Scalability of a Binary Tree on a 
Hypercube," Proc. International Conference on Parallel Processing, IEEE 
Computer Society, Silver Spring, MD, Aug. 1986. pp. 661-668. 
4. K. Chose and K.R. Desai, "The HCN: A Versatile Interconnection Network 
Based on Cubes," Proc. Supercomputing '89, IEEE Computer Society 
and ACM SIGARCH, Nov. 1989, pp. 426-435. 
5. W.D. Hillis, The Connection Machine, MIT Press, Cambridge, MA, 1985. 
6. C.T. Ho and S.L. Johnsson. "Spanning Balanced Trees in Boolean Cubes.-
SIAM J. Sci. Stat. Comput. 10, 4 (July 1989). pp. 607-630. 
7. H.P. Katseff, "Incomplete Hypercubes," IEEE Trans. Comput.& C-37, 5 (May 
1988), pp. 604-608. 
8. F.P. Preparata and J. Vuillemin, "The Cube-Connected Cycles: A Versatile 
Network for Parallel Computation," Comm. ACM 2,1. May 1981, pp. :300-
309. 
9. H.P. Katseff, "Incomplete Hypercube," IEEE Trans. Comput. C-37. 5 (May 
1988), pp. 604-608. 
10. N.F. Tzeng, H.L. Chen. and P.J. Chuang, "Embeddings in Incomplete 
Hypercubes," Proc. International Conference Parallel Processing, IEEE 
Computer Society, Silver Spring, MD, Aug. 1990, pp. 335-339 (Vol. I). 
11. C. L. Seitz, "Concurent VLSI Architectures," IEEE Trans. Comput. C-33, 12. 
(Dec. 1984), pp. 1247-1265. 
12. F.T. Leighton, Introduction to Parallel Algorithms and Architectures, Morgan 
Kaufmann Publ., San Mateo, CA, 1992, pp. 407-410. 
13. T.F. Chan, and V. Saad, "Multigrid Algorithms on the Hypercube Multipro-
cessor," IEEE Trans. Comput. C-35, 11 (Aug. 1988). pp. 969-977. 
14. S.L. Johnsson, "Communication Efficient Basic Linear Algebra Computation on 
Hypercube Architectures," J. Parallel Distrib. Comput. 4. 2 (Apr. 1987). 
PP. 1:33-172. 
15. A.Y.Wu, "Embedding of Tree Networks into Hypercubes," J. Parallel Distrib. 




16. T.H. Lai., and W. White, W. "Mapping Pyramid Algorithms into Hypercubes," 
J. Parallel Distrib. Comput. 9 (1990) pp. 42-54. 
17. S.G. Ziavras and M.A. Siddiqui, "Pyramid Mappings onto Hypercubes 
for Computer Vision: Connection Machine Comparative Study." 
Concurency: Practice Experience, Vol. 5, No. 6, Sept. 1993, pp. 471-889. 
