Louisiana State University

LSU Digital Commons
LSU Historical Dissertations and Theses

Graduate School

1993

Interconnection Networks Embeddings and Efficient Parallel
Computations.
Emadeddin Mohamed Abuelrub
Louisiana State University and Agricultural & Mechanical College

Follow this and additional works at: https://digitalcommons.lsu.edu/gradschool_disstheses

Recommended Citation
Abuelrub, Emadeddin Mohamed, "Interconnection Networks Embeddings and Efficient Parallel
Computations." (1993). LSU Historical Dissertations and Theses. 5554.
https://digitalcommons.lsu.edu/gradschool_disstheses/5554

This Dissertation is brought to you for free and open access by the Graduate School at LSU Digital Commons. It
has been accepted for inclusion in LSU Historical Dissertations and Theses by an authorized administrator of LSU
Digital Commons. For more information, please contact gradetd@lsu.edu.

INFORMATION TO USERS
This m anuscript has b een reproduced from the microfilm m aster. U M I
films the text directly from the original or copy subm itted. Thus, some
thesis and dissertation copies are in typewriter face, while others may
be from any type o f com puter printer.
The quality o f this reproduction is dependent upon the quality o f the
copy su b m itted . B ro k en o r indistinct print, colored o r p o o r quality
illustrations and photographs, print bleedthrough, substandard margins,
and im proper alignm ent can adversely affect reproduction.
In th e unlikely ev en t th a t th e au th o r did not send U M I a co m p lete
m anuscript and th ere a re missing pages, these will be noted. Also, if
unauthorized copyright m aterial had to be rem oved, a note will indicate
th e deletion.
O versize m ateria ls (e.g., m aps, drawings, charts) are re p ro d u c e d by
sectioning th e original, beginning at the u p p er left-hand co rn e r and
continuing from left to right in equal sections with small overlaps. E ach
o rig in a l is also p h o to g ra p h e d in o n e ex p o su re a n d is in c lu d e d in
reduced form a t the back of the book.
Photographs included in the original m anuscript have b een reproduced
xerographically in this copy. H igher quality 6" x 9" black and w hite
photographic prints are available for any photographs o r illustrations
appearing in this copy for an additional charge. C ontact U M I directly
to order.

U niversity Microfilms International
A Bell & Howell Information C o m p a n y
3 0 0 North Z e e b R o a d . Ann Arbor, Ml 4 8 1 0 6 - 1 3 4 6 U S A
313/761-4700

800/521-0600

Order N u m b er 9405380

In tercon n ection netw orks em beddings and efficient parallel
com p u tation s
Abuelrub, Emadeddin Mohamed, Ph.D.
The Louisiana State University and Agricultural and Mechanical Col., 1993

UMI

300 N. Zeeb Rd.
Ann Arbor, MI 48106

INTERCONNECTION NETWORKS EMBEDDINGS
AND EFFICIENT PARALLEL COMPUTATIONS

A D is se rta tio n

S u b m itte d to th e G r a d u a te F a c u lty o f th e
L o u isia n a S ta te U n iv ersity a n d
A g ric u ltu ra l a n d M e c h a n ic a l C ollege
in p a r tia l fu lfillm en t o f th e
re q u ire m e n ts fo r th e d e g re e o f
D o c to r o f P h ilo so p h y

in
T h e D e p a rtm e n t o f C o m p u te r Science

by
E m a d e d d in A b u e lru b
B S in C o m p u te r E n g in e e rin g , O k la h o m a S ta te U n iv ersity , 1984
B S in C o m p u te r S cience, O k la h o m a S ta te U n iv ersity , 1985
M S in C o m p u te r S cience, A la b a m a A & M U n iv ersity , 1987
A u g u st 1993

Acknowledgements
I would like to express my sincere appreciations to Prof. Said Bettayeb for all the
guidance and encouragement he has given me throughout my study. He has been a
good friend and a very supportive advisor.
I would like to thank my committee members, Prof. Ahmad El-Amaway, Prof.
Raymond Fabec, Prof. Sitharama Iyengar, Prof. Bush Jones, and Prof. Si-Qing Zheng,
for all their encouragement and support. Finally, I thank my brothers Ahmad and
Mohamed whose endless support served as the foundation for my achievements.

Table of Contents
A cknow ledgem ents.....................................................................................................

ii

L ist of F ig u re s ..............................................................................................................

v

A b s tr a c t.........................................................................................................................

vii

C H A PT E R 1 In troduction ..........................................
1.1 Flynn’s Taxonom y...................... ....;...........................................................

1
2

1.2 A Taxonomy of Topologies
...........................................................
1.3 Data Routing ................................................................................................

6
8

1.4 O verview .....................

10

1.5 Preliminaries and Term inology.................................................................
1.6 Graph E m b ed d in g
.................

13
18

1.7 Cost F unctions
................................................................................
1.8 Fault-Tolerance ...............................
1.9 Outline of the D issertation.........................................................................

20
22
24

C H A PTE R 2 P arallel C om putation on the Twisted H y p e rc u b e...................
2.1 Data Communication .................................................................................
2.2 Basic O perations.........................................................................................
2.2.1 Associative Com putations..............................................................

26
26
27
28

2.2.2 Parallel P refix ...................................................................................

32

2.3 S u m m ary .......................................................................................................

33

C H A PTE R 3 E m bedding Trees into Twisted H y p e rc u b e s.............................

36

3.1 Introduction..................................................................................................
3.2 Embedding Complete Binary Trees into Twisted H ypercubes
3.2.1 The Recursive E m bedding.............................................................

36
38
38

3.2.2 The Inorder Embedding .................................................................
3.3 Embedding Complete Quad Trees into Twisted Hypercubes ...............

45
49

3.4 S u m m ary......................................................................................................

57

C H A PTE R 4 E m bedding Rings into Faulty Twisted H y p e rc u b e s ...............

58

4.1 Introduction..................................................................................................
4.2 Fault-Free E m beddings..............................................................................

58
59

iii

4.2.1 The Embedding S equence..............................................................
4.2.2 Divide-Conquer Em beddings........................................................

60
63

4.3

Fault-Tolerance Embeddings ...................................................................
4.3.1
Embedding in the Presence of a Single Fault ..........................
4.3.2
Embedding in the Presence of Multiple F a u lts ........................

66
66
70

4.4

S u m m ary....................

75

CHAPTER 5 Fault-Tolerance Embedding of Rings into H ypercubes
5.1 Introduction.................................................................................................
5.2 Fault-Free Em beddings.............................................................................
5.2.1 Divide-Conquer Em beddings........................................................
5.3 Fault-Tolerance Embeddings ...................................................................
5.3.1Embedding in the Presence of a Single Fault ................................
5.3.2 Embedding in the Presence of Multiple F a u lts
.....................
5.4 S u m m ary.....................................................................................................

76
76
77
80
83
83
88
91

CHAPTER 6

93

Concluding Rem arks..................................................

Bibliography..........................................................................................................

96

V ita...........................................................................................................................

103

iv

List of Figures
Figure 1.1 SIMD and MIMD Paradigms ................................................................

4

Figure 1.2 Some popular interconnection networks .............................................

9

Figure 1.3 Hypercubes and twisted hypercubes for n = 1,2, and 3 ....................

17

Figure 2.1

Broadcasting in a twisted hypercube...................................................

27

Figure 2.2

The addition operation on a twisted hypercube of dimension 3 ......

30

Figure 2.3

The addition operation via the broadcast tr e e .....................................

31

Figure 2.4 The prefix operation on a twisted hypercube of dimension 3 .....

34

Figure 3.1

Embedding CB into TQ for n = 1, 2, 3, and 4 .....................................

39

Figure 3.2

Embedding CB into TQ for n = 5 ..........................................................

40

Figure 3.3

The recursive embedding of CB into TQ ...........................................

42

Figure 3.4

The inorder embedding of CB into TQ for n = 1,2, and 3 ................

45

Figure 3.5

The inorder embedding of CB into TQ ...............................................

47

Figure 3.6

Partitioning C Q .......................................................................................

50

Figure 3.7

The recursive embedding of CQ into T Q ...........................................

51

Figure 3.8

Embedding CQ into TQ for n = 2 ..........................................................

52

Figure 3.9

Embedding CQ into TQ for n = 3 ..........................................................

54

V

Figure 4.1

The embedding sequence.......................................................................

60

Figure 4.2

Fault-free em bedding.............................................................................

63

Figure 4.3

Blocks and cubes

..................................................................................

64

Figure 4.4

All possible locations of a faulty node ................................................

68

Figure 4.5

Single fault em bedding..........................................................................

69

Figure 4.6

Multiple faults em bedding....................................................................

72

Figure 4.7

All possible cases of two adjacent faulty cubes ................................

74

Figure 5.1

The embedding sequence......................................................................

78

Figure 5.2

The cube ..................................................................................................

81

Figure 5.3

Fault-free em bedding.............................................................................

82

Figure 5.4

All possible locations of an upper faulty node within a c u b e

85

Figure 5.5

All possible locations of a lower faulty node within a c u b e

86

Figure 5.6

Single fault em bedding...........................................................

87

Figure 5.7

Multiple faults em bedding....................................................................

90

Figure 5.8

All possible cases of two adjacent faulty cubes ................................

92

vi

Abstract
To obtain a greater performance, many processors are allowed to cooperate to
solve a single problem. These processors communicate via an interconnection net
work or a bus. Parallel machines are classified as either message passing machines
where processors have their own memory or shared memory machines where several
processors share the same memory. In this dissertation, we focus on the former. The
most essential function of the underlying interconnection network is the efficient inter
changing of messages between processes in different processors. The potential com
munication bottleneck has been the main drive in the design of interconnection net
works. Parallel machines based on the hypercube topology have gained a great
respect in parallel computation because of its many attractive properties. Many ver
sions of the hypercube have been introduced by many researchers mainly to enhance
communications. The twisted hypercube is one of the most attractive versions of the
hypercube. It preserves the important features of the hypercube and reduces its diame
ter by a factor of two. This dissertation investigates relations and transformations
between various interconnection networks and the twisted hypercube and explore its
efficiency in parallel computation. The capability of the twisted hypercube to simulate
complete binary trees, complete quad trees, and rings is demonstrated and compared
with the hypercube. Finally, the fault-tolerance of the twisted hypercube is investi
gated. We present optimal algorithms to simulate rings in a faulty twisted hypercube
environment and compare that with the hypercube.

CHAPTER 1

Introduction

The need for faster computers has not ceased since the beginning of the computer
era. New applications seem to push existing computers to their limit. The computer
industry shows a continuous effort to increase the computational speed of computers.
In the last four decades, dramatic increases in computing speed were achieved. Most
of these were largely due to the use of faster electronic components by computer man
ufacturers. As we went from vacuum cubes to transistors and from small to very large
scale integration, we witnessed the growth in size and range of the computational
problems that we could solve. The state-of-the-art in VLSI technology can’t satisfy
the growing computational demands in many scientific and engineering applications.
Without high performance computers, many of these challenges can’t be solved within
a reasonable time period.
In the last decade, as progress in VLSI has led to small size, low cost, and high
performance processors, it has become practical to build parallel computers containing
a very large number of processors. In parallel computation, a collection of processors
cooperate to solve a problem by working simultaneously on different parts of the prob
lem. The two major components of a parallel machine are the processors and the
interconnection network that ties them together. A main concern in the development

l

of such a system with this many processors is fault-tolerance. Since the probability of
one or more processors or links becoming fault in such complex systems is significant,
it is desirable to build some fault-tolerance features into the architecture.
Although parallel processing is not a new concept, its deviation from the tradi
tional Von Neumann computational model has introduced many new problems. The
extra complexity required for data communication among the processors can degrade
system performance and make programming on a parallel processing system much
harder than on a uniprocessor system. If each of the processors works autonomously,
the synchronization among different processes will further increase the complexity of
the system. Unless we have a clear understanding of these problems and the efficient
tools to solve them, the full power of parallel processing cannot be achieved.
This dissertation adds to the growing body of work that addresses highly parallel
computing for models of parallel machines. We specifically investigate relations and
transformations between various interconnection networks and explore their efficiency
in parallel computation. Both faulty and fault-free parallel architectures are consid
ered.

1.1 Flynn’s Taxonomy
Parallel machines can be categorized by their interconnection network topolo
gies. Also, we classify parallel machines as either shared memory or message passing
machines. Within each of these categories, we further divide them into vector versus
MIMD within the shared memory category and static versus dynamic within the

3

message passing category. Message passing designs offer higher levels of parallelism
through the interconnection of thousands of processors via an interconnection net
work. In such systems, there is no global memory or program space. The design of
message passing parallel machines places great demand on communication speed, data
partitioning, and routing.
The most widely accepted classification of parallel computation models is the
one proposed by Flynn [F], who viewed the Von Neumann model as a single stream of
instructions controlling a single stream of data (SISD). Flynn viewed parallelism as a
single stream of instructions controlling a multiple stream of data (SIMD) or a multi
ple stream of instructions controlling a multiple stream of data (MIMD). Figure 1.1
shows SIMD and MIMD paradigms. Traleaven [T] classified MIMD machines fur
ther. The data mechanism was divided into shared-memory and message-passing
approaches. The terms multicomputer and multiprocessor, respectively, are usually
used to distinguish these two approaches.
In SIMD machines, all processors operate under the control of a single instruc
tion stream issued by a central control processor. All processors do the same instruc
tion, or nothing, each on a different datum. SIMD is the most useful paradigm for
massively parallel scientific computing. Many scientific applications naturally fall
into the SIMD paradigm such as image processing and particle simulation. In SIMD
machines, a single instruction stream is acted upon by many processors in a lock step
fashion. Only one instruction counter is used to sequence through a single copy of the
program. The data that is processed by each processing element differs from one

4

P2
Interconnection
Network

CP

Pn

(a) SIMD

P2
Interconnection
Network

Pn

(b) MIMD
Figure 1.1: SIMD and MIMD paradigms.

processor to another. Therefore, a single program and a single control unit simultane
ously act on many different collections of data by controlling a collection of homoge
neous processors. SIMD is the basic paradigm of synchronous data parallel comput
ing. The classic example of parallel SIMD computers is the ILLIAC-IV, with 64 iden
tical processing elements each receiving the same stream of instructions to be exe
cuted on its own data item.
In MIMD machines, processors operate under the control of their own stream of
instructions which allows great flexibility. Each processor is fully programmable and
capable o f executing its own program. MIMD is the most general model of paral
lelism. Synchronization is achieved explicitly and locally rather than through a global
synchronization mechanism. This provides a lot of flexibility, but it also means that
the software that is needed to program the machine is more complex and much harder
to implement. MIMD is useful when the problem allows multiple heterogeneous tasks
to be performed at the same time. This is most likely to occur when the number of
tasks to be performed is not known and the tasks perform different operations from
one another.
MIMD is general enough to contain SIMD, because we can emulate SIMD
behavior by restricting MIMD through careful programming. However, there may be
severe performance penalties inherent in simulation of one form on a machine of dif
ferent form.

6

1.2 A Taxonomy of Topologies
Interconnection networks and their combinatorial properties have been the topic
of many recent research in the area of parallel processing ([AJ], [AK], [CLe], [FS],
[Gou], [II], [K], [Lei], [LE], [Si], [Sn]). An efficient interconnection structure should
have a low number of links per node, a small intemode distance, and a large number
of alternate paths between a pair of nodes for fault-tolerance. In a parallel machine,
the average intemode distance, message traffic density, and fault-tolerance are very
much dependent on the diameter and the degree of the network. There is a tradeoff
between the diameter and the degree of a network. A network with a low degree has a
large diameter and a network with a low diameter usually has a large degree. A ring
structure and a completely connected structure represent the two extremes. The diam
eter multiplied by the degree is usually a good criterion to measure the efficiency of an
interconnection structure [AJ].
Most of the communication problems in parallel processing systems come from
the fundamental different approaches adopted by uniprocessor systems and parallel
processing systems to support interprocess communications. In a uniprocessor sys
tem, all processes reside in a single processor and all interprocess communications are
supported by main memory references. As a result, any process can easily send a mes
sage to any other process with a uniform delay determined principally by the main
memory clock cycle. On the other hand, in a parallel processing system, different pro
cesses usually reside in different processors. Interprocess communications are sup
ported by an interconnection network. The delay incurred in an interconnection

network is much greater than that in a uniprocessor. The delay time depends on the
number of processors and the communication pattern. We call the extra interprocess
communication time in a parallel processing system the communication delay.
The two main sources of the extra communication overhead in parallel process
ing are the time for the messages to go through one or more intermediate processors,
in the absence of a direct link between the two processors communicating, and the
contention for a single link by more than one message at the same time. These delays
result from the mismatch of the communication characteristics of the parallel pro
grams and those of the parallel processing system.
An interconnection topology of a set of processors is a mapping from the set of
processors onto the same set of processors. The mapping describes how to connect
processors to other processors, with each processor usually connected to a small num
ber of processors in a regular pattern. For example, a ring topology is a mapping that
connects a processor with label i to processors with labels i - 1 and i + 1. A complete
binary tree topology, is a mapping that associates processors to the nodes of a com
plete binary tree where the root processor is connected to two other processors, inte
rior processors are connected to three other processors, and leave processors are con
nected to only one processor. Most parallel machines are distinguished by their inter
connection topologies. While the speed and capacity of parallel machines may vary,
the most significant difference between them is their interconnection topologies.
Interconnection networks that provide communication between the processors
have ranged from the simple to the complex, representing the trade off between speed

and cost. At one extreme is the ring network, in which each processor is linked to
only two other processors. Messages are passed along the network from one processor
to another by hopping through intermediate processors. At the other extreme in con
nectivity is the all-to-all network, in which each processor has its own private link to
every other processor in the network. Between these two extremes, there is a number
of other networks with intermediate numbers of neighbors. Figure 1.2 shows some
popular interconnection networks.
Interconnection networks can be classified into dynamic and static networks.
Dynamic networks create links between processors as the program executes. Static
networks are fixed by design and can’t be changed after the machine is built. Parallel
machines based on the hypercube static interconnection structure are one of the most
popular- because they possess many attractive properties that are needed in parallel
processing.

1.3 Data Routing
In a parallel processing system, if more than one message must be sent from a
source to a destination at the same time, some links can be contended by more than
one message. Since each link can support the communication of only one message at
any instant, this contention introduces extra communication delay into the system. A
good data routing algorithm should support parallel communication in the system with
minimum delay.

x^x
Ring

Hypercube

Cube-connected cycles

Butterfly

Tree

Mesh

Systolic array

Linear array

Figure 1.2: Some popular interconnection networks.

10

Circuit switching and packet switching are the two principal kinds of data routing
mechanisms. In circuit switching, a physical path is established between the source
and the destination. In packet switching, data are put in packets and routed through
the interconnection network without establishing a physical connection path. Circuit
switching is generally much more suitable for bulk data transmission, while packet
switching is more efficient for many short messages.
In parallel processing systems for image processing, computer graphics, robot
vision, and scientific computation, communications are heavy and message sizes are
small. For these reasons, packet switching is usually preferred. There are two kinds
of control strategies for packet switching, centralized and distributed. In centralized
control, the decision to route packets is based on global information. In distributed
control, each processor decides how to route the data based on its local information.

1.4 Overview
The Parallel Random Access Machine (PRAM) is used as a standard theoretical
model for parallel computation.

A PRAM is a synchronized machine with an

unbounded number of identical processors and a global memory which allows simul
taneous reads and writes from and into the same memory location ([AG], [Q], [U]).
Algorithms will run faster on this model than on real machines. Actual machines can’t
be built without a significant delay in access time. The best that one can hope for is
that access time is proportional to log N , where N is the number of processors [AG].
This led many institutions to design parallel machines based on the message passing

MIMD approach. The classic example of such an architecture is the MARK-II Cos
mic Cube.
Based on Kung’s sorting algorithm for meshes [TK] and Batcher’s merge sort for
cube connected machines, Nassimi and Sahni [NS] proved that a Random Access
Read (RAR) can be accomplished with complexity 0 (q 2n) on a q dimensional n q
mesh machine and Oilog2 AO on an TV cube connected or perfect shuffle machine.
Also, they proved that a Read Access Write (RAW) can be accomplished with com
plexity 0 (q 2n + dqn) on a q dimensional mesh machine and 0(log2N + dlog N ) on
an N cube connected or perfect shuffle machine, where d is the maximum number of
data items written into any processor.
Many researchers have concentrated on tiding efficient ways to simulate PRAM
on other parallel machines. The first reasonable deterministic simulation of a PRAM
was proposed by Upfal and Wigderson

[UW],

Their simulation

achieved

Oilog2 N log log N ) time to simulate one step of a PRAM algorithm on an N proces
sor network. Alt et al. [AHMP] subsequently improved the time complexity to
/ - \ / i

_ _2

v \io g

»r\

/v

Valiant [V] reported a probabilistic routing algorithm that can perform any per
mutation on a hypercube machine of size N in 0(log N ) steps. The algorithm consists
of two consecutive phases. In the first phase, it sends each packet p to a randomly
chosen node v. For each packet p , every node has the same probability of being cho
sen , which is

The choices for the different packets are independent of each other.

In the second phase, it routes each packet p from the intermediate node v to its desti
nation node. At each instant, there is exactly one copy of each packet A packet
might be transmitted along an edge, waiting in a queue associated with an edge, or
stored as a loose packet in an intermediate node. For simplicity, the algorithm is
described in a synchronized fashion. It alternates between a transmitting mode and a
bookkeeping mode. In the transmitting mode, the packet at the head of each queue is
transmitted along the edge associated with it and stored as a loose packet at the recipi
ent node. In the bookkeeping mode, each loose packet is assigned to the queue of one
of the outgoing edges according to some random choice, unless it has nowhere further
to go in the current phase.
Valiant proved that this distributed randomized algorithm can route packets to
their destination in a hypercube machine without two packets passing through the
same communication link at the same time in 0(log N ) with high probability. Each
packet carries with it O(log N ) bits of information and no other communication
among the nodes is needed. This result implies that a hypercube machine can simulate
a PRAM with an increase in the execution time for each step. Each PRAM step can
be simulated in approximately 0(log N ) steps on a network of size N . Therefore, we
can develop algorithms for the PRAM since we know how to translate them into algo
rithms for actual machines.

13

1.5 Preliminaries and Terminology
Several Structures have been proposed in the literature for interconnecting a large
network of processors. Many parallel machines that are based on these structures are
now commercially available. The Cosmic Cube [Se] is the first completed experimen
tal parallel machine based on the hypercube structure. It becomes the archetype of
early operative parallel machines. Since the Cosmic Cube, many machines based on
the hypercube structure have been built and made commercially available such as
Amet S/14, NCUBE/10, Intel BPSC, and the Connection Machine [H].
Parallel machines based on the hypercube topology have gained a great interest
in parallel computing because of their flexibility and suitability for general purpose
applications. Many of the properties of the hypercube that make it a desirable parallel
machine are a direct consequence of the graph theoretic properties of the hypercube
topology. The hypercube offers a rich interconnection topology with large bandwidth,
logarithmic diameter, simple routing and broadcasting of data, recursive structure that
is suited to divide and conquer applications, homogeneous and symmetric structure,
and the ability to simulate other interconnection networks with minimum overhead.
Also, it has a high fault-tolerance structure. Fault-tolerance and related issues are
becoming an important topic in the design and analysis of parallel machines.
The hypercube has been the topic of many recent research. Various researchers
have done extensive work in showing the parallel computational power of the hyper
cube machine in many directions. In one direction, many researchers have shown the
capability of the hypercube machine to simulate other networks such as rings, trees,

14

grids, and other interconnection networks with minimum overhead ([BCGS], [BCLR],
[BMS], [BSu], [MS], [SS], [Lei]). In another direction, researchers have shown the
power of the hypercube in solving many computational problems in parallel such as
sorting, merging, matrix multiplication, and parallel prefix ([A], [HB], [LE], [Lei], [P],
[Q], [QD], [St]). In a third direction, researchers have shown the robustness and faulttolerance of the hypercube, focusing on the hypercube’s ability to simulate, compute,
route, and reconfigure itself in the presence of faults ([AGr], [BS], [HLNa], [HLNb],
[WCM], [CL]).
Finally, many researchers have proposed modifications on the hypercube struc
ture to improve its computational power ([BH], [EBSS], [ENS], [EL], [PV], [YN]).
Bhuyan and Agrewal [BA] proposed a generalized hypercube structure. Preparata and
Vuillemin [PV] introduced the cube-connected cycles in which the degree of the diam
eter was reduced to 3. Latifi and El-Amaway [EL] proposed the folded hypercube to
reduce the diameter and the traffic congestion with little hardware overhead. Youssef
and Narahari [YN] proposed the Banyan-hypercube network which combines the
advantageous features and properties of Banyans and hypercubes and thus reduce the
communication overhead.
A hypercube of dimension n, denoted by Qn, is an undirected graph consisting of
2" vertices, each vertex corresponds to an n-bit binary string, labeled from 0 to 2" - 1
and such that there is an edge between any two vertices if and only if the binary repre
sentation of their labels differ in exactly one bit position. Each vertex is incident to n
other vertices, one for each bit position. The edges of the hypercube can be naturally

15

partitioned according to the dimensions that they traverse. An edge is called a dimen
sion i edge if it links two vertices that differ in the ith bit position.
Another version of the hypercube, called the twisted hypercube, was introduced
by Efe et. al. [EBSS]. Twisted hypercubes proved to contain the attractive properties
of the hypercube and a better communication capabilities. In parallel machines, the
communication cost dominates the computation cost. The overall performance of the
parallel machine depends heavily on the underlying interconnection network. In
twisted hypercubes, the diameter is reduced by a factor of two over that of the hyper
cube. Many of the hypercube’s attractive features such sa partitioning, routing, and
embedding are incorporated into the twisted hypercube and new gains are achieved in
diameter, average distance, and embedding efficiency ([ABc], [E], [Z]).
Two binary strings x =

and y = y iy 0» each of length two, are pair-related if

and only if ( x , y) e {(00,00),(10,10),(01,11),(11,01)}. Let G be any undirected
labeled graph, then Gb is obtained from G by prefixing every vertex label with b. We
define a twisted hypercube as follows.

A twisted hypercube of dimension n, denoted TQn, is an undirected graph consisting
of 2" vertices labeled from 0 to 2” - 1 and defined recursively as follows [EBSS].
(i)

TQi is the complete graph of two vertices with labels 0 and 1.

(ii) For n > 1, TQn consists of two copies of TQn_x one prefixed by 0, TQ°n_x, and
the other by 1, TQln_v Two vertices u = 0un_2...uo e TQ°n_x and v = lv„_2...v0 e
TQl„-i are adjacent if and only if

16

1.

m„_2 =

v„_2, if n is even, and

2. for 0 < i < L ( « - l)/2j, u2i+\uzi and v2l+1v2(- are pair-related.
Such an edge (u, v) is referred to as a dimension n edge, for all n > 1.
There exist a dilation two and expansion one embedding of the twisted hyper
cube into the hypercube and vice virsa [E]. Figure 1.3 shows hypercubes and twisted
hypercubes for n = 1,2, and 3. It is more convenient to view both the hypercube and
the twisted hypercube in this way, where the upper part consists of all nodes with even
labels and the lower part consists of all nodes with odd labels. An upper node is a
node that lies in the upper part of the structure, i.e., its least significant bit is a 0. A
lower node is a node that lies in the lower part of the structure, i.e., its least significant
bit is a 1. An upper link is a link that connects two upper nodes and a lower link is a
link that connects two lower nodes.
Trees are special kind of graphs which have a wide variety of applications in the
field of computer science. A k-ary tree of height n - 1 is an undirected graph that has
Ckn - 1 )
—- — — vertices and consists of a root of degree k with no parent and k children,
interior nodes of degree k + 1 with one parent and k children, and leaves of degree one
with one parent and no children. Spanning trees are very important in the context of
efficient communications and in the determination of distances between nodes in a
network. Binary trees are important tools in the evaluation of formulas and in the
study of branching of processes.

17

10

11

000

010

110

Oil

100

101

.00

10

100

01

11

101

Figure 1.3: Hypercubes and twisted hypercubes for n = 1,2, and 3.

The importance of complete binary trees comes from the fact that this class of
structures is useful in the solution of banded and sparse systems by direct elimination
and capture the essence of divide and conquer algorithms ([BI], [Gor], [HS]). A com
plete binary tree of height n - 1, denoted by CBn, is an undirected graph consisting of
2" - 1 vertices, such that every vertex of depth less than n - 1 has exactly two children
and every vertex of depth n - 1 is a leaf.
Quad trees are becoming an important representation technique in the domains of
image processing, computer graphics, and robotics [Sa]. This representation is based

18

on the principle of recursive decomposition. A complete quad tree of height n - 1,
(4” - 1 )
denoted CQn, is an undirected graph consisting of — - — vertices, such that every
vertex of depth less than n - 1 has exactly four children and every vertex of depth
n - 1 is a leaf.
Rings are another special kind of graphs that has many real world applications
and are used in the solution of many computer science problems such as the passing
token problem and the Hamiltonian circuit problem [I]. A ring of size n, denoted R„,
is an undirected graph consisting of n vertices labeled from vj to v„ such that node v,
is a neighbor to node v(M)mod „, 1 < i < n .

1.6 Graph Embedding
In this dissertation, we use undirected graphs to model interconnection networks.
Each vertex represents a processor and each edge a communication link between pro
cessors. The embedding of a guest graph G = (VG, E G) into a host graph H =
(V h , E h ) is an injective mapping / from VG to VH, where VG, E G and VH, E H are the
vertex and edge sets of G and H, respectively.
Many computational problems in parallel processing can be formulated as graph
embedding problems. Embedding one interconnection network into another is very
useful in the area of parallel computing for portability of algorithms across various
architectures, layout of circuits in VLSI, and mapping logical data structures into com
puter memories ([BMS], [Len]). Also, the problem of organizing computations on a

19

network of processors can be formulated as a graph embedding problem [KS]. When
a process can be naturally decomposed into a collection of subprocesses that can be
executed simultaneously with occasional communication between them, a task graph
can be constructed by denoting each subprocess by a node and each communication
between two subprocesses during the computation by an edge.
The problem of simulating one interconnection network by another is a natural
graph embedding problem. Usually, it is assumed that the host network can grow arbi
trarily large. This assumption is not realistic and does not correspond to actual parallel
machines. In the real world, a parallel machine has a fixed number of processors.
Thus, the problem of efficiently simulating a large network is an important issue. This
type of embedding is called many-to-one, where more than one node in the guest
graph are mapped to a single node in the host graph. If the embedding maps a single
node in the guest graph to more than one node in the host graph, then the embedding is
one-to-many. In this dissertation the word embedding refers to one-to-one embedding,
where a single node in the guest graph is mapped to exactly one single node in the
host graph. Many variations of embeddings in interconnection networks have been
studied in the literature ([AR], [BCGS], [BCLR], [BI], [BLD], [BMS], [BSu], [DS],
[JLD], [Lei], [LEI], [MS]). These variations differ principally in the optimization
measures used in the embeddings.

20

1.7 Cost Functions
The quality of an embedding is often guided by some constraints which may dif
fer from one application to another. The most common measures are dilation, expan
sion, edge congestion, and load factor [HMR]. If u and v are two adjacent nodes in G,
denoted u - v, then the distance from u to v, d = (u, v), is the length of the shortest
path from u to v. The dilation D is the maximum distance in H between the images of
adjacent vertices of G

D = max {d (f(u ), /(v )), where « - v e £ c )

The expansion E is the ratio of the cardinality of the host vertex set to the cardinality
of the guest vertex set

Minimizing each of these measurements has a direct implication on the quality of
the simulation of the guest network by the the corresponding host network. The dila
tion of an embedding measures how far apart neighboring guest processors are placed
in the host network. Clearly if adjacent guest processors are placed far apart in the
host network, then there will be a significant degradation in simulation due to the long
length of the communication path between them. The expansion of an embedding
measures how much larger is the host network than the guest network during the

21

simulation. We want to minimize expansion, as we want to use the smallest possible
host network that has at least as many processors as in the guest network.
In reality, we usually have a fixed size host network and we may have to con
sider many-to-one embedding for larger guest networks. When the size of the guest
network is not equal to the size of the host network in terms of the number of proces
sors, then we try to find the smallest host network that has at least as many processors
as the guest network. Such a host network is referred to as the optimal host network.
There is a trade off between dilation, which measures the communication delay, and
expansion, which measures processor utilization, such that one can achieve lower
expansion at a cost of greater dilation and vice versa.
Another cost measure is the congestion which is the maximum number of edges
of the guest graph routed through a single edge of the host graph. Edge congestion is
a measurement of possible degradation due to communication delay. If a particular
link in the the host network is needed for several different communication messages,
then the messages will suffer some delay time since the link can’t pass more than one
message at a time. This will add extra time to the communication cost between pro
cessors.
In embeddings that are many-to-one maps, an important measure is load factor
which is the maximum number of guest processors to be simulated by a single proces
sor in the host interconnection network.

This has been considered by many

researchers, for instance ([BL], [DS]). Clearly, it is very important to minimize the
load factor in the simulation of one network by another, as the distinct processors in

22

the guest network assigned to the same processor in the host network will be running
sequentially. An unbalanced processor load will degrade the simulation time as lightly
used processors must wait for heavily used processors to finish their tasks. Thus, the
amount of time needed to simulate one step of the guest network is proportional to the
maximum number of processors assigned to the same host network.

1.8 Fault-Tolerance
One of the most important issues related to parallel machines is fault-tolerance.
As the number of processors in parallel machines becomes larger, models without
faults are becoming increasingly unrealistic. A fa u lt is a processor or a link that fails.
We use a strong fault model where a faulty node can neither compute nor communi
cate with its neighbors. A node fault will completely distroy the node and all links
incident to it. We model a faulty link by making one of the nodes incident to the link
faulty. An interconnection network containing faulty components is called a faulty
network and a one without faulty components is called a fault-free network.
Fault-tolerant network architectures have emerged as an important area of study
in parallel processing ([AGr], [BS], [CL], [HLNa], [HLNb], [LBT], [PM], [WCM]).
The fault-tolerance o f a network is the capability of the network to compute, route,
simulate other networks, and reconfigure itself in the presence of faults. Clearly, if all
the immediate neighbors of a nonfaulty node become faulty, the network will become
disconnected. Many researchers studied the implementation of algorithms that are
designed for fault-free machines on faulty machines.

The efficiency of the

23

implementation is usually measured by its slowdown. The slowdown S is the ratio
between the algorithm’s time requirements on the faulty machine and the algorithm’s
time requirements on the fault-free machine

c_

*3

Time o f algorithm on a faulty machine
11 1
Time o f algoritm on a fa u lt —free machine

A significant difference between multiprocessor machines and other parallel
machines is that these machines use message passing instead of shared memory for
communication between processors. Each processor has a private local memory. This
type of architecture can be scaled up to a very large number of processors compared to
multicomputer designs based on globally shared memory. This model has some desir
able characteristics with respect to fault-tolerance and error confinement as well. A
faulty processor can be prevented from corrupting data in other processors if the faults
are detected quickly. Contrast this to a shared memory multiprocessor where a faulty
processor can potentially write into any location in memory and thereby corrupt an
entire system within a very short time.
One issue that is usually addressed in the design of fault-tolerance is the mecha
nism for detecting faulty processors. Many researchers suggested the use of off-line
testing of each processor, assuming there is a set of functional tests that can be run by
one processor on another. But it is very difficult to validate the completeness of the
functional testing strategies. Also, off-line testing can only detect permanent faults.
Intermediate and partial faults occur more frequently than permanent faults in parallel

24

machines. In order to detect these faults, it is necessary to have some kind of concur
rent fault detection features [AG]. This dissertation is not addressing the detection of
faults. Therefore, the faults are assumed to be known in advance.
Massively parallel message passing machines are receiving increasing attention
to meet the demand for high speed reliable computing. Hypercube interconnection
networks have emerged as one of the most effective and popular network architectures
for fault-free and faulty environments. The hypercube structure is highly fault-tolerant
and can handle a reasonable amount of interprocessor message traffic. When one or
more processors fail, the relatively large number of links often enables the nonfaulty
processors to continue communicating with one another. The ability of hypercube
machines to simulate, route, and reconfigure themselves in the presence of faults has
been addressed by many researchers ([BS], [CL], [HLNa], [PM], [WCM]).

1.9 Outline of the Dissertation
The capabilities of the twisted hypercube as a parallel machine is demonstrated in
Chapter 2. We show the capabilities of the twisted hypercube to provide efficient
broadcasting and routing and to perform basic parallel computations. The communi
cation time of several computations is reduced by a factor of two over that of the
hypercube. These include sorting, matrix multiplication, and associative computa
tions. Finally, an implementation of the parallel prefix operation on the twisted hyper
cube is presented.

25

In Chapter 3, we present dilation two and expansion one embeddings of complete
binary trees and complete quad trees into twisted hypercubes. We introduce two dif
ferent schemes to embed a complete binary tree into a twisted hypercube of approxi
mately the same size. The first scheme uses a recursive technique to embed the com
plete binary tree CBn into the twisted hypercube TQn based on the embedding of
CBn_x into TQn_]. The second scheme uses the inorder labeling of the complete binary
tree to embed it into the twisted hypercube. Finally, a recursive scheme to embed a
complete quad tree into its optimal twisted hypercube is presented.
In Chapter 4, we present optimal algorithms for embedding a ring into a twisted
hypercube with fault-free nodes, single faulty node, and multiple faults. We show that
a twisted hypercube TQn with 2” nodes can simulate a ring

with 2" - / nodes in

the presence of / twisted hypercube faults. We use divide and conquer techniques and
a new data structure called a cube to achieve our results.
Chapter 5 presents new techniques to embed a ring of size 2" - 2 / into a hyper
cube of dimension n despite the presence of / < 2"~3 faults. The basic idea behind our
technique is to partition the whole structure into cubes, avoid the faults within the
cubes by using unused links, and construct the whole ring by connecting adjacent
cubes. Finally, we conclude with discussion and open problems in Chapter 6.

CHAPTER 2

Parallel Computation
on the Twisted Hypercube

This chapter addresses data communication and basic parallel computations on
the twisted hypercube. This chapter is organized as follows. Section 1 reviews some
of the work that has been done to show the capability of the twisted hypercube to per
form efficient broadcasting and routing of data. Section 2 addresses some of the basic
parallel operations on the twisted hypercube. Section 3 concludes the chapter.

2.1 Data Communication
One of the most important components of an interconnection network is its com
munication mechanism. In a parallel machine, communications become a bottleneck
due to a great amount of time that is spent in interchanging information between dif
ferent processors. It is very important to get the right data to the right place within a
reasonable time.
Broadcasting is the most essential communication operation in an interconnec
tion network. The height of the broadcast tree of a network is at most its diameter.
Since the twisted hypercube reduces the diameter by a factor of two, the height of its
broadcast tree is also reduced by a factor of two. The broadcast tree of any network

27

can be easily found by running a breadth first algorithm. The breadth first spanning
tree constructed by the breadth first algorithm represents the broadcast tree of the net
work [Lei]. Efe [E] and Zheng [Z] independently introduced broadcasting and routing
algorithms for the twisted hypercube. Figure 2.1 shows the broadcast tree of a twisted
hypercube for n = 3.

2.2 Basic Operations
This section demonstrates the ability of the twisted hypercube to perform many
of the basic operations that are needed in designing parallel algorithms. These opera
tions usually appear as subproblems in solving other major problems. Sorting is the
most common subtask activity performed on parallel computers. It is the heart of
many other computations. Many problems involve a sort so that later access of

000

010

11 0

101

Oil

,001

111

Figure 2.1: Broadcasting in a twisted hypercube.

28

information can be done efficiently. In [E], the author shows that the twisted hyper
cube can reduce the communication steps for the rank sort by a factor of two. The
rank sort is considered to be the fastest sorting algorithm implemented on the hyper
cube machine. Like sorting, matrix multiplication is a fundamental operation that
appears in many numerical computations. Efe [E] shows that the twisted hypercube
can reduce the communication time of the matrix multiplication algorithm by a factor
of two.

2.2.1 Associative Computations
Associative operations are used frequently and appear as subproblems in solving
other problems. They include addition, multiplication, finding the smallest, finding the
largest, and others. Let + be the addition operation on some domain X . For a given
tuple {*0 , * i, —, *k- \ } e X , the addition operation is to compute the summation y0 =
x0 + x x + ... + x k_lt
We assume that each processor Ph 0 < i < 2" - 1 , contains the value

The

computation is considered to be complete when the final summation y0 is at processor
0. The symbol <=' denotes a data transfer from a processor to an adjacent processor
by a link through dimension j . The function BIT(y') returns the j ,h bit of the node’s
label. The addition operation is performed by the following algorithm.

29

ADDITION (X )
begin
fo r all Ph 0 < / < 2" - 1, do
yt <- Xi
fo r j
n to 1 do
fo r all P,, 0 < i < 2j - 1, do
if BYT(j) = 1
then tempk W y,, where P k is a neighbor through dimension /.
if BIT(y) = 0
then yi <— y, + tempi
end for
end fo r
end

Figure 2.2 shows the addition operation on a twisted hypercube of dimension 3.
The initial value jc, and the current sum y, of each node are given for each phase.
Algorithm ADDITION takes n communication steps which is the same time that takes
to run the same procedure in a hypercube machine. But since the height of the broad
cast tree of the twisted hypercube is reduced by a factor of two, then the number of
communication steps of the addition operation is also reduced by a factor of two ([E],
[Z]). Since some of the nodes in the broadcast tree might have up to n children, the
binary adders must be replaced by (n + l)-adders. Figure 2.3 shows the implementa
tion of the addition operation via the paths of the broadcast tree. The number of steps
is reduced from 3 to 2 communication steps.

( 3 ,3 )

(7 ,7 )

( 4 ,4 )

(6,6)

(8 ,8 )

( 2 ,2 )

(9 ,9 )

( 5 ,5 )

(a) Initial values

( 3 ,9 )

( 7 , 11 )

( 4 .4 )

( 6,6 )

( 8 , 17 )

(2 ,7 )

( 9 ,9 )

( 5 ,5 )

(b) After step 1

( 3 , 20 )

( 7 , 11 )

( 4 ,4 )

(6,6 )

( 8 , 24 )

( 2 ,7 )

( 9 .9 )

( 5 ,5 )

(c) After step 2

( 3 ,4 4 )

( 7 , 11 )

( 4 ,4 )

(6 ,6 )

( 8 , 24 )

(2 ,7 )

( 9 ,9 )

( 5 ,5 )

(d) After step 3

Figure 2.2: The addition operation on a twisted hypercube of dimension 3.

31

3

4

5

2

9

(a) Initial values

3

(b) After step 1

44

o

o

(c)

After step 2

Figure 2.3: The addition operation via the broadcast tree.

32

2.2.2 Parallel Prefix
In this section, we implement the parallel prefix operation on a twisted hyper
cube. The prefix operation is a very important operation that appears frequently in
designing parallel algorithms. It was first introduced by Ladner and Fischer [LF] to
solve the carry look-ahead problem for binary addition. The prefix operation was used
by many researchers to solve a variety of problems in the field of computer science. In
[Lei], the prefix operation was used to solve recurrence equations, to find convex hulls
of images, to route packets in interconnection networks, and to solve the problem of
computing carries. In [A], the prefix sum was used to solve the job sequencing prob
lem with deadlines and the knapsack problem. Plaxton [P] used the prefix operation to
implement a fast sorting algorithm called smooth sort, which was designed to run on
the hypercube.
Let ® be a binary associative operation on some domain X. For a given tuple
{jc0, x lt ..., x k_ i} e X , the prefix problem is to compute each of the partial sums,
assuming © is addition, yf = x0 © Xi © ... © x h 0 < i < k - 1. We assume that each
processor Ph 0 < i < 2" - 1, contains the value x h The computation is considered to
be complete when the partial sum y, = x0 © x x © ... © x { has been completed at pro
cessor i, 0 < / < 2” - 1. The local variables y, and f, accumulate the partial and total
sums, respectively. The symbol <=7 denotes a data transfer from a processor to an
adjacent processor by a link through dimension j . The function BITO') returns the j ,h
bit of the node’s label.

33

PREFIX (X )
begin
fo r all Pi, 0 < / < 2" - 1, do
y* <- Xi
ti < - X i

end for
for j <- 1 to n do
fo r all Pit 0 < i < 2” - 1, do
tempk <=j t{, where Pk is a neighbor through dimension j .
ti <— t{ © tempi
if BTT(j) = 1
then yi <— y, © tempi
end for
end for
end

It is obvious that the algorithm runs in n time steps, where n is the dimension of
the twisted hypercube. During the j th step, each node sends its current total sum to its
adjacent node through dimension j r . The partial and total sums of each node are
updated based on the value of the j th bit of its label. Figure 2.4 shows the prefix com
putation on a twisted hypercube of dimension 3. The initial value jc,, the current par
tial sum y, , and the current total sum

of each node are given for each phase.

2.3 Summary
This chapter demonstrated the capabilities of the twisted hypercube as a parallel
machine to provide efficient broadcasting and routing and to perform basic parallel
computations. The communication time of several computations is reduced by a fac
tor of two over that of the hypercube. These include sorting, matrix multiplication,
and associative computations.

Finally, an implementation of the parallel prefix

(3 ,3,3 )

(7, 7 ,7 )

(4 , 4 , 4 )

(6 ,6 ,6 )

(8 ,8,8 )

(2,2,2 )

( 9 ,9 , 9 )

(5 ,5,5 )

(a) Initial values

( 3 , 3 , 11 )

( 7,7,9 )

(8 ,11 ,11 ) | ( 2 ,9 ,9 )

( 4 , 4 , 13 )

( 9 , 13 , 13 )

(6 ,6, 11)

1 ( 5 , 11 , 11 )

(b) After step 1

( 3 , 3 , 20 )

( 7 , 18 , 20 )

( 4 , 15 , 24 )

( 6 , 6 , 24 )

(8 , 11,2 0 )

(2,20,20 )

( 9 , 24 , 24 )

( 5 , 11 , 24 )

(c) After step 2

(3 , 3 ,4 4 )

( 7 , 18 , 4 4 )

( 4 , 35 , 4 4 )

( 6 . 26 . 4 4 )

( 8 , 11 , 4 4 )

( 2 , 20 , 4 4 )

( 9 , 44 , 4 4 )

( 5 . 31 . 4 4 )

(d) After step 3

Figure 2.4: The prefix operation on a twisted hypercube of dimension 3.

35

operation on the twisted hypercube is presented. At the end of the computation, each
processor will have its initial value, the partial sum, and the total sum.

C H A P T E R

3

Embedding Trees
into Twisted Hypercubes

3.1 Introduction
Embedding trees into other interconnection networks attracted the attention of
many researchers: in [BCLR], [BI], and [MS] embeddings trees into hypercubes were
considered; in [BLD] the authors have considered embedding complete binary trees
into hypercubes; in [LEI] the authors have considered embedding binary trees into 3-D
mesh arrays; [DS] considered simulation of binary trees and X-trees on pyramid net
works; [HJ] addressed embedding quad trees into hypercubes; and [KHI] considered a
reconfigurable embedding of a complete quad tree into a faulty hypercube environ
ment.
It is well known that the complete binary tree CBn with 2" - 1 nodes is not a sub
graph of the hypercube Qn with 2” nodes. This means that a unit dilation and unit
expansion embedding from CBn into Qn is not possible. The proof is straightforward
by the use of bipartite graphs. Both complete binary trees and hypercubes are bipartite
graphs, their nodes can be assigned two colors so that adjacent nodes are not assigned
the same color. Coloring Qn produces equal number of nodes in each color class,
where coloring of CBn gives unequal number of nodes in each color class. Therefore,

36

37

CB„ can’t be a subgraph of Q„ since it has more nodes in one color class than the
number of nodes of Qn in the same color class, i.e., 2"-1 + 2"~3 + • • • > 2"-1.
The complete binary tree CBn can be embedded into Qn such that exactly one of
its edges is assigned to a path of length two in the hypercube and all other edges are
assigned to paths of length one in the hypercube. So, CBn can be embedded into Qn
with dilation two and expansion one ([BCLR], [BI], [Lei], [W]). Bhatt and Ispen [BI],
Barasch et. al. [BLD], and Wu [W] gave recursive dilation two and expansion one
embeddings of complete binary trees into hypercubes based on a structure called tworooted complete binary tree.
It is an open problem whether all binary trees can be embedded into their optimal
hypercube with dilation two or into their next to optimal hypercube with dilation one.
Bhatt et. al. [BCLR] showed that arbitrary binary trees can be embedded into hyper
cubes with constant expansion and dilation 10. The constants were subsequently
reduced by Monien and Sudborough [MS], giving a dilation 5 and expansion one
embedding and a dilation 3 and constant expansion embedding.
T »i

t il 1c

o n t a r

X ll U U i9 v u a p u / l

u r a

r o V ia r M n o

WV- t n u U U U V C U li.J.l/1 1 /U L O V U U l l L d

id

m tv tK n rl

U llU L U

/

-

»

/

e U tn p it/lL

>
U
liia
i yr

+ • * /- » /* r*

U & gd

and complete quad trees into twisted hypercubes ([ABa], [ABe]). The remainder of
this chapter is organized as follows. Section 2 describes two different schemes to
embed a complete binary tree into a twisted hypercube of the same size. Section 3
introduces a recursive technique to embed a quad tree into its optimal twisted hyper
cube. Section 4 concludes the chapter.

38

3.2 Embedding Complete Binary Trees into Twisted Hypercubes
This section describes our schemes to embed a complete binary tree CBn into a
twisted hypercube TQn with dilation two and unit expansion. In the first scheme, we
use a recursive algorithm to embed CB„ into TQn based on the embedding of CB„_i
into TQn-\. In the second scheme, we use the inorder labeling to embed CBn into
TQ n-

3.2.1 The Recursive Embedding
The complete binary trees CBX, CB2, CB3, and CB4 can be embedded with dila
tion one into TQX, TQ2, TQ3, and TQ4, respectively, as shown in Figure 3.1. The com
plete binary tree CB5 can be embedded with dilation two into TQ5 as shown in Figure
3.2. For n > 5, we use a recursive algorithm to embed CB„ into TQn based on the
embedding of CBn_x into TQn_x. The base of the recursive algorithm is CB5.
We proceed in four steps. In the first step, CBn is partitioned to a left complete
binary subtree LCBn_x with root Ir, a right complete binary subtree RCBn- X with root
rr, and a root r. In the second step, TQ„ is partitioned to two subcubes, TQ°n_x and
TQX„~i- In the third step, LCBn_x is embedded into TQ°„_X, RCBn_x is embedded into
TQxn_j , and r is embedded into the extra unused node in TQxn_x. In the fourth step,
we construct CBn by joining LCBn_x, r, and RCBn_x. This is done by finding the paths
r~lr and r~rr, each of length two. Our embedding is such that all edges in the lowest
four levels in the complete binary tree CBn are mapped to paths of length one in the

39

00

10

“000

110

010

100

bl

b2

'01

11

,001

101

,011

0100

oooo

1000
b2

c2
£001

Oil

£101

101

b2

Figure 3.1: Embedding CB into TQ for n = 1, 2, 3, and 4.

1011

1001

40

Figure 3.2: Embedding CB into TQ for n - 5 .

41

twisted hypercube TQn and all other edges in higher levels are mapped to paths of
length two as shown in Figure 3.3. Now we present a formal description of the recur
sive algorithm described above.

A lgorithm 3.1

Let

be the binary string of length n with a 1 in position i and 0 in all other positions,

0k be the binary string of length k with 0 in all positions, and © be the XOR operator.

For n = 1, 2, 3, and 4, a dilation one embedding is shown in Figure 3.1. For n = 5, a
dilation two embedding is shown in Figure 3.2. For n > 5, the algorithm is as follows.
Step 1: Partition CBn to LCBn_x, RCBn_j, and r.
Step 2: Partition TQn to TQ°„_X and TQln_x.
Step 3: (i) Embed LCBn_x and RCB„_X into TQ°n_x and TQln_x, respectively. Ir and
rr will appear at addresses O110„_51O and 1110„_51O, respectively.
(ii) Translate the embedding inT Q }n_x by complementing the (n - l)th bit
of each node. Formally, if a tree node was embedded at address x then
after the translation it will appear at address x® 5n_x. The root rr will
appear at address rr®8n_x, i.e., rr will appear at address 1O10„_51O. The
extra unused node u will appear at address u®8n_x,i.e., u will appear at
address 11O0„_51O.
(iii) Embed the root r into the unused node 11O0„_51O in TQln_x.

42

Figure 3.3: The recursive embedding of CB into TQ.

43

Step 4: Construct CBn from LCBn_x, r, and RCBn_x by finding the shortest two paths
r~lr and r~rr, each of length two. Let x and y be the extra nodes that r~lr
and r~rr go through, respectively, x will appear at address r®8n and y will
appear at address r®Sn_2- The shortest paths from r to Ir and from r to rr are
11

O0 „_5 1 O -O 1 O0 „_5 1 O - O110„_51O and 11O0„_51O - 1110„_51O - 1O10„_51O,

respectively.

T heorem 3.1. For all n, Algorithm 3.1 embeds the complete binary tree CBn within
the twisted hypercube TQn with dilation two.

Proof: For n < 4, the existence of an embedding with dilation one is shown in Figure
3.1. For n = 5, the existence of an embedding with dilation two is shown in Figure
3.2. For n > 5, we prove this by induction on the height of the binary tree. Our induc
tion basis is CB5. Assume the theorem is true for an embedding of CBn_x in TQn_x.
We now prove that the theorem is true for the embedding of CBn into TQn. In TQn,
consider the two subcubes TQ°n_x and TQxn_x. By induction hypothesis, there exist a
dilation two embedding of CBn_x into TQ°n_x and TQln_x. We assume that the two
embeddings are isomorphic, one is obtained from the other by complementing the
(n - I)'* bit. Since the number of nodes in CBn_x is less than the number of nodes in
TQn- x by one, then TQ°n_x and TQ1^ , contain two extra unused nodes located at
addresses 0000,,_510 and 11O0„_51O, respectively. Now we can use the extra unused
node in TQxn_x, the CBn_x of TQ°n_lf and the CBn_x of TQxn_x to construct the com
plete binary tree CBn.

Next we prove that the dilation of this embedding is two. We use the routing
algorithm of [EBSS] to show that the length of the shortest path from the root of CBn
to any of its children is of length two. Let r~lr be the shortest path from the root r of
CBn to the root lr of the left complete binary subtree LCBn_x and r~rr be the shortest
path from the root r of CBn to the root rr of the right complete binary subtree RCB„_X.
r will appear at address 11O0„_51O, lr at address O110„_51O, and rr at address
1O10„_51O. Notice that if we group the addresses of r, lr, and rr into pairs of bits,
from right to left, then they are pair-related except for the left most three bits. By
using the routing algorithm of [EBSS], the shortest paths from 11O0„_51O to 011^„_510
and from 11O0„_51O to 1O10„_51O are 11O^„_51O-O1O0„_51O - O110„_51O and
11O0„_51O - 1110„_51O - 1O10„_51O, respectively. So, the dilation of this embedding
is two. □
It can be proved easily that the edge congestion of this embedding is two. It is
obvious that the edge congestion of the lowest four levels of the complete binary tree
is one, since the dilation of the embedding is one. In the next higher level, only two
hypercube edges are used twice as shown in Figure 3.2. In all higher levels, each edge
from a parent to any of its children is mapped to a path of length two. Consider the
shortest paths r~lr and r~rr. The shortest path from the root 11O0„_51O to the left root
0116»„_510 is

<„_5 10 - 010<9„_510 - 011#„_510 and from the root 11O0„_51O to the

1 10 9

right root 1O10„_51O is 11O£;)_510 - 111 6»„_510 - 1O10„_51O. Notice that the path r~lr
uses an edge through dimension n from the root r to an intermediate node x and an
edge through dimension (n-3) from x to the left root r, while the path r~rr uses an

45

edge through dimension (n-3) from the root r to an intermediate node y and an edge
through dimension (n-1) from y to the right root rr. Therefore, the maximum number
of times a hypercube edge is used is two. So, the edge congestion of this embedding
is two.

3.2.2 The Inorder Embedding
Another way to embed the complete binary tree CBn into the twisted hypercube
TQ„ is the inorder labeling of the complete binary tree as shown in Figure 3.4. The
nodes of the complete binary tree are numbered inorder, each node of the complete

o

oo

,110

Figure 3.4: The inorder embedding of CB into TQ for n = 1, 2, and 3.

46

binary tree is mapped to the node in the twisted hypercube with the corresponding
address.
As illustrated in Figure 3.5, in the lowest level, each edge from a left child to its
parent is mapped to the corresponding twisted hypercube edge between the images of
the two nodes, while the edge between a right child to its parent is mapped to a path of
length two, from the right child to the left child and from the left child to the parent.
In the next level, each edge from a left child, or a right child, to its parent is mapped to
the corresponding twisted hypercube edge between the images of the two nodes. In all
higher levels, each edge from a left child, or a right child, to its parent is mapped to a
path of length two. Notice that the inorder embedding is simpler, but it is less efficient
in terms of the number of edges in the complete binary tree that are mapped to paths of
length two in the twisted hypercube.

Theorem 3.2. For all n, the inorder labeling of the complete binary tree CBn embeds
CBn within the twisted hypercube TQn with dilation two.

Proof: Let fik be the binary string of length k with 1 in all positions. For n < 3, the
inorder embedding is shown in Figure 3.4. For n > 3, we prove the theorem by induc
tion on the height of the binary tree. Our induction basis is CB3, a dilation two
embedding of CB3 into TQ3 is shown in Figure 3.4. Assume the theorem is true for an
embedding of CBn_x in TQn_x. We now prove that the theorem is true for the embed
ding of CBn in TQn. In TQn, consider the two subcubes TQ°n_x and TQxn_x. By
induction hypothesis, we can embed CBn_x into TQ°„_X and TQxn_x, with dilation two.

47

Figure 3.5: The inorder embedding of CB into TQ.

48

Since the number of nodes in CBn_x is less than the number of nodes in TQ„_y by one,
then r 0 o„_i and TQXn_x contain two extra unused nodes located at addresses 01 p n_2
and 11 Pn- 2 , respectively. Now we can use the extra unused node in TQ°n_x, the C£„_i
of TQ°n_i, and the CBn_x of TQln_x to construct the complete binary tree CBn with
2" - 1 nodes.
Next we prove that the dilation of this embedding is two. We again use the rout
ing algorithm of [EBSS] to show that the length of the shortest path from the root of
CBn to any of its children is of length two. Let r~lr be the shortest path from the root
r of CBn to the root lr of the left complete binary subtree LCBn_x and r~rr be the
shortest path from the root r of CBn to the root rr of the right complete binary subtree
RCB„_X. r will appear at address 01 p n_2, lr at address 00/?„_2, and rr at address
lOy0„_2. Notice that r, lr, and r r are identical except for the left most two bits. By
using the routing algorithm of [EBSS], the shortest paths from 01/?„_2 to 00/?„_2 and
from 01 Pn-2 t0 10/?„_2 are of length two. So, the dilation of this embedding is two. □
It is obvious that the edge congestion of this embedding is two. In the lowest
Ipypl pnr-h prjffo from
o nafPUt
\J A tL m U jpUiW'lit
tv f VI)

i*c Ipft ohilH ic mcinnoH to
thp rnirpcnnnH inn tw/ietpii
kV U1V W l l VU^/VI1UU1^

fcV M. V
O ivi. i. W
A
A
A
.A
V
*1
1
1

hypercube edge between the images of the two nodes, while the edge from a parent to
its right child is mapped to a path of length two, from the parent to the left child and
from the left child to the right child. This means that the only edge in this level that is
used twice is the edge from a parent to its left child. In the next higher level, each
edge from a child to its parent is mapped to the corresponding twisted hypercube edge
between the images of the nodes, i.e., each edge is used exactly once. In all higher

49

levels, each edge from a child to its parent is mapped to a path of length two. Con
sider the shortest paths r~lr and r~rr. Without loss of generality, consider the case
when n is even. The shortest path r~lr from the root 01/?„_2 to the left root 00/?„_2 is
n -2

01 # ,- 2 - 0 0 (0 1)

2

-00/?„_2 and the shortest path r~rr from the root 01/?„_2 to the right
n-2

n -2

child 10yf?„_2 is 0 i p n_2- \ l ( $ \ ) ~ -10/?„_2, where ( 0 1 ) ~ means the repetition of the 01
. n —2
pair -- - ■ times. Notice that the path r~lr uses an edge through dimension (n-1) from
the root r to an intermediate node x and an edge through dimension (n-1) from x to
the left root r, while the path r~rr uses an edge through dimension n from the root r
to an intermediate node y and an edge through dimension (n-1) from y to the right root
rr. Therefore, the maximum number of times a hypercube edge is used is two. So,
the edge congestion of this embedding is two.

3.3 Embedding Complete Quad Trees into Twisted Hypercubes
This section describes our scheme to embed a complete quad tree CQn into its
optimal twisted hypercube TQ2n_x with dilation two and expansion one. We proceed
in four steps. In the first step, CQ„ is partitioned into a left left complete quad tree
LLCQn_x with root llr, a left complete quad tree LCQn_x with root lr, a right complete
quad tree RCQ„_X with root rr, a right right complete quad tree RRCQn with root rrr,
and a root r as shown in Figure 3.6. In the second step, TQ2n_x is partitioned into four
subcubes TQ°°2n_3, TQ012n_3, TQu 2n_3, and TQl02n_3. In the third step, LLCQn_x is
embedded into TQ°°2n_3, LCQn_x is embedded into TQ012n_3, RCQn_x is embedded

50

r

nr

Figure 3.6: Partitioning CQ.

into TQn 2n- 3 >RRCQn_x is embedded into TQl02n-^, and the root r is embedded into
one of the unused nodes in TQ00^ -!- In the fourth step, we construct CQn by finding
the paths r~llr, r~lr, r~rr, and r~rrr, each of at most length two. The resulting
embedding is such that only 37.5% of the edges in the lowest level of the complete
quad tree and 50% of the edges in higher levels are mapped to paths of length two in
the twisted hypercube. The rest of the edges of the complete quad tree are mapped to
paths of length one in the twisted hypercube as shown in Figure 3.7. Now we present
a formal description of the recursive algorithm described above.

51

Figure 3.7: The recursive embedding of CQ into TQ.

52

noo

010

001

O il

(a) Standard

110

,100

U00

010

101

001

O il

110

100

101

(b) Alternate

Figure 3.8: Embedding CQ into TQ for n - 2 .

A lgorithm 3.2

Let Si be the binary string of length n with a 1 in position i and 0 in all other positions,
0k be the binary string of length k with 0 in all positions, and ® be the XOR operator.

For n = 1, CQi consists of exactly one node and can be embedded into TQX with two
nodes. For n = 2, a dilation two embedding is shown in Figure 3.8. For n > 2, the
algorithm is as follows.

Step 1: Partition CQn to LLCQn_x, LCQn_x, RCQ„_X, RRCQn_u and r.

53

Step 2: Partition TQ2n_x to 7 £ ° V 3,

^ 112, - 3 . and TQ10^ .

Step 3: (i) Embed LLCQn„x into TQ°°2n_3, LCQn_x into TQ01^,--}, RCQn_x

into

TQu 2n- 3 , and RRCQn_x into TQ102n_3. Hr, lr, rr, and rrr will appear at
addresses

0 0 0

#2„_4 , 0 1 0 #2n_4 ,

110

^ - 4 , and lOO0 2„-4 , respectively.

(ii) Translate the embeddings in TQ00^ ^ and TQl02n_3 by complementing
the (2n - 3)lh bit of each node. Formally if a tree node was embedded at
address x then after the translation it will appear at address x® S2n_3.
After the translation the left left root llr and the right right root rrr will
appear at addresses OO102„_3 and 1O102„_4, respectively. Therefore, the
final position of llr, lr, rr, and rrr are OOIO^^, 0106^,^, 11O02„_4, and
101

^ _ 4 , respectively.

(iii) Embed the root r into the node with label 0 in TQqo2ji_3.
Step 4: Construct CQ„ from LLCQ„_X, LCQn_x, RCQn_x, RRCQn_x, and r by finding
the four paths r~llr, r~lr, r~rr, and r~rrr. The edges r —llr and r —lr of
CQ„ are mapped to paths of length one in TQ2n_x, while the edges r - r r and
r - rrr are mapped to paths of length two. The shortest paths from r to rr
and from r to rrr are OOO0 2 „ _ 4 -O1O02„_4 - 11O02„_4 and 000d2n^ 100/92m_4 - lO 1 0 2n_4 , respectively.

Theorem 3.3:

For all n, Algorithm 3.2 embeds the complete quad tree CQn within

the twisted hypercube

with dilation two.

Figure 3.9: Embedding CQ into TQ for n = 3.

55

Proof: For n = 1, CQX can be easily embedded into TQ\. For n = 2, the existence of
an embedding with dilation two is shown in Figure 3.8. For n > 2, we prove this by
induction on the height of the complete quad tree CQn. Our induction basis is CQ3, a
dilation two embedding of CQ3 into TQ5 is shown in Figure 3.9. Assume the theorem
is true for an embedding of C<2„-i in TQ2n_3. We now prove that the theorem is true
for the embedding of CQn in TQln_l . In TQ2n-\, consider the four subcubes 7B002b_3,
TQ012„_3, TQn 2n_3 > and TQw2n_3. By induction hypothesis, there exist a dilation two
embedding from C£>„-i to TQ°°2n_3, TQ012^-3 . TQu 2n_3, and TQ102n_3. Since the num
ber of nodes in CQ„^ is less than the number of nodes in T Q ^,^, then TQ°°2n_3,
TQ012n-3, TQn 2 n - 3 ’ and TQ102n_3 contain extra unused nodes. Now we can use the
unused node with label 0 in TQ°°2n-3 > the CQn. x of TQ00^ ^ , the Cj2„-i of TQ012n_3,
the Cj2„-i of 7 2 112 ,1- 3 , and the CQn_x of TQ102n_3 to construct the complete quad tree
CQn.
Next we prove that the dilation of this embedding is two. Thus, we need to show
that the length of the shortest path from the root r to any of its four children is at most
two. Clearly, the length of the paths r~llr and r~lr is one since they are mapped to
edges in the twisted hypercube. Let r~rr be the shortest path from the root r of CQn
to the root rrr of the right complete quad subtree RCQ„_i and r - r r r be the shortest
path from the root r of CQ„ to the root rrr of the right right complete quad subtree
RRCQ„_j. r will appear at address

000

^ - 4 , rr will appear at address 1 1 0 6 >2 „^4 , and

rrr will appear at address 1 O1 0 2 /i-4 - Notice that if we group the addresses of r, rr,

56

and rrr into pairs of bits, from right to left, then they are pair-related except for the left
most three bits. By using the routing algorithm of [EBSS], the shortest path from
000f?2«-4 to 110#2*-4 is OOO02„-4 ~ 01002/1-4 “ 1106>2„-4 and from OOO02n-4 to lO102/>-4
is

0 00

$ 2 /i—4 - lO O ^ ^ -

101

^ 2 n- 4 - So, the length of the paths r~rr and r~rrr are two.

Therefore, the dilation of this embedding is two. □
It can be proved easily that the edge congestion of this embedding is two. It is
obvious that the edge congestion of the lowest level of the complete quad tree is two,
since one of the hypercube edges has to be used twice as shown in Figure 3.8. In all
higher levels, each edge from a parent to any of its left children is mapped to a path of
length one, while an edge from a parent to any of its right children is mapped to a path
of length two. Clearly, the edge congestion of the paths r~llr and r~lr is one since
their dilation is one. Now, consider the paths r~rr and r~rrr. The shortest path from
the root OOO0 2 « - 4 to the right root

1 1 0 6 > 2„ _ 4

is

0 0 0 0 ^ -4

~ OlO0 2n^t - llO 0 2 « - 4 and from

the root OOO0 2n- 4 to the right right root lO102n-A is OOO0 2 n - 4 -

1 0 0 ^ -4

-

101

#2 „-4 -

Notice that the path r~rr uses an edge through dimension (n-1) from the root r to an
intermediate node x and an edge through dimension n from x to the right root rr,
while the path r~rrr uses an edge through dimension n from the root r to an interme
diate node y and an edge through dimension (n-3) from y to the right right root rrr.
Therefore, the maximum number of times a hypercube edge is used is two. So, the
edge congestion of this embedding is two.

57

3.4 Summary
In this chapter, two different schemes were used to embed the complete binary
tree CBn into the twisted hypercube TQn. In the first scheme, we used a recursive
algorithm to embed CB„ into TQ„ based on the embedding of CBn_x into TQn_x. The
resulting embedding is such that all edges in the lowest four levels of the complete
binary tree are mapped to paths of length one in the twisted hypercube and all other
edges in higher levels of the complete binary tree are mapped to paths of length two in
the twisted hypercube. In the second scheme, we used the inorder binary labeling of
the complete binary tree CB„ to embed CB„ into the twisted hypercube TQ„. The
inorder embedding is simpler and more natural than the recursive embedding, but it is
less efficient in terms of the number of edges that are mapped to paths of length two.
For complete quad trees, we used a recursive algorithm that embeds CQ„ into
TQn based on the embedding of C<2„_i into TQn_x. The resulting embedding is such
that 37.5% of the edges in the lowest level and 50% of the edges in higher levels of the
complete quad tree are mapped to paths of length two in the twisted hypercube and the
rest of edges are mapped to paths of length one.

C H A P T E R

4

Embedding Rings
into Faulty Twisted Hypercubes

4.1 Introduction
The ability of a network to simulate, compute, route, and reconfigure itself
despite the presence of faults is an important issue in parallel processing. The twisted
hypercube was proposed as an alternative to the hypercube. One of the important fea
tures of the hypercube is its ability to simulate other networks in the presence of faults.
If the twisted hypercube is considered as an alternative, it is necessary to show that its
performance in the presence of faults is at least as good as that of the hypercube.
Rosenberg and Snyder [RS] showed that given any ring and any connected graph
of the same size, the ring can be embedded into the graph with dilation cost < 3. They
also proved that this bound is optimal. It is well known that rings can be embedded
into hypercubes with dilation one using cyclic Gray Codes. Saad and Schultz [SS]
used Gray Codes to embed a ring of size / into a hypercube of size 2” with dilation
one when I is even and 4 < / < 2". Latifi and Zheng [LZ] generalized the cyclic Gray
Code method to embed rings into twisted hypercubes. They identified n\ distinct
n\
Hamiltonian paths and — + (n - 2 ) ! distinct Hamiltonian circuits in a twisted hypercube.
58

59

Embedding rings into hypercubes in the presence of faults have been addressed
by many researchers. Provost and Melhem [PM] have given distributed algorithms
despite single, double, and multiple faults wasting up to 50% of the processors in the
worst case. Chan and Lee [CL] improved the previous result by wasting only one
nonfaulty processor for every faulty processor with some restriction on the number of
faults. In this chapter, we consider the problem of embedding rings into twisted
hypercubes in the presence of single and multiple faulty processors [ABb].
The remainder of this chapter is organized as follows. In section 2, we describe
our schemes to embed a ring of size 2" into a fault-free twisted hypercube of the same
size. Section 3 addresses embeddings in the presence of faulty nodes. Our emphasis
will be on the multiple fault case. Section 4 concludes the chapter.

4.2 Fault-Free Embeddings
Given a ring R2„ with 2" nodes, consider the problem of assigning the ring nodes
to the nodes of the twisted hypercube such that adjacency is preserved. That is, given
any two adjacent nodes in the ring, their images by this embedding should be neigh
bors in the twisted hypercube through some dimension i, where 1 < i < n. We can
view such an embedding as a sequence of dimensions crossed by adjacent nodes. Let
us call such a sequence the embedding sequence, denoted by ES = (d t , d 2, ..., d 2n),
where d t e {1,..., n} for all 1 < i < 2".

60

1V5
1
1V6

(a) Type A

V2
1
VI

(b) Type B

Figure 4.1: The embedding sequence.

4.2.1 The Embedding Sequence
Figure 4.1 shows an embedding of the ring R 2 3 into the twisted hypercube TQ3.
It is more convenient to view the embedded ring as well as the twisted hypercube in
the way shown in Figure 4.1. All twisted hypercube nodes with even labels are in the
upper level and all nodes with odd labels are in the lower level. The embedding
sequence of R2i is ES = (1, 3 ,1 ,2 ,1 , 3 ,1 ,2 ). For example, in Figure 4.1.a, notice that
nodes Vi and v2 are connected by a link through dimension 1, v2 and v3 are connected
by a link through dimension 3, v3 and v4 are connected by a link through dimension 1,
v4 and v5 are connected by a link through dimension 2, and so on. The embedding
sequence ES can be generated using the following algorithm.

61

A lgorithm 4.1
Let n be the dimension of the twisted hypercube and let the vertical bar be the con
catenation operator.
Step 1: ES <— 1
Step 2: For i <- 3 to n do
ES <—ES | i | ES
Step 3: ES

ES 12 1ES 12

The embedding sequence is generated by applying Algorithm 4.1 on n, where n
is the dimension of the twisted hypercube. The number of nodes in the twisted hyper
cube is equal to the number of nodes in the embedded ring which is 2" nodes. Thus,
the embedding sequence of the ring R # is ES = ( 1 ,3 ,1 ,4 , 1 ,3 ,1 , 2 ,1 , 3 ,1 ,4 ,1 , 3,
1, 2).

T heorem 4.1: For every n, Algorithm 4.1 will generate the embedding sequence to
construct a ring of size 2" in a fault-free twisted hypercube of dimension n.

Proof: We prove this by induction on the dimension of the twisted hypercube. Our
induction basis is TQ2, a ring of size 4 can be easily constructed in TQ2 using the
embedding sequence ES = (1, 2, 1,2). Assume the theorem is true for the construction
of a ring of size 2”-1 in a twisted hypercube of dimension n-1. We now prove that the
theorem is true for the construction of R 2n in TQn. Consider the two twisted subcubes
T< 2°„_i

and TQl n_x. By induction hypothesis, we can construct a ring of size 2”_1 in

62

both TQ°n_x and TQln_x. Let their embedding sequence be
ES = 5„_1|2 |5 „ _ 1|2
where Sn_x is a sequence of dimensions recursively defined as following
S2 = l

Sn-1= Sn-2 |n ISn-2
Now we combine two rings, each of size 2"-1, to come up with a ring of size 2". This
is done by replacing the first link that goes through dimension 2 of the first ring and
the second link that goes through dimension 2 of the second ring by two links that go
through dimension n. The embedding sequence of the new ring R 2n is
ES =

|n |

12 1Sn_i | n | S„_x 12

= S m\ 2 \ S m\2

which is the same embedding sequence generated by Algorithm 4.1. □

Notice that the same embedding sequence may result in different embeddings of
R 2n into TQn depending on the twisted hypercube node that initiates the ring construc
tion. Among all different embeddings, we are interested in two kinds. The first
embedding is when the node that initiates the ring construction in the twisted hyper
cube is the upper left most node, node with label 0. The second embedding is when
the node that initiates the ring construction in the twisted hypercube is the lower left
most node, node with label 1. Let us call the first embedding type A embedding and
the second embedding type B embedding. Figure 4.2 shows both type A and type B
embeddings for the ring R2* into the twisted hypercube TQ4.

63

4

1

3

3

3

3

(a) Type A embedding

3

3

1

4

4

(b) Type B embedding
Figure 4.2: Fault-free embedding.

4.2.2 Divide-Conquer Embeddings
This section introduces a data structure, that is fundamental to the embeddings
given in this chapter called a cube. A cube is a twisted subcube of dimension 3 that
consists of two adjacent blocks as shown in Figure 4.3.b. A block is a set of four
nodes in a twisted hypercube that form a ring of size 4 that has the embedding
sequence ES = (1, 2, 1, 2) as shown in Figure 4.3.a. Notice that cubes overlap while
blocks do not and a twisted hypercube of dimension n, TQn, contains 2"-2 cubes and

64

•

•
•

#

L_

L_

(a) The block

I
I

I

(b)

The cube

Figure 4.3: Blocks and cubes.

2"-2 blocks. A ring of size 8, R2i , can be embedded into a cube. In a cube, if we use
the twisted lower links that go through dimension 3 to connect the two blocks, after
removing the lower two links that go through dimension 2, then the embedding is of
type A and if we use the upper links that go through dimension 3 to connect the two
blocks, after removing the upper two links that go through dimension 2, then the
embedding is of type B as shown in Figure 4.1. The cube is used in this section to
introduce new techniques to embed a ring into a twisted hypercube. In the next sec
tion, this technique is generalized to embed a ring into a faulty twisted hypercube.

65

Now, given a ring R 2«, we can embed it into the twisted hypercube TQn by the follow
ing algorithm.

A lgorithm 4.2
Step 1: Partition TQn into 2”-3 node disjoint cubes.
Step 2: Embed the ring R 2i into each cube using type A, or type B, embedding.
Step 3: Connect the 2”-3 rings, each of size 8, through the upper links, or the twisted
lower links, to come up with type A, or type B, embedding.

T heorem 4.2: For every n, Algorithm 4.2 will embed a ring of size 2" in a fault-free
twisted hypercube of dimension n.

Proof: We consider only type A embedding. Type B embedding can be proved in a
similar way. We prove this by induction on the dimension of the twisted hypercube.
Our induction basis is TQ3, the embedding of a ring of size 23 into a twisted hyper
cube of dimension 3 is shown in Figure 4.1.a. Assume the theorem is true for the con
struction of a ring of size 2"-1 in a twisted hypercube of dimension n -1. We now
prove that the theorem is true for the construction of R 2n in TQn. Consider the two
twisted subcubes TQ0, ^ and TQ ln_x. By assumption we can construct a ring of size
2n_1 in both TQ°„_X and TQ ln_x. Now we combine two rings, each of size 2”-1, to
come up with a ring of size 2". This is done by replacing the first link that goes
through dimension 2 of the first ring and the second link that goes through dimension
2 of the second ring by two upper links that go through dimension n. □

66

In the next section, we will use the same concept with minor variations to embed
a ring to a faulty twisted hypercube without wasting any nonfaulty nodes.

4.3 Fault-Tolerant Embeddings
One of the special significant features of the hypercube is its capability to simu
late other interconnection networks in the presence of faults. Accordingly, if the
twisted hypercube is to be considered as an alternative, it is necessary to show that it is
at least as good as the hypercube regarding fault-tolerance. In this section, we are
interested in answering the following question. Given that some faults are present,
does the twisted hypercube have the ability to simulate rings efficiently? Like the
hypercube, the twisted hypercube is maximally fault-tolerant. While even one faulty
processor in the twisted hypercube will degrade its overall performance, it is still capa
ble o f simulating rings without wasting any nonfaulty nodes. In the hypercube, you
have to waste a nonfaulty node for every faulty node [CL]. In the next section, we
extend Algorithm 4.2 to handle a single faulty node.

4.3.1 Embedding in the Presence of a Single Fault
The idea behind our technique to embed a ring into a faulty twisted hypercube is
to use some of the unused links to skip a faulty node. As mentioned in the previous
section, Figure 4.1 shows two kinds of embeddings of a ring R 23 into a twisted hyper
cube TQ3. Notice that some of the links are not part of the embedding. As an

67

illustration, in Figure 4 .l.a, the links between nodes Vj and v5 through dimension 3, v2
and v7 through dimension 2, v3 and v6 through dimension 2, and v4 and v8 through
dimension 3 are unused links. We can use these unused links to avoid a faulty node.
Therefore, if node v, in a twisted hypercube TQ„ is faulty, a ring

can be

constructed by using some of the unused links to skip the faulty node without disturb
ing the construction of the rest of the ring. A faulty node is either an upper node or a
lower node.
The basic idea behind our technique is to identify the faulty node and the cube
that contains it, then avoid the fault by using the unused links. Figure 4.4 shows all
possible locations of a faulty node within a cube and the links that need to be used to
avoid it in the process of constructing the ring. Part (a) shows how to handle an upper
faulty node, while part (b) shows how to handle a lower faulty node. Notice that part
(a) simulates type B embedding within a cube since it does not disturb the construction
of the rest of the ring, the twisted lower links can be used to connect it with adjacent
rings when type B embedding is used. On the other hand, part (b) simulates type A
embedding within a cube since it does not disturb the construction of rest of the the
ring, the upper links can be used to connect it with adjacent rings when type A embed
ding is used. Figure 4.5.a shows how to handle an upper faulty node, while Figure
4.5.b shows how to handle a lower faulty node. Notice that the upper faulty node is in
the second cube, while the lower faulty node is in the first cube. The location of the
cube that contains the faulty node might be the first, the last, or some where in
between. Our technique works for all three cases by using the appropriate links. The

68

(a) Upper faulty node

(b)

Lower faulty node

Figure 4.4: All possible locations of a faulty node.

69

following algorithm embeds a ring R2n.x into a twisted hypercube TQn in the presence
of a faulty node.

A lgorithm 4.3
Step 1: Partition TQn into 2”"3 node disjoint cubes.
Step 2: Locate the cube that contains the faulty node and identify whether it is an
upper or a lower node.
Step 3: (i) If it is an upper node then
a. Choose the appropriate embedding from Figure 4.4.a.

(a) Upper faulty node

(b) Lower faulty node
Figure 4.5: Single fault embedding.

70

b. Embed the ring R23 into each of the fault-free cubes using type B
embedding.
c. Connect all the rings, one of size 7 and the rest of size 8, using the
twisted lower links to come up with the ring R 2n-\(ii) If it is a lower node then
a. Choose the appropriate embedding from Figure 4.4.b.
b. Embed the ring R23 into each of the fault-free cubes using type A
embedding.
c. Connect all the rings, one of size 7 and the rest of size 8, using the
upper links to come up with the ring /f2«-i -

Theorem 4.3:

For every n, Algorithm 4.3 will embed a ring of size 2"-1 into a

twisted hypercube of dimension n in the presence of a faulty node.

The theorem can be proved easily by extending the proof of theorem 4.2. In the
next section, we will use the same concept with minor variations to embed a ring into
a faulty twisted hypercube with multiple faults.

4.3.2 Embedding in The Presence of Multiple Faults
In this section, we describe our scheme to embed a ring 7?2"-/> where / is the
number of faults, into a twisted hypercube TQn in the presence of / faults such that
each cube has at most one faulty node. A cube might be an overlap cube as shown in

71

Figure 4.3.b. The maximum number of faults that can be handled by our technique is
/ = 2"~3. The idea is to generalize Algorithm 4.3 to handle multiple faults. The fol
lowing algorithm embeds a ring

into a twisted hypercube TQn in the presence

of / faults.

A lgorithm 4.4
Step 1: Partition TQ„ into 2"-2 blocks.
Step 2: Identify the blocks with faulty nodes.
Step 3: Group each faulty block with the adjacent unfaulty block to its left to form a
faulty cube.
Step 4: Embed a ring of size 7 into each of the faulty cubes by choosing an appropri
ate embedding from Figure 4.4 and embed a ring of size 4 into each of the
blocks.
Step 5: Construct a ring of size 2" - / by connecting the rings, either R-j or R4, using
the appropriate links, either upper links or twisted lower links as shown in
Figure 4.6.

Theorem 4.4:

For every n, Algorithm 4.4 will embed a ring of size 2” - / into a

twisted hypercube of dimension n in the presence of / faulty nodes such that each
cube has at most one faulty node.

72

Figure 4.6: Multiple faults embedding.

73

Proof: Without loss of generality, we assume that the left most block has no faulty
node. The existence of an adjacent unfaulty block to the left of any faulty block fol
lows directly from our assumption that each cube has at most one faulty node. In the
process of constructing the ring R2«-f, any two adjacent cubes with a fault are one of
the following cases

Case 1:

A cube with upper fault followed by a cube with upper fault.

Case 2:

A cube with upper fault followed by a cube with lower fault.

Case 3:

A cube with lower fault followed by a cube with lower fault.

Case 4:

A cube with lower fault followed by a cube with upper fault.

Figure 4.7 shows all four cases in the process of constructing the ring. We use the
twisted lower links with an upper faulty cube followed by either an upper or a lower
faulty cube and the upper links with a lower faulty cube followed by either a lower or
an upper faulty cube. The way we grouped the faulty blocks with unfaulty blocks to
form cubes always guarantees the existence of such links. The other cases are an
upper or a lower faulty cube followed by a block and a block followed by a block or a
faulty cube. We use the twisted lower links with an upper faulty cube followed by a
block and the upper links with a lower faulty cube followed by a block. For the case
of a block followed by a block or a faulty cube, we use the appropriate links, either
upper or twisted lower links, since both are available. □

74

(a) Upper followed by upper

(b) Upper followed by lower

(c) Lower followed by lower

(d)

Lower followed by upper

Figure 4.7: All possible cases of two adjacent faulty cubes.

75

4.4 Summary
In this chapter, we presented optimal algorithms for embedding a ring into a
twisted hypercube with fault-free nodes, single faulty node, and multiple faults. We
showed the capability of the twisted hypercube to simulate rings efficiently in the pres
ence of faults. While even one faulty processor will degrade its over all performance,
like any other network, but it is still capable of constructing a Hamiltonian circuit
within the nonfaulty processors.
A twisted hypercube TQn with 2" nodes can simulate a ring

with 2" - /

nodes in the presence of / twisted hypercube faulty nodes with some restrictions on
the location of the faults. In the hypercube, the simulation of rings achieved by wast
ing a nonfaulty processor for every faulty processor. The simulation of rings by
twisted hypercube is more efficient since it is achieved without wasting any nonfaulty
processors.

CHAPTER 5

F ault-Tolerance Embedding
of Rings into Hypercubes

5.1 Introduction
The hypercube has been the focus of many recent research activities. Extensive
work has been done to show that the hypercube is a powerful architecture capable of
simulating other interconnection networks such as rings, meshes, trees, stars, and oth
ers with minimum overhead ([BCGS], [BCLR], [BMS], [BSu], [MS], [SS], [Lei]). It
has also been shown that the hypercube machine is robust and fault-tolerant and has
the ability to simulate, route, and reconfigure itself despite the presence of either faulty
links or nodes ([BS], [CL], [HLNa], [HLNb], [PM], [WCM]).
The problem of embedding rings into other interconnection networks has been
addressed by many researchers. Rosenberg and Snyder [RS] addressed the problem of
embedding rings into general graphs. They showed a dilation 3 embedding of a ring
into a general graph of the same size. In [JLD] and [NSK], the authors considered
embedding cycles, rings, and Hamiltonians into star networks. Saad and Schultz [SS]
used Gray Codes to show the existence of a Hamiltonian circuit in a hypercube struc
ture. Chan and Shin [CS] used Gray Codes to identify n! distinct Hamiltonian paths in
a hypercube network.

77

Embedding rings into hypercubes despite the presence of faults have been
addressed by many researchers. Provost and Melhem [PV] have given distributed
algorithms in the presence of single, double, and multiple faults wasting up to 50% of
the processors in the worst case. Chan and Lee [CL] improved the result by wasting

o n l y o n e n o n f a u l t y p r o c e s s o r f o r e v e r y f a u lt y p r o c e s s o r a n d a l l o w i n g u p t o L

Tl |

2

1

J

faults. This chapter uses a new technique to embed a ring of size 2" - 2 / into a hyper
cube of dimension n despite the presence of / faults. It wastes only one nonfaulty
processor for every faulty processor and allows up to 2”-3 faults with some restriction
on the location of the faults [ABd].
The remainder of this chapter is organized as follows. In section 2, we describe
our scheme to embed a ring of size 2” into a fault-free hypercube of the same size.
Section 3 addresses embedding in the presence of faulty nodes. Our emphasis will be
on the multiple fault case. Section 4 concludes the chapter.

5.2 Fault-Free Embeddings
Given a ring R2- with 2" nodes. Consider the problem of assigning the ring
nodes to the nodes of the hypercube such that adjacency is preserved. In the hyper
cube, two nodes are adjacent if the binary representation of their labels differ in
exactly one bit position, say in position i. We call the link that connects the two adja
cent nodes a link through dimension i. The least significant bit in the binary represen
tation of a label is referred to as position 1 and the most significant bit as position n.

78

vl

v8

v7

v6

v2

v3

v4

v5

Figure 5.1: The embedding sequence.

Now given any two adjacent nodes in the ring, their images by this embedding should
be neighbors in the hypercube through some dimension i, where 1 < i < n. We can
view such an embedding as a sequence of dimensions crossed by adjacent nodes. We
call such a sequence the embedding sequence, denoted by ES = (d x, d 2, ..., d 2n), where
di e { 1 , n } for all 1 < i < 2".
Figure 5.1 shows an embedding of the ring R p into the hypercube Q$. It is more
convenient to view the embedded ring as will as the hypercube in the way shown in
Figure 5.1. We view the hypercube as two levels where all nodes with even labels are
in the upper level and all nodes with odd labels are in the lower level. The embedding
sequence of R2i is ES = (1, 2, 3, 2, 1, 2, 3, 2). For example, in Figure 5.1, notice that
nodes v l and v2 are connected by a link through dimension 1, v2 and v3 are connected
by a link through dimension 2, v3 and v4 are connected by a link through dimension 3,
v4 and v5 are connected by a link through dimension 2, and so on. The embedding
sequence ES can be generated using the following algorithm.

79

A lgorithm 5.1

Let n be the dimension of the hypercube and let the vertical bar be the concatenation
operator.
Step 1: ES <- 2
Step 2: For i <- 3 to n do
ES <—ES | i | ES
Step 3: ES <— 1 |E S | 1 |ES

The embedding sequence is generated by applying Algorithm 5.1 on n, where n
is the dimension of the hypercube. The number of nodes in the hypercube is equal to
the number of nodes in the embedded ring which is 2” nodes. Thus, the embedding
sequence of the ring R2* is ES = (1, 2, 3, 2, 4, 2, 3, 2, 1, 2, 3, 2, 4, 2, 3, 2) and the
embedding sequence of the ring R2s is ES = (1, 2, 3, 2, 4, 2, 3, 2, 5, 2, 3, 2 ,4 , 2, 3, 2,
1, 2, 3, 2, 4, 2, 3, 2, 5, 2, 3, 2, 4, 2, 3, 2). Notice that the same embedding sequence
may result in different embeddings of R2n into Qn depending on the hypercube node
that initiates the ring construction. Among all different embeddings, we are interested
in the embedding where the node that initiates the ring construction in the hypercube
is node with label 0.

Theorem 5.1:

For every n, Algorithm 5.1 will generate the embedding sequence to

construct a ring of size 2" in a fault-free hypercube of dimension n.

80

Proof: We prove this by induction on the dimension of the hypercube. Our induction
basis is Q2, a ring of size 4 can be easily constructed in Q2 using the embedding
sequence ES = (1, 2, 1, 2). Assume the theorem is true for the construction of a ring
of size 2"_1 in a hypercube of dimension n -1. We now prove that the theorem is true
for the construction of R 2n in Qn. Consider the two subcubes

and Ql n_

By

induction hypothesis, we can construct a ring of size 2"-1 in both Q°„_j and Qln-i. Let
their embedding sequence be
ES = 1 1S„_, 11 1S„_,
where Sn is a sequence of dimensions recursively defined as follows:
S2 = 2
*5n- 1 = *5n-2 | n | *5/1-2
Now we combine two rings, each of size 2"_1, to come up with a ring of size 2". This
is done by replacing the second link that goes through dimension 1 of the first ring and
the first link that goes through dimension 1 of the second ring by two links that go
through dimension n. The embedding sequence of the new ring R 2n is
ES = 1 15„_i | n 15„_! 11 1S„_i | n |
= 1 1*5„ 11 1<5„
which is the same embedding sequence generated by Algorithm 5.1. □

5.2.1 Divide-Conquer Embeddings
This section introduces a data structure, that is fundamental to the embeddings
given in this chapter, called a cube. A cube is a subcube of dimension 3 that consists

81

Figure 5.2: The cube.

of two adjacent blocks as shown in Figure 5.2. A block is a set of four nodes in a
hypercube that form a ring of size 4 that has the embedding sequence ES = (1, 2 ,1 , 2).
Notice that cubes overlap and a hypercube of dimension n, Qn, contains 2"-2 cubes. A
ring of size 8, R 23, can be embedded into a cube by the embedding sequence ES = (1,
2, 3 ,2 , \, 2, 3, 2). The cube is used to introduce new techniques to embed a ring into a
twisted hypercube. These new techniques are generalized in later sections to embed a
ring into a faulty twisted hypercube. The next algorithm uses a divide-conquer tech
nique to embed a ring R2« into a hypercube Qn.

A lgorithm 5.2
Step 1: Partition Q„ into 2”-3 node disjoint cubes.
Step 2: Embed the ring R # into each cube using the embedding sequence ES = (1, 2,
3, 2 , 1, 2, 3, 2).

82

Step 3: Connect the 2”-3 rings, each of size 8, through the upper, or lower, links to
come up with a ring of size R2*.

Theorem 5.2: For every n, Algorithm 5.2 will embed a ring of size 2" in a fault-free
hypercube of dimension n.

Proof: We prove this by induction on the dimension of the hypercube. Our induction
basis is £?3, the embedding of a ring of size 23 into a hypercube of dimension 3 is
shown in Figure 5.1. Assume the theorem is true for the construction of a ring of size
2”_1 in a hypercube of dimension n-1. We now prove that the theorem is true for the

(a) Using upper links

(b) Using lower links
Figure 5.3: Fault-free embedding.

83

construction of R 2« in Qn. Consider the two subcubes {2°„_i and Qxn_j. By assumption
we can construct a ring of size 2"_1 in both Q°n_{ and Qxn_ N o w we combine two
rings, each of size 2"-1, to come up with a ring of size 2". This is done by replacing
the first link that goes through dimension 2 in the upper part of the first ring and the
last link that goes through dimension 2 in the upper part of the second ring by two
upper links that go through dimension n, or by replacing the last link that goes through
dimension 2 in the lower part of the first ring and the first link that goes through
dimension 2 in the lower part of the second ring by two lower links that go through
dimension n, as shown in Figure 5.3. □

5.3 Fault-Tolerance Embeddings
One of the special significant features of the hypercube is its ability to simulate
other interconnection networks in the presence of faults. In this section, we are inter
ested in answering the following question. Given that some nodes of the hypercube are
faulty, does the hypercube have the ability to simulate rings efficiently? The hyper
cube is maximally fault-tolerant. While even one faulty processor will degrade its
overall performance, it is still capable of simulating rings by wasting only one non
faulty processor for every faulty processor.

5.3.1 Embedding in the Presence of a Single Fault
i

The idea behind our technique to embed a ring into a faulty hypercube is to use
some of the unused links to skip a faulty node. As an illustration, in Figure 5.1, the

84

links between nodes vj and v6 through dimension 3, v2 and v5 through dimension 3, v3
and v8 through dimension 1, and v4 and v7 through dimension 1 are unused links. We
can use these unused links to avoid a faulty node. But since the hypercube does not
contain odd cycles, we have to waste a nonfaulty processor for every faulty processor.
Therefore, if node v, in a hypercube Qn is faulty, a ring /?2«_2 can be constructed by
using some of the unused links to skip the faulty node without disturbing the construc
tion of the rest of the ring.
The basic idea behind our technique is to identify the faulty node and the cube
that contains it, then avoid the fault by using the unused links. Figure 5.4 shows all
possible locations of an upper faulty node within a cube and the links that need to be
used to avoid it in the process of constructing the ring, while Figure 5.5 shows the case
of a lower faulty node. Figure 5.6.a shows how to handle an upper faulty node, while
Figure 5.6.b shows how to handle a lower faulty node. The location of the cube that
contains the faulty node might be the first, the last, or some where in the middle. Our
technique works for all three cases by using the appropriate links. The following algo
rithm embed a ring J?2«_2 into a hypercube Q„ in the presence of a faulty node.

A lgorithm 5.3
Step 1: Partition Qn into 2”~3 node disjoint cubes.
Step 2: Locate the cube that contains the faulty node and identify whether it is an
upper or a lower fault.

85

o
*

(a) Standard

9 ---- * 9

(b) Alternate

Figure 5.4: All possible locations of an upper faulty node within a cube.

irrj

f f U

i:

i-

•

—i

'J

•— #
(a) Standard

o

o

•

f: f:
(b) Alternate

Figure 5.5: All possible locations of a lower faulty node within a cube.

(a) Upper faulty node

(b) Lower faulty node
Figure 5.6: Single fault embedding.

Step 3: (i) If it is an upper fault then
a. Choose the appropriate embedding from Figure 5.4.a.
b. Embed the ring R 2 3 into each of the fault-free cubes using the embed
ding sequence ES = (1, 2, 3, 2 ,1 , 2, 3, 2).
c. Connect all the rings, one of size 6 and the rest of size 8, using the
lower links to come up with the ring R2n-2.
(ii) If it is a lower fault then
a. Choose the appropriate embedding from Figure 5.5.a.

88

b. Embed the ring /?23 into eac^ of the fault-free cubes using the embed
ding sequence ES = (1, 2, 3, 2, 1, 2, 3, 2).
c. Connect all the rings, one of size 6 and the rest of size 8, using the
upper links to come up with the ring /?2«-i *

T heorem 5.3:

For every n, Algorithm 5.3 will embed a ring of size 2” - 2 into a

hypercube of dimension n in the presence of a faulty node.

The theorem can be proven easily by induction by extending the proof of theo
rem 5.2. In the next section, we will use the same concept with minor variations to
embed a ring into a faulty hypercube with multiple faults.

5.3.2 Embedding in The Presence of Multiple Faults
In this section, we describe our scheme to embed a ring

/? 2n_2 / ,

where / is the

number of faults, into a hypercube Q„ in the presence of / faults such that each cube
has at most one faulty node. The maximum number of faults that can be handled by
our technique is / = 2"-3. The idea is to generalize Algorithm 5.3 to handle multiple
faults. The following algorithm embeds a ring /?2"-2 / into a hypercube Qn in the pres
ence of / faults.

89

A lgorithm 5.4
Step 1: Partition Q„ into 2"~3 node disjoint cubes.
Step 2: Identify the cubes with faulty nodes.
Step 3: Embed a ring of size 6 into each of the faulty cubes by choosing an appropri
ate embedding from Figures 5.4 and 5.5 and a ring of size 8 into each of the
unfaulty cubes.
Step 4: Construct a ring of size 2" - 2 / by connecting the rings, either R6 or /?8,
using the appropriate links, either upper or lower links as shown in Figure
5.7.

T heorem 5.4:

For every n, Algorithm 5.4 will embed a ring of size 2" - 2 / into a

hypercube of dimension n in the presence of / faulty nodes such that each cube has at
most one faulty node.

Proof: In the process of constructing the ring

any two adjacent cubes with a

fault are one of the following cases
Case 1: A cube with upper fault followed by cube with upper fault.
Case 2: A cube with upper fault followed by cube with lower fault.
Case 3: A cube with lower fault followed by cube with lower fault.
Case 4: a cube with lower fault followed by cube with upper fault.

90

Figure 5.7: Multiple faults embedding.

Figure 5.8 shows all four cases in the process of constructing the ring. Notice
that the decision of whether to use a standard or alternate ring depends about the posi
tion of the faulty node within a cube, whether it is in the left or right block and
whether it is an upper or a lower fault. Also, the position of the faults in adjacent
cubes affect the type of ring to be used. We use the lower links with an upper fol
lowed by an upper, the upper links with a lower followed by a lower, and in the case
of an upper followed by a lower or a lower followed by an upper we might use the
upper or the lower links depending on the location of the faults. Since we are wasting
one good processor for every faulty processor, the size of the embedded ring is
2" - I f .

□

5.4 Summary
This chapter has presented new techniques to embed a ring of size 2” - 2 / in a
hypercube of dimension n despite the presence of / < 2"-3 faults. The new divideconquer technique uses a new data structure called cube. The basic idea behind the
technique is to identify faulty nodes and the cubes that contains them, avoid the faults
within the cube by using the unused links, and construct the ring connecting adjacent
cubes. Our technique has some restrictions on the distribution of the faults. It allows
up to 2"~3 faults such that each cube has at most one fault.

92

(a) Upper followed by upper

(b) Upper followed by lower

(c) Lower followed by lower

(d) Lower followed by upper
Figure 5.8: All possible cases of two adjacent faulty cubes.

CHAPTER 6

Concluding Remarks

One of the most important factors that govern the performance of a parallel
machine is the underlying interconnection network. Many interconnection networks
have been introduced in the literature. The most important features of these intercon
nection networks are the diameter and the node degree. Another important feature of a
network is fault-tolerance. Hypercubes have gained wide spread acceptance due to
their many attractive properties. The twisted hypercube preserves many of the proper
ties of the hypercube and reduces the diameter by a factor of two. This dissertation
explored the efficiency and the fault-tolerance of the twisted hypercube in parallel
computation and investigated relations and transformations between the twisted hyper
cube and various interconnection networks. These include complete binary trees,
complete quad trees, fault-free rings, faulty rings, and hypercubes.
We have presented different schemes to embed complete binary trees and com
plete quad trees into the twisted hypercube. For complete binary trees, we have pre
sented two different schemes to embed a complete binary tree CBn into a twisted
hypercube TQ„. In the first scheme, we used a recursive algorithm to embed CB„ into
TQn based on the embedding of CBn_x into TQn_x. The resulting embedding is such
that all edges in the lowest four levels of the complete binary tree are mapped to paths

of length one in the twisted hypercube and all other edges in higher levels of the com
plete binary tree are mapped to paths of length two in the twisted hypercube. In the
second scheme, we used the inorder binary labeling of the complete binary tree CBn to
embed CBn into the twisted hypercube TQn. The inorder embedding is simpler and
more natural than the recursive embedding, but it is less efficient in terms of the num
ber of edges that are mapped to paths of length two. For complete quad trees, we have
presented a recursive algorithm that embeds CQ„ into TQn based on the embedding of
CQn-\ into TQn_x. The resulting embedding is such that 37.5% of the edges in the
lowest level and 50% of the edges in higher levels of the complete quad tree are
mapped to paths of length two in the twisted hypercube and the rest of edges are
mapped to paths of length one.
Interesting results have been presented on the fault-tolerance of the twisted
hypercube. We have presented optimal algorithms for embedding a ring into a twisted
hypercube with fault-free nodes, single faulty node, and multiple faults. We have
shown the capability of the twisted hypercube to simulate rings efficiently in the pres
ence of faults. While even one faulty processor will degrade its over all performance,
like any other network, but a Hamiltonian circuit can be constructed on the nonfaulty
processors. We have shown that a twisted hypercube TQ„ with 2” nodes can simulate
a ring /?2n- / with 2" - / nodes in the presence of / twisted hypercube faults. In the
hypercube, the simulation of rings achieved by wasting a nonfaulty processor for
every faulty processor. The simulation of rings by twisted hypercube is more efficient
since it is achieved without wasting any nonfaulty processors.

We have presented new techniques to embed a ring of size 2" - 2 / in a hyper
cube of dimension n despite the presence of / < 2"-3 faults. The new divide-conquer
technique uses a new data structure called cube. Our algorithm for multiple faults
allows up to 2"-3 faults such that each cube has at most one fault.
In future work, we intend to study the embedding of other parallel architectures
into the twisted hypercube. It may also be possible to improve on some of our results
such as embedding complete binary tress into twisted hypercubes. It has been conjec
tured that the complete binary tree CB„ is a subgraph of the twisted hypercube TQn.
An interesting obvious problem left open is whether the number of faults that can be
tolerated by the twisted hypercube can be improved further. Another interesting prob
lem will be to adapt our techniques of embedding rings into faulty twisted hypercubes
on other parallel architectures.

Bibliography
[A]

S. Akl, The Design and Analysis of Parallel Algorithms, Prentice-Hall,
1989.

[ABa]

E. Abuelrub and S. Bettayeb, "Embedding Complete Binary Trees into
Twisted Hypercubes," Proc. ISCA International Conference on Computer
Applications in Design, Simulation and Analysis, pp. 1-4,1993.

[ABb]

E. Abuelrub and S. Bettayeb, "Embedding Rings into Faulty Twisted
Hypercubes," Proc. 31st ACM Southeastern Regional Conference, pp.
48-55,1993.

[ABc]

E. Abuelrub and S. Bettayeb, "Embeddings in the Twisted Hypercube,"
Technical Report, Department of Computer Science, Louisiana State Uni
versity, 1993.

[ABd]

E. Abuelrub and S. Bettayeb, "Fault-Tolerance Embedding of Rings into
Hypercubes," submitted fo r publication.

[ABe]

E. Abuelrub and S. Bettayeb, "Embedding Complete Quad Trees into
Twisted Hypercubes," submitted fo r publication.

[AG]

G. Almasi and A. Gottlieb,
jamin/Cummings, 1989.

[AGr]

J. Armstrong and F. Gray, "Fault Diagnosis in a Boolean n-Cube of Micro
processor," IEEE Transactions on Computers, vol. C-30, no. 8, pp.
587-590, August 1981.

[AHMP]

H. Alt, T. Hagerup, K. Mehlhom, and F. Preparata, "Deterministic Simula
tion of Idealized Parallel Computers on More Realistic Ones," SIAM Jour
nal on Computing, pp. 8089-835, October 1987.

[AJ]

G. Anderson and E. Jensen, "Computer Interconnection Structures: Taxon
omy, Characteristics, and Examples," Computing Surveys, vol. 7, pp.
197-213, December 1975.

96

Highly

Parallel

Computing,

Ben

97

[AK]

S. Akers and B. Krishnamurthy, "Group Graphs as Interconnection Net
works," IEEE Transactions on Computers, vol. 38, pp. 555-565,1989.

[AR]

R. Aleliunas and A. Rosenberg, "On embedding Rectangular Grids in
Square Grids," IEEE Transactions on Computers, vol. C-31, pp. 907-913,
September 1982.

[BA]

L. Bhuyan and D. Agrawal, "Generalized Hypercube and Hyperbus Struc
tures for a Computer Network," IEEE Transactions on Computers, vol.
C-33, no. 4, pp. 323-333, April 1984.

[BCGS]

S. Bettayeb, B. Cong, M. Girou, and I. Sudborough, "Embedding Permuta
tion Networks into Hypercubes," LATIN 9 2 ,1992.

[BCLR]

S. Bhatt, F. Chung, F. Leighton, and A. Rosenberg, "Optimal Simulations
of Tree Machines," Proc. 27th Annual IEEE Foundations o f Computer Sci
ence Conference, pp. 274-282, 1986.

[BH]

R. Beivide and E. Herrada, "Optimal Distance Networks of Low Degree
for Parallel Computing," IEEE Transactions on Computers, vol. C-40, no.
10, pp. 1109-1123, October 1991.

[BI]

S. Bhatt and I. Ispen, "How to Embed Trees in Hypercubes," Technical
Report, Department of Computer Science, Yale University, 1985.

[BL]

H. Bodlaender and J. Leeuwen, "Simulation of Large Networks on Smaller
Networks," Information and Control, vol. 71, pp. 143-180,1986.

[BLD]

L. Barasch, S. Lakshmivarahan, and S. Dhall, "Embedding Arbitrary
Meshes and Complete Binary Trees in Generalized Hypercubes," Proc. 1st
IEEE Symposium on Parallel and Distributed Processing, 1989.

[BMS]

S. Bettayeb, Z. Miller, and I. Sudborough, "Embedding Grids into Hyper
cubes," Journal o f Computer and System Sciences, vol. 45, no. 3, pp.
340-366, December 1992.

[BS]

B. Becker and H. Simon, "How Robust is the n-Cube," Information and
Computation, no. 2, pp. 162-178, May 1988.

98

[BSu]

S. Bettayeb and I. Sudborough, "Grid Embedding into Ternary Hyper
cubes," Proc. 1989 AC M South Central Regional Conference, pp. 62-64,
1989.

[CL]

M. Chan and S. Lee, "Distributed Fault-Tolerance Embeddings of Rings
into Hypercubes," Journal o f Parallel and Distributed Computing, no. 11,
pp. 63-71, 1991.

[CLe]

G. Chartrand and L. Lesniak, Graphs and Digraphs, Wadsworth & Brooks,
1986.

[CS]

M. Chen and K. Shin, "Processor Allocation in an n-Cube Multiprocessor
Using Gray Codes," IEEE Transactions on Computers, vol. C-36, no. 12,
pp. 396-407, December 1987.

[DS]

A. Dingle and I Sudborough, "Simulating Binary Trees and X-Trees on
Pyramid Networks," Proc. 1st IEEE Symposium on Parallel and Dis
tributed Processing, pp. 210-219, 1989.

[E]

K. Efe, "The Crossed Cube Architecture for Parallel Computation," IEEE
Transactions on Parallel and Distributed Systems, vol. 3, no. 5, pp.
513-524, September 1992.

[EBSS]

K. Efe, P. Blackwell, T. Shiau, and W. Slough, "A Reduced Diameter Inter
connection Network," Technical Report, Department of Computer Science,
University of Missouri, 1988.

[EL]

A. El-Amawy and S. Latifi, "Properties and Performance of Folded Hyper
cubes," IEEE Transactions on Parallel and Distributed Systems, vol. 2, no.
1, pp. 31-42, January 1991.

[ENS]

A. Esfahanian, L. Ni, and B. Sagan, "On Enhancing Hypercube Multipro
cessors," Proc. 1988 International Conference on Parallel Processing, pp.
86-89,1988.

[F]

M. Flynn, "Some Computer Organizations and Their Effectiveness," IEEE
Transactions on Computers, vol. C-21, no. 9, September 1972.

[FS]

R. Finkel and M. Solomon, "Processor Interconnection Strategy," IEEE
Transactions on Computers, vol. C-33, pp. 1180-1194, December 1984.

99

[Gor]

D. Gordon, "Efficient Embeddings of Binary Trees in VLSI Arrays," IEEE
Transactions on Computers, vol. C-36, no. 9, pp. 1009-1018, September
1987.

[Gou]

R. Gould, Graph Theory, Benjamin/Cummings, 1988.

[GW]

A. Gupta and H. Wang, "Optimal Embeddings of Ternary Trees into
Boolean Hypercubes," Proc. 4th IEEE Symposium on Parallel and Dis
tributed Processing, pp. 230-235, 1992.

[H]

W. Hills, The Connection Machine, M IT Press, 1985.

[HB]

K. Hwang and F. Briggs, Computer Architecture and Parallel Processing,
McGraw-Hill, 1984.

[HJ]

C. Ho and S. Johnson, "Dilation d Embedding of a Hyper-Pyramid into a
Hypercube," Proc. Supercomputing 89, pp. 294-303,1989.

[HLNa]

J. Hastad, F. Leighton, and M Newman, "Reconfiguring a Hypercube in the
Presence of Faults," Proc. 19th Annual ACM STOC, pp. 274-284,1987.

[HLNb]

J. Hastad, F. Leighton, and M Newman, "Fast Computation Using Faulty
Hypercubes," Proc. 19th Annual ACM STOC, pp. 251-263,1987.

[HMR]

J. Hong, K. Mehlhom, and A. Rosenberg, "Cost Trade-Offs in Graph
Embeddings, with Applications," Journal o f the Association Computer
Machinary, vol. 30, no. 4, pp. 709-728, October 1983.

[HS]

E. Horowitz and S. Sahni, Fundamentals of Computer Algorithms, Com
puter Science Press, 1989.

m

O. Ibe, "Reliability Comparison of Token-Ring Network Schemes," IEEE
Transactions on Reliability, vol. 41, no. 2, pp. 288-283, June 1992.

[H]

M. Imase and M. Itoh, "Design to Minimize Diameter on Building Block
Network," IEEE Transactions on Computers, vol. C-30, no. 6, pp.
439-448, 1981.

100

[JLD]

J Jwo, S. Lakshmivarahan, and S. Dhall, "Embedding of Cycles and Grids
in Star Graphs," Proc. 2nd IEEE Symposium on Parallel and Distributed
Processing, pp. 540-547, 1990.

[K]

K. Kwon, "Parallel Computation on the Hypercube-Like Machine," PhD
Thesis, Department of Computer Science, Louisiana State University,
1991.

[KHI]

N. Krishnakumar, V. Hegde, and S. Iyengar, "Fault Tolerant Based Embed
dings of Quadtrees into Hypercubes," Proc. International Conference o f
Parallel Processing, 1991.

[KS]

H. Kung and D. Stevenson, "A Software Technique for Reducing the Rout
ing Time on a Parallel Computer with a Fixed Interconnection Network,"
High Speed Computer and Algorithm Optimization, Academic Press, pp.
423-433,1987.

[LBT]

C. Liang, S. Battachanya, and W. Tsai, "Distributed Fault-Tolerant Routing
on Hypercubes: Algorithms and Performance Study," Proc. 3rd IEEE Sym
posium on Parallel and Distributed Processing, pp. 474-481,1991.

[LE]

T. Lewis and H. El-Rewini, Introduction to Parallel Computing, PrenticeHall, 1992.

[LEI]

S. Latifi and A. El-Amawy, "Efficient Approach to Embed Binary Trees in
3-D Rectangular Arrays," IEEE Proceedings, vol. 137, no. 2, pp. 159-163,
March 1990.

[Lei]

F. Leighton, Introduction to Parallel Algorithms and Architecture: Arrays,
Trees, Hypercubes, Morgan Kaufmann, 1992.

[Len]

T. Lengauer, Combinatorial Algorithms for Integrated Circuit Layout, John
Wiley & Sons, 1990.

[LF]

R. Lander and M. Fischer, "Parallel Prefix Computation," Journal o f the
ACM, vol. 27, pp. 831-838,1980.

[LZ]

S. Latifi and S. Zheng, "Optimal Simulation of Linear Array and Ring
Architectures on Multiply-Twisted Hypercubes," Proc. 11th International
Conference on Computers and Communications, 1992.

101

[MS]

B. Monien and I. Sudborough, "Simulating Binary Trees on Hypercubes,"
Proc. 3rd Aegean Workshop on Computing, Lecture Notes in Computer
Science, pp. 170-180, 1988.

[NS]

D. Nassimi and S. Sahni, "Data Broadcasting in SIMD Computers," IEEE
Transactions on Computers, vol. C-30, no. 2, pp. 101-106, February 1981.

[NSK]

M. Nigam, S. Sahni, and B. Krishnamurthy, "Embedding Hamiltonian and
Hypercubes in Star Interconnection Graphs," Proc. International Confer
ence on Parallel Processing, pp. 340-343,1990.

[P]

C. Plaxton, "Efficient Computation on Sparse Interconnection Networks,"
PhD Thesis, Department of Computer Science, Stanford University, 1989.

[PM]

F. Provost and R. Melhem, "Distributed Fault-Tolerant Embedding of
Binary Trees and Rings in Hypercubes," Proc. International Workshop on
Defect and Fault-Tolerance in VLSI Systems, 1989.

[PV]

F. Preparata and J. Vuillemin, "The Cube-Connected Cycles: A Versatile
Network for Parallel Computation," Communications o f the ACM , vol. 24,
no. 5, pp. 300-309, May 1981.

[Q]

M. Quinn, Designing Efficient Algorithms for Parallel Computers,
McGraw-Hill, 1987.

[QD]

M. Quinn and N. Deo, "Parallel Graph Algorithms," AC M Computing Sur
veys, vol. 16, no. 3, pp. 319-348, September 1984.

[U]

J. Ullman, Computational Aspects of VLSI, Computer Science Press,
1984.

[UW]

E. Upfal and A. Wigderson, "How to Share Memory in a Distributed Sys
tem," Journal o f the ACM, pp. 116-127, January 1987.

[RS]

A. Rosenberg and L. Snyder, "Bounds on the Costs of Data Encodings,"
Math. Systems Theory, vol. 12, pp. 9-39, 1978.

[Sa]

H. Samet, "The Quadtree and Related Hierarchical Data Structures," Com
puting Surveys, vol. 16, no. 2, pp. 187-260, June 1984.

102

[Se]

C. Seitz, "The Cosmic Cube," Communications o f the ACM, vol. 28, no. 1,
pp. 22-33, January 1985.

[Si]

H. Siegel, Interconnection Networks for Large-Scale Parallel Processing,
Lexington Books, 1985.

[Sn]

L. Snyder, "Introduction to the Configurable Highly Parallel Computer,"
Computer, pp. 47-56, January 1982.

[St]

H. Stone, High-Performance Computer Architecture, Addison-Wesely,
1987.

[SS]

Y. Saad and M. Schultz, "Topological Properties of the Hypercube," IEEE
Transactions on Computers, vol. C-37, no. 7, pp. 867-872, July 1988.

[T]

P. Treleaven, "Control-Driven, Data-Driven, and Demand-Driven Com
puter Architecture," Parallel Computing, no. 2,1985.

[V]

L Valiant, "A Scheme for Fast Parallel Communications," SIAM J. Com
puting, vol. 11, no. 2, pp. 350-361, 1982.

[TK]

C. Thompson and H. Kung, "Sorting in a Mesh-Connected Parallel Com
puter," Communications o f the ACM, vol. 20, no. 4, pp. 263-271, April
1977.

[W]

A. Wu, "Embedding of Tree Networks into Hypercubes," Journal o f Paral
lel and Distributed Computing, vol. 2, no. 3, pp. 238-249, August 1985.

[WCM]

A. Wang, R. Cypher, and E. Mayr, "Embedding Complete Binary Trees in
Faulty Hypercubes," Proc. 3rd IEEE Symposium on Parallel and Dis
tributed Processing, pp. 112-119, 1991.

[YN]

A. Youssef and B. Narahari, "The Banyan-Hypercube Networks," IEEE
Transactions on Parallel and Distributed Systems, vol. 1, no. 2, pp.
160-169, April 1990.

[Z]

S. Zheng, "SIMD Data Communication Algorithms for Multiply-Twisted
Hypercubes," Proc. 5th International Parallel Processing Symposium, pp.
120-125, 1991.

Vita
Emadeddin Abuelrub received his BS degrees in Computer Engineering and
Computer Science from Oklahoma State University in 1984 and 1985, respectively.
He received his MS degree in Computer Science from Alabama A&M University in
1987.

He joined the PhD program in the Department of Computer Science at

Louisiana State University in 1989, where he worked on parallel algorithms and map
pings on parallel machines. His other research interests include the design and analy
sis of algorithms, graph algorithms, and parallel and VLSI computations.

103

DOCTORAL EXAMINATION AND DISSERTATION REPORT

Candidate: Emadeddin Abuelrub
Major Field:

Computer Science

Title of Dissertation: Interconnection Networks Embeddings and Efficient
Parallel Computations

Approved:

Major Professor and Chairman

Dean of the Graduate School

EXAMINING COMMITTEE:
\A ^CX .

______

a p.

_y

i

Date of Examination:
05 / 14/93

