In this paper we consider the Supercube, a new interconnection network derived from the hypercube introduced by Sen in 10]. The Supercube has the same diameter and connectivity of a hypercube but can be realized for any number of nodes not only for powers of 2.
Introduction
The hypercube network is widely used as an architecture for parallel machines (Connection Machine, NCube and Intel IPSC). Its great popularity is essentially due to modularity, regularity and low diameter. These characteristics make easy to design e cient parallel programs and share the machine among users. Moreover, the logarithmic degree of the nodes makes the architecture technologically feasible with respect to an \ideal" network connecting each processor to all the others where the degree of a node is linear in the number of nodes.
However, the number of nodes of this network is restricted to be a power of 2, which can be, in some situations, a signi cant drawback. In fact, it is necessary to double the number of processing elements to upgrade a hypercube, which can be unrealizable for budget limitations or technical reasons. Several architectures have been proposed having any number of nodes and hypercube-like characteristics. In 3] a network called Generalized Hypercube is presented but it becomes a complete network when the number of nodes is prime. In 6] the Incomplete Hypercube is proposed, but the minimum degree of the nodes can be 1. This is a strong limitation to the communication capabilities of that node and to the fault{tolerant properties of the architecture.
In 10] the Supercube, a generalization of the hypercube, has been introduced. It can be realized for any number of nodes N and contains as subgraph the hypercube with dimension blog Nc. It is exactly a hypercube when the number of nodes is a power of 2. The addition of a node is easy and has low cost (few edges disappear). Moreover, the Supercube has the same good characteristics of high connectivity and small diameter as the hypercube network 10] and the degree of a node is at least blog 2 Nc?1 and at most 2blog 2 Nc ? 2. Its topological properties are studied in 13] while its good fault{tolerant characteristics are explored in 11, 1, 2] . In this paper we study the Supercube as an alternative to the hypercube as a basis for an interconnection network and, in particular, we study the capability of Supercube to e ciently execute typical parallel algorithms. To execute a parallel program, its tasks are to be mapped onto the processors of the parallel machine.
It is possible to model this kind of problem in graph{theoretical terms of graph embedding. We model both the parallel algorithm and the parallel machine as graphs and an e cient execution is obtained mapping the rst graph onto the second one in such a way that adjacent tasks are as close as possible. It is known that the optimal mapping problem is NP-complete in most of its formulations, but for simpler cases many results have been obtained in the past years.
We rst prove that complete binary trees and bidimensional meshes (with a side length power of 2) are spanning subgraphs of the Supercube.
Then we prove that the Supercube is an Hamiltonian graph and, when the number of nodes is not a power of 2, it contains all cycles of length greater than 3 as subgraphs, i.e. it is pancyclic. Only few graphs, as the De Bruijn graph 12], the X-tree 7] and Product Shu e network 8], are known to have such characteristic. These results are also used to prove that the Butter y can be embedded onto the Supercube with dilation and congestion 2.
De nitions
In the sequel the (Hamming) distance d(x; y) of two binary strings x and y is de ned as the number of positions in which x and y di er. The length of a string x is denoted by jxj. In 10] it is proved that the Supercube is (s + 1){regular whenever h = 2 s?1 . The maximum degree maxd is such that s maxd 2s ? 2, while the minimum degree mind is such that s ? 1 mind s.
Besides, it is also proved that its connectivity is equal to the minimal degree of the network and the diameter is blog 2 Nc. We can consider V (S N ) partitioned in three sets V 1 , V 2 and V 3 where, for any sequence v of s bits: V 1 is the set of nodes having label of the form 0v for which 1v 2 V (S N ); V 2 is the set of nodes having label of the form 0v for which 1v = 2 V (S N ); V 3 is the set of nodes having label of the form 1v. The oblique edges connect a vertex of V 3 with a vertex of V 2 corresponding to an \absent node" in the Supercube, that is, when a hypercube edge should join a node in V 3 with a node not belonging to V (S N ), it is stretched to the corresponding node in V 2 . Adding a new node to an existing Supercube, in a certain way, means to split each oblique edge of the rst node in V 2 in two edges, an horizontal one and a hypercube one.
Notice that, from the de nition, it follows that the subgraph induced by V 1 V 2 (with hypercube edges) is a hypercube H s .
Graph Embedding
We will consider the capabilities of the Supercube to e ciently execute parallel programs. We de ne the computation graph of a parallel algorithm to be the graph in which nodes are processes and edges connect processes exchanging data; similarly the host graph of a parallel machine is a graph in which nodes are processors and edges are physical interconnection links. Our goal is to minimize the number of the steps needed in the host network to emulate a communication between adjacent processes in the computation graph and, at the same time, we want to minimize the number of processors of the host network.
More formally, we de ne the embedding h ; i of a graph G=(V(G),E(G)) into a graph H=(V(H),E(H)) as a function from V(G) to V(H), together with a function that maps (u; v) 2 E(G) into a path ((u; v)) 2 H connecting (u) and (v). The dilation of the edge (u; v) 2 E(G) under h ; i is the length of the path ((u; v)) 2 H, while the dilation of an embedding h ; i is the maximum dilation, over edges in G, under h ; i.
The expansion of an embedding h ; i of G into H is de ned as the ratio of the size of V (H) to the size of V (G). We notice that expansion measures processors' utilization.
The congestion of the edge (u; v) 2 E(H) under h ; i is the number of paths images of passing through (u; v) 2 E(H) while the congestion of an embedding h ; i is the maximum congestion, over edges in H, under h ; i.
The load of a node v 2 V (H) under hi; i is the number of nodes of G mapped onto v by and the load of an embedding h ; i is the maximum load over the vertices of H.
We assume that a processor can communicate with each of its neighbors in one step, so that edges serve as bidirectional links. In this way each step of G is simulated by a series of steps of H.
Given an embedding h ; i of G into H, each communication across an edge (u; v) of G is e ected by transmitting the message along the path ((u; v)) 2 H. In such a case, a message from u 2 G to v 2 G has to wait that processor (u) executes at most other tasks, has to travel at most edges in H and for each edge can be delayed by other messages and by the fact that the intermediate processor has to execute at most other tasks. It follows that the number of steps needed to simulate a step of G on the host architecture H can be at most 4]. It can be easily veri ed that S N can be embedded into H blog Nc with dilation, congestion and load O(1). Since H blogNc is a subgraph of S N , we can say that the Supercube and the Hypercube are computationally equivalent, that is T steps on one of the architectures can be simulated in O(T) steps on the other one.
When G is a subgraph of H there is an embedding such that the dilation, congestion and load are 1. In the following we will consider only embeddings with load 1.
Complete Binary Trees
It is possible to embed T d (a complete binary tree of height d) into its optimal H d with dilation 2 using the inorder enumeration of the tree nodes and the binary representation of this number as hypercube label 4]. Since T d is not a subgraph of H d when d 3, this embedding is optimal 9]. We will prove that the complete binary tree is a spanning subgraph of the Supercube.
In the sequel we assume to identify the nodes of T d in the usual way: the root is identi ed by e (null string); the children of a node w are identi ed by w0 (left child) and w1 (right child). A (binary) labeling of a complete binary tree T d assigns to each node w of T d a distinct d-bit label`(w).
De nition 2 Given a labeling`of a complete binary tree T d , we call the mirror labeling for any node w the label given by
where z is the bit complement of the string z and z R is the reverse of the string z. By induction on the height of the tree, we prove that a canonical labelling describes an embedding of T h into S 2 h ?1 with dilation 1.
For h = 3, the assertion can be veri ed by inspection and we note that the edge between the root and its right child is an oblique one. Let us suppose that a canonical labelling`gives an embedding of T h?1 in S 2 h?1 ?1 , then the labeling m for T h is obtained according to the following scheme:
for w 2 f0; 1g such that 0 jwj h ? 2. It is easy to verify that m gives us an embedding of T h into S 2 h ?1 with dilation 1 and that m is canonical.
We point out that T h can be embedded in S N for N > 2 h ? 1 with dilation 2, using the embedding algorithm into the hypercube 4].
Meshes and Cycles
Given a sequence B of binary labels B = w 1 The Gray code is such that two adjacent labels have distance 1. Using the Gray code any mesh 2 r 2 s can be embedded in H r+s with dilation 1 5] . In the sequel, if not di erently speci ed, labels of the Gray code are on s + 1 bits.
Given a Gray code Gr(d), we call the rst 2 d?1 labels the Left part of the code LG(d) and the last 2 d?1 labels the Right part RG(d). We notice that the last node of the Left part is always a neighbor in the Supercube of the rst one.
Every label in the Right (Left) part correspond to a label in the Left (Right) part di ering only on the last bit. More formally, the corresponding node of G(2 s + i) 2 RG is the node G(2 s ? 1 ? i) 2 LG and the corresponding node of G(j ? 1) 2 LG is G(2 s+1 ? j) 2 RG. We may think to fold the Gr(s + 1) after the Left part in such a way that the corresponding nodes are in the same \column" and horizontal edges join corresponding nodes (see Fig. 2). .... In the following, we call existing a label for which a node in S N exists.
Lemma 2 The mesh 1 N can be embedded in S N with dilation 1. In the rst three cases, the Lemma is proved. In the last case, we make an inversion of the skipping in such a way that L N (N ? 1) = G(2 s ? 1). When the label G(2 s + 1) does not exist, it can be easily obtained using the oblique edge between G(2 s ? 2) and G(2 s ) (see Fig. 4a ).
In general, if G(2 s + k) is the rst not existing label in the Right part, we use the oblique edge between G(2 s ? 1 ? k) and G(2 s + k ? 1) to invert the verse of the skipping (see Fig. 4b ). We notice that, if N = 2 s , the Supercube is a Hypercube and we are always in the case 1.
It is well known that in H d any even length cycles C 2t can be embedded with dilation 1 using the rst t labels and the last t labels of Gr(d). Now we can prove that the Supercube is pancyclic when N is not a power of two. and include in it a node of the Right part, (say G(2 s + r)) followed by a non existing one. It can be easily seen that at least one such node is present in the Right part, since N is not a power of 2.
If 2 s + r 2 s+1 ? 2t then the node corresponding to G(2 s + r) belongs to C 2t , and G(2 s + r) can be included using the oblique edge between itself and G(2 s ? r ? 2). If it is not the case, we have to shift the cycle XORing all the nodes with the label G(2 s ? r). In this way, the rst node of the cycle is G(2 s ? r) and G(2 s + r) is included in the cycle substituting the edge (G(2 s ? r); G(2 s ? r + 1)) with the path (G(2 s ? r); G(2 s + r); G(2 s ? r + 1)) i = 2 s : Using the Left part of the Gray code. i = 2 s + k (where 1 k < h): The cycle is embedded using the same strategy of Lemma 2 and Lemma 4, but using just k nodes of the Right part.
The usual strategy is followed until the (k ? 1)-th node of the Right part is included in the cycle.
Then four cases may occur for the choice of the k-th Right node and they depend on the position of the (k ? 1)-th node (see Fig. 5 ). After the k-th node in RG has been selected, all the remaining nodes of the Left part are included. We say group a set of adjacent existing labels in the Right part.
If the (k ? 1)-th node is the last one of its group, we include just the rst one of the next group 1 using an oblique edge (Fig. 5a-b) .
When the (k ? 1)-th node is not the last one of its group, if the node preceding in the cycle the (k ?1)-th Right node is in the Right part, we include just the last one of its group using an oblique edge (Fig. 5c ), otherwise we simply include the k-th node following the usual rule (Fig. 5d ). The embedding strategy for the cycles can be used also to embed e ciently the Butter y network onto the Supercube.
De nition 5 A Butter y B h is a graph with h 2 h nodes where any node is labelled with hi; wi, for 0 i h ? 1 and w 2 f0; 1g h . Edges are divided into straight edges (hi; wi; h(i + 1) mod h; wi) and oblique edges (hi; wi; h(i+1) mod h; w e i i), where e i is the binary vector of length h whose only nonzero component is the i-th and is the bit-wise XOR operation.
Using a well-known standard technique, it is easy to show that 1 Note that it exists since k < h. Lemma 6 The butter y B h can be embedded in S h 2 h with dilation 2 and congestion 2. Proof : We de ne an embedding h ; i in the following way. We map a node hi; wi of the butter y on the node (hi; wi) = L h (i)w. Notice that the labels assigned to hi; wi and h(i + 1) mod h; wi are at distance 1 by Lemma 4 (straight edges) while to an oblique edge (hi; wi; h(i+1) mod h; w e i i we assign the path (hi; w); (h(i + 1) mod h; wi); (h(i + 1) mod h; w e i i).
Then the result follows from Lemma 4 and from the de nition of the Butter y.
