The 
Introduction
In the past two decades, a variety of interconnection networks have been suggested for parallel computing architectures, many of which are extensions or variations of the hypercube [15] . Hypercubes possess many appealing properties [6, 10, 17] , such as regularity, rich connectivity, high bisection width, logarithmic node degree and diameter, ease to embed other common structures, and efficient routing. Several hypercube-based machines have been built [1, 18] , from Caltech's Cosmic Cube (1983), Mark II (1984) , and Intel's iPSC/ 1 (1985) to more recent Silicon Graphics ' Origin 2000 ' Origin (1996 and Origin 3000 (2000) . The Origin 2000 and 3000 series servers utilize the so-called cc-NUMA (cache coherent non-uniform memory) architecture to link up multitudes of routers and nodes by emulating a multidimensional hypercube.
However, a primary disadvantage of the hypercube architecture is the cost of increasing size. The number of nodes 2 n in an n-dimensional hypercube grows rapidly as n increases, making VLSI/WSI implementation difficult for large systems. To overcome such a shortcoming, many hypercube-like architectures have been proposed. Among them, to name a few, there are incomplete hypercubes [8] , reduced hypercubes [22] , cube-connected cycles [13] , hierarchical cube networks [5] , folded hypercubes [4] , unidirectional hypercubes [3] , spanning bus connected hypercubes [12] , augmented binary hypercubes [9] , balanced hypercubes [21] , Fibonacci cubes [7] , extended Fibonacci cubes [20] , and enhanced Fibonacci cubes [14] .
Recently, Li et al [11] proposed an interconnection network called dual-cubes. An n-dimensional dual-cube is comprised of 2 n (n ; 1)-dimensional hypercubes as building blocks, also referred to as clusters. The clusters are further grouped into two classes. Each node in a cluster has only one external link to a node in another cluster of different class. The dual-cube was found to share many properties of the hypercube and increase tremendously the total number of nodes in the systems with a limited node degree. For a fixed node degree of n, a dual-cube contains nodes as much as 2 n;1 times that of a hypercube. Since the integrated circuit (IC) technology limits the number of links per node, the effort is very meaningful to increase the total number of nodes while keeping the node degree small. This appreciable property motivated the current study on the dual-cube as a variation and an improvement of the hypercube, which is still used by the industry in building high performance supercomputers, such as the most recent SGI Origin 3000 series.
Some basic topological properties of the dual-cube can be found in [11] , such as degree, diameter, cost, average node distance, and bisection width. This paper is focused on the dual-cubes structural self-similarity, i.e., recursive construction and decomposition, and embeddability of other structures. Some primitive considerations on optimal VLSI layout design of the dual-cube are also presented.
The rest of the paper is organized as follows: Section 2 introduces a formal definition of the dual-cube and the notations used. Section 3 investigates topological self-similarity of the dual-cube. Section 4 studies the Hamiltonian property of the dual-cube. Section 5 discusses some considerations on efficient VLSI layout design. Finally, Section 6 concludes the paper and describes some future work.
Definition of the dual-cube
The most basic and perhaps the most important property that the dual-cube inherits from the hypercube is the binary coding scheme of the nodes, which is beneficial for the implementation of the communication primitives. A node in an n-dimensional (n 2) dual-cube, abbreviated as n-dualcube and denoted as DC n hereafter, can be expressed as where^denotes the concatenation of binary strings. For simplicity, we will omit the symbol^in the following context. In the above notation, the first bit, namely 0 and 1, represents class ID and C i n and N j n refer to cluster ID and node ID, respectively.
The set of nodes in DC n can be expressed as DC n = f0C i n N j n 1N j n C i n 1 i j 2 n;1 g For n = 2, we have C Two nodes in DC n are connected if and only if they differ in exactly one bit and either of the following two conditions is satisfied: (1) The difference occurs in node ID, or (2) The difference occurs in class ID. Links satisfying the first condition are intra-cluster ones, whereas those satisfying the second condition are inter-cluster ones. The 2-dualcube is shown in Figure 1 . We can also see that the nodes connected by intra-cluster links constitute hypercubes, that is, each cluster in an ndual-cube is an (n ; 1)-dimensional hypercube, denoted as H C n;1 .
Topological self-similarity of the dual-cube
Self-similarity of an interconnection network is desirable when deriving substructures and designing VLSI/WSI layouts. It is also the basis of some algorithms, such as divideand-conquer. An important property of an n-dual-cube is that it can be constructed recursively from lower dimensional cubes. The set of nodes of DC n (n 3) can be written as DC n =f00C i n;1 0N i n;1 00C i n;1 1N i n;1 01C i n;1 0N i n;1 01C i n;1 1N i n;1 10C i n;1 0N i n;1 10C i n;1 1N i n;1 11C i n;1 0N i n;1 11C i n;1 1N i n;1 1 i 2 n;1 g with C 1 2 = 0 ,C Separating each cluster into two parts, one with the first bit of the node ID being 0 and the other with the first bit of the node ID being 1, an n-dual-cube can be divided into four
subgraphs. An example is given in Figure 2 for the case of n = 3 , in which a subgraph is enclosed by the dotted lines.
Obviously, if we remove the links between the four subgraphs and remove the first bits of the cluster ID and the node ID, we will obtain four identical disjoint (n ; 1)-dualcubes. This property is summarized in the following propositions: Proposition 3.1 Any n-dual-cube (n 3) can be divided into four (n ; 1)-dual-cubes.
Proof: An n-dual-cube consists of 2 2n;1 nodes. An (n ; 1)-dual-cube consists of 2 2n;3 nodes. Obviously, the size of an n-dual-cube is four times that of an (n ; 1)-dualcube. Therefore, we only need to prove that each subgraph obtained by the process described above is a (n ; 1)-dualcube. The nodes in an n-dual-cube are labeled by the set of (2n ; 1)-bit binary numbers. By removing the first bit of the cluster ID, we get two sets of (2n ; 2)-bit binary numbers. Further removing the first bit of the node ID, we will get four sets of (2n ; 3)-bit binary numbers. Furthermore, while removing the intra-cluster links between nodes with distinct first-bit node ID, all the inter-cluster links remain unchanged. The adjacency of nodes in each subgraph still satisfies the two linking conditions. Therefore, each of the subgraph is a (n ; 1)-dual-cube. 2
In the above discussion, we give privilege to the first bits of the cluster ID and the node ID, i.e., we insert one bit before the first bits of the cluster ID and the node ID of DC n;1 to construct DC n , and divide DC n into four DC n;1 's by removing the intra-cluster links between nodes with identical binary number in the first bit of the node ID. As a matter of fact, this is not a necessary condition. We can operate on any bit of the cluster ID or node ID, which leads to the following two propositions.
Proposition 3.2
There are (n ; 1) 2 different ways in which we can construct an n-dual-cube from an (n ; 1)-
dual-cube.
Proof: Both the cluster ID and the node ID have n ; 1 bits. We can insert one bit in front of the ith bit (1 i n ; 1 ) of the cluster ID and the jth bit (1 i r ; 1 ) of the node ID of DC n;1 , as long as the same rule is followed in all nodes. Altogether, we have (n ; 1) (n ; 1) = (n ; 1) 2 different ways. The n-dual-cube constructed this way is unique.
2 Proposition 3.3 There are n ; 1 different ways in which we can divide an n-dual-cube into four (n ; 1)-dual-cubes.
Proof: Each node ID has n ; 1 bits. We can remove the intra-cluster links between nodes with identical ith-bit (1 i n ; 1) in the node ID.
2 As an example, another way of separating DC 3 is illustrated in Figure 3. 
Hamiltonicity of the dual-cube
The Hamiltonian property is one of the major requirements in designing the topology of networks. It enables a network to embed a ring structure, to implement the popular and new routing technique of ring routing, and to use ring-based multicast algorithms. Here we will first prove all dual-cubes are Hamiltonian and then show how to emulate rings and linear arrays on the dual-cube. We know that each cluster in a dual-cube DC n is a hypercube H C n;1 . It has been shown that there exist b n 2 c edge-disjoint Hamiltonian cycles on a hypercube H C n [16] . For example, two edge-disjoint Hamiltonian cycles on H C 4 are illustrated in Figure 4 . From the proof of Theorem 5.1, we can observe that the Hamiltonian cycle on a dualcube is actually the sequential head-tail interconnection of the Hamiltonian subcycles on hypercubic clusters. Now, the claim of theorem 2 becomes obvious. Proof: It is easy to observe that, as in a hypercube, a ring of odd length cannot be embedded into any dual-cube.
Since each cluster in a DC n is an H C n;1 , a ring of even length 4 k 2 n;1 (n 3 ) can be embedded within any cluster of DC n .
We now prove the case where the embeddings contain inter-cluster links. According to the definition of a dualcube, there are no direct inter-cluster links between nodes of the same class and there is only one inter-cluster link between two nodes of different classes. Therefore, we need at least 4 clusters (2 clusters of class 0 and 2 clusters of class 1) to form a ring. It is then easy to verify the exceptional case.
A ring of length 2 i+2 , i = 1 2 3 n ; 1, can be embedded as follows:
For i < n ; 1, where links within each row are intra-cluster ones and links between two rows are inter-cluster ones.
In the above construction, the ring only traverses one intra-cluster link of two adjacent nodes. From the topology of a hypercube, there exist 1 to 2 n ;1 links between two adjacent nodes. Therefore, by replacing the intra-cluster link with more intra-cluster links up to 2 n ; 1, we can expand the above rings to contain up to 2 n+i links. Obviously, the intervals 2 i+2 2 n+i ] (i = 1 2 3 n ; 1) contain all even numbers from 8 to 2 2n;1 .
It is further noticed that, when n 3, the intervals 2 i+2 2 n+i ] ( i = 1 2 3 n ; 1) intersect each other, which implies that the embedding of a ring into DC n is not unique. 
VLSI layout of the dual-cube
In a wafer scale integration (WSI), all the processors of a multicomputer are placed on a silicon wafer. They are arranged in a specific pattern on the wafer surface and are connected by on-wafer wiring. On-wafers wires are allowed to run only horizontally and vertically. A track is defined as a continuous horizontal or vertical line on which wires are placed without overlapping each other. If all the processors (nodes) in a multiprocessor system are arranged in an array, then all the tracks are one-dimensional. To have an efficient layout, we need to minimize the number of tracks.
Before discussing the VLSI design for the dual-cube, we will first investigate the relation between the dual-cube and the hypercube. As mentioned earlier in Section 2, a cluster in an n-dual-cube is an (n ; 1)-hypercube. Looking into the structure of a dual-cube in more detail, we can obtain the following proposition: nodes, each of which is also represented by a (2n ; 1)-bit binary number. Two nodes are adjacent if and only if they differ in one bit. Obviously, a DC n has the same amount of nodes as an H C 2n;1 . But the former has fewer links. Therefore, a DC n can be embedded into an H C 2n;1 . 2 For a node degree of n, there are a total number of 2 n;1 n links in the hypercube and 2 2(n; 1) n links in the dualcube. Therefore, with the same number of 2 2n;1 nodes, the dual-cube contains 2 2(n; 1) n links, while the hypercube has 2 2(n;1) (2n ; 1) links. The total number of l inks decreases by nearly 50%. This proposition implies that an efficient VLSI layout of a DC n will contain much less number of tracks than that of a hypercube H C 2n;1 .
From the proof of Theorem 4.1, we can see that, in a Hamiltonian cycle of an n-dual-cube, clusters of class 0 and class 1 are arranged consecutively in an alternating pattern, which divide the Hamiltonian cycle into 2 n sections. Each section corresponds to a Hamiltonian cycle of an (n ; 1)-hypercube. Intuitively, an efficient arrangement of both the inter-cluster links in each section and the intra-cluster links crossing sections will lead to an overall efficient VLSI layout for a dual-cube.
Chen et al. [2] proposed an efficient layout of hypercubes using one-dimensional tracks and calculated the number of one-dimensional tracks required for a given N-node hypercube, where N = 2 n and n is the dimension of the hypercube. In their approach, a Hamiltonian path is constructed first, where nodes are arranged in a linear order using the binary reflected Gray code. Then, the N-node hypercube is divided two N 2 -node subcubes. The addresses of two connected nodes, one from each of these two subcubes, differ in the most significant bit. In the above sequence the first eight nodes belong to one subcube and the remaining eight nodes are in the other subcube. The number of tracks required in connecting nodes among two subcubes is measured by the bisection width of this division, i.e. N 2 .
The above process can be repeated at each N 2 -node subcube utill all the subcubes are divided into 1-node subcubes. The total number of tracks required is the sum of the bisection width at each division. It was shown that the exact number of tracks used is N ; log N.
Later, Wu [19] showed that, for the same layout, the number of tracks used can be reduced to Proof: Based on the above arguments, half of connections in bisection (division) 2k ;1 can share tracks used for connections in bisection 2k; therefore, T tracks consist of 2 n;1 ; 2 n;2 = 2 n;2 tracks for bisection 1, 2 n;2 tracks for bisection 2, 2 n;3 ; 2 n;4 = 2 n;4 tracks for bisection 3, 2 n;4 tracks for bisection 4, . . . 2 n;(2k;1) ; 2 n;2k = 2 n;2k tracks for bisection 2k ;
1, 2 n;2k tracks for bisection 2k, . . . We have T = 2 n;2 + 2 n;2 + 2 n;4 + 2 n; Wu's approach can also be used in laying out intercluster links in each hypercubic substructure of a dual-cube. Intra-cluster links can share tracks with each other or with inter-cluster links. Figure 8 shows a possible track sharing for the case where the end nodes of two links are adjacent to each other along the Hamiltonian path. As an example, an efficient VLSI layout for a 3-dual-cube is illustrated in Figure 9 where 9 tracks are used. On the other hand, a hypercube of the same number of nodes, 32 in this case, will need 21 tracks based on Theorem 5.1, as shown in Figure 10 . It should be noted that the nodes in the Hamiltonian paths of the dual-cube and the hypercube are not in the same order.
Conclusions and future work
An n-dimensional dual-cube can also be viewed as an incomplete (2n ; 1)-dimensional hypercube with faulty links between nodes whose binary codes differ in only one bit and the difference occurs at positions 2 to n if the codes start with 0 and at positions n + 1 to 2n ; 1 if the codes start with 1. It was shown that the dual-cubes have recurrent structures. A higher dimensional dual-cube can be recursively constructed from or decomposed into lower dimensional dual-cubes. It was also proved that all dual-cubes are Hamiltonian. Rings and linear arrays can be embedded into a dual-cube. Compared with the hypercube of the same node degree n or of the same amount of nodes, the dual-cube has the appealing property of containing 2 n;1 times more nodes or approximately 50% fewer links. Consequently, an efficient VLSI layout of the dual cube has significantly less number of tracks. All these desirable topological properties presented in this paper, together with many others, make the dual-cube a promising interconnection network for potential application in building large parallel computing systems.
Future work on dual-cubes includes the development of detailed strategy for designing efficient VLSI layout, the evaluation of the architectural complexity, and the investigation of some application issues in parallel computing. 
