Preparata and Vuillemin proposed the cube-connected cycles (CCC) and its compact layout in 1981 16]. We g i v e a new layout scheme for the C C Cwhich uses less than half the area of the Preparata-Vuillemin layout. We also give a l o wer bound on the layout area of the C C C . The area of the new layout deviates from the bound by a small constant factor. We then consider the unfolded C C C , called the cube-connected lines (CCL ), for which w e give a c o m p a c t l a yout. This latter layout is tight when compared with the lower bound above w h i c h is also applicable to the C C L .
Introduction
Interconnection network is an important design element in the construction of parallel computers. Many i s s u e s a r e i n volved in deciding on a speci c topology for connecting a set of processors. With the rapid technological progresses in VLSI, it is reasonable to conceive o f a h uge number of processors being integrated tightly together to solve problems in a cooperative, parallel fashion. Therefore, one of the criteria to judge the suitability o f a n i n terconnection network for the implementation of parallel computers is whether the network can be laid out compactly in a VLSI grid.
The cube-connected cycles (C C C ) is one of the most popular interconnection networks, which was proposed by Preparata and Vuillemin 16] as a substitute for the hypercube in 1981. In the same paper, they gave an asymptotically-optimal layout scheme for the C C C . Their layout scheme, however, cannot produce the minimal layout for the C C C . Our work aims at nding better layout schemes for the C C C . Research in the elds of graph embedding and VLSI layout has developed powerful techniques 2, 5] that can produce embeddings and layouts which are quite e cient|often within a constant factor from the optimal. However, even a modest constant factor may render an asymptotically-optimal layout or embedding unacceptable for real implementation. It is necessary to try to achieve the minimal. This is the motivation behind our work.
Our project has two goals: (1) to give a more compact layout for the C C Cthan the PreparataVuillemin layout, and (2) to reduce the long wires of the layout while keeping the asymptoticallyoptimal area. We h a ve a c hieved the rst goal|a new layout scheme which uses less than half the area of the Preparata-Vuillemin layout. Section 2 reviews the Thompson Model and the PreparataVuillemin layout. Section 3 presents the new layout and compares it with the Preparata-Vuillemin layout. Section 4 give s a l o wer-bound on the layout area, which is met (save s o m e l o w-order terms) by o u r l a yout of the C C L as given in Section 5.
Preliminaries

Thompson Model
Among the many mathematical models that have been proposed for VLSI computations, the most widely accepted one is due to Thompson 17, 18] . In his model, the chip is presumed to consist of a g r i d o f v ertical and horizontal tracks which a r e spaced apart at unit intervals. Two l a yers of interconnect are used to route the wires. Vertical wires are routed in the top layer of the interconnect and horizontal wires are routed in the bottom layer. Hence, wires may cross each other but cannot overlap for any distance or cross node to which they are not incident. To c hange direction, wires may t u r n into the other layer by c o n tact cuts or vias that facilitate connections between the two layers. In our discussion, no knock-knees are allowed|that is, two wires cannot turn at the same grid point 1 4 , 1 5 ] .
Formally, a n e m bedding or layout of a graph G in a Thompson grid is an assignment of the nodes of G to intersection points in the grid and the edges of G to paths along the grid tracks. O n e o f t h e important measures of a layout is the layout area which is de ned as the product of the numberof vertical tracks and the number of horizontal tracks that the layout uses to contain all the nodes and all the path segments.
Cube-Connected Cycles
The s-dimensional (s-d for short) cube-connected cycles (C C C ) is constructed from the s-dimensional hypercube by replacing each node of the hypercube with a cycle of s nodes 13, 16] . The ith-d edge o f a n o d e o f t h e h ypercube is then connected to the ith node of the corresponding cycle of the C C C . For example, see Fig. 1(a,b) . The resulting graph has s 2 s nodes, each of degree 3. By extending the labeling scheme of the hypercube, we can represent each node of the C C Cby hw ii where i (1 i s) is the position of the node within its cycle and w (an s-bit binary string with the 0th-d at the rightmost) is the label of the node in the hypercube that corresponds to the cycle. Two nodes, hw ii and hw 0 i 0 i, are linked by a n e d g e i n t h e C C Cif and only if either Edges of kind (1) are cycle-edges and edges of kind (2) are cube-edges. As shown in Fig. 1(c) , C C C is often drawn in the multi-stage format which will directly give rise to the Preparata-Vuillemin layout. The rst and the last stage, stages 1 and s, are called the two end stages, and they consist of all the nodes hw ii for i = 1 a n d i = s respectively. The C C Cis closely related to the butter y network just as the shu e-exchange network is to the deBruijn network. The group-theoretic relations of the four networks are well studied in 1] where the C C Cand the butter y are proved to be Cayley graphs derivable from the shu e-exchange network and the deBruijn network respectively and inversely, the shu e-exchange network and the deBruijn network are proved to be some coset graph of the C C Cand the butter y network respectively.
We i n troduce an unfolded version of the C C C . Like the butter y network, the C C Cnow has the traditional folded version and the new unfolded version. For the unfolded C C C , condition (1) in the above de nition is changed to Each cycle of the C C Cis replaced by a line in the unfolded C C C . We therefore call the unfolded C C C the cube-connected lines, denoted by C C L hereafter. The 3-d C C L is shown in Fig. 1(d) . 16] . Therefore, the Preparata-Vuillemin layout is asymptotically optimal.
The Preparata-Vuillemin Layout
More precisely, for an s-d C C Cwith n = 2 s cycles, denoted by C C C (n) hereafter, let W(s) and H(s) b e t h e n umbers of vertical and horizontal tracks respectively|i.e., the width and the height 
New Layouts
Although the Preparata-Vuillemin layout for C C Cis asymptotically optimal, it is not the minimal layout. For real implementations, we w ould prefer using as tight a l a yout as possible. Here we give a new layout for the C C C . It is more compact than the Preparata-Vuillemin layout whether it is minimal is an open question. Referring to Fig. 2 again, there are two o b vious shortcomings in the Preparata-Vuillemin layout:
the layout does not try to make use of the corner positions of a cycle by putting some nodes there, and the layout places all the cycles along the same horizontal axis.
In the new layout, these two problems are corrected, and the resulting layout uses less area and has a better aspect ratio. Fig. 3 (a,b,c) show t h e l a youts of the rst three C C C 's, starting from the second dimension. These layouts use minimal areas. As our interest is in the layout of the general C C C , w e omit the proofs of these speci c cases here. Like the Preparata-Vuillemin layout, the new layout is based on recursive construction. Unlike the Preparata-Vuillemin layout for which the recursion begins at the rst dimension, the base case for recursion in the new layout is the 4-d C C C (Fig. 3(c) ). The reason for this is that the layout of the 4-d C C Cis the rst one (starting from the rst dimension) that puts a node in every corner of a cycle. This layout is correct in the sense that it is indeed a valid C C Cthat is being laid out. This can be easily veri ed by examining the connections against the labels of the cycles in Fig. 3(c) . Similarly for the smaller cases.
Recursive Construction
The procedure is as follows, for s 5. these extra tracks: one node is added to every cycle, and its corresponding node in the other copy of the layout of the (s ; 1)-d C C Cis placed at the same horizontal position, and the two are joined by a horizontal wire. We label the cycles in the left copy b y extending the original labels by a 0 on the left, and the cycles in the right copy b y a 1 o n t h e left. The correctness of the s-d layout immediately follows from this labeling scheme. In Fig. 3(d) , which is recursively constructed from Fig. 3(c) , the labels of the bottom row of cycles are shown. 
In the same way that a more economical layout can be derived from the Preparata-Vuillemin layout, the new layout has a more economical version. The more economical version for the 6-d C C C is shown in 2 n log n ; 5n:
Comparison
By ignoring the low-order terms in Formulae 1, 2, 3 and 4, the four layout schemes of C C C (n) discussed above take areas of approximately 2n 2 , 4 3 n 2 , 3 4 n 2 and 1 2 n 2 respectively. We compare the new layout with the Preparata-Vuillemin layout, and the more economical version of the new layout with the more economical version of the Preparata-Vuillemin layout. In either case, the new layout scheme uses less than half the area of the Preparata-Vuillemin layout. The other important advantage of the new layout is that it has a more practical aspect ratio (W (s)=H(s)), which is close to 1, whereas the aspect ratio of the Preparata-Vuillemin layout could be as large as 3. Because of a better aspect ratio, the new layout has a shorter maximum wire length than the Preparata-Vuillemin layout.
The new layout also shows the superiority of the C C Cin layout area over other hypercube substitutes such as the shu e-exchange network and the butter y network 13]. The optimal layout of the shu e-exchange network was due to Leighton 12] . His layout of the shu e-exchange network, as well as the other related ones, however, are complicated, not regular or recursive. For years, the best known layout of the butter y network with n inputs or outputs was that by Wise 19 ] which has area ' 2n 2 . Recently, a more compact layout for the butter y was found with area ' 11 6 n 2 9].
The butter y networks discussed in these two papers, however, are unfolded. To be fair, the folded butter y network (i.e., the rst and the last stage are merged) 13] should be considered when comparing with the C C C . The corresponding areas of the folded butter y given in 9, 19] would then need to be doubled or quadrupled. On the other hand, as will be seen in Section 5, the unfolded C C Ccan be laid out with area ' 1 4 n 2 .
Lower Bound on Layout Area
We g i v e b e l o w a l o wer bound of ( 1 2 n ; 1) 2 for C C C (n). Our new layout of the C C Cas presented in the previous sections deviates from this bound by a factor of 2. The construction of the lower bound does not consider the laying out of the cycles in a C C C (see Fig. 6 ), and hence the lower bound is also valid for the C C L . As will be shown in the next section, the C C Lcan be laid out in an area of ( 1 2 n + o(n)) 2 , w h i c h is tight when compared with the lower bound.
The lower bound of ( 1 2 n ; 1) 2 is easily seen from the bounding strategy invented in 18] w h i c h i s in terms of the bisection width of a graph.
Lemma 1 18] For any graph G with bisection width BW(G), AREA(G) (BW(G) ; 1) 2 .
The proof of the bisection width, 1 2 n, o f C C C (n), however, is complicated. Alternatively, w e can turn to the modi ed bounding strategy, from 2] where a lower bound of the butter y network layout is proved by the same technique, but is in terms of the minimum special bisection width.
Let G be a graph having a designated set of special nodes. The minimum special bisection width In order to bound the M S B Wof C C C (n), the congestion argument originated in 12, 13] which is used for bounding unknown M S B W 's from known ones is used. Lemma 3 Let G and H be g r aphs having equal numbers of special nodes. If there is an embedding of G into H which maps special nodes to special nodes and which has congestion C, then
M S B W (H) (1=C)M S B W (G):
The complete bipartite graph K n n plays the role of the guest graph G with known M S B W= 1 2 n 2 .
Lemma 4 M S B W (K n n ) = 1 2 n 2 when all nodes of K n n are s p ecial. Now w e g i v e a n e m bedding of the guest graph K n n into the host graph C C C (n).
Lemma 5 One can embed K n n into C C C (n) with congestion 2 s = n in such a way that the inputs and outputs of K n n map, respectively, to the rst stage and the last stage of C C C (n).
Proof: Consider the embedding of K n n into C C C (n), which assigns inputs of K n n to the rst stage of C C C (n) and outputs of K n n to the last stage of C C C (n), and which routes the edges of K n n in increasing order of dimensions|i.e., from right to left.
With no loss of generality, see Fig. 6 for an embedding of K 8 8 into C C C (8) . Since the long wraparound cycle-edges of the C C Care not used for routing in the embedding, Fig. 6(a) is simpli ed to Fig. 6(b) |i.e., C C L (8) . Fig. 6 (b) can be isomorphically arranged to become Fig. 6 (c) in which all stages of nodes except the rst stage are reordered so that pairs of nodes connected by cubeedges are placed together like the rst stage while the cycle-edges at each stage appear to be in the unshu e-connection pattern 1, 8 ] . Fig. 6 (c) can be transformed into Fig. 6 (e) by replacing every pair of nodes with a complex node as shown in Fig. 6(d) . Fig. 6 (e) is a reverse Omega network (or a ip network 3]). Hence, the original C C C (n) is transformed into a reverse Omega network with A further look reveals that the congestion of cube-edges of C C C (n), shown as a thin edge in Fig. 6(d) , is also 2 s = n since from each input, exactly half of the paths will go through the cubeedge of a complex node. Theorem 1 Any layout of C C C (n) has area a t l e ast ( Theorem 2 There is a layout of C C Lwith area (
We p r o ve Theorem 2 via a sequence of reductions.
First Reduction
We show h o w to construct the desired layout of C C L (4n) from four copies of a suitable layout of C C L (n). In the following, a stage that is placed \along" a side of a grid means that each of the nodes of the stage is either directly on the side or there is no other node that is between it and the side.
Lemma 7 One can construct a layout L 4n of C C L (4n) with the area indicated i n T h e orem 2, from four copies of a layout L n of C C L (n) that has the following properties.
L n places C C L (n) in an (n + o(n)) (n + o(n)) grid.
L n places one end stage of C C L (n) along a vertical side of the grid and the other end stage of C C L (n) along a horizontal side of the grid.
Proof: Assume without loss of generality that the given layout L n places the end stages along the bottom and the right sides see Fig. 7 (a). Flip L n around horizontally to produce layout L 
. u t
We h a ve t h us reduced the layout problem to one of producing L 2n .
Second Reduction
We n o w s h o w h o w to construct layout L n that is needed by Lemma 7. For convenience, we assume that s is even. The case of odd s can be dealt with similarly, and the result will di er in only the lower-order terms.
Lemma 8 Suppose that we can lay out any C C L (n) in an n (n ; 1) grid, in such a way that the end stages of C C L (n) are along the two vertical sides of the grid. Then one can construct layout L n of C C L (n) as described i n L emma 7.
Proof: Let the stages of C C L (n) b e n umbered 1 2 : : : s ; 1 s , where n = 2 s , and stages 1 and s are the end stages. We c r e a t e l a yout L n of C C L (n) as follows.
Let k = s=2. By cutting the edges between stage k and stage k + 1 w e decompose C C L (n) i n to C C L (n 1): the subgraph of C C L (n) bounded by stage 1 and stage k (nodes and edges), and C C L (n 2): the subgraph of C C L (n) bounded by s t a g e k + 1 and stage s (nodes and edges).
. . . copies of L, one above another, along the right side of the grid for L n . Layout L (1) resides in an n (2 k ; 1) grid. By basic properties of the hypercube, C C L (n 2) is isomorphic to C C L (n 1) by a suitable relabeling of the lines we do the same for C C L (n 2) as we just did to C C L (n 1). The result is rotated 90 degrees to produce L (2) , a l a yout of C C L (n 2) along the bottom side of the grid for L n .
It can be easily seen that the smallest grid for L n that can hold L (1) and L (2) is of area (n + o(n)) (n + o(n)).
Finally, w e m ust connect L (1) and L (2) to re-create C C L (n). This can be accomplished by routing a speci c bijection between the two subgraphs of C C L (n) that have b e e n l a i d o u t . It is obvious that the unpopulated area that is left behind after placing the layouts L (1) and L (2) is su cient for any such bijection to be routed. u t
Third Reduction
Our last task is to construct the layout L of C C L (n) as demanded in Lemma 8. This can be easily done by modifying the Preparata-Vuillemin layout: all cycles become lines, and nodes can be placed at the two ends of a line (refer to Fig. 2(b) ). In particular, C C L (n) can be laid out in an n (n ; 1) grid.
Hence, Theorem 2 is proved.
Conclusion
We h a ve given a simple, regular and more compact layout scheme for the C C C , which takes less than half of the area of the Preparata-Vuillemin layout. We h a ve also derived a lower bound on the layout area of the C C C . Our layout deviates from the lower bound by a constant factor of 2. The lower bound, however, is not the tightest possible because its construction does not take i n to account the laying out of the cycles in a C C C . It is, more appropriately, a lower bound for the C C L . On the other hand, our tight l a yout of the C C Lcan give rise to a layout of the C C C , but the area will be four times that of the C C L(consider Fig. 8 , and the width and the height of the grid will need to be doubled to accommodate the cycles of the C C C ). We conjecture therefore that the layout of the C C Cas we h a ve proposed in this paper is optimal. Further work will be directed to deriving a lower bound for the C C Cthat would take the cycles into account.
Our layout of the C C L reveals the superiority of the C C L over the unfolded butter y network since the former takes only one-fourth of the layout area of the unfolded butter y network 2]. Another merit of the C C Lis that a C C Ccan be embedded into a C C Lwith congestion 2 and dilation 2 due to the well-known fact that a cycle can be embedded into a line with congestion 2 and dilation 2. Hence, the C C L can be a good substitute for the C C Cand can execute the ascend-descend algorithm with a small constant s l o wdown.
Another important measure of a layout is the maximum wire length 4, 1 1 ]. We h a ve recently succeeded in coming up with a layout of the C C Cwhich has no long wires and yet preserves the asymptotic-optimality of the area 7]. Our next task is to consider the tradeo 4, 6] between area and maximum wire length for the C C Clayouts.
