Preparata and Vuillemin proposed the cube-connected cycles (CCC) in 1981, and in the same paper gave an asymptotically-optimal layout scheme for the CCC 16]. While all the known optimal layouts, including the Preparata-Vuillemin layout, of the CCC have long wires, we give a new layout scheme which has no long wires while keeping the asymptotically-optimal area. Hence, we can conclude that the CCC can be laid out optimally (within a constant factor) both in area and in wire length. We also show how large a constant-factor blow-up in area is needed in order not to produce any long wire in the layout.
Introduction
Interconnection network is an important part in the design of parallel computers. There are many criteria that need to be considered when choosing a speci c interconnection for processors. With the technological progresses in VLSI of electronic circuits that have taken place so far, it is reasonable to conceive of a huge number of processors being integrated tightly together to cooperate on the execution of parallel algorithms. As such, one of the major criteria by which to judge the suitability of an interconnection network is how compactly it can be laid out in a VLSI grid. Two most frequently used measures of a layout is the layout area and the maximum wire length.
The cube-connected cycles (CCC) is one of the most popular interconnection networks. Preparata and Vuillemin proposed the cube-connected cycles as a substitute for the hypercube in 1981 16] .
They also gave an asymptotically-optimal layout scheme for the CCC. However, their layout scheme has two drawbacks: it is not \minimal" in area; and it has long wires. Our research aims at nding better layout schemes for the CCC. In terms of area, we have proposed an improved layout which is more compact than the Preparata-Vuillemin layout 4]. This paper introduces yet another new layout of the CCC which is free of long wires while keeping the asymptotically-optimal area. Based on this result, we can conclude that the CCC can be laid out optimally both in area and in wire length, thus answering a question posed by Beigel and Kruskal 2] . The result in this paper also shows how large a constant-factor blow-up in area is necessary in order to keep wires short in a layout. Note that not all graphs can have a layout that is optimal in both area and wire length 3].
Preliminaries

The Thompson Model
Among the many mathematical models that have been proposed for VLSI computations, the most widely accepted is due to Thompson, which is known as the Thompson grid model 17, 18] . In this model, the chip is presumed to consist of a grid of vertical and horizontal tracks which are spaced apart at unit intervals. Two layers of interconnect are used to route the wires. Vertical wires are routed in the top layer of the interconnect and horizontal wires are routed in the bottom layer. Hence, wires may cross each other but cannot overlap for any distance or cross a node to which they are not incident. To change direction, wires may turn into the other layer by contact cuts or vias which facilitate connections between the two layers. The routing of wires in this fashion is also known as layer per direction routing or Manhattan routing. In our discussion, no knock-knees are allowed|that is, two wires cannot turn at the same grid point 14, 15] .
Formally, an embedding or layout of a graph G in a Thompson grid is an assignment of the nodes of G to intersection points in the grid and the edges of G to paths along the grid tracks. The layout area is the product of the number of vertical tracks and the number of horizontal tracks which contain a node or a path segment of the graph. The maximum wire length is the length of the longest wire in the layout.
The Cube-Connected Cycles
The d-dimensional cube-connected cycles (CCC) is constructed from the d-dimensional hypercube by replacing each node of the hypercube with a cycle of s nodes in the CCC 16, 11] . The ith-dimension edge incident to a node of the hypercube is then connected to the ith node of the corresponding cycle of the CCC. For example, see Fig. 1(a,b) . The resulting graph has d 2 d nodes each of degree 3. By modifying the labeling scheme of the hypercube, we can represent each node by a pair hw; ii where i (0 i < d) is the position of the node within its cycle and w (any d-bit binary string) is the label of the node in the hypercube that corresponds to the cycle. Then, two nodes hw; ii and hw 0 ; i 0 i are linked by an edge in the CCC if and only if either 2. i = i 0 and w di ers from w 0 in precisely the ith bit.
Edges due to (1) are cycle-edges and edges due to (2) are cube-edges. As shown in Fig. 1(c) , the CCC is often drawn in the multi-stage format. Alternatively, the CCC can be unfolded along its wraparound links (long links inside cycles); see Fig. 1(d) where the rst stage (of nodes) and the last stage are identi ed. It should be noted that the CCC is very similar to the butter y network| compare (d) and (e) of Fig. 1 . One can be embedded into the other with dilation 2 and congestion 2. Because of this similarity, we will show that an optimal layout of the butter y network without long wires can directly give rise to an optimal layout of the CCC without long wires.
The CCC is closely related to the butter y network just as the shu e-exchange network is to the deBruijn network. The group-theoretic relations of the four networks are well studied in 1] where the CCC and the butter y are proved to be Cayley graphs derivable from the shu e-exchange network and the deBruijn network respectively; and inversely, the shu e-exchange network and the deBruijn network are proved to be some coset graph of the CCC and the butter y network respectively.
In general, we say that a network of N nodes has (asymptotically) optimal area if it can be laid out in area (N 2 =T 2 ), where T is the time to execute an ascend-descend algorithm. We say that a (constant-degree) network has optimal wire length if the longest wire has length (N=T 2 ).
Accordingly, for the d-dimensional CCC with n = 2 d cycles, the optimal layout area is (n 2 ) and the optimal wire length is (n= lg n). While all the known optimal layouts of the CCC have maximum wire length (n), we give a new optimal layout for the CCC whose maximum wire length is (n= lg n).
Bidelta Networks
There has been a large amount of research on multistage interconnection networks. Kruskal and Snir have found that many of these networks, such as the indirect binary cube (or unfolded butter y network), the omega network, the SW banyan network, and so on, are isomorphic 8]. That is, one can be produced from the other by simply rearranging the nodes at each stage. These networks are referred to as bidelta interconnection networks. Consider for example 3-stage bidelta networks. These networks begin with the bits a 2 a 1 a 0 and end with the bits a (2) a (1) a (0) . There are three bits, and hence three stages of connections are needed to correct them all. For the 3-stage bidelta networks shown in Fig. 1(e,f,g ), the following are their bit changes, respectively. We use a larger font type for the bit being corrected at a stage. Beigel and Kruskal 2] gave an (asymptotically) optimal layout of bidelta networks without long wires. Their main idea is to create a particular bidelta network (to which all other bidelta networks are isomorphic) in which the stages with long wires are spread out, thus amortizing the long wire lengths across intermediary stages. Using the same idea, we give an alternate layout of bidelta networks without long wires. The area and maximum wire length of the alternate layout are the same as the Beigel-Kruskal layout, but the permutations of the nodes are di erent. As a result, the new layout can be used to generate a layout for the butter y network (and hence the CCC) using fewer additional stages than the Beigel-Kruskal layout.
The following lemmas are useful in the construction of the new layout. 
u t
We de ne end-exchange to be the operation that permutes r bits from a r?1 a r?2 a 1 a 0 to become a 0 a r?2 a 1 a r?1 (i.e., a butter y connection of 2 r nodes), and local operation to be one that causes a link to go from a node i to a node i or i + 1 or i ? 1 at the next stage. Lemma 2 The number of stages required to end-exchange r bits using only local operations is 2 r?1 ?
1.
Proof: Refer to Fig.2 . Note that the wires do not con ict at any stage|that is, every stage is a permutation.
u t Lemma 3 The number of stages required to shu e (or unshu e) r bits using only local operations is 2 r?1 ? 1.
Proof: Refer to Fig. 3 . u t
Now we show the construction of the bidelta network whose layout has optimal area and wire length. This network is modi ed from the F network in 2]. As before, let n = 2 d , and assume for convenience that d + 1 is a power of two. The results can be generalized for arbitrary d. Without loss of generality, we use a 15-stage network to explain the bit changes. The bit changes for this particular network are as given in Table 1 For the rst stage of every batch, the lowest of the upper bits is corrected. That is, a 11 , and then a 14 , and so on, in the example.
Within a batch, an end-exchange is applied to the lower r ? j bits of the upper bits, where j is the batch number. That is, in the example, an end-exchange is applied to all 4 (r ? 0) bits For all the stages other than the rst within a batch, the lower bits are corrected one at a stage in order (from the lowest to the highest).
Based on the table, we have the layout of the 15-stage bidelta network as shown in Fig. 4(a) . There are 2 15 = 32768 nodes in a stage of nodes. The upper bits divide these into 16 (2 r ) groups (hence the upper bits represent the number of a group), each consisting of 2048 (2 d?r = n=(d + 1)) nodes. An oval in the gure represents one such group. Ovals are connected to ovals according to the permutations and corrections of the upper bits. Since these permutations are actually endexchanges, the pattern of the connections within a batch is the same as the corresponding one in Fig. 2 . The detailed connections due to changes of the lower bits are not shown because of their sheer number.
We give another example, that of a 3-stage bidelta network, in which all the connections can be clearly shown, as in Fig. 5(a) . The bit changes, based on the strategy given above, are as follows. It is easy to see that the new layout has area (n 2 ), and by Lemmas 1 and 2, the longest wire has length (n= lg n).
Layout of the Butter y Network and the CCC
A butter y network can be obtained from an indirect binary cube (a bidelta network) by identifying the last stage of nodes with the rst stage. That means the network needs to be folded over horizontally to enable the wraparound connections. This would increase the area and the longest wire length by at worst a constant factor. In order to let the rst stage of nodes coincide with the last stage, the nal bit pattern (refer to the bit changes 1) , where the rst term is for the bidelta layout, and the second term is for the extra stages as just explained. Each column is divided into 2 r groups (ovals) as we did in Section 3, but each group has 2 d?r = 3n=2(d + 2) contiguous nodes. Fig. 4(b) shows the \extra stages" of the layout of a 22-stage butter y network without long wires, and Fig. 6(b) shows the detailed layout of a 4-stage, 16 16 butter y network (without folding over). The layout consists of a 3-stage layout for the bidelta network (compare this with Fig. 5(a) ) and one stage (since r = 2) for the shu ing to achieve the desired permutation of bits. Its bit changes are as follows. Because of our use of the end-exchange operation to permute the upper r bits instead of the unshu e pattern in the Beigel-Kruskal layout (see Fig. 8 in 2] ), we needed only (2 r?1 ?1) additional stages to shu e the r upper bits in our layout for the butter y network, whereas the Beigel-Kruskal layout would need more stages to reverse the r upper bits.
Due to the a nity of the CCC and the butter y network, the optimal layout of the n n butter y network without long wires can directly translate into the corresponding optimal layout of the d-dimensional CCC without long wires, where n = 2 d . For example, the detailed layout for the 4-dimensional, 16 16 CCC network without folding over is shown in Fig. 7(b) which is a straightforward derivation from Fig. 6 (b).
Layout in the Thompson Model
We have given in the last section a layout scheme for the butter y network and the CCC without long wires. Referring to Fig. 4 again, all wires are locally arranged|i.e., each oval i (representing a group of 3n=2(d + 1) contiguous nodes) is connected to oval i or i + 1 or i ? 1 in the next stage. For the layout of the CCC, oval i is also connected to itself or oval i + 1 or oval i ? 1 in the same stage.
In any case, the wire length is (n= lg n) and thus optimal.
Our explanation above only shows that the given layout is without long wires in a logical way. In practice, the wire length and the area are model-dependent. Now we adopt the widely-used Thompson model and show that the given layout can be implemented in a Thompson grid with optimal area and optimal wire length. We will compute the exact upper bound for the area to see by how large a constant factor the area is blown up in order to do away with the long wires. Note that in the Thompson model, wires are limited to Manhattan routing. Theorem 1 The CCC can be laid out optimally both in area and in wire length.
Optimal Area
Proof: Referring to Fig. 4 , there are three types of connection patterns, as shown in Fig. 8(a,b,c) .
Refer also to Fig. 7(b) . For Fig. 8(a) , a total of 3n=(d+2) contiguous nodes (in two ovals in the same stage) are connected in pairs by 3n=2(d + 2) cube-edges. These connections are shown in Fig. 8(d) in thin lines, which occupy 3n=2(d + 2) columns in the worst case. Then for Fig. 8(b) , 3n=2(d + 2) contiguous nodes within the same oval are connected in pairs by 3n=4(d + 2) cube-edges, occupying 3n=4(d + 2) columns in the worst case. And for Fig. 8(c), 3n=2(d + 2) contiguous nodes within the same oval are connected in pairs by 3n=4(d + 2) cube-edges, occupying 3n=4(d + 2) columns in the worst case. Among the three cases, only the last case has cycle-edges (in thick lines in the gures) that cross. To allow the crossing, an additional 3n=2(d + 2) columns are needed, as shown in Fig. 8(f) .
Hence, the width of any stage is (n=d). There are d stages and thus the total width is (n). Plus the height (n) of the layout, the total area is (n 2 ) and thus optimal. Clearly, the length of any wire is also (n=d). u t
An Exact Upper Bound on Area
There are r stages, i.e., stages (i; j ) for j = 0, which only contain pattern (a) in Fig. 8 . For each such stage, the width is 3n=2(d + 2), referring to Fig. 8(d) . The total height of the layout is equal to n plus the number of extra rows added because of the connections that cross (i.e., Fig. 8(f) ). Note that in Fig. 8(f) , one extra row (at the very bottom) is added when the wires are laid out. According to Fig. 4 , the cross pattern in our layout happens at all rows except the top and the bottom row. Hence, we have the height of the layout equal to n + (2 r ? 3) ' n.
Finally, the layout has to be folded up to produce the wraparound connections. By the standard technique, the original width and height are doubled. Hence, the nal area is 6n 2 (save some lower-order terms). 6 Conclusion
Wire problems are getting more and more attention from computer designers and chip designers 5]. Formerly, a wire had the magical property of transmitting data instantly from one place to another; a wire did not occupy any space, did not dissipate heat, and did not cost anything|at least, not enough to cause any worry. Now we know that in fact wires do cost|they could take up a lot of space, and spend a lot of time in transmitting data 19, 6] . Hence, it is justi ed that we should loolook for a layout that is su ciently small and that uses reasonably short wires. Beigel and Kruskal found a layout of bidelta networks without long wires 2]; Lai and Speangue proposed a layout of the hypercube without long wires 9]; and Lau and Chen showed that some networks can be laid out minimally both in area and in maximum wire length 10]. Our work is motivated by and closely related to Beigel and Kruskal's paper.
It should be noted that the desired layout of the CCC cannot be derived from the layout of the circular shu e network, although Beigel and Kruskal had expected the opposite based on the similarity between the CCC and the circular shu e network 2, 7] . In fact, we are doubtful about the existence of an optimal layout of the circular shu e network without long wires. Even if one existed, the desired layout of the CCC derived from it could not be better than our layout in area, taking into account constant factors. 1 Blum has shown that there exist some graphs whose minimal-area layout require much longer wires than a constant factor in comparison with some less-than-optimal layouts 3]. On the other hand, there exist graphs, especially those that are based on the mesh topology, that can be laid out in both minimal area and minimal longest wire 10].
It is interesting and practical to try to analyze the tradeo between the area and the maximum wire length for important graphs such as the CCC. In this paper, we have given a new layout of the 1 The details are beyond the scope of this paper.
CCC whose maximum wire length is reduced to an optimal level; the cost is a constant-factor blow-up in area. Hence, we say that the CCC can be laid out optimally within a constant factor both in area and in maximum wire length. Speci cally, the area of the new layout, which is approximately 6n 2 , represents a blow-up of 12 times since the best minimal layout area of the CCC is n 2 =2 4].
Considering certain physical limitations in parallel computers such as the speed of propagation of information, small-diameter networks do not necessarily have an advantage over, say, meshconnected networks which have a large diameter ( (N 1=2 )) 13, 20] . The problem of the smalldiameter networks is that they are often not scalable: as they grow in size, the wire length must grow, thus degrading communication performance. Hence, seeking an e cient layout without long wires for the small-diameter networks such as the CCC becomes very important in order to demonstrate their suitability as an interconnection network for the implementation of parallel computers.
Our further research is to establish lower bounds on the area and the maximum wire length for the CCC taking into account constant factors 12] . Must all the minimum-area layouts have long wires? What is the minimal constant blow-up when reducing the length of wires to an optimal level?
