Preparata and Vuillemin proposed the cubeconnected cycles (CCC) i n 1981 [lS], and in the same paper, gave a n asymptotically-optimal layout scheme for the CCC. We give a new layout scheme for the 
Introduction
Interconnection network is one of the most crucial design issues for parallel computers. There are many criteria to be considered in choosing a specific interconnection for a given set of processors. With the rapid technological progresses in VLSI, it is now common to connect a huge number of processors together to cooperate for the execution of parallel algorithms. Obviously, in these situations, one of the criteria for packing these processors together would be the "compacity" of the layout in a VLSI grid. 111 general, the more compact the better.
The cuke-connected cycles (CCC) is one of the most popular interconnection networks. Preparata and Vuillemin [16] put forward the CCC as a practical substitute for the hypercube in 1981, and at the same time gave an asymptotically-optimal layout scheme for it. Their layout scheme, however, cannot produce the minimal layout for the CCC. Our work addresses this issue. We have two goals: one is to derive a more compact layout for the CCC than the Preparata-Vuillemin layout: and the other is to reduce the long wires in the layout while kceping the asymptotically-optimal area. This paper reports the result coming from our effort in trying to achieve the first goal.
Research in graph embedding and VLSI layout has developed many powerful techniques [a, 51 which can produce embeddings and layouts that are quite efficient -often within constant factors of being optimal. However, even a modest constant fac,tor may render an asymptotically-optimal layout or embedding unacceptably inefficient in practice. This motivates the current paper.
Preliminaries 2.1 The Thompson Model
Among the many mathematical models that have been proposed for VLSI computations, the most widely accepted one is due to Thompson, which is now known as the Thompson grid model [18, 191. In this model, the chip is presumed to consist of a grid of vertical and horizontal tracks which are spaced at unit intervals. Two layers of interconnect are used to route the wires. Vertical wires are routed in the top layer of the interconnect and horizontal wires are routed in the bottom layer. Hence, wires may cross but they cannot overlap for any distance or cut cross a node to which they are not incident. To change direction, wires may turn into the other layer by contact cuts or vias which facilitate connections between the two layers. In our discussion, no knock-knees are allowed-that is, two wires cannot turn at the same grid point [14, 151.
Formally, an embedding or layout of a graph G in a Thompson grid is an assignment of the nodes of G to intersection points in the grid and the edges of G t o paths along the grid tracks. One of the important measures of a layout is the layout area which is defined as the product of the number of vertical tracks and the number of horizontal tracks that contain a node or a path segment of the graph.
Cube-Connected Cycles
The s-dimensional cube-connected cycles (CCC) is constructed from the s-dimensional hypercube by replacing each node of the hypercube with a cycle of s nodes 113, 161. The ith-dimension edge incident to a node of the hypercube is then connected to the ith node of the corresponding cycle in the CCC. For example, see Fig. l(a,b) . The resulting graph has ~2~ nodes, each of degree 3. By adopting the labeling scheme of the corresponding hypercube and modifying it slightly to take into account the cycles that are introduced, we can represent each node of the CCC by bit. Edges due to (1) are cyclic edges and edges due to ( 2 ) are cubical edges. As shown in Fig. l(c) , the CCC is often drawn in the "multi-stage" format which can directly give rise to the Preparata-Vuillemin layout. say that a network of N nodes has an (asymptotically) optimal layout if it can be laid out in O ( N 2 / T 2 ) area, where T is the time to execute an ascend-descend algorithm [4, 191. The CCC can execute the ascenddescend algorithm in time O(1ogN) [16] . Therefore, the Preparata-Vuillemin layout is (asymptotically) optimal.
The Preparata-Vuillemin Layout
For an s-dimensional CCC with n = 2' cycles, which we denote by CCC,, let W ( s ) and H ( s ) be the numbers of vertical and horizontal tracks respectively, i.e., the width and the height of a layout. For the PreparataVuillemin layout, we have
We get W ( s ) = 2Sf1 = 2n and H ( s ) = 2' + 1 = n + 1.
Hence the area occupied by the Preparata-Vuillemin
For the improved Preparata-Vuillemin layout which is shown in Fig. 2(b) , 
New Layouts
Although the Preparata-Vuillemin layout for the CCC is asymptotically optimal (up to a constant), it is not the minimal layout. For real implementations, we are interested in the minimal layout. Here, we give a new layout for the CCC. It is more compact than the Preparata-Vuillemin layout; we conjecture that this layout is optimal (minimal). 
Small CCC's
With the new layout scheme, the layouts for the several initial small CCC's are as shown in Fig. 3(a,b,c) . It can be easily verified that these few simple cases do occupy minimal area. These small CCC layouts are the foundation on which to recursively lay out bigger CCC networks.
Recursive Construction
Starting from the 5th dimension, the construction is inductive. We take two copies of the layout for the (s -1)-dimensional CCC and place them side by side. Stretch every cycles vertically by an extra height of to allow for the insertion of the sth-dimension nodes and edges. Since there are four rows of cycles from top to bottom, totally 2"-' extra horizontal tracks are added. Note that at each recursive expansion, all new nodes (Le., the sth-dimension nodes) are inserted into the same cycles, which ensures the correctness of the new layout scheme. We show the 5-dimensional CCC and the 6-dimensional CCC that are laid out using the new scheme in Fig. 3(d) and Fig. 4(a) respectively. For the s-dimensional CCC with n = 2s cycles, it is easy to see that
23-3
W(4) = 12,
We get W ( s ) = 12 x ZS-' = i n and
Like the improved Preparata-Vuillemin layout, the new layout can also be improved. The improved new layout of the 6-dimensional CCC is shown in Fig. 4(b 
Comparison
By ignoring the low-order terms in Formulae 1, 2, 3 and 4, the four layout schemes of CCC, discussed above take areas approximately equal t o 2n2, $n2, in2 and in2 respectively. We compare the new layout with the Preparata-Vuillemin layout, and the improved new layout with the improved Preparata-Vuillemin layout:
The new layout scheme takes less than half of the area of the Preparata-Vuillemin layout in either case; the crux of the new layout is that the corner points of the cycles (now as rectangles in grid) are occupied by nodes (processors) so that their corresponding cubical edges need no or little extra space, which is unlike the Preparata-Vuillemin layout.
The new compact layout presented here also shows the superiority of the CCC in this aspect over other hypercube substitutes such as the shuffle-exchange network and the butterfly network [13] . Much work had been devoted to the layout of the shuffle-exchange network until Leighton found its optimal layout [12] . However, all these layouts of the shuffle-exchange network are complicated; they are not regular or recursive. The best known layout of the butterfly network with n inputs or outputs was due to Wise [20] with area 2: 2n2. Recently, more compact layouts for the butterfly were found with area 21 yn2 [8] , [a] . However, the butterfly networks 2 , 8, 201 are unfolded. To be fair, the folded butterfly network [13] should be used in comparison with the CCC. Generally, the corresponding areas of the folded butterfly using these layout schemes [2, 8, 201 are at least doubled.
4
To prove the optimality of the new layout, it is desirable to have a tight lower bound, say, in2 -o(n2), so that we can conclude that the deviation of the new layout from optimality is at worst of some lower order than a constant factor. While such a tight lower bound is difficult to derive, we give below a lower bound of (in-1)' for CCC,. Given this bound, we can see that the deviation of the new layout is at worst of a small additive factor of f or a multiplicative factor of 2 from optimality.
Lower Bound on Layout Area The lower bound ( i n -1)' is easily seen from the bounding strategy invented in [19] which is in terms of the bisection width of a graph. We present it below as Lemma 1.
Lemma 1 For any graph G with bisection width
The proof of the bisection width in of CCC,, however, is complicated. Alternatively, we can use the modified bounding strategy, Lemma 2, from [2] where a lower bound of the butterfly network layout is proved by the same technique in terms of the minimum special bisection width.
Let 4 be a graph having a designated set of 2c > 0 special nodes. The minimum special bisection width Consider the embedding of Kn,n into the CCC, which assigns the inputs of K,,, to the first stage of CCC, and the outputs of K,,, to the last stage of CCC,, and which routes the edges of K,,, in increasing order of dimensions, i.e., from right to left.
With no loss of generality, see by squeezing every pair of nodes into one big node shown in Fig. 5(d) . Fig. 5(e) is apparently a reverse Omega network (or a flip network [3] ). Hence the original CCC, is turned into a reverse Omega network with in inputs and in outputs.
Note that the reverse Omega network (with f n inputs and an outputs) has the banyan property [lo] : each input node U is connected to each output node U by exactly one path of length s -1. Let e be a stage-k edge of the reverse Omega network where 0 5 IC 5 s -2 . One end-point of e reaches precisely distinct output nodes while the other end-
point of e reaches precisely 2k distinct input nodes. Hence edge e lies on precisely 2"-2 input-output paths.
Since each input or output contains two nodes of K,,,, edge e actually lies on precisely 2" input-output paths, i.e., its congestion is 2" = n.
Further investigation shows that the congestion of the cubical edges of CCC, which are shown as thin edges in Fig. 5 (d), is also 2" = n, since from each input, exactly half of the paths will go through the cubical edges. 
Conclusion
The motivation underlying the work presented here is the question of how much we can reduce the layout area of the CCC. We have given a simple, regular and more compact layout scheme for the CCC; the resulting area is in2 + o(n2). Some earlier attempts have been made: [17] gave a construction of the CCC with area n2; [6] gave one with area in2.
We also give a lower bound on the layout area of the CCC by which we can judge how far our new layout may be from optimality. There is still a gap of a constant additive factor of between the new layout and the lower bound. To narrow or fill the gap, we need to find either a more compact layout or a tighter lower bound for the CCC. We conjecture that the new layout scheme is minimal. Hence, our future effort will be devoted mainly to finding a tighter lower bound.
Another important measure of a layout is the maximum wire length [4, 111. Another future research item will be to find layout schemes that will not produce long wires, and t o consider the tradeoff between area and maximum wire length for CCC layouts.
