A tight layout of the cube-connected cycles by Lau, FCM & Chen, G
Title A tight layout of the cube-connected cycles
Author(s) Chen, G; Lau, FCM
Citation The 4th International Conference on High PerformanceComputing, Bangalore, India, 18-21 December 1997, p. 422-427
Issued Date 1997
URL http://hdl.handle.net/10722/45582
Rights
©1997 IEEE. Personal use of this material is permitted. However,
permission to reprint/republish this material for advertising or
promotional purposes or for creating new collective works for
resale or redistribution to servers or lists, or to reuse any
copyrighted component of this work in other works must be
obtained from the IEEE.
A Tight Layout of the Cube-Connected Cycles 
Guihai Chen and Francis C.M. Lau 
Department of Computer Science 
The University of Hong Kong 
Pokfulam Road, Hong Kong 
{ gchen, fcmlau} Qcs. hku. h k 
Abstract 
Preparata and Vuillemin proposed the cube- 
connected cycles (CCC) i n  1981 [ lS] ,  and in the same 
paper, gave a n  asymptotically-optimal layout scheme 
for the CCC. We give a new  layout scheme for the 
CCC iuh,ich, require.? less th,an h,alf of the area of th,e 
Preparata- Vuillemin layout. W e  also give a non-trivial 
lower bound o n  the layout area of the CCC. There is 
a constant factor of 2 between the new layout and the 
lower bound. We conjectur.e that the new layout is op- 
timal (minimal). 
Keywords: Interconnection networks, VLSI, cube- 
connected cycles, embedding, routing, layout. 
1 Introduction 
Interconnection network is one of the most crucial 
design issues for parallel computers. There are many 
criteria to be considered in choosing a specific inter- 
connection for a given set of processors. With the 
rapid technological progresses in VLSI, it is now com- 
mon to connect a huge number of processors together 
to cooperate for the execution of parallel algorithms. 
Obviously, in these situations, one of the criteria for 
packing these processors together would be the “com- 
pacity” of the layout in a VLSI grid. 111 general, the 
more compact the better. 
The cuke-connected cycles (CCC) is one of the 
most popular interconnection networks. Preparata 
and Vuillemin [16] put forward the CCC as a practical 
substitute for the hypercube in 1981, and at  the same 
time gave an asymptotically-optimal layout scheme for 
it. Their layout scheme, however, cannot produce the 
minimal layout for the CCC. Our work addresses this 
issue. We have two goals: one is to derive a more com- 
pact layout for the CCC than the Preparata-Vuillemin 
layout: and the other is to reduce the long wires in the 
layout while kceping the asymptotically-optimal area. 
This paper reports the result coming from our effort 
in trying to achieve the first goal. 
Research in graph embedding and VLSI layout 
has developed many powerful techniques [a ,  51 which 
can produce embeddings and layouts that are quite 
efficient -often within constant factors of being op- 
timal. However, even a modest constant fac,tor may 
render an asymptotically-optimal layout or embedding 
unacceptably inefficient in practice. This motivates 
the current paper. 
2 Preliminaries 
2.1 The Thompson Model 
Among the many mathematical models that have 
been proposed for VLSI computations, the most 
widely accepted one is due to Thompson, which is 
now known as the Thompson grid model [18, 191. In 
this model, the chip is presumed to consist of a grid of 
vertical and horizontal tracks which are spaced at unit 
intervals. Two layers of interconnect are used to route 
the wires. Vertical wires are routed in the top layer of 
the interconnect and horizontal wires are routed in the 
bottom layer. Hence, wires may cross but they cannot 
overlap for any distance or cut cross a node to which 
they are not incident. To change direction, wires may 
turn into the other layer by contact cuts or vias which 
facilitate connections between the two layers. In our 
discussion, no knock-knees are allowed-that is, two 
wires cannot turn at  the same grid point [14, 151. 
Formally, an embedding or layout of a graph G in 
a Thompson grid is an assignment of the nodes of G 
to intersection points in the grid and the edges of G 
to  paths along the grid tracks. One of the important 
measures of a layout is the layout area which is defined 
as the product of the number of vertical tracks and the 
number of horizontal tracks that contain a node or a 
path segment of the graph. 
2.2 Cube-Connected Cycles 
The s-dimensional cube-connected cycles (CCC) is 
constructed from the s-dimensional hypercube by re- 
placing each node of the hypercube with a cycle of s 
nodes 113, 161. The ith-dimension edge incident to a 
node of the hypercube is then connected to the i th  
node of the corresponding cycle in the CCC. For ex- 
ample, see Fig. l(a,b).  The resulting graph has ~2~ 
nodes, each of degree 3.  By adopting the labeling 
scheme of the corresponding hypercube and modify- 
ing it slightly to take into account the cycles that are 
introduced, we can represent each node of the CCC by 
the pair (w,i) where i (0 5 i 5 s - 1) is the position 
of the node within its cycle and w (any s-bit binary 
string with dimension 0 being at  the rightmost bit po- 
sition) is the label of the node in the hypercube that 
corresponds to the cycle. Then two nodes (w,i) and 
(w’,z‘) are linked by an edge in the CCC if and only if 
either 
(1) w = w’ and i - i’ = Z ! E ~  
(2) i = i’ and w differs from w’ in precisely the i th  
(mod s), or 
422 
1094-7256/97 $10.00 0 1997 IEEE 
k 
Figure 1: (a) 3-dim hypercube. (b) %dim CCC. (c) Another drawing of 3-dim CCC; cyclic edges in thick and 
cubical edges in thin. 
Figure 2: (a) Preparata-Vuillemin layout for 4-dim CCC. (b) Improved Preparata-Vuillemin layout for 4-dim CCC. 
bit. 
Edges due to (1) are cyclic edges and edges due to 
( 2 )  are cubical edges. As shown in Fig. l (c) ,  the CCC 
is often drawn in the "multi-stage" format which can 
directly give rise to the Preparata-Vuillemin layout. 
2.3 The Preparata-Vuillemin Layout 
Fig. 2(a) provides a base, inductive hypothesis for 
proving that a CCC of N = s .2" nodes can be placed 
on a 2 . 2' x (2' + 1) chip. Since s 21 log(N/logN), 
the chip size is about O((N/logN)'). In general, we 
say that a network of N nodes has an (asymptoti- 
cally) optimal layout if it can be laid out in O ( N 2 / T 2 )  
area, where T is the time to execute an ascend-descend 
algorithm [4, 191. The CCC can execute the ascend- 
descend algorithm in time O(1ogN) [16]. Therefore, 
the Preparata-Vuillemin layout is (asymptotically) op- 
timal. 
For an s-dimensional CCC with n = 2' cycles, which 
we denote by CCC,, let W ( s )  and H ( s )  be the numbers 
of vertical and horizontal tracks respectively, i.e., the 
width and the height of a layout. For the Preparata- 
Vuillemin layout, we have 
W(l )  = 4, 
H(1) = 3, 
W ( s )  = 2W(s - l), 
H ( s )  = H ( s  - 1) + 23-1 
We get W ( s )  = 2Sf1 = 2n and H ( s )  = 2' + 1 = n + 1. 
Hence the area occupied by the Preparata-Vuillemin 
layout is W ( s )  x H ( s ) ,  i.e., 
For the improved Preparata-Vuillemin layout which is 
shown in Fig. 2(b), 
W (  1) = 4, 
H(1) = 3, 
W ( s )  = 2W(s - l), 
H ( s  - 1) + 2-1 
H ( s  - 1) + 2'-' + 1 
ifs  is odd 
if s is even. H ( s )  = 
We get W ( s )  = 2'+l = 271. and H ( s )  = 3 + (2 + 4 + 
5 + . . . + 2"-' + ( Y - ~  + 1)) = $2' + is + f for even 
s, and H ( s )  = g2' + 3s + for odd s. For simplicity 
we only consider even s. Hence the area is 
4 8 
-n2 + n Iogn + -n. 
3 3 
3 New Layouts 
Although the Preparata-Vuillemin layout for the 
CCC is asymptotically optimal (up to a constant), it is 
not the minimal layout. For real implementations, we 
are interested in the minimal layout. Here, we give a 
new layout for the CCC. It is more compact than the 
Preparata-Vuillemin layout; we conjecture that this 
layout is optimal (minimal). 
423 
Figure 3 :  New layouts of small CCC's. (a) 2-dim CCC 
needs area 4 x 4. (b) 3-dim CCC needs area 8 x 6. (c) 4- 
dim CCC needs area 12 x 12. (d) 5-dim CCC needs area 
24 x 28; note that all 5th-dimension nodes are inserted 
into the same cycles that are in the 4-dim CCC. 
3.1 Small CCC's 
With the new layout scheme, the layouts for the 
several initial small CCC's are as shown in Fig. 3(a,b,c). 
It can be easily verified that these few simple cases do 
occupy minimal area. These small CCC layouts are the 
foundation on which to recursively lay out bigger CCC 
networks. 
3.2 Recursive Construction 
Starting from the 5th dimension, the construction 
is inductive. We take two copies of the layout for the 
(s - 1)-dimensional CCC and place them side by side. 
Stretch every cycles vertically by an extra height of 
to allow for the insertion of the sth-dimension 
nodes and edges. Since there are four rows of cy- 
cles from top to bottom, totally 2"-' extra horizon- 
tal tracks are added. Note that at  each recursive ex- 
pansion, all new nodes ( L e . ,  the sth-dimension nodes) 
are inserted into the same cycles, which ensures the 
correctness of the new layout scheme. We show the 
5-dimensional CCC and the 6-dimensional CCC that 
are laid out using the new scheme in Fig. 3(d) and 
Fig. 4(a) respectively. For the s-dimensional CCC with 
n = 2s cycles, it is easy to see that 
23-3 
W(4) = 12, 
H ( 4 )  = 12,  
W ( s )  = 2 W ( s  - l), 
H ( s )  = H ( s  - 1) + 25-1. 
We get W ( s )  = 12 x ZS-' = i n  and H ( s )  = 2" - 4 = 
n - 4. Hence the area W ( s )  x H ( s )  is 
:nz - 3 n .  (3) 
Like the improved Preparata-Vuillemin layout, the 
new layout can also be improved. The improved new 
layout of the 6-dimensional CCC is shown in Fig. 4(b). 
For the improved new layout scheme, 
W(4) = 12, 
H(4) = 12, 
W ( s )  = 2 W ( s  - l), 
H ( s  - 1) + zs-l  if s is odd { H ( s  - 1) + 2'-' + 4 if s is even. H ( s )  = 
We get W ( s )  = 12 x 2"-' = Sn and H ( s )  = 12+  (16+ 
for even s. Hence the area is 
20 + 64 + 68 . .  . -I- 2'-' + (2'-' + 4)) = $2" + 2s - 3 3 
1 3 
2 2 
-n2 + -nlogn - 5n. (4) 
3.3 Comparison 
By ignoring the low-order terms in Formulae 1, 2, 3 
and 4, the four layout schemes of CCC, discussed above 
take areas approximately equal to  2n2, $n2, in2 and 
in2 respectively. We compare the new layout with 
the Preparata-Vuillemin layout, and the improved new 
layout with the improved Preparata-Vuillemin layout: 
The new layout scheme takes less than half of the area 
of the Preparata-Vuillemin layout in either case; the 
crux of the new layout is that the corner points of 
the cycles (now as rectangles in grid) are occupied by 
nodes (processors) so that their corresponding cubical 
edges need no or little extra space, which is unlike the 
Preparata-Vuillemin layout. 
The new compact layout presented here also shows 
the superiority of the CCC in this aspect over other hy- 
percube substitutes such as the shuffle-exchange net- 
work and the butterfly network [13]. Much work had 
been devoted to the layout of the shuffle-exchange 
network until Leighton found its optimal layout [12]. 
However, all these layouts of the shuffle-exchange net- 
work are complicated; they are not regular or recur- 
sive. The best known layout of the butterfly net- 
work with n inputs or outputs was due to Wise [20] 
with area 2: 2n2. Recently, more compact layouts 
for the butterfly were found with area 21 yn2 [8],  
[a]. However, the butterfly networks 
discussed Or n2 + o(nzl i  2 ,  8 ,  201 are unfolded. To be fair, the 
folded butterfly network [13] should be used in com- 
parison with the CCC. Generally, the correspond- 
ing areas of the folded butterfly using these layout 
schemes [2, 8, 201 are at  least doubled. 
4 
To prove the optimality of the new layout, it is de- 
sirable to have a tight lower bound, say, i n2  - o(n2),  
so that we can conclude that the deviation of the new 
layout from optimality is at worst of some lower order 
than a constant factor. While such a tight lower bound 
is difficult to derive, we give below a lower bound of 
( i n -  1)' for CCC,. Given this bound, we can see that 
the deviation of the new layout is at  worst of a small 
additive factor of f or a multiplicative factor of 2 from 
optimality. 
Lower Bound on Layout Area 
424 
Figure 4: (a) New recursively-structured layout of 6-dim CCC with area 48 x 60. 
6-dim CCC with area 48 x 48. 
first stage last stage 
l(b) Improved new layout of 
Figure 5: An embedding of K8.8 into CCC,. 
425 
The lower bound ( i n  - 1)' is easily seen from the 
bounding strategy invented in [19] which is in terms 
of the bisection width of a graph. We present it below 
as Lemma 1. 
Lemma 1 For any graph G with bisection width 
BW(G), AREA(4)  2 (BW(4)  - 1)'. 
The proof of the bisection width i n  of CCC,, however, 
is complicated. Alternatively, we can use the modified 
bounding strategy, Lemma 2, from [ 2 ]  where a lower 
bound of the butterfly network layout is proved by 
the same technique in terms of the minimum special 
bisection width. 
Let 4 be a graph having a designated set of 2c > 0 
special nodes. The minimum special bisection width 
of 4,  denoted M S B W ( G ) ,  is the smallest number of 
edges whose removal partitions 4 into two disjoint sub- 
graphs, each containing half of 4's special nodes. 
The following three lemmas are due to Avior et 
al. [a]. 
Lemma 2 For any graph 4, 
Proof: See [19]. 0 
AREA(4)  2 ( M S S W ( 4 )  - 1)2. 
In order to bound the M S B W  of CCC,, we em- 
ploy the congestion argument originated in 112, 131 
and refined in [2]  which is used for bounding unknown 
M S B W ' s  from known ones. 
Lemma 3 Let 4 and 'Ft be graphs having equal num- 
bers of special nodes. I f  there is  an embedding of 4 
into 'Ft which maps special nodes to special nodes and 
which has congestion 5 C ,  then 
M S B W ( 8 )  2 ( l / C ) M S S W ( S ) .  
The complete bipartite graph K,,, plays the role of 
the guest graph 4 with known M S B W  = in2.  
Lemma 4 MSBW(K,,,) = in2  when all nodes of 
K,,, are special. 
Now we give an embedding of the guest graph K,,, 
into the host graph CCC,. 
Lemma 5 One can embed Kn,n into CCC,  with con- 
gestion 2" = n in such a way that the inputs and out- 
puts  of K,,, map, respectively, to the first stage and 
the last stage of CCC,. 
Proof: Consider the embedding of Kn,n into the 
CCC, which assigns the inputs of K,,, to the first stage 
of CCC, and the outputs of K,,, to the last stage of 
CCC,, and which routes the edges of K,,, in increasing 
order of dimensions, i.e., from right to left. 
With no loss of generality, see Fig. 5 for an embed- 
ding of &,8 into ccc8. Since the long wrap-around 
cyclic edges of CCC are not used for routing in the em- 
bedding, Fig. S(a) is simplified to Fig. S(b). Fig. 5(b) 
can be isomorphically arranged as Fig. 5(c) in which 
all stages of nodes except the first stage are reordered 
so that pairs of nodes connected by cubical edges are 
put together like the first stage while the cyclic edges 
at each stage appear as the unshuffle-connection pat- 
tern [l, 7, 91. Fig. 5(c) can be contracted to Fig. 5(e) 
by squeezing every pair of nodes into one big node 
shown in Fig. 5(d). Fig. 5(e) is apparently a reverse 
Omega network (or a flip network [3]). Hence the orig- 
inal CCC, is turned into a reverse Omega network with i n  inputs and i n  outputs. 
Note that the reverse Omega network (with f n  in- 
puts and an outputs) has the banyan property [lo]: 
each input node U is connected to each output node 
U by exactly one path of length s - 1. Let e be 
a stage-k edge of the reverse Omega network where 
0 5 IC 5 s - 2 .  One end-point of e reaches precisely 
distinct output nodes while the other end- 2s-k- -2  
point of e reaches precisely 2k distinct input nodes. 
Hence edge e lies on precisely 2"-2 input-output paths. 
Since each input or output contains two nodes of K,,,, 
edge e actually lies on precisely 2" input-output paths, 
i.e., its congestion is 2" = n. 
Further investigation shows that the congestion of 
the cubical edges of CCC, which are shown as thin 
edges in Fig. 5(d), is also 2" = n, since from each 
input, exactly half of the paths will go through the 
cubical edges. 0 
Lemma 6 MSBW(CCC,) L in .  
Proof: Directly from Lemmas 3, 4, and 5. 0 
Finally, Lemma 6 can be combined with Lemma 2 
to yield the desired lower bound, Theorem 1, on the 
area of the layout of CCC,. 
Theorem 1 A n y  layout of CCC,  has area at least 
( f n  - 1)'. 
5 Conclusion 
The motivation underlying the work presented here 
is the question of how much we can reduce the layout 
area of the CCC. We have given a simple, regular and 
more compact layout scheme for the CCC; the resulting 
area is in2  + o(n2).  Some earlier attempts have been 
made: [17] gave a construction of the CCC with area 
n2; [6] gave one with area in2.  
We also give a lower bound on the layout area of 
the CCC by which we can judge how far our new lay- 
out may be from optimality. There is still a gap of a 
constant additive factor of between the new layout 
and the lower bound. To narrow or fill the gap, we 
need to find either a more compact layout or a tighter 
lower bound for the CCC. We conjecture that the new 
layout scheme is minimal. Hence, our future effort will 
be devoted mainly to finding a tighter lower bound. 
Another important measure of a layout is the maxi- 
mum wire length [4, 111. Another future research item 
will be to find layout schemes that will not produce 
long wires, and to  consider the tradeoff between area 
and maximum wire length for CCC layouts. 
426 
Acknowledgement 
constructive comments. 
References 
We are grateful to the anonymous referees for their 
[l] F. Annexstein, M. Baumslag, and A. L. Rosen- 
berg. Group action graphs and parallel architec- 
ture. SIAM J .  Computing, 19(3):544-569, June 
1990. 
[2] A. Avior, T. Calamoneri, S. Even, A. Litman, and 
A. L. Rosenberg. A tight layout of the butterfly 
network. In Proceedings of 8th ACM Symposium 
on Parallel Algorithms and Architectures, pages 
[3] K. E. Batcher. The flip network in STARAN. In 
Proceedings of International Conference on Par- 
allel Processing, pages 65-71, Detroit, MI, 1976. 
[4] R. Beige1 and C. P. Kruskal. Processor networks 
and interconnection networks without long wires. 
In AGM Symposium on Parallel Algorithm and 
Architecture, pages 42-51, 1989. 
[5] S. N. Bhatt and F. T .  Leighton. A framework 
for solving VLSI graph layout problem. Journal 
of Computer and System Sciences, 28(2):300-343, 
1984. 
[6] S. Bhattacharya, C. T. Liang, and W. T. Tsai. 
Cubical bus connected columns: An alternative 
to hypercube. In Proceedings of the Supercom- 
puting Symposium, pages 409-420, 1991. 
[7] G. Chen and F. C. M. Lau. Comments on “a new 
family of Cayley graph interconnection networks 
of constant degree four”. IEEE Transactions on 
Parallel and Distributed Systems, to appear. 
[8] Y. Dinitz. A compact layout of butterfly on the 
square grid. Technical Report 873, Technion- 
Israel Institute of Technology, Haifa 32000, Israel, 
November 1995. 
[9] B. N. Jain. Equivalence between cube-connected 
cycles networks and circular shuffle networks. In 
Proceedings of International Conference on Par- 
allel Processing, pages 8-11, 1986. 
A unified theory 
of interconnection network structure. Theoreti- 
cal Computer Science, 48( 1):75-94, 1986. 
Optimal layouts 
of midmiew networks. IEEE Transactions on 
Parallel and Distributed Systems, 7(9):954-961, 
September 1996. 
[la] F. T. Leighton. Complexity Issues in VLSI: Op- 
timal Layout for the Shufle-Exchange Graph and 
Other Networks. The MIT Press, 1983. 
Introduction to Parallel Algo- 
rithms and Architectures: Arrays, Trees, Hyper- 
cube. Morgan Kaufmann Publishers, 1992. 
170-175, 1996. 
[lo] C. P. Kruskal and M. Snir. 
[ll] F. C. M. Lau and G. Chen. 
[13] F. T. Leighton. 
(141 C. Mead and L. Conway. Introductzon to VLSI 
Systems. Addison Wesley, 1980. 
[15] K. Mehlhorn, F. P. Preparata, and M. Sar- 
rafzadeh. Channel routing in knock-knee mode: 
Simplified algorithms and proofs. Algorithmica, 
The cube- 
connected cycles: .4 versatile network for parallel 
computation. CACM, 24(5):300-309, May 1981. 
[17] J. J. Shen and I. Iioren. Yield enhancement de- 
signs for wsi cube connected cycles. In Proceed- 
ings of IEEE International Conference on Wafer 
Scale Integration, pages 289-298, 1989. 
[l8] C. D. Thompson. Area-time complexity for VLSI. 
In Proc. 11th Ann. Symp. on Theory of Comput- 
ing, pages 81-88, Atlanta, GA, May 1979. 
[19] C. D. Thompson. A Complexity Theory for 
VLSI. PhD thesis, CMU Computer Science De- 
partment, 1980. 
[20] D. S. Wise. Compact laymts of Banyan/FFT net- 
works. In H. T. Kung, R. Sproull, and G. Steele, 
editors, VLSI Sytems and Computations, pages 
186-195. Springer-Verlag, 1981. 
1 :213-221, 1986. 
[16] F. P. Preparata and J. Vuillemin. 
427 
