Layout placement for sliced architecture by Larmore, Lawrence L. & Gajski, Daniel D.
UC Irvine
ICS Technical Reports
Title
Layout placement for sliced architecture
Permalink
https://escholarship.org/uc/item/0x97m5w1
Authors
Larmore, Lawrence L.
Gajski, Daniel D.
Publication Date
1989-10-19
 
Peer reviewed
eScholarship.org Powered by the California Digital Library
University of California
Notice: This Material 
may be protected 
by Copyright Law 
(Title 17 U.S.C.) 
_Layout Placement for Sliced Architectur~ 
Lawrence L. Larmore 
--= -
Daniel D. Ga.jski 
~ Dept. of Ma&hema&ica and Compu&er Science Dep&. of Informa&ion and Compu&er Science 
Univerahy of California a& Riveraide Univenily of C&lifornia. a& Irvine 
Riverside, CA 92521 Irvine, CA 92117 
Technical Report 89-36 
October 19, 1989 
Layout Placement for Sliced Architecture 
Lawrence L. Larmore 
Dept. of Mathematics and Computer Science 
University of California at Riverside 
Riverside, CA 92521 
Daniel D. Gajski 
Dept. of Information and Computer Science 
University of California at Irvine 
Irvine, CA 92717 
Abstract 
This paper defines a new sliced layout architecture for compilation of arbitrary 
schematics (netlists) into layout for CMOS technology. This sliced architecture uses 
over-the-cell routing on the second metal layer. We define three different architectures 
with simple folding, interleaved folding and unrestricted folding. We present a linear 
time algorithm for placement of components in architectures with simple folding. We 
prove interleaved folding is NP-hard and give an algorithm of complexity O(nbH/6) 
for approximating an optimal module, where n is the number of components, bis the 
width of the least-area module, H is the total height of the components, and 6 > 0 is 
arbitrarily chosen. The error of this algorithm (i.e., the difference between the area of 
the resulting module and the optimal one) is O(nb6). We conclude the paper with a 
proof that the architecture with interleaved folding is as good as the architecture with 
unrestricted folding with respect to area minimization of the total layout. 
1 

Component placement methods fall into two groups: constructive and iterative. Con-
structive placement methods produce a complete placement from the netlist or partial 
placement while iterative methods improve a complete placement by modifying it. Im-
provement is measured by some metric such as area, total wire length, wire density, etc. 
The routing task determines the placement of connections between components. It consists 
of three subtasks. First, the routing surface must be divided into rectangular routing areas 
which meet the restrictions of the routing algorithm to be used. Second, a global router 
assigns each net to a subset of routing areas,i.e., the global router defines areas through 
which the net will be placed. Third, a detailed router calculates the exact wiring path in 
each routing area. 
3 Motivation 
In order to simplify the general placement and routing problem several layout architectures 
have been developed. They restrict the freedom in placing components and wires and thus 
simplify the placement and routing algorithms. The best known and the most popular 
one is the standard cell architecture. The general microarchitectural components such as 
ALU s, registers, counters, encoders and shifters are further decomposed into gates, such 
as NAND, NOR and EXOR; and storage elements, such as latches and flip-flops. Each of 
these elements is laid out by hand as a cell. All cells are of the same height but of different 
width with inputs and outputs on top and the bottom of the cell. The cells are placed 
in rows which are stacked in several levels. Every two rows are separated by the routing 
area called routing channels (Figure 1). Several dummy cells used only for routing cells in 
non-adjacent rows are inserted where needed. 
One of the weaknesses of the standard cell architecture is that routing occupies more than 
50% of the total area. It should be noted that cells use basically diffusion and polysilicon 
layers with some routing in the first metal layer while channels are routed in two metal 
layers. Thus, it would be beneficial to route over the cells and minimize routing area. 
Another weakness is that standard cell architec~ure does not take advantage of replicability 
of microarchitectural components which consists of many identical bit slices. Those slices 
can be laid out as one cell instead of as several standard cells, and connected through 
diffusion and polysilicon layers, thus drastically reducing number of wires in the channel. 
We now introduce the sliced architecture that avoids th~ above mentioned weaknesses 
3 

second solution to the sliced architecture placement the possibly smaller component area is 
traded off for the larger routing area in comparison to the simply folded architecture. 
The third solution to the above problem is the unrestricted architecture in which com-
ponents are not sorted or folded. Components are stacked with respect to the left and the 
right edge of the module in such a way that wasted area is minimal as shown in Figure 6. 
This architecture does not have any restriction on ordering or size of particular components. 
It assumes, however, that track densities of the left and the right parts together are less 
than or equal to the track density of the bit slice. 
5 Simple-folded sliced architecture 
In this section we give a fast algori,: hm for minimizing the area of the module, using the 
simple-folded sliced architecture. 
Sort the components by width. Let bi and hi be the width and height of the ith widest 
component. Let the width of the module be b; for any fixed b, our algorithm minimizes the 
required height of the module. The area of the module can be minimized by running the 
algorithm once for each choice of b, and comparing those best answers. 
We define the component Ci to be wide if bi> b/2, and narrow otherwise. In the simple-
folded sliced architecture, components C1 through Ck (for some k) will be stacked on the 
left in sorted order, while components Ck+l to Cn will be stacked on the right. The routing 
strategy requires that all the components on the right be narrow. 
We say that components Ci and Cj are compatible if they can be laid side by side in 
the module, i.e., if bi+ bj :::; b. Otherwise, we say they are incompatible. For any narrow 
component Ci, define the critical obstructing component of Ci to be that Cj of maximum 
index which is incompatible with Ci, and we say that j is the critical index of i, and write 
crit( i) = j. If there is no incompatible component, we say that crit( i) = 0. 
The algorithm needs only find the correct value of the parameter k. For each k, define 
Yk to be the minimum distance possible from the top of the module to the bottom of Ck+l 
if the folding occurs at k. Note that Yk 2:: Ei::k+t hi. Strict equality may not occur since 
obstructions caused by wide components on the right may prevent the folded portion from 
abutting the top of the module. All values of Yk can be found in linear time by reverse 
iteration, starting with k = n. Since there are no components on the right side, Yn = 0. 
Fork< n, we need to compute Yk, knowing Yk+i· Let j = crit(k), and let Xj = E1=t hi. 
5 

for it is equivalent to the statement that P = NP. 
We reduce the above partitioning problem to our architecture problem as follows: assume 
that we have n components of equal width (say bi = 1) and heights Wi, as well as a single 
component of width 2 and any height. The minimum area module has width 2, and is 
obtained by partitioning the components into left and right subsets of as nearly equal total 
height as possible; the one long component goes at the top or bottom. This partitioning 
yields an optimal solution to the classic problem. It follows that if we could find a polynomial 
time algorithm for optimizing sliced architecture with interleaved folding, we would have 
proved that P = NP. 
The situation is very far from hopeless, however. In this section we give an algorithm 
for minimizing the area of the module which runs in exponential time in the worst case. 
This algorithm, which we call the List-Merge algorithm, actually takes very little time to 
execute in practical cases, as we will show below. Even in the worst case, the List-Merge 
algorithm can be used to find an approximately optimal solution very fast, which differs 
from the optimal solution by a provably narrow amount. 
Assumptions. We will fix a module width b. The List-Merge algorithm will find that 
placement of the components which minimizes the module height. The algorithm can then 
be run once for each choice of b. 
We sort the components in the order in which they will be processed by the algorithm. 
The longest and shortest components come first, and those whose widths are closest to 
b/2 come last. This sorting is accomplished in two steps. In the first step, we sort the 
wide components (those of width greater than b/2) by width, widest first, and we sort the 
narrow components (those of width less than or equ~ to b/2) by width, narrowest first. 
These two lists are then merged, with the rule that if a wide and a narrow component are 
compatible (i.e., they can be placed side-by-side in the module) the narrow one is first, 
while if they are incompatible, the wide one is first. For example, if there are 9 components 
of widths 1, 2, 4, 5, 6, 7, 8, 9, 10 and if b = 12, the sorted list of wide components will have 
widths (10, 9, 8, 7) and the sorted list of narrow components will have widths (1, 2, 4, 5, 6). 
The merged list of components will have widths (1, 2, 10, 9, 4, 8, 5, 7, 6). Let bi, hi be the 
width and height of the ith component using this ordering. 
Partial solutions and signatures. The List-Merge algorithm is a dynamic programming 
algorithm which successively builds up lists of partial solutions "up to" k, for k from 0 to 
n. We define a partial solution up to k to be a placement of Ci, ... Ck in the module which 
7 



partial solutions one level at a time, i.e., as k iterates from 0 to n. After computing each 
level, we prune the tree by deleting all nodes whose signatures are not minimal for that level. 
If nodes have duplicate minimal signatures, we delete all but one of them. The remaining 
nodes at a level all have distinct signatures which are minimal, and only the children of 
these nodes are considered for constructing the next level. Induction on k, using Remark 
6.4, guarantees that no minimal signature will be lost. Since there is a minimal signature 
which is optimal, there will be at least one optimal solution constructed at the bottom level. 
Symmetry. Since any partial solution can be rotated 180 degrees, if ( x, y) is a minimal 
signature so is (y, x ). A further pruning by eliminating one of every such pair of minimal 
signatures saves almost another factor of 2 in the number of partial solutions that need to 
be computed. If this symmetry pruning is used, we eliminate any partial solution whose 
signature is (x', y') if we keep any other partial solution of signature (x, y) such that (x, y) ~ 
(y'' x'). 
Implementation. Let Lk be the set of partial solutions to k which survive the pruning 
process. We can assume that they are maintained in a list with strictly increasing values 
of x and strictly decreasing values of y. The algorithm can be expressed in terms of list 
operations as follows: 
Algorithm 2 
1. Let Lo be the list consisting of just the empty solution. 
2. For each k from 1 ton, construct the list of left children and the list of right children 
of Lk-I · Merge these two lists and prune, eliminating items with non-minimal or 
duplicate signatures. Call the resulting list Lk. 
3. Search Ln for that solution for which max{x, y} is minimized. The height-compression 
of that solution is optimal, and its height is max { x, y} plus the sum of the heights of 
all wide components. 
In Figure 11, we show the steps of a single example. Figure 11( a) shows the list of 
components, sorted in the order required by Algorithm 2. Figure ll(b) show~ the binary tree 
of signatures, where all non-minimal and duplicate nodes are pruned. Symmetry pruning 
is also used, as this cuts the size of the tree roughly in half. (The lists Lk are not shown in 
the figure, since the sorting required by the algorithm is not closely related to the binary 
tree structure and a figure showing both would look very tangled.) The path to the optimal 
11 

arrangement, the height of the module can be increased by at most that same amount. A 
decrease of the height of one component may or may not result in an decrease of the height 
of the module, but will never cause an increase. 
Now use the arrangement that is optimal for the estimated heights. We can increase 
every underestimated height to its true value, this causes an increase of the height of the 
module by at most E-, by the same argument. D 
Theorem 6.1 For any given positive o, there exists an algorithm for the sliced architecture 
with interleaved folding which takes 0( n L:i=1 hi/ o) time and produces a solution which is 
within no /2 of optimal. 
Proof: Approximate each hk by letting h~ be the nearest integral multiple of o. Then apply 
Lemmas 6.2 and 6.3. D 
Practical cases. In practice, we expect that n will be approximately 30, and that the 
sum of the heights of all components will be approximately 15000µ. ff the heights of all 
components are rounded to the nearest micron, the total error I: lh~ - hil, in that case, 
cannot exceed 15µ and is expected to be only 7.5µ. In that situation, Algorithm 2 will 
execute on the order of a million instructions, and will produce a solution whose deviation 
from optimal is on the order of 1 % . 
7 Sliced architecture with unrestricted folding 
We now consider a still more unrestricted architecture. We allow any component to be 
located anywhere in the module, as long as its base end abuts either the right or left edge 
of the module. Eliminating the restriction on sorting makes routing around the module 
impossible, so we must assume that there are two wires available for each cell. (See Figure 
6.) 
It might be expected that removing the sorting restriction will allow a decrease in the 
area of the module in some cases. We shall see that this is false, in fact the unrestricted 
folding architecture always yields the same optimal sjzed modules as the interleaved folding 
architecture in the previous section. There can be savings in the routing area, though. 
Theorem 7.1 For any given b, the minimum possible height of any module of width busing 
the sliced architecture with unrestricted folding is equal to the minimum possible height of 
any module of width b using the sliced architecture with interleaved folding. 
13 

Suppose that a left component Ci of width c overlaps a right component Cj, which must 
then have width greater than b - c. Let x be the distance from the lower right corner of 
Ci to the top of the module, and let y be the distance from the upper left corner of Cj to 
the bottom of the module. Since all left moiety components above Ci have width at least 
c, x s; f ( c ). Since all right moiety components below Cj have width greater than b - c, 
y ~ g(c). By Lemma 7.1, x + y s; J(c) + g(c) ~ h, which means that the lower right corner 
of Ci is above the upper left corner of Cj, contradicting the hypothesis that they overlap. 
D 
8 Experiments 
Since it is very difficult to obtain real-life examples on the microarchitectural level of design 
we generated more than ten examples by random. We set the number of components to 
be a random number between 10 and 50. This is approximately number of components 
on a controller, I/O interface chip or a medium size processor. The number of bits per 
component is a random number between 1 and 32, which covers most of the design with 
exception of floating-point arithmetic and number crunching high-performance processors. 
Similarly, the height of each component is a random number between. 100 and 600 microns. 
We developed most of the microarchitectural components using Silicon Compiler System's 
Generator Development Tools and found that most components fall into that range. We 
used independent uniform distributions in the above mentioned ranges. We approximated 
the quality of our algorithm by wasted area in the layout, i.e., the difference between the 
area of the module and the sum of all component areas. Table 1 shows for each example the 
number of components, areas of bounding rectangles for unfolded and folded archirecture, 
as well as wasted areas for both cases. On average our algorithm produced placements with 
less than 8.3% wasted area for folded architecture in comparison with 53.4% for unfolded 
architecture. 
9 Conclusion 
We have presented a new sliced architecture and an algorithm for placement of arbitrary 
microarchitectural schematics. We gave an algorithm for simple folding and interleaved 
folding. We also gave a bound on the algorithm quality. Our experiments show that wasted 
area is very small. Since this new architecture uses second metal for routing between 
15 

cell 1 cell 2 cell 3 ~ell 4 cell 5 
l L 1 l L L l 1 I 
1 I 1 I 
1 
l l l 
Figure 1. Standard-cell architecture 
cell rows 
routing 
channe 1 s 

component 1 
(6 bits) 
component 2 
(4 bits) 
component 3 
(5 bits) 
component 4 
(3 bits) 
component 5 
( I bit) 
component 6 
(2 bits) 
component 7 
(4 bits) 
slice O slice I slice 2 slice 3 slice 4 slice S 
fixed 
no. of 
tracks 
Figure 3. Sliced architecture 
'' 

comp. 1 
comp. 2 
comp. 3 
comp. 4 
comp. 5 
Routing 
area...,.. 
Module input and utpu s 
wasted 
Module outputs 
Figure 5. Sliced architecture with inter-
leaved folding and 1/0 on top and bottom 
comp. 9 
comp. 8 
comp. 7 
comp. 6 

XO=O 
X1 
X2 
X3 
X4 
XS 
X6 
X7 
X8 
X9 
X10 
Cl 
C2 C9 
C3 l CB 
C7 
C4 
C6 
cs 
cs 
C6 
C7 
CB 
C9 
ClO 
Figure 7(a) The Definitions of x i and y i 
ClO 
YlO=O 
yg 
YB 
Y7 
Y6 
YS 
Y4 

.. 1 _____ ... r: 
Figure 8(a) Before Height Compression Figure 8(b) After Height Compression 

M 
M 
Figure 9(b) Illustrating the Proof of Lemma 6.1, Case 2 

Width Height Large or Small 
C1 14 10 Large 
C2 3 8 Small 
C3 12 5 Large 
C4 5 4 Small 
cs 5 8 Small 
C6 10 6 Large 
C7 7 5 Small 
cs 9 10 Large 
Figure 11 (a) The Widths and Heights of the Components 

-, 
Cl 
C3 l\llll 
Figure 11 (c) The Optimal Interleaved Architecture, Showing Routing 

....,.4...._--<c----... 
-· -·.· - - ~~~~~~ ..... 
..,.._ __ ..,_,.~1·11111 
lllllll 
..,..411---~c--~ 
I 
f 
---------..---If II 
11111 
_, 
Figure 13 Illustrating Proof of Lemma 7 .1 
f(c) 
(c) 

