Algorithmic Layout of Gate Macros by Gajski, Daniel D. et al.
* Algorithmic Layout of Gate Macros 
Daniel D. Gajski 
Avinoam Bilgory 
Joseph Luhukay 
Department of Computer Science 
University of Illinois at Urbana-Champaign 
Urbana, Illinois 61801 
The rapid advancement of VLSI technology necessitates new implemen-
tation methodologies with design automation capabilities. Existlng 
implementation styles such as master slice, programmable logic arrays 
and custom design with cell library do not achieve the best tradeoffs 
between clrcuit density and chip development cycle time. The implemen-
tation ~methodology based on register-transfer building blocks called 
gate macros can be used to drastically c ut down the design time. Furth-
ermore, the gate macros which generally represent functional entities 
like registers, adders, busses, logic units etc. are subjective to algo-
rithmic or totally automatic layout [Verg80], [Joha79] . 
This paper describes the basic modules of a gate-to-silicon com-
piler which accepts as its input a high level description of gate macros 
and generates a layout that satisfies particular technology (NMOS, for 
example) and environmental parameters (layout area or time delay, for 
example). The input to the gate-to-silicon compiler are the set of 
* This work was supported in part by the NSr under g rant 
No. US NSF MCS80-0156l 
CALTECH CONFERENCE ON VLSI, JanuaPy 1981 
238 
Daniel D. Gajski , 4v i noam BilgoPy and Joseph Luhuka y 
tnacros generated R.t the register transfer level. High-level language 
constructs like DO loops and IF statements a re allowed in the input 
language. However, only Boolean scalars, vectors and strings are 
allowed. For example, a 16-bit binary adder can be described as fol-
lows: 
31: C(O) = CIN 
DO I = 1,16 
S2: C(I) = A(I )*B(I) + (A(I) + B(I))*C(I-1) 
S3: S(I) A(I) ffi B(I) ffi C(I-1) 
END 
S4: COUT = C(l6) 
The above description can be used for variety of impleme ntation 
styles. For example, tf the delay time specified is relatively slow 
with respect to t echnology used the 32-bit adder will be implemented as 
a ripple-carry adder. If a faster version is required the look-ahead-
carry adder will be used. For different delay times different number of 
bits will be looked ahead . Similarly, different layouts will be produced 
for different tlme delays. 
The compiler consists basically of four modules (Figure 1): 
1 . Boolean Analyzer partitions the input description into blocks 
with easily recognizable structure. For example, the statements s1 and 
s2 will be recognized as a recurrence system while the statement s3 is 
detected to be a vector operation. Statement s4 is detected as a scalar 
operation. Furthermore, the Boolean Analyzer generates the dependence 
g raph with statements as vertices and dependences as edges . The depen-
dence graph represents the internal structure of the gate macro. It 
rnMPII'T'Ti'P_JtTnr:>n nr:oc:orrn cr;occrr~n 
AtgoPithmic Layout of Gate MacPos 
High Level Languaee des c ription of gate macros 
J I Boolean Analyzer I 
Cell Generato r 
Dependence Graph Refiner 
• 1 Subcell Generator 1 
I Cell Binder ] 
IPara•neters Evaluati.onj 
Cell Layout 
I Symbolic Placement I 
~ 
!Layout Generator 
Timing Evaluation! 
(Struc ture Generator l 
~ 
Figure l. Block diagram of a gate-to-silicon compiler . 
indicates the critical time delay and cell structure of the future lay-
out alternatives. 
2. Cell Generator modules consist of Dependence Graph Refiner, 
Subcell Generator and Cell Binder. 
The Dependence Graph Refiner tries to break each of the dependence 
graph nodes into as many nodes as possible. The resulting dependence 
239 
CAL TECH CONFERENCE ON VLSI, Januapy 1981 
240 
Dani eZ D. Gajsk i ~ Av i n oam BiZ gory and Jos e ph Luhuka y 
graph is mor~ detailed, which allows the Cell Binder more flexibility in 
optlmization. Since statements s1 and s4 are scalar operatlons without 
operators their layout area and time delay are 0( ~ ) where ~ is a small 
value, so they are left untouched. Statement s2 is a recurrence with 
maximum O(n log n) layout area and minimum O(log n) time delay where n 
is the recurrence length. Since the recurrence node will be broken into 
three or more different types of subcells, its decomposition is left to 
the Snbcell Generator. Statement s3 has an O(n) layout area and 0(1) 
time delay. Since the EXCLUSIVE-oR oper~tlon is associative, statement 
s3 can be dissolved into s3 a and s3 b. Using the above approximation the 
original program is d lstributed as shown belo'"· 
Sl: C(O) = CIN 
DO I = 1,16 
S2: C(I) = A(l)*B(I) + (A(I) + B(I))*C(I-1) 
END 
DO I 1,16 
53a: T(I) = A(I) ID B(I) 
END 
DO I = 1,16 
53b: S(I) -= T(I) EB C(I-1) 
END 
s4: GOUT "" C(l6) 
The new dependence graph ls shown in Figure 2. 
The Subcell Generator consists of several submodules, each for one 
type of a block recognized by the Boolean Analyzer. Each submodule gen-
erates the functional description of the basic subcells used to syn-
thesize the given block. The recurrence statements s1 and s2 generate 
COMPUTER-AIDED DESIGN SESSION 
Algo ~ithmic Layout of Gate MacPos 
Figurcl 2 . De pendence gra ph o f distribut~d progr~m. 
f our types of subcells: 
type 2.1 su bcel l: G A*B 
type 2 . 2 subcell : p = A + 8 
type 2.3 subcell: G = Gl + G2*P l, p ~ p *P 1 2 
type 2.4 subcell: c G + P*Co 
A description of cell generation f or recurrence structures is f ound in 
[B iGa80]. Statements s3a and s3b gener~te one type of subce ll e a ch, 
called type 3a and 3b subcells, r e spectively. 
The Cell Binder combi~es subcells to form larger cells. The sub-
cells to be combined are selected according to the constra ints pos ed by 
the dependence graph. Since type 2.1 and 2.2 subcells (genera ted fo r the 
recurrence) pe r form vector operation as well a s type 3a s ubce ll, the 
three can be combined t o form one cell called type 1 cell. Type 2.4 and 
3b Bubcells can also be combined into one cell, but it was not done in 
this e xample, so type 2.3, 2. 4 and 3b s ubcells will e ach be assigned one 
CALTECH CO NFERENCE ON VLSI, Ja nua Py 1981 
242 
Daniel D. Gaj s ki , Avinoam Bilgo Py an d Jo s eph Luh ukay 
t ype of cell and r e named as type 2, 3 and 4 cells, respectively. The 
layout occupies minimum area when all the cell types have similar 
widths. 5o, if the Structure Ge nerator finds, for example, type 1 cell 
to be too l a rge , a separate cell type may be dedica ted t o subcell 3a. 
Since s3 is not on a critical path, this ce ll can be positioned almost . a 
anywhere in the layout in that case. 
3. Cel~ L~out modules consis t of Symbolic Placement and Layout 
Generator. 
The Symbolic Placement module ge ne rate s a two-dimensional array of 
symbolic transis tors and their connections. Compaction is done automat-
i cally wht:!n this two-dimens tonal array is tra11s lated by the Layout Gen-
er a t or into a comple t e mask description in compliance with layout design 
rules o f the chosen technology. 
Each cell can be manually designed if so des ired, leaving the 
p l.<icement a nd routing to be automatically performed by the system. The 
11anual cell des lgn presents one ex:tre•ne of the provided layout design 
space [MeCo80]. However, the ove rall aim is to have an automatic layout 
system, where a manual cell design or a cell library .is r eplaced by the 
library of algorl t h.ns in which one o r rnore algorithms for automatic gen-
e ratlon of layout s pecifications are available for each cell model sup-
plie d by the Ce ll Gene rator module. It then follows that the algo-
rithmi c l ayout ls t he othe r extre•ne of the layout design spectrum. 
For example , an obvious approach would be to implement each c e ll 
2 
with a sma l l ;Jrog r'lonmable logic array . The MOS a nd I L technologies are 
well a daptable t o a utoma tic synthesis as shown in [SOHT80] for one-
dime nsional sate a rrdys. 
a pproach a s de s c ribed be low. 
We have chosen a two-dimensional array 
COMPUTER - AIDED DESIGN SESSIO N 
ALgo~ithmic Layout of Gate Mac~os 
The Symbolic Placement module is based upon a grid system of tracks 
- or channels - on different layers of the integrated circuit structure. 
Interaction among the layers ls governed by the technology, and as a 
result, geometric relationship among the tracks is determined by the 
technology's layout design rules. Figure 3 shows a grid system which is 
used for silicon-gate MOS. 
---1--~-------------
---1-- --r-----------
---- - metal 
---- polysilicon 
----- - -----------
Figure 3. Sample grid system for MOS technology. 
Here the metal layer is more or less independent of the polysilicon and 
the diffusion layers, whereas polysilicon and diffusion interact 
strongly with each other. Hence polysilicon and diffusion tracks can be 
"hidden" underneath metal tracks. Using this 3rid as a base, synthesis 
procedures have been developed. For example, using a metal and polysili-
con grid like in Figure 3, two-dimensional arrays can be formed by mani-
pulating the diffusion to form the necessary devices, interconnected 
such as to build the required circuit. 
CALTECH CONFERENCE ON VLSI, JanuaPy 1981 
244 
Daniel D. Gajski ~ Avinoam Bilgo r y and Joseph Luhukay 
Flgurc 4 shows the processes implemente d by the Cell Layout 
modules. Input to the Symbolic Placement module consists of functional 
description of a cell (or a set of cells), in the form of a set o f AND-
OR-INVERT Boolean equations. In addition to this, basic topological 
information about the cell is also given, which comprises assignment of 
topological attributes to the input-output nodes of the celL For exam-
ple, the cell shown in Figure S(b) was specified with G1 , 
-P1 and T 
(ordere d from left to right) as t op-inputs coming in polysilicon, c2 and 
P2 (ordered f rom top to bottom) as right-inputs coming in metal, G, P 
and T (ordered from left to right) as bottom-outputs going out in 
polysilicon , and G and P (ordered from top to bottom) as left-outputs 
going out ln metal. The functional description specified for the cell 
is: G ~ Gl*Pl + Gl*G2; p a pl + p2 and T = T. 
If the I/O nodes ordering along the cell boundaries is fixed, such 
as in our case, then the Symbolic Placement module will start by order-
ing product-terms within an AND-oR-INVERT function, and also of the 
drive-transistor~ within a product-term. Otherwise, the module will 
first gene rate a symbolic placement of the functions themselves. The 
ordering's goal is to minimize the cell's height by r e ducing the number 
o f horizontal tracks needed to lay out the cell. In our example, we 
neerl t o place the product-terms of function G (G1 *i\ and G1 *G2 ), func-
tion P (P1 and P2 ) and function T (T), such that G1 , P1 and T which 
come in polysillcon- need not traverse any unnecessary vertical diffu-
sion trac ks . This is done by identifying the polysilic on input variable 
shar~d by both f unct ions (here : i\) and ordering the product terms such 
that metal crossovers for the polysilicon input variables (to get over 
diffusion tracks) are minimized. The following table shows how this 
process is done: 
COMPUTER -A IDED DESIGN SESSION 
245 
AlgoPithmic Layout of Gate MacPoB 
Transistor 
sizes 
Functlonal description 
+ 
Basic topological description 
No Gate 
Yes 
Product-term 
Symbolic placement 
Drive-transistor 
Symbolic placement 
Layout of; 
- diffusion product-term tracks 
- input nets & drive transistors 
- load structures 
- output nets 
- inverter st ructures 
mask description 
Symbolic 
Pl>tcement 
Symbolic Placement 
Layout Generator 
Figure 4. Block diagram of the Cell Layout modules. 
CALTECH CONFERENCE ON VLSI , JanuaPy 1981 
246 
Daniel D. GnjskiJ 4vinoam BilgoPy and Joseph Luhukay 
-
G1 p1 T G1 p1 T 
"· v. Gl*P1 1 l 0 G: c1 *G2 1 0 0 
c1*c2 1 0 0 Gl*Pl 1 l 0 
p: p1 0 l 0 P: pl 0 1 0 
-
p2 0 0 0 p2 0 0 0 
- - -T: T 0 0 l T: T 0 0 1 
Before ordering After ordering 
The output of the Symbolic Placement module is a tiible denoting 
rel~tlve placement of transistors on the reference grid system, and 
net-lists for the inputs and outputs. For our example, the table will 
be as follows: 
where columns denote vertical diffusion tracks, and rows denote horizon-
tAl polysilicon tracks. 
The Layout Generator uses the symbolic placement data to generate 
the masks, described in an intermediate form like the CIF [MeCo80]. It 
generates the rectangles necessary to lay out the masks: diffusion 
product-term "tracks", input nets and drive transi~tors, load struc-
tures, output nets, and inverter struct11res. Figure S shows the simu-
lated layout of four types of cells used in our example. 
Rather than predefining device parameters and then laying them out 
using a placement and routing scheme, the circuits are first laid out in 
an drray-like structure with minimum device sizes. The electrical and 
geometrical parameters are passed on to the next module. Iteration of 
COMPUTER-AIDED DESIGN SESSION 
247 
Algo~ithmic Layout of Gate Mac~os 
G 
p 
CO-
G = G1 + G2*P1 P = P1*P2 
T = T 
G = Gl*Pl + Gl*G2 ; p = PJ + p2 
'1' = T 
c 
(a) (b) 
G p T 
~~ 0 ~ru~ GND 
0 
M5" 
~ co 
I lo 
ca1J I VDD . 0 .. 
s 
T 'r S = C*T + C*T 
(c) (d) 
Figure 5. Layout of adderr s bas ic cells : 
(a) Type 2a cell; (b) Type 2b cell ; 
(c) Type 3 cell; (d) Type 4 cell . 
CALTECH CONFERENCE ON VLSI , Janua~y 1981 
248 
Da nieL D. Ga jsk i , Avinoam BitgoPy and J oseph Luhukay 
the process ~ill produce the desired clrcuit with the device sizes 
necessary to meet the design goals. 
4. Structure ~ene~ator attempts to obtain the best possible struc-
tur~ for the given func tional descrlptlon and environmental parameters. 
1t specifies the cell types, the position of each cell in the final lay-
out and the interconnections between the cells. 
Figure 6 shows the structure •>f a 16-bit binary adder . Each cell 
will be refered to as C[i,j], where land j are the row and column whe re 
the cell is l ocated, respectively, and the top rightmost cell is C(l,l]. 
D<ita are flowing only from top to bottom and from right to left. The 
four types of cells gener~ted by the Cell Generator are located as fol-
lows: type 1 cells in the first row, type 2 ln the second and third 
rows, type 3 in the fourth row and type 4 in the fifth row . The second, 
thlrd and fourth rows perform the carry-look-ahead. 
The lnput carry C(O) is fed into cells ~[4,1] through C[4,4] which , 
together with the cells in the second and third rows above them, func-
tion as the carry-look-ahead for carries C(l) through C(4). Then the 
output of cell C[4,4] (which is C(4)) is fed into cells C[4,5] through 
C[4,10] that <>Lmilarly produce the carries C(S) through C(lO). Lastly, 
the output of cell c[4,10] l.s fed into the cells to its left, so C(ll) 
through C(l6) are produced. 
~et us assume that each type of cell produces its outputs in the 
sa•ne time delay d after l.ts inputs are stable. For this particular adder 
example it was also given that the sum S(I) has to be available 7d and 
the input carry C(O) is available 3d after the inputs A(I) and B(I) are 
stable. ~lso, the fanout is limited: each cell can drive at most 7 
other cells. The structure shown in Figure 6 meets these constraints 
COMPUTER-A IDED DESIGN S ESSION 
AlgoPithmia Layout of Gat e Maa~os 
type 1 cell type 2 cell 
--,-,-.., 
I 
I 
! ~8 I I 
'{ 
type 3 cell type 4 cell 
Figure 6. 16-bit adder structure and types l, 2, 3 and 4 cells, in AND-OR form. 
CALTECH CONFERENCE ON VLSI, Janua~y 1981 
Daniel D. Gajski, Avinoam BiZgory and Joseph Luhukay 
with a very important feature - it has the minimum number of rows, 
therefore it occupies minimum chip area (however, this structure is not 
uni•111e). 
Several paths through the structure have the maximum specified 
delay. They will be called ct:'itical paths (e.g C[l,6] ~ C[3,6] + C[3,8] 
+ C[3,10] + C[4,10] + C[4,12] + C[5,14]). The functions that define 
each type of cell are evaluated by the Cell Generator in a sum of pro-
ducts form. Since in ~OS technology (where this example is implemented) 
an AND-DR-INVERT logic is imple1nented more naturally then AND-oR, the 
complemented outputs ;ue produced by each cell rather than the true 
ones. Inverting the outputs again ls ruled out, since it almost doubles 
the delay t Lme of each cell. For type 1 and 4 cells the double inversion 
problem is solved by modifying the functions to fit the complemented 
outputs . However, for type 2 and 3 cells this solution does not work, 
since these cells drive cells of the same type. Instead, two different 
subtypes of type 2 cell are defined: type 2a, which produces comple-
mented outputs from lts true inputs and type 2b, which produces true 
outputs from its complemented inputs. Now, cells along the critical 
paths are chosen to be of types 2a and 2b alternately. For type 3 
cells, invertlng the left output of C[4,4] and C[4,10] (that drive other 
type 3 cells) is unavoidable. Inverters must also be added to few type 
1 and 2 cells in order to adjust their outputs to the driven cells. For 
these cells, only the outputs that drive the cells in the same column 
are inverted again, while the outputs that drive cells to the left 
remain unchanged. Since critical paths have already been taken care of, 
the adder speed does not degrade by these inverters. In Figure 6, cells 
that contain additional inverters have a bar added above their type 
number. 
COMPUTER-AIDED DESIGN SESSION 
.4Zgoroithmic Layout of Gate Macrooa 
Conclusions 
We have described the basic ideas behind a gate-to-silicon compiler 
by walking through a simple and well-known example. The compiler con-
sists of four modules, each of which performs one step of the transla-
tion toward silicon level. The first translation is a crude approxima-
tion of the final layout, and therefore one or more iterations are 
needed to achieve a "near optimal" solution. 
The novel approach in our compiler is based on (a) the set of syn-
thesis procedures for decomposition of gate macros into small atomic 
cells and for optimization of obtained cellular structures wlth respect 
to environmental and technological parameters, and (b) the set of algo-
rithms for automatic layout of different cell models obtained through 
decomposition of gate macros. 
CALTECH CONFERENCE ON VLSI, Januaroy 1981 
252 
(BiGa80] 
[Joha79] 
(MeCo80] 
(SOHT80] 
[Verg80] 
Daniel D. Cajski, Avinoam Bilgopy and Joseph Luhukay 
References 
Rilgory, A. and Gajski, D. 0., "Automatic Cell Generation for 
Recurrence Structures" University of Illinois at Urbana-
Champaign, Department of Computer Science, Report UIUCDCS-R-
80-1040, November 1980. 
Johannsen, D., "Bristle Blocks: A Silicon Compiler," Proc. 
16th Design Automation Conf., pp 310-313, 1979. 
Mead, c. A., Conway, L.A., Introduction to VLSI Systems, 
Addison-Wesley, 1980. 
Shirakawa, I., Okuda, N., Harada, T., Tani, s. and Ozaki, H. , 
"A Layout System for the Random Logic Portion of MOS LSI," 
Proc. 17th Design Automation Con£., pp 92-99, 1980 . 
Vergnieres, B., "Macro Generation Algorithms for LSI Custom 
Chip Design," IBM J. Res. Develop., Vol. 24, pp 612-621, 
1980. 
COMPUTER-AIDED DESIGN SESSION 
