Method for Sizing MOS Transistors for VLSI by Amatangelo, Matthew J.
University of Central Florida 
STARS 
Retrospective Theses and Dissertations 
1988 
Method for Sizing MOS Transistors for VLSI 
Matthew J. Amatangelo 
University of Central Florida 
 Part of the Systems and Communications Commons 
Find similar works at: https://stars.library.ucf.edu/rtd 
University of Central Florida Libraries http://library.ucf.edu 
This Masters Thesis (Open Access) is brought to you for free and open access by STARS. It has been accepted for 
inclusion in Retrospective Theses and Dissertations by an authorized administrator of STARS. For more information, 
please contact STARS@ucf.edu. 
STARS Citation 
Amatangelo, Matthew J., "Method for Sizing MOS Transistors for VLSI" (1988). Retrospective Theses and 
Dissertations. 4254. 
https://stars.library.ucf.edu/rtd/4254 
METHOD FOR SIZING 
MOS TRANSISTORS FOR VLSI 
BY 
MATTHEY J. AMATANGELO 
B.S.E.E., University of Pittsburgh, 1978 
THESIS 
Submitted in partial fulfillment of the requirements 
for the degree of Master of Science in Engineering 
in the Graduate Studies Program 
of the College of Engineering 





Determining the device width to length ratios has typically been an 
iterative process for the custom IC digital design engineer. After the 
logic design phase is complete for a particular circuit, the designer 
would make an educated guess at the device sizes. Then by trial and 
error, using SPICE or another circuit simulator, suitable sizes would be 
determined. Unfortunately, this approach is time consuming and the 
resulting sizes are often a good bit larger than they need to be to 
maintain a certain speed because of the lack of a rigorous sizing 
methodology. 
This paper describes a method for reducing the time in obtaining a 
CMOS circuit design by providing the designer with transistor sizes which 
yield consistent gate to gate propagation delays within a delay path. 
The technical justifications are developed and several test cases are 
synthesized to illustrate this method. Switching time accuracy is 
verified using SPICE and the automatically generated sizes. A program 
written in the Ada language to perform device sizing is discussed as 
well. 
The ramifications of area reduction are discussed as it pertains to 
custom and semicustom design methodologies. Algorithms to perform area 
minimization are presented along with other enhancements to the program. 
ACKNOVLEDGMENTS 
I am grateful to my thesis advisor, Dr. Brian Petrasko, for his 
patience and support throughout the course of this work. Also, I am 
pleased to acknowledge the contributions of Mark Bluhm who discussed this 
thesis topic with me often and provided significant advice on developing 
application programs. 
I thank my parents for their love and guidance. 
examples will always be part of me. 
Their words and 
Finally, I thank my wife for her love and encouragement. 









TABLE OF CONTENTS 
INTRODUCTION 
Introduction to the Sizing Design Rules 
Consequences of Area Optimization .... 
THE ALGORITHM 
Fanout and the Tpp Equation ..... 
Logic Gate Threshold 
Key Algorithm Points ....... . 
Precision Sizing with SPICE 
Categorizing Load Capacitances. 
Gate Types and the Algorithm .. 
THE ADA PROGRAM IMPLEMENTATION. 
SYNTHESIS USING THE SIZING PROGRAM. 














• • 16 
21 
• • 25 







• • 63 
• • 66 
• • • • • • • • 84 
CHAPTER 1 
INTRODUCTION 
Circuit design is the phase of a custom design cycle to determine 
what the attributes of the transistors need to be for the circuit to 
perform properly. Once a function's logic is mostly defined, the design 
effort continues into this phase to assure that the circuit's performance 
is as it should be. The bulk of the time of circuit design is spent 
simulating the particular circuit. The most important performance 
parameter is delay time. In order to design a given logic path for 
speed, the designer makes an educated guess at the transistor sizes then 
exercises the circuit using a circuit simulator (e.g., SPICE). If 
expectations are not met, this process is repeated until acceptable 
results are achieved. Circuit design is usually a highly iterative 
procedure which emphasizes timing, often at the expense of area and 
power. 
In the semicustom world, design turnaround time is paramount. In 
order to obtain a short design cycle, area, while still important, is 
sacrificed to pass data through a critical path within a given allotted 
time. Thus, certain considerations in performing transistor sizing are 
more appropriate for semicustom design while others are more appropriate 
for custom circuit design. The central considerations in this report are 
timing and design time. 
2 
The transistor size calculations can be tailored to certain 
influences depending on the overall design methodology. For example, the 
reason for doing a hand packed custom design is to emphasize a minimum 
layout area. The custom designer has the flexibility in terms of 
schematic and layout alterations to achieve a denser circuit realization 
at the expense of his design time. On the other hand, in a semicustom 
environment, density is sacrificed for quick turnaround time. The 
circuit is placed and routed automatically resulting in a significant 
inability to control layout parasitics. The difference boils down to 
this: In a custom design environment there exists the ability to vary a 
driver's transitioning time to reduce the layout area. Because of a 
significantly lesser ability of doing this in a semicustom environment, 
adhering to a consistent transition time in this case is intuitively 
sensible. Yhen one considers that increasing the input transition time 
or slew rate of a driver much beyond that of what its output is capable 
of slewing causes the transistor's effective drain to source resistance 
to increase [1, pp. 529-35), then it becomes obvious that the 
uncontrollable variable, layout parasitics, has a serious affect on the 
delay path's timing. This phenomenon greatly complicates the problem of 
transistor sizing and would render the model introduced in the next 
chapter as an optimistic prediction of the actual response. In this 
light, it even makes sense to adhere to a constant transition time in the 
custom world, at least for a first pass attempt at finding the transistor 
sizes, since minimizing transistor widths involves the time consuming 
effort of performing trade-offs between nodal capacitances versus driver 
slew rates. 
3 
The discussion in the next chapter includes the term "fanout." In 
this report, fanout for CMOS integrated circuits is defined as the ratio 
of the output capacitance which a logic gate drives to its input 
capacitance. The use of the term fanout is illustrated by the following 
description of the slow input transition time problem in critical speed 
path designs. If gate 1 drives gate 2, it is possible that gate l's 
fanout is so large that its slow output transitions cause gate 2's 
switching speed to depend more significantly on its input slew rate 
rather than singly on its fanout. 
Therefore, the objective is to provide a method and a software 
package which sizes transistors such that the resultant total path delay 
conforms to the design timing criterion. Predictability is of utmost 
importance. Thus, the calculated sizes must provide the anticipated 
speed. And non-linear complications, as in the case of input slew rates 
that are too slow, must be avoided. This sizing method discussed herein 
is meant to be used in both custom and semicustom designs. For custom 
work, this sizing tool may be used only to find reasonable transistor 
sizes as a starting point for additional analysis. Or, when design time 
becomes more critical, the results from this algorithm can be utilized as 
the final sizes for either custom or semicustom methodologies. In 
Chapter 5, enhancements to this method are considered for area reduction. 
Introduction to the Sizing Design Rules 
The following influences are identified in device sizing: 1) layout 
area, 2) input capacitance of the delay path, and 3) transition time or 
slew rate between logic gates. As implied· above, the custom designer has 
4 
more control over these items. A custom circuit can be tailored to mesh 
with other custom circuits around it and the custom designer has the 
flexibility, at least more so than the semicustom designer, to decide the 
weighting of these three items in the design of a given circuit. This is 
as appropriate a time as any 
developing the sizing algorithm 
to introduce the design rules used in 
discussed in this manuscript. These 
ground rules for solving transistor sizes which address these influences 
are: 1) maintain a constant fanout, and hence a constant delay time, 
between logic gates for a particular delay path and 2) keep the logic 
gate threshold voltage equal to half of the power supply. Both of these 
design rules significantly simplify the problem by reducing the 
computational intensity. 
A constant delay path fanout helps 
sizing problem can now be a linear one. 
in several ways. First, the 
A varying fanout, on the other 
hand, allows one gate's transition time to be different from another's. 
If any transition becomes too slow such that it affects the 
responsiveness of a subsequent gate [1], then predictability is 
compromised. By keeping the gate-to-gate fanouts equal, it becomes 
easier to control all transitioning rates and avoid this undesirable 
affect. In addition to avoiding this undesirable aspect of slow input 
slew rates, another effect associated with slow transition rates is 
avoided: excessive DC current resulting from both P and N channels being 
ON simultaneously for an extended time. Thus power is also predictable. 
"Constant fanout" is synonymous with "geometric scaling" [2', pp. 
12-14). This "staged buffer" approach to sizing for speed maintains a 
linear power distribution and optimizes the overall path's speed-power 
5 
product. 
Another advantage of the constant fanout design rule is that the 
resulting input capacitance of the delay path is minimized for the drive 
required of that path. That is, the input capacitance value of the first 
gate of the path is a minimum for a particular path delay if the delay 
per gate is kept equal. This will be shown later. The obvious advantage 
is that other paths driving this path will see a minimum path load 
affecting their output. 
Another advantage of constant fanout is the potential for sizing 
multiple paths having common drivers within them (this will later be 
referred to as the "diverging paths" problem). This more complex problem 
can now be addressed because of the reduction in complexity and the focus 
on consistent gate delays: 1) The calculations are kept simple by 
utilizing geometric scaling. 2) The multiple diverging path problem is 
reduced to the single path problem plus a simple comparison of unit 
delays per diverging paths for selecting a value for the common gate(s). 
This will be discussed later in Chapter 5. 
As mentioned before, the second design rule of keeping the logic 
gate threshold constant for all gates also significantly reduces the 
computational intensity of the sizing problem. This is achieved by 
setting a process dependent variable, called ~PN, which results in the 
pull-up and pull-down transition rates being equal. 
6 
Consequences of Area Optimization 
Thus far, the presentation has focused on the use of constant fanout 
to affect the transition times between gates and the input capacitance of 
the delay path. However, this sizing method does not minimize area 
alone. This section investigates the consequences of optimizing with 
area as the priority. 
The designer is faced with these constraints: given a particular 
delay path and a certain speed, size the devices such that the area is a 
minimum, assuming a constant logic gate threshold (for simplicity, design 
rule no. 2, above, will be maintained). An equation, regarding the 
relationship between the summation of a path's fanouts and total path 
delay, will be introduced in the next chapter. But for now, let us refer 
to this relationship as the "Tpp equation" or total path propagation 
delay equation. 
Since the priority of the designer in this case is to achieve a 
minimum area design within the constraint of the allocated delay path 
time, then the following is of utmost importance. 
n 
Area= L * ~ Y(i) 
i=l 
or 
f(i) = C(i+l)/C(i) 
n n 
Area= [C(n+l)/Cox] * ~ 1/{ 7f f(i)} 
j=l i=j 
The following terms need defined: Land Y(i) are the length and width of 
the transistors in the delay path, f(i) is the fanout of logic gate i, 
Cox is the physical permitivity constant for silicon dioxide and C(n+l) 
7 
is the output node capacitance of a delay path with n drivers, i.e., C(l) 
would be the input capacitance of the first driver with f(l) defined as 
the ratio of C(2)/C(l). Please note, Land Cox are process dependent 
variables and will be considered as constants throughout this paper. 
Since f(n), the fanout of the last driver, is in the denominator of 
each product term in the above equations, it intuitively makes sense that 
Area can be minimized by first making f(n) as large as possible, then 
f(n-1), and so on. Of course, the "Tpp equation" still needs to be 
satisfied to assure that the allocated total path propagation time is 
met. Therefore, f(l), which is only present in the final product term, 
would be as small as possible, then f(2), etc. However, how much is "as 
large or as small as possible?" This is going to depend upon the 
distribution of interconnect capacitances (including "non-delay path gate 
capacitance," i.e., input gate capacitances of devices not within the 
serial path of the delay path in question) throughout the delay path. 
This will be discussed in the enhancement section of this paper. But the 
ramifications of minimizing layout area are manifested in the following 
constraints: 
1) By letting f(i) vary within a delay path, slew rates vary between 
gates thus allowing for size estimation errors per the previously 
referenced phenomenon of slow input slew rates affecting output 
responsiveness. 
2) Signals having slow slew rates are dangerous if these signals control 
pass gates or transmission gates because an inherent race condition 
is present with the data path. * (See page 8.) 
3) Sizing several delay paths becomes very awkward when the gate to gate 
fanout of a path can vary. Branches within a path to another path 
8 
result in infinite iterative loops unless rules are imposed to stop 
at some "reasonable" answer. 
On the other hand, if f(i) is held constant throughout a delay path, 
the transition time between each gate will be equal. If that f(i) is of 
a reasonable tested value (such as will be discussed in the "Precision 
Sizing with SPICE" section of the next chapter regarding fanout curve 
generation), the only problem in predicting the total path propagation 
time is if the input transition time is too slow. (This becomes 
relatively small as n increases.) Hence, if the input capacitance of this 
particular delay path is known by any circuit which would supply the 
input signal, then the Tpp equation would always be reasonably accurate 
to predict that total path delay. 
As mentioned in the design rules as an attribute of keeping f(i), 
the gate to gate fanout, constant, one aid in reducing size prediction 
errors would be to keep the input capacitance of a delay path, C(l), as 
small as possible, thus reducing extra load on other paths which would 
drive this particular path. An intuitive illustration of this is in 
Appendix A. 
By keeping f(i) constant throughout a delay path and by maintaining 
a constant logic gate threshold voltage throughout the entire design, a 
* A paradigm of this problem is that of the master/slave flip flop. 
The clock control signal enables the master while disabling the 
slave, and visa versa. A common situation is one where CLOCK and its 
complement, CLOCKBAR, are used as those control signals. During a 
transition of CLOCK, the transistor thresholds for both master and 
slave pass gates are satisfied and current flows through each. This 
allows the input data node to have a direct and immediate affect on 
the output node (Q). And the slower the transition time, the ldnger 
this condition is present which raises the probability of "shorting" 
the input data to the output. 
9 
sizing algorithm is considerably simpler to implement and safer regarding 
predictability. The implemented program is .based upon simple scalable 
sizing [3]. In the enhancement section, an area improvement method of 
altering f(i) based on the distribution of delay path interconnect 
capacitances (and other NON-delay path gate capacitances) is presented. 
Also in that section, another area reduction algoritcm is included for 
multiple input gates. The latter enhancement would be employed 
subsequent to the former, but does not require the execution of the 
former. 
The most important issues in circuit design are that the design gets 
done and that it works. The major component of the former is the design 
time while speed is a key element of the latter. It seems most prudent 
to at least establish a reasonable working solution as quickly as 
possible and then optimize the final issue in the design time remaining. 
Thus, this paper presents a method (using constant f(i)) to find a 
working solution as simply as possible. 
chapter, optimizations are discussed. 
Then, in the enhancements 
CHAPTER 2 
THE ALGORITHM 
In this section, the sizing algorithm is developed. The total path 
propagation delay equation is derived. Then, the logic gate threshold, 
Vinv, is explored. The explanations of how Vinv is defined and why it is 
fixed (i.e., the second design rule) is included. The key algorithm 
steps are outlined and an example is presented. Afterwards, a detailed 
explanation follows of how precision for automatic sizing is achieved by 
using a circuit simulator (SPICE) to establish fanout curves and VPN, a 
ratio of the pull-up transistor width to the pull-down width which 
implements the second design rule. In this particular section, the use 
of "active" and "passive" fanouts are described pertaining to the 
development of a set of CMOS process dependent fanout curves. A brief 
section is presented categorizing various load capacitances as they 
pertain to the sizing program. Finally, this chapter concludes with a 
section illustrating how different gate types are handled. 
Previously, the sizing design rule of a constant f(i) within a delay 
path was identified. The discussion continues such that a complete 
understanding of the relationship between fanout and speed is 
accomplished. A first order analysis of this relation follows. 
10 
11 
Before these considerations are discussed in detail, the pertinent 
terms of fanout and Total Path Propagation delay, or Tpp, are introduced. 
Fanout and the Tpp Equation 
Fanout or f is defined in this paper as 
f(i) = C(i+l)/C(i) 
<1> 
where f(i) is the fanout of the ith gate, modeled as an inverter driver, 
C(i+l) is the capacitive load at its output and C(i) is its input 
capacitance. C(i) also indicates the current drive capability of the 
device. The phrase "gain to load ratio" can also be used to describe 
this fanout relationship. The capacitance C(i) is defined as 
C(i) V(i) * L * Cox 
<2> 
Cox is the gate oxide capacitance per unit area while V(i) and Lare the 
device's width and length. (In this model level delta L effects such as 
Miller capacitance are not considered.) Lis normally the minimum drawn 
gate feature size. The device length L will be considered as a process 
parameter rather than a design variable. The propagation time or delay 
time of any driver is expressed as 
tp(i) Tk * k(i) * f(i) 
<3> 
The parameter k(i) is used to express the desired output voltage level of 
the driver (i.e., the logic gate threshold of the subsequent logic 
device) in terms of time units necessary to· cause the subsequent device 
12 
to switch. Note that if this threshold were kept constant, k(i) would 
simply toggle between two values depending on whether i was even or odd. 
In the following discussion the threshold is exactly half of the power 
supply. Thus, this multiplier is a constant, k, and is independent of i, 
the position of the driver in the delay path. 
The other parameter from equation <3>, Tk, is a constant in units of 
seconds for a particular CMOS technology and is derived in the following 
manner: 
tp(i) k * Ron(i) * C(i+l) 
<4> 
Ron(i) = L/[Cox*u*V(i)*(Vgs-Vt-Vds)] 
<S> 
This is the inverse of the transconductance, gm. Let Ron(i) be the 
approximate average transistor ON resistance with the gate-source 
potential at the maximum current level and the drain-source voltage at 
half of the power supply (Vgs Vdd and Vds = Vdd/2). Substituting 
equations <1>, <2> and <S> into equation <4> and utilizing the 
approximation for average ON resistance, the following is obtained: 
Tk = L**2/[u*((Vdd/2) - Vt)] 
<6> 
where u is the mobility, Vdd is the supply voltage and Vt is the 
transistor's threshold voltage. Assume that a driver does not switch 
until its input signal reaches some threshold (which defines k for 
equation <4>) and slews at an average rate indicative of Ron(i), then the 
total path propagation delay Tpp of a series of n logic gates (modeled as 
drivers) can be expressed as 
n 
Tpp Tk*~ f(i) 
i=l 
By following the first design rule where f(i)=f, equation <7> becomes 
Tpp = Tk * n * f 
13 
<7> 
This is the Tpp or "total path propagation delay equation." Note, the 
assumption regarding driver threshold is investigated in the next 
section. Subsequent discussions on algorithm implementations utilize 
equation <7> to predict transistor sizes given a user allocated Tpp. 
Logic Gate Threshold 
In the above discussion, the logic gate threshold was fixed at one 
half of the supply voltage, the second design rule. The following 
discussion considers the logic gate threshold definition and this design 
rule. To illustrate the switching considerations of a logic gate, the 
discussion considers a single gate. Rewriting equation <4>, 
tp k*Ron*Cout 
<B> 
where Ron is the average drain to source resistance of the transistor, 
Cout is the output load capacitance of that gate and k is a multiplier 
used to express time with respect to the desired output voltage level; 
kn= -ln(Vout/Vdd) for devices pulling low and kp = -ln(l-(Vout/Vdd)) for 
devices forcing the output high. Since N and P channel MOSFETs drive a 
standard CMOS logic gate's output low and high, respectively, kn and kp 
were chosen to represent the appropriate multipliers. Thus, for an 
14 
N-channel, this average ON resistance would be 
Rn= L/[Cox*u*Yn*((Vdd/2)-Vtn)] 
<9> 
where Rn and Yn represent the N transistor Ron and width, respectively, 
Vdd is the supply and Vtn is the N-channel threshold voltage. The 
delta L affects are ignored in this model level. 
Cin is the input capacitance of the gate and in this case is defined 
as 
Cin = Yn*L*Cox 
Thus, using <9> and this relationship in equation <8>, tp · becomes 
tp = kn*Cout*(L**2)/[Cin*u*((Vdd/2)-Vtn)] 
Replacing Cout/Cin with f, the fanout, tp is defined as 
tp = f*kn*[L**2/(u*((Vdd/2)-Vtn))] 
<10> 
<11> 
However, what is this "desired Vout" level which would then define kn and 
kp? Vell, it is called Vinv. 
Vinv is the voltage at the inverter's input which yields the same 
value of potential at its output, hence, both of the inverter's 
transistors are in saturation. Based on Idsn = -Idsp, Vinv (4, p. 47]] 
is expressed as 
Vinv = [Vdd+Vtp+Vtn*(Bn/Bp)**0.5]/[l+(Bn/Bp)**0.5] 
<12> 
where Bn and Bp are equal to Cox*u*Y/L for the appropriate transistor 
15 
type (recalling that u is the mobility introduced in <5> and <6>), Vtp 
and Vtn are the MOSFET threshold voltages and Vdd is the power supply 
voltage (referenced to ground, the negative supply). Conceptually, one 
inverter's output does not change until Vinv is achieved at its input. 
Given a reasonable input slew rate as discussed earlier, this conceptual 
interpretation serves as a good approximation. Thus, the propagation 
delay for one gate is essentially the time required for that gate to 
reach the Vinv of the next gate. 
Determining the most suitable Vinv is crucial for finding the p and 
N transistor sizes (Vp and Vn). Appendix Bis. an approach of utilizing 
an inverter pair delay to ascertain a speed-optimum relationship for the 
Vp to Vn ratio. This proves to be inappropriate for the sizing problem 
because the total capacitive distribution can vary, hence significantly 
changing this Vp to Vn ratio. 
However, a different approach to calculating a fixed P to N width 
ratio can be used for complementary logic speed considerations. In this 
case, we look at a single inverter and not a pair delay so that the P to 
N width ratio is independent of the node capacitance distribution. In a 
similar fashion as the area of a rectangle is optimized by keeping all 
perimeter edges equal in length, the quickest combination of high and low 
propagation delays (given a constant f(i) for all i) is achieved by 
keeping the rise and fall times equal. This assumes that the signal 
generated by the delay path is not required to produce a faster high 
transition than a low one and visa versa. In other terms, Vp/Vn is set 
such that Vinv 
-Vtp). This 
Vdd/2 (which also eliminates the need for assuming Vtn = 
value of Vinv is basically at the high end of the range of 
16 
equation <B4> in Appendix B and, as an advantage, yields the best 
possible noise margin. From here on, VPN shall be defined as that p to N 
width ratio which yields this symmetrical condition (Vinv = Vdd/2). 
Before proceeding, it should be mentioned that Vinv is another 
source of consideration for area improvement and meshes with the 
discussion in the previous section regarding inconsistent gate to gate 
slew rates within a delay path. The difference between varying Vinv and 
the aforementioned discussion is that not only will the slew rate vary 
from gate to gate, but also within any given gate; that is, the rise and 
fall times will differ. This additional degree of freedom significantly 
complicates developing a sizing method and hence will not again be 
considered in this paper. 
Ve have identified and defined two items needed for sizing: 
and a P to N width ratio. VPN is a product of a constant Vinv. 
Key Algorithm Points 
fanout 
In the sizing algorithm, all the capacitance of the driven node will 
be lumped together into one capacitor. Then that capacitance is 
translated into an inverter, termed "load inverter," having the 
appropriate VPN corresponding to the Lp, Ln (i.e., the effective P and N 
channel gate lengths, considering delta L) and Cox of the particular 
process. The P and N channel lengths do not necessarily have to be the 
same as long as they are kept consistent from gate to gate. Vith this 
load inverter (LINV), the sizes of the driving inverter are simply 
calculated by dividing the LINV's sizes by the fanout. If the driving 
gate is not an inverter, then these sizes are the "effective" sizes of 
17 
that gate. The actual sizes are derived by multiplying the effective 
ones by the appropriate number of Nor channel transistors that are in 
series. Also, parallel transistors add drain load and hence also alter 
the results of the size calculation: 
V(i) = y * C(i+l)/[ (L*Cox*f(i)) - Cdw] 
<13> 
where V(i) represents either the P or N width such that, for the P width, 
y = VPN/(VPN+l) and Cdw = Cdwp and, for the N width, y = 1/(VPN+l) and 
Cdw = Cdwn. Cdw is the drain junction capacitance per unit of width. 
This will be expanded later in this chapter under the section "Gate Types 
and the Algorithm." 
In summarizing this discussion about the algorithm, by establishing 
a constant fanout and using VPN as the width ratio, we can then proceed 
with the algorithm. The key points to the algorithm are listed below. 
a translate Cload into an inverter (LINV) 
a divide the sizes of LINV by f, resulting in the effective sizes of 
the driver, and then 
a find the actual sizes of the driver by considering series and 
parallel logic gates 
This method is worked from the output of a delay path towards the source 
by repeating these steps for each logic gate. 
IN 
18 
The following is an example to better illustrate the algorithm: 
U 1 U2 U3 





<---------------- 5 Gate Delays----------------> 
Figure 1. Example Delay Path. 
Translated, equation <14> states that the capacitance of node i is 
composed of interconnect and input gate capacitance. Please note, 
however, only the gate capacitance of elements related directly to the 
particular delay path (i.e., drivers of that delay path) are considered 
in the first component of equation <14>; all other capacitances, gate or 
otherwise, are lumped into Cint(i). This is qualified in the section 
entitled "Categorizing Load Capacitances" later in this chapter. 
Let the allocated time through this path be lOns. In Figure 1, let 
Cint(3) = 2pF and Cint(5) = lOpF. Also, assume the following at the 
critical temperature: Vdd 5v, Vtn -Vtp 
400cm**2/v-s (or 4E10 microns**2/v-s), Cox 
lv, un = 2*up 
1.0fF/u**2, Lp = Ln 
2 microns, and that the propagation delay time will be measured from the 
50% level of the delay path's input to the 50% output level (kn= kp; 
0.69). Initially, VPN should be found to establish the P to N size 




Assume the effective lengths, Leffp and Leffn, are equal to Lp and Ln. 
Therefore, VPN= 2.0 for this example. 
The next task is to calculate the required fanout, f, such that the 
path's speed is satisfactory. This is derived by first manipulating the 
input gate capacitance. For an inverter, the simplest driver gate, input 
gate capacitance is 
Cin =Cox* (Vp*Lp + Vn*Ln) 
Since Lp = Ln, 
Cin = Cox*L*Vn*(VPN + 1) 
<16> 
<17> 
Because of employing VPN, the average ON resistance of the P and N 
channel are equal (Rp = Rn). Hence we can calculate the fanout based on 
either type of transistor. Combining equations <11> and <17>, the fanout 
can be expressed as 
f = tp/[kn*Rn*Cin] 
<18> 
or 
f = [tp*un*((Vdd/2) - Vtn)]/[kn*(L**2)*(VPN + l)] 
<19> 
where tp is the allocated time through the speed path divided by the 
number of gate delays. This expression is an approximation, of course, 
and a circuit simulator like SPICE should be used to develop a precise 
fanout versus propagation delay per gate relationship. This is discussed 
later in the next section with reference to the Ada program. Ve obtain f 
; 14.5 using this expression with the above givens. 
20 
Exercising the algorithm, the following is obtained. (Refer to Figure 1.) 
US: C(S) = Cint(S) 
The N channel size of the equivalent "load inverter" seen by US is 
Yn_LINV(S) = C(S)/(Cox*L*(YPN+l)) 
Only 1 input to US; hence actual and effective sizes are the same -
Yn(5) = Yn_LINV(5)/f 
Yp(S) YPN * YN(5) 
U4: C(4) = Cint(4) + Cox*L*[ Yp(S) + Yn(5) ] where Cint(4) = 0 
Yn_LINV(4) C(4)/(Cox*L*(YPN+l)) 
Yn(4) Yn_LINV(4)/f; (i.e., U4 has only one input) 
Yp(4) = Yn(4)*YPN 
U3: C(3) = Cint(3) + Cox*L*[ Yp(4) + Yn(4) ] 
Yn_LINV(3) = C(3)/(Cox*L*(YPN+l)) 
etc. 
Yn_eff(3) = Yn_LINV(3)/f 
Yn(3) = 2*Yn_eff(3) 
Yp(3) = YPN*Yn_eff(3) note: parallel transistor affects not 
considered here 

















Note that the purpose of this example is purely to familiarize the reader 
with the algorithm. The next section will include other considerations 
beyond what has been illustrated here. 
Precision Sizing with SPICE 
In this section, preferred alternatives to making calculations to 
determine f and VPN are presented. Indeed, performing these efforts 
serve as preparation for executing the sizing program discussed in the 
next chapter. These preparations will result in the generation of a 
technology table in the form of a data program which will be accessed by 
the main program. A discussion of VPN follows since VPN is vital to 
deriving fanout curves. 
Recall what VPN is: the P to N width ratio which yields a Vinv of 
one half of the power supply (Vdd/2). And since Vdd+Vtp-Vtn is much 
greater than zero for all CMOS processes known to this author, this Vinv 
reflects equal rise and fall transitions. 
22 
The calculation of VPN (equation <15> in the previous section) 
serves as a reasonable approximation. However, approximations here lead 
to bigger errors in deriving fanout values and eventually in finding the 
transistor sizes. Therefore, the preferred approach is an iterative one 
which includes SPICE simulations. To obtain VPN, SPICE simulations of an 
inverter with several nominal loads and with varying transistor size(s) 
are conducted to determine the ratio that results in the most symmetrical 
waveforms (i.e., equal rise and fall times). This is a trial and error 
task but should not consume much time since it is easy to see whether 
either the Nor P size needs changed (e.g., make the N bigger if the fall 
time is longer than the rise time of a particular simulation run). A 
high resolution time scale (~O.lns per step) is used to check these 
transitions if no routine is available to automatically measure them. 
Once the symmetrical condition is found, VPN is the result of the P width 
divided by the N width. If VPN appears to be load dependent, then the 
user must choose a VPN value which, in his judgment, best represents the 
process. Also, the user has to determine at which conditions 
(temperature and supply voltage) and process parameter set (typical, 
worst case, etc.) he wants to baseline the sizes around. The author 
suggests using the slowest of these combinations since speed is likely to 
be the critical issue. 
Another variable to be defined before starting the fanout curve 
simulations is the width; that is, what Vp and Vn should be used as the 
driver for determining VPN and deriving these fanout curves? This is 
certainly a source of error since smaller widths are affected more by 
deltaV and width affects than are larger ones. The author suggests using 
a Vn of about three to four times that of the minimum width. Since the 
23 
majority of the transistors in delay paths will be considerably bigger 
than minimum, a Yn of lOu was selected for running the example in the 
Synthesis Chapter. The corresponding error will be within the tolerance 
of the overall algorithm. 
Vith an assigned value for VPN, the fanout versus delay-per-gate 
relationships can be established. As in the case of VPN, these 
relationships need to be as precise as possible; equation <19> is only an 
approximation. Accuracy is achieved by using actual circuit simulation 
results (i.e., SPICE of one simple inverter driver). Since the actual 
capacitance of a HOSFET gate changes with varying gate potential, the 
suggested method is to generate graphs for both an active load (that of a 
driven MOSFET gate) and a passive load. Depletion and Miller effects are 
not significant with interconnect capacitance (i.e., the passive load). 
One inverter driving another off-times larger constitutes the active 
load situation and one inverter driving a capacitor of value 
f*Cox*(Vp+Vn)*L constitutes the passive simulation. (Lis simply the 
drawn gate length. If P and N drawn lengths are different, simply 
multiply each individually with its corresponding width.) The x-axis 
measures the delay per gate domain while the fanout corresponds to the 
y-axis. The delay-per-gate value is measured from the input signal's 50% 
level to the same of the output signal. The input transition time should 
be as near to zero as possible and slew through the entire supply voltage 
range. 
Vith regards to the aforementioned technology table, the desired 
result of generating both passive and active fanout graphs is to produce 
four parameters, namely m_pass, m_act, b_pass ·and b_act, to incorporate 
24 
into this data program. These parameters are the coefficients for two 
(passive and active) simple line equations (y = mx + b) where m and bare 
calculated from the fanout data points and, of course, are the slope and 
y-axis intercept, respectively. A more precise fanout versus time 
relationship can be extracted from the SPICE results if desired. 
However, the simple line equation suffices for fanout values above 2 
or 3. Below that value, the self-parasitics of the transistor become 
significant with respect to the load. Besides, reference (2) illustrates 
that utilizing a fanout of less thane (the base number of the natural 
logarithm) reduces the area versus speed efficiency. 
To develop the fanout graphs, different values of fanout are chosen 
to yield points along the curve. Both active and passive fanout curves 
were derived with the following values of fanout to check the validity of 
the resulting relationships: 3, 5, 7, 10, 15 and 20. (Three points 
would have been sufficient, e.g. f 
the drain diffusion capacitance 
3, 10 and 20.) It is imperative that 
for both P and N transistors of the 
driver inverter are incorporated into the simulation model. 
Now that passive and active fanout curves can be generated to 
determine a fanout value given a particular time value (delay per gate), 
considerations are in order to cover all speed issues for all conditions 
of concern. In other words, m_pass, m_act, b_pass and b act must be 
generated for any combination of process, supply voltage and ambient 
temperature that may apply for a given project. As will be seen in the 
next chapter, the technology table, as it exists now, for example, was 
written to allow for any combination of three different process ratings 
(BEST, TYPICAL and V0RST), three supply values (4.5, 5.0 and 5.5 volts) 
25 
and three temperatures (-55, 25 and 125 degrees C); a total of 27 
different combinations of conditions and ratings, each having been 
generated by both an active and passive fanout curve. 
Categorizing Load Capacitances 
It is probably about time that a section is devoted to explaining 
how the various "real" capacitances of an integrated circuit fit into 
this sizing program. First of all, the only capacitance the user does 
not have to tell the program about is that associated with the gate 
capacitance of one of the drivers within the serial delay path in 
question. On the contrary, capacitances associated with interconnect, 
driver devices of other paths and drain junction capacitances of other 
paths must all be spelled out to the sizing program of their existence. 
The next chapter will illustrate how an input file of the existing 
sizing program must look. Only one value of capacitance is requested in 
this file which must represent the total of those capacitances listed 
above that the user must input. However, the author believes that the 
program should allow a separate field for each of these three types of 
capacitance. Subsequently, a revision to the program is in order. 
(Considerations for multiple path sizing would alter this aspect of the 
program further. See chapter 5.) Load capacitance and the user interface 
is discussed further in Appendix c. For now, the capacitance included in 
the input file should be weighted as far as how much is associated with 
devices of branch paths, both gate oxide and diffusion capacitance~ and 
how much is associated with interconnect. (On a side note: To 
accommodate the later revision of allowing three fields of capacitance, a 
26 
relation could be established for a diffusion capacitance load component. 
Like that of the passive and active fanout curves, a fanout graph with 
diffusion capacitance would improve accuracy. A practical application of 
why considering diffusion capacitance is important would be exemplified 
by a bus node where outputs of tristate drivers reside.) 
Gate Types and the Algorithm 
The basics of calculating transistor sizes has been illustrated. 
However, except for the simple example of a fully complemented NANO gate 
in the section entitled "Key Algorithm Points," sizing fundamentals have 
only been explained employing the inverter. In this section, the 
handling of multiple input logic gates (e.g., NANDs, etc.) is discussed 
including parallel transistor considerations. Also, transistor sizing 
algorithms for other than fully complemented (one PHOS device for each 
NHOS device) logic implementations such as domino or NORA logic (5, pp. 
261-66)) are presented. These topics will be introduced from Figure 2, 
the program flow for calculating the transistor sizes. 
The first calculation of Figure 2 determines the total capacitive 
load seen by the particular gate. This includes the value entered into 




(The word "adjusted" relates to taking the active value 
pertaining to this specific "subsequent" gate and 
into a passive value. This translation is actually done 
after a gate's sizes are calculated. Then, of course, this value is 
utilized in the next pass through the iteration for the "previous" gate. 
Please note, the "previous" gate is the driver ·gate of the "subsequent" 
or "driven" gate.) 
C Load + C(Next-Gate) 
N 
Calculate Full Complementary 
Effective Sizes 
Increase Series Transistor 
Count by 1 
N 
y 
Request Precharge Time 
Calculate Precharge 
Transistor Size 
Calculate Effective Size 
of Evaluation Transistors 
Calculate Actual Sizes Considering the Number 
of Parallel and Series Transistors 
Consider DELTA V and Minimum V 
Calculate C(Next-Gate) for Previous Gate 
continue 
Figure 2. Program Flov for Calculating Transistor Sizes. 
27 
28 
After the driver's output node capacitance is determined, Path IO 
checks the gate type versus an allowable gate table (discussed later). 
If it is a transmission gate, a variable will be incremented such that 
the subsequent iteration will size the corresponding driver to consider 
another series gate. After incrementing this variable, the program 
continues by beginning the next iteration. (Note, if a transmission gate 
is at the beginning of the delay path, it will be ignored and left 
unsized.) Details regarding improvements for sizing transmission gates 
are included in the "enhancements" chapter of this paper. 
If, however, the gate type is not a transmission gate, then the 
effective P and N sizes (see "Key Algorithm Points" section) are 
calculated. Then, if the gate is of the Precharge type, the program will 
request some precharge time, size the single precharging transistor 
(following the relationships of equations <8> and <9>), then calculate 
the effective size of the evaluation transistor stack. This latest 
calculation employs an iterative method. Basically, since a precharge 
gate will have its switch point offset quite significantly away from half 
of the supply voltage (for an N evaluation stack the switch point will be 
<Vsupply/2), the speed in pulling that node to the rail through the stack 
of evaluation transistors is faster than if the switch point was exactly 
Vsupply/2. Therefore, the program will calculate the delay time based 
originally on the full complementary effective size compared to the 
calculated size of the precharging transistor and the following empirical 
relationship: 
gatedelay = gatedelay*(l/beta_ratio)**0.25 
<20> 
29 
where beta_ratio is the ratio of the effective beta of the evaluation 
transistors to the beta of the precharge transistor. For example, a 
precharged NANO gate (positive logic) would have a stack of N transistors 
driven by the input in series with one N device driven by an "enable" 
signal while the pull-up capability is handled by a P channel driven by a 
"precharge" signal (usually the same "enable" signal). Thus, beta ratio 
is equal to the (Bn/Bp) term in equation <12>. If we had a precharged 
NOR gate where the evaluation transistors were P channels, then 
beta ratio would be (Bp/Bn). 
The iteration loop of homing in on the evaluation transistor sizes 
of a precharge gate after the precharge transistor size is determined 
includes (1) calculating beta ratio (first pass of iteration: use 
effective evaluation transistor size equal to full complementary sizes), 
(2) calculate gatedelay (first pass of iteration: use gatedelay = dpg, 
the allocated "delay per gate"), (3) if this gatedelay has exceeded the 
original allocation (dpg), then exit, else continue iterating, (4) reduce 
effective evaluation transistor width (small incremental reduction; about 
2%), increase fanout appropriately (call it temporary_fanout) then 
recalculate the gatetime based on: 
gatetime (temporary_fanout - b_pass)/m_pass 
<21> 
Then return to step (1). 
Now that the effective transistor sizes are known, the actual sizes 
must be calculated considering both the number of transistors in series 
and the number of drains in parallel for a particular gate. These are 
both provided with a look-up table of allowable gates. From this table, 
30 
the program knows both the number of P and N channel transistors in 
parallel and series for a particular gate type. This table will be 
discussed in the next chapter. 
The series consideration is straightforward, simply multiply the 
effective transistor width by the number of series transistors. This was 
discussed and illustrated in the "Key Algorithm Points" section. 
Diffusion capacitance is particularly important when parallel 
transistors are used. This is often the case regarding multiple input 
gates. Since the fanout curves should have been generated considering 
diffusion capacitance of both the single N and single P transistors of 
the inverter driver, the program will adjust the original value of fanout 
whenever more drain areas are needed to accommodate the number of inputs 
to the particular gate. In other words, because two parallel transistors 
can share the same drain, diffusion capacitance increases with the number 
of parallel inputs, n, as trunc((n+l)/2) where the trunc operation simply 
truncates the result to an integer. So whether there are 3 or 4 parallel 
transistors, for instance, the diffusion capacitance would increase by 
100% in either case. Of course, the value of the capacitance associated 
with the drain diffusion varies with potential. In the program, this 
effect is implemented by choosing an empirically derived multiplier 
(0.63) representing an approximate median value of capacitance pertinent 
between O and 5 volts. 
The program accommodates parallel transisto~s by changing the fanout 
for a particular gate. In other words, since the output capacitance is 
actually larger because of drain capacitance, _the program reduces the 
fanout to assure that a large enough V will result to accommodate the 
31 
extra load. 
This model of compensating for parallel transistors is an 
approximation. Certain gates may require further approximations like 
those having the same considerations of precharged logic gates. That is, 
since the transistors within the evaluation block of a precharged circuit 
are not tied to the appropriate supply but rather are in series with the 
enable transistor (to avoid charge share problems), both the drain and 
source diffusion capacitances affect the response. The source's 
diffusion capacitance will not affect the gate's response by as much as 
the drain's because there is one less transistor in series to the supply 
at the source. The program approximates the penalty of this affect in 
terms of a required fanout change. 
CHAPTER 3 
THE ADA PROGRAM IMPLEMENTATION 
The bases for the sizing algorithm have been discussed. In this 
section, the program developed to perform sizing of a specified speed 
path is introduced and its structure and input requirements are 
disclosed. 
Figure 3 is a Booch diagram (6) illustrating the program's 
facilities and interfaces. The main program, called Path_IO, takes an 
input file, parses it, checks continuity of connections, then performs 
the sizing operations from user information and subsequently writes an 
output file indicating the transistor sizes for these gates. Path List 
is a package written to basically provide the mechanisms to dynamically 
allocate memory and retrieve that information as Path IO works through 
the delay path. Techfile is the aforementioned technology table where 
the fanout relations reside for CMOS process variations and differing 
operating conditions. It is also a package. 
The input file is simply a serial path of gates. The first piece of 
information needed is some arbitrary name distinguishing this gate from 
all others of the path. The next column is an allowable gate name. The 
allowable gate names are contained in a table within Path IO along with 
the following attributes of the particular logic gate: 1) the number of 
inputs and 2) the number of P and N transistors in parallel and in 
series. After the gate name are the input and output node numbers 


















Figure 3. Booch Diagram of Ada Program's Facilities and Interfaces. 
34 
Finally, a value of capacitance representing the total load on the 
particular output node, besides that of any subsequent gate of the delay 
path, is the last item included in each row of the input file. Following 
is an example of an input file. 
Ul inv 1 2 0.0 
U2 inv 2 3 0.0 
U3 nand2 3 13 4 2.0e-12 
U4 inv 4 5 0.0 
gates inv 5 6 10.0e-12 
Path_I0 will take this input as a record of type "gate type." The 
gate name, type, number of inputs, etc., become fields of this record. 
A generic link list package, named Path_List, was created to 
dynamically allocate memory for the data structure and perform operations 
such as formatting or initializing an element (getnode), adding (addinfo) 
and inserting elements (insert), retrieving the number of elements in a 
list (numnodes), retrieving a particular element (retrievenode) and 
printing the list (print). "Ptrtype" is an access data type of user 
defined "LLtype" which, together, allow the dynamic creation of objects. 
In addition to Path_List, Path_I0 requires a data program to include 
information about the particular process being employed. This data 
program is named Techfile. It contains the fanout versus delay per gate 
equations and all the necessary process parameters to calcul~te 
transistor widths including parameters 
capacitance, diffusion capacitance and 
Techfile are listed in Appendix D. 
pertinent to 
VPN . . Path_IO, 
input gate 
Path List and 
35 
The pre-algorithm execution effort has been discussed in the section 
entitled "Precision Sizing with SPICE." Once Techfile is prepared as 
defined and compiled and linked along with Path_IO, the user can run 
Path IO to have a series string of logic gates sized (as long as those 
logic gate types have been included in the aforementioned "allowable gate 
table"). The following discussion involves the actual execution of the 
sizing program. An example of this is in the following chapter. 
The user interface is simple. As mentioned before, Path_IO requires 
an input file and will request the following information in order: 
1) the total path delay desired, 2) desired output level (50% or 10/90%), 
3) appropriate t emperature, 4) process considerations (e.g., "typical," 
etc.), 5) supply voltage and, if precharged devices are used, 6) the 
allocated precharge time. 
To figure out the fanout per gate, the delay per gate must be known. 
This is done by dividing the allocated delay path time by the number of 
"effective" gate delays in the path. The term "effective" is used here 
because special consideration needs to be given to transmission gates 
(they do not count as having a GATE DELAY) and what the desired output 
level is of the delay path (50% is defined as 1 GATE DELAY while 10/90% 
is 3.3 GATE DELAYS, based on the RC exponent coefficient multiple of a 
10% discharge level compared to a 50% level). 
The first five aforementioned requests from Path IO are utilized in 
deriving the gate to gate fanout value. If this value is less than 1.0, 
then an error is invoked because the allocated time is too small for this 
particular delay path and execution is termina~ed. Otherwise, execution 
continues as is illustrated in Figure 2 where, for this section of the 
36 
program, each gate is evaluated through the flowcharted sequence of this 




SYNTHESIS USING THE SIZING PROGRAM 
The prior chapter explained the program's operation. This section 
is a discussion of the results after employing the sizing program on 
three test cases: 1) a path with precharged and complementary CMOS logic 
mixed, 2) NP domino CMOS logic path, and 3) true and complementary CMOS 










T T - -









CLOCK ---l CLOC KBR R ---l 
CLOCK ~ CL OCK BAR°1 CL OCK ---l 
--= ::-
::-
0. 1 pF 0 . l pF O.4 pF 
T T 
Figure 4b. NP Domino Logic 
U 1 









In the first example, Figure 4a is a schematic of a delay path with 
fully complemented logic mixed with a transmission gate and a precharge 
gate. The input file to utilize the sizing program would look as follows 
(note that the device names in the first field are arbitrary): 
Xl nand2 1 2 3 l.4E-14 
x2 inv 3 4 1.0E- 14 
x3 xgate 212 4 213 14 0.3e-12 
x4 nand3 5 14 6 7 144.2e-15 
xS nor2 40 7 104 4.555e-13 
xSa nprel 100 101 102 103 104 8 1. Se-12 
x6 inv 8 9 6.32e-12 
39 
The program would respond in this manner (with the user's response in 
parentheses): 
inv 8 9 6.32000E-12 
nprel 100 101 102 103 104 8 1. S0000E-12 
nor2 40 7 104 4.SSS00E-13 
nand3 5 14 6 7 1.44200E-13 
xgate 212 4 213 14 3.00000E-13 
inv 3 4 1.00000E-14 
nand2 1 2 3 1.40000E-14 
Enter the desired prop delay>> (25.0E-9) 
You may measure the propagation time from 50% of the input signal 
to either the 10/90% output level or the 50% output level. 
Do you want the 10/90% output level? Y or N: 
prop = 2.50000E-08 
num delays= 6 
dpg- = 3.01205E-09 
Enter desired temperature(-55, 25, 125): 
Enter desired process (T=typical, B=best, V=worst): 
Enter the supply voltage (4.5, 5.0, 5.5): 
The calculated fanout is 1.16978E+0l 
Enter the precharge time in seconds>> 
P:N Vidth Ratio is 1.80000E+00 






Please note, the resulting fanout value corresponds to the particular 
technology file (Techfile.ada) listed in Appendix D. The program writes 
an output file with the final transistor size values as follows: 
The calculated fanout is 1.16978E+0l 
40 
NAME TYPE INPUTS OUTPUT CLOAD PYIDTH NYIDTH 
Xl nand2 1 2 3 1. 400E-14 2.031E+OO 2.213E+OO 
x2 inv 3 4 1.000E-14 l.671E+Ol 9.462E+OO 
x3 xgate 212 4 213 14 3.000E-13 1.671E+Ol 9.462E+OO 
x4 nand3 5 14 6 7 1. 442E-13 S.764E+OO 9.339E+OO 
xS nor2 40 7 104 4.SSSE-13 2.776E+Ol 7.999E+OO 
xSa nprel 100 101 102 103 104 
8 1.SOOE-12 1.472E+Ol 4.167E+Ol 
x6 inv 8 9 6.320E-12 1.514E+02 8.429E+Ol 
The RELATIVE LAYOUT AREA figure of merit is 7.47641E+02 
Also, note that a ''relative layout area figure of merit" is presented. 
This parameter is an indication of the area needed to realize this 
circuit with the listed sizes. The key word here is "relative," that is, 
this value should be compared to the same of other implementations of the 
particular function. If another implementation resulted in a "relative 
layout figure of merit" which was half of the above, it would be safe to 
say the resulting layout could be half the area. 
Yith the allocated propagation time of 25ns and an output level of 
10/90% as entered above, SPICE revealed the actual delay for this circuit 
using the sizes generated by the sizing program to be 27ns, an 8% error 
from the allocated time. 
Figure 4b is a diagram of NP Domino logic. The allocated time for 
this path was lSns for an output level o( 10/90%. The subsequent 
simulated SPICE delay time was 15.Sns, an error of 3.3%. 
41 
Figure 4c illustrates the schematic of a True and Complement Clock 
Driver. The purpose of this circuit is to have the CLOCK and CLOCKBAR 
signals switch simultaneously and optimumly each reach the 50% level at 
precisely the same instant. To achieve this, the following relationship 
should be considered: 
(Cc/Ccin)**(l/Sc) (Ccb/Ccbin)**(l/Scb) 
<22> 
where Cc, Ccb, Ccin and Ccbin are the capacitances associated with CLOCK, 
CLOCKBAR and the input signals to each, respectively, while Sc and Scb 
are the number of stages for CLOCK and CLOCKBAR from the input. Note 
that Ul is the common driver and, as far as equation <22> is concerned, 
its value of input capacitance is considered for each case, CLOCK and 
CLOCKBAR. That is, the actual input capacitance of Ul is Ccin + Ccbin. 
To utilize the above relationship, one must assume a value for Ccin, 
Ccbin or their sum. 
Some practical design considerations of this circuit are 1) the path 
with the greater number of stages (some odd number more than the other 
path) should drive the larger of the two loads and 2) the transition time 
of the signal with the lower number of stages from the input will 
naturally be slower than the other signal because the fanout must be 
larger to satisfy the relationship of equation <22>. Hence, the designer 
must provide enough stages to support the total fanout without slowing 
the transition time significantly (3 stages is about the minimum). The 
slower a clock's transition time, the more probable are race conditions 
(especially if transmission gates are employed ~s inputs to latches). 
42 
To size this clock driver, the designer considers one path at a 
time, CLOCK or CLOCKBAR. Once sizes are achieved using the same 50% to 
50% propagation delay for each path, the designer then puts the paths 
together. The best way of doing this is to consider the path with the 
lower value of fanout, consider the total driven gate load where the 
paths branch, then divide this load by the lower fanout value to 
determine the size of the "common" driver. In this example circuit, 
however, the author simply added the sizes of the first gate of each path 
together to determine the final sizes of this clock driver. The 
allocated time from 50% input to 50% output was 9.0 nanoseconds. 
Utilizing the widths calculated by the sizing program then running SPICE, 
the resulting overall propagation delay was 9.3 nanoseconds, a 3.3% 
error. The clock skew was 0.3 nanoseconds, thus illustrating the 
usefulness of this approach for true and complement clock drivers. Using 
a faster parameter set for P or Nor both kinds of transistors with this 
circuit, sized as it is, results in negligible changes to this value of 
skew. 
Since the SPICE results were consistently slower than the allocated 
time for each of these three examples, the user may want to simply 
allocate 10 to 20% more time for the sizing program effort. Or, when 
compiling the fanout relationships, burden the data by 10% for 
safeguarding. The author believes this slight error in measurement is 
caused by 1) the inaccuracy of switch point determination and 2) process 
dependencies. 
CHAPTER 5 
ENHANCEMENTS TO THE SIZING IMPLEMENTATION 
This chapter introduces ideas to improve the sizing program. From 
the opening discussions of this paper, it was established that there are 
several variables which can be used as influences in a circuit design. 
Minimizing the area of a delay path is one of these influences. However, 
providing a more universal method for sizing transistors for general 
usage is achieved by keeping a constant f(i) (resulting in a constant 
delay per gate) within a path. Indeed, this is the basis for the sizing 
algorithm described herein. In this chapter, layout area reduction 
enhancements is the primary topic; first for the general case then 
secondly concerning fully complemented multiple input gates. Then, 
improvement efforts will be focused on a broader means of sizing 
transmission gates. Finally, a brief discussion about automatic multiple 
delay path sizing is included. 
The size reduction method offered here is brief and simple. In the 
introduction of this paper, varying f(i) within a delay path was 
identified as the means for reducing layout area from a circuit design 
which was generated from utilizing a constant f(i). The important 
entities for performing this improvement activity are the distribution of 
interconnect capacitances (i.e., Cint, which, as mentioned earlier, 
includes all parasitic capacity in a delay path except for gate 
43 
44 
capacitance of those gates within the series delay path) and fsum, where 
n 
fsum ~ f(i) 
i=l 
<23> 
The front end to this sizing program enhancement would be similar to the 
present version where the inputs, such as allocated time and fanout 
relationships, etc., are identical. Thus, favg is known by Tpp/(Tk*n); 
see equation <7>. By parsing through the delay path, the sum of Cint at 
each node is totaled. 
n 
Cintsum = ~ Cint(i) 
i=l 
<24> 
The following relationship defines a particular node's fanout, f(i), for 
a first pass through area reduction. 
f(i) = [ fsum - (n*fmin) ] * [ Cint(i)/Cintsum] + fmin 
<25> 
The minimum allowed fanout is fmin. Earlier, reference [2] was 
identified with regards to keeping the fanout above 2 or 3 to avoid 
linearity problems with the way the data relations in Techfile were 
implemented. 
To maintain the speed requirement of the allocated delay, Tpp, 
equation <7> must still be satisfied. Also, a maximum f(i) was 
introduced earlier regarding the slowest slew rate the designer deems to 
be acceptable for that particular delay path; call this boundary fmax. 
45 
So then, f(i) of equation <25> will be altered under two conditions: 
1) if fsum is greater than favg*n or 2) if f(i) is greater than fmax. To 
go about performing such alterations, first, it seems that simply 
decrementing all fanouts equally such that no f(i) is less than fmin 
until fsum = favg*n may be as good as any iterative solution for 
satisfying the first condition. Then, any f(i) greater than fmax would 
be reduced to fmax and the difference would be spread to all other nodes 
with those having the greater Cint(i) receiving more of this difference, 
as long as fmax is not violated. 
In this way, the emphasis of size reduction is placed upon those 
nodes with larger values of Cint(i). This is a "quick and dirty" 
implementation which does significantly reduce the area and has merit, 
especially whenever Cint is randomly spread throughout a delay path. 
Perhaps a better but more involved approach would be desirable where the 
reduction emphasis is focused on large values of the total of Cint(i) 
plus C(i). Although this implementation is more CPU intensive because 
changes to C(i), the ith driver's input capacitance, would result in 
changing every C(i), greater area reduction should be achieved. 
Some area can be saved by altering the gate to gate fanouts in order 
to reduce some burden from logic gates composed of stacked transistors, 
especially when such a gate must drive much of a load. The following 
routine would be exercised on an already sized delay path (whether or not 
f(i) is constant does not matter) to change the fanout of certain 
multiple input gates and have other gates pick up the slack. Although 
such an effort should probably be preceded by a rule checker that 
considers when adding buffers into a delay path is more prudent than 
46 
altering fanouts for these multiple input logic gates (1), the discussion 
herein will assume a fixed logic schematic. 
The gist of the routine to handle these multiple-input gates focuses 
on finding gates which are inefficiently sized and reducing their widths 
(i.e., increasing their fanout). Then, it is necessary per equation <7> 
to decrease the fanout from a NON-inefficient gate to maintain total path 
propagation delay. These alterations must be done iteratively and will 
continue until a minimum total V or best layout figure of merit is 
achieved. 
The first detail of the multiple-input gate enhancement is defining 
some criterion to identify the inefficient gates. An "inefficient" gate 
of a delay path is one that has too many stacked transistors to drive a 
load relative to that of other gates within the same delay path. 
Therefore, if the number of series transistors of a particular gate (P or 
N) exceeds the average of the delay path, one could say this gate is 
"relatively" inefficient. To alleviate the need to separately consider P 
and N transistors (which basically means that one would want to avoid 
changing a gate's P and N transistors by different proportions for fear 
of changing the switch point while the fanout was being altered) the 
criterion could be changed to comparing the number of INPUTS of a gate to 
that of the average input count per gate of a delay path, providing this 
discussion is limited to fully complemented logic gates and not precharge 
gates. 
47 
The next step is to decide which inefficient gate should have its 
fanout altered next. Improving the gates with the largest width would 
certainly have the greatest affect on area savings. So the gates with 
the largest widths would be operated on first. The actual operation 
performed on these gates will be an iterative reduction of transistor 
width. This reduction is called the fanout increment rate, or FIR. A 
smaller FIR (i.e., smaller granularity or finer resolution) will cause 
more iterations to be executed before the "optimum" value of width is 
attained. On the other hand, a large FIR may not even yield one 
successful pass through the algorithm, hence leaving the inefficient gate 
sizes unchanged. The author had thought a good starting point for a 
fanout increment rate would simply reflect the "inefficient gate 
criterion.'' That is, the NEV fanout of the inefficient gate could be 
defined by multiplying its original fanout by the FIR which is the 
particular gate's number of inputs divided by the average number of 
inputs per gate of the delay path. For finer granularity, however, it 
seems more appropriate to set 
FIR= ((no. inputs)/(avg. no. inputs of path))**0.5 
<26> 
Once the width reduction has been executed on one inefficient gate, 
a NON-inefficient gate must "pick up the slack" such that the total path 
propagation delay is maintained. The first thing that must happen now is 
to determine how much slack needs to be accommodated. The fact that an 
inefficient gate's sizes were reduced means that the DRIVER of this gate 
has had a DECREASE in fanout which consequently reduces the amount of 
"slack" that needs to be burdened on the NON-inefficient gates. The NEV 
fanout of the driver gate is 
48 
f(i) = [ fo(i+l)/f(i+l) ] * fo(i) 
<27> 
where fo(i+l) and f(i+l) represent the previous and present fanout, 
respectively, of the DRIVEN gate, which is the inefficient gate in this 
case. The previous value of the fanout for the DRIVER is fo(i). So, the 
NET FANOUT REDUCTION (f-net) that needs to be burdened on the 
NON-inefficient gates is 
f-net [ f(i) + f(i+l) ] - [ fo(i) + fo(i+l) ) 
<28> 
The next step once £-net is determined is to distribute this amount among 
all of the NON-inefficient gates. More effort is needed in this area to 
establish a workable approach to this problem. A simple solution would 
be to divide £-net by the number of NON-inefficient gates and burden each 
of these gates with that average. The problem with this means of 
distributing the burden is that no consideration is given to the larger 
sized gates; that is, the total added transistor width could be very 
different from one gate to the next and thus would considerably reduce 
the overall area savings of the enhancement algorithm. A better approach 
might be to find the smallest width gates and reduce fanouts to the 
limit, one by one, until no more enhancement can occur. 
After redistributing the fanout among the NON-inefficient gates, the 
transistor sizes of the delay path will be recalculated using the new 
fanout values. If the path's total V sum is less than its previous value 
before this latest iteration, then these sizes and fanouts become the 
record values of the gates and a subsequent iteration (involving reducing 
the width of the same inefficient gate) begins as described. If the 
49 
total Vis NOT smaller, then the designated inefficient gate will be 
tagged so that the program does not operate on it anymore. Hence, in 
this case, the next highest priority inefficient gate, if any, will be 
scrutinized by the enhancement routine. 
The following is a synopsis of the previous discussion regarding the 
functions that must be performed to execute the multiple-input gate 
enhancement. 
STEP 1: Identify all inefficient gates. 
STEP 2: Select which inefficient gate to start the width reduction 
procedure (which has not yet been "tagged"). 
STEP 3: Perform alteration. 
STEP 4: Update driver gate's fanout per this change. 
STEP 5: Find quantity of fanout to be reduced from path to maintain 
the total path propagation delay (per equation <7>). 
STEP 6: Distribute this fanout reduction onto NON-inefficient gates. 
Note, this fanout must never be less than 1.0 for any gate, 
and possibly not less thane (refer to the discussion 
regarding the nonlinear relation between low fanout values and 
delay time in the Ada program implementation section herein). 
STEP 7: Recalculate the path's widths. 
50 
STEP 8: Check that the newly calculated total width sum is less than 
the previous iteration's value. If so, continue iterating (go 
to STEP 2); else, discard new values of widths and f an.outs, 
"tag" this particular gate, then go to STEP 2 and perform an 
iteration for the next highest priority inefficient gate. If 
this was the last gate, then exit. 
Note that "tagging" is used to identify the inefficient gates 
already optimized by the multiple-input gate enhancement routine so that 
they will no longer be considered for width reductions. Also, regarding 
STEP 6, a fanout of less than 1.0 represents a loss of stage to stage 
gain and is not acceptable because the delay corresponding to such a 
fanout does not represent the actual delay. 
Other random bits of information regarding the aforementioned 
enhancement are (1) precharged gates are ignored because they are already 
optimized by definition; and (2) transmission gates will be treated as 
they were in the sizing routine (i.e., considered only as an extra series 
gate). This leads us into the discussion of improving the program as far 
as transmission gates are concerned. The present method of simply 
treating a transmission gate as an extra series transistor to the driver 
has merit only in the case where the load at the output of the 
transmission gate is significant compared to the load at its input (the 
driver's output). However, when this kind of load distribution is not 
the case (i.e., input pass transmission gate of a latch or master/slave 
flip flop), an alternative means of considering transmission gates is 




_j_ C 1 C2 
T --.-1 
Figure 5. RC Network of Device Driving Transmission Gate 
As Cl increases compared with C2, the driver's size (the driver 
being represented by Rd) becomes more dependent on Cl and less dependent 
on the delay through the transmission gate. Bence, the series resistance 
associated with the transmission gate (Rx) plays a smaller role in 
determining the total delay. One method of handling this case would be 
to add another type of transmission gate to the allowable gate table of 
Path IO which would cause some routine to evaluate the ratio of Cl to C2 
in the size calculations of the devices. For example, the effective 
width of the driver, Vd, could be defined as 
Vd Veff * { 1 + [ C2/(Cl+C2)]**0.5} 
<29> 
where Veff is the effective sizes of the inverter equivalent (of the 
driver of Figure 5 in series with the transmission gate) necessary in 
driving a load (Cl+ C2) within the allocated delay per gate. The sizes 
of the transmission gate, Vx, could be 
Vx = Vd * [ C2/(Cl+C2) ]**0.5 
<30> 
52 
These relationships are empirical. 
Another means of handling differing load distributions with 
transmission gates is to completely overhaul the way the program counts 
gate delays. Rather than not counting transmission gates as a gate delay 
and considering them as series transistors, transmission gates would be 
counted as a partial delay depending on a suitable approach of utilizing 
the capacitance ratios of Figure 5. As the ratio of C2 to Cl approaches 
zero, the value of the gate delay would tend towards one for the 
transmission gate. Then, the delay per unit gate can be determined which 
defines how much time to allow for a given transmission gate. On the 
assumption of utilizing a time constant of 1 (t=lRC), the following 
relation can be used: 
td + tx Rd* (C1+C2) +(Rx* C2) 
<31> 
where td and Rd relate to the driver's delay per gate and average Ron, 
respectively, while tx and Rx apply to the transmission gate. Thus, 
Rx= tx/C2 
<32> 
Then, equation <9> enables the transmission gate's widths to be 
calculated. 
Cl and C2 would need to include the drain and source capacitance of 
the transmission gate. This requirement would likely render the latter 
enhancement to utilize iteration in obtaining the solution since these 
parasitics are·not based upon the driver's sizes. 
53 
The present implementation of the program considered xgates in 
applications such as drivers of moderately . loaded buses and pass gate 
logic and not inputs to latches and flip flops. The author recommends 
utilizing tristate inverters as inputs to latches. This approach costs a 
little more as far as area but eliminates a race condition inherent with 
transmission gates and associated clock skew and transition time. 
The most difficult enhancement to implement would be one which would 
allow automatic sizing of multiple delay paths (i.e., an entire IC, a 
block of logic, etc.). Items to be addressed in this sort of effort 
include the user interface (As in how does the user define the start and 
stop points for delay paths?) and precedence (Yhich branch should 
initially be the target of the algorithm? Yhat should be assumed about 
the gates at the branch junctions? And how should recursion from path to 
path be handled?). The term "branch" refers to any appendage of a delay 
path which forms at least one other delay path; sometimes called 
"diverging paths." A branch in a particular delay path can be located 
anywhere within that path such that certain devices may be part of more 
than one delay path. 
Matson (7) describes a system's data structure for sizing 
transistors of a delay path. Although diverging paths were not 
considered in this reference, the notion of recursion was introduced in 
some capacity. Recursion is imperative in optimizing sizes for area. 
The following briefly describes ideas envisaged for the development of a 
global automatic transistor sizer. 
54 
A means must exist to extract timing information to attach to each 
gate. One possible implementation would be to utilize a particular 
control signal, or signals (e.g., clocks), as a guide in determining the 
beginning and end of delay paths in addition to the timing specification 
(i.e., to incorporate input/output timing). It would be necessary, in 
the case of these control signals, to attach an attribute to a pin of a 
device to indicate to the sizer that this particular pin is a start or 
stop point for a path (e.g., the clock input pin of a flip flop or the 
enable pins of a tristate inverter). For calculation purposes, the sizer 
translates the devices at such pins into capacitors for use in sizing the 
driving path. Chip inputs and outputs would automatically be deciphered 
as termination points. 
To start, outputs of all the delay paths of interest would be marked 
by the user. The sizer would trace these paths to their origins, or as 
far as the user wished. Yhen a diverging path is encountered, the sizer 
could trace forward to the next output (or all the way to an output pin), 
then retrace back to the originally encountered junction. In this manner 
the sizer is 1) able to attach a timing requirement to each path, 
2) knows the beginning and end of each path, and 3) remembers the number 
and type (type being pertinent if "macro" logic gates are used which are 
composed of more than one gate delay) of each delay element per path. 
Once the jth path is completely traced from its output and its 
timing requirement established, Tpp(j), an initial delay per gate would 
be assigned to each of the i logic elements in that path: 
tp(i,j) = Tpp(j)/n 
<33> 
55 
where n is the number of gate delays in j . To junction gates, those 
logic elements which drive devices of more than one path, the sizer will 
assign the minimum tp(i,j) associated with the appropriate diverging 
paths. As described in Chapter 2's section entitled "Key Algorithm 
Points," a gate delay value is translated to a fanout value and sizes are 
calculated as before except junctions can only be considered after all 
the appropriate branches are sized. 
These sizes can be optimized for area, if desired. Optimization is 
very CPU intensive. Decisions of determining the allowable degradation 
of output transition time for a given gate (i.e., reference discussion of 
Ron(i) versus slew rate [1]) will require recursion at path junctions. 
Besides all of the fanout alteration computations discussed earlier, the 
sizer must also be able to look at more than one path, determine if one 
takes precedence over another regarding area optimization, and assign new 
values of f(i) (to appropriate values of fo(i)}. 
Such an undertaking would certainly require considerable resources. 
The necessity of tying several levels of abstraction together for use 
with this tool is a task in itself. Given a logic schematic, for 
instance, it intuitively seems that sizing and performing the layout 
routing are alternating, trail and error, tasks. At this time, there 
does not appear to be one best nor any easy means of implementing sizing 
and area optimization on a global basis. The author will investigate 
alternatives for bringing this sizing algorithm to the most usable state. 
SUMMARY 
This paper has described a method which ensures the IC designer a 
short order means of deriving device sizes for CMOS logic speed paths. 
Not only will this routine handle fully complemented logic but also 
precharge (e.g., Domino, etc.) and pass gate logic as well. The end 
result in utilizing this sizing methodology is the ability of achieving a 
quick circuit design for a given delay path with an allocated propagation 
time while expending minimum resources in performing circuit simulation. 
Indeed, not utilizing this sizing facility translates into a penalty in 
the design schedule and possibly in area efficiency. In addition, this 
paper suggests methods of how to incorporate area reduction. 
The author plans to employ the area reduction improvements discussed 
earlier and hopes to build these methods into a semicustom CAD tool. It 
is expected that a similar sizing method could be used for any other 
technology (e.g., bipolar, GaAs-MESFET, etc.) by redefining fanout and 




MINIMUM INPUT CAPACITANCE OF A DELAY PATH 
APPENDIX A 
MINIMUM INPUT CAPACITANCE OF A DELAY PATH 
Below is an intuitive illustration of how a minimum path input 
capacitance is obtained by keeping the gate-to-gate fanout, f(i), 
constant. The following is defined: 
n 
favg = ~ f(i)/n 
i=l 
and 
dev(i) f(i) - favg 
Let the area of the driver associated with C(l) be A(l). 
A(l) = L * Y(l) 
L*Y(i) = C(i)/Cox 
C(i) = C(i+l)/f(i) 




If just two of the fanout ratios are altered from the average, one to 
offset the other to maintain the requirement of keeping the total path 
propagation delay constant (the Tpp equation), then the non-constant 




delta A(l) = [C(n+l)/Cox]*{ [l/((favg+dev(i))*(favg-dev(i))*favg**(n-2))) 
- [ 1/(favg**n) ] } 
or 
delta A(l) = [C(n+l)/Cox]*[l/favg**n - (dev(i)**2)*favg**(n-2) 
- 1/ ( favg**n)] 




PAIR DELAY ANALYSIS TO SOLVE FOR VPN 
APPENDIX B 
PAIR DELAY ANALYSIS TO SOLVE FOR YPN 
Solving for an optimum YPN using an inverter pair delay model is the 
topic of this appendix. It is shown that the constraints surrounding 
this analysis warrant this approach inappropriate for deriving the YPN 
used for sizing calculations. 
For fully complemented CMOS, we can calculate an optimum ratio of 
the widths for speed assuming constant fanout and assuming that all of 
the load capacitance is composed only of the subsequent device's input 
gate capacitance. So, for a pair of inverters, the first inverter 
driving the second and the second driving a load equivalent to the 
first's, the delay through them can be expressed as simply the addition 
of the RC delays of each inverter as they pull high or low to reach the 
Vinv level of the next inverter so that kn= -ln(Vinv/Vdd) and kp = 
-ln(l-(Vinv/Vdd)). 
tp = [(kp*Rp) + (kn*Rn)]*f*Cin 
<Bl> 
where Rp is the approximate average ON resistance of a P channel 
transistor, Cin = (Yp + Yn)*L*Cox and the load seen at the output of the 
pair delay is Cin. To be precise, kp and kn would be expressed in terms 
of the switch point of the inverter, Vinv. And Vinv is expressed in 
61 
62 
terms of widths, among other things. 
At this time, let's assume that the switch point is set to a 
constant and forget about its dependence on Vinv. After the 
differentiation of <Bl>, this dependency will be considered. Also, let's 
assume that Vtn = -Vtp. Differentiating, we find 
dtp/dVp = (kn/(un*(Vtot-Vp)**2)) - (kp/(up*Vp**2)) 
Note: Vtot z Vp + Vn 
Setting this to 0: dtp/dVp = 0 
And the optimum width ratio for speed is 
opt{Vp/Vn) = [ (kp*un)/(kn*up) ]**0.5 
<B2> 
<B3> 
Note that Vinv < Vdd/2 in this case and, hence, the actual optimum (had 
Vinv's dependence on the width been considered) would be 
(un/up)**0.5 ~ opt(Vp/Vn) < un/up 
<B4> 
However, remember an assumption eluding to this analysis: Cload is 
composed only of the active load of one of these inverters. If we let Cl 
and C2 be interconnect loads with Cl being driven by the first inverter 
(pulling high in this example) while C2 is driven by the second, we find 
the optimum width ratio for speed is 
opt(Vp/Vn) [ (Cl*un*kp)/(C2*up*kn) ]**0.5 
<BS> 
And since the interconnect capacitance ratio in a pair delay is random 
(ie, not related to any other parameters), the absolute optimum value 
won't be consistent from node to node, rendering the relationship of 
equation <BS> unusable in a general case circuit analysis tool. 
APPENDIX C 
OTHER INTERCONNECT CONSIDERATIONS 
APPENDIX C 
OTHER INTERCONNECT CONSIDERATIONS 
This appendix briefly interjects ideas of how 
interface by enabling the sizing program to 
parameters and translate these into capacitances. 
for interconnect resistance are referenced. 
to improve the user 
utilize actual layout 
Also, considerations 
Enhancing the sizing program's versatility can be accomplished by 
allowing the user an option of whether to simply use this "lumped" value 
of load capacitance or enter the particulars of what composes the 
capacitance of that node. For instance, the user might be able to enter 
the dimensions and type of interconnect polygons, source or drain size, 
other gate capacitance or dimensions. The ramification of using poly or 
diffusion interconnect on calculating sizes for speed is that the 
resistance more quickly adds in these cases than for metal interconnect. 
Hence, the fanout required for a certain delay per gate would end up 
having its low boundary of feasibility be somewhat greater than what it 
would've been had no interconnect resistance been present. This, of 
course, is dictated by the amount of interconnect resistance associated 
with a node resulting in a faster logic gate delay requirement. So the 
fanout calculation would be iterative and based on the following 
sequence (8, pp. 5-11]]: 1- determine fanout excluding interconnect 
64 
65 
resistance, 2- calculate the transistor widths which in turn determines 
Ron(i), 3- add the interconnect resistance to Ron(i), 4- calculate the 
subsequent delay, 5- reduce the fanout by the factor that this new delay 
overestimates the original delay per gate, 6- continue iterating until 
the delay associated with including the interconnect resistance is 
acceptably close to the original desired delay. Note that it would be 
wise to burden all gates of a delay path with the reduced delay time 
caused by the presence of interconnect resistance rather than only the 
gate driving that resistance. 
APPENDIX D 




Gate Sizing Program 
copyright 1987, MJ Amatangelo 
PATH IO.ADA 
with text io;with path_list,techfile,float_math_lib; 
use text Io; 
procedure path io is 
package int Io is new text io.integer io(integer); 
package float io is new text io.float-io(float); 
type input type is array (integer range 1 .. 99) of integer; 
my_in_file~my_out_file:file_type; 
type gate type is record 
dev type:string(l .. 6); 

















output node,input node:integer; 





beta pn,beta np:float; -- P:N and N:P beta ratioes, respectively 
lfm:Iloat:=o:o; -- relative layout area figure of merit 
--procedure to print gate type information: 
procedure printgate(info:In gate type); 
--Instantiate package path list:-
package order is new path_Iist(gate_type,printgate); 
gate list:order.ptrtype; 
--Define procedure printgate: 
procedure printgate(info:in gate type) is 
begin -
put(info.dev type); 









open(my in file,in file,"network.dat"); 
create(my out file-;-out file, "foo. foo"); 
while not end of file(my in file) loop 
gate.dev type:=(others=>' '); 
gate.dev=name:=(others=>' '); 
get(my in file,char); 
while char=' ' loop 
get(my in file,char); 
end loop; -
gate.dev name(l):=char; 
for i in-1 .. 5 loop 
get(my in file,char); 




get(my in file,char); 
while char=' ' loop 
get(my in file,char); 
end loop; -
gate.dev type(l):=char; 
for i in-1 .. 5 loop 
get(my in file,char); 




if gate.dev type="inv " then gate.numins:=1; 
gate.pser:=l;gate.ppar:=l;gate.nser:=l;gate.npar:=1; 
elsif gate.dev type="nand2" then gate.numins:=2; 
gate.pser:=l;gate.ppar:=2;gate.nser:=2;gate.npar:=l; 
elsif gate.dev type="nor2 "then gate.numins:=2; 
gate.pser:=2;gate.ppar:=l;gate.nser:=l;gate.npar:=2; 
elsif gate.dev type="nand3" then gate.numins:=3; 
gate.pser:=l;gate.ppar:=3;gate.nser:=3;gate.npar:=1; 
elsif gate.dev type="nor3 "then gate.numins:=3; 
gate.pser:=3;gate.ppar:=l;gate.nser:=l;gate.npar:=3; 
elsif gate.dev type="nand4" then gate.numins:=4; 
gate.pser:=l;gate.ppar:=4;gate.nser:=4;gate~npar:=1; 
elsif gate.dev type="nor4 11 then gate.numins:=4; 
gate.pser:=4;gate.ppar:=l;gate.nser:=l;gate.npar:=4; 
elsif gate.dev type="xgate II then gate.numins:=3; 
gate.pser:=l;gate.ppar:=l;gate.nser:=l;gate.npar:=1; 
elsif gate.dev_type="triinv" then gate.numins:=3; 
68 
gate.pser:=2;gate.ppar:=l;gate.nser:=2;gate.npar:=1; 
elsif gate.dev type="pprel" then gate.numins:=5;--4p-nand 
gate.pser:=2;gate.ppar:=4;gate.nser:=l;gate.npar:=1; 
elsif gate.dev type="nprel" then gate.numins:=5;--4n-nor 
gate.pser:=l;gate.ppar:=l;gate.nser:=2;gate.npar:=4; 





for 1 1n 1 .. gate.numins loop 
int io.get(my in file,gate.input(i)); 






--check for continuity 
loop count:=order.num nodes(gate list); 
order.retreivenode(gate list,loop count,gate,error); 
output node:=gate.output; -
for i In 1 .. (loop count-1) loop 
order.retreivenode(gate list,loop count-i,gate,error); 
for j in 1 .. gate.numins loop -
input node:=gate.input(j); 











--Request allocated time 
loop 
begin --Exception block 
new line;put("Enter the desired prop delay»"); 
float io.get(prop);new line; 
if prop<l.Oe-11 then -
put("Error: specified prop time too small or negative."); 
elsif prop>l.Oe-5 then 
69 





when DATA ERROR=>new line(2); 
put("*** Enter a float type, e.g. 12.0E-9 "); 
skip line;new line; 
end; --end-block -
end loop; 
for i in 1 .. loop count loop 
order.retreivenode(gate list,i,gate,error); 
if not(gate.dev type="xgate ") then 




--inquire the output measurement level (10 & 90% OR 50%) 
70 
put line("You may measure the propagation time from 50% of the input signal"); 
put-line("to either the 10/90% output level or the 50% output level."); 
put("Do you want the 10/90% output level? Y or N: "); 
get(response); skip line; 
if ((response='Y') or (response='Y')) then 
put line(" (yes)"); 
dpg:=prop/(float(num delays)+2.3); --delay per gate 
else -
put line ( " (no) " ) ; 
dpg:=prop/float(num delays); --delay per gate 
end if; -
put("prop =");float io.put(prop);new line; 
put("num delays =");Int io.put(num deiays);new line; 
put("dpg- =");float_io.put(dpg); -new_line; -
loop 
begin --Exception block 
loop 
put("Enter desired temperature(-55, 25, 125): "); 
int io.get(temperature);new line; 
put("Enter desired process (T=typical, B=best, Y=worst): "); 
get(process);skip line;new line; 
put("Enter the supply voltage (4.5, 5.0, 5.5): "); 
float io.get(supply); 
new line; 
techfile.calc fanout(temperature,supply,process,m pass,m act,b pass, 
- b_act,error); - - -








put_line("*** Please enter data properly:"); 
put_line(" For Temperature, Supply or Process, enter one of the"); 
put_line(" given choices exactly; e.g. 125 or 4.5 or T "); 
skip line;new line; 
end; --end block -
end loop; 
--fanout calculations 
fanout:=(dpg*m pass)+b pass; 
fanoutact:=(dpg*m act)+b act; 
if fanout<l.O then -
raise fanout error; 
end if; -
new line; 
put{"The calculated fanout is ");float_io.put(fanout); 
new_line(2); 
Calculate the Vidths 




when data error=> new line(2); 
put line("Execution terminated ... "); 
put-line("Check TECHFILE. All of the following parameters are"); 
put-line("of type float (e.g., use 4.0, not 4):"); 
put-line("ldrawn,cox,wpn,cdwp,cdwn,minwidth,vtn,vtp,kn,kp,delw"); 
raise end error; 
end; --block 
for i in 1 .. loop_count loop 
--calculate equivalent inverter of driver 
order.retreivenode(gate list,i,gate,error); 
cprev:=cprev+gate.cload; 






-- xis an adjustment to the fanout for parallel transistors. 
note: if ldrawn p .NE. ldrawn n, simply multiply each of the 
n and p product-terms above by 1/ldrawn_n and 1/ldrawn_p, 
respectively rather than multiplying both by 1/ldrawn 






-- note: for ldrawn_p .NE. ldrawn_n, replace 
71 
"(ldrawn*cox*(wpn+l.O)*x)" with 
"(cox*(ldrawn p*wpn + ldrawn n)**x)" 
in both expressions for pwid-and nwid 
--sizing allowance for precharge logic gates 
if gate.dev type(l. .4)="ppre" or gate.dev_type(l. .4)="npre" then 
new line; -
if pctime<O.O then --inquire the precharging time 
loop 
begin --Exception block 
new line;put("Enter the precharge time in seconds»"); 
float io.get(pctime);new line; 
if pctime<l.Oe-9 then -
put("Error: specified prop time too small or negative."); 
elsif pctime>l.Oe-6 then 





when DATA ERROR=>new line(2); 
put("**~ Enter a fioat type, e.g. 25.0E-9 "); 




--calculate the precharge transistor size 
tempf:=fanout; 
gatetime:=dpg; 
ron:=pctime/(2.3*cprev); --accounts for 90% transistion [-ln(.1)] 




loop --iteratively find the effective function block transistor size 
--by comparing the adjusted time (gatetime) with dpg 
beta pn:=pwid/(nwid*wpn); 
gatetime:=float math lib.sqrt(float math lib.sqrt(l.0/beta_pn)) 
*gatetime;- - -




















gatetime:=(tempf-b pass)/m pass; 




--accommodate parallel transistors for precharge devices 
if gate.dev type(l. .4)="npre" then 
x:=fanout=(float(gate.npar/2)*0.625*cdwn 
*(l.O+((float(gate.nser-1))/(float(gate.nser))))); 




--calculate actual sizes based on transistor configuration 
gate.pwidth:=float(extra+gate.pser)*pwid; 
gate.nwidth:=float(extra+gate.nser)*nwid; 










if delw<O.O then 
put line(" Techfile parameter DELY must be positive."); 
put-line(" Execution terminated."); 
raise end error; 
end if; -
--update gate sizes into linked list 
order.add_info(gate_list,i,gate,error); 
--go back and update sizes for "driven" xgate(s) 
pwid:=gate.pwidth; 
nwid:=gate.nwidth; 




order.add info(gate list,i-j,gate,error); 
end loop; - -
73 
extra:=0; 





new line;put("P:N Vidth Ratio is ");float io.put(wpn); 
y:=Ianout/fanoutact; -
new_line;put("fanout/fanoutact ratio is ");float_io.put(y); 
Calculate relative layout area figure of merit 
for i in 1 .. loop count loop 
order.retreivenode(gate list,i,gate,error); 
if not (gate.dev name(l~.4)="ppre" or gate.dev name(l .. 4)="npre" or 
gate.dev-name="xgate ") then -
lfm:=lfm+(float(gate.numins)*(gate.pwidth+gate.nwidth)); 
elsif gate.dev name(l. .4)="ppre" then 
lfm:=lfm+((gate.pwidth*float(gate.numins))+gate.nwidth); 






\.Trite records to output file 
74 
put(my out file,"The calculated fanout is ");float_io.put(my_out_file,fanout); 
new_line(my_out_file,2); 
new line(my out file); 
set-col(my out file,1); 
put(my out-file, "NAME"); 
set coI(my-out file,9); 
put (my out-file, "TYPE"); 
set coI(my-out file,22); 
put(my out-file,"INPUTS"); 
set coI(my-out file,37); 
put(my out-file,"OUTPUT"); 
set coI(my-out file,47); 
put(my out-file, "CLOAD"); 
set coI(my-out file,59); 
put(my out-file, "PVIDTH"); 
set coI(my-out file,71); 
put(my ouCfile, "NVIDTH"); 
new line(my out file,2); 
for-i in 1.~(loop count) loop 
order.retreivenode(gate list,loop count-i+l,gate,error); 
set col(my out file,1);- -
put(my out-file,gate.dev name); 
set coI(my-out file,9); -
put(my_out=file,gate.dev_type); 
k:=4; 
set col(my out file,17); 
for-i in 1~.gate.numins loop 
int io.put(my_out_file,gate.input(i),4); 
if K=O then 
new line; 






set col(my out file,37); 
int-io.put(my out file,gate.output,4); 
set-col(my out fiie,45); 
float io.put(my out file,gate.cload,1,3,3); 
set col(my out 1ile~S8); 
float io.put(my out file,gate.pwidth,1,3,3); 
set col(my out 1ile~7O); 
float io.put(my out file,gate.nwidth,1,3,3); 
new line(my out-file,2); 
end loop; - -
put(my out file,"The RELATIVE LAYOUT AREA figure of merit is"); 
float Io.put(my out file,lfm); 
new_line(my_out=file); 
close(my in file); 
close(my-out file); 
exception -
when END ERROR=>new line(2); 
put line( "END OF FILE") ;--don't need with "end of file() II 
75 
when dev type error=>put line("Program Terminated. Unknown device type."); 
put(gate.dev type);put-line(" is not a recognized element."); 
put line("Note: All device types are in lower case only."); 
when disconnect=>new line(2); 
put line("Sorry, Termination in main program caused by DISCONNECT!"); 
put(" Device ");put(gate.dev name); 
put(''has no input matching the previous gate's output."); 
when fanout error=>new line(2); 
new line(2); -
put-line("Program terminated because of fanout error. "); 
put-line(" Error in time allotment (too small) OR"); 
put-line(" Path length has too many gate delays "); 
if error then 
put line(" OR"); 
put(gate.dev name);put line(" has too many inputs for given fanout."); 
end if; - -
when DATA ERROR=>new line(2); 
put line("Error in-input data file. Program Terminated."); 









new -line( 2); 
(--up to 6 --"); 
characters) (--integers--) (float)"); 
Device Device Inputs Output CLoad "); 
Name Type "); 




put-line(" Check Tech Table for particular DEVICE TYPE's number of inputs 
when numeric error=>new line(2); 





Temp= 125; Vdd = 4.5; Process= T (typical)"); 
Gate Sizing Program 
copyright 1987, HJ Amatangelo 
PATH LIST.ADA 
--THE FOLLOVING IS A PACKAGE TO IMPLEMENT A LINKED LIST (ORDERED) 
--OF OBJECTS VHOSE TYPE IS DEFINED BY THE INSTANTIATING PACKAGE. 
with text_io;use text_io; 
generic 
--types,functions to be instantiated by user program 
type itemtype is private; 
with procedure printlist(data:in itemtype) is<>; 
--start of linked list package 
package path_list is 
package bool io is new enumeration_io(boolean);use bool_io; 
type lltype is private; 
type ptrtype is access lltype; 
--procedure for initializing a node for a list of type lltype: 
procedure getnode(data:in itemtype;node:in out ptrtype); 
--procedure for inserting a node into the list; input is the particular 
--data and a pointer to the beginning of the list while the link is returned 
procedure insert(head:in out ptrtype;information:in itemtype); 
--procedure for printing a list 
procedure print(node:in ptrtype); 
--function for retreiving the number of nodes in the list 
function num_nodes(name:in ptrtype) return integer; 
--procedure for retreiving information of a particular node within the list 
--given the index 
procedure retreivenode(name:in out ptrtype; 
index:in integer;data:out itemtype;error:out boolean); 
--procedure for adding information to a particular node within the list 
procedure add info(name:in out ptrtype;index:in integer;data:in itemtype; 
error:out boolean); 
over range:exception; --flag for index being too big 
--private types 
private 






--start of package body for path list 
package body path_list is -









if node:null then 
put line( "no records available"); 
else -
p:=node; 





procedure insert(head:in out ptrtype;information:in itemtype) is 
current,temp:ptrtype; 
begin 
if head=null then --ie, start of new list 
getnode(information,temp); 
head:=temp; 










if name=null then 




while p/=null loop 







procedure retreivenode(name:in out ptrtype; 
79 




if index<=O then 
error:=true;put line("Error in index value"); 
else 
p:=name; 
for i in 2 .. index loop 
p:=p.next; 






when over range=>put line("Index exceeds list size."); 
- new=line; 
end retreivenode; 
procedure add info(name:in out ptrtype;index:in integer;data:in itemtype; 




if index<=O then 
error:=true;put_line("Error in index value"); 
else 
p:=name; 
for i in 2 .. index loop 
p:=p.next; 






when over range=>put line("Index exceeds list size."); 
- new=line; 
end add_info; 
end pa th_ list; 
Gate Sizing Program 
copyright 1987, MJ Amatangelo 
TECHFILE.ADA 
--This package contains the information to enable computation of the fanout 
--and transistor widths. 
with text io; 
use text Io; 
package oool_io is new enumeration_io(boolean); 
with text io,float io,int io,bool io; 
use text Io,float Io,int Io,bool Io; 
package techfile Is - -













slope and offset params for passive fanout 
slope and offset params for active fanout 
drain capacitance per unit width (Flum) 
L effective (um) 
Oxide capacitance per unit area (F/um**2) 
cox*u/2, gain factor 
device zero bias thresholds 
p:n width ratio for symmetrical transitions 
minimum device width 
loss of width because of processing 
supply voltage provided by user 
--Procedure that given a prop delay per gate in seconds, temperature in 
--degrees centigrade (valid temperature values are -55, 25, 125), supply 
--voltage (valid supply voltages are 4.5,5.0,5.5) and process integrity 
--identifier (w:worst case, t:typical, b:best case) will return parameters 
--to enable fanout calculation. 
procedure calc fanout(temperature:in integer;supply:in float; 
process:in character;m pass:out float;m act:out float;b pass:out float; 
b act:out float;error:In out boolean); - -
--Procedure to pass parameters for calculating widths and capacitance; 
procedure get params(ldrawn:out float;cox:in out float;wpn:in out float; 
cdwp:out float;cdwn:out float;minwidth:out float;vdd:in out float; 
vtn:in out float;vtp:in out float;kn:in out float; 
kp:in out float;delw:out float); 
end techfile; 
package body techfile is 
-- SPICE FILE 
--* 1.25 < EPI < 2.0 OHM-CM; 1.5K < PYELL < 1.9K OHMS/SQUARE 
--.MODEL PCH PMOS(VT=-1 UB=240 FRC=0.2 TOX=300 DNB=68.54E14 XJ=0.4 
--+ LATD=.1 OXETCH=-.4 DEL=0.2 CJA=0.28E-3 PHA=0.50 LEVEL=4 
--+ EXA=0.50 EXP=0.20 PHP=0.20 CJP=O.SSE-9 TCV=0.003 BEX=l.9 PTC=9E-5 
80 
--+ FSS=2.2Ell FSB=l.9E-4 SCM=0.5 ECV=2.5 VST=4E7 NVM=0.85) 
--.MODEL NCH NMOS(VT=l UB=590 FRC=0.03 TOX=300 DNB=1094E14 XJ=0.4 
--+ LATD=.1 OXETCH=-.4 DEL=0.1 CJA=l.12E-3 PHA=0.85 LEVEL=4 
--+ EXA=0.50 EXP=0.20 PHP=0.40 CJP=l.3E-9 TCV=0.003 BEX=2.2 PTC=9E-5 
--+ FSS=l.2Ell FSB=2E-5 SCM=0.3 ECV=2.5 VST=4E7 NVM=l.O) 
--* 
procedure get_params(ldrawn:out float;cox:in out float;wpn:in out float; 
cdwp:out float;cdwn:out float;minwidth:out float;vdd:in out float; 
vtn:in out float;vtp:in out float;kn:in out float; 















Flum (for bias of O volts across junction) 
Flum ( " ) 
um (used to determine gain ratio) 
um ( " ) 









wpn:=1.80; P to N width ratio at min L to yield vinv=vdd/2; if 0.00, 
then calculate wpn using leff's, vt's and k's. 




procedure calc fanout(temperature:in integer;supply:in float; 
process:in character;m pass:out float;m act:out float;b pass:out float; 
b act:out float;error:In out boolean) Is -
begin 
case process is 
when 'b' l'B' => error:=false; 
put line("Best Case parameters aren't available yet.");raise end_error; 
case temperature is 
when -55 => 
if (supply<4.6 and supply>4.4) then 
m pass:=-1.0;m act:=-1.0;b pass:=-1.0;b act:=-1.0; 
elsif (supply<S.I and supply>4.9) then -
m pass:=-1.0;m act:=-1.0;b pass:=-1.0;b_act:=-1.0; 






when 25 => 
if (supply<4.6 and supply>4.4) then 
m pass:=-1.0;m act:=-1.0;b pass:=-1.0;b act:=-1.0; 
elsif (supply<S.I and supply>4.9) then -
m_pass:=-1.0;m act:=-1.0;b pass:=-1.0;b act:=-1.0; 
elsif (supply<S.6 and supply>S.4) then -
m pass:=-1.0;m act:=-1.0;b pass:=-1.0;b act:=-1.0; 
else - - -
error:=true; 
end if; 
when 125 => 
if (supply<4.6 and supply>4.4) then 
m pass:=-1.0;m act:=-1.0;b pass:=-1.0;b act:=-1.0; 
elsif (supply<5.I and supply>4.9) then -
m pass:=-1.0;m act:=-1.0;b pass:=-1.0;b act:=-1.0; 
elsif (supply<5.6 and supply>S.4) then -
m pass:=-1.0;m act:=-1.0;b pass:=-1.0;b act:=-1.0; 
else - - -
error:=true; 
end if; 
when others => error:=true; 
end case; 
when 'w' l'V' => error:=false; 
82 
put_line("Vorst Case parameters aren't available yet.");raise end_error; 
case temperature is 
when -55 => 
if (supply<4.6 and supply>4.4) then 
m pass:=-1.0;m act:=-1.0;b pass:=-1.0;b act:=-1.0; 
elsif (supply<S.I and supply>4.9) then -
m pass:=-1.0;m act:=-1.0;b pass:=-1.0;b act:=-1.0; 
elsif (supply<5.6 and supply)5.4) then -
m pass:=-1.0;m act:=-1.0;b pass:=-1.0;b act:=-1.0; 
else - - -
error:=true; 
end if; 
when 25 => 
if (supply<4.6 and supply>4.4) then 
m pass:=-1.0;m act:=-1.0;b pass:=-1.0;b act:=-1.0; 
elsif (supply<5.I and supply>4.9) then -
m pass:=-1.0;m ac~:=-1.0;b pass:=-1.0;b act:=-1.0; 





when 125 => 
if (supply<4.6 and supply>4.4) then . 
m pass:=-1.0;m_act:=-1.0;b_pass:=-1.0;b_act:=-l.O; 
elsif (supply<S.1 and supply>4.9) then 
m_pass:=-l.O;m_act:=-1.0;b_pass:=-1.0;b_act:=-1.0; 
elsif (supply<5.6 and supply>5.4) then 
m_pass:=-1.0;m act:=-1.0;b pass:=-1.0;b act:=-1.0; 
else - - - · 
error:=true; 
end if; 
when others => error:=true; 
end case; 
when 't' l'T' => error:=false; 
case temperature is 
when -55 => 
if (supply<4.6 and supply>4.4) then 
m pass:=-1.0;m act:=-1.0;b pass:=-1.0;b act:=-1.0; 
elsif (supply<5.I and supply>4.9) then -
m pass:=-1.0;m act:=-1.0;b pass:=-1.0;b act:=-1.0; 
elsif (supply<5.6 and supply>S.4) then -
m pass:=-1.0;m act:=-1.0;b pass:=-1.0;b act:=-1.0; 
else - - -
error:=true; 
end if; 
when 25 => 
if (supply<4.6 and supply>4.4) then 
m pass:=-1.0;m act:=-1.0;b pass:=-1.0;b act:=-1.0; 
elsif (supply<5.I and supply>4.9) then -
m pass:=-1.0;m act:=-1.0;b pass:=-1.0;b act:=-1.0; 
elsif (supply<S.6 and supply>5.4) then -
m pass:=-1.0;m act:=-1.0;b pass:=-1.0;b act:=-1.0; 
else - - -
error:=true; 
end if; 
when 125 => 
if (supply<4.6 and supply>4.4) then 
m pass:=4.80E9;m act:=5.21E9;b pass:=-2.76;b act:=-2.72; 
elsif (supply<S.1 and supply>4.9) then -
m pass:=-1.0;m act:=-1.0;b pass:=-1.0;b act:=-1.0; 
elsif (supply<S.6 and supply>5.4) then -
m pass:=-1.0;m act:=-1.0;b pass:=-1.0;b act:=-1.0; 
else - - -
error:=true; 
end if; 
when others => error:=true; 
end case; 
when others=> error:=true; 
end case; 
if error then 
new line; 
put-line("*** Enter valid inputs ... "); 
put-line("Valid temperature values are -55, 25, 125· "); 
put-line("Valid supply voltages are 4.5, 5.0, 5.5 "); 
put-line("Valid process types are TJ B T (worst,best,typical). "); 
m pass:=0.0;m act:=0.0;b pass:=0.0;b act:=0.0; 
end-if; - - -






Glasser , L. and L. Hoyte . _D_e_la__,y':----,a,....,n...-d_...,,,P_o_w_,.e_r_O~pL.-t i_m_i_z....,a...-t_i_o_n_....,,i,....n __ V_L_SI 
Circuits. Proceedings of 21st Design Automation Conference, 
June 1984. 
Mead, C. and L. Conway. Introduction to VLSI Systems. 
Mass.: Addison-Vesley, 1980. 
Reading, 
[3] Glasser, L. and D. Dobberpuhl. The Design and Analysis of VLSI 
Circuits. Reading, Mass.: Addison-Vesley, 1985. 
Veste, N. and K. Eshraghian. Principles of CMOS VLSI Desi~n: 
Systems Perspective. Reading, Mass.: Addison-Vesley, 1 85. 
[4] A 
[5] Goncalves, N. and H. De Man. NORA: A Racefree Dynamic CMOS 
Technique for Pipelined Logic Structures. IEEE J. Solid State 
Circuits, vol. SC-18, No. 3, June 1983. 
[6] Booch, G. Software Engineering with Ada. 
Benjamin/Cummings, 1983. 
Menlo Park, Cal.: 
[7] Matson, M. "Macromodeling and Optimization of Digital MOS VLSI 
Circuits." Ph.D. dissertation, Massachusetts Institute of 
Technology, January 1985. 
[8] Amatangelo, M. and P. Curtis. The Common Sense 
Successful Foundry Utilization. Luton, Beds, 
Semicustom ICs, vol. 5 no. 1, September 1987. 
84 
Approach 
UK: J. 
to 
of 
