Delay estimation for CMOS functional cells by Madsen, Jan
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
  
General rights 
Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners 
and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. 
 
• Users may download and print one copy of any publication from the public portal for the purpose of private study or research. 
• You may not further distribute the material or use it for any profit-making activity or commercial gain 
• You may freely distribute the URL identifying the publication in the public portal  
 
If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately 
and investigate your claim. 
   
 
Downloaded from orbit.dtu.dk on: Dec 17, 2017
Delay estimation for CMOS functional cells
Madsen, Jan
Published in:
Proceedings of the 2nd European Design Automation Conference
Link to article, DOI:
10.1109/EDAC.1991.206369
Publication date:
1991
Document Version
Publisher's PDF, also known as Version of record
Link back to DTU Orbit
Citation (APA):
Madsen, J. (1991). Delay estimation for CMOS functional cells. In Proceedings of the 2nd European Design
Automation Conference IEEE. DOI: 10.1109/EDAC.1991.206369
Delay Estimation for CMOS Functional Cells * 
Jan Madsen 
Microelectronics Center 
Technical University of Denmark 
DK2800 Lyngby, Denmark 
Email: jan@dc.dth.dl< 
Abstract 
This paper presents a new R C  tree network model for de- 
lay estimation of CMOS functional cells. The model is 
able to reflect topological changes within a cell, which is of 
particular interest when doing performoncc driven lo you1 
synthesis. Further, a set of algorithms to perform worst 
caue analysis on arbitrary CMOS functional cells using the 
proposed delay model, is presented. Both model and algo- 
rithms have been implemented as a part of a cell compiler 
(CELLO) working in an ezperimental silicon compiler en- 
vironment. 
1 Introduction 
Using cell compilers, which translate a transistor netlist (or 
boolean function) description into mask layout, makes it 
possible to replace many primitive gates, such as NAND 
and NOR gates, with a single complezgate tuned to tlie 
circuit requirements. This methodology leads to a much 
larger solution space which enables both area and perfor- 
mance driven layout synthesis. 
Mapping an optimized function into an appropriate 
transistor netlist is a one-to-many mapping. The chosen 
topology will influence both area and performance. Usu- 
ally performance is optimized at  high-level by optimizing 
boolean expressions or a t  low-level by transistor sizing. 
However, choosing the right topology may add yet another 
performance optimizing step to performance driven layout 
synthesis. For this purpose we need a simple and eficient 
model of a general CMOS functional cell which is able to 
reflect performance properties. 
Using the linear RC model for modelling digital MOS 
circuits has become a well accepted practice for estimatiiig 
circuit delays. This modelling scheme was pioneered by 
Elmore [l], who’s notion of signal delay as tlie first-order 
moment of the impulse response has been used widely to 
approximate the time taken for a signal to reach half of its 
value. 
In 1983, Penfield, Rubinstein and Horowits [2] proposed 
*Tb project haa partly been sponsored by the ESPRIT 
B a k  Reswch Action 3281. 
a method to calculate upper and lower bounds for the de- 
lay of a R C  tree network. This method was extended by 
Lin and Mead [3] to general RC networks agd to cover 
the effect of pardlel connections and stored charge, i.e., 
arbitrary initial charge distribution. Other research on es- 
timatiiig signal delay has bccn carrictl out [,I], [5]. 
Though all these methods give good cstimates, they are 
not able to reflect the effect of topological changes within 
a CMOS functional cell. 
In this paper a new RC model, called the M model, 
is proposed. The M model is able to reflect topological 
changes within a CMOS functional cell and is based upon 
a detailed study of the impact on circuit delay when chang- 
ing the order of the traiisistors in a circuit [GI. Further, we 
propose an algorith~n wliich USCY tlic M Iiiodcl to estimate 
worst case delay. 
SwrCe 
7- 
a1 bl  Driving nctwork 
Loading network b21 1 
source 
Figure 1: Simple CMOS gate showing “drive” and 
“load” network when charging output. 
2 Model and Problem Formu- 
lat ion 
A CMOS transistor circiiit at  the ccll level is composed 
of a pull-up a d  a pull-down Iietwork. Each of tliese 
two networks can be represented by an undirected two- 
terminal multigraph G, in which an edge (e) represents 
the drain/source connection of a transistor and a vertex 
2130/9110000/0101$01.00 0 1991 IEEE 
101 
Authorized licensed use limited to: Danmarks Tekniske Informationscenter. Downloaded on July 06,2010 at 06:59:35 UTC from IEEE Xplore.  Restrictions apply. 
1.1 - 
a . 1  - 
(U) the net connecting several transistors. Each edge is 
indexed by the gate net (signal) of the corresponding tran- 
sistor. The  two terminals of the multigraph represents the 
power (source) and the output (drain) (see Figure 1). 
In order to estimate worst case delay and switching con- 
dition, i t  is assumed that only one transistor is switching, 
i.e., d other transistors are either turned ON, in which 
cme they arc kept in the graph, or tiirircd OFF ant1 rc- 
moved from the graph. The graph through which tlie 
drain is charged or discharged is denoted the "driving" 
network. In a CMOS network, a conductixrg path from 
source to  drain in the driving network (e.g. transistor b l  
in Figure 1) corresponQs to a cutset in the complemen- 
tary network (e.g. transistor b2 in Figure l), denoted the 
"loading" network. This cut may still leave transistors in 
the loading network which has a prtli to the drain (e.g. 
transistor a2 in Figure 1) and therefore contributes to tlre 
charge or discharge of the drain as extra load. Thus, the 
relation between the two networks depends highly on their 
topology, i.e., how transistors are arranged, and the model 
has to  handle- botlt,networks. 
Thus, the problem of finding the switclring condition 
which leads to^  the worst case delay may be formulated as: 
Find the longest path from source to drain in 
the driving network, where the transistor next 
t o  the source is the one switching, which leads 
to the highest influence of the loading network. 
It is further assumed that the circuit has been stabilized 
before switching, i.e., that all vertices in the driving net- 
work has the same potential and so for tlre vertices in tlie 
loading network. 
3 The M Model 
Because of symmetry, the following dicussion has been re- 
stricted to the case of charging the output through the 
I d  
Figure 2: a) Example circuit and b) Voltage of internal nodes during a switch, which cllarge tlie ou tput  node. 
102 
pull-up network (i.e., tlrc driving network), while the pull- 
down network acts as tlie loading network. 
Driving Network 
The driving network consists of two time components, 
TDrive and TBrortch- 
- Driving Path 
7 ' ~ ~ ; " ~  accounts for the time used to charge or dischirge 
the output tlirougli a series connection of transistors. The  
calculation is based w o n  the Elmore time constant. A 
correctioii factor of 2 on the first resistance accounts for 
the fact that tlie switching transistor is not turned on im- 
mediately. 'fnriuc can be expressed as: . 
where i is the distance from the source, n is the output 
node and a; is a correction factor accounting for the dif- 
ferent initial charge distribution and t h e  different amount 
of influence. ai is defined as: 
for i < n 
for i = n 
Vn n - VT n -;.A V 
VD D a;  = 
where VTD is the treshold voltage and refering to  Fig- 
ure 2 ,  AV is tlre voltage difference between two neigh- 
bouring iiodes in the pcriode where tlreP;oltage tend to 
saturate. AV can be expressed as: 
Authorized licensed use limited to: Danmarks Tekniske Informationscenter. Downloaded on July 06,2010 at 06:59:35 UTC from IEEE Xplore.  Restrictions apply. 
- Branches in the Driving Network 
Consider Figure 3 showing a path with a branch placed 
on node i ,  the influence of the branch is not felt until the 
next node i + 1 has to  be charged. Therefor the influence 
of the branch is brought to node i + 1 rather than node i 
which is the case for the Elmore model. The contribution 
T T T T  T 
Figure 3: Driving path from source to drain with a 
branch placed on node i .  
to  the delay of the driving network by having branches at 
node i can be expressed as: 
i t 1  
TBronch, i  = Ci . [ . i t1  . (E R D j )  . CB,] ( 2 )  
j = 1  
where ci accounts for the effect of different initial condi- 
tions for the output and the nodes in the loading network. 
Ci is defined as: 
( 0  if no branch on node n; 
f o r i  < n -  1 
for i = n 
ci = 
I.e. if the branch is added at node n - 1 ,  the effect is 
shown at the output node n, but since n initially is at 0 
voltage while n - 1 is at VTD, the branch-capacitance will 
not be charged until the output reaches V T D .  Similar for 
a branch added at the output, the branch-capacitance will 
not be charged until the output reaches 2 v T D .  In this case 
a n + i  = f ,  which reflects the larger sensitivity when adding 
a capacitance to  a loading node. 
The branch capacitance CB, is the capacitance at the 
first node in the branch counted from the node i .  If the 
branch consists of a large RC-tree, all node capacitances 
within this tree are pushed to  this first node b and the 
value of the capacitor is corrected according to the little 
influence when placed far from node i .  This situation dif- 
fers from the loading network in the sense that all nodes 
in the branch is initially at VTD. The branch capacitance 
can be expressed as: 
VDD - ( j  - ~ ) V T D  AV = 
VD D 
where j is the distance from node i. 
In order to calculate tlie total contribution from d 
brariches, we add the contrilwtiori from all branches dong 
llic clrivirig imlli, i.c:., llic: cuilril)iit.ic)ii ciiii I)c cxlirck-txl 
as: 
n r+ 1 
i= 1 j = l  
Loading Network 
The loading network is a passive RC tree network which 
increase the load capacitance at the output, thus branches 
are already included in the model. Since tlie influence of a 
capacitance in the loading network depends on how far it is 
from the output node, a correction facfior 6,  is introduced. 
The contribution to the delay from the loading network 
can then be expressed as: 
n 
(4) 
where i is the distance from tlie output node (i.e. the 
source) and bi is the correction factor defined as: 
VDD - VTL . R D r i v e  6 ,  = 
R D r i v e  + CL=, R L k  
the first fraction accounts for the fact that nodes in the 
loading network cannot reach VDD, while the last is the 
fraction between the total resistance driving the output 
( R D r i v e )  and the total resistance driving node i in the 
loading network. 
Delay Estimate 
When the three contributions to the time constant have 
been found, the delay estimate is calculated as: 
2Estimote = ( T D r i v e  + T B r a n c h  + T L o o d )  . In2 ( 5 )  
where In 2 is found from the capacitance charge equation: 
uout(t)  = v D D ( 1 -  exp-t’RC) 
when the time t is set to the time at wicli vout has reached 9. 
4 Algorithm to Perform Worst 
v nodes 
Case Analysis where the sum is taken over all nodes in the branch and the 
two fractions accounts for the different driving resistances 
and the initial node voltage at VTD respectively: The objective of the this algorithm is to produce a RC tree network from a given CMOS circuit. Both the drive - 
AR = R P a t h ,  + R D b  and the load graph ( Figure 4a) has to be converted into 
trees. Figure 4b shows the initial drive and load tree for a &athi + R D b  + R D k  
103 
Authorized licensed use limited to: Danmarks Tekniske Informationscenter. Downloaded on July 06,2010 at 06:59:35 UTC from IEEE Xplore.  Restrictions apply. 
S- 
SourCe 
g t 
Dnin 
SoUroC 
.f 
0 
current node has more than one edge connected to it. The 
algorithm takes time O(( P I I L I). 
sourcc 
FindLongestPath 
This dgorilliiii creates a list of all paths, from source to  
drain, iii tlic drive tree, sortctl by tlicir ICIigth which is 
ciilciiliilctl iisiiig IQiiiore’s liiiic coiisliiiit. ‘HIC irlgorillilii 
lakes lime O(l 1’ I . I L I). DRin 
source 
ReduceByLongestPath 
This algorithm is performed for each of the longest paths 
Li in the list of paths produced by the FindLongestPath 
algorithm (e.g., the drive tree of Figure 4b has 3 longest 
paths). A path L,  from the drivc tree, corrcsponds to a 
cutset in the load tree. The algorithm removes, from the 
load tree, all edges belonging to L,  together with all nodes 
and edges which do not have a path to the root (drain) 
after the cut, i.e., all nodes and edges which cannot be felt 
by the circuit output node. 
For each leaf node the algorithm traces towards the ront, 
removing the node and connected edge until either the 
current node is connected to more than one edge, in which 
g{ 
4 
Figure 4: a) drive and load graphs, b) initial trees, 
and c) final drive and load tree. 
complexgate and Figure 4c shows the resulting drive and 
load tree. 
In order to  evaluate the time complexity of the algo- 
rithms the following properties are defined: Let I V I and 
I E I denote the number of vertieces and the number of 
edges in the graph G(V, E); Let I P I represent the number 
of different paths from source to drain in G (not including 
loops), and let I L I den&e the number of edges in the 
longest path from the set P of all paths. 
LongestPat hTree 
This algorithm builds a tree of all possible paths from 
source to  drain node in a graph (see Figure 4b). I t  uses a 
breadth-first search algorithm which takes O(l V I + I E I) 
time to construct the tree, using the source node as root. 
The algorithm terminates when either the drain node has 
been reached or when edges which can be reached from the 
present node already is in the path, i.e., avoiding loops. 
However, loop branches may still be in the tree and have 
to be removed. 
The algorithm is performed on both the drive and the 
load graph. 
case a later trace from another leal will continue this path, 
or tlic currcnt edge bclo~igs lo the ciitsct, iii which casc the 
iiotlc ;iiicl c:clgc: iiIc roiiiov(*cI ; w t l  t l i c -  trx:c is iiiovccl to l l i c :  
next leaf itode not yet visited. The lougest path is then 
selected as the path Li from the pathlist which leads to the 
higest number of remaning edges in the load tree. Thus 
the time taken by this algorithm is O(l Pr, 1 .  I P 1 .  I L I), 
where I PL I (<I P I) is the number of longest paths in P. 
EvaluateBranches 
Wlicii tlic lorigcst path liavc bcrii f o i i i i t l ,  a11 other paths 
have to be cut open. In order to give the highest influence 
on the delay, these paths arc cut  i l s  close to the drain as 
possible. Branclics are not placed at the oulput uode (i.e., 
the drain) as experiments [G] have shown f h a t  branches 
have tlie least influence when placed at  output (and of 
course at power supply). 
For all drain nodes in tlie drive tree which does not 
belong to the longest pal11 L ,  the braiicli is traced until it  
intersects with L .  Thc firs1 nodc (tlriiiir) and coiuicctcd 
edge are always removed. While either the edge or/and 
the node has the same identifier as an edge and/or a node 
in L, the node and connected edge are removed. 
Edges wliicli arc not removed are included in a cutset, 
which is used to remove nonconducting edges from the load 
tree (i.e., open transistors). 
As was the case for the previous algorithm, this takes 
time O(l PI . I L I). 
RemoveLoopBranches Y .  
RemoveCu t S et This algoritlim rcmoves loop branclicy froiri a lrcc. For 
each leaf node which is not a drain node, tlie algorithm 
traces from this leaf toward the root. In each step node 
and connected edge is removed. The trace ends when the 
The filial algoritlim removes a11 cdgcs bcloriging to the cut- 
set found in EvaluateBraiiclies. The algorithm also re- 
moves all edges (i.e., sub-trees) which are floating, i.e., 
104 
Authorized licensed use limited to: Danmarks Tekniske Informationscenter. Downloaded on July 06,2010 at 06:59:35 UTC from IEEE Xplore.  Restrictions apply. 
6 Conclusion and Future Work 
Table 1: Comparison of the computed delay using the 
M model and the  SPICE (version 3 b l )  simulations. 
edges from which a path to the root cannot be found. 
This results in the final drive and load tree as shown in 
Figure 4c. 
5 Implement at ion and Results 
The M model and the algorithm set to perform worst case 
analysis have been implemented in C++ under UNIX as 
part of the cell compiler CELLO [7]. They have been ap- 
plied to a large set of benchmarks aimed at  showing dif- 
ferent aspects of the model and algorithms. ,Table l lists 
some of the results from the benchmark sets and compare 
them with SPICE simulation results. All results are the 
time taken to charge the output node from 0 voltage to 
half of the power supply. The CPU time taken to anal- 
yse and estimate the worst case delays is for the current 
implementation in the range of 0.1 - 2.1 sec.. 
From the table it is seen that the M model compares 
well with the simulations, except for a feew cases (e.g., 
ch9, ch13, ch14) in which the load is much larger than 
the drive, in these cases the effect of the load is to high. 
The benchmarks or1 - 014 shows the results of different 
topologies of the same circuit, the M model is able to reflect 
these topological changes. 
A new RC-trce iietwork iriotlcl for  cstiiiinting signal de- 
lay in CMOS funclional cells have been presented. ’1‘Le 
model is able to  reflect topological changes within a cell 
and compares well with SPICE simulations. Further, a set 
of algorithms using this model to cstimate worst case de- 
lay based upon the cell topology has been presented. Both 
model and algorithm set have been implemented as a part 
of a cell compiler. 
Future work involve refinement of the M model and ex- 
tension to liaiidle tlic iiillI1c:rlce o f  i l lor(: tliau OIIC switcliilig 
transistor and the influence of difFerent arrival times for 
the gate signals. These are “global” constraints necessary 
rounding cells. 
for choosing tlte right topology in connection with the SUK- 
7 Acknowledgement 
The author would like to express special thanks to  Dr. 
Paul Six from IMEC, Belgium, wider who’s supervision 
the author, w a guest researclier at IMEC, performed the 
initial studies leading to the work described in this paper. 
References 
[l] W. C. Elmore, ”The Transient Response of Damped 
Linear Networks with Particular Regard to Wideband 
Amplifiers, ” Journal of Applied Pl~ysics, 19( l ) ,  Jan- 
uary 1948, pp.55-63. 
Jorge Rubinstein, Paul Penficld, JR. and Mark A. 
Horowitz, ”Signal Delay in RC Tree Networks,” IEEE 
Trans. 011 CAD, vol. CAD-2, 110.3, July 1983, pp.202- 
211. 
Tzu-Mu Lin and Carver A. Mead ”Signal Delay in 
General RC Networks,” IEEE Trans. on CAD, vol. 
CAD-3, no.4, October 1984, pp.331-349. 
Pak K. Chan and Kevin Karplus, ”Computing Signal 
Delay in General RC Networks by Tree/Link Partition- 
ing,” Proceedings of the 2Gtli ACM/IEEE Design Au- 
tomation Coliferciicc, 1989, pp.485-490. 
Scrgc Ciriotti, Miclicl R. Dagciiais and Nicholas C. 
Rumin, ” Worst-case Delay Estimation of Transistor 
Groups,” Proceedings of the 26th ACM/IEEE Design 
Automation Conference, 1989, pp.491-496. 
Jan Madsen, ”The Impact of Gate Ordering on Circuit 
Delay, ” intermil paper, Designcenter of Electronics In- 
stitute, Technical University of Denmark, EI-LHT148, 
27 pages. 1988. 
Jan Madsen, ”A New Approach l o  Optimal Cell Syn- 
thesis, ” Proceedings of IEEE International Conference 
on Computer-Aided Design, Santa Clara, California, 
1989, pp.336-339. 
105 
Authorized licensed use limited to: Danmarks Tekniske Informationscenter. Downloaded on July 06,2010 at 06:59:35 UTC from IEEE Xplore.  Restrictions apply. 
