Introduction
Using cell compilers, which translate a transistor netlist (or boolean function) description into mask layout, makes it possible to replace many primitive gates, such as NAND and NOR gates, with a single complezgate tuned to tlie circuit requirements. This methodology leads to a much larger solution space which enables both area and performance driven layout synthesis.
Mapping an optimized function into an appropriate transistor netlist is a one-to-many mapping. The chosen topology will influence both area and performance. Usually performance is optimized at high-level by optimizing boolean expressions or a t low-level by transistor sizing. However, choosing the right topology may add yet another performance optimizing step to performance driven layout synthesis. For this purpose we need a simple and eficient model of a general CMOS functional cell which is able to reflect performance properties.
Using the linear RC model for modelling digital MOS circuits has become a well accepted practice for estimatiiig circuit delays. This modelling scheme was pioneered by Elmore [l] , who's notion of signal delay as tlie first-order moment of the impulse response has been used widely to approximate the time taken for a signal to reach half of its value.
In 1983, Penfield, Rubinstein and Horowits [2] proposed * T b project haa partly been sponsored by the ESPRIT B a k Reswch Action 3281. a method to calculate upper and lower bounds for the delay of a R C tree network. This method was extended by Lin and Mead [3] to general RC networks agd to cover the effect of pardlel connections and stored charge, i.e., arbitrary initial charge distribution. Other research on estimatiiig signal delay has bccn carrictl out [,I], [5] .
Though all these methods give good cstimates, they are not able to reflect the effect of topological changes within a CMOS functional cell.
In Figure 1) . In order to estimate worst case delay and switching condition, it is assumed that only one transistor is switching, i.e., d other transistors are either turned ON, in which cme they arc kept in the graph, or tiirircd OFF ant1 rcmoved from the graph. The graph through which tlie drain is charged or discharged is denoted the "driving" network. In a CMOS network, a conductixrg path from source to drain in the driving network (e.g. transistor b l in Figure 1 ) corresponQs to a cutset in the complementary network (e.g. transistor b2 in Figure l ), denoted the "loading" network. This cut may still leave transistors in the loading network which has a prtli to the drain (e.g. transistor a2 in Figure 1 ) and therefore contributes to tlre charge or discharge of the drain as extra load. Thus, the relation between the two networks depends highly on their topology, i.e., how transistors are arranged, and the model has to handle-botlt,networks.
Thus, the problem of finding the switclring condition which leads to^ the worst case delay may be formulated as:
Find the longest path from source to drain in the driving network, where the transistor next t o the source is the one switching, which leads to the highest influence of the loading network.
It is further assumed that the circuit has been stabilized before switching, i.e., that all vertices in the driving network has the same potential and so for tlre vertices in tlie loading network.
The M Model
Because of symmetry, the following dicussion has been restricted to the case of charging the output through the 
102
pull-up network (i.e., tlrc driving network), while the pulldown network acts as tlie loading network.
Driving Network
The driving network consists of two time components,
TDrive and TBrortch--Driving Path 7 '~~; "~ accounts for the time used to charge or dischirge the output tlirougli a series connection of transistors. The calculation is based w o n the Elmore time constant. A correctioii factor of 2 on the first resistance accounts for the fact that tlie switching transistor is not turned on immediately. 'fnriuc can be expressed as: .
where i is the distance from the source, n is the output node and a; is a correction factor accounting for the different initial charge distribution and t h e different amount of influence. ai is defined as: Consider Figure 3 showing a path with a branch placed on node i, the influence of the branch is not felt until the next node i + 1 has to be charged. Therefor the influence of the branch is brought to node i + 1 rather than node i which is the case for the Elmore model. The contribution
T T T T T
Figure 3: Driving path from source to drain with a branch placed on node i.
to the delay of the driving network by having branches at node i can be expressed as:
where ci accounts for the effect of different initial conditions for the output and the nodes in the loading network.
Ci is defined as:
( 0 if no branch on node n;
f o r i < n -1
I.e. if the branch is added at node n -1 , the effect is shown at the output node n, but since n initially is at 0 voltage while n -1 is at VTD, the branch-capacitance will not be charged until the output reaches V T D . Similar for a branch added at the output, the branch-capacitance will not be charged until the output reaches 2 v T D . In this case a n + i = f , which reflects the larger sensitivity when adding a capacitance to a loading node.
The branch capacitance CB, is the capacitance at the first node in the branch counted from the node i . If the branch consists of a large RC-tree, all node capacitances within this tree are pushed to this first node b and the value of the capacitor is corrected according to the little influence when placed far from node i. This situation differs from the loading network in the sense that all nodes in the branch is initially at VTD. The branch capacitance can be expressed as:
where j is the distance from node i. In order to calculate tlie total contribution from d brariches, we add the contrilwtiori from all branches dong llic clrivirig imlli, i.c:., llic: cuilril)iit.ic)ii ciiii I)c cxlirck-txl as:
Loading Network
The loading network is a passive RC tree network which increase the load capacitance at the output, thus branches are already included in the model. Since tlie influence of a capacitance in the loading network depends on how far it is from the output node, a correction facfior 6, is introduced. The contribution to the delay from the loading network can then be expressed as:
where i is the distance from tlie output node (i.e. the source) and bi is the correction factor defined as:
R D r i v e 6, = R D r i v e + CL=, R L k
the first fraction accounts for the fact that nodes in the loading network cannot reach VDD, while the last is the fraction between the total resistance driving the output ( R D r i v e ) and the total resistance driving node i in the loading network.
Delay Estimate
When the three contributions to the time constant have been found, the delay estimate is calculated as:
2Estimote = ( T D r i v e + T B r a n c h + T L o o d ) . In2 ( 5 )
where In 2 is found from the capacitance charge equation:
when the time t is set to the time at wicli vout has reached 9. 
Algorithm t o Perform Worst

ReduceByLongestPath
This algorithm is performed for each of the longest paths Li in the list of paths produced by the FindLongestPath algorithm (e.g., the drive tree of Figure 4b has 3 longest paths). A path L, from the drivc tree, corrcsponds to a cutset in the load tree. The algorithm removes, from the load tree, all edges belonging to L, together with all nodes and edges which do not have a path to the root (drain) after the cut, i.e., all nodes and edges which cannot be felt by the circuit output node.
For each leaf node the algorithm traces towards the ront, removing the node and connected edge until either the current node is connected to more than one edge, in which complexgate and Figure 4c shows the resulting drive and load tree.
In order to evaluate the time complexity of the algorithms the following properties are defined: Let I V I and I E I denote the number of vertieces and the number of edges in the graph G(V, E); Let I P I represent the number of different paths from source to drain in G (not including loops), and let I L I den&e the number of edges in the longest path from the set P of all paths.
LongestPat hTree
This algorithm builds a tree of all possible paths from source to drain node in a graph (see Figure 4b ). It uses a breadth-first search algorithm which takes O(l V I + I E I) time to construct the tree, using the source node as root. The algorithm terminates when either the drain node has been reached or when edges which can be reached from the present node already is in the path, i.e., avoiding loops. However, loop branches may still be in the tree and have to be removed. The algorithm is performed on both the drive and the load graph. For all drain nodes in tlie drive tree which does not belong to the longest pal11 L , the braiicli is traced until it intersects with L. Thc firs1 nodc (tlriiiir) and coiuicctcd edge are always removed. While either the edge or/and the node has the same identifier as an edge and/or a node in L, the node and connected edge are removed.
Edges wliicli arc not removed are included in a cutset, which is used to remove nonconducting edges from the load tree (i.e., open transistors).
As was the case for the previous algorithm, this takes
RemoveLoopBranches
Y .
RemoveCu t S et
This algoritlim rcmoves loop branclicy froiri a lrcc. For each leaf node which is not a drain node, tlie algorithm traces from this leaf toward the root. In each step node and connected edge is removed. The trace ends when the The filial algoritlim removes a11 cdgcs bcloriging to the cutset found in EvaluateBraiiclies. The algorithm also removes all edges (i.e., sub-trees) which are floating, i.e., edges from which a path to the root cannot be found. This results in the final drive and load tree as shown in Figure 4c .
Conclusion and Future Work
Implement at ion and Results
The M model and the algorithm set to perform worst case analysis have been implemented in C++ under UNIX as part of the cell compiler CELLO [7] . They have been applied to a large set of benchmarks aimed at showing different aspects of the model and algorithms. , Table l lists some of the results from the benchmark sets and compare them with SPICE simulation results. All results are the time taken to charge the output node from 0 voltage to half of the power supply. The CPU time taken to analyse and estimate the worst case delays is for the current implementation in the range of 0.1 -2.1 sec..
From the table it is seen that the M model compares well with the simulations, except for a feew cases (e.g., ch9, ch13, ch14) in which the load is much larger than the drive, in these cases the effect of the load is to high.
The benchmarks or1 -014 shows the results of different topologies of the same circuit, the M model is able to reflect these topological changes.
A new RC-trce iietwork iriotlcl for cstiiiinting signal delay in CMOS funclional cells have been presented. '1'Le model is able to reflect topological changes within a cell and compares well with SPICE simulations. Further, a set of algorithms using this model to cstimate worst case delay based upon the cell topology has been presented. Both model and algorithm set have been implemented as a part of a cell compiler.
Future work involve refinement of the M model and extension to liaiidle tlic iiillI1c:rlce o f illor(: tliau OIIC switcliilig transistor and the influence of difFerent arrival times for the gate signals. These are "global" constraints necessary rounding cells.
for choosing tlte right topology in connection with the SUK-
