Both long and short path delays are used to determine the valid clocking for various CMOS circuits such as single phase latching, asynchronous, and wave pipelining. Therefore, accurate estimation of both long and short path delays is very crucial in the designing and testing of high speed CMOS circuits. Most of the previous approaches in detecting long and short sensitizable paths assume that the rising and falling of gate delays are either xed or bounded. In fact the gate delay of CMOS circuits may also depend on how many and which inputs are rising or falling and the arrival times of those rising or falling inputs. For instance, the delay for a two-input CMOS NAND gate may vary as much as a factor of two based on whether one input or two inputs are changing.
Introduction
More accurate timing analysis tools are needed in order to design faster digital systems. For example, minimization of the clock period is a key point for improving the performance.
It has been shown that the accuracy of the longest=shortest path delay can a ect the calculation of the optimal clock period. Long and short path delays were also used to determine clock period for wavepipelined circuits 13 ] have developed methods to compute the longest sensitizable path without explicitly tracing and enumerating paths. The delay of the shortest topological path may be a poor estimate of how soon the outputs can become stable. Therefore, Cheng et al, proposed the shortest destabilizing path which was combined with the longest sensitizable path to determine the valid clock period 3]. Lam et al proposed a method to calculate the clock period based on minimum two-vector or multi-vector transition delay 9, 14] . A symbolic simulation approach was used in 4] to deal with the exact long transition delay. A uni ed approach called Time Boolean Functions was proposed in 9] to compute the longest and shortest transition delays.
Most of the gate delay models previously used in these studies are assumed to be either xed or bounded. The maximum possible delay is used to compute the longest path delay while the minimum possible delay is used to compute the shortest path delay. In reality the circuit does not operate under the oating mode and the transition delay between two inputs should be considered. In fact, the gate delay of a CMOS circuit may also depend on how many and which inputs are rising or falling and the arrival time of those rising or falling inputs. For instance, based on circuit level simulation on 2 m model parameters with all transistor widths and lengths of minimum size, the gate delay of a two-input NAND gate was shown in Table 1 (borrowed from 1]). In this table, for any literal x, x" represents x rising, x# represents x falling, x represents that x is a "1" and x represents that x is a "0".
In order to get more accurate long and short path delays, sensitization of multiple paths must be considered to assure the feasibility of multiple simultaneous input changing (transition) at a particular gate. Since considering only one path sensitization is already NP-hard and it is more di cult to consider multiple path sensitization. The previously proposed Symbolic approach 4] and Time Boolean Functions 9] will also become more complicated to deal with the simultaneous changing of inputs.
In this paper we study e cient ways to compute the longest sensitizable path and the shortest destabilizing path while considering data dependent delays. The rest of the paper is organized as follows. In Section 2, we discuss the data dependent delay model that we assumed. In Section 3, we propose an improved topological long/short path algorithm based on the data dependent delay model. In Section 4, we assume that the gate delays are followed as a bounded delay model, therefore, a modi ed algorithm which originally proposed in 2] is used to get the longest sensitizable path, however the shortest sensitizable path delay can not be used to estimate the minimum transition delay. In Section 5, we propose an algorithm which is a combination of the rst two algorithms. The experiment results are shown in Section 6. We o er some conclusions in Section 7.
Data Dependent Delay Model
In order to get more accurate long and short path delays, we need to consider a more accurate delay model. For static CMOS gate, it is not su ciently accurate to estimate its gate delay if we just consider the resistance and capacitance in a gate. Since di erent relative arriving times of input transitions can produce di erent delays for the same gate, we have to take a simulation approach to get the exact delay of the circuit. However, simulation is very time consuming. Gary, Liu and Cavin have proposed a way to reduce the simulation time by an event pruning technique 1]. In this paper we propose to take a di erent approach by rst simplifying the data dependent gate delay model. Therefore, some e cient algorithms can be designed and developed based on the simpli ed data dependent gate delay model.
We shall use a NAND gate G with two inputs a, b and an output c to illustrate the simpli ed data dependent delay model. When a lead (either an input or an output) changes its value either from 0 to 1 (called rising) or from 1 to 0 (called falling), the time for such a change to occur is referred to as its transition time. We shall use t i (i is either a, b, or c) to denote the transition time of lead i. Since gate G is a NAND gate, "0" is the Controlling (C) value and "1" is the Non-Controlling (NC) value of G respectively. Assuming the current values of all input leads are NC (i.e., are "1"), the current value of output lead is a "0". When several inputs have transitions from NC to C (i.e., falling), the output lead has a rising transition and its transition time equals to the earliest input lead transition time plus the gate delay. If the output lead has a falling transition, each input lead has either a constant NC ("1") or a transition from C to NC. The output transition time is the latest input lead transition time plus the gate delay. Note that in the data dependent gate delay model the gate delay depends on how many transitions are taking place and what are the transition times of these inputs.
From Table 1 we know the gate delays for a"b, ab" or a"b" are di erent even if t a =t b . Let us consider the special case of a"b" when t a < t b . It is clear that when t b is su ciently larger than t a , the gate delay of a"b" should be the same (or very close to ) the delay of ab".
Since by the rising time of the rising transition occurring on b, the transition on lead a has been stable to 1. Let 1 denote the minimum time period such that the above mentioned e ect can take place if t b > t a + 1 . Similarly, let 2 denote the minimum time period such that the delay of a"b" is the same (or very close to) the delay of a"b if t a > t b + 2 .
In the case of a#b# when t a < t b . It is clear that when t b is su ciently larger than t a , the gate delay of a #b# should be the same (or very close to ) the delay of a#b. Since by the falling time of the falling transition occurring on a, the lead b has a constant 1 (note that the falling transition of lead b occurs much later). Let 3 denote the minimum period such that the above e ect can take place if t b > t a + 3 . Let 4 denote the minimum period such that the delay of a#b# is the same (or very close to ) the delay of ab# if t a > t b + 4 .
For the convenience of illustration, we use = max( 1 , 2 , 3 , 4 ) to deal with all the cases. We further assume that when the di erence of t a and t b is no greater than , the gate delays of a"b", and a#b# are in bounded ranges. For example, they will be denoted as delay(a"b")= dmin(a"b"), dmax(a"b")]. The actual values of dmin(a"b") and dmax(a"b") can be obtained from simulation.
Consider the cases of the output having a falling transition.
j t a -t b j
The gate delay is delay(a"b") = dmin(a"b"), dmax(a"b")].
Here delay(a"b") denotes the delay when the di erence between the rising (C to NC) times of a and b is smaller than or equal to . dmin(a"b") and dmax(a"b") are the minimum and maximum possible values of delay(a"b") respectively. The gate delay is delay(ab") = dmin(ab"), dmax(ab")].
Here delay(ab") represents the delay when the transition on input b occurs, input a already stables at "1" (nontrolling value). dmin(ab") and dmax(ab") are the minimum and maximum possible value of delay(ab") respectively. t c = t b + delay(ab").
Consider the cases the output having a rising transition.
The gate delay is delay(a#b#) = dmin(a#b#), dmax(a#b#)]. Here delay(a#b#) is the delay when the di erence between the falling times of a and b is smaller than or equal to . Since a and b are both falling transition (NC to C), t c = min(t a , t b ) + delay(a#b#) 2. t a > t b + or input a has a falling transition and input b is "1".
The gate delay is delay(ab#) = dmin(ab#), dmax(ab#)]. Here delay(ab#) denotes the delay when the falling transition on b occurs, a stables at "1" for at least after the falling transition on b. t c = t b + delay(ab#) 3. t b > t a + or b has a falling transition and a is constant "1" (NC).
The gate delay is delay(a#b) = dmin(a#b), dmax(a#b)]. Here delay(a#b) denotes the delay when the falling transition occurs on a, b stables at "1" for at least after the falling transition on a.
t c = t a + delay(a#b) Remark 1: The maximum/minimum delay value in each case may depend on . Remark 2: The gate delays with one rising and another falling condition can also be considered in the above formula.
Let t ar , t af , t br , t bf , t cr and t cf denote the latest possible rising and falling times of lead a , b and c respectively. Based on the formula discussed above, the rising and falling times for lead c can be simpli ed to the following expressions.
t cf = max(t ar +dmax(a"b), t br +dmax(ab"), min(max(t ar , t br ), min(t ar , t br )+ ) + dmax(a"b")).
This expression considers all possible cases when both leads a and b have a rising (C to NC) transition. When we use dmax(a"b"), this means that the di erence between the rising times of a and b is not bigger than , so the later rising transition can not occur before min(t ar , t br )+ and max(t ar , t br ), so we get the last term in the formula. The maximum of the possible cases can be used to compute the longest path delay.
t cr = max(t af +dmax(a#b), t bf +dmax(ab#), min(t af , t bf )+dmax(a#b#))
This considers all possible cases too. Since leads a and b have falling transition (NC to C), we choose the smallest one if we use dmax(a#b#).
Let us use e ar , e af , e br , e bf , e cr and e cf to represent the earliest possible rising and falling times of a, b, c respectively. e cr = min(e af +dmin(a#b), e bf +dmin(ab#), max(max(e af , e bf )-, min(e af , e bf ))+ dmin(a#b#)) When we use dmin(a#b#), the earlier transition (from NC to C) can not happen before min(e af , e bf ) or max(e af , e bf )-. Thus, we can get the last term in the above expression.
e cf = min(e ar +dmin(a"b), e br +dmin(ab"), max(e ar , e br )+dmin(a"b")).
When we use dmin(a"b"), the later transition (from C to NC) can not happen earlier than max(e ar , e br ).
In the following we shall use the circuit shown in Figure 1 to illustrate how to compute the transition delay for a pair of input vectors under data dependent delay model. Each gate in Figure 1 (i.e., G1, G2 and G3) is a NAND gate with two inputs. We assume that the upper input lead and the lower input lead correspond to lead a and lead b in Table 1 respectively and the gate delay obeys the delay shown in Table 1 . The number on a lead represents both the rising and falling transition delays of the lead (i.e., connection delay).
For example, both the rising and falling delays from input a to the input of G1 are 580. In order to simplify the computation, we assume that is 0 and the second input vector a=1, b=1, c=1, d=1 is applied after all gates/leads in the circuit have stabilized to their values under the rst input vector a=0, b=1, c=0, d=1. We want to know when the transition can propagate to the primary output g.
The falling transition time of output G 1 is 580+delay(a"b)=580+470=1050. This transition propagating to the input of gate G 3 is 1050+440=1490.
The falling transition time of output of G 2 is 600+delay(c"d)=600+470=1070. This transition propagating to the input of gate G 3 is 1070+960 = 2030. The rising transition of gate G 3 is 1490+delay(e#f)=1490+590=2080.
For the gates with more than two input leads, we can derive similar formulas. For the simplicity of the presentation, we will not include them here. Based on these formulas, we propose the following algorithm.
Topological Long Path Algorithm
Step 1: Levelize the gates in the combinational circuit.
Step 2: Put gates into a queue by increasing order of their levels.
Step 3: Do while the queue is not empty
Step 4: Take a gate from the queue
Step 5: Use the latest possible rising (falling) time of the output of the predecessor gate plus rising(falling) delay of the lead to get the maximum possible rising(falling) times of each input lead of the gate Step 6: Use the data dependent delay formula to compute the maximum possible rising and falling times of the output of the gate Step 7: end do
Step 8: Find the maximum value among all of the maximum rising and falling times of the primary output gates.
Time Complexity
We assume that a circuit has n gates and m leads, and the maximum number of input leads to a gate is K. The time complexity of steps 3 to 6 is O(2 K ), therefore, the time complexity of this algorithm is O(n2 K ). We can also compute the longest topological long path delay by using the bounded delay formula in Step 6, and the time complexity in this case is O(m+n).
Theorem 1
The above algorithm obtains an upper bound for the long path transition delay.
Proof We prove this by using mathematical induction on the number of levels in a circuit.
When the level of a gate is 1, this is trivial. We assume that this is true for the gate with its level smaller than k. We need to prove this is also true for any gate in level k. Suppose that G is one of the gates in level k. Without loss of generality, we assume that G is an AND gate with two inputs a, b and one output c. Y=max(t1+dmax(A"B), t2+dmax(AB"), min(min(t1,t2)+ , max(t1, t2))+dmax(A"B")) Under the assumption that t1 and t2 are correct rising times, this means that Y1 t1, Y2 t2. We need to prove XX Y.
Since the real rising time of c may come from the following cases, 1. A"B, the rising time is x + delay(A"B) (x 2 X1, Y1]) t1 + dmax(A"B).
2. AB", the rising time is y + delay(AB")) (y 2 X2, Y2]) t2 + dmax(AB").
3. A"B", the rising time is z+delay(A"B"), z 2 max(x1, x2), min(min(Y1, Y2)+ , max(Y1, Y2))]. Y1 and Y2 are smaller than or equal to t1 and t2 respectively, so z is not bigger than min(min(t1,t2)+ , max(t1, t2)), so z+ delay(A"B") min(min(t 1 , t 2 )+ , max(t 1 , t 2 ))+dmax(A"B"). under data dependent delay model e cf =min(e ar +delay(a" b), e br +dmin(ab "), max(max(e ar , e br )-, min(e ar , e br ))+dmin(a"b")) e cr = min(e af +dmin(a# b), e bf +dmin(ab#), max(e af , e bf ) + dmin(a#b#)) under bounded delay model e cf = min(e ar +dmin(a"b), e br +dmin(ab"), max(e ar , e br )+dmin(a"b")) e cr = min(e af +dmin(a#b), e bf +dmin(ab#), max(e af , e bf )+dmin(a#b#)) If G is an inverter gate with input a and output c, the formulas for both the data dependent delay model and bounded delay model are:
e cr = e af + dmin(a#), e cf = e ar + dmin(a")
We can get similar formulas for a gate with more than 2 input leads.
Topological Short Path Algorithm
Step 1: Levelize the gates in the combination circuit.
Step 2: Put gates into a queue according the increasing order of their levels.
Step 5: Use the earliest possible rising(falling) time of the output of the predecessor gate plus the lead rising(falling) delay to get the earliest possible rising(falling) time of each input lead of the gate Step 6: Use the data dependent delay formula to compute the minimum possible rising and falling times of the gate.
Step 7: end do
Step 8: Find the minimum value among all of the minimum possible rising and falling times of the primary output gates.
The time complexity of this algorithm is also O(n2 K )). We can also get the shortest topological path delay by using the formulas for the bounded delay model in Step 6.
Theorem 2 If each primary output in a circuit is stuck-at-1 and stuck-at-0 testable, then the above algorithm derives an correct estimate for the shortest path transition delay.
Proof Under the given condition, each primary output has a rising and a falling transition.
We will prove by induction that if gate G has a rising (falling) transition at the output, the rising (falling) time of G computed by the algorithm is a lower bound of the minimum rising (falling) time.
When the level of a gate which has a rising (or falling) transition is 1, this is trivial. We assume that it is true for any gate with its level smaller than k. We will prove this is also true for the gate in level k. Suppose gate G is in level k and has a rising (or falling) transition. Without loss of generality, we assume that gate G is an AND gate with two inputs a, b and one output c and c has a rising transition.
Suppose that the earliest possible rising times of a and b computed by the above algorithm are t1 and t2 respectively, and the real rising times of a and b are in the range of X1, Y1] and X2, Y2] respectively. Second suppose that only one of a and b has rising transition. Without loss of generality,
we assume a has a rising transition. Thus, we have X1 t1 according to above assumption. The actual rising time of G comes from a"b x+delay(a"b) t1+dmin(a"b) min(t1+dmin(a"b), t2+dmin(ab"), max(t1, t2)+dmin(a"b")).
In any case, the rising time of G can not be overestimated. 2
Lemma 2 The result obtained by the above algorithm under the same condition as Theorem 2 has monotone speedup property.
Proof This can be proved similar to Theorem 2. 2
The primary outputs in the real circuits are usually stuck-at-1 and stuck-at-0 testable. Therefore, the above algorithm can be used to estimate the shortest transition path delay.
In the following we use the above algorithms to estimate the longest and shortest transition path delays for the circuit shown in Figure 1 . Here is still assumed to be 0.
The latest possible rising and falling times of output for G1 are A pair of input vectors a=0, b=1, c=0, d=1 and a=0, b=1, c=1, d=1 can cause the transition delay to be 2820. Similarly the shortest transition path delay obtained is 1700.
Path Sensitization Algorithm
In the previous section, we proposed an improved topological long/short path algorithm without considering the sensitizability of a path. Now, we brie y present the longest sensitizable path algorithm proposed in 2]. This algorithm can also be used to estimate the longest path delay under the bounded delay model.
We assume that the reader is familiar with the concepts like stable value and stable times of a gate, controlling value and non-controlling inputs to a gate. The following de nitions will be introduced.
De nition 4.1 (Exact Path Sensitization Criterion 8])
A path is considered to be an exact sensitizable path, if there is at least one primary input vector such that each on-input of the path is either the earliest controlling input or the latest non-controlling input with all its side inputs being non-controlling inputs too.
De nition 4.2 (Loose Path Sensitization Criterion 8])
A path is considered to be a loose sensitizable path, if there is at least one primary input vector such that each on-input of the path is either the earliest controlling input or a non-controlling input with all its side inputs being non-controlling inputs too. Chen 8] proved that these two criteria obtain the same longest sensitizable path delay.
Theorem 3 The longest sensitizable path delay computed by using the maximum possible delay is an valid estimate for the longest transition delay.
Proof This can be proved by using the monotone speedup property 8] and that the delay of transitional model is always less than or equal to the delay of the oating model 8]. 2
We will not discuss the details of the path sensitization algorithm based on the loose path sensitization criterion. However, it is clear that this algorithm (or any other path sensitization algorithm) based on bounded delay model can still be used in here. However, we may not use the shortest sensitizable path delay to estimate the minimum transition delay.
A Combination Algorithm
In this section we will present a way to combine the previous two algorithms to get a more accurate long path transition delay.
We use Loose Path Sensitization Criterion to nd the next longest sensitizable path using maximum possible delay, then a more accurate delay along this path is computed by using the data dependent delay. We repeat this process until the next longest sensitizable path delay is not bigger than the maximum delay computed by the data dependent delay model among all the longest sensitizable paths found by using the maximum possible delay so far.
We compute the delay of a path P under the data dependent delay model as follows. Without loss of generality, we assume that gate G is one of the gates along path P and G has two-inputs a, b and one output c. a is on-path, b is the side-input. Suppose that the latest possible rising time of a along path P is t tempar , the latest possible rising times of a, b and c are t ar , t br and t cr respectively and the latest possible rising time of c along P is t tempcr , then we have t tempcr = max(t tempar + dmax(a"b)), min(t br + , t tempar ) +dmax(a"b")), t cr = max(t ar +dmax(a"b), t br +dmax(ab"), min(min(t ar , t br )+ , max(t ar ,t br )+dmax(a"b"),
Since when we use the delay(a"b"), the actual rising time of a can not be later than min(t br + , t tempar ). Using mathematical induction we can easily prove t tempcr t cr .
We can derive a similar formula for other types of gates or with the number of inputs to a gate more than 2.
Lemma 3 The delay along a path computed by the above algorithm is an upper bound for this path transition delay.
Proof The proof is similar to Theorem 1 in Section 3. 2
Algorithm Description:
Step 1: Use the algorithm described in Section 3 to compute the latest possible rising and falling times of each gate.
Step 2: T=0;
Step 3: Use the Loose Path Sensitization algorithm to get the next longest sensitizable path by using the maximum possible delay of each gate. If the delay along this path under the maximum possible delay is not larger than T, then stop, otherwise go to
Step 4.
Step 4: Use data dependent gate delay model to get a more accurate delay along the path obtained in
Step 3 (side input using the rising/falling times obtained in Step 1). Then choose the maximum value between the path delay computed by data dependent delay model and previous T as current T value. Go to Step 2.
We will prove that the last T value is an upper bound of the circuit performance. In order to simplify the proof, we assume the circuit has only one primary output.
Given a primary input vector v for a circuit, we can nd a subset of paths such that the longest path in the subset of paths can not underestimate the delay of the circuit under vector v for any delay assignments. The subset of paths can be constructed as follows:
We rst derive the stable value of each lead in the circuit under v. From the primary output to the primary inputs, we use the following criterion to choose the input leads. If some inputs have controlling value, choose one with controlling value arbitrarily; if all the leads have non-controlling value, then we choose all the input leads. In this way we can generate a subset of leads. These leads construct a subset of the paths of the original circuit. Proof For any delay assignments, the longest sensitizable path under v is either in SPath v or there is at least one path with delay larger than the longest sensitizable path under v. This is because the longest sensitizable path is the longest path satisfying the condition that each on-input of the path is either the earliest controlling input or non-controlling input with all its side input being non-controlling inputs under v. But a path in SPath v is the path such that each on-input of the path is either controlling input (not necessary the earliest) or a non-controlling with all its side inputs being non-controlling inputs under v. Table 2 : Longest path of 10 ISCAS Benchmarks the transition whether from NC to C or C to NC, we created the data dependent delay for all of ISCAS benchmarks.
In Table 2 , TL denotes the topological long path algorithm based on the bounded delay model, OurTL denotes the topological long path algorithm based on the simpli ed data dependent delay model. ChenDu denotes the algorithm for computing the longest sensitizable path 2]. Combination is the algorithm described in the previous section. OurTL algorithm is more accurate than the longest path sensitization algorithm 2] (except C1908) with less computation time. The combination algorithm get the most accurate results with the most computation time.
In Table 3 , TS and OurTS denote the shortest topological short path algorithm based on the bounded delay model and the simpli ed data dependent delay model respectively. Both algorithms have the same accuracy for the shortest delays of circuits C880, C2670, C5315, C6288 and C7552. However, ourTL algorithm give more accurate estimations for circuits C432, C499, C1355, C3540. In general, ourTL algorithm takes more cpu time.
Conclusion
In order to avoid simulation in the timing analysis for CMOS circuits, we rst proposed a simpli ed data dependent delay model. This allows us to develop several e cient algorithms 
