Abstract-As the minimum feature sizes of VLSI fabrication processes continue to shrink, the impact of process variations is becoming increasingly significant. This has prompted research into extending traditional static timing analysis so that it can be performed statistically. However, statistical static timing analysis (SSTA) tends to be quite pessimistic. In this paper we present a sensitizable statistical timing analysis (StatSense) technique to overcome the pessimism of SSTA. Our StatSense approach implicitly eliminates false paths, and also uses different delay distributions for different input transitions for any gate. These features enable our StatSense approach to perform less conservative timing analysis than the SSTA approach. Our results show that on average, the worst case (µ + 3σ) circuit delay reported by StatSense is about 20% lower than that reported by SSTA.
I. INTRODUCTION
In recent times, statistical timing analysis has received significant attention in both academe and industry. This has been primarily due to the fact that process variation control has not kept pace with the rapidly diminishing feature sizes. While a lot of research has suggested that statistical timing analysis is essential for timing closure in VLSI design today, the use of this new method of timing analysis has not been readily welcomed by all chip designers. It is not just the reticence of designers towards adopting a new design methodology that is preventing/slowing the adoption of this new timing approach. There is also a legitimate concern that the results of statistical timing analysis tend to be overly pessimistic. Besides, statistical timing analysis takes longer to run. It also requires a greater effort during the gate library characterization phase. Designers are hence skeptical about the benefits of this new timing analysis methodology.
The are many sources of pessimism in statistical timing analysis and many of them are dependent on the method used for the analysis. Some of the common sources are: 1) Spatial correlations 2) Path correlations 3) Approximation of PDFs (Probability Density Functions) to Gaussian distributions (usually done during calculation of MAX of two PDFs) 4) False paths 5) The assumption that gate delays are Normally distributed In this paper we deal with the last two sources of pessimism. The approach discussed in this paper also implicitly considers path correlations and does not approximate PDFs to Gaussian distributions. In particular, each input transition at a gate (which results in an output change) is assumed to have a Normal distribution. Since there may be several such input transitions that cause some output transition, the resulting delay at the output consists of several Normal distributions (one for every input transition that causes an output change).
II. PREVIOUS WORK
Most techniques for statistical timing analysis are essentially based on the principles of Static Timing Analysis (STA). Hence statistical timing analysis is often called statistical static timing analysis (SSTA). The fundamental operations in a SSTA tool are the SUM and the MAX operations. Most SSTA algorithms rely on smart ways to implement these SUM and MAX operations for delay distributions, rather than use a single discrete delay value.
In [1] , the authors use PCA (Principal Component Analysis) to handle spatial correlations. They assume all delay distributions to be Gaussian and approximate the MAX of 2 or more Gaussian distributions to be Gaussian as well. In [2] , a canonical first-order delay model is proposed and an incremental block based timing analyzer is used to propagate arrival times and required times through a timing graph in this canonical form. One of the major contributions of the algorithm proposed in [2] is that it allows the statistical timing engine to be used incrementally. In [3] , [4] , [5] , the authors note that accurate statistical timing analysis can become exponential. Hence, they propose faster algorithms that compute bounds on the exact result rather than the exact result itself. In [6] , the authors propose representing the arrival times as CDFs (Cumulative Distribution functions) and the gate delays as PDFs to help perform the SUM and MAX operations efficiently. In [7] , the authors propagate delay distributions (PDFs) through a circuit. The PDFs are discretized to help make the operation more efficient.
The common theme in all the above works is that they are based on the static timing analysis framework. Hence only the structurally long paths are identified through these algorithms. The authors of [8] identify this deficiency and come up with a statistical timing analysis flow that considers false paths. While the authors of [8] reduce pessimism by considering false paths, they do not address the pessimism that arises from considering the gate delay distribution to be a single Gaussian.
Our approach eliminates false paths implicitly. In the statistical timing analysis flow discussed in [8] , a traditional SSTA is done, followed by an attempt to find sensitizable paths. In our approach, this order is reversed. We first find the primary input vector transitions that result in the sensitizable longest delays for the circuit, and then do a statistical analysis on these vector transitions. This statistical analysis utilizes, for each gate, the particular Normal distribution which corresponds to the input transition that the gate undergoes for the longest delay to be sensitized.
The main contributions of this paper are two-fold:
• By utilizing a sensitizable timing analysis tool, our approach implicitly eliminates false paths. SSTA does not eliminate false paths, leading to pessimistic results.
• For each input transition on the gate (which causes an output change), our approach utilizes separate Normal distributions, hence the statistical delays reported at the circuit level are more representative of the true circuit behavior. In SSTA, a single normal distribution is used for any gate, regardless of the input transitions that the gate undergoes.
III. OUR APPROACH
Our approach eliminates false paths and also accounts for the fact that the delay of a gate has different Normal distributions for different input transitions (which cause an output transition). Our approach consists of two phases. In the first phase we find a set of logically sensitizable vector transitions that result in the largest delays for the circuit. In the second phase, we use Monte-Carlo based techniques to propagate the arrival times for these delay-critical sensitizable vector transitions, and come up with a delay distribution at the outputs. The input transitions at any gate are known after the first phase, and so the gate delay distribution corresponding to this input transition is utilized in the second phase. The second phase therefore performs SSTA, using the appropriate gate delay distribution corresponding to the input transition for each gate. In the remainder of this section, these two phases are described, along with a discussion on how input arrival times are propagated for any gate.
A. Phase 1: Finding Sensitizable Delay-critical Vector Transitions
To make sure that we don't spend time performing statistical analysis on false paths, we first find a user-specified number of sensitizable vector transitions that result in the largest delays for the circuit. This is done using the sense [9] package in SIS [10] . Sense uses a SAT solver to verify if a particular delay (initially set to the delay found from a static timing analysis) is sensitizable. If there is no satisfiable input vector that produces this delay, then the delay value is reduced in steps till we reach a delay D that has a satisfying vector (a vector on the primary inputs that has a delay D). In its original implementation, sense returns only the critical delay of the circuit. We augmented the sense routine to return the vector (final vector) at the primary inputs, as well as all the possible previous vectors at the primary inputs that cause this delay. A change from any previous vector to a final vector is referred to as a vector transition. The set of input transitions is stored in an array for use in the second phase of our statistical timing flow. We then insert the complement of this largest sensitizable delay vector as a SAT clause in the sense's SAT routine and run sense again to get the next critical vector. We continue this till we get a large enough set of delay-critical vector transitions. The number of vector transitions collected before we move on to the second phase of the flow is decided based on desired accuracy and available time for computation. In the second phase of the flow, we propagate arrival times in a manner that exploits the fact that we know the input transition at each gate. This is explained in the following section.
B. Propagating Arrival Times
In a regular static timing analysis, we find the structurally worst case delay. In our timing analysis we take advantage of the fact that we know exactly what transitions cause a node to switch. The details of how we do this is explained with the example of a NAND2 gate. Let us first consider just the nominal delay of a NAND2 gate. Table I is a list of input transitions that cause the output of the NAND gate to change its logic value. Let AT f all i denote the arrival time of a falling signal at node i and AT rise i denote the arrival time of a rising signal at node i.
In the case of a regular STA, the rising time (delay) at the output c of a NAND2 gate is calculated as
) is often referred to as the pin-to-pin rising output delay from the input a, while M AX(D 11→00 , D 11→10 ) is referred to as the pin-to-pin rising output delay from the input b.
Similarly, in STA the falling time (delay) at the output c of a NAND2 gate is given by
) is often referred to as the pin-to-pin falling output delay from the input a, while M AX(D 00→11 , D 10→11 ) is referred to as the pin-to-pin falling output delay from input b.
For example, if the worst case falling or rising arrival time at inputs a and b was 10ps and 35ps respectively, then the rise delay at c would be calculated to be = MAX(10+50.5, 35+53.0) = 88.0ps. Similarly for a falling c output, the delay would be MAX(10+55.3,35+55.3) = 90.3ps. However this is a pessimistic method of calculating the delay. In our approach we attempt to remove some of this pessimism.
Let us first consider the rising output. The output of the NAND2 gate switches high when any of the two inputs switches low. From the output of sense we can find the actual vector transition that causes the largest delay for a given circuit. This primary input vector transition induces a transition on the gate inputs. Let us assume that this input transition was 11 → 00 for the NAND2 gate. A naive way of calculating the delay would be to state that the delay would be given by
Assuming again that the arrival times at inputs a and b were 10ps and 35ps respectively, the delay would be then be calculated as MAX(10,35)+30.5 = 65.5. However, we do know that the output would start switching before 65.5 since signal a arrives earlier than signal b. As a result, we can say that the gate effectively goes through the transition 11 → 01 → 00 rather than 11 → 00 directly. Note that the output of the NAND2 gate falls for the vector 01 as well. Hence, we calculate the delay to be
In our example, the delay is hence MIN(10+50.5,35+30.5) = 60.5. Note that we used the minimum of two delays in this case since any one input falling causes the output to switch. Also note that the delay calculated (60.5ps) is much smaller than the worst case delay calculated using regular STA (88.0ps). The reduction in pessimism in our approach occurs due to the fact that we have information about the input transition for the gate.
Now consider the case of the falling output. The output of the NAND2 gate switches low only when both the inputs switch high. Again, we exploit the fact that sense provides the actual vector transition that caused the critical delay. Let us assume that the induced input transition for the NAND2 gate was 00 → 11. A naive way of calculating the delay would be to state that the delay is ) + D 00→11 Assuming again that the arrival times at inputs a and b were 10ps and 35ps respectively, the delay would be calculated as MAX (10, 35) 
As a result, we can say that the gate effectively goes through the transition 00 → 10 → 11 rather than 00 → 11 directly. Hence, in our approach, we calculate the delay to be
In our example, the delay is hence MAX (10+55.3,35+42.7) = 77.7. Note that we used the maximum of two delays in this case since both inputs need to switch to cause the output to switch. Also note that the delay calculated (77.7ps) is smaller than the worst case delay calculated using regular STA (90.3ps).
These results are shown graphically in the Figures 2 and  3 . These plots show the arrival time of the output c of a NAND2 gate, for the 00 → 11 and 11 → 00 transitions respectively. The arrival time of one of the inputs a is fixed to zero and the arrival time of the other input b swept between -150ps to 150ps. The propagated delays are shown for STA and our method, along with the delay found by SPICE [11] . As can be seen from these plots, our method of calculating the arrival times for multiple switching inputs matches SPICE quite accurately and is significantly better (less pessimistic) than a traditional STA method for computing arrival times. We can similarly derive the equations to calculate the arrival times for any arbitrary gate, depending on the input transitions at that gate. Let us consider a NAND3 gate with inputs {a, b, c}. Let us first consider the inputs to the NAND3 gate changing as follows:
The output of the NAND3 gate switches low only when the inputs are 111. Hence the delay of the gate would be calculated as follows: Now let us consider a NAND3 gate with its output rising. Let the inputs change as below
In this case, the output of the NAND3 gate starts switching high when at least one of the inputs is logic 0. Hence the delay of the gate would be calculated as:
An extension to handling delay distributions is easily done by simply considering the distribution to be made of several distinct delay values, obtained from the PDF of the gate delay.
C. Phase 2: Computing the Output Delay Distribution
In the second phase of the computation, we perform Monte Carlo analysis on the sensitizable vector transitions that result in the largest delays for the circuit (which were computed in the first phase, described in Section III-A). In each of the STA runs for Monte Carlo analysis, we perform arrival time propagation as described in Section III-B. Since the primary input vector transitions may induce transitions on the input of each gate, the delay distribution of the gate for the corresponding gate input transition is used. A random value of the gate delay is computed from this distribution. This is done for each gate in the circuit. Finally, STA is performed, using these delay values. The resulting maximum delay over all the outputs is used to compute the worst case delay distribution of the circuit.
In a NAND2 gate we have 3 different input rising transitions that cause an output falling transition (these are shown in the bottom half of Table I ). For any iteration of STA, if we choose the value of delay for one of the 3 transitions (say 00 → 11) to be µ 00→11 + nσ 00→11 , we choose the value of the other two transitions (01 → 11, 10 → 11) to be µ 01→11 + nσ 01→11 and µ 10→11 + nσ 10→11 respectively. 
IV. EXPERIMENTAL RESULTS
In order to demonstrate the effectiveness of our technique, we tested our technique on several benchmark circuits from the ISCAS89 and MCNC91 benchmark suite. For all simulations, we assumed a 0.1µm process and used the BPTM 0.1µ process [12] We first characterized each of the standard cells in our library to come up with a table of values for the mean and standard deviation of the delay of each transition (that cause a change in the output). This pre-characterization was done for a set of different capacitance values. This pre-characterization was done using SPICE. The parameters considered to be varying, along with their variations, are given in Table II . In this table, all parameters are modeled such that their σ is 5% of their µ.
The characterization results for a NAND2 gate (with a load capacitance of 6fF) are shown in Figures 4 and 5 . Figure 4 shows the delay histogram for the three vector transitions which result in a rising output. These vectors are 11 → 00, 11 → 01 and 11 → 10. Note that each of these vector transitions exhibit different output delay distributions. Similarly, Figure 5 shows the delay histogram for the three vector transitions which result in a falling output. These vectors are 00 → 11, 01 → 11 and 10 → 11. Note that each of these vector transitions also exhibit different output delay distributions. The mean and standard deviation of all these distributions are computed and used in the second phase of our algorithm.
During the timing analysis phase of our approach, we interpolate between these capacitance values to find the mean and standard deviation of the delay for the given load capacitance value.
Next we carry out the first phase of our flow. We use sense to find the top few sensitizable critical delays and their corresponding input vector transitions. The result of the first phase of our approach is a set of vector transitions on the primary inputs of the circuit. In our experiments, we utilize the top 50 (or 25) primary input vector transitions that result in the largest circuit delay.
For the second phase of our approach, we propagate these transitions throughout the circuit. Since we have the knowledge of the input transitions at each gate, we use the arrival time propagation methodology explained in Section III-B to compute the arrival time at the gate output. This step of propagating circuit delays is done 1000 times in our experiments (or as many times as is required to get a reasonably stable and accurate estimate of the mean and standard deviation of the maximum delay of the circuit). For each of these 1000 iterations, a random value of delay is chosen for each gate. This random value is chosen from a Gaussian distribution with a µ and σ derived from the precharacterized table of values for each gate. Note that the µ and σ used for any gate correspond to the vector transitions that appear at that gate, for the primary input vector transition being simulated. We assume that the variations of process parameters within a gate are correlated (i.e. the threshold voltages and channel lengths of all the devices within the gate vary in the same manner).
To enforce this assumption, we must choose the random delay value carefully. For example, in a NAND2 gate we have 3 different input rising transitions that cause an output falling transition (these transitions are shown in the bottom half of Table I ). For any iteration of the timing analysis, if we choose the value of delay for one of the 3 transitions (say 00 → 11) to be µ 00→11 + nσ 00→11 , we must choose the value of the other two transitions (01 → 11, 10 → 11) to be µ 01→11 + nσ 01→11 and µ 10→11 + nσ 10→11 as well. Columns 12 through 17 have the same information as Columns 6 through 11, except that the StatSense simulations for these columns were performed using 25 input vector transitions (which result in the largest sensitizable circuit delay). The purpose of this experiment was to verify if the StatSense runtime can be reduced by simulating fewer input vector transitions. By comparing Columns 8 and 14, we note that there is no appreciable loss of fidelity when 25 input vector transitions are used, instead of 50. The worst case circuit delay (the µ + 3σ delay), averaged over all designs, is almost identical in both cases. The benefit of using 25 input vector transitions is indicated in Column 17, which shows that on average, StatSense (with 25 input vector transitions) requires only about 50% more runtime that SSTA. For most of the circuits used in our experiments, the difference between the nominal delays for the first and fiftieth vector chosen was large. In cases, where this delay difference is small, a larger number of vectors will need to be used.
In spite of the fact that SSTA conducts 10000 STA iterations, and StatSense conducts 50000 (or 25000) iterations, the runtime of StatSense is not 5× (or 2.5×) that of SSTA. This is because StatSense performs an event driven delay simulation. Whenever there is no transition at the output of a gate g, delay computations for gates in the fanout of g may be avoided. This pruning is not possible in SSTA. V. FUTURE WORK AND CONCLUSIONS In response to the growing impact of process variations, there has been much research in extending traditional static timing analysis so that it can be performed statistically. The resulting statistical static timing analysis (SSTA) approaches are, however, quite pessimistic. This pessimism arises from the fact that most static timing analysis tools and their statistical counterparts do not consider false paths. The second major source of pessimism is that most statistical timing analyzers assume delay distributions at all gates in a design to be Gaussian. However, the delay distribution of a gate is not necessarily Gaussian. In fact the delay distribution for a multi-input gate is Gaussian for each input vector transition that causes a change on the gate output. In this paper we present a sensitizable statistical timing analysis (which we call StatSense) technique to overcome the pessimism of SSTA. Our StatSense approach implicitly eliminates false paths, and also uses different delay distributions for different input transitions for any gate. These features enable our StatSense approach to perform less conservative timing analysis than the SSTA approach. Our results show that on average, the worst case (µ + 3σ) circuit delay reported by StatSense is about 20% lower than that reported by SSTA. In the future, we plan to work on techniques to reduce the runtime of the statistical timer. This would allow us to use more input vector transitions for the statistical analysis. We also plan to investigate methods to find out the minimum number of vector transitions required to get a realistic statistical timing result.
