CMOS scaling has led to incr easingly high variability in device and circuit performance. To improve design robustness, it is important to consider variation in the design flow. In this paper a closed-form solution is proposed to predict the variability in gate timing, which significantly reduces computation cost in statistical analysis. The proposed model covers both nominal delay and its variability across a wide range of device sizes, load capacitances and input transition times. Stack effect, such as that in NAND and NOR gates, is taken into account thereby making the model sensitive to the switching patterns. For ISCAS'85 benchmark circuits, implemented using a 45nm library, the model demonstrates high accuracy with less than 3.5% error for nominal delay and within 5ps variation of the critical path. Finally, use of the proposed model in design flow is demonstrated for setup time violations.
Introduction
As CMOS technology nodes shift towards 45nm and below, process variations increase significantly causing high variability in circuit performance and thus reducing manufacturing yield. To improve design robustness the effect of variability should be accounted for during the design flow. Conservative design for an approximated worst case performance is not recommended because high variability in scaled technologies leads to over designing. Analyzing variability in complex circuits through SPICE simulations consumes enormous run time. What is needed is an accurate analytical model to predict variability very fast so that it could be integrated into the design flow.
There are many analytical models [1] [2] [3] [4] [5] [6] to predict delay under nominal conditions. But these models do not take into account the effect of variability, which is critical in future technology nodes [7] . Also most of the work done on variability analysis has been confined to inverter; other gates are simplified into equivalent inverters through logical effort [8] [9] . This approach, though simple, is not accurate and cannot be easily extended to complex gates.
In this paper, a closed form solution to estimate nominal delay and delay variability due to variations in threshold voltage, is proposed. The model is verified extensively with SPICE Monte Carlo simulation results at 45nm with Predictive Technology Models (PTM) [10] . In this paper,
• An accurate analytical delay model is derived for CMOS inverter as a function of gate width, load capacitance and input transition time (Section 2.1).The analysis is extended to NAND and NOR gates to account for stacking effect (Section 2.2).
• Analytical model, for estimating delay variability due to threshold voltage variations, is developed for inverter and NAND2 gates (Section 3). The model shows good correlation with SPICE simulations.
• The accuracy of the model is demonstrated through small circuits like XOR2 as well as complex ISCAS'85 benchmark circuits. For the benchmark circuits, nominal delay for critical paths is predicted within 5% error compared to Synopsys prime time estimated value using 45nm technology library [11] (Section 4.2).
• Use of the proposed model into the design flow is demonstrated. It is shown how possible timing errors due to variability can be easily identified (Section 4.3).
Nominal Delay Model

Timing Model for Inverter
To derive the nominal delay model for a basic inverter at scaled technologies, the standard current equations using Shockley's MOSFET model or Alpha-Power law MOSFET model cannot be used. This is because as the saturation current is not constant for technology nodes below 50nm due to channel length modulation. The following current equation from [12] which considers channel length modulation, is used to develop the analytical delay model.
where α is the velocity saturation index and is considered to be 1 for the technology nodes considered. λ is the empirical channel length modulation factor. V th is the threshold voltage of the NMOS device. I D0 is the drain current at V GS = V DS = V DD . V DSAT is the drain saturation voltage at V GS = V DD . Since the operating range of V DD is very small for technologies under 45nm and V DSAT does not vary much in this range, V DSAT is also considered to be the saturation voltage for all V GS as in [12] .
The delay model is developed for output high to low (HL) delay: T phl for an inverter and the same model is applicable for output low to high (LH) delay as well. Input of the inverter, V in is considered to be a linear rising ramp wave driven by an active driver, with transition time t r . So at time t, V in (t) = V DD (t/t r ). Similar to [12] the discharging behavior of output node can be divided into different regions: Region 1. V in < V th : Here NMOS is in the cut-off region and output is at V DD . Region 2. V th < V in ≤ V DD : Here NMOS is in saturation. Unlike [12] , coupling capacitance between gate and drain is not considered because its effect is significant only for sharp input edges and can be ignored for practical circuits. Short circuit current is also ignored since, in an inverter, both PMOS and NMOS conduct only when V th < V in < V DD -V th (p) , where V th(p) is threshold voltage of PMOS, and for scaled technologies, V DD -V th(p) -V th is very small. If C L is the load capacitance and I n is the current through the NMOS device, then the output voltage V out is given by,
By substituting saturation current equation from (1), we get 
For scaled devices, transition times can no longer be ignored in the delay equations. Propagation delay is defined by the difference of times when V in = 0.5V DD (that is 0.5t r ) and when, V out = 0.5V DD . V out reaches half of V DD in either Region 2 or Region 3 depending on the value of input transition time and output load capacitance.
• For slow input or small load capacitance, V out reaches half V DD in Region 2. Tphl is obtained from equation (3) and is given by.
[ ] • For fast input or large load capacitance, V out reaches half V DD in Region 3. Tphl is obtained from equation (4) and is given by 
Model Validation:
The model is validated for a wide range of transistor widths, load capacitances and transition times with SPICE simulations. First, width is varied from twice the minimum length to 20 times the minimum length and for this case, fanin and fanout are fixed at FO4. Next load capacitance is varied by sweeping from FO4 to FO20 and keeping fanin at FO4. Here width of NMOS transistor is fixed to be 4 times the minimum length. Then input transition time is varied by sweeping fanin of the gate with fanout fixed at 10. Here the NMOS width is set to 4 times that of the minimum length. Figure 1 shows HL delay values predicted by the model and SPICE simulations for 45nm technology. As seen from the figure, delay is almost constant with varying width. Delay is proportional to load capacitance and it is also proportional to transition time for small transition times but saturates for large transition times. The figure also shows that model is continuous between Region 2 and Region 3. At 45nm node, the analytical model for nominal delay matches the SPICE values with average error of 1.08% when width is width is varied. 2.95% error when load capacitance is varied and 1.83% when input transition time is varied. At 32nm node, the average errors are 0.71%, 4.58% and 3.15% with varying width, load capacitance and input transition times, respectively. Thus the model is also accurate for lower technology nodes.
Timing Model for NAND and NOR gates
The delay mode derived for an inverter is extended to handle stacked transistors in NAND and NOR gates. Here the output voltage discharge characteristics depends on the transistor stack placed between the switching input and the output. Transistors placed between switching input and supply nodes do not affect the output and hence the delay. For instance, in Figure 2 , when input is given to A1, output depends only on transistor M1, where as, when the input is given to A2, output depends on both transistors M1 and M2. The two cases are considered separately as follows. is the threshold voltage of M1. According to Elmore's law, delay is proportional to
where C L is the load capacitance and C X is the capacitance at node X . The first term in equation (7) is t vx = R 2 (C L +C X ), which is the time to discharge C L and C X through M2. The second term in equation (7) t vout = R 1 C L , is the time to discharge load capacitance through M1. So, the total HL propagation delay of NAND2 gate when input is given to the bottom transistor is given by Depending on input transition time, t sat and t vx can be less than or more than t r . t vx < t r : In this case input is still rising when V x reaches V xf M2 is in linear region with rising input and V out is solved with I n in the linear region is given by
The constant C is found using the boundary condition when V out is equal to V DSAT at t= t sat . The time when V out reaches V xf is ( )
where ( )
t sat < t r , t vx ≥ t r : During the time from t sat to t r , M2 is in linear region with rising input and voltage at V x is given by equation (10) . Let the voltage at V x reach V x, t r when t = t r . Then the time taken to discharge V x from V x, t r to V xf is given as R 2 C X. ln(V x, t r /V xf ), 
Here V t is the threshold voltage of M1 or M2 depending on whether the input has fast or slow transition time. When input has fast transition edge (t r < t sat ), current through M2 is large, current through M1 is limited by M1, so V t = V th(M1) . If input has slow transition edge, current through M2 is small and current through M1 is limited by M2, so V t = V th . The time to discharge C L from V DD to half V DD is given by Case 2. Input given to top transistor in the stack: When input is given to top transistor, V x is already discharged. So only V out has to discharge from V DD to half V DD through the stack. This is equivalent to an inverter where both M1 and M2 are together considered to be a single transistor of approximately half the width. The delay is given by equations (5) Model Validation: Figure 3 shows the characteristic behavior of NAND2 gate using the proposed model and SPICE simulations, when the input is given to M2 (bottom) transistor. Delay values with respect to varying transistor widths, load capacitances and transition times are shown. Similar to INV delay, delay in NAND2 gate is also almost invariant to width, varies linearly with C L and t r for fast transitions and saturates for slow transitions. The average error when input is given to M2 and varying the width is -0.16%, when varying C L the error is -0.27% and for varying t r the error is 1.17%. The parameters required in the model are given in Table  I . The parameters α, λ, Vth, ID0 and VDSAT are extracted from device characteristics. The parameters load capacitance and final voltage Vxf, that node X reaches are parameters from the circuit level. All other parameters like Kx, Ky, Kz, Klog, K2 and C are derived from the parameters in Table I .
Delay variability model
Variability in inverter
Variation in delay due to threshold voltage is analyzed in this section. Here threshold voltage variation is assumed to follow Gaussian distribution hence, the delay variation should also follow Gaussian distribution with standard deviation, P T σ [13] . 
For an inverter with slow rising input or small load capacitance, inverter delay follows equation (5) and variability in such a case is given by The maximum difference between SPICE and model estimated σ/µ is 0.68% when width is varied, 0.47% when C L is varied and 1.09% when t r is varied for HL delays at 45nm technology. Here, σ/µ simulated by SPICE is 4.5%. 
Variability in NAND and NOR gates
Delay variations in NAND and NOR gates are also derived using equation (16) . Delay variation for NAND2 depends on whether input is given to top transistor or bottom transistor of the stack.
Case 1. Input given to bottom transistor in the stack:
When input is given to bottom transistor (M2 of Figure 2) , Vth variation in any of the top or bottom transistors affects delay variation. The delay of NAND2 is given by equation (8) and it is a function of both V th(M1) and V th(M2) (V th(M2) = V th in NAND delay equations). So partial derivative of (8) with respect to V th(M1) or V th gives delay variation because of threshold voltage variation in M1 or M2, respectively.
Case 2. Input given to top transistor in the stack:
When input is given to top transistor, delay depends only on V th of top transistor. So variation in delay is given by inverter delay variation, as in equations (17) or (18) depending on the region when V out reaches half V DD . So variation in bottom transistor should have almost no effect on the delay in this case. Table II shows variation when input is given to top (M1) and bottom (M2) transistors. In each case V th of only M1 or M2 is varied. NAND2 gate is loaded with FO10 and fanin is set at FO4. SPICE and model results show that, when input is given to M1, V th of M1 has strong effect on delay variability while V th of M2 has very little effect. But when input is given to M2, V th of both M1 and M2 affect delay variability. SPICE results closely match with model estimated values. For NAND3 gate, delay variability depends on V th variation of top and bottom transistors when input is given to bottom transistor. It does not depend on middle transistor because variations in middle transistor are easily compensated by either top or bottom transistor. When input is given to middle transistor, variations in top and middle transistors affect delay variability. When input is given to top transistor variations in top transistor alone affect the delay variations. 
Validation with benchmark circuits
XOR gate
The analytical model is applied to XOR2 gate. The estimated delay values are compared with SPICE values. The nominal delays of all the stages are added to compute the delay of the circuit. Variability is also found for each stage and since variations in each transistor are considered to be independent, equation (18) is used to estimate total circuit variability.
The circuit of XOR2 gate is shown in Figure 5 . The HL delay, when input A is set to 0 and B switches from V DD to 0 is considered. The LH delay, when B switches back from 0 to V DD is considered. The other pattern is when B is 0 and A is switching since these activate critical paths. Results are summarized in Table III . The maximum difference between SPICE and model predicted σ/µ percentage is less than 1%. Such an accurate prediction of nominal delay and delay variation has been possible because the model considers load capacitance, transition times and stacking effect. Comparison of nominal delay at 45nm: For ISCAS benchmark circuit, the nominal delay estimated for critical paths using the model and spice simulations are shown in Table IV . Model prediction of nominal delay is with maximum percentage error being 3.5% compared to that of spice simulation. While the results here are for Nangate library [11] , the model can easily be applied to other standard libraries.
Model validation for ISCAS'85 benchmark circuits
Variation prediction with the model: The delay variability of critical paths for five of the ISCAS benchmark circuits are also estimated. ±3% V th variation ( Table V summarizes the delay variation for the benchmark circuits. The model predicted average variation is within 1ps -5ps accuracy compared to that of SPICE simulation, depending on the circuit topology, internal node capacitance and the transition time. The time taken to estimate variability with the model is a very small fraction of the time taken to run SPICE simulations.
Though channel length variation contributes to significant portion of process variation, for a first order approximation Vth variation model can quickly help identify critical path delay variation in early stage of design flow. The run time for the proposed analytical model is multiple orders less than that of the time taken for HSPICE simulation to estimate the variation. Run time taken for 100 Monte Carlo simulations in HSPICE is about 2 hours for a circuit with 202 gates and is about 4 hours for a circuit with 1113 gates. Thus, for designs with larger number of gates, the run time can be too large for practical applications. In the analytical model, the major contributor for run time is the time taken to extract the capacitance of the nodes in the critical path and identify the critical path itself. Yet, for a circuit of about 2500 gates, the runtime is less than approximately 10s. The calculation of mean and delay variation is only few milliseconds. Hence the run time is negligible and as the analytical model predicts variation with reasonable accuracy, it can be integrated into the design flow for robust design.
Analytical model usage in design flow
The proposed analytical model can be used to identify possible timing violations in the early stage of the design flow. Setup violations are caused by delay variations in critical paths. Variability is low in these paths because of the averaging effect. However, paths with slightly smaller delay can have larger variability and can become critical [15] [16] . Figure 6 shows the delay distribution graph for the C880 benchmark circuit at nominal conditions and with Vth variation. As it can be seen from the graph, the distribution widens because of variation and the number of critical paths increases. Some of the noncritical paths at nominal conditions have now become critical. The number of shortest paths also increases. The minimum delay decreases and this can cause a hold violation. Similar trend is seen in another circuit ISCAS C7752. Here the critical path has a nominal delay of 1885.4ps and paths with 5% less than critical path delay are also considered as critical paths. Consider Path-2 with a delay of 1787.3ps, that is 5.2% smaller than the critical path and so not considered to be critical. Now with a slight Vth variation of 
Conclusion
An analytical model for predicting nominal delay and delay variability of CMOS gates has been proposed. The results closely match those generated by time consuming Monte Carlo simulations for a wide range of gate size, load capacitance and transition times. The model is simple and can be used to accurately predict timing variability during design phase. Hence it can be easily integrated into the design flow to identify timing violations due to variations and helps enable robust design implementation during early stages of the design.
