Abstract
Introduction
Process technology and environment-induced variability of gates and wires in VLSI circuits makes timing analysis of such circuits a challenging task [1] . More precisely, advanced analysis tools must be developed that are capable of verifying the changes in the circuit timing that stem from various sources of variations. These sources may be broadly classified as follows: imperfect CMOS manufacturing processes (e.g., variations in L, T OX , V T or ILD thickness), environmental factors such as drops in V dd (resistive drop and ground bounce), substrate temperature changes (due to movement of local hot spots over the chip area), and device fatigue phenomena (e.g., electro-migration, hot electron effects, and negative bias temperature instability) [5] . Static timing analysis (STA) is corner-based. As the number of sources of variation increases, the number of required STA runs rises exponentially. Since it is impossible to analyze all corners, some of the missing corners may result in failures after the chip is manufactured [5] . Notice that the identification of the corner-point is a complicated task which is dependent on the precise interconnect and gate structure [6] . Statistical timing analysis (denoted by TA) provides an effective solution to this important problem [1] [3] [4] [5] .
TA approaches can be classified into two major groups: 1) path-based , 2) block-based TA. Because of the shortcomings in path-based TA, block-based TA has been received a lot of attention. In block-based TA, every timing quantity of interest (e.g., delay and slew, arrival time and required arrival time) is represented as a function of global sources of variation (denoted by X i ) and independent random sources of variation (denoted by S i ) in the canonical first-order (denoted by CFO) form [5] . The advantages of such a formulation are that (a) it can capture all correlations and (b) it can produce delay sensitivities due to changes in various environmental and process-related parameters. As a result, designers are able to precisely quantify the sensitivity of a timing parameter to different sources of variation and use this information for timing diagnostics. Block-based TA breaks its analysis into two parts: 1) variational interconnect timing analysis and 2) variational gate timing analysis. Variation-aware interconnect timing analysis is studied in [2] . The authors express the resistance and capacitance of a line as a linear function of random variables and then use these r.v.'s to compute circuit moments. These variability-aware moments are used in known closed-form delay metrics [8] [9] to compute interconnect delay PDF's. The authors in [3] , propose a modeling technique for gate delay variability considering multiple input switching. In [4] , a model for calculating statistical gate-delay variation caused by intrachip and inter-chip variability is presented.
Unfortunately, block-based TA is lacking in variationaware gate timing analysis. Recent works do not provide an efficient means of analyzing the gate propagation delay and output slew as a function of variational input transition, variation-aware gate timing library, and variational gate load. In this paper a new framework is proposed for finding variational gate timing behavior. This is achieved by using VGTA (for Variation-Aware Gate Timing Analysis): 1) Given the variational resistive-capacitive load (where all resistances and capacitances are represented in the CFO form), an efficient and accurate algorithm is proposed to calculate variation-aware RC-load. To perform the analysis, we calculate the variation-aware admittance moments (c.f. section 3), and as a result, the resistance and capacitances in the RC-load can be written in CFO form.
2) Given the variational input transition, statistical gate timing library, and variational RC-load, the objective is to find variational gate delay and output slew in the CFO form. In order to achieve the aforementioned goal, a "variation-aware effective capacitance" technique is proposed (c.f. section 4). This method is based on an efficient and reasonably accurate single-iteration C eff approach.
The remainder of this paper is as follows. In section 2, we review the background of block-based TA. Variation in the physical dimensions of the wire causes change in its resistance and capacitance, thereby, making the gate delay and slew as well as wire delay and slew to vary accordingly. Therefore, we need to capture the effect of geometric variations on the electrical parameters. Classifying the sources of variation into global and independent random sources of variation, we represent electrical parameters of the wire (i.e., R and C) in the CFO form. For instance, R and C in the CFO form are calculated as follows [7] : 
where R nom and C nom represent nominal resistance and capacitance values, computed when the wire dimensions are at their nominal or typical values. X i 's are the global sources of variations and S r and S c represent the independent random sources of variation for the resistance and capacitance, respectively. r i and c i are the sensitivity coefficients of resistance and capacitance with respect to the sources of variations, respectively. With appropriate scaling of the sensitivity coefficients, we can assume that X i , S r , and S c are unit normal distributions N(0,1).
Converting into CFO form
As mentioned in sections 2.1 and 2.2, it is important to represent timing and electrical quantities in the CFO form. This in turn enables one to propagate first order sensitivities to different sources of variation through timing graph [5] [7] . In addition, it makes statistical computation efficient and practical and provides timing diagnostics at a very small cost in run time. The remaining question is how to convert a quantity of interest (which itself is a function of different CFO variables) into the CFO form.
The following subsection presents a method to answer the above question. We use an example to show the procedure. The problem we address is how to convert gate propagation delay into the CFO form. However, this method can be easily applied to any quantity of interest. 2.2.1 Gate timing analysis for lumped capacitive load in block-based TA Problem Statement I: Given is a variational CMOS driver (due to process and environmental variations) where its input rise time, T in, is in the CFO form and drives an output capacitive load, also, in the CFO form. The objective is to find the gate propagation delay and output slew in the CFO form.
The gate propagation delay is a function of the input transition time, the logic gate characteristics (e.g., the W/L ratio, threshold voltage of transistors, V dd , and temperature), and the output load. In commercial ASIC cell libraries, it is possible to characterize various output transition times (e.g. 10%, 50%, and 90%) as a function of above variables, i.e.;
where denotes the percentage of the output transition, t is the output delay with respect to 50% point of the input signal, and f is the corresponding delay function. The terms in the bracket capture the gate characteristics and environmental factors, T in is the input transition time, and C L is the output capacitive load. In block-based TA, T in , C L , gate characteristics, and environmental factors are represented in CFO form as a function of global and independent random sources of variations. Hence, to represent t in CFO form, we substitute them with their corresponding CFO models. Differentiating with respect to the global and independent random sources of variation, t , in CFO form, as a function of m global sources of variation and p independent random sources of variation can be written as:
Considering S j s as independent unit normal sources of variations, Eqn. (4) in CFO form can be re-written as:
As an example, suppose A and B are two given CFO random variables as shown below:
Therefore, for addition, subtraction, multiplication and division of a and b, we have; a) Addition and subtraction: 
3.

RC-Load Calculation in CFO form
Previously the situation in which the load is purely capacitive was discussed. However, in VDSM technologies, one cannot neglect the effect of interconnect resistance of the load on the gate delay and output slew. In STA, a more accurate approximation for an n th order load seen by the gate (i.e., a load with n distributed capacitances to ground) is to use a second order RC-model (c.f. Figure 1(b) ). Equating the first, second, and third moments of the admittance of the real load with the first, second, and third moments of the RC-load, we can find C n , R , and C f as:
where, Y k,in is the k th moment of the admittance of the real load. In TA, it is required to consider the effect of variability of the load on the gate timing analysis [10] , Thus; Problem Statement II: Given is an RC network representation in a design as shown in Figure 1(a) , where each R and C is in the CFO form. The objective is to find an equivalent variational RC-load (i.e., C n , ,R , C f of Figure 1 C n , R , and C f are functions of the admittance moments as in Eqn. (6) . Hence, examining the variational admittance moments leads us to evaluate the CFO of RC-load parameters. This can be done by differentiating the expressions in Eqn. (6) with respect to the sources of variation (c.f. section 2.3). However, as it is shown, since a recursive operation is utilized to calculate the variational admittance moments, we always represent the admittance moments in CFO form during the recursion, such that the complexity of presenting moments does not increase as recursive operation proceeds. As a result, we propose the following recursive approach to obtain the admittance moments in the CFO form.
Consider the RCY segment shown in Figure 2 . Assume the admittance at nodes i and j are represented with infinite series by using the admittance moments as in Eqns. (7) and (8) where Y k,i denotes the coefficients of s k is the k th moment of the admittance of the node i. Thus, the admittance at node i is computed recursively in terms of the admittance at node j as follows [11] :
Assume the admittance moments of node j are written in the CFO form. Thus, by differentiating Y k,i with respect to the sources of variations, Y k,i moments can be also represented in the CFO form (c.f. section 2.3). This can help us not to increase the complexity of presenting the moments as the recursive function proceeds. 
By using the above statistical recursive operations, we easily compute the moments of Y in =Y 1 in the CFO form.
4.
Gate Timing Analysis for the RCLoad in block-based TA Problem statement III: Given is a variational CMOS driver, where its input rise time, T in , is in CFO form and which drives a variational RC-load while the resistance and capacitances of this load are also in CFO forms. The objective is to find the gate propagation delay and output slew in CFO form. Section 2.3.1 solves the same problem where the gate drives a variational pure capacitive load in the CFO form. Therefore, if we substitute the RC-load with its equivalent variational C eff , then the solution to problem statement I is an acceptable solution to problem statement III.
To perform accurate gate delay and output slew calculation, an iterative calculation of C eff is inevitable [12] [13] [14] . However, as the number of sources of variations increases, the number of required C eff runs rises exponentially (it is proportional to number of corners), and thereby, it can be quite CPU-intensive. We propose an efficient technique to find C eff in CFO form. Suppose the actual C eff in the CFO from can be represented as: 
Since C eff calculation is iterative, we define C eff k (in CFO form) as an approximate presentation for actual C eff (in CFO form), which is resulted from the first k-iterations of the iterative C eff algorithm as: 
C eff 0 means representing C eff using total capacitance algorithm (i.e. C n +C f ) and C eff 1 means the value of the effective capacitance by using single iteration and so on. We define c eff k ,i / c eff k ,nom and c eff,i / c eff,nom as iterative and actual normalized sensitivity coefficients (denoted by NSCs), respectively. The NSCs capture the effect of the load variation on the C eff value. It can be shown that in each iteration, the iterative NSCs change slightly (for k 1), and they converge to their actual NSC values. i.e., , , , for 1 i m+1, we can write C eff in the CFO form. Using the method presented in section 2.3, we obtain the gate delay and output slew in the CFO form.
Step 2 is performed by using well-known STA-based (nonvariational) C eff algorithm [12] [13] [14] .
Step 3 is a simple algebraic equation while step 4 is performed as per section 2.3. For step 1, the following sections show how to calculate the C eff 0 and C eff 1 in the CFO form.
Finding Variational C eff using C eff
0
As we mentioned before, C eff 0 approximates C eff with the sum of the total capacitance (i.e., C n +C f ). Therefore, the C eff 0 in the CFO form is equal to the sum of C n in the CFO form and the C f in the CFO form, i.e. if, In this section we find the nominal value of the effective capacitance by performing iterative C eff calculation for the nominal conditions of the circuit. Next we find NSC's by applying a single-iteration effective capacitance method. First, we present an efficient single-iteration technique for a reasonably accurate C eff calculation in STA and we use it to further our discussion for calculating the NSCs. Based on its definition, the effective capacitance, C eff , is a pure capacitance that can replace an RC-load such that both RC-and C eff loads store the same amount of charge until a certain point of the output voltage transition (e.g., the 50% point of the output transition.)
To perform C eff calculation, we need to assume a reasonable output waveform for the CMOS driver (c.f. Figure  3(a) .) The actual output voltage waveform behaves as a combination of ramp and exponential waveforms as shown in Figure 3(b) . We assume that the actual C eff is calculated as a simple average of the C eff obtained for the ramp output waveform and the C eff which is obtained for the exponential 
We have shown that the iterative effective capacitance Eqn. for matching any % point of the gate output transition time can be written as (proof is omitted for brevity):
ln ( Furthermore, we have derived that if the output voltage of a gate is approximated with a ramp voltage waveform with % to % rise time of T R( ) , then the iterative C eff equation for any % output transition point is written as (proof is omitted for brevity):
Thus, based on the simple average assumption, the iterative equation for actual C eff calculation for any % point of the output transition time is:
where 0 1 is the linear combination factor of the exponential and ramp waveforms.C eff 1 means using single-iteration of Eqn. Subsequently, using the same approach as in section 4.1, we can find the C eff in the CFO form while the NSCs are calculated using the above single-iteration C eff technique. Experimental results confirm that evaluating variational C eff using the above approach shows an average error of 7% in the final delay and output slew calculation with respect to Monte Carlo simulation.
Experimental Results
Our experiments use 90nm CMOS process parameters to model gates and interconnect parasitics. We use standard CMOS gates of various sizes to determine the accuracy of our gate timing analysis. We assumed two different configurations for the experimental setup. The first one consists of two inverters connected in series whereas the second one is a CMOS inverter followed by a 2-input NAND gate. For both configurations, we apply a ramp input to the first inverter while its nominal value is chosen from the set (t in ) nom ={10ps,80ps,150ps,220ps,300ps}. For the first configuration, size of the first inverter is fixed at W p /W n =30/15µm whereas size of the second inverter is chosen to be one of W p /W n ={20/10, 50/25, 70/35, 100/50} m. For the second configuration, size of the first inverter is again fixed at W p /W n =30/15µm whereas this time the size of the succeeding 2-input NAND gate is chosen to be one of W p /W n ={40/40, 50/50, 100/100} m.
To characterize the timing behavior of the gate, a k-factor equation based library is employed which represents the gate delay and output slew as a function of input rise time and output capacitive load, V dd , and temperature.
We apply different loading scenarios for the second-stage gate as explained in the following subsections, i.e., pure capacitive load, and general RC load. We have also considered four different global sources of variation (V dd , temperature, Metal layer 1 width, and ILD) and one independent random sources of variation for each electrical parameter (i.e., r and c) and timing parameter (for instance t in ) in the circuit. The sensitivity of each given data to the sources of variation is chosen randomly, while the total variation for each data is chosen to be 10% and 15% of their nominal value. Mean and variance of the effective capacitance, the gate 50% propagation delay, and 10%-90% output transition time (slew) are calculated using the approaches presented in this paper.
To compare the results, we ran HSPICE Monte Carlo simulation tool on each test scenario and derived mean and variance of effective capacitance, the gate 50% propagation delay, and 10%-90% output transition time. The average percentage errors for the mean and variance of effective capacitance, the gate 50% propagation delay, and 10%-90% output transition time between the obtained results from the HSPICE and the calculated results based on using VGTA algorithm are reported.
A. Capacitive Load:
The load in this section is considered to be purely capacitive. Its nominal value is chosen to be (C) nom = {400, 500, 800, 1400}fF.
We performed our experiments on both circuit configurations explained above. The results for the first configuration (where the second gate is an inverter) are presented in Table 1 . The results for the second configuration are provided in Table 2 . Experimental results indicate an average error of about 3% for two different values, i.e. 10% and 15%. As we increase the value (i.e. the total variation for each data; e.g. variation of t in , and c l ) from 10% to 15%, the error in calculated mean and variance of the delay and slew increase, but slightly. The sources of error can be mainly classified into two groups: 1) the inaccuracy of the gate library table lookup and 2) the linear first order approximation of the timing and electrical parameters with respect to the sources of variation. Note that, the runtime of the proposed algorithm in average is 165 times faster than the Monte Carlo based approach. 
B. General RC Load:
For this section, the load is considered to be an RC tree of varying topology. The nominal value of the total resistance of the load is chosen to be from the set (R) nom = {150, 260, 300, 710, 1000} and the nominal value of the total capacitance of the load is chosen to be from the set (C) nom ={400, 500, 800, 1400}fF.
Again, we performed the experiment on both circuit configurations as explained before. The results for the first configuration (where the second gate is an inverter) are presented in Table 3 (when the C total is used for calculating the NSC) and Table 4 (when the single iteration C eff is used for calculating the NSC). The results for the second configuration are also provided in Table 5 (when the C total is used for calculating the NSC) and Table 6 (when the C total is used for calculating the NSC). Experimental results indicate an average error of about 19% for different values when the C total is used for calculating the NSC. It also shows an average error of about 7% for different values when the single iteration C eff is used for calculating the NSC. As we increase the value (i.e. the total variation for each data; e.g. variation of t in , c n , r , and c f ) from 10% to 15%, the error in calculated mean and variance of C eff , the gate delay, and output transition time increase, but slightly. The sources of error can be mainly classified into five groups: 1) the inaccuracy of the gate library table lookup, 2) the linear first order approximation of the timing and electrical parameters with respect to the sources of variation, 3) the error in calculating the variational RC-load and 4) the error in the effective capacitance iterative equation. 5) the error in NSC approximation (Eqn. (14)). Note that, the runtime of the proposed algorithm is, in average, 145 times faster than the Monte Carlo based approach. 
Conclusion
In this paper we presented a framework to handle the variationaware gate timing analysis in block-based TA. First, we proposed an approach to calculate variational RC-load, which can be utilized instead of the actual variational RC load for the gate timing analysis purposes. 7.
