Abstract-For CMOS feature size of 65 nm and below, local (or intra-die or within-die) variations in transistor Vt contribute stochastic variation in logic delay that is a large percentage of the nominal delay. Moreover, when circuits are operated at low voltage (Vdd ≤ 0.5V), the standard deviation of gate delay becomes comparable to nominal delay, and the Probability Density Function (PDF) of the gate delay is highly non-Gaussian. This paper presents a computationally efficient algorithm for computing the PDF of logic Timing Path (TP) delay, which results from local variations. This approach is called Non-linear Operating Point Analysis for Local Variations (NLOPALV). The approach is implemented using commercial STA tools and integrated into the standard CAD flow using custom scripts. Timing paths from a 28nm commercial DSP are analyzed using the proposed technique and the performance is observed to be within 5% accuracy compared to SPICE based Monte-Carlo analysis.
INTRODUCTION
There are three categories of process variations that are important in design of modern CMOS logic [1] .
1. Global random variation in gate length, gate width, flatband voltage, oxide thickness and channel doping. 2. Systematic or predictable variations, such as variations in litho or etch or CMP. 3. Local random variations in transistor parameters. Local random variations are assumed to be random from one transistor to another within a die.
This paper deals with the effect that local variations in CMOS transistor parameters have on logic timing at low voltage (Vdd ≤ 0.5V). Local variations are primarily the result of variations in the number of dopant atoms in the channel of CMOS transistors [2, 3] . Local variations have long been known in analog design and in SRAM design. In analog design, local variations are called "mismatch" because of the mismatch in the V t of adjacent transistors. But they have not generally been a problem for logic. However, shrinking of transistor geometries and low voltage design, for ultra-low power applications, make local variations increasingly important for logic.
Some approaches for handling local or intra-die variations have been proposed [4] [5] [6] [7] [8] . However this paper extends previous work in several important ways. It describes an approach to calculate the PDF of logic TP delay which results from local transistor variation. We call this approach the Nonlinear Operating Point Analysis for Local Variations (NLOPALV). At Vdd ≤ 0.5V, circuit delay is highly nonlinear in V t variation. The result is that the PDF of the delay is highly non-Gaussian. As shown later in this paper, the Gaussian assumption results in substantial error at Vdd = 0.5V. This approach deals only with local variations and needs to be used in conjunction with conventional approaches to generate global fast and slow corners.
The effect of local variations is very different from the effect of Global variations. Whereas global variations in delay add linearly, local variations do not. Let us assume, for a moment that the PDF of each cell is Gaussian. If we have a number of such cells having local stochastic delay characterized by standard deviation of σ i , the delays add in "quadrature", meaning that the variance of the TP delay is given by 2 2 TP i σ σ = ∑ . If we add the σ i linearly, we would get an overly pessimistic result. In this paper, we employ the concept of "operating point", which has the advantage that the stochastic delay of a TP can be computed by linearly adding the operating point delays of the respective cells.
The first step in calculating the effect of local variations on logic timing is the characterization of standard logic cells for local variation. Characterization needs to be done for each arc for each cell. An arc is defined by 1) input rise or fall, 2) output load capacitance and 3) input slew. Characterization needs to be done for each global corner of importance.
The analysis of this paper starts with pre-characterized logic cells. For every arc of every cell (every arc-cell) there exists a PDF of the delay. Referring to Fig. 1 , it is clear that any arbitrary (in general non-Gaussian) PDF can be mapped onto a unit variance Gaussian through the Cell-Arc Delay Function (CADF) D(ξ). Note, for a Gaussian PDF, D(ξ) is a straight line: D(ξ)= σ i ξ. Characterization also provides the Cell-Arc Slew Function (CASF) S(ξ), which is the most probable output slew for each value of ξ. The computational efficiency of the NLOPALV approach results from the fact that the entire PDF of the TP is not usually required. In TP analysis, we are usually interested in the f-sigma delay where f ~ +3 for setup time and f ~ -3 for hold time. In the space allowed for this paper, the underlying the NLOPALV approach can not section II, the NLOPALV approach will be p derivation. The validity of the approach will by the accuracy of the results on actual circ section II and III. The theoretical justif presented in a future publication.
II. MULTI-STAGE LOGIC PATH SSTA
The goal of multi-stage SSTA is to determ TP delay (or in general the f-sigma TP incurring the computational expense of comp PDF of the TP delay. In general, the PDFs delays are highly non-Gaussian. However, e Gaussian case, the concept of f-sigma delay h shown in Fig 1 , the 1-sigma point is the 84.134 3-sigma point is the 99.8650% quartile etc. non-Gaussian case, the σ defined by quartile is standard deviation of the non-Gaussian vari NLOPALV approach, the TP operating poin and the f-sigma operating point delay is app linear combination of operating point delays f logic cells. This is done without the need simulations or additional SPICE simulations.
A. Single Logic Path
We start with a timing path of N logic cell is characterized by D i (ξ i ), and we want to sigma TP delay. For simplicity, we assume th delays D i are statistically independent. (We w this simplifying assumption is not true, and this complication.) We define the f-sigma oper
where
Equation (1) needs to be solved iteratively operating point of the TP is determined, we the f-sigma timing path delay as a linear s point delays of the constituent logic cells: ( )
complete theory be presented. In presented without be demonstrated uits presented in fication will be A ANALYSIS mine the 3-sigma delay) without puting the entire of the logic cell even in the nonhas meaning. As 45% quartile, the However, in the s unrelated to the able. Using the nt is determined, proximated by a for the individual for Monte-Carlo ls, each of which determine the fhat the stochastic will see below that we will address rating point as:
( 1) y, but once the can approximate um of operating 
1) Algorithm for Non-linear Op Local Variations (NLOPALV)
This approach leads us to a determining the operating point and can be summarized as follows: 4. Compute new estimate of the operating point using the expressions given by (4).
5. Repeat steps 3 and 4 until the operating point converges to a constant value.
6. The TP delay can then be obtained as:
where, ) (
D ξ is directly obtained from the cell characterization data and:
B. Integration with the CAD flow
In order to effectively use SSTA in the design process, it is important to integrate the NLOPALV algorithm with existing CAD flow. The NLOPALV algorithm presented above is integrated with a commercial STA Timer. Fig 3 describes the process in the form of a flow-chart. 
C. Results
To validate the NLOPALV approach developed here, we tested it on logic paths taken from a commercial Digital Signal Processor implemented in 28nm technology, operating at 0.5V. We used the NLOPALV algorithm and the corresponding CAD flow to determine 3-sigma delay for the logic timing path and compared the results with those obtained from SPICE based Monte-Carlo analysis. Fig. 4 shows the comparison results for different timing paths. The 3-sigma delay obtained from NLOPALV analysis is within 5% accuracy compared to 10,000 points SPICE based MonteCarlo analysis. Theoretical analysis shows that the NLOPALV approach always underestimates the stochastic delay compared to Monte-Carlo analysis and this is validated by Fig. 4 . Figure 5 . TP delay PDF: The zero-sigma delay is the nominal delay. The Gaussian approximation is chosen such that the standard deviation for the Gaussian is same as the 1σ delay for NLOPALV Fig 5 shows comparison between the NLOPALV approach and Monte-Carlo for a typical timing path at 0.5V. It shows excellent agreement between the NLOPALV approach and Monte Carlo and also illustrates the inadequacy of the Gaussian approximation at low voltage. It is informative to note that:
1. The PDF of the TP delay peaks at a point in time that is less than the nominal delay (zero-sigma delay) 2. The mean of the non-Gaussian PDF lies 1.6ns to the right of the nominal delay, whereas in the Gaussian approximation, the stochastic delay has zero mean. 3. The 3σ stochastic delay is delay is 13.41ns, compared to a nominal delay of 9.17ns. This shows that the variation at 0.5V can be much higher than the nominal delay itself. 4. The 3-sigma stochastic delay calculated using the Gaussian approximation is 8.53ns compared to the actual 3-sigma stochastic delay of 13.41ns. This shows that the Gaussian approximation is highly optimistic.
The NLOPALV analysis was performed on different paths of different lengths, taken from the 28nm Digital Signal Processor. Table 1 summarizes additional results for a few of these paths. The 3-sigma delay (nominal + 3σ stochastic delay) computed using NLOPALV shows excellent agreement with Monte-Carlo. This contrasts with the large errors that result when the delay is assumed to be Gaussian. 
III. TIMING PATH SETUP AND HOLD ANALYSIS
After verifying the approach on single logic paths, we now extend it to perform setup and hold time analysis on timing paths including clock paths. 
Where, T CLK is the clock period for the design. The PDFs for setup/hold time for the registers are obtained from cell characterization. The NLOPALV approach described in section II can be used to compute
by considering the paths D1 and D2 together and computing the +/-3σ operating point for the cells in both these paths taken together. This approach is verified by considering timing paths along with the corresponding clock paths from a Digital Signal Processor. The 3σ setup/hold slack is computed using the NLOPALV approach described above and the results are compared with those obtained from SPICE based Monte-Carlo simulations. Timing paths taken from the same DSP are used and the results are summarized in Fig. 7 in the form of % error compared to Monte-Carlo analysis. The NLOPALV computation runs in linear time with respect to number of stages, whereas the Monte-Carlo analysis has exponential run time. In this approach, no expensive Monte-Carlo simulations are required during timing closure. Our approach does not assume delay PDF to be Gaussian and can handle the case where delay is highly non-linear function of random variables with non-Gaussian PDF. The concept of operating point greatly simplifies computations despite nonlinearities without sacrificing accuracy. Furthermore, this approach could be extended for optimizing the designs to reduce the stochastic delays.
The NLOPALV approach has proved very useful for performing cell characterization as well as timing closure. Future publications will cover:
1. Details of NLOPALV theory. 2. Cell characterization using NLOPALV. 3. MAX operation for convergent paths. 
