Abstract-A generalized methodology for modeling the effects of process variations on circuit delay performance is proposed by directly relating the variations in process parameters to variations in delay metric of a digital circuit. The 2-input NAND gate is used as a library element for 65 nm gate length technology, whose delay is extensively characterized by mixedmode simulations. This information is then used in a general-purpose circuit simulator SEQUEL, by incorporating appropriate templates for the NAND gate library. A 4-bit × 4-bit Wallace tree multiplier circuit, consisting of about 300 2-input NAND gates, is used as a representative combinational circuit to demonstrate the proposed methodology. The variation in the multiplier delay is characterized by an extensive Monte Carlo analysis. To extend this methodology for a generic technology library with a variety of library elements, modeling of NAND gate delays by response surface methodology (RSM), in terms of process parameters, is carried out using design of experiments (DOE). A simple piecewise quadratic model, based on the least squares method (LSM), is proposed for one-parameter variation to address significant cubic effects observed in the delay response function. Then, a hybrid model for gate delays is generated by superimposing the interaction terms of DOE-RSM model upon the quadratic model of one-parameter variation to address the generalized case of simultaneous variations in multiple process parameters. The proposed methodology has been demonstrated for NAND gate library with 266 gates, and the simplicity and generality of the approach make it equally applicable to a large library of cells for both statistical timing analysis and statistical circuit simulation at the gate level.
I. INTRODUCTION
The design of high-performance digital application-specific integrated circuits (ASICs), in the deep-submicrometer (DSM) regime calls for incorporating the performance fluctuations caused by processinduced parameter variations. Variability is fast emerging as a major challenge that threatens to impact the yield at its best and the circuit functionality, at its worst. Even as device dimensions are being aggressively scaled to achieve faster and more complex integrated circuits, the process tolerances are not becoming tight enough, resulting in increased effects of process variations on device and circuit characteristics. The problem is not the amount of variability, but the variability turned into uncertainty if it is not modeled and cancelled out by design techniques, as uncertainty can only be handled by vastly over guard-banding of designs. It is expected that performance variances caused by this mismatch in short-channel MOS circuits may ultimately introduce a limitation for device scaling in integrated circuits [1] . The factors that cause mismatch can be broadly classified as systematic variations and random variations. Processing gradients across the wafer introduce systematic variations related to wafer maps during manufacturing, which are independent of device size. The random variations include extrinsic and intrinsic variations and are dependent on device size. The extrinsic variations are caused by variations in implant dose, implant energy, oxidation, and annealing temperature, all of which are equipment related. The intrinsic variations are due to random fluctuations in channel dopant number, gate oxide thickness, interface charge, oxide charge, and interlevel dielectric permittivity. The corner SPICE parameters of the transistors are derived taking into account both extrinsic and intrinsic variations, which encompass the complete fluctuations of the processes, under worst case. In other words, they correspond to variations in transistor parameters across the wafer lots in the fabrication environment. However, the variations in parameters of transistors located within a die have some correlation, as enunciated by Pelgrom's classical mismatch model [2] . This is because the variations in individual processes have short-range and long-range orders; in addition, some of the processes have nonzero correlations.
The typical timing closure issue in digital ASIC design is illustrated in Fig. 1 . The data from a register REG1 of pipeline stage goes through combinational logic block and is latched in REG2. The typical design closure criterion of the slowest arrival time of inputs at REG2 due to worst case logic delay and the fastest arrival time of CLK2 due to worst case clock skew can result in a very pessimistic design. The notion of probabilistic design has been introduced recently to address this issue [3] . Furthermore, algorithms for statistical timing analysis have been suggested for efficient timing sign-off of digital integrated circuits [4] .
In this context, it would be very desirable to have a direct relationship between the gate delay of various library elements and the underlying manufacturing process parameters. This will facilitate the realistic evaluation of circuit delay variability considering the actual short-range and long-range orders of individual processes, as well as the correlations. The design will then become more robust and less conservative.
Response surface modeling has been used to identify and relate significant process parameters to device parameters that are taken as response variables, by a quadratic model, to optimize device design [5] . A device design methodology using the second order response surface modeling by multivariable optimization using process sensitivity considerations has been demonstrated [6] . The design of experiments-response surface methodology (DOE-RSM) has been used for optimizing semiconductor process to help reduce design/ analysis cycle time [7] . An integrated system called DOE/Opt has 0278-0070/$25.00 © 2007 IEEE been prototyped for performing DOE, response surface modeling, and optimization, using coupled process and device simulations for process control modeling and statistical process optimization [8] . Similarly, there has been prior work using RSM to relate circuit parameters to variations in SPICE parameters [9] , [10] . Monte Carlo simulation based second order polynomial modeling approach has been proposed to correlate statistical variations in device parameters to random variations in process parameters [11] . Recently, the mixed-mode simulation approach was presented to correlate the inverter delay variations to implant dose variations [12] . In this paper, we present a methodology to evaluate the effect of process variations on the delay variations of a complex digital circuit. An optimal second order "hybrid model" is obtained using DOE-RSM modeling and least squares method (LSM) technique for gate delays directly in terms of multiple process parameters. We demonstrate that the worst case design approach is overly pessimistic, and the hybrid model based statistical design approach results in robust circuit design.
We perform mixed-mode simulations, which bring the processsimulated devices directly into the netlist of the circuit, wherein both circuit and device equations are solved simultaneously. Process/device simulation is considered appropriate to the study of process sensitivity as it enables the precise control of process variations that are difficult to achieve experimentally. A commercial technology computer-aided design (TCAD) tool suite from Integrated Systems Engineering (ISE) has been used for process and device simulations [13] . The generalpurpose circuit simulator, i.e., A Solver for circuit EQuations with User-defined ELements (SEQUEL), has been used for circuit simulations [14] .
Section II discusses the design of 65 nm gate length transistors and the process sensitivity at the device/circuit level and delay characterization of NAND gate. Section III describes the principles of statistical modeling methodology. Section IV presents the SEQUEL simulations for delay distributions of 4-bit × 4-bit multiplier circuit for a set of process parameters, with individual variation and simultaneous variations. Section V concludes with a summary of results.
II. PROCESS SENSITIVITY AND DELAY CHARACTERIZATION OF NAND GATE
The overall flow of events that transform the process variations to relevant delay distributions using various simulation tools and models is illustrated in Fig. 2 . The nominal NMOS and PMOS devices with 65 nm physical gate length are designed and optimized for an off-state leakage current constraint of 10 nA/µm at V dd = 1.2 V. The disposable spacer process sequence [15] , with pocket halo and super steep retrograde channel (SSRC) implants, has been used for source/drain and channel engineering. Fig. 3 shows the 65 nm gate length NMOS/PMOS devices generated using process and device simulation based design approach. The contours of doping concentration are shown along with their values in atoms per cubic centimeter. A set of process parameters, whose variability has a significant impact on device parameters, is identified based on our simulations and published literature [1] , [11] . They include the gate length (L g ), gate oxide thickness (T ox ), halo dose, SSRC dose, halo tilt angle, and source/drain anneal temperature. The process parameter variations are assumed to have a Gaussian distribution with a ±3σ variation of ±10% of the nominal value, except for the anneal temperature for which it is taken as ±10
• C. These sigma levels are in accordance with [1] and [11] . A set of NMOS/PMOS devices with these assumed variations in each of the six process parameters, taken one at a time, is generated by process simulations using the DIOS process simulator. All the devices are simulated with drift-diffusion transport model to obtain I d -V g and I d -V d characteristics, and their respective saturation currents I on are measured. For device simulations, QCVandort channel quantization model, band-to-band recombination model, mobility models for doping, normal-field dependence, and high-field saturation (velocity saturation) are included.
The percentage variation in the saturation current I on at the device level, with variations in process parameters considered, is presented in Table I . The relative deviation of any parameter x about its nominal value x nom is calculated as ∆x = (x − x nom )/x nom . It can be seen that the variations in L g , T ox , and halo dose have the maximum impact on drive current variations for both NMOS and PMOS, which is in accordance with the published literature.
Using these devices, a two-stage 2-input NAND gate (Fig. 4) is simulated to evaluate its transient behavior. The mixed-mode simulation approach is used with the DESSIS device simulator. Both NMOS and PMOS are simulated at full device level. An area factor for PMOS devices is considered to be twice that of NMOS, to account for the difference in carrier mobility, as reflected in their drive currents. An input pulse V in with a rise and fall time of 1 ps is applied, and the stage delay of the first stage at its output Y is monitored, when loaded by an identical second stage. Delay values for rising and falling edge Table V . Similar lookup tables are generated for variations in other process parameters such as T ox , halo dose, SSRC dose, halo tilt angle, and annealing temperature. This completes the delay characterization of NAND gate.
III. MODELING METHODOLOGY
An analytical model proposed in [16] to relate NAND gate delays and device saturation currents based on CV /I metric, although efficient in tracking variations in one process parameter at a time, fails to capture the effects of interaction between process parameters when multiple parameters are simultaneously varied. However, these interactions are too significant to be ignored [2] . Hence, this paper attempts to model the dependence of gate delays upon process parameter variations by statistical methods.
To model the relationship between the gate delay with simultaneous variations in multiple process parameters, the statistical technique of DOE is used. The DOE is performed, and second order models are built for rising and falling edge gate delays using RSM [17] - [19] .
A 3-level face centered central composite (FCCC) design of resolution VI [20] for six process parameters is designed with 52 experimental runs. This design is a highly fractionated 3-level DOE for fitting second order response surfaces. In FCCC design, the star points are at the center of each face of the factorial space, with α = ±2.38. The parameter α indicates the distance of the axial point from the center point in the normalized parameter space. To ensure that the design is rotatable, the value of α is selected as α = [2 6−1 ] 1/4 , where 2 6−1 is the number of factorial portion of runs [20] . For six factors, a fraction of 52 experiments have been chosen out of a large set of 3 6 (= 729) full factorial design. The 2 6−1 fractional factorial design, which results in 32 factorial runs, is augmented with 12 axial star points and 8 replicated center points to yield a total of 52 experiments to be conducted in the FCCC design for 6 factors. The process parameters under consideration are varied by ±10% except for anneal temperature, which is varied by ±10
• C about their nominal values. It is assumed that ±10% or ±10
• C corresponds to ±3σ variation in the process under study.
The second order models that have been obtained by the regression technique using simulation results are long polynomials of the form
where β 0 is a constant, x i are the normalized process parameters varying between −1 and +1, and β i are the corresponding regression coefficients determined by the data obtained from the response surface DOE, for i = 1, . . . , 6.
An optimum second order model is obtained by removing all insignificant effects and recomputing the regression. The NAND gate delays that have been modeled are the rising and falling edge delays caused due to A input, B input, and when A and B inputs are tied together.
The percentage variation of gate delays as a function of −10% to +10% variation of process parameters is shown in Fig. 5 . The delay variation is calculated at each X value as a percentage of its value for the nominal design. To detect and fit the cubic effect seen in the falling edge delay response due to variation in gate oxide thickness T ox in Fig. 5(a) , the minimum number of levels needed for factor settings is 4. Thus, a 3-level FCCC design will not fit these response functions. However, a 4-level DOE-RSM modeling is prohibitive in terms of number of experimental runs and, hence, in terms of cost and computational effort. Under these circumstances, to achieve this response fit with only three levels for factor settings and a resultant quadratic model, the −10% to +10% range is split into two regions: −10% to 0% and 0% to +10%. This piecewise modeling is justified because the nominal device in sub-100 nm technologies is typically designed very aggressively. As a result, from the device perspective, we expect that the roll-off in any given device response below the nominal design would be significantly different from the one above the nominal design. A simple quadratic model is obtained for the gate delay response as a function of every process parameter in each of these regions. The piecewise quadratic model considered is of the form
where β 0 is a constant, x i are the normalized process parameters, and β i are the corresponding regression coefficients determined by performing nonlinear regression analysis using LSM. The LSM model does not contain any interaction terms between process variables as it is carried out taking one process variable at a time and for all process variables independently. The two-way and three-way interaction terms obtained from the DOE-RSM modeling are then superimposed on the quadratic models obtained by the LSM method to generate a hybrid model equation of the form set out in (1), whose constant, linear, and quadratic terms come from the LSM model, and the interaction terms come from the DOE-RSM model. This hybrid modeling approach ensures that one-parameter variations are tracked closely and, at the same time, interaction effects between process parameters are effectively captured. The significant computational benefits of this approach come from the selection of minimum number of levels needed for factor settings as 3 instead of 4. Also, the limitation of quadratic model in fitting a cubic or higher order response observed is overcome by fitting the response with piecewise quadratic models to the two split regions. Although the model is somewhat heuristic, this novel idea proves to be computationally efficient and yet very effective, as will be demonstrated in Section IV.
The hybrid models for delay response variables have been tested for their validity to predict the response values by various residual plots, which are found to be satisfactory [20] . The plot of residuals and predicted response does not exhibit any pattern to the residuals. Table VI . This model accuracy is taken to be adequate considering that we have fitted cubic or higher order response effects with piecewise quadratic models using a 3-level FCCC design with some improvisation. The correlation plots for the gate delays along with their correlation coefficients r are shown in Fig. 6 .
IV. DELAY DISTRIBUTIONS OF A DIGITAL CIRCUIT
A 4-bit × 4-bit Wallace tree multiplier circuit is designed using 2-input NAND gates as a library element. The input combination that results in worst case circuit delay is identified by full coverage of the input vector space, i.e., by applying all possible (= 256) input combinations to the multiplier circuit, and then used in all subsequent simulations. The transient analysis of the circuit is carried out to obtain the circuit delay using the SEQUEL circuit simulator. The event-driven simulation capability of SEQUEL for gate-level simulation is used. The systematic variations are modeled by assuming that the process parameters vary as per Gaussian distribution with varying mean of 0%, +5%, and −5% of the nominal value. The random variations are modeled by assuming a ±3σ variation of ±5% of the nominal value around respective mean values. The ±5% variation is only for the short-range process variation, and we treat it to be the subset of the worst case of ±10% variation. In other words, the delay variation of the NAND gates of this multiplier on any given chip is a subset of the overall NAND gate delay range, which is determined by the global variations in process parameters.
A probability distribution for the circuit delay in generating the multiplier output is obtained using the rigorous Monte Carlo simulations by randomly varying different process parameters individually and then simultaneously. A custom Monte Carlo code is written that treats each of the process parameters as an uncorrelated random input variable and is integrated with SEQUEL simulator. As a result, every NAND gate in the circuit obtains various process parameter values, as per the assumed Gaussian distribution of respective process parameters. It is presumed that the two NMOS and two PMOS devices constituting the NAND gate are closely spaced as to suffer identical process variations. The delay values for different transitions of different gates that take place in the circuit are assigned from the lookup table by applying linear interpolation. To guarantee accurate results at minimum computational cost, 10 000 Monte Carlo trials are performed. Figs. 7-9 show the delay distribution for variations in gate length L g , oxide thickness T ox , and halo dose, respectively, with each process parameter acting individually. The distribution obtained using rigorous mixed-mode simulation delay values is overlaid with the distribution obtained using hybrid model approach. We observe a fairly good match for these two distributions. The statistics for variations in L g , obtained by analyzing the resulting distributions, are presented in Table VII . Nominal delay is the circuit delay obtained when all the devices in the circuit have the nominal process parameter values. Similarly, best and worst delays are obtained when all the devices in the circuit have the best or worst process parameter values, respectively. The model statistics track the actual statistics extremely well, thus validating the hybrid model approach.
The methodology has been generalized to consider simultaneous variations in multiple process parameters. To begin with, for simplicity, simultaneous variations in two dominant process parameters are considered, and a large lookup table is generated that contains 25 delay values, corresponding to 25 device/circuit splits with nominal, ±5%, and ±10% variations for two parameters. This is realized by generating all 25 pairs of NMOS/PMOS devices and performing mixed-mode simulations of NAND gates using these devices. Then, Monte Carlo simulations are performed by generating two uncorrelated random numbers for every NAND gate in the circuit, one for each parameter, as per the assumed statistics of process parameters. The delay values for different transitions of different gates that take place in the circuit are assigned from the lookup table by applying two-dimensional interpolation. Then, delay distribution plots are obtained by the previous method and by using the hybrid model equations to generate gate delay values for simultaneous variations in L g and halo dose. This parameter combination is selected as they are the significant process parameters from the perspective of variability. Delay distribution plots for simultaneous variations in L g and halo dose are shown in Fig. 10 , and their statistics are given in Table VIII . The results of hybrid model differ from that of mixed-mode simulation by less than 0.3% in terms of mean and by less than 0.6% in terms of standard deviation at their worst. This demonstrates that the hybrid model yields reasonably accurate results with less computational requirements, apart from being scalable to variations in multiple process parameters. The model statistics track the actual statistics extremely well, thus validating the hybrid model approach for simultaneous variations in two process parameters.
A rigorous verification for simultaneous variations in more than two process parameters requires 5 n device/circuit splits, where n is the number of process parameters that are varying simultaneously. Hence, the number of device/circuit splits required increases in a power series fashion, as n increases. With simultaneous variations in two (n = 2) process parameters, the predictive ability of the hybrid model has been demonstrated. In other words, the hybrid model has adequately captured the interaction effects between process parameters. Since all interaction terms have come from the same single step of DOE-RSM modeling, it stands to reason to extend this methodology to multiple process variables. The hybrid model can be used to gain some useful insight in timing analysis for design closure. Suppose that the worst case design methodology is used to obtain the delay spread of the multiplier circuit. Then, for simultaneous variations in six process parameters, the delay spread will be from 243.2 to 337.7 ps. Furthermore, the delay distribution obtained using Monte Carlo analysis is as shown in Fig. 11 . The process parameters are assumed to vary by ±10%. In the worst case methodology, all the NAND gates in the multiplier take identical set of process parameters for any given trial in the Monte Carlo loop. On the other hand, if we take the statistical design approach, each of the NAND gate can take a random set of process parameters in every trial. The distribution obtained using this methodology is overlaid in Fig. 11 . We see that the worst case approach gives a standard deviation of 7.3 ps, whereas the statistical approach gives 1.5 ps. For simultaneous six-parameter variations, normalized delay variation is (337.7 − 243.2)/6 = 15.75 ps with traditional worst case design using the hybrid model. However, the corresponding value with statistical design using hybrid model would be (284.9 − 274.0)/6 = 1.82 ps. Clearly, the delay distribution is tighter by almost an order of magnitude, demonstrating the significance of statistical design with the hybrid model. The statistics are summarized in Table IX . It should be noted that ±10% variation in process parameters includes systematic and random variations. Thus, using such a wide variation in itself is very pessimistic for the timing closure problem illustrated in Fig. 1 . It is more realistic to consider that the process could be centered around either −5%, 0%, or +5% points, and there would be a random variation of ±5% around these points. We have again performed Monte Carlo analysis for this scenario to obtain delay distribution. Fig. 12 shows the distribution using worst case analysis and the statistical methodology. The worst case methodology results in very pessimistic design with a significantly higher standard deviation compared to statistical approach. Table X summarizes these results.
V. CONCLUSION
In the DSM regime, the deterministic circuit design approach may not be adequate to produce robust designs in the presence of severe process variations, and it becomes imperative that the circuit design adopts statistical approach. This paper presents one such statistical circuit design approach that takes into account variability in any number of process parameters, if their statistics are known. Two-input NAND gate has been used as a library element, and its delay is extensively characterized through mixed-mode simulations. An optimal second order hybrid model is obtained for gate delays directly in terms of process parameters through response surface modeling using DOE and LSM. The delay of a large digital circuit is characterized in statistical terms by taking a 4-bit × 4-bit multiplier as a representative circuit. We demonstrate that the worst case design approach is very pessimistic, whereas the hybrid model based statistical design approach can result in robust design. The proposed methodology has been demonstrated for NAND gate library with 266 gates, and the simplicity and generality of the approach make it equally applicable to a large library of cells for both statistical timing analysis and statistical circuit simulation at the gate level. This paper attempts to efficiently bridge the gap between the TCAD and design CAD through process simulations, mixed-mode device simulations, RSM, and a general-purpose circuit simulator.
