The impact of gate-to-source/drain overlap length on performance and variability of 65 nm CMOS is presented. The device and circuit variability is investigated as a function of three significant process parameters, namely gate length, gate oxide thickness, and halo dose. The comparison is made with three different values of gate-to-source/drain overlap length namely 5 nm, 0 nm, and -5 nm and at two different leakage currents of 10 nA and 100 nA. The WorstCase-Analysis approach is used to study the inverter delay fluctuations at the process corners. The drive current of the device for device robustness and stage delay of an inverter for circuit robustness are taken as performance metrics. The design trade-off between performance and variability is demonstrated both at the device level and circuit level. It is shown that larger overlap length leads to better performance, while smaller overlap length results in better variability. Performance trades with variability as overlap length is varied. An optimal value of overlap length of 0 nm is recommended at 65 nm gate length, for a reasonable combination of performance and variability.
INTRODUCTION
Parametric mismatch due to process variations is emerging as a significant barrier for realizing acceptable levels of performance and yield in nanoscale CMOS technologies. Design for Manufacturability and Yield (DFM and DFY) have received much attention in nanoscale technologies. Parametric fluctuations have evolved from a typical high-precision analog circuit design issue to a serious performance and yield limiter in pursuit of giga-scale integration. One of the most important and difficult challenge confronting the semiconductor industry is the loss of predictability in the functional correctness and performance of nanometer scale integrated circuits. It is expected that performance variances, caused by this mismatch, in short-channel MOS circuits may, ultimately, introduce a limitation for device scaling in integrated circuits [1, 2] . It has been shown that parametric mismatch forms a fundamental limit for realizing acceptable levels of performance and yield in nanoscale CMOS technologies and suggested for tighter control of conventional processes and development of improved device architectures [3] . For the technology to continue to advance along the Moore's curve, it is imperative to develop techniques to predict and to optimize the performance of ICs in the circuit design domain and to identify device structural parameters in the device design domain, in the presence of process variations. Thus, there is an urgent need to tighten the performance distribution of chips, both at the circuit design level and at the device design level, to achieve robust device/circuit performance, in the presence of process variations. There have been several works in the circuit design domain, reported in the literature, with regard to the accurate prediction of delay and power in the presence of process variations [4] [5] [6] [7] [8] [9] . To mitigate the effects of parametric mismatch, improved device architectures are required to be developed. This paper attempts to address the variability issue at the device design level, by identifying a variability-aware device design parameter.
There exists a critical gate-to-source/drain overlap length below which the device hot electron reliability suffers and a maximum value above which gate-tosource/drain capacitance becomes large and an optimal tradeoff between device performance and characteristics is achieved with in this narrow margin, that is shrinking with scaling [10] . The interaction of overlap length with lateral doping abruptness and the consequent impact on device performance is shown, indicating the use of overlap spacer for device optimization [11] . Traditionally, a minimum gate-to-source/drain overlap length of about 20 nm was recommended at 0.25 µm process, from the source/drain series resistance consideration, to prevent drive current degradation [12] . However, recently, it has been demonstrated that a gate-to-source/drain overlap length of 0 nm is preferred in the sub-100 nm regime from the perspective of digital and analog circuit performance [13] . Also, 0 nm overlap length devices have been shown to exhibit better hot carrier and gate oxide reliabilities, in terms of reduced peak electric fields near the drain and reduced gate leakage currents, respectively. The characteristics of MOS transistor with non-overlapped gate-to-source/drain region has been investigated and shown that they have better subthreshold slope and drain induced barrier lowering (DIBL) than those of overlapped structures [14] . In this work, we explore the efficacy of gate-to-source/drain overlap length as a variability-aware device design parameter.
We perform mixed-mode simulations, which bring the process-simulated devices directly into the netlist of the circuit, wherein both circuit and device equations are solved simultaneously. This technique is accurate as it does not involve SPICE parameter extraction, given that SPICE parameters may not capture the device behavior very accurately in the nanoscale regime [15] . Process/device simulation is considered appropriate to the study of process sensitivity as it enables the precise control of process variations that are difficult to achieve experimentally. A commercial Technology Computer Aided Design (TCAD) tool suite Sentaurus from Synopsys has been used for this study [16] . Section 2 describes the simulation methodology. While Section 3 presents the process sensitivity at the device level, Section 4 discusses the process sensitivity at the circuit level. Section 5 concludes with a summary of results.
SIMULATION METHODOLOGY
A set of nominal NMOS and PMOS devices of physical gate length of 65 nm are designed and optimized for two different leakage currents (I off ) of 10 nA and 100 nA, using disposable spacer technique [17] . A set of devices with a gate-to-source/drain overlap lengths (L ov ) of 5 nm, 0 nm, and -5 nm are generated by process simulations. The negative overlap suggests under lap from gate-tosource/drain extension. Figure 1 of NMOS illustrates gate-to-source/drain overlap length. To achieve the desired overlap length and leakage current, the overlap spacer thickness and the halo dose are appropriately varied. Devices with different overlap lengths are designed with leakage current constraint matched for the sub-nominal or best corner devices D-(defined in Table 1 ), for a fair comparison. It has been demonstrated that the gate length (L g ) and gate oxide thickness (T ox ) are the most significant parameters which impact the device variability at 65 nm [5] , as at successive process generations [1, 18, 19] . To demonstrate the relevance of overlap length as a variability-aware device design parameter, a set of most significant process parameters of L g , T ox , and the halo dose are considered, for process sensitivity study. It is assumed that gate length varies by ±5 nm, gate oxide thickness by ±3 A o , and halo dose by ±10%, so as to produce best and worst corner devices, for the complete set of nominal NMOS/PMOS devices. To explore the performance at the nominal, best and worst process corners, the traditional worst case analysis is used, by selecting and combining extreme values for each of the parameters chosen that result in extreme values of device performance in terms of drive current I on .
All the devices are simulated, with drift-diffusion transport model, to obtain I d -V gs , and C gg -V gs characteristics, and their respective drive current I on and total device gate capacitance C gg are measured. For device simulations, physical effects such as doping dependence of mobility, field dependence of mobility, velocity saturation, channel carrier quantization, Band-to-Band-Tunneling (BTBT), and silicon band gap narrowing have been considered. Using these devices, a two-stage inverter gate, as shown in Figure 2 , is simulated, to evaluate its transient behavior, using mixed-mode simulation approach. Both NMOS and PMOS are simulated at full device level. Transient analysis using mixed-mode simulations is used for the estimation of inverter delay, for its accuracy. An input pulse V in of 1 ps rise and fall times is applied and the stage delay of the first stage at its output Y is monitored, when loaded by an identical second stage.
PROCESS SENSITIVITY AT THE DEVICE LEVEL
The drive current I on and gate capacitance C gg of nominal and corner NMOS/PMOS devices are tabulated in Tables 2 and 3 , at 10 nA and 100 nA leakage, respectively. The percentage variation in I on and C gg of best and worst corner devices, with respect to the nominal, at 10 nA and 100 nA leakage, are shown in Tables 4 and 5 . It is observed that as L ov reduces, percentage variation in I on reduces for both NMOS and PMOS at both leakages. This may be attributed to increased area in the channel region and decreased impact of random dopant fluctuations. Hence, smaller L ov produces better variability performance. As L ov is reduced from 5 to 0 nm, the variation in I on reduces from 25.7% to 1.7% for best corner device in NMOS and from 22.3% to 15.5% for worst corner device in PMOS, at a leakage of 10 nA. However, at 100 nA leakage, the respective values are from 21.1% to 1.1% in NMOS and from 18.7% to 14.8% in PMOS. However, as L ov is reduced from 0 to -5 nm, the reduction in variability of I on is negligible for NMOS/PMOS, at both leakages. Also, the reduction in variability for worst corner NMOS and best corner PMOS is small. The process for NMOS can be biased toward the best corner and for PMOS toward the worst corner for an improved variability performance, but at the cost of process complexity. 
14] || Click here to download free Android application for this journal
It can also be seen that as L ov reduces, the device drive current I on reduces degrading the device performance of both NMOS/PMOS and at both leakages. As L ov is reduced from 5 nm to 0 nm, I on reduces by about 10% for NMOS and 20% for PMOS and as L ov is reduced from 5 to -5 nm, I on reduces by about 40% for NMOS and 50% for PMOS, at both leakages and for nominal devices. With variation in L ov , the total gate capacitance C gg is more or less constant for NMOS/PMOS at both leakages, as seen in Tables 2 and 3 . As L ov is reduced from 5 to 0 nm, the percentage variability in I on reduces from 25.7% to 1.7% in NMOS and from 22.3% to 15.5% in PMOS and as L ov is reduced from 5 to -5 nm, from 25.7% to 1.0% in NMOS and from 22.3% to 12.5% in PMOS. Similar trend in variation in percentage variability in I on can be seen at 100 nA leakage as well. Thus, it is demonstrated that overlap should be made as large as possible for better drive current performance and as small as possible for better drive current variability. Hence, there exists a trade-off between performance and variability at the device level with L ov as the design parameter. The design trade-off between the drive current I on and its variability is illustrated in Figures 5 and 6 , for 10 nA and 100 nA leakage, respectively. Although the percentage variation in I on is expressed with respect to the I on at +5 nm overlap, the percentage I on variability at a process corner for a given overlap is expressed with respect to the respective nominal device.
Considering that the reduction in variability is insignificant and reduction in I on is significant, when L ov is reduced from 0 to -5 nm at both leakages and for both NMOS/PMOS, an L ov =0 nm may be recommended for an optimal combination of performance and variability, at the device level. Thus, by reducing L ov from 5 to 0 nm, a significant reduction in variability to the extent of (1-(579.51-463.44)/(809.99-505.06)) = 62% in NMOS and (1 -(309.51-212.9)/(392.66-244.14)) = 35% in PMOS can be achieved at the cost of degradation in I on to the extent of 10% in NMOS and 20% in PMOS. The reduction in variability in drive current is expressed in terms of worst to best corner spread in drive current.
PROCESS SENSITIVITY AT THE CIRCUIT LEVEL
The nominal values of falling edge and rising edge delays of inverter circuit with 10 nA and 100 nA leakage devices are tabulated in Tables 6 and 7 . The percentage variation in falling edge and rising edge delays of inverter circuit of best and worst corner devices, with respect to the nominal, at leakage values of 10 nA and 100 nA, is shown in Figure 7 and the respective data are presented in Tables 8 and 9 , respectively.
The variation in falling edge delay reduces from 11.9% to 6% for best corner device and the variation in rising edge delay reduces from 20.3% to 13.8% for worst corner device, as L ov is reduced from 5 to 0 nm, at 10 nA leakage. However, at 100 nA leakage, the respective values are from 9.9% to 5.3% for falling edge delay and from 15.5% to 9.8% for rising edge delay. Similarly, as L ov is reduced from 0 to -5 nm, the variation in falling edge delay increases from 6.1% to 7.3% for best corner device and the variation in rising edge delay reduces from 13.8% to 5.4% for worst corner device, at a leakage of 10 nA. The respective values, for 100 nA leakage, are from 5.3% to 5.7% and 9.9% to 6.6%. For all cases of rising and falling edge delays at various overlap lengths, at both leakages and for best corner and worst corner devices, I on variations dominate over C gg variations, except for falling edge delay at L ov =0 nm and -5 nm at both leakages where C gg variations dominate over I on variations. This explains the rise in delay and its variability seen for this case.
It can be observed that as L ov is reduced from 5 to 0 nm, falling edge delay increases by about 7% and rising edge delay increases by about 25% and as L ov is reduced from 5 to -5 nm, falling edge delay increases by about 50% and rising edge delay increases by about 100%, at both leakages and for inverter circuit with nominal devices. As L ov is reduced from 5 to 0 nm, the percentage delay variability reduces from 17.5% to 16.6% for falling edge and from 20.3% to 14% for rising edge and as L ov is reduced from 5 to -5 nm, from 17.5% to 13.4% for falling edge and from 20.3% to 5.5% for rising edge. Similar trend in variation in percentage variability in delay can be seen at 100 nA leakage as well. Hence, the reduction in variability in delay trades with delay performance. The design trade-off between delay performance and delay variability is illustrated in Figures 8 and 9 , for 10 nA and 100 nA leakage, respectively. Although the percentage delay is expressed with respect to the delay at +5 nm overlap, the percentage delay variability at a process corner for a given overlap is expressed with respect to the respective nominal device. Thus, it is demonstrated that for better delay performance, overlap should be made as large as possible, and for better delay variability, overlap should be made as As L ov is reduced from 5 to 0 nm, the reduction in variability in delay is (1-(6.59-5.99)/(6.19-4.64)) = 61% and as L ov is reduced from 5 to -5 nm, the reduction in variability in delay is (1-(8.96-8.47)/(6.19-4.64)) = 68%. The reduction invariability in delay is expressed in terms of worst to best corner delay spread. Hence, it is clear that as L ov is reduced from 5 to 0 nm, the reduction in variability in delay is significant; the corresponding increase in delay is limited. Also, as L ov is reduced from 0 to -5 nm, the increase in delay is significant; the reduction in variability in delay is negligible. Hence, for an optimal combination of delay performance and delay variability, an overlap of 0 nm is recommended at 65 nm gate length.
CONCLUSIONS
The impact of gate-to-source/drain overlap length on performance and variability, at two different leakage currents, is evaluated by taking variations in significant process parameters. The Worst-Case-Analysis (WCA) approach is used to study the inverter delay variations at the process corners. The drive current of the device for device robustness and stage delay of an inverter for circuit robustness are selected as performance metrics. Although a smaller overlap length leads to better variability at the cost of performance, a larger overlap length results in better performance at the cost of increased variability. The design trade-off between performance and variability is demonstrated both at the device level and circuit level. An optimal value of overlap length of 0 nm is recommended at 65 nm gate length for a reasonable combination of performance and variability. The device design can be optimized for performance and variability with respect to overlap length for any CMOS process node.
