Abstract-This paper presents an on-chip characterization method for random variation in minimum sized devices in nanometer technologies, using a sense amplifier-based test circuit. Instead of analog current measurements required in conventional techniques, the presented circuit operates using digital voltage measurements. Simulations of the test structure using predictive 70 nm and hardware based 0.13 m CMOS technologies show good accuracy (error 5%-10%) in the prediction of random variation even in the presence of systematic variations. A test chip is fabricated in 0.13 m bulk CMOS technology and measured to demonstrate the operation of the test structure.
I. INTRODUCTION
L OCAL random variation in transistor parameters, particularly, threshold voltage ( ), increases with technology scaling and can degrade circuit robustness [1] - [3] . For small transistors in nanometer technologies intrinsic fluctuation in , due to effects such as random dopant fluctuations (RDF) or line edge roughness (LER) can dominate the mismatch in neighboring devices [4] . The effect of this local randomness is most pronounced in area constrained circuits, such as Static Random Access Memory (SRAM) cells, and limits the density scaling [3] , [4] . Hence, measurement, characterization, and estimation of local random variability in process are crucial for yield learning and enhancement in nanoscaled technologies, particularly, for SRAM design.
Conventionally, differential current measurement between identical neighboring devices is used to characterize local random mismatch [4] - [7] . However, measurement of small currents through minimum size transistors requires sophisticated analog measurement techniques. Moreover, complex data manipulation and analysis is required to extract differences from current differences. Hence, this method is unwieldy for on-chip characterization of local mismatches. An on-chip characterization can significantly reduce the time and cost associated with the collection of a large number of variability data (lower characterization time and cost). This paper demonstrates a sense-amplifier based test circuit and measurement method to characterize local random variation in a process. In this method offset voltage of sense-amplifier is used to measure device mismatch. Further, a built-in-self-test scheme for on-chip measurement of device mismatch is proposed. The primary advantages of presented test structure and measurement scheme over conventional methods are that:
• it provides a direct measurement of complete probability distribution of local mismatch; • it provides a simple digital measurement technique instead of complex analog voltage-current measurements; • the possibility of digital measurement suggests a fast, on-chip self-characterization scheme to measure random variability. The test structure is designed and simulated in predictive 70 nm technology [8] , hardware-based 0.13 m bulk CMOS and sub-90 nm silicon-on-insulator (SOI) technologies, to show its accuracy. A test chip was designed in 0.13 m bulk CMOS technology and fabricated through MOSIS services. The measurement of the test chip successfully demonstrates the operation of the test structure in measuring local random variation in a process.
The rest of the paper is organized as follows. Section II presents the theoretical analysis of the test circuit. Section III describes the test structure and the measurement methods. Section IV presents the statistical simulation results to verify the operation of the test circuit. Section V presents the test chip design and measurement results. Section VI draws the conclusions.
II. MISMATCH MEASUREMENT METHOD
This mismatch characterization scheme uses a current latchtype sense amplifier (CLSA) [9] based test circuit to measure the local random variability of a process. Fig. 1(a) shows the circuit schematic and basic operation of the sense-amplifier circuit. When the sense amplifier enable signal (SAE) is low, the nodes and are pre-charged to . When SAE is raised high, if , discharges at a rate faster than . If reaches below the trip-point of the inverter , the node switches back to "1" and goes to "0". However, if there is a mismatch in 0018-9200/$25.00 © 2008 IEEE the threshold voltage of the transistors, it is possible that even if , node can become "1" while goes to "0", resulting in an incorrect operation. A higher value of is required to avoid this incorrect operation. The offset voltage of this circuit is defined as the minimum voltage difference between and required for correct sensing. Let us now investigate the effect of process variation on offset voltage. First, random variations were applied independently to all the transistors in the circuit and Monte Carlo simulation was performed using predictive 70 nm devices [8] to extract the offset voltage. Next, a correlated component was added to the variations. Fig. 1(b) shows that the offset voltage is a strong function of random variation, while correlation does not significantly impact its distribution. An increase in the random mismatch increases the spread of the offset voltage. This shows that the offset voltage of CLSA eliminates the effect of systematic variation and depends only on the random components.
A. Analysis of Offset Voltage
To understand how the CLSA can be used for measurement of mismatch, consider the origin of the offset voltage. In this analysis, we will assume that all the different sources of random local variation is lumped into a single parameter, i.e., threshold voltage . This is a reasonable assumption for narrow-width devices in nanometer technologies (such as the ones used in SRAM) as the local variation is dominated by intrinsic fluctuations in due to effects such as random dopant fluctuations, line edge roughness, etc. If all devices in CLSA on the two sides of the symmetry line " " are identical, any voltage difference between the inputs can be sensed correctly (i.e., ). Assume a difference between the driver transistors such that . This suggests that, although , it can be possible that as follows: (1) where is the discharging current for node OUT and is the discharging current for node . If , node discharges at a rate faster than resulting in an incorrect sensing. Hence, for proper sensing, (2) From (2) we can observe that mismatch in the driver transistors results in a non-zero offset voltage. Similarly, difference between the trip-points of the two cross-coupled inverters in latch can also increase the offset voltage. The total offset voltage is linear combination of the offset due to mismatch only in the driver FETs and that due to mismatch only in the latch FETs . To verify this, worst-case mismatch was applied only to latch FETs, then only to driver FETs, and finally to all the devices. Simulation using predictive 70 nm devices shows that the total offset is a linear combination of and [ Fig. 2(a) ]. The offset voltage due to the driver FETs is given by the input voltage difference required to make the current through and equal. From (2) it can be concluded that is the same as the mismatch between the driver FETs. Hence, we obtain (3)
The offset voltage due to the latch is more difficult to estimate. To understand the latch offset, consider that . Hence, the time required for node to reach the trip-point of the inverter (say, ) should be less than the time required for node to reach the trip-point of the inverter (say, ). Assuming a constant discharging current until this time and a step input to SAE, we can obtain (4) where is the load capacitance, is the trip-point of the inverter associated with , and is the trip-point associated with
. For correct sensing, , and for incorrect sensing, . Hence, latch offset is given by the input voltage required to have . Considering mismatch in latch ( and ) we get (5) where is the current difference between the two paths. Since we are interested in the latch offset here, we assume that the driver offset is zero. In other words, mismatch in latch FETs is considered and no mismatch is considered for driver FETs (i.e., as and ). Therefore, for correct sensing, (6) The current difference between the two branches can be obtained by solving the differential stage formed by , , and and is given by [9] , where is the current through the clock transistors. Hence, we get
The above analysis shows that is a direct measure of the local mismatch while introduces estimation error. Moreover, does not depend on the size of other transistors. Hence, we propose to use the driver transistors as the device under test (DUT) and the offset voltage of CLSA is measured to obtain mismatch. Therefore, the statistics of the offset voltage directly measure the statistics of local random variations. To improve the accuracy of this method, the latch offset needs to be minimized. This can be achieved by reducing the size of the clock transistor (i.e., reducing ) and increasing the NFET-to-PFET beta ratio as demonstrated in Fig. 2 (b) and (c) [9] . Increase in sensing delay due to a smaller clock transistor (which makes it unsuitable for SRAM application) is not a major concern for this application. Further, a slower rise of the SAE also helps to reduce . 
III. TEST STRUCTURE AND TESTING METHOD

A. Organization of the Test Structure
The basic element of the test structure is the CLSA circuit optimized to reduce the latch offset [ Fig. 3(a) ]. To minimize the latch offset, the latch FETs are designed to be large, since random variation decreases with size, with large NFET-to-PFET width ratio ( 8) and a small clock transistor is used (same as driver FETs). Driver FETs (i.e., DUTs) are placed in closest possible proximity and inputs all unselected CLSAs reduces power dissipation and spurious transitions during testing/characterization.
B. Characterization Method
First, a decoder circuit selects one CLSA at a time.
is applied to and to of the selected CLSA, where is initially set to zero. It should be noted that SAE is low, which pre-charges and to high. Next, SAE is raised high and kept high for a reasonably long period of time since the clock transistor is small and the delay is expected to be large. It is expected that will be high and will be low if there is no mismatch. Hence, when is applied to , is compared to "0". If is observed to be "1", in the next step is increased in a small step and the measurement is repeated. This process is repeated until correct offset is reached (i.e., changes to "0"). The final value for the sense amplifier is stored and the measurement for next CLSA is started with reset to zero.
C. On-Chip Variability Measurement System
The above discussion shows that, although the CLSA based test structure operates based on differential current between two devices, it does not require analog measurement of the current difference. It only needs application of a voltage difference and measurement of a digital output (digital signature of the local variation). Hence, this scheme can be used to design an in-line on-chip built-in-self-test (BIST) circuit for random variability measurement, which is described in Fig. 4 . An on-chip voltage divider network can be used to generate different s. The on-chip test controller selects a sense-amplifier to apply the (starting from ) and compares its output to determine if there is a failure. In case of a failure, the controller advances its state and selects the next . As soon as a success is detected, the digitized value is stored in an on-chip memory. A self-test option makes the characterization simpler and faster compared to the conventional methods.
IV. STATISTICAL SIMULATION RESULTS
The effectiveness of the test structure is evaluated through Monte Carlo simulations. Random and correlated and shifts were applied to all the transistors in the circuit where area-dependent variations for were assumed. The simulated distribution of the offset voltage was compared to the distribution of the applied shift. Prediction errors for standard deviation and entire distributions are given by (8) (9) where the "applied mismatch" refers to the mismatch applied to the different devices in the CLSA while performing the SPICE simulations, and the "estimated mismatch" refers to the offset voltage values (which is expected to be same as the applied mismatch) the obtained from the simulations. The difference between the true (computed from applied variation values) and estimated (computed from offset values obtained from the simulations) values of (for different ) is used to quantify the estimation error in the distribution.
A. Estimation of Mismatch
First, we have performed Monte Carlo simulations of the test circuit considering random variation in each transistor in the test circuit. In the Monte Carlo simulation, a set of seven random values (one for each transistor in CLSA) represents one random instance of the test circuit. A large number ( 1000) of such random instances of the test circuit were simulated and the offset voltage for each case was estimated from the simulation. The offset voltage distribution thus obtained is referred to as the "estimated mismatch" in the following analysis. As mentioned before, the variation applied to the devices while performing the simulation, is referred to as the "applied mismatch" in the following analysis. Simulation using predictive 70 nm devices shows that the estimated mismatch (i.e., offset voltage) distribution closely follows the applied shifts (Fig. 5) . The estimation error in standard deviation was observed to be within 8% [ Fig. 5(b) ]. It can also be observed that simulated offset values tend to overestimate the distribution, due to non-zero latch offset. It was observed that reducing the clock transistor size improves the estimation accuracy in both standard deviation and distribution (Fig. 5) . On the other hand, increasing the size of the latch transistors helps to reduce estimation error as the mismatch inversely depends on device width. However, increasing the PFET size beyond a certain point only has a small impact on error, even assuming only area-dependent variation. Thus, the latch PFETs were to have large ( 8) which helps reduce latch offset due to area-independent components of mismatch. The test circuit can also obtain a good estimate of the mismatch distribution even if the distribution is non-normal in nature [ Fig. 5(c) ]. Estimation accuracy improves as the current through clock transistor reduces due to lower latch offset. This can be achieved by using and and using lower than (Fig. 6) . Fig. 7(a) shows that even for correlated distribution, the test circuit can correctly estimate and . The error in the prediction of complete cumulative distribution is also small [ Fig. 7(b) ]. For on-chip measurement it is necessary that inter-die variation should have minimal impact on test circuit operation. To evaluate this, both inter-die shift (same for all the transistors) and local (random and correlated) variation (same at all inter-die corners) were applied to the transistors. The test circuit can correctly predict the mismatch at all inter-die corners (Fig. 8) . 
B. Application of Test Circuit
Along with the intrinsic fluctuations, neighboring devices are also expected to have geometric mismatches, e.g., channel length variation. Let us analyze the effectiveness of the test circuit in predicting the total random variation in process. This is useful to analyze whether estimated distribution can be used for process optimization and/or circuit simulation.
Estimation of Device Variation:
We have studied the effectiveness of the proposed circuit in measuring total device variation when both local geometry and threshold mismatches are present in a technology. We have performed Monte Carlo simulations of the test circuit considering random and variation in each transistor in the test circuit. In the Monte Carlo simulation, a set of seven random and values (one for each transistor in CLSA) represents one random instance of the test circuit. A large number ( 1000) of such random instances of the test circuit were simulated and the offset voltage for each case was estimated from the simulation. Hence, the above Monte Carlo simulation provides a distribution of the offset voltage considering mismatch in both and . Next, the estimated offset distribution was applied as variation to two identical transistors to obtain their current mismatch. The current mismatch thus obtained is referred to as the "estimated mismatch" in Fig. 9 . We have also directly applied the random and variation (with the same standard deviation as applied in the case of Monte Carlo simulation of the test circuit) to these two identical transistors and obtained their current mismatch. The current mismatch thus obtained is referred to as the "true mismatch" in Fig. 9 . The estimated current mismatch observed to closely follow the true current mismatch (Fig. 9 ). This is due to the fact that offset voltage not only depends on the mismatch but also on other local mismatches. Hence, the offset distribution can closely predict the total random mismatch in process and is useful at the initial phase of technology development. Moreover, the obtained offset distribution can be used as " mismatch" for circuit design and simulation.
Estimation of SRAM Variability: The application of the offset distribution is considered in predicting characteristics of SRAM cell under process variation. As in the previous case, and variations were applied to the transistors in test circuit to estimate offset distribution of DUTs of different widths and use this distribution as distribution to obtain cell characteristics. The cell characteristics (namely, read current, read voltage, and trip-point) distribution thus obtained are referred to as the "estimated distributions" in Fig. 10 . Next, we applied the and variations directly to the SRAM transistors and re-obtained the distributions of these cell characteristics. The distribution thus obtained is referred to as the "true distributions" in Fig. 9 . The estimated distribution of read current closely follows the true read current distribution obtained by applying both and variations directly to cell transistors [ Fig. 10(a) ]. The variation in read voltage (i.e., voltage to which the node storing "0" rises while reading) and trip voltage (trip-point of the inverter associated with the node storing "1") can also be predicted with good accuracy [ Fig. 10(b) ]. This suggests that measured offset voltage can be used for simulations and estimation of random variation effects in SRAM cell characteristics.
C. Verification Using Hardware Based Models
The functionality and effectiveness of the test circuit is also verified using industrial standard, well-characterized hardware based models. First, the test circuit is optimized in 0.13 m bulk CMOS technology. Monte Carlo simulations of the test circuit are performed using the process variation parameters internal to the technology model, which are calibrated against hardware. Along with RDF, other sources of mismatch (e.g., geometric mismatch, orientation dependent mismatch, etc.) were also included in the simulation. The simulated offset voltage is then used as distribution of the devices (as explained in Section IV-B) to estimate mismatch in saturation current between two minimum sized identical devices. The estimated mismatch closely follows its true value obtained by direct Monte Carlo simulation using process variations internal to the technology [ Fig. 11(a) ].
We also verified the test circuit in a sub-90 nm SOI process through simulations using hardware based models. In this case, intentional Gaussian variations in were applied. The test circuit successfully estimated the applied variation [ Fig. 11(b) ]. The verification using hardware based models shows that the test circuit can be very useful in predicting local variation both in bulk CMOS and SOI technologies.
D. Analysis and Discussions
The effectiveness of the proposed design strongly depends on the following factors: 1) the number of test structure required; 2) characterization time; and 3) the resolution of the offset voltage.
Number of Test Structures:
Increasing the number of test structure will reduce the estimation error at the expense of higher test cost (larger area) and test time. To estimate the number of sense amplifier required, assume that the estimated standard deviation of is and its true value is . Confidence interval for is given by [10] (10) where is the confidence level, is the total number of test structures, and is inverse function for the chi-square distribution with degrees of freedom. From (10), we obtain (11) Fig. 12(a) shows the maximum percentage error in estimated value of for different numbers of test structures. From Fig. 12(a) , it is estimated that 200 test structures are sufficient to measure mismatch within 10% error with a 95% confidence level.
Characterization Time: The time required to test CLSA is a major design/analysis parameter for the test structure. The expected value of the characterization time is given by (12) where is the time required for measurement of single step, is the number of steps required to measure a single CLSA, and is the number of test structures. It is obvious that total characterization will increase with an increase in the number of structures. It is interesting to note that the characterization time in the proposed circuit also depends on the variability in process. To understand this property, let us evaluate the expected number of steps required to characterize a test structure using the proposed method. Since the offset voltages of all the CLSAs are identical independent variables, their expected values are equal and can be obtained as (13) where is the cumulative distribution function for Normal distribution. Fig. 12(b) shows the variation of characterization time for different process variation and measurement resolution . A higher process variation and smaller increases characterization time. For reasonable values of process variations, the test time is calculated to be less than 100 +s (significantly smaller than conventional methods). The dependence of the characterization time on process variability is an important property of the proposed design. Since in this method we modify input step until we observe a change of state at the output, a larger mismatch between driver devices will require a larger number of voltage steps. Higher process variability implies a larger number of test structures will have driver devices with high mismatch and will require a higher number of voltage steps. Therefore, the total characterization time will increase with an increase in process variability. This is in contrast with the conventional mismatch characterization technique using measurements. The number of steps require for characterization is independent of the process variation. Therefore, the characterization time for conventional measurement is independent of the variability in process.
Minimum Resolution of Input Voltage: It is expected that using a higher step size for increasing (i.e., higher minimum resolution) will increase the measurement error. By simulating the test circuit to estimate offset voltage using different minimum resolution, it was observed that a resolution of 10 mV can provide good estimation accuracy (error 10%) (Fig. 13) . 
V. TEST CHIP AND MEASUREMENT RESULTS
A test chip is fabricated in 130 nm triple-well bulk CMOS technology through MOSIS services and measured to demonstrate the operation of the test structure. Fig. 14 shows the partial die photo of the test chip with the test structure. Fig. 15(a) shows the layout the local variability sensor, which contains two arrays each with 256 (16 16) CLSAs. Individual CLSAs in the structure are accessed using a 5-bit row and 4-bit column decoder. The 512 CLSAs are divided into eight groups each with 64 CLSAs. The groups are designed to have DUTs of different widths , different channel lengths , different s (regular and high ), and with rotated layout. All NMOS devices, except DUTs, are designed in the isolated p-well of the triple-well process. Digital nature of the test structure allowed software controlled automated measurement of local mismatch. Measurements are performed at , and clock period s. Fig. 15(b) shows the measured offset voltage for different CLSAs for a particular die are random and local in nature. As expected from the discussion in Sections II and IV, the spatial correlation was observed to be negligible. The randomness in the measured data is clearly larger for the groups with width (row 0-3) compared to the groups with higher widths (e.g., group with , row 12-15). Fig. 16 shows the spatial variation of mismatch values for minimum size devices from two dies. It can be observed that there is minimal die-to-die correlation between mismatch values at a given spatial location. Further, as expected from discussions in Sections II and IV, the within-die spatial correlation was also observed to negligible. Fig. 17(a) shows the mismatch distribution for minimum size devices. The distribution was observed to be close to Normal. Moreover, the difference in the offset, i.e., mismatch, distribution obtained from different dies was observed to be small. This suggests that the die-to-die variation has a weak impact on the measurement accuracy of the test structure, as predicted in Fig. 8 . Fig. 17(b) shows the measured offset voltage values for a single die for DUTs with different width. It can be clearly observed that the spread in the mismatch reduces for devices with larger width. Fig. 18 shows the measured offset voltages obtained from three different dies for DUTs with higher and longer channel lengths. The spread is larger for higher devices, and lower for longer channel devices. This is due to the fact that higher doping in the higher devices tends to increase the random dopant fluctuation effect, resulting in higher mismatch [11] . On the other hand, higher channel length increases the channel area and reduces the short channel effect [11] . Both of these effects reduce the random variability due to RDF. Further, a longer channel means the highly doped "halo" regions near the junction are shifted further away from each other which is expected to reduce the channel doping. This could also results in a lower variation. Due to all these effects, the tends to reduce at a faster than square-root rate (as predicted from first-order analysis of RDF in [11] ) with channel length. Fig. 19 shows the measured standard deviation of mismatch for devices with different geometry and . The measured standard deviation values from five different dies are close to each other, which re-emphasizes the fact that the impact of chip-to-chip variation on the effectiveness and accuracy of the test circuit is very low. As observed in Fig. 17 , the standard deviation of the mismatch is lower for larger widths. Moreover, the values for different widths tend to follow the characteristics nature expected for mismatch due to RDF [11] . However, presence of area independent mismatch is also observed; reduces at a rate slower than . As expected from Fig. 18 , of the mismatch is higher for the highdevices, compared to the regular devices and lower for longer channel devices. The devices with different orientation (i.e., with 90 rotated layout) was observed to have similar variation as the same orientation. Note that in the implementation of the test circuit, a separate set of latch transistors was used with each of the DUT pairs. This is simple design but it has a higher area overhead as the latch FETs [and associated NAND gates in Fig. 3(a) ] are repeated for each DUT pair. This can be avoided by using only one set of latch FETs (and associated NAND gates) and multiplexing the DUT pairs as illustrated in Fig. 20 . The clock FET can also be distributed with each DUT pair. The DUT selector will select only one pair and SAE signal. For the unselected DUT pairs, SAE will be turned off along with the inputs and (both set to "0"). This results in a two transistor stack in the unselected path, which substantially reduces leakage through these paths. The leakage can be further reduced by using a small negative voltage for the SAE, , and for unselected DUTs. The number of DUTs that can be multiplexed will be determined by the leakage through the unselected path. We think as long as the current through the selected path is 100 times larger than the total leakage current through the unselected paths the circuit will provide a good indication of the mismatch between selected DUTs. For a technology with of 1000 and using the fact that a two-transistor stack has lower leakage compared to a single off device, 100 DUT pairs can be multiplexed. We believe multiplexing 64 DUT pairs is a good choice. Note, using a single set of latch FETs not only reduces the area, it also completely eliminates the effect of latch offset in the measurement. The latch offset adds equally to each measured offset value and provides only a shift in the mean of the measured offset distribution. It has a negligible effect on the standard deviation of the measured offset distribution.
VI. CONCLUSION
Measurement and characterization of local variation are very important for robust circuit design and better manufacturing yield. In this paper, a sense-amplifier-based test structure for fast and accurate characterization of local random variation has been demonstrated. The presented test circuit essentially measures a digital signature of local variation, thereby eliminating the need for analog measurements and complex data analysis involved in conventional mismatch characterization methods. The digital measurement technique makes the design of an on-chip built-in self-characterization scheme feasible. The effectiveness and accuracy of the test circuit is demonstrated through statistical simulations and measurement of test chip. The simulation and measurement results show that the proposed test structure can extract local random mismatch in a process with very low test time and cost. Digital nature of the testing scheme is very useful for fast and accurate characterization of process which will facilitate technology development and help make pre-silicon design decisions to improve circuit robustness, resulting in better manufacturing yield in nanometer technologies. ACKNOWLEDGMENT Thanks to Dr. Keejong Kim, Purdue University, for helping with the integration of the design in a multi-project test chip and the preparation of the test board. Thanks also to MOSIS services for the fabrication of the test chip.
