Abstract-Recently, devices optimized for subthreshold operation have been introduced as potential construction blocks for digital subthreshold logic circuits. However, for these devices, a strong sensitivity to process variations is expected due to the exponential relationship of the subthreshold drive current and the threshold voltage. In this paper, a yield optimization technique is proposed to suppress the variability of a device optimized for subthreshold operation. By using the technique, a process/device designer can optimize a transistor for subthreshold operation in terms of the total leakage current and intrinsic delay bounds. Sample devices are optimized for 90nm and 65nm technologies, and Monte Carlo simulations verify the accuracy of the technique.
I. INTRODUCTION
Subthreshold logic has emerged as a compelling approach to design energy-efficient systems. It operates in the weak inversion region (V dd < V th ), such that load capacitances are charged by subthreshold leakage current. It is well known that subthreshold current is exponentially dependent on the threshold voltage, where V th is strongly related to various device parameters; they in turn vary considerably in the Deep Sub-Micron (DSM) regime. Therefore, subthreshold designs are expected to be prone to process variations [1] .
In the subthreshold design domain, recent analyses have been done to address process variations. Zhai et. al. [1] have reduced the impact of Random Dopant Fluctuation, (RDF). Kwong and Chandrakasan [2] have determined how sizing affects the variability in the output logic swing, and have proposed a design framework for minimum energy subthreshold circuits. Melek et al. [3] have equalized mismatched pMOS and nMOS currents caused by V th variations. However, the previous research involve circuit and architecture level techniques, whereas strategies for optimizing a transistor for subthreshold operation is not considered under process variations.
Researchers have implemented subthreshold systems by using standard CMOS transistors [4] , [5] , however; a device should be specifically optimized for subthreshold operation to improve the device's Power-Delay Product (PDP) as shown by Paul et al. [6] . However, the authors have not followed nor proposed an automatic framework to perform the device optimization process, and have not accounted for process variations. Consequently, this work provides an automatic device optimization technique which includes process variations and device yield on subthreshold transistor design.
II. DESIGN PARAMETERS AND VARIATIONS
This section covers the proposed device structure, constraints, and device parameters. It should be noted that the equations provide an understanding of why certain parameters are or not included in the set of design variables for the device optimization problem. However, the technique is carried out by the MEDICI device simulator [7] which takes into account all device parameter dependencies such as V th roll-off, and the dependence of V th on the flat-band voltage, V f b .
A. Device Structure and Constraints
Scaled MOS standard transistors incorporate halo and retrograde doping profiles to mitigate SCE. However, in the subthreshold regime, the power supply is reduced, and the SCE such as DIBL and body punchthrough are minimal. As a result, for a more simplified fabrication process technology and fewer junction capacitances, the subthreshold device is characterized with a uniform doping profile [6] , in conjunction with a symmetrical Bulk-nMOS structure in this work.
Since, in subthreshold logic, circuits are driven by the subthreshold leakage current, it is important to examine the major leakage components in DSM technologies, subthreshold leakage, I sub , gate leakage, Igate, and reverse biased junction Band-To-Band Tunneling, IBTBT [8] .
I sub is the drive current in the subthreshold regime, Ion = I sub (Vgs = V ds = V dd < V th ), and in the transistor off-state, I of f sub = I sub (Vgs = 0, V ds = V dd < V th ). This off-state current is the dominant contributor to static power consumption. Thus, for computing the total leakage current, I of f sub is taken into account. I sub is exponentially dependent on the V th as follows [9] , is width over length ratio of the device, and n is the subthreshold swing coefficient.
Igate is a byproduct of the oxide thickness scaling, Tox, required to overcome the V th roll-off in scaled technologies. Such a current is an exponential function of Tox, Igate ∝ e −Tox . Thus, as the tunneling current increases exponentially with a decrease in Tox, the gate tunneling leakage cannot be neglected, thus, Igate is considered for computing the total leakage current.
The BTBT leakage is mainly present in scaled CMOS devices with halo and retrograde doping profiles [8] , which are not incorporated in the subthreshold structure. Besides, as V dd is reduced, the BTBT leakage is negligible [6] , and is ignored.
Therefore, the total leakage, T L is the power constraint, and computed as
The primary metric for transistor speed is the transistor intrinsic delay, τ , defined as
Cg is the gate capacitance per micron of the transistor width, and Ion is the drive current/μm. τ is the time required for a MOSFET to charge or discharge the gate of another identical MOSFET, and constitutes the performance constraint in the optimization problem.
B. Device Parameters and Variations
To establish a set of device design parameters for the optimization problem, a starting point is the exponential dependence between I sub and V th in (1) . Therefore, the variations in V th impact both I of f sub and driving current Ion. In turn, any variation in Ion echoes for τ
as well. Thus, the impact of device parameters on the V th should be considered.
1) Physical Gate Length (Lg):
The V th of a short-channel device decreases as Lg is reduced due to the closer proximity of the source and drain areas, whose surrounding depletion regions penetrate into the channel, as signified in Figure 1 [9] . W dep is the depletion-layer width, L is the channel length, and L is the reduced channel region. Consequently, less charge Q B ∝ W dep ×(L+L )/2 must be inverted to reach the V th which is defined as follows [9] :
ψs is the surface potential, and Q B is the total gate depletion (trapezoidal) charge. The shift in the V th , originated by channel length scaling ΔV th , is approximated as [10]
V bi is the built-in potential, and l is the characteristic length defined as
SiToxWdm/ oxη (6) Si, ox are the silicon and oxide permittivity, respectively, and W dep /η is the average depletion layer width along the channel. This approximation defines a SCE known as the V th roll-off. As a result, any variation in Lg impacts I sub , and hence, the performance and power consumption [9] . Thus, Lg is considered in the design problem.
2) Oxide Thickness (Tox):
Variations in Tox impacts the oxide capacitance, Cox per unit area, as given by Cox = ox/Tox. Any variation in Cox affects V th in (4) and I sub in (1). Moreover, the SCE is affected by Tox, as given in (6); thus, a thinner oxide is needed to overcome the V th roll-off. However, this Tox reduction increases the gate leakage current exponentially. Hence, careful engineering of the Tox dimension is crucial to meet the constraints. As a consequence, Tox is considered in the design problem.
3) Channel Doping (N ch ): As CMOS devices are scaled further into the DSM regime, variations in the number, and the placement of dopant atoms in the channel region cause random variations in V th . A simple first order model of the V th standard deviation (σV th ), due to RDF is given by [9] 
It is observed that by reducing N ch , the V th variation is also reduced. However, the V th also drops when N ch is reduced, increasing the I of f sub current. For these reasons, N ch is also included in the problem.
4) Junction Depth (Yj):
Aggressive device scaling has necessitated shallow junction depths to reduce the SCE and suppress the depletion layer penetration into the channel. The result is an increase in parasitic device resistance and involves a complex fabrication process. Consequently, any treatment of Yj is limited by the sheet resistance which is desirable to keep it low to achieve sufficient current-drivability. As a result, the design parameter, Yj, is not included as part of the optimization problem.
5) Transistor Width (W ):
The transistor width is the principal parameter that circuit designers can change to meet the required specifications of circuits and systems. For this reason, W is not included as part of the device optimization problem.
Summarizing, the problem is composed by Tox, Lg, and N ch since their variations directly impact the constraints, T L and τ .
III. PROBLEM STATEMENT AND YIELD MAXIMIZATION TECHNIQUE
This section begins with a general explanation of the idea behind the yield maximization process, and subsequently, the problem is formalized. Finally, the implementation of the technique is explained.
A. Qualitative Approach
For clarification, a problem with only two design variables is denoted in Figure 2 . Note that any point inside this plane represents the construction device dimensions, corresponding to the respective pair (Tox, Lg). Any device xi between the T L and τmax curves inside the shaded region (the feasible region) in Figure 2 satisfies both constraints. The yield maximization problem is reduced to inscribe a rectangle that is formed by four corner devices in the Fc. The center of the maximum yield rectangle
) represents a device with the set of design values most immune against process variations. Finally, Monte Carlo (MC) simulations verify the optimal design (x c ) yield, which is defined as the percentage of the total devices (scattered points) whose τ and T L values fall within Fc. 
B. Formal Problem Statement
To translate the idea briefly explained, the feasible region is identified from (8), a 3-D space, where any device xi = (Lg i , Tox i , N ch i ) satisfies the constraints. The feasible region, Fc, is formulated by
In addition to the Fc, a cube should be formed and inscribed in (8) . The cube is defined as follows:
x l and x u are the coordinates of the extreme corners.
In this work, the variation of each design parameter is considered to be independent and the distribution is assumed to be Gaussian. Since this distribution does not have a closed form integral (Cumulative Distribution Function CDF), which represents the yield evaluation, Kumaraswamy's double-bounded density function is adopted. This model has a simple closed form for both PDF and CDF, and can be consulted in [11] .
Supported by the last definitions, the yield problem is defined as
Note that the symmetrical assumption of the design variable distribution leads to easily locating the final device, lying at the center of the cube by computing x c = ( 
C. Implementation
To effectively solve the constrained non-linear optimization problem, the Sequential Quadratic-Programming (SQP) algorithm of Matlab R is used [11] . The variances of the design parameters and the two constraints are given to the optimization engine. The engine attempts to find a cube in the Fc, and maximize Yield. Consequently, for each iteration, a set of design values are determined by the engine, and directly evaluated by MEDICI. The containment of the cube in the Fc is verified by checking the worst cases, where each x element attains its extreme values. These cases are formed by the 
IV. RESULTS AND DISCUSSION
Sample devices for 90nm technology are designed to balance speed and power. 3σT ox , and 3σL g are chosen as 4% × 1.5nm and 12% × 90nm, respectively [13] . In addition, a 65nm transistor is designed to see how the design parameters scale. In this case, the 3σ values are 4% × 1.2nm and 12% × 65nm for Tox and Lg, respectively. 3σN ch is equal to 10% of its center value at each iteration for both technologies. Table I lists the defined bounds of the sample devices. The T L values are selected to be close to the low operating power (LOP) device for 90nm technology, defined by the ITRS [13] . A realistic ratio, Ion/I of f for subthreshold devices is approximately 1, 000 [6] , and for superthreshold transistors this ratio is approximately 100, 000 with a τ value of 1.5ps or so (again, for LOP devices and 90nm technology). In this way, as τ is a function of Ion (3), the intrinsic delay for subthreshold devices is expected to be around one hundred times the τ of LOP 90nm superthreshold transistors. For the 65nm technology, the constraints are estimated according to the expected five-fold I of f increase for each generation and to achieve at least a 30% delay improvement. Table II summarizes the optimum device parameters (x c ) and specifications, obtained from the technique. Unfortunately, there are no industrial subthreshold devices to compare the results with, but the standard CMOS 90nm and 65nm technologies in the literature and guidelines, provided by the ITRS are adopted. To maintain an adequate control over gate leakage current, the Tox values exhibit a constant trend of 2.3nm for 90nm technology devices. This value is similar to that of the Tox offered for low standby power 90nm technology devices, proposed in [14] (Tox = 2.2nm). The Tox value of the 65150,12.5 device corresponds to that of the general purpose 65nm technology device suggested in [15] , (Tox = 1.4). The I of f values are in good agreement with those proposed for LOP devices by the ITRS (3nA/μm and 5nA/μm for 90nm and 65nm technologies, respectively). The τ values are about one hundred times the intrinsic delay of the LOP superthreshold transistors, as expected. It is noteworthy that process precision does not allow the Tox and Lg parameters to be optimized continuously. To cope with this issue, when the optimum device parameters are obtained, they are rounded according to the achievable process values, and then the remaining parameter (N ch here) is re-optimized, considering Tox and Lg as fixed values. For example, consider the 90300,5.0 designed device, assuming a precision level of 0.1nm for Tox, and 1nm for Lg. Thus, Tox remains at 2.4nm, and Lg at 85nm in the reoptimization process. The new N ch optimized parameter changes from 1.0 × 10 18 cm −3 to 0.90 × 10 18 cm −3 . Leading to a negligible yield reduction from 98.9% to 98.1%. Table III reflects the worst cases of the leakage components from Section II-A. As can be seen, I of f sub contributes the most current to the static power consumption in the optimized subthreshold devices.
To verify the variability robustness and constraint satisfaction of the newly optimized subthreshold transistors, MC simulations (5,000 points) are performed based on the constraints in Table I . Figures 3  and 4 signify the mean and standard deviation of the T L and τ of the optimum devices, respectively.
In Figure 3 , it is noted that the T Lmean of the 90nm design with the lowest delay bound (90200,5.0) is greater than the mean of 90nm designs with constraint τmax = 300ps. This is evident since the fulfillment of the lower delay bounds requires more drive current, and is thus, intrinsically, an increment of I of f . For the 65nm device, there is an approximate five-fold increment of leakage current. A reduction of τmean is expected, since the devices are bounded with higher T L limits as depicted in Figure 4 . For the 65nm design, performance is improved in scaling technologies. For a closer picture of the effect of the constraints, devices 90300,5.0 and 90300,2.5 are selected to compare the delay and offstate leakage current dispersion. The spread of T L vs. τ is depicted in Figures 5 and 6 , respectively. In these figures, all the devices that are inside the quadrant τmax and T Lmax represent the success of the transistors in meeting both constraints. It can be seen in Figure 6 that as the T Lmax constraint is reduced, many devices violate this constraint as expected. Also there is an increase of the devices which violate τmax, even though the performance constraint is constant for both devices. This occurs since the optimization process finds a center device, x c , to meet a lower value of T Lmax, and therefore, its intrinsic delay τ is increased. Since the optimum device has a greater value for τ , when the variations are incorporated, the devices also start to violate the 
τmax.
V. CONCLUSION This paper shows a strategy for process/device designers to optimize transistors for digital subthreshold operation. By finding appropriate values of Lg, Tox and N ch , an optimized MOS device can satisfy the constraints while taking into account variations on the mentioned device parameters. Devices for 90nm are optimized for specific applications; that is, the 90300,5.0 device is appropriate for general applications to construct subthreshold circuits with a balance between the total leakage power and delay, I of f = 2.2nA/μm, τ = 161ps. The device 90300,2.5 is good fit for subthreshold designs for low power, that is, I of f = 1.6nA/μm with a sacrifice in speed, τ = 210ps. Finally, for the high speed cases, the 90200,5.0 device provides an intrinsic delay τ = 129ps with an increase in I of f , equal to 2.9nA/μm.
