Abstract-The supply voltage ( dd ) and threshold voltage ( th ) are two significant design variables that directly impact the performance and power consumption of circuits. The scaling of these voltages has become a popular option to satisfy performance and low power requirements. Subthreshold operation is a compelling approach for energy-constrained applications where processor speed is less important. However, subthreshold designs show dramatically increased sensitivity to process variations due to the exponential relationship of subthreshold drive current with th variation and drastically growing leakage power. If there is uncertainty in the value of the threshold or supply voltage, the power advantages of this very low-voltage operation diminishes. This paper presents a statistical methodology for choosing the optimum dd and th under manufacturing uncertainties and different operating conditions to minimize energy for a given frequency in subthreshold operation while ensuring yield maximality. Unlike the traditional energy optimization, to find the optimal values for the voltages, we have considered the following factors to make the optimization technique more acceptable: the application-dependent design constraints, variations in the design variables due to manufacturing uncertainty, device sizing, activity factor of the circuit, and power reduction techniques. To maximize the yield, a two-level optimization is employed. First, the design metric is carefully chosen and deterministically optimized to the optimum point in the feasible region. At the second level, a tolerance box is moved over the design space to find the best location in order to maximize the yield. The feasible region, which is application dependent, is constrained by the minimum performance and the maximum ratio of leakage to total power in the dd -th plane. The center of the tolerance box provides the nominal design values for dd and th such that the design has a maximum immunity to the variations and maximizes the yield. The yield is estimated directly using the joint cumulative distribution function over the tolerance box requiring no numerical integration and saving considerable computational complexity for multidimensional problems. The optimal designs, verified by Monte Carlo and SPECTRE simulations, demonstrate significant increase in yield. By using this methodology, yield is found to be strongly dependent on the design metrics, circuit switching activity, transistor sizing, and the given constraints.
I. INTRODUCTION
A S very large-scale integration (VLSI) technology is scaled down toward the nanoscale regime, drastically growing leakage power and variations in device characteristics have motivated a significant investigation into the optimum supply and threshold voltage design for minimizing energy or power for a given performance constraint. Although energy dissipation has improved with each new technology node, as systems on chip are integrating tens of millions of devices on-chip following Moore's law, the energy expended per operation has become a critical issue in analog and digital integrated circuit design. Subthreshold operation is emerging as an energy-saving approach that involves scaling voltages below the device thresholds [1] . In this region, the energy per operation can be reduced by an order of magnitude compared to conventional operation, at the cost of circuit performance. In order to satisfy the performance requirements demanded by the applications, the threshold voltage of the device should also be lowered to have both low power operation and high performance. However, there is a cost of higher static power dissipation due to large leakage currents [2] .
It is well known that subthreshold designs have dramatically increased sensitivity to process variations since drive current becomes exponentially dependent on threshold voltage [3] . Process variations and manufacturing uncertainties cause design variables to deviate from their nominal values. Moreover, the manufacturing process of nanoscale transistors and structures has introduced several new sources of variation that have made the control of process variation more difficult [4] . Thus the robustness of the design has come up with enormous challenges that must be resolved by the designers [5] .
The variations in may arise from nonuniformity in the distribution of the power supply and for different switching activity of the circuit. With leakage power becoming a significant contributor to total power dissipation, leakage current flowing through the grid can result in significant supply-voltage drops. The increase in the current density with the scaling in the technology and increase in rate of switching make it more challenging to retain the traditional bound on the supply-voltage noise [6] . Device threshold voltage is dependent on a number of process parameters such as channel doping concentration and gate length. As the critical dimension is scaled down, the number of dopant atoms becomes less and hence small variations in their number and position result in a large variation in device performance. Moreover, variations in the doping level cause variations in the base transit time in bipolar junction transistors [7] , further increasing the variability in mixed signal design. Increased variability and lower values can result in functional failures in dynamic logic design. A number of successful subthreshold designs have been presented in the literature [1] , [3] . Using subthreshold design, it is expected that energy efficiency in the range of 1 pJ/instruction can be achieved [8] . In addition, wide-range dynamic voltage scaling has been proposed where the processor can scale from high-performance superthreshold operation to ultra-low-power subthreshold operation depending on workload [9] .
In previous work, a minimum energy voltage for complementary metal-oxide-semiconductor (CMOS) subthreshold region was demonstrated [9] , [10] . However, the proposed analysis did not account for the impact of process variation. It has been shown that minimum-size devices are theoretically optimal for minimizing energy in subthreshold operation [11] . But minimum-size devices have increased sensitivity to variations. Therefore, variability must be considered when analyzing the minimum energy operating point. Zahi et al. addressed intradie variation by providing statistical models for energy and delay of an inverter chain in subthreshold [12] . Calhoun et al. derived analytical expressions for optimum to minimize energy in subthreshold operations and its dependence on design characteristics and operating conditions for a given technology [13] . However, functional yield was not considered until [14] .
Gonzalez et al. considered process variation for minimizing energy-delay product (EDP) [15] . Analytical expressions for the optimum point to minimize power at a given performance are shown for transregional models based on fitted [16] or physical [17] parameters. Measurements of a test chip with adaptive and adaptive body bias demonstrate the minimum power point for a given performance, but they also show forward-bias diode currents can make the theoretical optimum unreachable [18] . Expressions of the sensitivities of energy and delay to different parameters give a guideline for optimum energy circuits [19] . Optimizing subthreshold circuits has not been studied enough. Inclusions of uncertainty in subthreshold region of operation do provide a guideline for designing under variations. Nevertheless, each design needs to satisfy constraints that are application-dependent. Therefore, an unconstrained optimum point, suggested by this literature, does not satisfy the constraint and thus seems to be futile. This paper describes a novel simultaneous optimization of supply and threshold voltage scaling for subthreshold circuits in the leakage dominant era. Design optimization will be done in two different steps: finding the design variable that optimizes the objective function of the product and finding a tolerance region that provides the most yields.
To accomplish this, a design-specific feasible region is defined by constraints on minimum performance, maximum achievable performance, and numerically solving the ratio of leakage to total power in the -plane. However, due to manufacturing imperfections and technology shifting, it may not be possible to realize the nominal design value exactly. Therefore, design variables and are assumed to be random whose probability distribution may be known. A tolerance box that represents the variations in the voltages is moved over the design space. It attempts to place the tolerance box in such a way that the portions of the box with higher yield lie in the feasible region. The final location of the box and its center provides the nominal design values for and such that the design has a maximum immunity to the variations. This center maximizes parametric yield and optimizes energy for application-specific designs.
II. APPLICATION AREAS
Swanson et al. predicted ultralow voltage CMOS logic operating at a supply voltage of mV at room temperature and derived the fundamental limits of voltage scaling [20] . This was a key result for low-voltage logic design, which acts as a catalyst for many low-power applications. Subthreshold circuits have been used quite extensively in analog design [21] but not in digital domain. Subthreshold operations are suitable for specific applications, which do not need high performance but require extremely low power consumption such as hearing aids, pacemakers [22] , wearable computing [23] , and self-powered devices [24] . Emerging ultra-low-power application such as distributed sensor network is a natural fit with subthreshold circuits [13] . Subthreshold circuits can also be applied to applications where circuits remain idle for an extended period of time. This type of application appears in almost every design, including the high-performance microprocessor. So the recent explosion in applications that benefit from low-energy operation has created two classes of applications for which subthreshold circuits are well suited. The first is severely energy-constrained systems. The second class is called burst-mode applications that require high performance for part of the time but spend significant fractions of their operation doing non-performance-critical tasks.
III. SUBTHRESHOLD OPERATION
There are three different sources of power dissipation in CMOS circuits: the dynamic power, the static power, and the short-circuit power. The dynamic power results from switching capacitive loads between different voltage levels, whereas the static power is mainly due to subthreshold leakage between power supply and ground. The short-circuit power is dissipated during a switching event when there is a direct path from supply to the ground. In sub-100 nm technologies, there is also tunneling through the gate oxide due to reduced oxide thickness [25] . The contribution of power from gate leakage is significant when circuits remain idle for an extended period of time, i.e., at standby operation. For this, it can be included in the static power. Similarly, the short-circuit power can be modeled by the dynamic power equation, as it occurs only when the circuit is switching. Therefore, the total power can be written as a summation of these two powers as
For a CMOS gate, the dynamic power is [15] ( 2) where is the activity factor of the output node, is the total capacitance of the output node, is the supply voltage, and is the operating frequency. If the circuit performs one operation per cycle, then the energy per operation is [15] (3)
For a complex chip, the total dynamic power is simply the sum of the dynamic power of all gates. To determine the delay of an inverter, we use the alpha power model [26] (4) and the maximum operating frequency of the chip is given by (5) where is the logic depth and is the drain current. The subthreshold current for in the subthreshold region is [15] ( 6) where is the subthreshold slope factor, is , and is the zero threshold leakage current. The thin-oxide leakage of a single gate can be modeled as [27] ( 7) where (8) where is the current density, is the area of the gate, and are the technology constants, and is the oxide thickness. The power consumption of a chip can be represented as (9) where is the total number of gate of the circuit. Gate tunneling current has a strong dependence on the voltage across the gate, so it decreases with supply voltage much more quickly than subthreshold current. As a result, gate leakage become negligible in subthreshold region except when is high and is very low, which is not a good choice from subthreshold design perspective [28] . So from a realistic assumption, we have considered subthreshold leakage only in the leakage power term while plotting Low-power consumption in high-performance circuits is highly desirable. Scaling of is the most effective way to decrease power consumption since CMOS dynamic power quadratically depends on but degrades the performance of the circuit as well. This performance degradation can be compensated by decreasing , but then the subthreshold leakage power increases exponentially. Therefore, there is an optimum set of and that ensures the low-power and high-performance circuit design [29] , [30] . Given that there is a tradeoff between power and performance, we investigate their combined effect with recent trend in CMOS scaling in the -plane. Different metrics have been used to study supply and threshold voltage scaling such as EDP, energy or power-delay product, and power-energy product (PEP) [25] . The EDP metric gives a higher priority to performance than power. It is more suitable when performance is the primary concern. To prioritize power, PEP is a useful metric to be used as an objective function in optimizing power consumption and quality of a design. In this metric, power has a higher geometric weighting than delay and thus produces a lower power solution than the other two metrics. The energy metric gives balanced weighting to both. In this yield optimization for subthreshold design, to have both low-power operation and satisfactory performance, we choose PDP as our design metric with and to be two design variables to make the optimization technique more acceptable.
IV. PROBLEM FORMULATION
The computation of parametric yield is based on constructing a feasible region of operation where yield is defined as the percentage of chips that satisfy the constraints of the design metrics. The method uses the aspects of advanced first-order second-moment (AFOSM) reliability method in probabilistic design to find a linearized feasible region. The frequency lines determine two boundaries of the feasible region, while the power ratio curve determines the other boundary. Fig. 1 shows the variable activity factor characterization circuit used in this analysis. The circuit consists of an 11-stage two-input NAND ring oscillator with nine additional 11-stage NAND delay chains driven by the ring oscillator. This circuit is also used to model the pipeline delays for current microprocessor design. Fig. 2 shows the minimum energy point and contribution of active and leakage power to total energy in a ring oscillator circuit. Most circuits must meet specific performance targets. [13] . This figure also indicates that the minimum energy point occurs in subthreshold region. and leakage energy in the characterization circuit as a function of with originally fixed at 500 mV and an activity of one. Mathematically, the minimum energy point occurs where the slopes of the leakage energy and the active energy are equal in magnitude and opposite in sign. The figure shows that the minimum energy point occurs in the subthreshold region at mV. Markers indicate SPECTRE simulation results in 90 nm technology, and lines indicate analytical results. To choose different for different simulations, the device model was modified and new models were generated. As decreases, the subthreshold current increases exponentially, which is shown by the rise in leakage energy above . When decreases too far, then , and so subthreshold operation becomes invalid. From the figure, we see that leakage energy exceeds the dynamic energy for extremely low in this ring oscillator example. Of course, the advantage of lowering is increased performance in the subthreshold region for the same energy per operation, but it also increases the leakage energy dissipation, which puts a limit on the scaling of . Fig. 3 shows the effect of activity factor on the optimal and for minimum energy operation. Markers indicate SPECTRE simulation results in 90 nm technology, and lines indicate analytical results. If we vary the workload, the active energy decreases in proportion to the decrease in workload, but the leakage energy remains constant. So with the increase in workload, the minimum energy of operation increases and optimal supply voltage moves to the lower supply voltage, which leads to increase in variability and decrease in yield. The reduced gate performance is one factor that is keeping supply voltages from scaling down too quickly. Therefore, submicrometer technologies with low threshold voltages should be in demand for low-power applications. By simply moving to a low process, a designer could reduce the supply voltage and power without requiring a major change of the design, as the performance remains constant. But the irony arises from increasing leakage power in presence of process and operating point variation. Fig. 4 is obtained by numerically solving the ratio of leakage power to total power. The feasible region is constructed by the following constraints: the maximum achievable performance within the subthreshold design, the performance at the minimum energy point of operation, and the contour of leakage to total power ratio. Performance constraints are normalized by the performance contour that runs approximately through the optimal point. Any point in satisfies the constraints and is expressed as (11) where is the design variable vector and is the performance functions and index (here it is three) represents the constraint. These constraints can assume different values from one application to another. Here it is assumed that the minimum frequency is that corresponding to the minimum energy point, and the ratio of leakage power to total power is about 0.5. The first constraint ensures subthreshold operation with maximum achievable performance (12) The circuit clock frequency must exceed a given minimum value . This constraint ensures the minimum frequency of operation (13) Finally, the following constraint comes from the power budget: (14) The methods construct a polyhedral approximation of the feasible region by taking first-order approximation of each at the expansion point (15) where is the gradient vector of . The point is on the surface and has the minimal distance from the center of the initial tolerance box . The superscript stands for the center of the initial tolerance box. The shortest distance to the constraint is found by solving the minimization problem subject to (16) This minimization problem can be solved by the Newton or the gradient method. We use an iterative formula based on 
The superscripts and 1 refer to the index of iteration and to the gradient at the iteration. This formula attempts to solve indirectly. Convergence of (14) depends on continuity and convexity of , and the solution may not be unique. The method finds a linear approximation for each constraint in the form of (13) . Such approximations, together with some extra bounds on the design variables, form a polyhedral feasible region. The polytope is defined by (18) The th row of is , where all the partial derivatives are evaluated at , found by (15) and . The lower and upper bounds on the design variable are denoted by and , where index represents the th design variable. In this case, the design variables are and of the device. So far, the design space has been constructed. For the purpose of yield optimization, distribution of design variables is modeled.
Simulations results indicate that and variations can be modeled as normal distributions, which is a standard statistical model [32] , [33] . But the normal distribution does not have a closed-form cumulative distribution function (cdf), which is necessary for the yield estimation. In our method, yield is estimated directly using the joint cdf over the tolerance box requiring no numerical integration and thus saving considerable complexity for multidimensional problems. For this reason, instead of the standard model, we have considered Kumaraswamy's distribution [34] , double-bounded probability density function (DB-pdf), with the following form: (19) Fig. 4 . Contours of ratio of leakage to total power in subthreshold region. This is an extension of plot presented in [15] for above threshold operation. (20) Depending on the choice of parameters and , DB-pdf can take various shapes. It can be used to approximate uniform, triangular, Gaussian, highly kurtic, and many other single modal distributions. Another advantage of DB-pdf is that its integral, needed for yield evaluation, is available in closed form (21) V. TWO-LEVEL OPTIMIZATION To start the two-level optimization problem, a design metric (PDP, EDP, PT , etc.) is initially selected, based on the priority that needs to be given to the power as opposed to the performance in a given application. In the first level, the respective metric is optimized, subject to the constraints, and the solution of the constrained optimization is adopted as the initial solution for the yield optimization in the second level. At the second level, a tolerance box, with its center initially located on the deterministic optimum point obtained in the first level, is moved over the design space to find the best location in order to maximize the yield. The location of the box must not only satisfy the maximum yield objective but also be as close as possible to the deterministic optimum point obtained in the first level. The final location and the center of the box indicate a design that has the highest immunity against the variations in and . In addition to the design robustness, this center must provide the best possible tradeoff between power and performance. The size and location of the tolerance box depend on the allowed percentage of tolerance, the yield, the shape of the feasible region, and the distribution of the design variables. By minimizing the objective function in (16) , the tolerance box is moved over the feasible region to maximize the yield and minimize the distance between the deterministic optimum point and the center of the tolerance box in its final location. Fig. 5 depicts the reversed normalized energy contours at an average of 42 C for 90 nm CMOS technology. The optimal value shown here is the nominal PDP value of unconstrained minimization of PDP equation. Note that the contours are normalized by dividing the minimum energy by the calculated energy for any pair of and . Fig. 6 depicts the final location of the optimum tolerance box for 90 nm technology and all three constraints of the design. The outer box represents the tolerance range, and the inner box is the tolerance box with the maximum yield in the feasible region. If the variations in the design variable can be controlled so that the size of the outer box is reduced to that of the inner box, the yield becomes 100%. The yield is expressed as a function of the lower and upper limit of the design variables. The problem reduces to the search for a box over which the yield is maximized and is contained in the polytope defined by (13) (22) where is the inner optimum box contained in the feasible region and and are lower and upper bounds of the box, respectively. The containment requirement is equivalent to (23) Recall that is the transpose of the gradient vector , obtained by linearization of the performance constraint at a given . and are the upper and lower bounds of the same performance constraint, and refers to the constant terms in the linearization. We choose the reference point to be greater than or equal to to define the location of the larger tolerance box. The left bottom corner is , the top right corner is , and the range is given for the th width by . The variables and together place the location and the size of the optimal box that corresponds to the maximum yield. The The yield model (22) estimates the yield for given values of and . The objective of optimization, formulated below, is to move the tolerance box in such a way that the yield is maximized. The optimization model is given by Yield subject to (25) VI. RESULT AND DISCUSSION We used the sequential quadratic programming in MATLAB to solve the optimization problem. The MOSFET model for 90 nm technology is adapted from the BSIM 4 model. Monte Carlo simulations have been done to investigate an optimal operating region within which a circuit could function optimally and to verify its yield maximality. The result of yield optimization for different tolerances and different DB-PDFs is presented Fig. 7 . The final location of the tolerance box for Gaussian distribution of design variables over which the yield is maximized considering 10% tolerance for variations. Fig. 8 . The final location of the tolerance box for Gaussian distribution of design variables over which the yield is maximized considering 15% tolerance for variations.
in Table I . By using the methodology, optimized yield (Final M-C) is found to be significantly higher than the yield of initial nominal values (Initial M-C). For simple cases, the designer can estimate the initial design values close to the optimal ones, but for a subthreshold circuit when the slightest change in design variables can cause considerable change in performance functions, the proposed yield optimization method can give better yield. If the distribution of design variables (DB-PDF) changes, the location of the tolerance box also changes. Figs. 7-10 show the result of yield optimization for Gaussian and highly kurtic distribution of design variables and for different values of tolerances. Any pair of and in the feasible region satisfies the constraints. As seen from the figures, the distribution of variables in a specific part of feasible region causes the tolerance box to move in such a way that it crosses the performance functions, but the design variables are still inside the feasible region. The advantage of first-order approximation for feasible region is to find a tolerance box that corresponds to the yield Fig. 9 . The final location of the tolerance box for highly kurtic distribution of design variables over which the yield is maximized considering 10% tolerance for variations. Fig. 10 . The final location of the tolerance box for highly kurtic distribution of design variables over which the yield is maximized considering 15% tolerance for variations.
equal to one and then modify the process in a way that provides the expected tolerances for design variables to produce 100% acceptable products. As in our case, all performance functions are convex, so the estimated feasible region is always inside the exact feasible region. Yield increases when tolerance decreases, so there is a good tradeoff between increase in yield and the cost of design and manufacturing. So the financial data due to tolerances (manufacturing) plus cost due to yield (wastage cost) need to be evaluated. Table II lists the yield, Mean value, and standard deviation of the design variable for tight constraints considering Gaussian distribution. From Table II , we see that due to the variations in the design variables, design metric PDP has statistical measures. The mean and standard deviation of PDP, at the maximum yield point, are also calculated for two different standard deviations in design variables, which is mentioned in Table II. To set tight constraints, the maximum allowed frequency can be lowered or the acceptable ratio of leakage to total power can be reduced. In this paper, to see the effect of tight constraints, we take THE maximum allowed frequency to be equal to 80% of maximum frequency achievable with equal and . For minimum frequency of operation, we assumed . Again, the constraint on leakage to total power ratio was set to 0.4. This causes the area of the feasible region to shrink, which results in increase in the yield loss. It is very natural that due to tight constraints, many designs fail to satisfy either constraint.
From Tables I and II , it is clear that, with tight constraints, if we consider Gaussian distribution of design variables, yield decreases considerably from 98% to 84% for 10% tolerance (i.e., %) of variations of the design variables. To increase the yield, the process and environmental variations must be controlled so that the tolerance box (outer box) is resized to match the 100% yield box (inner box). To achieve this, we need to increase the precision of the equipment in the fabrication process at a higher cost. We can reduce the voltage noise by controlling the voltage drop within the chip. The variations of the design variables can be systematic or random in nature, and they fall into the following categories: fab-to-fab, lot-to-lot, interwafer, die-to-die, and even within-die. Systematic variations depend on the position of the devices on a die and the layout environment surrounding the devices [35] . It is possible to compensate the systematic variations from lithographic, etching, and layout information [36] . But the random uncertainties such as dopant number and location, however, cannot be predicted and cause all of the devices in close proximity to exhibit different characteristics. Therefore, to increase the yield, the constraints must be relaxed. This occurs when the feasible region is increased with a new set of constraints and the yield loss approaches zero. The effect of activity factor on yield is also analyzed in this paper. The circuit in Fig. 1 consists of ten levels, with 11 stages in each level. The first level is a NAND ring oscillator, and other levels are NAND chains. An activity factor of "0.1" is controlled by assigning "1" to select zero input and "0" to the subsequent select inputs. If more select inputs are set to "1," activity increases. To verify the methodology, Cadence SPECTRE simulations are performed for this circuit using 90 nm CMOS technology. As the activity of the circuit increases, the switching of the internal nodes increases and, consequently, the power consumption increases. Simulation results demonstrate that with the increase in workload, the minimum energy of operation increases to higher energy, as seen from Fig. 3 , and optimal supply voltage moves to the lower supply voltage. For this, variability increases and yield decreases. Fig. 11 shows the effect of activity factor of the circuit on parametric yield. We also investigate the effect of the transistor sizing on the yield, which is shown in Fig. 11 . The variations in the threshold voltage are slightly reduced by the increase in the device size, which in turn improves the yield if low activity of the circuit is considered. However, in an application for which activity of the circuit is high, the increase in the size of the transistors reduces the yield because the transistors' parasitic capacitance and, therefore, their power consumption are increased by increasing their sizes. Consequently, similar to those of the higher activity, optimal supply voltage moves to the lower supply voltage, where it results in higher yield loss. Table III lists the values of the center of the boxes at its final location obtained from the methodology. Table IV shows the process and circuit parameters from [15] used in this paper. An application-dependent design approach is presented for selecting the most appropriate pair of values for the supply voltage and threshold voltage, for which the yield is maximized and a near optimal tradeoff between power and performance is achieved. Parameter variations pose a major challenge in the design optimization of subthreshold VLSI circuits, especially for sub-100 nm technology. As the technology scales, design variables are more sensitive to process and environmental variations and different operating conditions of the circuit. The robustness and reliability of the design of integrated circuit is now emerging as a critical challenge in the variability-aware design especially for subthreshold circuits. Unlike the traditional deterministic PDP optimization, to find the optimal values for and under uncertainty, we have considered the following factors to make the optimization technique more reliable, efficient, and complete: the application-dependent power and performance constraints, variations in the design variables due to manufacturing uncertainty, device sizing, activity factor of the circuit, and power reduction techniques. For an efficient design, a metric that is application-dependent must be chosen carefully. The chosen metric is first deterministically optimized to find the optimum point in the feasible region. Then, a design center, which is the most immune to the variations, must be identified to maximize the yield. In addition to the design robustness, this center provides the best possible tradeoff between power and performance. The design-specific feasible region is constrained by minimum performance and maximum leakage to total power ratio in the -plane. In order to maximize the yield for double-bounded probability distribution functions, the AFOSM method was employed. We obtained the tolerance box corresponding to 100% yield and, for the worst case design, the best location for an existing tolerance box to maximize the yield. The sensitivity of the parametric yield to the activity factor was investigated, and we see that yield decreases as activity factor increases. We showed that with the increase in device size, yield is increased with lower activity of the circuit, but for the same device size, yield is decreased with higher activity of the circuit. The yield is found to be different for different values of tolerances and for different types of distribution of design variables.
