Abstract-Ultra-low-voltage operation can greatly reduce the power consumption of circuits. However, there is no fast, effective, and comprehensive technique for designers to estimate power, delay, or effects of process variation of a design operating in the ultra-low-voltage region. This paper presents a simulation framework that can quickly and accurately characterize a circuit from nominal voltage to the subthreshold region. The framework uses the nominal frequency and power of a target circuit, obtained using gate-level or transistor-level simulation tools, and normalized ring oscillator curves to predict delay and power characteristics at lower operating voltages. Specific contributions of this article include a weighted average method, an improvement to a previously published form of this framework, as well as a methodology to estimate the effects of process variation based on the same framework. The weighted averages framework takes into account the types of gates that are used in the circuit and critical path to give a more accurate power and timing characterization. Despite the varying results by several orders of magnitude, the errors are no greater than 20.01%, 15.30%, and 8.870% for circuit delay, active energy, and leakage power, respectively, for the weighted averages technique. To validate the framework, a detailed analysis is given in the presence of a variety of design parameters as well as a range of benchmark circuits.
I. INTRODUCTION

I
N A VAST ARRAY of applications such as battery operated smart phones with increasingly more sophisticated features, wearable computers, sensor network processors, and tiny implantable chips that monitor or otherwise aid patients, the power budget is very restricted because the system is running either on a battery or limited scavenged power. The two components of power consumption, active and leakage power, can both be reduced by reducing the supply voltage, a technique called voltage scaling. Traditional voltage scaling is limited to about half the nominal voltage, the typical voltage at which a circuit is designed to operate. The reason behind this is that some analog-like components of the circuit, such as sense-amplifiers and phase-locked-loops, do not operate properly below the transistor threshold voltage. Moreover, conventional CMOS logic faces extra challenges when operating in Ultra-Low-Voltage, or ULV, such as increased sensitivity to noise and process variation. However, in recent years, several research groups have shown that with a careful and innovative approach, it is possible to have a complete design that operates well below the threshold voltage. Among the reported prototypes are a CDR circuit that can run at 50 kbps in 300 mV [1] and a decoder that can run at 1 MHz at 240 mV [2] . The great power reduction of ULV operation comes at the cost of performance. The reason is that the transistors are not actually switching as usual in the ULV region; instead they operate by modulating the leakage current that passes through them, which is much less than the usual "on" current. Because of the reduced speed of the circuit and the high accuracy required to measure the very small current levels, transistor level simulators such as SPICE take a long time to simulate ULV circuits. In addition, the larger range of potential operating voltages greatly expands the design-space, which requires time-consuming analysis of the circuit across a range of voltages. This motivates a fast numerical approach to estimating delay and energy in the ULV region without having to resort to SPICE simulations.
When deriving values for power or timing with respect to voltage, closed-form equations such as the Shockley model are quick but inaccurate and generally only work in certain operating regions. Conversely, simulating a circuit in SPICE will yield accurate results but is time consuming and makes designspace exploration difficult. In this paper, a simulation framework is presented that provides both a fast and accurate analysis of a gate-level design across a wide range of supply voltages, including the ULV region. The simulation framework is based on the initial framework of [3] , [4] , [6] , [5] but includes a weightedaverage methodology that greatly increases the accuracy.
The baseline framework discussed in this article and in [4] works on the assumption that when delay/energy vs. supply voltage curves are normalized, i.e., the entire curve is divided by the value at nominal voltage, the resulting curve is independent of the underlying circuit. This means that characterizing any circuit across a range of supply voltages simply requires a normalized ring oscillator curve and the frequency or power value at nominal voltage of a target circuit. In this article, we introduce a weighted-average method, a major improvement that yields large drops in error. With this method, a variety of normalized ring oscillator curves are constructed and are combined using a weighted-average to form a circuit-specific scaling curve. This method reduces the error to no more than 20.01% for circuit delay, 15.30% for active energy, and 8.870% for leakage power. To serve as a baseline, we also compare the framework to the EKV model [7] or Enz, Krummenacher, and Vittoz model, which is a single set of closed-form equations that model digital circuits all the way from nominal voltage to the ULV region and demonstrate that the error with this method is significantly lower.
The contributions of this article are as follows:
• A detailed description of the methods is presented. First an overview of the original framework as well as the proposed weighted average technique is introduced. The potential pitfalls of the two methods are further discussed.
• An analysis of the framework is shown in the presence of varying parameters such as fanout, transistor width, and critical path.
• The framework is validated by applying it to some ISCAS benchmark circuits as well as a 256-bit multiplier. The techniques presented in this paper are compared to the EKV model to further illustrate the versatility of the method proposed.
• The framework will be used to estimate global and local process variation effects on the timing of circuits.
• Finally, it will be shown that the measured results collected from an accurate transistor-level tool, which take several hours to obtain, are very close to the estimated results, which only take a few seconds to obtain.
II. BACKGROUND AND RELATED WORK
A transistor has three main modes of operation, depending on the voltages at the terminals and the threshold voltage of the transistor. Cutoff Mode, or Subthreshold Mode, is one in which . In theory, the device would not be generating any current in this mode. In actuality, some electrons travel from the source to the drain despite the device being in the "off" mode, a phenomenon known as subthreshold leakage. With the Triode Mode and Saturation Mode, and the transistor is on. This means that a channel has been created so that current can flow between the source and drain [8] , [9] .
Some attempts at fast simulation frameworks have been made through mathematical models. One of the first models used was the Shockley diode equation [8] . With improvements in transistor sizing as well as the increased application of ULV devices, more specific equations have been used. However, these models only apply to ULV operations under specific conditions. For instance, [10] is a model for leakage current in ULV regions when transistor stacking is seen. It can be seen that the results from [10] are comparable to the results of the methodology presented in this paper. Another mathematical model can be found in [11] , which presents a model for delay in ULV regions even in variable conditions, such as process variations and transistor stacking. However, these are both mathematical models that only work in ULV regions.
Other methodologies that address the need for versatile power and timing modeling frameworks have been addressed in [12] and [13] . In [12] a modeling framework that can be used to estimate power, area, and timing for multiple core architectures is described. [13] addresses power estimations made without considering potential glitching of a circuit, which can drastically change the power estimation results of a circuit. Despite the versatility of these frameworks, an advantage that this paper has over frameworks such as [12] and [13] is that this paper addresses ULV operation of a circuit. Also, the methodology works with existing simulators, such as PrimeTime and SPICE, that can simulate events such as glitching.
Traditionally, a separate set of equations have been used to model the transistors in each mode of operation. However, the EKV models present a single set of equations can cover any region of operation without loss of accuracy [7] , including the ULV region. A complete derivation of the model itself can be found in [7] , while [14] applies the model to estimating the timing and energy of low power digital circuits. The details of the EKV approach will be covered in Section V and it serves as a baseline to evaluate the effectiveness of the proposed methodology.
The method presented in this paper introduces an estimation framework that is not only comprehensive with regard to the supply voltage applied, which ranges from superthreshold to subthreshold, but also with regard to the applications to active energy, leakage power, and timing delay analysis. Furthermore, this method improves the technique presented in [15] significantly.
III. FRAMEWORK
The fundamental justification of this framework is the assumption that timing and power curves for all circuits within a given technology are constant after being normalized with respect to a single voltage. Using this assumption, a smaller circuit can be quickly characterized in SPICE, normalized, then used to estimate a larger circuit's power, energy, and timing characteristics using only values at a single voltage. With regards to this article, a ring oscillator has been used as the characterization circuit. For Section III-A, a single ring oscillator has been used whereas Section III-B uses ring oscillators that are comprised of the gates in the target circuit. Fig. 1 presents the simulation framework. The idea is to characterize a ring oscillator in an accurate circuit simulation program such as SPICE across a range of supply voltages, then normalize the resulting curves so that they can be applied to other circuits. A ring oscillator is comprised of an odd numbered chain of complimentary gates with the output connected back to the input. The frequency of oscillation is related to the delay of the gates as well as the length of the chain. As the supply voltage of the ring oscillator is reduced, the frequency of oscillation and the power consumption also reduce.
A. Basic Framework
To construct the normalized curves, the supply voltage of a ring oscillator is varied from nominal voltage all the way down to the minimum operating voltage, which is in the ULV region. The resulting frequency and power curves are then normalized by dividing the curve by the value at nominal voltage. These normalized curves can then be used to determine how other circuits respond to supply voltage reduction. All that is needed to characterize an arbitrary design is a fast gate-level design analyzer such as PrimeTime that can collect power and delay characteristics of a circuit at nominal voltage. As shown in Fig. 1 , these numbers can then be multiplied by the normalized ring oscillator curves, producing curves that model the frequency and power consumption of a design at any voltage. 
B. Weighted Average of Gates and Flip Flops
One of the benefits of the method introduced in Section III-A is that it does not require any information about the circuit aside from the energy, power, and timing characteristics at nominal voltage. However, this also introduces error from a varying , et cetera, as can be seen in Section IV. One source of error is the fact that the characteristics of the target, such as the number of combinational and sequential elements of the circuit itself are not considered. Knowing such characteristics allows for a more specific analysis of the target circuit.
The weighted average methodology considers the types of components in the entire circuit or critical path when evaluating energy and power or timing, respectively. While the original technique chooses an arbitrary ring oscillator to normalize, this method specifically chooses the components involved in the target circuit's standard cell library. After normalizing and scaling each component's curves, the ratio of each component found in the circuit is then multiplied to the scaled ring oscillator values to yield the new estimate. This can be seen in Fig. 2 .
Meanwhile, (1) shows a mathematical modeling of such an implementation.
is the scaled estimation for the target circuit's energy, power, or timing values with respect to the different components' ratios.
is the number of components of type in either the entire circuit for active energy and leakage power or the critical path for timing delay, is the total number of components in the target circuit, is the scaling curve of component type , and is the standard cell library.
(1)
The caveat to this proposed method is that a ring oscillator of every gate in the library, or at least those being used in the target circuit, must be characterized in order for this technique to be used. However, this can be done in an automated manner, and once the characterized curves are acquired, any target circuit using that particular technology and library can be evaluated.
C. Process Variation Effects
In the same way the base framework performs, this process uses normalization and scaling to estimate the amount of process variation that the circuit may have, which is shown in Fig. 3 . This is done by evaluating the coefficient of variation of a ring oscillator across different voltages. The coefficient of variation is the ratio of the standard deviation to the mean. The reason that the coefficient of variation is used instead of standard deviation is because the mean changes with every . Therefore, standard deviation alone does not provide an accurate representation of data.
The motivation for using this technique rather than simply simulating the circuit using Monte Carlo simulations is the time that is saved. In addition to the amount of time it takes to simulate a circuit in the ULV regions, trying to find the process variation of the same circuit would require a Monte Carlo analysis, which would increase the total simulation time by the number of Monte Carlo simulations required. Therefore, this framework has a significant advantage with regards to the time that is saved for a designer.
IV. ANALYTICAL VALIDATION
This section provides an analytical validation for the framework presented in Section III-A. Later in Section VI, we will provide experimental results considering a specific technology with specific circuits. However, analytical equations allow us to consider the broader case. Additionally, by showing the analytical validation, the sources of error can be more easily identified. To do so, equations that can be applied at any , technology, , et cetera, are provided. By normalizing the equations, it can be shown how normalization affects the circuit's parameters.
The coefficients of the equations can be broken up into two categories: circuit dependent parameters and circuit independent parameters. Circuit dependent parameters, such as transistor width and load capacitance, vary from circuit to circuit, while circuit independent parameters depend on the process. The simulation framework assumes that the normalized curves of two circuits scale the same way with respect to . This means that after normalizing timing and power, circuit dependent parameters get canceled out and only circuit independent parameters remain (i.e., a normalized ring oscillator curve is identical to a normalized multiplier curve). It will be shown that certain assumptions have to be made in order for this to be true. When these assumptions do not hold, error is introduced into the estimate. The extent of the error will be shown in the experimental validation in Section VI.
A. Active Energy
In this section, energy is considered rather than power so that frequency no longer has to be considered, thus reducing the complexity of this analysis. (2) gives a basic equation for the active energy dissipation of a circuit. (3) shows (2) after normalization to nominal voltage.
(2) (3) (3) shows that normalization cancels out circuit dependent parameters such as activity factor, , and load capacitance, , meaning the active energy of a very large circuit with a low activity factor scales the same as a small circuit that is very active with respect to operating voltage.
B. Leakage Power
The EKV equations for digital CMOS are based on a circuit dependent parameter known as specific current, , given in (4), which is the current when . The parameters of are subthreshold slope, , mobility, , oxide capacitance, , and thermal voltage, [14] . (5) shows the EKV equation for leakage power where is the Drain-Induced Barrier Lowering (DIBL) Coefficient and is the total specific current of the entire circuit.
It is important to note that the parameters in are assumed to be circuit independent, which means that they will be cancelled out during normalization as shown in (6) . (6) shows the equation for leakage current after normalization. DIBL, which is represented as , is a potential source of error because the framework assumes that is constant, which is not always the case due to transistor stacking. Also, (5) assumes that is constant. In reality, changes with , and , which is another possible source of error.
C. Timing
The general equation used to find the timing of a critical path, shown in (7), can be generalized to the EKV model by defining the drain source current as (8) .
is the average specific current of the critical path, is a fitting parameter, and is the inversion coefficient, which is defined in (9). The inversion coefficient represents the inversion of a transistor in both subthreshold and superthreshold regions [14] .
(10) shows the EKV equation for delay. Here, represents the total capacitance of the critical path and represents the specific current of the critical path, both of which are circuit dependent parameters. (6) already showed how normalization cancels out circuit dependent parameters when the transistor is in the cutoff region. (10) and (11) show the effect of normalization on timing. Again, DIBL may lead to some error due to the fact that there may be transistor stacking. Also, the framework once again assumes that is a circuit-independent parameter, whereas in reality, can change with different transistor sizes and supply voltages.
One thing to note is that this section considers an analysis from a transistor level. This in turn makes the assumption that the width, length, , stacking, et cetera are the same for every gate, which is an assumption made in Section III-A. In actuality, some of these values will vary and that introduces additional error into the analysis.
V. EXPERIMENTAL METHODOLOGY
This section further discusses the specifics of how the results in Section VI were obtained. To start, variations of the baseline ring oscillator are compared. These variations are those that highlight the potential error sources of this methodology. Then, larger circuits are used to further verify the validity of this framework by demonstrating its accuracy. The characterization of flip flops are discussed and finally, the effects of global and local process variation on the timing of circuits is discussed.
A. Modified Ring Oscillator Analysis
First, to validate the simulation framework, baseline normalized ring oscillator curves were obtained from an 11-stage 2-input NAND ring oscillator with a fanout of one and with transistor lengths and widths. The number of stages in the ring oscillator, as seen in Section VI ultimately made very little difference in the evaluation of the methodology. This was simulated in HSpice and the frequency, energy, and power consumption were measured from nominal voltage down to subthreshold voltage, which in this case is 400 mV. These baseline curves serve as the foundation of the simulation framework and are used to determine the energy, power, and timing of other circuits. Therefore, the effects of results compared to baseline ring oscillators with respect to varying parameters such as load capacitances, varying paths, et cetera need to be evaluated.
To do so, the baseline's characterized curves are compared to the ring oscillators with varied parameters. These experiments were performed with a 32 nm predictive model library.
1) Increased Fanout:
To evaluate the effects of an increased fanout in a ring oscillator, one can make incremental modifications to the baseline ring oscillator. If the baseline ring oscillator has a fanout of one, then the modified ring oscillator will have a fanout of four. This is achieved by adding three 2-input NAND gates with the outputs grounded between each stage of the baseline ring oscillator. A fanout of four was chosen because a synthesized circuit will usually not have a fanout greater than four.
2) Varying Critical Path: To simulate a varying critical path depth, a ring oscillator made up of 2-input NAND gates with 31 stages was created. This will be compared to the baseline ring oscillator, which has 11 stages.
3) Increased Transistor Width: Another parameter that can be varied is the width of a transistor. An 11-stage ring oscillator with 2-input NAND gates is created wherein the transistor width is increased by five times the original width of the baseline ring oscillator's transistors.
B. Circuit Analysis
ISCAS benchmark circuits as well as a 256-bit multiplier were analyzed using the proposed framework as well as the EKV model. Pass transistor logic and gates with three or more inputs were excluded because of their poor reliability in the ULV region [5] . The circuits were simulated in SPICE from nominal voltage down to a subthreshold voltage level.
1) Leakage Power:
To form the EKV baseline, the target circuit's leakage current is measured from values between nominal voltage and 400 mV in addition to varying voltages using HSpice. The resulting current values are fitted to a surface plot in MATLAB where and are treated as variables as shown in (14) . (14) is derived by dividing from (5) and taking the natural logarithm of both sides. This will yield (12) which can be rewritten as (13) . Finally, (14) is simply (13) rewritten to make the multivariable nature of the equation more evident. The fitted equation will give the fitting parameters , and which can be used to find values and as seen in (15) and (16), respectively. (or ), , and are then put into (12) where the exponential of both sides are taken to yield the final leakage current value estimated by the EKV model. Finally, this value is multplied by to produce the leakage power estimated by the EKV model. To obtain values of and , (15) and (16) are used with the parameters derived from (14) . (15) (16) The values for , and are derived by using the surface fitting tool in MATLAB. and will be used in (15) and (16) and the exponential of is used to derive the value of . Using these derived values, (5) can be used.
To evaluate leakage power for weighted averages, a chain of gates is made out of the particular type of gates found in the circuit. Leakage power values are collected at multiple values. After determining the power for each gate, each set of values is normalized and scaled by the power at nominal voltage of the target circuit. Finally, the ratio of each gate's occurrence in the circuit is multiplied by its respective power curve to yield the leakage estimate.
2) Active Energy: With regards to the EKV model, using (2), the variables and are grouped together into one variable, . At nominal voltage, the active energy of a particular circuit is measured and set equal to . Knowing the value of and setting , the value of can be derived. Now that the constant is known, any value of can be estimated using the EKV model. To evaluate active energy in combinatorial components using the weighted averages technique, a chain of each of the gates found in the standard library is made. Total energy values are collected at values between nominal voltage and 400 mV. After determining the total energy for each gate, the leakage energy of the circuit is subtracted from the total to yield the active energy value. Then, each set of values is normalized and scaled to the energy at nominal voltage of the target circuit. Finally, the ratio of each gate's occurrence in the circuit is multiplied by its respective energy curve to yield the final energy estimate.
3) Timing: Using the parameters and that were generated through Section V-B1, these can be put into (9) . In the case of this paper, is evaluated at only one . The result of (9) is then put into (10) where the parameters , and will be treated as a constant, . To solve for , the delay characteristic of a particular circuit is measured with HSpice at varying and levels as shown in (17). (17) will generate every value of for each so the geometric mean of is taken to produce a constant. From this, is found at nominal voltage using HSpice. Now, any value of can be estimated using the EKV model.
(17)
For the weighted averages estimation's combinational elements, an 11-gate ring oscillator is used to derive the gates' natural frequency. These frequencies are collected at values between nominal voltage and 400 mV. These values are normalized and scaled to the frequency of the target circuit at nominal voltage. Finally, the ratio of each gate's occurrence in the critical path of the circuit is multiplied by its respective frequency curve to yield the timing estimate.
C. Flip Flops
To characterize the leakage power of a flip flop, a single flip flop's leakage power values are collected at values from nominal voltage to 400 mV. The normalized data is then used To determine the setup time of a flip flop, the period between the data edge changing and the clock asserting was made larger and larger until the data latched. The critical delay of a sequential circuit is the sum of the combinatorial critical delay and the flip flop setup requirement. The timing experiments shown in Section VI-B only look at clock to Q delay rather than evaluating setup time. Similarly, to determine the hold time of a flip flop, the period between the clock asserting and the data edge was made smaller and smaller until the data no longer latched.
D. Process Variation
To determine the effects of global and local process variation in the timing of a circuit, an 11-gate ring oscillator is run using Monte Carlo to derive the ring oscillator's natural frequency at varying instances. These frequencies are collected at values between nominal voltage and 400 mV. The mean and standard deviation of these results is taken at each measured and used to derive the coefficient of variation for each . This is then normalized and scaled to the coefficient of variation of the target circuit at nominal voltage. This will yield the estimated coefficient of variation for the target circuit.
VI. RESULTS
A. Modified Circuit Parameter Analysis
To demonstrate that this technique is accurate even with varying circuit parameters, this section shows three examples of the baseline ring oscillator being modified either through circuit dependent parameters (such as varying critical path) or circuit independent parameters (such as transistor width) and compared to the original baseline ring oscillator for leakage power, active energy, and timing delay. Fig. 4 shows the baseline curves compared against normalized curves obtained from ring oscillators that were created to test the effects of fanout, transistor width, and critical path length. Fig. 4 also shows the error between the baseline curves and the modified curves. Only nominal numbers are used in the framework and as the supply voltage gets farther from nominal, the error increases. This can also be due to second order effects being more pronounced in lower levels.
B. Benchmark Circuit Test
To illustrate the significance of the accuracy of this methodology, the estimation curves are shown in comparison to the original measurement in Fig. 5 , for a large circuit. The error between the benchmark and the estimations derived through the methodologies illustrated in this paper are minimal considering the framework only uses nominal values and at the lowest voltage, the circuits are more than 1000 times slower. Another consideration is that it took several hours to simulate each circuit across the range of voltages in SPICE, while the simulation framework took only seconds to obtain an estimate. The circuits used have an approximate transistor count of over 100 to almost 35 K transistors. Fig. 5(b) shows the error amongst the various methodologies for a 256-bit multiplier, which has about 35 K transistors. Table II provides a more comprehensive look by showing the absolute value of the maximum error with the estimation methodologies for various circuits.
The general trend observed is that the weighted average technique becomes a much more accurate alternative to the EKV model or the original framework proposed. In the case of active energy, the original framework proposed and the EKV model are comparable. Because the weighted averages framework takes into account the cells that are in each circuit, it will be more accurate in comparison to the other frameworks shown here.
The main drawback to the technique proposed in this paper is that an additional step is required in determining the characterized curve. While the original framework requires only one gate's characterized curves, the weighted averages technique requires that all cells in the target circuit's library be characterized. However, once this is done, any target circuit using that library can be characterized.
C. Sequential Circuit Considerations
The simulation framework can be used with sequential circuits. Fig. 6 shows the baseline timing curve along with the normalized setup requirement of a flip flop. Because both of these normalized curves scale identically with a decreasing supply voltage, the framework can be used to estimate the critical delay of a sequential circuit.
D. Process Variation Estimation
The simulation framework can also be used to estimate the effects of global and local process variation on timing at any given . Table I shows the maximum coefficient of variation error when evaluating the effects of process variation on the timing of a circuit.
VII. CONCLUSION
This paper presented a framework to quickly characterize the frequency and power consumption of a circuit across a wide range of voltages, including the ULV region. One problem with simulating circuits with low supply voltages is that transistor level simulations can take a long time due to the increased simulation time and accuracy requirements for the ULV region. The simulation framework addresses this issue by using only the nominal frequency and power of a circuit and multiplying them with normalized ring oscillator curves. Additionally, the framework presented is not a mathematical model, so many parameters that may be unknown in the target circuit are not needed. Despite the differences in order of magnitude with respect to frequency and power in different operating voltages, the estimates were accurate and only took a couple of seconds to obtain, while the SPICE simulations took many hours to complete. This technique proved to be even more accurate than the EKV Model, which is currently the standard methodology for estimating the frequency and power characteristics of a given circuit. The speed and accuracy of this framework makes it useful for many applications including architectural comparisons, minimum energy point determination, and design space exploration. 
