Abstract-Subthreshold circuit designs have been demonstrated to be a successful alternative when ultra-low power consumption is paramount. However, the characteristics of MOS transistors in the subthreshold region are significantly different from those in strong inversion. This presents new challenges in design optimization, particularly in complex gates with stacks of transistors. In this paper, we present a framework for choosing the optimal transistor stack sizing factors in terms of current drivability for subthreshold designs. We derive a closed-form solution for the correct sizing of transistors in a stack, both in relation to other transistors in the stack, and to a single device with equivalent current drivability. Simulation results show that our framework provides a performance benefit ranging up to more than 10% in certain critical paths.
design can be easily expanded to a hierarchical 64-bit adder such that the result will be attained in four cycles.
I. INTRODUCTION
Due to the robust nature of static CMOS logic, circuits in this technology family can operate with supply voltages below the transistor threshold voltage (V th ), while consuming orders of magnitude less power than in the normal strong-inversion region. The operating frequency of subthreshold logic is much lower than that of regular strong inversion circuits (V dd > V th ) due to the small transistor current, which consists entirely of leakage current. The low operating frequency and low supply voltage combine to reduce both dynamic and leakage power, leading to the significant power savings seen in subthreshold designs.
Subthreshold logic holds promise for the growing number of applications in which minimal power consumption is the primary design constraint. Such circuits have received much attention in recent research, and a number of successful designs have been demonstrated. A multiplexer-based SRAM was proposed for subthreshold operation by Wang and Chandrakasan [1] . They also introduced new tiny-XOR circuits and demonstrated their performance in a fast Fourier transform (FFT) processor running at a supply voltage of 180 mV. Kim et al. [2] presented a new high-density SRAM system operating down to 200 mV at the ISSCC'07. In [3] , Kim et al. built an ultra-low-power adaptive filter for hearing aid applications using subthreshold logic. Subthreshold friendly logic styles and massively parallel digital signal processing (DSP) architectures were used in that work to achieve lowvoltage operation.
The characteristics of MOS transistors in the subthreshold region are significantly different from those in the strong inversion region. The saturation current, which was a near-linear function of the gate and threshold voltages in the strong inversion region, becomes an exponential function of those values in the subthreshold regime [4] . In this paper, we show that the sizing methods used to obtain maximum performance must be reformulated for use in subthreshold designs due to these different characteristics. In particular, we present a framework for choosing the optimal transistor stack sizing factors in terms of current drivability for subthreshold circuits. A closed-form solution for the optimal sizing of stacked transistors is derived and shown to match simulation results. Our theoretical sizing values closely match those found in simulations with predictive technology model (PTM) [5] , [6] devices ranging from 130-nm technology down to the 45-nm node. This sizing method is shown to provide a clear benefit in logic paths containing a large number of stacks where the nodal capacitance is not dominated by the increased device sizes used in our method.
II. OPTIMAL TWO-STACK SIZING

A. Optimal Ratio Between Two Stacked Devices
The first step we take in developing the subthreshold stack sizing framework is finding the optimal width ratio between transistors in a stack for maximum drive current. Here, we will present a closed-form expression for the relative sizing of two transistors in a stack, showing that it is beneficial to size up the transistor nearest to the supply rail (V dd for PMOS, ground for NMOS). The starting point is the following pair of current equations for upper and lower transistors as situated in an NMOS stack (so the lower device is connected to ground), excluding the common factors that will cancel out when they are equated: 
as well as the fact that m = 1 + , to further simplify calculations.
Rewriting the two current equations and equating them yields the following relationship:
Solving for VX and using the definition VT = kT=q gives us
We then define W T = W U + W L to eliminate W L , which results in the following current equation: We find the optimal size for W U by setting (@I U =@W U ) equal to zero.
Again, using our definition of W T , we then find the optimal size for WL. This derivation results in the following equations:
According to these results, we expect to drive a higher current through the two-transistor stack when the lower device is larger than the upper transistor by a factor of p . For example, with an NMOS stack in 90-nm PTM technology, when using a W U of 1 m, the optimal W L would be 1.23 m at V dd = 0.2 V, and 1.30 m at V dd = 0.3 V. As shown in (3), is a function of V dd , resulting in the different optimal width ratios for different V dd values.
HSPICE simulations using 45-130 nm PTM technology files closely match the results of our derivation and verify that the benefit of using the p sizing ratio is more pronounced for larger values (i.e., when the supply voltage is larger). PMOS transistor stacks exhibited the same sizing trends-optimal sizing requires the upper transistor (adjacent to the power supply) to be sized up by a factor of p . Results for 90-nm technology are displayed in Fig. 1 , and indicate optimal ratios that are roughly 4% to 6.5% smaller than the theoretical p factors stated earlier. Due to the small difference in current with the skewed sizing (0.5% to 1.5% improvement), we will use a 1:1 width ratio in stacks. This reduces the design complexity for a negligibly small performance penalty. 
B. Optimal Two-Stack Sizing Factor
After deciding to use a 1:1 ratio for the two devices in a stack, we must find the amount by which they should be sized up to drive the same current as a single transistor. Defining W = WU = WL as the size of each transistor in the stack, we can modify (6) as follows:
For a single transistor, the current equation is I = W e e = W e e (10) where W e stands for the effective width of this device. From (9) and (10), we have the following relationship:
According to this equation, two stacked transistors should be sized up by a factor of (1 + ) in relation to a single device for the same current drivability. Tables I and II display (1 + ) stack sizing values from this theory and from simulation results, demonstrating the validity of (11) . DC simulations were performed to find the correct sizing for transistors in a stack which is capable of conducting the same amount of current as a single unit-sized device. Sizing factors found in simulations were slightly smaller than those predicted by the theory derived above due to effects not captured by current (1), but the trend with technology scaling is nearly identical in both cases.
Results indicate that stacks need to be sized up by a larger amount in the subthreshold region compared to the strong inversion region. Also note that NMOS stack sizing factors are significantly smaller in strong inversion due to velocity saturation.
III. ARBITRARY STACK SIZES
A. Proof of the Symmetry of the Lowest n 0 1 Device Widths in an n-Stack
Building an extensive cell library based on this stack sizing framework requires an extension of our work to stacks of three or more devices. The derivation for the current equation of a three-stack, which follows a similar method as the derivation in Section II-A gives us the following result:
W1 and W2 stand for the widths of the two lower transistors in the stack of NMOS devices (see notation in Fig. 2 
The vi variables are shorthand for e 0V i=V and stands for e (V 0V )=V .
Step 1) By setting (16) equal to (17), we can show that
Step 2) Next, by setting (15) where W fn02k1g is the parallel combination of transistors 1 through n 0 2.
Step 3) Finally, setting (13) equal to (14), we can solve for n01 n01 = W fn01k1g Wn + W fn01k1g :
We now have the following current equation:
Defining WT = n i=1 Wi and substituting for Wn in (22), we get
Wi W fn01k1g
An examination of (23) shows that the variables W 1 through W n01 appear symmetrically in the expression. Therefore, when In is optimized, W1 through Wn01 must have identical values, since setting the partial derivative of I n with respect to each W i , for i = 1 to n 0 1, will result in a symmetric set of n 0 1 equations.
B. Optimal n-Stack Sizing Factor
Given the symmetry of the lower n 0 1 device sizes, i.e., W X = W 1 = W 2 = 1 11 = W n01 , we have the following general form for I n in an n-stack: Thus, we have proven that the p sizing ratio holds for the general n-stack case.
As in the two-transistor stack case, the scaling factor of p leads to a trivial performance benefit (e.g., a 0.3% increase in current through a PMOS or NMOS stack in 90-nm technology with a total stack width of 1 m), so sizing all stacked transistors equally is the best choice in terms of overall design complexity. Using (24) and following the example of (11), we find that each device in an n-stack should then be scaled up by a factor of [1 + 3 (n 01)] to set the effective width of the stack equal to that of a single unit transistor (see Fig. 2 ). Note that all work done here again applies to PMOS stacks in a similar manner. The discrepancies between the larger sizing factors predicted by this theory and those found with simulations become slightly more pronounced as the stack size grows. For PMOS three stacks, the difference stays within the 4%-7% range, while for large alpha values, NMOS sizing factors are overestimated by up to 15% due to second-order effects not captured in (1) and (2).
IV. SIMULATION RESULTS
A. Critical Path: A Chain of Stacks
We tested our sizing with 130-, 90-, 65-, and 45-nm PTM simulations using simple chains of logic gates that are representative of those that may be found in the critical path(s) of ultra low power circuits. In order to isolate the benefits of using the larger stack sizing in subthreshold operation, a consistent beta ratio (PMOS to NMOS width ratio) of 1.5 was employed across all simulations. This nominal value is close to that used in advanced CMOS processes. Stack sizing factors found with dc simulations as described in Section II-B were used. These experimentally determined numbers closely match our theoretical results, as stated earlier.
The logical effort sizing method was used as a straightforward means of quickly optimizing the delay though a logic path [7] . Logical effort is defined as the ratio of the input capacitance of a gate to that of an inverter driving the same amount of output current. Fig. 3 displays logical effort values based on our stack sizing parameters, as well as the corresponding parasitic delay values. Parasitic delay represents the delay of a gate driving no load, and is set by the parasitic junction capacitance.
While the additional loading on previous stages created by the larger stack sizes here can degrade the performance of some logic chains, critical paths driving substantial fan-out capacitance, and particularly those containing paths dominated by stacks, do benefit from this sizing. The simple circuit illustrated in Fig. 4 is an example of a critical path whose delay is improved with our stack sizing framework. The fan-out inverter widths were kept constant across all experiments and their loading effect was taken into account through the branching factor [7] . The minimum width (i.e., the NMOS width in the unit-sized inverter) was held at 1 m. The gate capacitance of the inverters indicated in Fig. 4 served as the input and output capacitance parameters for the logical effort calculations (C in and C out , respectively).
Delays were found for both the path through this circuit consisting entirely of stacks (the "Stacks" path), and that containing no stacks (the "Fast" path), using the worst-case input pattern for each. Critical path delay results for V dd = 0.3 V and V dd = 0.2 V are shown in Tables III and IV, respectively. As indicated here, the critical path shifts from the stacks path to the Fast path when using the optimized subthreshold sizing, and the critical delay is consistently reduced. Also, note that the 1.2-V sizing scheme was optimal when operating in strong inversion, with improvements over subthreshold sizing performance ranging from <1% to 12.3%.
In logic paths where there are not chains of stacks driving each other in sequence, the larger subthreshold stack sizing becomes less beneficial, or even detrimental in terms of performance, due to its loading effect on the previous stage. For instance, if inverters are inserted between each NAND/NOR pair in the circuit in Fig. 4 , improvements in subthreshold with our larger stack sizes are reduced to 1%. In a chain of just NAND gates, the smaller stack sizes used in superthreshold were generally better choices across all supply levels. In detailed optimization schemes, care must be taken to account for transient effects, including the variance of load capacitances as operating conditions change. DC sizing schemes such as the one presented here provide us with intuition about the devices we are constructing circuits with, and a starting point for thorough optimization procedures.
V. CONCLUSION
We have presented a new stack sizing framework for circuits operating in the subthreshold region. A closed-form solution for the optimal width ratio between different devices within a stack, as well as the sizing factor for stacked transistors was presented and shown to closely match experimental results. Our optimization scheme resulted in performance gains of up to 10+% in simulations of critical paths where internal node capacitance is not dominated by the increased stack sizing factors.
