Technology scaling demands a decrease in both V dd and V t to sustain historical delay reduction, while restraining active power dissipation. Scaling of V t however leads to substantial increase in the sub-threshold leakage power and is expected to become a considerable constituent of the total dissipated power. It has been observed that the stacking of two off devices has smaller leakage current than one off device. In this paper we present a model that predicts the scaling nature of this leakage reduction effect. Device measurements are presented to prove the model's accuracy. Use of stack effect for leakage reduction and other implications of this effect are discussed.
INTRODUCTION
To limit the energy and power increase in future CMOS technology generations, the supply voltage (V dd ) will have to continually scale [1] . The amount of energy reduction depends on the magnitude of V dd scaling [2] . Along with V dd scaling, the threshold voltage (V t ) of MOS devices will have to scale to sustain the traditional 30% gate delay reduction. These V dd and V t scaling requirements pose several technology and circuit design challenges [3] [4] [5] .
One such challenge is the rapid increase in sub-threshold leakage power due to V t scaling. Should the present scaling trend continue it is expected that the sub-threshold leakage power will become a considerable constituent of the total dissipated power [6] . In such a system it becomes crucial to identify techniques to reduce this leakage power component. It has been shown previously that the stacking of two off devices has significantly reduced subthreshold leakage compared to a single off device [7] [8] [9] . This concept of stack effect is illustrated in Figure 1 . In rest of the paper the term leakage refers to sub-threshold leakage.
In this paper we present a model that predicts the stack effect factor, which is defined as the ratio of the leakage current in one off device to the leakage current in a stack of two off devices. Model derivation based on device fundamentals and verification of the model through statistical device measurements from 0.18µm and 0.13µm technology generations are presented in Section 2. The scaling nature of the stack effect leakage reduction factor is also discussed in the next section.
One solution to the problem of ever-increasing leakage is to force a non-stack device to a stack of two devices without affecting the input load, as shown in Figure 2 . By ensuring isoinput load, the previous gate's delay and the switching power will remain unchanged. Logic gates after stack forcing will reduce leakage power, but incur a delay penalty, similar to replacing a low-V t device with a high-V t device in a dual-V t design [10] . In a dual-V t design the low-V t devices are used in performance critical paths and the high-V t devices in the rest [11] . Usually a significant fraction of the devices can be high-V t or forced-stack since a large number of the paths are non-critical. This will reduce the overall leakage power of the chip without impacting operating clock frequency. In Section 3 we discuss the stack forcing method to reduce leakage in paths that are not performance critical. This stack forcing technique can be either used in conjunction with dual-V t or can be used to reduce the leakage in a single-V t design. Differences between achieving leakage reduction through forced-stacks and channel length increase are discussed in Section 4. Conclusions and future work are described in Section 5.
MODEL FOR STACK EFFECT FACTOR
Let I 1 be the leakage of a single device of unit width in off state with its V gs = V bs = 0 V and V ds = V dd . If the gate-drive, body bias, and drain-to-source voltages reduce by ∆V g , ∆V b , and ∆V d respectively from the above mentioned conditions, the leakage will reduce to,
where S is the sub-threshold swing, λ d is the drain-induced barrier lowering (DIBL) factor, and k γ is the body effect coefficient. The above equation assumes that the resulting V ds > 3kT/q [6] . For a two-device stack shown in Figure 3 , a steady state condition will be reached when the intermediate node voltage V int approaches V x such that the leakage currents in the upper and lower devices are equal.
Under this condition, the leakage currents in the upper and lower devices can be expressed as, where U is the universal two-stack exponent which depends only on the process parameters, λ d and S, and the design parameter, V dd . Once these parameters are known, the reduction in leakage due to a two-stack can be determined from the above model. It is essential to point out that the model assumes the intermediate node voltage to be greater than 3kT/q. To confirm the model's accuracy we performed device measurements on test structures fabricated in 0.18µm and 0.13µm process technologies. Results discussed in the rest of the section are from NMOS device measurements, but similar results hold true for PMOS devices as well. It is known that the stack effect factor strongly depends on λ d as suggested by the model. Also a decrease in the channel length (L) will increase λ d in a given technology [12] . So, any increase in the leakage of a single device due to decrease in L will not increase leakage of a two-stack at the same rate. This is illustrated in Figure 5 where increase in two-stack leakage is at a slower rate than that of a single device. Figure 6 illustrates the average stack effect factor for the nominal channel devices in both 0.18µm and 0.13µm technology generations obtained from both the measurements and the model. The increase in stack effect factor at a given V dd with technology scaling is attributed to increase in λ d , which is predicted by the analytical model. The higher stack effect factor for the low-V t device in 0.13µm technology generation is due to the same effect.
In 0.13µm generation, the low-V t device will dominate chip leakage. Figure 7 shows the scaling of stack effect from a 0.18µm device to a 0.13µm low-V t device based on device measurements under different V dd scaling scenarios. Since λ d is expected to increase due to worsening device aspect ratio and since V dd scaling will slow down due to related challenges [13] , stack effect leakage reduction factor is expected to increase with technology scaling. The predicted scaling of stack effect factor from 0.18µm to 0.06µm is depicted in Figure 8 .
This scaling nature of stack effect factor makes it a powerful technique for leakage reduction in future technologies. In the next section we describe a circuit technique for taking advantage of stack effect to reduce leakage at a functional block level.
LEAKAGE REDUCTION USING FORCED-STACKS
As shown earlier, stacking of two devices that are off has significantly reduced leakage compared to a single off device. However due to the iso-input load requirement and due to stacking of devices, the drive current of a forced-stack gate will be lower resulting in increased delay. So, stack forcing can be used only for paths that are non-critical, just like using high-V t devices in a dual-V t design [10] [11] . Forced-stack gates will have slower output edge rate similar to gates with high-V t devices. Figure 9 illustrates the use of techniques that provide delayleakage trade-off. As demonstrated in the figure, paths that are faster than required can be slowed down which will result in leakage savings. Such trade-offs are valid only if the resulting path still meets the target delay. Figure 10 shows the delayleakage trade-off due to n-stack forcing of an inverter with fanout of 1 under iso-input load conditions in a dual-V t 0.13µm technology [14] .
By properly employing forced-stack one can reduce standby and active leakage of non-critical paths even if a dual-V t process is not available. This method can also be used in conjunction with dual-V t . Stack forcing provides wider coverage in the delayleakage trade-off space as illustrated in Figure 10 .
Functional blocks have naturally stacked gates such as NAND, NOR, or other complex gates. By maximizing the number of natural stacks in off state during standby by setting proper input vectors, the standby leakage of functional block can be reduced. Since it is not possible to force all natural stacks in the functional block to be in off state the overall leakage reduction at a block level will be far less than the stack effect leakage reduction possible at a single logic gate level [7] . With stack forcing the potential for leakage reduction will be higher. Figures 11(a) and 11(b) illustrates such an example.
Forcing a stack in both n-and p-networks of a gate will guarantee leakage reduction due to stacking, independent of the input logic level. Such an example is shown in Figure 11 (c). To reiterate, stack forcing can be applied to paths only if increase in delay due to stacking does not violate timing requirements. Gates that can force stack effect independent of its input vectors will automatically go into leakage reduction mode when the intermediate node of the stack reaches the steady state voltage. This will boost standby and active leakage reduction since no specific input vector needs to be applied.
STACK EFFECT VS. CHANNEL LENGTH INCREASE
It is possible to facilitate delay-leakage trade-off by increasing the channel length of devices [15] that are in non-critical paths. To maintain iso-input load the channel width will have to be reduced along with increase in the channel length. Figure 12 , shows the mean leakage reduction achievable by increasing the channel length. Mean leakage is defined as the geometric mean of the leakages with and without variation in critical dimension around the channel length of interest. This mean leakage is expected to model the leakage of a chip that has within-die variation in critical dimension. In Figure 12 the channel length of interest is given by η x 0.18 µm and stack leakage is for a stack of two devices with η of 1 and w u =w l =½w. As it is clear from Figure 12 , the channel length has to be increased 3 times as that of the nominal channel length to match the mean leakage of a two-stack of 0.18µm devices. The main reason for such a large increase is attributed to the reverse short channel effect that is present due to halo doping [13] where V t reduces with increase in channel length. Figure 13 shows the energy-delay trade-off of an inverter under different configurations with fan-out of 1 and iso-input load. The simulation-based comparison clearly shows that the two-stack configuration's delay is less than delay due to increasing channel length, especially when compared to iso-standby leakage (η≈3) configuration. As summarized in Figure 14 , η of 2 has about the same delay as that of the two-stack with η of 1 but with a 2.3X higher mean leakage. On the other hand η of 3 provides about the same mean leakage as the two-stack but with 60% higher delay.
CONCLUSIONS
We presented a model based on device fundamentals that predicted the scaling nature of stack effect based leakage reduction. Device measurements verified the model's accuracy across different temperature, channel length, body bias, supply voltage, and process technology. Modes for using stack forcing to reduce standby and active leakage components were discussed and the advantage of stack forcing over channel length increase for delay-leakage trade-off was demonstrated. Stack forcing assignment for standby and active leakage reduction at a functional block level with and without dual-V t will be explored in the future.
V dd
Source Drain Fig. 11(c) If a gate can have its input as either "0" or "1" and still force stack effect then that gate will have reduced active leakage. The more the number of inputs that can be either "0" or "1" the higher the probability that stack effect will reduce active leakage. 
