The 
INTRODUCTION
MOSFET scaling into deep sub-100nm has resulted in significant increase in leakage power consumption. Particularly, in 45nm technology and beyond, leakage power consumption will catch up with, and may even dominate, dynamic power consumption [1] . This makes leakage power control an indispensable component in the nano-era low power design. Sub-threshold leakage, gate leakage and band-to-band tunneling leakage are the three main components contributing to the total leakage power.
Currently, most runtime leakage control (RTLC) techniques are only used for standby leakage control. They are applied when circuits have long idleness. However during runtime, it is possible that some circuits have continuous workloads and only have short idleness. In this case, conventional RTLC techniques cannot achieve any leakage saving. Unfortunately, leakage power consumption is particularly severe in this case, due to the high die temperature when circuits are active. The positive loop between active mode leakage consumption and high die temperature can even cause a phenomenon called "thermal runaway" and lead to catastrophic thermal failure [2] . Thus, more aggressive leakage control is desirable, especially for active circuits with only short idleness. Figure 1 illustrates the operation of conventional and aggressive RTLC. Although aggressive leakage control is desirable, its main problem lies in the fact that most current RTLC techniques require large energy and delay overhead to perform mode transition. To demonstrate this, we first introduce two design parameters, energy breakeven time (EBT) and wakeup time (WUT) [3] , to quantify the energy overhead and delay overhead, respectively. EBT is defined as the minimal time for a circuit to stay in low-leakage mode, such that energy saving catches up with the energy overhead for performing mode transition. WUT is defined as the minimal time required for a circuit to fully recover from the low-leakage mode to normal mode. Due to the existence of EBT and WUT, leakage control techniques can only be applied when the idleness (T idle ) in the circuit workload satisfies:
T idle ≥ EBT + W UT (1) We call it as minimum idle time (MIT). If leakage control is applied for any idleness shorter than MIT, it results in either negative net energy saving, or circuit malfunction. Tschanz et al. [3] have shown with 130nm technology for a 4GHZ ALU unit, the MIT of power gating and body biasing is roughly 82 and 104 cycles, respectively. This long MIT prevents the conventional techniques from being applied for short idleness exploitation. To enable aggressive RTLC, MIT has to be minimized. As we can observe, EBT contributes to the most part of MIT. This indicates that energy overhead problem is more stringent.
Various researches have proposed methods to reduce the delay overhead [4] . For energy overhead reduction, Pakbaznia et al. [5] proposed the technique of "Charge Recycling" to recycle the charging energy between the virtual ground and supply of power gating. However, their scheme requires both ground gating and supply gating, and large amount switches inserted between the virtual ground and the virtual supply. This incurs extra area and power consumption, and difficulty for implementation. One existing runtime technique for aggressive leakage reduction is dynamic V th scaling (DVTS) [6] . But DVTS requires complicated control circuit, and sacrifices circuit performance for leakage saving. Our study in [7] first demonstrated that by using small biasing voltages, the energy overhead of body biasing can be dramatically reduced, while its leakage reduction is still significant. In [8] , we utilized this phenomenon to enable joint dynamic and leakage power control. In this paper, we utilize the phenomenon demonstrated in Figure 2 , and proposes two novel V th hopping techniques: workload-adaptive V th hopping (WAVTH) and hierarchical V th hopping (HIVTH). These two techniques can dramatically reduce the energy overhead of conventional V th hopping, and thus enable aggressive runtime leakage control. Compared with DVTS, they are easy to implement, and do not affect circuit performance. Compared with our previous studies in [7, 8] , this paper performs a comprehensive study on overall leakage saving in a runtime environment, considering the impact of input and temperature variations. On top of that, we derive a more accurate model for V th Hopping, and we extend the application of the study to both reverse body biasing and forward body biasing implementations. This paper is organized as follows. Section 2 presents the ideas of WAVTH and HIVTH. Section 3 derives an accurate model for general V th hopping. Based on this model, Section 4 presents an algorithm to determine the optimum design points of WAVTH and HIVTH. Section 5 shows detailed experimental results. Section 6 concludes the paper.
NOVEL V th HOPPING TECHNIQUES
Both WAVTH and HIVTH are based on V th hopping technique [9] . The basic concept of V th hopping is to adjust body bias to generate low V th for active circuits to guarantee performance, and high V th for idle circuits to reduce leakage. As shown in Figure 3 , V P L and V NL (V P H and V NH ) are the biasing voltages for active (idle) circuits. They can either be provided by off-chip voltage sources, or generated by onchip voltage converters. Narendra et al. [10] have shown that 24 sets of voltage converters and virtual body routing consume 2% of the total chip area, and 1% of the total chip power consumption. S 1 to S 4 are control transistors. When the target circuit is working, the control signal "HOPPING" is logic '0'. So S 2 and S 4 are on, and S 1 and S 3 are off. When V th hopping is applied, "HOPPING" will be set to '1'. S 2 and S 4 will be turned off, and S 1 and S 3 will on. Then the body of the target circuit will be connected to V P H and V NH to increase V th , and thus reduce leakage.
V th hopping applies to either reverse or forward body biasing (RBB or FBB) schemes. From the aspect of power analysis, the only difference is the value assignment of V P L , V NL , V P H and V NH in Figure 3 :
In the following study, we do not specify RBB or FBB schemes. Our model derived based on Figure 3 will be a general model applicable to both RBB and FBB. Also, we only demonstrate the modeling process on PMOS V th hopping. The NMOS case can be studied in a similar way. The experiments in Section 4 were conducted on both PMOS and NMOS with RBB scheme. Finally in this paper, we focus on tuning the control circuits of V th hopping, and do not discuss the generation of control signals, for example the "HOPPING" signal. The generation of control signals can be implemented by idleness monitoring or prediction [11] .
Workload-adaptive V th Hopping (WAVTH)
The main energy overhead for V th hopping mode transition is due to the charging energy for body capacitance (C B ). To fully charge up C B , the energy overhead is:
To compensate this energy overhead, the target circuit needs to stay in the high-V th mode for a long period, resulting in long EBT and MIT. For short idleness (T idle ), if we only charge C B to a small voltage (V f ) proportional to the length of idleness, the energy overhead can be reduced to:
Although leakage reduction degrades at the same time, the reduction of energy overhead is much more significant, as demonstrated in Figure 2 . This is the basic idea of WAVTH. Figure 4 illustrates the actual implementation of WAVTH (PMOS only). S 1 and S 2 still control the mode switching from low-V th to high-V th mode. The workload-adaptation feature is implemented by a PMOS transistor S 0 . The gate voltage of S 0 is fixed. So S 0 functions as a current source (regulator). The biasing voltage source V P H is designed to enable 20× leakage reduction. The working mechanism of WAVTH is as follows. Once the target circuit is switched to the high-V th mode, S 1 will be turned on and S 2 will be off. Then S 0 starts to charge up the body voltage (V B ) gradually. The charging process stops when the circuit is switched back to low-V th . For short idleness, V B will be small since the charging period is short. This reduces the energy overhead and thus MIT. For long idleness, V B will be fully charged up to V P H to achieve 20× leakage reduction. Therefore, WAVTH enables aggressive leakage reduction, and maintains 20× standby leakage reduction. Figure 5 illustrates the body voltage charging process for different idleness. The design goal of WAVTH is to minimize its MIT, and thus maximize aggressive leakage reduction. We should identify the minimum MIT spot, by controlling the charging speed of S 0 . Due to its slow charging speed, the width of S 0 is very small. So its extra area cost are almost negligible. The optimization of S 0 will be studied in Section 4.
STANDBY ACTIVE
W ORKLOAD BODY VOLTAGE V PL V PH V f2 V f1 V f3
Hierarchical V th Hopping (HIVTH)
The idea of hierarchical V th hopping (HIVTH) is to have two different biasing voltage sources: a large one (V P H ) and a small one (V P HS ). For short idleness, V P HS will be applied to reduce energy overhead:
(5) For long idleness, V P H can be applied to ensure 20× leakage reduction. As we will show in section 4, the reduction of energy overhead by adopting V P HS can be up to 9×. As such, HIVTH achieves aggressive leakage reduction, and maintains 20× standby leakage reduction. In fact, HIVTH essentially reduces energy overhead in a discrete fashion, while WAVTH does it in a continuous manner. Figure 6 shows the actual implementation of HIVTH. S 1 and S 2 control the mode switching from low-V th mode to high-V th mode. S A and S B control the mode switching between aggressive leakage reduction and standby leakage reduction. When the target circuit is in active mode, S A is off . Hierarchical V th Hopping (HIVTH) and S B is on. The biasing voltage is connected to V P HS to reduce energy overhead. In standby mode, S A is on and S B is off. The biasing voltage is connected to V P H to provide 20× leakage reduction. Figure 7 demonstrates the change of body voltage for different idleness by applying HIVTH. To maximize aggressive leakage reduction, the value of V P HS should be optimized. This will be studied in Section 4. HIVTH can be applied in a hierarchical manner to reduce leakage with finer granularity. As shown in Figure 8 , S A , S B , S C and S D are global switches, controlled by system level power management unit. A 1 (A 1 ) to A 3 (A 3 ) are three sets of local switches, controlled by local idleness monitors or predictors. When the system determines that the entire circuit is in standby mode, or reduced-performance mode, S A and S C will be on to apply standby leakage reduction (V P H ) or DVTS to the entire circuit. If the system determines that the entire circuit is in full-performance mode, S B and S D will be on to enable aggressive leakage reduction (V P HS ). Thus, if one of the sub-circuits has short idleness, the corresponding local switches will be set to apply V P HS to reduce its leakage, while other sub-circuits can work in full performance. The very low overhead of applying V P HS makes the scheme of fine-grained leakage reduction feasible. 
MODELING
A model characterizing the energy and delay overhead of WAVTH and HIVTH is needed, to determine their optimum design points. Since both techniques are intended for short idleness exploitation, the accuracy of this model is critical. For this purpose, we precisely analyze the complete mode transition process of V th hopping.
A General Model for V th hopping
The models of WAVTH and HIVTH can be derived upon V th hopping model by changing a few parameters. and BTBT tunneling leakage (I t , including body-to-VDD and body-to-GND tunneling leakage) are modeled as two V B -controlled exponential current sources [12] :
where I s and I t are the original subthreshold leakage and BTBT leakage of the circuit without V th hopping. B s and B t are the reduction exponents for subthreshold and BTBT leakage, respectively. The switch transistors (S 0 , S 1 and S 2 ) are modeled as two V B -controlled linear current sources:
I 1 is the charging current to the body, when "HOPPING" is logic '1'. I 2 is the discharging current from the body, when "HOPPING" is '0'. I 1 is controlled by transistor S 0 ( Figure  4) in WAVTH, and S 1 ( Figure 6 Figure 10 shows a complete mode transition of V th hopping. It consists of the following three phases. 
Three Phases of V th hopping Mode Transition

Phase One: Body Charging Period
Once V th hopping is applied, the biasing voltage source V P X starts to charge up the body voltage V B . Due to the increase of V B , subthreshold leakage reduces, while BTBT tunneling leakage increases according to Equation 6 . Meanwhile the charging current I 1 becomes weaker according to Equation 7 . Assume that at the end of phase one T 1 , the body voltage will be charged to V f . (For WAVTH, V f is a floating value, determined by W 1 . For HIVTH, this V f equals to V P HS .) Consider I 1 and V B as functions of time t. The charging process can be characterized by:
By solving Equations 8 we have:
With V B (t), we can estimate leakage currents as functions of time, by using Equations 6. Furthermore, we can estimate the leakage energy consumption (E 1 ) in phase one by integrating the leakage current from time zero to T 1 :
Phase Two: Stabilized Period If V B is fully charged up to V P X , the target circuit enters the second phase. At this phase, since V B is stable, the leakage current remains static at:
Bs(VP X −VPL)
Assuming that the end point of phase two is T 2 , we can estimate the leakage energy consumption (E 2 ) in phase two:
Phase two does not exist for WAVTH applied for short idleness. This is because V B of WAVTH will not be fully charged up for short idleness, as shown in Figure 5 . In this case, WAVTH will directly enter phase three. Phase Three: Body Discharging Period When the target circuit needs to go back to normal mode, the "HOPPING" signal will be set back to logic '0'. This triggers the third phase. The body voltage (V B ) starts to discharge through current source I 2 . As a result, the subthreshold and BTBT tunneling leakage change back to their normal values. This process can be characterized by:
where T 3 is the end time of phase three, or the time point at which the target circuit fully wakes up. Most designs have a constraint on the maximum WUT (T W MAX ) to avoid performance penalty:
By solving Equations 13 we have:
The leakage energy consumption (E 3 ) in phase three is:
Conclusively, the total leakage saving in all three phases is:
The energy overhead for charging the body capacitance is:
(18) The energy overhead for switching the switch transistors is:
(19) where g is a technology parameter, denoting the gate capacitance per unit width. For each mode transition, we need to ensure that net energy saving is positive:
OPTIMIZATION
Both WAVTH and HIVTH have the same optimization goal: to minimize their MIT, and thus maximize aggressive leakage reduction. Based on previous modeling, we can define the minimum MIT problem as:
where the minimized T 3 is essentially MIT, since it is the minimum time satisfying both energy and delay constraints. Equations 21 is a three-variable optimization problem with two inequality constraints. Here we decompose this problem into two steps to reach a fast and near-optimum solution.
Step Figure 11 . The first shape yields the best leakage saving, since V B stays at V P HS for the longest time. However, its sharp slope requires large W 1 and W 2 . This incurs large energy overhead and may even overwhelm the leakage saving. The fourth shape requires the smallest W 1 and W 2 , but it also yields the minimum leakage saving. Hence given T 3 and V P HS , there should exist optimum values of W 1 and W 2 , to maximize net energy saving. We call this problem as "optimum body-voltage slope", defined as:
Equations 22 is a maximization problem of a two-variable function in a bounded region. Two types of points can potentially be the maxima: 1) relative extreme within the boundary. 2) relative extreme on the boundary (T 3 -T 2 =T W MAX ). The first type of points can be identified by solving:
There is only one boundary since there is only one constraint on the minimum size of W 2 . Based on Equation 15, this boundary ( W 2 ) can be identified by solving:
Then the second type of points can be identified by solving:
Step Two: Minimization of MIT It can be proven that the optimum body-voltage slope problem (Equation 22 ) is a necessary condition of the original problem (Equation 21). So the decomposition of Step One simplifies, but does not alter the original problem. Based on Step One, Step Two solves the minimum MIT problem using an exhaustive search strategy for T 3 and V P HS , as described in Algorithm 1. We only show the algorithm for HIVTH. WAVTH can be solved similarly.
In Algorithm 1, M is the minimum idle cycles (MIT). "SolveOptBodySlope" is a sub-routine for solving the optimum body-voltage slope problem. W 1 and W 2 the optimum values obtained by "SolveOptBodySlope". This algorithm starts from 1 clock cycle, and increases the number (M ) of idle cycles by one for each iteration. In each iteration, it finds the maximum biasing voltage that can satisfies both energy and delay constraints. (Delay constraint is satisfied in "SolveOptBodySlope" and is not explicated shown in Algorithm 1.) The algorithm stops when the minimum number of idle cycles satisfying two constraints is found.
EXPERIMENTAL RESULTS
We conducted HSPICE experiments to verify the effectiveness of WAVTH and HIVTH. Experiments one and two were conducted with 32nm predictive technology [13] at 80
• C. Experiment three was conducted with 22nm, 32nm and 45nm technologies at different temperature. ISCAS85 benchmark circuits were used as target circuits. For each circuit, 1024 random input vectors were generated with different activity ratios as typical inputs. V DD is 0.9V. The clock speed is 2GHZ. The WUT constraint is 1/2 clock cycle.
MIT Reduction
The first experiment verified the effectiveness of the two techniques on MIT reduction. The minimum MIT of each benchmark circuit was obtained by Algorithm 1, and verified by HSPICE simulation. The results are shown in Table  1 . The MIT values are the average MIT for all 1024 vectors in the unit of clock cycles. All voltages are in the unit of V. In Table 1 , V P H and V NH are the common biasing voltage sources of normal V th hopping, WAVTH and HIVTH for standby leakage reduction. However with WAVTH, the body voltages are only charged to V P f and V Nf at the end of MIT. This workload-adaption feature allows WAVTH to reduce its MIT for about 2×, compared with basic V th Hopping. On the other hand, HIVTH is able to reduce its MIT for about 9×, by using small biasing sources V P HS and V NHS . Note that V P HS and V NHS are optimized for each benchmark just for demonstration purpose. In practice, there are only limited number of voltage sources. On top of the basic V th Hopping, the extra area overhead caused by WAVTH (the introduction of S 0 in Figure 5 ) and HIVTH (the introduction of S 1 and S 2 in Figure 7 ) are 0.1% and 3%, respectively.
Enhancement on Overall Leakage Reduction
The second experiments verified the enhancement of overall leakage reduction in a runtime environment. In another word, it verified how "aggressive" our techniques are. The 1024 input vectors were fed to each circuit consecutively. Leakage reduction techniques (basic V th hopping, WAVTH or HIVTH) were applied to each circuit whenever the idleness in the inputs is larger than its MIT. We simulated the total energy consumption of 1024 cycles in HSPICE with and without leakage reduction techniques. Then we calculated the total leakage saving percentage (R total ) of 1024 cycles by applying three techniques. R total is a comprehensive indicator for the effectiveness of our techniques. As shown in Table 2 , for each circuit, we generated three sets of 1024 input vectors with different activity ratios (30%, 20% and 10% ). "M.I." refers to the maximum improvement of our techniques over basic V th hopping, for all three sets of input. In Table 2 , the average and maximal improvement of WAVTH over basic V th hopping is 4.3% and 7.7%, respectively. The average and maximal improvement of HIVTH is 19.2% and 23.4%, respectively. The improvement is dependent on input activities. To demonstrate this, we conducted more experiments on C432 to see R total variation with different input activities. As shown in Figure 12 , the improvement is significant when the activity is between 0.05 to 0.3. Higher activity (> 0.3) produces little idleness, and thus yields less improvement. Lower activity (< 0.05) creates long idleness. In this case, standby leakage reduction is already able to control most leakage energy consumption, hence the improvement of our techniques is also limited. 
Technology and Temperature Impact
The third experiment exercised the technology and temperature impacts on our techniques. The results are shown in Table 3 . We can observe that: 1) In newer technology nodes and higher temperature, the MIT of all three techniques is smaller, due to higher leakage-to-dynamic ratio. 2) In all settings, WAVTH is able to reduce MIT for 2×, while HIVTH is 6-9×.
3) The improvement on R total is more effective in low-leakage settings, and slightly less effective in highleakage ones. This is because for high-leakage settings, basic V th hopping is already capable of reducing significant amount of leakage, so the improvement is limited. However as we have shown, the improvement also depends on the input activity. Even in a high-leakage setting, our techniques can be especially effective for certain types of input activity. 
CONCLUSION
This paper emphasizes that the key of enhancing aggressive leakage reduction is to reduce the overhead (MIT) of leakage reduction techniques for performing mode transition. Two techniques based on V th hopping, WAVTH and HIVTH, are proposed to reduce energy overhead and enhance aggressive leakage reduction. In order to determine the optimum design points of these two techniques, an accurate model has been derived to characterize the whole mode transition process of V th hopping for both RBB and FBB schemes. Experimental results show 19.2% average improvement on overall leakage saving by applying WAVTH and HIVTH in a runtime environment.
