Abstract-The design of a suitable power gating (e.g., multithreshold or super cutoff CMOS) structure is an important and challenging task in sub-90-nm very large scale integration (VLSI) circuits where leakage currents are significant. In designs where the mode transitions are frequent, a significant amount of energy is consumed to turn on or off the power gating structure. It is thus desirable to develop a power gating solution that minimizes the energy consumed during mode transitions. This paper presents such a solution by recycling charge between the virtual power and ground rails immediately after entering the sleep mode and just before wakeup. The proposed method can save up to 43% of the dynamic energy wasted during mode transition while maintaining the wakeup time of the original circuit. It also reduces the peak negative voltage value and the settling time of the ground bounce.
Charge Recycling in Power-Gated CMOS Circuits Ehsan Pakbaznia, Student Member, IEEE, Farzan Fallah, Senior Member, IEEE, and Massoud Pedram, Fellow, IEEE Abstract-The design of a suitable power gating (e.g., multithreshold or super cutoff CMOS) structure is an important and challenging task in sub-90-nm very large scale integration (VLSI) circuits where leakage currents are significant. In designs where the mode transitions are frequent, a significant amount of energy is consumed to turn on or off the power gating structure. It is thus desirable to develop a power gating solution that minimizes the energy consumed during mode transitions. This paper presents such a solution by recycling charge between the virtual power and ground rails immediately after entering the sleep mode and just before wakeup. The proposed method can save up to 43% of the dynamic energy wasted during mode transition while maintaining the wakeup time of the original circuit. It also reduces the peak negative voltage value and the settling time of the ground bounce.
Index Terms-Charge recycling, leakage, low power, multithreshold CMOS (MTCMOS), power gating, very large scale integration (VLSI).
I. INTRODUCTION
A S CMOS technology scales down, supply voltage is reduced to avoid device failure due to high electric fields in the gate oxide and the conducting channel under the gate. This supply voltage scaling reduces the dynamic component of circuit power dissipation, but, unfortunately, also decreases the switching speed of transistors. To compensate for this performance loss, the transistor threshold voltages are decreased, which, in turn, causes an exponential increase in the subthreshold leakage current. Furthermore, to maintain the gate voltage control over the active region of the transistor, the thickness of the dielectric between the gate and the channel region is reduced, which, in turn, results in an exponential increase in the gate leakage current. Please refer to [1] for a more detailed discussion.
Power gating, also known as multithreshold CMOS [2] or MTCMOS for short, is used to cut off the power to some functional blocks in a design. MTCMOS provides low leakage and high-performance operation by utilizing high-speed low-V t (LVT) transistors for logic cell implementation and low-leakage high-V t (HVT) transistors for power gating switch implementation. The power gating switch itself is typically realized as a single (footer) NMOS or (header) PMOS transistor, which disconnects logic cells from ground or V DD rails to reduce the leakage when the circuit is in sleep mode. Some of the design challenges that must be considered when using the power gating technique are the following: 1) placement and sizing of the sleep transistors; 2) automatic generation of sleep signal; 3) sleep signal scheduling for wakeup noise reduction; 4) mode transition energy minimization; 5) state retention; and 6) support for multiple levels of sleep. In this paper, we focus on the problem of mode-transition energy minimization.
The remainder of this paper is organized as follows. In Section II, we review prior work in the area of power gating. Section III introduces the concept of charge recycling in powergated circuits. In Section IV, we derive conditions for achieving the maximum energy savings with charge-recycling MTCMOS (CR-MTCMOS). The impact of the proposed charge-recycling technique on the leakage and ground bounce (GB) of the circuit is discussed in Section V. A few important variants of the charge-recycling technique in MTCMOS circuits and the application of the technique to super cutoff CMOS (SCCMOS) circuits are discussed in Section VI. Sections VII and VIII present our simulation results and conclusions, respectively.
II. PRIOR WORK
MTCMOS is a leakage-power saving solution that provides high active mode performance and low standby leakage power [3] , [4] . Sleep transistors slow down logic cells during the active mode operation of the circuit. This is due to the voltage drop across the functionally redundant sleep transistors and the increase in the threshold voltage of logic cell transistors as a result of the body effect. The performance penalty of using a sleep transistor depends on its size and the amount of the current that flows through this transistor due to logic transitions in the active mode. A number of researchers have proposed methods for the optimal sizing of sleep transistors in a given circuit to meet a performance constraint [5] - [9] .
A large amount of current can flow during the sleep to active mode transition in an MTCMOS circuit. A high-peak rush current in the circuit can cause electromigration problems in the power/ground rails. This rush current can also result in supply/ ground bounces due to the Ldi/dt effect. In [10] , the authors propose a wakeup strategy and a partitioning technique to limit the rush-through current. The authors of [11] tackle the problem of minimizing the wakeup time while limiting the current that flows to ground during the sleep to active mode transition. Their approach consists of first obtaining the discharge patterns of all logic cells and then grouping the circuit into a minimum number of clusters in such a way that the total discharge current of each cluster is below a given threshold. In [12] , the authors introduce two power mode transition strategies to reduce the 0278-0070/$25.00 © 2008 IEEE GB while turning on the circuit. The first strategy uses a single sleep transistor and gradually turns it on, while the second technique employs parallel-connected sleep transistors with increasing widths and turns them on one after the other, starting from the transistor with the smallest width.
Due to the large amount of mode-transition energy overhead and large wakeup latency for the circuits, sometimes, for short standby periods, it is better to put the circuit in a drowsy mode instead of the sleep mode. The reason is that the wakeup latency of the drowsy circuit is much less than that of the circuit in sleep mode. The work in [13] presents multiple power modes for the circuit, but it needs multiple supply voltages (stable reference voltages to drive the gate terminal of the sleep transistor which will be operating in different points of the subthreshold conduction region during the sleep mode). In [14] , the authors propose a power gating structure to support an intermediate (drowsy) power-saving mode and the traditional power cutoff mode. The idea is to add a PMOS transistor in parallel with each NMOS sleep transistor. By applying zero voltage to the gate of the PMOS transistor, the circuit can be put in an intermediate power saving mode whereby leakage reduction and data retention are both realized. Furthermore, by transitioning through this intermediate mode while changing between sleep and active modes, the magnitude of the voltage fluctuation of the power supply or ground during power-mode transitions is reduced. In the cutoff mode, the gate of the PMOS transistor is connected to V DD .
None of these works attempt to minimize the power consumption during the sleep-to-active and active-to-sleep transitions or reduce wakeup time and the noise generated by the power gating structure while maintaining the low standby leakage current. In this paper, we apply a charge-recycling technique to minimize the power consumption during the mode transition in a power gating structure while maintaining the wakeup time. Through simulations, we show how the proposed technique also helps reduce the GB in the sleep-to-active transition. A preliminary version of this paper was published in [15] .
III. CHARGE-RECYCLING TECHNIQUE
Consider the coarse-grain MTCMOS configuration shown in Fig. 1 . There are two different blocks in the circuit; one is power gated by an NMOS sleep transistor which connects the virtual ground (VGND), i.e., node G in the figure, to the ground, whereas the other is power gated by a PMOS sleep transistor which connects the virtual V DD (VV DD ), i.e., node P in the figure, to the supply. In the active mode, sleep transistors S N and S P are in the linear region, and the voltage values of the VGND and virtual V DD are equal to zero and V DD , respectively. In the sleep mode, sleep transistors S N and S P are turned off; since they are HVT devices, very little subthreshold leakage current flows through them.
In practice (see below for precise conditions), all internal nodes of the gates in block C 1 and the VGND node G will be charged up to a voltage value very close to V DD . This happens because G is floating and leakage current causes its voltage level to rise toward V DD . Similarly, if the sleep period is long enough, all internal nodes of C 2 and the virtual supply node P will be discharged to a voltage very close to zero. We discuss this in more detail in the following section.
A. Virtual Node Voltage Values in the Sleep Mode
Consider subcircuit C 1 in Fig. 1 . We show that the assumption of the VGND node being charged to a value close to V DD is invalid only when outputs of all logic cells in C 1 are forced to logic one (i.e., the pull-down sections of all cells are off) immediately before the active-to-sleep transition occurs. However, this case rarely happens in practice, because if there is at least one cell in C 1 with output value set to logic zero (i.e., its pull-down section is on) before the active-to-sleep transition and if the sleep period is sufficiently long, then the steady-state value for the VGND voltage after entering the sleep mode will be close to V DD . Considering that a subcircuit will typically contain tens of logic cells, the probability of at least one of them having a logic zero at its output (before entering the sleep mode) is almost one; therefore, the voltage of the VGND of subcircuit C 1 will rise and reach close to V DD after sufficient time is spent in the sleep mode.
To empirically confirm the aforementioned claim, we show the voltage waveforms of the VGND node for four different cases in Fig. 2 . In each case, we have used an NMOS sleep transistor (the case with PMOS sleep transistor will be similar except that the corresponding output states are reversed). The first case is when there is only a single inverter cell in subcircuit C 1 and the output of the inverter is logic one before entering the sleep mode. As the figure shows, after entering the sleep mode, the VGND voltage of the inverter cell rises to about 200 mV, which is much less than V DD = 1.2 V. The next case corresponds to the same subcircuit C 1 , this time with the output of the inverter forced to logic zero. Here, the VGND voltage rises to 0.95 V, which is close to V DD = 1.2 V and a suitable level for the charge-recycling purpose (cf. Section III-B). Next, two cases correspond to C 1 comprising of four inverter cells, each driven an input to C 1 . In one case, three of the inverter outputs are one and only one inverter output is zero. In this case, the VGND voltage rises to an even higher level than case 2, resulting in a final steady-state voltage level of 1 V, which is again suitable for the charge-recycling purpose (cf. Section III-B). In the last case, two inverter outputs are set to logic one while the others are set to logic zero. Clearly, in this case, after entering the sleep mode, the VGND node is expected to rise and achieve a level that is even closer to V DD than before. This is confirmed by the top waveform in the figure, which shows the VGND of subcircuit C 1 reaching to a voltage close to 1.2 V.
In summary, as long as there is a reasonably large number of logic cells in a subcircuit (this is usually the case in practice) that use an NMOS sleep transistor, the probability that at least one of these cells will have a logic zero output value before entering the sleep mode is close to one; therefore, the VGND voltage of such a subcircuit will gradually rise and stabilize to a voltage close to V DD . This occurs in a relatively short period of sleep time (on the order of microseconds), which provides us with the opportunity for charge recycling between this subcircuit and another one that uses a PMOS sleep transistor. The case where a PMOS sleep transistor is used instead of an NMOS transistor is similar, and it can be shown that the VV DD node is discharged to some value close to zero during the sleep mode.
In the next section, we use this observation to propose a charge-recycling technique to achieve energy savings during mode transitions.
B. Charge Recycling for Mode-Transition Energy Saving
When the sleep-to-active transition edge arrives at the gates of the sleep transistors in an MTCMOS circuit, the voltage of VGND node G starts to fall toward zero, whereas the voltage of VV DD node P starts to rise toward V DD . If we denote the total effective capacitance in the VGND and VV DD nodes by C G and C P , respectively, we observe that during the active-tosleep transition, C G is charged up from zero to V DD , while C P is discharged from V DD to zero. The situation is reversed for the sleep-to-active transition, i.e., in this case, C G is discharged from V DD to zero, while C P is charged to V DD from its initial value of zero. These charge and discharge events on the VGND and VV DD nodes are wasteful from the energy dissipation point of view. Our goal is to reduce the energy as we switch between the active and sleep modes of the circuit. More precisely, we propose to use a charge-recycling technique to reduce the switching power consumption during the active-to-sleep and sleep-to-active transitions by adding a charge-sharing switch between the VGND and VV DD nodes as shown in Fig. 3 . The proposed charge-recycling strategy works as follows. We turn on the charge-sharing switch 1) immediately before turning on the sleep transistors while going from the sleep to the active mode and 2) just after turning off the sleep transistors while going from the active to the sleep mode. By turning on the switch at the end of the sleep mode as the circuit is about to go from sleep to active mode, we allow charge sharing between the completely charged up capacitance C G and the completely discharged capacitance C P . After the charge recycling is completed, the common voltage of the VGND and virtual supply is αV DD , where α is a positive real number less than one. The value of α depends on the relative sizes of C G and C P . As a result of this step, the mode-transition energy is reduced. The reason is that, in this case, the voltage of VGND changes from αV DD to zero and the voltage of the VV DD changes from αV DD to V DD , whereas in the conventional MTCMOS circuit, the transitions are from V DD to zero and from zero to V DD at the VGND and VV DD nodes, respectively. A similar analysis proves that the charge-recycling technique helps reduce the energy dissipated for transition from the active mode to the sleep mode as well.
In practice, we use a transmission gate (TG) to realize a switch (cf. Fig. 4 ). One may instead use other circuit realizations of a switch, such as pass transistors. Note that, with a TG, it is easier to achieve full charge sharing between the floating VGND and virtual V DD nodes. We will use a TG in the rest of this paper.
The proposed charge-recycling technique is used for modetransition energy saving in a coarse-grain MTCMOS design where each sleep transistor is used to disconnect ground/supply from multiple logic cells. In contrast, in fine-grain MTCMOS design, the standard cell library comprises of logic cells with integrated sleep transistors, i.e., each logic cell has its own built-in sleep transistor. Typically, in such a library, virtual ground/supply nodes are considered as internal logic cell nodes. This means that the charge cycling technique cannot be applied. Of course, if the logic cell library is designed such that we have access to the virtual nodes, the charge-recycling technique can be used. In general, however, we do not recommend applying the charge-recycling technique at the individual cell level (fine grain) since our basic requirement for energy saving due to charge recycling, i.e., the condition that virtual nodes change to the opposite rail values during sleep, may be frequently violated under this scenario.
Another MTCMOS design configuration is the cluster-based style which we consider as a midgrain MTCMOS technique [6] , [7] . To implement the cluster-based charge recycling, we start by putting a group of, for example, n logic cells that use NMOS sleep transistors together, and connecting their VGND nodes to create a single VGND node. Similarly, a group of, for example, m logic cells that use PMOS sleep transistors to make a single virtual supply node that is shared among the cells. Charge recycling can subsequently be performed between the VGND of one group and the virtual supply of the other group. Although cell clustering is an important optimization step, it falls outside the scope of this paper.
In the next section, we will analyze the energy saving achieved by applying the charge-recycling technique for a coarse-grain MTCMOS design.
IV. ENERGY SAVING ANALYSIS
In this section, we first calculate the maximum achievable energy saving and discuss the conditions under which we can achieve this maximum saving. Then, we quantitatively analyze the effect of the threshold voltages and sizes of the transistors in the TG realizing the charge-sharing switch.
A. Energy Saving Due to Charge Recycling
It is worth stating at the onset that, for the purpose of analyzing energy consumption in CMOS circuits, energy is taken out of the V DD rail only when a capacitive node is charged up through a direct connection to the V DD rail. The energy that is dumped to the ground rail is the energy which was stored in that capacitive node and need not be accounted for again. The charge recycling between "floating" capacitive nodes (with possibly different initial voltage levels) does not extract any energy from the V DD rail or dump any into the ground rail; instead, some of the energy that was stored in the capacitors is consumed in the resistance of the switch that short circuits the two capacitive nodes while the remainder of the energy is appropriately distributed between the nodes.
To calculate the energy saving of the charge-recycling technique, we consider two different transitions: wakeup transition, which is sleep to active, and sleep transition, which is active to sleep.
Case 1-Wakeup Transition: Let C G and C P represent the total capacitance in VGND and VV DD nodes, respectively. We assume that the sleep period is long enough so C G is charged up to a voltage close to V DD , while C P is completely discharged to a voltage close to zero. This is a good assumption in most circuits. Otherwise, the voltages of C G and C P will be a function of the length of the sleep period. As stated earlier, to go from the sleep mode to the active mode, instead of simply turning on sleep transistors, we first allow charge recycling between C G and C P . This is done by closing switch M at time t = t a0 . Assuming ideal charge sharing between C G and C P , the common voltage value of nodes G and P after charge sharing is calculated by equating the total charge in both capacitances right before and after charge sharing
The common voltage value V f of VGND and VV DD nodes at the end of the charge sharing is αV DD . After the charge sharing is complete, i.e., at time t = t a1 , we open switch M and turn on the sleep transistors S N and S P . As a result, there will be a path from the VGND to the (actual) ground going through S N which would discharge C G to zero. There will also be a path from the virtual V DD to the (actual) V DD going through S P which would charge C P to V DD . For now, we neglect the energy consumption in turning on and off the switch itself; therefore, the total energy drawn from the power supply is due to the process of charging capacitance C P which can be obtained as follows:
Substituting V f from (1) into (2), we obtain the energy consumed during the sleep-to-active transition
Next, we consider the active-to-sleep transition. Case 2-Sleep Transition: As mentioned earlier, to go from the active mode to the sleep mode, instead of simply turning off the sleep transistors, we do charge recycling between C G and C P as soon as the circuit enters the sleep mode. In other words, we close switch M at t = t s0 which is the time when the sleep transistors are turned off. The voltage values of the VGND and VV DD nodes at t = t s0 are zero and V DD , respectively. Assuming ideal charge sharing between C G and C P , the common voltage value of nodes G and P after charge sharing is calculated by equating the total charge in both capacitances right before and after charge sharing
Based on the aforementioned equation, the common voltage value V f of VGND and VV DD at the end of charge sharing is βV DD . The charge recycling is complete at t = t s1 ; therefore, we open the switch. After opening the switch, there is a leakage path from the power supply to the VGND going through logic block C 1 which eventually causes C G to be charged up to V DD . There is also a leakage path from the virtual supply to the ground going through logic block C 2 which eventually causes C P to be completely discharged to the ground. Again, if we neglect the energy consumption for turning on and off the switch, the total energy consumed is due to charging up the capacitance C G ; the energy consumption can be calculated as follows:
Substituting V f from (4) into (5), we obtain
Since α + β = 1, the total energy consumption will be where E CR−MTCMOS is the dynamic energy consumption during mode transition in the charge-recycling circuit. We can calculate the total energy consumption of the corresponding conventional MTCMOS circuit, i.e., when no charge recycling is used using the following:
From (7) and (8) and after substituting for α and β from (1) and (4), the energy saving ratio (ESR) would be
where X = C G /C P is the ratio of the VGND capacitance to the VV DD capacitance. The optimum value of X which maximizes ESR(X) is obtained by equating the derivative of ESR(X) to zero which results in X = 1 or C G = C P . In other words, in order to obtain the maximum energy saving, we need to have equal capacitances in VGND and VV DD . The maximum energy saving is
This means that a maximum energy saving of 50% can be achieved by using the charge-recycling method. However, considering the power needed to turn the TG on and off, the total saving ratio would be less than 50%. Fig. 5 shows HSPICE waveforms when charge recycling is performed before transitioning from the sleep to the active mode for an inverter chain implemented in 70-nm CMOS technology. Note that, in the circuit, C G = C P . The figure shows the VGND voltage V G , the virtual V DD voltage V P , and the charge-recycling signal V CR .
We denote the VGND and virtual supply capacitances as C G and C P , respectively. The total effective capacitance in the VGND (supply) comprises of the following components. a) Diffusion capacitance (C diff ). This component is calculated as the summation of the diffusion capacitances of transistors in logic gates connected to the VGND (supply).
b) Interconnect Capacitance (C wire ). This component is the total rail capacitance in the VGND (supply) due to interconnect. c) Internal node capacitance (C inte ). This component is calculated as the total internal node capacitance of logic gates connected to VGND (supply) whose voltage values transition from V DD to zero or vice versa during mode transitions. The total virtual node capacitances can thus be written as
Now suppose each block C 1 and C 2 in Fig. 1 consists of a simple inverter. When charge recycling is performed, after the active-to-sleep transition, the value of C G depends on the state of the inverter in C 1 . To be more precise, we consider two cases as follows.
Case 1: When the input of the inverter is at logic zero, the NMOS transistor of the inverter is OFF; therefore, the total capacitance C G is the sum of the first two components in (11) (no internal node capacitance). Case 2: When the input of the inverter is at logic one, the NMOS transistor of the inverter is ON, and the internal node capacitance contributes to C G . Similar discussion holds for the C P capacitance and the state of the inverter in the C 2 block. This makes C G and C P values input-pattern dependent for a general circuit, meaning that different input patterns applied to the circuit result in different logic values for the inputs of the circuit's gates which change the contribution of the internal node capacitances to the total rail capacitance resulting in different C G and C P values. Fortunately, our simulations for circuit blocks containing a reasonable number of logic cells (e.g., more than 20 gates per block) show that the maximum change in the shared voltage value after charge-recycling operation is less than 5% for different input patterns. In other words, the impact of the input vector on unbalancing the total VGND and virtual supply capacitance values is small and can be neglected.
Finally, we point out that the ESR is only a weak function of the ratio between C G and C P . From (9), the maximum ESR is achieved when C G = C P . However, even when this condition is not satisfied, the ESR will not decrease dramatically, for example, for C P = 2 × C G which means X = 1/2 in (9), ESR becomes 44%, and for C P = 3 × C G , X = 1/3, ESR becomes 38%. Therefore, even in cases where the C G and C P values are different by as much as two or three times, the ESR is still large.
Note that all the equations we derived so far were based on the assumption of having an ideal charge recycling between C G and C P . Under this scenario, we assume that no energy is consumed to switch the TG on and off. We also assume that the TG is on while the charge recycling is in progress. However, because of the dynamic power consumption in the TG and also the possibility of having incomplete charge sharing, this is not a perfect replacement in practice. In the following, we study the effects of the TG threshold voltage and sizing on the ESR and the wakeup time of the charge-recycling configuration. Fig. 6 . Charge sharing between C 1 and C 2 when using a TG to realize the charge-sharing switch.
B. Effect of the Threshold Voltages of the TG
We first discuss the effect of the threshold voltages of the NMOS and PMOS transistors of the TG on the energy saving and the delay of the circuit.
Consider the charge-sharing configuration shown in Fig. 6 where V 1 and V 2 are set to V DD and zero levels initially. After the TG is closed, the common node voltage is referred to as V f . To have a complete charge sharing, the TG has to stay on for the whole duration of the charge-sharing process. In order to have this property, the absolute values of the threshold voltages of the N and P transistors of the TG have to be small enough. To guarantee this, the common final voltage value of VGND and virtual supply, which is V f , has to satisfy at least one of the following two inequalities:
where V t,n and V t,p denote the threshold voltages of the NMOS and PMOS transistors in the TG accounting for the body effect. Notice that V f can be obtained from (1) for the active-to-sleep case and from (4) for the sleep-to-active case. The inequalities in (12) guarantee that at least one of the transistors in the TG remains on for the complete duration of charge sharing. In the case of equal virtual node capacitances, C G = C P , a complete charge sharing in both active-to-sleep and sleep-toactive cases results in a common final voltage value of V f = V DD /2, and (12) translates into Min{V t,n ,
, a TG may be replaced with a pass transistor while still achieving full charge sharing. Note that, in current CMOS technologies, this condition is easily satisfied for both LVT and HVT devices so as to have acceptable static dc noise margins. In the future CMOS technologies that use sub-1-V power supply level, as it will be discussed in Section VI, turning on the HVT devices will be difficult, and that is why super cutoff CMOS (which uses voltage over or under drive) was developed in [16] . Therefore, for sub-1-V technologies, we recommend using charge-recycling SCCMOS (CR-SCCMOS) instead of CR-MTCMOS (cf. Section VI). In this case, the transistors of the TG will be LVT, and V t,n , |V t,p | ≤ V DD /2 will be automatically satisfied (otherwise, even the CMOS logic cells inside the logic blocks, which all use LVT transistors, would fail). 
C. Effect of the Transistor Sizes of the TG
The sizing of the TG is another factor that affects the ESR as well as the wakeup time of the circuit. In the case of the original configuration where there is not any charge recycling, the wakeup time is typically defined as the time that it takes for the voltage of the VGND or virtual V DD to reach within 10% of their final values after we turn on the sleep transistor. In the proposed charge-recycling solution, we first turn on the TG in order to perform charge sharing between C G and C P , and next, we switch on the sleep transistors to complete the mode transition. Therefore, in the charge-recycling circuit, the wakeup time is defined as the time that it takes for the voltage of the VGND or virtual V DD to reach within 10% of their final values after we turn on the TG. In the following discussion, we consider the effect of the dynamic power consumption of the TG on the ideal ESR, which we previously calculated.
Consider TG with its control signal (the complement of the control signal is produced by a CMOS inverter). Assume a total input capacitance of C tg for the NMOS and PMOS transistors of the TG. In each active-sleep-active cycle, we need to turn on the TG twice, once before turning the sleep transistors on and once after turning them off. Every time we turn the TG on and off, we charge and discharge C tg . We have to turn off the TG after the charge sharing is complete.
Therefore, we can calculate the dynamic energy consumption of the TG for one complete active-sleep-active cycle as follows:
Therefore, the actual ESR can be calculated by subtracting the correction ratio E TG /E MTCMOS from the ideal ESR in (9) . The correction ratio can be calculated as
This correction ratio is proportional to the sizes of the TG's transistors since C tg is proportional to the size of the TG. Because many gates are usually connected to the VGND and the virtual V DD , C G + C P is usually much larger than C tg . Thus, the correction ratio is usually few percent, making the actual ESR to be less than the ideal ESR, i.e., 50%, by only a few percentage points. Fig. 7 shows the ESR versus the total transistor width used in the TG. As seen, the ESR is reduced as we increase the TG size. By changing the TG size, we can change the speed of the charge-sharing operation and, as a result, minimize the wakeup time; however, the charge-sharing operation only changes the virtual node voltages from their initial values to V DD /2. The rest of the wakeup operation is performed by the sleep transistors, and its duration depends on the sizes of the sleep transistors. Clearly, increasing the TG size does not affect the speed by which the sleep transistors can change the virtual node voltages from V DD /2 to V DD or zero as the case may require. Therefore, the total wakeup time of the circuit is expected to decrease when we increase the TG size, but then, it saturates at some point. Fig. 8 shows the circuit wakeup time versus the total transistor width used in the TG. Finally, note that, although increasing the TG size reduces the wakeup time, it also increases the correction ratio given in (14) , thereby changing the ESR of the circuit. In other words, there is a tradeoff between the wakeup time and the ESR.
V. LEAKAGE CURRENT AND GB ANALYSIS
We analyze two important issues for the proposed chargerecycling MTCMOS configuration, namely, the leakage current and the GB.
A. Leakage Current
In the sequel, we derive the subthreshold leakage current equations for both MTCMOS and CR-MTCMOS circuits. The leakage current of a MOS transistor can be written as follows [16] :
where V gs and V ds are the gate-source and drain-source voltages of the transistor and W/L is the width to the length ratio of the transistor. In the sleep mode, all sleep and chargerecycling transistors are off, i.e., they all have V gs = 0. Here, the V ds for each charge-recycling transistor is the absolute voltage difference between VGND and VV DD nodes in the sleep mode, which is approximately equal to V DD based on the discussion in Section III. From (15) 
where V tH is the threshold voltage of the sleep transistors. The total leakage current of the MTCMOS circuit is the sum of I Ln and I Lp
For the CR-MTCMOS, however, there is an additional leakage component due to the charge-recycling switch (I Lcr ). For the purpose of this section, assume that, instead of a TG, a single NMOS transistor with the width W cr is used for charge recycling. Using (15) , I Lcr can be written as
Using (17) and (18), the ratio of the leakage current for MTCMOS and CR-MTCMOS can be written as
Assuming μ n = 2μ p and W n = 0.5W p I CR−MTCMOS leakage
Since the charger-recycling transistor is usually much smaller than the sleep transistors, the leakage-increase ratio given in (20) is usually too small when compared to the power saving achieved by using the charge-recycling technique.
B. GB
Ground and power line bounces are one of the most important design concerns when power gating is used [12] . GB or power bounce may occur in power gating structures at the sleepto-active transition edge. In this section, we discuss about how charge-recycling technique affects the GB. Consider the circuit in Fig. 9 . Large current flows into the ground after the sleep transistor is turned on at the end of the sleep period. We adopt a simple RL model for the purpose of GB analysis. Because of the large di/dt at the turn-on time, a large voltage, i.e., Ldi/dt, appears across the inductance. We next study the effect of the proposed charge-recycling technique on the GB of the circuit. Fig. 9 shows the VGND capacitance C G connected to the RL circuit (modeling the pin-package parasitics of the IC), via the sleep transistor S N . The sleep transistor is turned on at t = 0 when the initial voltage of C G is V 0 , i.e., V G (t = 0) = V 0 . Based on the results of [19] , the positive peak of the GB occurs during the time when S N operates in the saturation region. If we neglect the channel length modulation effect, the saturation current of S N does not depend on V 0 . Therefore, we expect that the proposed charge-recycling technique, which changes V 0 from V DD to V DD /2, would not change the GB's positive peak. However, due to the channel length modulation effect, the saturation current of the sleep transistor S N is somewhat smaller for the CR-MTCMOS compared to the MTCMOS circuit. This results in a smaller GB for the CR-MTCMOS circuit. In addition, the negative peak and the settling time of GB are functions of V 0 , i.e., they both decrease when V 0 decreases [19] . Therefore, both the negative peak value and the settling time of the GB voltage are expected to decrease for the CR-MTCMOS circuit.
The amounts of improvement in the negative peak and settling time depend on the relative values of L, C G , R, V DD , and the sleep transistor parameters. Fig. 10 shows GB waveforms for the conventional and the charge-recycling power gating structures used for an inverter chain implemented in 70-nm CMOS technology. As expected, the positive peak value is almost the same in both cases; however, the negative peak value and the settling time are smaller for the charge-recycling MTCMOS structure.
VI. VARIANTS OF THE CHARGE-RECYCLING TECHNIQUE
Previously, we presented a certain type of charge-recycling technique that uses both NMOS and PMOS sleep transistors. Charge recycling was then applied between VGND and VV DD nodes. In this section, we discuss three variations of the proposed charge-recycling technique for the MTCMOS circuits.
A. Charge Recycling Between the Same Type of Virtual Rails
Consider Fig. 11(a) where two circuit blocks C 1 and C 2 are using the same type of sleep transistors, e.g., NMOS transistors. Suppose C 1 and C 2 work in "orthogonal" modes, i.e., when C 1 is in active mode, C 2 is in sleep mode and vice versa. For example, C 1 and C 2 can be integer and floating-point arithmetic blocks of a processor, respectively. When the integer arithmetic block is used, the floating-point block will be idle and conversely. We show that charge recycling can be performed between the VGND nodes of blocks C 1 and C 2 , denoted by VGND 1 and VGND 2 , respectively.
First, assume that C 1 is in the active mode and C 2 is in the sleep mode. The voltages of VGND 1 and VGND 2 are zero and V DD , respectively. When C 1 is switched to the sleep mode, C 2 is switched to the active mode and the voltages of VGND 1 and VGND 2 change to V DD and zero, respectively. Therefore, the charge recycling can be done between the VGND 1 and VGND 2 nodes to save the mode transition energy.The energy consumptions for the MTCMOS and CR-MTCMOS circuits in a full active-sleep-active cycle are
where ΔV 1 and ΔV 2 are the voltage differences between the final charge-recycling voltage value and the supply voltage values of the two blocks, respectively, and are calculated as follows:
Substituting ΔV 1 and ΔV 2 from (22) into (21), we can calculate the ESR as
which is similar to the regular charge-recycling case. The maximum energy saving of 50% is achieved when C G1 = C G2 . Similarly, the charge-recycling technique may be applied between the VV DD nodes of two blocks that use PMOS sleep transistors.
B. Charge Recycling for Blocks With Different Power Supply Levels
Consider Fig. 11(b) where two circuit blocks C 1 and C 2 use two different power supply levels V DD1 and V DD2 , respectively. If C 1 and C 2 use different types of sleep transistors, for example, C 1 uses an NMOS while C 2 uses a PMOS sleep transistor, and if C 1 and C 2 are always in the same mode of operation (i.e., they are both in the sleep mode or they are both in the active mode), then the charge-recycling technique may be applied between the VGND of C 1 , which is VGND1, and the virtual supply of C 2 , which is VV DD2 .
In this case, the energy consumptions for the MTCMOS and CR-MTCMOS circuits can be written as follows:
Substituting ΔV 1 and ΔV 2 from (25) into (24), we can calculate the ESR as
One can see from (26) that the ESR in this case depends not only on the capacitance values in the virtual rails but also on both supply voltage values. Notice that, if V DD1 = V DD2 , then (26) is reduced to (9) . Table I shows the energy saving results for the two variants of the charge-recycling technique discussed in Sections VI-A and B. This table includes three example cases of charge recycling for the same type of virtual rail. In each case, we have used two blocks of the same circuit when they both employ NMOS sleep transistors. Table I also includes a charge-recycling case for blocks with different supply levels. In this case, we put together two circuit blocks 9sym and C880 where 9sym employed a PMOS sleep transistor and a supply voltage of V DD1 = 1.3 V, whereas C880 used an NMOS sleep transistor and a supply voltage of V DD2 = 1.0 V. The results show that the energy consumption during mode transition for CR-MTCMOS is less than that for MTCMOS by an average of 36%.
C. Charge Recycling for SCCMOS
Turning on HVT devices is difficult in sub-1-V CMOS technologies [16] , [18] . In 45-nm technology, the best corner V DD is 0.9 V while the standard threshold voltage is about 0.5 V. For acceptable leakage saving, the HVT must be at least 0.65 V. This leaves only a 0.25-V margin for the gate-source voltage (0.65 < V GS < 0.9 V) of a turned-on NMOS sleep transistor when MTCMOS is used. Therefore, HVT sleep transistors are too slow and hard to turn on in sub-1-V technologies. SCCMOS circuits solve this problem by using an LVT device for cutting off ground or V DD [16] . Instead of using HVT devices for leakage reduction, SCCMOS circuits overdrive the LVT PMOS sleep transistors by applying a positive overdrive voltage of ΔV DD in excess of V DD to their gate terminals. Similarly, they underdrive the LVT NMOS sleep transistors by applying a negative voltage of −ΔV DD to their gate terminals. It has been shown that the SCCMOS circuits achieve the same leakage reduction as the corresponding MTCMOS circuits with shorter wakeup times due to the use of LVT transistors.
Similar to MTCMOS, conventional SCCMOS circuits suffer from wasteful mode transition energy consumption. Both NMOS and PMOS sleep transistors may be used to cut off power or ground from the gates inside a circuit. During the standby mode, due to leakage, the VGND node will be charged to a value close to V DD while the VV DD node will be discharged to a voltage close to zero [18] . The opposite situation occurs in the active mode. Consequently, charge recycling may be applied to SCCMOS circuits to save the mode transition energy in the same fashion as it is applied to MTCMOS circuits. Fig. 12 shows the configuration of the circuit used for CR-SCCMOS. Table III reports the results of applying the charge-recycling technique to SCCMOS circuits. In order to have a fair comparison between each MTCMOS and its SCCMOS counterpart, the value of the overdrive voltage for a PMOS sleep transistor in the SCCMOS circuit, i.e., ΔV DD , is set to the threshold voltage difference between the HVT and LVT PMOS devices in the MTCMOS circuit. Similarly, the value of the underdrive voltage for an NMOS sleep transistor in the SCCMOS circuit −ΔV DD is set to the threshold voltage difference between the LVT and HVT NMOS devices in the MTCMOS circuit.
VII. SIMULATION RESULTS
We used the ISCAS-85 circuit benchmark suite to generate our experimental results. All benchmark circuits are first optimized using "script.rugged" in SIS (a system for sequential circuit synthesis). We used a 90-nm cell library to perform timing-driven technology mapping. The LVT value is 0.25 V, whereas the HVT value is 0.55 V for NMOS transistors. Similarly, for PMOS transistors, the LVT value is −0.22 V, whereas the HVT value is −0.52 V. The supply voltage's value is V DD = 1.2 V.
Starting with an optimized and technology-mapped ISCAS-85 circuit, we first generate the MTCMOS version of the same circuit as follows. We use a single NMOS sleep transistor to cut off the ground from the VGND node during the sleep time. The size of this sleep transistor is set to ensure a voltage drop of no more than 5% of V DD across its R DS (ON) when the circuit is active. This limits the performance penalty of the power gating structure. The exact solution to this problem requires an optimization that falls outside the scope of this paper. Interested readers may refer to [6] , [7] , and [20] for different ways in which the problem can be formulated and solved. Let N denote the number of logic gates in the circuit. In our experiments, we assumed that at most 10% of logic gates in the circuit exhibit a simultaneous high-to-low output transition in any given cycle, each transition contributing an average of ΔI avg current to the total current flowing through the on sleep transistor, and therefore
This simple derivation produces reasonably good results for the size of the MTCMOS sleep transistor in our benchmark suite. However, in general, a more sophisticated sizing technique is needed to guarantee that the worst case path delay increase is below some prespecified target level. In the table of results, we use the notation ST-MTCMOS to refer to the standard MTCMOS version of circuits. Next, we generate a version of the circuit benchmarks that uses both NMOS and PMOS sleep transistors. In particular, we partition circuit C into two blocks C 1 and C 2 , where C 1 uses an NMOS sleep transistor, while C 2 uses a PMOS one. Furthermore, the partitioning is done such that the total capacitance of the VGND node of C 1 is equal to the total capacitance of the virtual voltage node of C 2 . The sizing of the NMOS and PMOS sleep transistors for each circuit block is done similar to the ST-MTCMOS case (accounting for the difference between hole and electron mobility). We refer to this version as the NP-MTCMOS because it uses both types of sleep transistors, yet it does not perform any charge recycling.
We incorporate the charge-recycling technique into NP-MTCMOS by using an appropriately sized TG as the switch between the VGND of C 1 and VV DD of C 2 . The size of this TG is selected such that the wakeup times of the NP-MTCMOS and the CR-MTCMOS are approximately equal. The optimization is performed by measuring the wakeup time of the NP-MTCMOS and sweeping the TG size (using SPICE) while monitoring the wakeup time of the CR-MTCMOS circuit.
NMOS transistors have higher drive strength compared to PMOS transistors; thus, from a layout area point of view, it is better to use NMOS sleep transistors. However, the sleep transistor size is not the only factor determining whether NMOS or PMOS sleep transistors must be used. Other factors such as leakage and noise on power/ground rails are also important. For example, PMOS transistors have lower leakage. In any case, since the total area overhead of the sleep transistors is relatively small (it is typically less than 5% of the total logic cell area), using NMOS versus PMOS sleep transistors does not make a big difference in terms of the total area. An important issue is the cost of implementing PMOS or NMOS sleep transistors in the given process technology as follows. If NMOS sleep transistors are used, body connections of the NMOS transistors of logic cells have to be tied to the VGND node in order to minimize the body effect. On the other hand, the body connection of the NMOS sleep transistor has to be tied to the actual ground. Thus, a three-well CMOS process is required, which is more expensive than a typical two-well CMOS process. In contrast, if PMOS sleep transistors are used, the p-substrate easily separates the n-well of these transistors from other nwells which contain PMOS transistors used in the normal cells.
We generate NP-SCCMOS circuits by taking the NP-MTCMOS and scaling both the NMOS and PMOS sleep transistors by the following factor:
where V tH, * and V tL, * denote the HVT and LVT values of NMOS or PMOS devices, respectively.
Finally, we generate CR-SCCMOS by enabling charge sharing with an appropriately sized TG. Similar to the CR-MTCMOS case, the size of this TG is determined through SPICE simulation with the goal of equating the wakeup times of NP-SCCMOS and CR-SCCMOS.
The control signal for the TG needs to be synchronized with the sleep signal generated by the power management unit. The pulse duration has to be long enough to enable charge sharing but not overly long since it adds up to the wakeup time. Typically, 20%-30% of the total cycle time is sufficient for the charge-recycling operation to finish. For example, in a 90-nm CMOS technology with a clock frequency of 2.5 GHz, the cycle time is 400 ps. Thus, a 100-ps pulsewidth is a good choice for the charge-recycling operation. The task of synchronizing this pulse with the clock and power management control signal is similar to meeting other timing constraints in nanoscale CMOS designs. Table II shows the energy saving results for various ST-MTCMOS circuits and their corresponding NP-MTCMOS and CR-MTCMOS ones. As one can see, the energy consumption during mode transition for CR-MTCMOS is less than ST-MTCMOS and NP-MTCMOS by an average of about 25% and 40%, respectively. Note that, in all reported cases, the wakeup times are equal. We have observed that the total sleep transistor area overhead in the NP-/CR-MTCMOS is 50% more than that in the ST-MTCMOS. Since this area overhead is only a small percentage of the total chip area (less than 5%), the actual sleep transistor area overhead due to using CR-MTCMOS compared to ST-MTCMOS is small. Table III shows the energy saving results for various NP-SCCMOS and corresponding CR-SCCMOS circuits. As it was explained in Section VI, in order to have a fair comparison between MTCMOS and SCCMOS circuits, the value of the overdrive voltage for a PMOS super cutoff switch in the SCCMOS circuit is set to the threshold voltage difference between the HVT and LVT PMOS devices in the MTCMOS circuit. Similarly, the value of the underdrive voltage for an NMOS switch in the SCCMOS circuit is set to the threshold voltage difference between the HVT and LVT NMOS devices in the MTCMOS circuit. As one can see, the energy savings of CR-SCCMOS over NP-SCCMOS is about 36% on an average for the same wakeup time.
Reducing ground and power rail bounces is among the important issues in designing MTCMOS circuits. As it was discussed in Section V, the proposed charge-recycling technique reduces the ground (power) bounce of the MTCMOS circuits. Table IV validates this expectation by reporting the positive and negative peaks of the GB for various NP-MTCMOS circuits and the corresponding CR-MTCMOS circuits. As one can see, the negative peak GB value of the CR-MTCMOS has decreased by an average of 33% compared to that of the NP-MTCMOS.
Next, we compare ST-MTCMOS and CR-MTCMOS circuits in terms of their total energy consumptions. The total energy consumptions in the ST-MTCMOS and CR-MTCMOS circuits may be written as the summation of their corresponding active and sleep mode energy consumptions plus the energy consumption due to the mode transition in these circuits
The active-mode energy consumption for both cases consists of two parts: dynamic and static (leakage) components. The active-mode energy components in ST-MTCMOS and CR-MTCMOS circuits can be written as
where c sw denotes the average switched capacitance for the circuit in each clock cycle, f clk is the clock frequency, I la denotes the average active leakage current in the circuit, and t active is the total time the circuit is active. Let N clk denote the number of the clock cycles over which energy calculations are performed. We can write
where T clk = 1/f clk is the clock period and α denotes the (active) duty factor which is defined as the percentage of the total time during which the circuit is in the active mode. The sleep-mode energy consumptions for the two circuits can be written as 
where c slp ST and c slp CR denote the total sleep transistor input capacitance and c G ST denotes the total VGND capacitance in the ST-MTCMOS circuit while c G CR and c P CR denote the total VGND and virtual V DD capacitances in the CR-MTCMOS circuit, respectively. Finally, β is the mode transition frequency, i.e., the average number of mode transitions per clock cycle. We also define the mode transition factor in some time window T as the β value times the number of clock cycles in T . From (29), the active mode energy consumption is the same for both circuits which means that charge-recycling technique does not have any influence on the active mode energy consumption; therefore, we do not consider the active mode energy consumption component of (28) Fig . 13 shows the percentage of the total energy savings of CR-MTCMOS over ST-MTCMOS as a function of the modetransition frequency for three different duty factor values for one of the ISCAS-85 benchmark circuits, which is 9sym. As we increase the mode-transition factor, the percentage of energy savings increases for each case. This is because the chargerecycling technique can save energy during mode transition only. As we increase the duty factor α, the total sleep time will decrease and the total saving will consequently increase. This can be seen in Fig. 13 by looking at energy saving plots for different activity factors. For large values of α (e.g., 0.9) and β, the sleep plus mode-transition ESR will be approximately equal to the mode-transition ESR (as was reported in Table II) . Table V shows active mode and sleep mode leakage current and mode transition energy consumption values for an LVT inverter in the library. The table contains two sets of data corresponding 
VIII. CONCLUSION
In this paper, we introduced the concept of charge recycling in MTCMOS and SCCMOS circuits. We showed that, by applying charge recycling to MTCMOS or SCCMOS circuits, we can save up to 43% of the energy wasted during mode transition while maintaining the wakeup time of the original MTCMOS or SCCMOS circuit. We also showed that, by using the proposed technique, we can reduce the peak voltage and the settling time of the GB occurred while waking up the circuit. Since the charge-recycling transistors are much smaller than the sleep transistors, the leakage increase due to the additional sneak path in the proposed technique is usually quite small.
