Static power has become the most important factor in the fabrication of integrated circuits. Power gating techniques minimize leakage currents and help to develop ultra-low-power and high-performance digital circuits. In this paper, a power gating approach is proposed to minimize leakage for subnanometer technologies. Simulation results reveal that the proposed technique reduces maximum of 96% leakage power, 33% dynamic power, 49% drowsy power, and 16% energy as compared to conventional techniques. The proposed technique offers good leakage reduction, even under variation of different operating parameters.
Introduction
In this modern era, usage of battery-powered portable devices like laptops, mobile phones, and personal digital assistants (PDAs) has been increasing rapidly. Operation time of these devices is restricted by their battery lifetime, which has become a major motivation for low-power VLSI design [1] [2] [3] . Previously, the major concern of chip designers has been dynamic power consumption, as it accounted for about 99% of the total chip power. As transistors in modern CMOS technology have been scaled, leakage power is growing immensely, and it is nearing dynamic power.
Power gating is the most commonly used circuit technique for leakage power reduction in digital integrated circuits [4] . The power gating technique cuts off power to the circuit blocks when they are idle. Transistor-based power gating is implemented by placing sleep transistors in-line between the circuit and the power network or the ground network. Mutoh et al. proposed a power gating structure [5] that supports active and sleep modes; it is a state-destructive technique where the current output value of the circuit block might be lost. To preserve the data in the circuit block during idle periods, an intermediate data retention mode is required. Many power gating techniques [6] [7] [8] [9] have been proposed for leakage reduction and data retention. In all these techniques, the charge gets stored at the gate of the sleep transistor during active mode. The stored charge is dumped to ground and is wasted during the transition from active to sleep mode. No attempt is made to reuse the charge at the gate of the sleep transistor.
A new power gating structure is proposed in this paper to suppress power consumption and provide data retention by charge recycling. In the proposed low-power multimodal switch (LPMS) power gating structure, the drowsy and sleep transistors form a stack pair to minimize power consumption. Virtual ground node voltage is boosted by charge recycling to support a data retention mode. Reduced power consumption reveals that the proposed structure is effective in minimizing energy to a great extent. The rest of the paper is organized as follows: section 2 discusses previous power gating approaches. Section 3 describes the proposed technique, and section 4 presents the simulation methodology. Section 5 explains the results, and section 6 concludes the paper.
Previous work
Various power gating techniques exist in the literature with low leakage and data retention capability. In [10] , Tada et al. presented the sleep buffer (SB) approach shown in Figure 1 , which retains the data in drowsy mode by reusing the charge stored at the sleep transistor gate during active mode. Sleep mode is lost in the sleep buffer approach; it is suitable only when the circuit block switches between active and drowsy modes frequently. The charge recycling (CR) technique [11] shown in Figure 2 supports active, drowsy, and sleep modes. The layout area of the charge recycling process is smaller, but drowsy mode power consumption is very high. The trimodal switch technique (TMS) [12, 13] shown in Figure 3 puts the circuit block in active, sleep, and drowsy modes depending on the control signals sleep and drowsy. The area requirement of the trimodal switch technique is high and leakage is also high, as there exists a sneak path from supply to ground through the drowsy transistors. A low-power multimodal switch is proposed in this paper to support sleep mode and minimize power with minimum area overhead. By utilizing a drowsy transistor, LPMS technique supports sleep mode. By connecting the sleep buffer (MS1 and MS2 ) to real ground through stacked pair transistors, the sneak path is eliminated in the LPMS technique. The power consumption, delay, power delay product (PDP), and area overhead of different power gating techniques are characterized in this paper. The impacts of sleep transistor width, sleep transistor threshold voltage, and temperature on power consumption are evaluated. The simulation results of the existing techniques show the potential benefits of the proposed LPMS technique in terms of power and performance.
Proposed technique
This section presents the circuit configuration and functionality of the proposed power gating technique. The circuit configuration of the low-power multimodal switch is shown in Figure 4 , and its functionality is provided in The LPMS technique combines both the concept of power gating and the stack effect. The LPMS structure connects the circuit block with the ground rail during active mode and disconnects the ground rail in sleep mode. Transistors MD and MS are connected such that they form a stack pair to minimize power. Considering MD and MS along with the pulldown network of the circuit block, it is clear that the stack length results in higher power reduction. 
Experimental methodology
LPMS technique is applied to various generic logic circuits to show that it is applicable to generic logic design. Three benchmark circuits-(i) a 4-bit adder, (ii) a 4:1 multiplexer, and (iii) a chain of 3 inverters-were considered for experimentation. In order to compare the results of the proposed approach with existing leakage reduction approaches, experiments include the techniques discussed in section 2, namely, sleep buffer, charge recycling, and trimodal switch approaches.
Schematics and layouts are designed in 90 nm technology for all considered techniques using digital schematic editor and simulator (DSCH) and Microwind tool. The net lists generated from layouts are modified to fit into all targeted silicon technologies using the predictive technology models (PTM) [14] for 32, 45, 65, 90, and 130 nm processes. Synopsys HSPICE simulation is used to estimate delay and power consumption. The simulation procedure is summarized in Figure 5 . 
Experimental results
Leakage, dynamic, and drowsy power analyses are presented in sections 5.1, 5.2, and 5.3, respectively. Propagation delay analysis is done in section 5.4. Energy consumption of different power gating techniques is discussed in section 5.5. Impact of sleep transistor width, threshold voltage, and temperature on leakage and drowsy power is estimated for the 4-bit adder designed in 65-nm technology, and the results are analyzed in sections 5.6, 5.7, and 5.8, respectively. Layout of generic circuits and area comparison are presented in section 5.9.
Leakage power analysis
As leakage power varies according to input state, a subset of possible input combinations are considered to estimate static power. All 8 possible input vectors of a full adder are considered for leakage power measurement. Out of 128 possible input combinations, 8 random input vectors are considered for the 4:1 multiplexer, as shown in Table 2 . For the chain of 3 inverters, 2 input vectors, '1' and '0', are considered. When an input vector is asserted, power consumption is measured after the signal becomes stable (e.g., after 50 ns). Leakage power consumption of each circuit is derived by averaging power consumption for all input combinations. Table 2 . Input sets for a 4:1 multiplexer leakage power measurement.
Leakage power consumption of the 4:1 multiplexer is shown in Figure 6 . Leakage of TMS and CR is high compared to the LPMS technique. The proposed LPMS technique offers high leakage reduction compared to other power gating techniques; this is due to the stacked pair of transistors, MD and MS. Stack effect: 2 series connected MOS devices that are off have significantly reduced leakage compared to a single off device [1] . The leakage reduction due to the stack effect can be illustrated mathematically by solving for the stack effect factor, which is defined as the ratio of leakage in a single off transistor to the leakage in a stack of 2 off transistors [15] . 
where S is the subthreshold swing, λ d is the drain induced barrier lowering (DIBL) factor, and k γ is the body effect coefficient. In Figure 7 , intermediate node voltage V x attains a steady state condition when the leakage currents in the transistors MD and MS are equal. Under these conditions, the leakage currents in MD and MS are given by
and the intermediate node voltage is
For short channel devices,
Substituting V x in either Eq. (3) or Eq. (4) will yield leakage in 2 stacked transistors.
where
The leakage reduction that can be obtained in a 2-transistor stack with widthsW M D and W M S compared to a single transistor with width W is given by
When W M D = W M S = W , the stack factor X can be rewritten as
or
where U is the universal 2-stack exponent, which depends on the parameters λ d , S , andV dd . From Eq. (9), it can be observed that leakage is highly alleviated in a stacked pair of MOS devices in comparison to a single device. No sneak path exists in the LPMS structure as the source of MS2 is connected to ground through the stacked transistors, resulting in further leakage reduction. The sleep buffer approach is not included in the leakage power estimation, as sleep mode is not supported by it.
Dynamic power analysis
Active power is estimated by asserting semirandom input signals and calculating the average power dissipation during this time. Inputs are chosen so that a large number of possible input combinations are included in the set. The average power dissipation reported by HSPICE is taken as the estimate of active power consumption.
For the 4-bit adder, the input vectors are asserted, covering every possible input. For the 4:1 multiplexer, the input vectors are chosen to represent a sample of possible inputs, with at least 4 of the 7 input bits at every clock cycle change. The active power of the chain of 3 inverters is measured by asserting a pulse signal of 25-MHz frequency.
The dynamic power consumption of different power gating techniques for a 4:1 multiplexer is shown in Figure 8 , which reveals that power reduction is high in LPMS as compared to other power gating techniques. The dynamic power of sleep buffer, charge recycling, and TMS approaches is the same. LPMS virtual ground voltage is influenced by the on resistance of 2 transistors (MD and MS ), whereas in other power gating techniques, virtual ground voltage is influenced by the on resistance of a single sleep transistor. Hence, the LPMS technique offers higher dynamic power reduction, as the voltage across the logic circuit blocks is lower due to higher virtual ground voltage. 
Drowsy power analysis
Drowsy mode is an intermediate power-saving mode introduced to retain data if the circuit block remains idle for a short duration. A subset of possible input combinations that are used for estimating leakage power is considered for measuring drowsy power. The circuit block is made to switch from active to drowsy mode, and the drowsy power is estimated for 50 ns.
When the circuit switches to drowsy mode, charge recycling takes place between V GN D and V G nodes, making the sleep transistor partially on; current flows from virtual ground node to real ground node. V GN D voltage reaches equilibrium at a balance point of the leakage current of the internal circuits and the current through the transistors between virtual ground and real ground nodes. Due to the stack effect, equilibrium virtual ground voltage of LPMS technique is higher than other techniques; hence, drowsy power is decreased to a greater extent. Figure 9 shows the results for 4:1 multiplexer drowsy power as the technology scales from 130 nm to 32 nm. Drowsy power of the CR technique is higher than that of other techniques, and drowsy power of the trimodal switch is higher in the sleep buffer and LPMS techniques. It is evident from Figure 9 that the LPMS technique is suitable for minimizing drowsy power compared to sleep buffer, charge recycling, and trimodal switch approaches.
Delay analysis
The propagation delay is estimated as the time between the input edge reaching 50% of supply voltage to the circuit output edge reaching 50% of supply voltage. Figure 10 reveals that the delay of the 4:1 multiplexer decreases as technology scales down. The delay of SB, CR, and TMS is less than that of the LPMS technique.
Propagation delay of the LPMS technique is higher than other techniques as it employs a stacked pair of transistors. The analytical delay model of a stacked inverter can be compared with the conventional inverter to show that the stack effect increases delay. Figure 11 shows the conventional inverter and its RC equivalent circuit, where C L is the load capacitance, R t is the transistor resistance, and C in indicates the input capacitance. Generally, transistor delay of the conventional inverter [16] is given by
The stacked inverter can be realized by replacing the single MOS device with 2 MOS devices. Figure 12 shows the stacked inverter and its RC equivalent circuit. C x represents the internal node capacitance between the 2 pulldown transistors. Using the Elmore equation, the delay of stacked inverter is 
It can be observed from Eqs. (10) and (12) that the delay of a circuit increases due to the stack effect.
Power delay product analysis
Power delay product (PDP) defines the energy consumed by the circuit block, and it is necessary to minimize the energy overhead. PDP is estimated by multiplying the propagation delay with the dynamic power. Figure 13 shows the impact of technology scaling on the energy consumption of the 4:1 multiplexer. The energy overhead decreases as the device size shrinks. The PDP of TMS, sleep buffer, and charge recycling techniques is the same; the power delay product of the LPMS technique is less due to its low power consumption compared to other power gating techniques. Table 3 lists the results for 32 nm technology implementation of each benchmark circuit. It is evident from Table 3 that the proposed LPMS technique minimizes dynamic, drowsy, and leakage power significantly as compared to other power gating techniques for all of the generic logic circuits, like a chain of 3 inverters, the 4-bit adder, and the 4:1 multiplexer. For instance, considering the results of a chain of 3 inverters implemented in 32 nm technology, the LPMS approach provides a reduction of up to 2% drowsy power, 33% dynamic power, and 15% energy compared to the sleep buffer technique. As compared to the CR technique, LPMS provides 9% reduction in leakage, 49% reduction in drowsy power, 33% reduction in dynamic power, and 16% reduction in energy. Compared to the TMS technique, a reduction of about 96% leakage, 47% drowsy power, 33% dynamic power, and 15% energy is obtained in LPMS. The delay of the LPMS approach is 26.8%, 25.7%, and 26.1% higher than TMS, CR, and SB techniques, respectively.
Impact of sleep transistor width scaling
Transistor width scaling increases power consumption. In this section, leakage and drowsy power of different power gating techniques are characterized, with varying sleep transistor width. Figure 14 shows that leakage power increases as sleep transistor width is varied from 1 µ m to 10 µ m. Leakage power of the LPMS technique is significantly less than charge recycling and the trimodal switch approach for all sleep transistor sizes. As sleep transistor width changes from 1 µ m to 10 µ m, leakage power minimization of LPMS varies from 87% to 90% and 78.5% to 88% compared to TMS and CR techniques, respectively. Figure 15 shows the simulation results of 4-bit adder drowsy power with varying sleep transistor width. Due to the stack effect of MD and MS, the LPMS technique consumes less drowsy power than sleep buffer, charge recycling, and trimodal switch techniques. LPMS technique minimizes drowsy power by approximately 49% more than the TMS technique and 10% more than sleep buffer and charge recycling techniques for all sleep transistor sizes ranging from 1 µ m to 10 µ m. 
Impact of sleep transistor threshold voltage
Choosing the right threshold value for the sleep transistor is very important in terms of power consumption. In this section, leakage and drowsy power of different power gating techniques are compared by varying the threshold voltage of sleep transistors. Figure 16 shows the measured leakage power results of the 4-bit adder while varying Vth. As the sleep transistor threshold voltage is increased from 0.3 to 0.5 V, leakage power decreases. The leakage power of charge recycling and the TMS approach reduces drastically as the sleep transistor threshold voltage increases, but the leakage is still higher than that of LPMS technique. At a threshold value of 0.3 V, LPMS leakage consumption is approximately 95% less than the CR and TMS approaches. At Vth of 0.5 V, leakage of LPMS is 49% and 80% less than CR and TMS techniques, respectively. Figure 17 shows the estimated drowsy power results with varying sleep transistor threshold voltage. The drowsy power of the charge recycling technique is higher than that of other power gating techniques. The drowsy power of sleep buffer and trimodal switch is the same. When the transistor threshold voltage is 0.5 V, the LPMS technique offers a maximum drowsy power reduction of about 13% to 16% compared to SB and TMS techniques, respectively. 
Impact of temperature
The impact of temperature on leakage and drowsy power of the 4-bit adder is presented in this section.
Temperature affects the leakage power of digital circuits adversely, as subthreshold leakage increases with temperature. Figure 18 shows the effect of temperature on leakage power. The leakage power consumption of TMS is approximately 80% more than that of LPMS technique at 100 • C. Leakage power reduction of LPMS varies from 32% to 79.6% compared to charge recycling technique as the temperature changes from 0 • C to 100
• C. Figure 19 shows the impact of temperature on 4-bit adder drowsy power. Results and analysis in sections 5.6, 5.7, and 5.8 show that the proposed LPMS technique's performance is better in terms of leakage and drowsy power reduction than other power gating techniques, even if the operating parameters changes.
Layout area comparison
Comparison of the layout area of logic circuits with different power gating techniques is presented in this section. The layouts are drawn with Microwind layout tool using 90 nm technology. Figure 20 represents the layout of the 4:1 multiplexer and 4-bit adder using LPMS technique, respectively. The layout areas of the chain of 3 inverters, 4-bit adder, and 4:1 multiplexer with different power gating techniques are listed in Table 4 . Considering the chain of 3 inverters, the charge recycling technique has the smallest layout area among all the power gating techniques that are evaluated in this paper. Sleep buffer technique has an area overhead of about 20.6% compared to charge recycling technique. As the TMS approach uses 4 transistors in its drowsy section, its area overhead is 48.6% and 23% higher than sleep buffer and charge recycling techniques, respectively. Although the LPMS technique suffers from greater area overhead of about 36% and 13% than CR and SB techniques, respectively, the area requirement of LPMS is 8% less than that of the TMS technique. 
Conclusion
In this paper, an effective power gating technique to reduce leakage, drowsy and dynamic power in generic logic circuits is proposed. It is apparent from the power characteristics that the LPMS approach is better than conventional techniques. The LPMS technique also minimizes energy consumption to a maximum of about 16% over other approaches. The proposed LPMS technique provides excellent leakage and drowsy power reduction even when sleep transistor width, threshold voltage, and temperature change. LPMS technique can be used in applications like wireless sensor nodes, SoC, and microprocessors to extend the battery life of these devices.
