Abstract-Power gating is one of the most effective techniques in reducing leakage power, which increases exponentially with device scaling. However, large ground bounces during abrupt changes of power mode may cause unwanted transitions in neighboring circuits, which should still be operating normally. We analyzed this ground-bounce noise and reduced it with novel power-gating structures that utilize holistic integrated devicecircuit-architecture approaches. We control the amount of charge in the intermediate nodes of the circuit that passes through the sleep transistors during the wake-up transition and stabilize the minimum virtual power supply voltage required for data retention. These techniques have been proven in silicon using 65-nm bulk CMOS technology.
I. INTRODUCTION

S
HORTENING the gate length of a transistor increases its power consumption due to the increased leakage current between the transistor's source and drain when no signal voltage is applied at the gate. This can occur, for example, when a mobile phone is on standby awaiting calls and no data processing is underway.
A tremendous increase in transistor leakage current is the primary disadvantage of technology scaling. Leakage affects not only the standby and active power consumption of a CMOS system but also circuit reliability since leakage is strongly correlated to process variations. The influence of leakage current on circuit performance depends on the operating conditions (e.g., standby or active), the circuit style (e.g., logic or memory), and the environmental conditions (e.g., the supply voltage).
There are several different techniques that can be used to tackle the leakage from various angles. Power gating is one well-known way of reducing leakage, and it continues to be applied to very-deep submicrometer CMOS technologies. There has been a lot of work [3] - [5] on the multithreshold voltage CMOS (MTCMOS) technique, which uses a MOSFET switch to gate, or cut off, a circuit from its power rail(s) during standby mode. However, without a clear understanding of the technique, the negative effects of power gating and the range of device options may overwhelm the potential benefits. The power-gating switch is typically positioned between the circuit and the power supply rail or between the circuit and the ground rail. During active operation, the power-gating switch remains on, supplying the current that the circuit uses to operate. During standby mode, turning off the power-gating structure reduces the current dissipated through the circuit. Since the switch gates the power when the circuit is in standby, it is also commonly called a sleep transistor [6] - [9] .
Many vendors of low-power embedded products now include a power-gating capability in the form of "sleep" modes, which typically operate under software control [10] , [11] . When the operating system detects a long idle loop, one of the several processor cores continues to run at its maximum operating frequency, while the other cores are power-gated off [12] .
By turning off the sleep transistor during the sleep period, however, all the internal capacitive nodes of the logic blocks and virtual VDD (VVDD) nodes are discharged to a steady-state value near ground (GND). During a power-mode transition, an instantaneous charge current passes through the sleep transistor, which is operating in its saturation region, and creates current surges elsewhere. Because of the self-inductance of the off-chip bonding wires and the parasitic inductance inherent to the on-chip power rails, these surges result in voltage fluctuations in the power rails. If the magnitude of the voltage surge or drop is greater than the noise margin of a circuit, that circuit may erroneously latch to the wrong value or switch at the wrong time.
Inductive noise, also known as simultaneous switching noise, is a phenomenon that has been traditionally associated with input/output buffers and internal circuitry [13] - [15] . In the past, inductive noise originating from power-mode transitions between the active and standby modes of a power-gating structure was not considered serious, but it is likely to become an important issue in the design of a system-on-a-chip (SOC) that employs multiple power-gating domains to control leakage power [16] - [18] . As shown in Fig. 1 , inductive noise can induce ground bounce in nearby circuits, which should still be operating normally. The noise immunity of a circuit decreases as its supply voltage is reduced. It is therefore essential to 0018-9383/$25.00 © 2008 IEEE Fig. 1 . Ground bounce in a SOC employing multiple power-gating structures to control leakage power.
consider using a technique such as power gating to address the problem of ground bounce in low-voltage CMOS circuits.
In this paper, we will introduce and analyze the ground bounce induced by an instantaneous power mode transition of a sleep transistor in a power-gating structure. We will also present test chip measurements that indicate the extent of the inductive noise caused by quick turn-on of the sleep transistor in a conventional power-gating structure.
We will go on to propose novel power-gating structures to reduce ground bounce by turning the sleep transistors on in a stepwise manner. These structures reduce the magnitude of voltage fluctuations in the power distribution network, as well as the time required to stabilize them. Stepwise switching of the sleep transistors can be implemented either by dynamically controlling the gate-to-source voltage V GS of a sleep transistor, by turning on only a proportion of the sleep transistors at one time, or by gradually releasing the trapped charge that causes the inductive noise. This stepwise switching technique consists of a relaxation stage, followed by a full turning-on stage. During the relaxation stage, the gate voltage of the sleep transistor is charged to a fraction of the rail voltage, and only a small portion of the sleep transistor is switched to full-rail, or stacked sleep transistors are switched in a nonoverlapping pulse manner. This stage significantly cuts the V DS of the sleep transistor with only a small peak current. During the full turning-on stage, V GS is charged to VDD, the remaining portion of the sleep transistor is completely switched on, or the stacked sleep transistors are turned on simultaneously.
We also present a power-gating structure that digitally suppresses ground-bounce noise and stabilizes data retention in ultradeep submicrometer technologies with a VDD below 1.2 V. Unlike previously published power-gating structures, ours reduces ground-bounce noise by precisely controlling the amount of charge supplied to a functional logic unit at a particular time while assuring the minimum "virtual power supply" voltage VVDD required for data retention by monitoring and feedback.
We have evaluated our new power-gating structures by designing and fabricating a test structure in 65-nm CMOS bulk technology using single-threshold devices for both logic and sleep transistors. At the end of this paper, we present measured results from this structure that show the potential benefits of our approach.
II. KEY OBSERVATIONS
A. Understanding Ground Bounce
In active mode, a sleep transistor in a power-gating structure operates in its linear region, in which it may be modeled by a resistor R active . This generates a small voltage drop V VGND equal to I active × R active , where I active is the total current demand of the logic block operating in active mode. The voltage drop reduces the gate's drive capability from VDD to VDD-V VGND and increases the threshold voltage of NMOS pull-down devices due to body effect. Both effects degrade the speed of the circuit, and so, the sleep transistor should not be too small.
In standby mode, the sleep transistor operates in the cutoff region and may be modeled by an open switch. In this mode, the sleep transistor limits the leakage current, but all internal capacitive loads connected to the VGND node through NMOS pulldown devices are charged up to a steady-state value near VDD.
If the sleep transistor is abruptly turned fully on, all the charge are trapped in the internal capacitive nodes, and the VGND node discharges rapidly through the switched NMOS pull-down paths of the logic blocks and the sleep transistor. For a time, the sleep transistor operates in its saturation region and may be modeled by a current source. The current that can flow through the sleep transistor in this situation is much larger than the active-mode current I active , and this current surge induces voltage fluctuations in the power distribution network.
B. Experimental Evaluation of a Conventional Power-Gating Structure
In this section, we present measurements obtained from a test chip which was specifically designed and implemented to evaluate conventional power gating. In particular, we will show the seriousness of the inductive noise induced by instant turn-on of the sleep transistor in a conventional power-gating structure. Fig. 2 shows a microphotograph and block diagram of the test chip, which was designed and implemented in 0.13-µm CMOS technology. It includes two identical DSP 40-bit arithmetic units (ALUs), but one is directly powered by the VDDL 2 grid, while the other draws power from VVDD 1 , which is serially connected to the VDDL 1 grid through a sleep transistor. This sleep transistor is sized at less than 1% of the total ALU PMOS and NMOS width and is composed of a parallel instantiation of standard cells. The critical path through each ALU includes a saturating adder with data inputs supplied by two 40-bit linear feedback shift registers (LFSRs) that generate pseudorandom patterns with a switching factor of approximately 50%. Each element in the pipeline includes a data transition barrier to prevent unwanted switching of elements when they are clockgated. Results are transferred from the output register to a multiple-input signature register (MISR).
Repeated tests were made over a range of supply voltages between 0.9 and 1.5 V. In each test, the clock frequency was increased until an error signature was detected by the MISR. The highest nonfailing frequency at each supply voltage was recorded. The standby leakage power was measured by stopping the clocks and recording the supply current.
The test results in Figs. 3 and 4 compare the performance and leakage power consumption of the two ALUs. Fig. 3 shows that the small sleep transistor incurs a performance penalty that ranges from 23% at the lowest operating frequency to 13% at the highest. Nevertheless, a sleep transistor of this size is effective in reducing the leakage power. The differential in power consumption is nearly three orders of magnitude at the lowest supply voltage and nearly four orders of magnitude at the highest, as shown in Fig. 4 .
We also measured the "wake-up latency," which is the time that elapses in bringing the circuit out of sleep mode, until it is operating at 95% of the maximum operating frequency for a given supply voltage. The inductive noise due to clock gating is effectively excluded from this measurement since the performance degradation due to the clock gating itself is around 5%. Initially, we turned off the sleep transistor by setting |V GS | = 0 and waited until all internal nodes and the VVDD 1 node were completely discharged. Then, we turned on the sleep transistor by setting |V GS | = VDD and measured the shortest wake-up latency that did not lead to failure. This test was repeated for a range of supply voltages. The resulting wake-up times, ranging from 498 to 807 ns, as shown in Fig. 5 , demonstrate the serious effect on performance of the inductive noise due to the power-gating structure. To make matters worse, the other circuits sharing the same power rails are similarly disturbed.
III. PROPOSED POWER-GATING STRUCTURES
In previously published power-gating structures, the sleep transistor is implemented as a single transistor or a set of transistors. As shown in Fig. 6 , a sleep transistor implemented as a set of individual transistors wired in parallel is effectively a single transistor because the transistors share both a VVDD node and a VDDL rail and are turned on simultaneously. During a mode transition, the large instantaneous current flowing through the sleep transistor of a conventional power-gating structure causes large voltage fluctuations in the on-chip power distribution network.
We propose three different approaches to minimize the instantaneous current flow through the sleep transistor. The first is to control V GS and, hence, V DS dynamically, as shown in Fig. 7(a) . During the relaxation stage, the sleep transistor is weakly turned on with V GS = V X (0 < V X < VDD), until its V DS is significantly reduced. Later, the sleep transistor is completely turned on with V GS = GND. When the V DS of the sleep transistor is small enough, the instantaneous current is less sensitive to variations in the V GS of the sleep transistor, which allows V GS to be increased in nonuniform steps without increasing the instantaneous peak current.
Our second approach is to change the effective size of the sleep transistor dynamically, as shown in Fig. 7(b) . Initially, the sleep transistor is only partially turned on, with its V GS equal to GND until its V DS is significantly reduced. Then, the sleep transistor is completely turned on, with its V GS at GND. When V DS is small enough, the instantaneous current is less dependent on the extent to which the sleep transistor is turned on, which is why we turn it on in a nonuniform stepwise manner.
Our third approach is to control the amount of charge trapped in the internal parasitic capacitive loads precisely by means of the charge-sharing effect, as shown in Fig. 7(c) . Two PMOS sleep transistors (M1 and M2) are stacked between VDD and VVDD, with a metal-to-metal capacitor (C M2M ) between them. To reduce the ground-bounce noise, either M1 or M2 is turned on and off by nonoverlapping or pseudorandom pulses, while the presence of C M2M allows us digitally to control the amount of charge supplied to the logic during the change from sleep to active modes. In detail, a charge passes from VDD to the metal-to-metal capacitor C M2M via M1 and then to VVDD via M2. By repeating this process, VVDD eventually reaches the level of VDD. At this stage, both M1 and M2 are turned on, connecting VVDD to VDD. Fig. 8 shows now that our power-gating structure combines the aforementioned three different approaches to minimize the instantaneous current flow through the sleep transistor. We can also insert an NMOS data-retention device in parallel with the power-gating device, which allows us to support an intermediate power saving and data-retention mode in addition to the power cutoff mode. The minimum voltage that is guaranteed not to violate the static noise margin of the storage elements in a sequential circuit is known to be about 0.7 V. Considering the low supply voltage of 65-nm CMOS (V DD = 1.2 V), the threshold voltage of NMOS devices (V TN = 0.3 V + α), and their process, voltage, and temperature (PVT) variations, reliable data retention is hard to ensure using only NMOS. To stabilize VVDD by monitoring and feedback, we add a PMOS charge pumping device (M4) in parallel with the conventional power-gating devices (M1, M2, and M3). We monitor VVDD using external circuitry and feed the output back to the pumping device. The resulting stabilized VVDD allows us to retain stored data reliably and to make a further reduction in dataretention voltage (DRV). To compensate for the performance degradation caused by the voltage drop across the sleep transistors, the body of the PMOS devices used to implement a functional unit is connected to VVDD instead of VDD.
IV. TEST CHIP DESIGN AND EXPERIMENTAL RESULTS
To demonstrate the effectiveness of our power-gating structures in 65-nm CMOS bulk technology using a single-threshold devices, we designed the test circuitry shown in Fig. 9 . It consists of a 16-bit arithmetic and logic unit (ALU), and 28 power-gating cells (PGs). Each power-gating cell includes two stacked PMOS sleep transistors (M1 and M2) with a metal-to-metal capacitor (C M2M ), the NMOS data-retention device (M3), and a PMOS charge pumping device (M4). No additional processing is required to implement C M2M as a metal-to-metal parasitic capacitance.
The ALU is powered by the VVDD grid through a sleep transistor, which is sized at less than 5% of the IR drop, so as to minimize the sacrifice in maximum operating frequency. The ALU includes add and subtract units, a shifter, and a logic unit, and operates at 714 MHz and 1.2 V. Its critical path is through a 16-bit adder, with data inputs supplied by two 16-bit linear feedback shift registers (LFSRs) that generate pseudorandom patterns. Results are transferred to an multiple input signature register (MISR).
There are several benefits of combining stacked sleep transistors with capacitors. First, the magnitude of power supply voltage fluctuations during power mode transitions will be reduced because these transitions are gradual. If we can predict the amount of charge flowing out of VVDD, we can easily control the transition time. Second, the standby leakage current will be further reduced because of the body effect during idle mode. Ideally, the intermediate node between the stacked sleep transistors can be discharged to near GND. In this case, the body of the sleep transistor, which is connected to VVDD, is induced to reverse bias, and the effective threshold voltage is increased. Third, while conventional power gating uses a high-threshold device as a sleep transistor to minimize leakage, a stacked sleep Fig. 8 . Power-gating structure in which a proportion of the sleep transistors is switched on in a nonoverlapping or pseudorandom manner, with data-retention devices. Fig. 9 . Block diagram of the test chip used to evaluate our proposed power-gating structure. structure can achieve the same effect with a normal threshold device and can be implemented in a standard process. Fig. 10 shows a die photograph of the test chip fabricated in 65-nm standard digital CMOS process. The 28 new powergating cells are inserted between the power supply node of the ALU and the supply voltage line VDD. Sizing the sleep transistors is one of the major challenges in designing the powergating cells. If we overestimate their size, then we waste silicon area, but if we make them too small, the required performance may not be achieved due to increased resistance between the circuit and VDD. The sleep transistor of our power-gating cells is over-sized to handle the worst-case current through the ALU. As a result, the total area of the sleep transistors and retention devices requires slightly more than 10% extra layout area. The total capacitance of the metal-to-metal capacitors inside the power-gating cells is 0.980 pF (28 PG cells × 35 fF/C M2M ). The area penalty induced by the capacitors is about 6%. Fig. 11 shows that the virtual VDD (VVDD) increases gradually whenever charge sharing occurs between C M2M and VVDD. In data-retention mode, VVDD is maintained uniformly by M3 and M4 when the target DRVs are 1.1 or 1.05 V.
Due to the relatively large on-board parasitic capacitance, off-chip measurement of ground noise may not be sufficiently accurate to allow us to understand the impact of on-chip ground-bounce noise induced by an instantaneous power mode transition of a sleep transistor in a power-gating structure. To investigate the effect of ground bounce on the neighboring internal circuitry, the peak value of the supply current is measured for power-mode transitions of the power-gating cells, as well as the off-chip VVDD. Fig. 12 shows the measured VVDD and peak current during the three different types of wake-up transition. Compared to the conventional abrupt wakeup, our stepwise and randomized turn-on mechanisms reduce the instantaneous peak current by 38% and 27%, respectively. Fig. 13 shows the relationship between the supply voltage and the normalized leakage power in sleep mode with and without power gating and in data-retention mode. These measurements indicate a reduction in leakage by a factor of between 37 and 45 for supply voltages between 0.8 and 1.3 V. When the device enters data-retention mode, VVDD falls below the supply voltage. For example, VVDD ends up around 0.7 V at a supply voltage of 1.2 V. Fig. 13 also shows how power consumption increases when VVDD is raised to 1.0 or 1.1 V to ensure a reliable data retention with a supply voltage of 1.2 V. The dynamic power consumption caused by the charge pumping device is only a slight addition to the power consumed by leakage in data-retention mode. At a power supply voltage of 1.2 V, the cost of stabilizing VVDD at 1.0 V, so as to retain stored data reliably, rather than at 0.8 V, is an increase in leakage. However, in data-retention mode, with a VVDD of 1.0 V, the overall power consumption is around 52% less than when it is without power gating.
V. CONCLUSION
We have investigated the ground bounce caused by large charge and discharge currents through a sleep transistor during the mode transition of a power-gating structure. Several novel power-gating structures utilizing holistic integrated devicecircuit-architecture approaches have been proposed to reduce the magnitude of voltage glitches in the power distribution network, as well as the time required for the network to stabilize. In addition, techniques have been presented to stabilize the minimum virtual power supply voltage required for data retention. The feasibility of our structures has been proved in silicon using very-deep submicrometer bulk CMOS technology. Experimental results show that the ground bounce is reduced by switching power modes in a stepwise or pseudorandom manner and that reliable data retention can be achieved by compensating for the effect of changes to the data-retention voltage caused by PVT variations. 
