Abstract-Power gating is an effective way to reduce leakage power. This technique uses high V th transistors, called sleep transistors, to turn off the power supply. However, sleep transistors suffer from the bias temperature instability (BTI) effect, resulting in an increased V th , and reduced reliability. This paper proposes two BTI-aware sleep transistor sizing algorithms to reduce the total width of sleep transistors based on the distributed sleep transistor network structure. The proposed algorithms reduce total width by more than 16.08%. More area can be reduced if the BTI effect on both sleep and cluster transistors is considered.
I. INTRODUCTION
As CMOS technology is scaled down, leakage power increases exponentially and thus has become a critical issue. An effective way to reduce leakage power is power gating [1] , [2] , [4] - [6] . This technique uses high-V th transistors, called sleep transistors, to turn off the power supply, reducing leakage power in standby mode.
There are two types of power gating designs: header-and footerbased designs, each of which use pMOS and nMOS as a sleep transistor. Because the sleep transistor behaves as a resistor, its width should be sufficiently large to avoid excessive IR-drop. Fig. 1(a) shows a footer-based design [1] . The circuit is divided into several smaller clusters, each with a pMOS header sleep transistor. However, the maximum instantaneous current (MIC) of a cluster may be large, resulting in large sleep transistor width. Long and He [6] proposed a distributed sleep transistor network (DSTN), as shown in Fig. 1(b) , and sleep transistor sizing algorithms were proposed in [2] - [4] to reduce area overhead. Compared to the cluster-based design, this design connects all virtual ground (V GND ) lines together. Therefore, the current can flow from one cluster to all sleep transistors, and a discharging current can be shared among the sleep transistors, reducing sleep transistor sizes. Note that the virtual ground also has wire resistance that cannot be ignored; therefore, an empirical parameter was used to replace the effect of the virtual ground resistance on a discharging current. However, this parameter cannot accurately model the effect of the virtual ground resistance.
To obtain accurate MIC/IR-drop profiles with the virtual ground resistance in the DSTN structure, a discharging matrix method is proposed method in [2] (UTPs) or variable-length time frame partitions (VTPs). When the number of UTPs is increased, more accurate MIC and smaller total sleep transistor widths can be obtained. However, the runtime is also increased. VTPs significantly reduce the runtime but slightly increase total width. Details on time frame partitioning can be found in [2] . NBTI is a major reliability issue in nanoscale technologies [7] . It is caused by traps at the Si/SiO 2 interface in pMOS transistors under a negative bias voltage (V GS = −V DD ). Interface traps are generated within the Si/SiO 2 interface, increasing V th and transistor delay. The accumulation of interface traps when a pMOS transistor is under a negative biased voltage is referred to as the stress phase. When the biased voltage is removed, and the effect is reduced, this is referred to as the recovery phase. The corresponding effect on nMOS is called positive bias temperature instability (PBTI). While PBTI is normally insignificant in the SiON process, it becomes comparable to NBTI and cannot be ignored in high-k metal gate technologies [10] . Therefore, it is necessary to consider both NBTI and PBTI effects in advanced high-k metal gate technologies.
The motivation to address the BTI issue in sleep transistors is the fact that sleep transistors are always turned on and stressed when in functional mode, and thus they suffer a more significant BTI effect than do other functional gates. Since sleep transistors are on the critical path of the current flowing from the power rail to the circuit, the V th degradation in a sleep transistor will reduce the circuit speed as a whole and cause long-term reliability issues. In addition, both sleep transistors and cluster transistors are subject to the BTI effect, whose impact thus needs to be addressed in order to design reliable power-gated circuits. Hence, this paper proposes two sleep transistor sizing algorithms that consider the BTI effect based on DSTN structure. The contributions of this paper are summarized as follows. 1) Two sleep transistor sizing algorithms are proposed to address the BTI-aware sleep transistor sizing problem. A trade-off between runtime and total sleep transistor width can be made. The first algorithm reduces more area at the cost of a longer runtime, while the second algorithm reduces runtime but yields a slightly larger total width. 2) Experimental results show that when only the BTI effect on pMOS sleep transistors in 90 nm technology is considered, the proposed algorithms reduce total width by 17%∼31% as compared to that obtained using the method in [2] . In 32 nm technology, the proposed algorithms reduce area by 16%∼28% when nMOS transistors are used as sleep transistors. When considering the BTI effect on both sleep and cluster transistors, more area reduction can be achieved.
II. PRELIMINARIES
A. BTI Model NBTI (PBTI) occurs when a pMOS (nMOS) transistor is under a negative (positive) bias voltage. The V th drift of a pMOS (nMOS) transistor due to the static NBTI (PBTI) effect can be described by a direct-current (DC) reaction-diffusion (RD) framework. If a transistor is under alternating stress and recovery phases, the DC RD model 0278-0070 c 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. should be modified into an alternating-current (AC) RD model [8] 
where a is a function of stress frequency ( f ) and signal probability (S). Since the impact of frequency is relatively insignificant, the effect of the signal frequency is ignored. K DC is a technology-dependent constant. More details about this model can be found in [9] .
B. Determination of Sleep Transistor Width
To maintain circuit operation, the IR-drop should be maintained in a certain range; hence, the sleep transistor width, W ST , can be represented as
where W ST is the minimum sleep transistor width that meets the IR-drop constraint requirement, and V ST is the maximum IR-drop across the sleep transistor. MIC(ST) is the MIC flowing through the sleep transistor, and k is equal to L/μC OX (V DD -V th ). More details can be found in [2] . The DSTN structure can be represented as a resistance network, as shown in Fig. 2 . Sleep transistors are replaced with resistors; thus, the resistance of sleep transistor i is represented as R(ST i ). Each cluster is modeled as a current source I i , and the resistance of the virtual ground line is modeled as resistor R V . Because the discharging current among all sleep transistors can be balanced through the DSTN structure, an accurate MIC(ST i ) cannot be measured easily using (2).
Chiou et al. [2] , [4] used Kirchhoff's current law and Ohm's law to obtain a discharging matrix in order to estimate MIC(ST i ). The discharging matrix for Fig. 2 is shown in (3) . ϕ is a 3 × 3 discharging matrix constructed from the resistance network of the DSTN structure. Every entry can be replaced by ψ ij , which is the percentage of the shared current from the current source I j to the sleep transistor ST i . Every entry is positive and can be calculated from resistance values The upper bound of MIC(ST i ) can be calculated using a discharging matrix and MIC(C i ), as shown in (4). After evaluating MIC(ST i ), the required sleep transistor width can be directly calculated using (2).
III. PROPOSED SLEEP TRANSISTOR SIZING ALGORITHM
The algorithm in [2] used the worst IR-drop constraint among all the time frames under consideration to determine sleep transistor sizes. However, not all of the sleep transistors have their worst IR-drop in the same time frame, and the sizing results in [2] were usually overly pessimistic. Furthermore, the BTI effect was not considered in [2] , and therefore a power-gated design could potentially fail to operate correctly after a period of time. This section proposes a modified sleep transistor sizing algorithm based on the DSTN structure to address the over-sizing problem due to the pessimistic scenario, with the BTI effect modeled for long-term reliability.
A. Modified BTI-Aware Sleep Transistor Sizing Algorithm
The proposed algorithm, increase and decrease sizing (IDS), contains three steps, as shown in Fig. 3 . V DEG is the maximum allowable IR-drop under the BTI effect. In step 1 (lines 1-4), the minimum initial width of the sleep transistors is determined. This is done by finding the minimum current of each cluster (line 2) and using (2) In step 3 (lines 10-16), the sleep transistors are reduced one at a time. The sleep transistor with the largest positive slack is adjusted first (line 11) since more reduction may be obtained. Then, this sleep transistor is reduced by the decrease ratio, which is defined as one minus the ratio of the worst IR-drop of ST i across all time frames to V DEG . This ratio is used because the difference between V DEG and the worst IR-drop of ST i across all time frames is the smallest. Hence, reducing the sleep transistor width using this ratio results in the lowest chance of violating the V DEG constraint. After this decrease ratio is used to adjust ST i , the new discharging matrix ϕ,
MIC(ST i , T j ), R(ST i ), and Slack(ST i , T j ) are updated (line 13). Then, it is determined whether all Slack(ST i , T j ) values are greater than or equal to zero (line 14)
. If all slacks are greater than or equal to zero, the current total width is better than the previous solution; otherwise, the previous solution is considered to be better and is therefore restored. This loop is repeated until no sleep transistor width can be further reduced, and all Slack(ST i , T j ) values are equal to or greater than zero.
B. Enhanced BTI-Aware Sleep Transistor Sizing Algorithm
Although IDS uses the decrease step to obtain a smaller total sleep transistor width than that obtained in [3] , both algorithms have a long runtime because they adjust one sleep transistor at a time. If the number of sleep transistors is large, the algorithms requires more iterations to recalculate the width of each sleep transistor. In addition, as mentioned in Section I, cluster transistors are also subject to the BTI effect, thus reducing the current. Hence, sleep transistor sizes can be further reduced if this current degradation is considered. Based on the DSTN structure, we propose an enhanced algorithm that adjusts all sleep transistors at the same time, while considering current degradation on cluster transistors.
The enhanced algorithm, dual decrease sizing (DDS), improves the total runtime while obtaining a comparable total width. It consists of three steps. Initially, the maximum initial sleep transistor width is chosen. Then, all sleep transistors are reduced simultaneously. Finally, sleep transistors are reduced one at a time for further improvement. Fig. 4 shows the enhanced algorithm in detail. MIC(C i , T j ) and V DEG , defined above, are inputs of the algorithm. CD i is the ratio of the . If all values meet the constraint, the total width is considered to be the best current width. Otherwise, the previous total width is determined to be the best width.
Step 3 is repeated until no sleep transistors can be further reduced with the decrease ratio. The last sleep transistor widths are the final output of the proposed algorithm. 
IV. EXPERIMENT SETUP AND RESULTS
The algorithms are implemented in C/C+, and the BTI model from Section II is used to obtain the V th of the sleep transistors. The benchmarks are from ISCAS 85 and 89, as shown in Table I . Fig. 5 shows the simulation framework. RTL netlists are synthesized to gatelevel netlists, and the SDF file is generated using design compiler. The netlists are then simulated to obtain the VCD file with 10 000 random patterns, and placement and routing are done to obtain the location of each gate and the virtual ground resistance using SoC Encounter. Based on the gate location, the gates in a given row are grouped as a cluster. The MIC of each cluster is estimated using PrimeTime and the VCD file, and the output file containing time frames and current information is generated. Then, the time frames are partitioned based on the information contained in the output file and are designated as the inputs for the experiments. Note that V DEG is set to 10% of the supply voltage. The probability of sleep transistors being turned on is set to 0.5, and V th is estimated for 10 years. The number of variable-length time frames is equal to the cluster numbers based on [2] , and the number of uniform time frames is set to ten times the cluster number. TSMC 90 nm and the PTM 32 nm high-k metal-gate technology model [11] are used. TSMC 90 nm technology considers only NBTI because PBTI is relatively insignificant, and 32 nm technology considers both NBTI and PBTI to demonstrate that the proposed sizing algorithms can be applied for both header-based and footer-based designs. The current inputs and cluster numbers of the 90 nm experiment are used in the 32 nm experiment. Both UTP and VTP [2] algorithms are compared with the IDS and DDS algorithms. Since DSTN, UTP, and VTP algorithms do not consider the BTI effect, in order to compare these works with the proposed algorithms, the maximum allowable IR-drop with the BTI effect, the V DEG constraint, is used to replace the worst IR-drop constraint.
A. Width and Runtime Comparisons When Considering the BTI Effect on Sleep Transistors
Table II compares the UTP, IDS, and DDS algorithms with uniform time frames in 90 nm technology when only the NBTI on pMOS sleep transistors is considered. Uniform time frames are used in UTP, IDS, and DDS algorithms. It can be seen that DDS and IDS reduce transistor width by 31.57% and 19.3%, on average, respectively. The runtimes of the IDS and DDS algorithms are 1.91X and 0.36X of the runtime required by UTP. This is because the UTP and IDS adjust one sleep transistor at a time, and the DDS algorithm adjusts all transistors simultaneously. Similarly, Table III compares TABLE VI  COMPARISONS OF TRANSISTOR WIDTHS CONSIDERING THE BTI EFFECT ON PMOS SLEEP AND CLUSTER TRANSISTORS IN 90 NM TECHNOLOGY the VTP, IDS, and DDS algorithms with variable-length time frames in 90 nm technology when only the NBTI on pMOS sleep transistors is considered. The proposed algorithms reduce sleep transistor width by 27.07% and 17.65%, on average, respectively. The runtimes of IDS and DDS are 3.26X and 0.24X of the runtime required by VTP, respectively. Compared to UTP and VTP, the proposed algorithms reduce the sleep transistor width more in 90 nm technology header-based designs.
Since PBTI becomes significant in 32 nm process technology, experiments are done for footer-based designs with the PTM 32 nm technology model. Table IV compares UTP, IDS, and DDS in uniform time frames. Compared to UTP, DDS, and IDS reduce sleep transistor width by 28.98% and 16.08%, on average, respectively. The runtimes of the IDS and DDS algorithms are 2.24X and 0.46X of the runtime required by UTP, respectively. Table V compares the VTP, IDS, and DDS algorithms in a variable-length time frame. Compared to VTP, IDS, and DDS reduce sleep transistor width by 27.95% and 17.7%, on average, respectively. The runtimes of IDS and DDS are 3.74X and 0.14X of the runtime required by VTP. It can be seen that the proposed algorithms reduce sleep transistor width more in both header-based and footer-based designs.
B. Width Comparisons When Considering the BTI Effect on Both Sleep Transistors and Cluster Transistors
Since the BTI effect occurs on sleep and cluster transistors, its influence on both needs to be considered. Table VI compares the UTP, VTP, and DDS algorithms in 90 nm when the BTI effect occurs on both pMOS sleep and cluster transistors. Five current degradation ratios are used in the cluster transistors: 10%, 15%, 20%, 25%, and 30%. The circuits are assumed to remain reliable when these current degradation ratios occur on the cluster transistors. It can be seen that when current degradation on the cluster transistors is not considered in uniform time frames, the sleep transistor width obtained by DDS is 72% of that obtained by UTP. However, when 10%, 15%, 20%, 25%, and 30% of current degradation on the cluster transistors are considered, the sleep transistor width is 63%, 59%, 56%, 52%, and 47% of that obtained by UTP. Similar results are obtained when variable time frames are used. As the current degradation ratio increases, DDS can reduce more sleep transistor width. This is because the current flowing through sleep transistors decreases due to the current degradation of the cluster transistors, and thus, sleep transistors need less width to satisfy the IR-drop constraints.
V. CONCLUSION
This paper proposed two sleep transistor sizing algorithms to reduce the total sleep transistor width under the BTI effect.
A trade-off between runtime and sizing results can be made by choosing the proper algorithm. The total sleep transistor width obtained using the proposed algorithms when only the BTI effect on sleep transistors is considered was reduced by 25%∼35% in 90 nm technology and by 13%∼31% in the PTM 32 nm technology model, on average. Therefore, when the BTI effect on both sleep and cluster transistors are considered, greater sleep transistor width reduction can be achieved.
