Abstract-Power switches are used as part of power-gating technique to reduce leakage power of a design. To the best of our knowledge this is the first study that analyzes recently proposed DFT solutions for testing power switches through SPICE simulations on a number of ISCAS benchmarks and presents the following contributions. It provides evidence of long discharge time when power switches are turned-off, when testing power switches using available DFT solutions. This may either lead to false test (false-fail or false-pass) or long test time. This problem is addressed through a simple and effective DFT solution to reduce the discharge time. The proposed DFT solution has been validated through SPICE simulation and shows an improvement in discharge time of at least 28-times, based on a number of ISCAS benchmarks synthesized with a 90-nm gate library.
I. INTRODUCTION
Power gating is a low-power design technique to reduce leakage power. It has gained popularity in sub 100-nm CMOS designs, where leakage power is a major contributor to the overall power consumption [1] . It utilizes power-switches (also called sleep transistors) to power-down the logic blocks during idle mode to reduce leakage power consumption [2] . Power switches are implemented as header switches or footer switches. This paper analyzes headers in detail but the results are equally applicable to footers. Power switches are usually implemented in either "fine grain" or "coarse-grain" design styles. Fine-grain style incorporates a power-switch within each standard logic cell with a control signal to switch on/off the power supply of the cell. In coarse-grain design style, a number of power-switches are combined to feed a block of logic. When comparing the two design styles, fine-grain design simplifies the incorporation of power-gating through existing EDA tools, but it has higher area overhead and it is more vulnerable to voltage drop fluctuations due to process, voltage and temperature variations [2] . Therefore coarse-grain design style is a more popular design choice in practice and is the focus of this work.
Recent research has reported a number of DFT solutions to test power-switches when considering the two possible type of faults: stuck-open and stuck-short [3] , [4] . The first DFT solution is reported in [3] , and is used to test power switches in both fine-grain and coarse-grain designs (Fig. 1) . Recently, in [4] the problem of long discharge time (in available DFT solutions) was highlighted without providing detailed analysis and implication on test quality. The first aim of this paper is to provide detailed analysis of available power gating DFT solutions and the second aim is to propose a simple and effective DFT solution (together with test vectors) for testing power switches to reduce the discharge time when they are turned-off.
The paper is organized as follows: Section II presents the shortcomings of available DFT methods for testing power switches leading to long test time. These shortcomings can be addressed by modifying the available DFT and associated test vectors, as discussed in Section III. Experimental results are reported in Section IV, and finally Section V concludes the paper.
II. ANALYSIS OF AVAILABLE DFT SOLUTIONS
In this section, we first analyze the imbalance in charging and discharging time of a power switch using HSPICE simulation, and then using a coarse-grain design, we demonstrate how it may either lead to false test (wrong identification of a switch as faulty or non-faulty) or long test time.
To quantize the charge (rise) and discharge (fall) time of a power switch, we constructed a test circuit (Fig. 1a) for testing a power-switch. The DFT of the circuit consists of a control register for controlling the test sequence, a multiplexer to enable the test mode, an AND gate and a comparator to detect a fault. The logic block in this example test circuit consists of five gates (two, 2-input NAND; two, 2-Input NOR and an inverter) from a 90-nm standard V ℎ CMOS gate library. The power switch used for this purpose is a high V ℎ PMOS transistor. IR-drop target of ≤ 5% [5] is achieved by using a power-switch with width of 1.1 m and length of 150-nm. The logic block is powered by the virtual supply (V ) and operating V is 1-V. The signal "TE" (Test Enable) is set to 1 (Fig. 1a) and the power switch is first turned-on (Test=0). The charge time is defined as the time it takes the voltage level to reach 90% of V and at that point the power switch is switched-off (Test=1) to observe the discharge time. The discharge time is the time it takes the voltage level to reach 10% of V . Through this experiment, it was found that the discharge time is significantly higher (5,068-times) than the charge time at room temperature (25 ∘ C; first row of Table I ). Temperature varies during test due to switching activity of the logic block and it also affects the leakage current of the logic block. To observe this effect on charge and discharge time of the power switch, temperature is varied from 25 ∘ C to 125 ∘ C in five discrete steps. The results are shown in Table I . As can be seen the discharge time is 1,723-ns at room temperature and it reduces to 91-ns as temperature increases to 125 ∘ C. This is because of the sub-threshold leakage current of a CMOS transistor and it occurs when the gate voltage is below V ℎ . This current is proportional to the square of the thermal voltage ( ; where T is temperature) [1] . As temperature increases the leakage of the logic block increases resulting in reduced discharge time. The last column of Table I shows the ratio between discharge time and charge time. The discharge time is 5,068 times higher than the charge time at 25 ∘ C, and it reduces to 233 at 125 ∘ C. These results (Table I) clearly demonstrate the imbalance in charge and discharge time of the power switch, as implemented in available DFT methods.
Next we demonstrate using a power-gating coarse-grain design how this imbalance may either lead to a false test or long test time. In coarse-grain designs, power switches are divided into segments to collectively test a number of power switches (Fig. 1b) . The number of power switches per segment has a trade-off between area overhead and precision in identifying faulty transistors, see [3] for more details. The design shown in Fig. 1b with 2 segments is tested using three test vectors (Table II) , where the first two test vectors are used to test transistors in segment 1 and segment 2 for stuckopen fault respectively. The third test vector is used to test stuck-short fault at either of the two segments with V . We analyzed the effect on test time through HSPICE simulation using these test vectors on the design shown in Fig. 1b . In this setup, the logic block consists of ten gates (four, 2-input NAND; four, 2-input NOR and two inverters) and single power switch per segment to achieve targeted IR-drop of ≤ 5%. For this experiment, we inserted a stuck-open fault in segment 2 transistor, which is tested using the second test vector (Table II) . Due to faulty transistor in segment 2, the logic value at the output of comparator "Out" should be "1". The SPICE simulation results are shown in Fig. 2 , as can be seen, identifying correct state of the switch requires sufficient waiting time. In this experiment, it takes more than 700-ns (1.2-s−0.5-s) to observe the correct logic value (Out=1) at "Out". On the other hand sampling the data earlier will result in false test (Out=0) i.e., identifying a faulty power switch as fault-free. Therefore to permit sufficient discharge time the test clock frequency must be less than 1.4 MHz ( 1 700×10 −9 ; as in this example) to avoid false test (false-fail or false-pass).
From test cost point of view, it is desirable to save test time, however in this case the test clock frequency is limited by the discharge time of the power switch. To use a higher test clock frequency, a simple and effective solution is using an NMOS transistor, referred as discharge transistor, between the virtual supply (V ) and ground, to quickly discharge the voltage at V after the power switch is turned-off. In practice, it is also difficult to estimate the discharge time accurately as it varies across logic blocks. Designing discharge transistor also allows accurate quantization of discharge time leading to not only higher test clock frequency but it also eliminates the possibility of false test (see Sec. IV-A for more details).
Discharge time overhead (700-ns) due to unavailability of a direct discharge path 700-ns 3 shows the proposed DFT for testing power switches, it requires discharge transistor and AND gate to improve the available DFT. The discharge transistor is enabled during test mode through TE (Test Enable) signal. During test the discharge transistor is controlled through the signal , which is an additional bit in the control register. In this section, we first describe the details of designing an efficient discharge transistor to ensure a balanced charge/discharge time at V . This is followed by a discussion on test vectors needed for testing power switches using the proposed DFT.
A. High Efficiency Discharge Transistor Design
A proper discharge transistor design is important not only for balanced charge/discharge time but it also affects the performance during normal operation (active mode or sleep mode) of the design. The discharge transistor is switchedoff during normal operation of the design and therefore high performance and leaky (low V ℎ or standard V ℎ ) transistors are unnecessary. This is why a high V ℎ (low performance and less leaky) NMOS transistor is a better choice to be used as a discharge transistor. As an example, consider the design shown in Fig. 3 . A balanced charge/discharge time at V is dependent on the power switch and the logic block. The power switch used through out this paper has gate length of 150-nm and width of 1.1-m. The logic block consists of five logic gates (two, 2-input NAND; two, 2-input NOR and an inverter). A methodology for high efficiency power switch design is discussed in [2] , and used in this work to design the discharge transistor. The discharge transistor is designed to achieve high efficiency i.e., high (I ) when transistor is switched-on, referred as I . For I ℎ simulation, V of the transistor is set to 10-mV for < 5% IR-drop target when operating at 1-V V , and for illustration the temperature is set to 125 ∘ C to model the operating mode. For I simulation, V is set to V and the temperature is set to 25 ∘ C to model the sleep mode.
The discharge transistor is designed in two steps. Firstly, transistor length is determined by sweeping the NMOS transistor width from 0.2-m to 5-m with increment of 5-nm; and this is repeated for a range of transistor lengths, from 90-nm to 300-nm with increment of 5-nm. The efficiency curve ℎ is shown in Fig. 4 . As can be seen, the highest efficiency is achieved at the gate length of 150-nm. Secondly, at this gate length (L =150-nm), the transistor gate width is varied to achieve balanced charge/discharge time at V . The power switch is turned-on to simulate the charge time and then it is switched-off to simulate the discharge time. Fig. 3 with a logic block of five gates. We also analyzed the effect of temperature variation on charge/discharge time at V , where the temperature is varied from 25 ∘ C to 125 ∘ C in five discrete steps. Fig. 6 shows the simulation results at three temperature settings showing the difference in charge/discharge time is less than 10-ps (representing < ±5% difference).
The proposed DFT fits well in the standard EDA power gating design flow. Fig. 7 shows the proposed DFT flow including the insertion of the discharge transistors. The power switch is usually placed after floor planning, followed by logic cell placement, clock tree synthesis and routing. Power switches are optimized next to ensure IR-drop target in every part of the design. This is followed by discharge transistor placement (highlighted with grey box) because it allows optimized performance of the design by giving priority to standard cell placement, clock tree synthesis and routing. Once the final IR-drop and power switch network are fixed, we have sufficient information to place the discharge transistors while ensuring balanced charge and discharge time. This is followed by Power Rail optimization, which is usually the last step in 
B. Test set for power gated design
For the proposed DFT, it is necessary to test the discharge transistors and the power switches for two possible faults: stuck-open and stuck-short. This is because a stuck-open fault (transistor drain-source open) in a discharge transistor will result in long discharge time of the power switch leading to a false test, while a stuck-short fault (transistor drain-source short) will lead to a stuck-at 0 fault at V . It was shown in [3] , [4] that for testing power switches in coarse-grain design, power switches are divided into segments, where each segment contains power switches. Each segment is tested separately for stuck-open and stuck-short faults. The number of segments and transistors per segment exhibit a trade-off between test time and test quality. Larger value of will require large number of test cycles for testing all power switches, but it improves test quality by isolating faulty power switches from the rest of fault-free switches. Fig. 8 shows the proposed DFT for coarse-grain power switches with segments. Power switches per segment are tested together and usually a single segment is activated during test [4] . The discharge transistors are designed to achieve a balanced charge/discharge time at V , assuming a single active power switch segment. Sec. IV-A discusses the implementation details of coarse-grain design using a number of ISCAS benchmarks. Table IV shows the test vectors to test a design using the DFT shown in Fig. 8 and assuming two segments = 2. The first test cycle turns-off both power switch segments (Segment 1 and Segment 2) and turns-on the discharge transistors to quickly discharge the voltage at V . The second test cycle turns-off the discharge transistors to test for stuck-short at test cycles are needed to test a design with power switch segments and a discharge segment using the proposed DFT (Fig. 8 ). For designs with ≥ 2 power switch segments, fourth test cycle (Table IV) should be repeated after applying stuck-open test at each segment other than the last segment, to discharge the voltage at V and to prepare for the next test cycle.
IV. EXPERIMENTAL RESULTS
To demonstrate the improvement in test time using the proposed DFT, we conducted an experiment using coarse-grain power gating designs. The designs are synthesized using 90-nm STMicroelectronics gate library and the netlist is converted to SPICE format using Synopsys STAR-RCXT. The operating voltage is 1-V and IR-drop of 5% or less is targeted for all designs. IR-drop is determined in active mode by simulating the voltage at V , while feeding transition pulses (high-tolow and low-to-high) to the primary inputs of each design. Table III lists some of the ISCAS benchmarks used for this experiment. The total number of power switches per design needed to achieve the targeted IR-drop of ≤5% is shown in the second column. As expected, the number of power-switches increase with the size of design. For 90-nm STMicroelectronics gate library used in this work, the ratio of power-switches to logic cells is on average approximately 1:5 for many different ISCAS benchmarks and due to space limitation we are showing results only for five benchmarks. The power switches are TABLE IV: Test patterns for testing power-switches and discharge transistors using the proposed DFT (Fig. 8) divided into a number of segments and the number of discharge transistors are chosen to achieve a balanced charge/discharge time at the V when only transistors of one segment are turned on. We also validated the charge/discharge time across a range of temperature settings (25 ∘ C to 125 ∘ C in five steps) and the worst-case difference is within ±5% of rise and fall time at the V , see Sec. III-A for more details. Since the voltage at V and the rise/fall time at V is related to the number of transistors switched on at a given time, we assumed three segment sizes "m" per design (5, 15, 25), referred as design configuration. For each of the three design configurations, the number of power switches per segment are shown in the third column, followed by the number of discharge transistors per design configuration in the fourth column. The width of each of the discharge transistors is shown in the fifth column, and as discussed in Sec. III-A, the length of each discharge transistor is kept at 150-nm but the width is varied to achieve a balanced charge/discharge time at V . The charge time at V with (proposed) and without discharge transistors (using [3] ) is shown in the columns marked as "Charge Time". As expected, the charge time reduces with higher number of power switches per segment and is approximately the same with and without discharge transistors. For example, in case of C432 as the number of power switches per segment increases from 5 to 25, the charge time reduces from 1.5-ns to 0.35-ns. The next column marked as "Discharge Time" show the difference in discharge time at V using the two (proposed and [3] ) DFT methods. As can be seen, there is significant reduction in discharge time using the proposed DFT method. On average, over all designs the reduction in discharge time is more than 346-times; in the best case the reduction is more than 1496-times (third row of C432) and the least reduction is more than 28-times (first row of C3540) in comparison to a design without discharge transistors. As discussed in Sec. II the discharge time has a direct effect on test clock frequency and in case of designs without discharge transistors, the test clock frequency is limited by the discharge time of the power switch. The last two columns of Table III marked as "Max. 
A. Coarse-grain power gating designs

