Introduction
For the low leakage, high performance operation of any VLSI circuit, the Power Gating technique is treated as the most effective one which can substantially reduce the leakage current in standby mode. Now, considering the previously proposed circuit level approaches, the use of sleep transistors for Power Gating is found to be the most popular one [1] [2] [3] [4] [5] . When the circuit is in active mode these sleep transistors are 'ON'. But, for the standby mode of operation, these transistors get turned 'OFF', and that in turn disconnects the logic cells from the V dd (or, Ground) rail. In conventional Power Gating architecture, a 'header' and a 'footer' switch used to be connected in series with the PUN (Pull-Up Network) and PDN (PullDown Network) of the logic circuits respectively. As illustrated in Fig. 1 , the virtual-V dd rail (virtualGround rail) could be disconnected from the actual V dd (Ground) by turning-off the 'header' ('footer') sleep transistor; and thereby reducing the leakage power. But in active mode, these sleep transistors need to be turned 'ON', such that the logic circuit works fine as per its functionality. Now, instead of using both 'header' and 'footer' sleep transistors, the same leakage power reduction can be achieved by using any one of the two switches. Considering the perspective of area required, effective conductance etc., it is better to use NMOS sleep transistors as the footer switches [2, 3] . Now, for an effective implementation of Power Gating, to reduce leakage power, it is very much essential to determine the proper size of the sleep transistors. It is found, for a specific placement technique, the amount of performance degradation of the circuit usually depends on the size of the sleep transistors [1] . For the larger sleep transistors, it can be seen that the performance degradation is lesser [1] . But simultaneously those larger transistors require larger area, and a significant amount of driving energy [5] . Whereas, the insertion of smaller sleep transistors may cause an increase in performance degradation, which is also not acceptable [1] . So obviously, there is a trade-off in between the power consumption and the performance of the circuit. To find out an effective Power Gating Strategy, a rigorous analysis has been done here, in this work. We started with the conventional Power Gating, where there used to be a large single transistor which can gate the entire logic circuit [6, 7] . Then we have considered another popular and well practiced Power Gating technique, called Cluster Based Sleep Transistor Design; and tried to find out its effectiveness in reducing the leakage power, as well as maintaining the performance of a logic circuit. After that, the concept of Distributed Sleep Transistor Network has been employed for the very same purpose. And lastly, a modified architecture of the tunable sleep transistor cell has been introduced to reduce the standby leakage of a logic circuit, without degrading the overall performance much. Now, for all the cases, the different Power Gating Strategies (as mentioned above) have been implemented on a basic circuit which is actually the 4×4 multiplier design as it is described in [8] .
Leakage Power & McCMOS Technique
Though the reduction of the device dimensions, with each technology node, has increased the integration density as well as resulted in a substantial improvement of the speed [9, 10] ; but unfortunately, considering the aspect of power consumption, this has led to a situation where the leakage power has become a major contributor to the total power consumption. Considering the deep-submicron devices or, the nano-devices, where the v th is quite low, the leakage power dissipation that occurs in a circuit, is mainly due the sub-threshold and the gate leakage current. Besides, the Gate Induced Drain Leakage (GIDL), the Band To Band Tunneling (BTBT) etc. are the other contributors which have become a concern in case of the advanced MOS devices [10] . Due to the non-zero minority carrier concentration, in the 'weak-inversion' region, there occurs a current conduction between the source and the drain of the MOS device; even if the applied gate voltage is below the v th . This is actually the sub-threshold current [9] . Considering the 'weak-inversion', the DIBL (Drain Induced Barrier Lowering) effect as well as the body effect, we can model the subthreshold current conduction as [9, 11] , sub-threshold swing co-efficient is denoted by m , whereas the linearized body effect co-efficient and the DIBL co-efficient are represented by the terms '  and  respectively. And as there exists an exponential relationship of the sub-threshold current to the change in v th , therefore assigning the higher-v th to the transistors in a circuit can be very useful in reducing the leakage current, and thereby reducing the leakage power [12] . But, the problem is that the higher-v th increases the equivalent ON-resistance (R ON ) for the transistors, and that in turn increases the delay [12] . The propagation delay through a transistor is generally denoted as,
Where, K is a factor which depends on the gate size, as well as on the process.  takes any value between we can see that the reduction of the v th can be useful to improve the overall performance at low supply voltages [14] . But, as we reduce the v th of the transistor, leakage current starts playing a dominant role [11, 14] . Thus, maintaining the performance of the circuit as well as reducing the leakage power dissipation becomes a key challenge for designing any low-voltage, low power digital circuit. For the conventional CMOS technology, the multiple channel length CMOS (McCMOS) technique is known to be one of the popular means by which we can reduce the leakage power [15] . As per the technique, the channel length of the transistors used in a circuit can be increased, wherever it is needed to control the leakage current. On the other hand, wherever it is required to maintain the performance (specially, for the transistors in critical path), we need to increase the width of the transistors [15] . 
Power Gating Strategies

Conventional Power Gating
In case of conventional Power Gating, generally, there used to be a large single transistor (of width W) which can gate the entire logic circuit [6, 7] . In active mode, when the sleep transistor is 'ON', it provides a resistance R ON , which is basically the channel resistance of the transistor. Let, d be the 50%
propagation delay for any logic block residing in a typical row of a CMOS circuit; whereas the load capacitance for the logic block and the supply voltage for the entire CMOS circuit are denoted by
where,  is the velocity saturation index [1] . Again, after the insertion of the ST, say the propagation delay value changes to 
Therefore, the / dd  ratio (which is the ratio denoting the delay degradation of the logic block) is actually proportional to the ST v [1] .
As per one of the mostly practiced methods, a constraint guaranteeing that 
Cluster Based Sleep Transistor Design
A large sleep transistor with a greater value of W generally causes a significant area overhead; and that in turn results in an excess consumption of power. Furthermore, a larger sleep transistor may nullify the leakage power savings as the sleep transistor itself will contribute a considerable amount of leakage in standby mode [4] . To mitigate the aforesaid problem, we may go for a Cluster Based Sleep Transistor Design, where the different logic gates inside a circuit module, can be grouped into more than one clusters; and the gates which belong to the same cluster need to be placed together [1] . Perhaps, each of the clusters is gated by a separate sleep transistor and the sizing of that sleep transistor is generally done by considering the amount of current flowing through the cluster [2] . As per one of the traditional approaches, for the purpose of grouping the logic gates into different clusters, the critical path for the circuit is determined, and according to that, the logic gates which reside in the critical path have been grouped together to form a cluster (C_cluster) and that cluster is generally power gated by a larger sleep transistor. However, the rest of the logic gates can be grouped in one (or, more than one) non-critical cluster (s). The non-critical cluster (NC_cluster) is generally power gated by a regular size sleep transistor [4] .
Distributed Sleep Transistor Network
Distributed Sleep Transistor Network is one of the popular means of Power Gating, where the area requirement is found to be much lesser compared to the CBSTD. Conventionally, in case of DSTN, a regular sized sleep transistor has to be placed locally for each of the clusters. And due to the proximity of the sleep transistors, the routing area overhead as well as the wire size become much smaller compared to those for any cluster based design structure [2] . Moreover, considering the 'timing-driven' placement, it is required that the gates with logic connections are placed closed to each other such that the overall interconnect delay gets minimized [2] . Now, the DSTN, as described in previous, can further be advantageous as because of its compatibility with the 'timing-driven' placement.
Cluster Based Tunable Sleep Transistor Cell Power Gating
As reported in [4] , the architecture of the tunable sleep transistor cell consists of 4 different sized parallel sleep transistors, which are driven by dedicated control NAND gates. Besides, the outputs of the NAND gates are distributed to the 'Gate' terminals of the sleep transistors through an inverter chain. In this work, we have mainly modified the architecture of the tunable sleep transistor cell of [4] , to a simpler structure, governed by (9) . Apart from that, here we have used AND gates instead of NAND gates, thereby excluded the use of the separate inverting buffer chain. As shown in Fig. 2 , the AND gates receive a 4-bit pattern (B3, B2, B1, B0), and depending upon the SLPBAR1 signal, the values of those 4-bit can be used for the purpose of switching 'ON' or, 'OFF' any of the four sleep transistors. Now, W=135 nm being regular width of the sleep transistors, that we have used in our design, the size of the other three transistors forming the tunable cell can be found from the equation,
where, 
Architecture of the 4×4 multiplier
An extensive analysis has been done here, in this work, with the aim of finding a suitable Power Gating strategy, which can effectively be used in reducing the standby mode leakage power of a digital circuit. For that very purpose, we have actually considered the conventional 4×4 multiplier circuit [8] , and applied various Power Gating techniques to gate the circuit. Now, the multipliers, which are vastly used in microprocessors, DSP and communication applications [10, 16] , can be simply viewed as the collection of adders [8] . The circuit of the 4×4 multiplier, as shown in Fig. 3 , uses a straightforward approach to accumulate the partial Figure 3 . Circuit design of the 4×4 multiplier [8] products with the help of an array formed by number of adders [8] . Now, for the performance optimization, it is very much required to find out the critical path of the circuit. The dotted line highlighted in Fig. 3 , shows the critical path that we have considered in our work [8] . Moreover, while designing the two-bit AND gates, as well as the adder circuits (both full adder and half adder), we have utilized the concept of McCMOS technique (as described in section 2). To optimize the power consumption, as well as to maintain the performance of the circuit, the L and W values of the transistors used for those basic building blocks, are required to be modified.
Results and Discussions
From Table I , we can have the quantitative information regarding the effects of sleep transistor sizing (in the case of conventional Power Gating scheme) on the performance of the 4×4 multiplier circuit. As it is illustrated in Table I , the gate length has been kept same in all the cases; whereas the width of the sleep transistor has been varied from a nominal value of W=135 nm, to some higher values. In order to limit the IR drop across the sleep transistor to a certain value (as per the constraint mentioned before, that is the ST v should not exceed 10 % of V dd ), we have considered the case where W= 700 nm, ST v = 89 mVolt, and the corresponding delay at output = 2.6052×10 -10 second. For the rest of this article, we will refer this delay value as the best case delay (d BC ). A similar analysis, for the 4×4 multiplier design Power Gated with CBSTD, has been shown in Table  II . However, one more constraint has been included in this case, and according to that, a 10 % increase in delay from the d BC value is taken as the maximum tolerance [4] . As shown in Table II , for a value of W= 400 nm, the maximum delay at output is 2.8091×10 -10 second, which is lesser than the critical value of 2.8657×10 -10 second (i.e., 1.10 times of d BC ). Watt, as the bit pattern varies from "0001" to "1111"; however, at the same time, the value of maximum delay at output decreases from 2.8174×10 -10 second to 2.5054×10 -10 second. Now, compared to the 4×4 multiplier with DSTN (as shown in Table III ), though the same with cluster based tunable sleep transistor cell Power Gating consumes almost similar power, but looking at the other aspects it provides much better performance. Again, for the sake of comparison, if we consider the 4×4 multiplier circuit of Fig. 3 , without any Power Gating scheme, then the value of the Average Power and the delay will come as 1.3862×10 -5 Watt and 2.3836×10 -10 second. Therefore, this modified tunable sleep transistor cell can obtain a 1.61 % reduction in the Average Power consumption at the cost of 6.79 % increase in delay. But, obviously looking at the performances of the other Power Gating schemes (like, conventional Power Gating, CBSTD, DSTN), the delay provided by the multiplier circuit with tunable sleep transistor cell Power Gating is found to be much lesser. 
Conclusion
In this work, we have focused on the impact of the several Power Gating strategies which significantly reduces the standby mode leakage power in any CMOS circuit, while maintaining a desirable performance or, speed. A fair comparison looking at the performances of the 4×4 multiplier circuit with the introduction of the different Power Gating schemes such as conventional Power Gating, CBSTD, DSTN, and cluster based tunable sleep transistor cell Power Gating, has been presented here. Compared to DSTN, as well as the other Power Gating schemes as discussed, the cluster based tunable sleep transistor cell Power Gating can provide best case performance with a 2.29% improvement with respect to the d BC . Moreover, the tunable sleep transistor cell has its inherent advantage of having the programmable parallel connection of transistors, which leads to the maximum dynamicity.
