Abstract
Introduction
With technology stepping into the submicron region, power issues have already reached a bottleneck in the design of portable and wireless electronic systems. The total power dissipation consists of dynamic power, short circuit power and leakage power, thus can be expressed as: are the transition probability, load capacitance, leakage current, and short circuit charge of the i-th gate, respectively. The behavior of the short circuit power dissipation remains at around 10% of the total power dissipation [2] . With the development of the fabrication technology, leakage power dissipation has become comparable to switching power dissipation [3] . At the 90nm technology node, leakage power may make up 42% of total power [4] .
Inevitably, techniques are necessary for reducing the increasing leakage power. These leakage control methods can be broadly categorized into two main categories: process level and circuit level techniques [5] . At the process level, leakage reduction can be achieved by controlling the dimensions (length, oxide thickness, junction depth, etc.) and doping profile in transistors.
Here we talk about circuit design techniques, namely, adapt body bias [6] , DVTS [7] , input vector control [8] , dual-V t assignment [9] [10] and Multi-Threshold CMOS (ST insertion). Among these, Multi-Threshold CMOS (MTCMOS) is a valuable technique for reducing leakage power in the circuit standby mode. MTCMOS technique is essentially placing a sleep transistor between the gates and the power/ground (P/G) net in order to put them into sleep mode when the circuit is standby. The most popular MTCMOS technique is gating the power of sizable blocks using large sleep transistors which assumes that all gates have a fixed slowdown [11] (Figure 1. (a) ), which has some advantages over the block level design (Figure 1. (b) ), is raising people's concern.
The existing literature on MTCMOS circuits [11] [12] [13] [14] [15] present cluster based methods for sleep transistor insertion and sizing. [11] first gives out a mutual exclusion method to reduces the area penalty. [12] [13] present several heuristic techniques for efficient gate clustering and try to mitigate the ground problem by introducing additional power penalty. In [14] [15], a Distributed Sleep Transistor Network (DSTN) approach is proposed which connects all the sleep devices to reduce the area penalty.
Although cluster based methods reduce the area penalty, they induce large ground bounce in the P/G network which has adverse effects on circuit speed and noise immunity [16] . What is more, the sleep transistor's size is determined by the worst case current of the clustering block. However identifying the worst case is quite difficult without comprehensive simulation [11] . Thus it is harder to guarantee circuit functionality for large blocks with only one sleep transistors [1] .
The fine-grain MTCMOS design methodology is discussed in [1] [16] . In [1] , a fine-grain MTCMOS design methodology and several design rules are proposed. The authors also make a comparison between local and global devices. [16] presents a selectively sleep transistor insertion methodology with better utilization of circuit slack. They first select where to put the sleep transistors by a heuristic method and then solve an LP model to get optimal sleep transistor size. Although the second step can give out an optimal sizing result, the first step may lead to a local optimal point. Furthermore, in the second step they assume the sleep transistor size is continuous which is not the real case. This paper presents three contributions to leakage reduction through fine grain sleep transistor insertion. First, we give out our newly developed leakage current model and delay model of a single gate, which are much simpler and more exact than the models in traditional fine grain sleep transistor insertion strategy. Secondly, a formal mixed-integer linear model of the leakage current reduction problem provides the designers the relations between leakage current and circuit constraints, and makes it possible to decide where to put the sleep transistors and the sizing of the sleep transistor simultaneously and optimally. The model can be solved when the circuit slowdown is not long enough to perform the conventional fixed slowdown based sleep transistor insertion. Even if the circuit performance is not influenced, our model can achieve an impressive leakage saving. Furthermore, if the conventional fixed slowdown method can be performed, our method still leads to a larger leakage saving and a much smaller total sleep transistor size. Finally, we show that the model can be solved with discrete sleep transistor size constraint which is much practical in the real life.
The paper is organized as follows. In Section 2, we give out our leakage current and delay model for a single gate. The detail of MLP model construction is proposed in Section 3. The implementation and experimental results are presented and analyzed in Section 4. In Section 5, we conclude this paper.
Preliminaries
First we give out our definition of leakage current and delay model. A cell-based design flow with a given cell library is used. We assumed sleep transistors with variable size which is decided by the process technology are used in our fine-grain sleep transistor insertion design. A combinational circuit is represented by a directed acyclic graph (DAG) G = (V, G). A vertex v ∈ V represents a CMOS gate from the given library, while an edge (i, j) ∈ E, i, j ∈ V represents a connection from vertex i to vertex j. We define I l (v), D(v) as the leakage current and delay of gate v respectively.
Leakage current model
The average leakage power dissipation P leakage (G) of the circuit can be expressed as the product of the average leakage current and power supply voltage.
(1) The circuit average leakage current can be calculated as the sum of the individual gate's average leakage current. As we all know the leakage current of a CMOS gate is decided by its structure and input pattern. We define the probability of a gate v under input pattern IN as PB (v, IN) . Thus the leakage current of a gate v in the circuit can be expressed as:
In our fine-grain sleep transistor insertion design the leakage of a gate in the circuit is also determined by whether the sleep transistor is inserted to this gate or not. 
On the other hand, the subthreshold leakage current I v is still sensitive to the input patterns, the data shown in Table 1 is the average leakage current for which we assume all the input patterns have same probability. As shown in Table 1 , the error is less than 0.39% and the original leakage current without sleep transistor is at least 15X larger than 
I v is about 15% of A(v).
Thus we use lookup table to model the leakage current of gates with no sleep transistor, and linear equations to model the leakage current of gates with sleep transistors. As we can see, our leakage current model for a single gate is very simple and accurate.
Delay model
In our fine-grain sleep transistor insertion design, we have to insert sleep transistors to the original gates in the given library. As shown in [18] , the delay of the gate is influenced by the sleep transistor insertion. 
where C L , V THlow , , K are the load capacitance at the gate output, the low threshold voltage, the velocity saturation index and the proportionality constant respectively. The propagation delay 
where V x is the V ds of the sleep transistor, that is to say the voltage drop from V DD to the virtual V DD as shown in Figure 1 . We define the difference of (6) (7) (8), we can get an approximate D(v) with neglectable difference using Taylor series expansion:
We use a constant =2 /( V DD -V THlow ) to simplify the equation (9) since V THlow , , V DD are all technology dependent constant. We suppose I ON (v) is the current flowing through the sleep transistor in the gate v during the active mode, and can be expressed as [16] :
Thus the voltage drop V x in gate v due to sleep transistor insertion can be expressed as:
Here we use ( ) v Ψ to simplify the equation. From above we can get D(v) as:
From equation (10), we can see V x is slightly larger than the actual value, thus D(v) is a little bit larger than the actual value which make our model more feasible to maintain the timing constraints of the circuit. 
MLP model construction

Objective function
Proceedings of the 7th International Symposium on Quality Electronic Design (ISQED'06)
We use equation (3) 
Refer to equation (3) and (5), we can hence replace equation (13) with:
where ST(v) and (W/L) v are variables which decide where to put sleep transistor and how to size the sleep transistor respectively.
Timing constraints
First we consider the primary input (PIs) and output (POs) gates of the circuit. The arrival time t a of all the PIs are set to zero, while the required time of all the POs are less than the overall circuit delay T req .
( 1 7 ) Then we notice that the sum of gate v's arrival time and its delay must be smaller than the arrival time of gate v's fanout gates. That is to say, ( , ) i j E ∀ ∈ , , i j V ∈ , we can derive the constraint as: 
Linearization constraints
First we define variable W(v) for each gate, where
. We use a similar piecewise linear approximation technique in [19] to linearize these exponential expressions with inequalities:
Secondly, in equation (15) and (19) , a set of items to be linearized is :
where WL(v), WLN(v) , WLN2(v) are real variables while ST(v) is binary. As in [19] , C B A = × , where A is a binary variable and M is an upper bound of B, is linearized as follows: 
Implementation and experiment results
We use ISCAS85 benchmark circuits to evaluate our MLP model. The netlists are synthesized using Synopsys Design Compiler and a TSMC 0.18μm standard cell library. The leakage current look up table is generated by HSPICE with TSMC 0.18μm CMOS process and a 1.8v supply condition. The values of various transistor parame-
ST(v) are binary variables
Proceedings of the 7th International Symposium on Quality Electronic Design (ISQED'06) Table 3 Comparison between MLP-C and Fixed-slowdown 
≤ , corresponding to a least delay variance of 6%. Thus for 0%, 3%, and 5% circuit slowdown, we can not get a valid solution through conventional fixed slowdown method. On the other hand our MLP model can lead to an impressive leakage current saving. When the performance slowdown is 7% and 9%, the conventional fixed slowdown method is implemented with a larger area penalty and a smaller leakage current saving compared with our MLP-C model.
As shown in Table 2 , the MLP-C model can achieve 79.75% leakage saving even if the circuit performance is not influenced. When the circuit slowdown is 3% and 5%, the leakage saving is 93.56%, 94.99% respectively through our MLP-C model. As we can see, our MLP-C model can achieve more leakage saving in the 5% circuit slowdown condition than fixed slowdown method in the 7% or 9% circuit slowdown condition. However, the difference of the leakage saving between our model and conventional fixed slowdown method is not as large as that mentioned in [16] . In our experimental results, the difference of leakage saving between our MLP-C model and fixed slowdown method in the same circuit slowdown condition is within 11%. That is caused by the difference leakage current model. When the performance slowdown is larger than 6%, our MLP-C model can get a optimal result with all the ST(v)=1, which leads to the same result as optimal sizing with sleep transistors placed everywhere [16] .
In Table 3 , we compare the area penalty between MLP-C model and fixed slowdown method. As we mentioned above, the difference of leakage saving is not very large. However, our MLP-C model can achieve a much less sleep transistor area penalty. With 7% circuit slowdown, our MLP-C model leads to 74.79% sleep transistor area saving compared to fixed slowdown method.
Obviously Table 4 shows the comparison of MLP-C, MLP-CtoD and MLP-D. As we can see, the MLP-C and MLP-D model can get the same ST gate number under the same circuit slowdown condition. Therefore the difference of leakage saving through MLP-C and MLP-D is determined by the ST area difference. Refer to our leakage current model in Section 2 and data in Table 4 , the difference of leakage saving is very small, about 0.1%. Meanwhile, as shown in Table 4 , our MLP-CtoD method is a very good approximation to MLP-D model: the difference in leakage reduction is within 0.01% and the MLP-CtoD method is about 30X times faster than MLP-D model.
From Table 4 , when the circuit slowdown is below 6%, not all the gate in the circuit can use the sleep transistor scheme, thus a MTCOMS gate may drive a traditional CMOS gate, which can put the output of the MTCMOS into a floating gate. We also use a leakage feedback gate structure [21] in order to avoid floating states. Meanwhile the results for the area penalty imposed by the fine-grain sleep transistor in [16] show that the area Proceedings of the 7th International Symposium on Quality Electronic Design (ISQED'06) penalty is just around 5% through a standard cell placement methodology.
Conclusions
We have presented a mixed integer linear programming method to simultaneously place and size the sleep transistor in our fine-grain sleep transistor design to minimize the leakage current. A novel leakage current and delay model of the fine-grain sleep transistor design is presented in order to build up the MLP model. Our MLP model can minimize the leakage current to about 79.75% even though the circuit performance is not influenced. Two MLP model: MLP-C and MLP-D with different sleep transistor size constraints are presented and compared. The MLP-D uses a discrete sleep transistor size constraint which is more practical. An MLP-CtoD method is introduced to speed up MLP-D model and approximate the MLP-D model very well. Our experimental results show that the MLP-C model can achieve 93.56%, 94.99% leakage saving when the circuit slowdown is 3%, 5% respectively. The MLP-C model also achieve on average 74.79% less area penalty compared to the conventional fixed slowdown method when the circuit slowdown is 7%. The MLP-D model can also achieve just 0.1% less leakage saving compared to the MLP-C model. The MLP-CtoD method can speed up the MLP-D model 30X times within almost no difference in leakage reduction. Proceedings of the 7th International Symposium on Quality Electronic Design (ISQED'06)
