In today's sub-100nm CMOS technologies, leakage current has become an important part of the total power consumption, affecting both yields and lifetime of digital circuits. Dual V th assignment, which is proven to be an effective method of reducing leakage power in the past, is also effective in today's technologies with certain modifications. In the paper, based on a statistical timing analysis (SSTA) framework we presented a dual V th assignment method which can effectively reduce the leakage power even in the presence of large V th variation. Besides, we use a statistical DAG pruning method which takes correlation between gates into account to speed up the dual V th assignment algorithm. Experimental results show that statistical dual V th assignment can reduce on average 40% more leakage current compared with conventional static method without affecting the performance constraints. Our DAG pruning method can reduce on average 30% gates in the circuit and save up to 50% of the total run time.
Introduction
As IC technologies scales down, leakage power has become a dominant contributor to the total power consumption [1] . With the fast development of dual V th process, dual or multiple threshold voltage assignment techniques are becoming more popular due to their ability of reducing leakage power while maintaining the overall high performance, which can be achieved by setting different transistors in the circuit with different threshold voltages. The delay between high and low V th transistors is roughly twice, and the leakage current differs by almost thirty times [2] ; while dual V dd may only lead to 2 or 3 times leakage difference with many additional level converters. Therefore, dual V th assignment is proven to be a potent way of leakage power reduction.
Deterministic dual V th assignment approaches were first proposed: some heuristic algorithms for dual V th assignment [3] - [5] provided local optimal solutions; while in [6] , [7] , global optimizations were derived from linear programming formulation. As technology scaling, process variation can severely affect both power and timing yield. Hence, some researchers devoted themselves to probabilistic dual V th assignment approaches [8] - [10] which statistically optimized the leakage power and circuit performance. However, little work has been done to improve the speed of circuit optimization based on statistical timing analysis. This paper presents a novel SSTA-based dual V th assignment considering V th variation and distinguishes itself in the following aspects:
(1) V th correlation is taken into account in the delay model of the statistical timing analysis framework. An SSTA-based † This work was supported by grant from National Natural Science dual V th assignment technique which can effectively reduce the leakage power in the presence of large V th variation is proposed.
(2) A statistical DAG pruning method which takes correlation between gates into account is used to reduce the gate number in the circuits. Therefore the problem size is reduced and our dual V th assignment algorithm is sped up greatly.
Experimental results show that statistical dual V th assignment can reduce on average 40% more leakage current compared with conventional static method. Our DAG pruning method can reduce 30% gates in the circuit on average. The improvements suggest that statistical optimization is vital for high performance designs.
This paper is organized as follows. Section 2 presents the V th variation model considering correlation, and the gate delay/leakage models as the preliminaries for our SSTA based dual V th assignment. Section 3 first compares the problem definitions of dual V th assignment under different timing constraints; and then proposes our SSTA-based DAG pruning method. The implementation and experimental results are given and discussed in Section 4. Section 5 concludes this paper.
Preliminaries
Given a digital circuit, timing analysis tools translate it into a directed acyclic graph (DAG) G =(V,E), where each node v i V denotes a primary input, primary output or internal gate, each edge e i =(v m , v n ) E denotes the interconnect between gate v m and v n . In addition, a source/sink node is conceptually added before/after the primary inputs/outputs so that the graph can be analyzed as a single-input single-output network.
V th variation model considering correlation
In gate level circuit optimization, for a path from source to sink with n nodes v 1 , v 2 ,…, v n , we assume that the logically close nodes are also spatially close. We may further assume that the threshold voltage of node v i is correlated with node v i+2 , v i+1 , v i-1 , v i-2 , (which can be generalized to more nodes), and each correlation coefficient is:
(1) The threshold voltage of each node v i is expressed as linear combinations of the mean value and random variables: Notice that the physical level information can also be used to get the real spatial and correlation information for each gate. Therefore, the method proposed in this paper can be also applied in physical level design.
Delay model
The load dependent delay of gate v is given by [11] :
where C L , V th , , K are the load capacitance at the gate output, the threshold voltage, the velocity saturation index and the proportionality constant respectively. Using first order approximation, the delay can be expressed as
Since delay can be expressed as linear function of the threshold voltage as shown in equation (5), the correlation between gate delay and the threshold voltage variation can also be easily expressed.
Leakage model
For the gates with low/high V th , a leakage lookup table is created by simulating all the gates in the standard cell library under all possible input patterns. Thus the leakage current I leak (v) can be expressed as: The sub-threshold leakage current of each gate depends exponentially on V th , which is:
where V T are constants, and C is the capacitance of the depletion region under the gate area, if we assume V th is Gaussian random variable, Equation (7) will become
where V th,mean is the mean value of the threshold voltage and V th ~N(0, 2 ).
SSTA based Dual Vth Assignment

Problem definition
In STA, optimization object function is:
where I leak,l and I leak,h are the leakage current of gate with high and low threshold voltage, respectively; H(v) equals to one if V th of node v is high, otherwise it equals to zero. The timing constraints for the circuits are: 
where t a is the arrival time (random variable); D(i) is the delay (random variable) of gate i; P(y) is the probability that y is true; p is a given percentage between 0 and 1 representing the timing yield requirement of the circuit. If p 1, the above condition represents the worse case scenario for SSTA.
DAG pruning method
For path-based SSTA, the main problem is that there are too many paths in the graph, while actually only part of these paths should be inspected when timing requirement is concerned. So if the number of paths which needs to be considered can be reduced, the SSTA-based optimization will be more efficient. Therefore a DAG pruning method is used here to reduce the problem size and to speed up the algorithm.
The DAG pruning method is based on the following assumptions: if the max delay (no matter static or statistical) of all paths (with all the gates in high V th condition) associated with one specific node v i satisfy timing requirement, then this node v i and edges connected to it can be removed from the graph without affecting the timing analysis.
Denote source and sink as v source , v sink respectively. For node v i , delay of the paths passing through v i can be expressed as: 
And maxD(v x , v y ) can be represented using the arrival time from v x to v y which can be easily calculated in STA tool with linear scale time. However, in SSTA-based framework, the above equation holds only as long as the mean value is concerned. Yet with our V th variation assumption, we can derive an upper bound for the variance of maxD(v source ,v i ,v sink ), which will help us to prune the DAG in a statistical manner.
First, consider a path with m (m>2) node v 1 ,v 2 ,…,v m , according to Section 2, the path delay can be expressed as 
which can also be expressed as Next, we derive an upper bound of from (14) and (15) Assume the path of source to i has n (n>2) node, i to sink has m (m>2) node, there are: 
Using procedures similar to the derivation of (17) the following holds for all paths from source passing through node i to sink: 
Dual V th assignment
There are many existing literatures [2] - [10] [12] on this topic, the basic idea is to find out gates which should be assigned high V th by queuing all the gates in certain order, and then to assign these gates high V th until the queue is empty. In this paper, a dual V th assignment scheme modified on our previous work [12] is used in the reduced DAG after our DAG pruning phase.
Experimental results and discussions
This section describes the implementation and shows the experimental results for ISCAS85 benchmark circuits. The simulation results are based on a standard cell library constructed using the PTM 90nm bulk CMOS model [20] . V dd =1.0V, |V thl,mean |=300mV |V thh.mean |=500mV, vth =10%V th,mean are set for all the transistors in the circuits. The leakage current lookup tables are created using HSPICE and PTM 90nm bulk CMOS model. All ISCAS85 benchmark circuit netlists are synthesized using a commercial synthesis tool and mapped to the 90nm standard cell library. A static timing analysis (STA) tool [12] is modified in a statistical way mentioned above to analyze the timing constraints of each circuit. The experiments were performed on a computer with Intel 2GHz CPU and 2GB RAM.
We compared three different conditions: corner based STA, worst case SSTA (p 1), moderate constraint (p=95%) SSTA under 5% and 0% timing performance constraint (D req ) relaxation, respectively.
In table 1, it is shown that both our worst case SSTA and moderate constraint SSTA can achieve very impressive leakage reductions and graph reduction potentials when the performance constraint relaxation is 5%. N r represent gates remained after DAG pruning procedure. N H represents gates that are assigned high V th after the optimization. Normalized I leak is the leakage current normalized to maximum leakage current, i.e. the leakage current where all gates are with low V th . After DAG pruning, only about 60% gates remain using worst case SSTA and moderate constraint SSTA, whereas more than 70% gates remain using STA. Secondly, about 90% gates can be assigned high V th in SSTA, while only about 80% gates can be assigned high V th in STA. Finally, 58.0% and 61.8% more leakage saving can be achieved by worst case SSTA and moderate constrain SSTA compared with that by STA.
The SSTA-based optimization is better than STA-based optimization for the following two reasons: (1) STA does not consider correlation of delay between gates, i.e. it treats delay as independent of each other, while in actual circuits, spatially close gates tend to affect the delay of each other; (2) STA specifies that delay MUST satisfy the timing constraints under the worst case delay, thus leads to more strict timing requirement at the expense of less leakage reduction. The same reasons also explain why the DAG pruning procedure is more effective in the two SSTA conditions than in the STA condition. Table 2 shows the condition that the performance constraint relaxation is 0%. The worst case SSTA-and moderate constraint SSTA-based optimization can still achieve on average 38.9% and 42.1% more leakage saving than the STA -based optimization. And after DAG pruning procedure, only less than 70% gates remains in worst case SSTA and moderate constraint SSTA conditions, whereas more than 75% gates remains in STA condition.
We further compare the runtime of optimization w and w/o DAG pruning in Table 3. Step 1 is the DAG pruning phase,
Step 2 is the dual V th assignment in the remaining DAG. If the DAG pruning procedure is not used, the run time can be up to 2X of that with DAG pruning for larger circuits in ISCAS85 benchmark circuits. Furthermore the runtime shown in table 3 consists with the remaining gates number in Table 1 and 3: that is: the more gates remain after graph reduction, the longer the total runtime of the algorithm.
Summary
We present an approach to statistically assign dual threshold voltages while performing low-power optimization. The DAG pruning procedure of our algorithm makes it especially efficient in SSTA without losing its accuracy. The effectiveness and efficiency of our approach have been demonstrated by the experimental results, which show that at least 40% leakage power can be reduced compared with STA-based method. In addition, by using the statistical DAG pruning procedure, approximately 30~40% gates can be removed and up to 50% runtime can be saved without affecting the timing yields of the circuit. Furthermore, the correlation between gates can be easily adapted to more complex situations according to the actual circuit layout information. 
