Dynamic power consumed in CMOS gates goes down quadratically with the supply voltage. By maintaining a high supply voltage for gates on the critical path and by using a low supply voltage for gates off the critical path it is possible to dramatically reduce power consumption in CMOS VLSI circuits without performance degradation. Interfacing gates operating under multiple supply voltages, however, requires the use of level converters, which makes the problem modeling difficult. In this paper we develop a formal model and develop an efficient heuristic for addressing the use of two supply voltages for low power CMOS VLSI circuits without performance degradation. Power consumption savings up to 25% over and above the best known existing heuristics are demonstrated for combinational circuits in the ISCAS85 benchmark suite.
INTRODUCTION
Supply voltage reduction is one of the most effective techniques in reducing power consumption of CMOS circuits. The majority of power consumed is dynamic power, which is reduced quadratically with the voltage ¢ ¡ £ ¡ . Reducing
¢ ¡ £ ¡
, unfortunately, leads to an increase in delay which results in performance degradation of the entire circuit. Recently many papers have been published on techniques to reduce ¡ £ ¡ without degrading performance, [1] , [2] , [3] , [4] .
Scaling down the threshold voltage,
¥ ¤ § ¦
, with
¢ ¡ £ ¡
was employed in [1] . This approach, however, faces the problem that the standby leakage current increases significantly because of low ¥ ¤ § ¦ slight area penalty.
In the design of high performance circuits reducing the critical path of designs takes up the majority of the design time. This results in excessive slack in various structural paths in the design. Therefore, it is extremely desirable that CAD tools automatically find such slack and exploit it for power reduction. In [4] simple greedy sub-optimal heuristics are proposed for utilizing this available slack using a dual supply voltage scheme for obtaining significant reduction in power consumption.
In this paper we develop a formal model for the use of two or more supply voltages for reducing power consumption in CMOS circuits without degrading performance. An efficient heuristic is then derived from this model. Our technique uses an iterative method based on linear programming, LP, and solves the problem in a near optimal manner using reasonable amount of CPU time.
PRELIMINARIES AND RELATED WORK
If a gate operating at a lower supply voltage were to directly feed a gate operating with a higher supply voltage then large amount of static current is likely to flow in the PMOS of the gate with the higher supply voltage. This is due to the fact that when the output of the low voltage gate is at "1" its voltage level may not be sufficient to turn-off the PMOS of the succeeding high voltage gate. This problem can be avoided by using the level conversion circuit shown in Fig. 1 . While the level-converter eliminates the static power dissipation it dissipates substantial power while toggling. In addition, introducing a level converter in the circuit may lead to performance degradation due to the propagation delay of the level converter which may or may not be substantial. Any formal technique that attempts to formulate the use of multiple supply voltages for circuit design must therefore take the delay of and the power dissipated in the level converters into account. Previous work in addressing this problem has concentrated on solving the problem at function level, [5] , [6] , [7] , [8] . In these, the problem is addressed as one of finding an optimal schedule for a data-flow description of an algorithm. At function level there are two factors that allow for relatively easy problem modeling: 1) The problem sizes are relatively small; so slow techniques like Integer Linear Programming, ILP, can be fruitfully employed.
2) The level converter delay is insignificant in comparison to functional unit delays and hence can be ignored. These two factors can be completely unrealistic when we consider gate level circuits. Our technique tackles these difficulties by: 1) Using linear programming, LP, techniques which are polynomial time techniques.
2) Explicitly modeling the delay of the level converters and the power consumed by them. 
NOTATION
We will now define all the attributes mentioned previously.
We call a circuit safe when all nodes V have H I 4 P 9 @ A ) and all wires have
.
Delay Balancing
A given circuit-graph G can be transformed to a functionally equivalent circuit-graph G' by introducing appropriate number of unit delay buffers into the circuit in such a manner that for every
. This process is known as delay balancing. For our purposes we use delay balancing as a tool to capture all the slack in the circuit. The delay buffers we use for delay balancing are fictitious entities whose only purpose is to model the slack present in the circuit. We refer to these fictitious buffers as UDFs (Unit Delay Fictitious-Buffers). Fig. 2 shows a gate level circuit and Fig. 3 shows its delay balanced counterpart; the "boxed" numbers in the wires of the circuit in Fig. 3 represent the number of UDFs on that wire. Starting with a given circuit-graph there are several possible ways to produce a delay balanced graph. Any such delay balanced graph will from now on be referred to as a delay balanced configuration. Critical Path = 8 RT(j)/SL(j)/AT(j) Figure 3 . The circuit in Fig. 2 after delay balancing. The boxed integers on wires represent the number of UDFs added to the wires for delay balancing.
UDF-Displacement
We define UDF-Displacement, a circuit-graph transformation technique, as a mapping r:V h Z, Z: the set of integers ; such that the number of UDFs in the wire
, after UDF-Displacement is related to the number of UDFs before UDF-
for all wires ! " # E. We state the following without proof.
Theorem 1 All legal delay balanced configurations for a given circuit-graph G are UDF-Displaced versions of each other.

Theorem 2 The net change in the number of UDFs in any structural path from a node/gate to another node/gate
The above theorem gives rise to the following corollary.
Corollary 1 If we connect all the gates PO (primary outputs) of a given combinational circuit to a common dummy node O through dummy wires then, if we restrict r(O) to be exactly 0 and also r(I) for every input node I
PI to be exactly 0, then the critical path of the transformed circuit-graph after UDF-displacement remains unaltered. 
PROBLEM FORMULATION
and then: Fig. 4 Fig. 4 :
Produce a random delay balanced configuration of the given circuitgraph. We use a depth first UDF insertion heuristic for this. 2. Transform the circuit-graph using the transformation illustrated in Fig. 4 as follows: i)introduce¨! auxiliary nodes/gates with a delay of 0 units in between gate
Y and each of its fanout nodes as shown in
u y ¶ $ W Ỳ b a º ¹ p W Ỳ ¢ » g a ¢ z { p Q W Ỳ b a m u y ¶ $ W Ỳ b a º ¹ p W 6 Y a @ z p W 6 Y ¼ » a m D v ³ ª " ¬ ® q © " ! (3) ½ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¿ ¾ ¾ ¾ ¾ ¾ ¾ ¾ À Á ¹ p W 6 Y ² a @ z p W 6 Y ² R ¼ » a µ v ª © " ! Q Â » @ ® g ! 8 § Ã 1 ª " ¬ ® q © Á ¹ p W 6 ® ² a ¢ z { p Q W Ỳ ² Ä s Å a h g AE È Ç ( É q W è ! " # µ Ê a µ Ã ( | ª " ¬ ³ ® ® © u y ¶ $ W Ỳ b a m z u y ¶ · W 6 ® ² a ¹ p W 6 Y ² a @ z p W 6 Y ² R ¼ » a µ v ª © ! Q Â » ® g ¨ 3 ! § Ã 1 ª " ¬ ® q © u y ¶ $ W Ỳ b a m z u y ¶ · W 6 ® ² a ¹ p W 6 ® ² a ¢ z { p Q W Ỳ ² Ä Å a h g AE È Ç ( É q W è ! " # Ê a µ Ã ( | ª " ¬ ³ ® ® © (4) n u e g " e u ® y r b 3 W 6 Y b a º Ë u y ¶ · W 6 Y b a @ z { u 6 y g ¶ · W 6 ® ² Q a m § Ã 1 ª " ¬ ® q © (5) S Á ¹ u y ¶ · W 6 Y b a ¹ ¬ I Á ¹ u 6 e q Q e g u q y g r h i W Ỳ b a ¹ ¬ I(6)
Set up the linear power benefit function, COST,
Ì v r h v § w v Í e o } X Î ¢ V Ï c « Ð ! 3 t ( Ñ W Ỳ b a R u 6 y g ¶ · W 6 Y h a m W § Ò ¡ ¡ $ Ó z Ò ¡ ¡ Ó ao ³ Ô ¤ ¢ g Õ ¡ e Ö g Ñ W 6 Y b a D u e g " e u ® y r h i W 6 Y b a µ W § Ò ¡ £ ¡ Ó o # Ó g u Ò ¡ ¡ Ó o 6 Q # » a D Õ ¡ e Ö(7)
Solve PROUD using the heuristic PRHEUDENT, Power Reducing Heuristic with UDF Displacement and output the power gain.
The inequalities, (4), (5), (7), ensure non-negativity of number of UDFs on edges of the transformed circuit-graph.
Under then use the heuristic in [4] to determine the supply voltage for these nodes. Now terminate PRHEUDENT and output the power gain and the supply voltages used for all the gates in the netlist.
In our simulations we iterated the steps 1., 2., 3. i) only once and then used 3. ii) to determine the supply voltage for all the remaining nodes.
The Non-Integral Delay Difference Model
The obvious way to deal with the case where a gate has a nonintegral i ! 
, unit delay gates with delay(5V) = 1 unit, delay(3V) = 2 units, i.e., 
EXPERIMENTAL RESULTS
We used the PROUD formulation to explore the use of two supply voltages for all the combinational benchmark circuits in the IS-CAS85 benchmark suite. The gates in the circuit were assumed to have a capacitance equaling the maximum number of transistors in a worst case charging/discharging path, e.g, a 5-input NAND/NOR gate is assumed to have a capacitance of , was fixed at the critical path of the benchmark circuit under simulation.
As can be seen in Table. 1 three different cases were simulated which differed in the values of ¡ £ ¡ and/or the delay of the level converter. As can be observed there is substantial increase in the power savings over the greedy technique in [4] when PRHEUDENT is employed. For case i) with the voltage pair (5V, 3V), see Table. 1, there is 2.5%-12.7% increase in power savings with PRHEUDENT. For case ii) with voltage pair (5V, 3V), see Table. 1, the increase in power savings with PRHEUDENT go up marginally for each benchmark circuit and now lie in the range 3%-14.7%. The real benefit of PRHEUDENT is, however, evident with the solo case with the voltage pair (5V, 2.4V), see Table. 1, here the increase in power savings lie in the range 7.1%-25.8%.
The CPU time for solving PRHEUDENT (Ultra-Sparc10) however increases considerably when we handle more complex situations. This is evident by the considerable increase in CPU time from the simplistic model Case I to the more complicated model Case III in Table. 1.
We believe that better CPU times may be possible if we use a better LP package. The LP solver currently used was lp solve, available at ftp://ftp.es.ele.tue.nl/pub/lp solve.
In summary we conclude that we have developed a useful formal mathematical model and an effective heuristic for handling dual voltage supplies to reduce power consumption at gate-level. Future research in this area would be directed towards extending the scheme to more than two supply voltages. Also, important is to figure out ways to reduce the execution time of PRHEUDENT.
