Abstract-We propose a technique to use dual supply voltages in digital designs to reduce energy consumption. New algorithms are proposed for finding and assigning a lower voltage in a dual voltage design. Given a circuit and a supply voltage and an upper bound on the critical path delay, the first algorithm finds an optimal lower supply voltage and a second algorithm assigns that lower voltage to selected gates. A linear time algorithm described in the literature is used for computing slacks for all gates of the circuit for a given supply voltage. For the computed gate slacks and the lower supply voltage, all gates are divided into three groups such that no gate in the first group can be assigned the lower supply, all gates in the second group can be simultaneously set to lower supply while maintaining positive slack for every gate, and gates in the third group are assigned low voltage, iteratively, in selected subsets at a time. The gate slacks are recalculated after each such voltage assignment. Thus, the overall complexity of this reduced power dual voltage assignment procedure is O(n 2 ). SPICE simulations of ISCAS'85 benchmark circuits using the 90-nm bulk CMOS technology results show up to 60% energy savings.
I. INTRODUCTION
Power and performance are two main constraints in modern VLSI design. An effective design is obtained with the optimization of these two parameters. This requirement for higher performance and lower power directly results from the increasing demand for portable electronic devices. Decreasing the supply voltage results in decreased performance. Hence, a trade off is necessary between power consumption and circuit delay. The use of multiple supply voltages to reduce energy consumption is a very commonly used technique in CMOS circuits. This results from the fact that the dynamic power of a CMOS circuit is directly proportional to the square of its supply voltage [6] , [17] . The underlying idea of this technique is to trade timing slacks for power reduction under given timing requirements. Generally, the gates in the critical path are kept at high supply voltage and the gates in non-critical paths are put to lower supply voltage, thus avoiding timing violation.
Usami and Horowitz [19] describe the clustered voltage scaling (CVS) technique in which the cells driven by each power supply are grouped together and use of level converter is avoided by not allowing a V L gate feeding a V H gate. The extended clustered voltage scaling (ECVS) proposed in [20] removes this restriction by allowing the usage of a level converter [15] at such boundaries. This is referred to as asynchronous level conversion. Both CVS and ECVS aim at utilizing the surplus timing of the non-critical paths in a circuit by applying a lower supply voltage on gates that are on these paths. This reduces dynamic power dissipation and hence lowers system level power dissipation. In our work, we do not use level converters, due to the overheads associated with them [4] , [8] , [11] , [14] .
Several other algorithms have been proposed in the literature for dual/multiple-voltage assignments modifying CVS and ECVS algorithms. A greedy algorithm (GECVS) is used in [14] to group gates for V L assignment based on a sensitivity measure derived from the gate slacks, change in total power and delay. The authors claim to improve power savings up to 28% and 13% with respect to CVS and ECVS, respectively.
In [12] , the authors use a mixed integer linear programming (MILP) technique to find the V L value for circuits operating in the subthreshold region. Then an ECVS kind of method is used with multiple logic gates between the V L and V H boundaries instead of level converters.
There are many algorithms for assigning a fixed low voltage value to the gates of a circuit, but relatively fewer algorithms to find the lower voltage. Many of these works related to voltage assignment have used a V L value of 70% of V H [5] , [9] , [14] , [16] , [17] . The work described in [7] , [14] , [18] claims that the optimal value of V L for minimizing total power is 50% of V H . Authors in [21] describe an algorithm to find the lowest feasible supply voltages according to their slacks from a set of given voltages. An additional 19.55% power savings were obtained by this technique over the CVS method [19] . An algorithm to find optimum V L value is described in [11] . The authors assign a low voltage value to a group of gates based on a modification of CVS algorithm and then calculate energy over a set of low voltages. The V L value resulting in minimum energy is chosen to be the lower voltage. This algorithm requires the voltage assignment to be done for each voltage value and is exhaustive in nature.
In Section II, we propose an algorithm to find the lower supply voltage which maximizes the power savings obtained by the number of gates in a specific group. As determined by their slacks these gates can be assigned the lower supply voltage without violating the critical path constraint irrespective of the voltage assignment of other gates. Section III describes an algorithm to assign this V L to the gates in the circuits using gate slack. All algorithms are written in Perl programming language. The results are discussed in Section IV and Section V concludes the paper. AN OPTIMAL LOWER SUPPLY VOLTAGE
II. SLACK-BASED ALGORITHM FOR FINDING
In our work, slack of a gate is defined as the difference of the critical path delay of the circuit and the delay of the longest path through that gate [11] , [13] . Thus, each gate has its own slack and the gates with same slack fall on the same path unless there are two paths with equal delays. We propose three theorems to categorize the gates in any given circuit based on the gate slacks.
A. Theorem I
Consider all gates with the high voltage supply, initially. The increase in the individual gate delay due to lowering of its supply voltage as a function of its slack in the all high voltage circuit is shown in Figure 1 .
Statement: All gates that fall above the 45 o line in the 'Delay increment versus slack' plot cannot be assigned lower supply voltage without violating the positive slack constraint. These gates belong to group 1.
B. Theorem II
We define two new variables introduced in a recent work [10] , [13] , β and S u . S u , the upper slack time is the lower bound of slacks of the gates which can be unconditionally assigned low voltage without affecting the critical timing of the circuit. These include gates with a large slack, i.e., shorter paths which do not affect the critical path time even when their delay is increased as result of assigning all the gates in these paths a lower supply voltage.
Statement:
where β is the ratio of the low voltage delay and the high voltage delay of each gate and T c is the critical path delay.
is the low voltage delay and d hi is the high voltage delay of gate 'i' . The maximum value of β, β max , will give us the lower bound on the gate slacks. The number of gates with slacks greater than S u in a given circuit are represented by G and these gates are grouped under group 2.
C. Theorem III
The set of gates whose slack fall below the 45 o line in the 'Delay increment versus slack' plot of Figure 1 and whose slacks are also less than S u forms group 3. The cardinality of this set is P .
Statement: There exists a group of gates within P which can be assigned lower supply voltage simultaneously without violating the positive slack constraint satisfying the condition
where y i is the difference of the low voltage delay and the high voltage delay of each gate.
where d li is the low voltage delay of gate 'i' , d hi is the high voltage delay of gate 'i' and slack hi is the slack of the gate 'i' when it is in high voltage."
The proofs of the above theorems are found in the literature [3] . Having described the relevant theorems, we now describe our algorithms.
D. Algorithm 1
Step 1: We use the O(n) slack calculation algorithm proposed in [13] , [10] to find out the gate slacks for a given circuit.
Step 2: The gates are divided into groups 1, 2 and 3 as described by Theorems I, II and III.
Step 3: Once this is done, we estimate the dynamic energy savings for the gates in group 2 and 3 together for the circuit.
The dynamic energy when all gates in groups 2 and 3 together are assigned a high supply voltage will be proportional to
Similarly, the dynamic energy when all gates in groups 2 and 3 together are assigned a low supply voltage will be proportional to
Then the energy savings for groups 2 and 3 together is estimated as
(1) Step 4: Now, repeat steps 2 and 3 for each value of V L within the specified range of voltages
Step 5: Find the low voltage V L1 as the V L when E save1 is maximum.
Emperically, the optimum lower voltage for no level converters allowed in the circuit is found to be
Thus, we obtain the value of the lower supply voltage giving a maximum energy savings. The gate slacks are compared to the d l − d h values for each gate and for each value of V L available. And then the gates are divided into respective groups based on their slacks. Then the energy savings is calculated for each value of V L available and a minimum value is found; the corresponding V L is the required lower voltage value. Since the number of voltages available for V L is negligible when compared to the number of gates in the circuit, this algorithm is proportional to the number of gates in the circuit. Thus, its complexity is linear, or O(n) for n gates.
III. SLACK-BASED ALGORITHM FOR DUAL VOLTAGE ASSIGNMENT A. An example of a chain of inverters
Consider the chain of inverters shown in Figure 2 . We simulated this circuit in using Synopsys HSPICE program [2] , with voltages V 1 and V 2 as 0.4V, 0.6V, 0.8V, 1.0V and 1.2V. A 1 GHz 50% duty-cycle clock is applied at the input and a capacitance of 6fF has been used as the load at the output. The results for 90nm technology are presented in Figure 3 . It reports the total energy consumption and delay for the circuit at various values of V 1 and V 2 .
The energy values shown in the green squares are when V 1 and V 2 are equal, corresponding to a single voltage operation. The values reported in blue squares below the V 1 = V 2 diagonal are the values when V 1 is greater than V 2 , i.e., when a high voltage gate is feeding a low voltage gate. The squares above this diagonal represent the operation when V 2 is greater than V 1 , i.e., when a low voltage gate feeds a high voltage gate. We observe that the delay measurement in top cells fails, represented as infinite delay, when the voltage difference is large. For all cases above the diagonal, although logic 1 level matched V DD , logic 0 levels for the five inverters near the output were higher than ground. That produced significantly higher leakage. This indicates the necessity for level conversion at the voltage boundry. The use of level converters has been studied in the literature [4] , [8] . However, the design of such devices is still evolving and as problems with their performance have been reported. Especially, their performance in terms of power and delay overheads deteriorates as the difference between the two voltages increases, further limiting the capability for power reduction.
Constraints on circuit topology which restrict a low voltage gate driving a high voltage gate are called topological constraints in our work. For all cases where a high voltage gate feed low voltage gate, energy savings are seen. These results demonstrate the effectiveness of using topological constraints in dual-V DD designs. In the remaining part of this section we describe a new algorithm that assigns low voltage to the gates of a given circuit using topological constraints.
B. Algorithm 2
Step 1: Initially we assume that all gates are at high voltage, i.e., all the gates are connected to V H supply voltage.
Step 2: Once we obtain the V L value using Algorithm 1, we assign the gates in group 2 the lower supply voltage.
Step 3: Then we recalculate the slack. Theorem II mandates that no negative slack occurs during this voltage assignment.
Step 4: The gates are again divided into new groups 1, 2 and 3.
Step 5: The circuit is levelized and starting from the primary outputs, we take a small group of high-voltage gates out of group 3 satisfying the condition stated in Theorem 3 and assign them low voltage.
Step 6: Recalculate slacks.
Step 7: Once we assign low voltage to this group of gates, we check whether there are any low voltage gates driving high voltage gates anywhere in the circuit.
Step 8: If this occurs, the supply of that low voltage gate is changed back to the high voltage.
Step 9: Gate slacks are calculated again.
Step 10: The gates are redivided into groups 1, 2 and 3.
Step 11: Steps 5 to 10 repeated on the remaining gates in a reverse levelized manner until we reach the primary inputs. Thus, we have described an algorithm to assign low voltage to the gates which puts restrictions on the circuit structure.
IV. RESULTS
We run our algorithm on ISCAS'85 benchmark circuits. These benchmark circuits are synthesized using a small set of 90nm standard cells consisting of an inverter, INV, a two-input NAND gate, NAND2, a threeinput NAND gate, NAND3, and a two-input NOR gate NOR2 using bulk-CMOS models [1] . Each circuit is simulated with a logic simulator with randomly generated input vectors to determine its average activity. The gate delays, average activity of each node and node capacitances of each circuit are obtained from SPICE simulations done for supply voltages ranging from 0.4V to 1.2V in 0.01V steps. These values are also tabulated for each standard cell with fanout of one to four [3] . For all SPICE simulations, 90nm bulk CMOS predictive technology models with 0.3V threshold voltage are used at room temperature and the higher supply voltage used is V H = 1.2V. One hundred random input vectors are used for simulations and energy per vector is found.
Algorithms 1 and 2 are used on each of the ISCAS'85 benchmark circuits to obtain the lower supply voltage, V L , the number of gates which can be assigned low voltage using our algorithms, the final energy savings and the CPU time to run the algorithms. E save for the ISCAS'85 benchmark circuits, using V L calculated from Algorithm 1, is reported in Table I . Also, we compare these with the results obtained when V L = 0.7 × V H and when V L = 0.5 × V H following the previous work. It is observed that the expected energy saving is large when we use V L given by Algorithm 1. As great as 20% more savings are seen for circuit c2670 when compared to the savings obtained by using 0.7 × V H .
In Table II , E single V DD and E dualV DD are the SPICE results for energy per vector consumed in single-voltage design and that in dual-voltage design respectively. Also, E savg.−expc. is the energy savings estimated using the lower supply voltage value given by Algorithm 1. The actual energy savings reported by SPICE for dual-voltage design is E savg.−obs. and it is observed to be very close to the estimated values. Also, we compare these results with the results reported by authors in [11] . The observed savings using our algorithms are as much as 14% more than those reported in [11] for circuit c2670 and the CPU time is 20 times less. In general, our algorithms give higher energy savings and execute in lower CPU time. Figures 1 and 4 show the delay increment versus slack graphs for the initial slacks and final slacks, respectively, for the c880 circuit. The brown markers indicate gates in high voltage and blue markers indicate gates in low voltage. It is seen that initially all gates are in high voltage. After Algorithms 1 and 2 are used and slacks are recalculated, all gate slacks are reduced. High voltage gates tend to concentrate at lower slack values and many have also moved above the 45 o line.
The low voltage gates that are still below the 45 o line are gates with very large slacks and the high voltage gates still lying below the line indicate the gates that cannot be put in low voltage due to topological constraints imposed by Algorithm 2. Also, we can see a few brown dots to the right of S u line. These are again the gates which cannot be put in low voltage due to topological constraints, but have slacks greater than S u , as seen in previous section.
V. CONCLUSION
An algorithm that finds a lower supply voltage maximizing the energy savings from a specific group of gates in a given circuit is proposed. This algorithm is proportional to the number of gates in the circuit as each gate slack is compared to the difference between the low voltage delay and high voltage delay of the gate. Thus, its complexity is linear, i.e.; O(n), where n is the number of the gates. Another algorithm is proposed which assigns this lower supply voltage found using the gate slacks. The gate slacks are recalculated after each iteration of voltage assignment. Both the voltage assignment and the slack calculating algorithms are linear in time. Thus, the overall complexity of this method is quadratic, i.e., O(n 2 ), where n is the number of gates. But in practice, it is found to use less time than that taken by other algorithms proposed in the literatute [11] . Energy savings of up to 60% are seen for ISCAS'85 benchmark circuits. Such high savings have not been reported in the earlier work. Sufficient theoretical and experimental work has been done to validate these results.
