We present an algorithm called MOVER (Multiple Operating Voltage Energy Reduction) to minimize datapath energy dissipation through use of multiple supply voltages. In a single voltage design, the critical path length, clock period, and number of control steps limit minimization of voltage and power. Multiple supply voltages permit localized voltage reductions to take up remaining schedule slack. MOVER initially finds one minimum voltage for an entire datapath. It then determines a second voltage for operations where there is still schedule slack. New voltages can be introduced and minimized until no schedule slack remains. MOVER was exercised for a variety of DSP datapath examples. Energy savings ranged from 0% to 50% when comparing dual to single voltage results. The benefit of going from two to three voltages never exceeded 15%. Power supply costs are not reflected in these savings, but a simple analysis shows that energy savings can be achieved even with relatively inefficient DC-DC converters. Datapath resource requirements were found to vary greatly with respect to number of supplies. Area penalties ranged from 0% to 170%. Implications of multiple voltage design for IC layout and power supply requirements are discussed.
INTRODUCTION
A great deal of current research is motivated by the need for decreased power dissipation while satisfying requirements for increased computing capacity. In portable systems, battery life is a primary constraint on power. However, even in nonportable systems such as scientific workstations, power is still a serious constraint due to limits on heat dissipation.
One design technique that promises substantial power reduction is "voltage scaling," which refers to the tradeoff of supply voltage against circuit area and other CMOS device parameters to achieve reduced power dissipation while maintaining circuit performance. The dominant source of power dissipation in a conventional CMOS circuit is due to the charging and and discharging of circuit capacitances during switching. For static CMOS, the switching power is proportional to V dd 2 [Rabaey 1996] . This relationship provides a strong incentive to lower supply voltage, especially since changes to any other design parameter can only achieve linear savings with respect to the parameter change. The penalty of voltage reduction is a loss of circuit performance. The propagation delay of CMOS is approximately proportional to V dd ր ͑V dd Ϫ V T ͒ 2 [Rabaey 1996] , where V T is the transistor threshold voltage.
A variety of techniques are applied to compensate for the loss of performance with respect to V dd , including reduction of threshold voltages, increasing transistor widths, optimizing the device technology for a lower supply voltage, and shortening critical paths in the datapath by means of parallel architectures and pipelining.
Datapath designs can benefit from voltage scaling even without changes in device technologies. Algorithm transformations and scheduling techniques can be used to increase the latency available for some or all datapath operations. The increased latency allows an operation to execute at a lower supply voltage without violating schedule constraints. "architecture-driven voltage scaling" is the name applied to this approach.
A number of researchers have developed systems or proposed methods that incorporate architecture-driven voltage scaling [Chandrakasan et al. 1995; Raghunathan and Jha 1994; Raghunathan and Jha 1995; Goodby 1994; Kumar et al. 1995; SanMartin and Knight 1995; Raje and Sarrafzadeh 1995; Gebotys 1995] . HYPER-LP [Chandrakasan et al. 1995 ] is a system that applies transformations to the dataflow graph of an algorithm to optimize it for low power. Other systems accept the algorithm as given and apply a variety of techniques during scheduling, module selection, resource binding, etc., to minimize power dissipation. All of the systems mentioned above try to exploit parallelism in the algorithm to shorten critical paths so that reduced supply voltages can be used. Most systems [Chandrakasan et al. 1995; Raghunathan and Jha 1994; Raghunathan and Jha 1995; Goodby et al. 1994; Kumar et al. 1995; Gebotys 1995 ] also minimize switched capacitance in the datapath.
Most voltage-scaling approaches require that the IC operate at a single supply voltage. Although substantial energy savings can be realized with a single minimum supply voltage, one cannot always take full advantage of available schedule slack to reduce the voltage. Nonuniform path lengths, a fixed clock period, and a fixed number of control steps can all result in schedule slack that is not fully exploited. Figure 1 provides examples of each type of bottleneck. When there are nonuniform path lengths, the critical (longest) path determines the minimum supply voltage, even though the shorter path could execute at a still lower voltage and meet timing constraints. When the clock period is a bottleneck, some operations only use part of a clock period. The slack within these clock periods goes to waste. Additional voltages would permit such operations to use the entire clock period. Finally, a fixed number of control steps (resulting from a fixed clock period and latency constraint) may lead to unused clock cycles if the sequence of operations does not match the number of available clock cycles.
Literature on multiple voltage synthesis is limited, but this is changing. Publications that address the topic include Raje and Sarrafzadeh [1995] ; Gebotys [1995] ; and Johnson and Roy [1996] . Raje and Sarrafzadeh [1995] schedule the datapath and assign voltages to datapath operators so as to minimize power given a predetermined set of supply voltages. Logic level conversions are not explicitly modeled in their formulation. Gebotys [1995] used an integer programming approach to scheduling and partitioning a VLSI system across multiple chips operating at different supply voltages. Johnson and Roy [1996] used an integer program to choose voltages from a list of candidates, schedule datapath operations, model logic level conversions, and assign voltages to each operation. Chang and Pedram [1996] address nearly the same problem, applying a dynamic programming approach to optimize nonpipelined datapaths and a modified list scheduler to handle functionally pipelined datapaths.
DATAPATH SPECIFICATIONS
A datapath is specified in the form of a dataflow graph (DFG) where each vertex represents an operation and each arc represents a dataflow or latency constraint. This DFG representation is similar to the "sequencing graph" representation described by DeMicheli [1994] , except that hierarchical and conditional graph entities are not supported.
The DFG is a directed acyclic graph, G͑V, E͒, with vertex set V and edge set E. Each vertex corresponds one-to-one with an operator in the datapath. Each edge corresponds one-to-one with a dependency between two operators: a dataflow, a latency constraint, or both. Associated with each vertex 
Datapath Scheduling with Multiple Supply Voltages & Level Converters
• is an attribute that specifies the operator type such as adder, multiplier, or null operation (NO-OP). Associated with each edge is an attribute that indicates a latency constraint between the start times of the source and destination operations. A positive value indicates a minimum delay between operation start times. The magnitude of a negative value specifies a maximum allowable delay from the destination to the source. Figure 2 provides a simple example of a datapath specification and defines elements of the DFG notation.
Two types of NO-OPs are used, which we refer to as "transitive" and "nontransitive" NO-OPs. The term "transitive" is used to indicate that a NO-OP propagates signals without any delay or cost. Neither type of NO-OP introduces delay or power dissipation. Both serve as vertices in the DFG to which latency constraints can be attached. The transitive NO-OP is treated as if signals and their logic levels are propagated through the NO-OP.
MOVER SCHEDULING ALGORITHM
MOVER will generate a schedule, select a user-specified number of supply voltage levels, and assign voltages to each operation. MOVER uses an ILP method to evaluate the feasibility of candidate supply voltage selections, to partition operations among different power supplies, and to produce a minimum area schedule under latency constraints once voltages have been selected. The algorithm proceeds in several phases. First, MOVER determines maximum and minimum bounds on the time window in which each operation must execute. It then searches for a minimum single supply voltage. Next, MOVER partitions datapath operations into two groups: those that will be assigned to a higher supply voltage and those that will be assigned to a lower supply voltage. The high voltage group is initially fixed to a voltage somewhat above the minimum single voltage. MOVER then searches for a minimum voltage for the lower group. The voltage of the lower group is fixed. A new minimum voltage for the upper group is sought. 
230
• To find a three-supply schedule, partition the lower voltage group and search for new minimum voltages for bottom, middle, and upper groups.
ILP Formulation
At the core of MOVER is an integer linear program (ILP) that is used repeatedly to evaluate possible supply voltages, partition operations between different power supplies, and produce a schedule that minimizes resource usage. In each case, MOVER analyzes the DFG and generates a collection of linear inequalities that represent precedence constraints, timing constraints, and resource constraints for the datapath to be scheduled. A weighted sum of the energy dissipation for each operation is used as the optimization objective when partitioning operations or evaluating the feasibility of a supply voltage. A weighted sum of resource usage serves as the optimization objective when minimizing resources. The inequalities and the objective function are packed into a matrix of coefficients that are fed into an ILP program solver (CPLEX). MOVER interprets the results from CPLEX and annotates the DFG to indicate schedule times and voltage assignments.
The architectural model assumed by MOVER is depicted in Figure 3 . All operator outputs have registers. Each operator output feeds only one register. That register operates at the same voltage as the operator supplying its input. All level conversions, when needed, are performed at operator inputs.
MOVER's ILP formulation works on a DFG where voltage assignments for some operations may already be fixed. For operations not already fixed to a voltage, the formulation chooses between two closely spaced voltages so as to minimize energy. The voltages are chosen to be close enough together that level conversions from one to the other can be ignored. Consequently, level conversions only need to be accounted between operations fixed to different voltages and on interfaces between fixed and unfixed operations.
ILP Decision Variables
Three categories of decision variables are used in the MOVER ILP formulation. One set of variables of the form x i, l, s indicates the start time and 
Datapath Scheduling with Multiple Supply Voltages & Level Converters
• supply voltage assignment for each operator that has not already been fixed to a particular supply voltage. x i, l, s ϭ 1 indicates that operation i begins execution on clock cycle l using supply voltage s. Under any other condition, x i, l, s will equal zero. The supply voltage selection is limited to two values where s ϭ 1 selects the lower and s ϭ 2 selects the higher candidate voltage. Another set of variables, x i, l , indicates the start time of operations for which the supply voltage has been fixed. x i, l ϭ 1 indicates that operation i starts at clock cycle l. Under any other condition, x i, l will equal zero. The last group of variables, a m, s , indicates the allocation of operator resources to each possible supply voltage. a m, s will be greater than or equal to the number of resources of type m that are allocated to supply voltage s. In this case, s can be an integer in the range of ͑1, number of fixed supplies ϩ 2͒. sʰ͑1,2͒ corresponds to the new candidate supply voltages. s Ͼ 2 corresponds to supply voltages that have already been fixed.
Objective Functions
The objective function (Eq. (1)) estimates the energy required for one execution of the datapath as a function of the voltage assigned to each operation. Consider the energy expression split into two parts. The first nested summation counts the total energy contribution associated with operations not already fixed to a supply voltage. The second nested summation counts the total energy contribution of operations that are already fixed to a particular supply voltage.
For each operation j that has not been fixed to a supply voltage (e.g., jʰV free ), the first nested summation accumulates the energy of operation j ͑onrg͑ j, s j , c reg ͒), the register at the output of operation j ͑rnrg͑s j , fanout j ͒), and any level conversions required at the input to j ͑cnrg free ͑ j, s͒͒. s j is the index of the supply voltage assigned to operation j. fanout j is the fanout capacitive load on operation j. c reg is the input capacitance of a register to which the operation output is connected. The decision variables x j, l, s are used to select which lookup table values for operator, register, and level conversion energy are added into the total energy. We must sum over both candidate supply voltages s j and all clock cycles l in the possible execution time window R j of operation j. E conv is the set of DFG arcs that may require a level conversion, depending on voltage assignments. V oper is the set of DFG vertices that are not NO-OPs. V fix is the set of DFG vertices (operations) that have been fixed to a particular voltage. V free is the set of vertices that have not previously been fixed to a voltage.
For each operation j that has been fixed to a supply voltage, we again accumulate the energy of each operation, register, and level conversion. The only difference from the expression for free operations is that now all voltages in the expression are constants determined prior to solving the ILP formulation. Consequently, the index s j can be removed from the summation and the decision variable x.
cnrg free ͑ j, s͒ and cnrg fix ͑ j͒ represent the level conversion energy at the input of free and fixed operations, respectively. c ini is the input capacitance of operation i.
Equation (4) 
ILP Constraint Inequalities
Equation (5) guarantees that only one start time l is assigned to each operation i for which the supply voltage is already fixed. Equation (6) guarantees that only one start time l and supply voltage s can be assigned to each operation i that does not have a supply voltage assignment.
Datapath Scheduling with Multiple Supply Voltages & Level Converters
• Equation (7) guarantees that the voltage of a transitive NO-OP j matches the voltage of all operations supplying an input to the transitive NO-OP. V trnoop is the set of vertices in the DFG corresponding to transitive NO-OPs. E is the set of all arcs in the DFG.
Equation (8) enforces precedence constraints specified in the DFG. Simplified versions of the constraint can be used if the source or destination operations are fixed to a voltage. This constraint is an adaptation of the structured precedence constraint shown by Gebotys [1992] to produce facets of the scheduling polytope. Each arc ͑i, j͒ with a latency lat i, j Ն 0 specifies a minimum latency from the start of operation i to the start of operation j. Equation (8) defines the set of precedence constraint inequalities corresponding to DFG arcs where the source and destination operations are both free (not fixed to a voltage). Simplified versions of this constraint are used when source or destination operations are fixed to a voltage.
Equation (9) enforces maximum latency constraints specified in the DFG. Each arc ͑i, j͒ with a latency lat i, j Ͻ 0 specifies a maximum delay from operation j to operation i. Equation (9) defines the set of maximum latency constraint inequalities corresponding to arcs where the source and destination operations are both free (not fixed to a voltage). Simplified versions of this constraint are used when source or destination operations are fixed to a voltage. The remaining equations are simplifications of Equation (9).
Equations (10) and (11) 
Voltage Search
MOVER searches a continuous range of voltages when seeking a minimum voltage of one, two, or three power supply designs. The user must specify a convergence threshold V conv that is used to determine when a voltage selection is acceptably close to minimum. Let V hi and V lo represent the current upper and lower bound on the supply voltage. When searching for a minimum single supply voltage, all operations are initially considered to be free (not fixed to a voltage). When searching for a minimum set of two or three supply voltages, MOVER considers one power supply at a time. The voltage will be fixed for any operations not allocated to the supply voltage under consideration. Table I outlines the voltage search algorithm.
Partitioning
Partitioning is the process by which MOVER takes all free operations in the DFG and allocates each to one of two possible power supplies. Partitioning is not performed until a single minimum supply voltage is known for 
Set up the ILP constraint inequalities. Obtain a minimum energy schedule. Operations will only be assigned to V a if there is schedule slack available. There may be several ways that the operations can be partitioned. In such a case, the optimal ILP solution will maximize the energy dissipation of the lower voltage group (i.e., put the most energy-hungry operations in the lower voltage group). This will tend to maximize the benefit from reducing the voltage of the lower group.
Given a successful partition, operations assigned to V a will be put into the lower supply voltage group and operations assigned to V b will be put into the higher supply voltage group.
The partition will fail if all operations are allocated to the lower supply voltage, all operations are allocated to the higher supply voltage, or the ILP solver exceeds some resource limit. The first situation indicates that the minimum single voltage could be a bit lower. In this event, MOVER lowers 
CHARACTERIZATION OF DATAPATH RESOURCES
The results presented in this paper make use of four types of circuit resources: an adder, multiplier, register, and level converter. MOVER requires models of the energy and delay of each type of resource as a function of supply voltage, load capacitance, and average switching activity. Each type of resource was simulated in HSPICE using 0.8 micron MOSIS library models with the level-3 MOS model. Energy dissipation, worst-case delay, and input capacitances were measured from the simulation. All resources were 16-bits wide. Load capacitance on each output was 0.1pF. Input vectors were generated to provide 50% switching activities.
Datapath Operators and Registers
During optimization, operation energies and delays are scaled as a function of the voltage assignment being evaluated. Energy dissipation (E) for each operator and register scales with respect to supply voltage as
where E 0 is the energy dissipation of the operator or register measured at the nominal supply voltage V 0 . Delay (t p ) for each operator and register scale with respect to supply voltage as
where t p0 is the propagation delay measured at the nominal supply voltage V 0 . The energy and delay scaling factors were derived directly from the CMOS energy and delay equations described by Rabaey [1996] . Energy and Table II gives the model parameters used by MOVER for each type of resource. Note that the register delay given here is just the propagation time relative to a clock edge. Register setup time is treated as part of the datapath operator delays.
Level Conversion
Whenever one resource has to drive an input of another resource operating at a higher voltage, a level conversion is needed at the interface. Four alternatives were considered to accomplish this: omit the level converter, use a chain of inverters at successively higher voltages, use an active or passive pullup, or use a differential cascode voltage switch (DCVS) circuit as a level converter [Chandrakasan et al. 1994; Usami and Horowitz 1995] . We omit the level converter for step-down conversions and use the DCVS circuit for step-up conversions. Given appropriate transistor sizes, this circuit exhibits no static current paths and can operate over a full 1.5V to 5.0V range of input and output supply voltages. A model was needed that could accurately indicate the power dissipation and propagation delay of the DCVS level converter as a function of the input logic supply voltage V 1 , output logic supply voltage V 2 , and load capacitance. The circuit was studied both analytically and from HSPICE simulation results to determine a suitable form for the model equations. Coefficients of the equations were then calibrated so that the model equations would produce families of curves closely matching simulation results for V 1 ranging from 1.5V to 5V and V 1 ϩ V T Յ V 2 Յ 5V. These are the ranges of supply voltages for which a level converter is needed. Typical energy dissipation of the level converter was found to be on the order of 5 to 15pJ per switching event per bit, given a 0.1pF load. Typical propagation delay ranges were approximately 1ns for level conversions such as 3.3V to 5V or 2.4V to 3.3V. Propagation delays become large as the input voltage of the level converter falls towards 2V T . A 2.5V to 5V conversion had a delay of about 2.5ns. A 2V to 5V conversion had a delay of nearly 5ns.
RESULTS

Datapath Examples
ILP schedule optimization results are presented for six example datapaths: a four-point FFT (FFT4), the 5th-order elliptic wave filter benchmark (ELLIP) [Rao 1992 ], a 6th-order Auto-Regressive Lattice filter (LATTICE), a frequency sampled filter (FSAMP) with three 2nd-order stages and one 1st-order stage, a direct form 9-tap linear phase FIR filter (LFIR9), and a 5th-order state-space realization of an IIR filter (SSIIR). In the FFT datapath, complex signal paths are split into real and imaginary dataflows. For all other datapaths, the signals are modeled as noncomplex integer values. All dataflows were taken to be 16-bits wide. Switching activities at all nodes were assumed to be 50%, i.e., the probability of a transition on any selected 1 bit signal is 50% in any one sample interval.
Each example was modeled for one sample period with dataflow and latency constraints specified for any feedback signals. Any loops that start and finish within the same sample period were completely unrolled. Any loops spanning multiple sample periods were broken. A dataflow passing from one sample period to the next was represented by input and output nodes in the DFG connected by a backward arc to specify a maximum latency constraint from the input to the output. A 20ns clock was specified for all examples. Latency constraints were specified so that the data introduction interval equals the maximum delay from the input to the output of the datapath. • iteration of the datapath. "Max Lat/Clks" specifies the maximum latency (equal to the data sample rate) and the maximum number of control steps (Clks), both given in terms of the number of clock cycles. "Max ϩ/Ϫ" specifies the maximum numbers of adder and multiplier circuits permitted in the design. Values of "Ϫ/Ϫ" indicate that unlimited resources were permitted. The columns headed by "Voltages 1 2 3" indicate the supply voltages selected by MOVER. A "Ϫ" is used to fill voltage columns "2" or "3" in those cases where a one or two supply voltage result is presented. The string "NR" in voltage columns "1" and "2" indicates that a solution with two supply voltages could not be obtained. "NR" in all three columns indicates that a solution with three supply voltages could not be obtained. The "Exec" column reports the minutes of execution time (real, not CPU) required to obtain the result. The number in parenthesis identifies the type of machine used to obtain the result. "(1)" indicates a SPARCserver 1000 with 4 processors and 320MB of RAM. "(2)" indicates a Sparc 5 with 64MB of RAM.
MOVER Results
The bar graph down the center represents the normalized energy consumption of each test case. Each energy result is divided by the single supply voltage, unlimited resource, and minimum latency result to obtain a normalized value. Single supply voltage results are shown with black bars. All other results are shown in gray. This style of presentation is intended to visually emphasize the effect of different latency, resource, and supply voltage constraints on the energy estimate. The rightmost column presents the absolute energy estimate in units of 10 Ϫ12 Joules (pJ). Figure 6 presents area penalty results. All but two columns have the same meaning as the corresponding columns in Figure 5 . The only exceptions are the bar graph and the "area" column on the right. The "area" value is a weighted sum of the minimum circuit resources required to implement the datapath schedule. The resources (all 16 bits wide) were weighted as follows: adderϭ1, multiplierϭ16, registerϭ0.75, and level converterϭ0.15. These weights are proportional to the transistor count of each resource. Each area value was divided by the area estimate for the corresponding single voltage result. Each single voltage result is shown as a black bar. Two and three voltage results are shown in gray.
Observations
The preceding results permit several observations regarding the effect of latency, circuit resource, and supply voltage constraints on energy savings, area costs, and execution time. Because our primary objective has been to minimize energy dissipation through use of multiple voltages, we are especially interested in the comparison of multiple supply voltage results to minimum single supply voltage results. Energy savings ranging from 0% to 50% were observed when comparing multiple to single voltage results. Estimated area penalties ranged from a slight improvement to a 170% increase in area. Actual area penalties could be higher, since our estimate only considers the number of circuit resources used. There is not a clear If we consider the impact of latency constraints alone, effects on area and energy are easier to observe. In most cases, multiple voltage area penalties were greatest for the minimum latency unlimited resource test cases. We can also observe that increasing latency constraints always led to the same or lower energy for a given number of supply voltages. However, the effect of latency constraints on the single vs. multiple voltage tradeoff varied greatly from one example to another. Results for multiple voltages are most favorable in situations where the single supply voltage solution did not benefit from increased latency, perhaps due to a control step bottleneck such as illustrated earlier in Figure 1 .
The effect of resource constraints on energy savings are also relatively easy to observe. Not surprisingly, resource constraints tended to produce the lowest area penalties. The only reason for any area penalty at all in the resource constrained case is that sometimes the minimum single supply solution does not require all of the resources that were permitted. Energy estimates based on resource constrained schedules were consistently the same or higher than estimates based on unlimited resource schedules.
The results presented previously do not include energy or area costs associated with multiplexers that would be required to support sharing of functional units and registers. However, an analysis of multiplexer requirements for most of these schedules indicated that multiplexers would not have changed the relative tradeoff between number of voltages, energy dissipation, and circuit area. In a few cases the energy and area costs were increased substantially (up to 50% for energy and 108% for area), but the comparison between one, two, and three voltages was always either similar to the earlier results or shifted somewhat in favor of multiple voltages. The maximum energy savings was 54%, and the average was 32% when comparing two supply voltages to one. The maximum area penalty was 132% and the average was 42%. Results for three supply voltages, at best, were only slightly better than the two supply results.
Multiplexer costs were estimated in the following manner. A simple greedy algorithm was used to assign a functional unit to each operation and a register to each data value. Given this resource binding, we determined the fan-in to each functional unit and register. Assuming a pass-gate multiplexer implementation, we estimated worst-case capacitance on signal paths, total gate capacitance switched by control lines, and relative circuit area as a function of the fan-in and data bus width. A single pass-gate, turned on, was estimated to add a 5fF load to each data input bit and 5fF to the control inputs of a multiplexer. The circuit area for a pass gate was taken to be 0.07ϫ (the area of one bit slice of a full adder). Multiplexer capacitances and area were added to the costs already used by MOVER. MOVER was then used to generate a new datapath schedule that accounts for these costs. In some cases, supply voltages had to be elevated slightly relative to previous results, in order to compensate for increased propagation delays.
DESIGN ISSUES
There are several design issues that a designer will need to take into consideration when a multiple voltage design is targeted for fabrication. In particular, the effects of multiple voltage operation on IC layout and power supply requirements should be considered. In this section we discuss the issues and identify improvements that would allow MOVER to more completely take them into account.
Layout
Following are some ways that multiple-voltage design may affect IC layout.
(1) If the multiple supplies are generated off-chip, additional power and ground pins will be required.
(2) It may be necessary to partition the chip into separate regions, where all operations in a region operate at the same supply voltage.
(3) Some kind of isolation will be needed between regions operated at different voltages.
(4) There may be some limit on the voltage difference that can be tolerated between regions.
(5) Protection against latch-up may be needed at the logic interfaces between regions of different voltage.
(6) New design rules for routing may be needed to deal with signals at one voltage passing through a region at another voltage.
Isolation requirements between different voltage regions can probably be adequately addressed by increased use of substrate contacts, separate routing of power and ground, increased minimum spacing between routes (for example, between one signal having a 2V swing and another with a 5V swing), and slightly increased spacing between wells. While these practices will increase circuit area somewhat, the effect should be small in comparison to increased circuitry (adders, multipliers, registers, etc.) needed to support parallel operations at reduced supply voltages. Areas for isolation will be further mitigated by grouping together resources at a particular voltage into a common region. Isolation is then needed at the periphery of the region only. Some of these layout issues can be incorporated into multiple voltage scheduling. Perhaps the greatest impact will be related to grouping operations of a particular supply voltage into a common region. Closely intermingled operations at different voltages could lead to complex routing between regions, increased need for level conversions, and increased risk of latchup. Assigning highly connected operations to the same voltage could not only improve routing, but should also lead to fewer voltage regions on the chip, less space lost to isolation between voltage regions, and fewer signals passing between regions operating at different voltages.
Circuit Design
There are some circuit design issues that still need to be addressed by MOVER including alternative level converter designs and control logic design.
Alternative level converter designs such as the combined register and level converter should be considered. The DCVS converter design considered in this paper does not exhibit static power consumption, but short circuit energy is a problem. Delays and energy also increase greatly as the input voltage to the level converter becomes small.
MOVER makes assumptions about datapath control and clocking that are convenient for scheduling and energy estimation, but will require support from the control logic. It is assumed that the entire control of the datapath is accomplished through selective clocking of registers and switching of multiplexers. This will require specially-gated clocks for each register.
Power Supplies
Before implementing a multiple voltage datapath, some decisions must be made regarding the voltages that can be selected and the type of power supply to be used. Regarding voltage selection, we must decide how many supplies to use and determine whether or not nonstandard voltages are acceptable. Regarding the type of power supply, we will only consider the choice between generating the voltage on-chip or off-chip. All of these choices will depend largely on the application. If on-chip heat dissipation is a primary constraint, voltages would be generated off-chip and DC-DC conversion efficiency would be a low priority. If battery life is the bottleneck, DC-DC conversion efficiency will determine whether or not multiple voltages will reap an energy savings.
A simple analysis provides some insight into the conditions under which a new supply voltage could be justified. In a battery-powered system, we would need a DC to DC converter to obtain the new voltage. Let represent the efficiency of the DC to DC converter. The efficiency can be most easily described as the power output to the datapath divided by the power input to the DC-DC converter.
This model does not explicitly represent the effect of the amount of loading or choice of voltages on converter efficiency. For now, we are only trying to determine the degree of converter efficiency needed in order to make a new supply voltage viable. Conversely, given a DC-DC converter of Datapath Scheduling with Multiple Supply Voltages & Level Converters
• known efficiency, we want to know how much voltage reduction is needed to justify use of the converter.
Let ␣ represent the fraction of switched capacitance in the datapath that will be allocated to the new supply voltage. V 1 represents the primary supply voltage. V 2 represents the new reduced supply voltage under consideration. E 1 represents the energy dissipation of the datapath operating with the single supply voltage V 1 . The energy E 1 can be split into a portion, ␣E 1 , representing the circuitry that will run at voltage V 2 , and a remaining portion ͑1 Ϫ ␣͒E 1 that will continue to run at voltage V 1 .
When the new supply voltage V 2 is introduced, the first term in Equation (10) will be scaled by the factor V 2 DC-DC converter that achieves better than 90% efficiency for a 6V to 1.5V voltage reduction.
CONCLUSIONS
In this paper we have presented MOVER, a tool which reduces the energy dissipation of a datapath design through use of multiple supply voltages. An area estimate is produced based on the minimum number of circuit resources required to implement the design. One, two, and three supply voltage designs are generated for consideration by the circuit designer. The user has control over latency constraints, resource constraints, total number of control steps, clock period, voltage range, and number of power supplies. MOVER can be used to examine and tradeoff the effects of each constraint on the energy and area estimates.
MOVER iteratively searches the voltage range for minimum voltages that will be feasible in a one, two, and three supply solution. An exact ILP formulation is used to evaluate schedule feasibility for each voltage selection. The same ILP formulation is used to determine which operations are assigned to each power supply.
MOVER was exercised for six different datapath specifications, each subjected to a variety of latency, resource, and power supply constraints for a total of 70 test cases. The test cases were modest in size, ranging from 13 to 26 datapath operations and 2 to 24 control steps. The results indicate that some but not all datapath specifications can benefit significantly from use of multiple voltages. In many cases, energy was reduced substantially, going from one to two supply voltages. Improvements as much as 50% were observed, but 20 -30% savings were more typical. Adding a third supply produced relatively little improvement over two supplies, 15% improvement at most. Results from MOVER are comparable and in many cases better than results obtained using the MESVS (Minimum Energy Scheduling with Voltage Selection) ILP formulation presented in Johnson and Roy [1996] . Behavior with respect to latency, resource, and supply voltage constraints is similar between MOVER and MESVS. The improvement relative to a pure ILP formulation is due to the fact that ILP formulation could only select from a discrete set of voltages, whereas MOVER can select from a continuous range of voltages.
