Optimal Selection of Supply Voltages and Level Conversions During Low Power Data Path Scheduling by Johnson, Mark C. & Roy, Kaushik
Purdue University
Purdue e-Pubs
ECE Technical Reports Electrical and Computer Engineering
3-1-1996
Optimal Selection of Supply Voltages and Level
Conversions During Low Power Data Path
Scheduling
Mark C. Johnson
Purdue University School of Electrical and Computer Engineering
Kaushik Roy
Purdue University School of Electrical and Computer Engineering
Follow this and additional works at: http://docs.lib.purdue.edu/ecetr
This document has been made available through Purdue e-Pubs, a service of the Purdue University Libraries. Please contact epubs@purdue.edu for
additional information.
Johnson, Mark C. and Roy, Kaushik, "Optimal Selection of Supply Voltages and Level Conversions During Low Power Data Path
Scheduling" (1996). ECE Technical Reports. Paper 105.
http://docs.lib.purdue.edu/ecetr/105
OPTIMAL SELECTION OF SUPPLY 
VOLTAGES AND LEVEL 
CONVERSIONS DURING LOW POWER 
DATA PATH SCHEDULING 
TR-ECE 96-3 
MARCH 1996 
Optimal Selection of Supply Voltages and 
Level Conversions During Low Power Data Path 
Scheduling * 




IN 47907-1285, USA. 
Ph: 317-4942361 
Fax: 317-494-3371 
e-mail: mcjohnso@ecn.purdue.edu, kaushik@ecn.purdue.edu 
Abstract 
In t'his paper we will consider how to select an optimal set of supply voltages and account 
for level conversion costs when optimizing the schedule of a resource domina.ted data path for 
minimum average power dissipation. Integer linear program (ILP) and non-linear program 
(NLP) formulations are presented for a minimum power schedule under latency and through- 
put constraints. Results are presented for several data path topologies under minimum latency 
constraints and under more relaxed latency constraints. The optimization demonstrated sub- 
stantial benefit going from one to two supply voltages, but minimal additi,onal benefit from 
any additional supplies. For example, a Kalman filter benchmark produced a power estimate 
of 356.7mW for a single 5V supply, 265.4mW for 4V and 5V supplies, but no additional 
improvement for three supplies. Increasing minimum schedule latency by 50% improved opti- 
mization results substantially for two and three supply voltages but in mod cases there was 
no improvement at all for a single optimal supply voltage. 
'This research was supported in part by ARPA (under contract F33615-95-C-1625) 
A great deal of current research is motivated by the need for decreased power dissipation while 
satisfying requirements for increased computing capacity. In portable systems, battery life 
is a primary constraint on power. However, even in non-portable systems ;such as scientific 
workstirtions, power is still a serious constraint due to limits on heat dissipat;ion. 
One: design technique that promises substantial power reduction is voltage scaling. The 
term "voltage scaling" refers to  the trade-off of supply voltage against circu:it area and other 
CMOS device parameters to achieve reduced power dissipation while maintaining circuit per- 
formance. The dominant source of power dissipation in a conventional CMOS circuit is due to 
the cha.rging and and discharging of circuit capacitances during switching. E'or static CMOS, 
switching power is proportional to V .  [15]. This relationship provides a st.rong incentive to 
lower supply voltage, especially since changes to any other design parameter can only achieve 
linear savings with respect t o  the parameter change. The penalty of voltage reduction is a 
loss of circuit performance. The propagation delay of CMOS is proportional t;o [15], 
where 'VT is the transistor threshold voltage. 
A variety of techniques are applied to  compensate for the loss of performance with respect 
to  Vdd including reduction of threshold voltages, increasing transistor widths, optimizing the 
device technology for a lower supply voltage, and shortening critical paths in the data path by 
means of parallel architectures and pipelining. Chandrakasan et. al. describe: these techniques 
in [lfi]. 
Dais path designs can benefit from voltage scaling even without changes in device technol* 
gies. Algorithm transformations and scheduling techniques can be used to increase the latency 
available for some or all data path operations. The increased latency allows an operation to 
execute a t  a lower supply voltage without violating schedule constraints. " Architecture-Driven 
Voltage Scaling" is a name Chandrakasan et. al. applied to this approach. 
A number of researchers have developed systems or proposed methods that incorporate 
architecture driven voltage scaling [4, 6, 7, 11, 5, 8, 91. The HYPER-LP system [4] is a system 
that applies transformations to  the data flow graph of an algorithm to optimize it for low 
power. Other systems accept the algorithm as given and apply a variety of riechniques during 
scheduling, module selection, resource binding, etc. to minimize power dissipation. All of 
the syatems mentioned above try to  exploit parallelism in the algorithm to shorten critical 
paths r;o that  reduced supply voltages can be used. Most of the systems [4, 6, 7, 11, 51 try to 
also minimize switched capacitance in the data path. Raje and Sarrafzadeh [9] take switching 
activities as given. They schedule the data path and assign voltages to data path operators so 
as t o  minimize power given a predetermined set of supply voltages. 
Th'e objective of this research has been to  incorporate multiple supply voltage selection 
and level conversion costs into the low power optimization of resource dominated data paths. 
In this paper, ILP and NLP formulations are presented that generate a schedule with sup- 
ply voltages assigned to  each operation so as to minimize average power dissipation. These 
formulations are designed for resource dominated data paths for which area, performance, 
and power dissipation are dominated by the data path resources (arithmetic operators and 
registers). For the remainder of this paper, we will refer to our ILP formulation as MPSVS 
(Minirnum Power Schedule with Voltage Selection). MPSVS is closest in scope to the work 
of Raje and Sarrafzadeh [9]. However, MPSVS is distinguished by the fact that it selects an 
optimal set of one, two, or three supply voltages from a larger set of possible supply voltages, 
and factors level conversion effects into the delay constraints and power estimate. The NLP 
formula.tion generates a schedule with continuous valued schedule times and iunlimited supply 
voltager. The NLP solutions are included for comparison to ILP results. 
2 I:LP Model for Minimum Power Sched.ule With 
Voltage Select ion 
The MPSVS formulation describes a minimum power scheduling problem under latency and 
multiplle supply voltage constraints. It is a zero-one integer linear program ILP similar in 
structu1:e to data path scheduling formulations described by DeMicheli [14] a'nd Gebotys [lo]. 
The primary input to MPSVS is a data flow graph that specifies the ~perat~ions,  data flows, 
and latlency constraints for a data path. Other inputs to MPSVS include: :;pecification of a 
discrete set of permitted supply voltages, a limit on the number of supply voltages that can 
be selected, a minimum difference between voltages that can be selected, average switching 
activities for each data path operation, and nominal propagation delay and average energy 
dissipation values for each data path resource. Solution of the minimum power scheduling 
problem results in a data path schedule, selection of an optimal set of supply voltages, and 
assignment of a supply voltage to each operation. 
MPlSVS makes the following assumptions: a one-to-one relationship of operator types to 
modi~le types, unlimited resources, and a predetermined clock period. Furtberrnore, the out- 
puts of all operations are registered for an entire sample interval of the data path. Level 
converters, when needed, are always located at  the inputs to an operator. 
Delay and power dissipation are accounted for arithmetic operations, rel:isters, and logic 
level conversions. Worst case propagation delay and average energy dissipation per input 
switching event were measured for each type of operator and register under nominal operating 
conditions. MPSVS re-scales the delay and energy estimates to be consist,ent with the supply 
voltages, switching activities, and estimated load capacitances in the data path. Energy esti- 
mates for operators and registers are scaled as E = Eo x q, where Vo is the nominal supply 
vo 
voltage, V is the actual voltage, and Eo is the nominal energy. Delay estimates for operators 
jw, where tpo is the nominal propagation delay and and registers scale as t p  = t f l  x x v- VT 
VT is tlne MOS transistor threshold voltage. Delay and energy estimates for level converters 
are treated in a similar manner except that they are functions of two supp1;y voltages rather 
than one. Details of the level conversion models are given in section 4.1. In all cases delay is 
scaled linearly with respect to load capacitance. Energy is scaled linearly with respect to both 
load capacitance and switching activity. 
The objective function for MPSVS is an estimate of data path power dissipation as a 
function of supply voltages and the rate at  which data samples are processed. The average 
energy of each data path operation, register operation, and level conversion is determined based 
on the .voltage assignments. The sum of these energies over the entire data pa.th represents the 
average energy dissipated by a single execution of the data path. The total energy is divided 
by the time interval between data samples to calculate average power. 
2.1 Definitions 
Before presenting the ILP formulation, it is important to  describe the manner in which a data 
path is specified for optimization and define the notations that will be used. Notations to be 
defined include set names, set indices, and parameters that  characterize data path resources. 
The: input to  both the ILP and NLP formulations is a data flow graph (DFG) where 
each vertex represents an operation and each arc represents a data flow or labtency constraint. 
This DFG representation is similar to the "sequencing graph" representation described by 
DeMicIieli [14] except that  hierarchical and conditional graph entities are not supported. Fol- 
lowing is a brief description of the DFG definition. 
The DFG is a directed acyclic graph, G(V, E) with vertex set V and ledge set E. Each 
vertex corresponds one-to-one with an operator in the data  path. Each edge corresponds one- 
to-one with a dependency between two operators: a data flow, a latency constraint, or both. 
Associated with each vertex is an attribute that  specifies the operator type: adder, multiplier, 
or null operation (NO-OP). Associated with each edge is an attribute that  indicates a latency 
constrarint between the start  times of the source and destination operations. A positive value 
indicates a minimum value for the destination start  time minus the source start  time. The 
magnitude of a negative value specifies a maximum value for the source start  time minus the 
destinakion start  time. 
Two types of NO-OP's are used which we will refer to as "transitive" ant1 "non-transitive" 
NO-01"s. Neither type of NO-OP introduces delay or power dissipation. Both types serve as 
verticeis in the DFG to which latency constraints can be attached. The transitive NO-OP is 
treated as if signals and their logic levels are propagated through the NO-OP. Non-transitive 
NO-01"s are ignored in the accounting of register delays, level conversions, and voltage supply 
choice:. 
Table 1 defines the sets and indices used in the ILP formulation for MPSVS. The "Set 
Name" column lists the labels used to represent the entire membership of a set. "Index" 
and "Index Aliases" identify the index variables used to represent individual members of the 
corresponding set. 
Table 2 describes constants and model parameters used by the ILP formulation. Indices, 
if any, for each parameter array will be shown enclosed in parenthesis following the parameter 
name. 
2.2 Decision Variables 
Every possible pair of i and 1 values define a possible assignment of start  time to an operator 
by me,ans of the zi,r variables. The results of an "as soon as possible" (ASA.P) and an "as late 
as possible" (ALAP) schedule are used to put bounds on the range of start  times allowed for 
each operator [2]. 
= i 1 if operation i is scheduled to start  at cycle 1 Xi,[ 0 otherwise 
Every possible pair of i and 1 values also define a possible assignment of an execution time 
to  an operator by means of the cyc;,~ variables. The results of an ASAP and an ALAP schedule 
Table 1: Set Names and Indices for ILP Formulation 
- 1 Set 1 Index 1 Description 
- 















(4  j )  
( i , j )  
, - 
Set of operator types: 
MI = transitive NO-OP 
MO = non-transitive NO-OP 
M l  = adder 
M2 = multiplier 










Set of vertices in the DFG, each 
corresponding to an operator or NO-OP. 
Set of vertices in the DFG for which 
the start time will be anchored 
to the ASAP schedule. - 
Set of edges in the DFG, each 
corresponding to a data or timing 
dependency from operator i to j. 
Set of DFG edges representing data 
flows from operator i to j that 
could require level conversion - 
Set of DFG edges that do not have 




s sl, s2 
Set of data flows that include a 
register delay. - 
Set of DFG edges that have a transi.tive 
NO-OP at the destination vertex. 
Set of clock cycles available for 
scheduling of operations. 
Set of possible supply voltages 
available for selection. 
Table 2: Parameters Used in ILP Formulation 
Parameter Name Description - 
minv Minimum supply voltage W] 
maxv Maximum supply voltage [V] - 
pathwidth Width (in bits) of all data flows - 
adivity(i, j )  Average switching activity on each signal 
in the entire data path. Ranges from 0 to 1 
11 t r ~ ~  I Clock period Ins]. 
11 t.a,.m 1 Data introduction interval Ins1 
1) uspace 7 Minimum voltage difference between voltage 
supplies that are made available [V]. - 
cdel(s1, s2) Level converter delay for each possible 
pairing of supply voltages [ns] 
cnrgy(s1, s2) Average converter energy [pJ] dissipated w h e n  
I I I inputs-to the converter swjtch, calculated 
for each possible pairing of supply voltages 
cdelmult(i, j )  Factor by which to multiply converter 
II 
. , 
1 delay in a data flow. Set to zero for arcs 
11 I that can not have a level conversion. 
I I / Otherwise, this parameter is a scale factor 
output of operator i 
lat(i, j) Latency constraint on arc (i, j). A positive 
11 ( value indicates a minimum delay from i to j. 
I 1 I The magnitude of a negative value specifies sr 
maximum delay from j  to i. 
[clock cycles] - 
odel(i, s) Propagation delay of operator i when using 
supply voltage s [ns] - 
O ~ T S Y ( ~ ,  S) Average energy dissipated for one execution 
I 1 I of operator i when using 
supply voltage s [pJ] 
rdel(i, s )  Propagation delay of register at  output of ' operator i when using supply voltage s [ns] - 
determined so as to be uniformly distributed 
from minv to mazv. 
rnr!J~(i ,  4 
voltage(s) 
Average energy [pJ] dissipated when a new value 
is latched by the register at the output of 
operator i when using supply voltage s 
 
Voltage level [V] of supply s. These levels are 
are used to put bounds on the range of execution times to be considered for each operator. 
= i 1 if operation i is allowed 1 cycles to execute cyc;,r 0 otherwise 
Every possible pair of i and s values define a possible assignment of supply voltage to an 
operator by means of the u;,, variables. This voltage assignment also specifies the level of a 
logic one output from the operator. 
= i 1 if operation i is powered by voltage supply s lJi,s 0 otherwise 
The usei, variables determine which of the supply voltages (and logic levels) in S will be 
allowed to be used. 
= i 1 if supply voltage is is available for use vsei, 0 otherwise 
T h l ~  uiji,j,sl,SZ variables account for all of the possible logic level conversions required in a 
data  p;ath. viji,j,sl,,z is set to one when there is a data  flow from operation a to j, supply voltage 
sl is assigned to operation i, supply voltage s2 is assigned to operation j, and uoNage(sl) < 
uoltage(sz). ~ij;,~,,,,, is set to  one if there is a data flow from i to j for which a level conversion 
is not required. so represents the index for the lowest defined supply voltage, but (so, so) was 
arbitrarily selected to  represent all cases where a level converter is not requ:ired. 
' 1 if (i, j )  E Eco,,, operator i uses supply 
sl, operator j uses supply sa ,  and 
I voltage(s2) > voltage(s1) 
1 if (i, j )  E Econv, 
uoltage(sl) = voNage(sa) = minu 
and the supply voltage for operator i equals or 
exceeds that of operator j 
0 otherwise 
2.3 Constraints 
There can only be one start  time, one execution time, and one supply voltage assigned to 
each data path operation. These restrictions are enforced by constraint equations 1, 2, and 3 
respectively. 
If there is a data flow from operator i  to j, operator i  uses voltage supply s l ,  operator j  
uses su.pply s z ,  and vol tage(s l )  < voltage(s2),  then v i j ( i ,  j ,  s l ,  s 2 )  is forced to a value of 1. 
For each data flow ( i ,  j ) ,  only one kind of level conversion can be specified. There must be 
one and only one choice of sl and s2 for which ~ i j ~ , ~ , , , , ~ ,  will equal 1. 
~ i j ~ , ~ , , , , , ,  will equal 1 if no level conversion is used in the data flow from operator i to j .  
This is necessary in order to satisfy equation 5 which requires that exactly one level conversion 
is always specified for each data flow. so is the index of the minimum supply voltage, but 
( s o l  so)  is used here as a way to  indicate that no level conversion is required.. 
If operator j  is a transitive NO-OP, force the supply voltage for operator j  to match the 
supply voltage for operator i .  
Restrict the number of supply voltages actually used to a specified number. 
7- 
) vsel,  = number of supplies allowed 
A d  
S 
A v.oltage supply can only be assigned to operator i  if that supply is available as indicated 
by vser!,. 
vi,, 5 vsel,  Vi and s (9) 
Equation 10 guarantees that not more than one supply voltage will be selected in an interval 
of vspcrce volts. 
x vsel,, 1 Vs 
s s s l  ss+uspace 
For each data flow from operator i  to j ,  the execution time allocated to operator j must 
meet or exceed the sum of the propagation delay of operator j, the register at the output of 
operator i ,  and the level conversion (if any). Equation 11 represents this con.straint as follows: 
For every forward arc in the DFG, equation 12 ensures that the start time of operator j 
must exceed the start time of operator i by at  least the execution time assigned to operator i. 
This g-uaranteea that data flow dependencies in the data path are satisfied. 
V(i, j )  E E where l ~ t ; , ~  2 0 
For every arc with a non-zero latency constraint specified by parameter lat(i, j ) ,  the 
start time of operator j must exceed the start time of operator j by the amount lat(i, j ) .  
If LAT(i, j )  < 0, equation 13 has the effect of enforcing a maximum latency constraint of 
I lat(i, j )  I clock cycles from the start time of operator j to i. 
V(i, j )  E E where lat; ,  # 0 
2.4 Objective Function 
An estimateof power dissipation serves as the objective function to be minimized when schedul- 
ing and assigning supply voltages to resources in the data path. The estimate is obtained by 
first t,a.king the average total energy dissipated to process one input sample, i.e., one execution 
of the data path. The parameter arrays onrgy(i, s) and rnrgy(i, s)  contain estimates of the 
energy expended to perform operation i and store the result for a single change of input values. 
cnrgy~nult(i, j) x cnrgy(sl, s2) gives the energy dissipation of the level conversion applied to 
a sing1.e change in the output of operation .i destined for operation j. The parameter arrays 
give energy estimates for each possible choice of supply voltages. The voltage assignment vari- 
ables I);,, and viji,j,,l,,2 are used to select one energy estimate from the parameter arrays for 
each aperator, register, and level converter. Finally, the total energy is divided by the data 
introduction interval TsAMP to give an estimate of average power dissipation. 
1 
PWT = - x C C v i ,  x (onrgy(i, s )  + mrgy(i, s)) + 
i s amp ; , 
A different objective function is needed for "as soon as possible" (ASA:P) and "as late as 
possible" (ALAP) schedule formulations that are used to set bounds on operator start times 
and execution times. The ASAP and ALAP objective function (equation 15) is simply the 
sum o:l the start times for all vertices in the DFG. The ASAP schedule minimizes the objective 
while the ALAP schedule maximizes the objective. 
2.5 Solution Strategy 
The ILlP formulation was implemented using GAMS [13] (General Algebraic Modeling System) 
and solved using the CPLEX integer program solver. The solution strategy t,aken was to start 
with a formulation that is relatively easy to solve and then solve successively more difficult 
problenns using the previous results to set bounds and initial conditions. Here is the sequence 
of mod'eling and optimization phases used to finally obtain a minimum powe:r schedule. 
11. Specify the DFG and timing constraints for the data path to be optimized. 1 
2. Obtain the ASAP schedule. 
3. Obtain the ALAP schedule. The ASAP results provide the initial conditions. 
Start times of source nodes in the DFG are anchored to the ASAP 
values before running the ALAP schedule. 
4. Use the ASAP and ALAP results to set bounds on the start times and 
execution times of each operator. 
!i. Obtain a minimum power schedule where the number of voltage supplies 
is limited to one. The ASAP results provide a starting condition 
and an upper bound on the power objective. 
e. Obtain a minimum power schedule where the number of voltage su.pplies 
is limited to two. The single supply voltage solution provides the 
starting state and an upper bound on the power objective. 
'7. Obtain a minimum power schedule for three voltage supplies using 
the two supply solution for the starting state and 
for the upper bound on the power objective. 
IXLP Formulation 
The NLP formulation is a continuous variable realization of the constraints and objective 
function that have been described for the ILP formulation. The NLP formulation should 
produce an optimized power dissipation lower than or equal to the ILP fornlulation since the 
ILP solution has to be a feasible solution to the continuous variable problem. 
The DFG used to specify a data path is identical for the ILP and NLP formulations. 
Assumptions regarding the structure of the data path to be scheduled are also the same. The 
specifications of variables and some constraints are different in the NLP formulation. Each 
quantiity to be determined by the optimization is represented by a single continuous valued 
variable. For example, the start time of operation i is represented by a single continuous valued 
variable rather than a collection of zero/one variables. The execution ti~nes of operators, 
excluding NO-OPs, are constrained to be a least one clock cycle in duratioin. No restrictions 
are applied to the number of different supply voltages that can be selected. The supply voltages 
v1 
- 
mN All translrtors 0.811 length 
I and 4 . 0 ~  width except wherc: 
Figure 1: DCVS Logic Level Converter 
can be selected from a continuum of values between a user specified minimum and maximum. 
Quantities such as the propagation delay of each operator are calculated dynamically by the 
NLP solver rather the being taken from look-up tables. 
The: NLP formulation was using the GAMS input language and solved using CONOPT. 
4 ]Level Conversion 
Four alternatives were considered for implementing logic level conversions in CMOS: 
1. Omit the logic level converter 
2. Use a chain of inverters 
3. Use an active or passive pullup 
4. DCVS level converter 
We omit the level converter for step-down conversions and use the DCVS circuit for step- 
up conversions. The DCVS circuit, shown in figure 1 was derived from a chip to chip level 
convcr1;er described by Chandrakasan [I]. More recently, Usami and Horowitz discussed the 
use of (;his circuit as a level converter in [3]. As long as the transistors are sized appropriately 
for the level conversion to  be performed, this circuit exhibits no static curlrent paths and it 
can operate over a full 1.5V to  5.OV range of input and output supply volta,ses. 
Another option is to combine the register and level converters together. This approach 
was documented by Usami and Horowitz [3]. The combined register and level conversion was 
fount1 to  dissipate only 10% more power than the register alone. 
4.1 Converter Modeling Approach 
A motlel was needed that could accurately indicate the power dissipation and propagation 
delay of the DCVS level converter as a function of the input logic supply voltage Vl, output 
logic supply voltage V2, and load capacitance. The circuit was studied both analytically and 
from IISPICE [19] simulation results to  determine a suitable form for the model equations. 
Coefficients of the equations were then calibrated so that the model equatic~ns would produce 
families of curves that closely match curves produced in HSPICE. The resulting model is valid 
for Vl ranging from 1.5V to  5V and Vl + VT 5 Vz 5 5V. This correspontls to  the range of 
supply. voltages for which a level converter is needed. 
4.2 Power Model 
The power model is separated into three factors. The first factor calculates the power con- 
sumption for Vl = &. Charging and discharging of the load capacitance contributes a vZ2 
term to  the power. The short circuit current on the paths through MlP/MlN and M2P/M2N 
contribute power as a third order polynomial of V2. 
The coefficients a3  through a0 are obtained by means of a polynomial curve fit to a plot 
of circuit power vs. V2. 
The next factor estimates the ratio of increase in power consumption due to & being less 
than 15. 
bO represents the portion of power dissipation not affected by Vl . The fractional expression 
models the effect of 'J1 < V2. When Vi < V2, M2N is in saturation uni:il VouT drops to 
V2-fi. Shortly thereafter, the cross-coupled circuit switches and M2P turns off. The fractional 
expression in DCVSPWR(Vl ,  V2) models the effect of saturation current in the pull-down 
transistors on the duration of short circuit current. The final term represents the power 
consumption in the inverter. 
The power model is scaled linearly for load capacitance. All of the analytical expressions 
for DCVS power dissipation showed a linear dependence on load capacitance. Plots of power 
dissipation versus load capacitance showed an almost perfect linear dependence on the load. 
Furthermore, if one chooses a nominal load capacitance (CLo) to  evaluate power dissipation, 
the slope of power versus capacitance is found to be proportional to the power dissipation 
(pwrO) a t  the nominal load. dpdc is the slope of power versus capacitance for the values of 
Vl anti V2 for which pwrO was measured. The following expression models this dependence on 
load capacitance. 
(CL -. CLO) D C V S P  WR(V1, V2 CL) = D C V S P W R ( &  I V2) X (1 + d ~ d c  X p ~ r  1 (18) 
4.3 Delay Model 
The delay model hinges on the following observation of delay versus V2 for fixed values of Vl. 
For V2 > Vl + VT, delay increases almost linearly with respect t o  V2. More importantly, the 
delay versus V2 lines all intersect a t  nearly the same point if extended. To take advantage of 
this behavior, a polynomial curve fit to  1 + delay was used to estimate the position of a point 
on the linear portion of each delay versus V2 curve. In particular, data points corresponding 
to  V2 := Vl + VT were used. The expression for DCVSDEL(Vl, Vl + VT) estimates these data 
points. 
The expression for DCVSDEL(Vl, V2) models the radial behavior of the delay versus V2 
curves. (Vo, delO) specifies the point from which the lines radiate. 
DCVSDEL(Vl, Vl + VT) - delO 
DCVSDEL(V1, V2) = x (V2 - Vo) +- deiO 
K+VT-Vo (20) 
Delay scales with respect to  load capacitance in a manner identical to !;hat described for 
power versus capacitance. 
5 Results 
5.1 Data Path Examples 
The ILP and NLP scheduling formulations were run for four example data paths: two toy 
benclilnarks based on a four point F F T  (FFT4a and FFT4b) [17], the 5th order elliptic wave 
filter E~enchmark (ELLIP) [16], a 6th order Auto-Regressive Lattice filter (LA.TTICE) based on 
the topology documented in [17], and the Kalman filter benchmark (KALMAN) [18]. Data flow 
graphs for each example are shown in figures 3 through 7. Figure 2 defines the notations used 
in eaclh DFG. In the F F T  data path, complex signal paths are split into real and imaginary 
data flows. The FFT4a example uses a seperate adder to implement each 2's complement 
inversion. FFT4b lumps any 2's complement operations into the next adder input. For all 
other (data paths, the signals are modeled as non- complex integer values. A.11 data flows were 
taken to  be 16 bits wide. Switching activities at all nodes were assumed t~o be 50%, ie., the 
probal~ility of a transition on any selected 1 bit signal is 50% in any one sample interval. 
8 Adder 
@ 2's ~ o m p l e m n t  p= Data Flow 
.- ----, Latency Constraint 
Figure 2: Key to DFG Notation 
Figure 3: F F T 4 a  - 4 Point FFT with Balanced Paths  
Each example was modeled for one sample period with data flow and latency constraints 
specifietl for any feedback signals. No conditional operations were modeled. Any loops that  
s tar t  and finish within the same sample period were completely unrolled. Any loops spanning 
multiple sample periods were broken. A data flow passing from one sample period to the next 
was represented by input and and output nodes in the DFG connected by a backwards arc to 
specify a maximum latency constraint from the input to the output. A 20ns clock was specified 
for all examples. 
Latency constraints were specified so that the data introduction interval  equals the maxi- 
mum delay from the input to the output of the data path. The total execution time of the da ta  
path is permitted to exceed the data  introduction interval as long as the outputs generated 
after the maximum latency are only used in the next iteration. In that  situat'ion, a maximum 
latency constraint is applied between the output node and any inputs that; use the output 
value. 
5.2 Characterization of Data Path Resources 
A 16 bii; ripple carry adder, a 16 bit carry-save multiplier, and a 16 bit register were simulated 
in HSPICE to  obtain propagation delay and power dissipation values und.er the following 
conditions. The level 3 MOS model was used with parameter values for a 0 . 8 ~  MOSIS process. 
A load capacitance of 0 - l p F  was applied to each output signal. Power sup:plies were set to 
5V. Input signals were generated for which an average of 50% of the one bit signals would 
switch simultaneously every 20ns. Worst case delay was measured and used in the adder delay 
model. The register setup time requirement is lumped into the adder delay. Multiplier delay 
- Sink 
Figure 4: FFT4b - 4 Point F F T  with Imbalanced Paths 
Figure 5 :  ELLIP - 5th Order Elliptic Wave Filter 
Fculbsck paths are indicated 
by signal nnmes (eg., there is 
a feedback from G5(n) to 
G5(n- 1)). A maximum 
latency constraint is assigned G5(n-1) i 
to each feedback path. 
'---...-.-----..-.------------------------: 
Figure 6: LATTICE - 6th Order Lattice Filter 
As- 
A(ijX K(4). md qij) are prpp 01 
carnant mlfmenrs.  Y(0) l h ~ ~ g b  Y(1Z) 
are iopls. Y40) 0 ) g b  X(15) are state 
vectanr. Vui )  is me i'th pnrtial sum 
f a  me mplt v c t a  V(j). 
After 16 itauirm. Ik output vectaa 
will be avp i lde  In VU,15). 
Figure 7: KALMAN - Kalman Filter Benchmark 
Table 3: Nominal Power and Delay Measurements 
was taken to  be the largest delay observed for the entire sequence of random input signals. 
Power dissipation was taken to be the average power dissipation for the sequence of random 
input values. Nominal operating conditions for the level conversion are th,e same as for the 
other resources, except for power supplies and load capacitance. The converter requires two 
power supplies: the nominal lower supply level was taken to be 3.3V, the higher supply was 
5V. Load capacitance was O.lpF on both sides of the differential output for a total load of 
0.2pF. Table 5.2 gives the nominal power, energy and delay values that were measured for 
each type of resource. 
Resource 
TY pe 
5.3 Optimization Results 
Tables 4 and 5 present the results of running the ILP and NLP formulations for each example 
data path under a variety of voltage supply restrictions. Table 4 reports the results when 
maximum latency is set equal to  the latency of the ASAP schedule. Table 5 1:eports the results 
when .maximum latency is set equal to the ASAP latency plus 50%. In lboth tables, NLP 
formulation results are reported for all supplies fixed a t  5V and for an u:nlimited selection 
of supjply voltages between 1.5V and 5V. ILP formulation results are reported for 5V fixed 
supplies, a single optimal supply voltage, an optimal choice of two supply voltages, and an 
optimal choice of three supply voltages. The ILP formulation was permitted to choose from 
voltages ranging from 1.5V to  5V in 0.5V increments. Selected voltages were: required to differ 
by a t  ].east 1V. A clock period of 20ns was used for all examples. 
Vallues reported in tables 4 and 5 are meant to  be interpreted in the ifoliowing manner. 
The power estimate is the average energy per switching event divided by the sample period 
for thr: data path. "Min Voltage" and "Max Voltage" report the smallest and largest supply 
voltages assigned to a t  least one operator by the NLP formulation. "#  Converters" indicates 
the number of level converters needed as a result of the way voltages were assigned to operators. 
"Volt,a.ge", "Voltage I", etc. all report supply voltages selected by an ILP solution. Next to  
each supply voltage is an indication of the number of operations of each t:ype to which that 
voltage was assigned. For example, "(5')" next to  a voltage indicates that five multiplications 
were assigned to  that supply voltage. 
Adder 






Table 4: Power Dissipation and Voltage Selection Results for No Critical Pa th  Slack 
7 1  FFT4a FFT4b ELLIP 
All latencies Latency Latency Latency Latency Latency 
Restrictions in clock cyc. = 2 = 4 = 10 = 6 
' NLP, all 5V Energy [pJ] 2955 4427 4814 
Power [mW] 73.9 55.3 24.1 165.9 
1.9V 1.5V 1.8V 1.8V 
ILP, all 5V I Energy [pJ] 11 2955 1 4427 1 4814 1 19913 1 64200 
ILP, 
2 supplim 1 ~~~~3~~ 
Voltage 2 
# Converters 
ILP, Energy [PJI 
Power [m W] 
3 Supplies Voltage 1 
Voltage 2 
Voltage 3 I- 
1 Supply 
ILP, 








47.3 ~ 4;.3 1 3:.7 1 1:.7 1 8:-5 1 2::.4 
4.0V (16+) 2.5V (6+) 2.5V (8+) 3.0V (4+,9*) 4.0V (10+,19*) 









































165.9 1 356.7 








Table 5: Pclwer Dissipation a n d  Voltage Se1ection:l Results  for a 50% Incirease i n  Cri t ical  P a t h  
Latency 
.. - 
Restrictions in clock cyc. = 3 
NLP, all 5V Energy [pJ] 2955 
1 Power [mW] 1 49.2 
ILP, all 5V Energy [pJ] 2955 
Power [mW] 49.2 
ILP, Energy [pJ] 1890 
Power [mW] 31.5 




1 Supply Voltage 4.0V 
ILP, Energy [pJ] 1401 
All latencies 
Power [mW] 23.4 
2 Supplies Voltage 1 2.5V (8+) 
Voltage 2 4.0V (8+) 
# Converters 16 
ILP, Energy [pJ] 1401 
Power [mW] 23.4 
3 Supplies Voltage 1 2.5V (8+) 
Voltage 2 4.0V (8+) 
Voltage 3 unused 
# Converters 16  
FFT4b I ELLIP I LATTICE 1 KALMAN 
Latency Latency Laten'cy Latency 
= 6 1 = 1 5  1 = 9  1 = 1 4  
36.9 1 16.0 I 110.6 1 229.3 
4.0V 4.0V 
1406 1447 
11.7 4.8 138.8 
4.0V (4+) 4.0V (3+) 4.0V (4+) 4.0V (22+,12*) 
unused unused 
2 5 9 
5.4 0 bservat ions 
In the preceding results, we are able to  observe the effect of the number of supply voltages, 
data  path topology, and latency constraints on our ability to minimize povver dissipation by 
appropriate scheduling and selection of supply voltages. 
For all but one of the data path examples that were evaluated, two appeared to  be the 
optimal number of voltage supplies. Three supply voltages provided little or no reduction 
in power and sometimes increased the number of level converters required. This may be a 
consequence of evaluating data paths that  are all comparable in size and complexity. 
Data path topology seems t o  have the greatest impact on schedules with minimum latency 
constrirints. By "minimum latency", we mean that no schedule slack was available on the 
critical path. In the single supply voltage case, the only usable slack is the clifference between 
the ext:cution time of each operation in the critical path and the nearest multiple of the clock 
period that is larger, regardless of the topology. For multiple supply voltages, a minimum la- 
tency da ta  path with most signal paths of similar length still offers relatively little opportunity 
for voltage reduction. FFT4a and FFT4b were tailored to  demonstrate this effect. The paths 
in FF7.'4a are identical in length. The only available schedule slack is the clifference between 
the clock cycle time and the time for an addition. Multiple voltages are of no use in FFT4a. 
FFT4b has three different path lengths and is able to take advantage of multiple voltages. 
Among the less trivial examples, the signal paths through the DFG for the KALMAN bench- 
mark vvere most nearly of similar length. Consequently, the KALMAN benchmark derived less 
benefit from multiple supply voltages than ELLIP or LATTICE. The  LATTICE filter DFG 
had the greatest variation in signal path lengths and also derived the greatest benefit from 
multiple supply voltages for the minimum latency constraint case. 
Increasing schedule latency by 50% doesn't improve the results much for a single supply 
voltage, but multiple voltage results are enhanced, and the influence of topology is reduced. 
In the single supply voltage minimum latency case, a lowering of voltage increases the delay 
of all alata path operations. If the data path was already voltage scaled for minimum latency, 
a small voltage drop may cause the delay of many data  path operators to  exceed the next 
multiple of a clock cycle and cause the data path latency to  increase by more than 50%. In 
the mriltiple supply voltage case, voltage reductions can be selectively applied to individual 
operators to take up just the amount of schedule slack that is available. 7?opology becomes 
less important with increased latency, since unbalanced signal paths are not needed to  provide 
slack for voltage scaling. 
Conclusions 
In this paper we have presented a method, MPSVS, for using integer progrannming to optimize 
the sclledule and supply voltage levels for a mixed voltage data path design. T h e  primary 
benefit of MPSVS is to  obtain a data  path schedule and supply voltage assignments that 
minimize data  path power dissipation. However, there are some beneficial side effects. Use of 
level conversions should be lower than for multiple voltage scheduling algorithms that  ignore 
level conversion costs. Fewer level conversions should result in larger portions of the data path 
that operate with a single supply voltage, simplifying layout and routing. Lowering supply 
voltages to signal paths with relatively large schedule slack will balance the delay paths and 
should help reduce glitching activity. 
Running MPSVS on a variety of data path examples resulted in the follovving observations. 
In all but a perfectly balanced data path example, the optimal number of supply voltages 
turned out to be two. When minimum latency constraints are applied, no voltage scaling can 
be applied unless there are signal paths shorter than the critical path. Loosening the latency 
constraints allowed lower voltages to be selected, but the optimal number of supply voltages 
still appeared to be two. 
There are a number of extensions to MPSVS that would improve the quality of the op- 
timization results and bring them closer to a realistic specification of a data path. Useful 
extensi.ons would include the addition of data path resource constraints, module selection and 
binding, support for retiming, support for conditional operations and loops, and including the 
effect of schedule changes on switching. Problems of particular interest are the incorporation 
of resource constraints and module binding which will have to be reformula~ted to account for 
assignment of a voltage to each module. 
Refierences 
[I.] A.P. Chandrakasan, R. Allmon, A. Stratakos, R. W. Brodersen, " Design of Portable Sys- 
tems," IEEE 1994 Custom Integrated Circuits Conference, San Diego, GA, pp.259-266. 
[2] S. Chaudhuri, R.A. Walker, and J.E. Mitchell, "Analyzing and Exploiting the Structure 
of' the Constraints in the ILP Approach to the Scheduling Problem," IEEE Transactions 
011 Very Large Scale Integration (VLSI) Systems, Dec. 1994, pp.456-47'1. 
[3] K. Usami and M. Horowitz, "Clustered Voltage Scaling Technique for Lasw-Power Design", 
Proceedings of the International Symposium on Low Power Design 1995, New York, pp.3- 
8. 
[4] A.P. Chandrakasan, et. al., "Optimizing Power Using 'Ilansformations," IEEE Transac- 
tions on Computer-Aided Design of Integrated Circuits and Systems, 'V01.14, No.1, Jan. 
1!295, pp.12-31. 
[5] N. Kumar, S. Katkoori, L. Rader, and R. Vemuri, "Profile-Driven Behavioral Synthesis 
fclr Low-Power VLSI Systems," IEEE Design & Test of Computers, Fall 1995, pp.70-84. 
[6] A. Raghunathan and N.K. Jha, "Behavioral Synthesis for Low Power," Proceedings - IEEE 
I~zternational Conference on Computer Design: VLSI in Computers and Processors 1994, 
plp.318-322. 
[7] A. Raghunathan and N.K. Jha, "An Iterative Improvement Algorithm for Low Power 
Dlata Path Synthesis," Proceedings of the Internaiional Conference ocr Compuier Aided 
Design 1995, pp.597-602. 
[8] R.. San Martin and J.P. Knight, " Power-Profiler: Optimizing ASICs Power Consumption 
a~t  he Behavioral Level," Proceedings 32nd Design Automation Conference, June 1995, 
San Francisco, pp.42-47. 
[9] S. Raje and M. Sarrafzadeh, "Variable Voltage Scheduling,'' Proceediz~gs of the Interna- 
tional Symposium on Low Power Design 1995, New York, pp.9-14. 
[lo] C.H. Gebotys and M.I. Elmasry, "A Global Optimization Approach far Architectural 
Synthesis", IEEE Transactions on CAD/ICAS, Vo1.12, No.9, pp.1266-12'78, Sep 1993. 
[ l l ]  L. Goodby, A. Orailoglu, and P.M. Chau, "Microarchitectural Synthesis of Performance- 
Constrained, Low-Power VLSI Designs," Proceedings - IEEE International Conference on 
Computer Design: VLSI in Computers and Processors 1994, pp.323-326. 
[12] J .  Elabaey, Digital Integrated Circuits: A Design Perspective, Prentice Hall, Englewood 
Cliffs, N.J., t o  be published. 
[13] Ant'hony Brooke, David Kendrick, and Alexander Meeraus, GAMS A User's Guide, The  
Scientific Press, 1992. 
[14] G. :DeMicheli, Synthesis and Optimization of Digital Circuits, McGraw-Hill, Inc., 1994. 
[15] A.F'.Chandrakasan, S-Sheng, and R.W.Brodersen, "Low-Power CMOS Digital Design," 
Journal of Solid-State Circuit, Vo1.27, No.4, April 1992, pp.473-483. 
[16] D. !j. Rao, "The Fifth Order Elliptic Wave Filter Benchmark" benchmark set: HLSynth92 
ht tp://www.cbl.ncsu.edu/www/CBLDocs/Bench.htm 
[17] J.G. Proakis, D.G. Manolakis, Digital Signal Processing Principles, Algorithms, and 
Applications Macmillan Publishing Company, Inc., 1992. 
[18] C. Ramachandran, " Kalman Filter Benchmark,  benchmark set: HLSynth92 
htt~~://www.cbl.ncsu.edu/www/CBLDocs/Bench.htm 
[19] HSPICE User's Manual, Meta-Software, Inc., 1995. 
