Gate-Level Design Exploiting Dual Supply Voltages for Power-Driven Applications by Chingwei Yeh et al.
Gate-Level Design Exploiting Dual Supply Voltages for Power-Driven Applications
Chingwei Yeh, Min-Cheng Chang, Shih-Chieh Chang
￿, Wen-Bone Jone
￿
EE &
￿CS, Nat’l Chung-Cheng Univ., Chiayi 621, Taiwan, ROC
princial author’s email: ieecwy@ccunix.ccu.edu.tw
A
b
s
t
r
a
c
t
The advent of portable and high-density devices has made power
consumption a critical design concern. In this paper, we address
the problem of reducing power consumption via gate-level voltage
scaling for those designs that are not under the strictest timing bud-
get. We ﬁrst use a maximum-weighted independent set formulation
for voltage reduction on non-critical part of the circuit. Then, we
use a minimum-weighted separator set formulation to do gate siz-
ing and integrate the sizing procedure with a voltage scaling proce-
dure to enhance power saving on the whole circuit. The proposed
methods are evaluated using the MCNC benchmark circuits. and
an average of 19.12% power reduction over the circuits having only
one supply voltage has been achieved.
1
I
n
t
r
o
d
u
c
t
i
o
n
Power consumption has become a critical design concern due to the
increasing gap between the energy required by portable computa-
tion/communication devices and the energy supplied by the bat-
tery. In addition, as the number of devices being packed into a
single chip approaching millions, heat dissipation becomes a prob-
lem that can adversely affect the reliability and packaging cost of
a design. These factors have driven numerous research efforts to
address various kinds of power-saving techniques [7, 5, 1].
The primary source of power consumption for a CMOS design
comes from the switching of logic states. The switching power is
expressed as [1]
P
s
w
i
t
c
h
=
￿
0
!
1
￿
f
c
l
k
￿
(
C
l
o
a
d
￿
V
2
d
d
) (1)
where
￿
0
!
1 is the average number of times in a clock cycle that a
switch from 0 to 1 occurs,
f
c
l
k is the clock frequency, and
C
l
o
a
d
is the loading capacitance. The equation clearly shows the supply
voltage (
V
d
d) affects power dissipation in a quadratic order. Thus,
voltage scaling has been deemed the most potential approach for
power reduction [1]. However, since the delay of a CMOS gate is
inversely proportional to the supply voltage, it is necessary to min-
imize or compensate the speed degradation resulted from voltage
scaling. Several techniques have been proposed to deal with speed
degradation resulted from reducing the supply voltage. For exam-
ple, parallel or pipeline architectures can be used to speed up a sys-
tem block in the ﬁrst place. Then, the voltage of the entire block is
scaled so as to gain back in power [1]. Also, scheduling technique
can be used to identify the blocks whose voltages can be lowered
down without compromising the overall system throughput [4].
The above block-level voltage scaling prescribes that the volt-
age of the entire block must be lowered in synchrony. In fact, volt-
age scaling can also be applied on the level of gates.T h a ti s ,l o w
voltage can be attempted on non-critical parts of a block so that
timing constraints of the block are not violated. However, direct
application of the idea generates a circuit with low-voltage parts
scattered among the high-voltage parts. Thus, it is possible that
a low-voltage gate is made to drive a high-voltage one. Since the
high-output of the low-voltage gate can not fully turn off the PMOS
part of the high-voltage gate, there is a DC leakage from power
source to ground. The DC leakage can lead to substantial power
loss. As a remedy, level restoration must be done at each driving
incompatibility point. Nonetheless, the restoration of voltage level
consumes power. Hence the authors of [8] restricted that, in ad-
dition to holding timing constraints at primary outputs, the supply
voltage of all the fanout gates of the current gate must be lowered
in order to for the current gate to receive a low voltage. In doing
so, the low voltage gates are within one cluster which is contingent
to the primary outputs. The contingency requirement can be ful-
ﬁlled using the clustered voltage scaling (
C
V
S) technique [8, 9].
The technique is basically a breadth-ﬁrst traversing from primary
outputs. During the traversal, each visited gateis examined for pos-
sible reduction of its voltage level. If the incurred timing penalty
does not cause the signal paths passing through this gate to violate
the timing constraint, the supply voltage of the gate can be lowered.
Otherwise, the voltage level of the gate is kept and further traversal
f r o mt h i sg a t ei st e r m i n a t e d .
The
C
V
S technique is the simplest way to achieve gate-level
voltage scaling. However, as will be elaborated in the subsequent
sections, the approach lacks global view of the problem and so is
too conservative to explore every power-saving opportunity. In this
regard, we ﬁrst use a maximum-weighted independent set formula-
tion for voltage reduction on non-critical part of the circuit. Then,
we use a minimum-weighted separator set formulation to do gate
sizing and integrate the sizing procedure with a voltage scaling pro-
cedure to enhance power saving on the whole circuit. The proposed
methods are evaluated using the MCNC benchmark circuits. and
an average of 19.12% power reduction over the circuits having only
one supply voltage has been achieved. These data provide a ground
for the application of gate-level voltage scaling.
_
___________________________
Permission to make digital/hardcopy of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage, the copyright notice, the title of the publication
and its date appear, and notice is given that copying is by permission of ACM, Inc.
To copy otherwise, to republish, to post on servers or to redistribute to lists, requires
prior specific permission and/or a fee.
DAC 99, New Orleans, Louisiana
(c) 1999 ACM 1-58113-109-7/99/06..$5.002
E
x
p
l
o
i
t
i
n
g
E
x
i
s
t
i
n
g
T
i
m
i
n
g
S
l
a
c
k
f
o
r
V
o
l
t
a
g
e
S
c
a
l
i
n
g
For the designs that are not under the tightest delay constraint, there
are often non-critical parts of the design that can be attempted with
low supply voltage. The
C
V
S technique performs voltage scal-
ing on the parts of the circuit neighboring to the primary output.
For other parts of the circuit, there is still possibility to employ
voltage reduction. This can be done using the following algorithm
(
D
s
c
a
l
e).
Dscale(
N
;
T
s
p
e
c
;
l
i
b
r
a
r
y
;
V
h
i
g
h
;
V
l
o
w)
f
T
C
B
=
C
V
S
(
N
;
T
s
p
e
c
);
while (1)
f
S
l
k
S
e
t
=
g
e
t
S
l
k
S
e
t
(
N
;
T
C
B
;
T
s
p
e
c
);
c
a
n
d
S
e
t
=
c
h
e
c
k
t
i
m
i
n
g
(
S
l
k
S
e
t
;
V
h
i
g
h
;
V
l
o
w
);
if (
c
a
n
d
S
e
t
6
=
￿)
f
w
e
i
g
h
t
w
i
t
h
p
o
w
e
r
g
a
i
n
(
c
a
n
d
S
e
t
);
L
o
w
S
e
t
=
M
W
I
S
(
c
a
n
d
S
e
t
);
for each node in
L
o
w
S
e
t
f
down-scale the supply voltage to
V
l
o
w;
insert necessary level restoration circuits;
g
u
p
d
a
t
e
t
i
m
i
n
g
(
N
;
T
s
p
e
c
);
g else
f
break;
g
g
g
In the pseudo code,
N stands for the logic network to be pro-
cessed. In additional to
N, the algorithm also needs the speciﬁca-
tions of timing (
T
s
p
e
c), the two voltages
V
h
i
g
h and
V
l
o
w,a n dt h e
cell library. The library must be processed in advance so that it
includes, beside the typical gate attributes, the new gate attributes
when the supply voltage is set to
V
l
o
w. This involves simulating
each gate with
V
l
o
w using the SPICE.
The ﬁrst step of
D
s
c
a
l
e is to exploit the timing slack near the
primary outputs. This isdone via the
C
V
Sprocedure. When
C
V
S
ﬁnishes, a set of nodes are identiﬁed as the time-critical boundary
(
T
C
B). As the concept of
T
C
B will be used in the subsequent
section, we elaborate its deﬁnition here. A node is said to be in
T
C
Bif both the following two situations hold: 1) the delay at the
output of the node would violate timing constraint if its supply volt-
age is scaled to
V
l
o
w; and 2) the voltage of one of the node’s fanout
nodes has already been set to
V
l
o
w. Thus, the
T
C
B is composed
of the high-voltage nodes sitting next to the low-voltage ones.
The next step of
D
s
c
a
l
e is to exploit the timing slack in other
parts of the circuit. The procedure get SlkSet evokes a static tim-
ing analysis to identify the part of the circuit that has timing slack,
termed SlkSet in the pseudo code. For each node of
S
l
k
S
e
t,w e
check whether it is possible to scale the voltage from
V
h
i
g
h to
V
l
o
w
withintheallowanceof thetimingslack(procedure
c
h
e
c
k
t
i
m
i
n
g).
The nodes that can be applied with
V
l
o
w are collected in
c
a
n
d
S
e
t.
To get the maximum power reduction, we set the weight of each
c
a
n
d
S
e
t node to be the power reduction when
V
l
o
w isapplied. The
algorithm then proceeds to select a subset of nodes from
c
a
n
d
S
e
t
that can be applied with
V
l
o
w to yield maximum gain. Since the cu-
mulative delay incurred by simultaneous voltage reduction on the
same path can invalidate the timing constraint, the selected nodes
can not lie on the same path. The problem can be solved by com-
posing a transitive graph of
c
a
n
d
S
e
t, and apply the maximum-
weight independent set algorithm [3], termed MWIS in the pseudo
code, to get
L
o
w
S
e
t. The voltage of nodes in
L
o
w
S
e
t can then
be scaled down to
V
l
o
w. The timing information of the circuit is
re-calculated and the whole procedure is re-started. The entire al-
gorithm stops when there is no
c
a
n
d
S
e
t found.
2
.
1
C
o
m
p
l
e
x
i
t
y
Let
n and
e be the number of nodes and nets in the network, re-
spectively. The
C
V
S technique is basically a breadth-ﬁrst traver-
sal, so it takes
O
(
n
+
e
). Since we use simple static timing anal-
ysis technique, the complexities of
g
e
t
S
l
k
S
e
t,
u
p
t
i
m
i
n
g,a n d
c
h
e
c
k
t
i
m
i
n
g
a
n
d
p
o
w
e
r are again
O
(
n
+
e
). Lastly, the com-
plexity of the procedure MWIS is
O
(
n
e
l
o
g
(
n
2
=
e
)
) [3]. Thus, the
entire algorithm takes at most
O
(
n
e
l
o
g
(
n
2
=
e
)
).
3
C
r
e
a
t
i
n
g
N
e
w
T
i
m
i
n
g
S
l
a
c
k
f
o
r
V
o
l
t
a
g
e
S
c
a
l
i
n
g
The algorithm
D
s
c
a
l
e terminates when all non-critical part of the
circuit has been exploited. This is also the stopping point for sim-
ilar approaches like [4]. However, there are other possibilities un-
explored. In section I, we describe that parallel/pipelining archi-
tectures are often used to speed-up the circuit to make room for
low supply voltage [1]. The idea can also be used in gate-level cir-
cuit. However, we choose not to change the logic structure as the
architectural approaches do, since this could invalidate all previous
timing and power computations. Instead, gates are up-sized to cre-
ate new timing slack so as to gain back in power. Although larger
gates contribute to greater capacitance, which translates to more
power cost, we will demonstrate that with careful gate sizing, the
overall power gain is still rewarding.
We impose the restriction [8] that gates of the lower voltage
cannot be scattered around the entire circuit. Rather, they must to
be in one cluster which is contingent to the primary outputs of the
target system block. Clearly, such a restriction precludes the driv-
ing incompatibility situation, hence requiring no level restoration
except at the boundary of system blocks.
The algorithm (
G
s
c
a
l
e) proceeds in an iterative manner. Dur-
ing each iteration, the
C
V
S technique is coupled with a cut-based
gate sizing to get maximal power reduction. The following pseudo
code describes the operations of
G
s
c
a
l
e. Note that the original
C
V
S technique operates only with the primary outputs as the start-
ing point. The new
C
V
S, on the other hand, operates with every
TCB in an iterative fashion.
Gscale(
N
;
T
s
p
e
c
;
A
s
p
e
c
;
l
i
b
r
a
r
y
;
V
h
i
g
h
;
V
l
o
w)
f
T
C
B
=
C
V
S
(
N
;
T
s
p
e
c
;
l
i
b
r
a
r
y
;
V
h
i
g
h
;
V
l
o
w
);
c
o
u
n
t
e
r =0 ;
while (further area increase is allowed)
f
C
P
N
=
g
e
t
C
P
N
(
N
;
T
C
B
;
T
s
p
e
c
);
w
e
i
g
h
t
w
i
t
h
a
r
e
a
v
e
r
s
u
s
t
i
m
e
g
a
i
n
(
C
P
N
;
l
i
b
r
a
r
y
);
C
U
T
=
m
i
n
w
e
i
g
h
t
s
e
p
a
r
a
t
o
r
(
C
P
N
);
for each node in
C
U
T
resize gates if area increase is allowed;
u
p
d
a
t
e
t
i
m
i
n
g
(
N
;
T
s
p
e
c
);
T
C
B
n
e
w
=
C
V
S
(
N
;
T
s
p
e
c
;
l
i
b
r
a
r
y
;
V
h
i
g
h
;
V
l
o
w
);
if (
T
C
B
n
e
w
=
=
T
C
B)
c
o
u
n
t
e
r++;
else
c
o
u
n
t
e
r =0 ;
if (
c
o
u
n
t
e
r
>
M
A
X
I
T
E
R ) break;
g
g
As in
D
s
c
a
l
e, algorithm
G
s
c
a
l
e starts with
C
V
S to quickly
set the initial region for low supply voltage. The central theme of
the algorithm is to push the
T
C
B towards the primary inputs so
that more nodes can be applied with
V
l
o
w.
Pushing the
T
C
Bis equivalent to speeding up the signal paths
leading from the primary inputs to the
T
C
B.T od ot h i s ,w eh a v e
to identify those nodes that are the real causes of the delay. Similar
to the procedure
g
e
t
S
l
k
S
e
t in
D
s
c
a
l
e, the procedure
g
e
t
C
P
N
uses a static timing analysis to identify the critical path network
(
C
P
N). The
C
P
N represents the set of nodes that are candidates
for improving the timing at
T
C
B.To get the maximum speed-up with the minimum area increase,
we setthe weight of eachnode tobe the area penaltyover the timing
improvement. The goal is to select the set of nodes with the mini-
mum weight and simultaneously resize them. In order to preserve
the timing calculated by
w
e
i
g
h
t
w
i
t
h
a
r
e
a
v
e
r
s
u
s
t
i
m
e
g
a
i
n,
the nodes lying on the same path can not be resized at the same
time. This constraint can be dealt with using the concept of the
separator set. In doing so, the minimum weighted separator set
gives the desired selection of resizing targets.
After the gate sizes have been ﬁxed, an update of the timing of
the whole network is performed. If the timing at
T
C
Bimproves,
we push the
T
C
B further by evoking the
C
V
S technique again.
Otherwise, the value of counter is incremented by one. The itera-
tion continues until a maximum number (
m
a
x
I
t
e
r) of unsuccess-
ful pushes of
T
C
Bhave been attempted.
3
.
1
C
o
m
p
l
e
x
i
t
y
Let
n and
e be the number of nodes and nets in the network, re-
spectively. The
C
V
S technique is basically a breadth-ﬁrst traver-
sal, so it takes
O
(
n
+
e
). Since we use simple static timing anal-
ysis technique, the complexity of
g
e
t
C
P
N and
u
p
t
i
m
i
n
g are
again
O
(
n
+
e
). The complexity of procedure
w
e
i
g
h
t
C
P
N,a n d
the complexity of resizing the
C
P
N nodes can be easily seen as
O
(
n
). Lastly, for
m
i
n
w
e
i
g
h
t
s
e
p
a
r
a
t
o
r, we use the Edmonds-
Karp’s max-ﬂow-min-cut algorithm with a complexity of
O
(
n
e
2
)
[2]. Thus, the entire algorithm takes at most
O
(
n
e
2
).
4
E
m
p
i
r
i
c
a
l
E
v
a
l
u
a
t
i
o
n
The proposed algorithms are implemented in C and built on top of
the SIS [6] package. A total of 72 combinational cells from the
COMPASS 0.6
￿m single-poly double-metal library are used here.
Cells with inverted outputs (e.g., NAND) have three different sizes
(d0, d1, d2), while those with non-inverted outputs have only two
sizes. To support voltage scaling, we enrich the library by adding
the low voltage gates. These gates are copied from, and thus have
the same functionality as, their high-voltage counterparts. How-
ever, the timing characteristics of these gates are obtained via the
low supply voltage. Both the level restoration circuitry proposed in
[8, 10] are used.
Thirty-nine MCNC benchmark circuits are used as the test bed.
Each circuit is ﬁrst optimized in a technology-independent way via
the script ”script.rugged” provided in the SIS package. Then, tech-
nology mapping is evoked to realize each circuit using the enriched
COMPASS library. Since our approach involves adjusting timing
for voltage reduction, it is necessary to set the timing constraint for
each circuit so that subsequent evaluations can be done soundly. To
this end, we ﬁrst run ”map -n1 -AFG” with zero required time on
each circuit. According to the SIS document, the scheme produces
the minimum delay circuit without regard to the area. Then, we
loosen the timing constraint by 20% and issue the map command
again so that the SIS mapper can perform area-delay tradeoff using
the 20% timing slack. Such a setup complies with our very ﬁrst
assumption, that the proposed voltage scaling applies only to those
circuits that are not under the strictest timing budget. The mapped
circuits are then processed by our algorithms using the delay of the
mapped circuit as the timing constraint, which is 20% greater than
the minimum delay. The maximum area increase is set to 10%.
As for the voltage setting, we use (5V, 4.3V) which is in accor-
dance with our internal design project. Table 1 shows the exper-
imental result. The power values in the table are measured using
the generic SIS power estimation function, which comprises ran-
dom simulations using
2
0
M
H
z clock frequency and a pin-to-pin
Elmore delay model. In the second column (
O
r
g
P
w
r)o ft h et a -
ble, we record the power value of the initial, technology-mapped
circuit. In the third column (
C
V
S) , we record the power improve-
ment of the
C
V
S technique over the original circuit. The fourth
and ﬁfth columns record the improvement results of
D
s
c
a
l
e and
G
s
c
a
l
e. The last column shows the CPU time of
G
s
c
a
l
e running
on a SUN Ultra SPARC with 64MB RAMs. The running time of
D
s
c
a
l
e is not listed as it is comparable to that of
G
s
c
a
l
e.A l l
G
s
c
a
l
e results are obtained using a value of ten for
m
a
x
I
t
e
r.
For the
C
V
S technique, there are 7 cases (C1355, C432, C499,
f51m, i2, mux, z4ml) on which
C
V
S can not make any low-power
arrangement, and 2 cases (C3540, i6) on which
C
V
S offers less
than 5% improvement over the original mapping. Nevertheless, the
overall performance of
C
V
S is impressive. There are 18 cases in
which more than 10% improvement is achieved, and 3 cases (lal,
x3, x4) where the improvement ﬁgures are more than 20%. In aver-
age,
C
V
S achieves 10.27% improvement over the original circuit,
which is good considering its simple algorithmic structure.
The results of
C
V
S can be improved by using
D
s
c
a
l
e.H o w -
ever, since it is necessary to insert level restoration circuitry at each
low-to-high boundary, the improvement is quite limited. There are
5 cases (dalu, i2, i3, i6, pcle) where
D
s
c
a
l
e fails to improve
C
V
S.
In average,
D
s
c
a
l
e produces a 12.09% improvement over the orig-
inal power, and 1.82% improvement over the
C
V
S.
Lastly,
G
s
c
a
l
e offers consistent improvement over
C
V
S and
D
s
c
a
l
e. In 17 out of 39 cases (C1355, C3540, C432, C499, C5315,
alu2, f51m, i10, i5, i6, k2, mux, pm1, sct, z4ml, x2, term1), the ﬁg-
ures of
G
s
c
a
l
e are more than twice those of
C
V
S. There are only
3 cases (pcle, i2, i3) where
G
s
c
a
l
e can not make any contribu-
tion. Averagely,
G
s
c
a
l
e delivers a 19.12% improvement over the
original power, which is 8.85% and 7.03% better than
C
V
S and
D
s
c
a
l
e.
Table 2 shows the proﬁles of the circuits processed by
C
V
S,
D
s
c
a
l
e,a n d
G
s
c
a
l
e. The second column shows the original num-
ber of gates. Column 3 and 4 shows the number and ratio of gates
whose voltage has been reduced by
C
V
S. The similar ﬁgures
appeared in columns 5, 6, 7, 8 for
D
s
c
a
l
e and
G
s
c
a
l
e. Lastly,
columns 9, 10, and 11 show the sizing results of
G
s
c
a
l
e.
Note that the ratio of low voltage gates can not be used directly
toderivethepower savingﬁgures inTable1. Thisisbecause weuse
a random simulation to do power estimation. Thus, the ratio must
be properly weighted by the capacitance loading and the switching
probabilityof eachgatetoderivethetruepower consumption value.
Still, the correlation between Table 1 and Table 2 can be observed.
In average,
C
V
Sisable toidentify 37% of the gatesfor voltage
scaling.
D
s
c
a
l
e does a little better by ﬁnding an additional 8% of
gates for voltage scaling. However, the 8% ﬁgures can not be com-
pletely turned into power savings, since level restoration circuitry
consumes extra power. In contrast,
G
s
c
a
l
e raises the portion of
low voltage gates to 70%, and in the mean time keep the number
level restoration gates minimized as the
C
V
S does.
Further, the data in the last 2 columns show that the ratio of
sized gates in
G
s
c
a
l
e is quite small. Except on circuits C1355,
C499, sct, term1, and z4ml, the sizing ratio is less than 10%. When
interpreted using the total gate area, the area overhead is less than
6%. With the small overhead shown in the current table, and the
large power gain shown in the previous table, our use of gate sizing
for voltage reduction is justiﬁed.
5
C
o
n
c
l
u
s
i
o
n
We have proposed gate-level voltage scaling algorithms for those
designs that are not under the strictest timing budget. We showed
that integrating voltage scaling with gate sizing produces an aver-
age of 19.12% power reduction over the circuits that have only one
supply voltage.
Our future research is to improve the gate sizing method so
that the timing impact of selecting a speciﬁc gate can be computedTable 1: Improvement over the Original Power (%)
circuit OrgPwr(
￿W) Improvement(%) CPU
CVS Dscale Gscale
C1355 321.88 0.00 1.98 21.41 7.02
C2670 447.58 14.62 18.27 22.56 20.03
C3540 657.90 2.12 2.73 13.63 27.04
C432 108.66 0.00 4.20 13.83 1.01
C499 326.32 0.00 1.77 15.78 6.02
C5315 1089.07 9.42 12.25 23.75 84.08
C7552 1615.53 9.08 11.46 18.96 130.12
C880 228.49 17.02 17.94 19.09 4.01
alu2 144.87 6.33 8.15 16.74 3.01
alu4 245.74 5.45 6.95 17.74 13.03
apex6 346.72 18.02 20.15 24.70 22.03
apex7 127.61 19.53 21.33 21.56 2.01
b9 67.61 12.63 15.95 19.72 1.50
dalu 250.21 18.63 18.63 21.76 19.03
des 1615.72 18.78 20.72 22.10 347.26
f51m 69.74 0.00 1.80 16.32 1.00
i1 18.54 13.57 15.69 19.10 0.70
i10 997.01 9.28 11.18 20.02 185.14
i2 50.20 0.00 0.00 0.00 0.00
i3 109.61 0.43 0.43 0.43 1.70
i5 146.99 6.36 8.35 13.08 1.80
i6 222.70 3.04 3.04 25.74 15.02
k2 179.22 9.22 11.64 24.00 35.04
lal 41.48 20.65 23.54 23.86 1.02
mux 30.20 0.00 1.73 17.03 1.00
my adder 132.19 11.80 12.03 13.24 1.01
pair 926.39 19.93 20.86 21.67 74.06
pcle 42.15 19.58 19.58 19.58 1.00
pm1 14.64 8.76 11.17 23.37 1.00
rot 388.74 13.88 18.22 22.21 18.02
sct 40.32 7.21 9.01 21.21 0.95
term1 83.40 9.60 12.12 17.53 1.00
too large 117.71 12.48 15.91 23.82 3.01
vda 137.94 14.04 14.96 15.62 6.01
x1 150.51 19.60 21.06 25.00 4.01
x2 23.44 6.51 8.54 22.74 1.00
x3 382.57 22.99 23.84 25.16 20.02
x4 154.36 20.04 20.74 22.42 4.01
z4ml 30.94 0.00 3.71 19.16 0.54
average 10.27 12.09 19.12
more globally and accurately. In addition, advanced timing analy-
sis, such as false path elimination, can be incorporated to improve
the overall results.
R
e
f
e
r
e
n
c
e
s
[1] A. P. Chandrakasan and R. W. Brodersen, Low-power CMOS
digital design, Kluwer Academic Publishers, 1995.
[ 2 ]T .H .C o r m e n ,C .E .L e i s e r s o n ,a n dR .L .R i v e s t ,Algorithms,
Chap. 27, MIT Press, McGraw-Hill Book Co., 1992.
[3] D. Kagaris and S. Tragoudas, ”Maximum independent sets on
transitive graphs and their applications in testing and CAD,”
Proc. Int. Conf. on Computer-Aided Design, Nov. 1997. pp.
736-740,
[4] S. Raje and M. Sarrafzadeh, ”Variable voltage scheduling,” Int.
Symp. on Low Power Design, 1995, pp. 9-14.
[5] J. D. Meindl, ”Low power microelectronics: retrospect and
prospect,” Proc. IEEE, vol. 83, no. 4, Apr. 1995.
[6] E. M. Sentovich et al, ”SIS : A System for Sequential Circuit
Synthesis,” Technical report UCB/ERL M92/41, Univ. of Cal-
ifornia, Berkeley, May 1992.
[7] D. Singh et al., ”Power conscious CAD tools and methodolo-
gies: a perspective,” Proc. IEEE, vol. 83, no. 4, Apr. 1995, pp.
570-593.
Table 2: Proﬁles
circuit Org Low Volt Sizing
CVS Dscale Gscale # AreaInc
# Ratio # Ratio # Ratio
C1355 390 0 0.00 27 0.07 286 0.73 58 0.01
C2670 583 280 0.48 340 0.58 487 0.84 6 0.00
C3540 996 68 0.07 95 0.10 532 0.53 9 0.00
C432 159 0 0.00 29 0.18 70 0.44 9 0.01
C499 390 0 0.00 35 0.09 214 0.55 56 0.01
C5315 1318 503 0.38 620 0.47 1193 0.91 23 0.00
C7552 1957 545 0.28 740 0.38 1281 0.65 82 0.01
C880 295 163 0.55 187 0.63 188 0.64 7 0.01
alu2 291 53 0.18 75 0.26 166 0.57 17 0.01
alu4 573 104 0.18 139 0.24 404 0.71 31 0.02
apex6 664 477 0.72 557 0.84 620 0.93 4 0.00
apex7 217 151 0.70 178 0.82 172 0.79 2 0.01
b9 111 56 0.50 77 0.69 86 0.77 6 0.03
dalu 706 430 0.61 430 0.61 517 0.73 12 0.00
des 2795 2047 0.73 2312 0.83 2384 0.85 115 0.01
f51m 81 0 0.00 6 0.07 47 0.58 6 0.02
i1 35 21 0.60 25 0.71 26 0.74 2 0.02
i10 2121 740 0.35 1022 0.48 1638 0.77 14 0.00
i2 102 0 0.00 0 0.00 0 0.00 0 0.00
i3 114 6 0.05 6 0.05 6 0.05 0 0.00
i5 199 48 0.24 76 0.38 99 0.50 1 0.00
i6 456 48 0.11 48 0.11 448 0.98 13 0.01
k2 880 240 0.27 344 0.39 807 0.92 15 0.01
lal 86 61 0.71 74 0.86 80 0.93 6 0.03
mux 60 0 0.00 4 0.07 33 0.55 4 0.04
my adder 179 76 0.42 78 0.44 84 0.47 3 0.02
pair 1351 952 0.70 973 0.72 1042 0.77 14 0.00
pcle 68 42 0.62 42 0.62 42 0.62 0 0.00
pm1 43 16 0.37 23 0.53 39 0.91 4 0.05
rot 585 289 0.49 396 0.68 488 0.83 2 0.00
sct 73 19 0.26 25 0.34 59 0.81 11 0.05
term1 136 52 0.38 74 0.54 99 0.73 13 0.03
too large 253 99 0.39 126 0.50 227 0.90 7 0.00
vda 485 168 0.35 189 0.39 211 0.44 16 0.01
x1 260 187 0.72 198 0.76 246 0.95 8 0.01
x2 39 10 0.26 14 0.36 33 0.85 3 0.02
x3 625 515 0.82 542 0.87 593 0.95 11 0.00
x4 270 213 0.79 225 0.83 234 0.87 3 0.00
z4ml 41 0 0.00 6 0.15 30 0.73 7 0.06
average 0.37 0.45 0.70 0.01
[8] K. Usami and M. Horowitz, ”Clustered voltage scaling tech-
nique for low-power design” Int. Symp. on Low Power Design,
1995, pp. 3-8.
[9] K. Usami et al., ”Automated Low-Power Technique Exploiting
Multiple Supply Voltages Applied to a Media Processor” IEEE
J. Solid-State Circuits, vol. 33, No. 3, Mar. 1998, pp. 463-472.
[10] Wang, J. S., Shieh, S. J., Wang, J. C., and Yeh, C. Design of
standard cells used in low power ASICs exploiting multiple-
supply-voltage scheme. Proc. 11th ASIC Conf., Sept. 1998.