A Combined Gate Replacement and Input Vector Control Approach for Leakage Current Reduction by Yuan, Lin & Qu, Gang
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 14, NO. 2, FEBRUARY 2006 173
A Combined Gate Replacement and Input Vector
Control Approach for Leakage Current Reduction
Lin Yuan and Gang Qu
Abstract—Input vector control (IVC) is a popular technique for
leakage power reduction. It utilizes the transistor stack effect in
CMOS gates by applying a minimum leakage vector (MLV) to the
primary inputs of combinational circuits during the standby mode.
However, the IVC technique becomes less effective for circuits of
large logic depth because the input vector at primary inputs has
little impact on leakage of internal gates at high logic levels. In this
paper, we propose a technique to overcome this limitation by re-
placing those internal gates in their worst leakage states by other
library gates while maintaining the circuit’s correct functionality
during the active mode. This modification of the circuit does not re-
quire changes of the design flow, but it opens the door for further
leakage reduction when the MLV is not effective. We then present
a divide-and-conquer approach that integrates gate replacement,
an optimal MLV searching algorithm for tree circuits, and a ge-
netic algorithm to connect the tree circuits. Our experimental re-
sults on all the MCNC91 benchmark circuits reveal that 1) the gate
replacement technique alone can achieve 10% leakage current re-
duction over the best known IVC methods with no delay penalty
and little area increase; 2) the divide-and-conquer approach out-
performs the best pure IVC method by 24% and the existing con-
trol point insertion method by 12%; and 3) compared with the
leakage achieved by optimal MLV in small circuits, the gate re-
placement heuristic and the divide-and-conquer approach can re-
duce on average 13% and 17% leakage, respectively.
Index Terms—Gate replacement, leakage reduction, minimum
leakage vector (MLV).
I. INTRODUCTION
AS THE VLSI technology and supply/threshold voltagecontinue scaling down, leakage power has become more
and more significant in the power dissipation of today’s CMOS
circuits. For example, it is projected that subthreshold leakage
power can contribute as much as 42% of the total power in
the 90-nm process generation [11]. Many techniques thus
have been proposed recently to reduce the leakage power
consumption. Dual threshold voltage process uses devices
with higher threshold voltage along noncritical paths to reduce
leakage current while maintaining the performance [16]. Mul-
tiple-threshold CMOS (MTCMOS) technique places a high
device in series with low circuitry, creating a sleep
transistor [13]. In dynamic threshold MOS (DTMOS) [3], the
gate and body are tied together and the threshold voltage is
altered dynamically to suit the operating state of the circuit.
Another technique to dynamically adjust threshold voltages is
Manuscript received May 28, 2005; revised October 15, 2005.
The authors are with the Electrical and Computer Engineering Department
and Institute for Advanced Computer Studies, University of Maryland, College
Park, MD 20742 USA (e-mail: yuanl@eng.umd.edu).
Digital Object Identifier 10.1109/TVLSI.2005.863747
Fig. 1. Leakage current of (a) INVERTER, (b) NAND2, and (c) NAND3. Data
obtained by simulation in cadence spectre using 0.18-m process.
the variable threshold CMOS (VTCMOS) [14]. All of these
approaches require the process technology support.
The input vector control (IVC) technique is applied to re-
duce leakage current at circuit level with little or no performance
overhead [7]. It is based on the well-known transistor stack ef-
fect: a CMOS gate’s subthreshold leakage current varies dra-
matically with the input vector applied to the gate [10]. Recently,
Lee et al. observed that gate oxide leakage is also dependent
on the input vectors to a CMOS gate [12]. Besides, the max-
imal and minimal leakage vectors are the same for both sub-
threshold leakage and gate leakage. In our study, we use Ca-
dence Spectre to measure the overall leakage current in a CMOS
gate that includes both subthreshold leakage and gate leakage.
Fig. 1 lists the overall leakage current in INVERTER, NAND2 and
NAND3 gates under all the possible input combinations. We see
that the worst case leakage (marked in bold) is much higher
than the other cases. The idea of IVC technique is to manip-
ulate the input vector with the help of a sleep signal to reduce
the leakage when the circuit is at the standby mode [9]. The as-
sociated minimum leakage vector (MLV) problem seeks to find
a primary input vector that minimizes the total leakage current
in a given circuit. [1], [4] [6], [8]–[10], [15]. The MLV problem
is NP-complete and both exact and heuristic approaches have
been proposed to search for the MLV. A detailed survey is given
in Section II.
In this paper, we consider how to enhance IVC technique with
little or no re-design effort. In particular, we study the MLV+
problem that seeks to modify a given circuit and determine an
input vector such that the circuit’s functionality is maintained
at the active mode and the circuit leakage is minimized when
the circuit is at standby mode. Our solution to this problem is
based on the concept of gate replacement that is motivated by
the large discrepancy between the worst leakage and the other
cases (see Fig. 1). The essence of gate replacement is to replace
a logic gate that is at its worst leakage state (WLS) by another
library gate. This is illustrated by the following example.
1063-8210/$20.00 © 2006 IEEE
174 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 14, NO. 2, FEBRUARY 2006
Fig. 2. Motivation example for gate replacement. (a) Original MCNC
benchmark circuit C17 with total leakage 831.08 nA under the optimal MLV.
(b) New circuit C17 with three gates replaced and total leakage 476.88 nA
under the same MLV.
Consider circuit C17 from the MCNC91 benchmark
suite [21] [Fig. 2(a)]. An exhaustive search finds the MLV
, with the corresponding minimum total leakage
current of 831.08 nA. Note that gate has its worst leakage
current (454.5 nA) with input , which contributes more
than half of the total leakage. In fact, we have observed that a
significant portion of the total leakage is often caused by the
gates that are in their WLS (see Table II in Section V).
Instead of controlling the primary inputs, we consider re-
placing these leakage-intensive gates. In particular, we replace
the NAND2 gate by a NAND3 , where the third input
is the complement of the SLEEP signal [Fig. 2(b)]. At
active mode, and produces the same output
as . But at the standby mode, and has a
leakage of 94.87 nA [Fig. 1(b)], which is much smaller than
’s 454.5 nA.
However, this replacement also changes the output of this gate
at the sleep mode and affects the leakage on gates and . In
this case, we replace them in a similar fashion. As a result, the
new circuit’s total leakage becomes 476.88 nA, a 43% reduction
from the original 831.08 nA in Fig. 2(a).
The proposed gate replacement technique is conceptually dif-
ferent from the existing IVC methods. In fact, they are com-
plementary to each other. Specifically, IVC method considers
the entire circuit and searches for an appropriate input vector in
favor of small leakage. The gate replacement technique targets
directly at the logic gates that are in their WLS under a specific
input vector and replace them to reduce leakage. This paper has
the following contributions.
1) We examine the effectiveness of IVC methods1 in mul-
tilevel circuits. For all the 69 MCNC91 benchmarks, we
1IVC-based approaches such as internal control point insertion [1] will be
discussed in Section II.
obtain the optimal MLV for small circuits and the best
over 10 000 random input vectors for large circuits. The
number of gates in their WLS are on average 15% and
17%, respectively, but they contribute more than 40% of
the circuit’s total leakage.
2) Motivated by the above observation, we propose the tech-
nique to replace gates that are in their WLS by other li-
brary gates that will generate less leakage current at those
states. Unlike other leakage reduction techniques such as
MTCMOS and DTMOS, this modification of the circuit
does not require changes of process technology in the
design flow. Hence, it will not increase the design com-
plexity or the leakage sensitivity.
3) We implement a fast gate replacement algorithm that
gives an average of 10% leakage reduction for a fixed
input vector. This algorithm’s run time complexity is
linear to the number of gates in the circuit in average
cases and quadratic in the worst case.
4) We develop a divide-and-conquer approach to combine
gate replacement and IVC. It reduces the leakage by 17%
and 24% over the optimal/suboptimal MLV mentioned in
1) with little area and delay overhead. The number of gates
in their WLS is dropped to 4% and 9%, respectively.
II. RELATED WORK
In this section, we mainly survey the efforts on IVC-based
leakage reduction techniques. A survey on other leakage mini-
mization techniques can be found in [7].
The effect of circuit input logic values on leakage current was
observed by Halter and Najm [9]. The underlying reason of this
effect was explained by Johnson et al. [10] as the transistor stack
effect. Authors in [9] proposed a technique to insert a set of
latches with MLV stored in to the primary inputs of a circuit,
forcing the combinational logic into a low-leakage state when
the circuit is idle. Many algorithms have been proposed to find
such MLV. Based on the nature of these algorithms, they can be
classified into the following groups:
Heuristic Algorithms: These include the random search al-
gorithm developed by Halter and Najm [9] and the genetic al-
gorithm proposed by Chen et al. [5].
Johnson et al. [10] defined leakage observability for each pri-
mary input as the degree to which the value of a particular input
is observable in the magnitude of leakage current. They itera-
tively chose the input with the largest leakage observability and
assigned it with a value that results in the smallest leakage. The
input combination constructed in this greedy fashion was taken
as the MLV.
In [15], Rao et al. introduced the concept of node controlla-
bility, which is defined as the minimum number of inputs that
have to be assigned to particular values to ensure that a node (or
gate) is in a specific state. Based on this, they proposed a fast
greedy heuristic to determine the values of the primary inputs
that minimize the node’s leakage.
Exact Algorithms: The MLV problem can be modeled as a
pseudo-Boolean satisfiability (SAT) problem. This formulation
allows us to apply the off-the-shelf SAT solvers to find the MLV
for leakage reduction [1], [2].
YUAN AND QU: A COMBINED GATE REPLACEMENT AND INPUT VECTOR CONTROL APPROACH FOR LEAKAGE CURRENT REDUCTION 175
Gao and Hayes [8] formulated the MLV problem as an in-
teger linear programming (ILP) problem. They first use pseudo-
Boolean functions to represent leakage current in different types
of cells with the general sum-of-products form. Then they apply
the well-known Boole–Shannon expansion [19] to linearize the
objective function and constraints. At last, they use an off-the-
shelf ILP solver to solve the ILP optimization. For large circuits,
the authors proposed a simplified mixed-ILP formulation that
uses selective variable-type relaxation to reduce the runtime.
Based on the pseudo-Boolean formulation of the leakage in
CMOS gates, two implicit pseudo-Boolean enumeration algo-
rithms are presented in [6]. The input space enumeration method
leverages integer valued decision diagrams and works well for
small circuits. The hyper-graph partitioning based recursive al-
gorithm represents a given circuit as a hyper-graph, partitions it,
and uses divide-and-conquer to solve the problem. The trade-off
between dynamic and leakage power in choosing the MLV has
also been discussed.
Internal Point Control: Due to the ineffectiveness of IVC
technique for circuits with large logic levels, Abdollahi et al.
proposed a technique to directly control the value of internal
pins to reduce leakage [1]. Their first approach inserts multi-
plexers at the input pins of each gate. The SLEEP signal se-
lects the correct input in active mode and chooses the input
values that produce low leakage current in standby mode. This
approach can reduce leakage in the CMOS gates significantly;
however, the inserted multiplexers will also generate leakage
current and introduce extra delay and area. In their second ap-
proach, they modify the library gates by adding SLEEP signal-
controlled transistors in the gate to select the low-leakage inputs
for its fanout gates. However, since the structure of the gates is
changed, a new set of library gates are needed.
Our gate replacement technique belongs to the class of in-
ternal point control, but is conceptually different from [1] in the
following aspects.
1) They treat each input pin of the gates as potential places to
insert multiplexers, while we consider only roots of each
tree. The search space is reduced substantially.
2) Their purpose of modifying a gate is to produce the
low-leakage input for G’s fanout gate while we aim to
reduce leakage current at itself.
3) They modify gates whenever necessary while we restrict
our algorithm to replace gates only by the available gates
in the library, and, hence, do not require gate structure
modification.
However, these two approaches can be combined as we will
discuss in more details in Sections III and IV.
III. LEAKAGE REDUCTION BY GATE REPLACEMENT
A logic gate is at its WLS when its input yields the largest
leakage current. Regardless of the primary input vector, a large
number of gates are at WLS, particularly when the circuit has
high logic depth. Take the 69 MCNC91 benchmarks for ex-
ample. For each of the 69 circuits, when we apply the optimal
(or suboptimal) MLVs to these circuits, 16% of the gates on av-
erage remain at WLS, producing more than 40% of the circuit’s
Fig. 3. Gate replacement and the consequence to its fanout gate.
total leakage. A detailed report can be found in Section V. In this
section, we describe the gate replacement technique that targets
directly the leakage reduction in WLS gates.
A. Basic Gate Replacement Technique
As we have shown in the motivation example in Section I,
the proposed gate replacement technique replaces a gate
by another library gate , where is the input
vector at , such that
1) when the circuit is active
;
2) has smaller leakage than when the circuit is
in standby .
The first condition guarantees the correct functionality of the cir-
cuit at active mode. The second condition reduces the leakage
on gate at the standby mode, but it may change the output of
this gate. Note that, although we do not need to maintain the cir-
cuit’s functionality at the standby mode, this change may affect
the leakage of other gates and should be carefully considered.
Fig. 3(a) shows that the replacement of by changes the
output from 0 to 1. For simplicity, we assume that ’s fanout
only goes to gate which can be either a NAND or a NOR or
an INVERTER. In Fig. 3(b) and (d), we see that such change does
not affect the output of gate and, therefore, it will not affect
any other gates in the circuit. Let be the leakage of
gate with input 11, we can conveniently compute the leakage
reduction by this replacement, which is
in the case of (b) for example.
In Fig. 3(c), the replacement at gate not only changes the
output of gate , it also puts at its WLS. Our solution is
to replace the NAND2 gate by an NAND3 . This preserves
the output of and the leakage change will be
. Similarly, in Fig. 3(f),
we replace the INVERTER by a NAND2 gate. Finally, in Fig. 3(e),
the replacement of moves both gates and away from
their WLS. It also changes the output of the NOR gate , which
we can conduct similar analysis.
176 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 14, NO. 2, FEBRUARY 2006
Remarks:
• General Fanout: The above analysis is applicable to ’s
fanout gate of any type. The change of ’s output either
does not affect ’s output [Fig. 3(b) and (d)] or changes
’s output. In the latter case, we either change ’s output
back (Fig. 3(c) and (f)) or continue the analysis starting
from [Fig. 3(e)].
• Beyond library gates: If the library does not have a re-
placement for , we can add one transistor into the N or
P sections of to meet conditions 1 and 2. This is similar
to the gate modification method proposed in [1]. However,
they attempt to control the output of the modified gate in
order to reduce the leakage in its fanout gate by producing
the desirable signal. Our gate replacement targets directly
at the leakage reduction of the current gate.
• Multiple fanouts: When gate has multiple fanouts, we
analyze each of them and then consider their total leakage
when we compute the leakage change due to the replace-
ment of gate .
• Compatibility: The gate replacement technique does not
change the primary input vector of the circuit. This im-
plies that we can combine it with existing MLV searching
strategies to further reduce leakage. The MLV+ problem
is based on this observation and is discussed in details in
Section III-B.
• Power overhead: There is not much dynamic power over-
head because the SLEEP signal remains constant at active
mode and will not cause any additional switching activities.
The leakage in gates and may be different at active
mode. Such difference becomes negligible when the circuit
stays at standby mode long enough [1].
• Other overhead: Gate replacement may introduce delay
and area overhead. This overhead can be controlled by re-
stricting the replacement off critical path and transistor re-
sizing. Gate replacement does not add new logic gates and
thus requires little or no effort to redo the place-and-route.
B. Fast Gate Replacement Algorithm
Based on the above gate replacement technique, we propose
a fast algorithm that selectively replaces gates to reduce the cir-
cuit’s total leakage for a given input vector. Fig. 4 gives the pseu-
docode of this algorithm.
We visit the gates in the circuit by the topological order. We
skip all the gates that are not at WLS and the gates that have
already been visited or marked (line 16) until we find a new
gate at WLS (line 2). Lines 3–9 find a subset of gates
and temporarily replace them. includes all the unmarked gates
whose leakage and/or output is affected by the replacement we
attempt to do on gate and other gates in . We then compute
the total leakage change caused by the replacement of gates in
(line 10) and adopt these replacements if there is a leakage
reduction (lines 11–13). Otherwise, we simply mark gate as
visited and do not make any replacement (line 14). We then look
for the next unmarked gate at WLS and this procedure stops
when all the gates in the circuits are marked.
Correctness: The topological order guarantees that when we
find a gate at its WLS, all its predecessors have already been
Fig. 4. Pseudocode of the gate replacement algorithm.
considered. The replacement at line 7 ensures that the func-
tionality will not change at the active mode. The subset con-
structed in the while loop (lines 4–9) is the transitive closure
of gates that are affected by the replacement action at gate .
Therefore, we only need to compute the leakage change on gates
within (line 10). We make the replacement only when this
leakage change is in favor of us, so the new circuit will have
less leakage in standby mode.
Complexity: Let be the number of gates in the circuit. The
for loop is linear to . Inside the for loop, the computation of
leakage change and the marking of all gates in (line 10–15) is
linear to , the number of gates in . The while loop (lines
3–9) stops when there is no new addition to and this will
be executed no more than times. As we have discussed in
Section III-A (see Fig. 3), in most cases, includes only and
its fanout gates. However, it may include all the gates of the
circuit in cases similar to Fig. 3(e) and so cannot be bounded
by any constant. That is, is in the worst case and
on average, where is the maximal fanout of the gates in the
circuit. Consequently, the complexity of this gate replacement
algorithm is in the worst case and on average.
Improvement: There are several ways to improve the
leakage reduction performance of the above gate replacement
heuristic. The tradeoff will be either increased design com-
plexity, or reduced circuit performance, or both. First, one can
consider gates that are not in the library as we have commented
in the second remark in Section III-A (line 6). However, this
requires the measurement of leakage current, area and delay
in these new gates as they are not available in the library. A
second alternative is to insert control point at one of ’s fanins.
For example, one can find the fanin such that replacing
by its complement gives the largest leakage reduction.
If , replace it by OR ; if , replace
it by AND . However, the addition of new gates
may require the repeat of placement and routing and will incur
more area and delay penalty in general. Third, one may also
consider both the library gate replacement and control point
insertion at the same time and choose the one that gives more
YUAN AND QU: A COMBINED GATE REPLACEMENT AND INPUT VECTOR CONTROL APPROACH FOR LEAKAGE CURRENT REDUCTION 177
Fig. 5. Illustration for the proof of the NP-completeness of the MLV problem. (a) Circuit for satisfiability text. (b) Reducing the satisfiability test to MLV.
leakage reduction. Finally, whenever we replace gate , we
also make the replacement for all the other gates in the selection
permanent (line 13). We have tested a couple of alternatives
and they give limited improvement in leakage reduction at very
high cost of run time complexity.
The incentive to keep the run time complexity of this gate
replacement algorithm low is that it will be combined with IVC
technique under the following divide-and-conquer approach to
solve the MLV+ problem.
IV. SOLVING THE MLV+ PROBLEM
Recall that the MLV problem seeks the input vector that min-
imizes the circuit’s total leakage. It has been claimed that this
problem is NP-complete for general circuits [1], [6], [10], [15].
But no formal proof has been given to our knowledge. In this
section, we first give a brief proof of the NP-completeness of
the MLV problem and then define the MLV+ problem, an ex-
tension of the MLV problem. Our main focus will be on the di-
vide-and-conquer approach that solves the MLV+ problem.
A. NP-Completeness of the MLV Problem
The MLV problem can be defined as follows: given a combi-
national circuit consisting of primary inputs (PIs), primary out-
puts (POs), internal logic gates connected by nets/wires, and the
leakage current of each gate under different input combinations,
determine an input vector at the PIs such that the total leakage
current of all the gates in the circuit is minimized.
Theorem: The MLV problem is NP-complete.
Proof: On one side, we have already mentioned a couple
of exact algorithms that solve the MLV problem by reducing it
to NP-complete problems such as pseudo-Boolean satisfiability
and ILP.
On the other side, we show that the NP-complete CIR-
CUIT-SAT problem [18] can be reduced to the MLV problem.
Consider an arbitrary circuit shown in Fig. 5(a), to test whether
the circuit is satisfiable (i.e., producing a logic “1” at its output),
we construct a new circuit by adding a big inverter at its output
[Fig. 5(b)]. The inverter is big in the sense that it has a huge
leakage value when its input is “0” and a small leakage
when its input is “1.” Actually, we can set to be the sum
of and the leakage of each gate in the circuit when it is in
its WLS. Now we solve the MLV problem for this modified
circuit. If the total leakage is less than , clearly the original
circuit is satisfiable and the MLV is one input vector that makes
the circuit output logic “1.” Otherwise, because that the only
way for the total leakage to be larger than is when the input
to the big inverter is “0,” the original circuit is not satisfiable.
B. MLV+ Problem and Outline of the Divide-and-Conquer
Approach
In the previous section, we have seen that leakage current can
be further reduced over the MLV by the proposed gate replace-
ment technique. We have also mentioned that this technique
is independent of the input vector and can be combined with
the MLV method. We, hence, formulate the following MLV+
problem.
Given a combinational circuit with PIs, POs, the internal
logic gates that implement the PI-PO functionality, and the
leakage current of each library gate under its different input
patterns, determine a gate level implementation of the same
PI-PO functionality without changing the place-and-route and
an input vector at the PIs that minimizes the total leakage.
Apparently, this is an extension of the MLV problem with the
relaxation of modifying circuit by gate replacement. It enlarges
the search space of MLV and provides us with the opportunity
of finding better solutions. For a circuit of PIs and internal
logic gates, the search space for the original MLV problem is
the different input combinations. Under the above MLV+
formulation, the search space becomes , where is
the number of library gates that can replace gate , including gate
itself. Assuming that half of the gates have one replacement,
then the solution space for MLV+ problem will be times
larger than the solution space for the MLV problem. Even when
we restrict the gate replacement technique only to gates that
are at their WLS, this will be significant because 1) a circuit
normally has more gates than PIs and 2) the percentage
of gates in WLS is considerably high (16% on the MCNC91
benchmark when MLV is applied, and will be higher as the logic
depth of the circuit increases).
As we have analyzed in the previous section, the MLV+
problem not only enlarges the solution space for the IVC
method, it also has the great potential in improving the solution
quality (in terms of leakage reduction) because of the stack ef-
fect. However, one challenge is how to explore such enormous
solution space for better solutions. Given the NP-completeness
of the MLV problem, we consider special circuits where the
MLV+ can be solved optimally and develop heuristics for the
general case. In the rest of this section, we describe details of
our proposed divide-and-conquer approach that consists of the
following phases:
178 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 14, NO. 2, FEBRUARY 2006
1) decompose a general circuit into tree circuits.
2) find the MLV for each tree circuit optimally by dynamic
programming.
3) apply the gate replacement technique to the MLV for each
tree to further reduce leakage.
4) connect the tree circuits by a genetic algorithm.
C. Finding the Optimal MLV for Tree Circuits
A tree circuit is a single output circuit in which each gate, ex-
cept the primary output, feeds exactly one other gate. A general
combinational circuit can be trivially decomposed into nonover-
lapping tree circuits [19]. (This is illustrated in Fig. 7.) The cir-
cuit in (a) is not a tree because gate has two fan-out gates
and . By splitting at the fanout of , we get three trees
with , and being the root of each tree, respectively.
We consider a tree circuit with gates
sorted in the topological order, which is preserved by the tree
decomposition.
Let be the leakage current in the gate when
vector is applied at ’s fanins. Each gate can be treated
as the root of a subtree circuit. Let be the minimum
total leakage of the tree circuit when it outputs logic value
at root and be the input vector to the tree circuit
that achieves . We develop a dynamic program-
ming approach to compute the pairs and
for each gate . The MLV for the tree
circuit rooted at gate , with gates sorted
in the topological order, can then be determined conveniently.
1) For each input signal to the tree, define
(1)
2) For each gate , let
(2)
(3)
where are the fanins of from gates
, respectively, and the input combina-
tion achieves .
3) The minimum leakage of the tree circuit with gates
is given by
(4)
and the MLV will be either or accord-
ingly.
A step-by-step illustration of the dynamic programming can
be found in [17].
Correctness: We show the correctness of the recursive for-
mula in (2) and (3). To compute , we need to con-
sider all the possible combination of fanins that
Fig. 6. MLV in a circuit before and after gate replacement.
produces output at gate . For each such combination, the
minimum leakage in the subtree rooted at is the sum of
leakage at gate and the minimum leakage at each of its fan-in
gate with output , . Equation (2) takes the
overall minimum leakage and gives the correct . As-
sume that this minimum leakage is achieved when has fanins
. Note that is the input
vector for the subtree circuit rooted at to produce with
the minimum leakage . The tree structure of the cir-
cuit guarantees that the subtrees rooted at will
not share any common inputs. Therefore, is the simple
concatenation of as given in (3).
Complexity: Equations (1) and (4) take constant time. For
each gate , we need to compute and
by (2) and (3). This requires the enumer-
ation of all the different combinations of ’s fanins. For
the first time, we need to perform additions in (2). If we
enumerate the rest cases following a Gray code, we
only need to update (two operations), replace one
(two operations) and compare the result with the
current minimum leakage, a total of five operations. Therefore,
we need operations for each and this gives a
complexity of , where is a constant depending on
the largest number of fanins in the circuit.
After obtaining the MLV for the tree circuit, we perform the
gate replacement algorithm proposed in Section III to further
reduce leakage. Note that, although the MLV is optimal, this
does not guarantee us an optimal solution for the MLV+ problem
on the tree circuit. For example, consider the circuit in Fig. 6, the
algorithm finds the optimal MLV with leakage
422 nA. Gate 2 is at its WLS and the gate replacement algorithm
does not give any improvement. The input vector gives
the maximum leakage 654 nA; however, when we apply gate
replacement technique and replace , the leakage is reduced
to 295 nA. In fact, is the optimal solution for the MLV+
problem.2
D. Connecting the Tree Circuits
In the previous phase, we have determined the output and re-
quired input for each individual tree circuit to yield the min-
imum leakage. The goal of this phase is to combine all the tree
circuits to solve the MLV+ problem for the original circuit. The
root of each tree circuit may have multiple fanouts that go to
2We conjecture that the MLV+ problem remains NP-hard for tree circuit. Be-
cause we have already lost the optimality when we do the tree decomposition,
we will not discuss in details on how to find better solutions to MLV+ on tree
circuits. For the same reason, we did not focus on how to improve the fast gate
replacement algorithm in Section III-B.
YUAN AND QU: A COMBINED GATE REPLACEMENT AND INPUT VECTOR CONTROL APPROACH FOR LEAKAGE CURRENT REDUCTION 179
Fig. 7. Resolving the conflict in connecting tree circuits.
other tree circuits as input. Since we treat the tree circuits in-
dependently, conflict occurs if the output of a tree circuit and
the value required by its fanout gates are not consistent. For ex-
ample, in Fig. 7(a), the circuit is decomposed into three tree cir-
cuits , and . outputs “1” when its MLV is applied,
while and require “0” and “1” from in their respective
MLVs. So we have a conflict.
There are basically three ways to resolve this conflict:
(I) enforcing ’s output at all the fanout gates [Fig. 7(b)];
(II) changing ’s output and enforcing this new value at
all the fanout gates [Fig. 7(c)];
(III) inserting an AND gate to allow them to be inconsistent
[Fig. 7(d)]. Similarly, if output “0” and some of its
fanouts require “1,” we can add an OR gate [as shown in
Fig. 7(e)].
To decide which one we should use to resolve the conflict, we
apply each of them and re-evaluate the circuit’s total leakage. In
(I), this requires the recomputing of the minimum leakage and
the MLV for tree circuit under the condition that its input
from is logic “1.” The dynamic programming algorithm in
Section IV-B can be trivially modified for this purpose. In (II),
we need to do the same procedure for tree circuit . Besides,
we have to replace the pair for tree circuit
by .
Both (I) and (II) resolve the conflict by sacrificing the min-
imum leakage of tree circuits under the provably optimal MLV.
In (III), we successfully connect the tree circuits while pre-
serving the minimum leakage and MLV for each tree with the
help of the SLEEP signal-controlled AND or OR gates. The cost
is that we have to add the leakage of the inserted AND or OR
gate into the total leakage. We mention that this gate addition
also preserves the correctness of the circuit at active mode when
.
It is now easy to make a decision on which method to adopt
to resolve a single conflict: use the one that gives the minimum
leakage. However, the decision at one conflict may affect the
existence of conflict at other places in the circuit. For example,
method (I) in Fig. 7(b) could change the output of tree and
directly affect whether there is a conflict at the root of .
We use a genetic algorithm (GA) to resolve the conflicts and
connect all the tree circuits. A solution by the GA is in the form
of a binary bit stream, each bit indicates whether there is a con-
flict at the root of a tree and which method to use to resolve it.
In particular, a “1” means there is a conflict and method (III)
should be used; a “0” means that there is either no conflict or
we should use the better one of methods (I) and (II) to resolve
the conflict.
The GA follows a standard routine where we start with a pop-
ulation of random bit streams (referred to as chromosomes).
Based on each bit stream, we resolve the conflict, apply the dy-
namic programming algorithm in Section IV-B to re-compute
the minimum leakage of a tree circuit when methods (I) and (II)
are used, run the gate replacement algorithm in Fig. 4 on the
entire circuit, and compute the circuit’s total leakage. The fit-
ness for a bit stream is calculated from the leakage value. The
smaller the leakage, the larger the fitness. We sort all the chro-
mosomes according to their fitness and create the next gener-
ation by the roulette wheel method. In this method, the proba-
bility that a chromosome is selected as one of the two parents
is proportional to its fitness. Crossover, which refers to the ex-
change of substrings in two chromosomes, is performed among
parents to produce children. A simple mutation operation, which
flips a bit in the chromosome at the bit mutation rate, is also
used. The GA continues to generate a total of new chromo-
somes and starts for the next generation. This process repeats
for certain number of times (50 in our simulation) and the best
chromosome is returned as the optimal solution.
E. Overhead Analysis
As the control gates are introduced in the tree-connecting
stage of the algorithm, they also require sleep signal to control.
Hence, we need to consider the extra power these control gates
and sleep signal may consume, and their effect on the overall
power saving. In this subsection, we will discuss the power over-
heads.
1) Control Gates: The control gates will consume extra dy-
namic power and leakage power. In this paper, we only consider
the leakage power overhead of the inserted gates and ignore their
dynamic power due to the following reasons. First, the number
180 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 14, NO. 2, FEBRUARY 2006
of inserted control gates only accounts for 5% to 6% of the total
number of gates in the circuit. Second, they are simple 2-input
AND and OR gates, which have a relatively small intrinsic capac-
itance at the node compared to other gates. Third, the switching
activities in these control gates are very limited because one of
the two inputs is the sleep signal, which changes only at the mo-
ment when the circuit switches between active mode and sleep
mode. As dynamic power is dependent on physical capacitance
and switching activities, we consider this dynamic power over-
head is negligible.
As for leakage power, we measured the average leakage cur-
rent in control gates over all possible inputs. In our algorithm,
we add this extra leakage current to the objective function, i.e.,
the overall leakage current to be minimized. Therefore, the
leakage saving achieved in our algorithm has already consid-
ered this overhead.
2) Sleep Signal: Both the gate replacement and the control
gates require the sleep signal to drive them during active and
sleep mode. The generation of the sleep signal may consume
extra power. However, due to the fact that our experiment was
conducted at the logic synthesis level before placement and
routing, it is not practical to obtain such power data quantita-
tively. On the other hand, the sleep signal is required by many
other leakage minimization techniques, such as [1], [3], and
[13]. Hence, in this paper, we expect the generation of the sleep
signal to be similar to those approaches and we believe this
problem can be better solved at the physical level of circuit
design.
V. EXPERIMENTAL RESULTS
We implemented the gate replacement and divide-and-con-
quer techniques in SIS environment [20] and applied them on
69 MCNC91 benchmark circuits. Each circuit is synthesized
and mapped to a 0.18- m technology library. We use Cadence
Spectre to simulate the leakage current for all the library gates
under every possible input vector. The supply voltage and
threshold voltage are 1.5 and 0.2 V, respectively. The measured
leakage current includes both subthreshold and gate leakage.
The simulations are conducted on an Ultra SPARC SUN work-
station.
Our results are compared with traditional IVC methods in
terms of leakage saving, run time, area and delay penalty. The
69 benchmarks include 26 small circuits with 22 or fewer pri-
mary inputs (Table I) and 43 large circuits (Table II). For each
small circuit, we find the optimal MLV by exhaustive search. For
each large circuit, we choose the best MLV from 10 000 distinct
random input vectors. It is reported that this will give us a 99%
confidence that the vectors with less leakage is less than 0.5% of
the entire vector population [9], [15]. To have a fair comparison
with [1], we also collect the average leakage of 1000 random
input vectors for each large circuit.
Table I reports the results for the 26 small circuits. Column 4
lists the leakage current for each circuit when the best MLV
is applied. Even in this case, an average of 15% of the gates
are at WLS as shown in column 5. The fast gate replacement
algorithm is able to move about half of these gates from their
WLS (column 7). This results in a 13% leakage reduction with
TABLE I
RESULTS ON 26 SMALL CIRCUITS WITH 22 OR LESS PRIMARY INPUTS
only 4% area increase (columns 6 and 8). We mention that we
restrict ourselves to replace only gates off critical paths. This
leaves 8% of the gates in the circuits at their WLS, but it also
guarantees us that there is no delay overhead.
The last four columns show that the divide-and-conquer algo-
rithm gives a 17% leakage reduction over the best MLV at the
cost of 9% more area. We incorporate delay constraints in the
genetic algorithm to ensure that the delay overhead to be within
5%. The two columns in the middle are the number of tree cir-
cuits in each case and the number of control gates we have used
to connect these trees. Only in three cases, we have inserted
more than five control gates. Note that the addition of control
gates may decrease the delay because it reduces the fanouts of
the gate. The area increase comes from the addition of control
gates and the replacement of “smaller” gates by “bigger” library
gates.
Fig. 8 reports the leakage and wls gates reduction in the 43
large circuits ( -axis) with 22 PIs or more. We replace the in-
feasible exhaustive search by the best solution from a random
search of 10 K input vectors. The fast gate replacement algo-
rithm are restricted only on gates off critical paths; for the di-
vide-and-conquer approach, we set the maximal delay increase
to be 5%.
The benchmarks are sorted by the total leakage achieved by
the divide-and-conquer method normalized to the best over 10 K
random search, which is shown one of the two curves at the
top part of the figure. The average leakage reductions are 10%
by gate replacement only (leakage G.R.) and 24% by divide-
and-conquer method (leakage D.C.). The maximal leakage re-
ductions are 46.4% and 60%, respectively. The three curves at
the bottom give the ratio of WLS gates. On average, the 10 K
random search has 17% gates at WLS(orig, wls); the proposed
fast gate replacement and divide-and-conquer techniques reduce
this ratio to 11%(G.R. wls) and 9%(D.C. wls), respectively.
YUAN AND QU: A COMBINED GATE REPLACEMENT AND INPUT VECTOR CONTROL APPROACH FOR LEAKAGE CURRENT REDUCTION 181
TABLE II
RESULTS ON 43 LARGE CIRCUITS WITH PRIMARY INPUTS MORE THAN 22
Fig. 8. Leakage and WLS percentage on 43 large circuits with 22 PIs or
more.X-axis lists benchmarks sorted by leakage current in divide-and-conquer
approach; Y -axis shows percentage of leakage and WLS gates.
More detailed results for these 43 circuits are shown in
Table II. Columns 4–6 list the leakage current, runtime, and
percentage of gates at WLS when the best MLV from 10 000
random vectors is applied to each circuit. The next four columns
show the results when the fast gate replacement algorithm is
applied to such best MLV. The average run time is only 0.05 s
and increases linearly to the number of gates in the circuit.
There is no delay overhead and the area increase is only 2%.
The next seven columns show results by the divide-and-con-
quer approach where we set a 5% maximum delay constraint.
In the genetic algorithm, we start with a population size of
and it converges after 50 generations. We are able to
achieve, over the best MLV from 10 000 random vectors, 24%
leakage saving with 7% area penalty on average. Although the
average run time is 6 of the random search, we mention that
this is mainly caused by the two circuits, i8 and des. They have
a couple of large tree circuits and, therefore, the frequently
called dynamic programming takes considerably long time.
Excluding these two circuits, the average run time for random
search and the divide-and-conquer algorithm drop to 64.7s and
143s, respectively. More importantly, we see clearly the run
time for random search increases exponentially to the number
of primary input and linearly to the number of gates (columns
182 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 14, NO. 2, FEBRUARY 2006
TABLE III
AVERAGE PERFORMANCE COMPARISON WITH ALGORITHM
2,3,5). However, the run time for the divide-and-conquer ap-
proach grows at a much slower pace (column 12).
Finally, the last two columns compare our results with those
reported in [1]. Because their detailed results are not available,
we can only compare the average performance. In their exper-
imental setup, the leakage reduction is compared with the av-
erage value among 1000 random vectors. For a fair compar-
ison, we also report in the last two columns the improvement
of our approaches over the same baseline. Table III summa-
rizes the performance improvement in the control point inser-
tion approach [1], our gate replacement algorithm, and the di-
vide-and-conquer approach.
VI. CONCLUSION
We study the MLV+ problem which seeks to modify a given
circuit and determine an input vector such that the correct func-
tionality is maintained when the circuit is active and the leakage
is minimized under the determined input vector when the circuit
is at stand-by mode. The relaxation of circuit modification with
changing its functionality enlarges the solution space of the IVC
method. We show that MLV (and, hence, MLV+) problem is a
hard problem and propose low-complexity heuristics to solve
the MLV+ problem. The proposed algorithms are practical and
effective in the sense that we do not need to change the design
flow and re-do place-and-route. The experimental results show
that this technique improves significantly the performance of
IVC in leakage reduction at gate level with little area and delay
overhead.
ACKNOWLEDGMENT
The authors would like to thank the Editor-in-Chief, the As-
sociate Editor, and the reviewers for their valuable comments.
A full version of this paper can be found in [17].
REFERENCES
[1] A. Abdollahi, F. Fallah, and M. Pedram, “Leakage current reduction in
CMOS VLSI circuits by input vector control,” IEEE Trans. Very Large
Scale Integr. (VLSI) Syst., vol. 12, no. 2, pp. 140–154, Feb. 2004.
[2] F. Aloul, S. Hassoun, K. Sakallah, and D. Blaauw, “Robust SAT-based
search algorithm for leakage power reduction,” in Proc. Int. Workshop
Integr. Circuit Des., 2002, pp. 167–177.
[3] F. Assaderaghi, D. Sinitsky, S. A. Parke, J. Bokor, P. K. Ko, and C. Hu,
“Dynamic threshold-voltage MOSFET(DTMOS) for ultra-low voltage
VLSI,” IEEE Trans. Electron Devices, vol. 44, no. 3, pp. 414–422, Mar.
1997.
[4] S. Bobba and I. N. Hajj, “Maximum leakage power estimation for
CMOS circuits,” in Proc. IEEE Alessandro Volta Memorial Workshop
Low Power Des., 1999, pp. 116–116.
[5] Z. Chen, M. Johnson, L. Wei, and K. Roy, “Estimation of standby
leakage power in CMOS circuits considering accurate modeling of
transistor stacks,” in Proc. ISLPED, 1998, pp. 239–244.
[6] K. Chopra and S. B. K. Vrudhula, “Implicit pseudo-Boolean enumer-
ation algorithms for input vector control,” in Proc. DAC, 2004, pp.
767–772.
[7] D. Duarte, Y. Tsai, N. Vijaykrishnan, and M. Irwin, “Evaluating run-time
techniques for leakage power reduction,” in Proc. VLSI Des., 2002, pp.
31–38.
[8] F. Gao and J. P. Hayes, “Exact and heuristic approaches to input vector
control for leakage power reduction,” in Proc. ICCAD, 2004, pp.
527–532.
[9] J. Halter and F. Najm, “A gate-level leakage power reduction method for
ultra low power CMOS circuits,” in Proc. CICC, 1997, pp. 475–478.
[10] M. C. Johnson, D. Somasekhar, and K. Roy, “Models and algorithms for
bounds on leakage in CMOS circuits,” IEEE Trans. Comput.-Aided Des.
Integr. Circuits Syst., vol. 18, no. 6, pp. 714–725, Jun. 1999.
[11] J. Kao, S. Narendra, and A. Chandrakasan, “Subthreshold leakage mod-
eling and reduction techniques,” in Proc. ICCAD, 2002, pp. 141–148.
[12] D. Lee, W. Kwong, D. Blaauw, and D. Sylvester, “Analysis and mini-
mization techniques for total leakage considering gate oxide leakage,”
in Proc. DAC, 2003, pp. 175–180.
[13] V. Khandelwal and A. Srinvastava, “Leakage control through
fine-grained placement and sizing of sleep transistors,” in Proc.
IEEE/ACM Int. Conf. Comput.-Aided Des., Nov. 2004, pp. 533–536.
[14] T. Kuroda et al., “A 0.9 V 150 MHz 10 mW 4 mm 2-D discrete cosine
transform core processor with variable threshold-voltage (VT) scheme,”
IEEE J. Solid-State Circuits, vol. 31, no. 11, pp. 1770–1779, Nov. 1996.
[15] R. M. Rao, F. Liu, J. L. Burns, and R. B. Brown, “A heuristic to deter-
mine low leakage sleep state vectors for CMOS combinational circuits,”
in Proc. ICCAD, 2003, pp. 689–692.
[16] V. Khandelwal, A. Davoodi, and A. Srivastava, “Simultaneous V selec-
tion and assignment for leakage optimization,” IEEE Trans. Very Large
Scale Integr. (VLSI) Syst., vol. 13, no. 6, pp. 762–765, Jun. 2005.
[17] L. Yuan and G. Qu, “A combined gate replacement and input vector
control approaches for leakage current reduction,” Inst. Adv. Comput.
Studies (UMIACS), Univ. Maryland, College Park, MD, Tech. Rep. TR
2005-63, 2005.
[18] M. R. Garey and D. S. Johnson, Computers and Intractability, a Guide to
the Theory of NP-Completeness. San Francisco, CA: Freeman, 2001.
[19] G. D. Hachtel and F. Somenzi, Logic Synthesis and Verification Algo-
rithms. Norwell, MA: Kluwer, 1996.
[20] E. Sentovich et al., “SIS: A system for sequential circuit synthesis,”
Univ. California, Electron. Res. Lab. Memorandum, Berkeley, CA, no.
UCB/ERL M92/41, 1992.
[21] Synthesis and Optimization Benchmarks User Guide, Microelectronic
Center, Triangle Park, NC, 1991.
Lin Yuan (M’03) received the B.S. degree in infor-
mation engineering from Xi’an Jiaotong University,
China, in 2001. He is currently working toward the
Ph.D. degree in the Department of Electrical and
Computer Engineering, University of Maryland,
College Park.
His research interests include low-power de-
sign, VLSI design automation, and wireless sensor
networks.
Gang Qu (S’98–A’00–M’03) received the B.S. and
M.S. degrees in mathematics from the University of
Science and Technology, China, in 1992 and 1994,
respectively, and the Ph.D. degree in computer sci-
ence from the University of California, Los Angeles,
in 2000.
In 2000, he joined the Electrical and Computer
Engineering Department, University of Maryland,
College Park. In 2001, he became a member of
the University of Maryland Institute of Advanced
Computer Studies. His research interests include
low-power system design, computer-aided synthesis, sensor network, and
intellectual property reuse and protection.
Dr. Qu has received many awards for his academic achievements and service
and has served on the Technical Program Committee for many conferences.
Currently, he is the General Co-Chair of the 16th ACM Great Lakes Symposium
on VLSI.
