Automated phase assignment for the synthesis of low power domino circuits by Priyadarshan Patra & Unni Narayanan
Automated Phase Assignment for the Synthesis of Low Power Domino Circuits
Priyadarshan Patra
Strategic CAD Labs, Intel Corporation
JFT-104, 211 NE 25th Avenue
Hillsboro, OR 97124-5961
Unni Narayanan
Design Technology, Intel Corporation
SC12-606, 2200 Mission College Boulevard
Santa Clara, CA 95052-8119
A
b
s
t
r
a
c
t
High performance circuit techniques such as domino logic have mi-
grated from the microprocessor world into more mainstream ASIC
designs. The problem is that domino logic comes at a heavy cost
in terms of total power dissipation. For mobile and portable de-
vices such as laptops and cellular phones, a high power dissipation
is an unacceptable price to pay for high performance. Hence, we
study synthesis techniques that allow designers to take advantage
of the speed of domino circuits while at the same time to minimize
total power consumption. Speciﬁcally, in this paper we present
three results related to automated phase assignment for the syn-
thesis of low power domino circuits: (1) We demonstrate that the
choice of phase assignment at the primary outputs of a circuit can
signiﬁcantly impact power dissipation in the domino block (2) We
propose a method for efﬁciently estimating power dissipation in a
domino circuit and (3) We apply the method to determine a phase
assignment that minimizes power consumption in the ﬁnal circuit
implementation. Preliminary experimental results on a mixture of
public domain benchmarks andreal industry circuits show potential
power savings as high as 34% over the minimum area realization
of the logic. Furthermore, the low power synthesized circuits still
meet timing constraints.
1
I
n
t
r
o
d
u
c
t
i
o
n
The advent of portable digital devices such as laptop computers
and cellular phones has made low power circuit design an increas-
ingly important research area [4, 13, 14, 12, 6, 11]. For example,
laptop computers have a limited battery life, and so the circuitry
in the computer must be designed to dissipate as little power as
possible without sacriﬁcing performance in terms of speed. Fur-
thermore, simultaneous low power and high performance designs
are needed beyond the realm of the microprocessors. For example,
ASICs in computer chipsets or cellular phones must also approach
microprocessor-like frequency targets, but are constrained by even
tighter power budgets [4]. The problem, of course, is that the ob-
jectives of low power and highperformance are often contradictory.
Consider, for example, the use of domino or dynamic logic which
is a necessity in high speed designs.
Figure 1 contains a schematic for a basic N-type domino gate.
Portion
N-Logic
Evaluate
Transistor
Precharge
Transistor
Inverting 
Buffer Clock
F
OO 12
Dynamic
Portion
Static
Figure 1: A basic domino gate.
The domino gate consists of a dynamic component and a static
component. During the precharge phase (when F
= 0 ), the output
of the dynamic gate (atO1) isprecharged high, and the output of the
buffer is low. When the gate is evaluated, the output will condition-
ally discharge and result in the output O2 conditionally becoming
high. Observe that domino gates are inherently noninverting. The
reason is that a block of domino logic can only function correctly if
each gate makes a monotonic “0” to “1” transition. For additional
details about domino design the reader is referred to [16, 7, 5, 15].
We note that although domino logic greatly enhances perfor-
mance, this beneﬁt comes at a high cost in terms of power con-
sumption. Due to clock loading and the precharging every clock
cycle, domino gates can consume up to four times the power of an
equivalent static gate [16]. However, most efforts in lower power
logic synthesis have focused on static CMOS gates [8, 9, 10, 11].
In this paper, we study the problem of low power logic synthesis
for domino circuitry. Speciﬁcally we present three results: (1) We
demonstrate that the output phase assignment affects power dis-
sipation in domino circuits (2) We propose methods for efﬁciently
measuring power consumption indomino circuits (3) We show how
these methods can be be incorporated in an algorithm for deter-
mining a phase assignment that minimizes power consumption in
domino circuits. In Section 2, we discuss some interesting proper-
ties about power consumption in domino gates. In Section 3, we
explain the problem of phase assignment and show that the choice
of output phase can affect power consumption. In Section 4, we
present an approach for determining phase assignments for low
power domino circuits. In Section 5, we present experimental re-
_
___________________________
Permission to make digital/hardcopy of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage, the copyright notice, the title of the publication
and its date appear, and notice is given that copying is by permission of ACM, Inc.
To copy otherwise, to republish, to post on servers or to redistribute to lists, requires
prior specific permission and/or a fee.
DAC 99, New Orleans, Louisiana
(c) 1999 ACM 1-58113-109-7/99/06..$5.00sults that indicate the promise of our approach. Finally, in Sec-
tion 6, we propose future directions for this work.
2
P
o
w
e
r
C
o
n
s
u
m
p
t
i
o
n
i
n
D
o
m
i
n
o
C
i
r
c
u
i
t
r
y
Recall that in CMOS technology a large portion of power dissi-
pation on chip is due to dynamic power consumption at the gates
which is computed according to the formula:
N
å
i
=1
1
2
CiV2
ddfi
where Ci is the output capacitance of the ith gate, Vdd is the supply
voltage, fi is the number of transitions at the output of the ith gate,
and N is the total number of gates on the chip [4]. Hence power
consumption islinearly related tothe switching activity fi of a gate,
and clearly a reduction in fi will lead to a corresponding reduction
in the total power consumption of the circuit. We say that the signal
probability of a gate is the probability that the logical output of a
gateishigh and theswitching probabilityof agateistheprobability
that the output experiences a transition. We can relate the signal
probability and the switching probability of a domino gate such as
the one in Figure 1 in the following manner:
Property 2.1 Let pg be the signal probability of logical output O2
of gate g. Then Sg, the switching probability, at both O1 and O2 is
exactly pg.
Property 2.2 Domino gates never glitch or experience spurious
transitions at their outputs.
Property 2.1 becomes apparent if we trace the behavior of a
domino gate. Suppose the logical output of of O2 is high. Then,
the output O1 must be low. This means that the dynamic portion of
the gate discharged the precharged current. Furthermore, the output
will need to be precharged during the next clock cycle. Thus, the
probability of a transition at O1 is precisely the signal probability at
O2. Furthermore, O2’s output experiences a transition if and only
if O1 experiences a transition. Hence, the switching probability at
O2 is also the signal probability at O2. Now suppose that the logi-
cal output of the gate at O2 remains “0”. In this case, no charging
or discharging takes place anywhere in the gate, and so, no power
is dissipated. Property 2.1 is interesting because domino gates, in
contrast to static gates, experience an asymmetry in switching ac-
tivity with respect to signal probability. Figure 2 compares the two
types of gates. In Section 3, we will show how this asymmetry
can be exploited during the phase assignment portion of domino
synthesis in order to reduce power consumption.
0 0.5 1.0
0.5
Signal Probability
Switching
Probability
Domino Gates
Static Gates
1.0
Figure 2: Signal probability and switching for domino and static
CMOS logic
Property 2.2 is true because once a gate discharges current, its
output cannot be recharged until the next clock cycle. Hence, any
glitch that appears at the inputs of a domino block sets a chain of
monotonic transitions that cannot be reversed (until the next clock
cycle). The consequence is that since domino gates never glitch,
the switching activity can be modeled correctly under a zero delay
assumption.
3
P
h
a
s
e
A
s
s
i
g
n
m
e
n
t
f
o
r
D
o
m
i
n
o
C
i
r
c
u
i
t
s
Recall from Section 1 that domino logic is inherently noninverting.
Hence, domino blocks must be synthesized without logical invert-
ers. In [15], Puri proposes the following ﬂow for synthesizing in-
verter free blocks: (1) Perform a standard technology independent
synthesis. Inverterswillappear atarbitrarypointsinthisinitialreal-
ization (2) Systematically remove inverters by changing the phase
of primary outputs and applying DeMorgan’s Law. In Figure 3,
we illustrate an example of this procedure. Suppose we wish to
synthesize the following logic functions: f
=
(a
+b
)
+
(c
￿d
) and
g
=
(a
+b
)
+
(c
￿d
). Referring to Step 1 in Figure 3, we generate an
initial synthesis. This realization cannot be implemented indomino
logic because of the internal inverters. Hence, we try “changing
the phase” of output g. We say an output is in positive phase if no
inverter appears at the output boundary. We say an inverter is in
negative phase if an inverter appears on the output boundary. We
note that a “negative phase” assignment does not mean that we are
changing the polarity of the output. In other words, a negative
phase assignment does not mean that we are implementing the
complement of the original output. In the initial synthesis, f is
implemented in the negative phase, and g is implemented in the
positive phase. In Step 2, we change the phase of g (and preserve
the logical value of g) by placing two “logical” inverters on the out-
put g. In Step 3, we push the inverter back, and apply DeMorgan’s
law to transform the OR gate into an AND gate. Finally, in Step
4, we remove the chained inverters. In general, phase assignment
is not as straightforward as this example. Figure 4 illustrates how
various phase assignments result in trapped inverters which in turn
result in signiﬁcant logic duplication. The reader is referred to [15]
for a detailed discussion on the trapped inverter phenomenon and
how it relates to logic duplication.
P
h
a
s
e
A
s
s
i
g
n
m
e
n
t
a
n
d
S
w
i
t
c
h
i
n
g
A
c
t
i
v
i
t
y
We make the key observation that different phase assignments af-
fect the switching activity and hence power in the ﬁnal domino cir-
cuit implementation. Additionally, we show that the phase assign-
ment for minimum area is not necessarily the same as the phase
assignment for minimum power. For example, Figure 5 contains
circuits corresponding to two different phase assignments. If the
primary input signal probabilities are 0
:9, we see that the second re-
alization has 75% fewer transitions including the transitions in the
static CMOS inverters at the boundaries. This is true even though
the second implementation is clearly not the minimum area imple-
mentation.
4
M
a
j
o
r
R
e
s
u
l
t
s
Figure 6 contains a description of our overall approach for power
minimization. First, we generate an arbitrary initial phase assign-
ment. We impose this phase assignment on the outputs of the
logic network. Next we measure the power. The power measure-
ment stepsinvolvetwoparts: (1)Partitioningthesequential domino
blocks into disjoint combinational blocks (2) Computing the signal
probabilities at each node. If we have not exhausted all our candi-
date phase assignments, we then heuristically generate a new can-
didate phase assignment based upon the power measurements. IfIndependent Synthesis
Initial results of technology
Negative
Phase
￿ ￿
￿￿ ￿￿
￿
￿
￿
￿
￿￿
￿￿
￿￿
￿￿
￿ ￿
￿￿
￿￿
￿￿
￿￿
￿
￿
￿
￿
￿￿
￿￿
￿￿
￿￿
￿￿ ￿￿
￿￿ ￿￿
￿￿ ￿￿
￿￿ ￿￿
f
￿￿ ￿￿
￿￿ ￿￿
￿￿ ￿￿
￿￿ ￿￿
￿￿ ￿￿
￿￿ ￿￿
￿￿ ￿￿
￿￿ ￿￿
￿ ￿
￿￿ ￿￿
￿￿ ￿￿
￿￿ ￿￿
￿￿ ￿￿
￿￿ ￿￿
￿￿ ￿￿
￿ ￿
￿ ￿ a
b
c
d g
a
b
c
d g
Zone within boundaries
must become inverterless
Inverters must be
removed
We try changing output
phase of one of the 
primary outputs
Law
We push inverter
backwards.
Finally, we have an
inverter free region which
can be implemented in
domino logic.
Applied  DeMorgan’s
(1)
(2)
(3)
(4)
a
b
c
d g
a
b
c
d
Domino Block
g
f
Inverterless
f
f
Figure 3: An example of how phase assignment can be used to
remove inverters.
Phase
Positive
Phase
Positive
￿
￿
￿
￿
￿
￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿
￿
￿
￿
￿
￿
Inverterless
f
g
Domino Block
a
b
c
d
d
Negative
Phase
Negative
Phase
￿
￿
￿
￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿￿
￿
￿
￿
￿
(1)
(2)
Inverterless
f
g
Domino Block
a
b
c
Figure 4: Phase assignments result in logic duplication.
Phase
Phase
Positive
Negative
Phase
Phase
Positive
￿
￿
￿
￿
￿￿
￿￿
￿￿
￿￿
￿
￿
￿
￿
￿￿
￿￿
￿￿
￿￿
Negative
Static Inverters on Outputs: .0019
.18
.18
.18
.18
.01
.19
.0019
.99
.81 .9981
c
d
b
a
Inverterless
Domino Block
f
g
a
b
c
d
Inverterless
Domino Block
g
.8019
.8019
.1981
.1981
.0019
Domino Block: 3.6
Static Inverters on Outputs: .8019
Static Inverters on Inputs: 0.0 Static Inverters on Inputs: .72
Domino Block: .40
TOTAL SWITCHING TOTAL SWITCHING
f
Figure 5: A Comparison of switching in circuits from different
phase assignments
we have exhausted all of our candidate phase assignments, then the
algorithm terminates. In Section 4.1, we describe our algorithm for
determining a candidate phase assignment. Then in Section 4.2,
we describe how we compute the power of the domino blocks.
Speciﬁcally, we describe an enhanced minimum feedback vertex
set heuristic that takes advantage of the properties of domino logic
blocks to effectively partition sequential blocks into combinational
blocks. Finally, we apply a variable ordering heuristic to reduce the
complexity of BDD computations.
Generate New Candidate
Phase Assignment
Variable Ordering
Generate Initial Phase 
Assignment
Partition Sequential Circuit
into Combinational Blocks
using Enhanced MFVS.
Assignment
Output Final Phase
Power 
Estimation
Compute Signal Probabilities
Using Enhanced BDD 
Figure 6: Overall Power Minimization Paradigm.
4
.
1
G
e
n
e
r
a
t
i
n
g
C
a
n
d
i
d
a
t
e
P
h
a
s
e
A
s
s
i
g
n
m
e
n
t
s
In this section we describe how we generate a candidate phase as-
signment. Our heuristic is guided by the following critical obser-
vation:
Property 4.1 Suppose we change the phase assignment of a pri-
mary output. If the signal probability of an individual node in the
transitive fanin is p, then its new signal probability will be 1
￿ p.
We also note that a particular choice of phase assignment might
be globally worse in terms of power because of area duplication.
Recall that the area duplication is due to conﬂicting phase assign-
ments, and this is related to some extent to the degree of total over-
lap of the transitive fanin cones of the primary outputs [15]. Hence,
we deﬁne the quantity:
O
(i
; j
)
=
j D i
\D j
j
j D i
j
+
j D j
j
where i and j are primary outputs and Di and Dj are the set of
nodes in the transitive fanin of i and j respectively. We call O
(i
; j
)
the overlap of primary outputs i and j and it represents the worst
possible duplication penalty for incompatible phase assignments i
and j respectively. Next, we deﬁne the average signal probability
for the current phase assignment:
Ai
=
ån
2
jDi
jSn
jDi
j
where Sn is the signal probability of node n. Finally, we can deﬁne
a cost function for various combinations of phase assignments:
K
(i
+
; j
+
)
=
jDi
jAi
+
jDj
jAj
+0
:5
￿O
(i
; j
)
(Ai
+Aj
)
K
(i
￿
; j
￿
)
=
jDi
j
(1
￿Ai
)
+
jD j
j
(1
￿A j
)
+
0
:5
￿O
(i
;j
)
(
(1
￿Ai
)
+
(1
￿A j
)
)
K
(i
+
; j
￿
)
=
jDi
jAi
+
jDj
j
(1
￿Aj
)
+
0
:5
￿O
(i
;j
)
(Ai
+
(1
￿A j
)
)
K
(i
￿
; j
+
)
=
jDi
j
(1
￿Ai
)
+
jD j
jA j
+
0
:5
￿O
(i
;j
)
(
(1
￿Ai
)
+A j
)Observe that Property 4.1 is incorporated into the cost function.
We note that i
+ refers to retaining the current phase (it does not
mean that we are making the output phase positive) and i
￿ refers to
inverting the current phase (it does not mean that we are making the
output phase negative). Thus the heuristic for determining a phase
assignment that reduces power is: (1) Generate an arbitrary initial
phase assignment for all the primary outputs (2) For each pair of
primary outputs determine the cost for each of the four possible
phase assignments (3) Choose choose the output pair and phase
assignment combination of minimum cost. (4) Synthesisze circuit
with that particular phase assignment (5) Measure the power using
the techniques described in Section 4.2 (6) If the power decreases
with this output pair and phase assignment combination, commit
to that combination and remove the pair from the candidate set.
Otherwise do not commit to that combination, and still remove pair
from candidate set (7) Return to Step 2 if there are still outputs in
candidate set We observe that this heuristic can be extended to
capture a greater degree of interaction between phase assignments
by extending the deﬁnition of the cost function K to more than a
pair of outputs. If the cost function is extended to all of the primary
outputs in the circuit, the heuristic essentially reduces to a greedily
ordered exhaustive search.
4
.
2
P
o
w
e
r
C
o
m
p
u
t
a
t
i
o
n
i
n
D
o
m
i
n
o
B
l
o
c
k
s
We estimate the power consumption in a domino block by the fol-
lowing equation:
N
å
i
=1
Si
￿Ci
￿Pi
where Si is the signal probability at the output of gate i, Ci is the
load capacitance at the output of gate i,a n dP iis a penalty for a
particular gate type. The quantity Pi arises because we wish to bal-
ance the tradeoff between power savings and circuit performance.
It is well known that certain logic structures such as domino AND
gates are slower than other structures such as domino OR gates.
The reason is that AND gates have transistors in series. For aggres-
sive circuit designs, the performance penalty for using an excessive
number of AND gates may be too high. Hence, we account for this
penalty.
The difﬁculty lies in estimating the power is the computation
of Si for each node in a sequential domino block. The naive ap-
proaches of using exact symbolic simulation or a straightforward
application of BDDs quickly break down in terms of computational
complexity. This is depsite the fact that domino gate switching be-
havior can be modeled correctly under a zero delay assumption.
Furthermore, the complexity increases due to the iterative nature
of the algorithm. Hence, in this section we proposes methods for
efﬁciently measuring the power consumption in a domino block.
Speciﬁcally in Section 4.2.1 we propose a method for partition-
ing a sequential circuit into combinational blocks in order to sim-
plify power estimation. Additionally, in Section 4.2.2 we propose
a method for ordering the variables in the BDD signal probability
computation.
4
.
2
.
1
M
i
n
i
m
u
m
F
e
e
d
b
a
c
k
V
e
r
t
e
x
S
e
t
P
a
r
t
i
t
i
o
n
i
n
g
Sequential circuits contain cycles. Hence, the resulting state ex-
plosion makes it computationally expensive to compute the exact
signal probability of each circuit nodes. One heuristic for reducing
the computational complexity of computing signal probabilities at
the nodes is to partition the circuit into combinational blocks at the
expense of accuracy. For example, Figure 7 illustrates a preferred
partitioning of the circuit. The reason is that this partititioning re-
sults in a combinational block with fewer primary inputs. Thus, the
problem of computing the signal probability at an internal node is
greatly simpliﬁed.
Original Sequential
New PI Ideal Partitioning 
Figure 7: A sequential circuit with various partitions.
Ideally, we would compute the minimum feedback vertex set
to generate the partitioning. Unfortunately, the MFVS problem is
NP-Complete. However, there are several well known heuristics
used in the testing domain that are potentially applicable for ap-
proximating the MFVS [2]. These techniques use the concept of
transforming a sequential network into an s-graph.A ns-graph is a
directed graph representing structural dependencies (edges) among
ﬂip-ﬂops (vertices). Figure 8 shows the graph transformations used
in previous work.
Enhanced Minimum Feedback Vertex Sets for Domino
We make the observation that the process of phase assignment
inevitably leads to some logic duplication. In fact, the logic dupli-
cation in domino can be quite high when compared to the equiv-
alent static blocks [15]. This means that there are many nodes in
a domino block that share common fanins and fanouts. Thus the
corresponding s-graph contains nodes (ﬂip-ﬂops in the original cir-
cuit) that share common fanins and fanouts.
Hence, we propose a fourth transformation which is illustrated
in Figure 9. In that ﬁgure, the s-graph is strongly connected and
none of the original transformations illustrated in Figure 8 can fur-
ther reduce it. However our fourth transformation, known as a sym-
metry based transformation, groups vertices which have identical
fanins and identical fanouts into a weighted supervertex. Thus, the
vertices A
;B and E form supervertex ABE with weight 3, and ver-
tices C and D combine to form supervertex CD with weight 2. The
MFVS algorithm of [2] is then applied to this transformed s-graph
of supervertices with the modiﬁcation that the supervertices in an
s-graph should be processed in descending order of their weights.
4
.
2
.
2
V
a
r
i
a
b
l
e
O
r
d
e
r
i
n
g
f
o
r
B
D
D
s
Once we have split the loops inthe sequential circuit toform a com-
binational structure we are ready to use BDDs [1] to compute the
signal probability at each circuit node [3, 14]. We can greatly re-
duce the complexity of BDD computations by maximizing sharing
of nodes in the ROBDD. Domino blocks have the following proper-
ties that allow us to maximize BDD node sharing: (1) The circuits
are highly ﬂattened and a node’s average fanout is high, (2) The
overall circuit is highly convergent – nodes near the primary inputs
have a greater number of fanouts than nodes near the primary out-
puts, (3) Most signals in a block of control domino logic feed gates
at the same topological level in the circuit. Hence there is a heavy
overlap of logic cones in the domino implementation. We observeX
Y
V
U
Y
V
U
Ignore X
X Y
U
V
Y
U
V
mfvs =   mfvs + {X}
X Y
U
V
U
V
U
V
Y
Ignore X Ignore Y
(b)
(c)
(a)
Remove X
Figure 8: Transformations to generate MFVS.
E
A
B
C
D
C
D
Symmetrization, and weighted reduction
E
C
D
A
B
Ignore AEB
ABE CD CD
Figure 9: A new transformation to generate MFVS
that with a careful variable ordering we can exploit this overlap
of logic cones to generate more compact BDD representations We
order the BDD variables according to two principles: (1) Variables
are ordered in the reverse of the order that the circuit inputs are ﬁrst
visited when the gates are topologically traversed, (2) Gates that are
at the same topological level are traversedinthe decreasing order of
the cardinality of their fanout cones. These principles heuristically
insure that a variables takes a lower position in the BDD ordering
if it is near the primary inputs or has large fanout cones.
Consider the example in Figure 10 where a circuit with nodes
P
;Q and R is depicted. The two possible topological orders for
visiting the gates are P
;Q
;R and Q
;P
;R. Let’s focus on the ﬁrst or-
der which implies that (primary) inputs x1
;x2
;x3 are used ﬁrst and
then x4, and ﬁnally x5 are used. We let the input names stand for
the variables in the BDDs, which we construct for all (non input)
circuit nodes, namely, P
;Q,a n dR . According to our previous ob-
servations, the initial BDD ordering should be x5
;x4
;x3
;x2
;x1. In
Figure 10 this ordering corresponds to the ﬁrst row of BDDs. Note
that it requires only 7 non-leaf BDD nodes to represent all the cir-
cuit nodes. In contrast, the second row of BDDs are obtained when
the topological ordering x1
;x2
;x3
;x4
;x5 is used. This requires 11
BDD nodes. Finally, the bottom row shows the BDDs obtained if
the “natural” grouping is violated and the primary inputs are ar-
bitrarily combined. In this case the ordering is x5
;x1
;x4
;x3
;x2.
The last BDD variable ordering, which requires 9 non-leaf BDD
nodes, has the variable x1 “unnaturally sandwiched” between x5
and x4. In practice, the circuit nodes have much larger fanouts and
convergence; thus, in practice, our heuristic is actually much more
effective than what is depicted here.
5
E
x
p
e
r
i
m
e
n
t
a
l
R
e
s
u
l
t
s
Our experimental framework was an in-house domino synthesis
system and ﬂow. We implemented the heuristic for efﬁciently com-
puting signal probabilities fordomino circuitsand the algorithm for
determining the phase assignment that results in minimum switch-
ing activity. In our optimization objective function, for each gate i
we set the penalty Pi
= 0 and the capacitanceCi
= 1. Hence, we ef-
fectively determined the phase assignment that minimized the total
x5
x1
x4
x2
x3
x4
01
R
Q
x1
x2
x3
1 0
P
BDDs with disturbed signal grouping
order
x3
x4 x4
x5
x2
x1
x3
0
Q
R
P
1
order
￿
￿
￿
￿
x1
x2
x3
x4
x5
P
Q
R
BDDs with reverse
topological ordering
Circuit with nodes P, Q, R
x5
x3
x4
01
x1
x2
x3
R
x3
x4
1 0
Q
x1
x2
x3
1 0
P
BDDs with topological
Figure 10: BDD ordering heuristic
switching activity. We measured the power using the EPIC Pow-
erMill circuit simulator which accounts for accurate delays and ca-
pacitances. Table 1 and Table 2 contain our experimental results
for a mixture of internal proprietary control logic blocks and public
benchmark circuits.
Table 1contains synthesis resultsandpower measurements when
the signal probabilities assigned to the primary inputs were respec-
tively 0
:5 (different signal probabilities yielded similarresults). We
used the following ﬂow: (1) We performed technology independent
minimization (2) We independently applied either the minimum
area phase assignment algorithm (identical to [15]) or the mini-
mum power phase assignment algorithm (depending upon the ex-
periment) (3) We performed technology mapping (to a proprietary
cell library) (4) We measured the power using the Epic PowerMill
tool with statistically generated input vectors with the appropriate
signal probabilities. Theresults under thecolumn “MA”refer tothe
OPTIMAL synthesis for minimum area with the algorithm speci-
ﬁed by [15]. Similarly, the results under the column “MP” refer to
the synthesis for minimum power based upon our heuristic. The
power is reported in terms of total capacitive current, short cir-
cuit current, and leakage current (mAmps). The area is reported
in terms of the total number of standard cells.
We note some interesting points. First, the average power sav-
ings is 18%. Second, the benchmark circuit frg1 yields several
insights about our algorithm: (1) The circuit frg1 has only three
primary outputs. Hence, there are only 23 or 8 possible phase as-
signments. Despite this severely limited search space, the power
savings of 34% that we achieved is quite high. (2) The area over-
head for this particular synthesis was 48%. This example conclu-
sively shows that the minimum area phase assignment and mini-
mum power phase assignment are quite different.
Table 2 contain the results for the circuits that were run through
the same ﬂow with an additional step of transistor resizing (after
technology mapping) in order to meet realistic timing constraints.
This set of experiments is interesting because the power and area
optimizations can potentially be “undone” by subsequent timing
optimizations. We note that the power based phase assignment ap-
pears to be quite robust with an average power savings of 35%.
Furthermore, the area penalty still is reasonably small, and in fact,
there is a power optimized circuit that has a smaller area than the
area optimized circuits.Ckt Desc. #P I s #P O s MA MP % Area Pen. %P w rS a v .
Size Pwr Size Pwr
Industry 1 Control Logic 127 122 1849 12.47 1970 9.65 6.5 22.6
Industry 2 Control Logic 97 86 2272 13.74 2348 14.13 3.3 -2.8
Industry 3 Control Logic 117 199 1589 11.77 1699 8.56 6.9 27.3
apex7 Public Domain 79 36 394 3.71 443 2.98 12.4 19.5
frg1 Public Domain 31 3 98 1.30 145 0.86 48.0 34.1
x1 Public Domain 87 28 404 2.57 421 2.34 4.2 8.9
x3 Public Domain 235 99 1372 7.49 1390 6.25 1.3 16.6
Average 11.8 18.0
Table 1: Synthesis when signal probabilities of primary inputs were 0
:5
Ckt Desc. #P I s #P O s MA MP % Area Pen. %P w rS a v .
Size Pwr Size Pwr
apex7 Public Domain 79 36 452 3.72 485 3.04 7.3 18.3
frg1 Public Domain 31 3 98 3.20 147 1.91 50 40.3
x1 Public Domain 87 28 406 7.67 433 6.10 6,7 20.5
x3 Public Domain 235 99 2005 70.13 1601 26.61 -20.0 62.0
Average 8.6 35.3
Table 2: Timed synthesis when signal probabilities of primary inputs were 0
:5
6
C
o
n
c
l
u
s
i
o
n
s
In this paper, we have studied the problem of low power logic syn-
thesis for domino circuits. We have shown that the choice of output
phase assignment can dramatically affect power consumption in a
block of domino logic. Furthermore, we have presented heuris-
tics for determining a phase assignment that minimizes power con-
sumption in sequential domino blocks. Finally, we presented ex-
perimental results on variety of public benchmarks and industry
circuits that show these techniques can be beneﬁcial in practice.
One promising direction for future work is in the area of integrat-
ing the choice of phase assignment with timing optimization. We
believe that such a combination will lead to even greater power sav-
ings.
A
c
k
n
o
w
l
e
d
g
e
m
e
n
t
s
We would like to thank Barbara Chappell, Rony Friedman, Jeff
Parkhurst, Rob Roy, Prashant Sawkar, Prashant Saxena, Carl Seger,
NareshSehgal, George Stamoulis,Xinning Wang, TamarYehoshua,
and the team for their valuable feedback about this research.
R
e
f
e
r
e
n
c
e
s
[1] R. Bryant. Graph-based algorithms for boolean manipulation. IEEE
Transactions on Computers, C-35(8):677–691, 1986.
[2] S. T. Chakradhar, A. Balakrishnan, and V. D. Agrawal. An exact al-
gorithm for selecting partial scan ﬂip-ﬂops. In Design Automation
Conference, pages 81–86, 1994.
[3] S. Chakravarty. On the complexity of using bdds for the synthesis and
analysis of boolean circuits. In Allerton Conference on Communica-
tion, Control and Computing, pages 730–739, 1989.
[4] A. Chandrakasan and R. Broderson. Low Power Digital CMOS De-
sign. Kluwer Academic Publishers, 1995.
[5] H. Y. Chen and S. M. Kang. Performance optimization for domino
cmos circuit modules. In ICCD, pages 522–525, 1997.
[6] J.C. Costa, J. Monteiro, and S.Devadas. Switching activity estimation
using limited depth reconvergent path analysis. In International Sym-
posium on low power electronics and design, pages 184–189, 1997.
[7] S. M. Kang. Data shifting and rotating apparatus. US Patent
4,396,994, August 1983.
[8] U. K. Narayanan, H. Leong, K. Chung, and C. L. Liu. Low power
multiplexer decomposition. InInternational Symposium onlow power
electronics and design, pages 269–274, 1997.
[9] U. K. Narayanan and C. L. Liu. Low power logic synthesis for xor
based circuits. In International Conference on Computer-Aided De-
sign, 1997.
[10] U. K. Narayanan, P. Pan, and C. L. Liu. Low power logic synthe-
sis under a general delay model. In International Symposium on low
power electronics and design, 1998.
[11] R. Panda and F. Najm. Technology decomposition for low-power syn-
thesis. In IEEE Custom Integrated Circuits Conference, pages 627–
630, 1995.
[12] P. Patra. Approaches to Design of Circuits for Low-Power Computa-
tion. PhD thesis, The University of Texas at Austin, 1995.
[13] P. Patra and D. Fussell. Power-efﬁcient delay-insensitive codes for
data transmission. In Proc. of 28th Hawaii International Conference
on System Sciences, Jan 1995.
[14] M. Pedram. Power minimization in IC design: Principles and ap-
plications. ACM Transactions on Design Automation of Electronic
Systems, 1(1):3–56, 1996.
[15] R. Puri, A. Bjorksten, and T. Rosser. Logic optimization by output
phase assignment in dynamic logic synthesis. In International Con-
ference on Computer Aided Design, pages 2–8, 1996.
[16] N. Weste and K. Eshraghian. Principles of CMOS VLSI Design: A
Systems Perspective. Addison-Wesley, 1993.