Optimizing CMOS circuits for low power using transistor reordering by Musoll Cinca, Enric & Cortadella, Jordi
Optimizing CMOS Circuits for Low Power using Transistor Reordering * 
E. Musoll and J. Cortadella 
Dept. of Computer Architecture 
Universitat Politknica de Catalunya 
0807 1 Barcelona, Spain 
Activity(transJsec) 
D,1 = 10K 
(1) Daz = lOOK 
Db = 1M 
D,1 = 1M 
Db = 10K 
(2) D,z = IOOK 
7 7 
(A) (B) (C) (D) Red. 
0.81 0.84 0.98 1.0 19% 
0.58 0.53 0.53 0.48 17% 
Abstract 
This paper addresses the optimization of a circuit for 
low power using transistor reorderin . The optimization 
algorithm relies on a stochastic mod1 of a static CMOS 
ate that includes thepower of internal nodes of the ate. 
$his power-consumption model depends on the switcaing 
activity and the e uilibrium probabilities of the in uts of 
the ate. The mole1 allows an exploration of the dfferent 
con.gurations of a gate that are obtained by reordenng its 
transistors. Thus, the best configuration of each gate is 
selected and the overall power consumption of the circuit is 
reduced. 
1 Introduction 
The continuous increasing packing densit and clock fre- 
quenc of static CMOS circuits has pushellow power as 
one o?the principal design parameters, specially in battery- 
powered portable systems, such as note-pad computers, 
sonal digital assistants, multi-media terminals and moEfi 
telephones. 
This paper addresses the optimization of a circuit for low 
power using Fapsistor reordering from a gate-level descrip- 
tion. The optimization al orithm uses a power-consumption 
model of a static CMO! gate that takes into account the 
power of the internal nodes of the gate. This model allows 
a fast ex loration of the different configurations of a ate 
that are oitained by reordering its transistors. Thus, the [est 
configuration of each gate is selected and the overall power 
consum tion of the circuit is decreased. 
We ~ C U S  on combinational multilevel circuits, where it 
has been shown that the power consumption of useless signal 
transitions (i.e. those transitions that do not contribute to the 
final result of the circuit) accounts for a large fraction of the 
overall dynamic power consumption of the circuit. Thus, it 
is necessary to incorporate the switching activity of the input 
signals into the power-consumption of the gate. 
1.1 Motivation examples 
To illustrate why it is important to incorporate the switch- 
ing activity information to the power estimation of a ate, 
consider the four possible configurations of the gate insig- 
ure l(a) thatimplementfunction y = (a1 + a2) b .  Different 
switching activity of the inputs (D,l, D a 2  and Da) results 
in a different o timal transistor reordering of the gate as it 
is shown in Tabye l(b). The 7 yilibrium probability (i.e. the 
probability for a signal to be 7 ) of all input signals has been 
set to 0.5. Table l(b) shows the power consumption for two 
different input switching activity scenarios (cases (1) and 
(2)) of the different configurations relative to configuration 
(D) in case (1). Time intervals between two consecutive 
transitions of input signal IC to the gate follow an exponential 
*This work has been supported by CICYT TIC95-0419 and Dept. 
d’Ensenyament de la Generalitat de Catalunya. 
L I I 
(E) 
(a) 
distribution with average 1/&. In case (1) the best tran- 
sistor ordering is given by configuration (A) of Figure l(a); 
the power consumption is decreased by 19% with respect 
to configuration (D). In case (2), the power is decreased by 
17% if configuration (D) is. taken instead of (A). 
adder is another example in which fhe 
equilibrium p r o z i l i t  y does not ive enough infonpation 
to optimize gates for low ower. eonsider an adder imple- 
mented as a chain of fulfadders that has to calculate the 
addition of two n-bit operands with equal equilibrium prob- 
ability for all bits. The equilibrium probabilities of all inputs 
of the full-adders is 0.5, but it is clear that the switching 
activity of the inputs of the full-adders corresponding to the 
operands is low (0.5 transitions per operation) whereas the 
switching activity of the input corres onding to the prop- 
agated carry is higher (specially in %os, full-adders that 
compute the most-significant bits) because of the generation 
and propagation of useless signal transitions. 
2 Previous work and overview 
The ripple-c 
Carlson [2] hinted the possibility to use the pansistor- 
reordering technique to decrease power consumption and he 
presented an algorithm for delaylpowerlarea optimization 
where high s eed was synonym of high powerconsumption. 
p his a proac. of measurin power consumption is not suf- 
ficientfy accurate since it r$es not consider the probability 
and switching activity of si nals. No power consumption 
reductions are reported in [$ 
Input reordering conform a subset of transistor reordering 
techni ues. For example, by reordering the inputs in a 3- 
input AAND gate, 6 different configurations of the gate are 
2 19 
1066-1409/96 $5.00 0 1996 IEEE 
10Ktrans.lsec 
100Ktrans./sec 
lMtrans./sec 
Power 
level description of the circuit and generates an optimized 
circuit for low power is described. Results are presented for 
several MCNC benchmarks in Section 5. In Section 6, some 
conclusions are drawn. 
3 Power consumption of CMOS gates 
In CMOS circuits there are three sources of power con- 
sumpbon: switching activit direct- ath short-circuit cur- 
rent and leakage current. 2 static EMOS, the switching 
activity source dominates the total ower consumption be- 
cause of the chargin of capacitors. h e  avera e ower con- 
sumption of a CMOE gate can be modeled witf t ie  equation 
is the load capacitance, 
V d d  is the power supply, Tcye is the global clock period and 
D is the number of signal transitions at the output of the gate 
per clock cycle. 
3.1 Definitions and overview 
Definition 3.1 (Stochastic process) Let x( t )  be the value 
of a system characteristic being observed at time t. In most 
situations, x( t )  is not known before timet and may be viewed 
as a random variable. A stochastic process is a description 
of the relation between the random variables x( t ) .  
Definition 3.2 (Stationary Markov process) A stationary 
Markov process is a stochasticgrocess where the proba- 
bility law relating the next perio 's state to the current state 
does not change (or remains stationary) over time. 
Delhition 3.3 (Equilibrium probability) Let x( t ) ,  t E 
(-CO, +CO), be a 0-1 stationary Markov process with ran- 
dom transition times. The probability that it takes the value 
I at any given time t is the expected value E[x( t ) ]  at that 
time and it is independent of time. This value is called the 
equilibriumprobability of x ( t )  and is denoted as P(x) .  
Definition 3.4 (Switching activity) The switching activity 
of a 0-1 stochastic process x( t )  is the number of 0-to-I 
transitions and I-to-O transitions of x ( t )  in a time unit. 
Henceforth, we will model logic si nals of a circuit as 
0-1 stationary Markov processes as in [g] to derive a power- 
consumption model of a static CMOS gate that includes the 
switching activity at the output and internal nodes of the gate. 
The model depends on both the equilibrium probabilities and 
the switching activity of the in uts of the gate. 
The switching activity in foth input and output nodes 
of a gate is measured with the transition density technique 
descnbed in [6]. This technique is briefly reviewed in the 
next section. 
3.2 Transition density 
The. transition density is a compact measure of the 
switching activity in digital circuits. The transition. den- 
sity of a node is the average number of signal transitions 
per time unit of that node and it is defined as D(yj) = 
~~~~ P(&D(xi) , where P ( z )  is the equilibriumprob- 
ability, D ( z )  is the transition density, x (y) are the n (m) 
gate inputs (outputs). 2 is named the boolean difser- 
ence and it is a boolean function that may depend on all 
xp, p = 1 . . . n, p # i. The boolean difference 2 is de- 
fined as y IX=l @ y lZ=o= y(x)  @ y(Z) . If 2 = l, then 
all the transitions at input xi are propagated to output yj . 
Thus, the transition density provides a fast way of pro - 
agating switching activity from the primary inputs to t R e 
outputs of the gates that compose a circuit. 
, where I Cioaa  V2, D pw = 2 T,,f 
a b a c b c 
b a e a c b 
c c b b a a 
0.91 0.91 0.95 0.96 0.99 1.0 
220 
3.3 Extended power-consumption model 
3.3.1 Notation 
CALCULATEHIVNCTION (nk, function) 
f o r e a c h  e l e m e n t  ( d f S - l i S t k ,  e h , )  
dfS-liStk = DEFTHEIRSTSEARCH (nk, v d d )  
ADD-INPUT-TOMINTERM ( e k , ,  f u n c t i o n )  
- if s o u r c e ( e k , )  = v d d -  
CREATENEWMINTERM (fun et ion) 
(a) (b) 
Figure 2: (a) Static CMOS gate representation and (b) algo- 
rithm to obtain function H n k .  
We represent a static CMOS gate as a directed acyclic 
graph (V, E). V = {no.. . n p - l ,  y, vdd,  vss} is the set 
of nodes representing the p internal nodes of the gate 
(no. . . n p - l ) ,  the output node (y) and the power and ground 
nodes (vdd,  vss) .  The set of edges representing the 2q tran- 
sistors (q of type P and q of type N) that connect nodes in V 
is E = {eop . . . eq-l,, eoN . . . eq-!N} .  Figure 2(a) shows 
the graph representation of gate (C)  in Figure l(a). Note that 
this representation retains the transistor order information of 
the ate. 
&e power consum tion of a node (output or internal) of 
a gate is otentially Aected by all the inputs of the gate. 
In particuyar, the power consumption of node n g  produced 
by input xi (Wnk I,) is , where T,,,,, is 
the number of transitions that node n k  undergoes because of 
input xi. In other words, T,,,,, expresses how many signal 
transitions of xi (D,,) are propagated to node n k .  C,, is 
the capacitance of node n k l .  
Henceforth, we assume that a node is charged (dis- 
charged) only when there is a direct path from power 
round) supply to the node, i.e. we do not consider charge 
:Raring among the nodes of the gate. 
v'd cGfy:a+nk 
3.3.2 Computation of the model 
To compute T,,,,, , the boolean function that represents all 
possible paths from power supply to node n k  needs to be 
calculated. Let us call H,, this function; similarly, G,, is 
the boolean function that represents all possible paths from 
n k  to ground '. 
The algorithm to obtain the Hnk function is depicted 
in Figure 2(b). Function H,, is obtained by generating a 
minterm for each possible path from node n k  to supply node 
vdd. A path from node n k  to vdd is a set of r edges e k ,  
so that dst(ek,,) = n k ,  dst(ek,) = src(eko), . . . , dst(ek,_,) 
= src(ek,-2) and src(ek,-,) = vdd, where src(ek)  (dst(ek))  
is the source (destination) node of edge ek . Using a depth- 
first-search approach [3], a list of all edges to visit is created 
(depth-first-seurch-listk). Afterwards, the edges of this 
list are added to the current minterm of the Hnk boolean func- 
tion ( A D D ~ P U T -  TOMINTERM()) until an edge ekj is reached so 
that src(eki) = vdd. In this case, a new minterm is cre- 
ated (CREATENEWMINTERMO), sharing with the last created 
minterm all its edges but the last one visited. 
Because of the task of modeling the capacitances of the nodes of a gate 
is difficult, these capacitances should be extracted and stored for all gates 
of the library whenever it is possible. This is the approach followed in this 
paper. 
2Note that Gnk and H,, are complementary functions only when n k  
is the output node ( y )  of the gate. 
In the example of Figure 2(a), the four minterms generated 
when calculating H,, - are {a1 b, a1 a, ;;I, a2T, a 2 Z a l } ,  
leading to Hnl = b (a1 + a2). Similarly, G,, can also be 
derived. In Figure2(a), G,, = b. The time complexity of 
these algorithms is linear in the number of transistors of the 
gate. 
Afterwards, the boolean difference of function Hnk with 
respect to input xi (that is 2) and the equilibrium prob- 
abilities of node nk need to be calculated. The boolean 
function ar, is calculated as explained in Section 3.2. The 
equilibrium probability of node n k  is obtained as follows [4]: 
the probability of node n k  of being '1' at a given instant of 
time (P(nk)  IC )  is the probability that n k  was '1' in the in- 
stant before ( P ( n k )  I*) and it is not discharged ( P ( 2 ) )  
or that it was '0' ( P ( n k )  l a )  and it is charged (P(  %)), i.e. 
aHn, 
- 
where P(f) = 1 - P ( f ) .  
Since all signals are assumed to be 0-1 stationary Markov 
processes, the steady state value of P ( n k )  can be derived as 
We conclude that W,, I, is 
T c y c  
If the contributions of all nodes (output and internal) are 
taken into account, the power estimation of the gate is ob- 
tained as P g a t e  = (CY:; wnk I,> + ~ x , ,  
where p is the number of internal nodes and n is the number 
of inputs of the gate. 
4 Power-optimization algorithm 
OBTAINPROBABILITIES ( c i r c u i t )  
g a t  e li s t = DEF'THXRST-TRAVERSE (circuit ) 
for e a c h  g a t e  g a t e  i n g a t e l i s t  &J 
i n f o i n p u t s  = OBTAINPROBANDDENS ( g a t e ,  circuit) 
FINDBESTREORDERING ( i n f o i n p u t s ,  g a t e ,  circuit) 
i n f o - o u t p u t  = CALCULATEDENS ( i n f o i n p u t s ,  g a t e )  
UPDATE-CIRCUITINFORMATION (in f o-output  , cir cuit  ) 
Figure 3: Optimization algorithm. 
In this section, an algorithm that traverses the gate de- 
scri tion of the circuit is presented. For each gate, it finds 
the {est transistor reorderin using the power-consumption 
model explained in Section 4. 
4.1 Algorithm overview 
Finding the best transistor reordering implies an exhaus- 
tive ex loration of each gate. Since most gates only have 
a smalr number of transistors in series, an exhaustive .ex- 
ploration is feasible. The algorithm to obtain all possible 
transistor reorderings of a gate will be addressed later. 
A simplified algorithm of the o timization process for 
low power is shown in Figure 3. &e probabilities for all 
221 
output nodes of the gates of the circuit are computed in OB- 
TAINPROBABILITIESO following the al orithm proposed in [7]. 
of the circuit (circuit) ordered in a depth-fist fashion [3] 
from the outputs,i.e. every gate ap ears somewhere af- 
ter all of its transitive fan-in ates. %or each gate (gate) 
of this list, the probability anif transition density: informa- 
tion for all of its inputs is obtamed from the circuit (OB- 
TAINPROBANDDENSO). Afterwards, the best reordenng is 
derived for gate gate (FINDEESTAEORDERING()). Finally, the 
transition density of the output node of the gate is calculated 
(CALCULATEDENSO) and this information is transferred to the 
circuit (UPDATE-CIRCUITJNFORMATION()). 
4.2 Monotonic characteristic 
The algorithm takes advantage of the following property 
of the model ex lained in Section 3: the reduction of the 
power in an ingvidual gate always decreases the power 
o the circuit. The reason of this monotonic behavior is 
same probability and transition density at its output node if 
the model explained in Section 3 is used to compute them. 
Since (a) the model recisely relies on the probability and 
transition density of t ie  inputs of a gate to decrease its power 
consumption and (b) the power of the circuit is the sum of 
the power of its gates, it is clear that the reduction of the 
power in an individual gate always decreases the total power 
of the circuit. This monotonic behavior may not corres ond 
to the actual behavior of a circuit, but the ex erimentsiave 
shown that this local (greedy) approach resuh in an overall 
power reduction for the whole circuit. 
Thus, with only one traversal of the circuit, the optimal 
reordering (always with respect to the model) for all gates is 
obtained. 
4.3 Exhaustive exploration of gate configurations 
DEPTHFIRST-TRAVERSE() returns the f ist of gates ( g a t e h t )  
t rf at all possible transistor reorderings of a gate lead to the 
I Gate I #C I( Gate I #C 11 Gate 1 #C 
I inv I 1 I1 aoi211A.Bl I 4 11 oai21rA.Bl I 4 
I 
I . . _  
nand2 aoi22 naiZ2 1 nand3 1 (1  aai31[A,B] 1 1“2 (1  aai311A.B; I l?2 1 
PIVOTEANDSEARCH ( g a t e - g r a p h ,  v i s i t e d r e o r d s ,  e u r r e n t n o d e )  
g a t  e -graph = PNOTINGDNNODE ( g a t  e-grap h ,  e u r r  e n t  n o d e )  _- i f n o t  VISITED ( g a t e - g r a p h ,  v i s i t e d r e o r d e r i n g s )  then 
v i s  it e d r  eor ds  = ADD-M-VISITEDREORDS ( g a t  e g r  a p  h )  
- for i n d e x  = 1 e n u m b e r - o f i n t e r n a l n o d e s  & 
if i n d e x  # e u r r e n t n o d e  then, - 
PNOTEANDSEARCH ( g a t e - g r a p h ,  v i s i t e d r e o r d s ,  i n d e x )  
FIND ALLREORDERINGS ( g a t  e g r a p  h )  
v i s i t e d r e o r d s  + 0 
- for i n d e x  = 1 & n u m b e r s f i n t e r n a l n o d e s  & 
PIVOTEANDSEARCH ( g a t e - g r a p h ,  v i s i t e d r e o r d s ,  i n d e x )  
Figure 4: Exhaustive exploration algorithm. 
[ , ] n [ , ]
2 8 2 8 
6  [ , ] 1  
Figure 5: Execution example. 
The algorithm recursively points to an internal node 
(current-node) and pivots on it to obtain a new reorder- 
ing (PIVOTING-ONJNTERNALNODE()). Further searching for new 
reorderings is pruned if the reordering obtained has already 
been visited (VISITED()). If it has not been visited, it is added 
to the set of transistor reorderings of the gate already visited 
(ADD_T~_VIS~DREORDERINGSO) and the algorithm is called 
again for all internal nodes of the gate except the current 
one (this is so to prevent the generation of a reordering that 
we know beforehand that wehave already visited). In [5] it 
is demonstrated that all ossible transistor reorderings of a 
gate are generated with t!e algorithm in Figure 4. 
To illustrate how this algorithm works, it has been applied 
to the gate implementing the function y = (a  1 + a2) 6 .  Fig- 
ure 5 shows the execution. The startin graph representation 
of the ate is the one in Figure 2(a). & observe that all four 
possibfe reorderings (those already seen in Figure l(a)) are 
generated. 
The algorithm in Figure 4 works for gates that can be 
re resented with a series-parallel graph. The gates of typical 
litraries can be all represented with this type of graphs. 
nor3 
nor4 
5 Results 
6 aoi211[A,B,C] 12 nai211[A,B,q 12 
18 ani221[A,B,C] 24 nai221[A,B,C] 24 
an1222 48 nai222 48 
5.1 Scenarios for the experiments 
A wide range of MCNC circuits have been used as bench- 
marks. They have been map ed into the gate library shown in 
Table 2. In some cases, to oltain all transistor reorderings of 
a gate, it is necessary to have more instances of that gate. For 
example, there are two instances of gate oai21: oai21[A], 
which is able to im lement configurations (A) and (B) of 
Figure l(a) and oaifl[B], whch is able to implement con- 
figurations (C) and (D). All instances of the gates in Table 2 
have been implemented in a Sea-of-Gates design style. 
Two scenarios have been considered to evaluate the 
power-consumption savin s obtained with the transistor re- 
ordering technique (see figure 6(a)). In Scenario A, the 
circuit is considered to be embedded in a lar er digital sys- 
tem. Thus, the equilibrium probability and  the transition 
density of the inputs of the circuit may take very different 
values. In this scenario, the probabilities and transition den- 
sity of the primary inputs of each circuit are randomly set 
222 
with a uniform distribution. Probabilities range from 0 to 1 
and transition densities range from 0 to 1 million transitions 
per second. In Scenario B, the circuit is considered to be the 
whole di ital system, with latches at its inputs and workin 
at a fixef fre uency. In this scenario, the probability an i  
the transition%ensity of the primary inputs of the circuit are 
set to, respectively, 0.5 and 0.5 transitions per cycle. In both 
scenarios, the optimization algorithm has been applied to the 
ori inal gate-level description of the circuits to obtain, for 
ea& gate, the best instance and, for each instance, the best 
input reordering. Because of all instances of the same gate 
have the same area, the total area of the optimized circuit 
remains the same. 
.. 
7.1 
5.7 I 
Figure 6: The two scenarios considered. 
-Y.Z 
17.2 
2.3 
- -  
224 9.5 
148 6.2 
316 2.8 
w 11.5 
117 9.9 
43 7.8 
24 15.4 
W 5.8 
64 6.5 
55 9.7 
128 4.3 
45 2.8 
459 10.5 
1% 11.2 
47 8.4 
64 7.7 
67 12.2 
62 13.0 
49 8.4 
41 10.7 
73 13.8 
84 12.7 
155 7.9 
so 11.8 
540 2.1 
401 5.4 
235 8.6 
424 7.9 
442 7.5 
222 6.2 
284 7.7 
411 8.4 
516 5.6 
408 8.8 
206 12.9 
132 12.5 
485 11.0 
244 10.2 
313 i1.n 
1.2 
4 . 7  
-1.6 
-15.5 
4.4 
1.1 
0.0 
17.5 
-2.7 
9.4 
13.3 
6.2 
1.1 
-8.0 
11.7 
-2.4 
11.8 
13.8 
8.4 
1.5 
-5.3 
7.5 
3.7 
11.7 
2.3 
4.4 
3.6 
17.2 
R.1 
Y.6 
1.3 
13.5 
6.0 
-11.2 
-2.4 
0.0 
4.0 
1.0 
0.2 
z 
S 
14.6 
14.1 
4.6 
13.7 
9.3 
11.9 
- 
20.0 
10.2 
11.9 
12.5 
12.5 
12.3 
5.9 
11.0 
12.4 
13.3 
13.6 
16.8 
15.2 
13.8 
3.8 
15.4 
15.0 
12.6 
11.5 
1R.2 
2.9 
13.6 
12.3 
8.4 
7.6 
7.9 
3.3 
12.3 
13.7 
13.4 
13.3 
11.7 
9.0 
12.0 
- 
- 
5.7 
4.3 
1.7 
4.1 
3.7 
2.7 
8.7 
2.2 
1.8 
3.8 
4.1 
1.7 
0.6 
4.6 
5.3 
5.1 
6.4 
6.4 
4.9 
5.5 
3.0 
4.5 
3.0 
4.7 
1.9 
4.5 
5.7 
2.1 
1.6 
1.2 
4.5 
4.6 
6.5 
7.1 
7.0 
(1.8 
6.1 
3.5 
4.2 
S 
6.4 
8.2 
3.9 
5.1 
6.9 
5.5 
15.3 
- _. 
3.7 
1.6 
7.4 
5.5 
0.1 
3.2 
4.9 
6.4 
10.0 
11.5 
8.3 
3.6 
I .7 
5.4 
3.3 
8.3 
9.6 
4.6 
5.1) 
1.7 
7.1 
2.5 
4.6 
2.2 
5.5 
7.0 
5.7 
s 7  
0.7 
0.1 
10.2 
- 
D 
-0.2 
-3.7 
-3.0 
-2.9 
2.5 
10.8 
-1.2 
7.4 
14.4 
-2.6 
5.4 
0.0 
-2.6 
-1.9 
15.3 
14.1 
4.4 
8.1 
-14.0 
-4.9 
-4.3 
-4.Y 
5.Y 
5.3 
2.2 
4 .4  
4.3 
-3.4 
16.4 
-6. I 
10.9 
15.2 
-47 
 
4.7 
10.2 
-2.0 
0.3 
Table 3: Results obtained for several MCNC benchmarks 
for both scenarios considered. The number of gates is given 
in column G. 
For each scenario and circuit, two new gate-level descrip- 
tions have been created. One of them contains the best 
transistor reordering for low power for all ates found with 
the optimization algorithm whereas the otter one contains 
the worst one. A switch-level simulator [ 111 extracts the 
power consumption of each description. Thus, the maxi- 
mum power reduction for each scenario is obtained. The 
input signals to the circuits used by the switch-level simu- 
lator have been generated with an exponential distribution, 
i.e. time intervals between two consecutive transitions of 
input signal IC to the gate follow an exponential distribution 
with average l/Dk, being DI, the transition density of input 
signal t. 
Table 3 shows the results obtained. Columns M and 
S show the power-consumption reduction (best case with 
regard to worst case for low power) obtained with the model 
and with switch-level simulations respectively. Column D 
shows the increase in delay (best case for low power with 
regard to a mapping into the original cell library). The delay 
increases in most of the benchmarks because not always the 
best transistor reorderin s of a gate for low power and low 
delay coincide. In fact, h e  rule of thumb that states that the 
critical transistor should always be placed near the output 
terminal to obtain a fast gate contradicts the low ower rule 
of placing it close to the ground node as can be oiserved in 
the motivation exam le (case (2)) in Section 1.1 and in [9]. 
It is shown that t ie  average improvement in power con- 
sumption in scenario A is 1 2 %  with an average increase in 
delay of 4%. The estimated average improvement is 9%. 
The reason of this lower value in the estimated improve- 
ment is that the model, in general, overestimates the power 
consumption by an offset, thus leading to a lower estimated 
im rovement. 
%he power reduction in scenario B is roughly half the one 
in scenario A. The ower and delay of latches and the clock 
line in scenario B gas not been included in the results. In 
both scenafios there is a small average increase in delay. 
Thus, significant power consumption reduction can be 
obtained in both scenarios with little average increase in 
delay and it is possible to achieve power reductions without 
increasing the delay of the" circuit. 
6 Conclusions 
This paper shows that average power reductions of 1 2 %  
with a 4% increase in delay can be achieved by applying the 
transistor reordering technique. An optimization algorithm 
that uses a power-consumption model of a static CMOS 
gate has been presented. This novel power-consumption 
gate model takes into account both the probabilities and the 
transition densities of the inputs of the gates that compose 
the circuit. 
The results suggest that (a) current libraries may be up- 
graded with more instances of the gates with different transis- 
tor reorderings, so that an optimization al orithm can choose 
the best instance forpower reduction and!(b) it is possible to 
obtain power reductions without increasing the delay of the 
circuit. Our future work in the transistor reordering field is 
devoted to this second direction. 
References 
[ I ]  M. Borah, M. Irwin, and R. Owens. Minimizing power consumption of static 
CMOS circuits by transistor sizing and input reordering. In Proc. of the Int. 
Coat: on V U I  Design, pages 294-298, Jan. 1995. 
[Z] B. Carlson and C. Chen. Performance enhancement of CMOS VLSI circuits by 
transistor reordering. In Proc. DAC, pages 361-366.1993, 
[3] T. Cormen, C. Leiserson, and R. Rivest. Introduction to Algorithms. McGraw- 
Hill, 1990. 
[4] R. Hossain, M. Zheng, and A. Albicki. Reducing power dissipation in serially 
connected MOSFET circuits via transistor reordering. In Pmc. of the Int. Con$ 
on ComputerDesign, pages 614-617, Oct. 1994. 
[5] E. Mus011 and J. Cortadella. Optimizing CMOS circuits for low power using 
transistor-reordering. Technical report, UPCDAC, Aug. 1995. 
[6] E Najm. Transition density, a stochastic measure of activity in digital circuits. 
In Pmc. DAC, pages 644-649,1991. 
[7] K. Parker and E. McCluskey. Probabilistic treatment of general combinational 
networks. IEEE Truns. on Comp., C-24:668-670, June 1975. 
[8] S. Prasad and K. Roy. Circuit optimization for minimizationof power consump- 
tion under delay constraint. In Pmc. Int. Workshop on Low Power Design. pages 
15-20,Oct. 1994. 
191 W. Shen, J. Lin, and E Wang. Transistorreorderingrules for power reductionin 
CMOS gates. InASP-DAC, July 1995. 
[IO] C. Tan and J. Allen. Minimization of power in VLSI circuits using transistor 
sizing, input ordering, and statistical power estimation. In Proc. Int. Workshop 
on Low Power Design, pages 75-80, Apr. 1994. 
[I  I ]  A. vancerenden. SLS: An efficient switch-level timing simulatorusing min-max 
voltage waveforms. In Proc. VU189  Con$, pages 79-88, Aug. 1989. 
223 
