Global and local properties of asynchronous circuits optimized for energy efficiency by Pénzes, Paul I. & Martin, Alain J.
Global and local properties
of asynchronous circuits
optimized for energy eciency
Paul I Penzes Alain J Martin
fpenzesalaingcscaltechedu
Department of Computer Science
California Institute of Technology
Pasadena CA  USA
Abstract
In this paper we explore global and local prop
erties of asynchronous circuits sized for the energy
e	ciency metric E  t
 
 We develop a theory that
enables an abstract view on transistor sizing These
results allow us to accurately estimate circuit perfor
mance and compare circuit design choices at logic
gate level without going through the costly sizing
process We estimate that the improvement in en
ergy e	ciency due to sizing is  to 
 when com
pared to a design optimized for speed
We study sequential composition of circuits and
show that for a circuit optimized for E  t
n
 the
relationship between energy consumption and com
putation delay of the components is independent of
n When applied to optimizations for Et
 
via volt
age scaling this relationship implies that the powers
of the components must be equalized
 Introduction
The continuing decrease in feature size and the
corresponding increase in chip density and operating
frequency have made energy consumption a major
concern in VLSI design As a consequence energy
e	ciency is becoming an important consideration in
IC design
In  it was shown that the right measure of en
ergy e	ciency of a computation is E  t
 
 where E
is the energy consumed by the computation and t
is the delay cycle time or latency of the compu
tation There are several levels at which a VLSI
system can be optimized for E  t
 
 architecture
circuit or physical implementation For the main
part of this paper we focus on the circuit level and
study the impact of transistor sizing on a system
optimized for E  t
 
 We develop a theory that al
lows an abstract view on transistor sizing First the
optimum N to P transistor ratio of a logic gate is
derived Then the relationship of energy to delay
is found We prove that the energy tradeo be
tween a slow and a fast system computing in par
allel is such that only part of the energy saved by
slowing down the fast system is spent on speeding
up the slow one We bound the optimal delay of a
circuit between its scaled 
n
n
 slowest delay and
fastest delay We further prove that the delay of a
system optimized for E  t
n
is close to the scaled

n
n
 smallest delay of the component that con
sumes the most energy and that the overall energy
consumption is close to the scaled n energy
consumption of this component Later in this pa
per we consider sequential composition of circuits
optimized for E t
 
and infer a general relationship
between the energies and delays of the component
circuits When applied to voltage scaling this rela
tionship shows that circuits composed sequentially
should be designed so as to equalize their power us
age When applied to transistor sizing the same re
lationship shows that circuits composed sequentially
should be designed so as to make their power usage
proportional to the squareroot of their asymptotic
power to be dened later While our results are
established for QDI asynchronous circuits they can
be applied to synchronous circuits as well
The paper is organized as follows Section  de
scribes the Et
n
metric Sections 
 and  present lo
cal and global properties of circuits optimally sized
for E  t
n
 Section  considers the sequential com
position of circuits and applies some of the theory
developed in section  Section  elaborates on the
practical benets of energy e	cient sizing Finally
section  sums up the main results Many of the
proofs have been omitted due to space limitations
They can be found in 
 Preliminaries
We are looking for an optimization metric that
combines energy E and delay t in a way that is in
dependent of voltage With such a metric  at hand
if we desire a particular delay target t we adjust
the voltage to meet it and a circuit optimized for
 would have the best E for that t Likewise we
may choose an energy target E and get a good t in
stead For CMOS   Et
 
is the best such metric
 Basically in rst approximation E  CV
 
and
t 
k
V
 thus Et
 
is roughly constant over a range of
voltages For the purpose of this work we will gener
alize the optimization metric E t
 
to Et
n
 where
n  N  n is called the optimization index This will
allow us to compare circuits optimized for an entire
range of metrics For n   the optimization metric
is energy only for n   the optimization metric is
the energydelay product for n   the optimiza
tion metric is our E  t
 
 while for n   the
optimization metric is speed only In this paper op
timizing E t
n
is used as a synonym for minimizing
E  t
n

We explore global and local properties of asyn
chronous circuits sized for E  t
n
 while abstracting
away transistor sizing itself These properties allow
us to accurately estimate circuit performance and
compare circuit design choices at gate network level
without going through the costly sizing process
Once a circuit has been designed down to a gate
network transistor sizes that ensure correct func
tionality and performance have to be chosen The
achievable improvement due to sizing is limited 
to 
 for our types of circuits however an im
proper choice of transistor sizes could signicantly
aect the e	ciency of the design
While the problem of transistor sizing for speed
only n   is relatively well understood sizing
for the more general E t
n
metric is not The main
di	culty is that gate delays are not independent of
wire parasitics and the nice abstraction that the size
of a gate is the geometrical mean of its neighbors
does not apply
Interconnects add extra costs and constraints to
the optimization problem and they are di	cult to
accurately predict before layout For speedonly op
timization n   the wire capacitance could
in theory be overcome by increasing transistor sizes
where appropriate As we will show later the par
asitic wire capacitance plays a major role in case
of optimizing for our target function and cannot in
general be overcome in a straightforward way
We model a transistor as a perfect switch in series
with a linear resistor The gate source and drain
capacitances are proportional to the transistor width
w
i
and the transistor resistance is inversely propor
tional to w
i
 Thus a transistor network is modeled
by an equivalent RC network Wire resistance is not
taken into account Gate delays are modeled by El
more delay tau model while energy is considered
proportional to the sum of the gate and wire capaci
tance switched during computation the energy due
to leakage and shortcircuit currents is ignored
Within this model our target function E  t
n
could be written as a function of transistor sizes
w
i
 This type of function belongs to a special class
known as posynomials A posynomial problem is
the minimization of one posynomial while simulta
neously satisfying a collection of upper bound con
straints on other posynomials With the substitu
tion w
i
 e
x
i
 a posynomial can be transformed into
a convex function and thus a posynomial program
is a special case of a convex program A convex pro
gram has the special property that a local minimum
must necessarily be a global minimum  This ob
servation is exploited by tools that attempt to solve
the optimization problem numerically 
 Local properties
In this section we present two important local
properties of circuits sized for E  t
n
 It is inter
esting to note that these properties are independent
of the optimization index n This suggests that the
E t
n
optimization has only a global impact on cir
cuits while the topology of individual logic gates is
not aected As a result the same logic gate library
could be used both for highspeed n   cir
cuits and energye	cient n   circuits provided
that the library has enough drive range to accom
modate the global circuit requirements
  Synchronization points
A logic gate network can be represented as a di
rected graph in which each logic gate has a cor
responding vertex and each literal a corresponding
edge In such a graph a path corresponds to the se
quence of red switched logic gates in a given exe
cution We call private section of a path a maximum
length subpath that is not part of any other path
Similarly a public sections of a path is a subpath
shared among other paths A synchronization point
in a logic gate network is any pullup or pulldown
of a gate that synchronizes eectively waits for
two or more inputs in a given execution For exam
ple both the pullup and the pulldown of a Muller
Celement constitute a synchronization point while
the pullups or the pulldown of a NOR gate do not
constitute synchronization points Finally a non
datadependent logic gate network is either dataless
control only or it has the property that each data
rail within a channel has equal probability of ring
in a typical execution A nondatadependent logic
gate network or a nondatadependent execution of
a datadependent logic gate network most common
case is often a good approximation of the general
behavior of the circuit In this context we can state
the following
Theorem  In a nondatadependent logic gate
network optimally sized for Et
n
 each synchroniza
tion point enabled to re nonvacuously has its in
put signals arriving simultaneously or for any early
transition each path that contains that early transi
tion has all its private section transistors of mini
mum size
If we dene a cycle in a logic gate network as a
closed path and a normalized cycle as the ratio be
tween the length of the cycle and the amount of ac
tivity on it number of tokens then we can state the
following
Corollary All normalized cycles in a nondata
dependent logic gate network optimally sized for
E  t
n
are equal unless the private sections of the
shorter cycles are minimum size
Theorem  suggests a practical way to achieve an
E  t
n
optimum identify all synchronization points
with unequal arrival times and slow down the fast
path by shrinking the corresponding transistors If
a fast path cannot be further slowed down all tran
sistors on the private section of that path will be
minimum size and the optimization on that path is
complete
  Size of P transistors relative to
Ntransistors
The following result shows that there exists a gen
eral relationship when optimizing for E  t
n
 be
tween the width of the N transistors and the width
of the P transistors of the same logic gate This re
sult depends on the relative mobility   the ratio of
hole mobility over electron mobility
Theorem  Consider any cycle of a logic gate net
work implementing a QDI circuit Assume that each
logic gate i on this cycle has k
ni
 N
 
Ntransistors
of width w
ni
in series and k
pi
 N
 
P transistors
of width w
pi
in series Under these circumstances
when optimized for E  t
n

w
pi
 w
ni
r

k
pi
k
ni
It is important to note that in a QDI asynchronous
circuit in general both the rising transition and the
falling transition of a logic gate are on the same 
possibly critical cycle Theorem  is a consequence
of this property Theorem  is a local relationship
the Ndevice to Pdevice ratio only depends on the
topology of the gate through k
ni
and k
pi
 and the
relative mobility  It does not depend on the out
put fanout or output wire parasitic neither on the
global E or global t and is also independent of the
optimization index n
One consequence of Theorem  is that the num
ber of free variables in the search space for optimum
sizing could be reduced roughly by half by elimi
nating the free variables corresponding to either the
N transistors or the P transistors Theorem  also
eases the way to any transistor sizing abstraction
 Global properties
 Global properties of E and t un
der E   t
n
sizing
Consider a circuit optimized for E  t
n
by tran
sistor sizing We make two main claims in this con
text First the consumed energy is independent
in rst approximation of the types NAND NOR
Celement etc of gates used by the circuit and is
solely dependent on the optimization index n and
the amount of wiring capacitance switched during
computation Second the circuit speed is indepen
dent of the parasitics and depends only on the opti
mization index and the types of gates used These
results allow an abstract view on transistor sizing
and shift the design emphasis to the logical level of
circuits
Theorem  For a circuit composed of a ring of in
verters with equal output wire parasitics p the opti
mization for E  t
n
yields a total gate capacitance
of w  w
ni
 w
pi
 np per logic gate
Proof If we write E and t as functions of w
ni
 w
pi
and p then minE t
n
 implies
Et
n

w
ni

Et
n

w
pi
 
 w
ni

np

p

and w
pi
 w
ni
p
 
np
p


p


w  w
ni
 w
pi
 np
Theorem 
 shows that the total gate capacitance
of an operator is equal to the output wire capaci
tance times the optimization index Thus a circuit
optimized for E  t
 
will have on average transis
tors twice as big as the same circuit optimized for
E t Theorem 
 also suggests a strong dependence
of transistor sizes on wire capacitance a dependency
that is in general ignored when sizing for speed only
For E  t
 
optimization wire capacitance plays a
major role and needs to be dealt with explicitly
One can notice that Theorem 
 holds not only for
a ring topology but also for a chain topology given
that the input drive of the chain is equal to the out
put drive of the chain since in this case the E and
t equations for a chain have the same form as the
ones for a ring This is an important observation
it makes our results for transistor sizing applicable
0100
200
300
400
500
600
700
800
900
1000
0 5 10 15 20
n index (Et^n)
en
erg
y --
 del
ay
actual energy
energy from Eq. (7)
actual delay
delay from Eq. (8)
Figure  Actual and estimated energy E and delay
t for optimal E  t
n
as a function of n
to circuit delays both in terms of latency and cycle
time Whenever we will use latency as the measure
of delay we make the salient assumption that the
scrutinized component has its input drive equal to
its output drive ie no amplication This is a
reasonable assumption since most logicgate chains
are part of closed ring topologies
Under the conditions of Theorem 
 we can show
the following
Theorem  The energy E and delay t of a circuit
optimized for E  t
n
are given by
E  n E


t 
n 
n



where E

is the total switched wire parasitic and 

is the smallest achievable delay
Equations  and  can be generalized in rst
approximation to rings with arbitrary logic gates
and arbitrary parasitics  This is illustrated in
Figure  for a generic circuit composed of dierent
gate types with unequal output parasitics Based on
 the consumed energy is independent of the types
of gates used by the circuit and is solely dependent
on the optimization index and the amount of wiring
capacitance switched during computation On the
other hand based on  the circuit speed is inde
pendent of the parasitics and depends only on the
optimization index and the types of gates used
The generalization of equations  and  is
based on the mathematical elaboration of the fol
lowing observation Consider a complex circuit with
given wire parasitics p
i
and gate sizes w
i
that op
timize E  t
n
 Assume that this system consumes
energy E and operates with delay t One can notice
that if all parasitics are scaled with   R

then
w
i
are the gate sizes that optimize E  t
n
of the
new circuit This is because E and t depend on w
i
and p
i
in such a way that  drops out of the par
tial derivative equations It follows that the energy
of the new circuit is E

 E and the new delay
is unchanged t

 t Similarly if all gate drives are
scaled by   R

the actual gate sizes w
i
that op
timize E  t
n
stay unchanged It follows that the
energy of the new circuit is unchanged E

 E and
the new delay is t

 t These observations show
that the consumed energy is directly proportional to
the switched parasitics through the scaling factor 
while the operation delay is independent of the wire
parasitics On the other hand the operation delay
is directly proportional to the type of gates through
the scaling factor  while the consumed energy is
independent of the type of gates
If n   in  then t  

 an expected re
sult in case of speedonly optimization On the other
hand if n   in  then E  E

 ie transistors
should be sized as small as possible for minimum
energy consumption  another expected result For
energy e	ciency n   the cycle time of the circuit
should be chosen

 


 while the minimum energy to
achieve this delay will be 
E

 the loss in speed is
more than compensated by the energy savings
Equations  and  provide an elegant way to
analyze E and t independently at circuit level while
optimizing E  t
 
or in general E  t
n
 In par
ticular the speed of an E  t
n
optimal system can
be directly derived from the absolute speed 

of
the same system  ie a well studied problem Sim
ilarly the energy consumption of the system can be
directly derived from the total switched wire capac
itance wire capacitance that could be estimated for
example using Rents Rule
 Energy vs delay under optimal
E   t
n
sizing
In this section we characterize the function Et
ie the minimum energy consumption of a circuit
given the delay of operation t at constant voltage
There is an absolute lower bound on the delay cycle
time or latency at which a given circuit can oper
ate no matter how big its transistors are sized We
called this limit 

in  t  

 w
i
 
As a result Et has a vertical asymptote to  at
t  

lim
t
 
Et   If the parasitics p
i
of
the circuit are considered xed which is the case in a
wirelimited design and there is no upper bound on
the delay t  w
i
  the minimum energy
with which the circuit can operate is E


P
p
i

the total switched wire capacitance assuming that
the minimum width of transistors is  E

is the
same term as in  As a result Et has a horizon
tal asymptote E

at t   lim
t
Et  E


The simplest function that fullls these two require
ments is
Et  

 R

 Et 
E

t
t 



Interestingly 
 can be veried using  and 
Consider a system composed of m subsystems S
i
E
i
 
i
 executing in parallel such that each subsys
tem has its minimum energy function of the form
E
i
t
t
i
in particular these subsystems can be chains
or rings of arbitrary logic gates If the subsystems
are synchronized then all delays aected by the syn
chronization stabilize to the same delay t cycle time
or latency depending of what is synchronized As a
consequence the total energy function is
Et   max
im

i
 R

 Et  t
m
X
i
E
i
t 
i

Note that  has the same asymptotic behavior as

 and also that it is closed under addition a prop
erty that 
 lacks
Theorem 	 For a system composed of m subsys
tems S
i
E
i
 
i
 as specied above if the system is
optimally sized for E  t
n
then
Et  n 
m
X
i
E
i
with equality i	 all 
i

s are equal
Proof The optimal Et
n
of this composed system
is reached for E and t that satisfy
dEt
n

dt
   n 
m
X
i
E
i
t 
i
 t
m
X
i
E
i
t 
i

 

For the next step in the proof we use the Cauchy
Schwarz inequality
m
X
i
l
 
i
m
X
i
r
 
i
	

m
X
i
l
i
r
i

 
which transforms to equality i all
l
i
r
i
terms are
equal With the substitutions l
i

p
E
i
t
i
and r
i

p
E
i
the previous inequality becomes
m
X
i
E
i
t 
i

 
m
X
i
E
i
	

m
X
i
E
i
t 
i

 

with equality i all 
i
s are equal Using  to bound
the righthandside of  we get
n 
m
X
i
E
i
t 
i
	
t
P
m
i
E
i

m
X
i
E
i
t 
i

 
 n 
m
X
i
E
i
	 t
m
X
i
E
i
t 
i
	
 Et
 n 
m
X
i
E
i
	 Et 
with equality i all 
i
s are equal
If for all i 
i
 

we get Et  n
P
m
i
E
i
and t 
n
n


a generalization of  and  to
systems composed in parallel
Let us consider a numerical example to illustrate
 If n   m   

  
 
  and
E

 E
 
  then t   and E  

E  E
S
 
 E
S

 
  
 Notice that
n
n


 
n
n

 
  n  E

 
 and
n E
 
 
 Thus the optimal delay of the sys
tem is between
n
n


and
n
n

 
as claimed by the
next theorem The way t is reached is by running
the faster system S

slower than its own speed target

n
n


  thus saving energy from n  E

 

to E
S
 
 
 and running the slower system
S
 
faster than its own speed target 
n
n

 
  thus
spending more energy from n  E
 
 
 to
E
S

 
 What  is saying is that the en
ergy tradeo between the slow and the fast systems
is done such that only part of the energy saved by
slowing down S

is spent on speeding up S
 
 ie
n  E

 n  E
 
  is always greater that
E  

Theorem 
 For a system composed of m subsys
tems S
i
E
i
 
i
 as specied above if there exists
j  m such that j
E
i
E
j
j 	 
 
 
   and    
such that j
n
n

j
 
i
j   
 i  m i  j then
for optimal E  t
n
 t 
n
n

j
and E  n E
j

The technicality of 
 and  in Theorem  is needed to
avoid a division by zero for a case with no practical
importance More importantly Theorem  tells us
that the composed system runs close to the target
delay
n
n

j
of the component S
j
 that consumes
the most energy E
j
and that the overall energy con
sumption is close to n  E
j
 In practice gener
ally all bits within a datapath pipeline are identical
and dierent datapath pipelines have similar struc
ture thus it could be assumed that the cycles formed
by these bits have very similar or identical 

s
These 

cycles will generate a dominant term in
the energy expression since most of the energy is
consumed in the datapath and will bound the op
timal cycle time of the system to
n
n


and its
energy consumption to n  E

 The existence of
some potentially faster cycles due possibly to slack
matching buers or fast control will not have a sig
nicant impact on the global speed and energy of the
system Theorem  allows us to use under certain
circumstances the simpler formula 
 in our global
performance analysis
Theorem  For the composed system considered
above we have
max max
im

i

n 
n
min
im

i
  t 
n 
n
max
im

i
Theorem  bounds the optimal delay of a circuit be
tween its scaled 
n
n
 slowest delay min
im

i

and fastest delay max
im

i
 If those delays are
close to each other  as it is the case in a balanced
design both bounds on t are tight Based on Theo
rem  any of the bounds for t in Theorem  could
be reached if the energy consumption of the respec
tive component is dominant If n   then
max
im

i
 t  max
im

i
 t  max
im

i

ie the speed of a circuit optimized for delay only is
limited by the speed of its critical path an expected
result for speedonly optimization
So far it was considered that all components par
ticipate in the computation In general some parts
of a circuit are only activated under certain condi
tions for example a branch adder will be used only
when a branch instruction is being executed Thus
some paths are not active on every computation cy
cle The question is how shall these paths be sized
to ensure global E  t
n
optimality
Our results could be extended to a related but
less general problem Assume that a given path is
activated every  out of f  N
 
computation cycles
The corresponding component perceives the system
timing t as f t Thus its contribution to the total
energy is
E

ft
ft
 

E

t
t

 
f
 If we consider 




 
f

then all properties inferred for paths activated every
computation cycle are true also for paths activated
only every  out of f computation cycle In partic
ular  achieves equality when

 
f
is equal to the
other 
i
s normalized by their usage frequency
 Application sequential
composition
This section reveals some remarkable global prop
erties of sequential systems optimized for E  t
n

Consider two programs A and B implemented by
the circuits S
A
and S
B
 respectively Assume a se
quential computation that runs repetitively program
A to completion and then program B to completion
 the delay between the end of one program and the
start of the other is assumed negligible We would
like to know at what t
A
 t
B
to run circuits S
A
 S
B
as to optimize the metric E  t
n

For the next theorem assume the existence of a
general energy function Et minimum energy con
sumed by a system given the systems operation de
lay t This is a more general energy function than the
one dened in section  since it is valid not only at
circuit level but at any optimization level Each sys
tem has its own Et since the energy function will
depend at high level on the particular computation
being implemented and at low level on the circuits
used
Theorem  For the sequential composition of two
systems S
A
and S
B
 if the composite system is opti
mized for E  t
n
 then
dE
A
dt
A

dE
B
dt
B
independently of n
Proof The latency of the composed system is
t  t
A
 t
B
 while its energy is E  E
A
t
A
 
E
B
t
A
 thus we minimize ft
A
 t
B
  E
A
t
A
 
E
B
t
B
t
A
 t
B

n
 f reaches its minimum where
f
t
A

f
t
B
  
dE
A
t
A

dt
A
t
A
 t
B
  nE
A
t
A
 
E
B
t
B
   
dE
B
t
B

dt
B
t
A
 t
B
  nE
A
t
A
 
E
B
t
B
   
dE
A
dt
A

dE
B
dt
B
 
nE
A
t
A
E
B
t
B

t
A
t
B

Theorem  is a very general result it holds for
any energy function Et as dened earlier and any
optimization index n It extends to any number of
sequential circuits S
i
and to the more general case of
sequential composition where each circuit S
i
is used
repetitively with probability p
i

 Sequential composition and volt
age scaling
Assume the optimization parameter is V voltage
scaling then Et 

t
m
 In  it is shown that
E  t
m
is constant over a wide voltage range for
m   Let us dene P
A
and P
B
to be the power
consumed by component S
A
 respectively S
B

Property  For the sequential composition of two
systems S
A
and S
B
 if the composite system is op
timized for E  t
n
through voltage scaling then
P
A
 P
B

Proof Using Theorem  with E
A
t
A
 

A
t
m
A
and
E
B
t
B
 

B
t
m
B
we get

A
t
m 
A


B
t
m 
B

E
A
t
A

E
B
t
B
 P
A
 P
B

For m   this correlation was rst suggested by
Mika Nystrom Property  tells us that if the used
optimization is voltage scaling circuits composed se
quentially and optimized for E  t
n
should be de
signed as to equalize their power usage
Property  For the sequential composition of two
systems S
A
and S
B
 if the composite system
is optimized for E  t
n
through voltage scal
ing then if n  m minEt
n
is

m 
p

A

m 
p

B

n

min t
A
m 
p

A

nm
 if n 	 m minEt
n
is

m 
p

A

m 
p

B

n

max t
A
m 
p

A

nm
 and if
n  m then minEt
n
is 
n 
p

A

n 
p

B

n
or
n 
p
min Et
n
 
n 
p

A

n 
p

B

This property was rst proved by Karl Papadanton
akis for n  m    Property  gives a lower bound
on the achievable optimum for sequential composi
tion using voltage scaling
As a side note for parallel composition of cir
cuits S
A
and S
B
 as a consequence of Theorem 
 t
A
 t
B
for optimal E  t
n
 Assuming Et 

t
m
 minEt
n
 minE
A
t
A
  E
B
t
A
t
n
A
 
A


B
min t
nm
A
 If n  m minEt
n
is reached for
min t
A
highest feasible voltage while if m  n
minEt
n
is reached for max t
A
lowest feasible volt
age If n  m then minEt
n
 
A
 
B
 ie the
lower bound on the achievable optimum for parallel
composition using voltage scaling is the sum of the
bounds of the individual components
 Sequential composition and tran
sistor sizing
Assume the optimization parameters are transis
tor sizes then Et 
E

t
tt
 
 Let us dene the
asymptotic power of a circuit S E

 

 as P
f

E


 

If the circuit S is optimized for E  t
n
 its power
consumption is P 
E
t
 
 n
E


 
 nP
f
 This re
lationship shows that the power consumption of a
circuit optimized for E  t
n
increases linearly with
the optimization index n In particular the power
consumption of a given circuit optimized for E  t
is half that of the same circuit optimized for E  t
 
at constant voltage Using this new denition we
can show the following
Property  For the sequential composition of two
systems S
A
and S
B
 if the composite system is op
timized for E  t
n
through transistor sizing then
P
A
p
P
fA

P
B
p
P
fB

Proof Using Theorem  with E
A
t
A
 
E
A
t
A
t
A

A
and E
B
t
B
 
E
B
t
B
t
B

B
we get
p
E
A

A
t
A

A

p
E
B

B
t
B

B

E
A
t
A
p
E
A

A
E
A

E
B
t
B
p
E
B

B
E
B

P
A
q
E
A

A

P
B
q
E
B

B

P
A
p
P
fA

P
B
p
P
fB

Property 
 tells us that under transistor sizing opti
mization circuits composed sequentially should be
designed as to make their power usage proportional
to the squareroot of their asymptotic power For
the relevant case of P
fA
 P
fB
we can state the
following
Property  For the sequential composition of two
system S
A
and S
B
 if circuits S
A
and S
B
have equal
asymptotic powers ie
E
A

A

E
B

B
 then for op
timal E  t
n
under transistor sizing t
A

n
n

A

E
A
 n  E
A
and t
B

n
n

B
 E
B
 n 
E
B

Property  tells us that under the equal asymptotic
power assumption each circuit of a sequential com
position of circuits could be optimized  under tran
sistor sizing independently and the composition of
their local optimum results in a global optimum
The achievable lower bound through transistor
sizing of a sequential composition is given by the
next two properties
Property 	 For the sequential composition of two
systems S
A
and S
B
 if the composite system
is optimized for E  t through transistor siz
ing then
p
minEt 
p
E
A

A

p
E
B

B

p
E
A
E
B

A
 
B

Property 
 For the sequential composition of two
systems S
A
and S
B
 if the composite system is op
timized for E  t
n
through transistor sizing then
minEt
n
  nE
A
E
B




n

A

B


n
with equality i	 P
fA
 P
fB

While Property  gives an exact minimum for the
special case of n   Property  gives an upper
bound on this minimum This upper bound is a
tight bound and due to the atness of the E  t
n
metric around the optimum this bound is a good
approximation of the absolute minimum
	 Improvement in E   t

met

ric due to transistor sizing
In theory sizing for speed only requires n 
In practice n is chosen big but nite This is be
cause wire parasitics are not fully washed away
since that would result in impractically large transis
tors While the miniMIPS microprocessor an asyn
chronous version of a MIPS R
 
 was designed
and sized for high speed we estimate its optimiza
tion index n to be between  to  If the same
design had been optimized for E  t
 
 the expected
energy improvement would have been


to
 

while
the speed slowdown between



to


of the origi
nal This would have resulted in an overall E  t
 
improvement of  to 
 We would expect this
improvement everywhere on the chip except for the
cache core cells which are sized based on dierent
considerations Based on the notion of asymptotic
power the power consumption would decrease 
to 
 Conclusions
In this paper we have explored global and local
properties of asynchronous circuits sized for the met
ricEt
n
 We have developed a theory that allows an
abstract view on transistor sizing under this type of
optimization We have shown that the target cycle
time of an optimized circuit should be t 
n
n



where 

is the smallest achievable cycle time of the
component consuming the most energy while the en
ergy to achieve this delay should be E  n  E

where E

is the total wire parasitic switched during
computation For the miniMIPS microprocessor we
estimated that the improvement in energy e	ciency
E  t
 
improvement due to transistor sizing is 
to 
 when compared to a highspeed design
We have considered a sequential composition of
circuits optimized for E  t
n
at any level of design
and inferred a general relationship between the ener
gies and delays of the component circuits When ap
plied to voltage scaling this relationship shows that
circuits composed sequentially should be designed
as to equalize their power usage When applied to
transistor sizing the same relationship shows that
circuits composed sequentially should be designed
as to make their power usage proportional to the
squareroot of their asymptotic power
Acknowledgments
We wish to thank the members of the Asyn
chronous VLSI Group at Caltech for many stimu
lating discussions Mika Nystrom Catherine Wong
and Karl Papadantonakis and Jose Tierno from
IBM TJ Watson Research Center
The research described in this paper was spon
sored by the Defense Advanced Research Projects
Agency and monitored by the Air Force under con
tract FK
References
 Alain J Martin Towards an energy complex
ity of computation Information Processing
Letters   p
 Alain J Martin Andrew Lines Rajit Manohar
Mika Nystrom Paul Penzes Robert South
worth and Uri Cummings The Design of an
Asynchronous MIPS R
 Microprocessor
Proceedings of the th Conference on Ad
vanced Research in VLSI IEEE Computer So
ciety Press p 

 Jose Tierno An energycomplexity model for
VLSI computations PhD Thesis California
Institute of Technology 
 Paul I Penzes  PhD Thesis in preparation
Caltech
 PEGill WMurray MHWright Practical
Optimization Academic Press 
 JPFishburn AEDunlop TILOS A posyn
omial approach to transistor sizing Proceed
ings of the  International Conference on
Computeraided Design Nov 
