Upper and lower bounds on switching energy in VLSI by Kissin, G.
Centrum voor Wiskunde en Informatica 
Centre for Mathematics and Computer Science 
G. Kissin 
Upper and lower bounds on switching energy in VLSI 
Computer Science/ Department of Algorithmics & Architecture Report CS-R9044 September 
A,,. .,, 
f'-"'l'ttmm vu...r ',",. , , e ~. l :nkJrr,,:;t.i;8 
The Centre for Mathematics and Computer Science is a research institute of 
the Stichting Mathematisch Centrum, which was founded on February 11 , 
1946, as a nonprofit institution aiming at the promotion of mathematics, com-
puter science, and their applications. It is sponsored by the Dutch Govern-
ment through the Netherlands Organization for the Advancement of Research 
(N.W.O.). 
Copyright © Stichting Mathematisch Centrum, Amsterdam 
UPPER and LOWER BOUNDS on SWITCHING ENERGY in VLSI 
Gloria Kissin 
Centre for Mathematics and Computer Science 
P.O. Box 4079, 1009 AB Amsterdam, The Netherlands 
and 
Department of Mathematics and Computer Science 
University of Amsterdam, Plantage Muidergracht 24 
1018 TV Amsterdam, The Netherlands 
September 1990 
ABSTRACT 
A technology independent framework is established for measuring the switching 
energy consumed by very large scale integrated (VLSI) circuits. Techniques are 
developed for analyzing functional energy consumption, and for designing energy-
efficient VLSI circuits. A wire (or gate) in a circuit uses switching energy when it 
changes state from 1 to O or vice versa. This paper develops the Uniswitch Model 
(USM) of energy consumption, which measures the differences between pairs of states of 
an embedded circuit. 
The following worst case lower bounds are obtained in USM. Monotone circuits 
require switching energy proportional to the circuit's area. A class of n - input, boolean 
valued functions. including addition and multiplication, uses Q(n log2 n) switching 
energy. when computed by a shallow depth circuit. A special case of the parity function 
is shown to require switching energy proportional to the area. 
This paper also derives upper bounds in USM. Novel circuits and layouts are 
obtained for n -bit OR and compare functions that have shallow depth and use only 
linear energy, in the worst case. A shallow depth n - bit addition circuit is laid out in a 
novel manner that uses linear energy, on the average. This is a log factor better than the 
worst case lower bound for addition. 
1980 Mathematics Subject Classification: 68A30 
CR Categories: B.7.1, F.1.1 
Key Words and Phrases: addition. compare functions. circuits, OR, parity, switching 
energy, uniswitch energy. uniswitch model, upper and lower bounds, USM, VLSI. 
Note: This paper will appear in a forthcoming issue of JACM. This paper was partially 
prepared at CSR/. University of Toronto. 
Report CS-R9044 
Centre for Mathematics and Computer Science 
P.O. Box 4079, 1009 AB Amsterdam, The Netherlands 

Upper and Lower Bounds on Switching Energy in VLSI 
1. INTRODUCTION 
This paper establishes upper and lower bounds on the switching energy required to realize some 
commonly computed functions with VLSI circuits. Specifically, OR and AND functions and compare 
functions are shown to be realizable with VLSI circuits that use an amount of switching energy that is 
never more than linear in the length of the input. A linear average energy layout is given for binary 
addition. Worst case lower bounds are obtained for monotonic circuits, a class of multiple-output func-
tions including addition and multiplication, and for a special case of the parity function. 
Energy is practically motivated in VLSI design because energy consumed by a circuit is 
transformed into heat. How well a circuit can dissipate heat determines its operational limitations. Thus, 
the less heat produced the better. In addition, energy considerations determine a significant portion of the 
overall costs of a computer [Me86]. 
Common to all physical devices is the switching energy [MC80] consumed when a wire or gate 
changes state from 1 to O or vice versa. The amount of switching energy consumed is proportional to the 
area switched. 
The Uniswitch Model (USM) of energy consumption, defined in the next section, and most 
results in this paper first appeared in [Ki82] and [Ki85]. This work also comprises part of [Ki87]. Len-
gauer and Mehlhom [LM81] showed that n-input functions realizeable in AT2 = O(n 2) require Q(AT) 
switching energy, where A is area and T is time in the Thompson model [Th80]. Aggarwal et al 
[ACR88] improved the result of Lengauer and Mehlhom to obtain an O(n 2) energy bound for the class of 
transitive functions [Vu83]. Leo [Le84] independently showed that, for a specialized circuit basis, the 
parity function requires O(A ) average switching energy, where A is the area of the parity circuit. 
The bounds obtained in this paper estimate the wire switching energy in USM. Neglecting node 
switching energy is not a major restriction because nodes are assumed to have minimal area and circuits 
are generally connected graphs. USM measures the differences between two states of a circuit; hence at 
most one switching event per node or wire is recorded by USM. Race conditions (aka hazards) that 
induce wires to switch more than once are the domain of the Multiswitch Models, which are defined 
and discussed in [Ki87] and [Ki90]. USM provides a lower bound on the total energy used by a circuit. 
The rest of this paper is organized as follows. Section 2 defines the Uniswitch Model of energy 
consumption, aka uniswitch energy. (The term energy refers to wire switching energy for the duration 
of this paper). Two worst case energy metrics are defined and shown to be equivalent up to a constant 
multiplicative factor. The motivation for USM is discussed in section 2.1. 
Section 3 examines energy lower bounds in USM . Monotone circuits {ie. basis { /\ , v } ) are shown 
to be inherently energy-inefficient in that a monotone circuit always uses worst case uniswitch energy 
proportional to the circuit's area. Nonlinear lower bounds are obtained for a class of multiple output 
functions including addition and multiplication. For a restricted basis, a nonlinear lower bound is 
obtained for the single output function, parity. All the bounds are in the form of energy-time tradeoffs, 
because allowing large time bounds (eg. linear circuit depth) often permits trivial energy upper bounds. 
In section 4, upper bounds are obtained in USM. In particular, a novel circuit and layout is 
described for the OR function (ie. x 1 v x2 v · · · v Xn ), which is optimal in expending O(n) worst case 
uniswitch energy. The OR technique is extended to compare functions (ie. Is X > Y? where 
X,Ye {0,1}"). 
Section 5 addresses the question of average case energy consumption. A parallel prefix circuit for 
binary addition, laid out according to the technique of section 4, is shown to use linear uniswitch energy 
-1-
Upper and Lower Bounds on Switching Energy in VLSI 
on the average. This is a log factor better than the worst case lower bound. Some open problems are 
stated in section 6. 
2. THE SETTING 
The UniSlvitch Model of energy consumption defines an energy cost measure for VLSI circuits. 
USM measures the differences between pairs of states of a circuit. The following discussion sets the 
stage for a precise definition of USM. 
A VLSI circuit is a combinational circuit [Bo77] embedded in a plane as in [BK81]. Salient 
assumptions of the Borodin / Brent / Kung models that are important to USM are as follows. A circuit is 
acyclic . A wire (edge) in a VLSI circuit has constant minimum width A> 0. A non-input node (gate) in a 
circuit computes a logical function of l or 2 inputs (eg. AND (A), OR ( v ), NOT (-, )) in constant time. 
Gates are separated by distances ~ A. A gate has constant area A 2, and a non-output gate has fanout 1 or 
2. Input nodes have fan in 0. Output nodes have fanout 0. At most a constant number of wires, v ~ 2, can 
overlap or intersect at any point in a VLSI circuit. 
Some of the energy lower bounds described in this paper are functions of the wire area of a VLSI 
circuit. These results require that the circuit in question be connected, that the circuit inputs preclude 
constant values, and that interior nodes are functionally dependent on the inputs (ie. each interior node 
has two states). 
Definition: 
A CID VLSI circuit is a VLSI circuit that satisfies the following two properties. 
El: Each input has fanout at most p , for p ~ 1. This limits the number of free duplicates of any input 
bit. Presumably, a real circuit receives one or a few instances of an input bit, and if more are needed the 
input must be replicated by the circuit. This costs area and energy that cannot realistically be neglected. 
In section 2.1 , an example is given that illustrates the asymptotic affect of replicating inputs. 
E2: All instances of any input appear at input nodes that are within a constant distance of each other. 
This again is an input fanout constraint. For the purposes of this paper, input and output nodes are on a 
convex boundary of the circuit layout. 
Properties El and E2 are called constant input duplicates (CID) assumptions. This paper is 
primarily concerned with cm VLSI circuits and their physical analogs. Hence, the term circuit gen-
erally refers to a CID VLSI circuit. The terms real circuit or physical circuit refer to the physical 
analog of a CID VLSI circuit. (In [Ki87], CID circuits are called CIF circuits.) 
Definitions: 
A legal state, (hereafter, also called state or stable state) s, is a function that attributes values to the 
nodes and wires of a circuit C. ie. C = ( V ,W) where V is the node set and W i!- the set of wires. 
s : V u W ➔ I 0, 1 } . Input node x has some value x o where x o E { 0, l } . Edge w emanating from input 
node x has value s ( w) = x o- Non-input nodes and edges have values consistent with the input and the 
labeling of the nodes (eg. s (AND(0, 1)) = 0, s (NOT(0)) = 1 ). sx denotes the state of C for input X. 
X ➔ sx is a bijection between an input vector and a state of circuit C. Since a state and its associated 
input vector are closely allied, they are used interchangeably in the following discussion. C is in state si 
at time ti. so is the initial state of C. 
The switching energy of a circuit C is defined on a pair of states. In particular, we are interested in 
what happens when one input vector to C is replaced by another input vector. In the definitions that fol-
low, the pair of states in question is often denoted as (so, X ), where s O is the initial state and X is an 
.2. 
Upper and Lower Bounds on Switching Energy in VLSI 
input vector lhat induces a second (ie. final) state. 
Definitions: 
Suppose VLSI circuit C changes state from so to SJ, denoted C: so ➔ SJ. Further assume that wire w 
has initial value so(w)=wo and final value SJ (w)=WJ where wo, Y.'j E {0,1 }. This change in the 
value of w is denoted w: w 0 ➔ wt. Then H' is switched (switches) iff w 0 :;t wJ. A wire of lenglh l 
that switches accounts for l / k switching energy, where k > 0 is the length of wire that accounts for 1 
unit of switching energy. If W = I w ) is the set of wires in circuit C , and X is the input set such lhat 
C:so ➔X,thenthewire energy,Ew,consumedbyC isE,.,,(C,so,X) ~ k1, ( I llwll). 
WE W 
SO ( W) ;I: sx ( W) 
where II w II is the area of wire w. E,.,, ( C , so, X ) ~ k * area(C) where area(C) is the total wire area of 
C. 
Uniswitch energy is defined below for the worst case and average case. Two worst case 
uniswitch energy models, denoted E!!orst and Ef:.orst, are defir.ed below. E!!orst picks the maximum wire 
energy expended when all (initial state, input) pairs are considered. This is the most appropriate measure 
for analyzing circuits - hence the superscript U for Upper bounds. E!;orst is obtained by first determin-
ing the maximum energy expenditure over all inputs for each initial state. From this set of maxima. the 
minimum is chosen to obtain a bound that is valid for all initial states. El;orst thus yields strong lower 
bounds - hence the superscript L - independent of a circuit's initial state. 
The average energy model Ec1, defined below, averages the wire energy expended by a circuit over 
all (initial state, input) pairs. 
Definitions: 
If Cn is a VLSI circuit computing fn: {O,l }n ➔ {O,l }m such that Cn is in state so at time to, and 
E w ( C n , so, X ) is the wire energy consumed by fn when X = ( x 1, .. . , Xn) is lhe input to C,, at time 
t > to, then Ef:.ors1 (Cn ), the universal worst case uniswitch energy, is given by 
Etorsr(Cn) ~ min[maxEw(Cn,So,X )] 
so X 
E!!ors1(C,, ). the existential worst case uniswitch energy, is given by 
u ~ Eworst CCn) = max Ew(Cn ,so,X) 
(so, X ) 
and Ea (Cn ), the average case uniswitch energy is given by 
Ea(Cn) ~ > Ew(Cn,So,X )!22n 
(s(0() 
where 22n is the number of (so,X) pairs. This definition of Ea(Cn) assumes that the input vector is 
uniformly distributed over {0,1 }". 
E !!,_,rs1 ( C n ) is abbreviated as E!!orst . Efvorst ( C n ) is abbreviated as Efvorst. 
Ef:.orst ( C n ) is a good model for lower bound analyses, while E!!orst ( C n ) seems better suited for 
upper bounds. The following theorem, however, shows lhat the two models are equivalent to wilhin a 
constant factor. Therefore, it is sufficient to consider E!!orst for bolh upper and lower bounds, and 
E!!arst ( C n ) will often be abbreviated as Eworst . 
Theorem 2.1: 
£/!ors/ ~ Ef:.orsl ~ ~ E/!orst 
-3-
Upper and Lower Bounds on Switching Energy in VLSI 
Proof: 
Th I. . 1· Eu > EL . b . e lfSt mequa 1ty, worst _ worst, ISO VIOUS. 
Consider a functionfn for which a circuit C n uses E!/.orst = B. Thus, by definition of E!iarst, there exists 
two states of Cn, s 1 and s2 such that when C,, switches from s 1 to s2, B amount of energy is consumed. 
Let A be the subset of wires of C n that have different values in s 1 and s 2· ie . C n = ( V, E) where E is 
the set of wires, and A= lw I w EE and s1(w):;t:s2(w)l . 
By detinition, area( A ) = B . 
Assume C n is in some states other than s 1 or s2. Consider the set A of wires . Let A 1 = { w I w E A 
and s ( w ) = s 1 ( w ) l 
LetA2=A-A1={w lwEA ands(w)=s 2(w)l 
There are two possible cases : 
case 1: area(A 1 ) 2: 1 area(A) 
Apply the appropriatP inputs that will induce s2 (ie. cause Cn to switch from states to state s 2 ). 
This will cause the wires of A 1 to switch, thereby using energy 2: 1 B. 
I case 2: area( A 2 ) 2: 2 area( A) 
Apply the appropriate inputs to induce state s 1. This will cause the wires of A 2 to switch, thereby 
using energy 2: 1 B . 
Since the argument above applies to an arbitrary function fn. an arbitrary circuit C n for fn, and an arbi-
trary state S of C n , it follows that Eforst 2: ~ E!!orst. [] 
Definitions: 
A function f: { 0, l J"' ~ { 0, 1 J * is energy efficient iff 3 a family C = (Cn )c n e N ) of circuits with 
C,, realizing/ 1{0,1 }n, and E,wrsc(Cn )=0(n ). Circuit family C =(Cn )en e NJ is energy efficient 
iff V families C = ( C n )c n E N ) of circuits with <I>(C) = <l>(C) : E. ... orst (C,,) = Q( Eworst ( C n )). 
Throughout this paper, log n means log2n. 
2.1 Model Motivation 
The intent of this section is to motivate the Uni.nvitch Model in light of physical consideration~, 
and to discuss and motivate some of the assumptions included in the definition of USM . 
USM is a good model for obtaining lower bounds because it conservatively estimates a circuit's 
switching behaviour. Thus, a lower bound in USM is an equally valid lower bound on multiswitch 
energy. 
USM takes no notice of how a circuit arrives at a particular state. This is the domain of the Mul-
tiswitch Models, which are discussed in (Ki87] and (Ki89]. However, in order to discuss the relevance 
of using USM to obtain upper bounds, the following multiswitch notions are introduced. 
The switching behaviour of physical circuits is influenced by various delay functions, such as gate 
delay 8, wire d . .lay 6 and input delay I. 8 determines the switching speed of a gate. 6 determines the 
time to transmit a bit along a wire. I determines when an input value arrives at an input port. 
Definitions: 
Let ( C n, 8, 6,l ) denote a circuit scheme, where C n is a CID VLSI circuit with gate delay 8, wire 
-4-
Upper and Lower Bounds on Switching Energy in VLSI 
delay 11. and input delay / . A circuit scheme ( Cn. ◊, 11, I ) exhibits the uniswitch property if each 
node or wire of C n switches at most once when C n changes from one input setting to another, according 
to 0, 11 and/ . Otherwise, ( Cn, ◊, 11, I) exhibits the multiswitch property. 
Using USM to obtain upper bounds is justified for circuit schemes that exhibit the uniswitch pro-
perty. For example, if each node of a circuit receives its inputs at the same time. race conditions cannot 
arise. The uniswitch property is thus ensured. Some real circuits have this timing property. Where race 
conditions derive solely from a circuit's asynchrony (ie. Lhe paths to a node vary in length), a circuit 
sc heme can acquire the uniswitch property if the circuit can be made synchronous. A "bad" input 
schedule can be offset by varying gate delays. These approaches to designing circuit schemes that 
achieve the uniswitch property are discussed in [Ki87] and [Ki89]. Further, according to C. Mead 
[Me86], many CMOS designs are synchronized to ensure that the corresponding circuit schemes itave the 
uniswitch property. 
USM is the first step in the systematic asymptotic analysis of switching energy consumption in 
VLSI circuits. As such. USM is justified as an upper bound model. In addition, USM is motivated by 
designers· practical efforts to prevent hazards and thus ensure the uniswitch property. 
The rest of this section discusses the following circuit assumptions. 
I) Circuits are acyclic. 
2) Each node of a VLSI circuit uses area )1.2. 
3) Each input has at most constant fanout. 
4) All instances of an input appear within constant distance of each other. 
The first two constraints are simplifying assumptions, while the last two constraints attempt to 
model real circuits. The following elaborates on the four assumptions listed above. 
1) Circuits are acyclic. 
The study of combinational circuits (without loops) has a long and distinguished history. Krohn 
and Rhodes· [KR65ab] seminal work in this area showed that each sequential machine (with loops) can 
be decomposed into structures consisting of only combinational circuits and flip-flops. 
A recommended architecture for sequential machines is the finite state machine in which the combi-
national logic is isolated from the looping structure [MC80]. See Figure 2.0. This architecture lends 
itseu· to analysis of the combinational logic distinct from the looping buffers. 
2) Each node of a VLSI circuit uses area )..2. 
That the area of each node is at least "A.2 is dictated by manufacturing constraints. A is a technology 
dependent value that determines the minimum feature size of circuit components. Large node fanin 
requires node area proportional to the fanin. However, since our underlying circuit model has fanin at 
most two, minimal node area is sufficient. A constant fanin greater than two increases node area by a 
constant factor. Unbounded fanin is reasonably excluded from consideration since practical circuits con-
sist of components with bounded fan in. 
A node requires large area to drive a large capacitive load, such as a large fanout or a long wire, 
without a degradation in time. Our underlying circuit model assumes that a node has outdegree at most 
two. Realizing a constant fanout greater than two increases the depth and area of the circuit by a constant 
factor. A larger fanout can be realized with a binary tree. Wire delay is not a concern of USM, which 
measures the differences between two static states. 
-5-
Upper and Lower Bounds on Switching Energy in VLSI 
- • - • -- - - -. i 0 . . . . u . . n . . . Com bin etion el . t . C/L - p . . p -u . Logic . -. . u t -- . . t -- - - -
I 
- -I I 
Feedb eck Peths 
Figure 2.0. Preferred Sequential Circuit: a Finite State Machine 
I / 0 ports will generally be much larger than A 2 area because they drive off-chip wires that are 
significantly larger than wires on the chip. This paper does not specifically address I/ 0 ports. but the 
tech.niques developed herein are relevant and applicable to circuits with I / 0 ports of realistic size. 
2) Each input has at most constant fanout. 
3) All instances of an input appear within constant distance of each other. 
These two constraints are called constant-input-duplicares (CID) assumptions. It is reasonable to 
expect that a real VLSI circuit will receive a single copy of any input, or at most a small number of 
copies. It is then up to the circuit to replicate the input as required and to transmit copies to distant loca-
tions on the chip as required. The following gives an example of the energy costs of replicating and 
transmitting input. The example shows that since these costs can have an asymptotic effect on the total 
energy, they can.not be neglected in practice. The constant-input-duplicates assumptions thus force the 
circuit designer to account for the cost of replicating and transmitting input. 
Let A=(ao, . . . , an-l) and B=(bo, . . . , bn-1) be sets of boolean variables. Consider the fol-
lowing DNF expression: 
H(A ,B) =(an-I/\ bn-1) V 
[(an-! EB bn-1) /\ (an-2 I\ bn-2 )] V 
. V 
[(an-I EB bn-1) /\ ... /\(a I EB b 1) /\ (ao I\ bo)] 
where a;. b; E { 0,1 ), 0 ~ i < n 
~ n-1 . ~ n-1 . 
Let A = ;~ a; 21 and B = i~ b; 21 
H ( A , B ) computes the n + I st bit of A+ If. 
Consider a circuit C n for H in which each occurrence of a; and b; in H ( A , B) is provided at a 
distinct input node, and interior nodes have fanout at most one. Then C n is a tree with O(n 2 ) leaves. 
Brent and Kung [BK80] and Yao [Ya8 l] showed that such a tree requires area Q(n 2log n) when it has 
O(log n) depth and the leaves are on a convex boundary of the layout. Thus, an optimal embedding of 
C11 in US,H will use no more than O(n 2log n ) energy. which is consumed when H switches from a state 
where A =B =On to a state where A= in and B =On-11. 
-6-
Upper and Lower Bounds on Switching Energy in VLSI 
The energy cost of H can be improved to O(n 2) by using a nontree-like circuit Dn in which the 
conjunctive clauses of H are energy-efficient (ie. use at most O(n) energy per clause). The technique for 
realizing such a circuit is discussed in section 4. As in C n , each occurrence of ai and bi in H (A, B) is 
provided at a distinct input node in Dn. 
The analysis above charges only unit energy per input instance, although many inputs have O(n) 
instances and instances can be up to O(n 2 ) distance apart when the input nodes are laid out as in Figure 
2.1. Consider an analysis of H that realistically accounts for these factors. 
Figure 2.1 illustrates a circuit Dn for H that includes :he input fanout trees. The example in the 
Figure uses ,z = 4. Recall that F denotes a fanout node that computes the identity function of its input. 
Consider the area of Dn. 
More than half the 2n inputs each have at least II I 2 instances in H. Hence the area of each of 
these input fanout trees in Dn is .Q(n log n ), even when the large separation between instances is 
ignored. When an input switches the entire fanout tree swi tches. Hence the total energy for the input 
fanout trees is .Q(n 2Iog n ), which exceeds the O(n 2 ) energy cost of the non-input portion of Dn (assum-
ing 6n uses energy-efficient conjunctive clauses). 
energy-efficient 
conjunctive clause 
(a 4 0 b 4 )/\(a 3G)b 3 )/\ 
(a20b2)/\(a1/\b1) 
F-------+--+-F-----+--+~ ® Fanout node 
----+--t----t Fl----~-+------' 
~ 
Figure 2.1. D 4, a VLSI circuit with large input fanout 
-7-
Upper and Lower Bounds on Switching Energy in VLSI 
If we realistically assume than an input will anive at a single input port , then the large separation 
between input instances in Dn will be manifested by long fanout wires in 6,,. In fact, Q(n) inputs in Dn 
have instances that are Q(n 2 ) apart. This drives the energy cost to at least Q(n 3 ). 
In chapter 4, H is shown to be computable in O(n ) energy in USM. 
3. LOWER BOUNDS 
3.1 Trivial Bounds 
The trivial lower bound for the worst case switching energy of a circuit with n inputs is Q(n) ( for 
cyclic and acyclic circuits ), achieved when all inputs are switched. In USM, an acyclic circuit of area A 
uses E worst = 0( A) trivially. Hence, in cases where A = O(n) then E worst = 0(n ). For many n -input 
functions this is achieved with circuits of O(n) depth. For example, an n -stage ripple carry adder, which 
has depth O(h ), uses E,..-arst = 0(n) since area A = O(n ). Thus. in order to obtain super linear lower 
bounds for energy, most of the theorems in this section assume sublinear circuit depth. 
However, assuming sublinear depth is not always sufficient to guarantee a circuit of superlinear 
area. For example, the well known H-tree embedding, illustrated in Figure 3.1, can be used to realize a 
parity function on n inputs, .vhere the nodes of the tree are exclusive-or (EB) gates. Such a circuit has 
shallow (ie. log n) depth but only O(n) area and hence 0(n) energy in USM. However, the H-tree 
embedding has its input ports strewn throughout the layout. while in practice it is advisable to confine I/0 
ports to convex boundaries of the layout [Me80]. 
Theorem J.0: ( Brent, R.P. & H.T. Kung [ BK80], A. Yao [ Ya81]) 
A tree with n l bo d d f d th D · A > en log n for eaves on a convex un ary an o ep requires area - log(2 D I log 11 ) 
C > 0. 
In particular, if D = O(log n) then A = Q(n log n ). Thus, to obtain superlinear energy bounds. 
many of the results in this section assume that n -input circuits have sub linear depth, and the input (and 
output) ports are on a convex boundary of the layout. In many cases, Theorem 3.0 above thus guarantees 
that the embedded circuits have superlinear area. 
Figure 3.1 H-tree Layout 
-8-
Upper and Lower Bounds on Switching Energy in VLSI 
3.2 l\:lonotone Circuits 
Definition: [Sa76l 
A monotone circuit is a circuit whose noninput nodes are labeled with functions from the monotone 
basis!/\, v ). 
Theorem 3.1: 
A monotone circuit Cn without constant inputs, embedded in area A requires worst case energy 
E-.w,w(Cn ) = Q (A ). 
Proof: 
Assume C n is in a legal state. Thus C n may contain some "O" edges and some " I " edges. Let A O be the 
area of the "O" edges Let A 1 be the area of the "l " edges. 
case I: 
If Ao~ 1 * A then apply 1 n to the input nodes. This will force all "O" edges to sw itch. Therefore, 
E wors1(Cn ) =Ao~ 1 * A. 
case 2: 
If A O< 1 * A then apply on to the input nodes . This will force all " l " edges to switch. Therefore 
E..,ws1(Cn )=A1 ~~ * A . 
[] 
Theorem 3.1 shows that in USM, a monotone circuit will switch most of its area in the worst case. 
Thus, the naive way of realizing the OR function on n inputs, with a monotone tree of v -gates, uses 
worst case energy proportional to the area of the tree. This high energy expenditure can be reduced for 
OR (n) and other function s by introducing negations into the circuit and by using a novel layout. Section 
4 describes such a VLSI circuit, C n , for computing OR (n ), which uses a complete basis. C n has 
O(log n ) depth, O(n log n ) area, but uses only O(n) worst case uniswitch energy, which is at least a log 
factor better than a shallow depth monotone circuit for OR (n ). 
3.3 Multiple Output Functions 
The previous section gave a general energy lower bound for monotone circuits. No such nontrivial 
bound exists for the class of circuits over a complete basis. [n this section, a class L of n -input functions 
is defined for which superlinear energy is required if the functions are realized by a circuit of sublinear 
depth, when the I/O ports are on a convex boundary of the layout. Class L includes addition and multi-
plication of two n -bit binary numbers and other common multiple output functions . 
mtuitively, each n-input function in L is shown to have the property that many (ie. Q(n )) outputs 
can be switched by switching only 1 input. Hence these functions are called 1-m·itchable. 1-
switchability is shown to imply the existence of many switched paths between the switched outputs and 
the single switched input. By observing that these paths require large (ie. Q(n log n )) area, a large 
energy bound is obtained. The following discussion formalizes these notions. 
lf a and P are two boolean bit strings of length greater than zero, then H ( a, P) = 1 denotes that a 
and p differ in only one bit. H ( a, p) is called tne Hamming distance of the two strings. 
Definition : 
Let j(n) = ( Ji, . . . .f,,, (n)): { 0, 1} n ~ { 0, l}"' (n )_ If V n 3 an, f3n E { 0, l}n -3-H (an, Pn ) = 1, and 
-9-
Upper and Lower Bounds on Switching Energy in VLSI 
S11 = { i: for lSiSm .f( an )-:;t.:.f ( ~")} ; then if I S11 I =il(n ). then/= (jln l )nE~ is 1-switchable. 
Lemma 3.1: 
Let en be a circuit with n inputs (x 1,x2, ... ,Xn) and let z1 be a node of en. Lets 1, s2 be two states 
of circuit en such that :3i3-l:s;isn and s 1(x;)-:f.s2(x;) and Vk 3- l:s;kSn and k-:t-i, 
s 1(Xk) = s2(xk ); ands 1(z1 )-:f. s2(z1 ). When Cn switches between states s I and s2, then a path in e11 
from x; to ::.1 switches . 
Proof: 
case 1: 
::.1 is node xi for some i 3- lS i :s; n. Done. 
case 2: 
::.1 is a noninput node. Then at least one of the input edges to z J switches . Call this switched edge e J. 
Then the node at the tail of e1 switches. Continue on in this way. Since en is acyclic and x; is the only 
input that switches, this process yields a switched path from Xi to z1. [] 
Theorem 3.2: 
Let d be an integer ?: I log(n + 1) l. If a boolean 11-input function fn is I-switchable, then to compute 
fn with a VLSI circuit en of depth d where the I/0 ports are on a convex boundary of the layout requires 
£ ( e ) = Q( n log n ) 
w o rSI n log(2 d I log n) · 
Proof: 
Consider VLSI circuit Cn that realizes function/,, . By hypothesis,j~ is I-switchable. :. :3 two states 
of C n, s I and s 2 in which only one input, say x; has a different value. And, :3 set S of outputs such that 
ISl=il(n) and VzES,z switches when en switches between s 1 and Sz. By Lemma 3.1, :3 a 
switched path from x; to each member of S when en switches betweens I and Sz. By the definition of 
USM, en contains at most a constant k number of instances of input x; . Hence, at least one instance of 
X; must account for m ?: 1fl = Q(n) switched paths between x; and elements of S. These switched 
paths form a tree with m leaves on a convex boundary and depth d, which requires area 
n n loan 
.:i.i!.( log(2 d I log fl)) by Theorem 3.0. [] 
Note, in particular. that if d = O(log n) then E-.iwst (en ) = Q(n log n ). 
Consider the following problem list L : 
I) Integer Addition 
input (Xo, . . . ,Xp-1,Yo, .. . ,Yp-1) wherex; ,Yi E {0,1} 
output (z o, .. . , zp) where z; E { 0,1 I and 
X = Y Xi 2i. Y = Y Yi 2i, Z = X + Y = Y Z; 2i 
/~ I~ I~ 
n = 2p 
IZl=p+l=~+l 
2) Cyclic Shift 
input (x 0, .. . ,xp_1,s)wherexi E {0,1},0:s;s <p 
output (zo, ... ,zp-1) where Zi =X(i+s)modp 
n = p + ilogp l 
IZ I =p = O(n) 
l 1pper and Lower Bounds on Switching Energy in VLSI 
3) lnteger Multiplication 
input (xo , . . . ,Xp - 1,Yo, . .. ,Yp-1)where xi,Yi E {0,1) 
output (:::o, ... ,z2p-1) where Zi E {0,1} and 
X = ~t · 2; Y = ~y 2; ,, l ' I ' . . 
/ _ l_ 
n = 2p 
IZl= 2p= n 
Z = X * Y = Y Zi 2; 
17b 
4) Product of 3 Matrices over Z2 
input (X11,• •· ,Xpp,Ytt,· · . ,Ypp,WI1,· ·· , Wpp)wherexij,Yij,wij E {0,1) 
output (: 11 , ... , zPP) where 
:::;1 = ~ (\ 'ik * WJ..-J) mod 2 and Vik = ~(Xii * Ytk) mod 2 
n = Jp2 
IZl=p 2 =~ 
5) Binary-to-Unary 
input (Xo, .. . ,X(logpJ-1) where xi E {0,1} 
output (z o, . . . , Zp-1) where X = > x; 2i 
i~ 
n = logp 
IZ I =p = 2n 
Theorem 3.3: 
and 




For each function f described in L. two states, s 1,s2 are given below by defining the input 
configurations ( X, Y , W ) and the output configuration Z . p is defined in L for each function. The 
reader can verify that for each problem in L, when a circuit for f with n inputs switches betweens 1 and 
s 2, then one input bit switches and Q(,z ) output bits switch. 
I) Integer Addition 
S 1: X = lP , Y = OP ⇒ Z = OlP 
s2: X = lP, Y = Qp-l l ⇒ Z = 10P 
2) Cyc lie Shift 
for p even: 
s 1: X = (( 10) '!, 0) ⇒ Z = ( 10) ! 
s2:X = (( 10)! , l ) ⇒ Z =(01)! 
-11-
Upper and Lower Bounds on Switching Energy in VLSI 
3) Integer Multiplication 
S 1: X = lP, f = OP ⇒ Z = Q2p 
s 2: X = lP , Y = OP- 11 ⇒ Z = OP lP 
4) Product of 3 Matrices over Z2 
for p even: 
0 
l 
⇒ [Zij] = lPP 
(ie. y 11=0, Yi; = 1 V ij:;; 11) 
for p odd: 
same asp even except last row of [ YiJ] is OP ins 1 and s2. 
5) Binary-to-Unary 
S I: X = 0108 P ⇒ Z = 0 P 
£. I!_ 
s2: X = 10<.logp l-l ⇒ Z = 1 2 0 2 
[] 
Corollary 3.1: 
L d b . > r l ( 1) l Th f . . L . E A( n 100 ro ) b et e an mteger _ 1 og n + . e unctions m require worst = H log(2 d I log n) to e 
computed by a VLSI circuit of depth d where the I/0 ports are on a convex boundary of the layout. 
Proof: 
By Theorem 3.3, the functions in L are l-switchable. :. by Theorem 3.2, the functions of L require 
E _ Q( n loo n ) [] 
worSt - log(2 d I log n) 
3.3.1 Related Work 
Several researchers have studied a subclass of l-switchable functions called transitive functions , 
which includes integer multiplication and matrix multiplication. The set of transitive functions was 
defined by Vuillemin [Vu83], and Snyder and Tyagi [ST86] showed that the transitive functions form a 
proper subset of the 1-switchable functions . 
Lengauer and Mehlhom [LM81] obtained an Q(,z + (n 2 /log(A In 2 )) bound on the uniswitch 
energy of transitive functions. Snyder and Tyagi [ST86] rederived this result in the case where 
A = O(n 2 ). The bounds obtained by both [LM8 l] and [ST86] use information theoretic arguments that 
preclude encodings. Aggarwal et al [AGR88] improved tbeir result to obtain an Q(n 2 ) worst case bound 
-12-
Upper and Lower Bounds on Switching Energy in VLSI 
on the uniswitch energy of transitive functions. 
[AGR88] also showed that if the I/0 ports of an adder need not be on the periphery of the layout, 
then addition can be computed in 0(n log n I (loglog n)) uniswitch energy. Section 5 of this paper 
shows that addition can be computed in linear average energy while keeping the 1/0 ports on the peri-
phery of the layout. 
Snyder and Tyagi [ST86] have extended the result on I-switchable functions to a range of depths. 
In particular, they showed that a convex VLSI circuit C that computes a I-switchable function in depth 
d (n ), log2r1 ~ d (n) ~ ,zr., 0 < E ~ I. requires E,,wsr (C) * d (n) = Q(max(n logn, nd (n ))). 
3.4 Single Output Functions 
The proof techniques of the previous section are applicable only to multi-valued functions. Section 
4 describes a method for obtaining VLSI circuits for certain n-input predicates (ie. single-valued func-
tions), which use Eworst = 0(11 ) . These predicates include OR and AND functions on n inputs, and 
compare functions. However, it is unlikely that all n -input predicates that can be computed by a VLSI 
circuit of shallow (ie. 0(log 11 )) depth can be computed in 0(11) worst case energy. The following discus-
sion provides evidence for this conjecture, by describing a superlinear lower bound on parity, for a spe-
cialized basis. 
Consider the parity function on n boolean variables (ie. x 1 EB .t2 EB · · · EB x,, ). In the special 
case where the circuit basis is {EB,-, l, a superlinear energy lower bound for parity is derived in the fol-
lowing theorem, which was independently obtained by J. Leo [Le84]. 
Theorem JA: 
To compute parity of II inputs with a VLSI circuit C,, of area A requires Eu (C n) = Q(A) when the cir-
cuit basis is { EB, -, } and when C,, contains no constant inputs and no nodes that compute a constant 
function . 
Proof: 
Let W = {w) be the wires ofCn. 
Note that when the basis for C n is {EB,-, }, each node of C,, computes a parity function of a nonempty 
subset of the inputs or their negations. 
The inputs of C,, are assumed to be unifonnly distributed over { 0, I } (by the definition of Ea). 
Hence, each wire of C,, has value l (or 0) for exactly half the states of C,,. 
:. \fw E W, Pr (w switches)= 1/ 2. 
:. Ea (C,,) ~ ~ A. [l 
The definitions of Ea and Eworsr yield the following Corollary to Theorem 3.4. 
Corollary 3.2: 
To compute parity of II nonconstant inputs with a VLSI circuit C of area A requires Eworst (C) = Q(A) 
when the circuit basis is {EB, -, } . 
An alternate proof of the worst case lower bound for parity is presented below in Theorem 3.4A. 
The alternate proof yields a deterministic polynomial algorithm for computing a pair of states that 
induces a lot of energy. Theorems 3.4 and 3.4A together demonstrate the relative difficulty of average 
case analysis versus worst case analysis. 
-13-
Upper and Lower Bounds on Switching Energy in VLSI 
Corollary 3.3: 
To compute the parity function on n boolean variables with a VLSI circuit Cn of O(log n) depth with the 
I/O ports on a convex boundary of C n requires Ea ( C n ) = Q(n log n ) = Eworsr ( C n ) when the basis for 
Cn is {EB.-,}. 
Proof: 
LetA betheareaofCn. ByTheorem3.4andCorollary3 .2,Ea(Cn )=Q(A)=E.,..0 r51 (Cn ). Sincecir-
cuit C n must fanin the n inputs, and since nodes have indegree S: 2, A is at least as large as the area of a 
binary tree on 11 leaves. L BK80] and [Ya8 l] showed that such a tree requires area Q(n log n ) when the 
depth is O(log n) and the leaves are on a convex boundary . 
• ·. E.,n>rst(Cn ) ~Ea(Cn )~ ~A =Q(nlogn). [] 
Theorem 3.4A below provides a direct proof of the worst case lower bound for parity, in the special 
case where the circuit basis is { EB } . The reader will note the relative complexity of the deterministic 
proof technique used in Theorem 3.4A, compared to the simple probabilistic argument used to prove the 
stronger result of Theorem 3.4. Theorem 3.4A is primarily due to Stephen Cook and uses an observation 
of Leslie Valiant [Va84]. 
Theorem 3AA: 
To compute parity on II boolean variables with a VLSI circuit C" of area A and O(log 11) depth requires 
E,. -orst ( C n ) = Q (A ) when the basis for C n is { EB I and when C n contains no constant inputs and no 
nodes that compute a constant function. 
Proof: 
Note that when the basis for C n is { EB } , each node of C" computes a parity function of a nonempty sub-
set of the inputs. Let S be the set of nodes of C n. Let X = ( x 1, .. . , Xn ) be the input nodes (variables) 
of the circuit (function). 
Definition: 
Let p E S and let Jp be the parity function computed at node p. Let XP ~ X s-Jp is the parity function 
of inputs XP. Let wp 1, Wp 2 denote the output edges from p. ( Recall that C n has fanout S: 2). Let 
weight(p)= area(H,.pi) + area(wp2). Whenp has fanout 1, area(wp2) = 0. Whenp has fanout 0, 
weight( p) = 0. 
Lemma 3.2: [Stephen Cook] 
There exists an assignment B of boolean values to x 1, ... , Xn such that when C n is in state B, then 
Y, weight(p) ~ ~ A 
p E 5 A (7; ( Xp ) = 1) 
Proof of Lemma 3.2: 
The following construction sequentially defines an assignment B of values to the inputs 
- B ( x 1 ), .. . , B ( Xn ) - that will cause at least half the area of C n to be "l ". In the following, Sk is the 
subset of nodes of C n that depends only on the inputs x 1, ... , Xk . Ak is the area of the out edges of 
nodes in Sk. 
More formally, 
let sk = { p Es : Xk E Xp and Vi> k' X; E1- Xp I and 
let Ak = L, weight( p ) 
p E 5; 
Basis of assignment: B ( x 1) = 1 
B(x 1)= 1 ⇒ Y. weight(p)=A1 
p E 51 /\ ("J';,(x1 )= I) 
-14-
Upper and Lower Bounds on Switching Energy in VLSI 
ln general. suppose B ( x l ), ... , B ( Xk-l ) have been determined. To determine B ( Xk ) : 
There are two choices for B ( Xk ): 
Suppose B ( Xk ) = 0. 
Let W o = k weight( p) 
pES, A tfp(Xp)=O) 
Let W 1 = Y. weight( p ) 
p E S, A (h,(Xp ) = I ) 
case l: 
lf W 1 ~ ~ Ak then done. ie. B ( Xk ) = 0 
case 2: 
lf W l < -}Ak then set B ( Xk ) = 1. Since V p E Sk, j~ is a parity function and Xk E XP; then it 
follows that changing B ( Xk ) from O to I changes /p ( XP) from O to 1. 
Note that setting xk does not affect the functions realized by nodes in Si for 1::; i < k. 
:. J . weight(p) 2 i Ak 
pES, A (h(Xp ) =l) 
. l n > l_A :. J we1ght(p) 2 A 1 + 2 "5' Ak - 2 pES A lT, ( Xp) =l ) k";;'-i 
[] (end of Lemma 3.2) 
Since Cn consists of EB nodes only. X =On ⇒ all wires m Cn have value 0. Let B (X) be the value of 
X determined by Lemma 3.2. lf C n is switched such that X : 0 n ➔ B ( X ), then Ewvrsr ( C 11 ) 2 ~ A . 
[] (end of Theorem 3.4A) 
3.5. Open Problems 
The USM lower bounds on parity are derived in the special case where the circuit contains only $-
gates and negations. 
Conjecture: 
To compute the parity function on II bits by an 0( log n) depth circuit in which the inputs are on a con-
vex boundary requires Q(n log n) uniswitch energy. 
The conjecture above does not restrict the basis of the parity circuit. Note that in order to obtain an 
Q(area) uniswitch lower bound for parity in the general case, a notion of a "minimal" circuit is required. 
This is because an extraneous circuit that uses o(area) energy can always be "attached" to a parity circuit. 
What about the majority function? We believe that majority also requires superlinear uniswitch 
energy if computed by a shallow depth circuit. 
.i. WORST CASE UPPER BOUNDS 
4.1 Energy-Efficient OR and AND Circuits 
The energy-efficient OR circuit described in this section evolved from the simple observation that it 
is sufficient to tum on one OR input to tum on the output. Therefore, intuitively, even when many or all 
the inputs are turned on, only one of the "l" signals need propagate all the way to the output. In a com-
pletely analagous manner, it is sufficient to tum off one AND input in order to tum off the output. Thus, 
-15-
Upper and Lower Bounds on Switching Energy in VLSI 
when many inputs are tW11ed off, only one "O" signal must propagate all the way to the output. This is the 
essence of the SOR (Smart OR) circuit and the SAND (Smart AND) circuit, described below. When 
many inputs to SOR are " l ", all but one of these "l" signals are "killed", using the dual of SOR. which is 
SAND. Similarly, extraneous SAND inputs are "killed" using SOR signals. 
The layout of the SOR !SAND circuit is designed so that the area used to "kill" signals (ie. prevent 
" l" inputs from reaching the SOR output, and prevent "O" inputs from reaching the SAND output) is at 
most linear in the input size, and the area of both the "successful" path to an output plus the "killed" paths 
is at most linear in the input size. 
The following recurrences describe the boolean functions OR : { 0, l} n ~ { 0,1 } and 
AND:{0,t}n~{O,l} in a novel way. The reader can verify that OR(x1, ---, Xn) = 
x 1 v Xz v · · · v x,1 and AND ( x 1, ... , x11 ) = x 1 /\ Xz I\ · · · I\ x 11 • l11e USM circuit realization of 
OR and AND is the energy-efficient SOR !SAND circuit. 
Recurrences: 
OR ( X l• ... , X11 ) = OR (X J, ... , Xn 12) V [AND ( X 1, ... , Xn / 2) /\ 
OR(.t(n / 2)+1, · · · ,.tn)l 
2) AND(xi,x1 )=(xi vX1 )/\Xj 
AND(xi, .. . ,Xn)= [AND(xi, ... ,X11 12) vOR(X(n/2)+1• ... ,Xn)] 
I\ AND (X(n / 2)+ !, . - . ,Xn) 
OR ( x 1, ... , Xn) is abbreviated by OR (n ). OR ( .f 1, ... , Xn) is abbreviated by OR (n ). AND is simi-
larly abbreviated. (xi, ... , Xn) is also written as (X1; Xn ). 
The discussion that follows is a formal description of the construction used to obtain energy-
efficient VLSI circuits. To clarify the formalism, the reader is advised to refer to Figures 4.0 and 4.1, 
which illustrate a VLSI circuit called LF. LF is an embedding in the plane of circuit SOR /SAND, 
which computes the functions OR /AND. Figure 4.0 illustrates LF on 2 inputs. Figure 4.1 recursively 
depicts LF on n inputs. Circuit SOR /SAND and layout LF are precisely defined below. 
Detinition: 
SOR !SAND (n) = ( VSD, Wso ) is a circuit, illustrated in Figures 4.0 and 4.1, such that 
V SD = f I U / 2 U L where 
l I are the input nodes {x 1, Xz, ... ,Xn) and 
l 2 are the input nodes { X1, Xz, ... ,-~ }. 
L, the set of interior nodes, is as follows. 
L = {(v,vi1.k ),(v,v~-k ),(/\,V~·k ),(/\,V~·k )where t:c:;i:c:;log2n, 1:c:;k:c:;; } 
{ v log n , I , v Jog n -1 } are the output nodes. For consistency, xk is also denoted v P· k and Xk is also 
denoted as v 2- k • The nodes of/ 1 and I 2 are labeled to indicate that the inputs that occur at nodes of l 2 
are the negation of those at nodes of/ 1. 
W SD , the set of edges, is as follows. 
-16-
Upper and Lower Bounds on Switching Energy in VLSI 
e(,k = iv i-l.2k v4,k) 
:, \ I ' ~ ' 
ei .k = (vi - l.2k vi ,k) 6 4 , 4 , 
1l1e indices i, j and k are used to label the nodes and edges of SOR !SAND uniquely . The sub-
scr ipt j distinguishes bet ·1een types of nodes and edges, and superscripts i and k distinguish within a 
type. ln particular, i indexes SOR !SAND along a vertical axis, increasing from Oat the inputs along the 
bottom to log n (ie. depth( SOR !SAND) / 2) at the top. i is thus called a vertical index. k indexes 
SOR /SAND along a horizontal axis, increasing from left to right, and is called a horizontal index. 
Recall from section 2 that s is a state function that attributes boolean values to the nodes and wires 
of a circuit. Thus, for v E Vso, s (v) denotes the value of node v. In the upcoming analysis, when the 
input is not clear from context, sx ( v) will denote the value of node v, where X = ( x 1, .. . , Xn ) is the 
input to SOR !SAND (n ). Alternately, for i E N, Si (v) denotes the value of node v al time ti . Simi-
larly, for w E Wso, s (w) (or sx (w) or Si (w )) denotes the value of the node at the tail of wire w. The 
state function is extended to sets of nodes and wires as follows. For U ~ V50 u W50 , 
S ( U ) = { S (u): U E U ) . 
Let Fn be the function realized by circuit SOR !SAND (n ). 
F n : IO, l In ➔ IO, l } 2 such thal F n ( X ) = ( V fog n · 1 , V Jog n .1 ) 
The reader can verify that 
\, , log n . I - OR ( y y ' and I ..... - ·'- 1, · · · , .. ," ,. 
dogn.l =OR(xi , ... , Xn )=AND(xi, .. . ,Xn) 
Layouts of SOR !SAND (2) and SOR /SAND (n) are illustrated, respectively, in Figures 4.0 and 4.1. 
F n ( x 1, ... , Xn ) is abbreviated as Fn ( X ) or F (n ). The following defines an embedding of a circuit in 
the plane. 
Detini tion: 
An (/ , J )-grid-with-diagonals GDu = (I/,£) is a graph where 
\1 = I (k, m) I O ~ k <I, 0 ~ m < J } (ie. set of cartesian coordinates), and edges of£ join vertex pairs 
that are either unit distance apart or distance ✓2 apart. GD 44 is illustrated in Figure 4.2. 
Definition: 
A layout (embedding, placement), 'I', of graph G = ( V, E) into GDu is a l-to-1 mapping of V into 
I/ and E into paths (wires) of£ such that V(x,y)E E, 'P(x,y) is a path from 'P(x) to 'P(y), and 
every pair of paths in £ is edge-disjoint. 
Definitions: 
height (GDu ) ~ /-1 
width (GD11) ~ J-1 
-17-





4 _____ ..,./\ 
L F( X1 ,X2) 




L F(X 1 •···,Xn/,. ) 
2 
LF( Xn/,. +1 •···• Xn) 
2 
LF(X 1 , ••• , Xn) 




Upper and Lower Bounds on Switching Energy in VLSI 
(3,0) (3,3) 
(0,0) (0,3) 
Figure 4 .2. GD 44 
area (GD11) ~ height (GD11) * width (GD11) 
Let LF(x 1,x2, ... ,Xn) = '¥(SOR /SAND (n )) such that input nodes are unit spaced on a line. with Yj 
placed to the immediate right of Xj for t:s; j :s; n, and the wire lengths are as follows. 
11 e ii· k 11 = 11 e ~ · k 11 = 2, 11 e ~ · k II = 11 e i · k II= 1, II e ~ · k II= II e ~ · k II= f2, and 
The relative location of the nodes of SOR !SAND (n) in the layout is evident from the recursive descrip-
tion of LF. illustrated in Figures 4.0 and 4.1. LF ( x 1, ... , x11 ) is abbreviated by LF (n) or LF. Some 
facts about LF (n) and SOR !SAND (n ): 
1) height( LF (n)) = 2 + height( LF ( ~ ) ) 
= 2log2n 
2) area( LF (fl))= height( LF (n)) * width( LF (n)) 
= 2log2n * (2 11 - I) 
::: 4 n log2n 
Note that i1 . (11 ) is a complete binary tree with n leaves unit spaced on a line, 
height( T (11 ) ) = log fl and 
width( T(n )) = n - l. 
:. area( LF (n )) ::: 4 * area( T(n )). 
3) Let D (n) be the depth of the SOR !SAND (n) circuit. 
D (11) = 2 + D ( ~) 
= 2log2n 
4) The reader can verify from the recurrence for OR (n) and layout LF (n) that V X E { 0, l} n 
for Similarly, 
( sx (e ~ ,k ). sx (d • k )) ;t: (0,0) for 1:s; i :s; log211, 1:s; k :s; ; , from the recurrence for AND (11) or by dual-
ity. 
-19-
lfpper and Lower Bounds on Switching Energy in VLSI 
The formal discussion that follows derives the upper bound on the worst case energy used by LF. 
Intuitively, the analysis below proceeds by first partitioning LF into a subgraph of "short" wires and a 
subgraph of "long" wires. The short wires are shown to constitute only O(n) area, and are thus elim-
inated from subsequent discussion. Further, since LF consists of dual SOR and SAND subcircuits, only 
the long wires of the SOR subcircuit are fully analyzed. The long wires are shown to occupy O(n log n) 
area, but further analysis shows they use only O(n) worst case energy. The reader is advised to refer to 
Figure 4.1, which depicts layout LF of circuit SOR !SAND (11 ), while reading the following definitions. 
Definitions: 
Recall that circuit SOR !SAND (n) = (II u / 2 u L, W.m ). 
Let vT (m. n) = ( v V (m, 11 ), vE (m, n )) be a labeled subgraph of SOR !SAND (n) such that 
vV(m,n)=/ 1 u{vi ·',,·\-' I l~i:dlog(n-m+l)l.r; i~k~r; 1iand 
vE(m,n)=! d ·k,e'.i·k,e~·k.d,k 11:::;i:::;11og(n-m+l)l. r.!?!..1:s;k:s;r~1} 
. ~ ~ 
Below, /\T (m, 11 ) is defined analogously to vT (_m, n ), as can be seen from Figure 4.1. 
Let /\T (m, n) = ( /\ V (m, n ), /\£ (m, n )) be a labeled subgraph of SOR !SAND (n) such that 
"V (m ' 11 ) = I 2 u I , . 2. k ' V p I I~ ; :dt og ( ll - m +I) l, r ; 1 ~ b r ~: 1 }and 
AE(m.n)={e\·',e\·',eO·',ei·' I l~i:dlog(n-m+I)l.r; i~hr; 11 
v/011owires Q !ei.,k I 1:::;i:s;Jo2:n, l:s;k:s;~} 
,'> _ I ~ 21 
vshortvvires ~ !e\·k,e4,k,d.k I 1:s;;:s;Jog11, 1::-:;k:s; ~!} 
/\l011gwires ~ !es·k I 1::-:;i:s;Iogn, J:s;k:s; ~~ } 
/\Shornvires ~ !e~.k,e3 ·k,e~·k I 1:::;i:s;Jog11, lsk:s; ~~} 
/ongwires g v!ongivires u 1\/ongwires 
sl10rn1·ires ~ vsl10rt1,vires u Ashortwires 
In the following, node and edge indices i and k are mapped to input indices i 1, i 2, i 3. and i 4. 
For i , k > 0, 
let i 1 = (k-1)2i + I. 
i2 = (2k-1)2i-l, 
i3=i2+I, 
i 4 = k2i 
Definitions: 
v/011gwires u11der e4,k g vE(i l,i2) n v/011g1,,vires 
/\/011gwires under d·k ~ /\£(i3,i4) n 1\ /ongwires 
Let Avs(n) be the area of the vshorMires in layout LF(n). Let Ar-s(n) be the area of the 
/\ShorMires in layout LF (_n ). 
Lemma -ti: 
A v s (11 ) = O(n ) 
A l\s (n ) = O(n ) 
-20-
llpper and Lower Bounds on Switching Energy in VLSI 
Proof: 
r.= 
A VS (2) = 3 + 'I 2 
A v5 (11) = 2A v5 ( ~) + (3+f2) 
$ 1011 
A I\S (11 ) = 0(11) by symmetry of layout LF (n ). [J 
Thus, since the short¾•1"res can contribute at most 0(11) energy. the remaining analysis considers only 
the /ongwires. Further, only the v longwires are discussed in detail. The !'Jongwires follow by dual-
ity of the circuits and symmetry of the layouts. 
Lemma -L2: 
lf OR (x 1 . ... , Xn) = 0 thens( v/011gwires) = (0) ands ( l'.!ongwires) = ( 1 }. 
Proof: 
In the following analysis, a node (or edge) and its respective function share the same label. 
Recall that v!ongwires = ( e ~ ,k I 1$ i $ log 11, 1$ k $ ~). The following shows that s ( v /011gwires) 21 
= {0} when OR (11 )=O. 11 is a power of 2. 
ConsiderV 1 = (vi1.k I 1$i$logn , l$k$;} 
Assume for the moment that s ( V 1 ) = ( 0}. 
Since vii· k = e ii· k v e ~ · k, then s ( v i1 · k ) = 0 ⇒ s ( e ~ · k ) = 0 . And since the head of every e ~ • k edge 
is av i1 .k node, s ( V 1 ) = ( 0} ⇒ s ( v!ongwires) = ( 0}. 
It is left to show that s ( V 1 ) = { 0}. 
Induction on i, the vertical index of SOR !SAND (n ), which computes OR (11 ). 
basis: i = 1 
SOR /SAND (2) realizes OR ( x 1,X2 ), which is the function computed by node v { 1 
s ( V 1 ) = ( s ( v r- 1 ) } = ( 0} by hypothesis. 
induction step: 
Considervi1,k E V1 
vil.k =Vi1-l.2k-l v(vi-l,2k-l AVi1-l.2k) 
By the induction hypothesis, s ( v il- 1. Zk-l ) = 0 ands ( v il-1. 2 k ) = 0. 
:. s( vii .k )=O. 
Thats ( l'.!011gwires) = { 1 } follows from the duality of OR and AND. (l 
Lemma ~.3: 
VXe (O,l}n,1:s;i:s;logn,l$k$ ~~, 
(a)s(d·k )=l ⇒ s( vlongwires underd·k )= {0}, 
(b)s(e§·k )=0 ⇒ s(l'./ongwires underd·k )= {l}. 
Proof: 
(a)s(e 7·k )=l ⇒ s(eii · k )=0 byFact4 
=s(vi1-1.2k-1) 
Buts ( vi1-l.Zk-l ) = OR (Xi 1, ... , Xi2) = 0 
:. by Lemma 4.2, s ( vlongwires under d · k ) = { 0}. 
(b) follows by the duality of OR and AND. [] 
-21-
Upper and Lower Bounds on Switching Energy in VLSI 
Definitions: 
A legal state of circuit SOR !SAND (n) is called a k-state if OR (n) = k , for k E { 0, 11. 1f 
G=(V,£) is a circuit embedded in GDu and kE !0,1}, then the k-area of G (or£) 1s 
( A J . II H,. II ), where A is a technology dependent constant > 0. 
wEl!:'&s(w)=k 
Lemma -L-l: 
For any I-state of circuit SOR !SAND(n), the I-area of 'i'(vT( l,n))~ 10n and the 0-area of 
'i'( /\T ( l, n ) ) ~ l On . 
Proof: 
Consider 'i'( vT ( 1, ,z )) . 
Let A vl (n) be the I-area of v/ongwires in layout LF (n ). Recall from Lemma 4.3 that for 
w E v/ongwire, s (w) = l ⇒ s ( v /ongwires under w ) = { 0} . Hence the followirlg recurrence for 
A vl· 
A vd2) ~ l + '-✓2 
n -
A vdfl) ~ A vd 2 ) + fl + ✓ 2- l 
~ 2n 
Recall from Lemma 4.1 that the vshornvires can contribute at most 1011 I-area. Hence the first part of 
the Lemma. By duality of OR and AND, and by the symmetry of LF, the 0-area of 
'i'( /\T ( l, n ) ) ~ l On . [] 
Theorem -l.l: 
u For all pairs of legal states , the worst case energy used by LF (n ), E.,.,orst ( LF (n )) = O(n ). 
Proof: 
Let s I and s2 be two legal states of LF. There are 3 cases irl which LF consumes energy when LF 
switches from s 1 to s2. 
l) s I is a 0-state and s 2 is a I-state. By Lemma 4.2. s I a 0-state ⇒ s ( v/ongwires) = { 0} and 
s ( 1\/ongwires) = { l l. By Lemma 4.4, when LF switches to state s2, at most 1011 area of 
'i'( vT ( 1, fl)) switches on and at most I On area of 'i'( /\T ( 1, fl)) switches off, conswning at most 20n 
energy. 
2) s 1 is a I-state and s 2 is a 0-state. Theorem 4.1 follows as r.'.:>ove except the 1-area of 'i'( vT ( 1, n )) 
switches off and the 0-area of 'i'( /\T ( 1, n )) switches on. 
J) s I and s2 are both I-states ands I i:-s2. Clearly, at most twice the switching occurs irl this case as 
above, using at most 40n energy. [] 
4.2 Compare Functions 
The technique developed in the previous section to yield energy-efficient VLSI circuits for OR and 
AND functions can be applied further. In particular, this section shows how to extend the technique to 
produce energy-efficient circuits that compare boolean bit strings lexicographically . 
Definition: 
Let A=(a 1, . . . ,a11 ) and B=(b1 , .. . , b11 ) such that ai , bi E {0,1) for l~i~n. Then 
,1 = B if f 'ii i 3- 1~ i ~ n , ai = hi . 
-22-
Upper and Lower Bounds on Switching Energy in VLSI 
Theorem 4.2: 
A = B can be computed by a VLSI circuit C n of O(log n ) depth and Eworst ( C n ) = 0(n ) when the n 
inputs are on a convex boundary of C n . 
Proof: 
A = B iff [ 1:,.. (a; = b; )] = l 
lS Is; n 
iff [
1
s;1\/a; EB b; )] = l 
Embed a; and b; such that they are 0(1) distance apart. Clearly (a; EB b; ) can be computed in 0(1) 
energy. Thus, by Theorem 4.1, 1:,.. (a; EB b; ) can be computed in Eworst = O(n) when the circuit 
lS1Sn 
depth is O(log n ) and a; , b; are on a convex boundary of the layout. [] 
To obtain energy-efficient VLSI circuits for compare functions such as >, ~, etc., (eg. Is X > Y 
where X = ( x 1, ... , Xn ) and Y = ( y 1, . . . , Yn ) are lexicographic boolean bit strings), a VLSI circuit 
EG, illustrated in Figure 4.3 below, is constructed from three modified instances of the SOR /SAND cir-
cuit and some connecting circuitry. EG contains an instance of SOR /SAND as described in section 4.1, 
and a subcircuit called SAND !SOR in which the logical gates are reversed. ie. /\-gates in SOR /SAND 
become v-gates in SAND /SOR and vice versa. The third "plane" of circuitry resembles "half" a 
SOR /SAND circuit. This partial SOR /SAND-like subcircuit, indicated by the striped lines in Figure 
4.3, is made energy-efficient by "piggybacking" off the complete SOR /SAND subcircuit. 
Theorem 4.3: 
The worst case energy used by EG (Xn, Yn, ... , x 1,Y 1 ), E!!orsr(EG (n )) = O(n ). 
Proof Idea: Intuitively, each SOR /SAND -like "plane" uses O(n) energy by Theorem 4.1. The connect-
ing wires (denoted by the textured lines in Figure 4.3) use area O(n) and hence energy O(n ). The com-
bined circuit thus uses only O(n) energy in the worst case. The formal details of the construction and 
analysis can be found in [Ki87]. 
4.3 Recent Results 
Kissin et al [KKTV90] recently extended the SOR !SAND technique to k-threshold functions. 
They have obtained a linear upper bound on the worst case uniswitch energy required to compute a 
k-threshold function, where k is a fixed constant. 
5. AVERAGE ENERGY 
5.1 Definitions and Easy Bounds 
This section derives the basic definitions needed to discuss average energy, and analyzes some sim-
ple circuits. In particular, an n -leaf complete tree whose interior nodes are either all /\-gates or all 
v-gates is shown to use O(n) average energy, when the leaves are embedded on a convex boundary of 
the layout. A novel adder layout that uses linear average energy is described in section 5.2. 
Recall from section 2 the following definition of average switching energy. 
Definition: 
If Ew ( C n, s 0, X ) (ref. section 2) is the wire energy dissipated when C n : so➔ X, then £0 ( C n ). the 








0 ...... i- 1,1 uv, 
des 
0 input 
0 input / output 








I . ,,,~i-1,2 
EG (xn •Yn , .. . , Xn-"2+1' Yn/
2
+1 ) avj-1,1 
t:Nj-1 ,2 0 
EG (X,y, • Yn/, •···•X1 •Y1 ) 2 2 









Most i, k superscripts are omitted for clarity . 



























Upper and Lower Bounds on Switching Energy in VLSI 
Ea(Cn )~ > Ew(Cn,So,X ) / 2'!.n 
(s (0( ) 
where n = I X I and 22n is the number of input pairs. Ea is also written as a function of n, ie. Ea (n ). 
Ea ( C n ) averages the wire energy over all pairs of inputs, which are assumed to be equally likely. 
Note that the definition of Ea in [Ki82] averages the wire energy over all inputs to a circuit in a particular 
state. Thus, the definition above is stronger in that a lower bound for Ea (Cn ) does not depend on pick-
ing a "bad" initial state. 
Average switching is defined analogously to average energy as follows. 
Detinition: 
If Sw ( C n , so, X ) is the number of wires in circuit C n that switch when C n : so ➔ X, then Sw a ( C n ), 
the average S111itching is given by 
Sw a ( C n ) = > Sw ( C n , So, X ) I 22n 
(st;':;( ) 
where n = I X I and 22n is the number of input pairs. Swa is also written as a function of n , ie. 
Swa (n ) . Note that Swa (Cn ) averages the number of wires that switch, while Ea (Cn ) averages the 
area of the wires of C n that switch. Implicit in the definitions of Ea ( C n ) and Sw a ( C n ) is the assump-
tion that the inputs to C n are uniformly distributed over ! 0, l } . 
Definitions: 
Let w =( v , 0 be a directed edge (wire) of circuit C . If L is the length of the longest path from any input 
to node v, then edge w is at level L. For consistency, an input node is called a level 0 wire. A com-
plete binary tree with II leaves and an v-gate (A-gate) at each node is called an 
11-0R (n -AND) tree. 
The following analysis shows that the average energy used by an n -AND tree or an n - OR tree is 
0(11) when the inputs are unifor:ily distributed over ! 0, l } . 
Switching Lemma 5.1: 
Ann-OR (n-AND) tree Tn has average switching Swa (n) = 0(n ). 
Proof: 
Swa (n) = O(n) follows from the fact that Tn contains O(n) wires. 
Swa (n) = Q(n) follows from the fact that Pr ( level 0 wire switches) = l / 2. Thus, n I 2 level O wires 
switch on the average. :. Swa (n) = 0(n ). [] 
Theorem 5.1: 
There exists a layout of an n - OR (n-AND) tree Tn with leaves on a convex boundary, which con-
sumes average energy Ea (n) = 0(n ). 
Proof: 
By [BK82] a complete tree embedded with n leaves on a convex boundary requires O(n log n) area. 
Consider fn, a standard embedding illustrated in Figure 5.1. Input node i is at position (i, 0). so that all 
v-gates have an x-coordinate k + 1, for i, k E N. Since the vertical wire segments of fn contribute 
only O(n) are«, they are omitted from the following analysis. Let A (k) denote the area of the horizontal 
wires of fk . A (k) is determined by the following recurrence. n is a power of 2. 
A (2) == I 
n n A (n) = 2 A ( 2 ) + 2 
-25-
Upper and Lower Bounds on Switching Energy in VLSI 
The probability, Pr, that a wire of Tn switches is as follows: 
Pr ( level O wire (ie. input) switches)= l / 2 
22t_l 






Let Ea (k) denote the average energy of (the horizontal wires of ) Tk. Ea (k) is obtained recursively 
below from the switching probabilities and the area recurrence for A. 
Ea (2) = 1/ 2 
2n/7 1 E (n ) = 2 E ( !..!._ ) + !..!._ ( -- ) 
ll a 2 2 2n-l 
n 2n 1 L l 
where 2 ( 2
n-l ) is the horizontal wire area at level (logn)-1 times the probability that this area 
switches. 
n 2n !2_1 
11 ➔ 00 ⇒ 2( 2n-l ) ➔ 0 . . ·.Ea(n)=O(fl). 
Eu (fl) = Q(n) follows from the fact that Pr ( level O wire switches) = l / 2. Thus. the average nwnber 
of level O wires that switch is fl I 2. Since each input wire is at least I unit long, Ea (n) = Q(n ). 
:. Ea (n )=0(n ). [] 
5.2 An Average Energy-Efficient Adder Layout 
The Brent/Kung layout [BK82] of a shallow depth parallel prefix adder [LF80] uses O(n log fl) 
energy both in the worst case and average case. While section 3 showed that this is optimal in the worst 
case, this section shows that, on the average, one can do better. 
The recursive construction described in this section uses a layout technique of section 4 to obtain an 
embedding of the parallel prefix adder that uses, on the average, O(n) energy. The followin~ definitions 
introduce terminology that is later used to describe layouts. 
Definitions: 
Let p be a node in a VLSI circuit A, embedded at location ( Xp, )'p ), and let q be a node in A embedded 
at coordinate ( Xq, Yq ). Let d E z+. 
(i) q is@ p if (xq ,Yq) = (xp ,Yp ). 
(ii) q is d units north (also written as N) of p if (Xq ,Yq) = (xp ,Yp+d ). 
An O(n log n) layout for an n- OR tree 
Figure 5.1. 
-26-
Upper and Lower Bounds on Switching Energy in VLSI 
(iii)q isd units south (S)of p if(xq,Yq )=(xp,Yp-d ). 
(iv)q isd units east (E)of p if(xq,Yq )=(Xp+d,yp ). 
(v)q isd units west (W)of p if(xq,Yq )=(xp-d,yP ). 
(vi) q is (d ..f2) units north- east (NE) of p if (Xq ,Yq ) = ( Xp+d ,yp+d ). 
(vii) q is (d1"2) units south-east (SE) of p if(xq,Yq) = (xp+d ,Yp-d ). 
d is called the displacement. An element of { N, S , E, W, NE, SE,@ I is called a heading . Head-
ings are also defined on coordinates directly. For example, q is two units S of ( x 1, y 1 ) if ( Xq, Yq ) = 
(xi , y i- 2 ). Clearly d = 0 for the heading@. 
Let S (n) denote a VLSI circuit (ie. an embedded circuit) that computes the carries generated by 
adding two n- bit binary numbers. In particular, S (n) receives as input two vectors, (pt, .. . , Pn) and 
( g 1, ... . gn) where Pi = ai EB bi and gi = ai I\ bi for 1$ i $ n, and itai 2i-l and J?i 2i-l are the two 
n - bit binary numbers being added. S (n) produces as output the carry vector (c 1, ... , Cn) defined 
recursively as co=O. Ci =gi v(pi /\Ci-I) for 1$i$n. Once the carries are computed, the sum vector 
(s 1, ... , Sn+l ), defined as Si =pi EB Ci-1 for 1$ i$n, and Sn+l =cn, can be computed. 
S (n) is defined recursively. An abstract view of S (n) called generic S (n) is shown in Figure 5.5. 
Four corners of S (n) are distinguished, moving clockwise from the bottom left comer, as BLC, TLC, 
TRC, and BRC, all of which denote the coordinates of the gate or I/ 0 port embedded at the respective 
corner. Computed at these corners of S (n) are the following: carry generate gi+n-1 at BLC; carry pro-
pagate Pi at BRC; block generate G (n , i ). which is the carry generated by the inputs to S (n ), at TLC; 
and block propagate P (n, i )=(p; APi+t /\ · · · APi+n-l) 
S (n ) = ( L V (n ), LE (n ) ) is . a VLSI circuit composed of L V (n ), the set of embedded nodes and 
LE (n ), the set of embedded edges. Each element of LV (n ) is written as a triple ( v , f, L ), where v is 
TLC ci+n -1 me 
G( n, i) p ( n, i) 
• 0 . C;.1 
0 input 
• C;+n-1 C;.1 • output 
S (n) 0 input / 
output 
• C;+n -1 C;.1 
9i+n -1 C;+n-1 Pi+n -1 .... . . 9 ; C; P ; 
BLC BRC 
Figure 5.5. generic S (n) 
-27-
Upper and Lower Bounds on Switching Energy in VLSI 
the node identifier, f is the function computed at node v, and l is the iocation of v in the embedding. L 
is a triple ( k, D, vP ), where k E z+ is a displacement, D E { N, S, E, W, NE, SE,@ I is a heading, 
and vP is a coordinate of the layout or a node whose location is already defined. The location of v is 
determined relative to vP. For input nodes, the function entry in the node triple is I . In S (n ), each node 
labeled Ci computes the identity function. These nodes are output ports that are functionally denoted as 
0 in the node triple of l V (n ). Examples of elements of L V (n ) are: 
(i) ( g 2,l, ( O,@, BLC (S (2)))) states that g 2 is an input node embedded at the bottom left comer of 
S (2). 
(ii) ( v 1, v, ( 2, N, g 2 )) means that v 1 is an v-gate embedded 2 units north of input g 2· 
(iii) (c 1, 0, ( 1, W, BRC (S (2)))) states that c 1 is an output port embedded 1 unit west of the bottom 
right comer of S (2) . 
An element of LE (n) is written as a pair ( w, k ) where w is an edge, written as a pair of adjacent 
nodes, and k E z+ is the length of w in the embedding. 
Recall that S (11 ) is a VLSI circuit that computes the carries of binary addition. This section has 
thus far described the tools used in the following recursive definition of S (n ). 
Recursive Definition of S (n ): 
base case: 12 = 2, illustrated in Figure 5.6. 
S (2) = ( L V (2), LE (2)) where the node set L V (2) and the edge set LE (2) are as follows. 
LV(2)= {fori E 11,3,5, ... ,n-l},(pi,l,(0,@,BRC(S(2)))), 
(gi,l,(2,W,pi )),(Pi+tJ,(4,W,pi )),(gi+1J,(6,W,pi )), 
(ci,0,( l,W,pi )), ( v1,v,(2,N,gi+l )), 
( v 2 , A, ( 1, N, gi )), ( v 3, A, ( 2, N ,Pi ) ), ( v 4, v, ( 1, N, ci )), 
( v 5, A, ( 2, E , v 4 ) ) , ( c i --1 , 0 , ( 2, E , v 5 ) ) } 
LE(2) = {for i E { 1,3,5, . .. ,n-1), ((gi+I, v 1 ),2), ((Pi, v3),2), ((v5,v 4 ),2), 
((Pi+ 1, V 2 ), 1 + 'J2 ), ( ( gi , V 2 ), 1 ), ( ( V 4, Ci ), 1 ), ((Ci-I, V 5 ), 2), 
(( gi, v 4 ), 'J2 ), ((Pi, vs), 'J2 ), (( v 2, v I ), 3 + 'J2 ), ((Pi+l, V 3 ), 4 + 'J2)} 
S(2): Co=O 
Figure 5.6. Base Case for Adder Circuit 
-28-
Upper and Lower Bounds on Switching Energy in VLSI 
S (2) is illustrated in Figure 5.6. 
induction step: illustrated in Figure 5.7. 
S (n) is defined recursively from two instances of S ( ~ ), distinguished as LS ( ~ ) for the leftmost 
instance, and RS ( ~ ) for the rightmost instance. The labels LS and RS are used only where ambigvity 
might arise; otherwise, Sis used. Input node Pi is embedded at BRC(RS ( ~)),and gi+(n/2)-1 is at 
BLC(RS(~)). Embedded at BRC ( LS(~)) is input Pi+(n / 2), and at BLC(LS(~)) is g;+n-1· 
LS (~)and RS (~)are laid out so that BRC ( LS (~))is two units west of BLC (RS ( ~ )). 
Because many nodes in S (n) have the same name, the following auxiliary names are introduced. 
Let { w 1, w 2, w 3, w 4 } denote the nodes at the top four corners of LS ( ~ ) and RS ( ~ ). In particular, w 1 
is at TLC(LS( ~ )), w 2 is at TRC(LS( ~ )), w3 is at TLC(RS( ~)),and w 4 is at TRC(RS( ~ )). Let 
w 5 denote the node located -f2 units south- east of Wz. 
Recall that S (n ) = ( L V (11 ), LE (11 ) ) where LV (n ), the vertex set, and LE (n ), the edge set, are 
defined recursively as follows. 
LV (n) = ( LLV ( ~ ) u RL V ( ~ ) u LA (n)) where UV ( ~ ) is the vertex set of LS ( ~ ), RLV ( ~ ) is 
the vertex set of RS ( ~ ), and 
TLC IO input • output 0 input / output I TRC 












91+n -1 C;+n-1 Pi+n -1 ... . .... Pi+n 9i+n-1 ....... .. g; C; P; 
BLC 2 2 BRC 
Figure 5.7. Adder Layout S (n) 
-29-
Upper and Lower Bounds on Switching Energy in VLSI 
LA ( fl ) = { ( v 1 , v, ( 2, N , w 1 ) ) , ( v 2, /\, ( 1, N , w 3 ) ) , ( v 3, /\, ( 2, N , ),\,! 4 ) ) , 
( V 4, v, ( ✓2,NE' W3 )), ( V 5, I\, ( ✓2, NE' w 4 )), ( V6, F, ( ✓2, NE' w i)) I 
v 6 is called the root (S (n )) and alternately labeled r. rL denotes root ( LS ( ~ )) and rR denotes 
root (RS ( ~ )). 
LE (n ) = ( LLE ( ~ ) u RLE ( ~ ) u LB (n)) where LLE ( ~ ) is the edge set of LS ( ~ ). RLE ( ~ ) is 
the edge set of RS ( ~ ). LB (n ). the subset of edges that connect subcircuits LS ( ~ ) and RS ( ~ ), is 
defined as follows. 
LB (11 ) = { (( w 1, v 1 ) , 2), ( ( w 4, v 3 ) • 2), ( ( w 2, v 2 ) , ✓2 + 1 ) , ( ( W 3, v 2 ) , 1 ) , 
( ( W 3, V 4 ) , ✓2 ) , (( V 4, rR ) , 2), (( W 4, V 5 ) , ✓2 ) , (_ ( rR , W 5 ) , 2), 
( (r ,rR ) , 2 ), (( v 2, v 1 ). 2n + ✓2 -1 ), ( ( w 2, v 3 ), 2n + -✓2 ), ( ( v 5, v 4 ) , 2n - 2)} 
S (n) is illustrated in Figure 5.7 . 
In the rest of this section, S (n) is shown to use O(n) average energy. Intuitively, the proof begins 
by observing that a subcircuit of S (n) is an n -AND tree. By a proof similar to that of Theorem 5.1, 
this n -AND tree subcircuit uses O(n) average energy. The rest of the proof proceeds very much like 
the analysis of the SOR !SAND circuit of section 4. The wires of S (n) minus then-AND tree subcir-
cuit are partitioned into shorrn'ires and longwires. The shortwires are shown to occupy O(n) area 
and are thus eliminated from further discussion. The switching probabilities of the longwires are then 
determined, assuming the inputs are uniformly distributed over { 0,1}. Using these switching probabili-
ties, the average energy of the /ongwires is shown to be O(n ). 
Lemma 5.2: 
The VLSI circuit S (n ), which computes n + 1 addition carries in 0( log n) depth, uses average energy 
Ea (n ) = 0(11 ). 
Proof: 
The first part of the proof extracts a subcircuit, A . of S (n ) that is an n -AND tree. Let V 3 be the set of 
all nodes of S (n ) labeled v 3. Let W 23 be the set of edges of S (n ) labeled ( w 2, v 3 ) and let W 43 be the 
set of edges of S (n ) labeled ( w 4, v 3 ). Let P = ( p 1, ... , Pn) be inputs of S (n ). Let A = (AN, AE ) 
denote a subcircuit of S (n ) such that AN = V 3 u P and AE = W 23 u W 43. It is easy to see that A is 
an n -AND tree. Since the inputs P are uniformly distributed over { 0,1}, A uses average energy O(n) 
by a proof similar to that of Theorem 5 .1. 
The analysis that follows examines S (n ), which is S (n ) minus the n -AND tree A . Let 
S (n) = ( L C1 , LE ) denote the graph obtained by removing the nodes and edges of A from S (n ). In par-
ticular, Lf1 =LV(n)-AN andLE =LE(n)-AE. 
The reader is advised to refer to Figure 5.7 to clarify the following definitions of long,vires and 
shortwires. 
Definitions: 
Let b/ongwires (S (n )) be the set of wires of S (n) labeled ( v 2, v 1 ). 
Let rlongwires(S(n )) be the set of wires of S(n) labeled ( V5,V4 ). 
/ongwires (S (n )) ~ blongwires (S (n )) u rlongwires (S (n)) 
shorMires (S (n )) ~ LE - longwires (S (n )) 
An element of longivires (S(n )) is called a longwire. Similarly, blongwire is a member of 
b/ongwires (S (n )) and r/ongwire is a member of rlongwires (S (n )). 
-30-
Upper and Lower Bounds on Switching Energy in VLSI 
Let R (n) be the area of shortwires (S (n )). 
Let L (n) be the area of longwires (S (n) ). 
The following Lemma shows that the area of shortwires (5 (n )) is O(n ). 
Lemma 5.3: 
R (n) = O(n) 
Proof: 
R(2)=9+3 '✓2 
R (n) ~ 2 R ( ; ) + 8\og n + 8 + 3 '✓2 
= O(n) 
The 8log n factor in the expression above accounts for the wire area needed to fanout intermediate carries 
Ci-1 and Ci+(n / 2)-1 (see Fi6ure 5.7). [] 
Since the slwrtrvires can contribute at most O(n) energy, the remaining analysis considers only the 
longwires. The area of the longwires can be determined from the following recurrence. 
L (2) = 5 + -fi 
L (n) = 2 L ( ; ) + 4n + -f2 - 3 
The reader can verify that L (n ) = O(n log n ). 
It remains to show that the average energy of the longwires is O(n ), which follows from examin-
ing the switching probabilities of the longwires. Note from Figure 5.7 that the head of a longwire is a 
conjunction labeled either v 5 or v 2, and in both cases, a node of then -AND tree is one of the conjuncts. 
Thus, it is intuitively clear that the longwires have a small probability of switching since the longwires 
are even less likely to switch than the wires of the n-AND tree. The following discussion formalizes 
this intuition. 
Recall from sections 2 and 4.1 that, for z E L fr u LE, s (z) denotes the value of z for some input 
to S (n ). The label z, however, serves double duty in the following analysis. z denotes a node (or wire) 
and its respective function. The intended usage is clear from the context. The following definition of 
stage facilitates the discussion of the switching probabilities of wires in S (n ). 
Definitions: 
A blongwire of area 211 + '✓2- 1 is a stage ( log n )-1 blongwire. An rlongwire of area 2n - 2 is a 
stage ( log n )-1 rlongwire. A node x is at stage k, denoted xk, if x is the tail of a stage k 
longwire or x is an input to the tail of a stage k longwire . 
Let Pr ( wt switches) denote the probability that stage k blongwire w switches. Let Pr ( wf switches) 
denote the probability that stage k rlongwire w switches. 
The following Lemma shows that the probability of a long1,,vire switching decreases as the stage 
of the wire increases. 
Lemma 5.4: 









(i) Pr ( w§ switches)= Pr (wt: 0 ➔ l) + Pr ( wt; 1 ➔ 0) 
= 2Pr(w§; l ➔ 0) ~ 2Pr(s(wl)= l) 
-31-
Upper and Lower Bounds on Switching Energy in VLSI 
:s; 2 Pr (s ( v 1 ) = 1) * Pr (s ( d ) = 1) 
j+2k-[ 
Since v5 = A Pi forsomej 3-- L:s;J:s;n-2k+l, 
I=} 
j + 2k - I 1 
Pr(s(v~ )= l) = Pr( A Pi= 1) = ')k 
I =j 2~ 
.·. Pr ( wt switches) :s; 2( 
2




(ii) By the same argument as (i) above, 
Pr( wf switches) :s; 2Pr(s(d )= l) * Pr(s(vt )= 1) 
From (i) above, Pr(s(d )= 1) = +, 
2-
:. Pr(wfswitches) :s; 2(
2




Continuation of Proof of Lemma 5.2 (ie. Eu (5 (n )) = O(,z )). 
Let SB (n) denote the average switching of blong1-vires (S (n )). 
Let SR (n) denote the average switching of rlongwires (S\n )). 
Let EB (n) denote the average energy of b!ongwires (S (n )). 
Let ER (n ) denote the average energy of r!ongwires (S (n) ). 
(] 
The following Lemma shows that the average switching of the longwires (S (n )) is O(n ). How-
ever, more important are the recurrences used to obtain the average switching. These recurrences are sub-
sequently used in Lemma 5.6 to show that the average energy of the longwires (S (n)) is O(n ). 
Lemma 5.5: 
(i) SB (n) = O(n) 
(ii) SR (n) = O(n) 
Proof: 
(i) SB (n ), the average switching of the blongwires (S (n )), is given by the following recurrence, which 
uses the probabilities of Lemma 5.4. 
SB (2) :s; 1 
SB(n) :s; 25B(~)+ 
2
(n}2l-l 
This standard recurrence has the solution SB (n ) = O(n ). 
(ii) Although different stage k rlongwires have different switching probabilities, Lemma 5.4 provides 
an upper bound on these probabilities. 
SR (2) :s; 1 
SR(n) ::_;; 25R(~)+ 
2
(n /\)-l 
:. SR (n) = O(n ). [] 
The following Lemma shows that the average energy of longwires (S (n)) is O(n ). 
Lemma 5.6: 
(i) EB (n) = O(n) 
(ii) ER (n) = O(n) 
Proof: 
(i) Using the recurrences for area and average switching obtained above, the following recurrence is 
obtained for EB (,z ), the average energy of the blongivires (S (n) ). 
EB (2) :s; 3 + '✓2 
EB (11 ) :s; 2 EB ( .!!. ) + (2 11 + '✓2- l ) 
2 2 (n / 2)-l 
-32-
Upper and Lower Bounds on Switching Energy in VLSI 
= 0(n) 
(ii) ER (n ), the average energy of the rlongwires (S (n )). is similarly obtained from the area recurrence 
and the switching probabilities cited above. 
ER(2) ~ 2 
ER( ) < ZER( n) (2n -2) n - 2 + z(n/2)-1 
O(n) [] 
From Lemma 5.3, the shortwires (S (n)) use Ea (n) = 0(n ). By a proof similar to that of Theorem 5.1, 
and by Lemma 5.6, the longwires (S (n)) use Ea (n) = O(n ). Hence, S (n ), which computes the carries 
for adding two n - bit binary numbers, uses average energy Ea (S (n )) = 0(n ). 
[] end of Lemma 5.2. 
Theorem 5.2: 
The average energy needed to add two n - bit boolean numbers in depth 0( log n) is Ea (n) = O(n ). 
Proof: 
The VLSI circuit S (n) can be used to compute the carries (c 1, . . . , Cn ). Notice that in the layout of 
S(n), for l<i~n. inputp; is embedded 3 units away from carry c;_1. (Assume a constant 0 is located 
near p 1 .) Hence, each element of the sum vector (s 1, .. . , Sn+l) can be computed from the carry vector 
in 0(1) energy, since s; =p; EBc;_1 for l~i~n, and Sn+l =en. [] 
6. OPEN PROBLEMS: 
Lower Bounds on Single-Valued Functions 
To date, no nontrivial energy lower bounds are known for single-valued functions. In section 3, a 
superlinear energy lower bound is derived for the parity function, in the special case where the circuit 
contains only EB-gates and negations. For an arbitrary basis, the parity problem is open, as is the major-
ity function. The only other results known for single-valued functions are the upper bounds described in 
section 4. 
Energy-Efficient Design Techniques 
The design techniques of section 4 were applied to specific functions - OR, compare and addition. 
Characterize a class of functions for which the circuit and layout techniques of section 4 produce energy-
efficient designs. Moreover, the layout technique alone, as applied to the parallel prefix adder circuit in 
section 5, is likely applicable to a larger class of circuits . In both the SOR /SAND circuit and the adder 
circuit, the layout technique entails embedding tree subcircuits in a specific manner. 
ACKNOWLEDGEMENTS: 
I am very thankful to Stephen Cook for supervising this research. Cook also proved Lemma 3.2. 
Les Valiant provided an idea that lead to Lemma 3.2. I thank Charles Rackoff and Joachim von zur 
Gathen for their contributions to the presentation of this work. The importance of switching energy to 
VLSI was originally suggested to me by Carver Mead. 
-33-
Upper and Lower Bounds on Switching Energy in VLSI 
REFERENCES: 
[ACR88] 
Aggarwal, A., A. Chandra, P. Raghavan, "Energy Consumption in VLSI Circuits", Proceedings of 
20th ACM STOC, May 1988, pp. 205-216. 
[Bo77] 
Borodin, A., "On Relating Time and Space to Size and Depth", SIAM Journal of Computing, Vol. 6, 
No. 4, December 1977, pp. 733-744. 
[BK82] 
Brent, R.P., H.T. Kung, "A Regular Layout for Parallel Adders", IEEE Transactions on Computers, 
Vol. C-31, No. 3, March 1982, pp. 260-264. 
[BK81] 
Brent, R.P., H.T. Kung, "The Area-Time Complexity of Binary Multiplication", JACM, Vol. 28, 
No. 3, July 1981, pp. 521-534. 
[BK80] 
Brent, R.P., H.T. Kung, "On the Area of Binary Tree Layouts," Information Processing Letters, 
Vol. 11, No. 1, August 1980, pp. 46-48. 
[Ki90] 
Kissin, G., "Models of Multiswitch Energy", CW/ Quarterly, Vol. 3, No. 1, March 1990, pp. 45-66. 
[Ki87] 
Kissin, G., "Modeling Energy Consumption in VLSI Circuits", PhD Thesis, Department of Com-
puter Science, University of Toronto, 1987. 
[Ki85] 
Kissin, G., "Functional Bounds on Switching Energy", Proceedings of 1985 Chapel Hill Conference 
on Very Large Scale Integration, May 1985, pp. 181-196. 
[Ki82] 
Kissin, G., "Measuring Energy Consumption in VLSI Circuits: a Foundation", Proceedings of 14th 
ACM STOC, May 1982, pp. 99-104. 
[KKTV90] 
Kissin, G., E. Kranakis, J. Tromp, P. Vitanyi, "The Energy Complexity of Threshold and Other 
Functions", CW/ Technical Report, to appear. 
[KR65a] 
Krohn, K., J. Rhodes, "Algebraic Theory of Machines. I. Prime Decomposition Theorem for Finite 
Semigroups and Machines", Transactions of American Mathematical Society, Vol. 116, 1965, pp. 
450-464. 
[KR65b] 
Krohn, K., J. Rhodes, "Results on Finite Semigroups", Proceedings of the National Academy of Sci-
ence, USA, Vol. 53, 1965, pp. 499-501. 
[Le84] 
Leo, J., "Energy Complexity in VLSI", M.S. Thesis, University of Nymequen, The Netherlands, 
February 1984. 
-34-
Upper and Lower Bounds on Switching Energy in VLSI 
[LF80] 
Ladner, R.E., M.J. Fischer, "Parallel Prefix Computation", JACM, Vol. 27, No. 4, October 1980, pp. 
831-838. 
[LM81] 
Lengauer, T., K. Mehlhom, "On the Complexity of VLSI Computations", Proceedings of CMU 
Conference on VLSI, Computer Science Press. October 1981, pp. 89-99. 
[MC80] 
Mead, C., L. Conway, Introduction to VLSI Systems, Addison-Wesley, 1980. 
[Me86] 
Mead, C., private communication 
[Me80] 
Mead, C., private communication 
[Sa76] 
Savage, J.E., The Complexity of Computing, John Wiley & Sons, 1976. 
[ST86J 
Snyder, L., A. Tyagi, "The Energy Complexity of Transitive FUJ1ctions", Proceedings of 24th Aller-
ton Conference on Communication, Control and Compming, October 1986, pp. 562-572. 
[Th80] 
Thompson, C., "A Complexity Theory for VLSI", PhD Thesis, Dept. of Computer Science. 
Carnegie-Mellon University, 1980. 
[Va84] 
Valiant, L., private communication 
[Vi83] 
Vuillemin, J., "A Combinatorial Limit to the Computing Power of VLSI Circuits", IEEE Transac-
tions on Computers, Vol. C-30, No. 2, 1983, pp. 135-140. 
[Ya81] 
Yao, A., "The Entropic Limitations on VLSI Computations", Proceedings of 13th ACM STOC, May 
1981, pp. 308-311. 
-35-

