On the Time-Bandwidth Proof in VLSI Complexity by Abu-Mostafa, Yaser S.
IEEE TRANSACTIONS ON COMPUTERS, VOL. C-36, NO. 2, FEBRUARY 1987
straight vertical connection between two points of V in one column.
Charge /M - 1 (for the length of the vertical line segment) and 1
(for the cell q occupies), hence a total of <M cells to q. (Note that
Steiner points on this vertical line segment can only have outgoing h-
edges beside the u edges now accounted for, and no other leaf can be
charged the same cells as q.)
Case b): the h edge marked by q has no other marks.
It means that this edge can be uniquely assigned to q and we can
again charge </M - 1 (for the horizontal line segment) and 1 (for the
cell of q), hence a total of<M cells to q.
Case c): the h edge marked by q has other marks as well.
Note that in this case, the h edge necessarily ends in a Steiner cell,
with one outgoing v edge continuing on to q over a path of further v
edges. Say (without loss of generality) that the path leads from the
Steiner cell downwards. The only possibility for the h edge to be
marked by another leaf as well is that there is a cell of V in the same
column reached from the Steiner cell by going upward. Thus we
conclude that the h edge can only be marked by one other cell of V,
that necessarily lies in the same column as q and is vertically
connected to it. Now charge the usual <M cells to a (for the h edge)
and <M cells to the other leaf for the vertical line segments.
It follows (by carrying out this procedure for cells of V at
increasing distance from the root) that all I - 1 cells of V beside the
root can be charged a unique set of<M cells. Hence we obtain 14 > (I
- l)<M + 1.
We now complete the proof of Theorem 5.5 as follows. By Claim
5.5.1 we need / conflict-free fetches to retrieve T. By Claim 5.5.2 we
have t . 14 . (I - I)</M + 1, hence l L(t - I)/Mj + 1. Thus
we can retrieve Tby means of at most [(t - 1)//MJ + 1 fetches.
El
By choosing for M a square close to N, the following result is
immediate. (Take, e.g., A = [<NJ 2.)
Corollary 5.6: There is a linear skewing scheme using no more
than Nmemory banks, such that every rookwise connected template
ofN cells in an N x N matrix can be retrieved in at most <N + 2
conflict-free fetches.
For arbitrary, connected templates T (including, e.g., diagonals) a
precise analysis as in Theorem 5.5 is hard, but the following
somewhat weaker bound can be obtained.
Theorem 5.7: Using s to store an N x N matrix intoM memory
banks, any connected template of t cells can be retrieved by means of
at most L2 t/<Mj + 1^ conflict-free fetches.
Proof: Follow the same argument as in Theorem 5.5 until after
Claim 5.5.1. To estimate / we now reason as follows. Enclose every
cell of Vby a "box" of cells that are at most rl/2WAI'] away from
it, measured in cells along a connected (but not necessarily rookwise
connected) chain. Note that the boxes indeed are squares, and that the
boxes thus surrounding the cells of V are all disjoint. Assuming I >
1, the connectedness of T requires that in every box so distinguished
there is a chain of cells leading from the middle cells to the boundary.
This accounts for at least rl/2. <l] cells of T per box, hence t >
I [r/2 <vA] and 1 _ L2 * t/<Mj . The bound stated in the theorem is
thus correct, including the case that t is small yet 1 = 1. 0
Choosing againM = L[NI 2(N) it follows that every connected
template ofN cells in an N x N matrix can be retrieved in at most
2<N + 0(1) conflict-free fetches, using the linear skewing scheme
s.
REFERENCES
[1] P. Budnik and D. J. Kuck, "The organization and use of parallel
memories," IEEE Trans. Comput., vol. C-20, 1566-1569, 1971.
[2] L. Euler, "Recherches sur une nouvelle espece des quarr6s magiques,"
Verh. Zeeuwsch Gen. Wetensch., Vlissingen, vol. 9, pp. 85-239,
1782.
[3] M. Hanan, "On Steiner's problem with rectilinear distance," SIAM J.
Appl. Math., vol. 14, 255-265, 1966.
[4] G. H. Hardy and E. M. Wright, An Introduction to the Theory of
Numbers, 5th ed. Oxford: Clarendon 1979.
[5] A. Hedayat, "A complete solution to the existence and non-existence of
Knut Vik designs and orthogonal Knut Vik designs," J. Combin.
Theory, Ser. A, 22, pp. 331-337, 1977.
[6] A. Hedayat and W. T. Federer, "On the non-existence of Knut Vik
designs for all even orders," Ann. Stat., vol. 3, pp. 445-447, 1975.
[7] F. K. Hwang, "On Steiner minimal trees with rectilinear distance,"
SIAM J. AppL Math., vol. 30, pp. 104-114, 1976.
[8] D. J. Kuck, "ILLIAC IV software and application programming,"
IEEE Trans. Comput., vol. C-17, pp. 758-770, 1968.
[9] D. H. Lawrie, "Access and alignment of data in an array processor,"
IEEE Trans. Comput., vol. C-24, pp. 1145-1155, 1975.
[10] Z. A. Melzak, "On the problem of Steiner," Canad. Math. Bull.,
vol. 4, pp. 143-148, 1961.
[11] G. P6lya, "Uber die 'doppelt-periodischen Lbsungen' des n-Damen-
problems," in Mathematische Unterhaltungen und Spiele, W.
Ahrens Ed. Leipzig: Teubner 1918, pp. 364-374.
[12] H. D. Shapiro, "Theoretical limitations on the efficient use of parallel
memories," IEEE Trans. Comput., vol. C-27, pp. 421-428, 1978.
[13] "Generalized latin squares on the torus," Discr. Math., vol.
24, pp. 63-77, 1978.
[14] J. Tappe, J. van Leeuwen, and H. A. G. Wijshoff, "Parallel memories,
periodic skewing schemes, and the theory of finite abelian groups," to
appear in IEEE Trans. Comput.
[15] K. Vik, "Bed6mmelse av feilen pA fors6ksfelter med og uten
malestokk," Meldinger fra Norges Landbrukshegskole 4, pp. 129-
181, 1924.
[16] H. A. G. Wijshoff and J. van Leeuwen, "Periodic storage schemes
with a minimum number of memory banks," Dep. Comput. Sci., Univ.
Utrecht, Utrecht, The Netherlands, Tech. Rep. RUU-CS-83-4, 1983.
[17] "The structure of periodic storage schemes for parallel
memories," IEEE Trans. Comput., vol. C-34, pp. 501-505, 1985.
On the Time-Bandwidth Proof in VLSI Complexity
YASER S. ABU-MOSTAFA
Abstract-A subtle fallacy in the original proof [11 that the computa-
tion time T is lowerbounded by a factor inversely proportional to the
minimum bisection width of a VLSI chip is pointed out. A corrected
version of the proof using the idea of conditionally self-delimiting
messages is given.
Index Terms-Bisected graph, computation time, information theory,
lower bounds, self-delimiting, VLSI complexity.
I. INTRODUCTION
The lower bound onA T2 whereA is the area of a VLSI chip and T
is either the average or worst case computation time on the chip,
depends on the fact that T _> H/co where H is an information-
theoretic constant that depends only on the function being computed
and X is the bandwidth (minimum bisection width/unit time) of the
particular communication graph (chip model) in question [1].The
Manuscript received July 18, 1985; revised October 15, 1985. This work
was supported by the Program in Advanced Technologies (Aerojet, GM,
GTE, TRW).
The author is with the Departments of Electrical Engineering and Computer
Science, California Institute of Technology, Pasadena, CA 91125.
IEEE Log Numnber 8611452.
0018-9340/87/0200-0239$01.00 © 1987 IEEE
239
IEEE TRANSACTIONS ON COMPUTERS, VOL. C-36, NO. 2, FEBRUARY 1987
simple and plausible proof of the relation coT > H was based on
Shannon's Theorem 9 of [2] which states that it is not possible to
transmit information over a channel of capacity C bits/unit time at an
average rate of more than C bits/unit time. There is a problem in
applying this theorem to the communication of variable-length
messages between the two halves of a bisected communication graph,
because the definition of channel capacity (as used in Shannon's
proof) is tricky when the ensemble of messages over the channel has
variable lengths. The dilemma is demonstrated in the following
proposition. The definitions and basic properties of the entropy H(x),
the conditional entropy H(xly), and the average mutual information
I(x; y) can be found in [3, ch. 2]. We use the liberal notation which
identifies a random variable with its ensemble.
Proposition: Let T be a nonnegative integer-valued random
variable with mean value T. Let m = mI*... mT where each mi is an
independent (from T and the rest of the mj's) random variable
assuming 20 equiprobable values. Then H(m) = cT + H1(T) bits.
Proof., H(m) = H(m, T)- H(Tlm) = H(m, T) - 0 (since T
is determine4 by m) = H(m, T) = H(T) + H(mjT) = H(T) -
Em,T P(m, T) log P(m|I T) (by definition, with log = log2) = H(T)
- Em,T P(m, T) log 2 (equiprobability and independence) =
H(T) + w.T.
When m is the message being transmitted between the two halves
of a bisected communication graph, H(m) is the information carried
by the message over the channel. Therefore, if H(T) is positive
(variable-length messages), the total information of the message
will exceed ct by sneaking in more information in the length of
the message. In practice, this extra information is canceled out by the
requirement that the messages be self-delimiting (thus reducing
H(m T)). An argument that incorporates self-delimiting is needed in
the case of a bisected communication graph in order to justify the ccT
bound on the information flow between the two halves of the graph.
The argument given in the next section does that using the idea of
conditionally self-delimiting messages. The same remark applies to
the coTworst bound in the worst case analysis. The total number of
different messages can in principle be 20 + 2' + 22w + +
2wTworst which exceeds 2wTworst. A simnilar argument incorporating
self-delimiting can be made in this case too.
II. THE NEW PROOF
We show that if c bits/unit time is the bandwidth of a bisection R
- S of a communication graph that spiits the input vector (uniformly
distributed random variable) x into xR and xs and. the output vector
(dependent random variable) y into YR and Ys, then the average
computation time T on ihis graph is governed by cT >. H(YR IXR).
The reader is referred to [1, Sec. 3.3] for the original argument.
Fig. 1 shows the information-theoretic model. Side R has full
knowledge of xR, but no initial knowledge of xs. and wants to
compute YR which depends on both XR and xs. Through the message
vector from side S to side R, namely m- ml ... mT where T =
T(x) is a random variable, side R learns enough information about xs
to compute YR- Since there are only w wires between S and R each
carrying at most one bit per unit time (in either direction), each mi
can assume at most 20 values. The mechanism of the computation
requires that at time t = T(x), side R is ready with the value of yR,
and knows that the computation is over. This means that,
conditioned on the knowledge of XR, m is self-delimiting. We now
give the formal argument.
Proof XTl H(YR IxR): Let N = Tworst (maximum of Tover all
x). Define mi = m'f(x) = ml * * * N as follows: Ahi = mi for i <( T
and m,h = constant for T < i < N (In other words, M is a fixed-
length version of the message m, with no explicit information about
the computation time). By the c tion mechanism, YR is
uniquely determined from XR and mE, i.e.,
H(YRIXR, m)=0.
Expanding the LHS term, we get H(YR lxR) - I(YR; md IXR) = 0.
Since I(YR; Mrnx) < H(MIxR) (see [3]), we get H(YRIxR) <
H(z IXR) and it suffices to show that H(M IxR) < cT. We expand
Side S Side R
message
m
\ >/4 compute
#f Y~~R
w wires
Fig. 1. Information-theoretic model of a bisected communication graph.
H(QlxR) as follows:
H(mt XR) H(h II XR)
+H(Mh21z I,XR)
+H(k3ITI, n2, XR)
+H(1hNIdI, "¼, ryN-I, XR).
Denoting by Mi the first t symbols m' I i't of mi, this can be
rewritten as
N
H(m xR) => H(M1t|6 t 1, XR).
T m1
To estimate the general term HQh(mlt -1' XR), we expand it as
( P(Mt 1, XR)
Wit1-,XR
x P(mdtIzmtj1, XR) log P(hCIdm_l, XR))
mt
For every t = 1, * N, whether or not the computation time T is
less than t can be determined by dz,,, XR. If T < t, the inner
summation is zero because mk will be a constant. Otherwise, the inner
summation is bounded by cc since mt assumes at mnost 20 values.
Therefore, H(M,timt,1, xR) < Pr (Tl' t)w, and
N
H(ihlxR) <. Pr (Tl ) t)cc
t=1
N N
=cxX Y. Pr (T=i).
t=1 i=t
Each term of the form Pr (T = j) appears j times in the double
summation, hence we rewrite
N
H(MdIxR) < cxX ixPr (T=j)
j=1
which reduces to coT, and the proof is complete. O
Two final remarks are in order. First, the message need riot be
absolutely self-delimiting, i.e., without the knowledge of xR one
message can conceivably be a prefix of another. Secondly, requiring
that at time T the chip knows that the computation is over cannot be
replaced by the weaker condition that the output y remains the same
from time T on. The latter condition leaves us unsure about when tQ
collect the output, possibly until time Tl'rst.
REFERENCES
[1] C. D. Thompson, "A complexity theory for VLSI," Ph.D. disserta-
tion, Carnegie-Mellon University, Pittsburgh, PA, 1980.
[2] C. E. Shannon, "A mathematical theory of communication," Bell
Syst. Tech. J., vol. 27, pp. 379-423, 1948.
[3] R. G. Gallager, Information Theory and Reliable Communica-
tion. New York: Wiley, 1968.
[4] J. D. Ullman, Computational Aspects of VLSI. Rockville, MD:
Computer Science, 1984.
240
