Area-period tradeoffs for multiplication of rectangular matrices  by Lin, Ferng-Ching & Wu, I-Chen
JOURNAL OF COMPUTER AND SYSTEM SCIENCES 30, 329-342 (1985) 
Area-Period Tradeoffs for 
Multiplication of Rectangular Matrices 
FERNG-CHING LIN AND I-&EN WV 
Department of Computer Science and Information Engineering, 
National Taiwan University, Taipei, Taiwan, Republic of China 
Received February 27, 1984; revised October 9, 1984 
A VLSI computation model is presented with a time dimension in which the concept of 
information transfer is made precise and memory requirements (lower bounds for A) and 
area-period trade-offs (lower bounds for Al’*) are treated uniformly. By employing the tran- 
sitivities of cyclic shiftings and binary multiplication it is proved that AP” = 
Q((min(mn, mp, np) I)’ +.), 0 < a d 1, for the problem of multiplying m x n and n x p matrices 
of I-bit elements. We also show that min(mn, mp, np) I is the exact bound for chip area. 
0 1985 Academic Press. Inc. 
1. IN~xoDuC~~N 
In a VLSI computation, the task is distributed over various processing elements 
and the interconnecting wires are used for necessary communication. The wires 
often occupy much space so that area minimization becomes important in chip 
designs. Recent researches [l, 6, 7, 10, 15-181 indicate that tradeoffs between chip 
area A and computation time T exist for many problems. 
For getting a lower bound for A, one usually considers the memory requirement 
at a certain stage of the computation. On the other hand, during the computation, 
some amount of information must be transferred from one part of the chip to the 
other due to the nature of the problem. If the amount of transferred information, 
the information content, can be measured from below then a lower bound for AT’ 
can be derived. Various techniques for proving lower bounds can be found in the 
references cited above and also in [3, 193. Conventional VLSI models for obtaining 
lower bounds lack handy ways to deal with the transient phenomena that different 
signals may occupy the same place at different time instances. This complicates the 
matter by hiding the time dimension in the models. 
In this paper, we propose an approach in which the time dimension is brought 
back into a conventional VLSI model. The concept of information content is made 
precise and memory requirements and area-time trade-offs are treated uniformly. 
Furthermore, in pipelined chips where T should be replaced by the computation 
period P [ 181, similar results for area-period trade-offs can be derived without any 
adjustment in our model. 
329 
0022-0000/85 $3.00 
Copyright Q 1985 by Academic Press, Inc. 
All rights of reproduction in any form reserved. 
330 LIN AND WU 
For problems like cyclic shiftings and binary multiplication, the information con- 
tents are easy to measure because of the transitivities they induce. The useful con- 
cept of transitivity was raised by Vuillemin [ 181 and we redefine it in a very 
straightforward manner. Motivated by the fact that transitivity only occurs within a 
row or a column in the matrix multiplication problem, we introduce the notion of 
partitioned transitivity for summing up the information contents. 
For the multiplication of m x n and n x p Boolean matrices, Savage [ 163 showed 
l- 
when (a - n)(b -n) < n2/2, where a = max(n, p) and b = max(n, m). Savage set the 
case when m, p > n and (m - n)(p -n) 2 n2/2 as an open question. We answer this 
question by proving a stronger and more complete result 
AP2* = 52( (min(mn, mp, np) I)’ + ‘), O<a<l, 
where I is the bit length of elements in the input matrices. It puts no constraint on 
the dimensions of the matrices and goes down to the bit level of the computation. 
We also show that min(mn, mp, np) 1 is the exact bound for chip area. 
The result we obtain for square matrix multiplication provides indirect proofs of 
area-period trade-offs for related problems like transitive closure and matrix inver- 
sion. We also give a direct lower bound proof for the all-pair shortest-paths 
problem. 
2. THE A x T GRID MODEL 
A variety of VLSI computation models have been proposed [7, 11, 171; they 
share many features in common with the grid model which we shall use here. We 
postulate a rectangular grid in which wires run along the horizontal or vertical grid 
lines. On any grid line there can be a wire for each layer. Circuit elements, in par- 
ticular, I/O pads, contacts, and logic elements reside at the grid points. 
Although in different technologies the separation among layers and circuit 
elements could be different, the spacing between any two parallel grid lines we can 
view as a constant I and this does affect the asymptotic complexity analysis. The 
times and locations at which the input and output bits are available are assumed to 
be fixed and independent of the input values. In other words, we only consider 
when and where deterministic chips. We also assume that each input or output bit 
is available only once, i.e., there is no free memory outside the chip. 
Since the time dimension is hiden in such a 2D grid model we like to extend it to 
a 3D or A x T grid model. In a 3D coordinate system, a point (x, y, t) stands for 
the state at the point (x, y) in the 2D grid at time instance t. If we use the clock 
time r as the spacing in the time dimension, then (U, j& kr), where i, j, k are non- 
negative integers, represents a grid point in the A x T grid model. A grid point 
MULTIPLICATION OF RECTANGULAR MATRICES 331 
Y+ 
I I 
I I I 
31 ’ I I 
I I I 
--L---L---L---- 
l I I 
2A 1 I I 
I I I 
~~c~~~L---l----- 
I I I 
A ’ I I 
I I 
--L---L---L---- 
i I I 
I I I c 
0 A 21 3x x 
(a) 
FIG. 1. Subdivision planes. 
b) 
(x, y, t) will be called an entrance if some data bit is input to the point (x, y) at 
time t. Similarly, a grid point (x, y, t) will be called an exit if some data bit is out- 
put from the point (x, y) at time t. Note that because any input or output bit is 
available only once and when and where deterministic, there is an l-l correspon- 
dence between input (resp. output) bits and entrances (resp. exits). 
We devide each A x Iz square in A into four 112 x n/2 subsquares as shown by the 
dashed lines in Fig. la. In the direction of t we also divide each r interval into two 
r/2 subintervals as shown in Fig. lb. One dashed line stands for a vertical or 
horizontal plane in the A x T model which we shall call a subdivision plane sub- 
sequently. 
A bisection surface is defined to be a connected surface which partitions the 
whole grid into two parts and any point of the surface must be on some subdivision 
plane. We see that no grid point can be on a bisection surface. If certain points, 
usually part of the exits, are paid close attention then they are called the observed 
points. The main purpose of a bisection surface is to separate the observed points. If 
a bisection surface is constructed such that it separates the observed points into two 
equal halves then it is called a balanced bisection surface. 
FIG. 2. A typical information transfer. 
332 LIN AND WU 
Suppose, in A, a signal is sent from one point to another by a wire path 
(i, 1, j, A),..., (&A, j,n) during the period from kz to k’r, we use, in A x T, the path 
(i,l, j,& kr),..., (i,n, j,n, kr), (i,,l, j,& k’r) to represent the information transfer. A 
typical information transfer is shown in Fig. 2. The vertical segment has length 
(k’ - k) T which is the propagation delay of the signal. In this way, we can get 
around the diverse arguments about delay assumption on long wires as discussed in 
[4, 5, 14, 171. 
The number of times information transfers cross a bisection surface is defined to 
be an information content of the computation. It should be apparent that at most a 
constant number of information transfers can pass through a bisection surface at 
the same point. Therefore, if the information content is Z then we can say that the 
area of the bisection surface is a(Z). Usually we can only bound the information 
content from below, thereafter we also denote its lower estimation by Z. 
3. INFORMATION CONTENTS AND AREA-TIME TRADE-OFFS 
We relate information contents to area lower bounds and area-time trade-offs in 
the following fundamental theorem. 
THEOREM 1. Zf the information content of any balanced bisection surface is Z then 
AT*“=B(Z’+‘) for O<a< 1. 
Proof: When c( = 0, 1 we have the following two interesting cases: 
(a) A = Q(Z) (lower bound for area) 
(b) AT* = Q(Z*) (area-time trade-off). 
It sufices to show (a) and (b) because they readily imply AT*‘= A’-“(AT*)* = 
52(Z1+‘) for 0 < a < 1. We shall prove (b) and then (a). 
Let the width, depth, and height of the grid be w, d, and T, respectively. We first 
try to construct a balanced bisection surface as perpendicular to x axis as possible 
and partition the grid points into two parts P, and P,. 
(i) Select a subdivision plane perpendicular to x axis by sliding it from left 
to right. One thing to make sure is that the number of observed points in the left 
part, i.e., 0 <x < (i+ l/2) 1, does not exceed one half of the number of observed 
points. Give the left part to P,. 
(ii) If the desired number is not made, go to the plane x = i + 1 for more 
observed points. This time we have to move from front to back in the y axis direc- 
tion and make sure that the number of observed points in the region [0 <x < 
(i + l/2) 11 v [0 < x < (i + 3/2) MO 6 y < ( j+ l/2) A] is not greater than the desired 
number. Give the extra region to P,, the result is shown in Fig. 3a, b. 
(iii) If the desired number is still not made then we can cut more observed 
points to P, in the pole [(i+1/2)IZ<x<(i+3/2),J.]n[(j+1/2)J.<y< 
(j+ 3/2) A] as shown in Fig. 3c. 























0 w x 
la) (b) 
FIG. 3. Constructing a bisection surface. 
The area of the bisection surface is clearly no more than dT + 3T. This implies 
dT= Q(Z). If we construct a balanced bisection surface perpendicular to y axis 
instead, we have wT= 52(Z). Put them together, we have AT’ = w dT2 = (wT)(dT) = 
Q(Z2). This completes the proof of (b). Similar arguments are made for the bisection 
surface perpendicular to r axis. (See Fig. 4) This time we have A = wd = Q(Z) or 
GO I 
An important measure for the time complexity of pipelined chips is the com- 
putation period P. Vuillemin [ 181 defined P as the maximal time interval between, 
two successive data passages at any input or output pad. In this paper we change 
the definition to the average time interval. Since P < T, any area-period lower 
bound is automatically an area-time lower bound. The following theorem is a 
stronger version of Theorem 1. 
THEOREM 2. Zf the information content of any balanced bisection surface is Z then 
AP2a=12(Z’+“) for O<a< 1. 
334 LIN AND WL 
/ 
/ 
0  x 
FIG. 4. A bisection surface perpendicular to r-axis. 
ProojI For A = Q(Z) the proof is exactly the same as in Theorem 1, where we 
only consider one computation. For dP = Q(Z) and WP = Q(Z) we have to consider, 
say m pipelined computations. Let the total computation time be T. By similar 
arguements we have dT' = Q(mZ) and WT = Q(mZ). Since T'jm -+ P when m -+ co, 
dP=Q(Z), and wP=Q(Z). u 
4. PARTITIONED TRANSITIVITY 
From the previous section we know that information content is an efficacious 
means to derive area lower bounds and area-period trade-offs. Vuillemin [lS] 
raised the useful concept of transitivity of functions for measuring information con- 
tents. Here we want to redefine it in a very natural way. In mathematics, a collec- 
tion of mappings on a set X is called transitive if, for any pair of elements xi, x2 in 
X, there exists at least one mapping in the collection which maps X, into x2. Now 
in VLSI computation, for a given problem, if the values of some input variables are 
properly controlled, one might be able to find out that transitivity does occur 
between parts of the input and output variables. 
DEFINITION 1. Let X be an input variable set and Y be an output variable set. A 
collection of l-l mappings, each of which maps a subset of X into a subset of Y, is 
said to be transitive if, for every x in X and every y in Y, there is at least one map- 
ping in the collection which maps x into y. 
EXAMPLE 1 (Cyclic shiftings). We define the collection of cyclic shiftings from 
x= {x,, x2 )...) x,} to Y = ( y, , y, ,..., JJ,} as follows: 
(i) If m 2 n, there are m cyclic left shiftings Ri, 1 < i < m, defined by 
Yj=Ri(x(J+i-I)rnoctrn) for l<j<n. 
(ii) If m < n, there are n cyclic right shiftings Ri, 1 < i < n, defined by 
Y~i+j-I)modn=Ri(xj) for 1 <j<m. 
MULTIPLICATION OF RECTANGULAR MATRICES 335 
(a) (b) 
FIG. 5. Cyclic shiftings. 
We roughly illustrate the cyclic shifting Ri for cases (i) and (ii) in Figs. 5a and b, 
respectively. The collection of max(m, n) cyclic shiftings is certainly transitive. 
A transitive collection of mappings can often be induced from a given problem by 
properly controlling some of its input variables. For example, consider the matrix 
multiplication problem C= AB, where A, B, C are 1 x m, m x n,. 1 x n Boolean 
matrices. We can assign B to be cyclic shift permutation matrices to induce a collec- 
tion of cyclic shiftings from A to C. Similarly, if A, B, C are m x n, n x 1, m x 1 
Boolean matrices, we assign A to be cyclic shift permutation matrices and induce a 
collection of cyclic shiftings from B to C. 
EXAMPLE 2 (Binary multiplication). Write the problem of binary multiplication 
as (zl, b., z2n)= (x19.x 2Y.Y X,)(Y,, Y2,..., y,,). Here we assume n is even. For each 
i, we can properly fix yi = 1 and yj = 0 for all j # i. This induces a transitive collec- 
tion of mappings from {x1, x2 ,..., x,,,} to (z,,, + r ,..., z”>. 
LEMMA 1. For a given problem, suppose there is a set of m input variables and a 
set of n output variables such that between them a transitive collection off mappings 
can be induced. Then the information content of any balanced bisection surface with 
respect to the n observed exists is at least mn/2J 
Proof Consider any balanced bisection surface. Each input variable is mapped 
into every output variable at least once. This implies that each input variable is 
mapped into output variables in the other part at least n/2 times. Therefore totally 
these m input variables are mapped into output variables in the other part at least 
mn/2 times. By the pigeon hole principle, there is one mapping which map at least 
mn/2f input variables into the other part. Hence this particular mapping reveals at 
least mn/2f information transfers across the bisection surface. 1 
For the special cyclic shiftings problem, m = n, Vuillemin Cl83 has proved the 
area-period lower bound AP2 = Q(n2). Brent and Kung [7] proved an area-time 
lower bound AT2 = L!(n2) for the binary multiplication problem. Here we apply 
Lemma 1 and Theorem 2 to obtain stronger results as stated in the following two 
theorems. 
THEOREM 3. Zf a problem can induce a collection of cyclic shiftings from m input 
336 LIN AND WU 
variables to n output variables, then for this problem AP2” = O((min(m, n))’ +OL) for 
O<a<l. 
Proof As indicated in Example 1, the collection of max(m, n) cyclic shiftings is 
transitive. By Lemma 1, the information content of any balanced bisection surface 
is at least mn/2 max(m, n) =min(m, n)/2. Applying Theorem 2 we have the 
result. 1 
THEOREM 4. For the problem of multiplying two n-bit numbers, AP2” = 52(n’ +OL), 
for O<a< 1. 
Proof A transitive collection of n mappings between n/2 inputs and n/2 outputs 
can be induced by the problem as indicated in Example 2. By Lemma 1 we know 
that the information content is at least (n/2 * n/4)/n = n/S. Theorem 2 gives the area- 
period lower bound. 1 
Because the matrix multiplication problem can only induce transitivity inside 
each row or column but not the whole matrix, we have to generalize the concept of 
transitivity to partitioned transitivity. We also can extend Lemma 1 naturally. 
DEFINITION 2. Let X be an input variable set and Y be an output variable set. X 
is partitioned into disjoint subsets Xi, 1 < i 6 p, and Y is partitioned into disjoint 
subsets Yi, 1 6 i < p. If a collection of l-l mappings is induced such that when 
restricted to each Xi, it forms a transitive collection of mappings from Xi to Yi then 
we say that the problem induces partitioned transitivity. 
LEMMA 2. Suppose a partitioned transitive collection off mappings is induced by 
a problem. If 1 Xi1 = mi and 1 Yil = ni for 1 6 i < p, and a balanced bisection surface 
separates each Yi into two parts with ki (dn,/2) exists in one part, then the infor- 
mation content is at least C;= L miki,fJ: 
Proof: Every input variable in Xi is mapped into output variables (in Yi) 
belonging to the other side of the bisection surface at least ki times. Similar 
arguments as in the proof of Lemma 1 can be applied to finish the proof. # 
5. MATRIX MULTIPLICATION 
Before we can prove an area-period lower bound for the problem of matrix mul- 
tiplication we need to show an interesting combinatorial lemma. 
LEMMA 3. Suppose we arbitrarily partition an m x n matrix of points into two 
parts P, and P, with P, containing k points, k < mn/2. If, for row i, there are rj (resp. 
rf) points belonging to P, (resp. P2) and, for column j, there are cj (resp. cj) points 
belonging to P, (resp. P2), then Zy= 1 ri + c;= 1 cj > k/2, where ri= min(r:, rf) and 
cj = min(cj’, cj’). 
MULTIPLICATION OF RECTANGULAR MATRICES 331 
ProoJ In each row or column we mark the points which belong to the lesser 
number part. (If the numbers are equal we freely choose one part.) We then have 
C;=, li + c;=, cj > the number of marked points. 
If every column contains at least rk/2nl marked points, then the number of 
marked points > rk/2nl. n 2 k/2 as we desire. So let us assume that there is a 
column in which at least m - Lk/2n J points are not marked. Let these points be in 
row b i ,..., b,, where u 2 m - Lk/2n]. They must belong to the same part, say Pi. 
Points in these rows but not in Pi must be marked. There are two cases, i= 1 or 2, 
to be discussed: 
(i) If i= 1 then 
number of marked points > i rij > i (n - r:,) 
j=l j=l 
.(m-LWnJn-j~, rAj 
2 ((mn -k) + k/2) -k 
> k/2. 
(ii) If i=2 then 
number of marked points > f r& 3 f (n - rij) 
j=l j=l 
a(m--Lk/2nJ)n- i rtj 
j=l 
2 (mn - k/2) - (mn -k) 
2 k/2. 1 
We are now ready to prove the main theorem for the problem of matrix mul- 
tiplication. We define the problem as C = AB, where A = (Q), B = (b,), C = (cv) are 
m x n, n x p, m x p matrices, respectively. Assume elements in A and B have bit 
length 1. 
THEOREM 5. For the matrix multiplication problem, AT2”=L2((min(mn, 
mp,np)I)‘+“)for O<a<l. 
Proof: For simplicity we assume 1 is even. If go down to the bit level, A, B, C 
can be thought as m x nl, n xpl, m x 2~1’ matrices, where I’ 2 1. The bit s of aij is 
denoted by a!!). 
By properli controlling the values of B we can induce a transitive collection of 
mappings between row i of A and row i of C, 1~ i < m. What we can do is to make 
each al;), 1 <j 6 n and 1~ s < l/2, mapped into any c$‘), 1~ j’ < p and l/2 + 1 < 
s’ < 1. This can be accomplished by combining the transitivities of cyclic shiftings 
338 LIN AND WU 
and binary multiplication as mentioned in the previous section. There are totally 
max(n, p) I mappings. By Lemma 2, any balanced bisection surface with respect 
to the mp1/2 observed point in C has information content bounded below 
by CT= i n1/2. rJmax(n, p)! = (min(n, p)/2p) Cy! 1 ri, where ri are defined as in 
Lemma 2. 
Let us denote bit slice s of column j of B and C by By) and C:?), respectively. B$!) 
can be cyclically shifted to Cj?) if the values of A are properly arranged. Here we 
only consider l/2 + 1 d s < 1. By Lemma 2 again, the information content of any 
balanced bisection surface is bounded below by Cj’i2r ncJmax(m, n) = min(m, n)/m 
C;z2, cj, where cj are defined as in Lemma 2. 
If we consider the matrix of points formed by Cj”), 1 < j < p, and l/2 + 1 ,< s < 1, 
and take k = mp1/4 in Lemma 3, then rnax(Cy= 1 ri, CT’/: ci) > mp1/16. Therefore the 
information content is at least min(min(n, p)/2p, min(m, n)/m) mp1/16. With this 
and Theorem 2 we finish the proof. 1 
Savage [ 161 has proved an area-time lower bounds for the matrix multiplication 
problem at the matrix element lever (I= O( 1)). Even at the matrix element level our 
result is more complete because it puts no contraint on the matrix dimensions. In 
our notation, Savage’s result can be written as 
z>c !T.F 
( ( 
l-t2 max(m, n) - n)(2 max(p, n) -n 
4 2 max(m, n) max(p, n) )I 
when (max(m, n) - n)(max(p, n) -n) <n/2. Let us compare our result with 
Savage’s in more detail: 
(i) When m, pdn, both results imply Z=sZ(mp). 
(ii) When m <n d p (similarly for p < n 6 m), both results imply Z= G?(mn). 
(iii) When n <m 6 p (similarly for n < p <m), our result gives Z= Q(mn). 
Under the condition (m - n)(p - n) < n2/2, Savage’s result becomes I> C(n2 - 
2(m - n)(p - n))/8. So in this case we get a better result. 
When restricted to the square matrix case, m = n = p = iV, Theorem 5 implies 
AP2” = Q((N21)’ +‘). Kung and Leiserson [9] have designed a renowned hexagonal 
network for computing square matrix multiplication. In this network, if we use the 
multiplier of Preparata [ 121 (A = U(1) and T= O(lli2)) then for the whole network 
A = O(N21) and T = O(N11’2), or A T2” = O( (N21)’ + ‘) which matches the area-time 
lower bound. Preparata and Vuillemin [13] have designed a family of pipelined 
chips for square matrix multiplication which achieve the area-time lower bound 
with O(log(N. Zli2) < T< O(iV1”2) if the Preparata’s multiplier is used. 
6. EXACT AREA BOUND 
In Theorem 1 or 2, if we set a = 0 then A = Q(Z). From the proof of Theorem 5 
we know that for the matrix multiplication problem Z= SZ(min(mn, mp, np) 1). In 
the following we will show that min(mn, mp, np) 1 is the exact bound for chip area. 
MULTIPLICATION OF RECTANGULAR MATRICES 339 
For the case n am, p, we design a mesh-connected network with mp cells as 
illustrated in Fig. 6. 
1-q pq ... q alI a12 ... a,, 
[TYl I1 ... m a21 u22 ... a,, 
Fi cl... Q a,, ai2 ... ain 
b II b,, ... b,, 
b2, b 22 ... b 2p 
b n1 b,, ... b, 
FIG. 6. Network with O(mpZ) area. 
Each cell contains a register cii initialized as 0. At stage k, 1 d k < n, the column of 
input values ulk,..., amk march through the network leftward and stay in each 
column of cells. Also the row of input values b k, ,..., b, march through the network 
upward and stay in each row of cells. Then aik x b, is added to cii. After n stages cV 
has the final result. 
There are one multiplication and one addition that take place in each cell at each 
stage. In order to keep the area of a cell as small as possible, we can serially input 
and output the data bits and use the add-and-shift method for multiplication and 
the ripple-carry method for addition. Every cell occupies O(I) area and the whole 
network occupies O(mpl) area. 
For the case m > n, p we also design a mesh-connected network with np cells as 
illustrated in Fig. 7. 
Cl1 c,2 ... ClP 
c21 c22 ... % 
cm1 cm2 .‘. ‘v 
FIG. 7. Network with O(npl) area. 
This time the cells hold the values of B. At each stage one column of input values of 
A march in leftward and stay in each column of cells first. Then one row of C 
340 LIN AND WU 
(initialized as OS) march in upward and accumulate their results. The area of the 
network is O(npl). 
For the case p 2 m, n, we can similarly design a circuit with area O(mp1). Since 
for all cases the area lower bound min(mn, mp, np) 1 can actually be achieved, we 
conclude that this bound is exact. 
THEOREM 6. For the matrix multiplication problem, A = 0(min(mn, mp, np) I). 
7. RELATED PROBLEMS 
As Savage pointed out in [16], both transitive closure and matrix inversion 
problems can be reduced to the square matrix multiplication problem by the 
following identities [2, pp. 203, 2421: 
Theorem 5 therefore provides indirect proofs for the area-period lower bounds for 
these two problems. 
THEOREM 7. For the transitive closure problem, AP** = Q(N*” + “‘) for 0 < u < 1. 
For the matrix inversion problem, AP*” = Q( (N’l)’ +a) for 0 < a < 1. 
It is also easy to give direct proofs for these two problems by a method 
analogous to the one for matrix multiplication. Guibas et al. [8] have designed a 
VLSI network for transitive closure problem which achieves the area-time lower 
bound with A = O(N*) and T= O(N). 
Another related problem is the all-pair shortest-paths problem. The problem is to 
compute for each pair of vertices in a graph, the weight of the least-weight path 
between them. Here we only look at the special case when every pair of vertices in 
the graph of N = 3n vertices is connected by an edge of weight 0 or 1. Let the cost 
(Boolean) matrix be M and denote the resulting matrix as S(M). Then 
S(M) = (a)*, where @ represents the complementary matrix of M. By substituting 
special matrices for M, we have the following identity: 
where C=a. 
MULTIPLICATION OF RECTANGULAR MATRICES 341 
Now, if A is fixed to be cyclic permutation matrices, there induces a transitive 
collection of cyclic shiftings from each column in i? to its corresponding column in 
C. Same property holds for every row of C when B is fixed to be cyclic permutation 
matrices. By applying Lemma 2 and Theorem 2, we obtain the area-period lower 
bound for the all-pair shortest-paths problem. 
THEOREM 8. For the all-pair shortest-paths problem, AP’” = Q(N2(‘+lr)) for 
O<a<l. 
The network for transitive closure just mentioned above can be modified by using 
adders described in [ 121 (A = O(1) and T = Of”‘)) and comparators by the same 
design method to solve the all-pair shortest-paths problem with AT” = 
O((N’/)“‘). We conjecture that this is also the lower bound. 
ACKNOWLEDGMENT 
The authors gratefully acknowledge the insightful comments and valuable suggestions of referee B. 
REFERENCES 
1. H. AB~LES~N AND P. ANDREAE, Information transfer and area-time trade-offs for VLSI mul- 
tiplication, Comm. ACM 23, No. 1 (1980), 20-23. 
2. A. V. AHO, J. E. HOPCROFT, AND J. D. ULLMAN, “The Design and Analysis of Computer 
Algorithms,” Addison-Wesley, Menlo Park, Calif., 1974. 
3. A. V. AHO, J. D. ULLMAN, AND M. YANNAKAKIS, On notions of information transfer in VLSI cir- 
cuits, in “Proc. 15th Annual ACM Sympos. on Theory of Computing,” Boston, Mass., April 1983, 
pp. 133-139. 
4. G. BILARDI, M. PRACCHI, AND F. P. PREPARATA, A critique of network speed in VLSI models of 
computation, IEEE J. Solid-State Circuits, SC-17, No. 4 (1982), 696702. 
5. B. CHAZELLE AND L. MONIER, A model of computation for VLSI with related complexity results, in 
“Proc. 13th Annual ACM Sympos. on Theory of Computing,” Milwaukee, WI, May 1981, pp. 
318-325. 
6. R. P. BRENT AND L. M. C~LDSCHLAGER, Some area-time tradeoffs for VLSI, SIAM J. Comput. 11, 
No. 4 (1982), 737-747. 
7. R. P. BRENT AND H. T. KUNG, The area-time complexity of binary multiplication, J. Assoc. Comput. 
Much. 28, No. 3 (1981), 521-534. 
8. L. J. GUIBAS, H. T. KUNG, AND C. D. THOMPSON, Direct VLSI implementation of combinatorial 
algorithms, in “Conf. on VLSI Technical Design and Fabrication,” California Institute of 
Technology, 1979, pp. 509-525. 
9. H. T. KUNG AND C. E. LEISERSON, Systolic arrays for VLSI, in “Introduction to VLSI Systems,” Sec- 
tion 8.3, Addison-Wesley, Menlo Park, Calif., 1980. 
10. R. J. LIPTON AND R. SELXEWICK, Lower Bounds for VLSI, in “Proc. 13th Annual ACM Sympos. on 
Theory of Computing,” Milwaukee, Wise., May 1981, pp. 300-307. 
11. C. A. MEAD AND L. A. CONWAY, “Introduction to VLSI Systems,” Addison-Wesley, Menlo Park, 
Calif., 1980. 
12. F. P. PREPARATA, A mesh-connected area-time optimal VLSI multiplier of large integers, IEEE 
Trans. Comput. C-32, No. 2 (1983), 194-198. 
342 LIN AND WIJ 
13. F. P. P~EPARATA AND J. E. VLJILLEMIN, Area-time optimal VLSI network for parallel matrix mul- 
tiplication, Inform. Process. Lett., ll( 1980) 77-80. 
14. V. RAMACHANDRAN, On driving many long lines in a VLSI layout, in “Proc. 23rd Annual Sympos. 
on Foundations of Computer Science,” Chicago, Il., November 1982, pp. 369-372. 
15. J. E. SAVAGE, “Planar Circuit Complexity and the Performance of VLSI Algorithms,” Technical 
Report No. CS-69, INRIA, Rocquencourt, 1981. 
16. J. E. SAVAGE, Area-time tradeoffs for matrix multiplication and related problems in VLSI models, J. 
Comput. System Sci. 22, No. 2 (1981), 23&242. 
17. C. D. THOMPSON, “A Complexity Theory for VLSI,” Ph.D. thesis, Carnegie-Mellon University, 
1980. 
18. J. E. VUILLEMIN, A combinatorial limit to the computing power of VLSI circuits, IEEE Trans. Com- 
put. C-32, No. 3 (1983), 294300. 
19. A. C. YAO, The entropic limitations on VLSI computations, in “Proc. 13th Annual ACM Sympos. 
on Theory of Computing,” Milwaukee, Wise., May 1981, pp. 308-311. 
