



College o f Engineering 
Applied Computation Theory
ON PROBLEM 




UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Approved ior Public Release. Distribution Unlimited.
I UnclassifiedE C U R IT Y  C LA S S IF IC A T IO N  OF T H IS  PAGE
J R EPORT DOCUMENTATION PAG E
11a. REPO RT S E C U R IT Y  C L A S S IF IC A T IO N
Unclassified 1b. R E S T R IC T IV E  M A R K IN G SNone
12a. S E C U R IT Y  C L A S S IF IC A T IO N  A U T H O R IT Y
N/A
3. 0 1STR 1 B U T IO N /A V A IL A B IL IT Y  OF R EPO R T
Approved for public release, distribution 
unlimited.| 2 b .  O E C LA S S IF IC A TIO N /D O V V N G R A D IN G  S C H E D U LEN/A
| 4 .  P E R F O R M IN G  O R G A N IZ A T IO N  R EPO R T N U M B E R (S )
UILU-ENG-86-2204 (ACT-67)
5. M O N IT O R IN G  O R G A N IZ A T IO N  R E PO R T N U M B E R (S )
N/A
HBa. N A M E  OF P E R F O R M IN G  O R G A N IZ A T IO N1 Coordinated Science 
| Laboratory, Univ. of Illinois
8b. O F F IC E  S Y M B O L  
(If  applicable)
N/A
7a. N A M E  OF M O N IT O R IN G  O R G A N IZ A T IO N
Semiconductor Research Corporation
8 6 c .  ADDRESS (City, State and ZIP Code)
B 1101 W. Springfield Avenue 
| Urbana, Illinois 61801
7b. A O O RESS (City. State and ZIP Code)
P.0. Box 12053
Research Triangle Park, NC 27709
.
H 8 * . N A M E  OF FU N O IN G /S P O N S O R IN GB o r g a n i z a t i o n  Semiconductor 
1 Research Corporation
8b. O F F IC E  S Y M B O L  
(If applicable)
N/A
9. P R O C U R E M E N T IN S T R U M E N T  ID E N T IF IC A T IO N  N U M B E R
SRC-RSCH-84-06-049-6
P.0. Box 12053
Research Triangle Park, NC 27709
10. SO U R CE OF F U N O IN G  NOS.
P R O G R A M  
E L E M E N T  NO.
11. T IT L E  (Include Security ClassificationJ
On Problem Transformability in VLSI
N/A
PROJECT TASK W O R K  U N IT
NO. NO. NO.
N/A N/A N/A
[ 12. PER SO N A L A U T H O R (S )
Hornick, Scot and Sarrafzadeh, Majid
13b. T IM E  C O V E R E D • 14. OATE OF R EPO R T (Yr.. Mo.. Day) 15. PAGE C O U N T
FR O M  TO February 1986 21
13«. TYPE OF REPORT I n t e r i m
Technical, final
16. s u p p l e m e n t a r y  n o t a t i o n
N/A
¡ 1 7 .  C O SATI COOES
I  F IE L D GROUP 1 SUB. GR.
r 1
18. SUBJECT TE R M S  (Continue on reverse if necessary and identify by block number)
VLSI model of computation, area-time tradeoff, lower bound, 
problem transformation, computational prototype
The two basic performance parameters that capture the complexity of any VLSI chip are 
the area of the chip, A, and the computation time, T. A systematic approach for establishing 
lower bounds on A is presented. This approach relates A to the bisection flow, . A 
theory of problem transformation based on <j>, which captures both AT^ and A complexity, is 
developed. A fundamental problem, namely, element uniqueness, is chosen as a computational 
prototype. It is shown under general I/O protocol assumptions that any chip that decides 
if a list of n elements (each with (l+e)logn bits) are unique must have <f> = ft(nlogn) » and 
thus, AT^ - = Q(n“log^n), and A = £2(nlogn) . A theory of VLSI transformability reveals the 
inherent AT- and A complexity of a large class of related problems.
20. O IS T ñ t 8 U 7 1 G N /A V A I L A B I L IT  Y OF A B S T R A C T  
| u N C L A S S i F ! E Q / U N L I M I T E O } S  SAM E AS RPT. □  O T IC  USERS □
21. A B S T R A C T  S E C U R IT Y  C L A S S IF IC A T IO N
Unclassified
22a. N A M E  OF RESPO NSIBLE IN D IV ID U A L 22b. T E LE P H O N E  N U M B E R  
(Include Area Code)
22c. O F F IC E  S Y M B O L
TDD FORM 1473. 83 APR E D IT IO N  OF 1 JAN  73 IS O BSO LETE. Unclassified
S E C U R I T Y  C L A S 3 l r i r ¿ T ¡ n N  r>p s ,
On Problem Transformability in VLSI*
Scot Hornick and Majid Sarrafzadeh
Coordinated Science Laboratory and 
Department of Electrical and Computer Engineering 
University of Illinois 
Urbana. IL 61801
Abstract: The two basic performance parameters that capture the complexity of any VLSI chip are 
the area of the chip. A. and the computation time. T. A systematic approach for establishing lower 
bounds on A is presented. This approach relates A to the bisection flow. <f>. A theory of problem 
transformation based on 6, which captures both AT2 and A complexity, is developed. A funda­
mental problem, namely, element uniqueness, is chosen as a computational prototype. It is shown 
under general I/O protocol assumptions that any chip that decides if a list of n elements (each with 
(l+€)logn bits) are unique must have <f> = il(nlogn). and thus. AT2 = Q(n2log2n), and 
A = Q(nlogn). A theory of VLSI transformability reveals the inherent AT2 and A complexity of a 
large class of related problems.
Key words: VLSI model of computation, area-time tradeoff, lower bound, problem transformation, 
computational prototype.
'This work was supported in part by the Semiconductor Research Corporation under contract RSCH 84-06-049-6.
11. Introduction
In the study oi complexity theory, a fundamental problem — normally referred to as a com­
putational prototype -  is chosen as the representative of a class of related problems. Establishing a 
lower bound on some significant performance parameter of a computational prototype has always 
been a difficult task, but once it is accomplished, the same bound for the rest of the problems in the 
class is established by means of problem transformation. Employment of a computational proto­
type is now classical: the most well-known examples are satisfiability in the theory of NP- 
completeness [GJ] and element uniqueness in the RAM model [PS].
Recent improvements in fabrication technology have made VLSI an attractive computation 
environment. The new challenge is the exploitation of the properties of VLSI to build efficient and 
effective computational structures. In the VLSI model of computation as formulated by 
[T.BK.AA], the fundamental complexity measures are A, the area of the VLSI chip, and T, its com­
putation time. \  LSI computation theory addresses the problem of using these two resources in an 
optimal (or efficient) manner. In order to establish criteria of optimality, research is often directed 
at proving lower bounds on area, time, or various functions that capture an area-time tradeoff, e.g., 
AT-. Standard techniques exist for proving lower bounds on T and AT2: they are based on 
bounded fan-in arguments (in the case of T) and on information flow arguments (in the case of 
AT“) [T.BP]. In this paper, we will present a standard technique for proving lower bounds on A. 
This technique is very similar to Thompson's bisection flow technique. Indeed, we will show that a 
lower bound on the bisection flow for a particular computation immediately implies a lower bound 
on the area of any chip that performs the computation (subject to appropriate input/output proto­
col constraints).
To establish a lower bound on the bisection flow for a problem II. there are two ways to 
proceed. The traditional approach is to essentially start from scratch, without taking advantage of 
previously derived lower bounds. A different approach is to utilize facts already known about 
another problem and show, by means of problem transformation, that II is at least as hard as this
2problem. Lntil now, the first technique has been used almost exclusively; the second approach has 
been used only in trivial situations, for example, to observe that inverting an arbitrary matrix is at 
least as hard as inverting a triangular matrix. Our goal is to establish a framework in which the 
second technique, that is. problem transformation, can be efficiently employed. This framework 
can be used to establish nontrivial lower bounds for a large class of related problems.
This paper is organized as follows. In section 2, we modify the bisection flow technique of 
Thompson to lower bound A instead of n/ a T. We investigate the duality of area and time in these 
lower bounds and show how. under this duality, a n/ aT (i.e., AT2) lower bound, obtained by bisec­
tion flow arguments, implies an A lower bound. In Section 3, we develop a theory of problem 
transformation in VLSI that is based on the bisection flow. A computational prototype, namely, 
element uniqueness, is introduced and nontrivial lower bounds on the bisection flow for this prob­
lem are established. Finally, in Section 4, these results are integrated to establish nontrivial AT2 
and A lower bounds for a large class of problems.
2. Lower Bounds Using Bisection Flow
Thompson, in his seminal thesis [T], proposed a now classical technique for analyzing VLSI 
complexity, as follows. Consider a problem II(s), where s is the input size, and a chip Cn with area 
A that is capable of solving n  in time T. Let l be a cut that partitions Cn into a left side (L) and a 
right side (R), such that each side reads (almost) half of the inputs, i.e., s/2 -  o(s) bits, as shown in 
Figure la. The general framework is one in which two processors, PL and PR. associated respec­
tively with L and R cooperate to solve II(s) (see Figure lb). We denote by 0 n(s) the number of 
bits that PL and PR communicate to solve II(s). As Ullman noted [U], the history of the computa­
tion performed by Cn can be modeled with an area-time solid, as shown in Figure 2. The commun­
ication channel between PL and PR is represented by the rectangle F (indicated by the dashed line) 
that transects the longer of the two area dimensions. Thus. F has sides of length T and (at most) 
n/A; so Af. the area of F, is at most VAT. If 0 n(s) bits must flow across this channel, then 
AF = Q(<£n(s)). Hence. we obtain:
3a. Bisection of a chip
or, equivalently.
b. A spatial two-processor system
Figure 1
VÀT = l i(0 n(s)). ( 1)
AT2 = n ( i n2(s)). (2)
Figure 2. Area-time solid with spatial bisection of inputs
Lower bounds on chip area have been obtained for a number of specific problems, e.g., 
[BK,BP,L,S,DS\ r]. However, unlike AT2 lower bounds, for which Thompson’s thesis gives us a 
standard prool technique, A lower bounds are usually proven with involved ad hoc arguments. 
Lntil now, general results on area were known only for 0/1 output functions [Y2] and for transi­
tive functions [V]. Here, we generalize both of these results and present a new methodology for 
proving area lower bounds.
4Again, we consider the area-time solid that models the computational history of Cn. Suppose 
there is a time t, at which Cn has read (almost) half of the inputs, i.e., s/2 -  o(s) bits. Let F (indi­
cated by the dashed line) be the rectangular intersection of the plane t = t, with the area-time solid, 
as shown in Figure 3. Clearly, AF = A.
This bisection also yields a two-processor system. Here. PB and PE, associated respectively with the 
beginning (0 ^  t ^  t,) and end (t, < t ^  T) of the computation of Cn, cooperate to solve II(s) (see 
Figure 4). We denote by M s )  the number of bits that PB and PE communicate to solve II(s). 
Because the electrical circuitry of the chip must be causal, information cannot flow backwards in 
time, and so this communication is strictly one-way, from PB to PE.
Area
Figure 3. Area-time solid with temporal bisection of inputs
s/2 bits s/2 bits
Figure 4. A temporal two-processor system
We can now state the following theorem relating A to i//n(s).
5Theorem 1; Any chip that solves IT(s) must have area satisfying
A = f i(^ n(s)). (3)
Proof: As above, let us first assume that there is in fact a time t, when s/2 — o(s) bits have been 
read. I hen the rectangle F (see Figure 3) represents the communication channel from PB to PE. All 
iniormation that crosses F must be encoded in the chip's state (i.e., stored in its memory) at time t,. 
Since the storage of a bit requires some constant amount oi area under any realistic assumptions,
a  = a f = n ( ^ n(s)).
Now, it there is no such time t/, then at some instant O(s) bits must be read simultaneously. 
This requires the existence of ft(s) input ports, which would occupy Q(s) area. Thus, A = H(s) in 
this case. But xjju(s) ^  s/2. since PB can simply send all of its inputs to PE. Therefore, in this case, 
we also have A = iK ^n(s)). □
The above theorem gives us a convenient relationship between the area complexity of a VLSI * 
chip and the one-way communication complexity of a two-processor system. However, because 
two-way communication complexity is the measure of interest in the proof of AT2 lower bounds, it 
is convenient to relate area to this measure also. If we denote by iAn(s) the number of bits that PB 
and PE must communicate to solve II(s) when two-way communication is allowed, then obviously 
vAn(s) ^  Thus, we have the following corollary.
Corollary 1: Any chip that solves IT(s) must have area satisfying
A = Q(i/fjj(s)). (4)
Although this bound may in general be quite weak, we will find it sufficiently tight for many prob­
lems.
Input/output protocol constraints are often established in VLSI computation theory. Such 
constraints reflect realistic assumptions regarding the physical structure of VLSI chips and the com­
puting environments in which these chips might be used, they simplify the combinatorics involved 
in the lower bound arguments, and they avoid redundant solutions. Here, we will investigate the
6dual roles played by area and time in the proof of these lower bounds. In particular, we will show 
how a spatial constraint on the input/output protocol, which may be used to bound 0 n(s), 
corresponds to a temporal constraint, which may be used to bound ^ ( s ) .  The fundamental obser­
vation here is that <f> (<// ) depends only on the distribution of input/output variables between PL 
and PR (PB and PE), and that the class of allowed distributions is governed by the spatial (tem­
poral) input/output protocol constraints.
We begin by summarizing typical input/output protocol constraints. For the purpose of this 
discussion, we will assume that the input is organized as n words, each with k bits. First, we have 
spatial constraints:
(A l) Unilocal: Each input/output bit is available at only one port (but perhaps at several time 
instances):
(A2) Place-determinate: Input/output data are available at a prespecified (instance-independent) 
place:
(A3) Word-local: For any cut l partitioning the chip. o(n) input (output) words enter (exit) the 
chip on both sides of Z;
(A4) Bit-local: For any cut Z partitioning the chip. o(k) input (output) bit positions enter (exit) 
the chip on both sides of Z.
Second, we have temporal constraints:
(Bl) Semellective: Each input/output bit is available at only one time instance (but perhaps at 
several ports):
(B2) Time-determinate: Input/output data are available at a prespecified (instance-independent) 
time:
(B3) Word-serial: At any time instance, at most one input (output) word has some, but not all, 
of its bits already read (written):
7(B4) Word-parallel: At any time instance, for all but at most one l, either all or none of the Zth 
significant bits of the input (output) words are already read (written).
When A1 and A2 (B1 and B2) are the only protocol constraints extant, the protocol is said to be 
non-word-local (non-word-serial).
Now. we will discuss the manner in which these constraints restrict the class of distributions 
of input variables allowed in the two-processor system. Constraint A1 ensures that any particular 
input/output bit resides in either PL or PR, but not both. Correspondingly, constraint B1 ensures 
that any particular input/output bit resides in either PB or PE. but not both. Constraint A2 (or B2)
• 4
ensures that, lor all problem instances of a given input size, any particular input/output bit resides 
alwavs in the same processor. Constraint A3 distributes the input/output bits between PL and PR 
essentially by word (possibly with o(n) words fragmented across processors). Constraint B3 
corresponds to A3 but is somewhat stronger. It distributes the input/output bits between PB and 
PE also by word (with o(l) words fragmented across processors). Constraint A4 distributes the 
input/output bits between PL and PR essentially by their position in their respective words (possi­
bly with o(k) positions fragmented across processors). Constraint B4, similar but stronger, distri­
butes the input/output bits between PB and PE also by bit position (with o(l) positions fragmented 
across processors). Because of this correspondence (see Figure 5), any theorem lower bounding <f> 
(and hence VAT) that is predicated on some combination of A1-A4 immediately yields a theorem 
xower bounding \Jj (and hence A) that is predicated on a corresponding combination of B1-B4. 
Hereafter, we will use the notation of the spatial two-processor system to establish lower bounds
on cf>. From the previous discussion, it is clear that the same arguments can be used to establish 













Figure 5. Table summarizing correspondence between 
spatial and temporal constraints
3. Transformability in VLSI
In this section, we will develop a general theory for establishing lower bounds on AT2 and A. 
In Section 2, it was shown that the bisection flow fully captures both the AT2 and the A measure of 
complexity.
Following the notation of Préparata-Shamos [PS], consider two problems n ^ )  and II2(s2). 
and assume that a two-processor system Pni(Sl) is available that solves 1 1 ^ ) .  Problem II2(s2) can 
be solved as follows.
1) The input to problem II2(s2) is converted into a suitable input to problem rijCsj).
2) Piijisj) is used to solve II^S!).
3) The output of rijCsj) is transformed into a solution to problem n 2(s-0.
Thus, it is said that problem II2(s2) has been transformed to problem II^S!). If steps 1 and 3 
(above) can be done by transmitting 0 21CS2) bits between the two processors in Pn,(Sl)» then ILCs-O
<^2.1^ 2^
is said to be 0 2ii(s2) - transformable to II^Sj), we write: II2(s2) — ^ ( s j ) .
Proposition: If problem n2(s2) is known to require 0n2(s2) bits of information flow and n 2(s2) is 
02.i(s2)-lransformable to rijCsi), then IljCsi) requires information flow of at least (f>n,(s2) — 0 2 jCs-O 
bits in the two-processor system associated with Il^Si).
9Now we need to search^ for a problem II(s) for which we can establish a lower bound of 0n(s)
o(0n(s))
on the information flow and a transformation n(s) -  IT(s'). for many related problems IT(s').
n(s) then serves as a computational prototype for this class of related problems. A good computa­
tional prototype lor a complexity class must be a simple problem, which makes it difficult to estab­
lish a lower bound on its bisection flow complexity. Indeed, this is the case for computational pro­
totypes in other models of computation (e.g., satisfiability in the theory of NP-completeness). In 
this paper, we choose element uniqueness (EU) as a computational prototype.
EU(n.h): Given n inputs (x j......xn), each of which is represented with h+logn-1 bits, decide if they
are all unique ( h ^ l ,  otherwise the problem is trivial). By convention, if they are all 
unique, then the output (one bit) is 1, otherwise the output is 0.
The following framework will be used to establish a lower bound on the communication com­
plexity of any two-processor system that solves HU(n.h). Consider a decision problem II(s). where 
s is the number of inputs and let P„(s) be a two-processor system that solves II(s). Consider a 
matrix M„(s). called the result matrix of II(s). with all 2s/2 possible values of inputs in PL as its 
row indices and all 2s12 possible values of inputs in PR as its column indices. The (i.j)-th entry of
the matrix is the output of Ills) when its input corresponds to the values of i and j. It has been 
shown by [Yl. MS]:
0 n(s) = [log rank (Mn(s))] (5)
In what follows, we show that 0 ElJ(n,h) = il(nh) under the word-local protocol (Lemma 1) 
and also under the bit-local protocol (Lemma 2). The two results will be combined in Theorem 2 to 
show the same lower bound for EU under the non-word-local protocol. Consider the input data 
organized as an array, with each word constituting a row and with the bit positions aligned as 
columns. We begin by partitioning the input array as X = [M.D], where M (the matching part) and 
D (the data) are blocks of logn-1 and h columns:
10
logn-1 bits h bits
The bits of M will be used to enforce an appropriate matching of the input words, which will be 
specified later. Subsequently, we will be concerned only with the information flow induced by D, 
and all bisection arguments will be based on the bits of D.
Lemma 1: Under the word-local protocol assumption (A3), </>ElJ(n,h) = H(nh).
Proof. The proof is based on a restriction of element uniqueness to pair-wise element uniqueness. 
Without loss of generality, we assume that dj enters PL for 0 ^  i < n/2 and it enters PR for
n/2 ^  i < n. We will prove a lower bound on the flow by considering the restricted class of input 
assignments such that:
m, = i, for 0 ^  i < IL, and
_ • n r n ^  . mi — i~ -y . lor — ^  i < n.
In essence, we have partitioned the inputs into n/2 pairs, where each pair contains dj and di+n/2 for 
0 ^  i < n/2. The two members of each pair are in a different processor (one in PL, the other in PR). 
Thus, the elements are not unique (output = 0) if dj = d1+n/2 for any 0 < i < n/2. It can be shown 
by a generalization of the argument in [MS] that the result matrix has full rank (2nh/2), and thus 
the flow has a bound of Q(nh)[GLTWZ]. □
Under the word local assumption, each bit of a given input word enters the same processor
(PL or PR). Now, we consider the "opposite case." where half of the input bit positions are assigned 
to each processor.
Lemma 2: Under the bit-local protocol assumption (A4). 0EU(n,h) = Q(nh), for h = O(logn).
Proof. The fragment of d4 in PL (PR) is denoted by djL (djR). As before, we restrict our attention to 
a particular class of inputs, one that forces a pairing of a word fragment in PL with one in PR.
11
We partition D into n/H groups of H elements each, where H = 2h/2+1. In the jth group
(0 < j< n/H -1 ) . let:
mj = j, for 0 < i < H. 
djH+i = djH+i = i. for 0 ^  i < H/2. and 
djH+i = n-jO-H/2). dj5+1 = cr¡(i—H/2), for H/2 ^  i < H.
where n j is an arbitrary permutation of {0.....H/2-1) and a , is an arbitrary mapping







d j H : 0 0 j
d j H + l : 1 1 j
•
d j H + H / 2 - l : ' 2 » « -l —2t/5_i
d j H + H / 2 : 7Tj(0) CTj(O)
•
d j H + H —1 : 7Tj(2h/2—1 ) Orj(2h/2- l ) j
In this setting, the elements are not unique (output=0) if ir,(i) = cr,(i) for 0 < i < H/2 and any j. 
There are (H/2)! permutations of irj: for any j. The result matrix for any one group is the subma­
trix obtained by deleting certain rov/s from the matrix introduced in Lemma 1. Thus, it also has 
full rank. i.e.. (H/2)!. The overall result matrix is the Kronecker product of the group matrices, 
and its rank is therefore the product of the ranks of these matrices:
|n/H |
n  (H/2)! .
12
From Equation 5, we can establish the desired bound on the flow:
n/H n/H
0 = log n  (H/2)! = £  log(H/2)! = |n/H |(H/21ogH/2) = n(nlogH). 
and, because H = 2h/2+1, 0 = D(nh). □
Now we will extend the results of Lemmas 1 and 2 to the non-word-local protocol assump­
tion. In this situation, any bit of any word may enter either of the two processors.
Theorem 2: Under the non-word-local protocol assumption. 0 EU(n,h) = fUnh). for h = O(logn).
Proof: Our strategy is to show that, for an arbitrary (but fixed) partition of the input bits, a large 
portion of the input words must all be either "substantially" word-local or "substantially" bit-local. 
The set of input words is partitioned into two sets, the set B of biased words and the set U of 
unbiased words. Intuitively, a biased word is one with most of its bits in one processor (PL or PR),
and an unbiased word is one with almost the same number of bits in each processor. More for­
mally,
B= {dJld,L! > ^ L o r ld i 'l  > i U .a n d  U = {d, jil I d,L I «  llL |
4  4  4  4
Note that B | J  L = D and B f ]  U = 0 . We analyze the distribution of input bits in each proces­
sor according to the size of the sets B and U.
Case 1) I B I ^  3n/4 (thus I U I ^  n/4):
We partition the biased words further into the left-biased BL and the right-biased B„, namely.
Bl = {d,€B |ldiI-| > - t- I ’and BR = {d|€B 11 d,B I >
4 4
The total number of bits in PL are nh/2. At most n/4(3h/4) = 3nh/16 of these bits belong to the 
set U, and at most n(h/4) = nh/4 of these bits belong to the set BR. Thus, at least nh/2 - (3nh/16 + 
nh/4j = nh/16 bits in PL belong to the words in BL. A symmetric argument verifies that at least 
nh, 16 bits in PR belong to the words in BR; thus, I BL i , ! BR I ^  n /16. Consider two input words 
di€BL and dj€BR. Clearly. d,L has at least h/2 bit positions that correspond with positions in d,R.
13
The remaining h/2 bits (of both d; and dj) may be set to any arbitrary value. The result matrix has 
rank 2nh/32. and thus <f> = O(nh).
Case 2) I B I < 3n/4 (thus I U I > n/4):
Consider an arbitrary pairing of the elements of U and let (d^dj) be one such pair. By the 
definition oi unbiased, d, must have at least h/4 bits in each processor. Each bit position in djL (d R) 
corresponds to a bit position in either djL or d R. We can distinguish two subcases:
Case 2a) Either h/8 positions in d;L correspond to positions in d R or h/8 positions in d R correspond 
to positions in djL. These pairs are referred to as word-type.
Case 2b) In the event that (dj.dj) does not satisfy the conditions of Case 2a. then, by the pigeonhole 
principle, h/8 positions in djL correspond to positions in djL and h/8 positions in d,R correspond to 
positions in djR. These pairs are referred to as bit-type.
Clearly, each of the n/8 pair of unbiased inputs is either word-type or bit-type. By another appli­
cation of the pigeonhole principle, either there are at least n/16 word-type pairs or there are at least 
n/ 16 bit-type pairs. Thus, we either have a word-local setting or a bit-local setting. In either case, 
from Lemmas 1 and 2, we conclude <f> -  fl(nh). □
This implies, by virtue of Equation 1. that any chip with area A that solves EU(n,€logn) in time T 
satisfies AT2 = iX n2log2n) under the non-word-local assumption, and. by virtue of Equation 4. 
A = Q(nlogn) under the non-word-serial assumption.
4. Applications
In this section, we demonstrate the application of the previous results to the establishment of 
AT- and A lower bounds for related problems. First, however, we prove the following lemma, 
which facilitates problem transformation by allowing us to relax the unilocal (semellective) 
assumption. A1 (Bl). Instead, we now assume A l' (Bl').
(A1  ^ Bilocal: Each input/output bit is available at no more than two ports (but perhaps at several 
time instances).
14
(B1 ) Bilective. Each input/output bit is available at no more than two time instances (but 
perhaps at several ports).
Let 0 'EU(n,h) denote the bisection flow under Al*. A2. and A3. Obviously, the traditional 
bisection technique tails to establish any bound on </>'EU(n,h) because each input may enter both 
processors (PL and PR). Nevertheless, we can still obtain a lower bound on 0 'EU(n,h) by employing 
a method similar to the bisection technique.
Lemma 3: 0 'EU(n.h) = H(nh) for h = O(logn).
Proof: Consider any (convex) chip C'EU that solves EU(n.h) under A l', A2. and A3. Let us parti­
tion the chip into four sections, by means of lines parallel, to the shorter side of the minimum-area 
enclosing rectangle, such that each section contains n/2 input words (recall that there are now 2n 
input words: {xo.Xo.Xj.Xj.... x ^ ^ x ^ } ) .  The general framework is one in which the four proces­
sors (P1.P2.P3.P4) associated with the four sections of C'EU cooperate to solve EU(n.h) (see Figure 6).
n/2 n/2 n/2 n/2
a. C'EU to solve EU(n.h)
Figure 6
b. a four-processor system
A straightforward modification of Equation 2 implies:
AT2 = n ((0 'EU(n.h))2),
where <£EU(n.h) = max(<f>‘1,<f>'2.<i>'3). A lower bound on any one of the <p'¿s cannot be established 
independent of the others, for it may be the case that processors to the left or rignt of the link asso­
ciated with have access to the entire input set and thus do not need to send or receive anv infor-
15
mation to or from the other processors. In fact, this situation occurs when Y>1 and P3 each contain a 
copy of (x0.........xn/2-i)’ and P2 and P4 each contain a copy of (xn/2, . . . .  xn_j).
Our strategy is to partition the four processors into two sets. PL and PR, such that each set 
contains both copies of (at least) n/16 input words. These inputs can then be revealed to the other 
set only by information flow through the links connecting PL and PR. Since there are a total of n/2 
inputs in P]f and each input is repeated twice then there must be at least n/4 distinct inputs in Pj. 
The other copies of these n/4 input words are in P1# P2. P3. or P4. By the pigeonhole principle. 
Px and Pi (for some 1 ^  i ^  4) must contain both copies of at least n/16 input words. Let 
Pl {Pil LJ {Pi) and PR — {P2, P3, P4} — {Pj. We can view PL—PR as a two processor system with 
a flow 0 LR of n(nh/16) bits between PL and PR (see Lemma 1). Clearly. </>LR ^  <f>\ + <f>'2 + 0 '3, and 
thus. <£'EU(n,h) = max(0'1.0'2>(^ ,3) = il(nh). □
From the discussion of section 2, it is clear that the temporal analog of Lemma 3 also holds under 
assumptions B1 . B2. B3. Furthermore. A3 (B3) may be replaced by A4 (B4) while maintaining the 
same flow bound. (The roles of n and h are simply reversed in Lemma 3.)
Now we demonstrate the problem transformation methodology by means of two examples. 
Specifically, two fundamental problems are shown to be at least as hard (in either the AT2 or the A 
sense) as element uniqueness. We conclude with a brief catalog of related problems, together with 
lower bounds on their AT2 and A complexity, as obtained via problem transformation.
The first problem is a fundamental one in computational geometry, namely, closest pair.
CP(n.h). Given a set of n points p; = (a^b,) for 0 ^  i < n. where each coordinate is represented 
with h+logn-1 bits, find the closest pair of points.
We want lo show EU(n.h) — CP(n.h). Assume there is a two-processor system PCP(n,h) that
solves CP(n.h). This system can then be used to solve EU(n.h) under the non-word-local assump­
tion (A1 and A2) in the following manner.
16
1) The coordinates of each point are set as pj — (x,,0), which is a trivial transformation.
2) Pcp is used to solve this (restricted) closest pair problem. (The chip is bisected in such a way 
that each processor inputs half of the "meaningful" data, that is. XjS.)
3 ) Once the closest pair ol points is determined. PL sends all of its output bits (O(logn)) to PR. 
PR then computes the distance between the two points and outputs a 0 if the distance is equal 
to 0 and a 1 otherwise. It is clear that the output is 1 if and only if the elements are unique.
By Theorem 1. 0 EU(n,h) = f)(nh). Since steps 1 and 3 above require the transmission of
O(lugn)
O(logn) -  o(nh) bits. EU(n.h) -* CP(n.h). Theorem 3 follows immediately from the proposition.
Theorem 3: Under the non-word-local protocol assumption. <f>CP(n.h) = ft(nh) for h = O(logn).
Thus, any chip with area A that solves CP(n.€logn) in time T satisfies AT2 = H(n2log2n) under the 
non-word-local assumption, and A = il(nlogn) under the non-word-serial assumption.
Now we establish a 0  lower bound on the problem of finding the size of the maximum clique 
in an interval graph (MCIG) by showing that EU is transformable to it. This serves as an excellent 
illustration of the utility of the previous results.
MCIG(n.h): Given a collection of intervals Ii=(Zi.ri) for 1 ^  i ^  n. where lx and rt are respectively 
the left and right endpoints of interval Ij. we can define a graph G = (V.E). where 
V—{Ij | 1 ^  i ^  n} and E={(Ij.Ij) | IjElIj ^  (f>, 1 ^  i.j ^  n}. Such a graph is called an 
interval graph. Let h+logn-1 be the length of the integers used to represent the /¡s 
and TjS. i.e., 0 ^  lvrj, ^  n2h 1—1 for 1 ^  i ^  n. The problem is to find the size of the 
maximum clique in this graph.
o(nh)
We want to show EU(n.h) — MCIG(n.h). Assume there is a two-processor system PMaG(n.h) that
solves MCIG(n.h). This system can then be used to solve EU(n.h) under the bilocal assumption 
(AT. A2, and A3) in the following manner.
17
1) Each interval is set as Ij (xj.Xj), which is a trivial transformation due to the bilocality.
2) p mcig is used to solve this (restricted) maximum clique problem.
3) Ignoring the least significant bit, PL and PR form the logical OR of their output bits. PR then
sends its result to PL (or vice versa), and PL outputs the NOR of its result and that of PR.
This requires exactly one bit of additional communication. It is clear that the output is 1 if
and only if the elements are unique.
By Lemma 3, $ EU(n,h) = H(nh). Since steps 1 and 3 above require the transmission of one bit,
0(1)
EU(n.h) -* MCIG(n.h). Theorem 4 follows immediately from the proposition.
Theorem 4: Under the word-local protocol assumption, <t>MCiC(n,h) = ft(nh) for k = O(logn).
Thus, any chip with area A that solves MCIG(n,€logn) in time T satisfies AT2 = Q(n2log2n) under
the word-local or bit-local assumption, and, A = H(nlogn) under the word-serial or word-parallel 
assumption.
Element uniqueness can be transformed to a large number of related problems. Here, we list 
a few. All of these problems have AT2 = D(n2logn2) and A = nlogn.
1) Visibility problem: Given a collection of vertical segments Sj = (bj.tj). where b{ and q are respec­
tively the bottom and the top points of Sj. for l ^ i ^ n ,  find all pairs of segment that "see" each 
other — two segments Sj and Sj "see" each other if and only if there exist a horizontal segments that 
crosses only Sj and Sj. Element uniqueness is o (l) transformable to this problem even if only one 
such a pair is desired.
2) Interval graph problems: maximum independent set. minimum clique cover, and minimum 
dominating set in interval graphs.




We would like to express our gratitude to Franco Preparata for his guidance during this 
research. We also acknowledge many helpful and enlightening discussions with Gianfranco Bilardi, 
Prasoon Tiwari. and Doug West.
19
REFERENCES
[AA] Abelson. H. and Andreae. P.. "Information Transfer and Area-Time Trade-offs for VLSI Mul­
tiplication," Communications of the ACM , vol. 23, no. 1, 1980, pp. 20-22.
[BP] Bilardi, G. and Preparata, F. P., "Tessellation Techniques for Area-Time Lower Bounds with 
Applications to Sorting," to appear in Algorithmica. Mar. 1986.
[BK] Brent, R. P. and Rung, H. T., "The Chip Complexity of Binary Arithmetic," Journal of the 
ACM, vol. 28, 1981, pp. 521-534.
[DSVT]
Öuris. P.. Sykora. O.. V n o . I.. and Thompson. C. D.. "Tight Chip Area Lower Bounds for 
Discrete Fourier and Walsh-Hadamard Transformations," Information Processing Letters, vol. 
21, no. 5. Nov. 1985, pp. 245-247.
[GJ] Garey, M. R. and Johnson, D. S., Computers and Intractability, W. H. Freeman and Co., 1979. 
[GLTWZ]
Gafni, E.. Loui, M. C., Tiwari, P., West D. B., and Zaks, S., "Lower Bounds on Common 
Knowledge in Distributed Algorithms," technical report ACT-50, Coordinated Science Labora­
tory, University of Illinois. 1984.
[L] Leighton. F. T., "Tight Bounds on the Complexity of Parallel Sorting." Proceedings of the 16th 
Annual ACM Symposium on the Theory of Computing, Washington D.C., Apr. 1984, pp. 71-80.
[MS] Mehlhorn. K. and Schmidt, E. M., Las Vegas is Better than Determinism in VLSI and Distri­
buted Computing." Proceedings of the 14th Annual ACM Symjx)sium on the Theory of Comput­
ing, San Francisco. May 1982, pp. 330-337.
[PS] Preparata, F. P. and Shamos, M., Computational Geometry, Springer-Verlag, 1985.
[S] Siegel, A. R., Minimum Storage Sorting Networks," IEEE Transactions on Computers, vol. C- 
34. no. 4. Apr. 1985, PP. 355-361.
20
[T] Thompson. C. D., A Complexity Theory for VLSI, Ph.D. thesis. Department of Computer Sci­
ence, Carnegie-MelIon University. 1980.
[U] Ullman. J. D., Computational Aspects of VLSI, Computer Science Press, 1983.
[V] Vuillemin. J., "A Combinatorial Limit to the Computing Power of VLSI Circuits,” IEEE Tran­
sactions on Computers, vol. C-32, no. 3, Mar. 1983, pp. 294-300.
[Yl] Yao, A. C., "Some Complexity Questions Related to Distributive Computing." Proceedings of 
the 11th Annual ACM Symposium on the Theory of Computing, Atlanta, Apr. 1979, pp. 209- 
213.
[Y2] Yao. A. C , "The Entropie Limitations on VLSI Computations," Proceedings of the 13th Annual 
ACM Symposium on the Theory of Computing, Milwaukee. May 1981, pp. 308-311.
