Introduction
The theory of computation is valid over a synthetic cJomain:
its formal ~odels have relevance only if they correspond to possible computational systems.
Technological chanEzes can affect the realm of possiblilty. In this light, it would be surprisin8 if the "VLSI revolution" did nol spawn new theoretical models. This paper is an attempt to show thai interestin~ complexity results are available throuLjh the use of a "VLSI model of computation".
Two parameters are of overriding interest in a VLSI design, its speed and its size. Speed can be handled with familiar complexity tools, that is, measuring lime by counting elementary operations. Size in the VLSI world is best expressed as the total area of silicon used. This is quite a different metric from a count of "active elements", "gates", or "registers". It may be the case that most of the chip is devoted to connections between such active elements. A complexity theory for VLSI must thus concern itself with the layout of active elements in the plane, alon 8 with their interconnections.
This research is supported in part by the National Science Foundation under Ttmere is a natural unit of area for VLS]. Manufacturing and physical limitations give rise to a "minimum feature width", X. This is time widlh of the narrowest wire, and X 2 is approximately the area of the smallest transistor. The 64K
RAM currently available has an area of about 10SX 2. Chips of lO 7 or J08 x. 2 may be possible (Mead, 1978) .
The choice of a unit of time is slightly more problematical.
Here, followinE (Mead, 1978) , it will be taken as the length of time that it takes a signal to propagate along a wire, or on--chip interconnection. This propa.~ation time can be made independent of the lensth of the wire, by fitting larger drivers to longer wires. Larger drivers of course occupy more area, but need never take more than 107. of the area of the wire they drive (IX 2 for a wire of length IOX, I04X. 2 for a 105X wire). By fudEini] ~ upwards by 5~, the area of the driver is thus absorbed into the area of its wire.
A full exposition of the VLSI model is deferred to Section 2:
The DFT.
The computational problem studied in this paper is the [Discrete Fourier Transform (DFT). The DFT is defined over any commutative ring, but only finite rings will be considered here. Elements of infinite rings have no fi×ed-lenElh representation, leadin6 to 6rave computational difficulties. Approximate methods are beyond the scope of this paper.
A satisfactory ring does exist for VLSI, the integers modulo m. If m = 2k-l, ordinary fixed-point arithmetic (ones-complement, ignoring overflow) on k bit words will produce exact answers. An N-point DFT can be performed in this rin K if N divides p-t for each prime p dividing m . (Bonneau, J973 ). An immediate consequence of Bonneau's result is that m > N.
81
Formally, tile DFT is a rnatrix-vector multiplication, A~ = ~.
The input vector is x, the output vector is ~, and A is an N by N matrix of constants,
The constant M must be a principal Nth root of unity. That is, it must satisfy ~N = [, and ~.,~jk = 0, for L<k<N 0~j<N
A fuller explanation of tile DFT may be found in (Aho, i974 
Notation.
The following functional notation is used throughout this paper,
"f is big O of g", an upper bound. There exists a constant c for which f(n) _< ca(n) for all but some finite (possibly empty) i set of non-neGative values for n.
fin) = 9(g(n)) "f is theta of g", an exact bound. There exist constants c 1 and c 2 for which clg(n) .s f(n) _< c2g(n) for all but some finite (possibly empty) set of non-negative values for n.
"f is omega of g", a lower bound. There exists a constant c for which fin) >_ cg(n) for all but some finite (possibly empty) set of non-negative values for n.
Proof stratesy, results.
The strategy of this paper is to focus on the cost of communications in a parallel system. Little consideration is given to the silicon area or time taken to perform arithmetics on "local" data. Instead, the area-time analysis is Following (Mead, 1978) , a minimum value may be found for some particular cost function, such as the product of area with time. Allernatively, one may seek a function of area and time that describes the perforrnance of many "good"
designs. The result of this paper is expressed in both of these ways.
For cosl fHnrlion~, of the form AT x with 0,<x~2, any cMp lhal performs an N point DFT cosls at least ~(Nt÷X/2). This minimum is achieved on chips whose arithmetic units are connected in a mesh-type pattern.
The relatiod AT 2 ~ N2/16 bounds the performance of any chip of area A that computes an N point DFT in time T, At least two designs come close to this limit: those wilh either a perfect shuffle or a mesh-type interconnection pattern.
Other approaches.
Area and time considerations have been studied previously. This paper's model is based on the assumptions of (Mead, 1978) , who found an optimal value for the area-time product in VLS[ memory chips.
Two studies have been rhode of tile area requirements of interconnection patterns. A random graph on N nodes was found to require O(N 2) area, using a model suggested by the wiring of a printed circuit board (Sutherland, [973) . The problem of embedding bipartite graphs in the plane has also been explored (Cutler, 19788) .
rhe theory of cellular automata (yon Neumann, 1966) can be used to elucidate some aspects of the area-time tradeoff.
For example, bus automata (Moshell, [976) Words.
The basic chunk of information considered in this paper is a word of rlo~ N1 bits. As indicated in Section J., the DFT is defined over a finite commutative ring of more than N elemenl~,.
Since all input vectors are equally likely, the representation of each input element ("number") takes at le~:,t one word.
Wires, units of len£.lh and time.
A wh'e has unit widlh and transmits a word from one end to lhe other in unit time. If the transmission is performed bit-serially, the Llnit of time is proportional to the word lenslh in bits. If the transmission is word-parallel, the unit of lensth is proportional to word length.
PEs.
APE has a "state", or some number of words of storage. 
Nexi.
Wires deliver words to and from a nexus associated with each PE. There is exactly one PE per nexus. Communication belween a nexus and its PE is free, costing no area or time.
Each nexus is square in aspect, with side $ if $ wires connect to it. This ensures that there is more than enough ed~,e len~3th on the nexus to accomodate all connecting wires. The square shape does entail a large area charge for a nexus of larse degree, but in this case its associated PE could be very powerful. One layer is devoted to the "x" direction, one to the "y".
Wires may bend at grid corners. This corresponds to a connection between the two layers of silicon.
At grid corners, wires may cross at right angles with no effect on each olher's signals or timing. This corresponds to insulatin8 the two layers of silicon from each other. 
Minimal Bisec'lion Width
The minimal bi.~,ection width of a ~raph is, informally, the number of cub:. needed to slice it in half. In other words, it is the sr,',allest number of edges whose removal disconnects one half of the vertices frorn the other. In general, il is difficult to compute minimal bisection widlhs: lhe problen', is NP-cornplete, in fact (Garey, 1974) .
Forlunalely, it is enough to know that every graph has a set of ~dge.:~ that realize.,, its minimal bisection width.
The following .,,cottons will derive bounds on area and time for any VLS| design. A graph will be associated with each design, defininD a rninimal bisection widll% ~. Lower bounds of 002/4 and LN/2J/w will be found on area and time respectively. Thus AT 2 Z N2/16. Nexl, cont, icJer "zig-zags" of the form {x=~-I for yZ~, x~-e:+! for ysjl, and y=-~ for ~z-1-<x~+l}. Using the deterrnined above, vary j3 to obtain another bisection of the output next. Only two wires may cross the horizontal (y=~) r.egment, so that its vertical sections must cross at least ~-2 wires.
This zig-zag accounts for the 00-2 square units of wire and no×LIS that lie within ]/2 unit of its vertical sections. By construction, lhese w-2 square units of area are disjoint from the ~ units of area accounted for by the line {x=~}.
In all, Lw/2J zig-zags may be drawn, each of the form {x=~-k for y~, x=a~+k for y_<~, and y=~ for 0~-k_<x_<O~+k}.
Each accounts for ~-2k area.
The total area of wire and nexus is thus
O.~ksL~/2J
Time
The The irfforrr, ation that must be transmitted over tlme parlilioning wires is captured in the cross terms A12~ 2 and A2j~ ~. A good VI_S[ design partitions the DFT so that ~'2 has little effec| on Y.I.' and x I has little effect on Y2"
Later in this seclion, tlme following result will be proved:
in'other word.-., the DFT is inherently non-partitionable.
This notion of partitionability is similar to that of n-independer~ce (Grigoryev, .!.976) . ( Tl~ere is at lem,l one field embedded within each ring ~m" if p is a prime dividing m, then there is a mapping @ from ~rn onto the field Zp. The mapping is the obvious one, @Ca) = a rnod p. [3onneau's result is that every embedded field must support an N point DFT: in particular, each such field must have more than N elements.
Tile lower bound results of this section assume that the DFT is performed over the embedded field ~,p. In the VLSI n',odel, conmputalion in the full ring '~'rn lakes at least as long as lhe reduced con',putalion rnodulo p, since each element of the ref.ult veclor modulo rn uniquely determines lhe corresponding element modulo p.
Proof of equation 4.
In section I, the DFT matrix of constants A was defined: (Pease, 1968 , Thompson, 1980 . Without loss of generality, let oo = N 1/2+'E. Then AT x = £(Nl*X/2*~x + Nt+X/2*¢(2-x)).
Since O.s×~2, lhe second term increases with ~ while the first term decreases with c. Clearly, ~=0 achieves the minimum value, hence the theorem. E] From the proof of Theorem 5, it is clear that the optimal design has o0=O(Nl/2), which corresponds to a mesh-type interconneclion paltern.
A similar analysis rnay b,e. performed for other problems, including matrix multiplication, Gaussian elimination, transitive closure, sorting, and permutation (Thompson, 1980) .
