The Semi-Automatic Generation of Processing Element Control Paths for Highly Parallel Machines by Sabety, Theodore M. et al.
THE SE~II-AUTOMATIC GENERATION OF PROCESSING 
ELEMENT CONTROL PATHS FOR HIGHLY PARALLEL 
~1ACHINES . 
7heoccre ~. Sabety 
3rian !-!athies 
:aVl: El:iot Shaw 
CUCS-127-84 




Ihe Semi-Automatic Geperatioc a! 
?rocessing glement Gootrol Paths ror 
Hignly Parallel ~ach1;os 
Ihis paper describes a recently implemented program that 'lery n;:i:!:y 
generates control paths for different versiocs or the co~ti:uent ;:rocess:~ 
element3 of a ~articu:ar ~a33ively parallel machine, the SeN-yeN 
Supercomputer. The program, called PLAIO, accept3 a3 input a set or 
in3truction opcodes, together with a3sociated control information, a~c 
produce3 as ou:put a functionally c~rrect, highly area-efficient 3et of ?~A' J 
for the processing elements. Among the novel a3pects of the progr~ are ::~ 
u.se of a channel routing algori tr.m :0 genera te a weiaberger Arr3Y :'3.YOU':. :'1:" 
the PtA and the generation, from a single input description, of =:~~ere~: 
'tariants corre.sj:londing ':0 processing el.eClents servir.g different 
3y s~pporting :he extremely rapid generation of processing eleCle~:s ~i:h 
;~:~~~ ~a3 ~:~eady yielded major area and ~erf~r=a~ce :~prove~en:s :~ :~e 
:;:S-';:~ ;rocess:.ng element. ~any of :he ':.ectll::i~ues empl.oyed :'n :ne ?:..,.;:: 
3ys:em shoul.d prove applicable to the semi-a~tomatic layout of ;:rocess:n~ 
e:ecent3 for other cul.eiprocessor ~chines. 
All Ippropr1ate orq1nizational approvals for the puDllcat~cn 
of thi- paper have o..n obtained. It ac:c:epted. th. authors "'ill 
prepare ~e tinal ~ua~1p~ 1n time for lnc:lu.lon 1n the 
conteronce proc~nqa ,and W11
i
l pres.n~ at the conference. 
'O::!Jf --( " 
Theodore M. a.Oety 
I 
Table of Contents 
7::e :;C~;-VON St..;perccmputer 
~ Ty~€s of Precessing Elements 
~. ~es:~~ ~oals for PLATO: 
4. :;'e ?LA70 Input File 
5. Automatic Weinberger Array Layout: 
6. Channel Routir.g Algoritho 
7. Conclus:on 
1. T"!":e ~CN-'10N SU?ercanputer 
~;ON-'/CN [1: :'05 a ir.assively ;:arallel r.on-vcn ~jeur:ann supercc!r.puter, ;:.:r:i::-.s 
'~h:ch are new under construction at CoL.rnbia. ' ... l1i:e the ::;acr.:r.e ::as e'I':~'i::':: 
over tte past several years, its mest L~portant elements rer.ain :he ~e, 
versior.s of the ::'.acr.ine contain a priil:ary processing subs'/stem (??S), 
implemented using custom r.M:)S VLSI circuits, and 3 secor.darv :recess:':-.g 
s\.ibsystem (SPS), based on a set 0: "intelligent" disk drives. 
Tr.e PPS comprises a large number (perhaps as ~any as a mil:ion) ;r:ce;s:~~ 
eler::ents (PE's), and is constructed frern C1,;stcm nMCS '/LS: chij:)s, -:~c.'1 
containing a nUffiber (eight, at present) PE's. The PPS is organized :0 .- -,,-
.' 
bi::ary tree of PE's. :n all but the latest version of t:-:e r.:acr.ir.e ~:;,::;-,=:; 
'..Jhich '~ill r.ot be discussed further in this ~aper), a single cecero: ::-:--::: ';-
:5 at:actec to the rcot of the PPS tree. :te control ~rccesscr :rcaccas:s 
ir:str'.Jctior.s o..;h.i.ch are executed simulaneously by all. ?S' 5 ir. the f?S. :::C~:-
';Cl :.;. ir.ccrporates a :-:t.r.:ber of processors, each ca~able ef ser'!i:;g as _ 
c::::rltro:' prccessor for soree s1,;btree in tr.e f?S; these "large ;:rccess:;-.g 
~~.::rr.er.:s" are lr.terconnected by a higl':-bar.6 .. :dth ir.terccr:r.ect:cn r:e:',.;cr':<.: 
?:~~;-~ ~ ~~c~:~es a descri~ticn of t~e P?S, 
::..;r :-~;-st ;::rc:cty~e, called I;CN-'~'ON 1, ·,.;"s desig;.ed L:s':'r:g :'argelj ad :-.:~ 
:r.':::-.':':s. Cl.:r ~rir.cipa1 goals in cor.struc~i:-:g :r.e ~;C!;-':C:l 1 ~rc:c:i';::e",e;e 
va::~~:e tr.e essentia~ architectural pri::ci;les of :r.e ~CN-VC~ ~es!~:-:, :: 
:::eas:.;r~ :r.e area ar.d aspect ratios 0:- various 5':'2. icor. s:r-,;c:t.:res :;.cc:-;:cr3 :-::: 
. ..:::r.~r. ::-.'2 ?:::' 5, ar.d ~c ~er!-orm certain electr:ca: ;:",e2SL:remer.ts en :!"" .. e 
-:c-r.:;:le~ed .:r.:~s. Fer th:'s !"'eason, :':~:le at:er:t:-:~ ',.;as g:"/er: t.o e: :::er-
3rea- or t L"::e-optiIr.iz3 tier. in the NON-VON 1 prototy;€ cr.: p. 7:-.e :iC~:-'JC:~ ~:::S 
:::i;: :-.as r.C'~ :ee:: cc<::p1eted, :3bri~ated, and ':es:ec thrc'..:g.": -:'A??!.' 5 ~C:=.3 
system, a::d 3;::~e3rs at ;:;resent to be f1.:1:'1 f'..:r.cticr.a:'. 
ContrOl oroceuor 
"'oot' 
- '_ear "ooes 
Lett C~"ld 
nGau I SON·VON "nmarr ;lrocnu.cq Iflteal. 
A seco~d ;rototy~e, ~CS-VCN 3, is ~ow under develop~e~t. 
was assig~ed to an interesting arch1:ec~~ral exerc1se ~tat we do ~o: ~~r~~~:':': 
~:ar. :0 :arry :eyond the "paper-and-~enc::~ s:age, a::houg~ ~:s :en:ra: ::e~~ 
:~!:~ence future ~CN-VCN ~esigns.) 
~es:ec:s :0 :::e original ~jCN-VCN ~ ~es:gn, 0-..;: :'5 ex;;ected :0 :':::o:-;;o:-a:e 3 
:-;·.;.:::er of i::lpro'le:lents suggested ':Jy the results of our i:-:1:1a1 ex;:er:'=e~.:3 ::: 
:~:p jesign and sof:ware development. : n ~ a r tic u 1 a r , t:: e ~I C ~j - '/ C ~j 3 ~?:: ',<1':'':' 
~ea:.;re: 
1. An area-ef~~cien: e:ght-bl: ~LU to ~e~':'ace tee one-bit ~LJ 
incorporated 1:: :::e prototy;:e NCS-VeN 1 SPS en1;;. 
2. Fewer local registers, based o~ NeN-VeN 1 area =easure=e~ts and 
sof:~are s::ulation results. 
3. A ~ar ~etter r::or ;:lan, ~or~ulated .;si::g precise :easure:e:::s 
:a~er. from tee prototype cnip. 
~. A €er.eral:.:.a:i'Jr. ,:f certai:; ~ICN-VON 1 instr'~ct:'or.s :0 su;:pcr: :::-: 
::ore e:-:-:::ient exec~ticn of ':':'.any ::orr.:cn i::stn:cticr: seqt;e:-:ce.s. 
~. ~-:ss silicon area devoted to control path :'ogic. 
Our plans called :cr the ~ON-VON 3 instruction set to be c:'ose:, :ased J~. : 
wi th ff!W exceptions, cr.ore general than the or.e employed ir. NON-'JCN 1. . 5cr:e 
of the addi ticns · ... e plan to ir.cor;:crate ir. fact corres~r.d ':0 cxr.:c:::j·';':O:': 
r::acros in our existing ~ON-VON 1 software.) It · ... as also deemed i.::portar:: ::-.:;: 
all existir.g ~jQN-VON 1 soft· ... are ~ si.:nply and rr.echanical:'y trar.sla~tle .; ... ~-
NON-VON 3 instr~ctions, so that none of cur ;,;ork to date would :-e :':5:. 
(Translated programs would take advantage of some, but net all c:- ~iC~I-'::~; ~'.:; 
enr.ancer.:ents.) In the future, of course, NCN-VON 3 soft" ... are wi:': :e ·,.;r·_ ':. :2:-. 
\";sing NCN-vml 3 instructior:s, allowing the exploi 'cation of all :::- t::e.se 
features. 
::arl1 in tr.e cevelopr.er.t cycle of ~jCN-VCN 3, it · .... as reccg::izec t:-:at. t.t:e 
success:u: accomplisrmer:t of these ~~bitious area and perfor.r.ar:ce gCa:3 
:e greatly accelerated by tr.e availability 0: a higr.ly au:ur.at.ed S1'S:=:: ::r 
tr.e s~ecifi::ation, design, laycut and testir:g of :r.e ccr:stit~er.t pr:cess:~g 
e:e!r.ents. To:'e useful, such a system ' .. ·ould have to ra~idly ar.d :-e:' iat::l 
§;er:era:e II,.:::rrec:" layouts, allc-..ling :r.e I~ser to ex;:eri.::e::t '".i:r. 21 :er::3::";2 
~r:.:e~~:::~ e:~er.t arch:tectures ',o/ith tr.e cor:f:Ger.ce tr.at :r.e resu:..:i::g _~:;:'';: 
·..Ie''':':': ~:1 :-3C: :aitr.fully realize the mcre abstract:'y specified Gesi~r.. ' .. ;: :":-.::--. 
~'';'::: ~ ser.:i-a~tC<r.atlc ~evelopnent er:'/irorr.:er:t, char.ges i:: :t'.e i::str:..;c::.:::: .:e: 
:r.:~t te rea:'i:.ec! in :-:arcware in a fractior: of the ti:;:e tn3: ;';0"';2.:: c:t.e:-"':':2 
be :-eq '';:' red , faci.:.i~tir.g extensive exper::::er.tation '..Ii::: ar:d "f:r:e t·...;:.:::;" ::-
:::e ~E ar:::i:ec:~re. 
2. Types of ?rccessing El~ents 
~ith :r.i::or excepticns, all ?~fS ~n the ?PS tree are ;:hY3ica:':'y :cer.:i.:a:. 
:::cse cif:erences tr.at do exist are cased on the r.:ar.r:er i:: '".r.ich cccr:::..;:::.:a::::-
27.cng adjacent ?~'s is accc~odated. !r.ese differences i:: CC~L:::ca:::r. .... :::, _._; . -!: _ ... --
':::: :::e :-c~r C::sses c:- ~rocessir.g e:emer.t.s: 
i. ~eaf r.cces that are left child of sane r.ode 
2. :".:af :-:c<:es ::-.at are right child of some node 
3. ::-:ternal .. odes that are left chi:d of scme r.oce 
4. :r:ternal nodes that are the right child of sane ncde 
By cor.vention, the root is considered an internal r.ode and a left chi:j. . - .'-
_c _0. 
:y~e of PE must be aware of its ty~€ so that it can properly part:ci~a:e ,~ 
the communication instructions that o~erate between the PE's. For eX27.;:~, 
• .... hen executing a SEND L:::'"'T QiILD instruction, each left child in 'Che :r-::-:: .::._::: 
latch data into an I/O register as · ... ell as send data to its left chi::; ::-.~ 
ri~~t children only ser.d data and do not latch any incoming data. 
One possible techniq~e used to differentiate between the ty~es of ?~!5 ~:_:~ 
te :0 encode t:..~e ?E ty~€ on two cor.trol lir.es that enter tr.e PLA at ::-.e ~:- :.:': 
to the ;'11[;-plane. In this scheme, one wire would disti.nguish tet·..:een :'2:-: :::-: 
right Children, ' .... hile tr.e other would distinguish bet' .. een leaves ar.d ir.:er::a: 
:-:odes. '..rr.en tr:e individual chips ',.jere wired ':.ogetr.er to for.:: a ccm;:lete ??S. 
tr.ese :.r.puts · .... culd be ;:Er.nanently · .... ired to tr.e :pprc;:ri:te ccr;stan: ~cg:'c 
va':"..:es :0 lIt;i::cl! the type of each Dt:" . -. 7r.e disadvantage of this a;:proacr., 
r.o,.;e','er, is ::--.3'C each PE ' ... ould have to contain a cor.siderable arr.cur.t c:- ::':~i.: 
::-.2: ·";01..::':': :-.e'o'er ':€ ~sed, result:'::g ::; 3 was':.e of si:icor: area. -:-:: :save 
3:': ::::cn "real esta:e", ?~A70 generates a d:'f'f'erent ?:"A :-or each :y;e :::- . -, 
each ccntair.:'ng on::'y that logic which is relevant :0 a PE of that :y~e. 
',.,n!::' e :te ;.artic'...:2ar se': of ~rocessor classes er.t..:r:erated above, a::':r.g ·..;i:r. 
tr.e:r associat.ed cC<l1T1ur.ication serr.antics, are specific to :JC~i-VO~I-::':'ke ::-ee-
struct~red r.lacr.ir.es, the ;:·resence of IIboundary conditiens ll disti::g1..:lsr.:'::g 
various classes of processing elerr.er.ts is comnon to mest paral:el 
arcr:itectures. ::; ~ral::'el ~Achir.es cor.figured as a ::':near array, for 
eX2.r.':;:-le, three tYjJ€s cf ?E' s (::'ef:'-:1est, ri&~tmost, and i::ter.-::ec:a:e) ... :;;.: :-:: 
require a3 :1aoy a3 :'.1oe PE type" (ceotral PE'.s, norCh, :south, ea.s': a::C: .. ~:!: 
2E'3, a::d ~te raur corner PE'.s). 
3. :831go Goal" ror FLAro: 
Several goal:s .ere formulated when .or~ began on the PLA70 ~rogr~: 
1. The er~1neer :should be able to define the PLA'.s for all ?E :y~es 
with a :si::g1e hign-level definiCion. 
2. The program .scould produce the :scallest ~0"s1ble PLA cor~i.sCer.: 
.1Ch a given :set of VLSI design rules. 
3. The program :should be integraced w1th all oCher layouc ar.d 
"i::ulac10n too13 ecployed for PE design. 
4. The program s.hould execute with absolutely no intervent:"on '"Jy :::e 
design englr.eer. 
:he last goal is intended :0 ::lini:n1ze the ;:ossib:"llty of error:s i;.t:-~.:_~~: _: 
the human ~ser, insuring that all layouts correctly real1ze the ~r.ter.~e~ ~~~::-
:evel fu~ction, and r.eed ~ot be extracted arad s:~ulated ~efcre ~nser:~=~ 
the PE laycut. 
~. The PLA70 :r.put F11e 
cne PLATO ~rcgrac ~s the fact or.:', :::e :.::;: ''': c : .. ___ .. 
~eed :e ~rea:ed :0 generate all four :y;:es cf PLA's. :he 3ys:e~ =~~es _~e 
=::e:onic labels ·.herever pO:S31ble to aid in the :"501a:10n of errors a •. : :~ 
=a~e it ea"ier ':0 identify ?LA input3 and out;:u:s ::: :~e ~:"::1s::ed :ay:~:. .~! 
sa.:e labe13 are -.;sed by a regi3ter cransfer :eve: 3:=I.;:'a:o:- :':r ::::;-'::::l 
;:Jroces5i:--6 .,.l.e:nents t;haC is r.ow ~eir.g de5igr.ed a: :ol.~o!.a. a::d · ... ::i::: .... :..:.:. 
w1~h appropriate sta~e variable ir.pucs, and for each opccde, a :~s: ~~ :~.,. 
:~r.:rol ::~es teat =u~t - ..... -... ...-
,. Cccmands :r.at ce:ine the ln~~t file fOr.TIa:. 
2. Ccmr:ar.ds that cescribe the place:ent of ir.~~ts ar.d c~t~uts :':1 :r.e 
:ayout. 
3. C~ar.cs that describe the lcgical functicr~lity of :r.e array. 
:igure 2 provides a Simple example of a typical PLATe input file. Tr.e ::~s: 
cccnar.d iir.e assigns the r.arnes :~JPUT_', :NPUT_2, L'JPUT.3, AND :~;PU:-_ 4 tv -:.:: 
first four opcode (and, in general, state) bits that · ... E: be er.counte:-e·: '-
left-to-right order. Tne second ccmnand line specifies the order in · .... r.i:r. 
these opcode and state bits should enter the AND plane of the ?LA eli s:::: :-:- .'.-. 
the bottan to the top of the PLA, assllIling that the bits enter U':e .!..';C -' :;. -~ ;""- - .. -
fran the right). The tr.ird ccmnand line in the example file s;:ecifi2.5 ::-.2 
order in which the output lines of ~~e array are to appear (lis~ed :rom :2:: 
to right with the wires leaving the ?LA at the bottom). 
Figure 2 
II :his is an example of a PLATO ir.put file. JUl corm:er.ts are i~ncrec. Ii 
II list of input labels for ir.put file crderir.g II 
:~;PUT_', :NPUT_2, INPUT.3, !NPUT_4; 
" I : ist of same labels, but for PLA ordering II 
:~:PUT_4, :~:Ft:T_2, :NPUT_', :~JPUT....2; 
II :~er., a :ist of output labels are placed in the desired or:er. Ii 
='~:?!';:-_:, :;U:-?U7_2, OUTPUT3; 




, 1 , 1 





:he rest of the file specifies tr.e decoding function of the ?LA. :0 s;:e-::. :': 
t":o.l an ir.struction is to be decoded, ':.he engir.eer :r.akes a 2.is~ wnsis::::5 ::' 
every i::s~n;c:ion cpcode, togetr.er '",i tr. all apf:ropriate s:ate ':ar:a::e _00- ~ - . 
ar.d a list of the control lir.es tr.at must be excited to exec~~e :r.e 
:'::s:n;c,::on ;:r::;:er_Y. :r.e prccessor's state :::ac:::ne:s :-e;:reser.:ed .5:.::::_; 
~.; ;:resent sta:e :5 considered one cf tr.e input bits and ':te next sta:e :~ 
de~ined ;5 :f each bit :n the state =achine ~ere a control ::::e . 
. r.e ?!...A:C :.r.;:ut file :s easier to use ttan a truth -:.a:le t-eca1,.;se ":Gr.': ~~~ 
conditions are considered acceptable logic values for i::;:ut bits. ::-.:s 
facilitates the separation of state Glachine specification frcm i:1s::-·,;c::::-. 
execution specification because the next-state info~at:on r.eed ::ct :e 
:;,cluded :r. each ir:struction decodir.g ccmmand line. This separat:cn 
contributes greatly to PLATO's ease of use and tends to ~ir.~~i:e :~e ~r:~:-
user errors. ?LATO converts this input file !"cnnat ir.to a tr:;::-. :a::'~ :':"7"::: 
· .... tich is tr.en used by such other design aids as :!.ogic :r.ir.:'I::::':::a:i::n ;:r: ~!"'::::::. 
T~e sample :r.put f::e ~resented in Figure 2 shews the S~€cifi:a~icr. -. 
instructions. :he 1-()V_A_3 instruction, 'which causes tr.e contents::- : ..... ~ :. 
register to :e transferred ':0 register 3, is exec:.:ted :y assert:::~ ~''';(' _:..:: ::-:_ 
:':r.es: OC:?~T_ ~ ar.d OUTPU7_2. :n tr.is examp:!.e, the ::::cntrol ::::e CC?'''::_' 
· .... culd be the "read A" register control li"e ar.d tr.e CUTFU7_2 contre:' : ::-.e 
· .. cu2.d :e :r:e ''' .. r: ~e 3" register control ::r:e. Ar.y n:.nter 0:- :or.c.:-:::l : :::e5 
:e Sp€c:::ec: 
:'s assert;C. 
in the case of the s~b:ract i;.s:r~cticn, or~y c"e "',- ... r ,... ""'\ ~ -""',.- ....... 
-::'~'O ex~r3 :;.~~: ~i ts are represented :n t.r.e :r.put file :-or :.r.e :;C~:-':C:; 
;r:c:;:ss:'r:g elar.ent: the "leaf/r.ct.-:.:af" Qnd ":eft-c:--.i::/:-.c~-: ef:-cr~:::" _: :-:~5 
::-.3: · ... ere d:sc:.:ssed earlier. n-.ese ':' .. 0 bits are ar.al:r::ec ':y ::-.e ?:".".:: -;:-:~ .... ::-
~~C~ 5canr.:r.g of the :~put fi!e arad are ~sed to se~ara:e ~r.e s:~~~ ~~;~: 
:r.tc fc~r :~~~r. :ables, each re~re~ent:r.g :he f~cticn c: or.e c: :~e 
::::orresp:r.:::.::g types 0:' ?~A. ?:. gt.:re 3 shCfW s a ~all ;::'e-::e of ::--.e , ....... \' '0"""1" .. \....~,-. '-~' -
r:"A70 :r.p~t fi2.e. Typically, this file · ... ould have rr.cre li::es ::-:ar. ::-.e ::.r..:-:-;" 
of c~~cces in the instr~ct:on set. :"r.e 
for example, has 147 l:::es. 
E" igure 3: 
I· •••••••••••••••••• NON-VON 3 PLA PLATO lnpu~s ••••••••••••••••••• , 
I· !or-__ ~: (l!/nl), mneumon1c, IRO-rR7, SO. 51, ENl, Is, LC, Ctrl-l:r.e~ 
I· 1npu~ 11n •• 1n~o PLA (source) II 
I. IR req ·1 IRa, nu. IR2, IR3, rR4, IRS. rR6. IR7, 
I. PLA ·1 so, 51, 
I. PE ·1 EN1. Is. LC; 
I. same 1npu~ 11nes. bu~ ordered tor PLA ·1 
51, SO, IRa, IRL IR2. IR3. rR4. IRS, IR6. IR7, 
EN1, Le, Is; 
II Ctrl l1nes driven by PLA (des~) II 
I· dpth ·1 
I· RESOLVE 
I· 10 ·1 I· PLA II 
RO l..RAM8 , WR1RAMS, WR lRAMl. RD~l. RDl.M.l.R. RD 1EN1. S E., L"n , 
WRIENl. WR1MAR. RO'IR, WR1BS. WR2B1. WR1Bl. ROTL, R01S1-
ROIBS, WR1A8. WR2Al, WRlAl. RDlAl, RD1A8, ADO, SUB, LCGICA:.. 
WR2Cl, WR2CS. WRICS, '...:RIC1, RDICl. RDICS. WR2IC8, WRlIea. 
WR2IOl, WR1I01, ROIIOl, R02I01. RDII08. RD2IC8, 
·1 
KCA, KCB. 
ORO. DRl. DR2. Ioo. 101, 102. 103. 104. 105. 106, 107. 
SOne~, Slne~; 
I' (a): nl non-lea! '1 
I· (I): l! lea! II 






'f:XXXXXXX x::n.1X I06; 
oxxxxxxx 10XIX SOne~; 
llX1XXXX 10XOX SCne~.Slne~; 
10XOOXXX 00x::n. 100.105.106; 
00000001 101XX WRIBS.RDlAB: 
.. 100XX RDlAS; 
01010001 100XX RDIIC8 ADO; 
.. 101XX WR2C1.WR2C8.ROIIOS ADD-CCMP~_IC8 01010:X1 100XX RDII08 SUB; • 
101XX WR2Al,WR2Bl.RD1I08 Sl~-
ENABLE 011101XX 10XXX SETEN1; , 









10001011 101XX RDL~.IOO.IOS.I07,DRl.CR2; 
lC01~001 100XX RDIBS, Ioo,I02, I04.DRO: 
101XO RDIBS,WR2I08,Ioo,I02,r04.DRO; 
101Xl RDIBS. roo, I02 104 DRO-1110~XXX lOXXX KCA,IOO.IOS,I07,DR1,DR2: 
.. 
" 
llXXX KCB,IOO,IOS,I07 DRl DR2-
OOXXX ORO: '" 
J. A\,;ta:cat~c 'tieinberger Array Layout: 
- ::r-:er ~ ach:~'1e an efficient \,;se of silicon area, ?~~TC ge~era:es a 
- • &; • 
array usi~g a ~ar:ation o~ tte Weinberger Array [3J layo~t :echn:~ue. 
·";eir.ter;~r Ar:-a1, t.t;e ~i&".ly reg'~ar s"r~cture cf a cor.·,er."i::~a: ?~.; :s 
cam~ressed i~to a functional:y equivalent, b~: smal2.er form. :;'e res~::~~ 
:ayout is less reg1..4lar, and cor.ceptually :r.cre complex, tr.ar. a :raci':.:':::-.3:" .:._:. 
2y way of background, an ordinary PLA :ayout consists of an ~ID-~lar.e a~: _ 
CR-~lane. The k~D-plane comprises a set of regularly spaced col~r.s 
incorporating logic gates ca~able of generati!1g the logical cor.';I.:r.c: :::-.5 .. 
its inputs. In the context of the processi~g element appL.caticr., ::-.~.5e 
:r.puts are the instruction opcodes. The CR-plane is construc:ed s:":::::"or:": 
from a set of regularly spacec gating elements that are used :~ ~e~~ra:e :~= 
~ogical cisjunction of the outputs of the ~ID-pla~e gates. ~.e :2-;:a~e ._ 
rotated 90 degrees frcm the orientaticn of the AND plane, allowirg ::-.e : •.. 
of :.he A.'le-plane ':.0 conr.ect :'0 t.he :r.;:L.:ts of :he CR-plane. 
::1 cor.s":.rt..:cti::g ::ost ;:rocessir.g e2.er.er:ts of tr:e kir.d :.Jsed :r. hi~":y ;ara:: -:- ~ 
=acr.:~es, the ~pulat:on of transistors i~ the ~~C-~lar.e :ar exceec~ ~he 
:.~ :!".e C;;:-~:'a::e. For t.r.:s reason, a ccr.s:cerable a.'1:o\,;r.: cf s:2.i::r. area __ 
:Y~:':3::'j ~asted when a conventional ?~A :s ~sed to :-eal~ze tr.e centre: .. - .... 
:.. :~:.: ::: SL:Cr. a processi::g elerr.er.t. :r.e ',.;e:'::beq;er .:'''rray:05 =;:able:f 
;:rJ~:'d:r.g sigi.:'::'cant area savings ~n SL:ch applications. 
::-.:S tecr.r.iql.:e \..:ses an conver.tior.a: array strL.:Ci:L.:re :cr cr.e ."J;::-plane, _' __ 
C :.; ia tes tr.e r.eed for a fwll CR-plar.e. .:"'n exarr.p:"e of a · .... ei.::terger ?:....:... 
::::c::ar. ar.c the CR-pla::e or. t~p. :nstruc::or. cpce~e tits er.:er tr.e • ,0-.-_1_-
p:"ar.e :-,cm ~e :-:gh:. C:or.trol lines ex':';:. '1:e en:ire array :~r:r.: :r.e :0: :x.. 
:1':e col:x:r.s :'n tote A.:!D-plar.e feec :'r::o :he tcp ar.c :r.ake ccr.:ac: o-"'o::h o,.;::--::~ 
:hat r'.;n r.ori:ontally in jX):'ys:2.icor.. -:1:ese ' ... i:-es ex:~::d cr.:'y 3S :-::r 205 ~: 




• c: ~ c: 
lCl I 
_ \.: 121 
~ 




















































:-e<;'.:::-e :::e ;;art:.c:..:.:.ar reswt bei::g carried ':'1 t.r.e · .... ire. ::- :r.e :'::::'~: __ 
:=si6~ed a;pr~priately, several different wires can often sr.are a sir.;:~ :-
':':: :::e CR-plar.e. Ccmpact.ion is achieved through the sr.ared use 0:- ':.;~ :~s; 
careful placer.ent of ~~D-plane columns yie:ds a layout with a ~i~i~a: cr ;~=­
minimal number of tracks. 
Tne authors are r.~t aware of any earlier PLA ger.eratien tools eta: ~ei.er=:~ 
'r'leint::erger Array layouts autcmatlcally. Typically, the layout engir.eer -:-.. _~: 
~2nufacture tr.e Weinberger array by hand. ?LATO, on the other hand, ~~~::~~ ~ 
channel-routing al gori trm (descri bed in the next section) to autc<r.a:::3 ~ ~ '.: 
specify the wiring of the ''';eir.berger array. Unlike ':i:e usual ·:~a::::e: r: ~: ~ 
problem, or:e end of the wire in the channel is connected to an ANC-;::;::e 
eut;;ut · .... hile ti:e other is ti:e gate of a transistor in the CR-plane. ::-.e 
autcmatic Weint::erger array :ayout algori thI! incorporated in ?~.;:!'-: :::::s 
successfwly produced a PLA fer NON-VON 3 that is approxin:ately 2:~ ;:~:;:~:;; 
t~an tr.e corres~nding one produced using conventional ?LA ger.erat:cr: 
":.ecnniqt;es. 
::1 tr.e rare :'r.stances in which ~o ,,,ires ::1 the array share 7.e same tr3':~: _ 
:-.ave transistor gates at the ends that ::ieet, tr.e higr~y oan;;act .~lC-p13r.e 
:21''::';: j:r::r.itives used by PLATO car. result ir. des:g:i rule ·/:01a::'-: •. 5 ::-: : .... ~ 
·.,·-=:'':erger array. PLATO detects these cases ar:d at.:tcrr.atical:'y ;:Jrc,,:.ces e:<::-: 
r~c~ :'n :te array to resolve each cor.flict as illustrated ir. Fi~ure :. 
:'-:1~:'r:cally, hc-.;ever, such cases have been :ound to cccur qt:ite i:-.f!"eG·-':;.:~~· 
C.;: of a to:al o!~ 208 ANC-plane co2.\.r.lns ir: a :;C~I-'iC~l 3 PLA, :or eX27.;:'e, 
4 :::cor;:crate extra s;:;.ace to avoid design rule v:elat:or.s. ?~A7G' s 
. -' 
'':7:ar.2gff:':er.~ ty exce~t:cn" approach tl':us ~e~its ef:ec~i'/e c::.m;:act:cn c:'" ::-.-? 
centrel ;2~ ~i:~ot.:t tr.e introdt.:ct:on of design rul.e errors. 
.. 
::3 ~~itial r.e~-re~resentation: 4 Tracks 
2 3 4 5 
I I I 





I I J I I 
I I I I I 
X---------------X---X 
I I I I 





i 1 i2 i3 i4 01 02 03 
6. Cha~~el Routing Algorithm 
The chennel routi~g algoritrm consists of two basic phases: ~~e :irst ~t=~~ 
sets up a data structure that represents the plac~ent of ccluTl:".s :':". ::--.~'" ~ " 
plene ~ith control lines that leave the OR-~lane ar.d are routed thr:~~. 
~jw-plane. 7r.e second phase ~ermutes this data structure, cr.angi~g :~e 
relative (:Osi tions of all of the colunns ~r. the A~:D-plar.e :':". such a ";OJ 
prcc!uce an CR-plane .... ith a minimun :':Ll!:cer of t.racks, and r.ence :r.e :'eas:: ..... =~-:. 
of silicon area. 
::-.e :.~:. ::'a1 :-orm of the data struct~re, cal:ec! ar. orceri::g, ~s ger.er2:e,: :;:r-
::-.e :;-.':r.~1:ized truth table that re;::resents the ft..:nct:'on to ::e real:'=ec :'j : .. _ 
?:..A. :11e orderir.g is a list of tte ~es~red .!J!D-pla:-:e reslL.ts a;:~€r.ced ~c ~ 
list of the cor.trol line outpt..:ts, em.n:erated in the order in , ... hich t::ey 3r-= 
ap;::-ear in the layout. Each AND resu.a i.s ger.erated by a col1..Iri:-: i:: t.r.e .:_\:::-
~lar.e. :1:e AND ccl-...rr.ns :r:ay 'c€ ~e:;:Jv:.ed a': · ... ill, bu:. the cor.trol :"::':-.e ;::.:: . ...r::" : 
mt..:st retain the order s~€cified by the user. 
:"ne ordering is augmented by a net-list represer.tation cf :he C?-;:lar.e. 
"-he iu'-;D-plane, this net-list is ini:iaEy ger.erated :-rcm U".e -c.n.:-c.h :3:::'2. 
Figt..:re :5 shews an example of tr.is da:.a structure. ;''':ong the bct:cm, :~.~ 
':.tr::'..:? :;; represent A.'ID ccl'..r.:r.s and tte ':'2:e1s :)1 tr.r:'-.;': :.: 
:-e;:reser.t cc::"ro:' :ir.e col'..l!lr.s. T'r.e rest of tr.e f:g'Jre stows 3 ::':~:':3: 
'I " ~ - -;"':1" 
---.., ..... _"'" :-e;:resents a cor.r.ecticn sche=e :e~eer. :r.e i~;:u'Cs ar:c ::-.e 
the C?'-i=:ar.e. At ~~is stage, ·,Jr.e~her a certai;. cor.r:ect.icn ':-et: .. eer. a :-.'::-::_' 
~ire and a vertical column is a ccn~ct cr a transis"or gate .. . :5 :::r:a :e!"":.:_ 
:he problen cf :r.ini.:r.izir.g the nutber of ':racks ir.. :r.e ar:-ay. 
Cnce the ini':ial set'Jp is completed, a depth-fi:-st searcr. ~cr 3 cc~r.ec:::~ 
scheme wi~~ the least number of tracks is per:c~ed. Figure 6 shews :~e 
result of applying this algorithn to t.he net-list depic':.ed :'n ::'~'''::-':: _. 
that the order of the control lir.e columns is preserved, altr.c~gr. :r.e::-
positions have changed. The ordering of the AND plane ccl~r.s r.as:ee:: 
successfully p€nT.uted to allow a connection scher.e of only :r.:-ee ::-3:~~, 
best possible result for this example. At this paint, PLATC :e:e~:~es 









3 Tracks (opti:;;\I.J) 
x---x------x X--x 
x--x---x 
01 ~ , i3 02 i2 03 i4 
PLATO ger.erates the actual layo~t cescr:~t:~n, ex~resset :n Ca::ec~ 
Ir:ter.;,ediate Forn (G?), in '='';0 stages. F:':-st, tr.e MiC-plar.e :5 ,-;e:-.~r:;:~: 
:rcm :;y ;::lacir.g the layout pri.J1iti·/es :'r. these ;:as: :icr:s ~e.scr:'::~: =j' :~.~ 
;:-or::.cn of tr.e trw':h t<3tle. I~ the seccr.d stage, :~e :'5 :::-.::::-
'cy ~enerat::-.g prLT.: :ives in positions specified ':y :.r.e :.e:-:'ist.. ::-.e" 
laycut is produced after labels are attached to a~;::ro;::ria:e ;:;':'aces i:: ::.-: 
':'aycu-c. 
7. Conclusion 
The PLATO tool employs three techniques to mini~ize the area required C' __ 
control paths of processing elements for highly parallel ~~c~ir.es: 
1. The generation of control paths through the automatic ger.eraticr.. 
using a channel-routing algorithm, of Weinberger Arrays. 
2. The automatic generation of ~ultiple types of ?LA ada;:::ed ':.J :~~ 
distinct types of PLA's inccrporated in different ?E's. 
3. :he use of highly compact layout primitives, together ~:~h ar. 
automatic procedure for resolving any resulting desigr. r',;le 
violations. 
2ased on area comparisons between the NON-VON 1 and NON-VON 3 ?:: -.~- ---=~'--"~' 
appears tha: each of these three ~echniques has proven res~!"l5ib:!.e :'·:r 3:', 
recuction on the order of 25~. The r.~vel techniq~es er..bodied ir. :r.e ?~~:: 
sy stem have thus been responsible in large part for our abili ':.y to ~teG -
nLlliber of processing elements in or.e PPS chi;:, which is tree or.e of :r.e 
esser.:i22. cornerstones of the ~lC~i-'JCN apprcach to massively paral:'e':' 
:c~;:;~:at.icn. 
;.=.ferer.ces 
i. 5::3· .... ,nn:e ~CN-VON Su~€rccm;:l\~ter" , Co:'...rr.tia Cc.;:;:t;ter .)c:.~::::~ 
, .:"'ug'..:s'C, i 982 
2. Carver ~ead and Lynn Cor.way, :ntroducticn to '1'..-SI c. :930 .:"'~::::'5~::- "~:_: 
?~blishi~g Co. Reading, ~ass. 
3. A. ~eir.berger, nLarge-scaled :'r.tegration of ~S com~lex logic: 
~ethod", rEEE J. Sol!d-State Cir:uits, vol. SC-2, pp. 132-'90,~e~., 
. .. -.... -
.... _::.4~ 
