Evolution of the NON-VON Supercomputer by Shaw, David Elliot
EVOLUTION OF THE NON-VON SUPERCOMPUTER 
David Elliot Shaw 
CUCS-79-83 
CUCS-79-83 
EVct.UT:ON OF !HE SON-VON SUPERCDI'!PUTER 
David Elliot Shaw 
Depar~~ent of Computer Science, Columbia Univer~ity, New York, New York lC027 
ABSTRACT 
~lCN-VCN i~ a ver-y high performance 
experimental supercomputer, portions of which are 
now being implemented at Columbia University. If 
our efforts are successful, it Should be possible 
to construct NON-VON machines of various Sizes 
that could ul timately ~upport the extremely rapid 
execution of a wide range of information 
processing tasl<.s relevant to the defense comnuni ty 
in a highly cost-effective manner. This paper 
brieny sketches the most important aspects of the 
NON-VON architecture, identifies a few of our 
.current architectural objectives, and describe~ 
the ~sed hardware implementation plan we have 
adopted for the next three years. 
INTRODUCTION 
NON-VCN [Shaw, 1980 j Shaw, 1982] is a 
massively parallel non-von Neumann supercomputer 
archi tecture tilat has been under investigation at 
Columbia since 1980. The machine was originally 
designed to provide highly efficient support for 
the kinds of symbolic information processing tasl<.s 
that seem to arise frequently in the context of 
large-scale artificial intelligence and database 
management applications. While such tasks remain 
our primary focus, we have since ccme to suspect 
that the NON-VON architecture may prove applicable 
to such d1ver~ application areas as ~1gnal 
processing, physical simulation, and low-level 
computer vision as well. The architecture ~oul,d 
prov ide a CCXI1Ilon basis for the con~truction ofP 
of physical size~, extending from compact emoeCded 
system~ to centralized large scale supercomputers. 
!his paper identifies a few of our current 
architectural objectives, and de~crlbes the pha~ed 
hardware implementation plan we have adopted for 
the next three years. Due to space limitations, 
software consideration~ will not be discussed in 
this paper, despite the fact that they r.ave 
occupied a large fraction of our time. 
PROJECT GOALS 
The follOWing goals are central to the NeN-
Vctl Project: 
1. The experimental construction of 
working prototypes of the NON-VON 
fCllllly of machine~, in an" at tempt ~ 
validate certain innovative 
architectural principles that could 
have important practical implications. 
2. The development of languages, 
translators, and operating systems 
capable of effectively explOiting the 
potential parallelism of such machines 
without the introduction of prOhibitive 
software complexity. 
3. The implementation of a modest corpus 
of working applications software that 
demonstrates NON-VON's potential 
advantages in the context of different 
kinds of computational tasks. 
Our approach to the solution of the hardware-
rela~ aspects of these goals involves: 
1. The extensive intermingling of 
processing and memory resources, 
supper-tina Cla:l.Sivl! "fine granularity" 
_ parallelism. ver-y high perfonnance machines hav 1ng a wide range 
1 • 
Thi!! re~earch was supported in part by the Oefen~e Advanced Research Projects Agency under contract 
N00039-80-G-0132. 
Invited paper, EASCON '83, Washington, D.C., November, 1983 
2. :he constructlcn of ~aC~lnes based on 
hetercger~ous ~nter~nnec~lon 
topologie3, and incorporating both 
"large" and "small" ;::roces3ing 
elanent3. 
3. The provislon of hardware support for 
both S!!"l) and ~mID control regimes. to 
support a wide range of parallel 
algorithms involving different mode3 of 
inter-proces30r communication. 
STAGED DEVEl..G~ENT OF TIlE ~CN-VCN ~CH!..'1E 
During the coming three-year period, we plan 
to ~oceed 1n several 3tages toward the 
satlsfaction of our long-range goals. Our 
(partially overlapped) three-stage development 
stra~gy is de.!!igned to minimize the ri3K involved 
in developing a highly unconventional 
supercomputer. We plan to begin by implanent1ng 
and testing a relatively simple machine which 
nonetr.eles.!! incorporates what ' .. e regard a.!! the 
most essential element.!! of a full-scale NON-VON 
supercomputer. Architectural enhancement.!! are to 
~ added in stage.!!, yielding incremental increase.!! 
in power and generality without the introduction 
of an unmanageable increase in conceptual or 
engineering complexity at any .!!ingle stage. 
:~ON-VON 1 
The first version we intend to implement, 
IOhich we call NON-VON' 1, comprises three 
subsystems: the Primary Proces.!!ing SYbsystem 
(?PS), the SeCQndary Processing SYbsy.!!tem (SPS) 
and the Control Processor (CP). Briefly, the PPS 
incorporate.!! a large mmber of simple, highly 
area-efficient Small Proce.!!.!!ing Elements (SPE's), 
configured a.!! a binary tree. Each SPE is much 
smaller than a conventional microproce.!!.!!Or, 
allowing between 8 and 16 SPE'.!! to be embedded 
within a single cu.stom nHDS integrated circuit 
clup. fie have recently completed and are now 1n 
the proceS.!! of ~st1ng and debugging, a circuit 
containing just one SPE, from which we hope to 
gain valuable information at the functior~l and 
electrical levels. Each SPE contains an eight-b1t 
comparator unit, a one-bit ALU, a 64-by~ local 
random acce.!!!! memory, and a small amount of logic 
for er.abl1ng and disabling the SPE and for 
conrnunicat1ng ' .. i:n oU1er SPE's in :r.e i'PS. 
The CP i3 a conventional ger~ral ;::urpose 
computer that broadcasts instr~cticns to ~ 
executed 3imultaneously by all ?E's i:'1 ::.he FPS. 
Because only a single CF will be incorporated :0 
U1e NON-VON 1 prototype, the :nachine ' .. 111 :e 
limited to single instruction stream. :nyl:i;le 
Cata stream (SIMD) applications, in './I''.J,ch a ::.ne CF 
sends instructions to be executed !o "lOCK ste~" 
by all proce:!sing elements. (As Io'e shall s~. 
this re.!!triction is to be relaxed in later 
version.!! of the macr.!ne.) 
The SPS is based on a nunber of "intelligent 
disl<.!!" whose individual disk heads are eacn 
aSSOciated with a snail anount of hardware capable 
of dynamically examining the data that pass 
underneath them, and passing selected record.!! 
along to the PPS in a highly parallel fa.!!hion. 
Al though the SPS is a key part of the NON-VON 
archi tecture, we have not yet begun to implement 
this sub.system. For this reason, our NON-VON 1 
prototype wUl be limited to the execution of SIMD 
algori thm.!! in which the argunent and resul t do not 
exceed the ca~city of the PPS. 
Unlike more recent versions of the 
arChitecture, NON-VON 1 performs all aritnmetic 
and logical operations except for comparison 
(which ha.!! special importance in ~ost NON-VON 
algorithms) in a bit-serial fashion, and is rather 
limited it.!! choice of operands for most 
instruction.!!. Because only one SPE is embedded on 
our initial prototype ~OI~VON 1 chip, a relatively 
lew priority wa.!! placed on the minimization. of 
silicon area; detailed measurements of the SON-VCN 
1 layout have, hewever, fonned the basis for the 
highly efficient floor plans now under Cevelopment 
for use in later ver.!!ions. 
for the sake of completeness, ·it is pro.bably 
worth mentioning at this point that the nane NON-
VON 2 was as.!!igned to an interesting arChitectural 
exercise that we do not currently plan to carry 
beyond the "paper-and-penci!" stage, although l:S 
essential ljeas may · .. ell :nfluence future NON-VON 
design~. 
NON-VCN 3 
The machine we now call NCN-VON 3 for.ns the 
basis for much of the work we plan to do during 
the next three year~. Like NON-VON 1, our NON-VON 
3 prototype will include no disk drives and only a 
single control processor. and will thus capable of 
executing only SIMD algorithms in which the data 
doe~ not exceed the capaci ty of the PPS. The 
machine will be similar in most respects to the 
erigiral NON-VON 1 design,- but will incorporate a 
m.mber of improvements suggested by tile re~ul ts of 
our initial experiments in chip design and 
software develollllent. In particular, tile NON-VON 
3 SPE will feature: 
1. An area-efficient eight-bit ALU to 
replace tile one-bit ALU incorporated in 
the prototype New-VON 1 SPE chip. 
2. Fewer local registers, based on NON-VON 
1 area measurements and software 
simulation results. 
3. A far better floor plan, for.nul.ated 
using precise measurements taken from 
tile pr.oUJtype chip. 
4. A generalization of certain NON-VON 
in~tructions to support the more 
efficient execution of many common 
in~truction sequences. 
The NON-VON 3 instruction set is nearly 
identical to, and witil few exceptions, more 
general than the one employed in New-VON 1. Some 
ef the additions in fact correspond to commonly 
used macros 1n our existing ~ON-VON 1 software. 
aefore adopting this instruction set, however, we 
were careful to insure that all existing NON-VON 1 
software could be simply and mechanically 
translated into NON-VON 3 instructions, so that 
none of our work to date would be lost. Such a 
translator should be completed shortly. 
Translated progrCIII~ will take advantage of some, 
but not all of NON-VON 3' ~ enhancements. In the 
fu'ture, of course, NON-VON 3 software will be 
written using New-VCN 3 instructions, allowing the 
exploitation of all of tilese feature~. 
~ION-VON 4 
The NON-VON 1 and 3 T.ac!ll~es sr.eulc ser-, e :~ 
validate many of our most iwportant arcr.itec~ural 
ideas, yielding major ~erfor.nance ~mprovements en 
a number of problems ar.er4ble to S~~D execut~on. 
The more sophisticated ~ION-VON 4 arcrJ.tecture. 
though, is intended to provide fer tl':e h~gnly 
efficient execution of a much · ... ider range of 
computational task.s tran NON-VON 1 and 3. The 
most significant enhancements we expect to 
incorporate in NON-VON 4 involve the acdition of a 
few thousand large proceSSing elements (LPE's) 
within tile top portion of the PPS tree, all 
interconnected in a high-bandwidth interconnection 
network, and each capable of serving as a control 
processor for an independent PPS subtree. This 
should give NOH-VON 4 the capacity for multiple 
instruction stream. multiple data stream (MOO) 
and multiple S!HD (HSIMD) operations, multi-
tasld.ng and mul ti-user application", and such 
problems as ~Sical ~imul.ation for which the top 
of the NON-VON 3 tree would otherwise represent a 
significant communication bottleneck. 
'fie hope to realize an additional 
multiplicative factor in total throughput by 
reducing the effective instruction cycle time 
(which i" equal to the time requirea for parallel 
inter-SPE communication) far below the estimated 
two microseconds projected for NON-VON 1 and 3. 
Among the techniques we plan to employ to achieve 
such an improvement are a separation of 
in"truction broadcast and inter-SPE data 
communication functions, the provision of a wider 
instruction broadca"t data path, local caching of 
instructions, and tree pipelining of blockS of 
lnstruction" during tranrfer to the :oc31 cacnes. 
Rough initial est1mates sugge~t that these 
techniques might reduce average instruction cycle 
tUne by as much as a factor of four or five. 
Another important feature of the NON-VON U 
design i" the incorporation of a large number of 
standard, commercially available dynanic RAM 
chips, which we expect to couple ti&htly to the 
~nelvic~l ??S Cnl~. ~hlle we expect ChlS RAM to 
:Je usee in several C::':ferent ways wi tt'.in the ~ION­
'ICN 4 machine, one of its most import.:lnt functions 
·..;ould be as a high banawidth "swal=ping memor/", 
3llowing data to be very rapidly transferred to 
and from the many local RAM's embedded wi thin the 
pPS. 
ACKNOowl.Erx.EMENTS 
The following individuals have made 
substantial contri butions to the theory, 
arcnitecture, design, implementation, simulation, 
and progranming of the NON-VON machine: Aaron 
Ackman, Dave Bacon, Raju Sopardikar, Peter Brajak, 
Tapan Chakraborty, Jane Chan, Oong Choi, Dayton 
Clark, Yoram Eisenstadter, Tyrone Faas, Bob Floyd, 
3ruce Hillyer, Lincoln Hu, ~4rk Huppert, Hussein 
Ibrahim, Kevin Kalajan, Oon Knuth, Andrew 
Kosoresow, Stuart Kreitman, JOhn Lai, Michael 
l.ebowitz, Deep;! 1"4jmuctar, Ted Markowitz, Dan 
. ~1ranker, Robert Montay, Abed Mougharbel, Reynaldo 
~ewman, Terry /lewton, Alessandro Piol, Rupan Roy, 
Ted Sabety, Ella Sanders, Sanjiv Shanna, Salvatore 
J. Stolfo, Arthur Suo, Steve Taylor,Danny SYkora, 
Shun Ueda, Gio Wiederhold, Michael Weisberg, and 
Terry Winograd. The author gratefully 
acknowledges their contributions. 
REFERENCES 
Shaw, David Elliot, Koowledgt-BaM!d Retrieyal 
on a Relational Database MaChine, Ph.D. Thesis and 
Coml=uter SCience Department Technical Report, 
Stanford University, Hay, 1980. 
Shaw, David Elliot, ~e NON-VON 
Supercomputer", Technical Report, Department of 
Computer Sc1ence, Columbia UniverSity, October, 
1982. 
