Equipment for the implementation of a digital signal processing supercomputer for complex knowledge-based systems by Barnwell, Thomas Pinkney
'77 
GEORGIA INSTITUTE OF TECHNOLOGY 	 OFFICE OF CONTRACT ADMINISTRATION 
PROJECT ADMINISTRATION DATA SHEET 
El ORIGINAL n REVISION NO. 	 
	 GTRC/4S1 	 DATE  3 / 10 / 87  
	  Schoottwbx  EE  
Project No./(Center No.) E-21-624 (R6277 -0A0) 
Project Director: 	T. P. Barnwell, III 
Sponsor: U. S. Army Research Office 
  
Agreement No. :  Grant No, DAAL03 -87 —G-0067 
Award Period: From 	3 / 1 / 87 	To 	2/29/88 	(Performance) 	4/30/88 	Reports 
Sponsor Amount: 
Contract Value: $ 
Funded: $ 
New With This Change Total to Date 
$  144,000  
$ 144,000 
Cost Sharing No./(Center No.)  E-21-381 (F6277 -0A0) Cost Sharing: $ 	36,000 
     
Title: Equipment for the Implementation of a Digital Signal Processing Supercomputer 
for Complex Knowledge—Based Systems 
ADMINISTRATIVE DATA 	 OCA Contact 
1) Sponsor Technical Contact: 
Military Security Classification: 	  
(or) Company/Industrial Proprietary: 	  
John B. Schonk 
2) Sponsor Issuing Office: 
Abram Van Hall 
U. S. Army Research Office 
P. O. Box 12211 
Research Triangle Park, NC 27709 
ONR Resident Rep. is ACO: 





See Attached 	 Supplemental Information Sheet for Additional Requirements. 
Travel: Foreign travel must have prior approval — Contact OCA in each case. Domestic travel requires sponsor 
approval where total will exceed greater of $500 or 125% of approved proposal budget category. 




Research Administrative Network 
Research Property Management 
Accounting 
SPONSOR'S I.D. NO 	  
Procurement/GTRI Supply Services 
Research Security Services._. 






Continued by Project No. 
-;_ 





GEORGIA INSTITUTE OF TECHNOLOGY OFFICE OF CONTRACT ADMINISTRATION 
SPONSORED PROJECT TERMINATION/CLOSEOUT SHEET 
Date 6/ 3/88 
Project No. E-21-624 
Includes Subproject No.(s) N/A  
School/Lab EE 
Project Director(s) T.P. Barnwell cGTRC rIT 
Sponsor 	U. S. Army Research Office 
Equipment for the ImPlemenration of a Dizital Siznal Prorpaqing 
Supercomputer for Complex Knowledge -Raged Syqtemc 
Effective Completion Date: 2/29/88 
 
(Performance) 4/30/88 (Reports) 
 
    
Grant/Contract Closeout Actions Remaining: 
   
None 
ICJ Final Invoice or Copy of Last Invoice Serving as Final 
ED Release and Assignment 
Final Report of Inventions and/or Subcontract: 
Patent and Subcontract Questionnaire 
sent to Project Director 171 
EED Govt. Property Inventory & Related Certificate 
F--1 Classified Material Certificate 
ED Other 	  
Title 
Continues Project No. 	  
COPIES TO: 
Project Director 
Research Administrative Network 
Research Property Management 
Accounting 
Procurement/GTRI Supply Services 
Research Security Services 
Reports Coordinator (OCA) 
Program Administration Division 
Contract Support Division 
EQUIPMENT FOR THE IMPLEMENTATION OF A 
DIGITAL SIGNAL PROCESSING SUPERCOMPUTER 
FOR COMPLEX KNOWLEDGE BASED APPLICATIONS 
Final Report Submitted to the 
THE ARMY RESEARCH LABORATORY 
Research Agreement No. DAAL03-87-G-0067 
by 
School of Electrical Engineering 
Georgia Institute of Technology 
Atlanta, Georgia 30332 




Thomas P. Barnwell III 
Professor of Electrical Engineering 
and Rockwell Fellow 
Contents 
1 Introduction 2 
2 The Original Laboratory Network 2 
3 The Proposed Laboratory Network 3 
4 The Funding Constraints 7 
5 The Equipment Upgrade 12 
1 
1 Introduction 
This contract was for equipment to integrate a digital signal process-
ing (DSP) multiprocessor supercomputer, called the Optimal Synchronous 
Cyclo-Static Array (OSCAR), into the experimental environment at the 
Digital Signal Processing Laboratory at Georgia Tech. The equipment re-
quested consisted of two major components: a VAX 11/785 computer sys-
tem to be used as a control processor for the OSCAR; and a Symbolics 3645 
LISP processor, to be used for knowledge-based signal processing applica-
tions both for the OSCAR and other areas of the DSP research program. 
The OSCAR project, which was being funded as part of the DARPA mul-
tiprocessor program on Strategic Computing, was an out-growth of theoret-
ical research funded under the Joint Services Electronics Program (JSEP) 
program. The proposed equipment was needed in order to allow wide and 
flexible access to the OSCAR so that it could be both effectively developed 
and also widely utilized in the total basic research program. The research 
areas which were to be directly effected by this equipment included optimal 
implementation of DSP algorithms on synchronous multiprocessors, image 
enhancement and reconstruction, computer vision, adaptive systems, and 
VLSI design of multiprocessor systems. All of the above areas are currently 
supported by the DoD at the DSP Laboratory at Georgia Tech. 
In addition to areas of research directly effected by the OSCAR, the 
LISP processor was also to have a major impact on many other areas of 
DSP research as well. These include knowledge-bases spectrum analysis, 
speech recognition and coding, image coding, computer vision, and objective 
measures for speech quality. All of these areas are also currently supported 
at the DSP Laboratory by either the DoD or other funding agencies. 
2 The Original Laboratory Network 
Figure 1 shows the layout of the computer network for the DSP Laboratory 
at the time this equipment upgrade was proposed. At that time, the net-
work utilized four computers: a Data General MV10000 running AOS-VS; a 
Data General Eclipse S250 running AOS; a Data General Eclipse S250-AP, 
2 
also running AOS; and a VAX 11/780 running UNIX. All four of these ma-
chines were connected in a network. All of the Data General computes were 
interconnected using an 8 megabit/second parallel bus called a multiproces-
sor communications adapter, or MCA. All three Data General Computers 
ran the Xodiac network, which uses an X.25 protocol. The VAX 11/780 
was connected to the MV10000 using an ethernet. The VAX connected into 
the network using TELNET and the TCP/IP protocol. All the resources 
from all the processors were accessible from anyplace in the network. 
Each computer in the network was used for a particular class of activities. 
The MV10000 was the primary DSP research machine, and supported the 
bulk of the DSP simulations and implementations. The Eclipse 5250 was 
used exclusively for supporting educational activities. The Eclipse S250-AP 
was primarily a network server which supports the array processors and 
the data acquisition system. The VAX 11/780 was dedicated to VLSI and 
microelectronics research. All of the computers were then being used to 
capacity, operating twenty four hours a day, seven days a week. 
3 The Proposed Laboratory Network 
The primary purpose of the proposed research equipment upgrade was to 
enhance the computational capacity of the computer network, to allow the 
OSCAR to be effectively utilized in a broad spectrum of DSP research activ-
ities and to enhance the environment for knowledge-based signal processing 
research. To achieve this goal, two different components are required. The 
first was a powerful, general purpose computer which could be used to in-
tegrate the OSCAR into the existing computer network. The second was 
a dedicated LISP machine to be used with the OSCAR and other research 
activities in large, complex system realizations. The LISP processor would 
have the additional advantage of substantially improving the experimental 
environment of all of the knowledge-based signal processing research. 
The network upgrade which was proposed is shown in Figure 2. The 
computer system which was proposed to support the OSCAR was a VAX 
11/785 from the Digital Equipment Corporation (DEC). The system in-













































Figure 1: Original DSP Computer Network. 
4 
accelerator, 6250 bpi magnetic tape, two unibus chassis, 1.2 gigabytes of 
mass storage, and 32 asynchronous terminal ports. The system was specif-
ically configured to run Berkeley UNIX 4.2. 
The system which was proposed to support the knowledge-based signal 
processing research was based on a Symbolics 3645 processor. The system 
included a Symbolics 3645 processor with .38 gigabytes of mass storage, 4 
megabytes of additional main memory, a floating point accelerator, a laser 
printer, and all the options needed to perform image processing functions. 
The new equipment was to be integrated into the existing computer 
network as shown in Figure 2. Both the VAX 11/785 and the Symbolics 
3645 were to be connected to the existing ethernet which then connected 
the MV10000 and the VAX 11/780. Since the new VAX 11/785 would also 
run UNIX and since UNIX network software was to be purchased with the 
Symbolics system, the new equipment was to integrate immediately into 
the existing network, and would be accessible to all network users. 
Table 1 shows a summary of the total cost of the proposed upgrade, 
including the cost of maintenance for the contract duration. The total cost 
of the research equipment was $503,864. Of this, $125,966 was provided as 
matching by Georgia Tech. Hence, the total funding requested is $377,898. 
Item No. Specification Amount 
1 Cost of VAX 11/785 System 371,980 
2 Cost of Symbolics 3645 System 173,495 
3 One Year Maintenance 34,016 
Subtotal 579,491 
1 Discounts on VAX 11/785 55,017 
2 Discounts on Symbolics 3645 20,610 
3 Georgia Tech Cost Sharing 125,966 
Subtotal 503,864 
Total Cost 377,898 
























































TCP/IP ETHER NET 
32 SERIAL 




Figure 2: Proposed DSP Computer Network. 
4 The Funding Constraints 
The equipment which was actually purchased and installed differed sub-
stantially from that described in the original proposal. There are three 
basic reasons for these variations. First, and most important, the proposal 
was not fully funded. In particular, only $144,000 was made available from 
ARO. When combined with the matching money from Georgia Tech, a total 
of $180,000 was available for the upgrade. Thus, it was simply not possi-
ble to acquire all of the equipment in the original proposal. Second, the 
money did not become available until around April, 1987. By that time, 
the cost and features of commercially available equipment had evolved con-
siderably. Thus, better equipment choices were available than when the 
proposal was originally written. Finally, the nature of the OSCAR project 
had also evolved considerably. The original multiprocessor was fully de-
signed, but due to funding problems at DARPA, was never built. Instead, 
a different OSCAR was designed and built. 
The basis of the new OSCAR design is the AT&T WE-DSP32 float-
ing point signal processor. The WE-DSP32 has a Control Arithmetic 
Unit (CAU) for 16-bit integer and control functions, and a Data Arith-
metic Unit (DAU) for 32-bit floating-point operations. The CAU is capa-
ble of 4 MIPS and the DAU can perform up to four million floating-point 
multiply-accumulates per second, or 8 Mflops. The device has internal mem-
ory consisting of 4 kbytes RAM and 2 kbytes ROM. The 100 PAP version 
of the WE-DSP32 can access up to 56 kbytes of off-chip RAM with a 32-bit 
data bus. Powerful I/O capabilities are provided by the serial (SIO) and 
parallel (PIO) ports; data can be transferred through both of these either 
under program control or by direct memory access, which is transparent to 
any application running on the WE-DSP32. The serial port permits data 
exchange with either an external device or another WE-DSP32, while the 
parallel port permits a simple interface with a controlling microprocessor. 
The latter contains a number of registers which are directly addressable on 
the PIO bus. The host microcomputer can examine or change WE-DSP32 
memory locations by placing the appropriate address in the PIO Address 
Register (PAR) and reading or writing the contents of the PIO Data Regis- 
7 
ter (PDR). If DMA and auto-increment are enabled, a contiguous block of 
memory may be accessed by specifying only the starting address and then 
doing repeated reads or writes. 
The host microcomputer has access to all of the WE-DSP32's memory 
through the parallel port, which permits examination and modification of 
memory locations and even downloading of new programs while the proces-
sor is running. However, although the DMA is transparent to the processor, 
it does incur "wait states" spanning several instruction cycles during which 
program execution is suspended. 
The WE-DSP32's instruction cycle time is 250nS; memory cycle time is 
125nS. A 160nS version of the WE-DSP32 will be available in early 1987. 
The memory space of the WE-DSP32 is divided into two banks which permit 
interleaved accesses, such that four memory accesses may occur in each 
instruction cycle if bank accesses are alternated. Each instruction cycle 
consists of four states. On the address and data buses the four states 
are instruction fetch, one memory write and two memory reads. If two 
consecutive accesses are made to the same bank, a wait state lasting for 
one quarter of a cycle is automatically generated. 
The multiprocessor architecture must permit as much interconnection as 
possible between processing elements subject to the constraints of cost, size 
and feasibility of physical interconnection of the large number of 50-bit wide 
buses (32 bits for data, 14 bits for address and four bits for control signals). 
It should provide a means for data to pass between constituent processors 
as freely as possible and with as little delay as possible. To ensure that a 
finite number of cyclo-static solutions exist for any given solvable problem, 
the data path delays between pairs of processors and between local and 
external memory for any processor must be equal and constant. 
The architecture chosen consists of a number of motherboards to each 
of which are connected up to five constituent processor boards. The moth-
erboard consists of five banks of four IDT7130-70 dual-port memory chips, 
giving 1k x 32 bits of dual-port RAM per bank, plus address decoding logic 
and connector banks. Each bank of dual-port memory forms a commu-
nication path between two constituent processor boards. The dual-port 
memory, while not permitting simultaneous access to a memory location 
8 
from both ports, is fast enough to permit two memory cycles during each 
processor cycle; this ensures that an operand stored by one processing el-
ement may be accessed by another during the same processor cycle. Each 
processor board also contains a bank of 1k x 32 bits of dual-port mem-
ory, which is used to form a communication path with another processor 
board. Thus each constituent processor board can communicate directly 
with up to four neighbors - two through the motherboard, one through its 
own dual-port memory and one through that neighbor's dual-port mem-
ory. This is illustrated in Figure 3. Thus a single motherboard with five 
constituent processor boards forms a fully connected synchronous multipro-
cessor. A large variety of architectures can be formed using any number of 
constituent processor boards. A simple synchronous ring may be formed 
simply by chaining processor boards together through their dual-port mem-
ories; more complex structures may be assembled by combining any number 
of constituent processor boards and motherboards with less than full con-
nectivity. 
Figure 4 shows a block diagram of a constituent processor board. Each 
constituent processor board may be equipped with up to three WE-DSP32's 
- one 100-pin device which handles the external memory interface, and two 
40-pin devices. Each device has its own interface to the host microcomputer 
(through the parallel ports) and the devices are chained together through 
their serial ports. Thus each processor board is itself an asynchronous 
multiprocessor, capable of up to 24 Mflops; with the 160nS version of the 
WE-DSP32, each board is capable of up to 37.5 Mflops. Each processor 
board is equipped with 8k x 32 bits of local memory, plus either a dual-port 
memory bank for use in the multiprocessor or another bank of 8k x 32 if 
the board is to be used as a stand-alone processor. 
One constituent processor board is designated to have master clock, and 
all others are clocked from that board's clock signal. Each constituent 
processor board will look in the appropriate bank of dual-port memory to 
access data generated by another processor board; the compiler is designed 
to ensure that no processor looks for data before it has been generated 
and stored by the appropriate processor board. Each constituent processor 
































P2 I P4 1 P3 
MULTIPROCESSOR MOTHERBOARD 
: 	C.P.B. 	: 	C.P.B. 	: 	C.P.B. 	: 	aP,B. 	. 	C.P.B. 
MULTIPROCESSOR COMMUNICATIONS ARCHITECTURE 
Figure 3: OSCAR-32 Communications Board Layout. 
10 
1 
SAMPLE CLOCK j 











DATA I IN 
i 
SERiAL 
DATA d UT 
DATA AND ADDRESS 
BUSES 
BK x 32 
LOCAL 
MEMORY 





 L 	  
EXTERNAL DATA AND 
CONSTITUENT PROCESSOR BOARD 
Figure 4: OSCAR-32 Constituent Processor. 
11 
memory (which will be available to itself and one other processor board) or 
in all blocks of dual-port memory to which it has access, effectively making 
data available to all its neighbors simultaneously. 
5 The Equipment Upgrade 
With the reduced funding, it was not possible to upgrade the laboratory 
equipment as proposed in the original proposal. However, the goals of the 
upgrade remained essentially the same as in the original proposal. Stated 
briefly, these were the fundamental increase in the computational capacity 
of the laboratory network, the integration of the OSCAR multiprocessor 
into the overall laboratory environment, and the creation of a broadly based 
LISP capability. Because of the reduced funding, it was decided that it was 
impractical to purchase a dedicated LISP machine. 
The compromise which was realized is shown in Figure 5, which shows 
the laboratory network after the upgrade. The laboratory upgrade consists 
entirely of the addition of a Multiflow Trace Seven Mini-supercomputer. 
As is shown in Figure 5, the Trace is integrated into the network on the 
TCP/IP ethernet. The Trace computer had four specific advantages for the 
laboratory upgrade. First, it is a extremely powerful computational pro-
cessor in which its arithmetic capacity is easily accessed by it users. It can 
hence become both the primary computational resource for the network and 
also control/communications processor for the OSCAR. Second, the Trace 
runs standard UNIX and its multiprocessor capacity is easily and automat-
ically accessed through standard FORTRAN and C programs. Third, its 
high-speed VME bus and its network interconnection make it a very flexible 
environment in which to integrate the OSCAR. Finally, because it was a 
newly introduced processor, the Multiflow company was willing to give it 
to Georgia Tech at an extremely attractive price. 
The equipment purchased is shown below in Table 2. The entire $180,000 
was spent on this equipment. The equipment is fully delivered and installed, 
and has been in operation since October, 1987. The Trace has proven to 







































































Figure 5: Final DSP Computer Network. 





7/200B-AB 7/200 CPU and Cabinet 
32 Megabytes Main Memory 
VME I/O Processor 
Disk Controller 
1.1 Gigabyte disk drive 
Cartridge Tape Drive 
16-line Asynchronous Multiplexor 
Video Console Terminal 
4.3 BSD based Unix Operating System 
2 1 
Multiflow 
MU200-CG 32-64 Megabyte Memory Upgrade 
3 1 
Multiflow 
EC100-AA Ethernet Controller 
4 1 
Multiflow 
SW110-AA Trace Scheduling Fortran Compiler 
5 1 
Multiflow 
SW120-AA Trace Scheduling C Compiler 
Table 2: Equipment in the Multiflow Trace. 
14 
