Array processor architecture connection network by Shafer, Philip E. et al.
United States Patent [I91 
Barnes et al. 
(1 11 4,365,292 
[451 Dec. 21, 1982 
[54] ARRAY PROCESSOR ARCHITECI’URE 
[75] Inventors: George H. Barnes; Stephen F. 
CONNECTION NETWORK 
Lundstrom, both of Wayne; Philip E. 
Shafer, Holmes, all of Pa. 
[73] Assignee: Burroughs Corporation, Detroit, 
Mich. 
[21] Appl. No.: 97,419 
122) Filed: Nov. 26, 1979 
[51] Int. C l . 3  .............................................. CO6F 13/06 
[52] U.S. Cl. .................................................... 364/200 
[58]  Field of Search ... 364/200 MS File, 900 MS File; 
301/241, 242, 243 
[561 References Cited 
U.S. PATENT DOCUMENTS 
3,200,380 8/1965 MacDonald et al. ............... 364/200 
3,226,689 12/1965 Arndahl et al. ..................... 364/200 
3,401,380 9/1968 Bell et al. ............................ 364/200 
3,537,074 10/1970 Stokes et al. ........................ 364/200 
3,593,302 7/1971 Saito et al. .......................... 364/200 
4,044,333 VI977 Auspurg et al. .................... 364/200 
4,051,551 9/1977 Lawrie et al. ....................... 364/200 
4,099,235 7/1978 Hoschler et al. ................... 364/200 
4,101,960 7/1978 Stokes et al. ........................ 364/200 
OTHER PUBLICATIONS 
Lawrie, D. H., “Access and Alignment of Data in an 
Array Processor”, IEEE Trans. on Computers, vol. 
C-24, 1975, pp. 1145-1 155. 
Primaty Examiner-Gareth D. Shaw 
Assistant Examiner-Thomas M. Heckler 
r571 ABSTRACT 
A connection network is disclosed for use between a 
parallel array of processors and a parallel array of mem- 
ory modules for establishing non-conflicting data com- 
munications paths between requested memory modules 
and requesting processors. The connection network 
includes a plurality of switching elements interposed 
between the processor array and the memory modules 
array in an Omega networking architecture. Each 
switching element includes a first and a second proces- 
sor side port, a first and a second memory module side 
port, and control logic circuitry for providing data 
connections between the first and second processor 
ports and the first and second memory module ports. 
The control logic circuitry includes strobe logic for 
examining data arriving at the first and the second pro- 
cessor ports to indicate when the data arriving is re- 
questing data from a requesting processor to a requested 
memory module. Further, connection circuitry is asso- 
ciated with the strobe logic for examining requesting 
data arriving at the first and the second processor ports 
for providing a data connection therefrom to the first 
and the second memory module ports in response 
thereto when the data connection so provided does not 
conflict with a pre-established data connection cur- 
rently in use. 
5 Claims, 9 Drawing Figures 
I’ 
jJ 2.8 x d-- PROC.0 PR0C.l 
Dl 1 
2i 
https://ntrs.nasa.gov/search.jsp?R=20080004202 2019-08-30T02:24:35+00:00Z
U.S. Patent Dec. 21,1982 
1 
DATA 
D BASE 
MEMORY 
OB N 
CONTROLLER 
2.2 X IO9 BITSISEC. 
- 
Fiu. 1 
Q I> 
TO/FROM F I L E  
MEMORY 
Sheet 1 of 6 4,365,292 
'I9 
17 
d 
EXTENDED  EM^^^ 
EM0 EM( MEMORY 
1 4 
13 1 
2.8 x loll BITS/SEC. -
-7 
TO/FROM SUPPORT 
PROCESSOR SYSTEM 
4 
- - 
CONNECTION 
I5 -- NETWORK (CN) 
COORDINATOR 
(CR) 
I 
4 
2.8 X 10" BITSISEC 
-- > 
PROC.0 PROC. I PROC. 511 
2 3  --- CN 
BUFF 
I 
21 
25--- EU 
27 P M  
- - -  
U.S. Patent Dec. 21, 1982 Sheet 2 of 6 
I 
I - v .  -- - - - - - - 
COORDINATOR -,21 
4,365,292 
PROCESSOR I 
/ 34 .- - - - 
Fiu. 2 
(CR) - S  0 
-.R ENABLE GO . 
BUFFER (CNB) 
-- 
I -  
TO/FROM SUPPORT 
-PROCESSOR SYSTEM 
1 3 8  
- Fig. 3 ,-I COORDINATOR 1, 
“1 29 
I -  I 
I 
,1 j--- -1 
1 is e I 
I 29 0 I 
21 ’ 
CONNECTION 
9 13 \ 
t--L- i 
I - ,  
I 0 I 
I I 
I I 
e 
0 
19 
I 1 
MEMORY 
17 
U.S. Patent Dee. 21, 1982 Sheet 3 of 6 4,365,292 
SYSTEM r 1  
38 I 
COMMUNICATION -, 
REGISTER 
Fig. 4 
V 
DBMC 
MEMORY 
STORAGE UNIT 
(64WORDS X 55 BITS 
I 
I r' 21 I10 BUFFER AND 
DECODER CN TO CN(ACCESS TO EM) 
, EM NO.1WIRED INTO BACK PLANE) 
BUFFER BYTE-SERIAL OATA 
ADDRESS MAR FROM CN 
FOR FROM 
DBM DBM PROCICN 
CONTROLLER , I I I 
75 i 7  
Fiq.5 
U.S. Patent Dec. 21, 1982 Sheet 4 of 6 4,365,292 
PORT 
N 0. 
PP 0 
P P I  
PP 2 
PP3 
PP4 
PP 5 
PP6  
PP7  
PROCESSOR 
SIDE 
PP8 
PP9 
PP IO 
PPI1 
PPI2  
PPI3 
P P I 4  
P P I 5  
PORT 
NO. 
EMPO 
EMPf  
EMP2 
EMP3 
EMP4 
EMP5 
EMPG 
EMP7 
E.M. 
SIDE 
EMP 8 
EMP 9 
EMP IO 
EMP II 
EMP12 
emp13 
EMP I4 
EMP 15 
U.S. Patent Dec. 21, 1982 Sheet 5 of 6 4,365,292 
, 107 u 
2 109 TO/FROM A
3 Ill FILE I 1 
MEMORY 
* '  I 
I I3 
FJ. 9 
II 
64K WD 
MEMORY + 
DATA BUS 
(55 BITS) 
15, 
64K WD 
MEMORY 
1
90 
4 4 4 4 TO/FROM EXTENDED 
MEMORY MODULE 
U.S. Patent Dec. 21, 1982 Sheet 6 of 6 4,365,292 
1 I I I 1  N w I  
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
i 
I 
I 
' C O  
I c i  I C  
I 
I 
I 
I 
I 
I 
I 
I 
I 
4,365,292 
1 2 
use of the computational envelope approach as dis- 
closed in U.S. Pat. No. 4,101,960, issued July 18, 1978, 
in the name of Stokes et al, and assigned to the assignee 
of the present invention. Briefly, in the computational 
The invention described herein was made in the per- 5 envelope approach a host Or support processor of a 
formance of work under NASA Contract No. NAS general processing variety (such as a Burroughs B7800) 
2-9897 and is subject to  the provisions of Section 305 of functions as an I/o controller and user interface. spe- 
the National Aeronautics and Space Act of 1958 (72 cial purpose jobs are transferred in their entirety (pro- 
Stat. 435, 42 U.S.C. 2457). gram and data) to a large high speed secondary storage 
10 system or  data base memory and from hence to the 
array memory modules and array processor for process- 
BACKGROUND A N D  OBJECTS OF THE 
ing. During the special purpose processing period the 
INVENTION 
This invention relates generally to connection net- front end support or host processor is freed for other 
works for interconnecting an array of data P r o c e s ~ r s  processing jobs. Once the complete special purpose job 
this invention relates to a connection network of the tants therefrom are returned through the array memo- 
ing processor and a requested memory module is made for output to the user. 
substantially instantaneously as the requested memory 
ARRAY PROCESSOR ARCHITECTURE 
CONNECTION NETWORK 
with an array of memory 
Omega gender wherein aconnection between a request- 
More particularly, Is or task is executed by the array processors, the resul- 
ries and the data base memory to the support processor 
It is an of the present invention to provide a 
modules address flows through the connection network 2o fast, efficient connection network of the Omega gender 
from the requesting processor to the requested memory for use between an array of requesting processors and 
an array of requested memory modules. module. 
vide a plurality of switching elements cooperating in an portant design goal has always been to  maximize their operating speed, Le., the amount of data that can be 25 Omega gender connection network, each element processed in a unit of time. It has become increasingly therein strobing requesting data flowing thereto to pro- apparent in recent times that two important limiting 
conditions exist within the present framework of vide substantially instantaneously a proper non-conflict- 
puter design. These are the limits of component speed ing connection path therethrough. 
and of serial machine organization. To overstep these 30 SUMMARY OF THE INVENTION 
limitations two different types of parallel operating 
In carrying out the above and other objects of this systems have been developed. 
~ i ~ ~ ~ ,  multiprocessing systems have been developed invention there is provided a connection network for 
wherein a number of quite independent processors have use between an array Of requesting processors and an 
been linked together to operate in parallel on differing 35 array Of requested memory 
portions of a program or job in order to speed execution PfiwS an array Of switching elements cooperating in an 
of that program or  job. Frequently, the processors are Omega networking architecture. Each switching d e -  
linked together in a network loop or similar fashion, ment includes a first and a second processor side port, a 
thus greatly slowing the cooperation between proces- first and a second memory module side port, and con- 
When the processors are linked together by a 40 trol logic circuitry for providing data connections be- 
parallel and much faster network such as a crossbar tween the first and the second processor ports and the 
network, the network control mechanism and the Cost first and the second memory module Ports. The control 
and reliability of the network quickly becomes un- logic CIrCUitry includes strobe logic for examining data 
wieldly for a reasonable large number of processors. arriving at the first and the second Processor Ports to 
Second, high speed parallel locked-step processing 45 indicate when the data arriving is requesting data from 
systems have been developed providing an array of a requesting Processor to  a requested memory module. 
processing elements under the control of a single con- Further, connection circuitry is associated with the 
trol unit. strobe logic for examining requesting data arriving at 
As speed requirements of computation have contin- the first and the second processor ports for providing a 
ued to  increase, systems employing greater numbers of 50 data connection therefrom to the first and the second 
parallel memory modules have been developed. One memory module ports in response thereto when the 
such system has in the order of 64 parallel memories, see data connection so provided does not conflict with a 
U.S. Pat. No. 3,537,074, issued Oct. 27, 1970, to R. A. pre-established data connection currently in use. 
Stokes et al, and assigned to the assignee of the present Various other objects and advantages and features of 
invention. However, parallel processors have not been 55 this invention will become more fully apparent in the 
without their own problems. following specification with its appended claims and 
Primarily, parallel processors are often so far re- accompanying drawings wherein: 
BRIEF DESCRIPTION OF T H E  DRAWING moved from the conventional scalar processors that they are hard to program. Secondarily, parallel proces- 
sors are fashioned to operate eMiciently with vectorized 60 
data but are quite inefficient operating upon scalar data. 
Finally, parallel processors, being found operating in 
locked-step fashion in prior art force all processors in 
the parallel array thereof to perform in synchronization 
whether or not such operation is needed in all proces- 65 
sors. 
The manner of difficulty in programming the parallel 
array has been greatly eased by the incorporation and 
In the Of digita1 computers a most im- It is another object of the present invention to pro- 
The network 
FIG. 1 is a block diagram showing the environmental 
architecture of the present invention; 
FIG.  2 is a diagram of the major component parts of 
a processor used in a processing array in the present 
invention; 
FIG.  3 depicts the arrangement of a coordinator in 
the architecture of the present invention; 
FIG. 4 is a detailed diagram of the coordinator of 
FIG.  3; 
4,365,292 
3 4 
FIG, 5 is a diagram of the major component parts of 4. If C less than 0 d o  subroutine Y to calculate Z 
a memory modules used in an extended memory module 5. Calculate D = A  divided by Z 
array in the present invention; With vector notation it is appreciated that A repre- 
FIG,  6 is a diagram of a partial portion of the Omega- sents the elements a, from a/ to a, wherein n equals the 
type connection network used to interpose the proces- 5 number of elements in the vector. The same relationship 
sors of FIG. 2 and the memory modules of FIG. 5; holds for vectors B, C, D and Z. Also it is appreciated 
FIG, 7 is a logic diagram of a 2 X 2 crossbar switching that the elements of vectors are individually stored in 
element used in the Omega network of FIG. 6 memory modules of the extended memory module 
FIG. 8 is a circuit diagram of a control logic circuit array 13 and that the elements are fetched therefrom 
used in the crossbar switching element of FIG. 7; and 10 and operated thereupon individually by the individual 
FIG. 9 is a logic diagram of a data base memory used processors of the processor array 11. 
to interface in a computational envelope architectural The elements of the vectors are loaded into the mem- 
manner the parallel and multiprocessing array of the ory modules according to some particular mapping 
present invention and the support processor which pro- scheme. The simplest loading scheme would be to load 
vides programs, data and I/O communication with the 15 the first vector element into the first memory module, 
user. the second vector element into the second memorv 
module, etc. However, such a simple mapping does not 
lead to efficient parallel processing for many vector 
ooerations. Hence. more comolex mamine schemes 
DESCRIPTION OF THE PREFERRED 
EMBODIMENT 
The present connection network invention resides in 20 hive been developed such as disclosed ih’U.s. Pat. No. 
a parallel data processing environment. The parallel 4,05 1,55 1, entitled “Multidimensional Parallel Access 
data processing environment, see FIG. 1, comprises five Computer Memory System”, issued Sept. 27, 1977 in 
major component elements; namely, the processor array the names of Lawrie et al, and assigned to the assignee 
11, the extended memory module array 13, the connec- of the present invention. The mapping scheme disclosed 
tion network 15 interconnecting the processor array 11 25 therein is incorporated in the “Scientific Processor” of 
with the extended memory module array 13, the data U.S. Pat. No. 4,101,960, issued July 18, 1978, in the 
base memory 17 with data base memory controller 19 name of Stokes et a1 and assigned to the assignee of the 
for staging jobs to  be scheduled and for high-speed present invention. 
input/output buffering of jobs in execution, and the The actual mapping scheme selected is relatively 
coordinator 21 used to synchronize the processor array 30 unimportant to the basic operation of the present inven- 
11 and coordinate data flow through the connection tion. It is important however, that the rule of the map- 
network 15. ping scheme be stored in processor array 11 so that each 
In operation, all data and program for a run is first processor therein can calculate by that rule where the 
loaded into the base memory 17 prior to  the beginning vector element is stored that must be fetched. For exam- 
of the run. This loading is initiated by a Support Proces- 35 ple, if the first processor is always to  fetch from the first 
sor System (not shown) which functions as a user and memory module 13, and the second processor from the 
input/output interface to transfer under control of a second memory module 13, etc. the instruction stored in 
user data and programs from a secondary storage file each processor would express “fetch element specified 
memory (not shown). The use of such a Support Pro- from memory i” where “i” would be the processor 29 
cessor System is detailed in the above-cited U.S. Pat. 40 number. As will be detailed later, each processor 29 has 
Nos. 3,537,074 and 4,101,960. wired in, preferably in binary format, its own processor 
As the run is initiated, the data base memory control- number and each memory module likewise. It will also 
ler 19 transfers code files from the data base memory 17 be appreciated that each stored element is identified as 
to  the extended memory module array 13. Then the data being stored in a particular memory module 13 at a 
base memory controller 19 transfers the processor array 45 particular storage location therein. The exact storage 
11 code files thereto and the necessary job data to the location is a direct function of the mapping used to store 
extended memory module array 13. the element. This mapping is known and stored as a 
Once initiated the present invention is capable of subroutine in each processor 11 to determine from 
parallel execution in a manner similar to  the lock-step whence it is t o  fetch its particular element. 
array machines disclosed in U.S. Pat. Nos. 3,537,074 50 As will also be detailed later, the connection network 
and 4,101,960. Simple programs (having a copy thereof IS can be set to “broadcast” to all processors 11 simulta- 
resident in each processor of the processor array ll), neously. Thus in one transferance it can store a copy of 
with no data-dependent branching, can be so executed. mapping subroutines and other program instructions in 
However, the present invention is not limited to parallel each processor 29. 
mode operation since it can also function in the manner 55 Following loading of the program instructions the 
of a conventional multiprocessor. Thus, as will be de- execution of the problem or job to  be solved begins. In 
tailed hereinafter, the present invention performs essen- the example above, each processor 29 fetches its ele- 
tially just as efficiently whether the data is arranged in ment a, of vector A and its element b, of vector B to 
the form of vectors or  not. The processor array 11 can calculate by addition its element cj of vector C. Since 
function in the lock-step fashion for vector processing 60 the processors are all doing the same thing, the above 
and in independent scalar fashion for multiprocessing fetching and calculating occurs early in parallel. 
operation. However, each processor is now storing a value c, 
A simple but illustrative example of the operation of which is either greater than, equal to, o r  less than zero. 
the present invention involves the following vector It performs the appropriate subroutine to calculate its 
calculation: 65 element z1 of Z. Thus in sharp contrast to prior art 
locked-step processors, the processors 29 of the present 
invmtion are not all proceeding on the same branch 
instruction simultaneously. 
1. A+B=C 
2. If C greater than 0 d o  subroutine W to calculate Z 
3. If C equals 0 d o  subroutine X to calculate Z 
5 
4,3 65,292 
6 
Assuming now that the entire vector Z must be deter- Path 43, the store data path 45, and the fetch data path 
mined before the next step can be executed each proces- 47. 
sor calculating a vector element z, issued a flag indicat- Associated with the execution unit 25 is an enable 
ing “I got here” or that it has successfully completed its flip-flop 34 and a I-got-here flip-flop 36. A flag bit pro- 
last instruction, that of calculating 2,. That processor 29 5 grammed into the code executed by the execution unit 
is halted. The Coordinator 21 monitors all processors 29 25 indicates when the execution unit 25 is executing an 
that are calculating a vector element Zj. When all such instruction or  task which must be concluded by itself 
processors 29 issue their “I got here” flag, the Coordi- and all other like execution units 25 working on the 
nator 21 issues a “GO” instruction and all processors 29 same data before all execution units 25 proceed with 
then begin the next instruction (in the present example, 10 further instructions. This flag bit sets enable flip-flop 34 
that of calculating vector element di of D). Thus, the thereby raising the enable output line 44 thereof and 
processors 29 can function independently and effec- deactivating the not-enable line 38 thereof which is fed 
tively but still be locked in parallel by a single instruc- to the coordinator 21 to indicate which processors 29 
tion when such parallel operation is called for. are not-enabled. When the instruction or  task is com- 
Following calculation of the elements diof the vector pleted by the execution unit 25 (Le., the calculation by 
D ,  the vector D is fed back through the connection “2” in the example above), a bit is sent out to set the 
network 15 to the extended memory module 13 and “I-got-here” flip-flop 36 which raises its I-got-here out- 
from hence through the data base memory 17 to the put line 40 to the coordinator 21. The coordinator 21 
user. thereup issues a command via its command and syn- 
The above simplified illustrative example of the oper- 2o chronization lines 31 and 33 to halt the processor 29 
ation of the present invention presented an overview of until the coordinator 21 receives an I-got-here signal 
the structure and function of the present invention. A from all enabled processors 29. Then a “GO” signal is 
fuller understanding may be derived from a closer ex- issued from the coordinator 21 through line 42 and 
amination of its component parts. A N D  gate 44 to  reset the I-got-here flip-flop 36 and the 
The processor array 11 consists of 512 independent 25 enable flip-flop 34. The Coordinator 21 also releases its 
like processors 29 denoted as Processor 0 through Pro- halt commands through lines 31 and 33 and all proces- 
cessor 511 in FIG. 1. Each processor 29 includes a sors 29 begin in parallel the execution of the next task or 
connection network buffer 23, an execution unit 25 and instruction which usually involves the fetching of 
a local processor memory 27. The number of processors 3o freshly created or modified data from the extended 
29 in the processor array 11 is selected in view of the memory array 13 through the connection network 15. 
power of each processor 29 and the complexity of the The Processor Memory 27 holds instructions for 
overall tasks being executed by the processor array 11. execution by the Execution Unit 25 and the data to be 
Alternate embodiments may be fabricated employing fetched in response to the instructions held. Preferably, 
more or  less than 512 processors 29 having less or more 1c the Processor Memory 27 is sufficient for storage of 
processing capability than detailed below. 
PROCESSOR 
In the array of 512 (516 counting spares) processors 
29, each processor is identical to  all others except for an 
internal identifying number which is hardwired into 
each processor (by 10 lines representing in binary code 
the number of the processor) in the preferred embodi- 
ment but may be entered by firmware or software in 
alternate embodiments. It is of importance only that 
when data is to flow to or from processor number 014 
(for example) that a single processor 29 is identified as 
being number 014. 
Each processor 29 is in itself a conventional FOR- 
TRAN processor functioning to execute integer and 
floating-point operations on data variables. Each pro- 
cessor 29, see FIG. 2, is partitioned into three parts; 
namely, the Execution Unit 25, the Processor Memory 
27, and the Connection Network Buffer 23. 
The Execution Unit 25 is the logic part of the Proces- 
sor 29. The Execution Unit 25 executes code contained 
in its Processor Memory 27, fetches and stores data 
through the Connection Network 15 via its Connection 
Network Buffer 23. As will be detailed hereinafter, the 
Execution Unit 25 also accepts commands from the 
Coordinator 21 via the Command Line 31 and synchro- 
nization via the Synchronization Line 33. The Execu- 
tion Unit 25 executes instructions stored in its Processor 
Memory 27 by addressing same through Address Path 
37 to fetch instruction code on Fetch Line Path 39 or to 
store instruction code on Store Path 41. Likewise, data 
is transferred through the connection network 15 via 
the connection network buffer 23 through the Address 
_I- 
over 32,000 words of 48 bits plus 7 bits of error code 
each, Data, address and control communications is 
solely with the Execution Unit 25. 
The Connection Network Buffer 23 functions as a 
40 asynchronous interface with the Connection Network 
15 to  decouple the Processor 29 from the access delays 
of the Connection Network 15 and the extended Mem- 
ory Module Array 13 (FIG. 1). The Connection Net- 
work Buffers 23 communicates basically three items; 
45 the number of a particular Memory Module in the Ex- 
tended Memory Module Array 13 with which commu- 
nication is desired plus the operation code (Le., fetch, 
store, etc.), the address within the particular Memory 
Module, and one word of data. The presence of an 
50 Extended Memory Module number in the Connection 
Network Buffer 23 functions as a request for that Ex- 
tended Memory Module. An “acknowledge” signal is 
returned via the Connection Network 15 from the Ex- 
tended Memory Module selected to indicate the success 
Each and every Connection Network Buffer 23 is 
clock-synchronized with the Connection Network 15 to 
eliminate time races therethrough. 
COORDINATOR 
The Coordinator 21, as seen in FIG. 3, in essence 
performs two major functions: first, it communicates 
with the Support Processor primarily to load jobs into 
and results out of the data base memory 19; second, it 
65 communicates with the processors 29 in order to pro- 
vide synchronization and control when all processors 
29 are to operate in parallel on a particular piece of data 
or  at a particular step in a program. 
5 5  of the request. 
60 
7 
4,365.292 
The coordinator’s function in communicating with 
the Support Processor is relatively simple under the 
computational envelope approach such as detailed, for 
example, in U S .  Pat. No. 4,101,960. Under the compu- 
tational envelope approach all data and programs for at 
least one job is transferred from the Support Processor 
to the Data Base Memory 17. Thereafter, the Support 
Processor is freed. Program and data are transferred to 
the Extended Memory Module Array 13, and eventu- 
ally to  the Processor Array 11 for processing. Upon 
completion of the processing, results are communicated 
back to the Data Base Memory 17 and finally back to 
the Support Processor. 
Communication is maintained between the Coordina- 
tor 21 and the Support Processor via a Communication 
Register 49 and with the Data Base Memory Controller 
19 through an I/O Buffer and Decoder 51, see FIG. 4. 
A Connection Network Buffer 23 identical in struc- 
ture and function to  the Connection Network Buffer 23 
of each Processor 29 (see FIG. 2) permits the Coordina- 
tor 21 to  communicate with the Extended Memory 
Module Array 13 through the Connection Network 15 
as easily as if it were a Processor 29. Likewise, a Con- 
nection Network Port 53 permits the Coordinator 23 to 
Communicate with the Processors 29 through the Con- 
nection Network 15 as easily as if it were a memory 
module in the Extended Memory Module Array 13. 
Other important links to the Coordinator 23 from the 
Processors 29 are via the “I got here” lines 40 and the 
NOT Enabled lines 38 from each Processor 29. By 
ORing the lines 38 and 40 individually for each Proces- 
sor 29 in an O R  circuit 59 and summing all OR circuit 
59 output lines 61 through an A N D  circuit 63 an output 
65 is obtained which signifies that every enabled Pro- 
cessor 29 has finished its current processing task and 
raised its “I got here” line 40. The output line 65 is fed 
through Control Logic 67 to issue a “GO’ signal on GO 
Line 42 to release all enabled Processors 29 and allow 
them to continue processing in parallel. The control 
logic 67 also provides on line 33 the synchronization for 
the Processors 29 to  provide proper timing between the 
Processors 29 and the Connection Network 15. Further, 
the Control Logic 67 provides standard communication 
control on a Communication Bus 73 between the Com- 
munication Register 49, I/O Buffer and Decoder 51, 
Connection Network Port 53, and the Communication 
Network Buffer 23. 
EXTEND MEMORY MODULE 
The Extended Memory Module 13 is the “main” 
memory of the present invention in that it holds the data 
base for the program during program execution. Tem- 
porary variables, or work space, can be held in either 
the Extended Memory Module 13 or the Processor 
Memory 27 (see FIG. 4), as appropriate to the problem. 
All I/O to and from the present invention is to and from 
the Extended Memory Module 13 via the Data Base 
Memory 19 (see FIG. 1). Control of the Extended 
Memory Module 13 is from two sources; the first being 
instructions transmitted over the Connection Network 
15 and the second being from the Data Base Memory 
Controller 19 (see FIG. 1) which handles the transfers 
between the Data Base Memory 19 and the Extended 
Memory Module 13. 
In the preferred embodiment of the present invention 
there are 521 individual memory modules in the Ex- 
tended Memory Module 13. The number 521 is chosen 
because it is a prime number larger than the number 
a 
(5 12) of Processors 29. The combination of 521 Memory 
Modules 13 with 512 Processors 29 facilitates vector 
processing as detailed in U S .  Pat. Nos. 4,051,551 and 
4,101,960. 
Each memory module 13 is identical to all others 
except that it has its own module number (Le., 0-520) 
associated with it, preferable in a hardwired binary 
coded form. The purpose of having a memory module 
13 numbered is to  provide identification for addressing. 
10 Storage locations within each memory module 13 are 
accessed by the Memory Module number and storage 
locations within that memory module comprising in 
essence the total address. 
Each memory module in the Extended Memory 
” Module 13 is conventional in that it includes basic stor- 
age and buffering for addressing and data, see FIG. 5. 
Basic storage is provided in the preferred embodiment 
by a Memory Storage Unit 73 sufficient to  store 64,OOO 
words each having 55 bits (48 data bits and 7 checking 
2o bits). High speed solid state storage is preferred and 
may be implemented by paralleling four 16K RAM 
memory chips. 
Standard address registers are also provided; a First 
Memory Address Register 75 for addressing from the 
25 Data Base Memory 17 and a second Memory Address 
Register 77 for addressing from the Processors 29 via 
the Connection Network 15. Data Buffering is provided 
by a one-word buffer 79 for data communication with 
3o the Data Base Memory 17 and a parallel-to-byte-serial 
buffer 81 for communication through the Connection 
Network 15. Byte communication rather than word 
communication is handled through the Connection Net- 
work 15 to minimize the number of data paths and 
35 switching paths required therethrough. Alternate em- 
bodiments may extend, for example, to bit communica- 
tion which is simple but slow to word communication 
which is relatively faster but likewise quite expensive 
and massive in hardware implementation. 
Communication through the connection network 15 
and the Extended Memory Module 13 is straightfor- 
ward. A strobe signal and accompanying address field 
indicates the arrival of a request for a particular Ex- 
tended Memory Module by number. The requested 
45 number is compared to the actual memory module num- 
ber (preferably hardwired in binary coded format) and 
a true comparison initiates an “acknowledge” bit to be 
sent back to the requesting Processor 29 and to lock up 
the connection Network 15 path therebetween. 
As will be detailed hereinafter, following the strobe, 
and accompanying the address field, will be any one of 
four different commands, namely: 
(1) STOREM. Data will follow the address; keep up 
the acknowledge until the last character of data has 
arrived. The timing is fixed; the data item will be 
just one word long. 
(2) LOADEM. Access memory at the address given, 
sending the data back through the Connection 
Network 15, meanwhile keeping the “acknowl- 
edge” bit up until the last 11 bit frame has been 
sent. 
(3) LOCKEM. Same as LOADEM except that fol- 
lowing the access of data, a O N E  will be written 
into the least significant bit of the word. If bit was 
ZERO, the pertinent check bits must also be com- 
plemented to keep the checking code correct. The 
old copy is sent back over the Connection Net- 
work 15. 
5 
QO 
50 
55 
60 
65 
4.365.292 
9 
(4) FETCHEM. Same as LOADEM except that the 
“acknowledge” is dropped as soon as possible. The 
Coordinator 21 has sent to this code to imply that 
it will switch the Connection Network 15 to broad- 
cast mode for the accessed data. The data is then 
sent into the Connection Network 15 which has 
been set to broadcast mode by the Coordinator 21 
and will go to all processors 29. 
CONNECTION NETWORK 
The Connection Network 15 has two modes of opera- 
tions. In a special purpose mode detailed hereinafter the 
Coordinator 21 may use the Connection Network 15 to 
perform special tasks. In the special mode, a typical 
operation for the Connection Network 15 is the “Broad- 
cast” operation wherein under command from the Co- 
ordinator 21 a word of data is “broadcast” to all proces- 
sors 29 from either the Coordinator 21 or  a selected 
particular Extended Memory Module 13. 
In the normal mode of operation a “request strobe” 
establishes a two-way connection between the request- 
ing processor 29 and the requested Extended Memory 
Module 13. The establishment of the connection is ac- 
knowledged by the requested Extended Memory Mod- 
ule 13. The “acknowledge” is transmitted to the re- 
quester. The release of the connection is initiated by the 
Extended Memory Module 13. Only one request arrives 
at a time to a given Extended Memory Module 13. The 
connection Network 15, not the Extended Memory 
Module 13 resolves conflicting responses. 
With reference to FIGS. 1 and 6 the Connection 
Network 15 appears to be a dial-up network with up to 
512 callers, the processors 29, possibly dialing at once. 
There are 512 processor ports (only 16 shown, 
PW-PPIs), 521 Extended Memory Ports (only 16 
shown, EMW-EMPlS), and two Coordinator ports 
(see FIG. 4), one the Connection Network Port 53 
functioning as an Extended Memory Port and the other, 
the Connection Network Buffer 23 functioning as a 
processor port. 
With reference now to FIG. 6, it can be seen that the 
Connection Network 15 is a standard Omega Network 
comprised of a plurality of switching elements 83 
wherein each switching element 83 is in essence a 
two-by-two crossbar network. 
Addressing is provided by the requester and is de- 
coded one bit at a time on the fly by the Connection 
Network 15. Consider for example, that processor port 
PPlO desires communication with Extended Memory 
Port EMP11. The processor port PPlO transmits the 
Extended Memory Port E M P l l  number in binary form 
(101 I ) .  Each switching element 83 encountered exam- 
ines one bit in order from the most significant bit to the 
least. Thus switch elements 83u examines a binary one 
and therefore outputs on its lower (reference FIG. 6) 
line 85. Switch element 83b examines a binary Zero and 
therefore outputs on its top line 87. Switch elements 83c 
and 83d both examine binary ones leading to a final 
output to EMPl l .  For E M P l l  to communicate back to 
PPlO a binary representation of ten (1010) is transmitted 
thereby causing in the above described manner commu- 
nications to be established through switch elements 83d, 
c. b, and (I to PP10. 
For a special or “broadcast” mode of operation it can 
be seen from FIG. 6 that if all switch elements 83 were 
to establish dual communication paths therethrough 
communication could be established between any one 
Extended Memory Port EMPO through EMPl5  to all 
10 
processor ports PPO through PP15. Likewise communi- 
cation can be broadcast from any one of the processor 
ports PPO through PP l5  to all of the Extended Memory 
Ports EMPO through EMP15. 
Although only 16 processor ports (PPO-PPl5) and 16 
Extended Memory Ports (EMPO-EMP15) are shown in 
FIG. 5, the Omega type connection network 15 is 
readily expandable to handle any number of processor 
and memory ports. 
Each switch element 83 has an upper or  first proces- 
sor side port 84, a lower or second processor side port 
86, an upper or first extended memory side port 88 and 
a lower or second extended memory side port 90. Fur- 
ther, each switch element 83 includes a plurality of 
15 AND logic gates 89, a plurality of OR logical gates 91, 
and a control logic circuit 93, see FIG. 7. The control 
logic examines one bit of the data flowing to the switch 
element 83 to control the passage of data therethrough 
in accord with the above-described operation. 
The control logic circuit 93 generates control signals 
El, E2, E3 and E4 to control the flow of data through 
the switch element 83. The control logic circuit is fed 
by two bits each from the upper and lower processor 
ports 84 and 86. The two bits from the upper processor 
25 port 84 are inputted on line 92. One of the bits is a strobe 
signal indicating that an addressing request is passing 
through the switch element 83 and the other bit indi- 
cates whether the request is to exit through upper port 
88 or  lower port 90. As will be detailed, the control 
30 logic circuit 93 recognizes the strobe bit and honors the 
exit request if the requested exit port is free. The control 
logic circuit 93 will also keep the requested path 
through the control logic circuit 83 open or locked long 
enough for an “acknowledge” signal to return from the 
35 Extended Memory Module 13 indicating a successful 
path connection through the entire connection network 
15. The “acknowledge” signal will keep the path con- 
nection locked for a time sufficient to pass the desired 
data therethrough. If no “acknowledge” signal is re- 
40 turned within a time sufficient for the request to travel 
through the connection network 15 and the “acknowl- 
edge” signal echoed back, then the control logic circuit 
93 will release the requested path through the switch 
element 83. 
The control logic circuit 93 receives the two bits 
above-described from input line 92 and a similar two 
bits from the lower processor port are inputted on line 
94. The “acknowledge” bit arriving through the upper 
Extended Memory Port 88 is inputted on line 96 while 
50 the “acknowledge” bit afriving through the lower Ex- 
tended Memory Port 90 is inputted through line 98. 
Two commands from the Coordinator 21 are received 
on line 100. Although the control logic circuit 93 is 
shown in more detail in FIG. 8, it is appreciated that 
55  many alternative embodiments could be fashioned to 
fulfil the function of the control logic circuit 93 as 
above-described. The control logic circuit 93 includes 
in the FIG. 8 embodiment thereof four identical input 
A N D  gates 102a through 10M for summing the above- 
60 described strobe and exit port request bits. Two  inverter 
gates 104a and 104b are provided to complement the 
exit port request bits. Four strobe circuits 106a through 
106d are provided. Each strobe circuit 106 when trig- 
gered remains “ON” for a period of time sufficient for 
65 an “acknowledge” signal to arrive back if a successful 
path is completed through the entire connection net- 
work 15. Each strobe circuit 106 feeds through an OR 
gate 108 shown individually as OR gates 108n through 
5 
IO 
20 
45 
4,365,292 
11 
108d to produce an energizing signal E identified indi- 
vidually as El through E4 to open and hold open the 
request path through the switching element 83 (see 
FIG. 7). 
Single “acknowledge” bits are sent back on lines 96 
and 98 and are combined through A N D  gates 1104 
through llad with the outputs of the strobe circuits 
1060 through 106d as shown in FIG. 8 t o  initiate a latch 
circuit 112 shown individually as latch circuits 1120 
through llzd to  “latch” or  keep locked the requested 
path through the switching element 83 for a period 
sufficient to  pass at least an entire data word of 5 5  bits 
therethrough, 11 bits at a time. It is realized that the 
strobe 106 and latch 112 circuits may be fashioned as 
monostable multivibrators or other delay or timing 
devices depending on the length of time (Le., how many 
nanoseconds) the strobe latch is required to be “ON’. 
The length of time required is dependent upon the type 
of circuit elements chosen, the size and the clocking 
speed of the connection network 15. 
In an alternate or special mode of operation a two bit 
signal is received from the Coordinator 21 on line 100 
and processed through exclusive OR circuit 114 and 
OR gates 108d through 108b to open all paths through 
the switching element 83 to provide for a “broadcast” 
mode wherein an one particular processor 29 may “talk 
to” all of the memory modules in the Extended Memory 
Module Array 13 and wherein any one memory module 
in the Extended Memory Module Array 13 may load 
each processor 29 in the Processor Array 11. 
DATA BASE MEMORY 
Referring again to FIG. 1, the Data Base Memory 17 
is the window in the computational envelope approach 
of the present invention. All jobs to be run on the pres- 
ent invention are staged into the Data Base Memory 17. 
All output from the present invention is staged back 
through the Data Base Memory 17. Additionally, the 
Data Base Memory 17 is used as back-up storage for the 
Extended Memory Module 13 for those problems 
5 
10 
15 
20  
25 
30 
35 
40 
whose data base is larger than the storage capacity of 
the Extended Memory Module 13. Control of the Data 
Base Memory 17 is from the Data Base Memory Con- 
troller 19 which accepts commands both from the Co- 
ordinator 21 for transfers between the Data Base Mem- 45 
ory 17 and the Extended Memory 13, and from the 
Support Processor System (not shown) for transfers 
between the Data Base Memory 17 and the File Mem- 
ory (not shown). 
FIG. 9, a general CCD (charged-coupled device) array 
101 is used as the primary storage area, and two data 
block size buffers memories 103 and 105 of 64K word 
capacity each are used for interfacing to  the secondary 
storage file memory. Experience in large data array 55 
systems and scientific array processors indicate that 
about 99% of the traffic between the data base memory 
17 and the file memory is generally simple large data 
block transfers of program and data. To provide for 
high volume-high speed transfers, four data channels 60 
107, 109, 111 and 113 are provided. 
The buffer memories 103 and 105 are connected to 
the CCD array 101 through a data bus 115, preferably 
55 bits wide, and a data register 117 of 440 bits width. 
The data bus 115 feeds directly to the Extended Mem- 65 
ory Modules 21 with no additional buffering required 
except for the one-word ( 5 5  bit) I/O Buffer 51 (see FIG.  
4) provided with each Extended Memory Module 21. 
In the preferred embodiment of the invention, see 50 
DATA BASE MEMORY CONTROLLER 
The data base memory controller 19 interfaces two 
environmemts: the present invention internal environ- 
ment and the file memory environment, since the Data 
Base Memory 17 is the window in the computational 
envelope. The Data Base Memory 17 allocation is under 
the control of the file memory function of the Support 
Processor. The Data Base Memory Controller 19 has a 
table of that allocation, which allows the Data Base 
Memory Controller 19 to convert names of files into 
Data Base Memory 17 addresses. When the file has been 
opened by a present invention program it is pro- 
grammed as far as allocation is concerned, and remains 
resident in Data Base Memory 17 until either is closed 
or abandoned. For  open tiles, the Data Base Memory 
Controller 19 accepts descriptors from the coordinator 
21 which call for transfers between Data Base Memory 
17 and Extended Memory Modules 13. These descrip- 
tors contain absolute Extended Memory Module 13 
addresses but actual file names and record numbers for 
the Data Memory Base 17 contents. 
Operation is as follows. When a task for the present 
invention has been requested, the Support Processor 
passes the names of the files needed to start that task. In 
some cases existing files are copied into newly named 
files for the task. When all files have been moved into 
the Data Base Memory 17, the task starts in the present 
invention. When the task in the present invention opens 
any of these tiles, the allocation will be frozen within 
the Data Base Memory 17. It is expected that “typical” 
task execution will start by opening all necessary files. 
During the running of a present invention task, other 
file operations may be requested by the user program on 
the present invention, such as creating new files and 
closing tiles. 
Extended Memory Module 13 space is allocated ei- 
ther at compile time or dynamically during the run. In 
either case, Extended Memory Module 13 addresses are 
known to the user program. Data Base Memory 17 
space, on the other hand, is allocated by a file manager, 
which gives a map of Data Base Memory 17 space to  
the Data Base Memory Controller 19. In asking the 
Data Base Memory Controller 19 to  pass a certain 
amount of data from Data Base Memory 17 to  Extended 
Memory Module 13, the Coordinator 21, as part of the 
user program, issues a descriptor to the Data Base 
Memory Controller 19 which contains the name of the 
Data Base Memory 17 area, the absolute address of the 
Extended Memory Module 13 area, and the size. The 
Data Base Memory Controller 19 changes the name to 
an address in Data Base Memory 17. If that name does 
not correspond to an address in Data Base Memory 17, 
an interrupt goes back to the Coordinator 21, together 
with a result descriptor describing the status of the 
failed attempt. 
Not all files will wait to the end of a present invention 
turn to be unloaded. For example, the number of snap- 
shot dumps required may be data dependent, so it may 
be preferable to create a new file for each one and to  
close the tile containing a snapshot dump so that the 
File Manager can unload it from Data Base Memory 17. 
When the present invention task terminates normally, 
all tiles that should be saved are closed. 
Although the present invention has been desribed 
with reference to its preferred embodiment, it is under- 
stood by those skilled in the art that many modifica- 
tions, variations, and additions may be made to the 
4.365.292 
13 
description thereof. For example, the number of proces- 
sors or memory modules in the arrays thereof may be 
increased as specific processing and storage require- 
ments may dictate. Also, although the Connection Net- 
work is described as an Omega network it is clear that 
any network having local mode control and the ability 
to decode path direction bits or flags on the fly may be 
used. Further, the Omega network may be doubled in 
size so as to  minimize the effect of a single blocked path. 
Routine mapping algorithms may interpose actual mem- 
ory module destinations and memory module port des- 
ignations if desired. Additional gating may be provided 
in each switching element of the Connection Network 
to allow for a “wrap-around” path whereby processors 
may communicate with each other as well as with mem- 
ory modules. The control from the Coordinator may be 
expanded so that there can be two separate broadcast 
modes; one to the processors, and the reverse to the 
memory modules. 
Further, although the present invention has been 
described with a crossbar network having each switch- 
ing element fashioned to examine all incoming data for 
a “strobe” or addressing bit, it is appreciated that once 
a “strobe” bit is detected and an “acknowledge” bit 
returned that logic could be provided to  free up the bit 
position of the “strobe” bit for other purposes during 
the period when an acknowledged latch was present 
and data was being transferred through the switching 
element from a processor to a memory module. For 
example, the freed-up strobe bit position could be used 
for a parity bit for the data being transferred. Other like 
changes and modifications can also be envisioned by 
those skilled in the art without departing from the sense 
and scope of the present invention. 
Thus, while the present invention has been described 
with a certain degree of particularity, it should be un- 
derstood that the present disclosure has been made by 
way of example and that changes in the combination 
and arrangement of parts obvious to one skilled in the 
art, may be resorted to without departing from the 
scope and spirit of the invention. 
What is claimed is: 
1. A connection network for use between a parallel 
array of processors and a parallel array of memory 
modules for establishing non-conflicting data communi- 
cations paths between requested memory modules and 
requesting processors, said connection network com- 
prising: 
a plurality of switching elements interposed between 
said parallel array of processors and said parallel 
array of memory modules in an Omega networking 
14 
architecture wherein each switching element in- 
cludes: 
a first and a second processor side port; 
a first and a second memory module side port; 
control logic means for providing data connections 
between said first and second processor ports and 
said first and second memory module ports, said 
control logic means including: 
strobe means for examining data arriving at said first 
and second processor side ports to indicate when 
said arriving data describes a request for connec- 
tion from a requesting processor to a requested 
memory module, each said request for connection 
from a requesting processor of said parallel array of 
processors being independent as to  synchronism 
with requests for connection from other requesting 
processors of said last mentioned array; and 
connection means associated with said strobe means 
for examining requesting data arriving at said first 
and said second processor ports for providing a 
data connection therefrom to said first and said 
second memory module ports in response thereto 
when said data connection so provided does not 
conflict with a pre-established data connection 
currently in use whereby non-conflicting data com- 
munications paths are established between request- 
ing processors and requested memory modules. 
2. The connection network according to claim 1 
means for maintaining said data connection provided 
by said connection means for a period of time sum- 
cient for data to flow therefrom to a requested 
memory module and an acknowledge signal to be 
echoed back from said requested memory module. 
3. The connection network according to claim 2 
means for maintaining said data connection provided 
by said connection means for a period of time sum- 
cient for a plurality of bytes of data to  flow there- 
through upon detection of said acknowledge signal 
echoed back from said requested memory module. 
4. The connection network according to claim 1 or 
claim 2 or claim 3 wherein said data connection pro- 
45 vided by said connection means is a byte wide data 
connection. 
5. The connection network according to claim 1 or 
claim 2 or  claim 3 wherein said data connection pro- 
vided by said connection means is a twelve-bit wide 
5 
10 
I s  
2o 
25 
3o wherein said connection means includes: 
35 
wherein said connection means further includes: 
40 
50 data connection. , . * * +  
55 
65 
