Calculation of timing in the map-200 and map-300 array processors computers, 1981 by Hutchinson, Fredrick B. (Author) & Houston, Johnny L. (Degree supervisor)
CALCULATION OF TIMING IN THE MAP-200 
AND MAP-300 ARRAY PROCESSORS COMPUTERS 
A THESIS 
SUBMITTED TO THE FACULTY OF ATLANTA UNIVERSITY 
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS 
FOR THE DEGREE OF MASTER OF SCIENCE 
BY 
FREDRICK BRYAN HUTCHINSON 
DEPARTMENT OF MATHEMATICAL SCIENCES 
ATLANTA, GEORGIA 
MAY 1981 
- VL(A T - 36 
ABSTRACT 
COMPUTER SCIENCE 
HUTCHINSON, FREDRICK B. B.S. CLAFLIN COLLEGE, 1979 
CALCULATION OF TIMING IN THE MAP-200 AND MAP-300 ARRAY 
PROCESSOR COMPUTERS. 
Advisor: Dr. Johnny L. Houston 
Thesis dated May 1981 
The primary intent of this thesis is to calculate the timing of software 
routines developed for the MAP-200 and MAP-300 Array Processor Computers. 
The routines were developed for the expansion of a Real-Time Hybrid Computer 
System (RTHCS) which is being upgraded by the Flight Dynamics Laboratory 
(AFWAL/FI) of the Air Force Wright Aeronautical Laboratory (AFWAL) at Wright 
Patterson Air Force Base (WPAFB). 
ACKNOWLEDGEMENTS 
In appreciation for their patience and the many hours of hard work, I would 
like to thank Drs. Johnny L. Houston and Benjamin F. Martin, Ms. Marie Reed, Ms. 
G. Baker-Roberts and the entire Department of Mathematical Sciences for the 
help they have so willingly given me; and most of all I would like to thank God. 
m 
TABLE OF CONTENTS 
Page 
Acknowledgements  ili 
List of Figures  v 
List of Tables  vi 
Glossary  vii 
I. Introduction  1 
A. Array Processors  1 
B. The MAP-200 and MAP-300 Array Processors  6 
C. SEL-MAP Mini Vector Processors  10 
II. Architecture and Basic Features of the MAP-200 and MAP- 
300 Array Processors  12 
A. Architecture and Basic Features  12 
B. Shared Memory SEL-MAP Vector Processor 
Architecture  14 
III. Timing of Calculations in the MAP-200 and MAP-300 
Processors  15 
A. Units of Measure for Generic Instructions  15 
B. Options for Program Efficiencies  15 
C. Optimizing of Options for Program Efficiencies .... 19 
IV. Mathematical Expressions for Which Computer Routines 
are Developed  22 
A. Numerical Integration Expressions  22 
B. Coordinate Transformation Expressions  26 
V. Time Analysis of the Mathematical Equations Above .... 28 
Footnotes  35 
Bibliography  36 
LIST OF FIGURES 
Figure Page 
1 Traditional array processor architecture  2 
2 Representation of the 32-bit floating point  8 
3 Architectural diagram of the MAP-200 and MAP-300 
array processors  13 
4 Arithmetic processor program example  16 
5 Programming efficiencies  18 
6 Optimized program efficiencies  20 
7 Flowchart of the routines  28 
8 Functions of each routine  29 
9 Routines and equations for Runge-Kutta, 4th Order 
evaluations  31 
10 Illustration of the final computation of the executive 
time of the Runge-Kutta routines  33 
v 
LIST OF TABLES 
Table Page 
1 Basic Features of the Multi-Arithmetic Array 
Processors  9 
vi 
Glossary 
1. Array Processor - systems composed of a set of identical CPU's acting 
synchronously under control of a common broadcasting unit. 
2. A/D - converts analog information to digital. 
3. D/A - forms basic links between the world of "real" phenomena, where the 
variables are generally continuous analog quantities, and the "engineer 
designed" world of digital information processing and data communications, 
where the variables are discrete quantized quantities. 
U. CRT - (Cathode Ray Tube) used to display characters on a video screen. 
5. I/O Processor - controls I/O devices and handle the movement of data 
between the memory and time I/O devices. 
6. Megabyte - a millionth of a byte. 
7. Memory - the function portion of a computer that stores data. 
8. Minicomputer - a computer whose cost and size is smaller than medium and 
large-scale computers. 
9. MOS - acronym for Medal Oxide Semiconductor. 
vn 
10. Multibus - allows complete overlap of data input and processing of earlier 
data. 
11. Multiplexed - the technique in which one set of wires are time-shared for 
two or more different functions. 
12. Pipelining - speeds up an Arithmetic-Logic Unit (ALU) and depends on 
instruction look ahead to provide several instructions to work on at the same 
time. 
13. Transducers (Xducers) - hardware devices that change information from one 
physical form to another and hence serve as communication links between 
the computer and its environment. 
14. Msec - microsecond. 
vm 
I. INTRODUCTION 
A. Array Processors 
Small computers have never been able to operate fast enough for 
applications requiring complex calculations and high speed monitoring. In 
vibration analysis, for example, computers must analyze signals from several 
transducers and compare monitored signals to prerecorded patterns. A new kind 
of small computer called an Array Processor (AP) can do complex calculations 
several orders of magnitudes faster than minicomputers. These computers collect 
and calculate data at mainframe speeds to handle video inspection, structural 
analysis, and vibration testing at a fraction of mainframe cost. 
Most array processors are made up of several independent processors each 
optimized to do a small set of tasks. The processors exchange data and control 
signals through three memories and sets of memory data lines (buses). The 
multibus structure speeds operation by reducing the amount of time processors 
wait to exchange data. An illustration of both the traditional array processor 
architecture and an inside view of an array processor can be found in Fig. 1. 
Although array processors have been highly successful in speeding up host 
computational tasks by factors of a 100 or more, certain inefficiencies and 
overhead penalties are incurred if care is not taken with the overall system 
configured. For example, conventional benchmarks such as memory access speed 
1 
2 






5 Memory Bus ï 
ietnory 








Fig. 1. (cont.) Inside an array processor. 
4 
and multiply times typically do not include data command initialization overhead, 
and if not avoided, these components of the overall run time may constitute a 
much more significant and serious burden than the array function arithmetic 
computation time. 
The following architectual features are desirable in the array processors 
(AP): 
Multiple memory buses to allow complete overlap of data input and 
processing of earlier data. 
Ability to mix memory speeds so that cost effective processing systems can 
be configured. 
Multiple, parallel, asynchronous processors to handle individual I/O and 
arithmetic tasks. 
High speed programmable I/O processors. This capability is extremely 
important in many situations where a "snapshot" of a high speed event must 
be captured and processed later at a more leisurely pace. 
An array processor software executive residing internally, rather than in the 
host computer. 
Array processor resident function library routines. 
5 
. FORTRAN Programmability. A higher level language such as FORTRAN, 
allows the array processor program to be initialized and defined in the host 
program, and yet gives the user the ability to execute complex conditional 
loops. 
Array processors (AP) can be either synchronous or non-synchronous 
computers. Synchronous AP's have a single master clock signal that operates all 
processors within the AP at the same speed. Synchronous operation allows the 
entire AP to proceed through a program a single step at a time to find program 
errors. 
However, synchronous AP's often cannot be easily upgraded with additional 
circuits. For example, AP processing speed theoretically can be increased by 
adding memory or I/O processors. But in synchronous AP's adding circuits may 
increase electrical-lead capacitance or resistance, which in turn can slow signals 
and cause malfunctions. 
Nonsynchronous AP's are generally not affected by clock speed variation 
because each processor operates from its own clock. Clock cycles in these 
machines can range from 50 to 170 nanoseconds. Since each processor operates at 
a different speed, processors must communicate with each other by means of 
"hand shaking" signals that indicate the start and finish of each task. Here, a 
processor receives no new task until it issues a "hand shake" signal indicating that 
6 
all previous tasks are complete. In nonsynchronous processing, additional 
processors for I/O, faster calculations, and so forth can be added without regard 
to effects on clock speed, since hand shake signals are undocked.^ 
B. The MAP-200 and MAP-300 Array Processors 
MAP is an acronym that stands for Macro Arithmetic Array Processors. The 
MAP array processor product line, introduced by CSPI-Central System Processing 
Incorporation, in 1974, represents a bold departure from traditional processor 
architecture. Several hundred MAP's are now operating in the field in diverse and 
demanding application areas. CSP Incorporated, was formed in 1968 to meet the 
growing needs for high-speed array processing systems in signal processing, data 
reduction and other computationally intensive fields. 
The MAP Series (200, 300 and 6400) of array processors represents a 
signficant advance in the computation of interactive arithmetic. Interfaced to 
virtually any mini or super-minicomputer, these floating point, FORTRAN IV 
programmable co-processors improve computational speeds by several orders of 
magnitudes. 
MAP offers unprecedented features and advantages: 
* Multi-bus architecture allows completely overlapped I/O, arithmetic pro¬ 
cessing and control. 
* Flexible non-interleaved memory structure permits direct addressing of 
bytes, 16-bit half-words, 32-bit full-words, and 64-bit double precision 
words. 
7 
* AP - resident executive for full task sequencing, multi-tasking memory 
management, and coordination of processors. 
* Peripheral device I/O processors relieve the host of data flow and 
management tasks, and permit real-time data transfer at up to 30- 
megabytes per second. 
* FORTRAN library provides over 300 signal processing and math functions as 
well as support routines for intelligent I/O processors. 
* Modularity allows field expansion of memory and upgrading of processors. 
* Software transportability insures compatibility throughout the 32-bit and 64- 
bit array processor product line. 
* Hardware compatibility within the MAP product line MAP-200, MAP-300 
utilize compatible memories, power supplies, chassis, interface, and peri¬ 
pherals, thus minimizing space parts, inventory and simplifying service 
support. 
The MAP-200 and MAP-300 series are 32-bit floating point array processors 
developed for numerically and I/O intensive signal and image processing 
applications (see Fig. 2). Table 1 is an illustration of the features of the MAP. 
* Host interface 
MAP array processors have been successfully interfaced to most popular 16- 
bit and 32-bit mini and super-minicomputers. 
8 
Fig. 2. Representation of the 32-bit floating point. 
9 
Table 1. Basic Features of the Multi-Arithmetic Array Processors. 
MAP-200 MAP-300 MAP-6400 








+ 1 o 10+- 76 lot 76 
/O Rates 
up to 36 mega¬ 
bytes/sec 
up to 36 mega¬ 
bytes/sec 
up to 36 megabytes/ 
sec 
Operation 
1 multiply and 
2 adds in 420 
nsec 
2 multiplies 
and 4 adds in 
420 nsec 
1 multiply and 
2 adds in 1/y sec 


























MAP's software system includes an extensive FORTRAN-callable array and 
matrix processing library of over 300 signal processing and math routines, an 
internal Executive (Snap II), and complete utilities which include Assem- 
2 biers, Simulators, Loaders, Debuggers, and Host Loadable Diagnostics. 
C. SEL-MAP Mini Vector Processors 
Systems Engineering Laboratories Inc. (SEL), came out with a family of mini 
vector processors called the VPS 3200, 3300 and 6400. The machines are aimed at 
scientific and engineering applications where intensive, interactive arithmetic is 
required. 
The VPS 3200 is based on the CSPI MAP-200 array processor, and has the 
capability of performing one add and 1/3 of a multiplication in 140 ns. The VPS 
3300, based on the CSPI MAP-300, and has the capability of performing twice as 
many operations at one time as the VPS 3200. 
SEL offers a package system which includes software and peripherals, based 
on the VPS 6400. Hardware includes Yi M byte of MOS memory in the 
minicomputer, 16 K words of program memory and 32 K words of data memory in 
the array processor, an 80 M byte Control Data Corporation disk drive, 75-ips 
Pertec tape drive, 600-1 pm Data products printer, 300-cps printer and Hazeltine 
CRT console. 
11 
Software for the vector processors includes a multiuser multi-programming 
Mapped Programming Executive (MPX-32), scientific run-time and vector process¬ 
ing libraries, compilers, assemblers, loaders, debuggers and media conversion 
utilities. MPX permits concurrent vector processing, multi-batch scale processing 
and multi-terminal program development. 
The VPS weds two corporate technologies. The bulk of the system consists 
of a System 32/77 high speed 32-bit computer, a machine known for its mammoth 
input-output data rates reduction and control application. Although wrapped in 
SEL sheet metal, the other part of the VPS system will consist of a 32- or 64-bit 
3 
vector and array processor from Computer Signal Processors Inc. 
IL Architecture and Basic Features of the MAP-200 and MAP-300 Array Processors 
A. Architecture and Basic Features 
The MAP contains special purpose processors designed to operate in a 
parallel manner to efficiently realize a given processing algorithm. Synchroniza¬ 
tion is achieved through a unique queuing and flag mechanism to allow the various 
processors to run at their maximum respective speeds. This architecture also 
facilitates programming by dividing the overall control, addressing, and arithmetic 
processing operations associated with the algorithm into separate program 
modules (see Fig. 3). 
The Central System Processor Unit (CSPU), acts as the MAP system 
executive, and interprets the Function Control Block issued by the host computer 
to the MAP. The appropriate instruction sequences and parameters are then 
supplied to the Arithmetic Processor (AP) for the actual data processing. The AP 
itself consists of two parallel functional elements. The Arithmetic Processor Unit 
(APU) which performs the actual arithmetic operations; and the Arithmetic 
Processing Addresser (APS) which produces the memory addresses for data to be 
retrieved or stored while the APU is operating, provide substantial time savings by 
permitting the arithmetic sequences to overlap the I/O operations rather than 










Pig. 3. Architectural diagram of the MAP-200 and MAP-300 array processors 
B. Shared Memory SEL-MAP Vector Processor Architecture 
The Vector Processor Unit (VPU) is an array processor capable of performing 
floating point calculations on large arrays of data streams. It operates as a 
processor to a 32 series computer system, and performs repetitive operations 
requiring a large number of summations and multiplications. 
The architecture of the Vector Processor Unit is such that executive 
(management), arithmetic addressing, and I/O operations are handled in parallel by 
separate processors within the Vector Processor. These processors communicate 
primarily by means of their commonly shared memory bus system. Thus, data I/O, 
data addressing and arithmetic computations are performed in parallel. This 




III. Timing of Calculations in the MAP-200 and MAP-300 Processors 
A. Units of Measure for Generic Instructions 
The effective time for a MAP-200 and a MAP-300 is 420 to 480 nanoseconds. 
The nominal timing for the generic instructions is: 
Multiply Instruction    480 Nanoseconds 
Add Instruction    240 Nanoseconds 
Move Instruction    80 Nanoseconds 
Control Instruction 
No Jump   80 Nanoseconds 
With Jump    240 Nanoseconds - MAP-200 
  160 Nanoseconds - MAP-300 
Therefore, the execution time for any arithmetic program can be computed 
by determining the total number of nanoseconds for the various instructions in the 
main loop and multiplying by the number of data elements to be processed. The 
overhead instructions for initialization need only be counted once. It is therefore 
obvious that the overhead times can be amortized when the MAP is used to 
process relatively long data blocks (see Fig. 4). 
B. Options for Program Efficiencies 
There are some programming efficiencies which can be incorporated into 



































M 0 V(IQA,M0) 
MOV(IQA,Ml) 
MUL(M0 , M4 ) 
MOV ( A0),MUL(M1 ,M5) 
MOV( P, Al ) 
9 R4 (AU+C) ADD ( A0 , A4 ) 80 
10 A3«-R , R«-[JAU+C)+ MOV (A3) ,ADD(A3,A1 ) 
BV] 
240 
11 0Q< R MOV(R,0Q) 80 
12 If more out- JUMP(4,E0) 
puts; jump to 
4 
240 
1 3 HALT CLEAR (RA) 80 
Fig. 4. Arithmetic processor program example. In order to perform the vector 
operation = AU^ + BV^ + C where Y, U, and V are N element vectors, 
this example might apply. 
Note that in this example, instructions 1, 2, and 3 are one-time input overhead 
and that instruction 13 is also overhead. Hence, the total arithmetic execution 
time for an N element vector is 
320 + N £2000] nanoseconds. 
17 
required and/or the amount of time required for execution. For instance, since a 
multiplication taking 480 nanoseconds and transfers can be accomplished in 80 
nanoseconds and an addition can be done in 240 nanoseconds an average of one 
multiply and two adds (executed in proper sequence and under the proper 
constraints) can be completed within a single 480 nanoseconds time frame. 
For instance the sequence: 
MUL (Ml, M6) 
ADD (Al, A3) 
ADD (A2, A4) 
MOV ( P, A5) 
will execute in a total of 560 nanoseconds because the arithmetic unit, will decode 
and execute the ADD instructions while the multiplier is operating on Ml and M6. 
However, the sequence: 
MUL (Ml, M6) 
MOV ( P, A5) 
ADD (Al, A3) 
ADD (A2, A4) 
requires 1040 nanoseconds because the MOV instruction cannot be executed until 
the P register has received the results of the MUL instruction. 
18 
The asynchronous, parallel architecture of the AP (Array Processor) allows 
the instruction decoder to continue executing instructions as long as the required 
hardware is not busy. 
Figure 4 is repeated below on the left and a re-ordered coding is presented 
on the right (see Fig. 5). Note that the number of instructions required has not 
changed, but the execution time has been reduced from 2000 to 1680 nanoseconds 
by initiating a transfer during one multiplication and initiating an addition during 






1 M0V(IQA,M4) 80 M0V(IQA,M4) 80 
2 M0V(IQA,A4) 80 M0V(IQA,A4) 80 
3 M0V(IQA,M5) 80 MOV(IQA,M5) 80 
4 M0V(IQA,M0) 80 MOV(IQA,M0) 80 
5 M0V(IQA,M1) 80 MUL(M0,M4) 480 
6 MUL(M0,M4) 480 M0V(IQA,M1) - 
7 MOV(A0),MUL(M1 ,M5) 480 MOV(A0)MUL(M1 ,M5) 480 
8 MOV(P,A1 ) 80 ADD(A0,A4) - 
9 ADD(A0,A4) 240 M0V(P,A1) - 
10 MOV(A0),ADD(A0,A1) 240 MOV(A0)ADD(A0,A4) 240 
11 M0V(R,0Q) 240 M0V(R,0Q) 80 
12 JUMPC(4,E0) 240 JUMP(4,E0) 240 
13 CLEAR(RA) 80 CLEAR(RA) 80 
Fig. 5. Programming efficiencies. 
19 
The bottleneck in this particular program sequence is at instruction 9 which 
requires that all previous instructions be completed before it is initiated. 
Further efficiencies can be achieved by overlapping transfers and additions 
to enable the entire program loop to execute in 960 nanoseconds. This is 
illustrated in the next section. 
C. Optimizing of Options for Program Efficiencies 
An optimized MAP program is one in which all of the arithmetic elements 
(multiplier, adder, and control) are operating as simultaneously as possible to 
implement the required operations. This implies that for many functions of 
interest involving multiplies (where there is at least 1 multiply for every 2 
additions), the execution time should be determined by the speed of the multiplier 
unit since additions and transfers can be "hidden" behind multiply operations. 
One way of accomplishing the overlap is to "fold" the operations required to 
produce the previous output behind the operations required for the present input. 
For the vector translation function, this means that the additions and transfers 
required to produce the previous output sample can be hidden behind the two 















Description Statement N-Sec 
M4«— -A MOV(IQA,M4 ) 80 
A4«— -c MOV(IQA,A4) 80 
M5«— -B MOV(IQA,M5 ) 80 
M04— — U ( k ) MOV(IQA,M0) 80 
Al*— -BV(k-l), 




A3<— AU(k-1)+ C, 
V(k-1 ) 
MOV(A3),ADD 
(A1 , A3) 
- 
Ml«— -V(k) MOV(IQA,Ml) - 
A 04— AU(k),BU(k) MOV(A0) ,MUL(Ml, 
M5 ) 
480 
0Q«— — V ( k-1 ), 




JUMP if more 
output 
JUMP C(4,EO) - 
HALT CLEAR (RA) 80 
TOTAL 960 NS 
Fig. 6. Optimized program efficiencies. 
21 
The program loop starts at statement 4 with the input of the present U- 
sample, U (k). It is then multiplied by scalar - A at statement 5. The present V- 
sample, V (k), is input at statement 7 and multiplied by scalar - B at statement 8, 
while the AU (k) product is transferred to an adder register. The first product is 
transferred to an adder register. The first product is then added to scalar - C in 
statement 9 before the next process is folded back through the loop. During the 
second pass, the BV (k) product is transferred to an adder register in statement 5, 
and added to the AU (k) + C result to form the required result which is output at 
statement 9. Notice that the second pass of adder operations and transfers has 
been hidden behind the 2 multiples performed on the first pass. The total number 
of loop cycles per output is seen to be reduced to 960 NS, which is just the time 
required to perform the pair of multiplies. All add and control operations have 
been hidden behind the multiplies which, for this example, represents the optimum 
solution. 
IV. Mathematical Expressions for Which Computer Routines are Developed 
A. Numerical Integration Expressions 
The numerical integration software routines for the array processor are 
developed to expand the capability of the RTHCS by taking maximum advantage 
of the digital high speed pipelining and parallel processing capability of the MAP- 
300 Array Processor. 
The software accepts from the host main digital an input vector consisting 
of 1 to 25 first order, state variable derivatives. The following four numerical 
integration algorithms are implemented with the array processors: (1) Runge- 
Kutta, 4th order; (2) Adams-Bashforth, 2nd order predictor; (3) Adams-Moulton, 
4th order predictor/corrector; (4) Mod-Gurk, 3rd order. All algorithms are 
implemented in both single precision and double precision real number accuracies. 
(1) Runge-Kutta 4th Order: 
Ynr V hY2 Yn 
Vn2- V hY2 Yn 1 
Y„,= Y „ + h/3 Y n 3 n n 2 
Yn+1= V *'« <V 2 Ynl + 2 Yn2 + Yn3> 
Note: h = integration step size. 
22 
23 
Integration from Y to Y ,, n n +1 
Input., : Yn, h, Yn 
Output-,: Y ., 
Input 2: Yn , h, Ypl 
Output2: Yn2 
Input 3: Vn. h, Vn2 
Outputs1 Yn3 
Input 4: Yn , h, Yn3 
Output^: Yn+1 
(2) Adams-Bashforth, 2nd Order Predictor: 
Yn + 1 = Vn + h'2 (3 \ - V,> 
Note: h = integration step size 
Input: Yn>h, Yn, Yn_, 
Output: Yn+lf Yn_, 
• • 
note: shift store Y ., = Yn 
(3) Adams-Moulton 4th Order Predictor/Corrector 
Predictor: 
Vl ■ % = h/24 C55 \ - 59 Vl + 37 Yn + 2-9 
Corrector: Yn+,=Yn+ h/24 [9 Yn+, + 19 5^., + 
Note: h = integration step size 
Y +.| = derivative value based upon predictor value 
p 
for Y at next time step 
ïnputp : Yn, h, Ÿn, Ÿn_r Ÿn_2» 
Outputp : Yn + 1 
P 
input : Yn+] 
25 
(4) Mod-Gurk, 3rd Order: 
Yn+1 = 
An Yn + An-1 + An-2 Yn-2 
+h (bn Yn + bn-1 Yn+1 + bn-2 Yn-2) 
A = 1.1462084 n 
‘n-1 0. 2010870 
Ap_2 = 0- 0548787 






Inbut: Yn, Yn-1, Yn-2, 
h’ V Vl ■ Yn-2 
Output: Yn+1, Yn_1, Yn_2, Yn_r Yp_2 
more shift store Y„ , = Y„ n -1 n 
Yn-2 = Yn+1 
Yn-1 ' V Yn-2 * Vl 
B. Coordinate Transformation Expressions 
The coordinate transformation software shall accept two modes of input: (a) 
a 3 component input vector and three angles (yaw, pitch, and roll) of transforma¬ 
tion and (b) a 3 component input vector and a 3 X 3 direction cosine matrix. The 
output of the offeror's software shall consist of the transformed input vector and 
the direction cosine matrix (when not provided). It shall be possible for this 
software to accept and transform up to 20 input vectors during each system call.^ 
Direction Cosine Matrix ( D ): Defined in terms of three angles — Yaw, 
T -1 Pitch, and Roll = 0, $, respectively. D is orthogonal D = D 
D (3X3) = 
D (1,1) = cos COS0 
D (1,2) = s i n COS0 
D (1,3) = - si ne 
D (2,1) = cos sin 0 sin <t> - sin cos <I> 
D (2,2) = s i n sin 0 s i n <D + cos cos cp 
D (2,3) = s i n <$> cos 0 
D (3,1) = cos V sin e cos $ + sin y s i n <P 
D (3,2) = si n V sin 0 cos <t> - cos sin <I> 
D (3,3) = cos 0 sin <5> 
26 
V. Time Analysis of the Mathematical Equations Above 
The execution time for any arithmetic program can be computed by deter¬ 
mining the total number of nanoseconds (NSec) for the various instructions in the 
main loop and multiplying by the number of data elements to be processed. The 
overhead instructions for initialization need only be counted once. 
The equation for Runge-Kutta, 4th Order, Evaluation has been used as a test 
routine and is being used as a format for the remaining mathematical equations. 
Figure 7 illustrates the flowchart of the routines. Functions of each routine is 
illustrated in Fig. 8. Test routines are shown in Fig.9 . An illustration of the final 
computation of the executive time for the Runge-Kutta Routine is shown in Fig. 
10. 
Each of the numerical routines can be accessed by a FORTRAN call 
statement of the form: 
Call | Routine name j (XQ, Y , F, B, N, ERROR, Y, NC), where 
X is the initial value of X. o 
YQ is an array N values representing the initial values of the solution. 
N is the number of equation in the system. (1 <N< 25). 
B is the final value. 
27 
28 
Fig. 7. Flowchart of the routines. 
29 
APRKI- RUNGE-KUTTA SCALAR INITIATION 
Multiplies each constant by the step size h and stores 
the products at locations JZ-24. 
APRK0- RUNGE-KUTTA EVALUATION 0 
Uses the initial values X? and Yp It stores Xjr in X 
and Yp 1n Z. 
- Calls User defined function, APFNT whose job is to 
compute Kp. APFNT computes Kgr and places it in Y. 
APRKI- RUNGE-KUTTA EVALUATION 1 
Uses Y ( K0 ), ab^, a^ , Xj/, Yp 
Outputs a new X and Z 
Moves 7 (1st output of APFNT) to Kp 
Calls APFNT which outputs K] , stores it at Y. 
APRK2- RUNGE-KUTTA EVALUATION 2 
Uses Y (k-j ), H, Yp, a2* *^21 
Outputs a new X and Z 
Moves ? (2nd output of APFNT) to K-j 
Calls APFNT which outputs K2, places it at Y. 
APRK3- RUNGE-KUTTA EVALUATION 3 
Uses Y (K2), Xp, Yp, a^®^31 ’ ®^32 
Outputs a new X and Z 
Moves Y (3rd output of APFNT) to K2 
Calls APFNT, which outputs'^, stores it at Y (K3) 
APRKY- RUNGE-KUTTA INTEGRATION ROUTINE 
Uses CD, Cl, C2, C3, and ARRAY W (Yp, Kp, Kl, K2, 
K3) Outputs YRK 




APRKCT i APURK0 
1 APSRK0 
APRKI | APURKI 
/ APSRKI 
APRK2 1 APURK2 
' APSRK2 
APRK3 r APURK3 
I APSRK3 
APRKY | APURKY 
' APSRKY 








* The APU and APS object 
module are combined under 
the name of APRKn.MO 
Fig. 8. (cont.) Functions of each routine. 
There is also a User Defined Function, named APFNT. APFNT has an 
APSFNT routine and an APUFNT. The above routines are representative of 
these functions 
31 
Fig. 9 . Routines and equations for Runge-Kutta, 4th Order, evaluations. 
Given h, Xo, Yo 
= Yo + hCo Ko+ hC^ = hC^I<2 + H C^l<3 
Ko = f ( Xo, Yo’) 
= f ( Xo + h Yo + ab-jg Ko) 
= f (Xo + A2 h To + bab20 Ko + h ab21 ) 
K^ = f (Xo+ a^h Yo + hab3Q l(o + hab ^ K^ + hab32K2) 
The column vector A consists of the constrants a^, a2, and By The two 
dimensional vector AB consists of the constants abjg, ab2g, ab2j, ab3Q, ab^^, and 
ab32* The column vector C consists of the constants Cg, C^, C2> and Cy 












a2 ab20 ab21 0 
G 1 
C 2 
-a3. -ab30 ab31 ab32- 
c 3 
The Runge-Kutta routines are APRKI, APRK, APRI1, APRK2, APRK3, and 
APRKY. Each of these routines consists of an APU and APS program. These 
programs each have an APRK.MS file (source code) which when run through the 
MAP cross assembler produces an APRK.MO (object code) and an APRK.LST 
(listing). 
32 
(x) = Y! (x) 








(x) + Y (x) + X 
i = 1 n 
(x) (x) 
Y 2 -j ( x ) = i 
Y5 
“ Y5 
+ Y (x) + X 




+ Y (x) + X 





(x) + Y (x) + X 
Y' r 9 (x) 
= 
Y9 
(x) + Y (x) + X 
Y' T10 (x) 
= 
Y10 (x) 
+ Y (x) + X 
Fig. 9. (cont.) Routines and equations for Runge-Kutta, 4th Order, evaluations. 
A differential equation of order 10 was used to test the Runge-Kutta routine. 
33 















59710 - TOTAL 
Fig. 10. Illustration of the final computation of the executive time for the Runge 
Kutta routines, 
Y is a vector of order N if NC = 0 and Y = Y (B) where Y is the solution 
If NC > 0 then Y is a NCXN matrix where the ith row of Y hold the solution 
at the ith step. 
ERROR is the tolerance. 
ERROR should be > 10 "12. 
Footnotes 
* Array Processors, Dr. Peter Alexander, CSP Inc., Burlington, Mass. 
2 
MAP Guide to MAP Family of I/O Interface CSPI The Array Processor, Billerica, 
Mass. 
3 
Article, Computer Science News, Sept. 24, 1979, SEL Unveils Mini Vector 
Processors 
Reference Manual. Vector Processor Common Memory System (VPCMS). Vol. 
1. IV System Overview, pp. 1.1-1.2. 
^Enhancement of AFWAL/FI Real-Time Hybrid Computer System. Wright - 




1. Reference Manual - Vector Processor Common Memory System (VPCMS). 
Vol. 1 of IV system Overview 
2. MAP Programming Course - Student notebook (MAP Overview). 
Articles 
1. MAP Guide to MAP Family of I/O Interface CSPI. The Array Processor 
Billerica, Mass. 
2. Computer Science News, Sept. 2^, 1979, SEL Unveils Mini Vector Processor. 
36 
