A multiprocessor implementation of a contextual image processing algorithm by Swain, P. H. et al.
8.0­
SR-PO-00474NAS9-15466AgRISTARS 
"Made vailable under NASA sponsorship A Joint Program for 
inthe interest of early and wide dis­
semination of Earth Resources Survey Agriculture and 
Program information and without liability Resources Inventory 
for any use made thereof." Surveys Through 
Aerospace 
Remote SensingSupporting Research 
July 1980 
Technical Report 
A Multiprocessor Implementation of a 
Contextual Image Processing Algorithm 
by Bradley W Smith, Howard Jay Siegel and Philip H. Swaint
(380-10313) A MULTIPROCESSOR IM PLEMENTATION N80-32808
 
OF A CONTEXTUAL IMAGE PROCESSING ALGORITHM
 
(Purdue Univ.) 
 234 p HC All/MF A01 .CSCL 05BG3/43 Uncls00313
 
Purdue University 
Laboratory for Applications of Remote Sensing 
West Lafayette, Indiana 47907 






































Star Information Form 




4 Title and Subtitle 5 Report Date 
A Multiprocessor Implementation of a Contextual Image July 1, 1980
 
Processing Algorithm 6 Performing Organization Code
 
7 Author(s) 8 Performing Organization Report No 
Bradley W. Smith, Howard Jay Siegel, and Philip H. Swai 070180
 
10 Work Unit No9 Performing Organization Name and Address 
Laboratory for Applications of Remote Sensing
 
11 Contract or Grant NoPurdue University 
West Lafayette, IN 47907 NA9-1 A6S
 
13 Type of Report and Period Covered 
12 Sponsoring Agency Name ano Address 
NASA/Johnson Space Center
 
Houston, TX 77058 14 Sponsoring Agency Code 
15 Supplementary Notes 
16 Abstract 
Contextual classifiers are being developed as a method to exploit the spatial/
 
spectral context of a pixel to achieve accurate classification. Classification
 
algorithms such as the contextual classifier typically require large amounts of
 
computation time. One way to reduce the execution time of these tasks is
 
through the use of parallelism. The applicability of the CDC Flexible Processor
 
system for implementing contextual classifiers is examined. Extensive testing
 
on a CDC Flexible Processor simulator was done. Results show a dramatic in­
crease in throughput can be obtained using the CDC Flexible Processor array.
 
17 Key Words (Suggested by Author(s)) 
Classifying image data, contextual 
information,contextual classifier, 
multiprocessor systems, remote sensing 
18 Distribution Statement 
19 Security Class! (otthis report) 20 Security Classit (of thispage) 
Unclassified I Unclassified 
'For sale by the National Technical Information Service Springfield Virginia 22161 
21 
I 








This work was supported by the NationaL Aeronautics and Space Adminis­
tration under Contract NAS9-15466.
 
This report is based on the Master of Science Dissertation by BradLey
 
W. Smith. Portions of this work have appeared in the following:
 
Philip H. Swain, H. J. Siegel, and Bradley W. Smith, "A Method for CLassify­
ing MuLtispectral Remote Sensing Data Using Context," Proceedings of the
 
Symposium on Machine Processing of Remote Sensing Data (IEEE Catalog No. 79
 
CH 1430-8), pp. 343-353, June 1979.
 
Philip H. Swain, Paul E. Anuta, David A. Landgrebe, and H. J. Siegel,
 
"Volume III: Processing Techniques Development, Part 2: Data Preprocessing
 
and Information Extraction Techniques," LARS Contract Report 113079, No­
vember 1979, 160 pages.
 
Philip H. Swain, H. J. Siegel, and Bradley W. Smith, "Contextual Classifica­
tion of Multispectral Remote Sensing Data Using a Multiprocessor System,"
 




H. J. Siegel, Philip H. Swain, and Bradley W. Smith, "Parallel Processing
 
Implementation of a Contextual Classifier for Multispectral Remote Sensing
 
Data," Proceedings of the Symposium on Machine Processing of Remotely Sensed
 
Data (IEEE Catalog No. 80 CH 1533-9),pp. 19-29, June 1980.
 
The authors wish to thank David A. Landgrebe and Leah J. Siegel for
 
their comments and suggestions. They also wish to thank Mel Boes, Dave Cur­







LIST OF FIGURES ............. .................... v
 
ABSTRACT ........... ......................... vii
 
1. INTRODUCTION ........ ......................... 1
 
2. OVERVIEW OF THE FLEXIBLE PROCESSOR ........ ........ 3
 
2.1 The Hardware ................ .......... 3
 
2.1.1 Introduction ............... ....... 3
 
2.1.2 The CDC Flexible Processor........... . 3
 
2.1.3 Register Files. . ............... 6
 
2.1.4 Registers and Arithmetic Units.......... 6
 
2.1.5 Micro-memory and Input/Output. ......... 8
 
2.1.6 Microprogramming of the Flexible Processor . . 9
 
2.1.7 A Flexible Processor Image Processing System . 9
 
2.2. The Software.. .............. ............ 9
 
2.2.1 Introduction .......... ... ......... 9
 
2.2.2 Registers. ......... . . ....... 12
 
2.2.3 The Transfer Constant Instruction ....... 12
 
2.2.4 The Transfer Register Instruction ....... 13
 
2.2.5 Using the Temporary Files . ........ 14
 
2.2.6 Using the Large Files... ..... 15
. ...
 
2.2.7 Programming the Arithmetic Logic Unit 15
 
2.2.8 The Index Registers .. ................ 19
 
2.2.9 ConditionaL Operations ........ 19
 
2.2.10 Subroutine CalLs, Program Jumps and the
 
Stack ... ........................ 22
 
2.2.11 The Hardware MuLtipLy . .. ..............23
 
2.2.12 Bus Registers ...... ........ ........ 24
 
2.2.13 Shifting Data ........ .... 25
 
2.2.14 Input/Output to the FLexibLe Processors. .... 25
 
2.2.15 Interrupts .... ..................... 27
 
2.3 ConcLusions ...... ................ 27
 
3. THE FLEXIBLE PROCESSOR ARRAY SIMULATOR .......... 28
 
3.1 Introduction ..... ..... ............. 28
 
3.2 Organizaton of the Simulator . . . ...... .... 28
 
3.3 Operation of the SimuLator ............. .... 29
 
3.4 Doc6mentation. ... ........ ......... ... 31
 
3.5 Changes to Increase Speed .... ............ 31
 
3.6 Flexible Processor Micro-Assembler ........... 34
 
3.7 Conclusions .................. ........ 34
 
-iv­
4. 	FLEXIBLE PROCESSORS SYSTEM IMPLEMENTATION OF MAXIMUM
 
LIKELIHOOD CLASSIFICATION ALGORITHM .... .......... ... 35
 
4.1 	 Introduction. . ............... . . ..35
 
4.2 	Implementation of the Maximum Likelihood Classifier
 
on an Flexible Processor Array .. ........... ..... 36
 
4.2.1 Introduction ..... ....................... 36
 
4.2.2 Subset of Classes for Each Processor Method . . 36
 
4.2.3 Subset of Pixels for Each Processor Method . . . . 39
 
4.3 	 Conclusions ... .... .. ..................... 43
 
5. 	FLEXIBLE PROCESSOR SYSTEM IMPLEMENTATION OF A CONTEXTUAL
 
CLASSIFICATION ALGORITHM ...... .................. 45
 
5.1 	 Introduction ........ ........................ 45
 
5.2 	 The Contextual CLassifier. . . . . . . . . . . . .. 45
 
5.3 	 Serial Implementation of a Contextual Classifier . . . . 47
 
5.4 	 FLexible Processor Implementation of a Simple
 
Contexual Classifier ....... 53
 
5.5 	 Contextual Classification on a Flexible Processor
 
System ....... ..................... ...... 55
 
5.6 	 Conclusions ... .... ... ... .................. 60
 
6. 	CONCLUSIONS ....... .................... ..... 61
 
REFERENCES ....... .... .......................... 65
 
APPENDICES ....... .. ............................. 66
 
APPENDIX 1 - Flexible Processor System Simulator Displays.. A-1
 
APPENDIX 2 - Simulator Flowcharts .... .............. A-4
 
APPENDIX 3 - Simulator Listing ... ............. ..... A-10
 
APPENDIX 4 - Flexible Processor Micro-Assembler Listing A-85
 
APPENDIX 5 - Implementation of the Maximum Likelihood
 
Classifier on a Flexible Processor ..... .... A-109
 






2.1 	 The Basic Components of a FLexible Processor.... . . 4
 
2.2 	 FLexible Processor Array Block Diagram ....... .... 5
 
2.3 	 FLexible Processor Structure ....... ... ........ 7
 
2.4 	 The CDC FlexibLe Processor Coding Form .... ...... 10
 
2.5 	 Typical Flexible Processor Configuration . . ... .. 11
 
2.6 	 Complete Listing of the ALU Mnemonics.. . .. ... 16
 
2.7 	 Complete ALU Instruction Set ... ...... ....... 18
 
2.8 	 ALU Mnemonics ....... ............ ....... 20
 
2.9 	 Conditional Mask Functions Implemented on Simulator . 20
 
3.1 	 Flexible Processor Simulator ControL Tree Diagram . 30
 
3.2 	 Sixteen-FLexible Processor Simulator Control Tree . 30
 
3.3 	 Simulator Commands ....... .... ................ 32
 
............. 	 32
3.4 	 Single Step Commands .... ........
 
3.5 	 Memory Editor Commands .... ............. ..... 32
 
4.1 	 An A By B Image Divided Among N Flexible Processors . 41
 
5.1 	 Horizontally Linear Neighborhoods . ............. 48
 
5.2 	 Pidgeon ALGOL Implementation of the Contextual
 
Classifier -- With Redundant Calculations .... .... 48
 
5.3 	 Pidgeon ALGOL Implementation of the ContextuaL
 
Classifier -- Without Redundant Calculations . ..... 51
 




(b) Inter-Flexible Processor Data Transfers Required -





5.5 Vertically Linear'Neighborhoods ... ... 

5.6 Diagonally Linear Neighborhoods ................ 58
 
5.7 The Diagonals of an A By B Image ....... .......... 58
 






Contextual classifiers are being developed as a method to exploit the
 
spatial/spectral context of a pixel to achieve accurate classification.
 
Classification algorithms such as the contextuaL cLassifier typically re­
quire large amounts of computation time. One way to reduce the execution
 
time of these tasks is through the use of parallelism. The applicability of
 
the CDC FLexible Processor system for implementing contextual classifiers is
 
examined. Extensive testing on a CDC Flexible Processor simulator was done.
 






Since man has been able to fly, he has attempted to gain information
 
about the earth from above. Over the past decade, attempts to extract infor­
mation from multispectral image data have proved increasingly successful.
 
Traditional methods of pattern recognition applied to individual picture
 
elements have yielded accurate results; however, greater accuracy can be ob­
tained if contextual information is also employed. Accuracy has been in­
creased by up to 55.8% using contextual methods [10]. Because the computa­
tional requirements of the contextual classifier are very large, convention­
al computer systems are not able to handle the processing on a real time
 
basis E103. One way to reduce the execution time of these tasks is through
 
the use of parallelism.
 
Various parallel processing systems that can be used for remote sensing
 
have been built or proposed. These include pipelined processors E13, mul­
timicrocomputer systems E8,93, and special purpose systems E43. The Control
 
Data Corporation (CDC) Flexible Processor system [1,2,3) is a commercially
 
available multiprocessor system which has been recommended for use in remote
 
sensing E5]. The Flexible Processor system includes up to 16 separate pro­
cessing units called Flexible Processors. In addition to the Flexible Pror
 
cessors, a typical configuration might consist of: a CDC Cyber 170 series
 
computer, a system controller featuring a Cyber 18-20 computer, up to 64K
 
bytes of bulk memory per Flexible Processor, and a high speed data transmis­
sion structure called a ring [5]. In depth discussion of both hardware and
 
software aspects of the Flexible Processor system can be found in Chapter 2.
 
- 2 -
There is a simulator for the Flexible Processor array written in the C
 
programming language [6), which runs on the UNIX operating system. The
 
simulator resides in 64K bytes of main memory, and 161280 bytes of secondary
 
storage during the simulation of 16 FLexible Processors. Further discussion
 
of the simulator is in Chapter 3 and Appendices 1, 2, and 3.
 
The main computation required by the contextual classifier resembles
 
the Gaussian maximum likelihood classifier. Since the maximum likelihood
 
classifier is considerably less complicated, the software development re­
quired for the contextual classifier was based on the maximum likelihood
 
classifier. The logic behind the maximum likelihood classifier, the use of
 
a multiprocessing system to execute a maximum likelihood classifier, and a
 




Contextual classifiers are discussed in Chapter 5. A description of
 
the contextual classifier, a serial algorithm, a multiprocessor implementa­
tion classifier, and timing analyses are given.
 
In Chapters 4, 5, and 6, timings of the Gaussian maximum likelihood
 
classifier and the contextual classifier are presented. Both classifiers
 
currently run on the Flexible Processor simulator as discussed in Chapter 3.
 
Chapter 6 draws conclusions on the usefulness of the Flexible Processor ar­
ray for performing contextual classifications.
 
- 3 ­






Key elements of the FLexibLe Processor are discussed first. The focus
 
is on the FLexibLe Processor, which is the basic building block of the FLex­
ibLe Processor System. Further details are in [2,3).
 
2.1.2 The CDC FLexibLe Processor
 
The basic components of a FLexibLe Processor are shown in Figure 2.1.
 
Each FLexibLe Processor is microprogrammable, aLLowing paraLLeLism at the
 
instruction LeveL. An example of the way in which N FLexibLe Processors may
 
be configured into a system is shown in Figure 2.2. There can be up to 16
 
FLexibLe Processors Linked together, providing much paraLLeLism at the pro­
cessor LeveL. The cLock cycle time of an FLexibLe Processor is 125 nsec
 
(nanoseconds). Since 16 FLexibLe Processors can be connected in a paraLLeL
 
and/or pipeLined fashion, the effective throughput can be drasticaLLy in­
creased, resulting in a potential effective cycle time of Less than 10 nsec.
 
A central feature of the FLexibLe Processor is its dual 16-bit internal
 
bus structure, enabling the FLexibLe Processor to manipulate either 16- or
 
32-bit operands. If 32-bit operands are used, the FLexibLe Processor can be
 
programmed to execute floating point routines (on its integer hardware)
 
based on the floating point representation of such systems as the IBM 370
 
and the PDP-11/70. If the needed data width is 16 bits, the FLexibLe Pro­










FILE LINECHANNELS "-- MEM. 
PERIPHERALS 
SMALL FILE 
 HIGH SPEED 32RING
 
CHANNEL
16w x 32B 

LARGE FILE , MICRO
 



























FP FP FP FP FP FP 
m -
LOCAL A 
PATH BULK BULKBUKATMMEMORY MEMORYM 
-- - CONTROL PATH 
-DATA PATH 




In each Flexible Processor, there are two files of, registers, one
 
called the temporary register file and the other the large register file.
 
Both are divided into 16-bit addressable subunits. If the needed path width
 
is 16 bits, the two files can act like four files, thus creating more ad­
dressable space. A special feature of the temporary fiLe is its two
 
separate read and two separate write address, registers. This can save much
 
CPU time in many types of matrix operations. The large register file has
 
its own two read/write address registers. It is possible to do either a
 
read or write to either file and simultaneousLy increment (or decrement) the
 
address register. The temporary file is 16 words, 32 bits each, while the
 
Large file is 4096 words, 32 bits each. All of the register files consist
 
of 60-nsec random-access memory.
 
2.1.4 Registers and Arithmetic Units
 
Details of the architecture of an FlexibLe Processor are shown in Fig­
ure 2.3. There are three 32-bit general purpose registers called the E, F,
 
and G registers. All of these registers are connected to the arithmetic
 
logic unit (ALU), which can perform 32-bit additions in 125 nsec. The E and
 
G registers are readable directly through the ALU. The general purpose re­
gisters can be shifted separately, or the E and F registers can be combined
 
into a 64-bit shift register for double-length shifts. The output of the
 
ALU is a 32-bit register, A, that is addressable by byte (8 bits). This
 
makes a variety of byte manipulations possible. Separate from the ALU is a
 
hardware integer multiplier, which takes two bytes and multiplies them to
 
produce a 16-bit result in 250 nsec. The input registers are the P and Q
 
registers, which are each 16 bits wide. The user can choose which two bytes
 




d.0 11 dxZ dx3 index registers 
mcr cr icr2 indexcompareregisters 
m r0nd ri mcr2 mcr3 riaintenanceompareregisters 
I 
2 
cmr3 condition Pask registersc-ImrO cr crMmmry 
- -'ZbrIZI !'": - I br' , 
16file file Q 
itemraryreitemrs' 
16fileoarsregQIfie on strnaIfir 





FIG. 2.3 FLEXIBLE PROCESSOR STRUCTURE
 
O1?GrNL PAG&T0OP PO04Q ,JA OIS 
8
 
registers and eight corresponding compare registers. The index registers
 
can be used for looping and can be incremented or decremented during any
 
statement not addressing those registers. The Flexible Processor also con­
tains a hardware jump stack, so it is capable of handling standard types of
 
program calls such as subroutine jumps.
 
2.1.5 Micro-memory and Input/Output
 
The micro-memory consists of 4K 48-bit words. It stores the micropro­
gram. Each Flexible Processor in a system can contain a different program.
 
Input/Output (I/O) for the Flexible Processor depends on the overall
 
system (i.e., the Flexible Processor array and its host machine). An Flexi­
ble Processor is capable of interrupting another Flexible Processor for I/O.
 
I/O among the Flexible Processors is done one of two ways. The first is a
 
very high speed communication link, arranged in a ring configuration 12,3].
 
It operates at four mega-words (16-bits per word) per second. Each Flexible
 
Processor has a station on the ring, and each station on the ring is con­
nected to two other stations. When an Flexible Processor does a write to
 
the ring, it gives 16 bits of data and the address of the destination. If a
 
station receives data for another address, it shifts the data to the next
 
station. This is continued until the data reaches the correct station.
 
Special hardware has been added to remove data from the ring in the event of
 
a station failure. The data is loaded into the "input file." This 16
 
32-bit/word register file can be used as a small buffer. Another form of
 
I/O is through up to 16 64k-byte banks of shared 160-nsec memory. This is
 
not as fast as the previous method; however, for large data transfers, it
 





2.1.6 Microprogramming of the FLexible Processor
 
The Flexible Processor is micro-programmed in "micro-assembLy
 
Language," allowing parallelism at the instruction level, as indicated in
 
the Flexible Processor coding form shown in Figure 2.4. For example, it is
 
possible to conditionally increment an index register, do a program jump,
 
multiply two 8-bit integers, and add the E and G registers, all, simultane­
ously. This type of operational overlap, in conjunction with the multipro­
cessing capability of the Flexible Processors, greatly increases the speed
 
of the FLexible Processor array.
 
2.1.7 A Flexible Processor Image Processing System
 
Figure 2.5 is provided as an example of one possible Flexible Processor
 
array configuration E53. The setup of this system has many desirable
 
features for picture processing. The paraLLeL-pipeLined architecture of the
 
Flexible Processors enables the system to do rapid matrix multiplications.
 
There are image displays attached, so it is possible to view the pictures.
 
The two 800-bpi tape drives, along with the 50M disk unit, contain enough
 
storage space for jobs that require large amounts of memory. In addition,
 
the system can handLe up to eight terminals on its resident operating system
 








The host for the Flexible Processor system is programmable in FORTRAN.
 
Flexible Processor programs written in assembLy language can be called from
 
the FORTRAN library, enabLing the calling programs to be written in FORTRAN
 



























.... L..- . . - . .. 
- - J -... .. L. - .4 4 4 - .4...L . ... J ..t i4. , I4 . I. .- L.,JJ.&.- .4.. ...L 4 4.44.. L . I..L ,44... 
-...4.  i~ 0 
4-. - 4 
-4 




4 - . . 
W.AWL.=L 
. .Wr . 
.l- .4. 
.i ii I , I.4-t4 
4 I4-4-4WL.L4.-A 
- 4-+Li._L+I- . A L W. -4 .4.*.-.I. 4 
4-i-i- . *-I - --. i4..* 
i 
... _ .. J 4- -- .....J.-- .4 4 I .---- -- -



























I I DISCA I 
YCOERTO 
SYSTE RINTER 
Fig. 25 Tm 
Fig. 2.5. Typical PP System Configuration. 
- 12 -

Processor assembly language, making-the use of the system much. easier. If
 
the necessary Flexible Processor routines exist, data analysis packages,
 
such as LARSYS, which are written in Fortran can, with very simple modifica­
tions, run on the Flexible Processor system. The rest of this section over­






The three general purpose registers (E, F, and G) are each divided in,
 
halves because they are 32 bits wide and the busses are only 16. The most
 
significant bits of the registers are referred to as the "one" group and the
 
least significant bits are referred to as the "zero" group. For example, the
 
most significant bits of the E register are called El, and the least signi­
ficaht bits of the E register are called EO.
 
The ability to address registers in groups of 16 bits allows one to ad­
dress halves of two separate registers simultaneously. For example, if one
 
wished to write into the upper 16 bits of the F register and the lower, 16
 
bits of the G register, the pair would be referred to as FIGO in the com­
mand. Both will get the same data, but they will get it in one machine cy­
cle instead of two. This increases throughput when, for example, loading in­
itial conditions.
 
2.2.3 The Transfer Constant Instruction
 
These registers can be loaded with a constant using the Transfer Con­
stant (TO instruction. Figure 2.4 shows the coding form. Line three gives
 
the form of the TC instruction format. Omitting the AAAA and the comments,
 
the basic form of the instruction is:
 




The $ telLs the assembler that the four following digits are to be inter­
preted as hexadecimal. This command pLaces the constant on both data Lines
 
to enable the loading of two registers simultaneously. The DST (destination)
 
is filled in by an appropriate register which can read off the corresponding
 
bus. Not all registers can provide data to ("source") or receive data from
 
("destine") both busses. For example, Fl can not read (destine) bus 0, the E
 
and G registers can only be sourced into the arithmetic logic unit, the
 
El and GO registers can read onLy from bus 1 E33. Some examples of correct
 
TC instructions are: '
 
TC $FFA8 EOGI FIGO,
 
TC $0100 ED GO
 
TC $0101 ED NOP
 
The first command in the example transfers the hexadecimal constant FFA8 to
 
the sixteen-bit registers ED, Fl, GO, and GI. The second command transfers
 
the hex constant 0100 to the ED and GO registers. In the third command, the
 
NOP indicates bus 1 is not used. Note that whiLe it is not possible to
 
source two different registers at the same time, it is possible to destine
 
two registers off the same bus.at the same time.
 
2.2.4 The Transfer Register Instruction
 
Another way in which the registers can be used as a source of informa­
tion is in the Transfer Register (TR) instruction. This is the fourth format
 
shown in Figure 2.4. The basic format of the instruction is:
 
TR SRCO DSTO SRC1 DST1
 
This instruction tells the computer to source the register in the SRCO field
 
to bus 0 and to use the register(s) in the DSTO field as the destination(s).
 
In the event that the other bus is not to be used, a NOP must be placed in
 
- 14 ­
both the SRC and DST fields corresponding to that bus.
 
2.2.5 Using the Temporary Files
 
A special feature of the temporary register files, discussed in 2.1.3,
 
is that it has separate read and write indices. The indices are TORA, TOWA,
 
T'1RA, and TIWA, which stand respectively for Temporary file 0 Read, Address,
 
Temporary file 0 Write Address, Temporary file 1 Read Address, and Temporary
 
file 1 Write Address. Each is four bits in length. When using the temporary
 
files, one usualLy initializes the index value and then uses special in­
structions to increment, decrement, or clear these registers whiLe doing
 
otheroperations. When storing information to a temporary file, the mnemon­
ic used is TFxf, where x is the file number and f is the function to
 
be performed. The following is a list of the available functions:
 
U increment the corresponding index
 
P decrement the corresponding index
 
C zero the corresponding index
 
N perform no operation on the index.
 
The machine will ,update the read or write address depending on the context
 
used, i.e., if a temporary file is used as a source, the read address will
 
be assumed, and if it is used as a destination, the write address will
 
be assumed, Some examples are as follows:
 
TC $0101 TFOU TF1D
 
TC $0101 TFON TF1C
 
TC $0101 TFOC TF1C
 
In the examples, the hex constant 0101 is stored in the temporary file while
 
the write pointer is incremented, decremented, unchanged, and cleared.
 
- 15 ­
2.2.6 Using the Large Files
 
The large files, discussed in Section 2.1.3 have only one pointer per
 
file, but are accessed in the same manner as the temporary fiLe. To access a
 
file, the format is LFxf, where x is the file number and f is the function
 
to be performed on the file. The functions performed are the C, D, and N as
 
defined in Section 2.2.5 and A which adds index register 0 to the
 
corresponding index and uses that location as the desired address. The in­
struction
 
TC $0101 LFOU LF1D
 
would store the hex constant 0101 in large files 0 and 1 while incrementing
 
the pointer for large file 0 and decrementing the pointer for large file 1.
 
The length of the large file pointers is 10 bits. Large file pointers are
 
called LOA and LIA. Both the large file and the temporary files pointers can
 
be accessed in the same manner as standard general purpose registers.
 
2.2.7 Programming the Arithmetic Logic Unit
 
In the TR instruction there is a field labeled ADD (see Figure 2.4).
 
This field controls the function of the ALU. Output from the ALU is avail­
able as the A (accumulator) register, which can be sourced in the same
 
manner as the F and G registers. In the event that the A register is not
 
sourced, the result is moved to the FO-Fl register pair. One feature of the
 
A register is different from the other general purpose registers in that it
 
is byte addressable. This ability makes it one of the most powerful regis­
ters on the machine. Figure 2.6 is a listing of the ALU mnemonics and a
 
brief interpretation of their meanings. It is important to remember that it
 
is possible to micro-program this machine; thus there are many possibili­
ties that are not in the mnemonic set. This is the extent of the assembler
 
S16 -
Mneihonic: Function: Comments: 
ADD A=E+G Twos complement add the. E andG regs. 
AND A=EG Logical AND the E and G registers. 
E A=E This is the method for sourcing the 
E register, making it possibLe to get 
data to either bus from the E register. 
E+l A=E+I This makes it possible to increment, 
E-1 A=E-1 decrement, and doubLe the E register 
E+E A=E+E without ever having to Load a con­
stant. 
E-G A=E-G Twos complement subtract the E and G 
register pairs. 
E=G A=E-G-1 The Flexible processor has a branch 
if negative command. If the E regis­
ter is Less than or equal to the G 
register, this will branch. 
EN A='E LogicalLy complement the E register 
(E NOT). 
G A=G This makes it possible to use the G 
register as a data source to both 
busses. 
GN A='G Logically complement the G register. 
OR A=E+G Logically OR the- E and G registers. 
SB1 A=E-G Ones, Comptement subtract the G 
register from the E register. 
SET A=E+'E Set A to alL ones. 
XOR A=E+G EXCLUSIVE OR E and G registers. 
ZRO A=E'E Load A register with all zeros. 
Fig. 2.6. 
Complete Listing of the ALU Mnemonics.
 
- 17 ­
mnemonics for the ALU, but there are more commands. Figure 2.7 shows a
 
listing of the entire command set. To be able to use this list, first type
 
either an A or a L (for arithmetic or Logical) and then a C or a N (for car­
ry or no carry). The ACL) determines the basic function type. The C(N)
 
further determines the type of function by determining the type of carry.
 
With the above, it is possibLe to use Figure 2.7 to determine the exact
 
function number desired. The only other entity necessary is the function
 
number (from 0 to F). Thus an ANF describes the arithmetic function in the
 
no carry portion of the table that is in the fifteenth row. All three of
 




As shown in Figure 2.3, the A register is divided into four bytes num­
bered zero through three. If AG is sourced, bytes 0 and 1 will be obtained.
 
Likewise, sourcing Al will yield bytes 2 and 3. If bytes 1 and 2 are needed
 
together, adding an SW (which stands for SWap bytes) to the end of AO will
 
yield the desired result. If bytes 0 and 3 are needed, adding an SW to the
 
end of Al will yield the desired result. Thus, AOSW is the correct way to
 
address bytes 1 and 2.
 
Another feature of the AO and Al registers is that they can do a right
 
shift, preserving the signs of the registers. This is accomplished by conca­
tenating a RS (Right Shift) at the end of the desired register. It is pos­
sible to do a right shift in conjunction with a byte swap. The ALU has the
 
ability to shift a byte of zeros into either (or both) of the AO and Al re­
gisters. This is accomplished by shifting both accumulators right by one
 
byte, and loading the upper byte of the pair with zeros. The mnemonic for
 
this is a RZ (Right shift Zero fill) concatenated at the end of the byte
 
pair desired. Figure 2.8 is a list of the possible combinations of the ac­
- 18 -
Function Logical Arithmetic Operations
 
Number Functions No Carry With Carry
 
= =
0 F = 'E F E F E+I
 
1 F = '[E+G] F = [E+G) F = [E+GJ+I
 
=
2 F = 'E G] F [E+'GJ F = [E+'G)+I
 
3 F = ['F F) F = -1(2's comp) F = 0
 
4 F = 'EEG] F = E+EE'G) F = E+[E'G2+I
 
5 F = 'EG] F = [E+G]+LE'G) F = [E+G+E'G)+
 
6 F = [E'G+'EG) F = E-G-1 F = E-G
 
7 F = [E'G) F = [E'G]-1 F = [E'G)
 
=
8 F = E'E+G] F = E+[EGJ F E+[EG]+I
 
9 F = ['E'G+EG] F = E+G F = E+G+I
 
A F = G F = [E+'G)+EG F = ['E'G+EG]+I
 
=
B F = EEG] F = EEG]-1 F EEG]
 
= 
C F [F+'F] F = E+E F = E+E+I
 
D F = [E+'G] F = [E+G]+E F = [E+G]+E+
 
E F = E+G F = [E+'G)+E F = [E+'G)+E+I
 
F F=E F=E-1 F=E
 




Complete ALU Instruction Set
 
- 19 ­
cumulators and the above operations [3]. The bus numbers are omitted be­
cause they can be sourced to either bus. Shift is done before swap. BO, BI,
 
B2, and B3 indicate the four bytes of the A register.
 
2.2.8 The Index Registers
 
In the diagram of the machine structure (Figure 2.3), there are four
 
index registers, four index compare registers, and four condition mask re­
gisters. None of the registers can be sourced for their contents alone. In­
dex register 0 and its corresponding compare register are 16 bits Long,
 
whiLe all the others are only eight bits long. The IDX field, shown in Fig­
ure 2.4, is the field that controls the operation of the indices and their
 
compare registers. An INx command, where x is one of the index registers,
 
will increment index register x. A DCx will decrement index register x
 
by one, while a CLx wiLL clear index register x. CLA will clear all regis­
ters. The Index compare registers (see Figure 2.4) are used to hold values
 




The condition mask registers control the condition to be used. These
 
registers do not have a one-to-one correspondence to the index registers.
 
Figure 2.9 is a list of the functions used in the current software (a full
 




It is possible to test for the conditions in Mask Register 0 by placing
 
a TN in the CND (CoNDition) column. Figure 2.4 shows the location of the CND
 
column in the coding form. To test for the logical "not" of the condition
 






















































































Z -one byte of zeros 
LS-sign of Lower two bytes 
US-sign of upper two bytes 
Fig. 2.8 
ALU Source Mnemonics. 

















Index Compare regO = index 0 
Index Compare regO 0 index 0 
Index Compare regl = index 1 
Index Compare regl # index 1 
Index Compare reg2 = index 2 
Index Compare reg2 # index 2 
Index Compare reg3 =.index 3 
Index Compare reg3 ; index 3 
Fig. 2.9. 
Conditional Mask Functions Implemented 
on Simulator. 
- 21 ­
the condition in Mask Register 3, an AD is placed in the CND column. Furth­
ermore, the AD must be placed at least two instructions after an increment
 
or decrement of the register in question. If the condition tested is true,
 
the current instruction is executed.
 
The ability to conditionally execute a statement enables a conditional
 
program jump. Recall that the basic form for a TC statement is:
 
TC $HHHH DSTO DST1
 
If DSTO is the MAR (Memory Address Register), then after execution of the
 
next statement, the Flexible Processor will do a conditional jump to the
 
value indicated by the hex constant, which can be a program label.
 
The following is an example of a conditional jump to hex address 1234:
 
TC $0001 NOP CMR3
 
TC AD $1234 MAR NOP
 
The first statement will set the condition mask, while the second statement
 
will jump to memory location 1234 if IDXO = ICRO. To do an unconditional
 
program jump, omit the AD. The following:
 
TC NEXT MAR NOP
 
will jump to the program label NEXT. Since the MAR and instruction fetch of
 
the Flexible Processor are buffered, it is impossible to do an immediate
 
program jump. This adds little complication to the programming, except that
 
the step to be executed before the jump is placed after the actual jump
 
statement. It is very important, when reading source code for the machine,
 
to remember that the order of execution is reversed.
 
The FLexible Processor contains two program status words. One can be
 
user loaded and is called PAST. The other contains the current program
 
- 2Z ­
status word and is called NOW.
 
2.2.10 Subroutine Calls, Program Jumps, and the Stack
 
As shown in Figure 2.3, there is a 16-by-12-bit stack called the return
 
jump stack. This is a typical buffer which is usedto hold return,addresses
 
as well as temporary data. As indicated in Figure 2.4, there is a field
 
labeled RJ. This controls the return jump stack. There are three- possi­
ble commands for the stack. SR (SubRoutine jump) will take the current
 
vatue of the MAR (which is pointing to the next statement), increment
 
it by one and store the result on the top of the stack. This wAlL be the,
 
return address. JP (JumP return) takes the current top of stack and
 
places it in the MAR'. DF (Delete First item) will delete the top. of
 
the stack. The JP does not perform the delete function. Another feature
 
of the SR, JP, and DF is that they all trap out interrupts.
 
The MAR is buffered, so all operations that seem to be performed on the
 
MAR are actually performed on the buffer. One program cycle is needed to
 
dump the buffer into the MAR. This makes the micro-assembly language some­
what confusing, as the Flexible Processor will execute the statement immedi­
ately following any modification to the MAR. For simplicity, the examples
 
use a NOP following a jump. In actual practice, however, this will be re­
placed by a statement that is more productive.
 
A typical subroutine jump looks like the following:
 
(Fields) type RJ $HHHH DSTO DST1 
label TC SR $1234 MAR NOP 
TC NOP NOP NOP 
The above routine will store label+2 on the stack, execute the NOPs, and
 





(FieLds) TYPE RJ $HHHH DSTO DST1 
TC JP NOP NOP NOP 
TC DF NOP NOP NOP 
This will take the top of stack, place it in the MAR, and then delete the
 
top of stack. Since the CND field is vaLid on all types of instructions, it
 
is possibLe to do a conditional subroutine jump just by placing the condi­
tion in the conditionaL field. The result looks like:
 
(Fields) TYPE CND RJ $HHHH DSTO DST1
 
TC AD SR $1234 MAR NOP
 
This will store the vaLue of the return address, execute the next
 
statement, and continue execution at location 1234.
 
2.2.11 The Hardware Multiply
 
The only remaining functional unit to be discussed is the hardware mul­
tiply. As shown in Figure 2.3, the inputs are the P and Q registers, which
 
are each 16 bits in length. The result of the multiply is a 16 bit product,
 
which can be the result of the multiplication of any two bytes. This is the
 
only case where the same byte can be sourced twice. The mnemonics for the
 
addressing is L for the lower byte, and U for the upper byte. Thus, to
 
multiply the Lower byte of the P register by the upper byte of the Q regis­
ter, a PLQU would be placed in the MULT field. Caution must be taken
 
when a multiply is initiated. A multiply takes two machine
 
cycles before the result can be sourced. If an interrupt is received
 
before the result is ready, the result will be lost. To prevent
 
such loss, it is necessary to trap out all interrupts. This is accom­
plished as follows: whenever a multiply is done, an SR is placed in
 
the RJ column of the first statement of the multiply, and a DF is placed
 
in the RJ column of the second multiply statement. The net result is to
 
- 24 ­
push a return address onto the stack and then pop it off the stack. This
 
wiLL trap out interrupts as needed. 'Further caution must be taken in that
 
the RJ stack is only 16 units long, so overfLow is possibLe.
 
If overflow occurs, no error will be flagged. The following is a routine
 
to square the Lower byte of the Q register.
 
(Fields) type RJ MULT $HHHH SRCO DSTO SRC1 DST1 
TC SR QLQL $0057 MAR NOP 
TR DF QLQL MULT FO MULT Fl. 
This not only does a multiply, but italso does a program jump and traps in­
terrupts all at the same time, showing how this machine obtains
 
very high processing speeds. (Consider that each program step takes ..125
 






This ruLe can be modified to the byte leveL, yielding the 32- bit resuLt in
 




The two registers in Figure 2.3 Labeled BRGO and BRGI are the bus re­
gisters. Normally these are used for breakpointing. It is possible to use
 
these registers for general purpose registers (ifno breakpointing is need­
ed). To write into these registers, BRGO and BRG1 are put into the respec­







The SH field of an instruction is shown in Figure 2.4. The OEINC,
 
OFINC, and OGINC fields determine what type of shift is to take place (Left
 
or right, circuLar or not, padded with ones or zeroes or data from the pro­
gram status word). The P field determines the Precision of the shift. If
 
the P field is set to S, all of the registers are treated as separate regis­
ters; however, if the P field is set to D (Double precision), the E and the
 
F registers are tied together as one register for the shift. There are com­
mands that not only determine the data to be shifted, but they also control
 
the conditions under which the shifts are done [3].
 
2.2.14 Input/Output to the Flexible Processors
 
Input/Output (I/O) is one of the most complicated parts of the entire
 
Flexible Processor system. I/O must occur in one of the following forms:
 
1. Flexible Processor to host
 
2. Flexible Processor to Flexible Processor
 
3. Flexible Processor to MOS RAM (shared bulk memory).
 
For large amounts of data requiring Flexible Processor to Flexible Processor
 
communication, FP to MOS RAM is the most reasonable form of data transfer.
 
If the high-speed communication link, as described in 2.1.5, is used, there
 
is only a buffer for 16 words of information. This requires very closely
 
timed algorithms, as any error would result in the loss of data. Each Flex­
ible Processor is connected to four 16-bit channels, which are called Direct
 
Storage Access (DSA) channels. Each of the channels is connected to four
 
banks of 250 nsec MOS RAM. Each bank of MOS RAM is addressed by bank and
 
channeL. Different banks on various channels may be shared. For example,
 
bank I on channel 3 may be the same as bank 2 on channel 1. The Flexible
 
Processor is capable of choosing a bank and address to which all the chan­
nels are linked through four S registers (Storage location) and B (Bank) re­
- 26 ­
gisters. Since the RAM memory is much slower than the clock cycle, the read
 
is done in two stages. Thefirst stage sends the address to the MAR of the
 
specified bank. Upon completion of a read, the Flexible Processor will au­
tomatically increment the MAR of the specified bank by one. Within the next
 
four cycles, the data will appear in the Zx register, where x is the channel
 
number (see Figure 2.3). The data will remain in the Zx register until the
 
next read is initiated. In the event of a "memory bank busy," or "data not
 
ready," the Flexible Processor will automatically wait for two machine cy­
cles, after which it will repeat the process. To do a write, the data is
 
sourced directly to the MBR (Memory Buffer Register) of the memory bank
 
corresponding to the bank register. (Awrite is a one stage process.) The
 
Flexible Processor is programmed to do I/O through the 1O statement type.
 
Figure 2.4 shows the form of the statement. The 10 statement is similar to
 
the TR statement in that arithmetic calculations can be done simultaneously
 
with I/O. The following statements show how to initialize the S and B regis­
ters. (The S and B registers are linked together so that they can be load­
ed in one statement.)
 
10 CND IDX RJ MULT ADD SRCO SRC1 IO CHO CHI CH2 CH3 
IO ZRO AO Al DS LS LS LS LS 
10 FO FO DS LB LB LB LB 
IO DF PLQL MULT MULT DS LSB LSB LSB LSB 
The first statement loads all four S registers with 0000. The second
 
loads all four B registers with the contents of FO. The third loads all
 
four S and B registers with the output of the multiplier. The DS stands for
 
DSA I/O. The leading L in the channel column stands for load.
 
After initializing the S and B registers, the read needs to be ini­
tialized, which is done by placing an R in the channel field of the channel
 




Zx register. To do a write, a W is placed in the channel fields into
 






With I/O, interrupts are often needed. The Flexible Processor has the
 
abiLity to handLe up to 16 different interrupts E2,3]. The FLexible Proces­
sor can interrupt itself, the host and other FLexible Processors. While
 
processing an interrupt routine, the Flexible Processor sets a flip-flop in­
dicating that an interrupt is being processed. This traps out all Lower
 
priority interrupts. The interrupt flip-flops are reset when the program
 






This has been an introduction to the parts of the Flexible Processor
 
and the parts of the instruction set that will be used in the Gaussian max­
imum likelihood classifier and contextual classification algorithms dis­
cussed in Chapters 4 and 5. For further documentation, consult the CDC
 
Flexible Processor Textbook [32.
 
- 28 ­




Each Flexible Processor has a complicated microprogrammable internal
 
architecture. This was overviewed in Chapter 2. As stated earlier, an ad­
vantage of this microprogrammable architecture is that it allows parallelism
 
at the instruction level. This makes user verification of the correctness of
 
Flexible Processor algorithms and accurate mathematical timing analyses of
 
these algorithms very difficult. Thus, in order to debug, verify, and time
 
Flexible Processor algorithms, a simulator and micro-assembLy language as­
sembler for an array of Flexible Processors have been developed. The simula­
tor and assembler run under the UNIX operating system on a PDP-11 series
 
computer, which has been used successfully to program a maximum Likelihood
 
classifier, as discussed in Chapter 4, and a contextual classifier, as dis­
cussed in Chapter 5. The simulator displays the contents of the Flexible
 
Processor registers on a terminal screen, in a format demonstrated in Appen­
dix 1. This chapter describes the organization and operation of the simula­
tor.
 
3.2 Organization of the Simulator
 
The Flexible Processor system simulator is based on a single FP simula­
tor developed at Purdue E7J. Its capabilities have been extensiveLy expand­
ed.
 
The current version can simulate up to sixteen Flexible Processors, the
 
maximum number allowed in an actual system. Further, should any further
 
'designchanges take place in the actual system, the simulator can be modi­
- 29 ­
fled to simulate up to forty-eight Flexible Processors. The current maximum
 
program length is 2000 Lines. The simulator occupies approximately 64K
 
bytes of main memory. 
The simulator is divided into four programs, alL written in C 
E6J, a Language much like PL/I or PASCAL. Each of the four programs her­
forms a different task. "Monh.c" is the system monitor, which interfaces
 
the simulator to the user. "EXECh.c" is the simulator, which simulates
 
all of the system instructions except the I/O and the shift in­
structions. "shioh.c" simulates the rest of the instruction set. The fi­
nal program in the set is "heLph.c," which contains a brief help file for
 
the user who is stranded in the monitor routine. In addition, helph.c
 
contains special routines that make the program consistent with all versions
 
of the UNIX operating system. This makes the program portable for use on any
 
system that supports UNIX and the C programming language. In addition, this
 
routine contains all ihe paging algorithims that are used, making the
 
routines localized, easing possible debugging problems in the future.
 
3.3 Operation of the Simulator
 
The program structure for a single Flexible Processor simulation can be
 
represented by the control tree diagram in Figure 3.1. All register files
 
are considered indexed registers. The 16-Flexible Processor system is basi­
cally the same tree structure, but there is one more level in the control
 
tree, as shown in Figure 3.2. The structure beneath the command level is
 
the same as for the single Flexible Processor case. If the monitor re­
ceives a 1#', it will move one node closer to the root of the control tree
 
on any of the branches.
 
In the Command Level, there are ten possible commands, which are shown
 









































Fig. 3.2. System Flexible Processor SimuLator Control Tree. 
- 31 ­
of one program step and will move to the single step node. Figure 3.4
 
gives the command set for the single step node. If the m is typed, the
 
only valid arguments are a '#1 or a register name. The monitor will print
 
the old value of the register and ask for a new one if the register named
 
is a single register. If the register selected is a register file, the
 
monitor will ask for the index. Upon receiving the index, the monitor
 
will print the old value and prompt the user for input. Valid commands are
 




These are all of the functions supported by the simulator at this time.
 
Appendix 2 contains flowcharts outlining the operation of the simula­
tor. Appendix 3 contains a source listing of the simulator. As mentioned
 
previously, the maximum likelihood classifier and a contextual classifier
 






At the beginning of every major portion of program code, there are com­
ments describing the program flow and variables. This should facilitate
 
understanding of the routine and future simulator modifications, as it
 
translates the routine from a computer language into English.
 
3.5 Changes to Increase Speed
 
Normally, output to the terminal is done one character at a time. This
 
requires the program to generate an interrupt to the operating system for
 
each character to be displayed. The operating system then checks several
 
flags, adds special characters where needed, awakens the device driver,
 







s 	 Single step program.
 
m Go to memory level.
 
L Load assembled object code.
 
t Print the contents of the
 
registers after the input
 




v 	 Save the current register
 




e XXX 	 Execute XXX program steps.
 
stop Exit from monitor routine.
 
! unix ,Execute system command.
 
# Move up one node.
 
p Print out aLL the registers.
 




Fig. 3.3. Simulator Commands.
 
s Single step program.
 
m Go to memory Level.
 
e XXX Execute XXX program steps.
 
# Move up one node to command level.
 
p Print out all the registers.
 
h,H Print out the help file, followed
 
by the name of the current node.
 
dtemp Display the contents of temporary fiLes.
 
dlarge Display the contents of large files.
 
dmem Display the contents of micro-memory.
 
Fig. 3.4. Single Step Commands.
 
Changes the old values to XXX.
 
Increments the index without changing the old value.
 
Decrements the index without changing the old value.
 
Return to original level (either the command level
 
or the single step level).
 





The output from a singLe execution step requires exactLy one screen, which
 
is 3370 characters. Buffering is done so that the computer handles the in­
terrupt routine once per screen instead of once per character. The only
 
change in the interrupt routine is that instead of displaying one character,
 
the computer displays 3370. This reduces the Load on the system by 3369 in­
terrupt routines per screen of output. Most of the time required for output
 
is not due to the physical transfer of data; rather, it is due to the other
 
areas of the interrupt routine. The net result is that the simulator output
 
is over 3300 times faster with buffering than without. While the different
 
command levels require different size buffers, the average increase in speed
 
due to buffering is 4500%.
 
The PDP-11 series computer uses 16 address bits; thus the maximum
 
amount of data address space is limited to 65,536 bytes. Each simulated
 
Flexible Processor memory and registers require approximately 60,000 bytes,
 
so a special paging routine was written to'page the simulated Flexible Pro­
cessor memories and registers in and out of main memory as required. Output
 
to disk is done in units of 65,536 bytes instead of units of 1 byte. This
 
makes the swapping routine to exchange a part of Flexible Processor memories
 
run in 1 second. Without buffering, this routine took 2.5 hours of straight
 
transfer time. Originally, this program required the total computing power
 




In a high level language, such as C, PL/I, FORTRAN, or PASCAL, one pro­
gram step corresponds to many machine steps. To minimize the number of
 
machine steps per-program step, frequently used variables were placed in the
 





requires the computer to Load the variable C from memory. The machi'ne then
 
loads C into a register, increments the register, and stores C back into its
 
originaL location. Frequently accessed variables are placed in a register,
 
so frequent memory fetches are less necessary. This often shortens the num­
ber of executed steps by three steps. When C is not used, C is stored and
 
accessed in the usual manner. Thus, the hardware of the computer was used
 
to obtain maximum throughput.
 
3.6 FlexibLe Processor Micro-Assembler
 
The micro-assembler E73 takes the FlexibLe Processor micro-assembLy
 
language and translates it into machine micro-code. A microprogram must end
 
with a # to signal the end of input to the micro-assembler. After the
 
micro-assembler is invoked, it prompts the user for the input file. When it 
is finished, it will move the assembled output to a fiLe called "object" 
'which can then be loaded into the simulator via the Load command. A source 




The Flexible Processor micro-assembler and simulator are operational
 
and have been used. Up to 16 Flexible Processors can be simulated. The
 
current versions do not include Flexible Processor-host, inter-Flexible Pro­
cessor (ring), and Flexible Processor-bulk memory communications.
 
- 35 ­
4. 	FLEXIBLE PROCESSOR SYSTEM IMPLEMENTATIONS OF
 




To demonstrate the use of a FlexibLe Processor system on a task less
 
complex than the contextual classifier, consider the analysis of Landsat
 
data using a Gaussian maximum likelihood classifier. Landsat measurements
 
are taken from four spectral bands and are received by the Flexible Proces­
sor as a data vector. Based on decision theory akin to that developed in
 
the contextual classifier model, the vector is classified by determining the
 
probability that it belongs to each information class and assigning it to
 
the class for which this probability is maximum.
 
The way in which an Flexible Processor may be used in implementing a
 
Gaussian maximum likelihood classifier is demonstrated below. The tech­
niques described are to be extended to the contextual classification algo­
rithm.
 
In Section 4.2, methods for implementing the maximum likelihood clas­
sifier on an Flexible Processor array are presented. The ways in which the
 
contextual classifier can be implemented on an Flexible Processor array are
 
presented in Chapter 5.
 
- 36 ­
4.2 Implementation of the Maximum Likelihood
 




Two methods for implementing the maximum likelihood classifier on an
 
Flexible Processor array are discussed. The first assigns to each Flexible
 
Processor a different set of classes, and each Flexible Processor processes
 
all pixels for its assigned classes. The second method assigns to each
 
Flexible Processor a different subimage, and each Flexible Processor
 
processes the pixels in its subimage for all classes. The basic matrix
 
operations are the same for both methods.
 
The ability to do a fast matrix multiply is at the heart of efficiently
 
implementing the maximum likeLihood classifier. The form for the matrix
 






where X is the data vector, Ui is the mean vector for the ith class, and Ci
 
is the covariance matrix £10) for the ith class.
 
4.2.2 Subset of Classes for Each Processor Method
 
Consider the use of the Flexible Processor array to perform these clas­
sifications using the first method. Assume there are m distinct classes and
 
the computer contains p Flexible Processors. Each Flexible Processor is as­
signed to process m/p classes. The large file in each Flexible Processor is
 
initialized with the inverse of the covariance matrix and mean vector for
 
each class it was assigned. The current data vector is stored in each Flex­
ible Processor in the temporary file. When a new data vector is loaded into
 
an Flexible Processor, it overwrites the previous one. For simplicity, but
 
- 37 ­
without the Loss of generality, in the following assume that m = p. If m is
 
greater than p, then in each Flexible Processor instead of applying just one
 
inverse covariance matrix to the data set several would be applied. This
 








 , creating a new vector. This vector would then be multiplied by (X-Ui)
 
resulting in a scalar. In our implementation, the order has been somewhat
 
1
altered. (X-U.)t is muLtiplied by a column of C1 , accumulating in a vari­1 

able called "sum." After this is done for a column j of C:1 , "sum" is muLti­
plied by (X-Ui). (the jth element of CX-Ui)), accumulating the result in a
 
variable called "hold" and re-initializing "sum" to 0 Ell. The following is
 






















where N is the dimension of covariance matrix, XEI) is the Ith eLement of
 
the input vector, and CEI,J] is the element in the Ith row and Jth column of
 
covariance matrix. At the end of the routine, the value contained in the
 
"hold" variable is the desired scalar. This algorithm requires fewer stores
 
and fetches than the standard algorithm, so it shortens the run time of the
 
process. All pointers are kept in the index register, further simplifying
 
the process. Temporary file locations are used for sum and hold, so the
 






One way to perform this algorithm is to have the host send (C.)-1  and
 
U. to Flexible Processor i. The host then sends the current data vector to
 
Flexible Processor 0, then 1, etc. When the processor receives the data
 
vector, it caLcuLates "hold." ' After the host gives all Flexible Processors
 
the data for pixel (i,j), it waits until Flexible Processor 0 has calcuLated
 
the vaLue for its "hold." The host then retrieves the value of "hold,"
 
loads FlexibLe Processor 0 with the data vector for the next pixel, and adds
 
a precomputed constant to calculate the discriminant function. The host ex­
ecutes this process for all FlexibLe Processors.. After the last Flexible
 
Processor has transmitted the result, the host does a compare and stores the
 
class index corresponding to the maximum of the discriminant values computed
 
for this pixel. Thus, the compares and adds are done by the host while the
 




This maximum likelihood' classifier implementation has, been programmed
 
on a simulator for a FlexibLe Processor array at the Laboratory for Applica­
tions of Remote Sensing., The simulator displays the contents of the main re­
gisters and provides a variety of tooLs for debugging, Flexible Processor mi­
crocode, as is discussed in Chapter 3.
 
Alowing 40 Flexible Processor machine cycles for each floating point
 
addition and 9 Flexible Processor machine cycLes for each floating point
 
muLtiply°, the number of machine cycles is as follows, where j = number of
 
pixels and, n = number of measurements (size of data vector':
 





 load covariance matrix: 

load and normalize data vector; 42jn+j
 
- 39 ­
inner loop of algorithm: 56jn 2
 
outer loop of algorithm: 61jn
 
56jn 2 + 103jn + 4n2 + 2n + i + 9
 
Floating point numbers had an eight-bit exponent and 16-bit mantissa.
 
This assumes that m, the number of classes, equals p, the number'.of proces­
sors. If m is greater than p, the runtime may be approximated by multipLy­
ing by Fm/pl.
 
Preliminary tests indicate that a single Flexible Processor will per­
form a maximum likelihood cLassification faster than a PDP-11/70 with float­
ing point hardware. Exact comparisons of the Flexible Processor array per­
formance with other systems are difficult without detailed information about
 
factors such as pre- and/or post-processing of the data not included in the
 
computation time, data precision used, memory load time, etc. However, to
 
give a general idea of the effectiveness of this approach, consider a clas­
sification of 256-by-256 pixels of Landsat data (n=4) using 16 classes and a
 
complete array of 16 Flexible Processors (and a host machine). The total
 
processing time is approximately 10.4 sec. ESL states that their array pro­
cessor gives up to an increase of 25 times over the IBM 370/158. On the
 
classification of four channels into eight classes, their time is 6.3 sec.
 
4.2.3 Subset of Pixels for Each Processor Method
 
An alternative method to perform the pointwise maximum likelihood clas­
sification of pixels using a Flexible Processor array is based upon having
 
each Flexible Processor perform the maximum likelihood classifier for a dif­
ferent section of the image. Recall, the contextual classifier performs
 
computations similar to those used by the maximum likelihood classifier, but
 




Consider performing a maximum LikeLihood classification on an A-by-B
 
image with N FlexibLe Processors. One way to approach the problem is to
 
divide the image into N subimages and have each Flexible Processor perform
 
the maximum likelihood classification for all pixels in its subimage. This
 
is shown in Figure 4.1. If all subimages have the same number of pixels,
 
then the Flexible Processors will be fully utilized and the classification
 
of the entire image will take approximately 1/N as much time as it would
 
take a single FlexibLe Processor to perform the entire classification.
 
Thus, maximum improvement, i.e., a factor of N, is obtained.
 
Consider the case in which each subimage does not contain the same num­
ber of pixels, which will occur when (A*B)/N is not an integer. This will
 
lead to underutilization of the Flexible Processors, but this underutiliza­
tion wiLL be negligible as will now be shown.
 
One way to approach this situation is as follows. To each of N-I Flexi­
ble Processors, assign a subimage of size
 
where Fxl, the ceiling of x, is the smallest integer greater than or equal
 




For example, if A=117, B=196, and N=16, then
 
F%-fl = r1433.25 I = 1434 
pixels are in each subimage associated with 15 Flexible Processors. The
 









An A-by-B Image Divided
 





are associated with one Flexible Processor. This sixteenth Flexible Proces­
sor will have fewer pixels to classify and thus wilL finish before the other
 
Flexible Processors (assuming that, on the average, the time for the float­
ing point calculations is approximately the same for all pixels, which im­
plies some underutilization of the FlexibLe Processor since it must sit idle
 
waiting for the others to finish). Ideally, a factor of N=16 performance im­
provement over a singLe Flexible Processor is desired, which, in this case,
 
wouLd require all 16 FlexibLe Processors to each classify 1434 pixeL-s. To
 
compute the utilization of the Flexible Processor array, divide the number
 
of pixeLs actually classified by the maximum number that could be classified
 
in the same amount of time if all 16 Flexible Processors were fully util­






Therefore, a factor of 99+% of N improvement is obtained.
 
In generaL, using the above assignment of pixeLs to subimages, the
 




N(A * B)/Ni * N
 
The maximum value of the denominator is A*B+N-1 and occurs when A*B = k*N+1,
 
where k is anoarbitrary integer. Therefore,
 
min((A * B)/C r(A * B)/Ni * N)) = (A * B)/(A * B + N - 1). 
Based on typical sizes of remotely sensed images and assuming that the max­
imum size of a Flexible Processor array is 16,
 
- 43 -
A * B > 10 *N, 
and
 
(A * B)/(A * B + N-I) > 99%.
 
Thus, in general, the worst case performance is 99+% of the ideal factor of
 
improvement over a single Flexible Processor.
 
In Appendix 5, the maximum likelihood classifier programs for this
 
Flexible Processor implementation are described. IncLuded are the routines
 
for fLoating points arithmetic, using a 14-bit exponent and a 16-bit mantis­
sa. The current algorithm, which runs on the simulator described in Chapter
 
3, uses 3526 125-nsec steps to process one pixel (four floating-point com­
ponent data vector) and two classes, including choosing the maximum value.
 
Performing a two class maximum likelihood classifier on 400 pixels of actual
 
Landsat data (as used in the tests described in [12)), a single Flexible
 
Processor averaged 410 microseconds per pixel (including the time to move
 
the image data from the bulk memory to the processor). Thus, if 16 Flexible
 
processors were used, each with its own bulk memory, an effective processing
 




Two methods of calculating a Gaussian maximum likelihood classifier
 
have been discussed. The timings for both algorithms have been discussed.
 
The first method presented requires the host to do much data collection,
 
while the second does not. It was shown that the second method provides
 
high utilization of the Flexible Processors. The actual micro-assembly
 
language program for the second method is presented in Appendix 5.
 
- 44 -
In the next Chapter, the way in which a parallel processing system such
 





5. FLEXIBLE PROCESSOR IMPLEMENTATION OF
 




This chapter expLores the actual implementation of a contextual clas­
sifier. Section 5.2 briefly describes the contextual classifier approach.
 
Section 5.3 gives serial algorithms for performing it. Section 5.4 presents
 
a Flexible Processor program to implement the contextual classification al­
gorithm with a simple size three neighborhood and an image size such that
 
the number of rows is a multiple of the number of Flexible Processors in the
 
system. The use of the FlexibLe Processor system to implement a general
 
contextual classifier is explored in Section 5.5.
 
5.2 The Contextual Classifier
 
The image data to be classified are assumed to be a two-dimensional I­
by-J array of muLtivariate pixeLs. Associated with the pixel at "row i" and
 
"column j" is the multivariate measurement n- vector X.. e Rn and the true
 
class of the pixel 6.. CQ = {Cl,...,w). The measurements have class­
conditional densities f(Xlwk), k = 1,2,...,c, and are assumed to be class­




In order to incorporate contextual information into the classification
 
process, when each pixel is to be classified, p-1 of its neighbors are also
 
examined. This neighborhood, including the pixel to be classified, will be
 
referred to as the p-array. Intuitively, to classify each pixel, the con­
- 46 ­
textuaL classifier computes the probabiLity of the given observed pixeL be­
ing in class k by aLso considering the measurement vectors (vaLues) 'ob­
served for the neighbor pixeLs in the p-array. Specifically, for each
 
pixeL, for each class in Q , a discriminant function g is calculated by sum­
ming the weighted probabiLities of the p-1 neighbor pixels occurring in aLL
 
possibLe cLassification states. This is described below mathematically for
 
pixel (i,j) being in class wk" The description is followed by an example to
 
clarify the notation used. Further details may be found in [10,11,13).
 
gk(X fM)= ) GP(i
 
i4
 j e j=k =fxti ezjG 
where
 
Xeij is the measurement vector from the ith pixeL in the p-array (for pix­
el (i,j))
 
e.1e ij is the cLass of the ith pixel in the p-array (for pixel (i,j))
 




GPOij)=GP(G1,2,...,lo of the p­p) is the a priori probabiLity observing 

array el,e 2,..., ep
 
Within the p-array, the pixel locations may be numbered in any convenient
 




To cLarify the computation of the discriminant function, consider the
 
following example. Let the context array (neighborhood) be the p=3 (two
 
- 47 ­
nearest neighbors) choice shown in Figure 5.1 with the pixels numbered
 
such that the pixel (i,j) to be classified is associated with X1 and and
I" 

pixel (i,j-1) is associated with X2 and e20 and pixel (i,j+l) is associated
 
with X3 and e3" Assume there are two possible classes: Q={a,b}. Then the
 
discriminant function for class b is explicitly
 
gb(Xij) = E I'](XIP le G3( 







+f(X1 Ib)f(X 2 1b)f(X 3Ib)G(b,b,b)
 
Note that G0i )=G(OIa2,03) is the relative frequency of occurrence in the
 
scene of the specific neighborhood configuration (e1,e2,e3). After comput­
ing the discriminant functions ga and gb for pixel (i,j), pixel (i,j) is as­
signed to the class which has the largest discriminant function value.
 
5.3 Serial Implementation of a Contextual Classifier
 
Algorithm 1, shown in Figure 5.2, is one way to implement the contextu­
al classifier. The particular classifier considered here is a horizontally
 
linear p-array of size three. This is shown in Figure 5.1.
 
First consider the main loop. Let the original image to be classified
 
be an I-by-J array called A. Columns 0 and J-1, the two side edges of the
 
image, are not classified since these pixels will not have both left and
 
right neighbors. The variable "value" will contain the maximum "g"
 
48 -
Fig. 5.1. Horizontally Linear Neighborhoods.
 
Main Loop Discriminant Function Calculation 
for i = 0 to 1-1 do /' row V function g(1,3,k) 
begin sum = 0 
for j= to J-2 do f* column for r = I to C do /* all possible 




value = -1 /* max "g" */ 
class = -1 /' class with max for q = 1 to C do /* all possible 
g" / classes t/ 
for k = 1 to C do /* for each class */ 
begin 
 sum = co0f(i,J-l,r)*compf(J,k)

current = g(x.3.k)currnt =g~x,,k)*compf(i,3+l,q) 
-G(r,k,q]+nam
 
if current > value
 
end 
then value = current
 
end









end Class-Conditional Density Calculation
 








epo- log Ik I- (x%)T X-l (x-i)] * . 
return ( e xp O)
 





(discriminant function) value calculated for pixel (i,j). This variable may
 
be updated as the "g" for each cLass is calculated. The variable "class" is
 
the class associated with "value." In the main loop, "g(i,j,k)" is a call
 
to a function to calculate the discriminant function for pixeL (ij) and
 
class k. This function is called I*(J-2)*C times, once for each class and
 
for each pixel being classified.
 
Consider the calculation of g(ij,k). The class of pixel (i,j) is held
 
constant at k, while all other possible class assignments are considers for
 
pixels (ij-1) and (ij+l). For each assignment of classes for the pixels
 
neighboring pixel (i,j), of which there are C*C, the product of the class­
conditional densities ("compf") is weighted by "G(r,k,q)," the a priori pro­
babiLity of observing the 3-array (wr wkWq). The "G" array ispredeter­
mined and prestored. For each call "g(ij,k)," the value of "sum" for that
 
1,j, and k is calculated. "Sum" is then returned as the value of
 
"g(i,j,k)." In this straightforward version of the g(i,j,k) routine, the
 
function to compute a class-conditional density ("compf") is called C*C
 
times each time "g" is called.
 
Now consider the "compf" routine. This calculates the class­
conditional density for pixel (a,b) and class k using the following equa­
tion:
 




where the measurement vector for each pixel is of size four, Sk is the in­
verse covariance matrix for class k (four-by-four matrix), mk is the mean
 
vector for class k (size four vector), "T" indicates the transpose, and
 
"log" is the natural logarithm. For each class, the algorithm uses
 
loglEI, Ek1, and mk as precomputed constants. For each call "compf
 
- 50 ­
(a,b,k)," the vaLue of .. .eexpo
eexpo, for that a,b and k is calduLated. is
 
then returned as the value of "compf (a,b,k)."
 
Algorithm I executes the "compf" subroutine (I*(J-2)*C 3) times. Since
 
for each pixel there are C "f"s (class-conditional densities), this approach
 
is inefficient by a factor of C2 . Algorithm 2 rectifies this problem by
 
saving certain "f" values rather than recalculating them.
 
Algorithm 2, shown in Figure 5.3, impLements the contextual classifier
 
without the redundant executions of "compf" that occur in Algorithm 1. Let
 
X, Y, and Z correspond to the pixels i, j-1), (i, j), and (i,j+l), respec­
tively, where (i,j) is the pixel to be classified. Each bf X, Y, and Z is
 
a vector of size C. Element t of X wilL contain the cla~s-conditional den­
sity ("compf") for the current i, j-1) pixel for class t. Y and Z are de­
fined similarly. By using these three vectors to save the class-conditional
 
densities, each density (for a given pixel and class) is calculated only 
once, instead of C2 times. 
The main loop of Algorithm 2 is modified to calculate the class­
conditional densities for the first three columns each timCl a hew row is 
considered (i.e., each time "i" is intremented) Each time a new pixel in a
 
given row is to be classified (i.e., just before "j" is ihcremented), these
 
values are updated. In particular, X gets the Y values, Y gets the Z
 
values, and new values are calculated to update Z.
 
The new discriminant function calculation, g', does hot call the
 
subroutine "compf." It gets the values it needs from the X, Y, and Z a ­
rays. For each call "g'(k)," the value of "sum" for that k it ealeulated.
 
"Sum" is then returned as the value of "g'Ck)."
 
The same "compf" routine is used for both Algorithms 1 and 2. Algo­






for I - 0 to 	I- do /- row / 
begin
 
for k = I to C do
 
begin /- compute Vs for ist 3 
columns / 
X(k)P' compf (x.O.k) 
Y(k) = compf (i.l.k) 
Z(k) - compf (x,2,k) 
end 

for 3 = I to J-2 do /- column/ 











cla class with max 'g 





current = g (k) 

if current > value 

then value = current 









if 3 < J-2
 
then /- update X,Y,Z arrays /
 
for k = 1 to C do 
begin 
X(k) = Y(k) 
Y(k) = Z(Ck)
 









































There are other techniques that can be employed to make Algorithm 2
 
even more efficient that have not been included in order to avoid obscuring
 
the basic program fLow. For example, whenever G(r,k,q) is zero, no multi­
plications are performed.
 
The serial complexity of Algorithm 2 can be caLculated in terms of as­
signment statements, muLtiplications, additions, and "compf" caLculations.
 
To initialize X, Y, and Z for new rows, I*C*3 assignments. and calLs to
 
"compf" occur. For each pixel, at most C+1 assignments to "value" and
 
"cLass" occur, and C calls to "g'(k)" occur. In addition, for each row, the
 
X, Y, and Z vectors are updated J-3 times, each update using 3*C assignments
 
and C calls to "compf." Each execution of "g'(k)" uses 3*C2 muLtiplica-

C2
tions, additions, and C2+1 assignments. Thus, the total compLexity for 
Algorithm 2 is: 
I(J(C3+7C+2)-(2C3+14C+4) assignments; 
3C3I(J-2) multiplications; 
C31(J-2) additions; and 
I*J*C "compf" calculations. 
The growth is proportional to I*J*C 3 assignments, multiplications, and addi­
tions, and I*J*C "compf" calculations.
 
In this section, a contextual classifier based on a horizontally linear
 
neighborhood of size three has been analyzed. Algorithms for contextual
 
classifiers using other size and shape neighborhoods would be analogous to
 
the algorithms which were presented.
 
Algorithms 1 and 2 are written for conventional uniprocessor systems.
 








Consider the implementation of a contextual classifier on an array of N
 
Flexible Processors. Assume the neighborhood is horizontally linear as
 
shown in Figure 5.1. Divide the A-by-B image into subimages of B/N rows A
 
pixels long, as shown in Figure 4.1. Assign each subimage to a different
 
Flexible Processor. The entire neighborhood of each pixel is included in
 
its subimage. Each Flexible Processor can therefore execute the uniproces­
sor algorithm presented in Section 4.1 on its own subimage. No interaction
 
between Flexible Processors is needed, i.e., each Flexible Processor can
 
process its subimage independently.
 
The LARS Flexible Processor micro-assembler and simulator are being
 
used to gather statistics on the execution time for the size three horizon­
tally linear neighborhood contextual classifier. Due to the fact that each
 
Flexible Processor is microprogrammable, determining program correctness and
 
analyzing execution times is done through the use 6f the micro-assembler and
 
simulator. The current implementation of the contextual classifier uses 780
 
microprogram instructions. Execution times per pixeL vary because all float­
ing point operations are done in the software. The classification time asso­
ciated with the first pixel on a line is different than the classification
 
of the rest of the pixels on the same line. This difference is accounted for
 
by the three-pixel window. Data must be calculated for each of the pixels in
 
the window for the first pixel on the line, while for the rest, data must be
 
calculated for only one pixel. The format of the data words of the pixel
 
measurement vectors, covariance matrices, etc., consists of a 14-bit two's
 
complement exponent and a 17-bit sign-magnitude mantissa. The covariance
 
- 54 ­
matrices, logarithms of the determinants of the covariance matrices, a
 
priori probabilities (GP), and the X, Y, and Z vectors are all stored in the
 
Large fiLMe. In this way, each Flexible Processor has alL the information it
 
needs for performing the classification on its subimage. The subimage data
 
itseLf would be stored in a bulk memory. A multiple Flexible Processor con­
figuration which associates one bulk memory with each Flexible Processor
 
would be best for this application. For testing the Flexible Processor con­
textual classifier program, the classification of two rows of eight pixel
 
measurement vectors (stored in the large file) using four classes was
 
evaluated. The data was actual Landsat data, as was used in [122. Evalua­
tion of the serial Algorithm 2 from section 5.3 showed that a PDP-11/70 re­
quired .073 seconds per pixel, while a single Flexible Processor required
 
.075 seconds per pixel. While, at first, it seems that the PDP-11/70 actu­
ally ran faster, lack of exponent range in the 11/70 fLoating point hardware
 
yielded the incorect results due to rounding error. To overcome this error,
 
by normalizing the data, it would require an extra .030 seconds per pixel,
 
thus the Flexible Processor is over 25% faster. The floating point is im­
plemented in software in the Flexible Processor and uses a 14-bit exponent
 
to overcome this problem. These tests are by no means exhaustive. The simu­
lator must run for many hours just to obtain a result for one pixel. Further
 
testing is in progress.
 
Using .1seconds per pixel as a rough approximation of the PDP process­
ing time, and .08 seconds per pixel as a rough approximation of a single
 
Flexible Processor processing time, a 16 Flexible Processor configuration,
 
where each, processor had its own bulk memory, would perform contextual clas­
sifications at a rate of 200 pixels per second as opposed to 10 pixels per
 
second for a single PDP-11/70. As mentioned in section 5.3, additional pro­
- 55 ­
gramming techniques that would increase this processing rate can be incor­
porated (this is currently in progress). Furthermore, as more experience in
 
programming FLexible Processors is obtained, additional improvements in ex­
ecution time can be expected.
 
5.5 Contextual Classification on a FlexibLe Processor System
 
Consider the impLementation of a contextual classifier on an array of
 
Flexible Processors as discussed in Section 5.4. Again, assume the neigh­
borhood is horizontally linear, as shown in Figure 5.1 and the image is di­
vided into subimages of B/N rows A pixels long, as shown in Figure 4.1. If
 
B = kN, where k is a integer, there is 100% utilization of the Flexible Pro­
cessors. Furthermore, there is no overhead for inter-Flexible Processor data
 
transfers, since the entire neighborhood of each pixel is included in its
 
subimage. Therefore, a factor of N improvement is attained.
 
If (A*B)/N is an integer, but B = kN + x, O<x<N, then Flexible Proces­
sors can be underutilized in order to keep neighborhoods within subimages,
 
or Flexible Processors can be fully utilized, dividing neighborhoods between
 
Flexible Processors, necessitating inter-FLexible Processor data transfers.
 
This is shown for a simple example in Figure 5.4, where N=2, A=3, and B=4.
 
In Figure 5.4(a) no inter-Flexible Processor transfers are needed, but FLex­
ible Processor number 1 is not fully utilized. In Figure 5.4(b) both Flexi­
ble Processors are fully utilized, but due the horizontally linear neighbor­
hood, at least pixel 11 will have to be sent to Flexible Processor number 

and at Least pixel 12 will have to be sent to Flexible Processor number 0.
 
If (A*B)/N is not a integer, some inter-Flexible Processor data
 
transfers will be necessary. The number of transfers will be a function of
 
the way in which the pixels are assigned to Flexible Processors, as in the
 
previous paragraph. To determine the computationally fastest approach when­
1 
- 56 ­
°°--°J °I !°  P
L° -FPO ° 
00 101 '02 0 00 01 !02 103 FPO
 
10 I 1110 1 12113
1P
 




(a) Underutilization With No Inter FP Communication.
 
(b) Inter-FP Data Transfers Required -- Full Utilization. 
HS 




ever B kN + x, O<x<N, requires knowledge of the actual image size, the ac­
tual number of Flexible Processors used, the exact time required to execute
 
inter-Flexible Processor transfers, and the length of the neighborhood.
 
There are two other cases of linear neighborhoods. There are vertical­
ly linear and diagonally linear, as shown in Figures 5.5 and 5.6. The verti­
cally linear case is just a 90 rotation of the horizontally Linear case.
 
The diagnonally linear case can be simplified to a 450 rotation of the hor­
izontally linear case for B = kN by the proper assignment of pixels to Flex­
=
ible Processors. Consider an A by B image, A < B and B Nk. Label the di­
agonals from 0 to A+B-2, as shown in Figure 5.7 for A=4 and B=6. The pixels
 
can then be grouped into B sets of A pixels as follows:
 








Using these rules, each FLexible Processor is assigned k groups. Thus, the
 
problem has been reduced to the equivalent of the horizontally linear case,
 
which has already been discussed. The case for B = kN + x, O<x<N, is even
 
more complex than the analogous situation in the horizontally linear case
 
and requires a detailed tradeoff analysis based on the actual image size,
 
the actual number of Flexible Processors used, the exact time required to
 




Now consider nonlinear neighborhoods, that is neighborhoods which do
 
not fit into one of the Linear classes. For example, all of the neighbor­
hoods in Figure 5.8 are nonlinear. Figure 5.8(a) and its rotations
 
represent the simplest nonlinear neighborhood. It is included in all other
 
0LU 






1 2 3 4
 
2 3 4 5
 
4 5 6 7
 
5 6 7 8
 
Fig. 5.7. The Diagonals of an A-by-B Image.
 
BEE 
(a) (b) (C) 
Fig. 5.8. Nonlinear Neighborhoods. 
- 59 ­




It can be shown that there is no way to partition an A-by-B image into
 
N (not necessarily equal) sections such that a contextual classifier using a
 
nonlinear neighborhood can be implemented without involving inter-Flexible
 
Processor data transfers. This will be demonstrated for the nonlinear ker­
nel and will thus be true for all nonlinear neighborhoods. There are three
 
cases to consider. If there is a horizontaL border between two subimages
 
stored indifferent Flexible Processors, then pixels 1 and 2 in 5.8(a) will
 
be in different Flexible Processors. If there is a vertical border, pixels
 
2 and 3 will be in different Flexible Processors. If there is a diagonal
 
border, pixels 1 and 2 will be in different Flexible Processors. The way in
 
which to assign Flexible Processors in order to minimize computation time
 
will depend upon the particular image size, number of Flexible Processors,
 
time required for inter-Flexible Processor communications and the shape and
 
size of the neighborhood. These factors will also determine the effective­
ness of the use of the Flexible Processor array for performing context clas­
sifications based on a given neighborhood.
 
The discussion of performing classifications with the Flexible Proces­
sor system demonstrates one way in which a multiple-processor system can be
 
used to hasten the processing of image data. Future work involves program­
ming the contextual classifier on the Flexible Processor simulator using
 
different size and shape neighborhoods and determining the most efficient
 
assignment of pixels to Flexible Processors for each case examined. The im­
plementation of the classifier will provide hard data to verify the effec­





Algorithms for performing contextual cLassifications using a size three
 
horizontally linear neighborhood was presented. Algorithm 1 was a straight­
forward approach. Algorithm 2 was a more efficient approach that avoided
 
redundant calculations. The serial computational complexity of Algorithm 2
 
was shown to have a growth proportional to I*J*C 3, assignments, multiplica­
tions, and additions, and I*J*C "compf" calculations. The way in which N
 
Flexible Processors could perform the classifications N times faster than a
 
single Flexible Processor was explained.
 
In summary, contextual classifiers have been shown to be powerful re­
mote sensing,tools in other papers. Their main disadvantage is their compu­
tational complexity. This Chapter has demonstrated how parallel processing
 





The goal of the research in this project has been to implement a con­
textuaL classifier on a simulator of an array of CDC Flexible Processors.
 
To achieve this end, the simpLer Gaussian maximum likeLihood classifier was
 
first impLemented. The maximum likeLihood classifier program provided a
 
vehicle for gaining experience in coding for a Flexible Processor and debug­
ging the simuLator. Computations performed by the maximum likeLihood clas­
sifier are identical to many of the computations required for the contextual
 
cLassifier, but the overall algorithm is considerabLy simpler. Thus impLe­
menting the maximum LikeLihood cLassifier provided a usefuL means for begin­




The next major step was to implement the contextuaL classifier on the
 
FlexibLe Processor simulator. As the program currently runs, it is approxi­
mateLy 25% faster on a singLe FLexible Processor system than it is on a
 
PDP-11/70. After extensive testing, using 300 pixels from actual Landsat da­
ta, the following is a list of average timings for the Flexible Processor
 
floating point algorithms used in the contextual cLassifier program(using a
 













A lFlexible Processor, operating on actuaL Landsat data, can perform a con­
textual classification using a size three horizontalLy linear neighborhood
 
and four classes at a rate of approximately 75 milliseconds per pixel.
 
Thus, a 16 FlexibLe Processor system would process approximately 215 pixels
 
per second. When more experience programming the Flexible Processor has
 
been gained, these times can most likely be improved.
 
It is important to realize that 60 to 90% of the processing time for
 
the contextual classifier is spent in software implementations of floating
 
point algorithms. Thus, the addition of floating point hardware (with the
 
needed precision) would greatly increase the processing speed of the clas­
sifications.
 
Recall that a Flexible Processor is programmed in micro-assembly
 
Language, allowing paraLlelism at the instruction level. For example, it is
 
possible to increment an index register conditionally, do a program jump,
 
multiply two 8-bit integers, and add two 32-bit integers -- all simultane­
ously. This type of operational overlap, in conjunction with the multipro­
cessing capability of the Flexible Processors, greatly increases the speed
 
of the Flexible Processor array.
 






Dual 16-bit internal bus system.
 
Able to operate with either 16- or 32-bit words.
 
125 nsec clock cycle.
 
125 nsec time to add two 32-bit integers.
 
250 nsec time to multiply two 8-bit integers.
 




In order to debug, verify, and time FLexibLe Processor aLgorithms, a
 
simulator for an array of up to 16 Flexible Processors has been developed.
 
This simuLator runs under the UNIX operating system on a PDP-11/70 series
 




The experience gained through the use of the simulator has made evident
 




Multiple processors (up to 16).
 
User microprogrammable -- paralLelism at the instruction leveL.
 
Connection ring for inter-FlexibLe Processor communications.
 
Shared bulk memory units.
 






Micro-assembly Language -- difficult to program.
 
Program memory limited to 4k micro-instructions.
 
Based on the investigations to date, the advantages of this system ap­
pear to outweigh the disadvantages. However, alternative approaches, such
 
as multimicroprocessor systems, should also be considered to determine the
 
most cost-effective approach for implementing the contextual classifier and
 
other computationally demanding image processing operations for remote sens­
ing.
 
Through the use of parallel, pipelined, and/or special purpose computer
 
systems such as the CDC Flexible Processor system, the types of computations
 
required for the contextual classifier and other computationally demanding
 
- 64 ­
processes can be implemented efficiently. This wiLL not only reduce the
 
time required to do contextual classification, but will also alow the in­






Ell 	 G. R. Allen, L. 0. Bonrud, J. J. Cosgrove, and R. M. Stone, "The Design 
and Use of Special Purpose Processors for the Machine Processing of Re­
motely Sensed Data," Proceedings of the 1973 Conference on Machine 
Processing of Remotely Sensed Data, (IEEE Cat. No. 73 CH 083-2GE), pp. 
1A-25-1A-42, Oct. 973. 
[2] 	 Control Data Corp., Cyber-Ikon Image Procesing System Design Concepts,
 




E3J 	 Control Data Corp., Cyber-Ikon Flexible Processor Programming Textbook,
 




E43 	 K. S. Fu, "Special computer architectures for pattern recognition and
 
image processing-an overview," in Proc. 1978 National Computer Conf.,
 
pp. 1003-1013, June 1978.
 
15] 	 J. L. Kast, P. H. Swain, and T. L. Phillips, The Feasibility of Using a
 
Cyber-Ikon System as the Nucleus of an Experimental Agricultural Data
 
Center, LARS Contract Report 021678, Laboratory for Applications of Re­




E63 	 Kernighan, B. W. and Ritchie, D. M., The C Programming Language, 
Prentive-Hall, Englewood Cliffs, NJ, 1978 - _ 
[7) 	 Krause, K. W., "Use of the CDC Cyber-Ikon Simulator," unpublished re­
port, School of Electrical Engineering, Purdue University, West Lafay­
ette, In 47907, August 1978.
 
[8) 	 H. J. Siegel, P. T. Mueller, Jr., and H. E. Smalley, Jr., "Control of a
 
partitionable multimicroprocessor system," Proceedings of the 1978
 
International Conference on Parallel Processing (IEEE Catalog No. 78 CH
 
1321-90, pp. 9-17, August 1978.
 
[9) 	 H. J. Siegel, L. J. Siegel, R. J. McMillen, P. T. Mueller, Jr., and S.
 
D. Smith, "An SIMD/MIMD multimicroprocessor system for image processing
 
and pattern recognition," Proceedings of the 1979 IEEE Computer Society
 
Conference on Pattern Recognition and Image Processing (IEEE Catalog
No. 79 CH 142T-2C), pp. 214-224, August 19.
 
[10) 	P. H. Swain, H. J. SiegeL, and B. W. Smith, "A method for classifying
 
multispectral remote sensing data using context," Proceedings of the
 
1979 Machine Processing of Remotely Sensed Data Symposium (IEEE Catalog
 
No. 79 CH 1430-8), pp. 343-353, June 1979.
 
- 66 ­
111 	 P. H'. Swain, and P. E. Anuta, D. A. Landgrebe, and H. J. Siegel, Vol.
 
III: Processing Techniques Development, Part 2: Data Preprocessing-and
 
Information Extraction Techniques, LARS Contract Report 113079, Labora­
tory for Applications of Remote Sensing (LARS), Purdue University, West
 
Lafayette, IN, November 1979.
 
[12] 	 P. H. Swain, S. B. Vardeman, and J. C. Tilton, Contextual
 
Classification of Multispectral Image Data, LARS Technical Report
 
011080, Laboratory for Applications of Remote Sensing (LARS), Purdue
 
University, West Lafayette, IN, Jan. 1980.
 
[13] 	P. H. Swain, H. J. Siegel, and B. W. Smith, "Contextual cLassification
 
of multispectral remote sensing data using a multiprocessor system,"
 




E14) 	J. C. TiLton, P. H. Swain, and S. B. Vardeman, "Context distribution
 
estimation for contextual classification of multispectral image data,"
 
Proceedings of the 1980 Machine Processing of Remotely Sensed Data
 





FLEXIBLE PROCESSOR SYSTEM SIMULATOR DISPLAYS
 
A. Simulator Output Display
 
B. Simulator Display of Temporary File 
C. Simulator Display of Large File
 
A. SIMULATOR OUTPUT DISPLAY
 
tr unn nap np pupu




-0000 OSf3 OIGd 0100 0100 0006 	 acO mult 6rgO fop 

IDXO: IDXI: 	IDXZ: IDX3: ICRO: ICRI: ICRZ: ICR3: El: EO:
 
0004 0000 0003 0004 0004 0004 0000 lfZa
0003 0004 





0000 0000 	 0010 0000 0000 0000 Sc40 0000 aSce
 
FP: FO:




0000 27dS 0000 0000 0000 

P: 0:
NOW: PAST: MMTO: MMTI: MMT2: OVPO: OVPI: 

0014 0014 0000 0000 0000 0000 0000 a2GS oSOG
 
TFOAR; 	 TFOAW: TEIAR: TFIAW; MULT;
IFOA: IFIA: LFOA: LFIA; 

0000 0000 0010 0010 0002 op 0002 0002 27d5
 
AINOUT: ORESIN: ARESIN: AROUT: QINOUT: 	AQZIN:
 
0000 	 0000 0000 0000 0000 9000
 
0000 0000 0000 000'0 0000
0000 

0000 0000 0000 0000 0000
0000 





B. SIMULATOR DISPLAY OF TE1IPORARY FILE
 
temp[ol = 0000 0000 







-mp[4= GQG 9400 
temptS] = GOOG Ceso 
tempE6) = 0000 0000 
temp[7) = 0000 0000 
-tempt83 eooo 000o 
tempt9] = 000 000 
tempEa) = 0020 0000 
temptb] = 0000 0000 
tempEc] = 000' 0000 
tempEd) = 0000 0000 
tempte] 0000 0000 
+Cmpcifl = 2220 0002 
C. SIMULATOR DISPLAY OF LARGE FILE 
ife] 8fco 0005 cooo 00fd 6080 8002 CGo Gefc 
1f[43 coGo eOfd 9780 GOS 980 0001 f8so Seel 
lfEES bOsO 8oo2 380 8001 a440 0004 bS80 8000 
IfEc] = 800 GOfc f880 8001 bBBo 8000 ffs 0003 
Ift103 = 0000 0000 0000 0000 0000 0000 0000 0000 
lfI143 0 0000 0000 0000 0000 0000 0000 
Iftial 0000 0000 0000 0000 0000 0000 0000 0000 
lfclc) 0000 0000 0000 0000 0000 0000 0000 0000 
1f[20a2 0000 0000 0000 0000 0000 0000 0000 0000 
If[243 0000) 000) 0000 0000 0000 0000 0000 000 
IfEZa3 = 0000 00 0000 0000 0000 0000 0000 0000 
If2c23 0000 0o0 0000 0000 0000 0000 0000 0000 
IfE303 = 0000 0000 0000 0000 0000 0000 000 0000 
1ft34 = 0009 000 0000 000 0000 0000 0000 000 
1f1383 = 0000ooo 00 0000 00 0000 00o0 0000 
If[3o] 0000 0000 0000 0000 0000 0000 0000 00O0 
1fE402 
1fE44) @000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 
1ft[6 ) 0000 0000 0000 0000 e000 0000 O00 0000 
I9t40 = e00 eee eeo oee.e 0000 o00 0 
fS] 02 0000 0000 0000 0000 000 0000 0000 0000 
fL543 = 0000 0000 2000 0020 000 0000 0000 0000 







A. 	 Setting Up Simulation 
B. 	Input FP# and Operation to be erformed
 
C. 	 Read and Modify Register or Program Memory 
Content
 
D. 	Execute Single Execution Step
 
E. 	Subroutine "Exec" for Executing Single In­










































i n t o b u f f e r  






No N+ No 

,Print PntIFill with zeroe
 
..










































opePa e-f a 
OP s n le a Irg a




command~~nt me o y a e-fa = 
close input










O e (p i tFP registers
Print valid 
















reg . input 
Is1
 






of register Pi "It to 
C,~ ~ eitro PriadMdrormMm ntet 
































JPrint and describe index I t pora 
Print contents of pindex satrtn W3l~cso 
11lFP registers address" ][arge tile11?1 
r dme 
rint 1 1:111at input 
"tartllIng LJpr nI!next g2 inde address" 1 AWAMoPBY 
4Page-flag =1 index 


























condtionlY~e a Retrn Subroutine skiJ 
I 




































Print registers | oturn ]I 
I Roturn I 
E. Subroutine EXEC for Executing Single Instructions. Subroutine SKIP
 












* x X X XXXX X XX X X
* x XXXX xx XX x x 
* X X X X X X X X XXXXXX 
* x XXxX X XX X X 
* X X XX X X X X X XX X X 
* X X X XXXX XXXXXX XX XX X X 






/* Global int: carry in; doubLe add parameters; opcode; condition
 
* flag; number of processors in use; */ 
int car,parO,parl,par2,par3,op,confLg,Lim;
 
/* bO-b3 eight bit ALU output chunks ( As defined in textbook )
 
* upper sign extended (ALU); lower sign extended (ALU); 






* Command at * level tabLe for use in lookup routine 
char *comtab[] {
 
lt, R,1 V 1, b, . so ,# ,P 1h • help, el $Ire, 
-1 }; 




"S", "dtemp", "dmem", "dlarg", "m" "b", I"p" "h", "1help", "aei,"4#1 
"r", -1 }; 































































































































































1* 0 *1, 
1* 40 *I, 
1* 440 */, 
1* 840 *1, 
1* c40 *1, 
1* 80 */, 
1* 480 */, 
1* 880 */, 
1* C80 *1, 
1* 180 */, 
1* 580 */, 
1* 140 */, 
1* 540 *1, 
1* 940 */, 
1* d40 * , 
1* 3c0 */, 
1* 7c0 */, 
1* cO *I, 
1* 4c0 */, 
1* 200 */, 
1* 600 *I, 
1* aOO */, 
1* eOO *1, 
1* 240 */, 
1* 640 */, 
/* a40 */, 
1* e40 */, 
/* 380 */, 
/* 780 *1, 
1* f80 *1, 
1* ffff */, 
1* 300 */, 
1* 700 *f, 
1* bOO */, 









05500 /* b40 */,
 
07500 /* f40 *1,
 
0700 /* lcC */,
 
05700 /* bcO *1,
 
0400 /* 100 */,
 
01200 /* 280 */,
 
03200 /* 680 */,
 
05200 /* a80 */,
 
07200 /* e80 */,
 
02700 /* 5cO */,
 
04700 /* 9c0 */,
 
01300 /* 2c0 */,
 
03300 /* 6c0 */,
 
05300 /* acO */,
 
07300 /* ecO */,
 
02400 /* 500 *1,
 
04400 /* 900 */,
 
0500 /* 140 *1.
 
02500 /* 540 *1,
 
04500 /* 940 *1,
 
06500, /* d40 *1 
-1 };
 
/* For some registers, they are loaded by Looking up there 



























021 /* 11 *1,
 
041 /* 21 */,
 




022 /* 12 */,
 
042 /* 22 */,
 








025 /* 15 */,
 
045 /* 25 */,
 
065 /* 35 */,
 
017 /* f */,
 





03 ' 3,*-, 
023 I*'13 */, 
010 /* 8 */, 
030 I* 18 */, 
050 I* 28.*, 
070 /* 38 */, 
011 /* 9 *1, 
031 /* 19 */,. 
051 /* 29 */, 
071 /* 39 */, 
016 /* e */, 
036 1* le */, 
076 /* 3e *1, 
067 /* 37'*, 
014, /* c */, 
034 /* 1c */, 
054 /*- 2c */, 
074 1* 3c *T/, 
0 /* ff'ff */,
 
0.15 /* d */, 
035 /* ld *1, 
055 1* 2d *1, 
075 I* Sd *I, 
0 /* ffff */, 
057 /* 2f */, 
04 I. 4­
012 1* a */, 
032 /* la */ 
052 I* 2a */, 
072, /* 3a */, 
027 /* 17 *I, 
047 /* 27 */, 
013, /* b */, 
033 /* lb *l 
053' /* 2b *t, 
073, /* 3b */, 
024 /* 14 *1, 
044 /* 24 *1, 
05 I* 5 
025 /* 15 */', 
045 /* 25 */, 
065 I* 35 */,-1: 


































































0400 /* 100 */,
 
04200 /* 880 */,
 
024200 /* 2880 */,
 
014200 /* 1880 */, 
034200 /* 3880 */, 
020200 /* 2080 */, 
030200 /* 3080 */, 




,023200 /* 2680 */, 
027200 /* 2e80 */, 
,037200 1* 3e80 */, 
03000 1* 600 *I, 
,07000 /* eOO */, 
013000 1* 1600 */, 
017000 1* IeOD */, 
027000 /* 2e00 */, 
035600 /* 3b80 */, 
03400 1* 700 */, 
021600 1* 2380 *I, 
025600 /* 2b80 */, 
031600 1* 3380 */, 
01000 1* 200 */, 
,05000 /* aO0 *I,
 
015000 1* laOO */,
 
025000 1* 2a00 */,
 
035000 1* 3a00 */,
 
021000 1* 2200 *I,
 
031000 1* 3200 */,
 
011000 1* 1200 */,
 
02400 /* 500 */,
 
06400 /* dOO */, 
012400 1* 1500 */, 
D016400 I* ldOD *1, 
026400 1* 2d00 *I, 
032400 1* 3500 *1, 
036400 1* 3d00 * , 
026000 1* 2c00 */, -1 





















































































int dlint[ { 
00 /* 0*,
 
0166 /* 76 */,
 
02 /* 2 */,
 
015 /* d */,
 

























































/* 2d *l, 
/ 3d * , 
1*1 */, 
1* 51 */, 
1* 71, *1, 
1* 31 *1, 
1* 41 */, 
1* 61 */, 
1* 21' *1, 
/* 4f *f , 
1* 5f *f , 
1* 6f * , 
1* 7f *f , 
1* f */, 
I* If *I, 






1* ic *1, 
1* 2c */, 
1* 3c */, 
1* 5c *I, 
1* 4e *I, 
1* 5e */, 
/* 6e */, 
1* 7e *1, 
1* 17 * , 
1* 27 */, 
1* 37 *1, 
I* 47 *I, 
1* 57 *1, 
1* 67 */, 
1* 77 *1, 
1* 4 */f 
1* 54 */, 
1* 74 *1, 
1* 14 *I, 
1* 34 *1, 
1* 44 *1, 
1* 64 */, 
I*'24 *1, 
1* a *1, 
7* la *1, 
1* 2a */, 
/* 3a *1, 
1* 5a */, 
1* 6a */, 
1* 7a */, -1 

















































































































/* the size of the large files has been changed to 4k to simulate the real
 
/* system more closeLy. 4096 Locations allocated
 
int If1 (4202-; /* was 1131 with lk */
I**I 
int mir 82981; /* was 2156 *1 
int mbr (8301-; /* was 2159 */
 
int resO (8304>; /* was 2162 */
 
int resl 8305>; /* was 2163 */
 
int busO (83061; /* was 2164 */
 
int bus1 (8307>; /* was 2165 */
 
int stack (8308>; /* was 2166 */
I/** I 
int ovl (83241; /* was 2182 */ 
int ovh (83251; /* was 2183 */ 
int shcon (8326>; /* was 2184 */ 
int mulflg (8327>; /* was 2185 */ 
int muLt (83281; /* was 2186 */ 
int mem (8329>; /* was 2187 */ 
int stptr (11525); /* was 4983 */
I**/ 
int cych (11526); /* was 4984 *1 
int cycl (115271; /* was 4985 *1 
int cycsh (11528>; /* was 4986 *1 
int s (115321; /* was 4990 */ 
int b {11536}; /* was 4994 *1 




"ainout", "qresin", "aresin", "aresout", "qinout", "aqzin", "brgO" 
"brgl", "cmrO", "cmrl", "cmr2", "cmr3", "e", "el", 
"foil","fill, "g"01, "gl", "indxO", "indxl", "indx2", 
"indx3", "icrO", "icr1", "icr2", "icr3", "intr", "imrO", 
"imrl", "ifa", "ifla", "LfOa", "Ifla", "mar", "marc", 
"inow", "mmiO", "imii1", "mmi2", "~mcrO", "mcr1", "lmcr2", 
"~mcr3", "ioupu", #,past", itp, IV, "tfO", "tfOar","0 

"tf0aw", "tfl", "tflar", "tflaw", "spare", "IfO", If1l",
"mir", "mbr", "resO", "resl", "bus", "bus1", "stack", 
"ovl", "ovh", "shcon", "muLflg", "mult", "mem", "stptr",











25, 26, 27, 28, 29, 30, 31,
 
32, 33, 34, 35, 36, 37, 38,
 
39, 40, 41, 42, 43, 44, 45,
 
46, 47, 48, 49, 50, 51, 52,
 
53, 54, 55, 56, 57, 58, 59,
 
60, 61, 63, 64, 65, 66, 82,
 
83, 84, 100, 101, 102, 106, 4202,
 
8298, 8301, 8304, 8305, 8306, 8307, 8308,
 
8324, 8325, 8326, 8327, 8328, 8329, 11525,
 
11526, 11527, 11528, 11532, 11536, 11540, -1 };
 




























char *srctabE] ; 
int srOintE ; 
int srcint[]; 
int srlint[] ; 
char *dOtab[] ; 
int dbOint[; 
int dOint[l ; 




int dlint[] ; 
char *regtabE] ; 













































































































* X X XXXX X X X X XXXX 
* XX XX X X XX X X X X X 
* XXX X X X X X X X X X 
* X X X X X X X XXXXXXX X 
* X X X X X XX X X XX X X 
* X X XXXX X X X X XX XXXX 
* 
* SIMULATOR SYSTEM MONITOR MOD 12 FOR VERSION 7 redone by BWS 03/01/80. 
* CRASHPROOFED on 01/21/80 by BWS. UNNECESSARY FUNCTION CALLS REPLACED WITH 
* GOTOs. THIS ALSO WILL SPEED EXECUTION AND SHORTEN PROGRAM LENGTH. 
* This must be compiled as 'cc monh.c exech._ shioh._ helph._ -is -o 
* where exec and shio can be .o or .c 
* The follwing includes incLude the standard I/O Library, 
* and two global files. 'inclOh.h' contains general global variabLes 
* while 'incllh.h' contains global variables pertaining to 
* the processor register variables. 'incllh.h' and 'incl3h.h' 
* are generated by 'glob.c' with regin as input. Therefore 
* to change processor variables, the variable is added or changed 
* in 'regin' and new include files are generated by 'glob'. 
* To change general global variables they must be physically changed 









/* name 	and stornam are the computer systems name for the swap area */
 
/* that 	will hold the memory for paging. This will enable the machine*/
 
/* to run with up to 62 FPs. Procnum is the processor number currently*/
 
/* in memory. Pagec is page modified flag C=0 if changed). This wilL */
 
/* increase the system responce time if one just wiches to take a peek*/
 
/* at what one of the otherprocessors is doing. 	 */
 
/* the tmpfd and statfd are the file descriptors for the temporary fil*/
 
/* and the status file. This is done in order that the old 10 package*/
 
/* may be used, and thus, the throughput of thesystem increased. The */
 
/* fd, is not currently being used in this version, because of the */
 
/* flag 	variable. If the program being loaded into an FP is the same*/
 
/* as a 	program loaded into another fp, the output file is not closed */
 
/* , as 	the close command takes over 12 seconds!!!!! The change in */
 
/* 10 packages took one evening of time, but the result is that the */
 






/* This 	is to buffer the output. This will give an increase of 17:1 in*/
 
/* The throughput of the system. The fflush will cause the buffer to */
 












/* flag is set to one only at the beginning of the program. The flag */ 
/* is set so that the program will not close the first program is loades*/ 





lim=15; /*the address of processor max*/
 
/* this 	section of code creates the name of the temporary file in /tmp*/
 
URIGINAL PAGE1jS 
OF POOR OTJATITP 
A-25
 








printf("temporary file used: %sO,stornam);
 








/* the originaL create opened the temporary file for read only. It was*/
 
/* cLosed. The following will re-open the temporary file for both */
 
/* read and write, this is why the new i/o Library was not chosen. */
 




/* Prompt for Status load */
 
printf("cyber Ikon simulator, hex mace 0);
 
printf("Do you want to load status? ");
 




















/* The function 'go' is the first level of command
 
* it will calL-itself and reprompt for th6 processor in 
* case of an error. It is also called by 'pros' on receipt 














/* the some part of the current page in memory has been modified, save*I
 
/* the current, version of the page. ELse, trash it! *1
 















int *reg; { 
int ij; 
char comE33,string40]; 
/* Print prompt and scan for command **
 
loOp: 	 printfC"-- "); 
fflush(stdout); 
scanf ("%s",com); 
/* 	 The following routine desides the action to be taken *1
 



































1, type in a t for print the current values of the registers */
 































1" type in a b X for a breakpoint after X steps of execution ,1
 











































recreate the register display on 'the screen without */
 
/* executing any of the :program steps. This is,used to */
 





/* execute help command, this is used to aid the user in */ 





/* This Mill run a trace on the program, printing the register */
 
/* values everytime it encounters a subroutine jump, or a return*/
 























/* After the statement has been executed, this part of the */
 





/* This will execute the number of program steps corresponding */
 












































I***** * ****** * ******* * *** **** *** ** **** * **** ** ****** *** ***** *** **** 
1" 	 jump back to the monitor routine. -1
 
if(quest(reg) == 13) goto lolp; 
}
 










































































































1* p = print the registers as they now stand. This will fail */ 



















/* This will run a trace on the compiled program. It will */
 
/* print the registers when it encounters an sr or a jp, effect*/
 

























1* Modify subcommand level, prompts with 'Register?' and *I
 























/* If it is a valid input register, its index is put in 'bias' */
 









/* Registers with an index of 3 */
 
case 0: case 1: case 2: case 3: case 4: case 5: case 53:
 


















/* Registers with an index of 15 */
 








/* Registers with an index of 4095 ( ie the large files ) */
 
















/* Registers with an index of 2 */
 



































* indexed registers; there is no need to return to the register prompt 


















/* Enter allows you to enter data, increment or decrement the
 
* indexed register. If the maximum index is exceeded, the monitor 
* returns to the 'Register?' leveL. 	 */ 
enter(add,reg, limit)
 




































































printf("Hex PLease 'D; 
fflush(stdout); 
scanf("%s",this); /* Dummy scan */ 
inhexO; 
/* GeneraL Lookup routine returns the index in a table 'chartab' 









for(j=O;(r=chartabli]Ej]) == itemEj] && r ! 'O';j++); 









/* Print routine prints aLL the processor registers
 


















if (strcmp(f6,"") != 0)
 
printf(" %s %s%s%s %ss%s",f6,fa,f7,f7b,f8a,f8,f8b);
 
printf(" %s%s%s %s %s %s %s %sOf9a,f9,f9b,flO,fll,f12,f13.f14);
 










printf(" "); pnt(reg[el)); pnt(reg[eO]);
 










printf(" "); pnt(reglgl]); pnt(reg[gO]);
 

































printf("OIFOA: IFA: LFOA: LFA:");
 

































/* CalLed by prireg, prints out a hex number i with Leading zeros */
 







1* Store the processor registers on 'status' in the format:
 
* index reglindex]--------------- reg[index+9] 	 */ 
/* Load 	the processor registers from the file 'status' in the
 

































/* Print the registers according to there index (found in regin)
 













































/* Memory editor, prompts for address then from that computes
 















/* Edit memory in much the same way as the indexed registers */
 
edit(reg,efad) 
int *reg; { 
char 	 verbE51;
 



































































































/* DispLay memory in readable format, keeps asking for more as Long
 




































/* DispLay the large files, keep displaying a page at a time until
 





























/* This routine will 'load a pasl formatted object file into memory */ 
A-41
 
/* Since the close command takes so long, this checks to see if the */
 
/* previously loaded file is different from the requested file. If it */
 
/* is, the previously loaded file is closed and the new file is
 
/* opened. If it is the same, no files are closed or opened, saving 12*/
 
























/* the first time throught, it would be rather difficult to close the */
 
/* previously opened file, as one does not exit. If flag = I, it is */
 









































** ****** * ** **** *** * **** *** ** *** **** ** **** ** *** ** * **** **** * *** / 
1* this routine, if called, wiLL store the page in secondary *I
 
/* store, this is the page out routine. */






1* this routine will get the correct starting address for the*/
 
/* beginning of the page. */
 
{ lseek(tmpfd,(24000L * (Long) procnum),O);
 
1* this routine does the write into secodary type storage. *1
 
1* fwrite becomes write(tmpfd,reg,24000); (fdes,bufptr,sizofbuf) *1
 
/* write returns -1 on error, or # of bytes written. *1
 
if (write(tmpfd,reg,24000) < 0) printf("cannot write!O);
}
 
/* This is the routine that Loads the "page" into memory. If*/
 
/* the page does not exits, the routine wiLL automatically */
 










/* the new read is done as follows: read(tmpfd,reg,10000) -1 for error
 








/* this is just a way of demonstrating that a page in memory*/
 
/* has been modified. This will speed upthe rate of data */
 








/* this function (called nargs) is just to satisfy a fluke in*/
 
/* the UNIX operating system. Because this program is run */
 
/* with separate Instruction and Data space, and UNIX is not */
 
/* quite equipped to handLe such things on system calls, this*/
 
/* was added to the code to prevent a dump every time that a */
 









* XXXXXX X X XXXXXX XXXX X X XXXX 
* X XX X X XX X X X 
* XXXXX XX XXXXX X X X X 
* X XX X X XXXXXX X 
* X X X X X X X X XX X X 
* XXXXXX X X XXXXXX XXXX X X XX XXXX 
* FIRST PART OF FP EXECUTION ROUTINES 
* CAUTION: CODE SHOULD BE ADDED TO SHIOH.C TO AVOID OVERFLOW ERRORS 
* OF THE FORM 'format error exec.o' THIS IS 11/70 FOR 
* ' You can't compile a fiLe that big 
* The include files here are expLained in monh.c, the ones 
* used here are of necessity non-initialized 
* This is compiled as 'cc monh. exech.c shioh. helph._ -0' 







/* 'exec' does some overhead work, zeros the output variables and
 
* updates the cycle count then calLs the actual execution routines 





























f 8b=" I,; 
f 9a=,,­




















/* The 	preceeding are executed whether or not the condition is met
 









/* Execute the ALU portion; check the arit/Log bit and call
 
* exarit or exlog based on that test 	 */
 
exaLu(reg) 
int *reg; C 











int 	 *reg; { 
f6[0='a'; 
car=O; 






















































































































































































































































/* Double register add routine simulates a 32 bit integer add
 





















/* Set processor overflow by the conditions described in the manuaL */ 
one=two=three=Ol; 
one=((Long) parl<<16) + ((Iong) parO & OxOOOOffffL) ; 
two=((Long) par3<<16) + ((tong) par2 & OxOOOOffffL); 
three=one+two+(Long) car;
 
reg~resl] = (int) (three >>, );
 








































































































































































































































































/* was !=*/ 	 maskt(reg); 
break; 

































































































































/* The following two routines are called by 'excon' to test for
 


















if(regticrO]<=reglindx0l) { confLg=ffff; return; ­
ifC(regEcmrl1&010)!=O)
 




















/* The folLowing is the same as 'maskf' except that all conditions
 
* must 	be faLse for 'confLg' to be set inhibiting execution */ 
maskf(reg)
 










if((reg[cmrl&02)!=O) { printf("Sense not simuLatedO); 
erflg=ffff; ) 
if((reg[cmrl&04)!=O) 




















if(reglicrll!=regEindxl]) { contlg=O; return; I
 
/* ConLod checks the conditions of the bus and sets the now register
 
* based on its state (and the state of various registers) 





































if((regEmir+2]&02000)!=0) { f2E2]='s'; reg[past]=reg[now]; }
}
 
/* Execute the multiply. 'mulflg' contains the value of the
 
* previous multiply and 'merflg' is set if the current 
* multipLy is not the same as the last. Results are not placed 














if(regEmulflg)!=O) {regEmulflg]=O; merflg=ffff; break; I
 
=








if(reg[mulfLgJ!=l) {regEmulflgl=1; merflg=ffff; break; }
 
=








if(reg[mulflg]!=2) CregEmulflg]=2; merflg=ffff; break; I
 
=reg[mult] (reg[q]&0377) * (reg[p)&0377); 
break;
 




if(reg[mulflg)!=3) {regEmu lflg)=3; merflg=ffff; break; I
 
=
regEmult] (reglp)&0377) * ((regEq>>8)&0377);
break;
 
ORIGINL PAGE IS 






if(reg~mulfgfl!=5) {reg[mulflg=5; merflg=ffff; break; I 
reg[multt= ((reglp>>8)&0377) * ((reglp]>8)&0377); 
break; 
case 6: case 9':
 
f5="puqL"; 
if(reg[mulflgJI=6) {regEmulfLg]=6; merflg=ffff; break; I
 








if(reg[mulflg!=7) {reg[mulfLg=7; merflg=ffff; break; I­








if(regEmuLflg) !=10) {regEmuLfLg=10; merfLg=ffff; break; I
 
=




case 11: case 14:
 
f5="qlqu"; 
if(regEmuLfLg]!=11) {regEmuLflg]=l; merfLg-ffff; break; I
 
=







if(reg[muLgl!=15) {regEmuLflg)=15; merfLg=ffff; break; I
 
=
reg[mult] ((reg[ql>>8)&0377) * ((regtq>>8)&0377); 
break;
I 
/* Return jump */
 
exrj(reg) 






















































































































































































/* Execute the main part of the instruction */ 
exmain(reg)
 





























































/* Use input codes to get an index returned from a list of valid
 






















































































































- al */ 
case 8: 
















































/* aOrs - al *1 
case 8: 








































/* aOrz - al */ 
case 8: 




else C sO=bl; s1 srsw(sl,reg); I 
break; 











































































/* alsw - aO */ 
case 4: 



































































































/* alrz - aO */ 
case 4: 














































/* Given that their are no speciaL ALU considerations this
 
* is where sources are determined. ss is the return 





































































































case 19: case 20: case 21: case 22: case 23: case 24: case 25: case 26:
 
















































































































































































/* If the z register is sourced, processor may simulate a four
 
* cycle 	wait state */ 









































1* The default case Look up the register index and sources the
 










/* Store the sO source into the dO destination */ 
stordO(reg)
 
int *reg; { 
































































































































































































































































































































































































































































































































































* XXXx X X X XXXX X X XXXx 
* X X X X ) X X, X X 
*xXXx xxxXxx x x x xxxxxx* 
* x x x x x x x I x x 
* 	 X XXXx x x xx x x X Xx 
*XXXX X X X XXXX X XUXX XXXX 
* SHIFT AND I/0 EXECUTION ROUTINES 


















1* Determine if its a double or single shift, formulate opcodes
 
* and call shift routines accordingly 	 */ 

































/* This routine execute the double shift and calls 'shcl' to execute
 












if(op!=opl) { printf("DoubLe shift errorO); erfLg=ffff;
 
return; I 

























































































































/* General single shift routine op contains the opcode 
* of the register to be shifted, aO and al contain 
* the indexs of the register and k keys where-the 














































































































































































































reg al J=reg al1&0177400; 
break;
 







































































































































































































/*	DSA read reads from the disk randomLy, flags overflow
 
Memory banks are files 'bankO',banki' etc.
 
* 	 Four memory banks are simulated, the read reads on alL four channels 














/* If file non-existant fLag error */ 
if(CbOptr=fopenC"bankO","r"))=='O') {
 




1* Scan memory input into z-reg until memad 'regis+j)' is matched
 
* 	 If the end of thd" bank is reached> an error is flaged */ 
for(i=O;i<=regs+jI;i++) if((fscanf(bOptr,"%x",&regtaq in+jJ))==-1) {
 





















































printf("Bank %d read errorOreglb+j)); 
erfLg=ffff; return; } 
tor(i=O;i<=regls+jJ;i++) if((fscanf(b3ptr,"%x",&reglaqzin+jJ))==-I') 



















/* The DSA write simulates a random access memory on the disk
 












/* See if file exists, if not create it and append zeros until
 
* the proper address is reached, then append the value in the 





















/* If the file exists, determine whether the value can be
 
* appended to the end of the file or whether it has to replace 










/* At this point it has been determined that the value must be inserted
 
* 	 in an existing file. The file is written on to a dummy file 
called ..buff and when the address is reached, the 'aqzin' value 

























/* Restore bankO */ 
syste("mv .. buff bankO"); 
break;
 
































































/* This routine writes on the disk in 'bankO' when it is indicated
 

















/* This routine writes on the disk in 'bank1' when it is indicated
 


























/*jf ptrname is 0, the pointer is set to offset.
 
/*if ptrname is 1, the pointer is set to its current tocation
 
it ptus offset. *1
 




/*if ptrname is 3, 4 or 5, the meaning is as above for 0, 1 and 2
 
1 except that' the offset is,mu'Ltiplied by 512.
 





1* This wiU. create the help fiLes to aid a used wi'th'the exec*/
 











* XXXXX XX XXXX X XXXX 
* X )( X X X XX X X 
* X X X X XXXX -X X 
* XXX XXXXXX X X X 
* X x xX x x xx x 
* X X X XXXX XXXXX XX" XXXX 
* 
* MICROCODE ASSEMBLER 1:01 9/10/79 
* 
* CAUTION: IF THIS SEEMS NOT TO WORK, CHANCES ARE ITS'YOUR INPUT 
* FILE. ANSWER THE FOLLOWING QUESTIONS YES BEFORE 
* ATTEMPTING TO 'DEBUG' THE ASSEMBLER. 
* 1. Is there a # at the end of the source Listing? 
* 2. Do alL the instructions have the proper number
 
* of fields ?
 
* -3. Are their no control characters hidden in the fiLe?
 
* If this dOesn't cure the Droblem check the tables to make sure 
* you have proper input formats, I could have sworn 
* many times that the assembler was bonkers when it was 
* my source. This assembLer should not have bugs 
* This program is compiled as 'cc pasl.c -IS' 
* The first line incLudes the new I/O package 

















/* Globals, line number; table index; error flag; output words;
 










* The following tables are the heart of the assembler * 
* a mnemonic is read in to one of the fieLd inputs and * 
* this is "looked up" in one of the following tables * 
* If a match is found, the index of the table at the matching * 
* word is returned. In many cases this is the opcode * 
* associated with the mnemonic looked up. If it is not, * 
* 'that index is used as an index to another array of integer * 
* constants whose values are the proper opcodes * 
* If the input mnemonic is not found, a -1 is returned and * 
* an error is fLaged. * 
/*
 











































"fpnl ' , 
"fps",






/* Lookups for opcodes in condition field*/
 
int conint[] {0,0,1,2,2,3,4,4,5,6,6,7,8,8,9,10,10,11,1 2,12,13, 





























































































7* Opcode Lookups */ 
int addint [] {037,032,051,06,06;043,043,023, 
054,a,057,033,036,046,020,020, 
-025,025,046,;026- ; 




































"la5", Ofla PAGE IS 









"n", ","0", " 1 ","21', "Y , "U', "u". a "," 
-1 3;it shcint[]l {0,0,1,2,3,4,5,6,7 1;
 
































































































































































































/* 440 */, 
I* 840 */, 
I* c40 *i, 
1* 80"*I, 
1* 48Q */, 
/* .80 */, 
1* c80 *I, 
/* I"Q *1, 
[l 580 * , 
/* 140 *1, 
/* 540 *i, 
1* 940 */, 
/* d40 */, 
1* 3c0'*I, 





























/* 700 */,. 
I* bOO *I, 


















































































































































int 	 srlintE] C 
/* 0*, 
/* 1*, 
/* 11 *1, 
/* 21 */, 
/* 31 */, 
/* 2*, 
/* 12 */, 
/* 22 */, 
/* 32 */, 
1* 6*, 
/* 16 */, 
1* 5*, 
I* 15 */, 
/* 25 */, 










/* 18 */, 
/* 28 */,
 
/* 38 */, 





/* 39 */, 
/* e */, 
A-94
 
036 /* le */, 
076 /* 3e */, 
067 /* 37 *1, 
014 /* c *1, 
034 /* lc */,
 
054 /* 2c *1, 
074 /* 3c */, 
0177777 /* ffff *1, 
015 /* d *1, 
035 /* 1d-*/, 
055 /* 2d *1, 
075 /* 3d */, 
0177777 /* ffff */, 
057 /* 2f */, 
04 1* 4*, 
012 /* a */, 
032 /* la */, 
052 /* 2a-*/, 
072 /* 3a *1, 
027 /* 17 */, 
047 /* 27 *1, 
013 /* b *1, 
033 /* lb *1, 
053 /* 2b */, 
073 /* 3b */, 
024 /* 14 */, 
044 /* 24 *1, 
05 * 5 *1, 
025 /* 15 *1, 
045 /* 25 */, 
065 /* 35 */ 













































-1 - I 
int dOint,[] C 
00 /*-0*/, 
0400 1* 100 */, 
04200 /*-880 */, 
024200 1* 2880 *1, 
014200 /* 1880 *1, 
034200 1* 3880 *f," 
020200 '1*2080 */, Anzp 
030200. /*3080 Q,& 
010200 /* 1080 */, 
023200 /* 2680 */, " 
027200 /* 2e80 */, 
037200 /* 3e80 */; 
03000 /*.60*/, 
07,000 /*"edO */, 
013000 1*1600 */, 
017000. /*'leOO */, 
027000 /* 2e00 *1, 
035600 /* 3b80 */, 
03400 /* 700 */, 
021600 /* 2380 */, 
025600 /* 2b80 */, 
031600 /* 3380 */, 
01000 I* 200 *I, 
'05000 /* aO0 */, 
015000 /* laO0 *1, 
025000 /* 2a00 *1, 
035000 /* 3a00 */, 
021000 /*2200 */, 
031000 /.* 3200 *1, 
01.0,00 1* 1200 */, 
02400 1* 500 */, 
06400 I* dOD *1, 
012400 1* 1500 */ 
Q16400 I* idO0 *I, 
026400 1* 2dO0 *I, 
032400 1* 3500 */r 
03'6400 1* 3dO */, 
a2P6000 /? 2cOO *; 
Bus one destination codes
 

















































































































/* d */, 
-/*ld */, 
/* 2d *1, 
/* 3d */, 
/* 1-1 *1, 
/* 51 */, 
1* 71 */, 
/* 31. */, 
* 41 *1, 
/* 61 */, 
/* 21 */, 
/* 4f"*/, 
/* 5f */, 
/* 6f */, 
/* 7f*/, 
/* f */, 
/* If * , 
/* 2f */, 
/* 3f */, 
/* 58 *f, 
/* c */, 
/* ic */, 
/* 2c */, 
/* 3c */, 
o* 5c */, 
/* 4e */, 
/* 5e */,
 
/* 6e */, 
/* 7e */, 
/* 17 */, 
./*27 */, 
/* 37 */, 
A-91& 
0107 /* 47 */.
 
0127 /* 57 */.
 
0147 1* 67 *1.
 




0124 /* 54 */,
 
0164 1* 74 */,
 
024 /* 14 *1.
 
064 /* 34 */,
 
0104 /* 44 */,
 
0144 /* 64 */,
 




032 /* la * ,
 
052 1* 2a */.
 
072 /* 3a */,
 
0132 1* 5a *1.
 
0152 /* 6a *1.
 


















printf("Cannot open source fileG); 
return; 3 











/* Action is based on first character in line
 
* # says end of file, go to pass two 
* * says comment 
* space or tab says instruction, no Label 
* anything else is a label */ 
switch(name[O]) {
 



































/* Default to build the 	label from the first characters ( until
 















































/* Check to see if name 	is in table */ ORIGI1AW ,





































printf(" Pass one complete:O 0");
 


































































































































I* GeneraL tabLe Lookup routine
 
, 
" This routine returns the index of the item in the
 































































outO=(iopdec(iop)<<12)I (car<<14)I (iodec(rch3)<<9) I (iodec(ch2)<<6) 
I(-iodec(chl)<<3) iodvc(ch0); 
fprintf(objptr,"%x %x %xO,outO,outl,out2); 
A-1 03
 












/* Determine if -cnstant-is hex or a label
 
* $ means hex number but the $ itself must be delimeted 











if ('fscanf (srcptr,"%x",&outlY==0) 


































/* Decode the ALU; carry log/arit returned if car; funcion number in
 













/* See if item addE] is in the addtab, it will be
 











/* Not a special mne. so decode in two,parts, car-Log/arit










printf("Invalid ALU field Line %dO,LInum); 









printf("Invalid ALU field Line %dO,Lnum);
}
 
/* General shift decode function
 
* Takes shop, the input character mnemonic and 





























printf'("Invalid SHIFT field Line %dO,inum);
 





















printf("InvaLid SHIFT field Line %dO,Lnum);
}
 



























printf("Invalid CND field Line-%dO,Lnum);
 
















printf('InvaLid IDX fieLd Line %dO,lnum);
}
 
1* Decode return jump field */
 
rjdec(rj) 
char *rj; t 
int ret; 
if((rjO=='*')&&rj[l]=='O')) return(O); 
if((ret=lookup(rj,rjtab)) != -1) 
return(ret); 
else 




















printf("Invalid MULT field Line %dO,Lnum);
}
 
















printf("InvaLid IOP field Line %dO,lnum);
}
 
















printf("Invalid 10 field Line %dO,tnum);
 
























printf("InvaLid SOURCE (Bus 0) field Line %dOlnum); 
else 



















printf("Invalid SOURCE (Bus 1) field Line %dO,lnum);
 








































printf("Invalid DESTINATION (Bus 0) field Line %dO,lnum);
 





















IMPLEMENTATION OF THE MAXIMUM LIKELIHOOD CLASSIFIER 
oN A FLEXIBLE PROCESSOR 
A. Initialization of the FPs by the Host Computer
 
B. Interrupt Routine for Flexible Processor
 
C. -Overview of Maximum Likelihood Classifier Flexible Processor Algorithm 
D. Flowchart- of Floating Point Addition Routine 
E. Flowchart of Floating Point Multiplication Routine 
F. Flowchart of Floating Point Compare Routine 
G. Actual Flexible Processor Program for Maximum Likelihood Classification 
A- --.o 
'A. ;INIIACIZATUON (OF THE fPs 1BY THE MOST :GOMPUTER 
1. InitiaAlize memory ,to zeroe. 
2. Send FP size, ip, sigma, X, and U. 
3. 'Calculate det:sigmk) 
4. 'Cal cuqate inv(sigma) 
5. 'CaIculate n]'(det.(-fgma.).) 
Z,. Ulc]u-latie, ,in,(,p' w, 
7. 'Send !FP in'(p.,ln (det,(_sii gma) :5 
18. -Send iFP inv(sigma) 
A-ill
 



































C.1. Load Data Into FP.,
 
1) Zero all registers. This includes all in­
dex registers, index compare registers, large file
 
address registers, maintenence compare registers
 




2) Read the first number and store it in re­
gister F.
 
3) Cop'y the number stored in the F register
 
into the index compare registers number 0 and 1.
 
(This number is the dimension of sigma.)
 
4) Load all conditions. (This means that the
 
index compare registers are going to test for
 
equality to n.) Index register three will check
 
for equality to zero.
 
5) Test and increment Index register' three.
 
If it i not equal to zero read A number, load it
 
into the F register.
 
6) Move the F register to temporary file zero
 
while inc-rementing the write counter.
 




S8). Zero all indes registers while moving n to
 
the P register of the multiply while trapping in­
terrupts. (This can be done using the "sr" com­
mand.) 
9) With interrupts trapped, move multiply
 
output to condition register 2. (Th-is means that
 
the condition registers are now set to check for
 
Index registers 0 and 1 equal to n, index regis­




10) Test and increment index register 2. If it
 
is n squared, exit.
 
11) Read a number, store it In large file
 
zero, while simultaneously incrementing its ad­
dress buffer.
 










Temporary files Large Files
 






vector for X[1,2] Sigma[n,l]
 




























mean vector fo.r U[I,I]
 
class one 	 U.['I,2] 
u ['1,3] 
U [.1 , 4] 




















C.3. First Matrix Multiplication
 
1) Initialize all registers. Move 1 to the
 
read address of temporary file zero. Zero all oth­
er index registers, large file addresses, tem­
porary file addresses.
 
2) Move temporary file 0 to the E register
 
(while incrementing the read address pointer.)
 
3) Move large file 0 to the G register (while
 
incrementing the address- pointer.)
 
4) Call floating point multiply routine.
 
5) Store result in temporary file 1, while
 
increasing the write pointer.
 








8) Increment index reg 0.
 
9) Go to step 1.
 
10) Increment index I by 1.
 
11) Zero F register. (This is used as the ac­
cumulater for the floating point add.)
 
12) Zero index register 0.
 
13) Zero temporary file 1 read address.
 




15) Call floating point add subroutine. (40
 
cycles.) (this routine has been modified to incre
 
ment the temporary file 1 read pointer as it goes
 
along, so this is not necessary.)
 
16) Go to step 14.
 
17) Temporary file 0 pointer = 1. 
18) Store f in large file 1 (while increment­








C.4. Second Matrix Multiplication
 
4) Zero all pointers to la-r-ge files and tem­
porary file I add.ress.
 
2) Write a 0 to temporary file address 0.
 
3) Transfer t-emporary file zero memory loca­
-ti-on 0 t-o index register 3.
 




5) -W.hil incrementing the, .pointer. to tem­
porary file 0, move the conteiL to tie ri 
ter. 
6) While incrementing the pointe-r T1--o''" tg
 
file 1, move the contents to the G register.
 




8) Call the floating point add routine.
 
9) Send the result to temporary file 1.
 








floating pt. D. FLOATING POINT ADDITION ROUTINE
 























































E. FLOATING POINT MULTIPLICATION ROUTINE 





FO = productFl = sum* I 


















E so ed 
-~ 






















E and G.,,n 
strip sin 
off E and,G<ysi 
Exoe estored 







F =I if t=-G 
- 0it Erf G 
A-I 9
 
G. Actual Flexible Processor Program for Maximum Likelihood Classification
 
org OD1a 
* Average time: 612 cycLes per pixel (down 10%) 
* Tin time: 21e cycLes per pixel (down 30%) 
*****BAYES MAXIMUM LIKELIHOOD CLASSIFIER VER. 021580 2:50****
 
*****For TWO PIXELS .......................................
 
tc * * * * fplr mar nop 
tc* * * * $ 0000 * 
* first interrupt routine. This routine handles the inter- *
 
* rupt to Load the covariance matrices, the mean vectors *
 
* and the data vector. *
 
org 00le 
tc * * * * vinr mar nop 
tc* * * * $ 0000 * * 
* this routine wilt handLe the interrupt when the host just *
 




tc* * ** $ 0000 * *
 
* there values are to be loaded into compare register 3. * 
* These will test the respective registers for inequality to *
 
* their compare registers. *
 
tc * cla * * $ O00a tOwa tlwa 
tc* * * * $ 0001 tfOn tfln 
******************WWWW****W**WWW ****W***** ******* 
* this will clear all of the. index registers and zero the * 
* temporary file write addresses. *
 
******W***W*WWWW**WW* *W********* ** *W****** *********
 
tc* * * * $ 0000 nop cmrO 
* This will clear the temporary file 0 read address and the *
 
* condition, register to prevent spurious results. *
 
tc* * * * $_0000 nop cmr2 
* this will zero the other condition register and the temp *
 
* file read address. The dimension of the incoming data is *
 
* assumed to be 4X4. If thematic mapper data is to be used *
 
* the matrix will be five by five. *
 




* This will store N in the index compare registers, *
 
tr* * * * * nop -nop f0 icrl
 
tc * * * * $ 0010 	 nOp icr2 
* this is just setting up the counter variables for the loop.* 
wait 	tc * * * * wait mar nop
 
tc * * * * $ 0000 brgO brgl
 
tr* * * pupu* mult eO * *
 
* the host wild start execution at 100 and wait here for the *
 
* host to interrupt the FP, at which point the FP will do a *
 
* program jump to $0007, where there will be a jump to the *
 
* correct routine. *
 
* This is the wait routine, which waits for an interrupt. 
*****************************Beg in Rout ine************************k*********** 
fpmr tr* * * * e al eO aO q 
* Load multiplicand * 
*** *** ****** ********** ************* ** 

tr* * ** g aO p al gO 
* load multiplier * 
tc* * * * $ 0004 * cmrO 
* this condition will check to see of fO < 0. * 
tc* * * * 	 $ 0002 * cmr3 
* is indexO = compare register 0? * 
************************** * ********* **** ** ************ 
tr * clO * puqu add al * aO fl
 
tr* * * puqu* mult fO fl el
 
tr* * * * * * * fO icrO
 
*If the value returned is zero, zero both registers, return*
 
tcad * jp * S 0000 fO f1
 
tcad * df * $ 0000 * *
 
* If fO is justified, return. The product is normalized. * 
* ** * **** * ******** *************** ** ******** ************ ****** ** 
tC* * * * $ 0000 * cmrl
 
tc tnn * jp * $ 8Off gl *
 







* 	 Save the exponent in gl, clear El for a counter * 
tr* * * * zro fl gl aO el
 
tr * clO* * zro aO eO * *
 
* By here, product cannot be zero.The normalization process * 
* will take less than four repeats of this loop. If it ever * 
* takes more, there is something branching directly to this * 
* process. * 
*********** * ************************ ** ****** ************ * 
nrm 	 tr fnn inO * * e+l aO eO al * 
sh fnn * * * * nzin Izin nzin s 
***************** *** *** **************W*** **** *** ** ** **** **** * 
* By now, the result must be normalized!!!!! * 
** ********************************* ****** ******** ********** * 
tr* * * 9 aG * al fl
 
tr* * ** e aD gl al go
 
tr* * ** * fO eO fl el
 
sh * * * * * Icir Icir Izin s
 
tr* * * * e-g aO eO al el
 
*WW*W*****W *******W*W************ *** *** ****** 
*This will take the normalized result, shift it left, adjust * 
*the expontent, so that it agrees with the mantissa. * 
******************* *****************WW**W************** * 
tc* * * * SO1ff gl *
 
tc* * * * $ ffff * gO
 
tr * * jp* acb aO * al fl
 
* this will "mask off" any carries into the unused portion of* 
* 	the exponent. * 
**** ** ******* ********** ************* ********************** 





* This routine does the initial setup of the variables * 
***** ****************** ************ *** ***************** * 
fplr 	tc * cl2 * * $ 0000 load lad 
*this clears all the index registers and the large file write* 
* pointers. * 
******** ******* **** ***** ** ** ****** ******* * * **** * 
tc* * * * $ 0100 nop icr2
 
tc* * * * $ 0010 nop cmr3
 
** ** ** * * ************* ************** *********** ****** **** * 
* the 0010 tests for index2 <> its compare. * 
*This will load the compare register to check for index * 
*regiter equal to its stored value. * 
***** ******** * ************* *** ** ** ************* * 
*This loads the temporary file with the osj al location of the* 
A-122
 
*mean vector. 	 * 
tc* *.* * $ 0000 	 fO * 
io* * * * * 	 fO * ds Lsb **** 
*This Loads the bank and address locati6n of the covariance * 
*matrix 	 * 
********** ****** ********** **********outine********** ************** 
********************************End* Rout*i*n**************************
 
tThis routine Loads the covariancematHix. 
imr 	 io* in2* * * * * ds r ** *
 
t * * ** * zO fO zl fl
 
* 	 this Loads the mantissa into the fI register. * 
* -	 and loads the exponent into the fO register. * 
tc ad * * * imr 	 mar nop 






* 	 this is the routine that Loads the mean vectors. 
* 	 This routine Loads all 16 mean vectors at once.
*4*****************************************************
 
mnr tc * cl2 * * $ 0010 nOp icr3 
imr fr * in2 * * * zO fO il f 
* This does the i/o call and oads-the number into the fO-fl* 
* register pair * 
* This does the i/o call fo th4 next number * 
sh* * * * * 	 * ln31 * 
* this shifts the mean vector to the Left, and negates. the * 
* sign bit. * 
** **** * ** * **** *** ** ****************** ************************ * 
sh* * * * * 	 * tcir * s 
*This negates the sign of the mean vector and shifts it in to* 
*the sign position. This is done because the vebtor rbiati~sd* 
*is in&M form. This way, an addition to the vector will * 
*actualLy perform the operation of sdbtracting the vector 
*from the addend. * 
A-123
 
io* * . . * 	 * * ds r ** * 
*This does the i/o call for the next number (if there is one)*
 
********** * ******* *********** **************** ********* * 
tc ad * * * imnr 	 mar nOp 
tr* * * * * 	 fO lfOu fl lflu 
*Load the next element in the vector, and store the new vaLue*
 




•*********************************eg i Rout ine********************************
 
*This routine loads the normalized the data vector. It can * 
*be 	called to execute by itself. * 
********** ******* *** ************************************* 
tc * * * * $ 0004 * icr3 
vinr tc * * * * $ 0040 nop cmr3 
* the 0040 tests for index 3 <> its compare (4 elements of * 
* the data vector.) * 
********* ********* ** * ******* ************** **** ******** * 
tc * * * * $ 000a tOwa tlwa 
tc * cl3* * $ 0151 fO nop 
io* * * * * fO * ds lsb** * 
* This initializes the location of the read pointer. * 
* and this initializes the first read. * 
io* * * * * 	 * * ds r ** * 
*This sets the large file hold registers to a minimum value. * 
** ***** * ********** ************************ ** *** *** **** *** * 
tc * * * * $ 014f 	 lOad lad 
tc* * * * $ 807f 	 * fl 
tc* * * * $ ffff 	 fO * 
tr* * * * * fO lfOu fl Lflu
 
tr* * * * * fO LfOu fl Lflu
 
Loip tr * in3 * * * zO tfOu z1 tflu
 
**** 	 **** ****** ** *** ****** **** ** ****** ** *** * *** * **** **** ** * *** 
* 	 this loads the mantissa and exponent into the temp file * 
*** ******* * ********************** *** ******* ** * **** **** 
• 	 The following does the I/O call. 
'io* * * * * * * ds r ** * 
• 	 This is executed before the jump and it will load the needed data into
 
* 	 the zO-zl register pair before it is neeeded, eliminating a two cycle 





tc ad * * * Loip mar nop
 
tc* * * * $ 0000 brgO brgl
 
tc* * * * $ 0006 tOwa tlwa
 
tc* * * * $ 0000 tfOn tfln
 
, 
• the 0 in Location 6 of the temporary files is a cycle counter.
 
* it keeps track of the class currently being worked upon. 
, 
* After normalizing the data vector, store it, repeat untiL all 
• the elements are finished, then repeat the cycle until all four elements
 
* are finished being processed.
* 
tc * * * * $ 000a tOra tlra
 
tc * cl3* * $ 0002 tOwa tlwa
 
tc* * * * $ 0151 lOad Llad
 
lolp tr * in3 * * * tfOu brgO tflu brgl
 
tc * * sr * fpar mar *
 
tr * * * * * tfOu fO Lflu fl
 
tc* * * * $ 0040 * cmr3
 
tc ad * * * Lolp mar *
 
tr* * * * * fO tfOu fl tflu
 
tc * cl3 * * $ 000a ta tlra
 
* This stores the data normalized data vector in locations 2-5 of thq 
• temporary file. The second vector will appear in locations 6-9 of
 
* the temporary file. 
* This will fall thrQugh to the matrix processing routine. 
* This is the beginning of the matrix multipLy routine. 
stra tc * cla * * $ 0000 brgO brgl 
tc* * * * $ 0006 tOra tlra 
tr* * * * * tfOn p * * 
tc* * * * $ 0010 * q 
tc * * * pLqL $ 0000 * nop
 
tr * * * pLql * mult LOad muLt llad
 
tc* * * * $ 0001 tOba t1ba
 
tc* * * * $ 0001 tfOu tflu
 
tc* * * * $ 0002 tOba tlba
 
mlty tr * inl * * * tfOu eO tflu el 
* This Loads the multiplicand into the eO-el register pair. 
tc* * sr* fpmr mar nop 
• This does the program jump to the floating point multiply routine.
 
tr* * * * * Lflu gl _IfOu gO 
A-125 
* This step is done before the jump is actually executed' This will Load the 
* multiplier into the gO-gl register pair. (F=EXG floating point mult) 
tc * * sr * fpar mar nop 
* This step wilL do a jump to the floating point addition routine.Thi§ rout­
* ine calculates the sum of the contents of the F register and the BRG regis­
* ter pair. The result of the add is then stored in the F register. 
tc* * * * $ 0004 nop icrl
 
tc* * * * $ 0004 nop cmr3
 
* the 0004 tests for index1 <> its compare 
* This is executed before the jump. It will just load the condition register 
* with the next condition to be tested. 
, 
tc ad * * * mlty mar nop
 
tr* * * * * fO brgO fl brgl
 
, 
* On index register I not equal to its compare, jump to beginning of multiply 
* routine. 
tc* * * * $ 0001 tOba tlba
 
tr* * * * * tfln eO nop nop
 
* get address of jth item in the data vector. 
tr* * * * acO al tffd aO tfld
 
tr* * * * acO aO tOra aO tlra
 
* tr* * * * acO al * aO tlra 
* the above was a change to insure that the program works, this is kept. 
* this will update the address for the next round, store it, and point to the 
* item in question. 
* 
tr* in2* * * tf0c eO tflc el 
, 
* this will load the multiplier for the second multiply into the eO-el reg­
* ister pair. Simultaneously, this will zero the temp file pointers. They 
* will now point to the location of the accumulator. 
tr* * * * * bsrO gl bsrl gO
 
tc * * sr * fpmr mar nop
 
tr * * * g aO gl al gO
 
* this is just a subroutine jump to the floating point multiply routine. 
* f=EXG 
tc * * sr * fpar mar nop
 




* F=F+BRG. This caLcuLates the subtotal of the matrix multiply. 
tr * * * * * fO tfOn fl tfln
 
tc* * * * $ 0002 tOba tlba
 
* 	The above two steps Load the sub total into the temporary file Location 
* zero. It them resets the read and 'writepointers of the temporary fiLe to 
t Location two. 
tc * * * * $0004 	 nop icr2 
tc* * * * $ 0010 	 nop cmr3 
* the 0010 tests for index2 # its compare. 
* 	 This will do a test for index 0 not equal. to its compare register. 
tc ad cll* * mlty mar nop
 
tc* * * * $ 0000 brgO brgl
 
tr* * * puql* * * mult mcr3
 
tc * * * $ 0000 tOba tlba
 
tc * * * $ 014f Load, lad
 
* 	 Add precomputed constants (150 cycles, not in code) 
This is the compare routine. 
tc* * * * $ 0000 tOba tlba
 
tr* * * * * tfln gl tfOn gO
 
tc * * sr * fcmp mar
 
tr* * * * * LfOn eO ifln el
 
here, if G > E, (f0=f1=0) return tflE6) as the class (to location 150)
 
tr * clO* * * * * fl icrO
 
tc* * * * $ 0002 A cmr3
 
* 	 this will increment tfOn[6i, the pointer to thi's array. 
tc* * * * $ 0006 	 tOba tlba 
tr ad * * * * tfOn eO * *
 
tc ad clO* * chck mar *
 
tr ad: * * * aO tfOn * *
 
tr* * * * e aO' LfOu al Ltlu
 
tr* * * * * tfOn LfOu tfln lflu
 
tr* * * * * tfOn eO *
 
trt * * * e+1 	 aO tfOn * * 
chck tc * * * $ 0010 * icrO 
tc* * * * $ 0001 * cmr3 
tc* * * * $ 0004 p 
tr * * * * * * * tfOn q
 
tc* * * plql $ 0151 * gO
 
tr* * * plqL * mult eD tfOn idxO
 
tr* * * * add aG Load al *
 
tr* * ** add al * a lad
 
tc* * A $ 0000 tOba tlba
 




tc* * * * $ 0001 tffu tflu 
tc adn * * * Lolp mar * 
tc * clO* * $ 0000 e0 el 
finl tc * * * * $ 0150 Load lad 
otpt io * * * * * * IfOd ds w **** 
io * * * * * IfOd * ds w * 
io * * * * * * IfOd ds w **** 
io * * * * * ifOd * ds w * 
tr * * * pupu* 	 mult brg0 * * 
tc * * * * wait 	 mar nop 
****************************nd* Rout ine****************************
 
**This is the floating point addition routine. 9/4/79. 3:45:00.
 
fpar 	tr* * * * * bsrl gl fIl el 
sh uns clO * * * Izin nzin Izin s 
sh * * * * * rzin nzin rzin s 
* 	 This will strip the sign of the mantissa and save it for future use. 
, 
tr* * * * * 	 * * bsr0 icrO 
tc tnn * * * $ 0000* cmr0 
tc tnn * * * $ 0010 * cmrl 
tr tnn* jp* * * * * * 
tr tnn * df* * * * * * 
tr * * * * zro aO * al cmrl 
* 
• 	 this will compare the brg to zero, if it is, return. 
* 
tr * * * * zro 	 a0 eU al go 
* 	 This will zero the registers to prevent spurious results. 
tr* * * * e-g 	 a0 nop al icrO 
* 	 if lel<Igl, the program will reverse the numbers and continue. 
* 	 since addition is commutative, this should not affect the results. 
, 
tr* * * * xor al e0 aO nop 
tc* * * * $ 0080 * gO 
tr * * * * and al * aO go 
tc* * * * $ 8000 eQ * 
tr* * ** e-g al * aO go 
tc* * * * $ 0010 * cmrO 
tc fnn * * * nsh ORIGINAL PAGE g~ar * 
()P POOR QUALITY 
A-12881,9 
* If the exponent on one of the two numbers is Less than zero 
* and the other is not, subtraction to yield the number of shifts 
* will not yield the correct answer, and thus special handLing 
* must be added to compensate for this problem. The way that this 
t routine handles the problem is it exclusive ors the two numbers
 
t together and then strips off everything but the sign bit. This
 
* is then subtracted from a constant (for speed).The constant is 
* 8000, thus if there is a I in the sign position, the result will 
* not be negative, indicating that the correction'must take place. 
tc tnn* * * $ 0020 * cmrO 
tr* * * * e-g al gl * 
* This will test for E-G->§ negative. This is to insure that jbrgj >= 
* Ifl, simplifying the aLgorithm greatly. 
tc tnn* * * swap mar nop 
* This involkes the swap routine tIat will force the above to be true. 
tr * * * * zro * t aO cmrO 
tc * * * * $ 0010 * cmrl 
* the zeroes that are loaded into condition mask register 0 tell 
* the machine not to check for any of the conditions represented. 
* the 0010 loaded into cmrl tell the machine to check for the compare 
* register greater than index register one. In this case, this will 
* determine whether the two numbers ar equal or equal and opposite in 
* magnitude and sign. 
tr* * ** zro aO gl al el 
tc tnn* * * equl mar mop 
* If they are, the program will jump to a special routine. 
By this point in the program, jlel>jgl. 
tr* * ** * bsrO eO fO gO
 
tc* * * * $0010 * idxO
 
tc* * * * $ 0020 * cmrl
 
tc tnn* * * rtnf mar nop
 
* If the number of shifts required > 16, return the value in the 
* 'Fregister. 
tr * * * * zro al gl aO cmrl
 
tc * clO .* $ 0001 * cmr3
 
* this loads the data to be processed and it programs the CPU to check 
* for regO#indxO. This is represented by a one in the first position. Thi 
A-12 9
 
* check is involked by the AD command. 
. 
shft 	sh * inO * * * rzin nzin nzin s 
tc ad * * * shft mar nop 
* 
* Index register contains the amount by which G>E, (the number of orders 
* of magnitude. This routine shifts E to the right until the two orders of 
* magnitude are equal. 
tc* * * * $ 0000 gl *
 
tc* * * * $ 0020 * cmrO
 
tc fpn * * * gpos mar nop
 
. 
* 	 if gl >= 0, its sign is taken to be positive, and the numbers are 
* 	 handeled in a corresponding manner. 
* 	 By this point, g must be negative. 
tc* * * * $ 0002 cmrO
 
tc tpn * * * ssgn mar nop

* 
* 	 If E is negative, and G is negative, the signs are the same and the 
* 	 two numbers are just added and one of the signs is preserved. 
tc* * * * 	 * * * * 
* 	 At this point, IGI>IEI, the resultant sign will be that of G. 
* 	 Without regard to sign, the result will be the old sign of G 
* 	 plus Ig-el. 
dsgn tr* * * * en aO eO * * 
tr* * * * e+l aO eO * * 
tr* * ** add fl eO aO go 
* 	 This calcuLates g-e. 
* 
tc* * * * 0010 	 * cmrO 
* 	 If the result is >= zero, there is not a one in tthe first bit position, 
* 	 so the number is not normalized, and must be shifted until there appears 
* 	 a '1' in the first bit position. 
* 
norm tr fnn * * * e-1 aO eO * * 
tc fnn * * * norm mar nop 
sh fnn * * * * nzin nzin Izin s 
* 
* 	 This routine normlizes the data 
* 
tr* * * * g aO tO * *
 
tc* * * * $ 80ff * gO
 
tr* * * * and * * aO fl
 
A-130 
tc* * * * $ 0002 * cmr0
 
sh * * * * * nzin Izin nzin s
 
sh tpn * * * * nzin roin nzin s
 
sh fpn * * * * nzin rzin nzin s
 
tc 4 * jp * $ 0000 * *
 
tc* * df * $0000 * *
 
* 	 this routine sets the sign to the sorrect sign and returns to the calling 
* 	 routine. 
gpos 	tc tpn * * * dsgn mar nop
 
tr* * * * * * * * *
 
* 	 Before the jump to GPOS, the condition register was set to check for 
* 	 e<O. If it is, the signs are opposite and the data is treated 
* 	 correspondingly. 
* 	 By default, both G and E have the same sign, so the results are just 
* 	 added. 
tr* * ** * 	 * * fO go 
ssgn 	tr* * * * add al gl aO gO
 
sh* * * * * nzin nzin rcir s
 
tc* * * * $ 0010 nop cmr0
 
*I 
this checks for a carry out of he MSB, indicating normaLization is
 
* 	 necessary. 
tr tnn * * * g 	 fl eC * * 
tr tnn * * * e+l ao eG * *
 
tr tnn * * * g aG fo * *
 
tc tnn * jp * $ 8off * go
 
tr tnn * df * and * * aO fl
 
* 	 If it is, then the number is normalized and the subroutine returns. 
sh * * jp* * nzin nzin (cir s
 
tr* * df * g aD fO * *
 
This 	routine exchanges the two registers involved so that IGI>IEI
 
* 
swap tr* * * * * fl gl fO go 
tr * * * * * bsr0 f0Y bsrl fl 
tc* * * * fpar mar nop 
tr* * * * g aO brgG al brgl
* 
* 	 This calls the original routine. 
* 	 This is the action taken when the routines have the same magnitude. 
A-131
 
equl 	tc* * * * $ 0000 nop cmrl
 
tc* * * * $ 0002 nop cmrO
 
tc fpn* * * epos mar nop
 
tc* * * * $ 0020 nop cmrO
 
tc tp * * * ssgn mar nop
 
tc fp * * * swap mar nop
 
tc* * * * $ 0000 nop nop
 
epos tc fp * * * ssgn mar fop 
tc* * * * $ 0100 nop cmrO 
tr* * * * e=g nop nop nopt nop 
tc tn * * * zapp mar nop
 
tc* * * * $ 0020 nop cmrO
 
tr* * ** e-g al gl aO nop
 
tc tn * * * dsgn mar nop
 
tc* * * * $ 0000 gl cmrO
 
tc * * * * dsgn mar nop
 
tr * * * * * nop nop bsrl fl
 
tc* * * * $ 0000 * *
 
zapp tr * * jp * zro aO fO al fl
 
tc * * df * $ 0000 nop cmrO
 




nsh 	 tc * clO * * $ 0010 nop cmrO
 
* 	 bug in assembler, nulL Line will not be assembled. By this point
 
* 	 in the program, the exponent on one of the two numbers must
 
• 	 be less than zero. This part of the routine will force the negative
 
* 	 part to be stored in brg register. Since a swap can take place,
 
* 	 all the original flags must be reset in the event of ashift.
 
, 
tr* * * * g alsw * aOrz gO
 
tc tnn* * * glz mar nOp
 
* 	 The g/brg register contains the negative exponent, no swap needed.
 
* 
tr fnn* * * * fl gl fO gO 
tr* * * * * bsrO fO bsrl fl 
tr* * * * g aO brgO al brgl 
tr* * * * * bsrl gl fl el 
sh uns * * * * Lzin nzin Izin s 
sh * * * * * rzin nzin rzin s 
. 
• 	 This swaps the two 'numbers and resets all the flags needed by the
 
* 	 rest of th6 routine.
 
, 
glz 	tc* * * * $ 0000 eO gO
 
tr* * * * e-g alsw * aOsw gO
 
tr * * * * g alrz nop aOrs icrO
 
• 	 Calculate the number of shifts needed, if it is < 0, it is
 
* 	 actually > 80 (16), so return the value in the F register.
 
/A-132 
tc tnn * * * rtnf mar nop 
tr * * * * g aOsw eO alrz fop 
tc* * * * $ 0010 * gO 
tr* * ** e-g al fop aD gO 
tc fnn * * * rtnf mar hop 
, 
* 	 If the number of shifts required is-> 16, return the data in the 
• 	 F register.
 
tc* * * * $ 0000 * cmrO 
tc* * * * $ 0000 el 
tr* * * * * bsr0 eO fO go 
tc* * * * shft mar nop 
tc* * * * $ 0001 * cmr3 
• 	 prepair to shift the data and return to shifting routine,
 
rtnf 	tr * * jp* * * * *
 
tr* * df * * * * *
 
• 	 Return the contents of the F register.
 
******* ******** ** ********.**** ********** ****** ************ *** ** 
* this is just for a break point and it is to be removed when* 
" the program is actually inserted into the code. * 
fcmp 	tc * * * * $ 0000 tOba tiba 
" This accepts the data in the E register and G register as * 
" Inputs. Initially, the program stores the original data in * 
" temporary file. the E register goes in Location 0 and the * 
" G register goes in Location 1. The following will aLso * 
* strip off the sign bit * 
tr* * * * e aO tfOu al tflu 
tr* * * * g aO tfOu al tflu 
" This routine strips off the sign bit. The correct sign bit * 
" is saved in the PAST register. * 
sh uns *Lzin 	 zin s
*** **in 
sh* * * * * rzin nzin rzin s 
tr* * * * e alsw eO aOsw el 
tr* * * * g 	 aOsw gl alsw gO 
" The 0002 in cmrO will check for El negative. This is done * 
" in the past sense. If E1<0, then jump to the routine that * 
" will handle that case. * 
tc* * * * $ 0002 nop cmrO 
tc * * * * $ 0000 fop cmrl 
A-133
 
tc tpn * * * emng 	 mar * 
**** ** * ***** * *** *** *** * ****** ** ***** *** *** * *** ******** * *** *** * 
" by this point, the E register must not be negative (>=0) * 
* the 0020 in the cmrO will test for g<O. If g<0, e is the * 
" greater of the two numbers. If not, they are both >= 0. * 
tc* * * * $ 0020 * cmrO 
tc tpn * * * egrt mar * 
* This will determine if there is a difference in exp sgn. * 
tc tpn * * * $ 0000 * cmrl 
tc tnn * * * gxng mar * 
" This wilL do a jump if the sign of G is 1, or,G negative * 
" in the exponent portion. * 
** **** **** ******* *********** *********** *** *** ** **** ******** ** * 
tc* * * * $ 0002 * cmrO 
tc tnn * * * ggrt mar * 
" By here, the exponent of g is positive. If the exponent of * 
* E is negative, both mantissas being positive, e<g * 
tr fnn * * * e-g a0 e0 al el 
tc tnn * * * ggrt mar * 
tc fnn * * * $ 0000 fO fl 
tc * * jp * $ 0000 tOba tlba 
tr * * df* * tf0n e0 tfln el 
****************** * ************** ** ******* * ******** * 
" Since both exponents and mantissas are nonnegative, this * 
* routine calculates e-g, exponents in the HOBP and mantissas* 
" in the LOBPs. If the result is < 0, g>e, else return E. * 
ggrt 	tc* * * * $ 0001 tOba tlba 
tc * * jp * $ 0001 fO fl 
tr * * df* * tfOn e0 tfln el 
* if 	fl>=0, e>g, return tf[1] * 
egrt 	tc* * * * $ 0000 fO fl 
tc * * jp * $ 0000 tOba tlba 
tr * * df* * tf0n e0 tfln el 
* This is the section of the program that is called if E is * 
* negative. (mantissa) * 
emng 	tc * * * * $ 0020 * cmr0 
tc fpn * * * ggrt mar nop 
* This section does the compare if both the operands are < 0 * 
" This wilL determine if there is a difference in exp sgn. * 
A-134
 
ncmp tc* * * * $ 0000 * *
 
tc tnn* * * gbng mar *
 
* This wilL do a jump if the sign of G is 1, or G negative * 
" in the exponent portion. * 
tc* * * * $ 0002 * cmrO 
tc fnn* * * nnpp mar * 
" By here, the exponent of g is positive. If the exponent of * 
" E is negative, both mantissas being negative, e>g * 
tc tnn* * * $ 0000 fO fl 
tc * * jp * $ 0000 tOba tlba 
tr * * df* * tfOn eO tfln el 
• The above will return e 	 * 
gbng 	tc tnn* * * ebng mar * 
" Both G's exponent and sign are negative. If true, the same * 
" holds true for E. If this is false, return g. * 
tc * * * * $ 0001 tOba tlba 
tc * * jp * $ 0001 fO fl 
tr* * df* * tfOn eO tfln el 
ebng 	tr* * * * e-g aO eO al el 
tc fnn * * * ggrt mar * 
tc * * * * 0000 fo fl 
tc * * jp * $ 0000 tOba tlba 
tr* * df* * 	 tf0n e0 tfln el 
" Both the mantissa and the exponent of both E and G are * 
" less than zero. calculate e-g.. if resuLt positive, g>e * 
gxng 	tc fnn* * * egrt mar nop 
tc* * * * $ 0000 * * 
test tr * * * * e-g aO eO al el 
tc fnn * * * egrt mar nop 
tc tnn * * * ggrt mar nop 
tc* * * * $ 0000 * * 
" at this (preceeding Line) both E and G are positive. The * 
* sign of the exponent of g, is negative. If the sign of the * 
" exponent of E is positive, e>g, hence return E. * 
nnpp tr * * * * e-g a0 eO al el 
tc tnn * * * egrt mar nop 
tc fnn * * * ggrt mar nop 













* THREE PIXEL LINEAR NEIGHBORHOOD -- 04 classes! 




tc* * * * $ 0010 nop icrO
 
tc* * * * $ 0002 nop cmr3
 
lILd 	tc * * * * $ 0000 tfbu tflu
 
tc adn inO * * Lold mar nop
 
tr adn * * * * ifOu tfOu iflu Iflu
 
* 	 Load ICovariance Matrixi
 
tc * *clO * * $ 0030 nop icrO
 
tc* * * * $ 8000 gl tlba
 
tc* * * * $ 0000 tOba gO
 
loam 	tr * inO * * * ifOu eO iflu el
 
tc adn * * * Loam mar hop
 
tr* * * * xor aO LfOu al lflu
 
* 	 Load mean vectors
 
tc * clO * * $ 0120 fop icrO
 
Loac tc adn * * * Loac mar nop
 
tr adn inO * * * ifOu LfOu- iflu Lflu
 
tr* * * * zro aO LfOu al Iflu
 
tr * * * * zro aO lfOu al Lflu
 
* 	 Load covariance matrix.
 
tc * clO * * $ 0032 	 nop icrO
 
load 	tc adn * * * load mar nop
 
tr adn jnO * * * ifOu LfOu iflu tflu
 
tc * clO * * $ 0152 Load llad
 
tc * * * * $ 0533 nop icrO
 
loag 	tc adn * * * loag mar nop
 




* 	 the above routine wilL load the data vector & the a-priori configuration
 
* 	 al probabilities.
 
* 	 The configuration of the Large file is as follows:
 
* Base 10 Base 16
 
* -In jCovariance matricesl 000-016 000-00f
 
* 	 Mean Vectors ( X -1 ) 017-064 010-03f /* 12 classes */
 
* 	 covariance matrices 065-255 040-Off /* 12 classes */
 
* 	 data vectors 256-287 100-121 /* 8 pixels */
 
* 	 Not currently used 289-337 122-151
 
GEr,j,q] 338-1668 152-684 /* 11 classes */
SOGRWAL PAGE IS
 
"J"' POOR OTTAT.T"-v 
A-136
 
* Not currentLy used 	 1668-1776 684-6fl 
* 	 aEr] 1777-1792 6f1-700 
* 	 b[jJ 1793-1808 701-710 
* 	 cEq] 1809-1824 711-720 
* 	 Not used 1825 721 
* 	 i 1826 722 
* 	 1 1827 723 
* 	 k 1828 724 
* 	 r 1829 725 
* 	 q 1830 726 
* 	 value 1831 727 
* 	 class 1832 728 
* 	 ii 1833 729 
* 	 Er,j,q] 1834 72a 
tc* * * * $ 0722 LOad lad 
tc* * * * $ 0000 LfOu Lflu 
iLpl 	tc * * * * $ 0723 LOad llad
 
tc * * * * $ 000f tOba tlba
 
tc* * * * $ 06fl tfOn tfln
 
tc * * sr * comf mar nop
 
tc * * * * $ fffe IfOn lfln
 
tr * * * pupL * nop nop muLt mcr3
 
* 
* 	 fl,01826]=0 LfO,111827]=-2;call comf to calculate aEjl 
where 0<=j<3 comf calcuLates the 3 classes for pixel 
* k+2. comf also assumed that lOad and lad are 1827. 
* 
* 	 comf also assumes that in tfOE07J the location of the destination 
* 	 is stored. 
* for i=O to I-1 do: 
* 
tc* * * * $ 0723 LOad lad 
tc * * * * $OOf tOba tlba 
tc* * * * $ 0701 tfOn tfln 
tc * * sr * comf 	 mar nop 
tc* * * * $ ffff IfOn Ifln 
tr * * * pupl * nop nop muLt mcr3 
* 	 call comf to calculate blj), where 0 <= j <= 4. 
tc* * * * $ 0723 lOad lad 
tc * * * * $ 000f tOba tlba 
tc * * * * $ 0711 tfOn tfln 
tc * * sr * comf mar nap 
tc* * * * $ 0000 lIOn Ifln 
tr * * * pupl * fop nap mult mcr3 
* 
=
* 	 call comf to calculate clj], where 0< j <= 4. 
tc* * * * $ 0723 	 LOad Llad 
A-137
 
tc * * * * $ 0003 LfOu tflu 
tc * * * * $ 0001 LfOu [flu 
* 	 LfO,111827]=3,1f0,111828]=1. 
* 	 for k=l to J-2 do: 
kLpl 	tc * * * * $ 0727 Load llad 
tc* * * * $ 8001 fop el 
tc* * * * $ 8000 eQ nop 
tr* * * * e aO LfOu al Iflu 
tr* * * * e aO lfOu al Iflu 
• 	 LfO,11831) = -1; [fO,111832] = -1; 
• 	 value=cLass=-1
 
tc * * * * $ 0723 lOad llad 
tc * * * * $ 0000 tfOu Lflu 
• 	 LfO,1[1827=0;
 
* 	 j=O (for j=O to c-I do:) 
jLpl 	tc * * * * $ 0725 toad Hlad 
tc* * * * $ 0004 p nop 
tc* * * * $ 0000 LfOu Lflu 
• 	 lfO,111829]=0; p=3 (always = number of cLasses C)
 
• 	 for r=O to c-1 do: 
tc * * * * $ 0723 LOad llad 
tr * * * * * nop nop LfOn q 
tc * * * plql $ 072a Load llad 
tr * * * plqL * muLt eO nop nop 
tc* * * * $ 0152 fop gO 
tr * * * * add aO [fOn al lfln 
. 
* 	 LfO,111834]=Clfl,0[1827] X C ) + base address of G 
* 	 this will provide the address for g[O,j,0); 
* 
rlpl 	tc * * * * $ 0726 lOad llad 
tc * * * * $ 0000 lfOu Lflu 
tc* * * * $ 0000 brgO brgl 
. 
* 	 sum=O; Lfi[1830=0; 
• 	 for q=O to c-1 do:
 
. 
qlpl 	tc * * * * $ 0725 Load llad 
tc* * * * $ 06fi nop gO 
tr * * * * add aO Load al lad 
* 
* 	 e = 1777 + tf1E1829J; LOad,Llad = e; (a[r]) 
tr* * * * * 	 LfOn fO Lfln fIl 
* fO,1 = LfO,.Lfl 
tc* * * * $ 0701 nop gO 
tc * * * * $ 0723 Load Llad 
tr * * * * * Von eO fop nop 
tr* * * * add aO Load aO Llad 
* e 1793 + If1118272; Load,Llad = e; (blj]) 
tfr* * * * * LfOn eO Lfln el 
td* * sr* fpmr mar nop 
tr-* * ** * fl gI fO gO 
. 
* f-eXg-- f atr] X blji; 
tc* * * $ 0711 fop gO 
tc* * * * $ 0726 Load lad 
tr* * * * * LfOn eO nop nop 
tr t * * * add ag Load aO liad 
tr * * * * * LfOn eO Lfln et 
* 
* caduLate Location of cEq]. (1809+Lfl11830) 
tc-* * sr* fpmr mar nop 
tr* * *f* * l gi fO "g0 
* ffeXg f a~r) X blj] X ctq]; 
tc* * * $72a LOad llad 
tr * * * * * LfOn LOad' fln lad 
* LOad,Llad = Lf1E1834Y' CgEr,j,q]) 
tr * * * * ffOr eO' Lfln el 
tc * * sr * fpmr mar nop 
tr* * ** *fl gl fo g0 
* f=eXg -- = air) X bljJ X c~q] X g[r,j,q] 
tc * * sr * fpar mar nop 
tr* * * * * nop nOp nop nop 
* f f + sum 
tr * * * * * fO brgO fl1 bigl 
* sum = f 
tc* * * * $ 072a LOad lad 
tr * * * * * fOn eO lfln el 
tr* * * * e+l aO LfOn ad Lflrn 
A-139
 
* 	 update pointer into g[r,j,q] to next q 
. 
tc * * * * $ 0726 LOad Llad 
tr * * * * * lfOn eO nop nop 
tr* * * * e+l nop nop aO idxO 
tc* * * * $ 0004 nop icrO 
tc* * * * $ 0002 nop cmr3 
tc adn * * * qlpl mar nop 
tr * * * * e+l aO IfOn aO lfln 
* 	 q=q+l (tfO,111830J = LfO,111830 + 1; 
* 	 if q 4 goto qlpl 
tc* * * * $ 072a Load llad 
tr * * * * * IfOn eO nop nop 
tc* * * * $ 006f fop gO 
tr* * * * add aD LfOn aO Ifln 
* 
* 	 e=LfO,111834J 
* 	 e=e+133 
* 	 LfO,1E18343=e 
* 	 the program has just gone through aLL possible vaLues of q 
* 	 in the combination g[r,j,q], it must now update to the next 
* 	 value of r, as j is held constant. 12 if necessary because 
* 	 g is a 4 X 4 X 4 matrix. The program is pointing to the 
* 	 Last element of a given r, and j. 
tc* * * * $ 0726 load llad 
tc* * * * $ 0000 LfOd lfld 
* Lfl1830] = 0 (q 0) 
, 
tr * * * * * IfOn eO nop nop 
tr* * * * e+l al fop aO idxO 
tc* * * * $ 0004 nop icrO 
tc* * * * $ 0002 nop cmr3 
tc adn * * * rlpl mar nop 
tr * * * * e+1 aO fOn aD lfln 
Lf11829) = Lf1E1829) + 1 (r=r+1) 
* 	 store updated value of r. 
if r 4 (base 10) goto rlpl 
tc * * * * $ 0727 LOad llad 
tr * * * * * IfOn fO Ifln fl 
tr* * * * * bsrO eO bsrl el 
tc * * * * fcmp mar nOp 




* 	 fg; floating point compare e and g. 
tr * * * * * nop nap fl idxO 
tc* * * * $ 0000 nop icrO 
tc* * * * $ 0002 nop cmr3 
tr ad * * * * bsrO IfOu bsrl lflu 
tc ad f * * $ 0723 	 Load llad 
tr ad * * * * IfOn eO 1fln el 
tc ad * * * $ 0728 lOad llad 
tr ad * * * e aO LfOu al Iflu 
* 
* f will be set to zero or one, depending on whether g
 
* or e-is greater. If g is greater, the new value is Less
 
* 	 than the old value. If e is greater, its value and class 
* 	 are the new one for the pixeL under consideration. 
* 	 in program code: compare e,g; 
* if (t!=0)
 
* - lf0,118313 = brgG,i;
 
* 	 lf0,1118323 = f0,1118272; 
*}
 
tc * * * * $ 0729 load llad 
tc * * * * $'0000 lfOn ifln 
* 	 Lf1;011833]=O; (jj=O) 
tc * * * * S 06fl Load 1lad 
r * * * * IfOn e0 hop nap. 
* 	 eO = LfI11777fl(a[0]) 
tc* * * * $ 0701 LOad llad 
r* * * * * lfOn gl1 lfiln gO 
tr* * * * g aO gl al gO 
* 	 91 f lf[1793J(bEO) 
tc * cO * * 7$ 0001 nap elgO 
tc* * * * $ 0002 nop cmr3 
tc* * * * $ 0016 nop icrO 
* 	 This will be used to augment the original two values, so that 
* 	 the program can shift btjjl -> a[jj] the 16 represents the 24 
* 	 pixels allowed (max) 
Ljjp 	 tr * inO * * g aO Load aO Llad 
tr * * * * * IfOn fO Lfln fl 
tr* * * * e al Load aO nap 
tr * * '* le aO nop al llad 
tr* * * * add al 'nop "aO gO 
A-141
 
tr* * * * add sO nop al el 
tc adn * * * ljjp mar nop 
tr* * * * * fO IfOn fl Ifln 
* since there are 24 data vectors maximum, this will move aLL of 
• them, whether they are there or not. the two add's are the update.
 
• of the address pointer.
 
, 
tc * * * * $ 00Of tOba tiba 
tc* * * * $ 0711 tfOn tfln 
tc * * sr * comf mar nap 
tc* * * * $ 0724 Load lad 
* CalcuLate new cljjJ's 
tc * * * * $ 0724 LOad llad 
tr * * * * * IfOn eO Ifln el 
tr* * * * e+l aO eO aO el 
tr* * * * e al nap aO idxO 
tc * * * * $0001 nop icrO 
tc adn * * * ilpl mar nap 
tr* * * * e aO lfOn al Ifln 








fexp tc* * * * $ 0001 gl nop 
tc * * sr * fpmr mar nap 




" E=E*log(e) (log to the base 2)
 
* 
tc * * sr * fLoo mar nap 
tr* * * * * fO brgO fl brgl 
. 
* ent = floor(E) 
. 
tc * * * * $ 0007 tOba tlba 
tr* * * * * fO tfOu fl tflu 
* 
• save ent in temp file[7)
 
. 
tr* * * * * nap nop fl el 
tc* * * * $ 8000 gl nap 
tc * * sr * fpar mar nap 
tr* * * * xor aO nap al fl 
A-1 42 





















tr* * * * * fO 'tfOu fl tflu 


























,tr * * * * * fO tfOd f1 tfld 
• Store xsq in temporary file 9 
tr* * 
tc* * 
tc * * 




























































































































* templ = templ * fract 
* store in Location 8 of the temporary file. 
. 
tr* * * * * tfOn fO tfln fl 
tc * * * * $ 000b nop brgl 
tc * * sr * fpar mar nop 
tc* * * * $ daa9 brgO nop
* 
* temp2 = xsq + q2 
* 
tr* * ** * fO eO fl el 
tc * * sr * fpmr mar nop 
tr* * * * * tfln gl tfOn gO 
. 
* temp2 = temp2 * xsq
* 
tc* * * * $ 0013 nop brgl 
tc * * sr * fpar mar nop 
tc* * * * $ a005 brgO nop 
. 
* temp2 = temp2 + ql 
tr* * * * * fO eO fl el 
tc * * sr * fpmr mar nop 
tr* * * * * tfld gl tf0d gO 
* temp2 = temp2 * xsq 
tc* * * * $ 0017 nop brgl 
tc * * sr * fpar mar nop 
tc* * * * $ b730 brgO fop
* 
* temp2 temp2 + qO 
. 
tr* * * * * fO tfOn fl tfln 
* store temp2 in temporary Location 9 (temp2 is already in f) 
tc * * sr * fpar mar nop 
tr * * * * * tfOu brgO tflu brgl 
. 
* F=templ+temp2 
tr* * ** * fO eO fl el 
tc* * * * $ 0001 gl nop 
tc * * sr * fpmr mar nop 
tc.* * * * $ b505 nop gO
* 
* F=F*sqrt(2) 
tr * * * * * tfOd brgO tfld brgl 
tr * * * * * fO tfOn fl tfln 
A-144
 








tc* * * * $ 8000 gl fop 
tc* * sr * fpar mar nop 





tc * * sr * fdiv 	 mar nop 





tc * * * * $ 0007 	 tOba tlba 
tr * * * * * 	 tfln gl tfOn _gO 
. 
• get ent 
tc* * * * $ 3fff nop el 
tr * cLO* * and aO eO al icrO 
tc* * * * $ 0001 nop cmr3 
tc* * * * $ 0000 eOgl nop 
Loop tc ad * * * LOOp mar nop 
sh ad inO * * * nzinn nzinn Lzinn s 
. 
* get integer vaLue of ent. 
. 
tr * * * * * nop nop fl el 
tr * * jp * add aD nop al el 
tr * * df* * fO eO muLt mcr3 
. 
* add to current power of two. Routine is now over. 
. 
fdiv tc * cLO * * $ 0000 nop el, 
tr* * * * *bsrO eQ fO gO 
tc* * * * $ 8000, gl nop 
tc* * * * $ 0004 * cmrO: 
tc * * * * $ 0001 * cmr3 
Loop 	tr * * * * e-g aO fO aO icrO 
tr fnn* * * * fO eO fO icrO 
tr fnn * * * add aO nop al el 
tc ad * * * Loop mar nop 
sh * * * * * nzinn nzinn rzinn s 
tr * * * * e al fO bsrl el 
tr * * * * zro fl gl aO gO 
tc* * * * $ 4000 eO fop 
sh * * * * * Lcir nzinn Lcir s 
tr* * * * e-g aO eO al el 
A-145
 
sh * * . * * rcir nzinn nzinn s 
tc* * * * $ bfff gl gO 
tr* * * * and al eO aO el 
tr tnn * jp * e+1 aO eO al el 
tr tnn * df * and al nop aO fl 
sh * * jp * * nzinn Lzinn nzinn s 
tr* * df * e al nop aO fl 
floo tr * clO * * zro - bsrl eO al el 
sh uns * * * * lzinn nzinn nzinn s 
tc* * * * $ 0001 nop cmr0 
tc* * * * $ 0001 nop cmr3 
tc tpn * * * next mar nop 
sh * * * * * Lzinn nzinn nzinn s 
tc tnn * jp * $ 0000 fO fl 
tc tnn * df * $ 0000 eO el 
tc* * * * $ 0034 nop gO 
tr* * * * e-g aD eO aD fl 
tr fnn * jp * * bsrO fO bsrl fl 
tr fnn * df * * nop nop nop nop 
sh * * * * * nzinn rzinn nzinn s 
sh * * * * * nzinn rzinn nzinn s 
tr * * * * * bsrO eO bsrl icrO 
LOp tc ad * * * 1Olp mar nop 
sh ad inO * * * lzinn nzinn nzinn s 
tr * clO * * e al eO aO fop 
102p tc fnn * * * 102p mar nop 
sh fnn * * * * lzinn nzinn nzinn s 
tr * * jp * e aD fO bsrl fl 
tc* * df * $ 0000 eO el 
next tc tnn * jp * $ 8001 fO fl 
tc tnn* df * $ 8000 fO el 
tc* * * * $ 0034 nop gO 
tr* * * * e-g aD eO aO fl 
tr fnn* jp* * bsrO fO bsrl fl 
tr fnn * df * * nop nop nop nop 
sh * * * * * nzinn rzinn nzinn s 
sh * * * * * nzinn rzinn nzinn s 
tr* * * * * bsrO eO bsrl fl 
sh * * * * * nzinn Izinn nzinn s 
sh * * * * * nzinn rzinn nzinn s 
tc* * * * $ 0000 nop el 
tr* * * * * nop nop fl icrO 
Lslp tc ad * * * Islp mar nOp 
sh ad inO * * * Izinn nzinn nzinn s 
tr * cLO * * e al eO aO nop 
tr ad * * * e+1 aO eO nop nop 
Ls2p tc fnn * * * Ls2p mar nop 
sh fnn * * * * Izinn nzinn nzinn s 
tr * * jp * e aO fO bsrl fl 
tc* * df * $ 0000 eO el 
fpmr tr * 












******* ** *** 
q 
iRIGTAL PAGE IS 
OF POOR QUATTY 
A-14 6
 
* Load multiplicand 	 * 
tr* * * * g 	 aO p al gO 
* Load multiplier 	 * 
tc* * * * 	 0$O004 * cmrO 
* this condition will check to see if fO < 0. 	 * 
tc* * * * 	 $Q002 * cmr3 
* 	 is indexO = compare register 0? * 
tr * cLO* puqu add 	 al * aO fl 
tr * * * puqu zro 	 muLt fO aO el 
tc* * * * $ 0000 	 gl go 
tr * * ** * 	 nop nop fO go 
tr* * * puqL g aOsw gl alsw go 
tr* * * puqL * muLt eO nop nop 
tr* * * pLqu add al gl aO go 
tr* * * plqu * muLt eO * * 
tr * * ** add al gl aO go 
tr* * * * g aOrz fO alsw nop 
tr* * * * * * * fl el 
tr* * * * * * * fO icrO 
*If the vaLue returned is zero, zero both registers, return*
 
tcad * jp * $ 0000 fo fl 
tc ad * df * $ 0000 * * 
* If f0 is justified, return. The product is normaLized. * 
tc* * * * $ 0000 * cmr 
tc tnn * jp * $ bfff gl * 
tr tnn * df * and aO * al fl 
* 	 Save the exponent in gl, cLear El for a counter * 
tr * * * * zro fl gl aO el 
tr * cLO* * zro aO eO * * 
* By here, product cannot be zero.The normaLization process * 
* will take Less than four repeats of this Loop. If it ever * 
* takes more, there is something branching directLy to this * 
* process. * 
nrm 	tr fnn inO * * e+l aD eO al * 
sh fnn * * * * nzin Lzin nzin s 
* 	 By now, the result must be normaLized!!!!! * 
A-14 7
 
tr* * * * g aO * al fl 
tr* * ** e aO gl al go 
tr* * ** * fO eO fl el 
sh * * * * * Icir Icir Izin s 
tr* * * * e-g aO eO al el 
********* ******* *** ** ********* **** *** *** *************** 
*This wiLL take the normalized result, shift it Left, adjust * 
*the expontent, so that it agrees with the mantissa. * 
tc* * * * $ 7fff gl * 
tc* * * * $ ffff * gO 
tr * * jp* acb aO * al fl 
* this wilL "mask off" any carries into the unused portion of* 
* the exponent. * 




• 	 This is the fLoating point addition routine. 3/1/80
 
fpar 	tr* * * * * bsrl gl fl el 
sh uns cLO * * * lzin nzin lzin s 
sh* * * * * rzin nzin rzin s 
* 	 This will strip the sign of the mantissa and save it for future use. 
, 
tr* * * * * * * bsrO icrO 
tc tnn * * * $ 0000 * cmrO 
tc tnn * * * $ 0010 * cmrl 
tr tmn * jp * zro aO eO al el 
tr tnn * df * zro aO gl al gO 
tr* * * * zro aO * al cmrl 
* 	 this will compare the brg to zero, if it is, return. 
tr* * * * zro 	 aO eO al go 
• 	 This wiLL zero the registers to prevent spurious results.
 
* 
tr* * * * e-g 	 aO nop al icrO 
* 
* 	 if le<lgl, the program wilL reverse the numbers and continue. 
* 	 since addition is commutative, this should not affect the resuLts. 
* 
tr* * * * xor al eO aO fop 
tc* * * * $ 2000 * go 
tr* * * * and al * aO gO 
tc* * * * $ 8000 eO * 
tr* * ** e-g al * aO go 
tc* * * * $ 0010 * cmrO 




* If the exponent on one of the two numbers is less than zero 
* and the other is not, subtraction to yield the number of shifts 
* will not yield the correct answer, and thus special handling 
* must be added to compensate for this problem. The way that this 
* routine handles the problem is it exclusive ors the two numbers 
* together and then strips off everything but the sign bit. This 
* is then subtracted from a constant (for speed).The constant is 
* 8000, thus if there is a 1 in the sign position, the result will 




tc tnn * * * $ 0020 * cmrO 
tr* * * * e-g al gl * * 
* This will test for E G->G negative. This is to insure that jbrgl >= 
* Ifl, simplifying the algorithm greatly. 
tc tnn * * * swap mar nop 
* This involkes the swap routine that will force the above to be true. 
tr* * * zro * * aO cmrO
 
tc* * * * $ 0010 * cmrl
 
* the zeroes that are loaded into condition mask register 0 tell 
* the machine not to check for any of the conditions represented. 
* the 0010 Loaded into cmrl tell the machine to check for the compare 
* register greater than index register one. In this case, this will 
* determine whether the two numbers are equal or equal and opposite in 
* magnitude and sign. 
tr* * * * zro aO gl al el 
tc tnn * * * equl mar nop 
* If they are, the program will jump to a special routine. 
* By this point in the program, lel>jgl. 
tr* * * * * bsrO eO fo gO
 
tc* * * * $ Q010 * idxO
 
tc* * * * $ 0020 * cmrt
 
tc tnn * * * rtnf mar nop
 
, 
* If the number of shifts required > 16, return the vaLue in the 
*k F register. 
tr * * * * zro al gl aO cmrl
 
tc * clO* * $ 0001 * cmr3
 
* this Loads the data to be processed and it programs the CPU to check 
* for regO#indxO. This is represented by a one in the first position. This 
A-149
 
* 	 check is invoLked by the AD command. 
shft 	sh * inO * * * rzin nzin nzin, s 
tc ad * * * shft mar nop 
, 
* Index register contains the amount by which G>E, (the number of orders 
* of magnitude. This routine shifts E to the right until the two orders of 
* magnitude are equal. 
tc* * * * $ 0000 gl *
 
tc* * * * $ 0020 * cmrO
 
tc fpn * * * gpos mar fop

* 
* 	 if gl >= 0, its sign is taken to be positive, and the numbers are 
* 	 handeled in a corresponding manner. 
* 	 By this point, g must be negative. 
tc* * * * $ 0002 * cmrO
 
tc tpn * * * ssgn mar nop
 
* 	 If E is negative, and G is negative, the signs are the same and the 
* 	 two numbers are just added and one of the signs is preserved. 
tc* * * * 	 * * * * 
* 	 At this point, IGI>IEI, the resultant sign will be that of G. 
* 	 Without regard to sign, the result will be the old sign of G 
* 	 plus lg-el. 
dsgn 	tr * * * * en aO eO * * 
tr* * * * e+l aO eO * * 
tr* * ** add fl eO aO gO 
, 
* 	 This calculates g-e. 
tc* * * * $ 0010 * cmrO 
. 
* 	 If the result is >= zero, there is not a one in tthe first bit position, 
* 	 so the number is not normalized, and must be shifted untiL there appears 
* a '1' in the first bit position. 
, 
norm tr fnn * * * e-1 aO eO * * 
tc fnn * * * norm mar nop 
sh fnn* * * * nzin nzin lzin s 
, 
* This routine normlizes the data 
. 
tr* * * * g aO fO * *
 
tc* * * * $ 3fff * gO
 




tc* * * * $ 0002 	 * cmr0 
sh* * * * * nzin Lzin nzin s
 
sh tpn * * * * nzin roin nzin s
 
sh fpn * * * * nzin rzin nzin s
 
tc * * jp * $ 0000 eOgl elgO
 
tc * * df * $f000 * *
 
* 	 this routine sets the sign to the sorrect sign and returns to the calling 
* 	 routine. 
, 
gpos 	tc tpn * * * dsgn mar nop
 
tr* * * * * * * * *
 
* 	 Before the jump to GPOS, the condition register was set to check for 
* 	 e<0. If it is, the signs are opposite and the data is treated 
* 	 correspondingLy. 
* 	 By default, both G and E have the same sign, so the resuLts are just 
* 	 added. 
tr* * ** * 	 * 1 0 go 
ssgn 	tr* * * * add al gl aO go 
sh* * * * * nzin nzin rcir s 
tc * * * * $ 0010 nop cmr 
* 	 this checks for a carry out of he MSB, indicating normalization is 
* 	 necessary. 
tr tnn* * * 9 fl eO * *
 
tr tnn-* * * e+l a0 eO * *
 
tr tnn * ** g aD f0 * *
 
tc tnn * jp * $ bfff * go
 
tr tnn * df * and * * a0 fl
 
* 	 If it is, then the number is normalized and the subroutine returns. 
sh * * jp* * 	 nzin nzin Icir s 
tr* * df * g 	 aO to * * 
* 
* 	 This routine exchanges the two registers involved so that GI>IEI 
swap 	tr* * * * * fl gl fO g0 
tr* * * * * bsrO fO bsrl fl
 
tc* * * * fpar mar nop
 
tr* * * * g aO brg0 al brgl
 
* 	 This calLs the originaL routine. 
* 	 This is the action taken when the routines have the same magnitude. 
A-151
 
equl 	 tc* * * * $ 0000 nop cmrl 
tc* * * * $ 0002 nop cmr0 
tc fpn * * * epos mar nop 
tc* * * * $ 0020 nop cmr0 
* 
* 	 is el >= 0 (originaLly ie) was f => 0?) 
tc tp * * * ssgn 	 mar fop 
tc* * * * $ 0000 	 fop fop 
• 	 if fl < 0 then the two have similar signs, and should be added.
 
* 	 By this point, the brg is negative for sure, and the fl register 
• 	 is positive. Reverse the two and continue processing.
 
tr* * * * * fl e0 bsrl gO 
tr* * * * e aO gl nop nop 
tr * * * * g al nop aO el 
tr* * ** * bsr0 ea fO go 
tr* * ** e a0 fO al fl 
tr * * * * g a0 brgO al brgl 
tr * * ** g a eO al el 
tr* * ** * fl gl fO go 
sh uns * * * * Lzinn Lzinn Lzinn s 
sh * * * * * rzinn rzinn rzinn s 
tc * * * * lbll mar nop 
tr* * * * zro a0 gl al el 
* 	 the two numbers are reversed the the program can continue processing 




epos 	tc fp * ssgn nops** 	 mar 

IblL 	tc * * * * $ 0100 fop cmr0 
tr* * * * e=g nop nop nop nop 
tc tn * * * zapp mar nop 
* 	 by here E and g have different signs. If they have the same 
* 	 mantissa, the result should be zero. This is what the above 
lines of code determine. 
tc * * * * $ 0020 nop cmr0 
tr* * ** e-g al gl a0 nop 
tc tn * * * lbLl mar nop 
tc fn * * * lbl mar nop 
tr * * * * * nop nop bsrl fl 
• 	 if the mantissa of g is greater than the mantissa of e, force the
 
* 	 resultant sign and exponent to be that of g. Else force it to be 
* 	 e. 
Ibll 	 tr * * * * e-g a0 gl nop nop 
tr * * * 9g al e0 nop nop 
tr tnn* * * en a0 e0 nop nop 
i'l4nIAl PAGE IS 
OP POOR OUATT. 
tr tnn* * * e+i aO eO nop nop
 
tr* * * * e fl eO aO gO
 
tc* * * * $ 0010 * cmrO
 
nnrm 	tr fnn * * * e-1 aO eO * * 
tc fnn * * * nnm mar nop 
sh fnn * * * * nzin nzin Izin s 




tr* * * * g aO fO * *
 
tc* * * * $ 3fff * gO
 
tr* * * * and * * aO fl
 
tc * * * * $0002 * cmrO
 
sh * * * * * nzin Lzin nzin s
 
sh tpn * * * * nzin rzin nziri s
 
sh fpn * * * * nzin rain nzin s
 
tc * * p * $ 0000 eOgl elgO
 
tc* * df * $ 0000 * *
 
* 	 this routine sets the sign to the correct sign and returns to the calling 
* 	 routine. It is seperate because somewhere a sign convention changed. 
zapp 	tr * * p * zro aO f0 al fl 
tc* * df * $0000 nop cmrO 
* 	 this routine handles numbers that have different exponentiaL signs.
* 
nsh 	 tc * clO* * $ 0010 nop cmrO 
* 	 bug in assembler, null line will not be assembled. By this point 
* 	 in the program, the exponent on one of the two numbers must 
* 	 be less than zero. This part of the routine will force the negative 
* 	 part to be stored in brg register. Since a swap can take place, 
* 	 all the original flags must be reset in the event 'of ash-ift. 
* 
tr* * * * 9 alsw * aOrz gO
 
tc tnn * * * gLz mar nop

* 
* 	 The g/brg register contains the negative exponent, no swap needed. 
trfnn* * * * 	 f, -gl fO gO 
tr * * * * * 	 bsrOf fO bsrl fl 
tr * * * * g 	 aO brgO al brgl 
tr * * * * * 	 bsrl gl fl el 
sh uns * * * * 	 lzin nzin Izin S 
sh* * * * * 	 rzin nzin rzin s 
This 	swaps the two numbers and resets all the flags needed ,by the
 
* 	 rest of the routine. 
gLz 	 tc* * * * $ 0000 eO gD 
tr* * * * e-g alsw * aOsw gO 
A-153
 
tr* * * * g 	 alrz nop aOrs icrO 
* 	 Calculate the number of shifts needed, if it is < 0, it is 
* 	 actually > 80 (16), so return the value in the F register. 
tc tnn * * * rtnf mar fop
 
tr* * * * g aOsw eO alrz nop
 
tc* * * * $ 0010 * gO
 
tr* * ** e-g al fop aO gO
 
tc fnn* * * rtnf 	 mar nop 
* 	 If the number of shifts required is > 16, return the data in the 
* 	 F register. 
tc * * * * $ 0000 * cmrO
 
tc* * * * $ 0000 * el
 
tr* * * * * bsrO eO fO gO
 
tc * * * * shft mar nop
 
tc* * * * $ 0001 * cmr3
 
* 
* 	 prepair to shift the data and return to shifting routine, 
, 
rtnf 	tr * * jp * zro aO eDgl al elgO 
tr* * df* * * * * * 
* 	 Return the contents of the F register. 
comf 	tc * * * * $ 0000 brgO brgl 
tc* * * * $ 0006 tOwa tlwa 
tc* * * * $ 0000 tfOn tfln 
* 	 the 0 in Location 6 of the temporary files is a cycle counter. 
* 	 it keeps track of the class currently being worked upon. 
, 
* 	 After normalizing the data vector, store it, repeat until all 
* 	 the eLements are finished, then repeat the cycle until all four eLements 
* 	 are finished being processed. 
, 
tc * * * * $ 000a 	 tOba tlba 
tr * * * * * 	 lfOn eQ ln nop 
tc* * * * $ 0102 	 fop gO 
* 	 the data vectors are stored in location 100 of the large file 
tr* * * * add aO Load aO llad
 
tr * * * * * LfOu tfOu tflu tflu
 
tr * * * * * tfOu tfOu Lflu tflu
 
tr * * * * * lfOu tfOu lflu tflu
 
tr * * * * * LfOu tfOu lflu tflu
 
tc* * * * $ 0003 nop icr3
 
tc * * * * $ 000a tOba tlba
 
tc * cl3 * * $ 0002 tOwa tlwa
 
OR C NAL PAG 
OP POOR QUAr: 
A-154 
tc* * * * $ 0010 	 LOad llad 
lolp 	tr * in3 * * * tfOu brgO tflu brgl
 
tc* * sr* fpar mar *
 
tr* * * * * LfOu fO Lflu fl
 
tc* * * * $ 0040 * cmr3
 
tc ad * * * lolp mar *
 
tr* * 9 * * fO tfOu fl tflu
 
tc * cl3 * * $ 000a tOra tira
 
, 
* 	 This stores the data normalized data vector in Locations 2-5 of the 
• 	 temporary file. Location, 6 is used for a counter. The results will
 
* 	 be stored in the value pointed to by Location 7 of the temporary fiLe. 
, 
* 	 This will fall through to the matrix processing routine. 
• 	 This is the beginning of the matrix multiply routine.
 
* 
stra 	tc * cla * * $ 0000 brgO brgl 
tc* * * * $ 0006 tOra tlra 
tr* * * * * tfOn p * * 
tc* * * * $ 0010 * q 
tc* * * pLql $ 0040 * gO 
* 	 was $ 0000 to *nop! 
tr * * * plql * muit eO nop nop 
* 	 tc* * * * $ 0040 nop gO 
* 	 this was here. 
tr * * * * add aO lOad aD lad 
tc * * * . $ 0001 tOba tlba 
tc * * * * $ 0001 tfOu tflu 
* 	 For indirectly addressing the current row of the normalized 
* data vector 
* 
tc* * * * $ 0002 	 tOba tlba 
m~ty 	tr * inl * * * tfOu e0 tflu el 
* This loads the multiplicand into the eO-el register pair. 
tc * * sr * fpmr 	 mar nop 
* This does the program jump to the fLoating point muLtipl-y routine. 
tr* * * * * 	 lflu gl LtOu go 
A-155 
* 	 This step is done before the jump is actualLy executed. This wiLl Load the 
* 	 multiplier into the gO-g1 register pair. (F=EXG floating point mult) 
tc * * sr * fpar 	 mar nop 
* 	This step will do a jump to the floating point addition routine.This rout­
* 	 ine calcutates the sum of the contents of the F register and the BRG regis­
* 	ter pair. The result of the add is then stored in the F register. 
tc* * * * $ 0004 nop icrl
 
tc* * * * $ 0004 nop - cmr3
 
* the 0004 tests for indexl <> its compare 
* 	 This is executed before the jump. It will just Load the condition. register 
with the next condition to be tested. 
tcad * * * mlty mar nop
 
tr* * * * * fO brgO fl brgl

* 
* 	 On index register 1 not equal to its compare, jump to beginning of multiply 
* 	 routine. ­
tc* * * * $ 0001 tOba tlba
 
tr* * * * * tfln eQ nop nop
 
* 	 get address of jth item in the data vector. 
-tr * * * * acO al tf0d aO tfld
 
tr * * * * acO aO tOra aO tira
 
* 	 tr* * * * acO al * aO tlra 
* 	 the above was a change to insure that the program works, this is kept. 
this will update the address for the next round, store it, and point to the 
* item in question. 
, 
tr * in2 * * * 	 tf0c eO tflc el 
* 	this will load the multiplier for the second multiply into the eO-el reg­
* 	 ister pair. Simultaneously, this wilL zero the temp file pointers. They 
* 	will now point to the location of the accumulator. 
* 
tr* * * * * bsrO gl bsrl gO
 
tc * * sr * fpmr mar nop
 
tr* * ** g aO gl al gO
 
* 	 this is just a subroutine jump to the floating point multiply routine. 
* 	 f=EXG 
tc * * sr * fpar mar nop 
tr * * * * * tfOn brgO tfln brgl 
,
* 	 F=F+BRG. This calculates the subtotal of the miatrix multiply. 
A-156
 
tr * * * * fO tfOn fl tfln 
tc * * * * $ 0002 tOba tlba 
* The above two steps load the sub total into the temporary file Location 
" zero. It them resets the read and write pointers of the temporary file t 
" Location two. 
tc* * * * $ 0004 nop icr2 
t* * * * $ 00o nop cmr3 
* the D010 tests for index2 # its compare. 
* This will do a test for index 0 not equal to its compare register. 
tc ad cll * * mLty mar nop
 
*tc * * * * $0000 brg0 brgl
 
tr * * * puql * * * mult mch3
 
tc * * * * $ 0000 tOba tiba
 
'tc* * * * $ 8000 g nop
tc* * * * $ 0000 hop 'go 
tr * * * * * ttOn eO tfln el 
tr * * * * xpr aO brgO al. brg1 
* quadratic quadratic * -i 
, 
tc * * * * $ 0006 tOba tlba
 
.tr * * * * tfOn Oad tfOn Had
 
* location of Log(Isigmal)
* 
tc * * sr * fpar mar nop
 
r * * - * * IrOn tO flin 1
 
tr * *--* puql * * * mult mcr3
 
• 	 calculate lqg(det(sigma))+quadratic
 
tr * * * * zro 	 aO eO 11 el 
tc* * * * $ 0001 gl nap
 
tr* * ** zro fO eO aO gO
 
,sh * * * * * Icirn Lcirn Icirn s
 
sh * * * * * 1cr tcirn cirn s
 
tr* * * * e-g ao eD al el
 
sh * * * * * rcir rcir rcir s
 
sh* * * * * rcir rcir rcir s
 
tr * * * plqu * nop fop muLt mcr3
 
etst 	tc* * * * $ 0002 nop cmrO 
tc fnn * * * pos mar nop 
* 	 For some reasons, exp only works for positive exponents. This 
* 	 addition should change the data so that it is positive. It will 
* then take the result and invert it by dividing by one. 
, 




* By here the number is negative. This strips off the sign. 
tc * * sr * fexp mar nop 
sh* * * * * rzin nzin nzin s 
* 
* call the exponentiation routine. 
, 
tc* * * * $ 8000 brgO nop
 
tc* * * * $ 0001 nop brgl
 
* This is 1.0 in the computers notation. 
* 
tc * * sr * fdiv mar nop
 
tr * * * e aO fO al fl
 
tr * * * plqu * fop nop mult mcr3
 
* This fdiv routine will calcuLate 1/f, which is the same result 
* as the exp should give using the correct exponent. 
tc * * * * meet mar nop 
tr* * ** * fO eO fl el 
* Skip the next few statements. 
, 
pos tc* * sr* fexp mar nop
 
tr * * * * * * * * nop 
* by here, the routine is "OK" for positive numbers. No special processing 
* needed here. When done, just fall through 
meet tr * * * * * nop nop * * 
* OKOKOKOKOKOKOKOKOKOKOKOKOKOKOKOKOKOKOKOK to the first time through!!!!! 
* this calculates exp{.5Elog(det(inv(sigma)))-quadraticd} 
, 
tr * * * plqu * fop nop mult mcr3
 
tc * * * * $000f tOba tlba
 
tr* * * * * tfOn eO nop nop
 
tr * * * * e+l aO tfOn aO tfln
 
tr* * * * e aO 1Oad aO llad
 
tr * * * * * bsrO IfOn bsrl Ifln
 
* store new value in the location given in tfOnEf] 
* this will increment tfOn[6], the pointer to this array. 
tc * * * * $ 0006 tOba tlba
 
tr* * * * * tfOn eO * *
 
tr* * * * e+l a0 tf0c aO idxO
 
ORIGINAL PAGE IS 
C r fnn rflTfATTr 
A-158
 
tr * * * * tfOn IrOn tfln lfln 
tc * * * $ 0004 * icr 
tc* * * * $ 0001 * cmr3 
tc * * * * $0004 p * 
tr * * * e al * aO 
tc* * * OiO1 $ 0151 * go 
tr* * * plOL * muLt 6O tfOn idxO 
tr* * * * add aO Load al * 
tr* * ** add al * aO lad 
tc* * * * $0000 tOba tiba 
ic* * * * $ 0000 tfOu tflu 
tc 4 * * $ 0001 tfOu tflu 
tc* * * * $ 000a tOba tlba 
tc* * * ) $ 0002 tOra tlra 
tc adn* * * 1op mar * 
tc* ca'* * $S 0000 eO el 
finl tc * * * d 	 Load 1lad$li50 
tr * * Jppupu* mult brbO * * 
tc* * df* $ 0000 fop nop 
fcmp tc * * * * $ 0000 	 tOba tlba 
* This accepts the data in the E register and G register as * 
* Inputs. Iriti-LL , the program stores the original data in * 
* teifporary file. the E register goes in location 0 and the * 
* G register goes in Location 1. The foLLowing will also * 
" strip off the sign bit * 
tr*-* * * e 	 aO tfou a1 tflu 
tr* * * * g aG tfOu al tflu 
* ******** ******* ** ****** ****** *** ****** ************ **** * 
" This routine strips off the sign bit. The correct sign bit 
" is saved in the PAST register. * 
shuns * * * * Izin nzin Izin s 
sh* * * * * rzin nzin rzin s 
tr* * * * e alsw eO aOsw el 
tr* * * * g aOsw gl alsw go 
****** ************************** ************* **** * 
* 	The 0002 in cmrO will check for El negative. This is done * 
* 	 in the past sense. If E1<0, then jump to the routine that * 
* 	 wilt handle that case. * 
** ******* **** ** ***** *************************** ** *** ********* 
tc* * * * $ 0002 nop cmrO 
tc* * * * $ 0000 nop cmrl 
tc tpn* * * emng mar * 
*-by this point, the E register must not be negative (>0) * 
* 	 the 0020 in the cmrO gill test for g<O. If g<O, e is the * 
greater of the two numbers. If not, they are both >= 0. * 
A-159
 
tc* * * * $ 0020 * cmrO 
tc tpn * * * egrt mar * 
This will determine if there is a difference in exp sgn. * 
tc tpn * * * $ 0000 * cmrl 
tc tnn * * * gxng mar * 
* This will do a jump if the sign of G is 1, or G negative * 
* in the exponent portion. * 
tc* * * * $ 0002 * cmrO 
tc tnn * * * ggrt mar * 
******************* ******** ******* ** ************* ****** * 
* By here, the exponent of g is positive. If the exponent of * 
* E is negative, both mantissas being positive, e<g * 
* ** ************ ****** *** ** ******** ** ****** ******** ******** * 
tr fnn* * * e-g aD eO al el 
tc tnn* * * ggrt mar * 
tc fnn* * * $ 0000 fO fl 
tc * * jp * $ 0000 tOba tlba 
tr * * df* * tfOn eO tfln el 
* Since both exponents and mantissas are nonnegative, this * 
" routine calculates e-g, exponents in the HOBP and mantissas* 
* in the LOBPs. If the result is < 0, g>e, else return E. * 
ggrt 	tc * * * * $ 0001 tOba tlba 
tc * * jp * $ 0001 fO fl 
tr* * df* * tfOn eO tfln el 
* if 	fl>O, e>g, return tfElJ * 
* ** *** *************** **************** ********* ********* ****** * 
egrt 	tc* * * * $-0000 fO f1 
tc * * jp * $ 0000 tOba tlba 
tr* * df* * tfOn eD tfln el 
********* *********** ******** *** ** **** ** ***** ** ************** 
" This is the section of the program that is called if E is * 
* negative. (mantissa) * 
emng tc* * * * $ 0020 * cmrO 
tc fpn * * * ggrt mar nop 
* This section does the compare if both the operands are < 0 * 
* This wilL determine if there is a difference in exp sgn. * 
ncmp tc* * * * $ 0000 * * 
tc tnn* * * gbng mar * 
* This will do a jump if the sign of G is 1, or G negative * 
* in the exponent portion. * 
A-160
 
tc* * * * 0002 * cmrO 
tc fnn * * * nnpp mar * 
" By here, the exponent of g is positive. If the exponent of *
 
* E is negative, both mantissas being negative, e>g *
 
tc tnn* * * $ 0000 fO fl 
tc * * jp* $ 0000 tOba tiba 
tr* * df* * tfOn eO tfln el 
****************** * ******************* ****** **** **** 
* The above will return e 	 t 
gbng 	tc tnn* * * ebng mar * 
" Both G's exponent and sign are negative. If true, the same
 
* holds true for E. If this is false, return g. 
tc * * * * 0001 tOba tlba 
tc * * jp * $ 0001 fO fl 
tr* * df* * tfOn eO tfln el
********* ************** ** **************** ** ********** **** ** * 
ebig tr* * * * e-g aO eO al el 
tcfn* * * ggrt mar * 
tc* * * * $ 0000 fO fl 
tc * * jp* $ 0000 tOba t-ba 
tr * * df* * tfOn eO tfln el 
Both the mantissa and the exponent of both E and- are * 
* less than zero. calculate e-g. if result positive, g>e * 
gxng tc fnn * * * egrt mar nop 
tc* * * * $ 0000 * * 
test tr* * * * e-g aD eO al el 
tc fnn* * * egrt mar nop 
tc tnn * * * ggrt mar fop 
tc* * * * $ 0000 * * 
** ******* * * ***** ******* ******** ************ **** *********** 
" at this (preceeding Line) both E and G are positive. The * 
" sign of the exponent of g is negative. If the sign of the A 
* exponent of E is positive, e>g, hence return E. 
nnpp 	tr* * * * e-g aO eO al el 
tc tnn * * * egrt mar nop 
tc fnn * * * ggrt mar nop 
tc* * * * $ao00 * * 
ORIGINAL PAGE a3 
OF POOR QUALIT 
C -­
