Architecture and data processing alternatives for Tse computer.  Volume 1:  Tse logic design concepts and the development of image processing machine architectures by Bodenheimer, R. E. & Rickard, D. A.
General Disclaimer 
One or more of the Following Statements may affect this Document 
 
 This document has been reproduced from the best copy furnished by the 
organizational source. It is being released in the interest of making available as 
much information as possible. 
 
 This document may contain data, which exceeds the sheet parameters. It was 
furnished in this condition by the organizational source and is the best copy 
available. 
 
 This document may contain tone-on-tone or color graphs, charts and/or pictures, 
which have been reproduced in black and white. 
 
 This document is paginated as submitted by the original source. 
 
 Portions of this document are not fully legible due to the historical nature of some 
of the material. However, it is the best reproduction available from the original 
submission. 
 
 
 
 
 
 
 
Produced by the NASA Center for Aerospace Information (CASI) 
https://ntrs.nasa.gov/search.jsp?R=19760026763 2020-03-22T13:26:31+00:00Z

National Aeronautics and Space Administration
Goddard Space Flight Center
Greenbelt, Maryland 20771
FINAL REPORT. Contract NSG-5002
Architecture and Data Processing
Alternatives for the Tse Computer
VOLUME 1: Tse Logic Design Concepts
and the Development of Image Processing
Machine Architectures
D. A. Rickard
R. E. Bodenheimer
TECHNICAL REPORT 'R-EE/CS-76--1
September 1976
(10 0OT 1976
RECEIVED,
NASA STI I:ACILRY
INPUT BRANCH ^
AA
ARCHITECTURE AND DATA PROCESSING
ALTENNATIVES FOR THE TSE COMPUB
VOLUME 1. TSE LOGIC DESIGN CONCEPTS AND THE
DEVELOPMENT OF IP9AGE PROCESSING MACHINE ARCHITECTURES
Robert E. Bodenhelmer - Principal Investigator
We A. Rickard - Co- Investigator
Department of Electrical Engineering
The University of Tennessee
Knoxville, Tennessee 37916
Final Report. NSG-5002
Period; May 1974 - August 1976
NATIOML AER94AUfICS AND SPACE A91INISTRATION
GODDARD SPACE FLIGiIT CNER
GREQULL MRYl MI) 20771
i
ABSTRACT
Schaefer and Strong have proposed a new class of digital computer
components which would perform two-dimensional array logic operations
(tse logic) on binary data arrays. This dissertation is concerned with
the further development of tse logic concepts through the design of Golay
transform processing machines that utilize tse logic.
The basic tse components that are currently under development
at NASA's Goddard Space Flight Center are described. The properties
of Golay transforms which make them useful in image processing are
reviewed, and several architectures for Golay transform processors are
presented with emphasis on the skeletonizing algorithm.
A hardwired skeletonizing machine is designed using basic tse
components. An output disable control line is shown to be an extremely
useful addition to active tse logic devices. Two additional hardwired
skeletonizing machines are developed using tse logic devices with
an output disable control. Several alternate techniques are illustrated
for performing the critical index recognition operation, and new tse
logic devices are introduced. A unique pipeline architecture is
developed for performing ultrahigh speed image processing. In addition,
a programmable tse computer capable of performing numerous Golay trans-
forms is designed. Programs are written for performing both the skel-
etonizing and swelling algorithms.
Conventional logic control units are developed for the Golay
transform processors. One is a unique microprogrammable control unit
that uses a microprocessor to control the tse computer. The remaining
M
control units are based on programmable logic arrays.
Performance criteria are established and utilized to compare
the various Golay transform machines. On the basis of this research
a critique of tse logic is presented, and directions for additional
research are identified.
iv
TABLE OF CONTENTS
CHAPTER PAGE
1. INTRODUCTION	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . l
Historical	 Background .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 2
Future Requirements .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 19
2. BASIC TSE LOGIC DEVICES AND CONCEPTS . 	 .	 .	 .	 .	 .	 .	 .	 .	 . 21
TseLogic .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 21
Electro-Optical	 Tse	 Logic .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 23
Tse Analog-to-Digital	 Conversion	 .	 .	 .	 ...	 .	 .	 .	 .	 .	 . 33
3. GOLAY HEXAGONAL PARALLEL PATTERN TRANSFORMATIONS . . . . 36
Neighborhood Considerations .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 ...	 .	 . 36
Golay Transforms	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 40
4. BASIC TSE LOGIC DESIGN CONCEPTS AND THEIR APPLICATION TO THE
DESIGN OF A GOLAY TRANSFORM PROCESSOR. 	 .	 .	 .	 .	 .	 .	 .	 .	 . 47
Elementary Tse Processor Control 	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 48
Tse Memories	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 51
Hexagonal-To-Rectangular Array Transformation. . . . 	 . 56
Golay Neighbor Planes .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 58
Index Recognition . 	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 66
Golay Function	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 72
Implementation of the Skeletonizing Algorithm. 76
Evaluation of Skeletonizing Machines .	 .	 .	 .	 .	 .	 .	 .	 . 83
5. IMPLEMENTATION OF THE SKELETONIZING ALGORITHM USING TSE
LOGIC DEVICES WITH A DISABLE INPUT .	 .	 .	 .	 .	 .	 .	 .	 .	 . 86
v
f
CHAPTER PACE
Improved Conventional Logic Control Signal Interface
to Tse Circuits	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 86
Some Additional
	
Improved Tse Memories . 	 .	 .	 .	 .	 .	 .	 . 91
Improved Index Recognition Circuits .	 .	 .	 .	 .	 .	 .	 .	 . 97	 t
An OR Latch Implementation of the Skeletonizing
Algorithm	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 ..	 . 104
Control of the OR Latch Skeletonizing Machine . 	 .	 .	 . 110
An Improved AND Latch Implementation of the Skeletonizing
Algorithm	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 113
6.	 A PIPELINED ARCHITECTURE FOR THE SKELETONIZING
MACHINE	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 120
Multiple Tse Processing	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 120
A Minimum Hardware, Modified Pipeline Implementation
of the Skeletonizing Algorithm	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 121
Additional Modified Pipeline Implementations of the
Skeletonizing Algorithm . 	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 133
7.	 A SPECIAL PURPOSE PROGRAMMABLE TSE PROCESSOR 	 . .	 . .	 . 143
A Programmable Tse Computer .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 143
Tse Computer Control	 Unit .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 147
Tse Computer Instruction Set	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 152
Microprogram Control of Tse Operations 	 .	 .	 .	 .	 .	 .	 . 158
A Cross-Assembler for the Tse Computer 	 .	 .	 .	 .	 .	 .	 . 169
Application Program Examples 	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 176
Tse Computer Performance Evaluation . 	 .	 .	 .	 .	 .	 .	 .	 . 178
vi
rCHAPTER PAGE
S.	 CONCLUSION
	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 180
A Critique of Tse Logic .	 .	 .	 .	 .	 . , .	 .	 .	 .	 .	 .	 .	 .	 . 180
Suggested Directions for Future Research	 .	 .	 .	 .	 .	 . 181
REFERENCES .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 183 {
APPENDIXES	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 185
APPENDIX A.	 SCHEMATIC SYMBOLS FOR TSE LOGIC
DEVICES	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 186
APPENDIX B.	 TSE MASK PATTERNS	 .	 .	 .	 .	 ...	 .	 .	 .	 .	 . 190
APPENDIX C.	 THE CDP1802 MICROPROCESSOR . 	 .	 .	 .	 .	 .	 . 192
APPENDIX D.	 CONTROL MICROPROGRAMS FOR THE TSE
COMPUTER	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 205
APPENDIX E.	 RT-11 MACRO ASSEMBLER	 .	 .	 .	 .	 .	 .	 .	 .	 . 222
APPENDIX F.	 SELECTED MACRO DEFINITIONS FROM THE
TSE COMPUTER CROSS-ASSEMBLER MACRO LIBRARY 234-
APPENDIX G.	 SAMPLE APPLICATIONS PROGRAMS FOR THE
GOLAY TRANSFORM TSE COMPUTER	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 246
vii
LIST OF TABLES
TABLE	 PAGE
1.1 Golay Neighborhoods . . . . . . . . . . . . . . . . . . . . 	 13
4.1 Characteristics of the Golay Neighbor Planes Generator
Circuits for Rectangular Arrays of Three or Seven
Subfi el ds . . . . . . . . . . . . . . . . . . . . . . . . . 	 68
4.2 Characteristics of the Space and Time Iterative Index
Recognition Circuits . . . . . . . . . . . . . . . . . . .
	 74
4.3 Basic Hardwired Skeletonizing Machine Performance
Characteristics . . . . . . . . . . . . . . . . .
	 . . .	 84
5.1 Performance Characteristics of the Index Recognition
Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . 	 106
5.2 OR Latch Skeletonizing Machine Control Signals. . . . . . . 114
5.3 Improved AND Latch Skeletonizing Machine Control
Signals . . . . . . . . . . . . . . . . . . . . . . . . . .	 118
5.4 Performance Characteristics of the Improved AND
Latch and OR Latch Skeletonizing Machines . . . . . . . . . 119
6.1 Minimum Hardware, Modified Pipeline Skeletonizing
Machine Control Signals . . . . . . . . . . . . . . . . . . 134
6.2 Control Signals for the Modified Pipeline Skeletonizing
Machine with Three Golay Neighbor Planes Generators . . . . 138
6.3 Control Signals for the Modified Pipeline Skeletonizing
Machine with the Space Iterative Index Recognition
Gi rcui t . . . . . . . . . . . . . . . . . . . . . . . . . . 	 140
6.4 Performance Characteristics of the Modified
Pipeline Skeletonizing Machines . . . . . . . .
	 . . . . 141
viii
i
N I
TABLE PAGE
7.1 Tse Computer Control Signal	 Functions .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 153
7.2 Tse	 Instructions .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 154
7.3 Register and Mask Constant Definitions.	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 157
7.4 Tse Instruction Characteristics 	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 171	 r
7.5 Special Tse Computer Instructions	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 173
7.6 Performance of the Tse Computer as a Skeletonizing
Machine	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 179
A.1 Schematic Symbols for Tse Logic Devices ... 	 .	 .	 .	 .	 .	 .	 .	 . 187
B.1 Tse	 Mask
	
Patterns	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 19I
C.1 CDPI802 Microprocessor Instruction Set. 	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 195
C.2 COSMAC CDP1802 Register Assignments 	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 204
E.1 Legal	 Separating Characters	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 223
E.2 Special
	 Characters .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 224
E.3 Operator Characters	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 225
E.4 Legal	 Binary Operators .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 226
E.5 Some Allowable Listing Directive Arguments. 	 .	 .	 .	 .	 .	 .	 .	 . 229
E.6 Allowable	 Conditions .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 231	 .
E.7 Subcondi ti oval
	
Directives	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 232
ix
LIST OF FIGURES
FIGURE	 PAGE
	
1.1 Organization of SPAC . . . . . . . . . . . . . . . . . . . .
	 3
1.2 Arrangement of link circuits in the rectangular
	
neighborhood of a single processing elevent . . . . . . . .
	 5
	
1.3 Organization of Solomon . . . . . . . . . . . . . . . . . .
	 5
	
1.4 Organization of ILLIAC IV . . . . . . . . . . . . . . . . .
	 9
1.5 Arrangement of Golay's proposed hexagonal module array. . . 11
1.5 Golay neighborhood pattern for a hexagonal array
partitioned into three subfields. . . . . . . . . . . . . . 12
1.7 A module of the parallel picture proces.w g machine
	
proposed by Kruse . . . . . . . . . . . . . . . . . . . . .
	 15
1.8 Logic unit organization for the serial model of the
parallel picture processing machine . . . , . . . . . . . . 17
1.9 Organization of a cell of CLIP 3 . . . . . . . . . . . . . . 18
2.1 An example of primitive tse operations on binary
images . . . . . . . . . . . . . .
	 . . . . . . . . . . . .
	 22
	2.2 Basic tse logic devices. . . . . .
	 . . . . . . . . . . . . 	 25	
t
2.3 Two modes of interleaves operation . . . . . . . . . . . . 26
2.4 A prototype interleaver . . . . . . . . . . . . . . . . . . 27
2.5 Exploded view of a tse EXCLUSIVE-OR circuit . . . . . . . . 28
2.6 Alternate slide down device structures. . . . . . . . . . . 30
2.7 Alternate implementations of the contractor device. . . . . 31
2.8 An image intensifier implementation of the spiller
	
dev li ce . . . . . . . . . . . . . . . . . . . . . . . . . . .
	
32
X
FIGURE	 PAGE
2.9 Tse analog-to- digital conversion hardware schematic
	
for an eight level image . . . . . . . . . . . . . . . . . 	 34
	
2.10 Tse computer concept . . . . . . . . . . . . . . . . . . . 	 35
	3.1 Eight point rectangular neighborhood. . . . . . . . . . . 	 38
	
3.2 Six point hexagonal neighborhood . . . . . . . . . . . . . 	 39
	
3.3 hexagonal arrays of three, four, and seven subfields. . . 	 42
3.4 An example of the Golay transform skeletonizing
	
algorithm . . . . . . . . . . . . . . . . . . . . . . . .
	
45
	
4.1 One tse OR latch . . . . . . . . . . . . . . . . . . . . .
	
52
	
4.2 One tse AND latch . . . . . . . . . . . . . . . . . . . .
	
53
4.3 Master-slave tse memory . . . . . . . . . . . . . . . .
	
55
4.4 Subfield assignments and neighborhood patterns for rectan-
gular arrays with three, four, and seven subfields.	 57
4.5 Golay neighbor planes generator for rectangular
	
arrays with four subfields . . . . . . . . . . . . . . . . 	 59
4.6 Type 1 Golay neighbor planes generator for rectangular
	
arrays with three or seven subfields. . . . . . . . . . . 	 61
4.7 Type 2 Golay neighbor planes generator for rectangular
arrays with three or seven subfields. . . . . . .
	 . .	 62
	
4.8 Tse EXCHANGE gate . . . . . . . . . . . . . . . . . . . . 	 64
	
4.9 Equivalent circuit for an EXCHANGE gate . . . . . . . . .
	
65
4.10 Type 3 Golay neighbor planes generator for rectangular
	
arrays with three or seven sub-fields. . . . . . . . . . .
	
67
4.11 Space iterative combinational index recognition circuit
	
for indices one, two, and three . . . . . . . . . . . . .
	
70	
i
xi
FIGURE	 PAGE
4.12 It.--a iterative, comparison type index recognition
	
circuit. . . . . . . . . . . . . . . . . . . . . . . . . .	 71
4.13 "timing diagram for recognizing an index with a weight
of three using the comparison type index recognition
	
ci rcui t . . . . . . . . . . . . . . . . . . . . . . . . . .	 73
	
4.14 Golay function circuit for a swelling operation. . . . . .	 75
4.15 Golay function circuit for a skeletonizing operation	 .	 75
4.16 Block diagram of a tse logic implementation of the
	
skeletonizing algorithm . . . . . . . . . . . . . . . . . .	 77
	4.17 Schematic of a hardwired skeletonizing machine . . . . . . 	 78
4.18 'timing diagram for one iteration of the skeletonizing
algorithm . . . . . . . . . . . . .
	
. . . . . . . . . . .	 80
4.19 Images formed during one iteration of the skeletonizing
algorithm . . . . . . . . . . . . . .
	
. . . . . . . . 	 81
5.1 Basic conventional logic control signal interface to	
1
i
	
tse circuits . . . . . . . . . . . . . . . . . . . . . . . 	 37
	
5.2 Improved one tse OR latch . . . . . . . . . . . . . . . . . 	 89
	
5.3 Improved one tse AND latch . . . . . . . . . . . . . . . . 	 90
	
5.4 Improved master--slave tse memory . . . . . . . . . . . . . 	 92
5.5 Six tse circular right shift parallel-input, parallel-
	
output master-slave shift register . . . . . . . . . . . .	 93
5.6 Control signal timing diagram for the parallel-input,
	
parallel-output master-slave tse shift register. . . . . .
	
94
5.7 Six tse circular right shift, parallel-input, parallel-
	
output shift register . . . . . . . . . . . . . . . . . . . 	 95
xii
J.-
FIGURE PAGE
5.8 Control signal timing diagram for the parallel-input,
parallel-output tse shift register .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 96
5.9 Multiplexed index recognition circuit .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 98
5.10 Timing diagram for the multiplexed index recognition
circuit.	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 99
5.11 Shift register based index recognition circuit.	 .	 .	 .	 .	 . .	 101
5.12 Timing diagram for the shift register based index
recognition	 circuit	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 ...	 .	 .	 .	 .	 .	 .	 . .	 102
5.13 Comparison type index recognition circuit using
EXCLUSIVE-OR gates.	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 103
5.14 Timing diagram for the comparison type index recognition
circuit using EXCLUSIVE-OR gates to identify basis points
with an index of one, two, or three .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 105
5.15 Hardwired skeletonizing machine using OR latches. 	 .	 .	 .	 . .	 107
5.16 Timing diagram for one complete iteration of the skele-
tonizing algorithm using the OR latch type skeletonizing
machine.	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 l l 1
5.17 Control unit for the OR latch skeletonizing machine . .	 . .	 112
5.16 Schematic diagram for an improved AND latch hardwired
skeletonizing machine	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 115
5.19 Timing diagram for one complete iteration of the skele-
tonizing algorithm using the improved AND latch skeletonizing
machi ne	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 116
6.1 Three plane mixer
	
.	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	
122
6.2 Block diagram of a traditional pipeline machine architec-
ture........	 ................... 123
xi i I.TpoDUCIE3TL1TY Or r..
'AGE IS P(W
f
FIGURE	 PAGE
6.3 Block diagram of a modified pipeline architecture for
tse logic image processing machines using Golay trans-
forms	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 126
6.4 Block diagram of a modified pipeline tse logic implementa-
tion of the skeletonizing algorithm . 	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 127
6.5 A two plane mixer	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 128
6.6 Schematic for the minimum hardware, modified pipeline
implementation of the skeletonizing algorithm . 	 .	 .	 .	 .	 . 131
6.7 Timing diagram for the minimum hardware, modified pipe-
line implementation of the skeletonizing algorithm.	 .	 .	 . 132
6.8 Schematic for the modified pipeline implementation of
the skeletonizing algorithm with three Golay neighbor
planes	 generators	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 135
6.9 Timing diagram for the modified pipeline skeletonizing
machine with the shift register based index recognition
circuit	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 136
6.10 Timing diagram for the modified pipeline skeletonizing
machine with the space iterative index recognition
circuit	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 139
7.1 A special purpose tse computer organization .	 .	 .	 .	 .	 .	 . 144
7.2 A three subfield mask generator circuit .	 .	 .	 .	 .	 .	 .	 .	 . 146
7.3 Tse logic for the Golay transform tse computer.	 .	 .	 . .	 . 148
7.4 Organization of a conventional microprogrammed control
unit ........................... 150
7.5 Block diagram of the tse computer control unit. . . . . . 	 151
xi v
FIGURE	 PAGE
7.6 A flow chart for the general ALU operations control
program.	 .	 .	 159
7.7 A timing diagram for the tse register and ALU operations
(except TCLRI) . . . . . . . . . . . . . . . . . . . . . . 	 162
7.8 A flow chart for the tse compare operations control
program.	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 ...	 .	 .	 .	 . 163
7.9 A timing diagram for the tse compare operations. .	 .	 . .	 . 164
7.10 A flow chart for the index recognition control program 165
7.11 A timing diagram for recognizing an index with a weight
oftwo	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 167
7.12 A flow chart for the tse input control program .	 .	 .	 .	 .	 . 168
7.13 A timing diagram for the tse input operation .	 .	 .	 .	 .	 .	 . 170
7.14 A macro definition for the TMIX instruction.	 .	 .	 .	 .	 .	 .	 . 175
7.15 A timing diagram for the TMIX instruction. 	 .	 .	 .	 .	 .	 .	 .	 . 177
C.1 Internal structure of the CDP1802 COSMAC
Microprocessor	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 193
D.1 Tse computer general ALU operations control program.	 .	 .	 . 206
D.2 Long delay subroutine for variable program counters. 211
D.3 Tse computer compare operations control program.	 .	 .	 .	 .	 . 212
D.4 Tse computer index recognition control program . 	 .	 .	 .	 .	 . 215
D.5 Long delay subroutine for R3 as the calling program
counter.	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 219
0.6 Tse computer input control program . 	 ,	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 220
xv
It
FIGURE	 PAGE
F.l Representative macro definitions from the tse computer
cross-assembler macro library . . . . . . . . . . . . . . . 235
G.l A program for performing the Golay transform skele-
toni zi ng algorithm . . . . . . . . . . . . . . . . ... . .
	
247
G.2 A program for performing the Golay transform swelling
algorithm . . . . . . . . . . . . . . . . . . . . . .
	 . .	 252
CHAPTER 1
INTRODUCTION
A primary goal of computer architecture research is to increase the
speed and efficiency of computing machines. The performance boundary
for a particular machine organization is imposed by the basic physical
limitations of the available hardware. Therefore, increased computing
power must ultimately be obtained through improvements in computer organ-
ization. Realizable computer architectures, however, are constrained by
the availability of suitable hardware. Thus, optimum computer system
development requires that component design and system architecture be
considered in concert. This principle is the basis for the general
goals of the research reported in this dissertation.
Schaefer and Strong [1] * have proposed a new class of digital com-
puter components which would perform two-dimensional array logic opera-
tions (tse** logic) on binary data arrays. The goal of this research
effort is to support NASA's development of the tse logic concept through
the design of a two--dimensional parallel computer using hypothetical tse
logic devices. By developing tse computer system architecture concepts
*Numbers in brackets refer to those entries listed in the list of
references.
**Tse is the English transliteration of the Chinese word for a
pictograph character.
1
2concurrently with the physical hardware, trade offs between tse component
complexity and system design constraints can be optimized. For example,
useful additions to the tse logic family have been proposed as a result
of this research.
Historical Background
The concept of a two dimensional parallel computer was utilized by
Unger [2, 3] as a method for -improving th4 performance of digital com-
puters in spatially oriented problems such as pattern detection and
recognition. Unger's proposed spatial or SPAC computer consists of a
master control unit and a rectangular array of logical modules (Figure
1.1). Each module contains a one-bit accumulator, several one-bit mem-
ory registers, and some logical circuitry. Modules communicate directly
with their four immediate neighbors. In addition, there is an external
input in the form of a photocell. Thus, with the exception of the con-
trot unit, a module has all the Features of a rudimentary conventional
computer.
Unger's SPAC computer utiltizes global control in which identical
commands are issued to all logical modules in the array. The master
control unit includes a random-access memory for instruction storage, a
program counter, and appropriate instruction decoding circuitry. Opera-
tion is similar to that of a conventional digital computer control unit
except that the control signals are distributed to each module in the
rectangular array, rather than to a single arithmetic-logic unit. The
global control feature of SPAC limits use of the machine to problems
involving applied parallelism.
3t
- -P-
-I"-i
`0►
r 0% JOW I A" w f	 A A 0'% r% I I I r•
Figure 1.1 Organization of SPAC.
t	 .
4Unger recognized the importance of connectivity in spatial computer
operations. As a result, each SPAC module is connected to its eight
nearest neighbor nodules through special link circuits (Figure 1.2).
When a link instruction is executed, a storage register in each link is
set if the accumulators of the two logical modules to which the link is 	 `
connected both contain a fine. The storage register is cleared if the
accumulator of one, or both, of the modules contains a zero. The state
of each link storage register remains fixed until another link instruc-
tion is executed.
Expand instructions are used in conjunction with the link operation.
to perform tasks involving connectivity. For example, a horizontal
expand instruction causes a one to be placed in the accumulator of each
module which is connected through a horizontal chain of set link elements
to two modules with ones in their accumulators. An expansion may be per-
formed wit,',
 respect to any combination of the four available link orien-
tations: horizontal, vertical, positive diagonal, and negative diagonal.
The real significance of the expand operation is that the contents of a
module can be directly affected by the contents of arbitrarily distant
modules.
Although Unger's computer was impractical in terms of hardware cost,
the architecture of SPAC was simulated on an IBM 704 general purpose com-
puter. In this form, the machine was successfully applied to the recogni-
tion of alphanumeric characters and the detection of L-shaped patterns.
In 1962 Slotnick [41 proposed a more flexible parallel processing
computer called SOLOMON. like Unger's machine, the SOLOMON computer,
shown in Figure 1.3, consists of a large rectangular array of processing
,,
5Figure 1.2 Arrangement of link circuits in the rectangular .neighborhood
of a single processing element.
1 A-66;
COLUMN SELECT
Figure 1.3 Organization of Solomon.
cn
i	 1
PROGRAM
,CENTRAL
CONTROL
SWORE i
t
UNIT F
NETWORK i
+
!
CONTROL
1
i
1
! PE SEQUENCER
_T
i
k 1
R 0,0 1
0 PE	 PI: ^ PE !j; PE
W
PE! - ^ r PE1S
I
PE{PE
E
:
i
=
t	 E
t	 e
1	 k
P	 E
a	 ^
a	 1
tC
T !
1
lcp
1	 ^	 I
-
rPE
t
^	 e	 ,	 ^
_	 ^-
PE	 --^.PE
BULK
STORAGE
1f0 CONTROL
I NSTRUCT[ON
AN D DATA B U S
s^-
f
7
elements (similar to the logical modules in SPAC) and a central, global
control unit. There are, however, three basic differences between
SOLOMON and SPAC. First, the SOLOMON computer introduces limited local
control of each processing element by inhibiting execution of the current
instruction wherever the mode of a particular processing element does not
match the mode specified by the instruction. A processing element may
be in one of four modes as determined by internal conditions and stored
data. Thus, although identical instructions are broadcast to each proc-
essing element, individual conditional jumps can be programmed [5].
The second basic difference between SOLOMON and SPAC is the inter-
communication pattern of the processing element array. Unger's design
is based on a simple rectangular array with communication between a
module and its four nearest neighbors. The SOLOMON computer retains
this basic intercommunication pattern but with five possible modifica-
tions. These are as follows:
1. A vertical cylinder formed by establishing communication between
the outside columns of the array.
2. A horizontal cylinder formed by establishing communication
between the outside rows of the array.
3. A torus formed by combining the first two options.
4. A single straight line formed by a connection of all the proc-
essing elements.
5. A circular array formed by connecting the end processing
elements of the straight line.
A third distinction between the organizations of SOLOMON and SPAC
is the link structure included in SPAC. No equivalent structure was
8provided in the SOLOMON computer since SOLOMON was designed primarily
for numeric processing.
The SOLOMON design was based on a 32 x 32 array of processing ele-
ments. A machine of this size was never constructed due to the exces-
sive hardware cost; however, a 10 x 10 array was built [6]. Research
using this machine led to the design of a SOLOMON II computer with a
faster clock rate, a faster multiply time, and a 24-bit word length [7].
Further studies and the advent of medium scale integrated circuits
encouraged the development of the ILLIAC IV computer which is the larg-
est parallel-array computer now in existence [8].
The arithmetic-logic unit of the ILLIAC IV computer consists of 256
processing elements arranged in four reconfigurable SOLOMON type arrays
of 64 processors each (Figure 1.4). A separate global control unit is
provided for each array so that the machine . can be operated as four
independent quadrants, two 128 element arrays, or one 256 element array.
Overall system control is provided by a Burroughs 8-6500 computer.
Each processing element in the array requires 10 4 emitter-coupled-
logic gates to execute 4 x 10 6 instructions per second. In addition,
there are 2048 sixty-four bit words of memory within each processing
element. A one giga-bit disk with a transfer rate of 109 bits per sec-
and is provided for mass data storage. Because of economic considera-
tions, only one quadrant of the four quadrant ILLIAC IV system was
originally constructed. Additional details of the ILLIAC IV design and
the applications research which has been undertaken using this machine
can be found in the engineering literature [5-9].
In 1969 Golay proposed a two-dimensional computer to perform hex-
agonal parallel pattern transformations using the hexagonal module array
_	
r	
,Y
B— 6500
	
ERIPHERA
GENERAL PURPOSE
	
DEVICES
COMPUTER
CONTROL CONTROL CONTROL CONTROL
UNIT UNIT UNIT UNIT
ARRAY 0 ARRAY I ARRAY 2 ARRAY 3
64 PE's 64 PE's 64 PE's 64 PE`s
DISK	 ^[/O BUFFER
i^^
Figure 1.4 Organization of ILLIAC IV.
10
shown in Figure 1.5 rather than the traditional square module array [10].
The hexagonal tessellation simplifies connectivity dependent operations
since each point in the array has six equally distant neighbors rather
than four primary neighbors at the distance unity and four secondary
neighbors at the distance 12— as in the square module array. In addi-
tion, the Golay transformations specify division of the array into sub-
fields of non-nei ghboring modules (Figure 1.6) so that no two neighboring
modules need be operated on simultaneously as a function of each other.
The basic computational unit in Golay's proposed computer is the
submodule which corresponds to a module in Unger's SPAC computer. Each
submodule represents the state of one point in a two-dimensional binary
image, and the entire image is represented by a planar, hexagonal layer
of submodules. Normally, the machine consists of a number of layers
which overlay each other to form a hexagonal array of modules. All of
the submodules within one module are interconnected and, furthermore,
each submodule of a particular layer, k, communicates directly with its
six nearest neighbor submodules within layer k.
The Golay transform classifies the 64 possible patterns of the six
element surround of a submodule into the 14 characteristic indices shown
in Table I.I. Submodule operations can be a function of the subfield
select signal, the state (1 or 0) of the submodule, the index of the sub-
module's surround, or the central control unit commands. Often, the
modular operations are repeated until no further change occurs in the
layer, k, on which the operations are being performed. This condition is
detected by forging the modulo 2 sum of the current and the immediately
preceding iterations in an auxiliary layer, x, and then determining
Whether or not layer x is empty.
1
i
i
ti
i17
Figure 1.5 Arrangement of Golay's proposed hexagonal module array.
J
12
Figure 1.6 Golay neighborhood pattern for a hexagonal array
partitioned into three subfields.
R0^ L
c ^ IS ^pfiH^
,3A
TABLE 1.1
a
GOLAY NEIGHBORHOODS
0 0 1 1 1 1 1
0 0 1 0 1 0 1 1 1 1 1 1 1 1
Pattern + + + + + + +
0 0 0 0 0 0 0 0 0 1 0 1 1 1
0 0 0 0 0 1 1
Index 0 1 2 3 4 5 6
Weight 1 6 6 6 6 6 1= 32
0 1 1 1 0 0 1
1 1 1 0 1 0 1 1 0 0 1 0 0 1
Pattern + + + + + + +
0 0 0 0 0 0 0 1- 1 0 1 1 0
1 0 1 1 0 0 1
Index 7 8 9 10 11 12 13
Weight 2 6 6 6 6 3 3 E = 32
14
Although the two-dimensional computer proposed by Golay was never
constructed, a special purpose computer system capable of performing
simple Golay transforms on a three layer, 128 x 128 array was imple-
mented Ell]. This Golay logic processor (GLOPR) was used successfully
to distinguish between two types of white blood cells and to perform 	
r
other pattern recognition tasks.
Kruse [12] proposed a parallel picture processing machine with
many features similar to those of the Golay logic processor but utiliz-
ing a rectangular module array. Each module within the array is a
synchronous sequential circuit (Figure 1.7) which has a state transition
function that is dependent on the present states of that module, the
eight nearest neighbors of that module, and a set of global control
signals. The number of possible states which a module may possess is
limited by design economy since,for even a small number of states, the
number of possible neighborhood patterns becomes large.
Each neighborhood pattern is called a template. State transitions
occur when the neighborhood of a module matches one of the templates
specified in a particular instruction. Kruse allows for rotationally
symmetric and iterative operations similar to those used in the Golay
transform. "Don't care" states within the neighborhood pattern and
limited arithmetical operations are also proposed. The potential prob-
lem of conflicting state transitions is avoided by forcing the template
matching operation to proceed in a specified order. Only the state
transition indicated by the first matching template is allowed to occur.
Two essentially serial, information extracting operations are
employed by Kruse's parallel picture processing machine, The first is
}
15
	
U	 U i+liti j-1	 U ii	 U
U33 Ui+1 j+1
CONTROL
-COMBINATIONAL 
	
UNIT
	
Uij	 CIRCUIT	 Ui j+l
U
i_1 j-1	
UU
ij u!-1 j 
Uij	
i-1 j+j
STAY E
REGISTER
Figure 1.7 A module of the parallel picture processing machine
proposed by Kruse.
T 1 TVjCIBUITY OrODT	
-lcuA WAGE IS P01'^ ,^
16
a neighborhood counting operation in which the number of occurrences of
a specific neighborhood is computed during each local operation. This
information is used for texture analysis and area measurements. The
second is a coordinate extraction operation which identifies the coor-
dinate of the first module encountered in a predetermined scanning
r
segbence whose neighborhood matches a specified template. This process
can be used to locate particular features within a picture.
Since all eight nearest neighbors as well as multiple state tran-
sitions are utilized in Kruse's machine, the logical modules must be
quite complex. Therefore, Kruse did not attempt to present a truly
parallel implementation of the design. Instead, a special purpose serial
machine capable of performing one local operation at a time was con-
structed (Figure 1.8). This machine simulates the parallel picture
processing operation by sequentially matching templates to each neigh-
borhood in the image. A conventional computer provides system control.
One of the most recent parallel image processing machines, CLIP 3,
was described by Stamopoulos [13]. CLIP 3 contains an iterative, 16 x 12
array of logic cells. As shown in Figure 1.9, each cell includes a sum-
mation and threshold device, an OR gate, and a function generator. A
global control unit provides three control lines that select one of eight
threshold levels and eight control lines that select the neighbor inputs
which are to be summed. Either a square or hexagonal tessellation can
be selected under program control.
The' output of the threshold unit is ORed with the contents of the
B storage register and applied to one input of the function operator.
Storage register A provides the second function generator input. Eight
control lines select the Boolean operations which are performed on the
PICTURE REGISTER
SHIFT REGISTER
SHIFT REG.0
CONTROL  LINES tn L
ca ca
It I SHIFT REG. 1...
4 3 2
SHIFT REG. 8
5 0 Z
NEIGHBORHOOD
MATCHING
LOGIC
6 7 8
3X3 NEIGHBORHOOD
Figure 1.8 Logic unit organization for the serial model of the parallel picture
processing machine.
w.
	PARALLEL	 SERIAL	 SERIAL
	
PARALLEL
	
OUTPUT	 OUTPUT	 INPUT	 INPUT
Figure 1.9 Organization of a cell of CLIP 3.
m
19
inputs to produce two outputs, n and D. The n output, , which represents
the state of the cell, is supplied to the threshold units of neighboring
cells.
The CLIP 3 processor has been constructed using TTL logic and MOS
memory at a cost of approximately $10,000. Serial scanning and display
devices are used for economic reasons. The A and B storage devices are
shift registers whose contents can be displayed on a dual beam oscillo-
scope. Pattern inputs are supplied by means of a light pen. Several
examples of successful software developed using. CLIP 3 are given in [13].
A major limitation of this machine is the relatively small number of
cells in the array.
Future Requirements
The two-dimensional parallel computers described in the previous
section are potentially orders of magnitude faster than conventional com-
puters in image processing and matrix manipulation tasks. nevertheless,
even more powerful machines will be required to process the 50,000 images
per day that NASA expects to receive from earth observation spacecraft
during the 1980's [1]. In fact, other important tasks such as real-time
modeling of the weather through studies of ocean currents and atmospheric
conditions are already awaiting the advent of sufficiently powerful par-
allel processing machines [14].
In order to provide adequate speed and resolution, future machines
may require arrays containing as many as 1024 x 1024 elements. Construc-
tion of these machines using hardware organizations such as those des-
cribed in this chapter would be a formidable task. Although individual
f20
modules with sufficient capabilities could probably be produced as single
ititegrated circuits (e.g., microprocessors), the problem of interconnect-
ing a million or more components to form the complete array would remain.
The development of inherently parallel tse logic devices as proposed by
Schaefer and Strong [1] is an attempt to make large two-dimensional par-
allel computers a reality.
A summary of the basic tse logic devices proposed by Schaefer and
Strong [1] is presented in Chapter 2. The neighborhood considerations
which directed this investigation toward implementation of the Golay
transform skeletonizing algorithm are discussed in Chapter 3. Chapters
4 and 6 present basic tse logic implementations of the skeletonizing
algorithm. Several new tse logic devices and a cost function are pro-
posed. A hardwired, conventional logic control unit for the tse logic
processor is developed. In Chapter 6, a high-speed implementation of
the skeletonizing algorithm, based on a unique application of the pipe-
line principle, is discussed. A programmable tse computer organization
which can perform Golay transform algorithms is developed in Chapter 7.
A microprocessor based control philosophy is proposed. Chapter S sum-
marizes the significant results of this research and suggests directions
for future investigations.
CHAPTER 2
BASIC TSE LOGIC DEVICES AND CONCEPTS
Previously, researchers have proposed two-dimensional parallel
computer architectures based on planar arrays of logical modules. Each 	 ,
module is essentially a highly specialized microprocessor which, to
obtain maximum processing speed, must have a different organization for
each unique computer architecture. The tse logic devices proposed by
Schaefer and Strong [1] represent an alternate technique for construct-
ing large, two-dimensional parallel computers. This chapter summarizes
the work of Schaefer and Strong as reported in [l].
Tse Logic
A tse is a two-dimensional, rectangular matrix of binary data. Tse
devices execute logic operations simultaneously on all sets of binary
data from corresponding matrix positions within a group of input tses.
The primitive operations AND, OR, NEGATE, and SLIDE demonstrated in
Figure 2.1 form a functionally complete tse logic set. The AND, OR, and
NEGATE operations represent the standard Boolean functions as applied
to tses rather than bits, whereas the SLIDE operation translates a binary
image an integer number of matrix positions in the ± x and/or ± y direc-
tion. Through the four basic SLIDE operations (right, left, up, and
down), tse logic provides a unique method of transferring data from any
tse matrix position (i,j) to any other matrix position (m,n). Any
Boolean function of arbitrarily .selected tse data points can be generated
.
21
22
I
	
^I
TSE A	 TSE B
AND H IX OR
NOT A	 SLIDE B DOWN
Figure 2.1 An example of primitive tse operations on binary images.
J
a
23
using the primitive tse operations. Therefore, any digital computer can
be constructed from a standard s-,:t of tse logic devices which perform
only elementary operations. The simplicity of the computational cells
within each tse logic device is expected to permit their successful inte-
gration into the required rectangular array.
Electro-''Optical Tse Logic
Schaefer and Strong [1] proposed an implementation of the tse logic
concept that is based on a combination of electronic and fiber optic
technologies. The AND, OR, and NEGATE functions, as well as a special
REFORMAT operation, are performed by active semiconductor devices. SLIDE
operations and general data transfers are accomplished using passive fiber
optic devices.
Active tse logic devices consist of an integrated array of computa-
tional cells. Each cell contains photo-detectors which convert the
optical input into electrical signals. The electrical signals are pro-
cessed by conventional MOSF'ET circuits and converted to an optical output
by either exciting or failing to excite an electroluminescent material.
Thus, the state of each data point in a tse is indicated by the presence
(state 1) or absence (state 0) of light at the corresponding array
position.
Photon coupling between tse logic devices is provided by coherent
fiber optic bundles which act as tse image data paths. Optical inputs
to individual cells of multiple tse input devices, such as the AND and
OR gates, are projected onto the integrated circuit by a fiber optic
interleaver. The glass fibers which make up the interleaves are arranged
a
t
r24
so that data points from corresponding matrix positions of two input
tses are brought into adjacency at the output, thereby becoming inputs
to a single cell in the active tse logic device. Initially, the com-
plexity of the interleaves fiber optic array will limit the number of
tse inputs to two. The basic tse logic devices are illustrated in
Figure 2.2, and schematic symbols for tse devices are listed in Appendix A.
The problem of distributing light to multiple fiber optic bundles
restricts the basic fan-out of each tse logic device to one, however,
the interleaver can be used in reverse as an image duplicator. Half of
the light output of each electroluminescent point in the array appears
at each duplicator output. The two modes of operation for an interleaver
are illustrated in Figure 2.3, and a prototype interleaver is shown in
Figure 2.4. Because of the arrangement of—the glass fibers within the
interleaver and the decreased light intensity at each o 11tput, an active
integrated circuit refomatter is normally required at each output. The
reformatter serves as a buffer and restores the proper signal levels.
Therefore, the effective fan-out of a tse logic device can be increased
to two by.adding a duplicator (interleaver) and two reformatters to the
basic device. Further increases in fan-out can be obtained by cascading
additional duplicators and reformatters. As shown by Schaefer and Strong
[1], the reformatters can be replaced by negators when the complement of
the input tse is required. This approach reduces propagation delay and
the number of components required in some tse circuits but also reduces
the potential noise immunity of the negator device since the logic one
input threshold must be set to less than one-half of the normal logic
one light intensity. The basic tse logic circuit for the EXCLUSIVE-OR
of two tses is illustrated  i n Figure 2.5.
	
%'V',P d.ODUCIBiLITY OF `i'Hi
IGiNA..L PAGB IS P001
i0o
w
25
AND
i
OR
.0000
w	 ^
SLIDE	 NOT
Figure 2.2 Basic tse logic devices.
f
E
4
26
I
00
COMBINER
I
00",
00
000
DUPLICATOR
Figure 2.3 Two modes of interleaves operation.
~
-
-
( I 
.;,
~
 
~~
 
~~
 
~g
 
"='
~ 
~E=
l :~
 
"='
~ ~~
 
~
 
'-
-
Fi
gu
re
 2
.4
 
A
 p
ro
to
ty
pe
 i
nt
er
le
av
er
. 
(C
ou
rte
sy
 o
f 
Ea
rth
 O
bs
er
va
tio
n 
Sy
ste
m
s 
Di
vi
si
o
n
, 
Go
dd
ard
 S
pa
ce
 F
lig
ht
 C
en
te
r) 
,
\ 
.
.
-
..
..
 
N
 
.
.
.
.
, 
DUPLICAT
REFORMATTE-
NEGATOR
r
28
INPUT
	
INPUT
IMAGE A	 IMAGE B
i
COMBINER
AND
^-- DUPLICATOR
i	 1`	 ^--- N RGATOR
$'.' ; -t—/ ;' REFORMATTER
	
l► ^ I	 !I	 /
^	 J^J ti JI 1	 '!
COMBINER
AND
^	 r
i	 r
r
COMBINER
—OR
EXCLUSIVE OR
OF A AND B
Figure 2.5 Exploded view of a tse EXCLUSIVE-OR circuit.
(Courtesy of Earth Observation Systems Division, Goddard
Space Flight Center).
c
f29
SLIDE operations are accomplished by transferring the tse from one
fiber optic image path to another path which is offset in the x and/or y
direction. Normally fibers in the output path which receive no inputs
from the original tse are masked in order to guarantee a logical zero in
the corresponding position of the output tse. Figure 2.6 demonstrates
the image path configuration required for a SLIDE down operation. By
utilizing the proper offset, a SLIDE gate can be constructed so as to
translate a tse any number of matrix positions.
SLIDE gates are one of several intercommunication devices proposed
by Schaefer and Strong [1]. The other intercommunication devices being
considered for use in tse computers include: a cycler, a rotater, a
vertical inverter, a horizontal inverter, a diagonal inverter, a hori-
zontal sweeper, a vertical sweeper, a contractor, a spiller, a magnifier,
and a demagnifier. In the context of this study, the most important
'special intercommunication devices are the contractor and the spiller.
A contractor has one tse input and one optical output element. The out-
put signal is a logic one if any elements of the input tse are in the
logic one state. This device is useful for detecting an empty or all
zero tse which corresponds to a black image. Two potential realizations
of the contractor are illustrated in Figure 2.7. The spiller device
shown in Figure 2.8 has one tse input and one tse output. The output
tse is a white or all logic one image if and only if the element of the
first row and first column of the input tse is a logic one. Spiller
operation can be obtained from the tse circuit shown in the lower section
of Figure 2.7 if all elements of the input image except element (I,l)
are masked.
is
pkPk^ ^
	 il^.liL
R1'.
is	 !	 1
E:
4
f
i
4
i-
r	
3!r
^'..	 1SSSj
S	 ^
L	
{{
P`
{J
Figure 2.6 Alternate SLIDE down device struccures.
'J
30
J
'IFR i	 .. I
31
Figure 2.7 Alternate implementations of the contractor device.
ER
r
32
Figure 2.8 An image intensifier implementation of the spiller device.
i
9
33
Performance specifications for the electro-optical' tse logic com-
ponents will be an important consideration in the development of tse
computer architectures. The experimental devices currently being devel-
oped are based on a 128 x 128 element array. Initially, active tse logic
E
devices will have a response time (propagation delay) of approximately 	 I
five milliseconds and will consume up to three watts of power. Passive
devices will add significant volume and weight to the tse computer
structure. For example, the prototype interleaver (Figure 2.4, page 27)
weighs 0.7 grams, is six centimeters long, and has an output image area
of one square inch. Major objectives in the development of tse logic
devices will be to increase the array size to 1024 x 1024 elements while
reducing the component size, power consumption, and propagation delay.
A major advantage of the tse logic concept is that these improvements
can be incorporated into individual active tse logic devices as the
electro-optical technology improves.
Tse Analog-to-Digital Conyersic,
A major application of tse logic will be processing the multiple
gray level images received by earth observation spacecraft. Before these
images can be processed, they must be digitized into a set of binary
tses. Figure 2.9 demonstrates the concept of digitizing an image into
three tses by using threshold devices. The output tse data word consists
of three ordered tses and represents an eight level quantization of the
original image. A six tse data word derived from a 1024 x 1024 pixel
image would contain over six million bits of data representing 64 gray
levels. Figure 2.10 illustrates the tse logic concept as applied to pro-
cessing earth resources data.
34
HALF SILVERED
MIRROR
INPUT
IMAGE	 THRESHOLD
DEVICE
Th4
NEGATION
DEVICE
ti
MOST SIGNIFICANT
TSE
Th G I
	
I I Th7
LEAST SIGNIFICANT
TSE
Figure 2.9 Tse analog-to- digital conversion hardware
schematic for an eight level image.
(Courtesy of Earth Observation Systems Division,
Goddard Space Flight Center)
"2ACRES
OF DISEASED
COR N..
^.=	
i-	
r^ _ C
	
C^ =y r .
	
^ . _.: CL
	 C= _ LL::: C:C r_	 C.=	 C^ Ll:: C:	 1	 i_ ._.
TSE COMPUTER CONCEPT
TSE PROCESSOR
CONTAINING TWO
ANALOG TO	 DIMENSIONAL PARALLEL
DIGITAL CONVERTER	 HARDWARE
IMAGE	 FIBER
THRESHOLD	 OPTICSteI--	 n. "", —
	
X OF D-- '	 INTERLEAVERS
	
BINARY	 2DIMENSIONAL LOGIC DEVICES
IMAGES
LENS
-	
_	 ^•.
tr
-
	
- -
4 IMP UT IMAGE
Figure 2.10 Tse computer concept.
(Courtesy of Earth Observation Systems Division, 'Goddard Space Flight Center) 	 vw,
CHAPTER 3
GOLAY HEXAGONAL PARALLEL PATTERN TRANSFORMATIONS
Many important pattern detection and recognition algorithms are
based on nearest neighbor logic in which each point in a binary image is
treated as the basis point of a localized neighborhood. These algorithms
utilize image transformations which require various operations to be per-
formed on the basis point as a function of the states of the basis point
and the points within its neighborhood. The extent and form of the
neighborhood must be carefully matched to the application so that hard-
ware complexity can be minimized. In the first section of this chapter,
some factors which influence the choice of a neighborhood are discussed.
The second section reviews a particular type of pattern recognition
algorithm which utilizes nearest neighbor logic and is the basis of the
tse logic designs presented in the remainder of this dissertation.
Neighborhood Considerations
As pointed out by Golay [10], there are only three types of planar,
symmetric, isotropic point arrangements: the square, hexagonal, and
triangular arrays. In a standard, uniformly-shaped rectangular digitiza-
tion pattern, each basis point has four primary neighbors at the distance
unity, four secondary neighbors at the distance 427, and additional
neighbors at distances of two and greater. The number of points which
can be considered in the basic neighborhood is limited by the practical
considerations of hardware complexity and cost. Therefore, the simple
36
37
eight point neighborhood shown in Figure 3.1 is normally selected for the
rectangular array. As shown by Kruse [12], more complex neighborhood
operations can be simulated by selected sequences of local operations
using the basic neighborhood.
The rectangular neighborhood is advantageous in pattern recognition
algorithms which depend on an orthogonal coordinate system. In general,
	 '
however, the rectangular neighborhood requires more complex processing
than the hexagonal neighborhood [10]. As shown by Golay [10], any oper-
ation on a basis point, P, which requires knowledge of the connectivity
properties of the rectangular neighborhood of P should be a function of
both the four primary and the four secondary neighbors of P. however,
since the secondary neighbors are at a somewhat greater distance from P,
the -function should depend less strongly on* these variables. Thus, the
development and implementation of algorithms involving the connectivity
of rectangular neighborhoods is a complex task.
This task becomes even more complex when the triangular array is
employed. In order to determine whether or not a basis point operation
will alter the connectivity of a triangular array, one must know the
states of three nearest neighbors at distance unity, six neighbors at
distance vIT, and three neighbors at distance two. Therefore, the tri-
angular array is not normally used in pattern recognition algorithms.
In contrast, the uniform hexagonal digitization pattern proposed by
Golay [10] yields a basic neighborhood (Figure 3.2) of six points which
are equidistant from the basis point. As a result, the connectivity
properties of the hexagonal neighborhood are readily defined as a func-
tion of the basis point and its six nearest neighbors. The single dis-
advantage of the hexagonal array is that the natural coordinate axes do
i
38
i
Figure 3.1 Eight point rettanguiar neighborhood.
fQ
39
Figure 3.2 Six point hexagonal neighborhood.
REPRODUCIBILITY OP RE
ORIONAL PAGE IS POW
.40
not correspond to an orthogonal system. Therefore, the hexagonal array
is favored in applications, such as the extraction of earth resources
data from satellite pictures, where the required algorithms do not rely
on orthogonal coordinates.
Golay Transforms
The Golay transform is based on the set of 14 rotationally indepen-
dent patterns of zero and one states which can occur in the surround
(neighborhood) of a bases point within a hexagonal array. As shown in
Table 1.1, page 13, each pattern is assigned a characteristic index and a
weight which indicates the number of distinct orientations of the pattern
that can be obtained by rotating the illustrated surround. The sum of
the weights is 64, the total number of possible neighborhood patterns.
Because all patterns of the same index are considered equivalent, Golay
transform operations are invariant under rotation of an image through
angles which are multiples of 60°.
The connectivity property of a Golay hexagonal neighborhood is
uniquely defined by the state of the basis point and the index of the
surround. For example, changing the state of a basis point within a
neighborhood of index six or less cannot alter the connectivity of the
array. however, the connectivity of the array can be altered by changing
the state of a basis point with a neighborhood of index seven or greater.
A knowledge of both the state of a basis point and the index of its sur-
round is sufficient to determine the effect which a particular pattern
transform operation will have on the connectivity of a neighborhood with
basis point P, if and only if the neighbors of P are not simultaneously
transformed as a function of P. The Golay transform satisfies this
I41
condition by dividing the hexagonal binary data array into a number of
subfields, each of which contains only non-adjacent points. Three sub-
fields are commonly used although, as shown in Figure 3.3, the array can
also be divided into four or seven subfields. All data points within a
particular subfield are processed in parallel. However, only one sub-
field is transformed at a time to avoid the potential logical conflicts.
Thus far this discussion has assumed a simple Golay transform in
which the surround of each basis point of an image, k, is equivalent to
the hexagonal neighborhood of that point. The Golay transform also per-
mits more general compound operations in which the surround of a basis
point of image k can be defined as a logical function of the hexagonal
neighbors of the corresponding points in a multiplicity of other binary
images, p,q,...,u,v. The compound Golay transform can be utilized in
processing multiple grey level images which have been digitized.
The general hexagonal parallel pattern transformations [10] are
basis point operations which are performed simultaneously on all points
within one subfield of an image and then sequentially on each subfield.
Golay transforms are expressed in the form
Maoal...al3, nt k = ai(LI) . L 2
 • L3 + ai(LI) • L2k 1	 (1)
where M stands for the general basis point operation. The a 0 - a 13 sub-
scripts of M represent the indices of the basis point's surround for
which an operation will be performed. Subscript n stands for the number
of iterations required. Each iteration involves performing the indicated
operation on each of the specified subfields in turn. When n is not
specified, the operation is performed only once. If n is replaced
42
1 2 3 1 2 3 1	 2
3 1 2 3 1 2 3
1 2 3 1 2 3 1	 2
3 1 2 3 1 2 3
1 2 3 1 2 3 1	 2
3 1 2 3 1 2 3
1 2 3 1 2 3 1	 2
3 1 2 3 1 2 3.
1 2 1 2 1 2 1	 2
3 4 3 4 3 4 3
1 2 1 2 1 2 1	 2
3 4 3 4 3 4 3
1 2 1 2 . 1 2 1	 2
3 4 3 4 3 4 3
1 2 1 2 1 2 1	 2
3 4 3 4 3 4 3
1	 2	 3	 4 5	 6	 7	 1
4 5 6 7 1
	 2 3
6 7	 1	 2 3 4 5 6
2 3 4 5 6 7 1
4 5 6 7 1
	 2 3 4
7	 1	 2 3 4 5
	 6
2	 3	 4 5	 6	 7	 1	 2
5	 6	 7 1	 2 3 4
Figure 3.3 Hexagonal arrays of three, Four, and seven suhfields.
t	 -^
1
C	 I
43
by the symbol
	 instead of a number, the operation is performed until
the image ceases to be transformed by further iterations.
'I
In Equation (1), k represents the binary state of the basis point
being operated on. When a simple Golay transform is being performed,
	 .
L  is a logical function of one or more of the hexagonal neighbors of
the basis point in image k. In the case of a compound Golay transform,
L  can also be a function of the hexagonal neighbors of corresponding
basis points in images p,q,...,u,v.
	 In general, L I	LI(k,p,q,...,u,v,
	
i
...) and L  - k implies a simple Golay transform. The surround of the
	
I
1
basis point consists of the six outputs of the function L 1 which corre-
spond to hexagonal neighbors a-f in the simple Golay transforms. The
term i(L) is the index of the surround of the basis point on which the
operation is being performed. When the index of the surround is listed
as a subscript of M, a i(L	 = 1; otherwise, a i(L ) = 0.
1	 1	 ,
L2
 specifies which one of the three, four, or seven subfields Is
currently being transformed. For an image divided into four subfields,
L2
 will be true for only one-fourth of the image points at any one time.
When the operation is to be performed on all subfields of an image, a
superscript of three, four, or seven may be given with M instead of
specifying L2 . L3
 can be either a control signal or a logical function
of the various images utilized in compound Golay transforms.
A number of useful Golay transforms are given in [10] and [11]. One
example is the simple skeletonizing operation defined by
3	 (2)
M1-.,_ k i ai(k)	 k .
When this algorithm is applied to a binary image plane, all simple blobs
(4)
44
are reduced to a single logic one point, and all blobs with holes are
reduced to one or more loops consisting of a single layer of logic one
points. Figure 3.4 demonstrates the application of this algorithm. The
superscript of M specifies that the points in the binary image are
assigned to one of three subfields by the numbering scheme shown in
Figure 3.3, page 42, and that the subfields are processed sequentially
in ascending order. The first subscript of M indicates that the basis
point operation is to be performed whenever the index of the surround is
one, two, or three. Therefore, whenever i(k) =-1, 2, or 3
aitk) = 1
	 and	
ai^k) = 0
otherwise
ai (k) -= 0	 and	 ai ^k) = 1 .
As specified by the second subscript of M, the skeletonizing operation is
performed iteratively until the image becomes stable. When each subfield
is processed, points within that subfield are set to zero unless they are
currently in state one and have a surround whose index is not one, two or
three. Figure 3.4 shows only one iteration of the algorithm. However,
this simple image will not be transformed further by another iteration,
so the final skeleton is the result of the third subfield operation.
Although the skeletonizing algorithm is a simple Golay transform,
its tse logic implementation must embody the general Golay transform
principles. In addition, hardware minimization is a particularly impor-
tant consideration in the initial development of tse logic circuits.
`therefore, implementation of the skeletonizing algorithm will be
t
45
ORIGINAL IMAGE
0 0 0 0 0 0 0 0
0 00 0 1
	
0	 0
00 0 1 1 1 0 0
0 0	 0	 1	 0	 1	 0
0	 0	 0	 1	 U	 1	 0	 0
0	 0	 1	 0	 0	 1	 0
0	 0	 0	 1	 1	 1	 0	 0
0 0 0 0 0 0 0
RESULT OF SECOND SUBFIELD
OPERATION
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 CD 0 0 1 1 0 0
0	 0	 0	 1	 0	 1	 0
x 0
	
0	 1	 0	 1	 0	 0
0	 0	 1	 0	 0	 1	 0
0	 0	 0	 1	 1	 1	 a	 0
0 0 0 0 0 0 0
RESULT OF FIRST SUBFIELD
OPERATION
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 1 1 0 0
0	 0	 0	 1	 0	 1	 0
0	 0	 0	 1	 0	 1	 0	 0
0	 0	 1	 0	 0	 1	 0
0	 0	 0	 1	 1	 1	 0	 0
0 0 a 0 0 a a
RESULT OF THIRD SUBFIELD
OPERATION
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 O0 0 1 1 0 0
0	 0	 0	 1	 0	 1	 0
0	 0	 0	 1	 0	 1	 0	 0
0	 0	 1	 0	 0	 1	 0
0	 0	 0	 1	 l	 7	 0	 0
0 0 0 0 0 0 0
Figure 3.4. An example of the Golay transform skeletonizing algorithm.
J ^1^ODUCII3ILITY OF T1.)
"' Y;1NAL PAGE IS PO"
46
emphasized in the tse logic designs presented in the following chapters.
Little generality will be lost since, as demonstrated in Chapter 4, any
simple Colay transform can be implemented by a straightforward modifica-
tion of the basic tse logic skeletonizing machine.
r
CHAPTER 4
BASIC TSE LOGIC DESIGN CONCEPTS AND THEIR APPLICATION TO
DESIGN OF A GOLAY TRANSFORM PROCESSOR
Every switching function can be expressed in a canonical sum-of-
products form, where each expression consists of a finite number of
switching variables, constants, and the operations AND, OR, and NOT [151.
The family of electro-optical tse logic devices proposed by Schaefer
and Strong [11 is -Functionally complete since the AND, OR, and NOT op-
erations are all available. Thus, tse circuits analogous to each of
the major subdivisions of a conventional digital computer can be de-
signed. A computer organization in which arithmetic-logic unit, con-
trol unit, and memory are all constructed from tse logic devices is
conceivable. If such a computer is developed by interconnecting tse
circuits which correspond directly to the logic circuits of a conven-
tional computer, the tse computer will simply be a two- , dimensional ex-
pansion of the conventional computer. Each element in the n x n array
of a tse logic device becomes one component of an elemental computer
which is isomorphic to the original conventional computer [1]. This
organization, however, does not realize the full power of tse logic
since tse intercommunication devices provide the potential for a more
sophisticated design in which data can be transferred between the ele-
mental computers. A new computer architecture based on the unique
characteristics of tse logic is required.
The goal of this research is to contribute to the development of
tse computer architectures throu g h the design of tse logic circuits
which perform the Golay transform skeletonizing algorithm. These tse
47
r48
logic units will be controlled by conventional logic control units.
A two-dimensional control unit would potentially allow dissimilar
operations to be performed on various areas of an image simultaneously
and independently. This flexibility, however, is not essential and
can only be obtained with a significant increase in the number of tse
logic components. Also, before the development of a two-dimensional
control unit is inaugurated, the design of tse arithmetic-logic units
should be thoroughly understood.
This chapter introduces some basic tse logic concepts and presents
one tse logic implementation of the skeletonizing algorithm. Additions
to the tse logic family proposed by Schaefer and Strong [1] are de--
scribed. Several circuits are presented for generating Golay neighbor
planes which simplify index recognition using tse logic.
Elementary Tse Processor Control
Control of basic tse processing units can be achieved by providing
control images that consist of all ones when -true and all zeros when
not true. The control image can be created by switching a light source
that illuminates each element in a fiber optic bundle. One possible
light source is an array of electroluminescent devices manufactured on
a semiconductor substrate. This source is similar to the output array
of a standard active tse logic device and is assumed to be switched by
a single CMOS compatible control line. The state of the output tse cor-
responds to the state of the control line.
Some tse processing requires control of individual data points
within the tse. A number of different techniques can be used to set
particular data points within a tse to the logic one or logic zero
49
level. One method of obtaining this control is to AND-or OR the tse
with a tse mask which contains a predetermined binary image. The tse
mask image can be permanently stored in an electro-optical tse memory
chip which contains an array of electroluminescent devices. A standard
electrol um! nescent array based on the light source described earlier
could be programmed to produce any required tse mask. Typically, the
array would be programmed by specifying the final metalization pattern
used in the manufacturing process to define the power connections to
elements of the array. The same general technique is currently employed
in the production of semiconductor read-only-memories. Alternately, the
tse mask can be produced from an all logic one el ectrol umi nes cent array
by placing an "opaque and clear" photographic film mask between the ar-
ray and the fiber optic image path. In cases where the only operation
required is to set particular points within the data tse to zero, a film
mask can be placed directly in the data path. Still another possibility
is to design the active tse logic devices so that selected points within
their output el ectrol umi nes cent arrays can be permanently set to either
logic level during manufacture. This might be accomplished by altering
the final metalization pattern as described above or by forming the
final integrated circuit interconnections using the recently developed
dye laser micro-welder [161.
In both conventional and tse computer architectures, conditional
operations are often performed when a zero state is detected. Signals
to control these operations must be derived from a device which is ca-
pable of detecting an empty or all zero tse. The contractor and spil-
ler devices described by Schaefer and Strong [1] can be combined to
perform this function. The practical logic circuit implementations of
50
these devices produce long propagation delays which could seriously
limit the throughput of a tse circuit. Therefore, a new tse device
which produces an output image that is all ones if and only if the in-
put image contains at least one logic one point should be developed.
This device is refer,^ed to as a total spiller. A primary objective in
the development of a total spiller should be to minimize propagation
delay through the circuit.
The timing relationships between tse control signals will depend
upon the propagation delay introduced by each tse log i c device In the
circuit. At this time these propagation delays are not well defined;
however, the response time of the electroluminescent material in active
tse logic devices is expected to be the controlling factor. For this
reason, the response times for the various active devices are likely to
be approximately equal. Since some knowledge of propagation delays is
required for efficient circuit design, all active tse logic devices ex-
cept the spiller, combiner, and total spiller are assumed to introduce
a maximum of one unit propagation delay. Because the spiller, combiner,
and total spiller operations are more complex than the operation of a_
typical gate, a propagation delay of two units has been arbitrarily as-
signed to these devices. Images will be carried through tse buses and
passive devices, such as sliders and interleavers, at the speed of light,
so zero propagation delay is assumed for passive devices. These assump-
tions allow relative comparisons of the performance of tse logic designs
and permit the design of conventional logic control units for tse circuits.
Although control unit timing should be irodified as the characteristics
of real tse devices become known, the control techniques illustrated in
these designs will be useful in the development of future control units.
51
Tse Memories
In the initial stages of tse logic development, integrated read--
and-write tse memories are not expected to be available because of
their complexity. Tse memory requirements will be met by using the
integrated circuit one tse read--only-memories described in the previous
section and by constructing read-and-write tse memories from standard
tse devices. The simplest read-and-write tse memory is the OR latch
illustrated in Figure 4.1. This device stores one binary image in a
feedback path which is controlled by CMOS compatible signal C. The in-
put path is controlled by a second CMOS compatible signal labeled E.
Normally, the tse ROMs switched by C and E would contain all ones; how-
ever, special image patterns could also be used 	 or the control input
images could be generated by another tse circuit. When E = 1 and C = 1,
the output image, Q+ , is Q + I (Q OR I). This property of the OR latch
can be used to design logic circuits in which partial operations are
performed sequentially and Oked into the tse latch to produce the final
result.
A comparable tse latch based on the AND gate is shown in Figure
4.2. The input and feedback paths are controlled by CMOS compatible
signals ii and S, respectively. As specified by the control table, the
output of the AND latch is Q • I (Q AND I) when H and S are both zero.
This property of the AND latch can be used to form the final result of
a tse operation by logically ANDing the results of a set of sequential
operations. Although an objective of tse logic design is to attain
***The images stored in each tse read-only-memory used in this
dissertation are defined in Appendix B.
u
INPU'
I
JTPUT
Q
52
e
CONTROL TABLE
E	 C	 Qii
0	 0	 0	 -
0	 1	 Qij
1	 0	 Iii
1	 1	 Qi3+Ii,j
Figure 4.1 One tse OR latch
-	 - 41),
iN PUT
WPUTQ
1
53
CONTROL TABLE
H	 s	 Qt 
0	 0	 Qii•1ij
0	 1	 Iii
1	 0	 Qij
1	 1	 1
Figure 4.2 One tse AND latch
54
high speed operation through parallelism, serial logic circuits are
often necessary for component minimization. Throughput can remain high
because of the parallel nature of the tse logic devices themselves.
A mador disadvantage of the basic latch circuits is that the out-
put image is destroyed before storage of the input image can be guar-	 r
anteed. In circuits where the next input to the latch depends upon the
present output of the latch, incorrect operations can result. A master-
slave memory, such as the one shown in Figure 4.3, can be used to avoid
this difficulty. The timing diagram'in Figure 4.3 illustrates the con-
trol signal sequence for storing a new tse in the master-slave memory.
Timing is based on the following conservative, worst-case assumptions
about the active tse logic devices:
Minimum propagation delay - 0 units
Maximum propagation delay - 1 unit
Minimum turn-on time 	 - 0 units
Maximum turn-off time	 - 1 unit
The assumption of a minimum propagation delay of zero assures that cor-
rectly generated control signals will not permit race conditions to
develop in tse circuits. In tse timing dia grams, the turn-on and turn-
off timing constraints are illustrated by showing control signal state
changes as if they occur over a time span of one gate delay. Actually,
control signal state changes will occur instantaneously, and the state
of the tse logic device being controlled can change at any time up to
one gate delay later.
C1
E2
i
x
f
55
INPUT
Q
SCHEMATIC
E1
C1 —
F2
C2
TIME
0	 5	 10	 15
CONTROL. SEQUENCE FOR STORING A NEW IMAGE
Figure 4.3 Master-slave tse memory.
	
l;
i
{
4j
56
Hexagonal-To-Rectangular Array Transformation
Standard tse logic devices utilize a rectangular array of binary
data points and are not directly compatible with the hexagonal array
employed in the Golay transforms. This difficulty can be overcome by
noting that a rectangular array will be created from the Golay hexa-
gonal digitization pattern if the data points in even numbered rows
are shifted one-half unit distance to the left. The Golay neighborhood
of each basis point, P, in the resulting rectangular array can be de-
fined using the Knowledge of which points formed the neighborhood of P
in the original array. Subfield assignments and typical neighborhood
patterns for rectangular arrays with three, -Four, and seven subfields
are shown in Figure 4.4. dote that the neighborhood pattern for cor-
responding points in arrays of three and seven subfields are identical.
Also note that data which is originally acquired through a hexagonal
digitization pattern can be processed using the square tse logic array
without introducing error. Data acquired through a scanning process
can be automatically stored in a square array. If the data acquisition
process is parallel, a special fiber optic bundle could be designed to
convert the hexagonal array into a square array. Alternately, one
could utilize a standard rectangular digitization pattern and merely
assign particular Golay neighbors to each basis point. For sampling
intervals which are small compared to the critical dimensions of inter-
est in the binary image, the minor errors introduced by this process
should be acceptable.
57
s
1^2-3 1	 2	 3	 1 2 1 2-1 2	 1	 2 1 2
3	 1	 2
1 ^2-3
3-1^2 3
1	 2	 3	 1 2
^4 4 3-1
2^1	 2
3 4
1-2" 1 1 2	 !
3 -1/2 3 4--343	 1	 2 1 3 4 3 3 4
1	 2	 3 1	 2	 3	 1 2 1 2 1 2	 1	 2 1 2
3	 1	 2 3	 1	 2	 3 1 3 4 3 4	 3	 4 3 4
1	 2	 3 1	 2	 3	 1 2 1 2 1 2	 1	 2 1 2
3	 1	 2 3	 1	 2	 3 1 3 4 3 4	 3	 4 3 4
(a) Three Subfields (b) Four Subfields
1` 2-3II 4	 5	 6 7 1
5	 6 --1^2 3 4
6	 7---1 2	 3	 4 5 6
2	 3	 4 5— 6/7 ? 2
4	 5	 6 7	 1	 2 3 4
7	 1	 2 3	 4	 5 6 1
2	 3	 4 5	 6	 7 1 2 a
5	 6	 7 1	 2	 3 4 5
(c)	 Seven Subfields
Figure 4.4	 Subfield assignments and neighborhood patterns for
rectangular arrays with three, four, and seven subfields.
r
ti
58
Golay NeighborPlanes
Golay's modular operations are normally functions of the Index of
the basis point, P. As a result, a Golay transform processor must be
capable of performing logical operations on a point P i a as a function
of its coplanar Golay neighbors which, in the case of four subfields
for example, are located at positions Pi,j-1' P i-1,j 3 Pi-l,j+l1 Pi,j+l'
P,+j,j , and Pi+l,j_1. In constrast, binary tse logic operations are
performed on separate image planes in parallel, and the state of a point
Xij in the output array is only a function of the state of corresponding
points in position Y ij of the input image arrays. Thus, for index rec-
ognition using tse logic,the state of each neighbor of a basis point,
Pij , should be available in position Y ij of a separate image plane.
Six Golay neighbor planes (GA, GB, GC, GR, GE, and GF) are required to
represent the state of each neighbor of every basis point in a tse.
A tse logic circuit which will produce the six Golay neighbor
planes for a tse divided into four subfields is shown in Figure 4.5.
Plane GA is -formed by sliding the input image right one element. This
action transfers a point in position P ilj _ l of the original image to
position Pij of plane GA. Therefore, each point in plane GA is the A
Golay neighbor of the point which occupies the same relative location
in the original tse. Similarly, plane GB contains the B Golay neigh-
bors, and so forth. The input image is assumed to have a one element
deep border of dummy zeros since a complete neighborhood cannot be de-
fined for the border elements.
When a tse is divided into three or seven subfields, a somewhat
more complex Golay neighbor planes generator circuit is required because
rMDUCI ILYEY OF THE
, 3 , Z L PAGE IS POOP
a59
Figure 4.5 golay neighbor planes generator for rectangular arrays
with four subfields.
r60
the surrounds of basis points in even and odd numbered rows are oriented
in opposite directions. The A, B, C, 0, E, and F Golay neighbors of
basisoint P.- are P. 	 , P.	 , P.	 , P.	 , and P -	 re-p	
^^	 z'j- 1 	 ^ - 1 ,j	 7-1,j^-I	 1,j ^'1 	 ^-^l,j^l
spectively, if ti is even but are P
i' j _ 1 , P 	 P i-1,j' Pi,j+l'
Pi+l,33 and Pi+ , ,j _ 1 , respectively, if i is odd. Since P i'j _ 1 is neigh-
bor A in each case, plane GA can be created by sliding the original tse
right one element. However, sliding the input tse down one element in
an attempt to create plane GB actLally results in an image which con-
twins B Golay neighbors in even rows and C Golay neighbors in odd rows.
To,obtain a tse with B Golay neighbors in odd rows, one can slide the
original image down one element and to the right one element. By com-
bining the odd rows of this tse with the even rows of the previous
image, a single tse which corresponds to Golay neighbor plane GB can
be obtained. This process is illustrated in Figure 4.6 which is a
schematic for one possible Golay neighbor planes generator circuit: for
images divided into three or seven subfields. As shown in Appendix B,
the one tse read-only-memory labeled ME consists of all ones in odd
rows and all zeros in even rows. Mask MO is the complement of ME. The
Type 1 Golay neighbor planes generator circuit requires 38 active and
29 passive tse logic devices. Based on the standard assumption of one
unit gate delay for each active tse logic device, the complete circuit
has a propagation delay of six gate delays.
Figure 4.7 illustrates the advantages which can be obtained by re-
placing active tse read-only-memory masks with film masks inserted
directly in the signal path. The boxes labeled FE and FO represent
photographic film masks which contain the same patterns as the active
tse masks ME and MO shown in Figure 4.6. An opaque area of the film
4'
rn
Figure 4.6 Type 1 Golay neighbor planes generator for rectangular arrays with three or
seven subfields.
_.
62
Figure 4.7 Type 2 Golay neighbor planes generator for rectangular
arrays with three or seven subi:ields.
r
63
forces the corresponding tse data point to zero by blocking the light.
Clear areas of the film have no effect on their corresponding tse data
points. This implementation of the Golay neighbor planes generator
circuit requires 22 active and 21 passive tse devices. Circuit propa-
gation delay is five gate delays, one less than for the Type 1 circuit.
The technique of programming selected points in the output array of an
active tse device to one or the other logic level could be used to im-
plement the Type 2 Golay neighbor planes generator without photographic
masks.
The Golay neighbor planes generator circuit can be simplified even
further by using the proposed new tse logic device illustrated in Figure
4.8. An EXCHANGE gate is a passive tse intercommunication device with
two tse inputs (A and B) and two tse outputs (A and B). The EXCHANGE
gate is constructed from optical fibers using the same techniques em-
ployed in the construction of tse interleavers and image buses. Uat-
put image path A contains the fibers which carry the data from the odd
rows of the image A input and the fibers which carry the data from the
even rows of the image B input. The B output path contains fibers from
the even rows of A and the odd rows of B. Thus, the EXCHANGE gate per-
forms the operation of exchanging the data in the alternate rows of
two images. This is one of the operations required to generate the
Golay neighbor planes for tses divided into three or seven subfields.
A schematic for the equivalent circuit of the roll EXCHANGE gate is pre-
sented in Figs,.: 4.9. Since the tse array is square, a similar column
exchange operation can be obtained by rotating the EXCHANGE gate fiber
optic array 900 with respect to the image muses interfaced to the in-
64
a,rnzdrUXujauLnauLp1t1
r
	
\ \
/L
	
/
	
/
/r
L65
n
Figure 4.9 Equivalent circuit for an EXCHANGE gate.
L66
puts and outputs of the device. The EXCHANGE gale illustrates that
special fiber optic arrays can be constructed to perform certain tse
logic functions that would normally require several active devices and
tse masks. The special tse logic devices reduce circuit power consump-
tion and propagation delay.
EXCHANGE gates are employed in the third type of Golay neighbor
planes generator circuit shown in Figure 4.10. Only one output of each
UCHANGE gate is used. This implementation of the Gola y neighbor
planes generator requires only 18 active and 21 'passive tse logic de-
vices. Fropagatior. delay is reduced to four gate delays by the passive
nature of the EXCHANGE gates. A comparison of the three types of Golay
neighbor planes generator circuits is provided in Table 4.1.
Index Recoq+iLion
With the neighborhood points for each element of an image avail-
able in the six Golay neighbor planes, the task of designing circuits
to perform any binary function of a basis point and the basis surround
reduces to µ relatively standard logic design problem. Tse logic cir-
cuits can be designed to recognize any surround index or combination
of indices. An index recognition circuit can be a sequential circuit
or a pure combinational circuit. The major distinction between the
conventional logic design procedure and the required tse logic design
procedure is that many of the traditional minimization procedures do
a
not apply to tse logic design because of the fan-in and fan-out restric-
tions. An important similarity between conventional and tse logic
circuit designs is that a trade off generally exists between operating
speed and circuit complexity. This fact is demonstrated by the index
14
67
Figure 4.10 Type 3 Got ay neignber planes generator f--:- rect.--,uIz,,r
arrays with three or seven subfields.
fIT
TABLE 4.1
CUARACTER,ISTILS OF THE GOLAY NEIGHBOR PLANES GENERATOR
CIRCUITS FOR RECTANGULAR ARRAYS OF TPREE
OR SEVEN SUBFIELGS
Circuit Figure Number of Number of Propagation Delay in
Type Number Passive Devices Active Devices Standard Gate Delays
Type 1 4.6 29 38 6
Type 2	 4.7
	 21	 22	 5
Type 3	 4.10	 21	 18	 4
01co
69
recognition circuits presented below.
The Golay transform skeletonizing algorithm requires recognition
of basis points which have a surround index of one, two, or three.
Figure 4.11 shows the schematic of a combinational circuit which pro-
duces an output tse with zeros marking the positions of basis points
that have an index of one, two, or three. Each cell in the space iter-
ative circuit recognizes one of the six rotationally equivalent orienta-
tions of a surround with index one, two, or three. Six cells are re-
quired to recognize all of the possible orientations of the surrounds.
This index recognition circuit has a propagation delay of only ten gate
delays but requires 125 active and 77 passive tse logic devices.
Figure 4.12 illustrates a time iterative 'index recognition circuit
Which determines the index by comparing the input tse to tse masks.
Each Golay neighbor plane is EXCLUSIVE-O.Red with .^ tse mask to produce
a tse with logic one points marking positions where the neighbor plane
and the mask have complementary values. These six planes are ANDed
together to form the input to the tse latch. The tse latch input con-
tains logic one points only in positions where none of the corresponding
Golay neighbor plane points match their corresponding masks. If all
of tits: input masks are turned on, the tse latch input image will con-
tain logic one points in the positions of basis points with a surround
of index zero. Each of the 64 possible combinations of input mask
states correspond to one orientation of one of the 14 possible indices.
All basis points with a particular surround index can be recognized by
set`,jentially checking for every orientation of that surround and ORing
the partial results into the tse latch. The timing diagram shown in
Ge atGs Gf
— e. 11 — — cs	 G6 GL GO GE GF OA	 GC 00 GE OF GA G6	 00 GE OF GA G6 GC	 GE GF GA G6 GC GO	 GF GA G6 GC GO GE
i0
Figure 4.11 Space iterative combinational index recognition circuit for indices one, two, and three.
V
CD
HE
V
Figure 4.12 Time iterative, comparison type index recognition circuit.
172
Figure 4.13 illustrates the control sequence required to recognize an
index with a weight of three. To perform the skeietonizing algorithm,
basis points with index one, two, or -three must be recognized. This
procedure takes 215 unit gate delays but requires only 61 active and
33 passive tse logic devices (Table 4.2). An important feature of this
index recognition circuit is that any index or combination of indices
can be recognized. With the addition of a NEGATE operation, recognizing
that the index of a basis point is not an element of a specified set
A is equivalent to identifying the index as an element of set B where
the union of A and B is the set of all 14 indices. The upper bound on
the time required to recognize all basis points with any given set of
indices is established by the time required to identify half of the 64
possible surround orientations. Thus, in the worst casu, 383 unit gate
delays are required to identify all basis points with indices from a
particular set.
Golay Function
Once the index recognition circuits have been designed, a Golay
function circuit can be developed to perform the logical operations
specified by a particular Golay transform. The Golay function circuit
for performing the swelling operation [10] defined by the symbol
M3- 5,M[k = a i(k) + a i(k) • k]	 (5)
is illustrated in Figure 4.14. As shown in Figure 4.15, the skeleton-
izing algorithm requires an even simpler Golay function circuit.
Similar Golay function blocks can be designed for any Golay transform.
4
1INPUT 
X	 k	 Y
NX _
	 !	 -- _	 ! +	 1	 l 1
E -/--
t
C_ ---- --
s
TIME
0	 5	 10	 15	 20	 25	 30	 35
Figure 4.13 Timing diagram for recognizing an index with a weight of three using the
comparison type index recognition circuit.
cs
E
TABLE 4.2
CHARACTERISTICS OF THE SPACE AND TIME ITERATIVE
INDEX RECOGNITION CIRCUITS
Time Required tQ Recognize
Circuit Figure Number of dumber of Indices 1, 2, and 3
Type Number Active Devices Passive Devices (Unit Gate Delays)
Space Iterative 4.17 T25 77 10
Time Iterative 4.12 61 33 215
75
Index recognition
image-basis points
with indices 3-5
identified by I's
Layer
Input
image
i
r
Golay function image
Figure 4.14 Golay function circuit for a swelling operation.
Layer	 K	 ID	 Index recognition
input	 image-basis points
image	 HN	 with indices 1-3
identified by 1's
F
Golay function image
Figure 4.15 Golay function circuit for a skeletonizing operation.
_	 Y _
76
Note that although the Golay function is performed simultaneously on
all points of the layer input image, the results at any one time are
only valid for one subfield within the image. The Golay function must
normally be performed sequentially on each subfield of the image. A
hardwired tse logic circuit which illustrates the procedure for per-
forming a Golay transform is presented in the next. section.	 I
Implementation of the Skeletonizing Algorithm
The task involved in the Golay transform skeletonizing algorithm,
M1-3, 
[k = ai(k)	 k^^
	
(6)
is to shrink all simple blobs in the binary image plane,K, to a single
one, while reducing blobs with holes to one or more loops consisting
of a single layer of ones. During an iteration, through the algorithm,
the specified modular operation (Golay function) is performed in par-
allel on the points within each of the three subfields in sequence.
As each subfield is processed, the basis points within thhat subfield which
are in state one and have a surround index not equal to one, two, or
three are allowed to remain in state one while all other points in that
subfield are set to zero. The process is complete when further itera-
tions produce no change in the output image.
A block diagram of the tse logic implementation of the skeleton-
izing algorithm is presented in Figure 4.16, and a schematic for the
circuit is illustrated in Figure 4.17. This particular design is
intended for high speed processing and, therefore, utilizes the space
iterative, combinational index recognition circuit. The timing diagram
LAYER
OUTPUT
	
LAYER	 LATCH
LAYER	 OUTPUT	 QA,
INPUT	 TRUE	 A
	
LAYER INPUT
	
LATCH	 L	 GOLAY NEIGHBOR
LOG IC	 A QQ i 	 PLANES GENERATOR
GA GB GC GD GE GF
1
SUBFIELD	 GOLAY
	
F' MULTIPLEXING	 FUNCTION	 INDEX RECOGNITION LOGIC
CIRCUIT	 F LOGIC	 Q- ID
	
Figure 4.16 Block diagram of a tse logic implementation of the skeletonizing algorithm.
	 V
Figure 4.77 Schematic of a hardwired skeietonizing machine.
LO 1
C."
o^
r
y
CO
GOLAY TUNCTION CIRCUIT
79
provided in Figure 4.18 shows the sequence of control signals required
by the skeletonizing machine.
The operation of the skeletonizing machine can be explained by
following an image, K, through one iteration of the skeletonizing al-
gorithm. Assume that the binary image labeled K in Figure 4.19 is
available at the layer input of the skeletonizing machine. Upon com-
pletion of the current task, the layer input logic will automatically
gate (t = 0) the new image plane K into latches A and A w for temporary
storage. Golay neighbor planes are generated using EXCHANGE gates
(t = 18), and basis points with a surround who-_2 index is not one, two,
or three are identified by the index recognition circuit. At t = 28
the output of the index recognition circuit is ANDed with image K to
obtain the Golay function output image, F(t = 29). Although the entire
image is being processed, only the first subfield results are valid.
Image F propagates unaltered through the subfield multiplexing circuit
and the layer input logic to the input of latch A. Subfield one of
image F ' is then ANDed into latch A(t = 48) by manipulating the tse
mask control lines so that only the first subfield portion of the reg-
ister A input control image changes. Latch A now contains subfields
two and three of the original image, K, and a new subfield on which
the Golay modular operation has been performed. The first subfield
operation is now complete. The image created by the first subfield
operation is allowed to propagate back through the circuits described
above with the result that the second subfield is processed and ANDed
back into latch A(t = 82). After the completion of the second subfield
operation, the resultant binary image is again processed by the Golay
neighbor planes generator (t = 86), the index recognition circuit
a^
80
INPUT
AE1
AC1
	
A'E2	 /-1
A'C2
H
	
H * 	_
	
—
	
S	 n
X*
F  .^	 a
LOT
0	 50	 100
FIRST SUBFIELD
	
SECOND SUBFIELD	 THIRD SUBFIELD
Figure 4.18 Timing diagram for one iteration of the
skeletonizing algorithm.
JC.RIGIM LMASE	 MAY HEIGRSCR PLM
	 I&T.EE AELCGI TICN DMLTT
	 GCIAY FL::MG 'i	 MISTER A Cli17'^ITK	 CA	 QAI	 F
0 0 0 0 0 0 0 0	 6 0 0 0 0 0 0 0
	 0 0 0 0 0 0 0 0	 0 0 a a 0 0 0 0	 0 0 0 0 0 0 0 0
01 4 1 1 1 0 0
	
a 0 1 0 1 1 1 0
	 0 0 0 0 1 1 0 0
	 a 0 0 0 1 l a 0
	 0 0 0 1 1 1 4 0
a T '0 1 7 1 1 a	 0 0 3 0 1 1 1 1	 D 0 C I I 1 0 0
	
0 0 0 1 1 1 0 0	 0 11 a l I t 0 0
0 3 1 1 0 1 0 0	 0 0 0 1 1 0 1 0	 0 a 0 1 0 1 0 a	 0 0 0 1 0 1 0 0	 0 0 1 1 0 1 0 0
0 0 0 1 0 1 0 0	 0 0 0 0 1 0 1 0	 0 0 0 1 0 1 0 0
	 0 0 0 i 0 1 g 0	 a p O 1 D 1 a 0
a 7 1 0 a 3 1 0
	
0 0 1 1 0 0 1 1	 0 D 1 0 0 1 0 0
	 0 0 1 a 0 1 0 0	 0 0 1 0 0 1 1 0
D O :1 1 7 l t 0	 D 0 a 1 1 1 3 I
	 a 0 a 1 1 1 0 0	 0 0 0 1 1 1 a 0	 0 0 1 3 7 T O D
0 0 0 0 0 0 D a	 0 0 0 0 0 0 0 0	 D 0 0 0 0 0 0 0
	
0 0 0 D 0 0 0 0	 0 0 0 0 0 0 0 0
t=0	 t=la	 t=28	 tn29	 ta40
GOSAY H E M :2 FLF47-	 IR9EX F MITIC9 D'jMT	 GCLAY FUNCTIC.i	 REDISTER A Da&DT	 WLAY NEIWWR FLANKCr1	 I	 F	 QA	 GA
6 0 0 0 0 0 0 0	 0 0 0 D 0 0 0 0	 0 0 D 0 0 0 0 0	 D 0 0 0 0 0 0 0	 0 0 0 0 0 0 0 0
0 D O O T i 1 0	 a 0 0 0 3 0 0 D	 0 0 0 0 1 0 0 a	 0 
0ao  
0	 0 0 0 0 1 1 0 0
0 0 1 0 1 1 1 0	 D 1 0 1 1 1 0 a	 0 1 0 1 1 1 0 0	 0 D0	 0 0 1 0 1 1 1 0
a a 0 1 1 a 1 D	 0 a 0 1 0 1 0 0	 0 0 0 1 0 1 0 a	 0 0A10D	 0 0 0 0 1 0 1 0
0 D 0 0 1 0 1 0	 0 0 0 1 a 1 a 0	 0 D 0 1 0 1 0 0	 0 a0	 C O a 0 1 0 7 a
D a D 1 0 0 1 1	 D 0 1 0 0 1 0 0	 0 0 1 0 0 1 0 0	 0 0a	 0 0 0 7 0 0 1 1
D 0 0 1 1 1 1 a	 0 0 0 1 1 1 0 0	 a 0 D I I 1 0 0	 0 0a	 0 0 0 1 1 3 1 0
0 0 0 0 C D 0 d	 0 0 0 0 4 0 0 0	 D O O a 0 0 g 0	 a 0 0 0 D a D D	 0 0 0 0 0 0 0 0
t = E2	 t=6?	 t=63	 t=E2	 t=C6
summa mil TIPLExin CIRCUiF Omu	 E%CLUSIYE-9R O'UM9	 RESISTER A CU"Mf	 P1HK RECC711TIC'1 MPDT	 Gv1AY FLfM11	 V	 X	 OA
a 0 0 0 0 0 0 0	 a 0 a 0 a 0 a 0	 a 0 a 0 a 0 0 a	 0 0 0 0 0 0 0 0	 a 0 0 D 0 0 0 a
0 a 0 0 a a 0 0	 0 0 0 0 0 0 0 0	 0 00 D 1 0 0 D
	
0 1 0 1 D 1 0 0	 0 0 0 0 1 0 0 0
0 1 0 a 1 1 0 a	 0 1 0 0 1 1 0 0	 0 D O T i 1 g D	 O p a D D D l a	 C ^1 O 7 7 1 a D
O D 0 1 a i 0 0	 0 0 0 3 0 1 0 0	 a D g 7 D 1 a D	 0 0 1 0 0 0 0 4	 O O o 1 0 1 0 0
1--1	
D 0 0 1 a 1 a D	 D 0 0 1 0 1 0 0
	 0 a 0 1 0 1 0 a	 a a 0 0 0 0 0 0	 a a 0 1 0 1 0 0
ca0 0 7 0 0 1 0 0	 0 0 1 0 0 1 0 0	 0 9 1 0 6 1 0 0	 0 1 0 a 0 0 1 0	 0 0 1 9 0 1 0 0
;'	 0 0 0 1 1 1 0 a	 0 0 0 1 1 1 0 0
	 0 D 0 l I I 0 D
	
0 0 1 0 0 0 1 0	 0 0 0 1 1 1 0 0
d	 0 0 a 0 0 a D a	 a 0 a 0 a 0 0 I	 0 0 0 D 0 0 0 0	 0 0 0 0 0 0 0 0	 0 0 0 0 0 0 0 0t=96	 t=97	 t 	 t=106	 t-I26
Figure 4.19 Images formed during one iteration of the skeletonizing algorithm.
co
r "
n:
f
82
(t = 96), and the Golay function circuit (t = 97). Since the third
subfield operation marks the completion of one iteration, the result-
ant subfield is not ANDed directly back into latch A. Instead, the
third subfield of image F is combined with subfields one and two from
image latch A by the subfield multiplexing circuit to produce tse
F'(t = 101). Image F' is EXCLUSIVE-ORed (t = 106) with the original
image K which has been stored in latch A. The resultant image con-
tains a one in the position of any bases point which changed state as
a result of a modular operation performed during the current iteration.
If . no points have been altered, image F' is the Golay skeleton of the
original image K and processing is complete. ;he total spiller pro-
daces a control tse which allows a new image, labeled J, to be accepted
for processing and causes the layer output-true image to become all
ones. However, if any basis point was altered during the current
iteration, another iteration is required. In that case, the control
image produced by the total spiller allows image F' to be stored in
latches A and A'(t W 126). Processing then continues as described
above until the skeleton of the image s obtained.
-'	 Using this skeleton'izing machine, each iteration of the skeleton-
izing algorithm takes 112 unit gate delays. The total image processing
time is, of course, dependent upon the number of iterations required
to obtain the skeleton of a particular image. Processing time is not
a function of the size of the tse array. A total of 203 active and
134 passive tse components are required for this implementation of the
skeletonizing operation. Of these, 116 active and 58 passive compo-
nents are required because of the fan-in and fan-out limitations of
83
the electro-optical tse logic devices. The number of basic tse logic
components required to perform the skeletonizing algorithm can be
reduced to 196 active and 91 passive devices by using the comparison
type index recognition circuit and a negator. The time for each iter-
ation is increased to 679 unit gate delays.
Evaluation of Skeletonizing Machines
Adequate evaluation and comparison of skeletonizing machines re-
quires a set of performance parameters. Several performance charac-
teristics of the skeletonizing machine described in this chapter are
given in Table 4.3. A hardware cost function defined by
Hardware Cost = Ax + By + Cz
where x = number of active tse devices
y - number of passive tse devices
z = number of unit lengths of fiber
optic bundle
and the constants A, B, and C are weighting
factors which are determined from the ac-
tual size and price of the components
is proposed for comparison of various machines. This cost function
provides a more complete representation of circuit size and power con-
sumption than a total component count. The distinction between active
and passive tse devices is maintained because active devices consume
power without contributing significant weight or bulk, whereas the
passive devices consume no power but contribute substantially to cir-
cuit weight and bulk. The interconnecting fiber bundles are also
important because of their weight and bulk. The length of fiber bundles
TABLE 4.3
BASIC HARDWIRED SKELETONIZING. MACHINE PERFOR,%1ANCE CHARACTERISTICS-
Cost
	
Number of	 Total Gate Delays Data Rate Average Peak Pourer	 Speed-Power
Function	 Control Component per Iteration Simple Pourer Consumption	 Product in
Signals Count (Time in Images Consumption in Watts	 Watt-Seconds
Seconds) per Minute in Watts
208Af134B	 10 342 112 107.14 546 609	 339.36
(0.56)
00
i
l
85
required for a particular circuit cannot be accurately determined at
this time, and, therefore, is not included in Table 4.3.
The relative speed of various skeletonizing machines can be most
accurately determined by considering the average rate at which simple
images can be processed. A simple image is defined as any image whose
skeleton is itself. Only one iteration is required to process a simple
image. The simple image processing operation represents the most
basic task which includes all operations required in the general skele-
tonizing algorithm. Note, however, that the time required to process
a typical image will be much greater than the time required to process
a simple image.
Both peak and average power consumption data are provided in Table
4.3. Peak power consumption provides an indication of the relative
cost and size of the power supplies required for the circuit. Average
power consumption is important for space applications and in battery
powered equipment where the total available energy is limited. A gen-
erally accepted figure of merit which includes consideration of machine
speed and power consumption is provided by the speed-power product.
The speed-power product listed in Table 4.3 reveals the energy which
must be expended to perform one skeletonizing operation. The time and
power consumption figures given in Table 4.3 are based on the five
millisecond propagation delay and three watt power consumption pro-
jected by NASA [1] for early prototype tse devices.
r
CHAPTER 5
IMPLEMENTATION OF THE SKELETONIZING ALGORITHM USING
TSE LOGIC DEVICES WITH A DISABLE INPUT
The tse logic control technique described in Chapter 4 permits con-
f
ventioral logic control of tse circuit operations while requiring the
CMOS compatible output disable signal to be provided for only on p basic
tse device, the electroluminescent array read-only-memory. Eacn control
signal requires one passive and two active tse.devices (Figure 5.1) to
complete the interface with the tse circuit. One unit gate delay is also
added to the tse data path. Two improved skeletonizing machines and a
technique for reducing the number of tse devices required for the control
signal interface are presented in this chapter. In addition, a conven-
tional logic organization for the control units of dedicated tse logic
circuits is developed, and several index recognition circuits which illus-
trate the trade off between circuit complexity and operating speed are
characterized.
Improved Conventional logic Control Signal
Interface to Tse Circuits
A majority of the control signals applied to tse logic circuits are
used to disable image propagation through selected data paths. This goal
could also be accomplished by deactivating the output electroluminescent
array of the active tse logic devices from which the image is originat-
ing. A CMQS compatible control line which activates the array when in
f
86
r	
8l
Tse data
path input
One tse
ROM
Conventional
logic control
line
G	 Tse logic gate
(AND, OR, EXCLUSIVE-OR, etc.)
Tse data
path output
Figure 5.1 Basic conventional logic control signal inter-
face to tse circuits.
88
the logic one state and deactivates the array when in the logic zero state
can be provided for each active tse logic device. Figure 5.2 illustrates
the hardware cost reduction obtained when a tse OR latch is constructed
using this control technique. Note that control signal E is applied to
the active tse device which creates the input image. The hardware cost
function for the improved one tse OR latch is 3A + 2B compared to a hard-
ware cost of 7A + 4B for the basic one tse OR latch (Figure 4.1, pa ge 52)•
Circuit operating speed is also improved by this control technique. One
disadvantage of the control technique is that -the . complexity of the inte-
grated circuit, active tse logic devices is increased. This disadvant-
age should be offset by the reduced tse logic power consumption which will
be obtained when the electroluminescent array is deactivated. The power
consumption of a deactivated tse device could be reduced to essentially
zero by allowing the control signal to power-down the logic circuit por-
tion of the tse device as well as the electroluminescent output array.
An improved one tse AND latch which uses the proposed control tech-
pique is illustrated in Fig. 5.3. note that control signal S is inter-
faced to the feedback tse data path using the same technique employed in
the basic one tse AND latch (Figure 4.2, page 53), This control signal in-
terface could be simplified by assuming that some active tse logic devices
are available which are enabled for normal operation or forced to produce
an all logic one output tse depending upon the state of a control line.
An increase in the complexity of the active tse logic devices would be
required. .Since this increase will not be associated with additional ad-
vantages such as reduced power consumption, the assumption that this class
of tse logic devices will be available will not be made at this time.
After additional experience has been gained in the design of tse logic
r--I_..c
MPUT
I
89
F
CONTROL TABLE
E	 C	 QT.
n	 a	 a
a	 1	 Qij
1	 o	 Iij
1	 1	 Qij+Iij
Figure 5.2 Improved one tse OR latch.
--H
NPUT
I
S
90
a
CONTROL TABLE
H	 s	 Qij
0	 0	 Q1j•Iij
0	 l	 Iii
l	 0	 Q1j
l	 1	 l
Figure 5.3 Improved one tse AND latch.
r
91
circuits and in the fabrication of tse devices, a decision can be made
concerning the development of tse devices with this capability.
Some Additional Improved Tse Memories
The circuit schematic and timing diagram for an improved one tse
master-slave memory is presented in Figure 5.4. Only ten unit gate
delays are required to store a new image in this memory whereas 15 unit
delays are required to store an image in the basic master-slave tse mem-
ory (Figure 4.3, page 55). The cost function of the improved memory is
6A + 4B compared to a ;ost function of 14A + 8B for the basic master-slave
memory.
Master-slave tse memories can be chained to create tse shift regis-
ters of any desired length. An example is the six tse circular right
shift register illustrated in Figure 5.5. dine active and six passive
tse devices are required in each section of this register for a total
cost function of 54A + 36B. Figure 5.6 illustrates the typical control
sequences for this shift register. A functionally equivalent shift reg-
ister constructed to use the elementary control technique would require
21 active and 12 passive tse devices in each section for a total hardware
cost of 126A + 7213. The hardware cost of this circuit is reduced by more
than 50 percent by adding control lines to the active tse devices.
An alternate implementation of the six tse, circular right shift,
parallel-input, parallel-output shift: register is shown in Figure 5.7.
This design does not exhibit master-slave capability on the parallel
inputs; however, as illustrated in Figure 5.8, this design does have the
advantage of parallel loading in four unit gate delays rather than the
i
t92
SCHEMATIC
	 a
E1
C1
E2
C2
TIME	 0	 1	 2	 3	 4	 $	 d	 7	 a	 9	 10
CONTROL SEQUENCE FOR STORING A NEW IMAGE
Figure 5.4 improved master-slave tse memory.
r
FRC
LAT
PL
El -
F	 jr
PL
Figure 5.5 Six tse circular right shift parallel-input, parllel-output
ILITY CE 'CIE
PA ("r is FnoR
I
I{'ULDOUT FRAM	 .
93
shift parallel-input, parllel-output master-slave shift. register.
oUT FRAME
PL
SR
El
Cl
E2
C2
\1
z	 x^	 x	 x
OUTPUT	 ^` `^_ J	 % `L .^ J,^
TIME
0	 5	 30	 15	 20	 25	 30	 35
PARALLEL LOAD	 SHIFT RIGHT
'	 Fi gure 5.6 Control signal timing diagram for the parallel--input, parallel-output master-slave 	 UD
tse shift register.
k
lr` 
f
Figure 5.7 Six tse circular right shift, parallel-input,
f
i4UfijOTJT 17RAMD
95
parallel-input, parallel-output shift register.
i 'FJ	 f l	 ), _
PL
SR
E1
C1
E2
C2 V_! ^l
— - - - -^`
\ i
OUTPUT ____J J ^^
TIME 25	 300	 5 10	 15	 20
PARALLEL SHIFT SHIFT
LOAD RIGHT RIGHT
Figure 5.8 Cont.-a l	signal timing diagram for the parallel-input, parallel-output tse shift register.
rn
97
nine unit gate delays required by the true master--slave design. The same
number of tse components are required to construct either circuit.
Improved Index Recognition Circuits
As demonstrated in Chapter 4, index recognition circuit character-
istics are important factors in determining the hardware cost and data
rate of a tse logic implementation of the skeletonizing algorithm. The
space iterative index recognition circuit (Figure 4.11, page 70) requires
only ten unit gate delays to recognize basis points with a surround of
index one, two, or three but has a hardware cost function of 125A + 77B.
A number of index recognition circuits which offer a range of choices in
the trade-off between circuit complexity and operating speed are desir-
able since component minimization is an important goal in the development
of tse logic circuits. The basic technique for reducing the amount of
hardware required to implement an index recognition circuit is to design
the circuit to operate in a time sequential mode.
Figure 5.9 shows an index recognition circuit which identifies basis
points with a surround of index one, two, or three by checking for one
possible orientation of each of the three indices at a time. The order
of the Golay neighbor plane inputs to the combinational logic portion of
the circuit is rotated by the multiplexer circuit so that basis points
with the correct index will be identified regardless of the orientation
of their surround. The partial result obtained after each rotation is
stored in the OR latch. As demonstrated by the timing diagram in Figure
5.10, 75 unit delays are required to recognize all three indices using
this circuit. The cost function for the multiplexed index recognition
circuit is 104A + 69B. Thus, this circuit requires 21 fewer active
^p
o^
UDCO
GO	 ac	 W	 G!	 G[
Figure 5.9 Multiplexed index recognition circuit.
M,
99
r
	
INPUT	 ti
Al
	
A2
	 I
A3
N4
N5
IID
CID
x
pUTPU7 _. - .,--------- ---------- --- -r1._
i
	
TIME I---r---r—}--I—}--;----i— i-^	 z—
0	 to	 20	 30	 40	 50	 6o	 70
Figure 5.10 Timing diagram for the multiplexed index recognition circuit.
100
devices and eight: fewer passive devices than the space iterative index
recognition circuit:. The multiplexed index recognition circuit is not
practical for implementation using basic tse devices which lack a single
line disable input since a minimum of 38 control line interfaces would be
required. This would increase the circuit; cost function to 180A + 1070
which is 42 percent more components than the high performance space
iterative index recognition circuit (Figure 4.11, page 70) requires.
An alternate technique for rotating the Golay neighbor planes is
-illustrated by the shift; register based index recognition circuit pre-
sented in Figure 5.11. This circuit can recognize all basis points with
a surround of index one, two, or three in 82 unit gate delays (Figure
5.12). The cost function for the shift register based index recognition
circuit is 68A + 45B. Thus, a 34 percent'savings in tse components can
be realized with only a nine percent increase in propagation delay by
rotating the Golay neighbor planes with a shift register rather than a
multiplexer. The shift register type index recognition circuit can be
realized using the basic control technique (Figure 5.1, page 87) at a
hardware cost of 140A + 81B. Such a design would be impractical, however,
:since improved performance could be acheived at a lower hardware cost
using the space iterative index recognition circuit (Figure 4.11, page 70).
The hardware cost of performing the index recognition task using the
comparison type circuit (Figure 4.12, page 71) discussed in Chapter 4 is
60A + 33B. As illustrated in Figure 5.13, the hardware cost of the com-
parison type index recognition circuit can be reduced to 20A + 133 if
integrated circuit EXCLUSIVE-OR gates are available. Because of the gen-
eral usefulness of the EXCLUSIVE-LR function, EXCLUSIVE-OR gates are
proposed for development as the first complex, integrated tse logic device,
Figure 5.11 Shift register based index
vOLDOUT
_A. I
101
agister based index recognition circuit.
'IDOU':'
102
XrINPUT
	 ---
PL
SR
E1 J L
Cl
E2
C2
EID
IDP
OUTPUT
TIME I^—}-- 10—{— 20 X30 4, ,.^.._„S91_.__1p 70
	
so
Figure 5.12 Timing diagram for the Fhift register based index
recognition circuit.
”. ' U^IIiTLI'I'Y OP` TH'
1^7, PAGE IS POOP
N q NE
I U3
Figure 5.13 Comparison -type index recognition circuit using
EXCLUSIVE-OR gates.
104
An improved comparison type index recognition circuit using EXCLUSIVE-OR
gates requires only 159 unit gate delays (Figure 5.I4) to identify all
basis points with a surround of index one, two, or three. The worst case
time required to identify basis points with any set of indices is only
285 unit gate delays.
Performance characteristics for the various index recognition cir-
cuits that are useful in performing the Golay transform skeletonizing
algorithm are summarized in Table 5.1. The trade off between circuit
complexity and operating speed is illustrated by the increase in propa-
gation delay as the number of tse components required to perform the
index recognition task is reduced. Table 5.1 also shows the potential
value of including a single line disable on the active tse logic devices
since only two of the five index recognition circuits are practical if
the single line disable is not provided.
An OR Latch Implementation of the Skeletonizin q Algorithm
A medium performance implementation of the Golay transform skeleton-
izing algorithm using OR latches and the shift register type index
recognition circuit is illustrated in Figure 5.15. This circuit operates
in basically the same manner as the skeletonizing machine described in
Chapter 4. The OR latch design, however, has the advantage of permit-
ting changes in the subfield operating order. Normally the modular oper-
ation specified by the Golay transform skeletonizing algorithm is per-
formed on basis points in subfields one, two, or three in that order.
If some other subfield operating order is employed, a skeleton of the
original image will still be obtained but, depending upon the shape of
the original image, the skeleton might be shifted to a slightly different
i
IvPVr. t^
NA
N3
NC
ND
NE
NF
^a
EID
.7j i.3
CID
OUTPUT
TIME
0	 10	 20	 30 1 40	 50	 60	 7o	 e0	 90	 1 o	 llo	 7 o	 1 o	 I o	 15o	 llo
Figure 5.14 Timing diagram for the comparsion type index recognition circuit using
EXCLUSIVE-OR gates to identify basis points with an index of one, two, or three.	 u'
106
TABLE 5.1
	 PERFORMANCE CHARACTERISTICS OF THE
INDEX RECOGNITION CIRCUITS
Circuit Figure Cost Function
j
Propagation Delay
Type Number in Unit Gate Delay
SPACE ITERATIVE 4.11 125A+77B 10
COMBINATIONAL CIRCUIT
9
MULTIPLEXED CIRCUIT 5.7 100+69^ 75
SHIFT REGISTER CIRCUIT 5.11 68A+45B 82
BASIC COMPARISON 4.12 60A+33B 214
CIRCUIT
COMPARISON CIRCUIT WITH 5.13 20A+13B
i
159
EXCLUSIVE-OR GATES
i
{
is.
LAYER
	 LAYER
--I
V
Figure 5.15 Hardwired skeletonizing machine using OR latches.
R
108
location within the image field. This effect could be useful in a poten-
tial cloud tracking application which has been suggested for the
skeletonizing machine [17].
The location and velocity of major cloud formations is one type of
valuable meteorological information which can be obtained from earth
resources satellites. Traditionally, successive images of the cloud for-
oration are transmitted to ground stations where conventional computers
are used to compute the speed and direction of the cloud formation. if
a tse logic skeletonizing machine is included in the satellite, cloud
images could potentially be skeletonized in real time. A sequence of
skeletons could be transmitted to earth as one image which would show the
track of the air mass carrying the cloud. Since less data would be
transmitted to earth, the bandwidth of the.transmission channel and the
processing load on earthbound computers could both be reduced. One
potential problem is the time varying shape of the cloud formation. The
skeletonizing operation should consistently produce skeletons at a point
near the geometric center of the original cloud image to facilitate
accurate cloud tracking. An OR latch skeletonizing machine design such
as the one illustrated in Figure 5.15 would allow several skeletons of
each cloud image to be obtained using different subfield operating orders.
The average skeleton of the image would then provide a more consistent
indication of cloud position than a single skeleton.
The OR latch skeletonizing machine illustrated in Figure 5.15 is
more amenable to alternating subfield operating orders than an improved
version of the skeletonizing machine presented in Chapter 4 (Figure 4.17,
page 78) would be because the new subfield produced by the current pro-
cessing step is always combined with the other two subfields via the
I
109
subfield multiplexing circuit. Th.-.,re are three alternate paths through
the subfield multiplexing circuit. Each path contains a film mask or an
active tse OR gate with a programmed electroluminescent output array.
The first image path is masked so that only those data points within sub-
field one of the input tse will be transmitted. Data points within the
second and third subfields are zeroed. Similarly, the second and third
•	 image paths transmit only data points that are within the second and
third subfields of the image, respectively. Either the output of latch
A or the Golay function circuit output can be selected for transmission
through any of the image paths. normally, the image output from latch A
propagates through the two image paths which correspond to subfields that
are not currently being operated on,and the Golay function circuit out-
put propagates through the remaining image path. The image paths are
recombined at the output of the subfield multiplexing circuit to produce
a new image which is the result of the current skeletonizing algorithm
operation. If the current operation is the last of the three subfield
operations, the new image can be compared to the result of the preceding
operation to determine whether the skeleton is complete or another itera-
tion is required. Image propagation through the three subfield multi-
plexing circuit image paths is completely controlled by conventional
logic signals. Thus, the control unfit of the OR latch skeletonizing
machine can be designed to select any required subfield operating order
and change the order as often as necessary.
The input logic required by the OR latch skeletonizing machine is
significantly less complex than the input logic required by the machine
presented in Chapter 4. This is partially due to the use of an inte-
grated EXCLUSIVE-OR gate. In addition, however, note that the total
_^ 5
110
spiller has been replaced by a contractor device which produces a single
CMOS compatible output. The output of the contractor is a logic one if
any basis point changed states as a result of a modular operation per-
formed during the current iteration of the skeletonizing algorithm. In
that case, another iteration is required. The conversion from a total
spiller to a contractor is a direct consequence of the assumption that
active tse components can be deactivated by a one bit control line. As
a further result of this assumption, some of the input logic operations
can be performed by the conventional logic control unit,and fewer tse
components are required to implement the input logic.
Control of the OR Latch Skeletonizin Machine
A timing diagram for performing one complete iteration of the
skeletonizing algorithm using the OR latch skeletonizing machine (Figure
5.15, pagel07) is illustrated in Figure 5.15. The timing diagram includes
control signals which are used to minimize power consumption by disabling
active tse devices when they are not in use. Each iteration of the
skeletonizing algorithm is defined as one machine cycle,and each subfield
operation is defined as a subcycle. Thus, there are three subcycles
within the 315 unit gate delay machine cycle of the OR latch type skele-
tonizing machine.
The control unit for the skeletonizing machine can be developed
using any of the conventional logic, sequential circuit design tech-
niques. One state-of-the-art control unit design is illustrated in
Figure 5.17. Control signals for the skeletonizing machine are gener-
ated by programmable logic arrays (PLAs). A binary counter chain which
	INPUT	 t
MP
	
I P -r1 	 ^^1	 /^^
AM
AM
	
AE2	 l^
AC2
	
A'E2	 J^
AC2
AO
GP
SP
	
PL	 /^1
SR
ET
CT
E2 __l
	
C2	 T^.
	
IRP	 -^-1
	
ELD	 -^^^
lop
FP
	
51	 l—`
57
S2*
57
S3+
SMT
TS
CP
Lo
OD
	
TIME 1	
-	
-+—^—/.
	
D	 5{I	 100	 L4D	 2D0	 25D	 300
FIRST SUBFIELD OPERATION
	
SECOND SDeFIELD OPERATION 	 TNLRD SUHFIEW OPERATION
t a 
r'^s	
.
C)
	
Figure 5.16 Timing diagram for one complete iteration of the skeletonizing algorithm rising the
OR latch type skeletonizing machine.
Iti
f
OC RYSTA L
IK	 OSCILLATOR
100KHz
SQXO-2A
START/STOPDIVIDE — BY — N COUNTER
CK	 4059A
N = 500
L 	 CK	 CL
4 BIT SYNCHRONOUS COUNTER ENP
74LSI63	 ENT
1 Qq
	 ^Q7	 IQe	 Qs
LD	 CK	 CL
RC 4 BIT SYNCHRONOUS COUNTER ENP
74LS163	 ENT
04 	 IQ3	 iQ2	 Qi
98
CLEAR CONTROL SIGNALS
Figure 5.17 Control unit for the OR latch skeletonizing machine.
PROGRAMMABLE LOGIC ARRAY
N
p
113
is driven at a clock rate corresponding to twice the inverse of the
standard gate propagation delay provides inputs to the programmable logic
arrays. Each unit time increment in the 3.5 unit gate delay machine cycle
is uniquely defined by the state of the counter chain. The PLAs produce
the control signals required by the skeletonizing machine by decoding the
state of the counter chain, the state of the contractor output, and the
state of external control signals which start and stop the machine. An
output signal from one PLA resets the binary counter chain at the end of
each machine cycle. In applications such as cloud tracking where a
fixed sequence of different subfield operating orders is required, an
additional binary counter circuit can be used to determine the current
subfield operating order. The programmable logic array would provide a
clock pulse to the counter at the completion of each machine cycle. The
PLA would then decode the new counter state to determine the subfield
operating order required during the next machine cycle. Table 5.2 sum-
marizes the function of each control signal produced by the OR latch
skeletonizing machine control unit.
An Improved AND Latch Implementation of the
Skeletonizing Algorithm
The OR latch type skeletonizing machine can be converted to an
improved version of the AND latch type skeletonizing machine presented
in Chapter 4 by redesigning register A, register A`, and the subfield
multiplexing circuit. A schematic for the AND latch type skeletonizing
machine is presented in Figure 5.18, and a timing diagram for one machine
cycle is illustrated in Figure 5.19. Since a higher data rate is
achieved with fewer tse components, this design is preferred over the
I
114
TABLE 5.2
OR LATCH SKELETONIZING MACHINE CONTROL SIGNALS
Control dumber of Components Control
Signal Controlled Function
MP 2 Machine Power
IP 3 Input Logic Power
AEi I Layer Input Gate
AC1 1 Latch A
AE2 1 Latch A
AC2 1 Latch A
A'E2 1 Latch A'
A'C2 1 Latch A'
AQ 1 Latch A Output
GP 9 GNPG Power
SP 18 Shift Register Power
PL 10 Shift Register Input
SR 6 Shift Slave into Master
El 6 Shift Right to Next Latch
Cl 6 Store Slave
E2 6 Input to Master
C2 6 Store Master
IRP 16 Index Recognition Combinational
Logic Potter
EID 1 Index Recognition Latch Input
IDP 2 Index Recognition Latch Power
FP 10 Golay Function and Multiplexer Power
Sl 1 First Subfield Control
Sl* 1 NOT First Subfield Control
S2 1 Second Subfield Control
S2* 1 NOT Second Subfield Control
S3 1 Third Subfield Control
S3k 2 NOT Third Subfield Control
SMT 1 Subfield Multiplexer Output
TS 4 Contractor Power
CP 1 Continue Processing Control
LO 1 Layer Output Control
OD 0 Contractor Output Signal
Figure 5.18 Schematic diagram for an improved AND latch hardwired skeletoniring machine.
(HFUT Wit_-----_--___--------------- ----- ------^--- --- ---------^^_^^,---------- ----^^
MP
IP
AEI
ACI
AE2
XE2
XC2
H
H*
5
AO
OP
Sr
Pt
SR
ET
CI
E2
C2
IRP
Vol
IDP
FP
x
x*
SKIT
TS
CP
LO
OD
TIME 
p	 SO	 100	 ISO	 200	 250	 000
FIRST SURFIEED OPERATION	 SECOND. SUAFIELD OPERATION	 THIRD SUBFIELD OPERATION
Figure 5.19 Timing diagram for one complete iteration of the skeletonizing algorithm using the
improved AND latch skeletonizing machine.	 a
L41-
117
OR latch design when only one subfield operating order is required. A
control unit for the AND latch machine could be constructed using the
technique described for the OR latch machine. Control signals for the
AND latch skeletonizing machine are described in Table 5.3. The
hardware costs, power requirements, and performance of the AND latch and
OR latch versions of the skeletonizing machine are summarized in Table 5.4.
,e AND and OR latch skeletoninzing machines presented in this
chapter are medium performance machines. In particular applications, a
higher throughput design or a design which requires fearer tse components
may be required. Because the basic skeletonizing machine designs are
independent of the type of index recognition circuit used, these machines
can be tailored to specific applications by selecting the appropriate
index recognition circuit from Table 5.1, page 106. In low data rate
applications, the improved comparison type index recognition circuit
should be used to reduce hardware costs. For applications where the
machine cycle time of the skeletonizing machine using a shift register
based index recognition circuit is too long, the multiplexed or space
iterative index recognition circuits can be employed at a significantly
higher hardware cost. Alternately, the skeletonzing machine organization
described in Chapter 7 can be utilized. This organization offers several
performance advantages in ultrahigh data rate applications.
118
TABLE 5.3
IMPROVED AND LATCH SKELETONIZING MACHINE CONTROL SIGNALS
Control ControlNumber of Components
Signal Controlled Function
MP 6 Machine Power
IP 3 Input Logic Power
AE1 i Layer Input Gate
ACl I Latch A
AE2 I Latch A
A'E2 1 Latch A'
A`C2 I Latch A'
H	 . I Latch A Input
H* I Latch A Input
S 1 Latch A Store
AQ I Latch A Output
GP 9 GNPG Power
SP 18 Shift Register Power
PL 10 Shift Register Input
SR 6 Shift Slave into Master
El 6 Shift Right to Next Latch
C1 6 Store Slave
E2 6 Input to Master
C2 6 Store Haster
IRP 16 Index Recognition Combinational
Logic Power
EID I Index Recognition Latch Input
IDP 2 Index Recognition Latch Power
FP 5 Golay Function and Multiplexer Power
X I NOT First Subfield Control
X* 3 First Subfield Control
SMT 1 Subfield Multiplexer Output
TS 4 Contractor Power
CP I Continue Processing Control
LO 1 Layer Output Control
OD 0 Contractor Output Signal
l
TABLE 5.4
PERFORMANCE CHARACTERISTICS OF THE IMPROVED
AND LATCH AND OR LATCH SKELETONIZING MACHINES
Machine	 Cost	 dumber of Total Gate Delays Data Rate Average Peak Speed-Power
Type	 Function	 Control Component per Iteration Simple Power Power Product in
Signals Count (Time in Images Consumption Consumption Watt--Seconds
Seconds) per Minute in Watts in Watts
OR LATCH
	
123A+898	 31 212 315 38.10 145.70 177.00 229.47
(1.58)
IMPROVED
AND LATCH 120A+86B	 29	 206	 313	 38.34	 160.00	 195.00	 250.35(1.57)
z
tj
N^
H
Q
CO
CHAPTER 6
A PIPELINED ARCHITECTURE FOR THE SKELETONIZING MACHINE
The medium performance tse logic skeletonizing machines presented
in Chapter 5 achieve high data processing rates using a minimum number
of elementary tse logic devices. However, certain applications may
require even higher data processing rates at the expense of additional
hardware. The various index recognition circuit designs described in
Chapter 5 illustrate that the data processing rate can be improved sig-
nificantly by increasing the hardware complexity of the index recog-
nition circuit. This chapter presents a pipelined architecture for
the skeletonizing machine which allows further improvements in the
data processing rate through a more efficient utilization of the index
recognition circuit.
Multiple Tse Processing
The Golay transform algorithms permit the -transformation of only
one subfield of an image at a time. As a result, only one third or
less (less for the four or seven subfields case) of the elements of 	 Y a
each tse logic device used in the Golay neighbor planes generator, index
recognition, and Golay function circuits can be performing a useful
operation
the Golay
simultane
image are
subfields
on a single image at any instant. The operation performed by
neighbor planes generator inherently restricts the number of
)us input images to one because all subfields of the input
involved in the output function. Points within different
do not interact in the index recognition and Golay function
120
121
circuits. Therefore, these circuits are capable of processing distinct
subfields from several different; images simultaneously.
Figure 6.1 illustrates a three plane mixer circuit: which can be
used to create a composite image for parallel processing by the index
recognition circuit. The inputs to the -three plane mixer are Golay
neighbor planes of the same type from three separate tses. To achieve
the minimum index recognition time, the Golay neighbor planes must be
available simultaneously, and, thus, three Golay neighbor planes gen-
erator circuits are required. The shift register based index recog-
nition circuit (Figure 5.11, p, 101) provides input latches which can
accept the three subfields of the composite tse from the three plane
mixer sequentially. This permits the use of a single Golay neighbor
planes generator circuit when a somewhat: longer image processing time
is acceptable. Six three plane mixer circuits are required to produce
the composite Golay neighbor planes GA*, GB*, GC*, GD*, GE*, and GF*.
One additional three plane mixer is used to create a composite image,
QA*, of the three tses currently being processed. This image is one
input to the Golay function logic.
A Minimum Hardware, Modified Pipeline Implementation of
the Skeletonizing Algorithm
The possibility of processing distinct subfields from several dif-
ferent images simultaneously suggest the development of a skeletonizing
machine architecture which uses a modification of the pipeline prin-
ciple. Figure 6.2 illustrates the traditional pipeline orgainzation
	
r
for an image processing machine. The image processing algorithm is
w
122
f
Figure 6.1 Three plane mixer.
aa
e
INPUT DATA
OUTPUT DATA
Figure 6.2 Block diagram of a traditional pipeline machine
architecture.
129
broken down into a sequence of steps which can be implemented by suc-
cessive logic circuits. Synchronizing registers [181 store the partial
result obtained from each step while the succeeding processing step is
being performed. Images are fed into the top of the pipeline, proces-
sed as they are clocked down the pipeline, and extracted at the bottom.
This type of machine processes images at the rate at which they can be
clocked into the machine rather than at a rate which is dependent upon
the complexity of the algorithm. Although this organization is expen-
sive in terms of hardware cost, the pipeline is-highly efficient since
all of the gating can be utilized 100 percent of the time [18].
Golay transform algorithms are generally not fixed length algori-
thms because the required number of iterations is a function of the
image being processed. An excessively Marge number of synchronizing
registers and logic circuits would be required to guarantee the com-
pletion of the Golay transform in a straight pipeline of the type il-
lustrated in Figure 6.2. This difficulty can be overcome by providing
a gated feedback path for the output image. A logic circuit at the
input of the pipeline would determine whether to gate a new image into
the first synchronizing register or gate the current output back into
the pipeline for additional processing. Eking this technique, any
integral number of processing stages can be used to construct a machine
to perform a convergent Golay transform such as the skeletonizing al-
gorithm.
Although feasible, a conventional pipeline organization of a Golay
transform machine would not be efficient because only one subfield of
an input image could be processed at a time. Thus, one-third or more
.{
At
.125
of the elements of each device employed in the image processing logic
would be unused at any instant. Figure 6.3 is a block diagram of a
unique pipeline organization for Golay transform processing machines.
This organization allows one image processing logic circuit to process
one subfield from each of three different pipelined images simulta-
neously. The hardware minimization achieved by using this architecture
is extremely important because of the high cost projected for tse logic
devices.
A block diagram of a tse logic implementation of the skeletonizing
3
algorithm using the new pipeline organization is illustrated in Figure
6.4. The Golay neighbor planes generator, three plane mixer, index
recognition, and Golay function logic circuits comprise the image pro-
cessing logic. Three Golay neighbor planes generators are required to
obtain the highest processing rates. However, Figure 6.4 illustrates
a lower hardware cost design which takes advantage of the latches 	 j
available in the shift register based index recognition circuit to per-
mit sequential use of a single Golay neighbor planes generator circuit.
Latches A, B, and C are the synchronizing registers of a conven-
tional pipeline organization. One subfield operation is performed on
the input image as the image is clocked between each succeeding latch.
Two plane mixer circuits, such as the one illustrated in Figure 6.5,
are used to form the composite images which represent the result of
each sub field operation. In addition to the two plane mixer which pro-
vides the feedback signal to the layer input logic, a two plane mixer
is required at the input to latch B and latch C. Latches S^ and C^
preserve the images present at the beginning of each iteration of the 	
i
algorithm for comparison to the processed images.
LAYER	 LAYER
I	 -i
-I
126
Figure 6.3 Block diagram of a modified pipeline architecture for tse
logic image processing machines using Golay transforms,
127
LATCH
	
LATCH
v	
Ow B'
Q r
LAYER
INPUT	 ^	 —
LAYER
	 — T
	
MI	 M2
LAYER INPUT
	 LATCH	 LATCH	 LATCH
LOGIC	 A	 MR 9
	
M2 C
	
^^^ O	 O
O.D.	 L
F'
TWO
PLANE	 GOLAY NEIGHBOR PLANE5 GENERATOR
MIXER
	
GA	 GB	 GC	 GD	 GE	 GF
THREE PLANE MIXER CIRCUITS
CA*
	
GA*	 GD*	 GC*	 GDs)	 GE*	 GF*
GOLAY
^F^-- FFUNCTION
	
INDEX RECOGNITION LOGIC
LOGIC	 R
i
Figure 6.4 Block diagram of a Modified pipeline lase logic
implementation fo the skeletonizing algorithm.
l12$
I
MX I MX
-F
k
Figure 6,5 A two plane mixer.
129
The operation of this machine can be explained by following an
image, J, through one subfield operation of the skeletonizing algorithm.
Image J is gated through the layer input logic and clocked into latch
A. Simultaneously, other images are clocked into latches B, fa r , C,
and C. Now, assume that latch B' contains a previous input image, K,
latch B contains the result of the first subfield operation on K, K*,
latch C' contains a previous input image, L, and latch C contains the
result of the first and second subfield operations on image L, L**.
The output of latch A is enabled and Golay neighbor plane_ are genera-
ted for image J. Subfield one of each of the Golay neighbor planes
from image J is gated through the three plane mixer network and clocked
into the shift register type index recognition circuit. The output of
latch A is disabled, and the output of latch B is enabled so that sub-
field two from each Golay neighbor plane of image K* can be clocked
into the index recognition circuit after disabling output QB of latch
B and enabling output QC of latch C.
Upon completion of the index recognition procedure, images J, K*,
and L** are combined in another three plane mixer, and a composite
-image consisting of subfield one of image J, subfield two of image K*,
and subfield three of image L** is provided to the Golay function logic.
Image F, which is formed by the Golay function logic, consists of the
first, second, and third subfields of images J, K*, and L**, respec-
tively. The first subfield of image F and the second and third sub-
fields of image J are Combined by a two plane mixer for clocking into
latch B. Subfield two of image F is combined with subfields one and
three of image K* by another two plane mixer for clocking into latch C.
Image L** has just completed one iteration of the skeletonizing algori-
130
thm, so subfield three of image F is combined with the previously pro-
cessed subfields one and two of image L ** to form image F ' which is the
result of the current iteration performed on image L. Image F f is
EXCLUSIVE-ORed with image L to detect any differences in the images.
If no differences are detected, image F r is the skeleton of image L
and will be gated out of the machine as a new image is clocked into
latch A. If a difference is detected, image F r is gated into the in-
put of latch A for additional processing. Latches A, B, C, B' and C"
are then clocked to store
as described above. At t
B contains image J* which
tion on image J, latch C^
new images in them, and processing continues
his point latch B , contains image J, latch
is the result of the first subfield opera-
contains image K, latch C contains image K**
which is the result of the second subfield operation on image K, and
latch A contains either a new image or the result of the last subfield
operation on image L. Successive subfield operations continue on each
of the images until no reduction of the image is obtained in one com-
plete iteration, and the skeleton of the image is gated out of the
machine. When three Golay neighbor planes generators are utilized to
obtain higher data processing rates, the serial procedure for genera-
ting composite Golay neighbor planes is not required, and any type of
index recognition circuit can be employed.
Figure 6.6 is a schematic of the minimum hardware, modified pipe-
line implementation of the skeletonizing algorithm. The timing dia-
gram provided in Figure 6.7 shows that the cycle time of this machine
is 144 gate delays. For a particular image, each iteration of the
skeletonizing algorithm requires 432 gate delays. However, because of
rb
^O}3
Figure 6.6 Schematic for the minimum hardware, modified pipeline implementation of the
skeletonizing algorithm. CA)
132
INPUT = -
-- ----------------------..------ .____-_--_ -__nYJ
MP
1P
AEI n
AC1
	
AU-2	 n
	
AC2	 v- 	 -
AC!
C'c
GP
	
51	 r^^
	
52 	 r^
.-..	 S2^k	 ^1
	
53	 /—^
SP
PL
	
SR
	
116	 111	 111,
	
E1 _	 ^_ n n n n
c1
	
E2	 n
'
111F.
,
C2
IRP
EtD
IDP
FP
	
SO	 /1
6C1 rl 	 /-
EE2 -1
BC2
aQ
co -^	 J---^	 , J --
SMT TL __
TS
CP ---
	 J
LD
	
OD	
-	
,T
	
TIME k
	
E	 E	 }	 +-	 I	 t--k - I	 }—r------I
	 !	 +	 {
	
0	 so	 100
Figure 6.7 Timing diagram for the minimum hardware, modified
pipeline implementation of the skeletonizing algorithm.
V,V,P .ODUG1BILrrY 01' U-1 L
o10ANAL PAGE S +;) U c
t
i133
the pipeline organization and simultaneous processing of three dif-
ferent images, the effective or average time for each iteration is
only 144 gate delays. As a result, this machine can process 83 simple
images per minute compared to only 38 simple images per minute for the
machines described in Chapter 5 which use the same index recognition
circuit.
Thirty-four control signals are required by this machine. Their
Functions are outlined in Table 6.1. The hardware cost function for
this implementation of the skeletonizing algorithm is 183A + 1296 for
a total of 312 components. A measure of the efficiency of this design
is provided by the speed-power product of 120 watt-seconds compared to
over 229 watt-seconds for the hardwired machines described in Chapter 5.
Additional Modified Pipeline Implementations of
the Skeletonizing Algorithm
Figure 6.8 illustrates the general schematic for modified pipe-
,._
line implementations of the skeletonizing algorithm when three Golay
neighbor planes generators are available. The index recognition cir-
cuit can be of any desired type. Figure 6.9 is the timing diagram for
the case of the shift register based index recognition circuit which
was utilized in the minimum hardware, modified pipeline skeletonizing
machine. Machine cycle time is reduced to 108 gate delays for a pro-
cessing rate of 111 simple images per minute. The hardware cost func-
tion is 193A + 157B for a total of 350 components. Elimination of the
serial procedure for generating the composite Golay neighbor planes
reduces the speed-power product by 14.5 percent to 102 watt-seconds.
41
134
TABLE 6.1
MINIMUM HARDWARE, MODIFIED PIPELINE SKELETONIZING
MACHINE CONTROL SIGNALS
Control	 Number of Conponents	 Control
Signal	 Controlled	 Function
MP 5 Machine Power
IP 3 Input Logic Power
AE1 3 Layer Input; Gate
ACl 2 Latch A Slave
AE2 2 Latch A Slave
AC2 2 Latch A Master
AQ 2 Latch A Output
C' C 1 Latch C' Output
GP 23 GNPG Power
Si 6 Mixer Subfield 1
S2 6 Mixer Subfield 2
S2* 12 Mixer Subfield 2 NOT
S3 6 Mixer Subfield 3
SP 18 Shift Register Power
PL 6 Shift Register Input
SR 6 Shift. Slave into master
El 6 Shift Right to Next Latch
Cl 6 Store Slave
E2 6 Input to Master
C2 6 Store Master
IRP 16 Index Recognition Combinational
Logic Power
EID 1 Index Recognition Latch Input
IDP 2 Index Recognition Latch Power
FP 17 Golay Function and Multiplexer Power
BEl 4 Latch B and C Slave Input
BC1 2 Latch B and C Slave Feedback Path
BE2 2 LatchB and C Master Input
BC2 2 LatchB and C Master Feedback Path
BQ 1 Latch B Output
CQ 2 Latch C Output
SMT 1 Subfield Multiplexer Output
TS 4 Contractor Power
CP 1 Continue Processing Control
LD l Layer Output Control
OD 0 Contractor Output Signal
I
411
Figure 6.8 Schematic for the modified pipeline implementation of the skeletonizing algorithm
with three Golay neighbor planes generators. w
136
INPUT
MP
1 P - --°
AE1
AC1
AE2
AC2
AQ f^^
CC V
GP
SP
PL
5R
I:1
C1
E2
C2
IRP ^^
EID
IDP
FP
BEi
BC1
-	
---^BE2
BC2
B 
SMT
TS
C 
LO
OD L - -	 f-.	 - --	 ---
TIME
0 $ 0 	 100
I
k-.
Figure 6.9 Timing diagram for the modified pipeline skeletonizing
machine with the shift register based index recognition circuit.
I
137
This implementation of the skeletonizing machine provides approximately
twice the data processing capability of the OR latch machine presented
in Chapter 5 while reducing the speed-power product by 56 percent. The
cost of this improved performance is a 57 percent increase in the num-
ber of active components and a 76 percent increase in the number of
passive components required to build the skeletonizing machine. The
functions of the 29 control signals required by this machine are out-
lined in Table 6.2,
For applications which require ultra high 'data processing rates,
a space iterative index recognition circuit can be used in the skele-
tonizing machine in Figure 6.8. A timing diagram for this implemen-
tation of the skeletonizing algorithm is illustrated in Figure 6.10.
Control signal functions are described in Table 6.3. The hardware cost
of this machine is 249A + 1896. A total of 438 components, or more
than twice the number of components required by the OR latch skele-
tonizing machine, are utilized in this ultrahigh speed design. Fig-
ure 6.10 shot-is that the machine cycle time is only 34 gate delays
which allows the skeletonizing machine to process 353 simple images
per minute. This is approximately nine times the processing speed of
the OR latch machine described in Chapter 5. The speed-power product
of this high speed, modified pipeline skeletonizing machine is 93 watt-
seconds versus 229 watt-seconds for the OR latch machine. The charac-
teristics of the three modified pipeline skeletonizing machines de-
scribed in this chapter are summarized in Table 6.4.
Extension of the modified pipeline organization described in this
chapter to the realization of Goiay transforms involving four or seven
Ff
138
TABLE 6.2
CONTROL SIGNALS FOR THE MODIFIED PIPELINE SKELETONIZING
MACHINE WITH THREE GOLAY NEIGHBOR PLANES GENERATORS
w	r
Control Number of Components Control
Signals Controlled Function
MP 5 Machine Power
IP 3 Input Logic Power
AE1 3 Layer Input Gate
ACl 2 Latch A Slave
AE2 2 Latch A Slave
ACP 2 Latch A Master
AQ 2 Latch A Output
C'C 1 Latch C' Output
GP 63 GNPG Power
SP 18 Shift Register Power
PL 6 Shift Register Input
SR 6 Shift Slave into Master
El 6 Shift Right to Next Latch
Cl 6 Store Slave
E2 6 Input to Master
C2 6 Store Master
IRP 16 Index Recognition Combinational 	 =
Logic Power
EID 1 Index Recognition Latch Input
IDP 2 Index Recognition Latch Power
FP 17 Gotay Function and Multiplexer Power
BE1 4 Latch B and C Slave Input
BC1 2 Latch B and C Slave Feedback Path
BE2 2 Latch B and C Master Input
BC2 2 LatchB and C Master Feedback Path
BQ 3 Latch B Output
SMT 1 Subfield Multiplexer Output
TS 4 Contractor Power
CP 11 Continue Processing Control
LO 1 Layer Output Control
OD
k;
0 Contractor Output Signal
139
i
iNPUT
I _..
AP
AC i
AE2
A
i R	 \	 -^^
FP
-
BE1	 -_
BCi
BED	
_	 -----
_^
SMT
TS
CP
LO
OD
TIME I
o	 10	 20 30
Figure 6.10 Timing diagram for the modified pipeline skeletonizing
machine with the space iterative index recognition circuit.
I
140
TABLE 6.3
CONTROL SIGNALS FOR THE MODIFIED PIPELINE
SYELETONIZING MACHINE WITH THE SPACE ITERATIVE INDEX
RECOGNITION CIRCUIT
Control Number of Components Control
Signal Controlled Function
MP 5 Machine Power
IP 3 Input Logic Power
AE1 3 Layer Input Gate
AC1 2 Latch A Slave
AE2 2 Latch A Slave
AQ 4 Latch A Output
C'C 1 Latch C' Output
GP 63 GNPG Power
IR 131 Index Recognitionon Circuit Power
FP 15 Golay Function and multiplexer Power
BEl 4 Latch B and C Slave Input
BC1 2 Latch B and C Slave Feedback Path
BE2 2 Latch B and C master Input
BC2 2 Latch B and C Master Feedback Path
BQ 3 Latch B Output
SMi' 1 Subfi el d Multiplexer Output
TS 4 Contractor Power
CP 1 Continue Processing Control
LO 1 Layer Output Control
OD 0 Contractor Output Signal
i
s
i
1
1g^s ° ^^),)IjCIBILrrY OF ` 111,
^jrj%Lr,4LL 1?AGS IS MOR
TABLE 6.4
PERFGTU-'MCE CHARACTERISTICS OF THE t'ODIFIED PIPELINE SKELETONIZING MACHINES
tlachine Cost	 Number of Total Gate Delays Data Rate Average Peak Speed-Poser
Type Function	 Control Cosrponent per Iteration Simple Poe,er Power Product in
Signals Count (Tiir? in Images Consumption Consumption Watt-Seconds
Seconds) ner Minute in Watts in 11atts
t1I t3I G,ilb1
Hr'6rUARE 183A+1298	 34 3I2 144 83.33 166.19 255 119.66V13DIFIED (0.72)PIPELINE
MMIFIED
PIPELINE
I;ITH SHIFT	 193A+1 57B	 29	 350	 IQS	 111.11	 189.50	 363	 102.33
REGISTER	 (0.54)
INDEX
RECOG'11TION
MODIFIED
PIPELINE
WITH SP14E
	
249A+1898	 19	 438
ITERATIVE
INDEX
RECQC'1mori
34	 352.94	 545.12	 714	 92.67(0.17)
r
142
subfields is straightforward. Each additional subfield requires two
additional latches, one for storage of the input image and one for tem-
porary storage of the additional partial result from the subfield oper-
ation for that subfield. The three plane mixer circuits become four
or seven plane mixers which consist of a mask for each subfield and OR
gates to combine the multiple masked inputs into a single composite
output image. Four or seven Golay neighbor planes generators are re-
quired unless the sequential technique for generating the composite
Golay neighbor plane is employed. The two plane mixer circuits re-
quire only a mask change to insure that the correct subfields from the
two input images are combined to form the composite output image. Since
the Golay neighbor planes generator for the four subfields case is
much simpler than for the three or seven subfields case, the case of
four Golay neighbor planes generators does not increase the basic hard-
ware cost of the Golay transform machine. However, the seven Golay
neighbor planes generators for the seven subfields case represent a sub-
stantial increase in hardware cost for the basic Golay transform machine
organization.
This chapter presented the development of modified pipeline re-
alizations of the skeletonizing algorithm which are useful in appli-
cations that require high data processing rates. Chapter 7 will de-
scribe the design of a programmable tse computer architecture which
is capable of performing Golay transrorms with a relatively small num-
ber of tse logic devices.
143
CHAPTER 7
A SPECIAL PURPOSE PROGRA14MABLE TSE PROCESSOR
A number of different architectures for hardwired tse logic
skeletonizing machines have been presented in previous chapters.
These machines have the advantage of providing relatively high data
processing rates but do not offer the flexibility which could be
obtained with a programmable tse processor, This chapter presents a
special purpose programmable tse computer which can be used to perform
numerous Golay transforms. A microcomputer-based, tse computer control,
unit is described and used to define a basic instruction set for the
tse processor. Programs for performing the Golay transform skeleton-
izing and swelling algorithms are illustrated. In addition, the use
of advanced microprogramming techniques to define additional instruc-
tions for the tse computer is demonstrated.
A Pro g rammable Tse Computer
Figure 7.1 is a block diagram of a special purpose tse computer
which is designed to perform Golay transform operations on images
divided into three subfields. The machine consists of an arithmetic-
logic unit; (ALU), an accumulator latch (A), two , general purpose
latches (B and C), an index recognition latch (I), an output latch
(0), a contractor, an index recognition circuit, and a control unit.
The ALU includes a latch which temporarily stores the result of an
ALU operation when the accumulator or a general purpose latch is
being used as both a source and a destination register for the current
Figure 7.1 A special purpose tse computer organization. .a
145
operation. This prevents undesirable race conditions from developing
in the tse processor.
The accumulator and general purpose registers, B and C, serve
as both source and destination registers. Latch 0 can only function
as a destination register, and latch I can only function as a source
register for ALU operations. In the ALU operations which require
two operands, the accumulator is entered in the right side of the ALU
as one operand, while the output of latch B, C or I is entered in the
left side of the ALU as the second operand. The ALU is capable of
performing the AND, EXCLUSIVE-OR, and COMPLEMENT operations. Images
can also be gated through the ALU to the input of any destination
register. Since the registers are OR latches, the OR operation is
normally performed by gating one operand through the ALU and ORing
that image with the second operand which is stored in the destination
register.
Two independent mask generator circuits of the type illustrated
in Figure 7.2 are included in the ALU. An image mask for any of the
three possible subfields or any combination of those subfields can be
created by controlling the states of the three tse read--only-memories.
For example, if M1, M2, and M are all active, the output image will
be M3. The ALU is organized so that these mask tses can be ORed or
ANDed with the ALU input images from the source registers. This pro-
vides the capability of performing ALU operations on entire images or
only on selected subfields of the images. The result of each ALU
operation is tested by the contractor which detects all zero tses.
In addition to the ALU, the tse computer utilizes an index
M] M2
M +
146
Figure 7.2 A three subfieid mask generator circuit.
d
4j
147
recognition network consisting of a Golay neighbor planes generator
and a comparison type index recognition circuit. This function
could be implemented as a programmed ALU operation using individual
logical and slide operations but would then require extremely long
execution times. The hardwired comparison type index recognition
circuit provides an effective trade off between the hardware cost
and the speed of the Golay transform tse computer while preserving
the ability to recognize all fourteen possible indices. The
accumulator is the source register for all index recognition operations.
Latch I is the destination register.
A schematic of the tse logic used in the Golay transform computer
is illustrated in Figure 7.3. The hardware cost function for this
machine is 97A+69B. No random access tse'memory is included in the
computer organization because of the high hardware cost of tse memory.
Also, due to the flexibility of the tse logic ALIT, external memory
requirements should be minimal. In most applications, only a serial
input image buffer memory would be required to synchronize the
variable length Golay image processing operation to the incoming
	 -
data. Additional general purpose registers could be added to the tse
computer organization if they are needed for temporary storage
of partial results from a complex transform operation. This would
require less hardware than adding external memory to the tse computer.
Tse Computer Control Unit
Theoretically, a computer control unit should be able to
produce the optimum control bit sequence for performing any operation
. ,:zODUCIBILITY OF THE
^; I ,NAL PAGE IS POOR
PSI	 PS8
Figure 7.3 Tse logic for the Golay transform tse computer.
149
which is feasible with the register and ALU organization of the con-
trolled computer. In practice, this degree of flexibility can only be
approached by utilizing a microprogrammed control unit which contains
the control bit combinations in microinstructions that are read from a
control memory. Groups of microinstructions form microprograms that
f
control the execution of each macroinstruction. Thus, microprogrammed
computers require a small computing section within the central proces-
sing unit (CPU) to execute the microprograms. A block diagram of the
organization of a typical microprogrammed control unit is shown in Fig-
ure 7.4. The cycle time of a microprogrammed control unit must be sig-
nificantly faster than the minimum cycle time of the main computer for
the microprogrammed control unit to be efficient. In conventional com-
puters this requirement places severe restrictions on the design of the
control unit. The trade off between the complexity of the control unit
and the time required to decode and execute each instruction must be
carefully considered in the design. Since the projected propagation
delay of tse logic devices is several orders of magnitude longer than
the propagation delay of standard bipolar and MOS logic, a complex con-
ventional logic control unit can potentially provide efficient control
of a tse logic ALU.
A tse computer control unit that provides the benefits of micro-
programming and permits the use of conventional semiconductor memory for
tse computer program storage has been designed. The control unit is
based on RCA's CDP 1802 COSMAC microprocessor [19]. Figure 7.5 is a de-
tailed block diagram of the control unit. The 1802 microprocessor was
chosen over other currently available devices because of several unique
architectural features which enhance the input-output (I/0) and control
150
MAIN COMPUTER LOGIC
Figure 7.4 Organization of a conventional
microprogrammed control unit.
i
..	 j
4628	 —	 •	 i
—^---i j—^ 1107
II
COSMAt
I	 '^^
-512.8
	 ^!	 MICROPROCESSCR i	
-	 BUS
OUTPUT
ROM	 I802 PORT 1
i
PULL—UP
1837 I 1852^ RESIStORS
	
I
B-22KR
	 I I101—^
DATA +Bus
12 MUe
10 M f2
CPU STATUS
PLAO AND
HIGH i
LINES
EXTERNAL INPUT
BYTE
REQUEST
ADDRESS
(8)	 ADR
6U5
LATCH O INPUT ENABLE18'
1852
^ I I
N LINES	 .
MICROFROGRAM AND APFLICAt[Ot1 j	 NESI	 LS N "--1	 DECODER i.---- CPU CONTROL AND
ACKNOWLEDGE FLAGS
OUTPUT
• • •	 PONT T
1852
IfD7
7^7I	 1K.1	 d	 IKz1
RAM
• • R RAM
1821 1821
x-	 -
Figure 7.5 Block diagram of the tse computer control unit. 	 Ln
152
capabilities of the device. These features include on-chip 1/0 and the
use of multiple program counters. A brief summary of 'the architecture
and instruction set of the 1802 is provided in Appendix C.
The 1802 microprocessor controls the tse components by outputting
control bytes and executing internal timing loops to account for the
propagation delays of the tse logic devices. Flag input ER monitors
the output of the contractor to detect all zero tses at the output of
the ALU. The EF2 flag is used for external input requests. These re-
quests are acknowledged by setting bit three of output port five. In-
put images are gated into latch A under control of the Q line. At the
end of the execution cycle for major tse instructions, EF3 is tested
for an external request to halt tse processing. This feature can be
used to single step the tse computer through the more complex tse in-
structions. Conventional techniques [19] can be used to singla step the
1802 microprocessor through the simpler tse instructions and the micro-
programs themselves. Table 7.1 summarizes the function of each major
tse control signal.
Tse Computer Instruction Set
An instruction set consisting of all the standard CDP 1802 in-
structions plus 26 generic tse instructions with over 1300 variations
has been developed for the special purpose Golay transform tse com-
puter. The 1802 instruction set is given in Table C.1, Appendix C.
COSMAC instructions can be used alternately as macroinstructions or
as microinstructions. Table 7.2 suiiu»arizes the basic tse instruction
153
TABLE 7.1
TSE COMPUTER CONTROL SIGNAL FUNCTIONS
Name dumber of Bits Function
EF1 l Detects All Zero Tses At ALU Output
EF2 1 External Input Request
EF3 l External Request to Halt Tse Processing
Q 1 Controls External Input to Latch A
Output Port 1	 8 ALU Control
Output Port 2	 8 Source Register Output and ALU Control
Output Port 3	 8 Destination Register Input and
Feedback Control
Output Port 4	 8 Index Recognition Circuit Control
Output Port 5	 1 Acknowledge Input Request
Output Port 5	 1 ALU Control
Output Port 7	 2 ALU Output Latch Control
154
TABLE 7.2
TSE INSTRUCTIONS
Instruction
	
Mnemonic
	
Operation
REGISTER OPERATIONS
MOVE REGISTER TO REGISTER TMOV REGI,REG2 (REG2)-+REGI
(SEE TORT)
MOVE IMdEDIATE TO REGISTER TMVI REG,MIASK (MASK)-+REG
CLEAR REGISTER I TCLRI 0>I
LOGIC OPERATIONS
AND REGISTER TO A TAND REGI, REG2. MASK [(FWK)+( REG2)]- ( A).^REG1
AND A TO REGISTER TANA REGI,REG2J4ASK E (FM5K)+(A)1-(REG2)-REGI
AND IMMEDIATE TO REGISTER TANI REGI.REG2,M14SK (MtASK)•(REG2)4REG1
AND HRU SUR TO A TANAN REGI,REG2 ,M111SK E lASK , IIE_G2T 1•( A),REGI
AND A TO REGISTER TANAN REG',REG2,MASK E(M1ASK •(A)]•(REG2)->REGI
OR REGISTER TO A TOR REG,MASf E(MASK)•(REG)]+(A)*A
OR A TO REGISTER TORA REG,MASK E(MDASK)•(A)]+(REG)-REG
OR IMIMEDIATE TO REGISTER TORI REG,M1ASK (M4ASK)+(REG)-•REG
OR	 EGISTER TO A TORN REG.MASK E(^lRSK)+(REG)1+(A)-;A
OR A TO REGISTER TORAN REG, MASK E(MTSK)+(A)]+(REG)-+REG
EXCLUSIVE-OR REGISTER TO A TXOR REGI.REG2J% K [(fiASK)•(REG2 )1m(A)-REGI
EXCLUSIVE-OR A TO REGISTER TXRA REGI.REG2.MASK E(MASK)•(A)JCD(R£G2)>REGI
EXCLUSIVE-O
REGI
R 	 TO REGISTER I REGl.REG2J1ASKj
TCNR REGI,REG2 jM K (MDAS4:)	 (REG2)	 REGI
I
1
155
TABLE 7.2 (continued)
Instruction
	 Mnemonic	 Operation
INDEX RECOGNITION
IDENTIFY BASIS POINTS WITH 	 TIDA X	 (I)+(ID)+i
A SURROUND OF INDEX X Where ID is a tse with 1's in
positions which correspond
to basis points of A which
have a surround of index X.
COMPARE OPERATIONS
CONTRACT REGISTER TCNT REG,MASK (MASK)•(REG);IF=O, O+RF
IF#0,	 1-*RF
TEST REGISTER TTEST REG,MASK (ff-SK)+(REG)-,IF=O, O+RF
IF#O, I+RF
COMPARE REGISTER TCMP REG,MASK [(14A5K)•(REG)]e,[(MASK)•(A)I,
IF RESULT=O, O+RF
IF RESULM , 14RF
COMPARE IMMEDIATE TCPI REG,MASK (MASKI(D(REG); IF(REG) = (MASK), O-RF
IF REG 0 MASK), 14RF
TSE BRANCH OPERATIONS
TSE SHORT BRANCH ON ZERO TBZ ABR IF RF=O,M(R(P))-+R(P).0
ELSE R(P)+1
TSE SHORT BRANCH ON NO ZERO TBNZ AOR IF RF^O,M(R(P))-R(P).O
ELSE R(P)+1
TSE LONG BRANCH ON ZERO TLBZ ADR IF RF=0,34(R(P))4R(P).1
M R(P +1)-*R(P).0
ELSE R(P)+2
TSE LONG BRANCH ON NO ZERO TLBNZ ADR IF RF=1,M(R(P))-R(P).1
14(R(P)+1)-►R(P).0
ELSE R(P)+2
INPUT/OUTPUT OPERATIONS
INPUT TSE	 TIN	 INPUT TSE4A
OUTPUT TS'	 TOUT REG	 (REG),O
(SEE TMOV)
i
,
156
set which was created by using the 1802 instructions'as microinstruc-
tions. The dummy arguments REG, RGGI, REG2, and MASK should be
replaced by the appropriate arguments from Table 7.3 when programs
are written using the COSMAC or tse instructions.
Tse instructions are divided into six basic groups consisting
of register operations, logic operations, index recognition operations,
compare operations, branch operations, and input-output operations.
The register operations facilitate movement of tse data within the
computer. As is the case in all tse instructions, register 0 cannot
be specified as a source register, and register I cannot be specified
as a destination register. Logic instructions provide the operations
which are necessary to perform Golay functions. In general, register
A cannot be specified as REG2 when register A is an implicit operand
in the logic operations. The contents of any source register can be
tested using the compare operations. These instructions are typically
used to determine whether or not an image was altered by the lass:
iteration of a Golay transform. The branch instructions provide
a method for testing the results of a tse operation and for performing
conditional operations. Both short branches, which are limited to
the current memory page, and long branches, which can specify any
memory location, are included in the instruction set. Note that the
tse branch instructions depend on the contents of R(F).0 which is set
to one or zero during each compare operation. The standard 1802
instructions provide additional branching capabilites. For example,
the B2 and BN2 instructions are use:i to test for external input
requests.
{
157
TABLE 7.3
REGISTER AND MASK CONSTANT DEFINITIONS
Name SymhoI Decimal Value
COSMAC REGISTERS
REGISTER 0 RO 0
REGISTER I R1 1
REGISTER 2 R2 2
REGISTER 3 R3 3
REGISTER 4 R4 4
REGISTER 5 R5 5
REGISTER 6 R6 6
REGISTER 7 R7 7
REGISTER 8 R8 B
REGISTER 9 R9 9
REGISTER 10 RA IQ
REGISTER 11 RB 11
REGISTER 12 RC 12
REGISTER 13 RD 13
REGISTER 14 RE 14
REGISTER 15 RF 15
TSE REGISTERS
REGISTER A A 225
REGISTER B B 210
REGISTER C C 180
REGISTER 0 0 120
REGISTER I I 8
TSE MASKS
ALL ZERO MASK MO 0
ALL ONE MASK M 12
SUBFIELD ONE MASK M1 9
SUBFIELD TWO MASK M2 10
SUBFIELD THREE MASK M3 15
SUBFIELDS ONE AND M12 11
TWO MASK
SUB rIE..DS ONE AND M13 14
T^,EE MASK
SUBFIELDS TL-10 AND M23 13
THREE: MASK
168
.Microprogram Control of Tse Operations
When an application program is assembled for the tse computer,
each tse instruction generates a multiple byte operational code which
is actually a group of 1802 instructions and data bytes. This infor-
mation is used to produce the control signal sequence required to
perform the specified tse operation. Some tse instructions, such as
TCLRI and the tse branch instructions, require less than six bytes of
microcode to control their execution. The microcode expansions of
these instructions are used directly as their operational codes.
Thus, these instructions are self--contained in the sense that they
do not require a separate microprogram to control their execution.
The remaining tse instructions require somewhat longer control programs
which are executed as called subroutines. The subroutine call is
performed by an 1802 set P instruction which is used as the first
byte of the tse instruction operational code. Control signal sequences
are specified by the remaining bytes of the operational code. Control
microprograms always return with a SEP R3 instruction since all tse
computer applications programs will use R(3) as their program counter.
Figure 7.6 is a flow chart for the general ALU operations control
program which is listed in Figure D.1, Appendix D. This microprogram
controls the execution of all the tse register and logic operations
except TCLRI. These instructions require a six byte operational code.
The first byte is a SEP R9 instruction which calls the general ALU
operations microprogram. The remaining five bytes contain the ALU mask,
tse register output, port six, tse register input number one, and tse
register input number two control bytes which are output by the ALU
I
e	 i
START
OUTPUT MJISK CONTROL BYTE TO PORT 1
OUTPUT REGISTER OUTPUT CONTROL BYTE TO PORT 2
AND OUTPUT THE PORT 6 CONTROL BYTE.
ALL FAMI THE OP CODE AT M(R(3))
DELAY 9 GATE DELAYS
L
UTPUT 1 TO PORT 7 TO TURN THE
ALU OUTPUT AND GATE OFF
DELAY 2 GATE DELAYS
LOUTPUT 0 TO PORTS 1, 2 AND 5
OUTPUT TSE REGISTER INPUT CONTROL BYTE NO. 1
FROM THE OP CODE AT M(R(3)) TO PORT 3
DELAY 2 GATE DELAYS
OUTPUT TSE REGISTER INPUT CONTROL BYTE N4. 2
FROM THE OP CODE AT M(R(3)) TO PORT 3
DELAY 2 SATE DELAYS
PUT 0 INTO R(F).O
 YES
ZERO TSE	 qO PUT i INTO R(r-),I
RESULT
OUTPUT 11110000 TO PORT 3 TO TURN ALL TSE
DESTINATIO14 LATCH INPUTS OFF UITH ALL FEEDDACK
PATHS ON.OUTPUT 0 TO PORT 7 TO TURN THE ALU OUTPUT
LATCH OFF
EXTERNAL	 ,YES I SET BIT 3 OF
INPUT	 PORT 5 TO
REQUEST	 ACKNOWLEDGE REQU
kb
BRANCH TO INPUT
SERVICE ROUTINE
HALT
TSE
YES	 PROCESSING
y
uR ETRN
; TJCk»ZL1TY OF THL
.: J
. ; LL WAGE M POOR
159
Figure 7.6 A flow chart for the general ALU operations control program.
t160
operations microprogram to control execution of the specified tse
instruction.
The general ALU operations control program utilizes the versatile
1/0 capability of the 1802 to minimize the complexity of the control
microprogram. Control bytes which vary from instruction to instruc-
tion within the tse register and logic instruction classes are output
directly from the application program memory space addressed by R(3).
This eliminates the need to specifically decode the individual tse
register and logic instructions before initiating their execution.
Constar,: control bytes are output as immediate data from the control
microprogram 1'.o minimi ze the length of the tse instruction operational
codes. The 1802 microprocessor is particularly efficient at performing
these tasks because the output data pointer, R(X), is automatically
incremented during each output operation, and the register which is
assigned as the data poi.iter can be changed by a single set X instruc-
tion.
Time delays are included in the microprogram to account for the
relatively long propagation delay of the tse logic devices. The
;long delay subroutine MLDLY listed in Figure D.2, Appendix D, is called
by the ALU operations microprogram to create the time delays. A
standard subroutine call and return technique [19] is employed, and
two data bytes are passed to MLDLY to specify the length of the delay.
The 1802 executes most COSMAC instructions in two machine cycles which
consist of eight machine states each. To simplify control signal timing,
a 3.2MHz clock frequency was chosen for the lb02 microprocessor. This
frequency permits the 1802 and associated components to be operated
I	 i
161
from a five volt power supply and provides exactly 2000 machine states
in five milliseconds. Thus, 1000 two-cycle 1802 instructions can be
executed within the propagation delay of a tse logic gate.
Figure 7.7 is a timing diagram for the tse register and logic
instructions which execute under control of the general ALU operations
microprogram. The execution time for these instructions is 19 gate
delays. Flags EF1, EF2, and EF3 are tested during the execution of
these instructions. At the end of the tse instruction execution cycle,
R(F).0 will contain z,.ro if the image at the output of the ALU was an all
zero tse. Otherwise, R(F).0 will be set equal to one. Rote that the
contents of R(F).O is not necessarily indicative of the result of a
tse OR instruction since the final OR operation is normally performed
at the destination register.
A listing of the tse c r:iputer compare operations control program
is provided in Fi gu y^e D.3, Appendix 0, and a flow chart for the micro-
program is sho,;,	 7.0". The tse compare instructions have
four byte or-. rat-' _._al codes which consist of a SEP PC instruction
and three ccntral bytes. Only three control bytes are required
because there is no destination register. With this single exception,
the tse compare operations microprogram performs essentially ho same
function as the general ALU operations microprogram. Tse ; 	 .re
instructions execute in 15 rate May.- (F'icure 7.PP.,'.
One of the most complex tse computer opora:_ cn iz-^
the index recognition i nstruction. The i ndL rec	 i c . ccT, L ,^ o l
microprogram is listed in Figure U.4, fppa:idYx c. Figure 7.10 is a
flow chart for this microprogram. The inde: .' recognition, instruction
P11-P18 ,x
I
162
Q,
P&I -1134
K, r^ _ ^ _ ^^ Yom• r i^ ^ rn tir 	^ ^	 r	 ^ r. w aY^ _
P41-P48
P61 ,x.-.
P71
P72
SAMPLE
FLAGS
TIME
0	 10	 15	 20
Figure 7.7 A timing diagram for the tse register
and ALL operations (except TCLRO .
163
START
OUTPUT MASK CONTROL BYTE TO PORT 1
OUTPUT REGISTER OUTPUT CONTROL BYTE TO PORT 2
AND OUTPUT THE PORT 6 CONTROL BYTE
ALL FROM THE OP CORE AT II(R(3))
DELAY 9 GATE DELAYS I
OUTPUT 1 TO PORT 7 TO TURN
THE ALU OUTPUT GATE OFF
DELAY 6 GATE DELAYS
ALL
ZER
PUT 0 INTO R(F).O	
YES	
TSEO
	
NO 	
PUT 1 11M,R(F).0
ESULT	 --
OUTPUT 0 TO PORTS 1. 2. 6, AND 7 TO
TURN THE TSE COMPONENTS OFF
L
', YES	 SET BIT 3 OF PORT 5
INPUT
REQUEST TO ACKNOWLEDGEREQUEST
	 -	 -
NO
BRANCH TO INPUT
SERVICE ROUTINE
YES	 HALT
TSE	 \
PROCESSING
NO
RETluR-N
I
i
Figure 7.8 A flow chart For the tse compare operations control program.
Pit-P18 ,x•
P21--P28
P31--P34
P35-1238
Phi-P48
P61
P71
P72
SAMPLE
FLAGS
TIME
0	 5	 10	 15
Figure 7.9 A tinting diagram for the tse compare operations.
364
F
166
r
START
1
GET INDEX WEIGHT
BYTE FROM OP CODE
INTO RB
GET INDEX BYTE INTO D
SET BIT 7 AND PUSH ONTO STACK
OUTPUT FIRST PHASE OF INDEX BYTE
FROM OP CODE AT H(R(3)) TO PORT 4
NO	 INDEX;  BYTE
YES
OUTPUT 0 TO PORT 4 TO
TURN INDEX RECOGNITION
MASKS OFF
r	 I
EXTERNAL	 YES	
SET BIT 3
INPUT
OF PORT 5 TO
REQUEST ACKN014LEDGE REQUEST
NO
BRANCH TO INPUT
SERVICE ROUTINE
DELAY 4 GATE DELAYS
OUTPUT SECOND PHASE OF INDEX BYTE
FROM THE STACK TO PORT 4
DELAY 4 GATE DELAYS
DECREMENT STACK AND OP CODE POINTER,;
HALT
YES	 TSE
PROCESSING
RETURN
OUTPUT THIRD PHASE OF INDEX BYTE
FROM OP CODE AT M(R(3)) TO PORT 4
DECREMENT WEIGHT COUNTER
DELAY 1 GATE DELAY
Figure 7.10 A flow chart for the index recognition control program.
i166
operational code varies between three and eight bytes in length
depending upon the weight of the index. Each index recognition instruc-
tion begins with a SEP RA. The second byte of the operational code
specifies the weight of the index, and the remaining bytes provide
the control bits which are output to drive the index recognition masks
I
and enable register I. The six least significant bits of the control
bytes correspond to the six Colay neighbors of a basis point but have
the complementary logic state.
Index recognition images are ORed into register I so that multiple
indices can be easily recognized. The index recognition instruction
has a variable execution time consisting of nine gate delays for each
orientation of the specified surround. A timing diagram for recognizing
an index with a weight of two is shown in-Figure  7.11. Most indices
have a weight of six and can be recognized in 54 gate delays.
A special TCLRI instruction is provided for clearing the contents
of the index recognition register before each set of index recognition
operations. TCLRI is a self-contained instruction which turns the
index recognition latch off by executing an 1802 OUT 4 instruction with
an immediate data byte of zero. Two unit gate delays are inserted at
the end of the TCLRI instruction by calling LDLY (Figure D.5, Appendix
D). This insures that the index recognition latch will clear before
the next index recognition operation. The TCLRI operational code is
five bytes long.
The tse input instruction has a one byte operational code, SEP RD.
Figure D.6, Appendix D, is a listing of the microprogram which controls
the execution of the tse input instruction and Figure 7.12 is a flow
S
j4
1
P
061' . SO 44B LOM a qq Lri xapu L uQ
BULZLu6030.A JOS UIPAIGLP 6uLWL4 d LL'L 0an6LA
SL	0^	0
3Wil
SJ^f7d
BUMS
ZLd
LLd
--	—	-	
Wd
8-0d
Ltd
9t^d– Wd
29d-5Ed
	
-	ti£d--LEd
s
9zd— ^Zd
Ud– L ki
0
L9E
s	 `
168
5TART^	)
SET Q TO ENABLE A TSE INPUT
r	 ^
OUTPUT 11100000 TO PORT 3 TO
TURN THE REGISTER A FEEDBACK
PATH REFORMATTER OFF
DELAY 3 GATE DELAYS
OUTPUT 11110000 TO PORT 3 TO
TURN THE REGISTER A FEEDBACK
PATH REFORMATTER ON
DELAY 2 GATE DELAYS
RESET Q TO DISABLE TSE INPUT
EXTERNAL	 YES 
r SET BIT 3 OF PORT 5PUT
INQUE	 TO ACKN0WLEDGE
REQUEST 	 i	 REQUEST
	
N0	 ^^ ._
BRANCH Ti INPUT
i SERVICE ROUTINE
YES	 HALT
TSE
ROCESSING^
NO
R
Figure 7.12 A flow chart for the tse input Control program.
i
i169
chart for . the microprogram. The Q line is set to enable an external
tse input to latch A, and the feedback path reformatter is turned off
to clear the register (Figure 7.13). After three gate delays, the
feedback path is re-enabled, and following two additional gate delays,
Q is reset to disable the external tse input. Execution time for the
tse input instruction is five gate delays. Both the EF2 and the EF'
flags are tested. The tse output instruction, TOUT, is a special case
of the TMOV instruction and is controlled by the general ALU operations
microprogram.
Tse branch instructions are self--contained operations which per-
form an 1802 GLO RF instruction followed by the appropriate 1802 branch
instruction (BZ, BNZ, LBZ, or LBNZ). The short tse branches require
a three byte operational code, and the long tse branches require a
four byte operational code. Tse branch instructions execute in essen-
tially zero gate delays.
Table 7.4 summarizes the important characteristics of the basic
tse instruction set which has been developed for the Golay transform
tse computer. The efficiency of the 1802 microprogram control technique
is indicated by the fact that the control routines listed in Appendix D 	
o
require only 238 bytes of memory.
	 "I
A Crass-Assembler; for the Tse Computer
A cross-assembler has been written to aid in the development of
tse computer applications programs. The tse computer crass-assembler
is a macro library which can be used in conjunction with Digital Equip-
ment Corporation's RT-11 MACRO assembler [20] and a PDP 11/40 minicom-
P71
170
Pti -Pia
P21--P28
P31-P34
P35-P38
P41-- P48
P61
P72
SAMPLE
FLAGS
TIME
Q	 5
Figure 7.13 A timing diagram for the tse
input operation.
.3
SL-
TABLE 7.4
TSE INSTRUCTION CHARACTERISTICS
Instruction	 Op Code	 Execution	 Control
Class	 Length in Bytes Time in Gate	 Microprogram
Delays	 Program Counter
Register (except TCLRI) 6 19 R9
TCLRI 5 2 -
Logic 6 19 R9
Index Recognition 3-8 9-54 RA
Compare 4 15 RC
Branch 3-4 O --
Input 1 5 RD
Output 6 19 R9
a
172
puter to assemble both microprograms and applications programs for
the tse computer. The RT-11 MACRO assembler features conditional and
macroassembly capabilities [20] which are utilized in assembling pro-
grams for the tse computer. A brief summary of the RT-11 MACRO assem-
bler commands is provided in Appa^,ilix E. Interested readers can refer
to Korn [21] for a general discussion of macros and conditional assem-
bly.
The tse computer macro library defines all of the COSMAC and tse
instructions as well as the pseudo and delay instructions listed in
Table 7.5. A complete listing of the tse computer macro library is too
long to be included here. However, a representative subset of the macro
library is listed in Appendix F. Each macro definition specifies the
symbolic name of an instruction. Operational code bytes for the instruc-
tion can be calculated by the macro assembler if they cannot be specified
as constants. Both logical operations and conditional tests are utilized
in computing the calculated operational codes. Since the PDP 11/40
k
utilizes 16 bit words, the BYTE operation is used to truncate the words
to eight bits. Many of the macro definitions include a call to another
macro which will test for illegal conditions. For example, the short
branch instruction macros call $$$PAG, which checks for an illegal
branch across page boundaries.
New instructions can be added to the repertoire of the tse computer
by defining additional macros for them. As an example, consider a tse
mix operation that combines subfields from the accumulator and another
source register to form the resultant image. 1ilis operation is given
the symbolic name TMIX and assigned three arguments. The first argument
^I
TABLE 7.5
SPECIAL TSE COMPUTER INSTRUCTIONS
Instruction
	
Mnemonic	 Operation
PSEUDO PER I t
ORIGIN	 ORG	 Specifies program starting address
ENO	 END	 Marks the end of a source program
DATA BYTE	 DB	 Places a data byte in the object file
DATA WORD	 DW	 Places a data word (two bytes) in the
object file
DATA STORAGE	 DS	 Reserves a set of memory locations
for data storage
MACRO OPERATIONS
^ 73
I
SUBROUTINE CALL	 CALL
RETURN FROM RSR
e.	 SUBROUTINE
LOAD IMMEDIATE, LOAD
REGISTER
LONG DELAY, LDLY
RETURN TO R (3)
LONG DELAY, MLDLY
RETURN TO CALLING
PROGRAM COUNTER
DELAY 3 DLY3
DELAY 4 DLY4
DELAY 6 DLY6
DELAY 9 DLY9
DELAY 10 DLY10
DELAY 12 DLY12
DELAY 15 DLY15
DELAY 18 DLY18
Sets P to 4 to initiate the standard
subroutine call procedure
Sets P to 5 to initiate the standard
subroutine return procedure
Loads the given 16 bit constant into
the specified COSMAC register
Delay 8N+22 cycles
Delay 8N+30 cycles
Delay 3 cycles
Delay 4 cycles
Delay 6 cycles
Delay 9 cycles
Delay 10 cycles
Delay 12 cycles
Delay 15 cycles
Delay 18 cycles
,OI}[ICIBTLITY
'
-JGINAL PAGB IS POOL
174
is REG1 which specifies the destination register. The second argument,
REG2, identifies the source register that is to be mixed with the accu-
mulator. A third argument, MASK, is required to specify the subfields
of REG2 that should appear in the resultant image. The switching
expression for this operation is
[(MASK)•(REG2)1+[(MASK)•(A)a-^REG1. 	 (7)
Figure 7.14 provides a listing of the macro definition for the TMIX
instruction.
The TMIX instruction is written to execute with R(3) as the pro-
gram counter. Mask, register output, and port six control bytes are
output as immediate data to gate the specified subfields for REG2 into
the ALU output latch. A long delay subroutine, LDLY, (Figure D.5,
Appendix D) is then called to insert a delay which allows the image
to propagate through the ALU. After nine gate delays the ALU output
image will consist of the specified subfields of REG2 and zeros in the
unspecified subfields. This image is temporarily stored in the ALU	
4r
output latch, instead of being sent to the destination register.
To complete the TMIX operation, the contents of register A
should be ANDed with the complement of the subfield mask specified in
the TMIX macro call and QRed with the current contents of the ALU out-
put latch. Then the resultant image should be stored in the desti-
nation register, REG1. These operations can be executed under control
of the general ALU operations microprogram (Figure D.1, Appendix D).
A SEP R9 instruction byte is included in the TMIX instruction opera-
tional code to initiate the microprogram call. Conditional assembly
41
I. MA 1RO TM I n REr31, REG2, MASK	 ; MIX REGISTER WITH A
. '4L I ST SRC
BYTE ^0441	 ; OUT 1
.BYTE MASK	 ; MASK CONTROL BYTE
. DYTE '^0142
	 ; OUT 2
BYTE <''0017&REG2>	 1 REGISTER OUTPUT CONTROL BYTE
.DYTE '10146	 ; OUT 6
.DYTE ^L-'•00000001
	 PORT SIX CONTROL BYTE. P61 ON
. T:YTE ^0D27
	 ; LDLY 2247.
r3;•'►
	
4307 	 DELAY NINE GATE DELAYS
. BYTE ^0147	 OUT 7
.BYTE •^L00000011	 TURN ALU OUTPUT LATCH ON
. EYTE "VJ2 7
	
; LDLY 9',97.
UW 174;	 DELAY FOUR GATE DELAYS
DYTE '0147	 ; OUT 7
. BYTE "B00000001	 TURN AND GATE AT OUTPUT OF ALU OFF
. BYTE '^0=.127	 ; LDLY 247.
DW 0067	 ; DELAY ONE GATE DELAY
. r^  r E "033 1	 SEP R9
_IIF Eck, MAST , BYTE ^0300	 CONDITIONAL MAStC
. IIF E n, MASK -^0144 BYTE ^0000 ; CONTROL. BYTES
I	
Z9,
IF EQ, MASK-^00
 ^^y
11, BYTE "0: 24
I IF	  1 1 Y'1 S 1+."" .+^00 12 BY i•E^0340
. IIF E n, .I"'St4-°^0017, BYTE •^0260
l I F EQ%, M A20I ''x3013 BYTE "f]^6Q
i IF EQf N A'2.1K--'`0016 ► BYTE 0240
. I s F EQ, nSAS IK--'`00 15, BYTE "0220
. BYTE "B10,000001	 ; REGISTER OUTPUT CONTROL BYTE
. r `dTE ' B00000001
	
PDRT SIX  CONTROL EY i'E. f G ON
.BYTE REG1	 REGISTER INPUT CONTROL BYTE NUMBER 1
.BYTE	 REGISTER INPUT CONTROL BYTE NUMBER 2
.LIST  Rr-;
. ENOM T M x X
Figure 7.14 A macro definition for the ` 14IX instruction.
6
176
statements are used to establish the correct mask byte. The register
output, port six, and register input bytes required by the general
ALU operations microprogram are also included in the TMIX instruction
operational code.
A timing diagram which illustrates the control signals for the
TMIX instruction is shown in Figure 7.15. Thirty-four gate delays
are required to execute the TMIX instruction. This operation could
be performed by a sequence of basic tse instructions. For example,
TANI C,B,M1
TORA C,M23
is equivalent to
TMIX C,B,M1 .
The advantage of using the mi crop rogrammed TMIX instruction is a
reduction in execution time.
Application Program Examples
The tse computer can perform a variety of Golay transforms. Fig-
ures G.1 and G.2, Appendix G, are listings of tse computer programs
for performing the Golay transform skeletonizing and swelling algorithms,
respectively. -rho skeletonizing program is 129 bytes long and requires
607 unit gate delays to process a simple image. The swelling program
is slightly more complex but requires only 147 bytes of program storage.
One iteration of the swelling algorithm can be performed in 664 unit
gate delays. Neither of these programs utilize tse register B. There-
fore, register B can be used for temporary storage of Another image.
The simplicity of these programs is indicative of the power of the tse
computer instruction set and CPU organization.
Q1
P11-1318 fx	 ^'^.
P21 --P28
P31--P34
1335-1338
P41--P18 _.-------
	
- ---- ----,---	 .,-- _	 _ ^_
P61
P71
P72
SAMPLE
FLAGS
	
TIME r	 i	
-	 i	 i	 --
	
0	 5	 10	 15	 20	 2^	 30
Figure 7.15 A timing diagram for the TMIX instruction.
1	 e
'r
i
V
178
Tse Computer performance_ Evaluation
Classically, the evaluation of computer performance has proven
to be a difficult problem which is highly dependent upon the task
that is assigned to the computer. At this time, the performance of
the special purpose Golay transform tse computer can best be evaluated
by considering the skeletonizing algorithm. Table 7.5 summarizes the
performance characteristics of the tse computer in the skeletonizing
machine application.
The low hardware cost and relatively low data rate of this machine
are primarily due to the use of a comparison type index recognition
circuit. The ability of the tse computer to perform a variety of
Golay transforms under program control is the primary advantage of
the machine.
TABLE 7.6
PERFORMANCE OF THE TSE COMPUTER AS A SKELETONIZING MACHINE
Cost	 Number of Total Gate Delays Data Rate Average Peak Speed-
Function	 Control Component per Iteration Simple Poorer Power Power
Signals Count (Time in images Consumption Consumption Product in
Seconds) per Minute in Watts in Watts Watt--
Seconds
97A+69B	 35 166 607 19.77 204.96 222 622.07
(3.04)
R
v
180
CHAPTER 8
CONCLUSION
The tse logic devices proposed by Schaefer and Strong D] have
been used to develop hardwired, pipelined, and programmable architec-
tures for Golay transform processing machines. These machines illus-
trate that tse logic circuits can perform useful image processing
algorithms which have not been optimized for tse logic processing.
The Ivey step toward performing Golay transforms with tse logic was
the development of the Golay neighbor planes generator circuit. This
circuit facilitates tse logic implementations of the ifldex recognition
operation. Because the hardware cost and processing rate of a Golay
transform machine arE highly dependent upon the index recognition
circuit, several alternate realizations of the index recognition opera-
tion were developed. In addition, several new tse logic devices were
proposed, and a set of performance evaluation parameters was developed
to aid in the design of tse logic circuits. Techniques were also de-
veloped for controlling tse logic circuits with conventional logic con-
trbl units constructed from a microcomputer ar from programmable logic
arrays.
A Critique of Tse Logic.
The major advantages of tse logic devices are their ability to
operate on a large number of data points simultaneously and the high
data processing rates which can potentially be achieved due to this
parallelism. The fan-in and fan-out limitations of the devices are
r	
181
drawbacks since they tend to increase both the hardware cost and
propagation delay of a tse logic circuit. Perhaps the most serious
disadvantage of current tse logic circuits is the large number of
device-to-device interconnections which are required. In conventional
integrated logic circuits, manufacturing costs and failure rates
both increase substantially with any increase in the number of external
r
	connections required by the circuit. Although improved interconnection
techniques will have to be developed for tse logic circuits, there is
currently no evidence to support a theory that the cost and reliability
of tse logic circuits will not be heavily dependent upon the device-
to-device interfaces. In fact, since the same basic integrated circuit
technology is used to fabricate both conventional logic and active
tse logic devices, the cost and failure rate characteristics of the
conventional devices can be expected to prevail.
Suggested Directions for Future Research
One method of reducing the number u-i device -to-device interfaces
in a tse logic circuit is to increase the functional complexity of the
individual tse logic devices. This technique has been used successfully
in conventional logic and should be investigated for possible use in
tse logic. A projection of the integrated circuit complexity, which
should be realizable by a -target date such as 1985, would aid tse device
designers in the task of partitioning complex logic functions into
individual circuits. As an example of the usefulness of this approach,
consider the potential advantage of an integrated tse latch or read-
write memory. Three active and two passive tse devices are currently
r
182
required to construct the simplest tse latch. An integrated tse
latch would reduce the number of device-to-device interfaces from
eight to two. In addition, the integrated tse latch could potentially
reduce the propagation delay, power consumption, and size of the tse
latch.
Although the electro-optical family of tse logic devices was
used for all the design examples in this dissertation, the basic
designs and design principles described here are not dependent upon
the signal transfer technique. Because of the low efficiency which
is characteristic of electro--optical interfaces, additional signal
transfer techniques should be investigated.
An alternate technique for achieving high speed parallel processing
by using arrays of microprocessors or programmable logic circuits
should also be investigated. The possible advantages of this technique
include a reduction in the number of different integrated circuits
which must be fabricated and reduced device-to-device interface
complexity. The possibility of a reduction in interface complexity
is projected because of the bus oriented structure of microprocessors.
As a long term project, research should be conducted on the use
of two--dimensional tse logic control units for tse circuits. If the
full potential of a completely two-dimensional computer can be realized,
a revolutionary advancement over the large scale computing capabilities
of today's computers could be expected.
183
REFERENCES
[11 D. H. Schaefer and J. P. Strong, Tse Computers, X-943-75-14,
Goddard Space Flight Center, 1975.
[21 S. H. Unger, "A Computer Oriented Toward Spatial Problems,"
Proceedings of IRE, vol. 46, pp. 1744-1750, Oct. 1958,
[3] S. H. Unger, "Pattern Detection and Recognition," Proceedings
of IRE, vol. 47, pp. 1737-1751, Oct. 1959.
[41 D. L. Slotnick, W. C. Borck, and R. C. McReynolds, "The Solomon
Computer," A,F.I.P.S. Proceedings, First Fall Computer Conference,
pp. 97-107, 1962 --
[51 D. Lewin, Theory and Design of Digital Computers, London: Thomas
Nelson and Sons LTD., 972, pp. 308-3 .
161 J. C. Murtha, Advances in Computers_, vol. 7, New York: Academic
Press, 1966, pp. 10-22. T
[71 "Solomon 11-Parallel Network Processor," Westinghouse Electric
Corp. Report No. 1869A, Aerospace Division, Baltimore, Maryland,
1964.
[81 G. H. Barnes, et. al., "The Illiac IV Computer," IEEE Trans, on
Computers, vol. C -17, pp. 746 -757, Aug. 1968.
[9] D. J. Kuck, "Il 1 i ac IV Software and Application Programming,"
IEEE Trans. on Computers, vol. C--17, pp. 758-770, Aug. 1968.
[101 M. J. E. Golay, "Hexagonal Parallel Pattern Transformations,"
IEEE Trans. on Computers, vol. C-18, pp. 733-739, Aug. 1969.
[111 K. Preston, "Feature Extraction by Golay Hexagonal. Pattern
Transforms," IEEE Trans. on Computers, vol. C-20, pp. 1007-
1014, Sept. 1971.
[121 B. Kruse, "A Parallel Picture Processing Machine," IEEE Trans.
on Computers, vol. C-22, pp. 1075-1086, Dec. 1973.
[131 C. D. Stamopoulos, "Parallel Image Processing," IEEE Trans, on
Computers, vol. C-24, pp. 424--433, April 1975.
[141 W. H. Ware, "TheUltimate Computer," IEEE Spectrum, vol. 9,
pp. 84-87, March 1972.
[151 Z. Kohavi, Switching and Finite Automata T	 New York: McGraw-
Hill, 1970 3 p.57.
184
[161 "Laser Used as Micro-Welder," Electro-Optical Systems design,
vol. 7, p,6, Jan. 1975.
[171 D. H. Schaefer, Personal communication.
[181 d. B. Peatman, The Design of Digital Systems., New York:
McGraw-hill, 1970, pp. 216-2z].
[191 User Manual for the CDO1802 COSMAC Microprocessor, RCA Solid
State Division, Somerville, N. ., 7976.
[201 RT-11 System Reference Manual, Digital Equipment Corporation,
Maynard, Mass., 1975.
[211 G. A. Korn, Minicomputers for Engineers and Scientists, New York:
McCraw-Hill, 1973, pp. 123-130.
e
1 85
APPENDIXES
186
APPENDIX A
SCHEMATIC SYMBOLS FOR T5E LOGIC DEVICES
Table A.1 lists the tse logic devices and their schematic
symbols. To simplify schematic drawings, three different symbols
are used for the interleaver. Note that the input and output surfaces
are reversed when the interleaver is used as a combiner rathor than
as a duplicator. Also, note that the slide gates move an image more
than one matrix position, a number should be included within the slide
date symbol to indicate the extent of the slide.
187
TABLE A.1
SCHEMATIC SY14BOLS FOR TSE LOGIC DEVICES
Device
	 Symbol
ACTIVE DEVICES
AND
OR +^
NEGATE O
EXCLUSIVE-OR
REFORMAT 0
TOTAL SPILLER
CONTRACTOR C N T
ROM
i
I
PASSIVE DEVICES
INTERLEAVER AS A COMBINER
188
FABLE A.1
(Continued)
Device	 Symbol
INTERLEAVER AS A DUPLICATOR
	 - — -
-----^-
	 --tom
SLIDE UP
	
su
SLIDE DOWN
	
5a
SLIDE RIGHT
	
SR
SLIDE LEFT
	
5L
_ nmiBiLi.EY OP THE
PAGE IS PM11.
189
TABLE A,1
(continued)
Devi ce
	 Symbol
-A B
EXCHANGE
	
L
IMAGE BUS-LONG
IMAGE SUS-SNORT
	 ^•
S
190
APPENDIX B
TSF MASK PATTERNS
Table B.1 defines the mask patterns for the tse read-only- memories
and programmed output active devices. The ROMs are programmed to
produce logic one outputs at the array positions specified as active
and logic zero outputs elsewhere. In the case of programmed output
active devices, the array outputs which are specified as active are
normal outputs which can produce a logic one or a logic zero state
that is a function of the inputs to the active device. The remaining
array outputs of the programmed output active devices are disabled
so that they always produce a logic zero output.
The notation G  is used to specify points in the 5 Golay subfields
of the array where the array is divided into three, four: or seven
subfields as specified by the type superscript, T. For example,
G1 a , is the set of all points in the second and third subfields of
an image where the image has been partitioned into three Golay subfields.
41
A,.
l 91
TABLE B.l
TSE MASK PATTERNS
Name Symbol Active Elements of an nxn
Array with General Element 
ai,
r	 `
ALL LOGIC ONE TSE M All
ODD TSE MASK MO aia where 1,0,2,4,6...
EVEN TSE MASK ME a,, where 1,1,3,5,7...
ALL LOGIC ZERO TSE MO None-
GOLAY SUBFIELD ONE MASK Ml
3
aijeG,
GOLAY SUBFIELD TWO MASK M2
3
au cG2
GOLAY SUBFIELD THREE MASK M3
3
aiicG3
GOLAY SUBFIELDS ONE M12
3
aiacGisz
AND TWO MASK
3
GOLAY SUBFIELDS ONE M13 aijcGl23
AND THREE MASK
GOLAY SUBFIELDS TWO M23
3
aijcGp53
AND THREE MASK
192
APPENDIX C
THE CDP1802 MICROPROCESSOR
i
The CDP1802 COSMAC microprocessor [191 is a byte-oriented
central processing unit constructed as a complementary-symmetry
MOS integrated circuit. A block diagram of the internal structure
of the 1802 (Figure C.1) shouts that the COSMAC architecture is based
on an array of 16 general-purpose 16-bit scratch-pad registers. These
general--purpose registers are connected to a common bus and can be
selected by the four-bit N, P, and X registers to perform specific
tasks. The scratch-pad registers can be used as program counters,
data address pointers, general-purpose counters, and temporary
data storage locations. High and low bytes of the scratch-pad
registers can also be gaited between the register array and the eight-
bit D register which functions as an accumulator.
One of the outstanding features of the COSMAC architecture is that
any of the scratch-pad registers can be used as the program counter.
This permits a very fast and efficient subroutine call which is
performed by a one-byte set P instruction that simply specifies a 	 -
new program counter. The tse computer control subroutines are called
by set P instructions.
Another important feature of the 1802 is a flexible input--output
structure. Four EF flags which can be used as one bit inputs are
included in the CPU. These flags can be tested by 1802 branch instruc-
tions and are used to check the status of the tse coilr,,ater. The 4 flag
functions as a single bit output which can be set, reset, and tested by
MEMORY VO COMMAND 110	 BI-DIRECTIONAL
I	 ADDRESS OR SERIAL DATA COM AND	 DATA pus
I
{8} O	 (1}
Mux 13,	 {B]
(8}	 {$1 (lE CONTROL N
(161
	 A LOGIC 14)
1161
P
{4}
(761 {4}
R(o Lt	 8101 .0 RSElECT R!D	 8111.1	 8111.0
DECK
T
RI21,1	 8121.0 {Bi
1161	 1 SCRATCH PAD
Nl'^1.1	 H(91 .0 REGISTERS x
RIA1.1 I RIA1.0
^
R {41
e
RIE1.1	 R{E}.0 1
R(E1.1	 R ( E ]A DF111	 O	 A{8}	 (88 1) 141
1161	 IBf
8-BIT pus
193
Figure C.1 Internal structure of the CDPI802
COSMAC microprocessor.
(Courtesy of Solid State Division, RCA Corporation)
14
7 94
the 1802 CPU. The tse computer input data path is controlled by the
Q line. In addition to this on-chip 1/0, the 1802 includes a set
of memory-oriented 1/0 instructions which are used to provide control
signals to the tse computer.
A summary of the COSMAC 1802 microprocessor instruction set is
	 r
given in Table C.I. The notation R(W) indicates the register designated
by W where W is N, X, or P. When the low order or high order bytes of
R(i'!) are referenced individually,  the notation R (W) . O refers to the low
order byte while R(W).1 denotes the high order byte. As an example
of the operation notation, the symbols
D-rM(R(X) ); R(X)-1
mean that the contents of register D are stored in the memory location
pointed to by the register selected by the current contents of X and
that the register specified by X is decremented by one.
Several features of the 1802 instruction set should be noised.
First, all COSMAC instructions except the long branch, long skips
and NOP instructions execute in two machine cycles consisting of eight
states each. The long branch, long skip, and NOP instructions execute
in three machine cycles. This feature simplifies the realization of
precisely -Limed control signals. Second, most of the 1802 instructions
require only a simple one--byte operational code which conserves memory.
Third, all the logic and arithmetic operations utilize the contents of
D and the contents of memory as operands. Data stored in scratch-pad
registers cannot be operated on by these instructions. Finally, mote
that the memory address required by the 7/0 instructions is obtained from
195
TABLE C.1
CDP1802 MICROPROCESSOR INSTRUCTION SET (191
Instruction Mnemonic Operation
MEMORY REFERENCE
LOAD VIA N LDN M(R(N))-^D; FOR N NOT 0
LOAD ADVANCE LDA M(R(N))-}D; R(N)+l
LOAD VIA X LDX M(R(X))}D
LOAD VIA X AND ADVANCE LDXA M(R(X))-)-D; R(X)+l
LOAD IMMEDIATE LDI M(R(P))-T; R(P)+1
STORE VIA N STR D.,M(R(N))
STORE VIA X AND STXD D-}M^R(X)); R(X)-1
DECREMENT
REGISTER OPERATIONS
INCREMENT REG N INC R(N)+l.
DECREMENT REG N DEC R(N)-1
INCREMENT REG X IRX R(X)+l
GET LOW REG N GLO R(N) .0-}D
PUT L014 REG N PLO D )R(N) .0
GET HIGH REG N GHI R(N).14D
PUT HIGH REG N PHI D4R(N).1
LOGIC OPERATIONS
OR OR M(R(X)) OR D4D
OR IMMEDIATE ORI M(R(P)) OR D4D; R(P)+l
EXCLUSIVE OR XOR M(R(X)) XOR D4D
EXCLUSIVE OR IMMEDIATE XRI M(R(P)) XOR D-ID; R(P)+l
A
1
'	 196
TABLE C.1
(Continued)
Instruction Mnemonic Operation
AND AND M(R(X)) AND D-rD
AND IMMEDIATE ANT M(R(P))	 AND D•)-D;	 R(P)+l
SHIFT RIGHT SHR SHIFT D RIGHT, LSB(D)-}DF,
O+MSD(D)
SHIFT RIGHT WITH SHRC SHIFT D RIGHT, LSB(D)}DF,
CARRY D F- -MSB (D )
RING SHIFT RIGHT RSHR
SHIFT LEFT SHL SHIFT D LEFT, MSB(D)-}DF,
O-}LSB(D)	 .
SHIFT LEFT WITH SHLC SHIFT D LEFT, MSB(D)-3^DF,
CARRY DF-)-LSB (D )
RING SHIFT LEFT RSHL
CONTROL INSTRUCTIONS
IDLE IDL WAIT FOR DMA OR
INTERRUPT; M(R(0))-BUS
NO OPERATION NOP CONTINUE
SET P SEP N +P
SET X SEX N+X
SET Q SEQ 1-}Q
RESET Q REQ (-)-Q
SAVE SAV T}M(R(X))
PUSH X,P TO STACK MARK (X,P)-}T;	 (X,.P)4M(R(2))
THEN P->.X;
	
R(2)-1
RETURN RET M(R(X))-}(X,P);	 R(X)+1
1-)-IE
DISABLE DIS M(R(X))-4-(X,P);	 R(X)+l
0+I E
r	 -
197
TABLE C.1
(Continued)
Instruction Mnemonic Operation
BRANCH INSTRUCTIONS - SHORT BRANCH
SHORT BRANCH BR M(R(P)))-R(P).0
NO SHORT BRANCH NBR R(P)+1
(SEE SKP)
SHORT BRANCH IF 0=0 BZ IF D=O, M(R(P))-}R(P).0
ELSE R(P)+1
SHORT BRANCH IF BNZ IF D NOT 0, M(R(P))-}R(P).O
D NOT O ELSE R(P)+1	 -
SHORT BRANCH IF DF= 1 BDF IF DF=1, M(R(P))-}R(P),0
SHORT BRANCH IF POS BPZ ELSE R(P)+1
OR ZERO
SHORT BRANCH IF EQUAL BCE
OR GREATER
SHORT BRANCH IF DF=O BNF IF DF=O, M(R(P))r(P).0
SHORT BRANCH IF MINUS BM ELSE R(P)+1
SHORT BRANCH IF LESS BE
SHORT BRANCH IF Q=1 BQ IF Q=1, M(R(P))-rR(P).O
ELSE R(P)+1
SHORT BRANCH IF Q=O BNQ IF Q=O, M(R(P))-}R(P).O
ELSE R(P)+1
SHORT BRANCH IF EF1 =1 B1 IF EF1 =1, M(R(P))->R(P).O
ELSE R(P)+1
SHORT BRANCH IF EF1 =0 BN1 IF EF1 =0, M(R(P))-}R(P).O
ELSE R(P)+1
SHORT BRANCH IF EF2= 1 B2 IF EF2=1,	 M(R(P))-)-R(P).0
ELSE R(P)+1
SHORT BRANCH IF EF2W0 BN2 IF EF2=0, M(R(P))-^R(P).0
ELSE R(P)+l
198
TABLE C.1
(Continued)
Instruction Mnemonic Operation
SHORT BRANCH IF EF3= 1 B3 IF EF3=1, M(R(P))^R(P).0
ELSE R(P)+l
SHORT BRANCH IF EF3=0 BN3 IF EF3=0, M(R(P))}R(P).O
ELSE R(P)+1
SHORT BRANCH IF EF4=1 B4 IF EF4=1, M(R(P))-}R(P).0
ELSE R(P)a-1
SHORT BRANCH IF EF4=0 BN4 IF EF4=0, M(R(P))-}R(P),0
ELSE R(P)+l
INPUT -- OUTPUT BYTE TRANSFER
LINES=1OUTPUT 1 OUT 1 M(R(X))-BUS; R(X)+! ; N
OUTPUT 2 OUT 2 M(R(X))4BUS; R(X)+I, N LINES=2
OUTPUT 3 OUT 3 M(R(X))-}BUS; R(X)+I; N LINES=3
OUTPUT 4 OUT 4 M(R(X)))-BUS; R(X)+I; N LINES=4
OUTPUT 5 OUT 5 M(R(X))4BUS; R(X)+I; N LINES=S	 b-
OUTPUT 6 OUT 6 M(R(X))}BUS; R(X)+l ; N LINES=6
OUTPUT 7 OUT 7 M(R(X))}BUS; R(X)+I; N LINES=7
INPUT 1 IMP 1 BUS+14(R(X)); BUS-}D; N LINES=1
INPUT 2 INP 2 BUS+M(R(X)); BUS-}D; N LINES=2
INPUT 3 INP 3 BUS+M(R(X)); BUS->D; N LINES=3
INPUT 4 INP 4 BUS-).M(R(X)); BUS40; N LINES=4
INPUT 5 INP 5 BUS+M(R(X)); BUS-}D; N LINES=5
INPUT 6 INP 6 BUS-}M(R(X)); BUS-}D; N LINES=6
INPUT 7 INP 7 BUS+M(R(X)) ; BUS->D; N LINES=7
199
TABLE C.1
(Continued)
Instruction Mnemonic Operation
BRANCH INSTRUCTIONS - LONG BRANCH
LONG BRANCH LBR M(R(P))}R(P).1
M(R(P)+1)4R(P).0
NO LONG BRANCH NLBR R(P)+2
(SEE LSKP)
LONG BRANCH IF D=0 LBZ IF D=O,	 M(R(P))n}R(P).1
M(R(P))+1)4R(P).0
ELSE R(P)+2
LONG BRANCH IF D NOT 0	 LBNZ IF DO NOT 0, M(R(P))-3.R(P)•1
M(R(P)+I)-}R(P).O
ELSE R(P)+2
LONG BRANCH IF DF=1 LBDF IF DF=1,	 M(R(P))-^R(P).1
M(R(P)+I),+R(P) .0
ELSE R(P)+2
LONG BRANCH IF DF=O LBNF IF DF=O, M(R(P))-}R(P).i
M(R(P)+1)-,.R(P) .0
ELSE R(P)+2
LONG BRANCH IF Q=1 LBQ IF Q=1, M(R(P))^R(P).1
M(R(P)+1)+R(P) .O
ELSE R(P)+2
LONG BRANCH IF Q=0 LBNQ IF Q=O, M(R(P))4R(P).1
M(R(P)+1)-}R(P) .O
ELSE R(P)+2
SKIP INSTRUCTIONS
SHORT SKIP SKIP R(P)+1
(SEE NBR)
LONG SKIP LSKP R(P)+2
(SEE NLBR)
t
i
r
i	
	
200
TABLE C.1
(Continued)
Instruction Mnemonic Operation
LONG SKIP IF D=O LSZ IF D=O, R(P)+2
ELSE CONTINUE
LONG SKIP IF D NOT 0 LSNZ IF D NOT 0, R(P)+2
ELSE CONTINUE
LONG SKIP IF DF=1 LSDF IF DF=1, R(P)+2
ELSE CONTINUE
LONG SKIP IF DF-O LSNF IF DF=O, R(P)+2
ELSE CONTINUE
LONG SKIP IF Q=1 LSQ IF Q=1 , R(P)+*?
ELSE CONTINUE
LONG SKIP IF Q=O LSNQ IF Q-O, R(P)+2
ELSE CONTINUE
LONG SKIP IF IE-1 LSIE IF IE=1, R(P)+2
ELSE CONTINUE
ARITHMETIC OPERATIONS
ADD ADD M(R(X) )+D-}DF, D
ADD IMMEDIATE ADI M(R(P))+D+DF, D; R(P)+1
ADD WITH CARRY ADC M(R(X))+D+DF-}DF, D
ADD WITH CARRY, ADCI M(R(P))+D+DFhDF, D
IMMEDIATE R(P)+1
SUBTRACT D SD M(R(X))-D+DF, D
SUBTRACT D IMMEDIATE SDI M(R(P))--D-}DF,	 D;	 R(P)+1
SUBTRACT D WITH BORROW SDB M(R(X))-D-(NOT DF)-)-DF, D
SUBTRACT D WITH BORROW, SDBI M(R(P))-D-(NOT DF)->DF, D;
IMMEDIATE R(P)+l
SUBTRACT MEMORY SM D-fl(R(X))-}DF,	 D
f
201
TABLE C.1
(Continued)
Instruction
	
Mnemonic	 Operation
SUBTRACT MEMORY, 	 SHI	 D-M(R(P))-}DF, D;
IMMEDIATE	 R(P)+l
SUBTRACT MEMORY WITH	 SMB	 D-M(R(X))-(NOT DF)^DF, D
BORROW
SUBTRACT MEMORY SMITH	 SMBI	 D-M(R(P))-(Nor DF)>DF, D
BORROW, IMMEDIATE	 R(P)+1
202
R(X). Since the contents of register X can be changed by a one--byte
set X instruction, data can be output efficiently from calling programs,
data storage areas in memory, and from immediate data bytes. This feature
is utilized extensively in the tse computers control programs which out-
put both immediate data and data obtained from the calling program.
In some applications a subroutine is called from several distinct
programs which may use different program counters. The set P subroutine
call technique is unsatisfactory for these applications because the sub-
routine cannot easily determine which register was the calling program
counter. Two alternate subroutine call procedures are provided to
simplify this type of subroutine call. The first procedure is a MARK
subroutine technique [19] in which the MARK instruction is used to
save the current value of X and P in a software stack. This method
permits the use of nested subroutines where the nesting order varies
dynamically. The second procedure is the standard call and return
technique [19] which uses two linking subroutines to control the call
and return processes. The standard call and return technique is the
most advanced call and return method. Advantages of the standard call
and return technique include unlimited subroutine nesting capability and
maximum flexibility in storing scratch-pad registers. In the standard
subroutine call and return technique, registers four and five are
assumed to point to the linking call subroutine and the linking return
subroutine, respectively. A call is initiated by setting P to four.
The address of the called subroutine is specified by two data bytes
which should follow the set P instruction. Returns are initiated by
setting P to rive. Except during the actual call and return operations,
4AL PAG IS POW.'
.3
203
both main programs and subroutines which utilize the standard call and
return technique execute with register three as the program counter.
All three subroutine call procedures are used in the tse computer
programs to maximize their efficiency.
The standard subroutine call and return technique requires some
of the scratch -pad registers to be dedicated to particular functions.
The tse computer control programs also assign particular functions to
certain scratch-pad registers. Table C.2 lists the functions assigned
to the COSMAC registers in the tse computer control unit application.
209
TABLE C.2
COSMAC CDP1802 REGISTER ASSIGNMENTS
Register	 Function
R(0) DMA Address Register
R(l) Interrupt Service Program Counter
R(2) Stack Pointer
R(3) Main Program Counter
R(4) Dedicated Program Counter for the Call
Subroutine
R(5) Dedicated Program Counter for the Return
Subroutine
R(b) Pointer to the Return Location and Arguments
Passed to the Called Subroutine
R(7) Dedicated Program Counter for the Long Delay
Subroutine with R(3) as the Calling Program
Counter
R(8) Scratch-Pad Register used by the Long
Delay Subroutine
R(9) Dedicated Program Counter for the General
ALU Operations Control Program
R(A) Dedicated Program Counter for the Index
Recognition Control Program
R(B) Unassigned
R(C) Dedicated Program Counter for the Compare
Operations Control Program
R(D) Dedicated Program Counter for the Input
Control Program
R(E) Dedicated Program Counter for the Long
Delay Subroutine for Variable Calling
Program Counters
R(F) Unassigned
a
f
205
APPENDIX D
CONTROL. MICROPROGRAM FOR THE TSE COMPUTER
This appendix lists four microprograms which control execution
of the basic tse instruction set of the Golay transform tse computer.
Two long delay subroutines are also listed. The total length of
the microprograms presented inthis appendix is 238 bytes.
010000
010000
010001
04L0001
0100102
01 ^0002
01 30oti
01000 3
01000T
01'0004
01 Goo G
01 0007
010 0 10
010:11
010011
010012
010012
010 0,1:a
0i0u1:^
TSE COMPUTER GENERAL ALU OPERATIONS CONTROL PROGRAM
ALL INSTRUCTIONS EXCEP"f TCLRI, TIDA, TCNT, TTEST,
TCMP, TCPI, TOZ, Ti NZ, TLBZ, TLENZ AND IIN.
R9 FUNCTIONS AS THE PROGRAM COUNTER
EXALU: SEP
	
R3.
323
ALUOF- OUT
	
i	 ; OUTPUT MASK CONTROL BYTE
141
OUT	 2	 ; OUTPUT REGISTER OUTPUT CONTROL BYTE
142
OUT	 to	 ; OUTPUT PORT 6 CONTROL BYTE
146
MLDLY 2247.	 DELAY 9 GATE BELAYS
i7 1
0 316
010
y07
042
VEST:
	 SEX	 R9	 ; PREPARE TO OUTPUT IMMEDIATE DATA
351
OUT	 7	 ; TURN AND GATE AT OUTPUT OF ALU ON
1.47
DO	 AND TURN ALU OUTPUT LATCH FEEDBACK
003
PAT14 ON
Figure D.1 Tse computer general ALU operations control program,
N
O
d}
c
_	 __	
r
MLDLY 997.	 ; DELAY 4 GATE DELAY':01.0014
010014 171
0100 4LS 331,,
010016 003
010017 `.345
01Co20 042
410021
010021 147
.!10 0`22
010422 001
01002,':
0 It 002? 171
01O J24 336
010025 001
010024 360
010027 0rt2
010030
0100:30 141
0100113,1
01003i 000
010;_2
0100 1302 142
010033
010033 003
0100-:4
010034 146
01 00,35
0100M 000
OUT	 7	 ; TURN AND GATE AT OUTPUT OF
DB	 <011B00000001>
	
ALU OFF
MLDLY 496.	 DELAY TWO GATE DELAYS
OUT
	 i	 TURN
Dr.	 <"^0000000003
OUT
	 2	 TURN
1313
	 ^^000000000>	 ALU
OUT
	 6	 TURN
D 	 C."B00000000>
ALU MAST% ROM'S OFF
REGISTER OUTPUTS AND REMAINING
IASi^S OFF
P61 JIFF
Figure D.1 (Continued)
NO
010036.1
010030	 343
01 030:17
010 737	 143
014040
010040 171
010041 336
010042 001
010'x)4'3 3u 1
0 4AO044 042
010041 r
01004.E 1430
010046
010646 171
0 10047 3 fa
010050 001
010011 ^57
010052 442
0L00Cj
0	 F::
^
♦10 G 064'
01 0054 062
SEX	 R3	 : PREPARE TO OUTPUT DATA FROM MAIN
PROGRAM
OUT	 3	 TURN THE RE"rORMATTER AT THE INPUT
OF THE SELECTED LATCH ON AND TURN
THE FEEDDACEi PATH REFORMATTER OF THAT
LATCH OFF IF AN OR OPERATION IS
NOT REQUIRED
MLDLY 497.	 ; DELAY 2 GATE DELAYS
OUT	 3	 ; TURN FEEDBACK' PATH REFORMATTER
; BACK ON IF IT WAS TURNED OFF
MLDLY' 499.	 DELAY TWO GATE DELAY'S
D1	 DST1	 ; TEST CONTRACTOR OUTPUT WHICH IS
WIRED 10 NOT EF1 r BRANCH IF EF1 =1
Figure D.1 (Continued) N
CD
co
F
0 1005 LDi
0100:5 370
01005/ 001
010057 PLO
X
010057 257
010060 Del
0100%0 060
010061 066
01001.52 DST1: LDI
01001>2 370
01006:3 000
010064 PLO
010064 257
0100%5 r PLO010065 257
01 0066 33ST2: SEX
01001.1,1 J 351
010067 DLY6
010067 304
010070 004
010071 OUT
010;71 143
010772 DD
010072 SC;,O
01007:3' OUT
01007:1 147
010074 DG
010074 000
1	 ; J. --:>D
RF	 ; 1----^RF CURRENT RESULT IS NOT AN ALL
HERO TSE
DST2	 ;
 BRANCH TO CONTINUE DESTINATION PROGRAM
O	 0--:•F1
RF	 0---->RF CURRENT RESULT IS AN ALL
ZERO TSE
RF	 DELAY 2 CYCLES
R9	 PREPARE TO OUTPUT IMMEDIATE DATA
DELAY 6 STATES
S	 ;
 TURN THE INPUT TO EVERY LATCH OFF
-'`D1	 0000	 ; WITH FEEDBACK, PATHS ALL LEFT ON
7	 ; TURN ALU OUTPUT LATC H OFF
:11lD00000000>
Figure D.1 (Continued)
IV
0
Lo
ti
w{
750^
y <
1 
f
4
y 
0^
a
D2
01 00 1 J ^y0135
010076- 104
010077 OUT
010077 145
0101.00 DD
010100 004
410101 LLf^
010101 300
010102 017
0141:3:1 377
0101134 D 1T3: Dti
010104 0640
010105 110
010106- DR
010106 060
010107 104
010110 DST4: SEX
010110 34:1
010111 DR
410111 060
010112 000
13ST3	 ; CHECK' FOR EXTERNAL INPUT REQUEST
5	 ; IF REQUEST IS PRESENT ACKNOWLEDGE
; BY SETTI NG BIT O OF PORT 5.
C"D00000100>	 ; THE INPUT ROUTINE MUST RESET THIS
INPUT	 ; BI T.
DST4	 ; OTHERWISE, CHECK FOR A REQUEST TO HALT
LISTS	 ; TSE PROCESSING AND GO INTO PROGRAM LOOP
IF THE RETEST IS PRESENT
R3	 RETURN VALUE OF X TO 3
EXALU	 ;' BRANCH TO JUST BEFORE THE ENTRY
POINT TO RESTORE R9 BEFORE
RETURNING TO THE MAIN PROGRAM
^n
^o
	 Figure D.1 (Continued)	
N
O	 p
N
LONG DELAY SUBROUTINE FOR V4RIMULE PROGRAM COUNTERS
GENERATES SN-r 20 CYCLES DELAY
RE FUNCTIONS AS THE PROGRAM COUNTER
010322	 cXMLDY: RET
	 RETURN TO CALLING PROGRAM
010322 160
01C323 MLDLY: LDXA
23!10*10 162 PH I
010324 270
010W25 LD;4A
014325 162
01032E PLO RS
01 0326 250
010327 . iLDLY1 : DEC RD
010~327 054
41x3^ 130 GHI RO
01031-:30
 2f3C
01x331 XRI 377
010-331 .37
Cl.e k1,: ^2:331 .37I'
0103;33 BNZ MLDLYI
010;;3, 072
01 0.3134 327
010-^ ZIP 5
.SEX 2
0 10:3 35 3042
01 0:3161 1 Nc R2
0103= ; try
0103:;7 DR EXMLDY
4103=37 O60
014344 322
f'UT HIGH AND LOW 13YTES
 OF THE
DELAY CONSTANT INTO R1
;
DECREMENT RO IN A LOOP
HIGH BYTE OF RO---->D
; COMPLEMENT D
; BRANCH TO REPEAT IF NOT ZERO
; ELSE SET X TO 2
; INCREMENT R2
; BRANCH TO ENTRY POINT MINUS 1
Figure D,2 Lang delay subroutine for variable prc^ram counters. N
,t
601&-
TSE COMPUTER COMPARE OPERATIONS CONTROL PROGRAM
INSTRUCTIONS TCNT, TTE ST, TCMP! AND TCPI
RC IS USED AS THE PROGRAM COUNTER
010204 EXCMP: SEP R3	 ; RETURN TO MAIN PROGRAM
410240 023
010201 CMPOP: OUT 1	 ; OUTPUT MAS6: CONTROL CYTE
01r,201 141
410242 OUT 2	 ; OUTPUT REGISTER OUTPUT CONTROL BYTE
410202 142
010203 OUT G	 ; OUTPUT PORT 6 CONTROL BYTE
014203 146
410204 MLDLY 2247.	 ; DELAY 9 GATE DELAYS
010204
	 171
414205	 ^36
410246	 010
410207	 307
0141, 10	 044-:2
010211 SEX R:r PREPARE TO OUTPUT IMMEDIATE DATA
1014211	 351
414212 OUT 7 ; TURN FIND GATE: AT OUTPUT OF ALU ON
01421 2 	147
4102:13 DU `^B00000001> ; AND LEAVE THE ALU OUTPUT LATCH FEEDBAG:
014213	 001
PATH REF'OR IATTEP OFF
04442444 NLDL'd 147. DELAY 6 GATE DELAYS
414414 171
410215 336
0102' J. 045
0102 4 7 331
414220 042,
Figure D.3 Tse computer compare operations control program. N
N
t
R
' ki ti
010221 D1
010221 064
010222 230
010223- LD I
010220- a70
010224 001
010225 PLO
010225 257
01022/ DR
010226 060
010227 234
010230 CrIp 1: LDI
010230 :370	 C
0 s0213p1 000
0102y2 PLO
010232 257
010233 PLO
01 0233 257
010234 CMF2: OUT
0102'L'4 141
0 1 02 13P5 DD
0102iU 000
0102:06 OUT
0102:36 142
0102'a 7 DL:
0102307 000
010240 OUT
010240 14o:,
CmPl	 ; TEST THE CONTRACTOR OUTPUT WHICH
IS WIRED TO NOT EF1. BRANCH IF EF1 =1
1	 1 — U
iZF
	 1---- RF CURRENT RCSULT I3 NOT AN ALL
ZErk0 1-OE
c MF2	 BRANCH TO CHECK FOR EXTERNAL INPUT
REQUEST
O
Pt F	 0--• -"RF
RF	 DELAY TWO CYCLES
1	 TURN ALU MASKS OFF
DO0000000>
2	 ; TURN REGISTER OUTPUTS AND REMAINING
C-11=0040000>	 ; ALU MAUI O OFF
6,	 . TURN PORT f CONTROL PITS OFF
Figure D.3 (Continued)
r	 ^.
w
C i a241 DD ^"DC^t7t70aot3Q7 ;
010241 000
010242 OUT 7 ; TURN AND MATE AT T'.-" OUTPUT OF
010242 -147
010243 DB <111000000000> ; THE ALU OFF
01024; 000
010244 C".), CrIP3 . OHEC« FOR EXTERNAL INPUT REQUEST
010244 065
010245 253
01024!1 OUT 5 ; IF `F i .E REQUEST 13 PRESENT AOK'NOWLEDGE
010246 145
014247 Do <11 X4000010033 ; BY SETTING BIT a OF PORT S. 	 INPUT
010247 004
010250 LER INPUT ; ROUTINE MUST RESET THIS PIT
0102: ^0 u00
010«1 017
010252 377
0102,53 GMPO: 03 CNP4 CHECK FOR A REQUEST TO HALT TSE
01 025? a!:1 J
010254 257
010255 rR OMP : PROOESSING AND GO INTO A PROGRAM LOOP
010255 060
010256 <5
IF THE REQUEST IS PRESENT
0102:57 CPim SEX, R3 OTHERWISE. RETURN THE VALUE OF X TO 3
014257 `4r
AND BRANCH TO ...BUST BEFORE THE ENTRY
010260 DR EXCMP POINT TO RESTORE RC BEFORE RETURNING
01 0260 06.0
010261 200
TO i VIL MA I N PROG RAM.
Figure D.3 (Continued)	 ^"
01011?
010113
010114
010114
010115
010115
%0116
010116
014117
014117
010120
010121
010121
010122
0'10122
014123
010123
010124
010125
4io1.c 6
010127
INDEX RECOGNITION CONTROL PROGRAM
INSTRUCTION TIDA
RA IS THE PROGRAM COUNTER.	 RD IS USED AS A WEIGHT COUNTER.
EXID: SEA' R3
323
IDAOP: LDXA GET INDEX WEIGHT INTO D
162
PLO RB STORE WEIGHT IN REGISTER B
253
I DAOPI : LIEN P.3 ; LAAD INDEX BYTE INTO D
003
ORI 140 ;	 'SET LIT 7 Or INDEX BYTE IN D
371
10o
STR	 R2	 ; PUSH SECOND PHASE OF INDEX BYTE PINTO
122
; STACK
; OUTPUT INDEX IDENTIFICATION CONTROL
; nYTE
; DELAY 4 GATE-DELAYS
OUT	 4
144
MLDLY	 rag/ .
17133 I
.J J
403
344
042
Figure DA Tse computer index recognition control program.
!	 _I
010130 SEX R2 ; PREPARE TO OUTPUT DATA FROM THE STACK
010130 :42
010131 DLY6 ; DELAY 6 CYCLES
010iol 304
010132 304
010133 OUT 4 ; OUTPUT SECOND PHASE OF INDEX
010133 144
; IDENTIPIGATION
; CONTROL BYTE TO ENABLE INPUT TO
LATC1 I I
0101:.4 DEC R2 ; RE3T0 RE STACK POINTER TO CORRECT
010134 042
VALUE
010135 DEC R3. ; POINT RZ) BACK; TO CURRENT INDEX
010135 043
s RECOGNITION BYTE
0101 ^+:, MLDLY '?96 . DELAY 4 GATE DELAYS
010136 171
0101 37 336
010140 003
010141 344
010142 042
010143 SEX R3 ; DELAY 2 CYCLES
010143 u43
010144 '-,,EX R3 ; PREPARE TO OUTPUT INDEX RECOGNITION
010144 343
BYTE FitaAIN
010145 OUT 4 DISABLE INPUT TO LATCH I
010145 144
010146 DEC RD ; DECREMENT WEIGHT COUNTER
01014,64 053
Figure D.4 (Continued)
0103147 { iLDLY 246. ; DELAY 1 GATE DELAY
010147 171
010150 036
010151 000
010152 361
410153 042
010154 GLts RD ; GET LOW BYTE OF WEIGHT COUNTER INTO D
010154 213
0101115 DL IDAOPI ; BRANCH TO CHECK NEXT ORIENTATION IF
0101155 062
013156 116
; NOT FINISHED
010157 t SEAS RA ; PREPARE TO OUTPUT IMMEDIATE DATA
010157 352
010160 OUT 4 t TURN INDEX RECOGNITION MASKS OFF
010160 144
010161 DD ^"L00000000:
010161 000
010162 D.:{ IDA1 CHECK FOR EXTERNAL INPUT REDUEST
010162 065
010163 171
010164 OUT 5 ACi'NOWLEDGE REQUEST BY SETTING BIT
010164 1 45
3 OF PORT 5.•
0101.65 DBI ti"B-'00000100> THE INPUT SERVICE ROUTINE
010165 004
MUST RESET THIS DIT.
010166 LER INPUT ; BRANC 1 -1 TO INPUT SERV ICE ROUTINE
010166 1000
010167 017
10x70 377
Figure DA (Continued)
3
010171 IDA1: 03 IDA2 ; CHECK FOR A REQUEST TO HALT TSE
010171 06Py
010172 175
; PROCESSING
010173 DR IDA1 ; 00 INTO A PROGRAM LOOP IF REQUEST
010173 060
010174 171
;	 i3 PRESENT
0110175 !DA?: SEX R3 ; PREPARE TO RETURN TO THE M AIN PROGRAM
010175 343
WITH X=:3
010176 DR EX ID ; BRANCH TO RESTORE RA DEFORE
010176 060
010177 113
RETURNING TO THE MAIN PROGRAM
Fi gure D.4 (Continued)
co
s
^i	
e
LONG DELAY SUBROUTINE FOR R3 AS THE CALLING
PRO`nRAM COLINTTER
GENERATES SN4-22 CYCLES DELAY
R7 FUNCTIONS AS THE PROGRAM COUNTER
R3	 RETURN TO THE CALLING PROGRAM
13
+.f	 7
NO	 PUT HIGH AND LOW BYTES OF THE
R3	 DELAY CONSTANT INTO R3
R
R7	 ; DECREMENT PS 1N A LOOP
RS	 ; HIGH BYTE OF R3--iD
377	 3 COMPLEMENT D
LDLY1	 s BRANCH TO REPEAT IF NOT ZERO
EXDLY	 ; ELSE BRANCH TO ENTRY POINT MINUS 1
010341 EXDLY: SEP
014341 32.E
010342 LDELAY: LDA
010342 103
01014. PHI
010.;43 270
410344 LDA
01 0344 103
010345 PLO
010 _345 250
010^46 LDLY1: DEC
010346 050
010 47 GH I
410347 230
0103so XRI
010050 373
*J10^51 -.377
0103 2 DNZ
010352 072
010:15 346
010.^_.54 DR
4103-54 0 ^ 0
O10^5a 041
01035 END
Figure D.5 Lang delay subroutine for R3 as the calling
program counter.
LD
TSE COMPUTER INPUT CONTROL PROGRAM
INSTRuc,riON TIN. REGISTER. RD IS USED AS
T14E PROGRAM COUNTED
010262 EXTIN: SEP rt3
010262 323
01 02,613 TINOr : SEQ ; SET Q TO ENADLE INPUT IMAGE
01026;; 173
010264 SEX RD ; PREPARE TO OUTPUT IMMEDIATE DATA
010264 ^ ^ 1
61
/
0102 OUT ; TURN REGISTER A FEEDBACK PATH
010265 14'3v
010246 ` DD C^B1 1100000, REFORMATTER OFF
010266 340
01027 MLDLY 747. DELAY THREE GATE DELAYS
017267 171
010270 3-36-
010271 002
0102172 3z3
010273 042
010274 CUT 3 ; TURN REGISTER A FEEDBACK PATH
010274 143
010275 I3E ; REFORMAT T ER ON
010271 3160
010276 MLDLY 497. DELAY TWO GATE DELAYS
0143276 171
01 077 3 a G
010300 001
010301« 1
01 0 02 04:..1
Figure D.6 Tse computer input control grogram.
I
N
O
010303 RED? RESET 0 TO DISABLE IMAGE INPUT
010303 ii2
010304 D2 TIN1 } CHECK' FOR EXTERNAL INPUT REQUEST
010:304 065
010305 313
010306 OUT ; ; IF THE REQUEST IS PRESENT ACF^NOWLEDGE
010306 145
010.107 DC <-11000000i.00> ; BY SETTING BIT 3 OF PORT 5.
	 THE INPUT
010;07 004
010310 LSR INPUT ; ROUTINE MUST RESET THIS DST.
010:3101 300
010311 017
010312
i
377
010:113 TINT: 03 TIN2 ; CHECK FOR A REQUEST TO HALT TSE
U 1031 '1; 066
010314 317
010315 BR TIN 1 PROCESSING AND GO INTO A PROGRAM
010315 060
010^1G
LOOP IF THE REOUEST IS PRESENT
010,3 17 TIN2: SEX Ra ; OTHERWISE RETURN THE VALUE OF X TO 3
010517 134.3
010:324 Sr EXTIN AND BRANCH TO JUST BEFORE THE ENTRY
0 14;320 0610
01 132 1 262
POINT TO RESTORE RD UErrORE RETURNING
TO T14E MAIN PROt_ RAM.
Figure D.6 (Continued)
k
222
APPENDIX E
RT-11 MACRO ASSEMBLER
This appendix summarizes the features of the RT--11 Macro assembler
[201 that are utilized in the tse computer cross-assembler. A detailed
explanation of the Macro assembler can be found in [201.
The RT--11 Macro assembler offers three features that are essential
to the tse computer cross-assembler. First, Macro permits user defined
macros which allow new instructions to be defined. Second, Macro
includes numerous conditional assembly directives which simplify
the generation of special operational codes, and, third, Macro provides
listing control directives which can be used to enhance the readability
of assembled tse computer programs.
Macro accepts a source program with up to four fields. The general
format of a source statement is
label: operator operand(s) ; comments .
The label and comment fields are optional. Either the operator or the
operand field may be omitted depending upon the contents of the other.
Ilhen more than one operand appears in the operand field, the operands
are separated by one of the legal separating characters defined in Table
E.I. The legal character set for source statements includes the letters
A through z, the digits,O through 9, and the special characters defined
in Table F.2.
Some of the special characters listed in Table E.2 are used as
operator characters which specify unary or binary operations on the
given operands.. Tables E.3 and E.4 define the legal unary and binary
T GIBLUY Ur THE
r., PAC ' IS POOR 1
r	 ^r
223
TABLE E.1
LEGAL SEPARATING CHARACTERS [24]
Character
	 Definition
	 Usage
space
	 crle or more spaces
	 A space is a legal separator
and/or tabs
	 only for argument operands.
Spaces within expressions are
ignored.
comma	 A comma is a legal separator
for both expressions and
argument operands.
<...^	 paired angle brackets Paired angle brackets are used
to enclose an argument,
particularly when that
argument contains separating
characters. Paired angle
brackets may be used anywhere
in a program to enclose an
expression for treatment as a
term. (The angle bracket
construction should be used
when the argument contains
unary operators).
^•••^	 Up arrow construction This construction is
where the up arrow
	 equivalent in function to the
character is followed paired angle brackets and
by an argument	 is generally used only
bracketed by any	 where the argument contains angle
paired printing
	 brackets.
characters.
d
°l0
tab
space
a
224
TABLE E.2
SPECIAL CHARACTERS [201
I
Character
	 Designation
	
Function
carriage return
line feed
form feed
vertical tab
colon
equal sign
percent sign
number sign
at sign
left parenthesis
right parenthesis
comma
semicolon
left angle bracket
right angle bracket
plus sign
minus sign
asterisk
slash
&	 ampersand
!	 exclamation point
"	 double quote
' single quote
+	 up arrow
backslash
formatting character
source statement terminators
label terminator
direct assignment indicator
register term indicator
item or field terminator
item or field terminator
immediate expression indicator
deferred addressing indicator
initial register indicator
terminal register indicator
operand field separator
comment field indicator
initial argument or expression
indicator
terminal argument or expression
indicator
arithmetic addition operator or
auto increment indicator
arithmetic subtraction operator
or auto decrement indicator
arithmetic multiplication
operator
arithmetic division operator
logical AND operator
logical inclusive OR operator
double ASCII character indicator
single ASCII character indicator
universal unary operator,
argument indicator
macro numeric argument indicator
ORIGINAL PAGIa t; ? 4,
'	 225
TABLE E.3
OPERATOR CHARACTERS [201
Unary
Operator Explanation	 Example	 r
+	 plus sign	 +A	 (positive value of A,
equivalent to A)
minus	 -A	 (negative, 2's complement:
value of A)
up arrow, universal
	
+F3.0	 (interprets 3.0 as a
unary operator	 1-viord floating-point
number)
+C24 (interprets the one's
complement of the binary
representation of 24(8))
+D127 (interprets 127 as a decimal
number
+034 (interprets 34 as an octal
number)
+B11O00111 (interprets 11000111 as a
binary value)	 d
J
k
{3S
]
I
3
226
'wt E-4
LEGAL. BINARY OPEMTORS [20]
Binary
Operator
	
Explanation	 Example
+	 addition	 A+B
-	 subtraction	 A-B
*	 mul ti pl i ca %.i on	 A*B (16-bit product reti 6 vied )
/	 division	 A/B (16-bit quotient returned)
&	 logical AND
	
A&B
logical inclusive OR	 A!B
F	
_f
227
operators, respectively. Note the fO and +B constructions which are
used extensively in the tse computer macros to indicate whether a
number is in the octal or binary radix. Operands can be numbers or
previously defined symbols. The symbols are usually defined by direct
assignment statements which have the general format
symbol = expression.
The tse mask symbols, tse register symbols, and COSMAC register Symbols
are all defined by direct assignment statements. Their decimal values
are given in Table 7.3, page 157.
In some instructions and macro definitions, the current value
of the assembly location counter must be known. A special symbol, the
period, is used to represent the current value of the assembly location
counter. The period can be used in any expression in which the other
defined symbols are legal. The tse computer cross-assembler uses
the current value of the assembly location counter to check for illegal 	
F
attempts to branch across page boundaries using short branch instructions.
When an illegal branch is detected, a .ERROR directive is used to output
an error message to the list file as a warning to the programmer. Error
messages are also printed out when the programmer attempts to use an
illegal input/output port or register specification.
The RT-11 macro assembler provides several types of assembler
directives which occupy%he operator field of a Macro source line
and cause the assembler to perform special processing operations.
Listing control, data storage, terminating, conditional assembly, and
macro directives are all used in the tse computer cross-assembler.
.228
The listing control directives .LIST and ALIST are used to
control the contents of the list file created by the assembly process.
Macro utilizes a listing level count to determine whether or not a
particular line of source code should be listed. When used without
an argument, the .LIST and A LIST statements cause the listing level
count to be incremented or decremented, respectively. Listing is
suppressed whenever the listing level count is negative.
The listing control directives can also be used with arguments.
In that case, the listing level count is not ai:fected,•but the listing
mode is overridden in a manner specified by the argument. The most
commonly used listing directive arguments are shown in Table E.5.
Listing directives are used extensively in the tse computer cross-
assembler to prevent macro expansions from listing. This improves the
readability of the assembled applications programs.
Since the PDP 11/40 is a 16-bit minicomputer, the RT-11 Macro
assembler normally works with 16-bit numbers. The COSMAC microprocessor,
however, is an eight-bit machine that requires eight-bit source code and
data bytes. A data storage directive, .BYTE, allows the Macro assembler
to produce object files which are suitable for the COSMAC control unit
of the tse computer. The .BYTE directive truncates specified arguments
to eight bits. The argument can be a number or any legal expression
whose 16-bit value has a high byte that contains either all zeros or all
ones.
One terminating directive is used in the tse computer cross-assembler
to indicate the physical end of a source program. This .END directive
can have an optional argument that indicates the entry point of the
r
229
TABLE E.5
SOME ALLOWABLE LISTING DIRECTIVE ARGUMENTS [201
Argument Default Function
SEQ list Controls the listing of source line
sequence numbers.
LOC list Controls the listing of the location
counter (this field viould not normally
be suppressed).
BIN list Controls the listing of generated binary
code (supersedes BEX).
SRC list Controls the listing of the source code.
Coll list Controls the listing of comments.	 This
is a subset of the SRC argument and can
be used to reduce listing time and/or
space where comments are unnecessary.
SYM list Controls the listing of the symbol table
for the assembly.
V
1
I
f
230
program. Often, this feature is used to provide automatic start-up
of programs after they are loaded into the computer.
Conditional assembly directives are one of the most important
Macro assembler features. These directives provide the programmer
with the capability to conditionally include or ignore blocks of
source code during the assembly process. The general form of a
conditional block of Macro code is
.IF con di ti on, argument (s) ; Start of Conditional Block
; Statements in the Range of
the Conditional Block
.ENDC
	
End of Conditional Block
where the condition which must be met for the block to be included
in the assembly is one of those given in Table E.6.
There are three subconditional directives (Table E.7) which can
be placed within conditional blocks -to -indicate that an alternate
section of code should be assembled when the main condition is not
wet. Alternately, these subconditiovals can be used to indicate the
unconditional assembly of a section of code within the conditional block.
The value of the condition, found upon entering the conditional block
of code , is the implied argument of the subconditional statements.
One line conditional blocks can be written using an immediate
conditional directive of the form
.IIF condition, argument, statement.
The allowable conditions and arguments are the same as those defined
earlier. Mote that a .ENDC statement is not required for immediate
conditionals.
231
TABLE E.6
ALLOVULE CONDITIONS [201
Conditions
Positive	 ComInIement Arguments Assemble Block If
EQ NE expression expression=0 (or	 0)
GT LE expression expression>O (or < 0)
LT GE expression expression<O (or > 0)	 Y
DF NDF symbolic symbol is defined
argument (or undefined)
B NB macro-type argument is blank
argument (or non blank)
ION DIF two macro-type arguments identical
arguments separated (or different)
by a comma
Z NZ expression same as EQ/NE
G expression same as GT/LE
L expression same as LT/GE
J
It
Subconditional Function
232
TABLE E.7
SUBCONDITIOHAL DIRECTIVES [201
.IFF
	
The code following this statement up to the next
subconditional or end of the conditional block is
included in the program if the value of the
condition tested upon entering the conditional
block is false.
AFT
	
The code following this statement up to the next
subconditional or end of the conditional block is
included in the program if the value of the
condition tested upon entering the conditional
block is true.
.IFTF
	 The code following this statement up to the next
subconditional or the end of the conditional block
is included in the program regardless of the value
of the condition tested upon entering the conditional
block.
J
I
233
Macro directives are the final -hype of assembler directives
utilized by the tse computer cross--assembler. A .MACRO statement of
the form
.MACRO name, dummy argument list
serves as the first statement of each macro definition. The name of
the macro can be any legal symbol. Similarly, any required arguments
are represented by legal symbols in the dummy argument list. These
symbols can be used outside the body of the macro definition with no
conflicts of definition. A comment field can be included after the
dummy argument list.
The last statement in each macro del=intiion must be a .ENDM
directive. The .ENDM directive is of the form
.ENO name
where name is an optional argument which, if used, must be the
name of the macro being terminated. Examples of correctly defined
and terminated macros are given in Appendix F.
a
234
APPENIDX F
SELECTED MACRO DEFINITIONS FROM THE TSL
COMPUTER CROSS-ASSEMBLER MACRO LIBRARY
Figure F.1 is a listing of some representative macro definitions
from the tse cross-assembler macro library. The mac ro library is
intended to be used with Digital Equipment Corporation's RT-11 MACRO
assembler and a PDP 11/40 minicomputer.
W
a
i
i
i
i
c-eIn
old
^J
C
d
MACRO SEP REG
. N'LIST SRC
WER REG
.BYTE ""0320+RE W
.LIST RC
. END i SEP
.MACRO  REQ
. NL I ST QRC
.BYTE "0172
.LIST SRC
ENDS j RE'fc
ok
.MACRO  MLDLY ADR
. NLI S 1 3RC
.BYTE "01 71
	; MARK
. DYTE	 -'fJ336, <AD R&"0 77404>!"0404, AUF:w'" 0377 ; SEA' RE, WADE:, LAUR
. UYTE "0042 DEC R2
.LIST SRC
. EMD S MLDLY
.MACRO ULY6
.BYTE "0104,"0304 NOP NON
.LIST  SRC.
MACRO LDI RR
. NL I T SRC
. DYTE "0370, RP
LIST SRCR' 1 •J I 
Figure F.l Representative macro definitions From the He computer
cross-assembler macro library.
NWU7
^QL__
MACRO LSN REG
,NLIST SRC
$»1ER REG
,BYTE	 .REG
.LIST SRC
,ENOM GON
.MACRO STR REG
.§LIST SRC
=ER REG
.BYTE A"0I20+REW
.LIST SRC
.zNDM STR
MACRO DQ	 X1,X XS,X4.X5,X6
.NLIST SRC
.BYTE x;
,IF NQ IV,2
.BYTE X2
IF ND XS
. BYTE W
,IF NO X4
.BYTE X4
.IE No XS
.BYTE XS
IF ND X6
.DYTE X&
.ENDC
.ENDC
.ENDC
.ENDC
.ENDC
Figure  ] (Continued)	
C^'
.MACRO 01 RP
.mLIST SRC
	 .
=PAG RP
.BYTE "00 4,r ^0377
.LIST SRC
.ENDM D1
.MACRO DZ RP
,mLIST GRC
ITZPAO RP
.DYTE ^0062,RP&"O377
.LIST SRC
.EN M §Z
.MACRO LBR ADR
.&LIST SRC
ADRl=2`0177400&ASR>Z^0400
ADR2=AQR&^O377
.DYTE ^0300.ADRI,ADR2
.LIST SRC
' ENDM LSR
.MACRO PLO REG
.mLIST SRC
11TER REG
.BYTE <%240+REG>
.LIST SRC
.ENDM PLO
.MACRO DEC REG
.NLIST SRC
Z=ER REG
.BYTE <^004WREO>
.LIST GRC
.ENOM DEC Figure F,] (Continued) 	
^
^
^^^^
. I MACRO OUT REGL
. NL I S T SRC
Z=•ERL REGL
BYTE C--"0140- REGL>
LIST SRC
ENDIM OUT
.MACRO T CLRI
. WL I ST S rC
.EYTE ^0144
LYTE ^000000000
. DYTE "013,27
DW 07 Sj G
.LIST `;tir.;
ENDM TCLRI
. MACiO TCMR REG1, REG2, MASK;
.NLIST SRC
BYTE "0331
IF EO., REG2--`O341
IFT
BYTE MAST
.BYTE "V10110001
IFF
BYTE ";`JIASK*16.:
R43=<REG2 ,^'0Q1 7t
. DYTE "."0220! RO>
I=N0C
,BYTE RE.G i
DYTE	 REG13
LIST aRC
EfiSI;^ i TL':>' R
; CLEAR REGISTER I
OUT 4
DD 0
LULY 494.
DELAY 2 GATE DELAYS
THIS MACRO TURN S THE i=EEDDACK MATH
REFORMATTER OF LATCri I OFF
COMPLEMENT REGISTER
SEP R9
;
ASSEMBLE IF RE02 IS A
MASK CONTROL ' BYTE
REGISTER OUTPUT CONTROL BYTE
A SSEMDLE IF RE02 IS NOT A
i MASK CONTROL BYTE
PICK OFF BOTTOM 4 BITS OF REG 2
REGISTER OUTPUT CONTROL BYTE
FORT SIX CONTROL BYTE. P61 ON
; RIrG1STER INPUT CONTROL BYTE NUMBER I
i REG I S T i_R INPUT  = ::..7NTR%OL BYTE NUMB-ER 2
Figure F.1 (Continued) w
a
. 11ACRO TMOV REG1, REG2 MOVE REGISTER TO REGISTER
. NLIST ,SrC
. 13'T TIE  •^0331 i ti EP R7
. LYTE -`D00000000 MASK CONTI^ ZOL BYTE
. IF E0, R' CG2— ^OO41
. I FT ; ASSEMBLE IF RE02 I S A
F:YTE .11 DO1100001 ; REGISTER OUTPUT CONTROL BYTE
. UYTE ,^D00000000 ; PORT SIX CONTROL BYTE.	 F &J. OFF
. I Fr ; ASSEMBLL I T-	 EG2 I'D NOT A
RG= l
 REG2y"O0 1 7i i
BYTE: .''O '0 ! RO ; R E G I STER OUT rlJl" CONTROL BYTE
DY i E -^u00000001 ; rORT SIX CONTROL BYTE.	 x'61 Or,;
L *,14 30
EY^i E REG I ; REGISTER INPUT CONTROL BYTE NUMBER I
. DYTE z ' s;<3{,0 ! REG1 D REGIS ER INPUT l ,^ON T ROL BYTE NUMBER 2
L I D T  SRF-
ErZll	 E rsOv
I A-' %O ^'MV I REG, MASK, ; MOVE IMMEDIATE  TO REGISTER
NL i ST SARC
LDYTE I SEP R^
DYTE i A I ;	 M SI-%' CONTROL 1l.", "1C TL7
. I:YTE "B00000000 REGISTER OUTPUT CONTROL BYTE
1; 111 T E ."W" 100000001 FORT SIX CONTROL BYTE.	 P61 ON
. D`{TE REED ;	 REGISTER INPUT CONTROL BYTE NUMBER 1
.BYTE ` i1t 0!REGD REGISTER INPUT CONTROL BYTE NUMBER 2
y
LI J 1 JRC
Efs -.1 my
Figure F.1 (Continued) NW
e
. MACRO TANOW REG1, REG2, MASK ; AND NOT REGISTER TO A
. NLIST SRC
.BYTE "0331 , SE.P R9
.IF EQ, RE0 J-•-"0 41 ti
.IFT ; ASSEMBLE IF RE02 IS A
.BYTE MASK ; MASK CONTROL BYTE
.BYTE "501100001 ; REGISTER OUTPUT CONTROL BYTE
. IFF ; ASSEMBLE IF REG2 IS NOT A
.BYTE :MASF4& "0007> ; MASK CONTROL BYTE
^R0= RiEG21^O017i ; PICK OFF LOWER FOUR BITS OF RE02
.BYTE ?"0140!RO3 REGISTER OUTPUT CONTROL BYTE
E ►` DC.	 a
.BYTE	 "B00OOOOOO ; PORT SIX CONTROL BYTE.	 PQ OFF
.BYTE REG I REGISTER INPUT CONTROL BYTE NUMBER f
.BYTE '."O360 ! REG 1 s REGISTER INPUT CONTROL BYTE NUMBER 2
.LIST ;SRC
.ENCM TANUN
.MACRO TOR
	 REG, MASK OR REGISTER TO A
. •{LIST SRC
.BYTE "01 ; SEP R9
.IF  ENV REG-"0341
. IFT ; ASSEMBLE IF REG IS A
.BYTE MASK MASK CONTROL BYTE
.BYTE "B00100001 REGISTER OUTPUT CONTROL BYTE
.BYTE ""00000000 ; PORT SIX  }CONTROL BYTE.	 P61 OFF
. I FF ; ASSEMBLE IF REG IS NOT A
.BYTE w''0007&MASK> ; MASK CONTROL BYTE
.BYTE ; REGISTER OUTPUT CONTROL BYTE
' BYTE •"DO000000i ; PORT SIX CONTROL BYTE.
	 PV ON
. ENDC
. DY T E C"O360 ! REG> ; PLOISTER IAEf UT CONTROL BYTE NUMBER I
.BYTE A"0360=6>
 ; REGISTER INPUT  CONS ROL BYTE NUMBER 2
.LIST  ti RC
. EPdDM TOR Figure F.1
	
(Continued)
a
Q
; CONTRACT REGISTERMACRO TrNT R%EGI MASK
;~LIST SRC
BYTE "OS-33
If" EQQ, F%rEG---`OS41
xFT
BYTE MA V'
LYTE -^B00100001
.BYTE " DOOOOOOOO
. I PF
LYTE <MA';'3Z&^0007D,
LYTE e 'r: G. f`0 177s
BYTE -00000- 0 0^'^0
. E iuOC'
RI I SRC
EEwrtM T C NT
MACRO TOUT REG
. INL I ST SiiC'
BYTE "'03r3
/
A1^
^y /^ y^y
LYTE "B00 (00000
1r'l eJ^`:[IE^J Yj ^^^^ 17:
u'i T E -'- ,0 20! iOi
BYTE ``D00000001
DYTE '. 1 0I I I 100
BYTE ^D1III10,00
. LI
r
S
,
T^
rf 
3FRC
END 1 TOUT
 1
SEP RC
;
ASSEMBLE
MASK CON'
REGISTER
PORT SIX
ASSEMBLE
ALIJ MASK'
i REGISTER
FORT SIX
IF REG IS A
rROL BYTE
OUTPUT CONTROL BYTE
CONTROL BYTE. P61 OFF
IF" REG IS NOT A
CONTROL BYTE
CIJTr[Jl' CONTP:OEL, BYTE
C0NIT R 0L B'z TE. P61 ON
; OUTPUTiT TW E
; S EP R9
; MASER. CONTROL EYTE
; REGISTER OUTPUT CONTROL BYTE:
; FORT SI X CONTROL BYTE. P61 ON
; REGISTER 0 INPUT CONTROL BYI E NUMBER 1
REGISTER 0 INPUT CONTROL BYTE NUMBER ?
Figure F.1 ( Con ti nued) 	 N
an
o '^
.MACRO TIDA X ; IDENTIFY BASIS POINTS WITH A
. NLIST SRC ; SURROUND OF INDEX X
.BYTE ^0332 ; S EP RA
.IF EQ, X ASSEMBLE IF INDEX IS ZERO
.BYTE ^0001 1
.BYTE ^0277 i INDEX ZERO CONTROL BYTE
.ENDC
.IF	 Ltd, X-"0006- i ASSEMBLE IF INDEX IS SIX
.BYTE "0001 1
.BYTE "0200 INDEX SIX CONTROL BYTE
.ENDC
CU,.IV X--0007 ASSEMBLE IF INDEX IS SEVEN^
.BYTE "0002 2
.BYTE "0252, "0225 ; INDEX SEVEN CONTROL BYTES
.ENDC
.IF GT, X-^0013 ; ASSEMBLE IF INDEX IS TWELVE OR THIRTEEN
.BYTE ^0003 3
.IF E% X-^0014 ASSEMBLE IF INDEX IS TWELVE
.BYTE "0266, "0255, "0233 INDEX TWELVE CONTROL BYTES
. ENDC
.IF c:Q, X--^0013 ASSEMBLE IF INDEX THIRTEEN
.BYTE "0211, ^0222, "0244 ; INDEX THIRTEEN CONTROL BYTES
-ENDC
Gt4L.r
I1' NE, X ASSEMBLE IF INDEX IS NOT ZERO
.IF  NE, X-"0006 i ASSEMBLE 1F INDEX 13 NOT SIX 
.IF NE, X---"0007 ; ASSEMBLE IF INDEX IS NOT SEVEN
.IF LE, X--"0013 ; ASSEMBLE IF INDEX IS NOT TWELVE OR THIRTEEN
.BYTE "0006 ; f.5
.IF r_-01 X-- 110001 ASSEMBLE IF INDEX 13 ONE
.BYTE "0276, '"0275, -"0273 INDEX ONE CONTROL BYTES
.BYTE ^0267, "0257, "0237
. ENDC
Figure F.7 (Continued)
NJ_, . _.
. BYTE ^0274, "0271, "0263
. 
BYTE ^0247, -^0217, ^0236,
. ENDG
.IF EQ, X-''0003
. BYTE "0270, '^0261, "0243
.BYTE
^rye	 . 0 ^^
	
^ 23
 ,_.^ r c	 U^Q"l,	 v.-. ,. Fes.	 ....^
. FNDC
. I F Erg , X-"0004
BYTE ^0260, 025 1, ^0203
BYTE "0206, ^0215, ^02 0
. ENUC	 a
.IF  co, X .... sODOFJ
. DYTE -0240, « 0201, ^0202
. LYTE ^0204, ^0210, "0220
. ENDC
.IF  EQ, X-"Q l 3
.BYTE "^0254, ^0231 ^0262
LYTE ^0245, .'`02 13, "^0226
_ ENBC
.IF EQ, X-^0011
.BYTE ^0264, "0251, ^0223
. rr.YTE ^3236, ^0215, •^0232
. ENDC
.IF  EQ, X--0012
.BYTE ^0242, "0210, ^0221
.BYTE  ^02: 4, "0205, ^0212
. ENDC
IF Eta, X-^015
. DYTE ^02511 ^0227P,
.UYTE ^0235, 1 0272, ^0265
;w
ENVC
ASSEMBLE IF INDEX IS TWO
INDEX TWO CONTROL BYTES
ASSEMBLE IF INDEX I S THREE:
INDEX THREE CONTROL BYTES
ASSEMBLE IF IP.DLX IS T=OUR
INDEX FOUR CONTROL BYTES
ASSEMBLE IF INDEX IS 71VE
INDEX  FIVE CONTROL 'BYTE=:
ASSEMBLE IF INDEX IS EIGHT
INDEX EIGHT CONTROL BYTES
ASSEMBLE IF INDEX IS NINE
INDEX NINE CONTROL BYTES
ASSEMBLE IF INDEX IS TEN
INDEX TEN CONTROL BYTE;
ASSEMBLE IF INDEX IS ELEVEN
INDEX ELEVEN CONTROL BYTES
Figure F.1 (Continued)
w
y^	
i
ENDC
ENDC
ENDC
LIST SRC
. ENDM T I DA
.MACRO TCMP REG,
	 MASK ; COMPARE REGISTER
WL I ST RC
BY T E ^03133 ; ti EP RC
Mfg=-=MASF:;11^. > ; SHIFT LOWER FOUR BIT.^-_. OF MASK 'BYTE INTO
ir1 — PtO MA^f;. ; UPPER FOUR BITS AND COPY BACK INTO LOWER FOUR PITS ALSO
BYTE ; MASK CONTROL BYTE
R0= REG&^017i ; PICK OF'F BOTTOM 4 BITS OF REG
. rp	 r_	 .i^ °.',OHO	 i ; REGISTER OUTPUT CONTROL BYTE
.
BY
TE ^B00000001 ; PORT SIX CONTROL BYTE. 	 P61 ON
.LIST SRC
MACRO TIN ; INPUT T SE
NL I S T SRC
. BYTE "03355 ; SEP RD
.LIST SRC;
. ENDiM TIN
.MACRO TBNZ RP ; TSE BRANCH ON RF NOT EQUAL TO ZERO
. NL I ST SRC
= P AG FOP
D Y T E ^0217 ; OLO RF
BYTE 0072, R P&^0377 ; BNZ RP
.LIST SRC
. ENLtfi Tlir^z
Figure F.1 (Continued) ma,
.sue
MACRO $&$PAO RPT
.NLIST SRC
AOR1=<^0177400&RPT>/^0400
ADR2=<^0177400&.>Z^0400
.IIF NE, .<AOR!-ADRl 5	 ,ERROR, ###= ILLEGAL BRANCH OVER PAGE BOUNDARY
ENDM $$$PAO
.MACRO %$$ER REFT
$¢V2=-`O2
,IIF GE, REOT, $$»2=$¢$2-!
.IIF LE, <RE6T-1 >. $$¢2=m$$2-!
,IIF NE, $$$2. ,ERROR: ^###= ILLEGAL REGISTER SPECIFICATION
.ENDM $$$ER
,MACRO $$$ERL REOLT	 -
$$$2=2
.IIF DE, REOLT. a$&2=$$$2-!
.IIF LE, <REGLT-7. >, $m¢2=$$$2-1
.IIF NE, $&$2, .ERROR; ^###: ILLEGAL PORT ASSIGNMENT 	 .
.ENDM &&«ERL	 .
^UlFigure F.l (Continued)
246
APPENDIX G
SAMPLE APPLICATIONS PROGRAMS
FOR THE GOLAY TRANSFORM TSE COMPUTER
Figure G.1 is a listing of a program for performing the Golay 	 r
transform skeletonizing algorithm using the tse computer. The skele-
tonizing program is 129 bytes long and requires 607 unit gate delays
to process a simple image. A program for performing the swelling
algorithm is listed as Figure G.2. Each iteration of the swelling
algorithm takes 664 unit gate delays. The swelling program is 147
bytes long.
0
V
000100 START:
	 TIN
000100 325
000101 ANOTHER: TMOV
	 C, A
000101 331
0001 02 000
000105 1 4 1
000104 000
000105 264
00010L 364
00010Y t 1:L^:I
000107 144
00rii lo 000
000
 i 11 027
000112 501
000110 356
0001 14 T I DA	 1
000114 332
00011b 1 0I.S.
000116 271.
000117 275
000120 273
00Q2 1 267
000122 257
000120 227
000124 TIDA	 2
0001 24 -a32000120 304,
0001`6 274
00012/ 271
000ijo /
000151 247
000i= 217
•J •J0001'"`j' 216 
; INPUT NEXT IMAGE FOR SKELETON I Z I NG
; SAVE IMAGE IN REGISTER C
; CLEAR INDEX REGISTER
; RECOGNIZE INDEX ONE
RECOGNIZE INDEX TWO
Figure G.1 A program for performing the Gol ay transform
skeletonizing algorithm.
Lill-
Figure G.1 (Continued) N
.A
co
0001 34 T I DA
000134 Uti2
00012,15  006
000 1 36 270
0001;;7 211
003140 24ti
030141 207
000 142' 21f•
000143 ^ ;4
00.40144 TANDN	 A, 11 ml
X300144 X31
000145, i0 i
0.3v 146 1 50
000147 000
r^ r..F rfsy s-1 .3 l41i
0 73 151 361
00315 2 TCLR I
00&1 132 144Cv:3 1 000
030 154 3,2 7
00:315 15 0 C, 1
0.0.E 1 `,U ;,56
0001 IS T I DA	 1
000157 rJ^
wr.31: t3 OCS
f f f ^ S.J 1 .:: / J
OO CJ, 16 2 27
^00. 1
 L J
4J JV 1 i,J{1 4 •J7
; RECOGNIZE INDEX THREE
; PERFORM GOLAY FUNCTION OPERATION
FOR THE FIRST SUBFIELD
CLEAR INDEX REGISTER
RECOGNIZE INDEX ONE
000147 TIDA	 2
000167 ac2
000170 006
00017! 274
000!72 27!
0001730 263,
00074 247
000175 217
000176 216
000177 TIDA
000177 SS2
000200 006
00020| 270
000202 261
000205 24%
000204 207
000205 216
000206 254
000207 T	 N	 A,I,M2
000207 Sa!
000210 002
00021: 150
0002!2 000
000215 541
000214 S6!
000215 TCLRI
000215 144
00021& 000
00217 527
000220 001
0030221 356
RECOGNIZE INDEX TWO
RECO@r4IM INDEX THREE
, PERFORM OOLAY FUNCTION OPERATION
: FOR THE SECOND SUDFIELO
; CLEAR INDEX REOI'ER
Figure G.] (Continued) ^
^
000222 TIDA	 RECOGNIZE INDEX ONE
000222 332
000223 006
000224 276
00022; 27F=.
000226 21:3
000227 267
O?.: 2 30 25-17
00020 1 217
000202 TIDA	 2	 RECOGNIZE INDEX TWO
.. ^,2: a-3^J{J^-^^
r332
.3,^^
000233 001:.
000204 274
000235 271
000216 263
000237 247
050240 217
000241 236.
03024 2 TIDA
	 :3
	 RECOGNIZE INDEX THREE000242 312,
J^^^J i.-.;r.524 ryryV0L
OD0244 270
005245 261
000246 24.J
000247 207
y 000251 214
^° n
b^
 v
o
Figure G.7	 (Continued)
O y
^xt=
QZp
C"0
000232 TA14DN A, I, M3
0602§2 Gei
000253 007
000254 '	 !S0
0002SS 000
00025& e4|
000257 S6±
000260 TCMP C,M
000260 228
0002&1 104
000262 203 
000265 001
000264 TVNZ ANOTHER
000264 217
000 176§ 072
000266 101
00024,7 TOUT A
0002` 321
000270 000
00027! 021
000272 001
000272 170
000274 c70
000275 CHECK. B2 START
00027§ ohs
000276 108
000277 DR CHEC1{
000227 060
000_^ 1. 6o 275
000301 END
, PERFORM OOLAY FUNCTION OPERATION
; FOR THE THIRD SUBFIELD
; COMPARE ITERATION RESULT WITH IMAGE
, SAVED IN REGISTER C	 .
: BRANCH TO PERFORM ANOTHER
, ITERATION IF DIFFERENT
; OTHERWISE OUTPUT THE SKELETON IMAGE
, CHECI( FOR A NEW-INPUT IMAGE
WAIT IN A LOOP FOR THE NEW IMAGE
^
Figure G.] (Continued)
000400 BEGIN:
	 TIN
oo04Jf0 joeb ;	 INPUT IMAGE FOR SWELLING
 000401 CONT
00)401 Ti V C.A000401 301 , SAVE IMAGE IN REGISTER C
0004«2 000
000408 141
. 000,404 000
o00^Jf5 ^ 4
000406 064
000407 TCLRI000407 144 ; CLEAR INDEX REGISTER
000410 000
000411 327
000412 001
000413 aS6
000414 TIDA000414 322 ; RECOGNIZE INDEX THREE
000415 006
000&16 000
0004!7 261
000420 243 .
000421 207
000422 2IG
000423 444
000424 TIDA 4 000424 RECOGNIZE INDEX FOUR
.000425 /0\
00042& 260
0.00427 ^l
OGO4SO 203
000401 206
0004&2 215
000^^ 200
Figure G.2
	 A program for performing the Golay transfo rm
. swelling a7	 rithm.	 ^
^
000424
000404 332
000455 006
000426 240
000457 201
000440 202
000441 204
000442 210
000445 220
000444
000444 031
000445 001
000446 120
000447 000
000450 041
000351 S6!
000452
000452 371
0004= 001
000454 010
000455 001
000456 370
000457 270
000460
`	 000460 144
000461 000
00)4%2 527
000460 001
000464 356
TIDA	 S	 , RECOGNI2E INDEX FIVE
TANDN A.I.M1 , PERFORM
THE GOLAY FUNCTION OPERATION
TOR I, M1 ; ON THE FIRST SUBFIELD IN TWO
; OPERATIONS
TCLRI	 : CLEAR INDEX REGISTER
Figure G.2 (Continued)
ca
rm
000468	 TIDA
	
3
	 ; RECOGNIZE INDEX THREE
000465
	 sat
000466	 006
000467	 270
000470	 261
000471	 z40
000472	 207
000473	 216
000474
	
204
000475	 TIDA	 4	 : RECOGNIZE INDEX FOUR
000475	 Sat	 .
030476	 006
000477	 260
00oso0	 251
000301
	 208
000502	 206
000305	 215
000504	 230
00010§
	 TIDA	 5	 ;'RECOGNIZE INDEX FIVE
000303	 132
000506	 006
000507	 240
000510
	 201
000511	 202
000512	 204
000513	 210
000514	 220
Figure G.2 (Continued) .
.	 .	 .i
000=5
000515 331
000516 002
000517 154
000524 000
000521 541
003522 361
00052:3 
000 523 331
000124 002
000525 010
000526 001
000527 270
0GOS3 } 270
0005.1
00053i 144
000512 000
000s= 227
000514 OO 1
00005S5 -.x`16
00 506
000 O& 302
000517 004-
000540 270
000541 261
000542 243
000549 207
000044
000545 214
TANUN
	 A, i , M2
TOR	 I. M2
TCLR I
1 I DA
	 3
PERFORM TI- & GOLAY FUNCTION OPERATION
1
; CAN TIME SECOND SUBFIELD IN TWO
; OPERATIONS
; CLEAR INDEX REGISTER
RECOGNIZE INDEX THREE
rigore G.2 (Continued)	 N
C37
C31
0
; RECOGNIZE INDEX FOURTISA	 4
: RECOGNIZE INDEX FIVETIDA	 S
TANDN	 A,I,MS ; PERFORM GOLAY FUNCTION OPERATION
TOR	 I.MS , ON THE THIRD SUOFIELO IN TWO
; OPERATIONS
Figure G.2 (Continued)
r%3
^
000SK,
000546 332
000547 006
000550 260
0005s1 251
000552 200
000552 206
000554 213
000550 230
000556
000556 Sj2
000357 006
000060 240
000561 20!
000562 202
000160 214
000564 210
005565 220
050066
000566 Sal
000367 007
000070 !SO
00&37! 000
000572 341
000572 361
000574
000574 S81
000575 007
000576 0I0
005577 001
000600
00050! 270
000602 TCrI` P	 c, M
000602 333 
000603 104
000604 205
000605 001
00OW& TSNZ	 CONTINUE
COMPARE RESULT TO IMAGE STORED IN
REGISTER::: TER C:
URANCH TO PERFORM ANOTHER ITERATION
005,50 21
0?0407 072
00 6io 001
000611L TrJU'r
000611 331
000612 000
0006ij 021
000614 001
00061b 170
000616 370
000617 CHECK:	 B2
000617 065
00620 000
000421 ER
000621 060
000622 217
000620	 END
IF Oit=FFRONT
A	 IF NOs ii ? 'r i= ERENC:E OUTPUT THE SWELLED
: IMAGE
BEGIN	 ; CHECK FOR NEW IMAGE INPUT r:EQUES`% AND
CHECK	 CONTINUE CHECKING IN A LOOP UNTIL ONE
OCCU :S
Figure G.2 (Continued)
