Architecture and data processing alternatives for the TSE computer.  Volume 3:  Execution of a parallel counting algorithm using array logic (Tse) devices by Bodenheimer, R. E. & Metcalfe, A. G.
General Disclaimer 
One or more of the Following Statements may affect this Document 
 
 This document has been reproduced from the best copy furnished by the 
organizational source. It is being released in the interest of making available as 
much information as possible. 
 
 This document may contain data, which exceeds the sheet parameters. It was 
furnished in this condition by the organizational source and is the best copy 
available. 
 
 This document may contain tone-on-tone or color graphs, charts and/or pictures, 
which have been reproduced in black and white. 
 
 This document is paginated as submitted by the original source. 
 
 Portions of this document are not fully legible due to the historical nature of some 
of the material. However, it is the best reproduction available from the original 
submission. 
 
 
 
 
 
 
 
Produced by the NASA Center for Aerospace Information (CASI) 
https://ntrs.nasa.gov/search.jsp?R=19760026765 2020-03-22T13:26:39+00:00Z
t-
4 
'
o
 
n
 
o C)
 (
A
t i
l 
tp
	
C
)
r
n
M
 
r
)	
Go
)
OD
r
i 
V
,,
 
0
LT
I 
1 3
	
L
<
F
4
 
C
 
M
 
m
0 G
-)
	
7
t*l	
:ro
cn
 4
- 
W'
A
n
	
t
^
 
x
P
-
 
tA
l 
11
 U
)	
I
:
j 
H 
X t-
I 
"
i
n
o
:
]c
 a
 
I
V
I-
A 
H
c
n
 0 
=
FA
 
1.4
 m
•
t 4
:	
C)
 r
) 
H
m
U)
	
t 7l
M
A
.
m
.
1	
-
'
Ti
to
	
-
1
r
e
•
6
p
9. •
u
r
wPsi
h
a
l
k
-
National Aeronautics and Space Administration,
Goddard Space Flight Center
Greenbelt, Maryland	 20771
t
FINAL REPORT.	 Contract NSG•-5002
Architecture and Data Processing
Alternatives for the Tse Computer
VOLUME 3:	 Execution of a Parallel
Counting Al gorithm Using Array Logi c a
(Tse) Devices 3
A.	 G.	 Metcalfe
R.	 E.	 Bodenhe'imer
TECHNICAL REPORT TR-EE/CS--76--3
r	 September 1976
- 7
3
n	 +7.»
o
i
/
a
^^
O cT 1976
GCZDN	 RECEIVED	-rz}
NASA sn iACIU11 c, x
; 'INPUT BRANCH. '	 ^^ ,
^<-'
^-'
f

ABSTRACT
A new family of digital 	 logic elements, 	 known as tse logic devices,
have been proposed by D. H. Schaefer and J. P. Strong at the Goddard
Space Flight Center, Greenbelt, Maryland._ Tse logic elements are r`
parallel	 computing elements which implement primitive logical functions
concurrently at each position of a two-dimensional 	 binary array.	 The
purpose of this research was to examine different tse hardware structures
for performing-a certain task.
A parallel	 algorithm for counting the number of logic-1	 elements	 in
a binary array or image was developed at GSFC during preliminary
investigation of the tse concept. 	 After summarizing the research at
GSFC, the counting algorithm is implemented using a basic combinationalti.
r c	 r	 .	 Modifications which improve the efficiency of the basicstutoe	 	
	
structure are also presented. 	 A programmable tse computer structure is
then proposed', along with a hardware control 	 unit,	 tse 'instruction set,
and software program for execution of the counting algorithm.	 Finally,`
I
a comparison is made between the different structures in terms of their
more important characteristics.	 To more clearly illustrate the projected
advantages of tse logic, a program to perform the same task was written
s
fora conventional	 binary processor and included in the comparison.
i
3
iv
f
^.
_.	 _
f
i,
kr
j
TABLE OF CONTENTS
y CHAPTER PAGE
-r
1. INTRODUCTION 1
I 2. TSE	 LOGIC	 DEVICES	 .	 .	 .	 .	 .	 .	 .	 . 5
i 3. A PARALLEL COUNTING ALGORITHM	 .	 .	 .	 .	 .	 :	 .	 .	 .	 .	 .	 . .	 14
^R Modified forms of the counting algorithm 	 .	 .	 .	 .	 .	 .	 . .	 25
Generating sums	 in sectors of an image	 :	 .	 .	 .	 .	 .	 .	 . .	 25
Simplification of the algorithm for
j clustered elements	 .	 .	 .	 .	 .	 .	 . .	 30 a	 ?
4. TSE HARDWARE IMPLEMENTATION OF THE COUNTING ALGORITHM . 	 . .	 34
^a Combinational circuit	 approach	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . '.	 34
Pipeline network	 implementation	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 38 -	 1
' Implementation using a programmable tse processor 	 .	 .	 . .	 41
f Hardware considerations	 .	 . .	 41
Software	 considerations	 .	 .	 .	 .,.	 ,	 ,	 .	 .	 .	 .	 .	 .	 ,	 . .	 54
f Modified forms of the counting 	 algorithm-	 .	 •	 .	 .	 •	 .	 . .	 64
5. COMPARISONS AND CONCLUSION	 .	 . 66
s
LIST OF	 REFERENCES	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 :	 .	 . .	 74
,
'
VITA 76
xn
f r
J
t;
i
,} >r r
LIST OF TABLES -j
,
i TABLE PAGE
1 Summary of Modifications per Iterat'_an 	 .	 .	 .	 .	 .	 .	 .	 . .	 .	 26
2. Tse Computer Instruction Set 55,
3. Tse Processor Instructions
	
.	 .	 .	 .	 .	 . 58 x
4. Tse Program Control	 Instructions	 .	 .	 .	 . 59-
5. Tse Computer Program for Execution of the j
Counting	 Algorithm	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . . .	 .	 . .	 .	 60
6. Program for Execution of the Counting a
Algorithm on the IBM 360 	 . 68'
j^
7. Summary of Image Processing Times for Tse;
and	 Conventional	 Implementations 	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 . 	 70 '	 l
8. Physical	 Characteristics of Tse Implementations
j
i of	 the	 Counting Algorithm	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 .	 72
n
j	 i
9
:
t}
.. vi
r
LIST OF FIGURES a
FIGURE PACE
1. A	 two tse	 Input,	 Digital	 AND Gate	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 6 E; q
2: Use of DUPLICATORS to Increase Effective
Fan-Out of a tse Device to Four 8
^y 3. Method of Implementing SLIDE Operation . 	 .	 .	 .	 .	 .	 .	 :	 . .	 9
4. An Example of Elementary tse Operations on
Typ ica l	 Images	 .	 ,	 .	 . .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . 10
5. Control Method Used for Switching of Image Paths 	 . 	 .	 .	 .	 . .	 12
6. A tse Plane in Which Each 1-Element Represents a
' Classified Region Whose Area	 is	 Desired	 .	 .	 .	 .`	 .	 .	 ... 15
7. The Result of Applying the Counting Algorithm to
1
t the	 Binary	 Image	 of	 Figure	 6..	 ..........	 ..
' 8. Result After Each Iteration of the Counting Algorithm,
:i
i Indicating Partial Summing 'Method 18
9.; The Results of Applying the Steps in the Algorithm
to Image A for One Iteration 	 .	 .	 . 20
-' 10. The Results and Significance of Applying the Algorithm f
is to	 Image A After Each
	
Iteration	 .	 .	 .	 .	 .	 .	 .	 . 24 {
11. Example of Partial Summing Where an Error
is	 Generated	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 28
12. Example of Modified Counting Algorithm 31
13. Example of Simplification of the Algorithm . 33
vii
f
iFIGURE PAGE
14. Combinational	 Network Implementation	 .	 .	 .	 .	 .	 .	 . .	 .	 .	 .	 .	 35
15. Contents of a Typical Box for the Combinational
` Circuit	 Implementations	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 .	 . .	 .	 .	 .	 .	 36
16. Pipeline Network Implementation to Increasei
Image Processing Rate	 .	 .	 . 39
17._ Latch Implementation for the tse Register	 .	 .	 .	 . .	 ..
	
.	 .	 40
' 18. A Microprogram, Microprocessor Control Unit
Concept for the tse Processor	 .	 . 44
19.	 -Block Diagram for the tse Processor	 .	 .	 .	 .	 .	 .	 . .	 .	 .	 .	 .	 46
-	 20. Organization of the tse Logical Operations Unit 	 . .	 .	 .	 .	 47
21. Image Buss with Conditional and Monitor Devices 49
22.	 .Organizat,ion of the tse Image Registers 	 .	 .	 .	 .	 . .	 .	 .	 50 }
23. Organization and Content of tse ROM Registers 	 .	 . .	 .	 .-.	 .	 51
24. Implementation of a Horizontal Sweep Device [5] 	 . .	 .	 .	 .	 .	 52
i
;t
r 4^
a
t
P
T
'l
tyy
i
sal
i
i
w
t
1
{}
w CHAPTER 7
t
r INTRODUCTION
L^
Since the advent of the first computers, a great deal of effort has
been devoted to the task of developing processors with improved data
processing rates.	 Technological	 innovations have been the primary
contribution to the improvements which have been realized throughout the
history of the computer. 	 However, there is evidence that the speed at
which present-day components can operate is fast approaching the limit
at which electronic signals can propagate [1]. 	 Thus, refinements in
areas other than the speed of semiconductor devices will provide the
t
probable source for significant increases in data processing rates in
the future.
One such area which has gained interest in recent computer
developments is the concept of parallel processing. 	 The term "parallel .,
I processor"	 is used to describe a computer whose Arithmetic Logic Unit
is structured to operate on each bit of an n- bit operand concurrently. i
This term is sometimes also used to describe computer architectures in
which a number of different instructions may be executing at any one
time.	 However, a description of such processors is beyond the scope of
this investigation.
	
Of particular_ interest in this research is the
array processor, a special
	 type of parallel-data processor. 	 The array
processor,	 in general, addresses and operates upon large blocks of bits,
usually multidimensional
	
in their arrangement.
i	 '
,
3
1
y
S
:a
r
:E
r`
2
4k, The concept of the parallel processor is actually not new. j
Virtually all computers in use today exhibit some degree of parallelism
in that they Are word-oriented machines. 	 This is justified by the fact
A ; that some of the most common forms of data handled by these machines
(for example, ASCII or BCD information) occur most frequently as a
group of bits.	 The processing of such data would become unnecessarily
cumbersome if handled bit-by-bit. 	 Therefore, the word-oriented
r processor is favored over a completely serial one. 	 In fact, the word-
oriented computer has become the most highly developed and widely used
form of information processing machine in general use at this time.
Only recently have more highly parallel 	 processors been given
-serious consideration as practical tools for information processing.
Although the advantages of such processors are many, 	 their development
has been limited by such factors as cost, size, and maintenance
i..' considerations, which are due to the increased component count and
number of connections.	 Although' some array processors such as SPAC [2], j
'	 l
SOLOMON [3], and ILLIAC IV [4] have been proposed, few have actually I
y reached an operational' status (the ILLIAC IV has been partially
completed), and 'virtually none have found widespread 'general 	 use.	 One
factor which is expected to contribute to the development of parallel
processors is the current state of integrated circuit technology. 	 This`
x;
technology allows the fabrication of large scale cellular component
,w
-	 structures at a reasonable cost and small 	 size. }
Insofar as present-day computers are concerned, they are well
Ym. organized to handle much of the data which they encounter, since these'
data occur mainly in a word-oriented fashion. 	 However, one form of
e
3}}
[`	 w
3data which becomes cumbersome to process is the digitized image. ,Even ay^
low resolution, simple binary image would be di g, 	tzed to a minimum of
x^. about 104 bits, and, at present, almost all 	 images are processed in a
<< highly serial manner using conventional	 processors.	 In order to
optimize speed and efficiency, a processor capable of handling at least
as many bits as there are image elements would be required. 	 Hence, l
image processing is a very likely field for the expansion of parallel
array processors._ In- particular, the organization of an image processor
would necessarily be two-dimensional, 	 implying communication between
4, horizontal	 and vertical	 neighboring elements,	 instead of simply
-
employing a large number of bits which are spatially unrelated.
- The success of the Earth Resources Technology Program (ERTS-1, now
known as LANDSAT-1)	 has led to the consideration of parallel	 processing,
for the development of practical 	 and efficient methods for identifica-
tion and classification of earth resources.	 The fact that the LANDSAT-1
satellite images cover approximately six million square ki'l'ometers per
day has provided the main challenge to the NASA Data Processing Facility
for the retrieval	 and processing of these data. -Among the most promising
programs which have resulted from this challenge is one at the NASA
Goddard Space Flight Center which has 'projected the development of a
family of two-dimensional	 parallel	 logic devices.	 In essence, each of
.may
the logic devices represents a computing element whose array size is the
..
same as the number of picture elements 	 (or pixels)	 in the image and can
perform a primitive logical operation concurrently at each image
s
:m position.
	
Currently, the utilization of fiber optics is being 	 considered
^n
for the fabrication of these devices.	 Projected refinements in fiber`
f ^	 i
1 .
4 t
optics technology indicate that parallel computing elements could be
constructed which are faster, less power-consuming, and possibly even
!'
smaller than conventional electronic components [5].
I
_ In the second chapter of this thesis, the work of'Shaeffer and
Strong related to two-dimensional	 logic devices is discussed. 	 The
r.
third chapter presents Strong's counting algorithm, and the fourth
(
i
chapter is devoted to the hardware implementation of the algorithm. 	 In
the fifth chapter, the merits of the different hardware implementations
are compared, and conclusions are presented based on these comparisons.
>m
i
i
4
^ M
1
I
ry
3
 
t1
13.2!
r
1
1
( ^	 1 A
nn
i
f	 x^ r
1 .
mCHAPTER
Tn	 TSE LOGIC DEVICES
Consider an image composed of a 512 x 512 rectangular array of
r
picture elements in which the gray level of each element is
	 	 	 quantized to
six bits.	 There are over 1.5 x 106 bits of information in this image.
Another way to visualize a digitized image is as six binary image
planes, each plane containing 512 x 512 bits.
	 The binary image plane or
bit plane is a two-dimensional
	 binary data array called a "tse."	 The
origin of the term "tse" is the result of an analogy drawn between
binary bits and words of the English language.
	 Just as the Chinese
language makes use of single symbols which re^ resent many Engl ; sh words,
the binary array represents many binary bits.
	 The term "tse" is the
transliteration of the Chinese word for the pictograph character, and
thus has been adopted as the word for the binary data array [5].
A family of tse logic devices which utilize electro-optical
technology and which are capable of performing simple,
	 parallel	 logical
operations simultaneously on one or two tses has been proposed by
Shaeffer and Strong [5	 6].	 Figure 1
	 illustrates a`tse gate capable of
ANDing two binary image planes.
	 A tse gate consists of two parts, an
interleaves and an electro-optical
	 threshold device.
	 The interleaves is
-	 a passive device which consists simply of two bundles of n2 optical
fibers, where n x n is the size of the bit plane for which the gate is i
designed.	 These bundles are merged or interleaved such that corresponding
positional	 elements in the Image A and Image B inputs are combined to the
5
1

a7
3
same elemental	 position at the interface of the electro -optical	 device.
•^ 1
When used in this manner, 	 the interleaver is referred to as a combiner.
The electro-optical device is an active integrated circuit which
-- converts the optical	 inputs to electrical signals which are logically
ANDed in a conventional manner. 	 Electrical	 signals at the output are
converted to an optical output by an electro -luminescence process.
Since only one fiber bundle can be connected directly to the output,
the fan-out of a tse logic gate is one.
	
In order to increase the 't
:
- effective fan-out, one or more interleavers can be used, in a reverse
manner, at the output of a tse gate.
	
An interleaver is referred to as a
duplicator when used in this manner.	 Since each output element of a y
duplicator is one-half the intensity of the input, 	 the original	 fight
a
intensity must be restored before the outputs can be used. 	 Therefore,
each output from the duplicator must interface to a reformator, which is
an active tse buffer device used to restore the proper optical 	 signal
levels.	 Figure 2 demonstrates how the effective fan-out from the AND
. gate can be increased to four.
In addition to the AND operation, other primitive operations can be
implemented.	 The OR, EXCLUSIVE -OR, NEGATE and SLIDE operations are
}
implemented in a similar manner as the - 4ND gate, ; except that the single-
operand devices need not include tre combiner at the input.
	 The SLIDE
operation is an image; translation in the UP, DOWN, RIGHT or LEFT
direction.	 Conceptually,
	
this operation is generated by interfacing two
fiber bundles with a physical
	 offset, as illustrated in Figure 3..
	 The
results of performing these primitive operations on typical, images are
j depicted in Figure 4.
t
r	
,
AN
s
8
. t
L,
(w
Z
Four Fan -O,ut ,
interleaver as	 Image A AND B
COMBINER
Lmage A
INPUT OUTPUT A
Image B
1
:
interleavers as
DUPLICATORS
e	 -
tse AND	 D tse REFORMATOR a
Figure 2. Use of DUPLICATORS to increase effective fan-out of a
tse device to four.
r
tt
Y Y
s
f
.
Y
W-r-mlm
10 r
Image A	 Image B	 s
_
A OR B	 A EX-ORB	 i
INVERT A	 SLIDE B RIGHT	 f{y	 ^
^	 s
LOGICAL 1	 LOGICAL 0
Figure 4. An example of elementary tse operations on typical r
images.
z
}
{
Al
1.^^ 11
Tse logic gates which implement functions other than the primitive
logical	 operations have been proposed. ,	One of these gates, the
contractor, has been found to be useful 	 in most tse computer structures.
The contractor is ,a control 	 device which indicates the presence of any
-
Y 1-elements in a tse.	 If there are no 1-elerients in any position of the
binary image, the output of the devicE 	 is logic-0, otherwise the output
-is logic-1.	 This' device is different in that the input is _a tse, but
the output is a single-bit logic signal.	 A, device of this type is
necessary for implementing conditional	 image operations.	 Other special	 -
tse logic devices can be found in Reference 5, Appendix A.
kt The basic tse gates can be interconnected in much the same mannerr
as conventional	 logic components to form structures which perform useful
functions.	 I`n order to realize more efficient utilization of components
j
in -a complex tse structure, some method of controlling the propagation
of images must be provided.	 To facilitate the switching of paths along
	 4
which an image will 	 travel, all active tse devices are assumed to have a
one-bit control	 line for turning the electro-luminescence on and off.
t In the off state,	 the output tse is a zero-tse;	 that is, all	 elements	 in
the array are logic-0.	 The use of this control scheme is illustrated in
Figure 5.	 Assuming that only one of -the three control 	 lines is active
at any time, the circuit can execute a SLIDE UP, SLIDE DOWN, or a
NO-OPERATION`.
In the sections which follow, a parallel
	
algorithm for counting
the number of 1-elements in a binary image will
	 be implemented using tse
logic devices.	 To provide a basis for comparison, different tse hardware
f
f
1 4

f^	
;r
h
CHAPTER 3
•r A PARALLEL COUNTING ALGORITHM -
In earth resource applications, techniques of pattern recognition
are appliedto the classification of terrain or surface features..	 After
the classification process, the measurement of the area of an identified
region is desired in many instances. 	 For example, a typical	 application
of earth resource technology might involve the mapping of the bodies of
W
water in a certain land region. 	 After the portion representing water is
identified in each frame, the total area of the water surface is
desired,	 since this information is important to the classification of
the land region, by percentage, of water.	 Figure 6 illustrates a tse
image plane of a typical	 frame after the identification of bodies of
water.	 The desired area is represented by the 1-elements.	 Each
i
element in the region classified contributes a partial area to the }	 ;
total;	 The problem of area measurement is solved by counting the
number of 1-elements in the tse.	 In a conventional	 digital computer,
this measurement is attained through a sequential 	 decision process, one
element at a time.	 The parallel	 counting algorithm of Strong [5 	 Appen-
dix F] achieves the solution differently. 	 A few of the descriptive
characteristics of the algorithm are presented in this section.
The counting algorithm is applicable to any binary image (or tse),
A w -
2m rows by 2 n columns, where m and n are any nonzero, positive integers.
After the application of the algorithm, the result is also in the form` 1
14
l
-z..R/"s ^..	 ..	 -_	 ..	 .inu.c...azar^-E._.sus.	 ' ..^.; ^ed^n' 	 '__.	 a-__	 .—_	 .^_..	 -	 -	 ,.	 _^.n:..e..,.k..,. z.-'	
.3,^i	 _q

16
of an image.	 However, the desired information, which is a binary
number indicating the number of 1-elements in the image, is found in the
bottom row of the image.	 The remaining elements of the image are all
zero.	 This is illustrated in Figure 7 for m,n = 3.	 The original	 image
` of Figure 6 is seen to contain23 1-elements. 	 Figure 7, which is the
result of the application of the algorithm, has the number 00010111 	 in
the bottom row.	 This is seen to be the binary representation of the
number 23.	 The method by which the image of Figure 7 is generated from
the original	 image is outlined as follows.
Basically, the counting process which is carried out by the algorithm
consists of a number of iterations, each of which generates partial 	 sums
over the image.	 Each successive iteration generates these sums over
larger areas, the final	 area being the entire image. 	 To illustrate the
r
process, consider the 4 x 4 binary image shown in Figure 8(a).	 Of course,
t the 1-elements and 0-elements in the image represent the logic level	 at
I each position.	 However, for this discussion, consider each as a one-bit
binary number, a 1-element representing one unit of area to be contributed
?I,. to the total, and a 0-element representing no area to be contributed. 	 In
the first iteration, each number in the first column and third column is
° added to the number immediately to its `right.	 Thus, eight additions of
two elements each are performed in parallel and the result is as shown in
Figure 8(b).-	 The eight groups or sectors over which the additions were
generated are indicated in Figure 8(b) 	 the encircled numbers being the s
two-bit first partial	 sums. `	In the second iteration, 	 each group (partial
x. sum)	 is added to the one immediately to its 'right,	 thus generating four
!
second partial	 sums, as shown in Figure 8(c).	 dote that each group
f
_
w
_
yF
17_
r
y^
0 0 0	 0	 0 0 0	 0 A	 °;
0 0 0	 0	 0 0 0	 0
0 0 0	 0	 0 0 0	 ,0
0 0 0	 0	 0 0 0	 0
0 0 0	 0	 0 0 0	 0
0 0 0-	 0	 0 0 0	 0
0 0 0	 0	 0 0 0	 0
0 0 0	 1	 0 1 1	 1
i
Figure	 7. The result of applying the counting algorithm to the_
binary image of Figure 6.
4
f
a^
a
x
ry ;
{; j:	
-fj->a-smmmvic,. m , v..s^a ^ 	 . 	 _:	
-xvsauam-c_,si-ti.+^vt m:aue-_
	 '•
.-uxe^.u.:^aiy:_i.rsre_...-  . 	-mx-ma ac•-e+	 . -	 ..	 _ .	 .	 _
a	
-y
.^	 z
-.
1 8 	_e
1	 1	 0	 1 G::0 0 1 0-01
	 1	
t
0	 1	 1	 1 0	 1	 1	 0
=0=1
0	 0	 1	 0 0	 0	 0 oho 0 1
0	 0	 0	 0_ 0	 0	 0
=0=0
( a ) (b) (c)
original- Result after Result after
fi rst iteration
	 second iteration
a
Y
y
V
^0	 0	 0	 0 0	 0	 0	 0
(t0 1 	 1	 0 0	 0	 0	 0
`0	 0	 0	 0 0	 0	 0	 0
LO
	
0	 0	 1 0	 1	 1	 1
( d ) (e)
Result after Result after
third iteration fourth iteration
Figure 8.	 Result after each iteration of the counting algorithm,
indicating partial
	 summing method.
^..	 POOR^	 , 1v'A E is^'f Y
1
19
encompasses a full
	
row, and the number in the group represents the
number of 1-elements which were in that row of the original 	 image
(Figure 8(a)).	 Since the maximum. , number^of 1-elements in any row is
four, and only three bits are necessary to represent the sum of the
"° elements in that row, the sum is right-justified. 	 In thethird itera-
tion, the row sums are added to generate two partial sums, each over
two rows, and,	 in the final	 iteration,	 the two partial	 sums are
4
r.. combined to form the sum over the entire image. 	 Note here, that in
addition to being right-justified, the sum in each group, appears in
the bottom row of the group.
As evidenced by the above description, the algorithm is inherently
parallel.	 The algorithm could, of course, be implemented by various
methods, such as by conventional programmed processors. 	 However, the
parallel	 nature of the algorithm makes it particularly well 	 suited to the
-	 concept of tse logic. 	 In the following paragraph, one iteration of the
algorithm is reduced to a number of steps and described in terms of the
necessary tse operations which were defined in the previous chapter.-
.
To complement the illustration of the algorithm, an arbitrary image
is used to show the effect of the execution of each step. 	 For simplicity,
the 8 x 8 (m,n = 3)	 image shown in Figure 6 (page 15)	 is chosen as the
original,	 and is	 identified in the algorithm as Image A.
STEP 1.	 Create a new image,
	 Image B, by performing a SLIDE 1
RIGHT operation on Image A.
	 The result of this step
^. is shown	 in Figure 9(a).,
STEP 2.	 Mask the odd numbered columns of both Image A and
Image B.	 This forces these columns to contain all
f
s	
^
i 
0`0 0	 1. 1	 1	 1	 1	 0 0 0 0 1	 1 1 'l 0 0 0 1 0 1 0 1 0'0 0 0 0 1	 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 '0	 0
0'0 0 0 1 1	 1 1	 0 -0 0 0 0 1 1 1 0 0 0 0 0 1 0 1 0 0 0 0 0 1	 0 1	 -0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0
0 , 0000011 00'000001 00000001 00000001 00000101 00000000
0'0x11'0011 00011001 00010001 00010001 0000000/ 00000000.
0 1	 1	 1	 0 0 0 1	 0 0'1 1 1	 0 0 0 0 1 
0
1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1. 0 0 0 1 0 0 0 0 0 0 0 0
0'1110001 00111000 01010001 00010000 00010000 01000001
1
0'00000`00 00`.000000 00000000 00000000 000'00000 01 000001
0'000'0000 00`00000'0 0000000'0 00000000 00000000 00000000
Image A Image B Image C Image D Image E Image F
Step 1 Step 2 Step '3_
( a ) (b) (C)
`j 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 '0 0 0 0 0 0
00001010 00010000 00000000 0001 1010
00001 01 0 00000'000 00000000 00'001 01 0
0 0 0 0 0 0 1 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 a
00100010 00000000 00000000 00100010
00 °1 0000`0 01 000001 00000000 01 1 00001'
^d 00100000 01000001 00000000 01100001
S0 0 00 0 0 00 0 0 0 00000 000 0> 0 0 00 00 000 00 0
Image C Image D Image E Image F
Step 4 Step 5
(d) (e)
Figure 9. The results of applying the steps in the algorithm to Image A for one iteration.
N
?
_ u.
bL
21
zeros.	 The masked images are labeled ,Image C and
Image D, respectively, and the result of this step
is shown in Figure 9(b).
L	 :,
j;
x. r
STEP 3.	 The AND and EXCLUSIVE-OR of Image C and Image D
are generated and are labeled Image E and Image F,
I^
rY respectively.	 The result of this step is shown
i in Figure 9(c).
STEP 4.	 Image E is checked for all	 zeros.-	 If Image E is
;f
all	 zeros,	 the iteration is complete.	 If Image E
contains any 1-elements, a SLIDE 1 	 LEFT operation is
performed on Image E. 	 The result is labeled Image C,
and Image F is relabeled as Image D.	 The result of
performing this step is shown in Figure 9(d).
STEP 5.	 Repeat Steps 3 and 4 until Image E is all zeros.-
_
r	 9
In the example, Steps 3 and 4 are repeated once.
- The result of performing this step is shown in
Figure 9(e).
These steps describe the first iteration of the algorithm. 	 All
subsequent <iterations use the same steps, with the exception that Step 1
M	 ;
and Step 2 are modified.	 The magnitude and direction of the Slide in
Step I and the pattern of the masking; in Step 2 are different for eachi
successive iteration.` These differences are outlined later. 	 First,
consider the effect of each step outlined above within the first 'w
iteration.
Step l	 creates a new I"mage B from the original 	 Image A in which
M
every element occupies; the same position as the one to its right in the
1 1
ing the odd columns from
creates Image C, which ci
the original image, and
22
t
both Image A and Image B,
)nsists only of the even-
creates Image D, which
one position to the right.	 No information is lost in the masking process,
as might first be concluded. 	 When Image A is masked to create Image C,
the effect is to retain the even-numbered columns of the original 	 image.
On the other hand, when Image B (same as Image A, displaced by one
column)
	
is masked to create Image D, the effect is to retain the odd-
numbered columns of the original. 	 Therefore, all	 information which was
contained in the original 	 image has been retained, and none is lost.
The masking process actually removes redundant information from the
"	 Images.	 Step 3 adds,	 independently and concurrently, each element in
Image C to the corresponding element in Image D.	 The sums	 (result ofi
the EXCLUSIVE-OR operation) are placed in Image Fund the carries
(result of the AND operation)	 in Image E.	 Step 4 checks Image E to
(J
"`	
l
determine whether or not any carries were generated. 	 If not,	 Image F is
the result of the iteration.	 If any carries were generated, they must y-^
be added to the sums,	 using another EXCLUSIVE-OR and AND operation,
	
thus `-
generating another sum image and carry image (Image F and Image E).
Step`-5 indicates that the adding operation of Step 4 is repeated until
the carry image shows all 	 zeros, thus indicating that the addition has
f
j
been completed.
	
Note that Steps 3, 4, and 5 describe an addition'
process which is completely analogous to the operation of a conventional
ripple-carry adder circuit [5, Appendix F].	 Step 3 implements a function
similar to that of the first half-adder in each cell
	
of a conventional
a
f
s
`'l
rN% 23
,
adder.	 Steps 4 and 5 represent the method by which the second half
adder of each cell adds the incoming carry to the sum fror the first
-
half-adder of the cell to generate an outgoing carry to be used by the
a
next cell.	 Upon completion of all	 five steps, Image F is the result of4
the first iteration.	 This image is the original 	 for the next iteration.
I
Each successive iteration performs the same basic operation over s
larger groups,	 the final	 iteration being the one which generates the z
sum over the entire image.	 Figure 10 illustrates the result after
each
	
iteration,
	
along with the sectors over which summing is performed.
f f	 hThe	 act t at an	 sector	 encircled areas in Figure 10 a) 	 through (f))Y	 (	 9	 (	 9 z
E contains a binary number representing the number of 1-elements in the
corresponding sector of the original	 (Image A) can be readily verified
from the figure.
x
^p
As previously stated, Step l and Step 2 must be modified for each
ry
iteration.	 For instance,	 the second iteration of the algorithm differs
from the first,in that the slide operation is a SLIDE 2 RIGHT and the
columns that are masked are the first, and second, fifth and sixth, r;
ninth and tenth, and so forth.	 Using the same approach as presented in
the first iteration, the sum over horizontal	 groups of four will	 be {k
r
generated when the second iteration is performed on the result of the
first iteration,	 as shown	 in Figure	 1'0(b).
a
Modification of the subsequent iterations 	 is similar,	 until	 each
horizontal group is the length of an entire row, as shown in
Figure 10(c).	 At this point,	 the rows must be added to one another,	 in
{C
much the same manner as groups were added before. 	 Therefore, Step 1 of
{
the next iteration will 	 be `'a SLIDE 1	 'DOWN,	 and the odd-numbered rows
1 ja
 14
11
--
G70)GZ U) 0 0 0 0 CEDED 0 0 0 0 0	 0	 0	 0	 0	 0	 0	 0 0	 0	 0	 0	 00	 0	 0
0' 0 0:1) 1	 1	 1	 1 0 0 (ED ED 1 0 CO	 0	 0	 1	 0	 1	 0 0	 0	 0	 0	 0	 1	 0	 1
0	 0	 0	 0	 1	 1	 1	 1
=66  0 0 T3 =16 0	 0	 0	 0	 0	 1	 0	 0 0	 0	 0	 0	 0	 1	 0	 0
ED E--O) I0 01 1 0 0^ 0 0 E3 1	 0 C0--O 	 00	 0	 0	 1	 0 0	 0	 0	 0	 0	 0	 1	 0
0 _0	 1	 1	 0	 0	 1	 1
=0) 1 :D 93 CD 0	 0	 1	 0	 0	 0	 1	 0 0	 0	 0	 0	 0	 1	 0	 0
0 J 1 1- ED ED COD ED 0 011 0	 0	 1	 1	 0	 D	 0] 0	 0	 0	 D,	 0	 1	 u	 0
i (ED= (Do 0 1 Co D ED ED =1 0	 0	 l	 l	 0	 0	 0	 1 0	 0	 0	 0	 0	 1 . 0	 0
0	 0	 0	 0	 0	 0	 0	 0 (ED 0 0 =ED0 0 0 0 0 =ED 0	 0	 0	 0	 0	 0	 0	 0
Image A Result After
First Iteration
Result After
Second Iteration
Result After
Third Iteration
SLIDE 1 RIGHT SLIDE 2 RIGHT SLIDE 4 RIGHT
(a) (h) (o)
C0 
	
0 -0' 0 0 „0 0 0 0
k	 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0	 0 0 0 0 0 0 0 0
0 0 0 0 0 1 1 0	 0 0 0 0 '1 0 1 1
C
0 0 , 0 0 0 0 0 0	 0" 0 0 0 0 _0 0 0
0 0 00 1 0 0 0	 0 0 0 0 0 0 0 0
k, 0 G'0 0 0 0 0 0	 0 0 0 0 0 0 0 0
f	 0 0 0 0- 0 1 0 0	 0 0 0 0 1 1 0 0
Result After Result After Result After
F-' Fourth Iteration Firth Iteration	 Final or Sixth Iteration
SLIDE I I	 DOWN SLIDE 2 DOWN SLIDE 4 DOWN
f (d) (e)
(f)
Figure 10.
	 The results and significance of applying the algorithm to Image A after each iteration.
i
^y
',.^a.u`.'_	 ....,..	 ,..	 ...	 ..	 .y .......	 ........•tu.. .,^r.,...t.,a..x..,.a..w.^...aa,....,..._.,...a..,.. .w,^-a ^,...y.,. ._......... 	 __.	 ....,.... ,.	 ..	 _..
I25
instead of columns, will	 be masked in Step 2. 	 From that point,
modifications are the same numerically as before, retaining the DOWN
slide direction and row masking.
	
Table 1
	
gives the necessary
modifications as a function of the number of the iteration being
r
performed. 3
Modified forms of the counting algorithm. 	 The parallel counting
algorithm is seen to be an efficient and potentially fast method of
summing elements	 in a binary image.	 Results similar to those shown can
be obtained by utilizing certain allowable variations 	 in the basic
steps of the algorithm.
	
With these variations, the execution of the
algorithm can possibly be greatly simplified for special 	 types, of input
images.	 Some modified forms of the counting algorithm are presented
below,	 using the image in ,Figure 6 (page 15,)	 as an example. ?
I
Generating sums in sectors of an image.	 In the standard form of
,
the algorithm,	 the magnitude and direction of the slide in Step 1 of
each iteration is specified by the number of the iteration.	 For any
image 2m X 2 n ,	 the order of the slide operations
	 is	 1, 2, 4
2m-1	 RIGHT,	 then 1,	 2,-4	 2n-1	 DOWN.
	
Note, however,	 that the
l
iterations need not be tied to this specific ordering.
	 After performing
I
any number of the	 iterations containing RIGHT slides,
	 the iterations
containing' DOWN slides may be commenced.	 After any number of DOWN
slides, more of the RIGHT slides may be performed, and so on.
	 The
iterations may be intermixed in any way, subject to only two
restrictions.	 First of all'_,
	
the actual	 order of.the	 iterations
containing slides of a certain direction should not be disturbed when
,.,.	 7,,.
f
TABLE 1
SUMMARY OF MODIFICATIONS PER ITERATION
Operation Modifications
Step 1
-1
SLIDE 2k RIGHT,	 k < nSlide Operation
k-n-1SLIDE 2	 DOWN, k > n
Step 2
' Mask Operation
r
Step 2 (a)	 Mask following columns:
Mask Operation Fork < n	 n-k
and	 i	 1,	 2,	 2
{ (2k7-2k+1	 2k -2 k
 +2,	 2ki-1),	 -	 -2k
(b) Mask following rows:
k
For k > n
	
m+n-k
and	 i	 = 1,
	 2,	 2 ('
a
k-n-	 k-n
	 k-n	 k-n	 k-n
	 k-n-1)(2	 i-2.	 +1,	 2	 i+2,	 2	 i-2-2	 .,
^.
the current iterztion. 	 2n = Number of columns in the image.k = Number of
2m = Number of rows in the image.	 m + n = Total number of iterations per image.
N
^p F
p, t7
	p+	 2 7
x
T	 iterations in the other direction are inserted between them. For
example, if the iteration containing the SLIDE 4 RIGHT is performed at a 	 i
	
^Y	 certain time, then no matter how many iterations containing DOWN slides 	 f
are performed, the next RIGHT slide iteration will be a SLIDE 8 RIGHT.
Of course, the same applies when RIGHT slides are inserted between DOWN
slides. The second restriction concerns the size of the partially summed
sectors which may be generated as a result of performing the iterations
in a different order. Certain orderings of the iterations will generate
sectors whose row lengths may not be large enough to contain the binary
number representing the number of elements in the sector. For instance,
consider the 8 x,8 image in Figure 11 to which the algorithm will be
applied. The order of the iterations is chosen to be theSLIDE 1 RIGHT
s
iteration, followed by the SLIDE 2 RIGHT iteration, followed by the
SLIDE 1 DOWN, SLIDE 2 DOWN, and SLIDE 4 DOWN iterations, as depicted in
C figure. After the application of these five iterations, the result
indicates partial summing over two sectors, each -8 rows by 4 columns.
However, note that there are 19 1 -elements in the left half of the
original image, and that the binary form for 19 (10011) cannot be
i
placed in the bottom row of the corresponding sector. The most signifi-
cant bit is lost, thus introducing an error in the computation. Clearly,
the result after the final iteration (SLIDE 4 RIGHT) is incorrect.
Within the restrictions	 n rearra ngement of the or der o^	 	 ^ a
_ Y	 9	 	 	 f the
z	 iterations will generate the same final result. The real significance of
r	 ^^
the rearrangement is that by properly choosing the sequence, then
i
omitting one or more of the iterations, the end result will indicate
	
--	 partial summing over a number of sectors of the original image. This'l	 -
sp
x:
}
1	
a2s
1.	 1 1 1(r'^1 1	 1 0 1.0 1 0	 1	 0 0 1
_,
I1 1	 0	 0 1 0'0 0 1 0	 0 0	 0	 1 0 0
I1 1	 0	 0 0 0	 0 0 1 0	 0 0	 0	 0 0 0,
1	 1 	 1 0 0	 0 0 1 0	 1 0	 0	 0 0 0
I
O 0	 1	 1 1 0	 0 0 0 0	 1 0	 0	 1 0 0
IO 1	 1	 1 1 1 ` 	1 1 0 1	 1' 0	 1	 0 1 0
I0- 0	 1	 0 0 1.	 1 0 0 0	 0 1	 0	 1 0 T
10 0	 1	 0 1 1	 0 0 0 0	 0 1	 1	 0 0 0
Original First Iteration
r
3
0 1	 0	 0 0 0.1 1 0 -0	 0 0	 0	 0 0 0
1
0 0	 1	 0 0 0	 0 1 0 1	 1 0	 0	 1 0 0
0 0	 -1	 0 0 0	 0 0 0 0	 0 0	 0	 0 0 0 I
0 1	 0	 0 0 0	 0 0 0 1	 1 0	 0	 0 0 0
0 0	 1	 0 0 0	 0 1 0 0	 0 0	 0	 0 0 0
0 0	 1	 1 0 1	 _ ` 0 0 0 1	 0 1	 0	 1 0 1
0 0	 0	 1 0 0	 1 0 0 0	 0 0	 0	 0 0 0
0 0	 0	 1 0 0	 1 0 0 0	 1 0	 0	 1 0 0
Second Iteration Third Iteration 'R
.i
f
v
ti
i Figure 11. Example of partial summing where an error is generated. '2
^	 1 =a
1 4
fy	 y
y
tY N
'iCF
{
3
C
I
_
W	 .,
tt Sit
29
x
rr w
3
9
s
1
A_y
I
t
0 0 0	 0	 0	 0	 0	 0 0 0 0 0 00 0 0
0 0 0	 0	 0	 0	 0	 0 0 0 0 0 0 0 0 0
0 0 0	 0	 0	 0	 0	 0 0 0 0 0 0 0 0 0
1 1 0	 0	 0	 1	 0	 0 0 0 0 0 0 0 0 0
0 0 0	 0	 0	 0	 0	 0 0 0 0 0 0 0 0 0
0 0 0	 0	 0	 0	 0	 0 0 0 0 0 0 0 0 0
0 0 0	 0	 0	 0	 0	 0 0 0 0 0 0 0 0 0
0 1 1	 1	 1	 0 	 0	 1	 error here —r 0 0 1 1 1 1 0 l
Fourth Iteration Fifth Iteration
I 0 0 0	 0	 0	 0	 0	 0 J
' 0 0 0	 0	 0	 0	 0	 0
0 -0 0 _0	 0	 0_'0	 0 a
0 0 0	 0	 0	 0	 0	 0
0 , 0 0	 0	 0	 0	 0	 0
0 0 0	 0	 0 ',
	
0	 0	 0
0 ,0 0	 0	 0;`	 0	 0	 0,,
0 0 0	 0	 1	 1	 1	 1'{ incorrect }
Figure 11.	 (continued)
a
a
k
!3
i
7
7
C .a 30
capability could be useful	 for some applications which require finding
average densities 	 in different partitions of an image.
As an example, consider the 8 x 8 image shown in Figure 12(a).	 In
addition to finding the total	 number of 1-elements in the image, the
distribution of these elements over the four quadrants is also desired.
r'
To achieve this,
	
the order of the iterations is changed to SLIDE 1	 RIGHT,
SLIDE 2 RIGHT, SLIDE 1	 DOWN, and SLIDE 2 DOWN.. 	 After these four itera-
tions,	 the desired partial	 sums are available, as shown in Figure 12 (e), z
_
To complete the operation, the remaining two iterations, a SLIDE 4 RIGHT
l and SLIDE 4 DOWN, are performed.
fSimplification of the algorithm for clustered elements.	 When the
elements to be counted do not cover most of the total 	 image frame, some
of the iterations may possibly be omitted. 	 Counting of clustered
elements, those which lie totally within some smaller area of an image,
j requires, only as many iterations of the algorithm as would the smallest l
2m x 2 n (m,n are integers)	 image which will
	
enclose the cluster.	 Once
this reduced image size is determined, the partial' summing procedure
'l
described above is applied to the image.	 The process is complete when
the 'size of the partially summed sectors is the same as the 'reduced i
image size for the cluster. 	 Provided that every element of the cluster
(= was located within a single sector of the original, the result shown inj
that sector will	 actually be the desired sum. 	 In order to ensure that l
o
i	 ,g g
the cluster lies entirely within the sector, a number of slides DOWN and
RIGHT should be applied to the original 	 to relocate the cluster to the
extreme bottom right of the image.
	 This will also cause the result to
v
K
4
f Z= .{'tom a, j1[	 • .
iry
i
0 0 0 0 0 0 0 0 (f 0 0 0	 0 0 00 00 of
0001-1	 11	 T 11	 01	 0 000101	 00,
0 0 001 1-11 0000 1 	01
	 0 000001
	
ooj	 a
000000l 1 000001
	 0^^ 000000	 o;
00 11 00.11 0o1, O IF 0 001o0	 0)
01.1 1-0001 .ol 1=0 00 00 l 	 10—'0--0'1^)
0 1	 1	 1- 0	 0 0 1 ED 1 0 0 0 0 1 0 0 31 0 0` 0 1)
0
f
0 0 0 0 0 0-0 0 0 0 0 0`0 0 0 0 0 0 0 0 0 0 0=1
(a) (b) (c)
_	
io000000000000000
	 00000
1 10'001'01 0'	 00000000	 0000C
X00000000	 00000'000
	
0000C
^0 0 0 0 '0 1 l 	 0 0 0 1 1 0 1 0
	 0 0- 0 0 C
00000000	 00000000)
	 00000
01 01<0011	 00000000
	 0-000C
00000000	 00000000
	 0000C
1001 1 000l	 1 0000'1 00
	 000l
( d )	 fie)
Figure 12. Example of modified counting algorithm.
f
^w
f	
a
E	 y
i,CDUCIBILI'rYF THE
w	 PAIGE IS I'OC}I
32
always be located in the normal bottom right position. Thus, by
elimination of some of the iterations, - the processing time can be
reduced.
The simplification is practical only if there is some method of
determining the size of the cluster, adjusting the pattern of the itera-
tions, and determining the number of slides needed to relocate the
f_
cluster. In particular, if the execution of the algorithm is under the
control of some type of monitoring device, an overall reduction in
processing time may be realized by checking every image before starting
the counting process, then making the necessary adjustments, if any.
a
As an example, suppose that the binary image of Figure 13(a) is
the result of some classification process, and the encircled areas have
y
been found to be -the areas of interest. However, for some reason, only
the area at the lower left is to be measured. By some further classi-
fication process, the area at the upper right is removed, as shown in
Figure 13(b)	 Upon checking the image, the monitoring device finds that
the smallest area which is 2 m
 rows by 2n columns and contains all of
the elements to be counted is 4 x 4 (m,n = 2)	 To relocate the image to
the bottom right corner, simple slide operations are performed, resulting
in the image of Figure 13(c). The partial summing procedure is then
applied, with a final sector size of 4 x 4. Thus, the execution time
for the 'algorithm will be shorter.
More significant reductions in processing time will be realized when
image planes larger than 8 x 8 are used.
j	 -	 '
33
r
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00000000
00011111 00000000 00000000
00001111 00000000 00000000
00000011 00000000 00000000
001	 1-001 1 001 1	 0000 0000'1001 11
01110001 0111`0000 0000'01111
0 1	 1	 1	 0 0 0 1 0 1 1 1	 0 0 0 0 0 0 0	 0 11 0 1 1 1
00000000 00000000 00001000of(a) (b) (c)
4x4
00000000
0 0 0 0 0 0 0 0
000000'00
0 0 0 0 0 0 0 0
00000000
0 0 0 0 0 0 0 0
0 0 0 0 1	 0 0 0 3
(d)`
after applying complete
i	 algorithm for a 4 x 4 image
Figure
	
13. Example of simplification of the algorithm.
i
a	
x 1
t y;
1 _
i..
a
_
_Y
CHAPTER 4
TSE HARDWARE IMPLEMENTATION OF THE COUNTING ALGORITHM
Combinational	 circuit approach.	 As described in the chapter on tse`'
components, tse gates may be interconnected to form more complex tse
functions in much the same manner as conventional gates are interconnected
to.form more complex boolean functions.	 Using a straightforward approach,
one method of implementing the counting algorithm is to simply connect
the proper gates together insuch a way that the desired steps are
performed on the image as it propagates through the network. 	 Conceptu-
ally,	 this	 is the simplest and most direct realization of the algorithm. y
The complete circuit for implementation of the algorithm may be
viewed as a`group of cascaded "black boxes," where each box has only an
input tse and an output tse, and performs a single iteration of the
algorithm.	 This arrangement is illustrated 	 in Figure 14.
	 Since the
magnitude and direction of the slide and the pattern of the mask is
different for each iteration, each box will	 have different contents.
However, the following description of the contents of one box is a r	 a
general one, and the differences -between each box- can be summarized.
The hardware needed to performone iteration of the al_ orithm isp	 9 i
shown in Figure 15, and the steps within the iteration can be readi ly
:;	 a
associated with certain gates in the structure.
	 Tse-Gate l	 is the slide
:	 l
gate which performs the slide operation in the appropriate mask pattern, !{
as outlined in Step 2,	 by allowing information to pass through the AND
IA
° 34 -
r	 +-	 r^o
13001IGINAL PAGE is


I^
s 3 7
i
gate where a l	 appears in the same position in the mask, and generating
a zero elsewhere.
As described in Step 3, tse Gates 4 and 5 generate the sum and
r
carry images,	 respectively.	 Tse Gates 6, 7, 8, and 9 effectively
perform Step 4 and Step 5, but in a somewhat different manner.	 The -.
carry image generated by Gate 5 is one input of the OR gate (Gate 6),
whose other input is initially clear.	 The carry image then propagates
through Gate 6 unaffected to Gate 7, where a SLIDE l 	 LEFT is performed
on the image.	 This will	 displace any carry bit generated to the left, 1
as is expected when two binary numbers are being added. 	 Remember,
r
however, that many sets of two numbers are being added simultaneously in
this case.	 The displaced carries are then fed back into one input of
9'.
Gates 8 and 9, where they are added to the previously generated sums.
Again,	 this follows directly from the case for binary numbers. 	 When the t
carries are added to the sums, new carries may be generated. 	 These new
carries are,
	
in turn, shifted left and added to the 'sum, which may
generate even more carries.	 Thus, carries will propagate around the
feedback loop until 	 no new carries are generated.	 This is analogous to ;`..
binary addition. 	 In fact,	 the entire summing operation performed by w
Gates 4, 5	 6,	 7,	 8,	 and 9 is equivalent to having 2m+n (the total	 number
of image elements) one bit full adders connected in groups of p, where p
is the number of image elements being summed per group in the particular
i"teration.
	
Hence,	 the true potential' 	 of parallel
	 processing is apparent.
After the feedback loop is stabilized ,(that is,-'no new carries are
y being generated),	 the output can be assumed to be correct.; Actually,
there is no indication as to when this condition has been reached;
f	 _
38
therefore, the amount of time required for worst-case carry propagation
delay must elapse before the output can be assumed to be stable and
correct for the next iteration.
The number of gates required for this implementation of the counting
algorithm for a 512 x 512 image is 360.	 Based on an initially projected
;power consumption of 3 watts per gate, the total 	 power required for the
circuit is '1080 watts. 	 The processing time, taken to be the delay from
the introduction of the original 	 image to theinput of the circuit to
the time at which the output is stable (worst-case), 	 is 756 tse gate
delays.	 In terms of a projected delay of '5 milliseconds per gate, 	 the
processing time for this configuration is 3.78 seconds.	 This corresponds
to an image processing rate of 0.26 images per second. 	 These character-
istics are summarized in the next chapter (see Chapter 5, page 70) l
where they are also compared to those of other implementation of the
algorithm:
;; 1
Pipeline network implementation.	 One of the most serious
disadvantages of the combinational	 circuit implementation is the large
propagation delay from the input to the output.	 Although this delay
cannot be	 easily reduced, a higher rate of image processing can be
realized by considering -a pipeline structure,	 such as the one illustrated
in  Figure 16.	 The circuit is basically the same as for the combinational
I-
	
circuit implementation,	 except that intermediate tse registers have been
placed between each iterative box.	 An implementation of the tse register
is presented in Figure 17.	 These registers temporarily hold intermediate x	
i
results so that data flow through the structure at a constant rate,
A
rk,
f,
t	 ,'

#,..
41
controlled by the frequency of the clock.	 The clock is set at a
frequency such that its period is slightly greater than the worst-case
propagation delay of the slowest box.
	
Between each clock pulse, a
wu
completely new image can be placed at the input.
	
The result of that
9
-
>- image will	 appear at the output after m+n-1 clock pulses. 	 Although the
actual	 delay from the input to the output for any single image is -
..
greater, the rate of processing (number of images per unit time)
	 is t,
increased.
i,
w
The pipeline implementation of the algorithm requires a total of
„
:. 468 gates, representing a power consumption of 1404 watts. 	 For a single x;k	 a
image, the number of gate delays is 1368, corresponding to a processing
time of 6.84 seconds.	 However, the processing time per image for a -
number of images being input to the circuit at the clock frequency is
0.38 seconds.	 The image processing rate for this	 implementation is2.63 m
` images per second, 	 an improvement over the combinational
	 circuit
implementation.
v
I
Implementation using a programmable tse processor.
	 Present-day ._ti
technology and projections to the near future indicate that the early_tse
iv
components will	 be bulky, have large power consumption, and will
	 not have a':
a suitable degree of fiber alignment to allow easy interconnection of
x.
gates.	 Therefore,	 initial
	 efforts will	 tend to favor structures which LL	
r
_ are as simple as possible, even
	
though
	 repetitive use of the structure
4 may require considerably greater execution times per image.
Hardware considerations,
	 As an illustration of the type of
structure proposed, consider ,a unit which contains only the most
f PEI'RODUCIBILITX OF THE
OPIMNAL PAGE IS POOR
1
i
42
elementary gates AND, OR,	 NEGATE, SLIDE -1
	 RIGHT, SLIDE 1	 LEFT,	 SLIDE 1
UP,	 and SLIDE 1
	 DOWN.
	 Consider also a'set of tse registers, as many as
needed to hold intermediate results, and a control and bus scheme which
w
allows any register to be directed through any gate and the result to be ;,$
directed to any register.
	 Using this machine, a SLIDE 64 RIGHT operation,
for example,- would be implemented by executing the SLIDE l
	 RIGHT opera-
tion 64 times on the same register.
	 The EXCLUSIVE-OR function of
Register A and Register B, for example, would be implemented by performing
the proper sequence of AND, OR, and NEGATE operations to generate
AB'" + A'B.	 Registers A and B
	 and three other registers for intermediate
results would be used in generating this operation.
The machine described in the previous paragraph is the general
	 form
of a tse computer.	 Although inherently slow, the machine has the
advantage of being structurally simple and versatile in that the control
unit can be reprogrammed to execute virtually any function or algorithm.
As intermediate results are generated,
	 they are stored in certain
tse registers whose outputs must be directed along different paths
according to the nextdesired operation.
	 Therefore, some method for the
switching of image paths	 is necessary.	 Recall	 that Figure 2	 (page 8)
shows a typical gate whose fan-out has been increased to four using image
i
R
duplicators.	 The four outputs are then connected to four different paths
a
through reformators.
	 In order to cause the image to propagate through
{ only one of the four paths, the control
	 bit to the reformators in the
other three paths
	 is turned off.	 This will
	 allow all-zero	 images to
propagate through these paths, which is the same as having them
disconnected from the source of the image.
	 Hence, the method of control
43
in a tse structure is to switch the active tse elements on and off in
the proper sequence.	 Since the control	 signals switch entire images and
not individual	 elements within an image, the generation of these signals
can easily be controlled by a small conventional 	 binary computer or
microprocessor. }
Shown in Figure 18 is a mi. croprogrammed control system for a tse
computer organized around a microprocessor. 	 The use of the Intel 8080
j microprocessor is projected in this paper, although any other micro-
processor would be suitable.
	
The memory consists of conventional ROM
and RAM organized as 8-bit words.
	 The microprogram, which generates the
control sequences for the tse processor,	 is stored in the ROM portion of
the memory.	 For ease of modification, the main program is stored in the
t
RAM portion of the memory.
The various operations performed by the tse processor (AND, OR,
NEGATE, and so forth), along with program control functions (Branch, i
Halt, Register Transfer, and so forth)
	 comprise the instruction set. M
These instructions, which are coded in some 8-bit format, are used to
a	
o
structure the _progr m for the c ounting algorithm.
	
This program is
stored in the memory associated with the microprocessor. 	 As each tse
instruction is encountered during execution of the program, the micro-
processor decodes the instruction under the direction of a system monitor
program.	 The microprocessor then outputs the appropriate control words ;?
through the output port to the tse control
	 lines in a proper sequence.
When a program control instruction is encountered, its effect will
	 be
restricted to the microprocessor control system, and no control of tse
components	 is generated.
w
t i _ ..;s^ s._
	
.._	 ..e.ti..:.^._
	 . ^.^ ..	 ^•^+:.tiv FhA.nN. N.Y ^.	 .::.. ._	 :...	 '."
N	
^' az
1	 `.-,c... Ltie^..^'F
1I
M1
Clock INPUT	 OUTPUT
1
{
{
tse	 tse a
8080
Microprocessor
ADDRESS tse
Status BUS PROCESSOR
Register ARRAY
Conditional
° MEMORY and Monitor iV
RAM	 ROM Inputs i
Microprocessor
Control
DATA
INPUT
BUS
PORT
MICROPROGRAM
CONTROL
•	 ^
—.REGISTER
OUTPUT
PORT
Figure 18.	 A microprogram, microprocessor control
	 unit concept for the tse processor.
In tse conditional 	 instructions,	 the status of the tse processor
must be monitored by the control, as in checking an image for the
4
presence of any l-elements and branching if true.
	
For this purpose,
certain devices whichmonitor tse images and convert the status of these
images to a few binary bits are required.	 The outputs of these devices
are connected to the input port, where the information may be addressed
by the microprocessor control.
E Figure 19 presents a block diagram for the tse processor. 	 The
architecture includes these subsystems:	 a tse Logical Operations Unit,
an Image
 Bus.,	 tse_Re gis ter, fixed	 Read Onl y) 	tse Regi sters, and tseg	 g^	 	 •
z
_a
monitor devices.	 Control inputs to the logical devices are not shown;
however, each active device which is involved in the switching of image
paths represents an incoming control 	 line.	 Each subsystem of this
3
organization is described below.
The organization of the tse Logical Operations Unit is illustrated -
in Figure 20.
	
An image placed at Input 1 	 will	 be directed through the
n
SLR, SLL, SLU, SLD,	 NEG or NOP gates to the output.
	 NOP is used in tse
register transfer operations. 	 When another image is placed at Input 2, a
the two images can be directed through the AND or OR gate to the output. ;a
Input 2 ` can be disabled and the SLR or SLD gate in the feedback path
enabled to perform the Horizontal or Vertical
	 Sweep operation [5] on the
a
image at Input 1.
	 Sweep_ operations are required to generate the 18
masks	 (m+n = 18) for an image size of 512 x 512 used in the counting
A
algorithm without the requirement ` for a large ROM.
	
The latch at the
output is a register which retains the result of the operation until
r
^_	
a cleared.
,
1
7
f
y_^
f' o-
'^
',^
1
47
ti
48
z
A primary purpose of the Data Bus is to transmit an image from the
output latch of the Logical Operations'Unit to the tse Image Registers,
where the image is gated into a destination register. • Also connected to
the Data Bus are two special
	 tse devices as shown in Figure 21.
	 One of
the devices is a tse contractor gate.
	 This device serves as a zero-image
or zero-tse detect bit when monitored by the control unit.
	 Another
special device on the Data Bus is a tse row output gate, which transmits
to the control unit input port the information in the eight rightmost
-, elements of the bottom row of the image latched on the Data Bus.
	 This
device allows the control
	 unit to have access to the numerical
	 result of
the counting algorithm.at
The tse Image Registers are connected to the Data Bus, as shown in
Figure 22.	 For implementation of the counting algorithm, eight tse
registers are required.
	 The output of any one of these registers can be
directed to Input 1
	 of the Logical Operations Unit, with the exception of
one register which will
	 be labeled Register RO.
	 This register is
_ A
connected directly to Input 2 of the Logical Operations Unit and is
	 the
	 i
only path connected to that input.
	
s
;A
Two fixed (read-only)
	 image registers are used to store certain
pattern images, as presented in Figure 23, for an 8 X 8 image.
	 These
patterns are used in conjunction with the sweep operations to generate
;a
}	 -' each mask required by the counting algorithm.
	 The method used to
generate these masks
	 is outlined below:
Consider the simple tse circuit shown in Figure 24(a).
	 Let the
image of Figure 24(b) be planed at the input.
	 The SLIDE 1 RIGHT gate in
	 s
_
,,	 aN	 l
the feedback loop will
	 cause the contents of any column in the input
^a
; I PRODUCIMLITY OF THL ,A	 ..
ORIGNAL Mt 18 POOR	
',


:yl
0	 0 0 0	 1 0 0	 0 0 0 0 0	 0 0 0 0
0	 1 0 0	 1 0 0	 0 0 1 0 1	 0 1 0
0 0101:00 0 0 0 11001^4 M1
0	 1 1 0	 1 0 0	 0 0 0 0 0	 1 1 _1 1
M2 0	 0 0 1	 1 0 0'	 0 1 1 1 1	 1 1 1 1
-	 0	 1 0 1	 1 0 0	 0 0 0 0 0	 0 0 0 0
0	 0 1 1	 1 0 0	 0 0 0 0 0	 0 0 0 0
F; 0	 1 1 1	 1 0 0	 0 0 0 0 0	 0 0 0 0
M1 M2
' Figure 23.	 Organization and content of tse ROM registers.
j
S
3
kf
h
INPUT tse 1 0 0	 0	 0 0 0 0 1 1 1	 1	 1 1 1 1
1 0 0	 0	 0 0 0 0 1 1 1	 1	 1 1 1 1
1 0 0	 0	 0 0 0 0 1 1 1	 1	 1 1 1 1	 r
0 0 0	 0	 0 0 0 0 0 0 0	 0	 0 0 . 0 O
+ 0 0 0	 0	 0 0 0 0 0 0 0	 0	 0 0"0 0
1 0 0	 0	 0 0 0 0 1 1 1	 1	 1 1 1 1
1_ 0 0	 0	 0 0 0 0 1 1 1	 1	 1 1 1 1
SLR
0 0 0	 0	 0,0 0 0 0 0 0	 0	 0 0 0 0
OUTPUT tse INPUT tse OUTPUT tse
(b)
(c)(a)
Figure 24. Implementation of a horizontal sweep device [5].
-	 CT7..N
i
elk, ......	 ,?J{G 	 +^+v.rudSU-	 ..,.u.._.....wi(M31.aaWa
.Libusa .ull^r. .sh.w6.l.:.:	 j' 1ui.^ .u.,.,	 r... nw.uw .ws	 ^_<	 ......t 	 a	 .e w.	 .w u	 u w.
53
image to be duplicated in the column to its right. The duplicated
column will propagate through the OR gate and be duplicated again. As
shown in the output image of Figure 24(c), the column will continue to
duplicate, or "sweep," across the image until all 'columns have been
duplicated. This operation is called a HORIZONTAL SWEEP [5]. A VERTICAL
SWEEP is implemented in a similar manner. When the OR gate in the
Logical Operations Unit is enabled, and the SLR or SLD gate in the
feedback path of the Logical Operations Unit is enabled, a circuit
similar to that of Figure 24 is realized, and a sweep operation is
performed on the image at Input 1.
The sweep operation is utilized in the generation of masks as
illustrated in the followin g example. During the execution of the third
G iteration of the counting algorithm, the following mask is required for
an 8 x 8`image:
00001111
r
0 0 0 0 1 1 1 1
00001111
00001/11
00001111
O 0 0, 0 1 1' 1 l
000'01 1 1 l
000>01 1 1 1
To generate this mask, two images are formed by a SLIDE 3 UP and a
SLIDE 4 UP on the contents of M2 (Figure 23).` The results of these
operations are ANDed. This places the desired mask pattern in the top
row of the tse, and 0-elements elsewhere. A VERTICAL SHEEP operation is
f	 _
i
54
then performed to duplicate the pattern downward, and the result is the
required mask for the third iteration.
Software considerations.
	
The instruction set for the tse processor
,
includes the operations performed by the Logical Operations Unit. 	 Since
y ^	 the processor was designed on the basis of a minimal	 hardware structure, j
operations such as EXCLUSIVE-OR and slides of magnitude other than one t)
are not included, but must be programmed by the user when required. 	 In
addition to the tse instructions, there must be available to the program-
mer certain program control 	 instructions or tse-microprocessor'
instructions.	 These instructions are used for branching, subroutine
call	 and return,	 indexing,	 halting,	 and so forth.	 The primary distinc-
tion _between tse processor instructions and program control	 instructions
is that the latter  do not involve any transfer or modifi cation of
is
information by the tse processor array. 	 Their effect is confined to the
x
microprocessor control	 system.
c t;
The complete instruction set derived for the tse processor is
- presented in Table 2.	 This set is partitioned into a tse processor
- instruction set and a tse program control set. 	 All	 tse registers are
indicated by upper case letters,	 and all microprocessor system registers
are indicated by lower case letters.
	 Note that some of the program
_control	 instructions have double-precision
	
(16-bit)	 capabilities.
	
These
are indicated by an asterisk (*). 	 Double precision operations are
necessary to allow for numbers which may exceed 28 1, or 255, as in the
case where image sizes may exceed 256 x 256 elements.
	 Of course,	 the
use of doub'l'e precision operations 	 implies the extension of the
r
55
TABLE .2
TSE COMPUTER INSTRUCTION SET
Mnemonic Instruction Description
tse	 Processor Instructions
AND A Logical AND (A)	 (RO)	 A
OR A Logical OR (A)	 +	 (RO)	 A
SLR A Slide I	 Right SLIDE R	 (A)	 A
SLL A Slide 1	 Left SLIDE L	 (A)	 A
SLU A Slide I	 Up SLIDE U	 (A)	 A
SLD A Slide 1	 Down SLIDE D	 (A)	 A
NEG A Compl ei'iient NEG	 (A)	 A
VSW A Vertical Sweep V.	 SWEEP (A)	 A
HSW A Horizontal Sweep H.	 SWEEP	 (A)	 A
MOV A, B Register Transfer (A)	 B
JMZ A, n Conditional	 Jump on null If (A)	 0,	 GO TO	 n
Image to Location n
tse Program Control
	 Instructions
Mnemonic Instruction Description
Microprocessor Instructions
JMP n Unconditional Jump to GO TO n
location n
JMZ a, n	 Jump on zero to a location n IF a	 0, GO TO n
MOV a, b	 Register Transfer (b)	 -^-	 a
LDI a, n	 Load Immediate n	 a
INC a Increment (a)	 +	 I	 a
CALL n Subroutine Call Store Program Control
Register, Go To n
RET Return from Subroutine GO TO	 (location
stored by CALL)
*DAD a Arithmetic Left Shift (a)	 +	 (a)	 a
(Multiply by 2)
f
OP 121'
(WICKNAL PAGE IS PO()Tz

57
associated register to 16 bits using some auxiliary register.	 Other
control	 instructions are implemented by microprocessor subroutines, as -.
.'
in the case of *Sub a-, b.
In order to program the processor to perform any task, the proper
,g	 sequence of instructions must be stored in the working memory of the
i
processor.	 Therefore, a coding scheme or format must be provided to
represent each instruction.	 Tables 3 and 4 show a coding scheme for
the instructions	 in Table 2..	 Although there are many possible formats,
the one shown has been structured to require a relatively simple micro-
program for decoding.- This simplification reduces the amount of
read -only memory required to store the microprogram, and also reduces
the decoding time for the tse instructions.
A tse computer program for execution of the counting algorithm is
presented in Table 5. 	 The size of the image in this case is 512 x 512
elements.	 Note that some subroutines are called upon which perform
frequently-used functions.	 These functions were omitted from the basic
instruction set for simplicity. 	 The subroutines are EOR, VSLR, VSLL,
VSLU and VSLD.	 Subroutine FOR performs the EXCLUSIVE-OR operation on
the images in Registers A and B.	 Subroutines VSLR, VSLL, VSLU and VSLD
perform variable-length slide operations in each of the four directions.
The image upon which the slide is performed is stored in Register A and
P	
the magnitude of the slide in Register z prior to the call.
	 Since the
.
four variable slide subroutines differ only in the slide instruction at
1
location IOTA, only one of the subroutines is shown in the table.
i4	 i
S t
** 58
}
TABLE 3
TSE PROCESSOR INSTRUCTIONS r
Number of
Instruction Bytes Coding
USW A 1 1	 0001 xxx
HSW A l 1	 0010 xxx
SLL A 1 1	 0100 xxx i
SLR A 1 1;	 0101 xxx
SLU A 1 1	 0110 xxx
SLD A 1 1	 0111 xxx
AND 1 1	 1001 RRR
OR 1 1	 1010 RRR
NEG 1 1	 1011 RRR
JMZ 2 1	 1101 RRR
aaaaaaaa
MOV 2	 _ 1	 1110 xxx
[RRRR], s [RRRR Id
aaa a aaa Branch AddressA
xxx = Don't Care
s ='Source Register
d _ Destination Register
I;
Register Format (RRR or [RRRR]):
RRR:
-A 000	 RRRR	 or RRRR
s	
d
A 0000
B 001 B 0001
C 010 C 0 010
D Olt D 0011
G100 G0100
L 101 L 0101
V M 110 M 0110
N 111 N 0111
t11 1000
M2 1001
..
RO 1100
a
is t^ 59
i
'	 TABLE 4 k
TSE PROGRAM CONTROL INSTRUCTIONS
Number of
Instruction Bytes Coding
MOV 2 0 0001	 xxx
xx	 [rrr]s[r,rr],,
LDI _3 0 1011	 [rrrld
nnnnnnnn
nnnnnnnn
SUB- 2 0 0011 xxx
xx [rrr]s[rrrld
1
JMZ r 2 0 1010 rrr
t aaaaaaaa 9
JMP 2 O 0010 xxx
aaaaaaaa
i	 CALL 2 0 0100 xxx
aaaaaaaa
r	 RET 1' 0 1100 xxx
DAD 1- 0 1110 rrr
INC 1 0 1111	 rrr 1
HLT 1 0 1101 	 xxx
i
aaaaaaaa = Branch or Call	 Address g
nnnnnnnn = Immediate Bytes
j
Register Format:
a	 000
1
b	 001
c	 010
v	 Olt
w	 100
x	 101
y	
110 2
F	
z	 111
j
r
,1
60
TABLE 5
y '	 TSE COMPUTER PROGRAM FOR EXECUTION OF THE COUNTING ALGORITHM
Location Mnemonic Comment
LDI 2,	 1 Initiate
LDI x,	 1 Microprocessor`
Counter
LDI y,	 0
ZETA MOV L,	 A
Test for Row
MOV y,	 a Counting or
SUB a,	 1 Column Counting
JMZ a,	 ALPHA
MOV ` x,	 z Slide
	 Input
CALL VSLR Image Right
MOV, A,	 B
MOV M2,	 A
MOV w,	 z
CALL VSLR
Create Mask
MOV M4,	 RO
AND A'
VSW A
l
MOV A,	 C
ALPHA JMZ y, BETA
vI
4 MOV x,	 z Slide Input
CALL VSLD Image Down
k MOV A,	 B
MOV MI,
	
A
MOV w,	
z
:^. CALL VSLL >Create Mask
MOV M3,	 RO a
AND A
HSW A
. MOV A,	 C
:^
y
61
TABLE 5 (continued)
Location Mnemonic	 Comment
BETA MOV C,	 R0
AND B AND input image
AND L
with mask	
f`
MOV B, RO
MOV L,	 A
Generate Sum,
Store in C
CALL FOR
3
MOV A,	 C
MOV L,	 B Generate Carry,'
AND B
Store in B
MOV B,	 RO
DELTA OR G
MOV G,	 RO
Test for
additional	 carry
MOV D,	 A bits
CALL FOR If none, begin
next grouping
JMZ A, GAMMA
_MOV G,	 D
MOV D,	 A i
SLL A
MOV A,	 R0 Add Carry_ to Sum,
MOV C,	 -A
Generate New
Carry
_
CALL FOR
MOV A,	 L
AND C
MOV C,	 G
JMP, DELTA Repeat Carry test	
F
GAMMA DAD x Increment Counters
INC w
i
r
RLTRODUCI ILITY OF THE
I9 POOP
 PAGE 
rT	
.L_	
_	
_ate ^.. :::	 ^.	 •..
n 62
TABLE 5 (continued)
Location Mnemonic Comment
A LDI v,	 512 Test for Completed
SUB v,	 x Row or ColumnSumming
JMZ z,	 EPSILON If Column Summing
JMP ZETA Completed, StartRow Summing
EPSILON JMN a,	 ETA If Row Summing
LDI x,	 1 l Complete, Stop , --,t
LDI y,	 1 Reset Counters,
Start Next
LDI w,	 l Grouping
JMP ZETA
ETA HLT
l
Location Subroutine VSLR Instruction
VSLR LDI	 b,	 0
IOTA SLR	 A
INC
	
B
I MOV	 b,	 c
SUB	 c,	 z
JMZ	 c,	 THETA
JMP	 IOTA j
THETA RET
- --^1
Subroutines VSLD, VSLU, VSLL are similar y
to VSLR l
Location Subroutine FOR Instruction
FOR MOV	 A,	 M
3
MOV	 B,	 N
NEG	 A
NEG	 B
MOV	 A,	 RO
,r
f	 - rC{
^
)	 — _ { h nm L
x , 63
P
TABLE 5 (continued)
i Location Instruction
AND	 PJ
_
MOV	 B, R0
AND	 M
MOV	 M, R0
OR	 N
MOV	 N, A
R ET i
Note:	 All	 registers are assumed to be cleared before execution
of this program.
Original	 image is	 in Register L.
* Image size = g	 92	 x 2.
-.1
I
r
F
!d
64
Modified forms of the counting algorithm. 	 Basically, the tse
_
program shown in Table 2 (page 55) 	 is used to implement the modified
r
forms of the algorithm. 	 Of course, some adjustments to parts of the
program are necessary.	 These adjustments are minor and are presented
7
without listing the entire program.
In one of the modified forms of the algorithm,	 recall	 that not all	 ?`
of the right-slide iterations or the down-slide iterations are performed,
but are truncated.	 To accomplish this, the immediate bytes of the LDI:
instruction at program location GAMMA+2 must be alt 	 ed, since the value
'	 1
they contain will signal 	 the end of each series of iterations. 	 Before
execution,	 the immediate bytes of the LDI should be loaded with the 	 r
number 2m , where m is the number of right-slide iterations to be
executed.	 After all	 desired right-slide iterations have been performed,
the immediate bytes of the LDI should be loaded with 2 n , where n is the
number of down slides to be executed.
•	 -	 i
In another modified form of the algorithm, which concerns partial
summing by sectors,	 the right-slide iterations and the down - slide itera-
tions can be intermixed. 	 To accomplish this, the 'desired sequence of i
iterations must be specified in some portion of memory and addressed by
the main routine.	 The program is not changed up to location GAMMA. 	 At
this	 location,	 however, a branch should occur to address the portion of
memory in which the iteration sequence is located.	 When the next
desired iteration is	 identified,	 registers w, x, and y should be modified
u
accordingly.	 These registers control
	 the magnitude and direction of the
r
slide operation for each iteration.	 At this point,
	 a branch to location
ZETA should occur.	 This will	 start the-iteration.-
	 Upon completion of
the iteration,	 the processor will	 again be at location GAMMA. 	 Thus, the
iterations are repeated in this manner until	 the desired sequence is
completed. r
A total	 of 92 tse gates are required for theprogrammable tse
computer implementation of the counting algorithm. 	 The corresponding
is
power requirement for this number of gates is 276 watts. 	 Total	 process-
ing '	time for a single image is 68,775 tse gate delays, representing.
an image processing time of 343.9 seconds per image. 	 This corresponds
to an image processing rate of 2.9 X 10
-3
 images_ per second. 	 Image
r.,
processing times are based on an average carry-propagation distance of
three positions per image.
	
Unlike the combinational	 and pipeline
A
implementations,	 the processor can indicate early completion of the
algorithm.	 Maximum carry, propagation in each iteration is unlikely,
1
especially where this maximum is eight or more.
a
z
H
Z
"
a
f	 _
C
^p
o
,
r
1	 ^	 `
The concept of two-dimensional logic devices and the tse computer
represents a new and different approach to the task of image processing.
Although the basic ideas used in the development of tse operations are
not particularly innovative and were conceived long ago, the use and
refinement of these ideas have been severely limited by the lack of 'a
technology required to implement them. Only now can such a highly
parallel logic structure even begin to be considered as having practical
hh
	
applications in the future. At the present time, the first tse gates
I have been conceived and are under design and development. The specifi-
cations related to these gates give an indication that, in the near
I
future, tse computers may replace conventional computers in certain
applications. In order to _gain a better perspective as to the relative
merits of tse processors, the characteristics as presented in the
previous, sections will be compared with the characteristics of a
G	 conventional processor.
Of course, the execution of the counting algorithm cannot be
considered as a proving g round for tse processors. There are many tasks
that can be performed on a tse computer which could allow the processor
to compete effectively with conventional processors, in terms of effi-
ciency. Each task would generate a different set of specifications for
comparison, which in turn would generate a different set of conclusions.
Therefore, the results of consideration of the countin g algorithm for tse
66
t	 r	 ^.
"	 67
implementation must not be taken to
	 rigidly apply to tse computers in
fx	
general.	 Any projection based on the results generated here must be
carefully evaluated.
All comparisons presented in this section are based on an image
-^	 size of 512 x 512 pixels.
	 This represents an excellent resolution,
i
almost that of a standard television receiver.
	 At present, the
I
512 x 512 is projected as the upper limit on the image size of tse
{	 components.
A good evaluation of the usefulness of the tse computer must W
include a comparison to a typical
	
conventional	 processor.
	 For the
purpose of comparison, the counting algorithm will
	 be implemented in j
terms of the language of the IBP^ 360.
	 The total	 processi ng time will
then be determined using the actual
	 instruction cycle times of the
_	 360/65,	 Level	 N.
The IBM 360 is not by any means the fastest processor available;
	
-
i	 however,	 its characteristics are intended to be representative of most
computers in general	 use today.	 Also, the results generated by this
example can be easily extended to fit almost any conventional
	 processor
by determining actual
	 times for tasks performed. a
'	 A program written for the 360 to count the number of 1-elements
	 in
an image is shown in Table 6.
	 The basic approach is to address each
y
picture element and increment a counter if the element is a logic-
	 "l."
i
An image is stored as 8192 consecutive 32-bit mennory locations in the
computer.	 The image could also be stored externally,
	 possibly latched
at the output of a parallel-type camera,
	 and addressed in much the same`
manner as the internal memory would be addressed.
	 For the basic
1
68
TABLE 6	 a.
PROGRAM FOR EXECUTION Of THE COUNTING;
ALGORITHM ON THE IBM 360
Location	 Instruction	 Comment
SR	 8, 8
	 Clear Register 8
SR	 4, 8	 Clear Register 4
a	 LA	 6, 1 (0, 4)	 81 92 •}- Register 6
SLA	 6, (13, 0)
s,	 SR	 5, 5	 Clear Register 5
A3	 L	 3, 0 (5, 2)	 Load Register 3 with one word
from memory
LA	 7, 32 (0, 0)= 32 	 ^- Register 7
A2	 CR	 3,'8	 Compare MSB of Register 3 to
zero. If zero do not
increment Register 4-BH	 Al
LA	 4, 1 (0, 4)
Al	 SLL	 3, 1 (0)
	 Shift left one position
Register 3R	
BCT	 7, A2	 Go to A2 unless 32 shifts
v	 have occurred
LA	 5, 1 (0, 5)	 Increment pointer to
address new word
CR	 5, 6	 Compare new word address
to 8192
BL	 A3	 Go to A3 unless 8192 words'
have been checked
A .
	 END
Register 2 should contain the absolute address of the first_ word- of
the image before this program is executed.
Register 4 will ,contain the result after the execution of the
..	 program.
hit-
1	 r	 ^_
ty^	 s	 ,_,
69
counting algorithm, the program of Table 6 is very efficient. The
execution time for this program ranges'between 1.06 seconds for an
all-zero tse and 1.15 seconds for an all-one tse, where the average
time is 1.10 seconds.	 ?
The execution times for the three tse implementations of this
research depends upon the propagation delay per gate._ Initially, a
delay of 5 milliseconds per gate has been specified for tse gates
which are being developed [5, page 5]. Of course, future technological
_projections indicate improvement in the propagation delay. Eventually,
the delay is expected to be comparable to the delays of present-day
binary logic, gates.
Table 7 summarizes the processing time per image for the 360, along
with number of gate delays per image of each tse implementation and the
corresponding execution times for different projected values of the tse
!
	
	 gate delay. A comparison indicates that tse hardware structures behave
in much the same manneras binary gate structures. For instance, (1) a
combinational structure will be much faster for any given task than a
sequential structure (computer) because ofrepetitive use of fewer
components in the latter, and (2) a pipeline structure will improve the
image processing rate (number of images per unit time) over that of 'a	 -
combinational structure. However, the actual delay for any image through
the pipeline structure may be longer.
Another indication which results from the comparison is that, for
the basic parallel counting algorithm, the image processing rates for
4.y
	
	the combinational`
 and pipeline methods are on the same order of maanitude
as that of the binary processor, but the programmable tse computer is
i
f
TABLE 7
SUMMARY OF IMAGE PROCESSING TIMES FOR TSE AND
CONVENTIONAL IMPLEMENTATIONS
imageGate Processing time per
Delays tse gate delay:
y
Implementation per Image 5 ms	 5 Us 5 ns
tse-Combinational 756 3.78 sec	 3.78 ms 3.78 ps
tse-Pipeline 76 0.38 sec	 0.38 ms 0.38 us
tse-Computer 68,775 343.8 sec	 0.344 sec 0.344 ms
1
IBM 360 1.10 seconds
li
r
_
 eta
i
_ 1
}
i
i
I
B^I
	
OFi
^^	 -	 PAGE
v	
Ay
O^I , u
-r
71
slower, for the present time. 	 This is not unexpected since tse logic is
fast compared to conventional	 logic (due to the inherent parallelism),
yet slow compared to _delays to present-day binary gates fabricated by
TTL or CMOS technologies (about 10 nanoseconds). 	 However, as the speed
of the tse gate is improved, the processing rate of the tse computer will
surpass that of the binary processor. 	 As the tse gate propagation delay
approaches that of today's binary gates, the advantage of the tse
computer is apparent in that the number of 1-elements	 in a 512 x 512
image can be counted in about one-third of one millisecond.
j
I
Other characteristics which could be used to compare the merits of
the different implementations are total 	 power, total size and weight,
F
dollar cost, speed -power product, and gate count. 	 However, most of
I
t
these are physical	 characteristics which may be refined independently
of each other.	 Therefore, no projections can be made as to when a certain
characteristic will	 be refined to the point that the tse computer is
feasible for a certain application.	 Some of the more meaningful
characteristics are summarized in Table 8. 	 The only conclusion, that can	 a
be drawn concerning these characteristics is that any tse_ processor that
could be built using current fiber optics technology would probably not
be a practical	 replacement for the binary processor.	 Currently, alter-
natives to the power-consuming light sources and bulky optical 	 fibers-
e
are being considered for the fabrication of tse logic components.
The need for improved data-handling capabilities in digital'g
;
computers will	 inevitably lead to research into increased parallelism.
The results of this section indicate that tse computers will
	 not replace
o -conventional	 processors presently, although they have a definite
f
Y
j 	 -
ffi
72
TABLE 8
PHYSICAL CHARACTERISTICS OF TS'E IMPLEMENTATIONS
OF THE COUNTING ALGORITHM
Speed-Power y
w; $ Power	 Product Gate
Implementation Consumption	 @ 5ms/gate Count
tse-Combinational - 1080 W	 4082.4 G!-sec 360
tse-Pipeline 1404 W	 533.5 W-sec 468
"	 tse-Computer 276 W	 94,888 W-sec 92
r
Ai
73
potential for future use.	 From the standpoint of the tse computer in
orbit as an earth resources image processor, many of its characteristics
are very promising.
	 However, the size and power consumption of the
processor must be brought to within limitations for practical
	 spacecraft.
	
r'	 -
The development of the tse computer will, of course, depend primarily
upon the amount of research effort devoted to the concept in the near
future.	 Until	 then,
	 the expansion of today's digital
	 computer will
more than likely take the form of increased parallelism of conventional
logic components.
	 At some time in the future, however, the physical
characteristics of tse devices will
	 be refined to the point where they
are more attractive than are the increasing number of inter-connections
a
required for conventional gates, thus marking the advent of the
generation of tse computers
	 in digital
	
processing.
1
l
i
i
e
r
a-a
LIST OF REFERENCES
1
z
LIST OF REFERENCES
1. Millman, J., and H. Taub, Pulse, 'Digital and Switching Waveforms.
New York: McGraw-Hill, 1965.
2. Unger, S. H., "A Computer Oriented Toward Spatial Problems,"
PROC. IRE, vol. 46, October 1958, pp. 1744-1750
3. Slotnick, D. L., W. C. Qorch, and R. C. McReynolds, "The Solomon
	 ?
Computer," AFIPS PROC. FJCC, vol. 22, 1962, pp. 97-107.
4. Barnes, G. H., et al., "The ILLIAC IV Computer," IEEE TRAM. on
Computers, vol. C17, August 1968, pp. 746-757,
5. Schaefer, D. H. and J. P. Strong, III, "tse Computers," X-943-75-14,
NASA, GSFC, January 1975.
6. Schaefer, D. H., and J. P. Strong, III, "Two-Dimensional Radiant
Energy Array Computers and Computing Devices," Patent Application
Serial No. 468614, May 8, 1974.
	
-
i
{
i
75
VITA
Alan Gerald Metcalfe was born in , on
, He graduated from Clarksville High School in 1969, at
which time he enter^, J
 the pre - Engineering curriculum at Austin Peay
.,	 i
State University. In 1974, he received the Bachelor of Science degree
I
in Electrical Engineering from The University of Tennessee. At this
time, he entered The University of Tennessee Graduate School and earned
the Master of Science degree in Electrical Engineering in December 1976.
In September 1975, he was employed as an Electrical Engineer with
the Tennessee Valley Authority.
q
I	
_
76
