














Annual Report for Period October 1985-September 1986






Rear Admiral R. H. Shumaker D. A. Schrady
Superintendent Provost
The work reported herein was supported by Naval Electronic Systems Command.
Reproduction of all or part of this report is authorized.
This report was prepared by:
SECURITY CLASSIFICATION OF THIS PAGE (Whan Dale Entered)
REPORT DOCUMENTATION PAGE READ INSTRUCTIONSBEFORE COMPLETING FORM
1. REPORT NUMBER
NPS-62-86-007
2. OOVT ACCESSION NO. 1. RECIPIENT'S CATALOG NUMBER
4. TITLE (end Submit}
Similarity Counting Architecture for Object
Detection
S. TYPE OF REPORT a PERIOD COVERED
Annual Report for Period
October 1985-September 1986
S. PERFORMING ORG. REPORT NUMBER
S. CONTRACT OR GRANT NUMBERr*;7. AUTHORf*)
Chin-Hwa Lee
tO. PROORAM ELEMENT. PROJECT, TASK
AREA * WORK UNIT NUMBERS
9. PERFORMING ORGANIZATION NAME AND ADDRESS
Naval Postgraduate School
Monterey, CA 93943-5100




IS. NUMBER OF PAGES
24
14 MONITORING AGENCY NAME ft AODRESSf'/ dlllerent Irom Controlling Ottlca) IS. SECURITY CLASS, (ol thl» report)
Unci as si fied
ISa. DECLASSIFICATION/ DOWNGRADING
SCHEDULE
16. DISTRIBUTION STATEMENT (ol (hi. Report)
Unl i mi ted
17. DISTRIBUTION STATEMENT (ol the abetract entered In Block 20, II dl/lerent hem Report)
18. SUPPLEMENTARY NOTES
19. KEY WORDS (Continue on revert* elde II neceeeary ltd Identity by block number)
Parallel Architecture, Content Addressable Memory, Object Detection,
Image Correlation
20. ABSTRACT fConilnut on reveree elde Hi eery end Identity by block number)
A new algorithm to detect object in image is presented here. It can achieve
similar results as the two dimensional correlation method with shorter
execution time. An architecture using content addressable memory is
implemented in the SCALD CAD environment. The design of the shift counter




71 1473 EDITION OF t NOV 65 IS OBSOLETE
S'N 0102- LF- 014- 6601 SECURITY CLASSIFICATION OP THIS PAGE (Whan Date mnlerad)
SECURITY CLASSIFICATION OF THIS PAOe fWMn D— Bntmr^Q
S'N 0102- LF- 014- 6601
SECURITY CLASSIFICATION OF THIS PACEfWfc»n Dmlm Enffd)
Similarity Counting Architecture For Object Detection
Chin-Hwa Lee





In digital image processing it is often necessary to examine an image and
ask the question whether a known object exists in the input image. This is
generally referred to as object detection problem. The object can be anywhere
in the field of view, and can have different aspect angles or sizes. The
method to find the object in the image should be position invariant, scale
invariant, and orientation invariant to the difference between it and the
model image called template. In aerial photography, satellite IR imagery, or
multispectral images such as those collected from LANDSAT, the object in an
image may have position shifts, scale change, and aspect change due to
different camera position or sensor distortion. For calibrated measuration it
is also necessary to associate the views in two different images as belonging
to the same object. This is usually called image registration problem. The
usual approach for automatic image registration is to do two dimensional
correlation or use sequential Similarity Detection method [1,2]. A new
algorithm to solve the problems mentioned above is proposed here. First,
the algorithm is introduced. This algorithm, can be implemented in a special
architecture which uses the associative memory or content addressable memory
(CAM). Finally, a logic design of a one-dimensional implementation is
discussed here in details.
Similarity Counting Algorithm : ~
For simplicity let's start the discussion of a one dimensional problem.
The input signal is a sequence f(x), where x=0,1, ..., L-1 . The template
signal is a sequence of g(x), where x=0,1, ..., M-1. Assume that there is no
need to worry about the aperiodic correlation. A cyclic correlation of f(x)
with g(x) is good enough to indicate the existence of the object in the input.
1 f(x) = fCx), x-0,1 L-1 (1)
construct a periodic ge (x)=ge (x+nL) ; where n is an integer.
ge (x) = g(x), x-0,1, ..., M-1
0, x=M, ... f L-1 (2)
the periodic correlation is:
L-1
h(x) - [ f(m)g (m+x) (3)
m=0
Step 1 : Shift operation
input sequence: f(0), f(1), f(2) f(M), ...f(L-1)
template sequence
shifted by one position g(0), g(1), ... g(M-1)
Step 2: Operate and accumulate
f(0), f(1), f(2) .... f(M), ...f(L-1)
g(0), g(1) .... g(M-1)
sum: = P[f(1),g(0)]+P[f(2),g(1)]+...+P[f(M),g(M-1)]
where; P is a multiply operator
Step 3: Assign the sum to output sequence h(x)
h(1):= sum
In sequential algorithms the above steps are performed for all the samples in
the output sequence. Therefore, there are M multiply-add operations in the
steps for each output point. A total of M*L multiply-add operations is
required to get the correlation sequence. Basically, it is an expansive
operation.
If this formulation is extended to a two dimensional input f(x.y) of L*L
pixels and a two dimensional template g(x,y) of M*M pixels, the shift







The "operate and accumulate " will be applied to all the point pairs in the
overlapped area between f(x,y) and g(x,y). The accumulated sum is assigned to
the output sequence h(x,y). As is obvious, the the total operation to obtain
h(x,y) is L2*m2 multiply-add operations. For an input image of 400x400 pixels
and a template image of 56x52 pixels, the correlation required a typical
execution time of two to three hours (10,000 sec) approximately on the
VAX1 1/750. At the end, one examines all possible local maximum points in the
correlation image plane. Those are the places where the image signal is most
similar to the template image. Barnea recognized this computation difficulty
and proposed a different operation in step 2 which is less expansive than the
multiplication. The operation in step 2 of the "sequential similarity






Instead of performing multiplication doing absolute difference saves some time.
If the image pixel f(x) at x is similar to the template pixel g(x+n) at x+n
the cummulated absolute difference will be small. The original approach of
finding the local maximums in the correlation function becomes finding the
local minimums in the output image plane. Barnea also recognized that if at a
certain position the input image is very different from the template image,
the cummulated sum grows quickly. If it exceeds a certain threshold, it is
advantageous to abort the "operate and accummulate" step early. Onoe improved
the algorithm to allow Automatic threshold settings so that the total number
of operations performed was further reduced [2].
A new approach called Similarity counting algorithm is proposed here which
can allow algorithm implementation in parallel hardware. However, the new
algorithm can also be implemented in sequential software similar to the
similarity detection method mentioned above. The operation in step 2 can be
generalized to a comparison operation. There can be the exact comparison or
the comparision with tolerance. The exact comparison operator P can be
defined as:
P Cf(x),g(x+n)] =1 , if f(x)-g(x+n) (5)
, otherwise
The comparison operator P^ with tolerance t is then:
P tCf(x),g(x+n)] = 1 , | f(x)-g(xm)| -t (6)
, otherwise
In this approach the operation has been further reduced to a comparison
operator that yields binary results. The sum operation can be replaced by
counting the comparison results. Similar to the correlation method, the
answer is found to be the local maximums in the output image plane, which is
the opposite situation from the sequential similarity detection method. This
approach avoids any algebraic calculations and requires only logical
comparisons. If it is implemented in sequential software, it can also have
all the advantages proposed in Barnea's and Onoe's techniques. The approach
of similarity counting with tolerance should have additional advantages
whichmakes the technique more orientation independent and scale independent.
From the test results of the simulated algorithms in software, evidence shows
those advantages. The new approach also allows better registration of images
and the template with different gray tone biases. The results showed that if
there is prior knowledge or estimate about the tone bias or geometric
dictortion, it's possible to select an appropriate tolerance to minimize the
miss registration in the result. Therefore, the new technique in many
respects is compar able or superior to both the correlation method and the
sequential similarity detection method.
Parallel Hardware Architecture :
A parallel hardware architecture is studied here to implement the
similarity counting algorithm. In order to explain the ideas of using
parallel hardware such as the associative memory, content addressable memory
(CAM), it is necessary to explain an alternative parallel algorithm. For
simplicity again, assume a one-dimensional case of f(x) with 8 points and g(x)
with 3 points. Let's list the calculated output h(x) for all the points;
h(0)= P[f(0),ge (0
h(1)= P[f(7),ge (0
h(2) = P[f(6),ge (0
h(3)= P[f(5),ge (0



















] +P[f(3),ge (2)] (7)
As was mentioned above for similarity counting algorithm P operator for exact
match is shown in equation (5) and that for tolerance match is shown in
equation (6).
Let's assume that there is a content addressable memory whose functional
block diagram is shown in Fig. 2. The content addressable memory has 32
memory cells. Each cell stores an 8-bit byte which is accessible by its
address in the same way as the regular random addressable memory (RAM). There
is a register called comparant register that can store a desired bit pattern,
and the CAM can search that pattern in the cells. It is possible to search
part of the bit pattern of the comparant using the bit enable mark. If the
bit in the mark is set to one, there will be a match of the corresponding bit
between the comparant and the cells. For example,
the comparant is: 100001 XX
and the bit enable mask is: 11111100
Bit 2 to bit 7 of all the cells will be matched with those of the comparent.
The bit and bit 1 of the comparant will not affect the result because they
are masked out in the comparison. It is possible to implement the tolerance
match equation (6) of the similarity counting algorithm using the bit enable
mask capability of the CAM. For the above arrangement, a byte 10000100
located at address 3 will be found in the CAM operation as shown below.







*'+ **•+ •+ •+ '*•+ • + •*'+ •*•+•*'*•+•*•*•+ •+ •*•*•+ •*'+'+•*>
.\. \. %. s. \. %. \. \. v.. \.v.\.%.%.s.s.n.\.s.s.s. %.%••».,.•».
,.S»S'S»S»S«S«S»S»S»S»S»S«S»S«S«S«S«S«S«S.S»S»S»S^
».S'S>S>S«S-S>S>S'S>S-S-S>S'S>S>S«S-S'S.S<S*S-S>S'S>S>











(found) 10000100 - cell 3
Assume further that no other cell matches with the comparant for the specified
bits in the upper bank of the cells. A match word whose bit pattern
corresponding to the cells in the CAM can be read out at the end. Its address
can be decoded. For the above example, the read out word will be hexadecimal
1000. If there is another byte 10000101 located at cell address 1, then the
read out match word is hexadecimal 5000. By checking the bits in the read out
match word, the location of all matches in the CAM cells can be determined.
The CAM can perform the P operation defined in (5) and (6) parallel in
space over a number of data stored in the cells. Examine the equations listed
in (7). In the usual sequential algorithm the h(x) is calculated one row at a
time. It is conceived that in a parallel algorithm the input f(x) can be
stored into the cells of the CAM. Each time we take one sample of the
template g(x) and use it as the comparant. Then, at one operation time of the
CAM, the match word of the CAM yields one column of the equations in (7). The
critical issue is about how to use the match word output from the CAM to yield
the result. At this point you can see that you need to operate CAM a number
of times proportional to the size of the template, M. If everything else can
be done in hardware, the total number of operations for a one dimensional
signal is reduced from L*M to M as compared to that of traditional correlation
method. This situation is even better than the FFT algorithm which reduce a
N^ problem to a Nlog N problem.
Since the output of a comparison operator P is either binary or 1 , the
accumulation of the comparison result can be simplified to binary counting.
It is decided that parallel binary counters can be used to accumulate the
results. If we perform the comparison of the template g(x) in the reverse
order, the accumulation starts with the right most column in (7) toward left.
This way the index argument of f(x) in P operators decrease its value. If the
f(x) is stored in CAM cells in fixed positions, the result in the accumulation
has to be shifted toward the least significant direction before it can be
added with the later result. In other words, the match word read out need to
be shifted so that the accumulations in counters is done in the same way as
that described in the rows of (7). It is found that it is advantageous to
shift the counter instead of the match word. That can minimize the total
number of shifts required. Shown in Fig. 3 is a simplified view of the shift
count operation discussed above. A bank of binary counters are connected into
a parallel load ring. The bit of the match word enables the counter 0, and
bit 1 for the counter 2, and so on. When this operation finishes, the
counter has the result of h(0) left in counter which happens to be the index
of the first argument of the P operators performly most recently. From
equations in (7) it is noticed that the index of the f(x), the first argument
of P, is the L's complement of the output index of h(x). Therefore, there is
a simple last data shuffling necessary.
The parallel algorithm can be summarized as follows;
(1) load the input f(x) into the CAM cells in the forward order.
f(0) + cell
f(1) - cell 1
f (31 ) * cell 31
(2) load in one comparant from g(x) in the reverse order.
1st time: g(M) + comparant
2nd time: g(M-1 ) -comparant
Mth time g(0) comparant
(3) For each of the comparant loads, there is a count shift operation.
(4) At the end, do a data reshuffling based on the L's complement of
thecurrent position.
10


































A Logic Design for a One-Dimensional Implementation ;
Assume that a CAM memory like that described in the previous section
exists. Details of the CAM memory will not be discussed here. A
one-dimensional implementation of the similarity counting algorithm as a
special purpose peripheral subsystem is shown in Fig. H. There are *J main
parts in this subsystem. The first part is the CAM which stores the input
f(x) in cells. The second part is the template where the g(x) is stored. The
third part is the shiftcount unit, and the Fourth part is the controller. The
subsystem has its usual interface to the host buses. There are decoding
circuitry and parallel ports which accepts command from the host and receive
f(x) and g(x) to detect the object in the signal. The status of the operation
is reported back to the host. If the operation is successful, the result will



















Fig. 4. the special peripheral subsystem.
13
The system level interface between the host and the subsystem is as
follows. The subsystem accepts inputs as;
.Image data, f(x).
.Template data, g(x)
.Match tolerance, to set the bit enable mask in CAM.
The subsystem produces the outputs as;
.Locations of the best match clusters
.The number of the match clusters
.The locations and the counts of the other match clusters
The host should be able to use this information in an application program.
The template buffer is just a regular buffer HAM memory with no special
circuitry in it. The details of the shift count unit are discussed here. The
operations of the CAM memory, the template buffer, and the shiftcount unit are
coordinated by the control unit. To achieve high speed operation and flexible
configuration for two dimensional signal cases the controller can be
implemented in bit-slice microprocessors. For a one dimensional case the
situation is simple, the controller unit is implemented directly in hardware
logic. The design of the control unit is not discussed here.
Shift Count Unit:
Following the algorithm discussed in the previous section, the binary
counter bank is designed first. Fig. 5 shows the logic diagram of the basic
counter called$cell 1. Two F163's are used to provide an 8-bit binary counter.


































Fig. 5. The basic counter cell.
8P 3P q<7. .0>\i
>4
15
Usingscell 1, a sixteen binary counter bank called&ell 16 is constructed as
shown in Fig. 6. There are enable inputs EN 15..0 that allow the counters
to be incremented selectively. There are also parallel load paths between
counters in the bank. Using these parts, the shiftcount is constructed as
shown in Fig. 7. The shiftcount has a finite state machine that does
operations at the leading edge of the clock. It sequences to the next state
at the following edge of the clock. The shiftcount incorporates twoscell 16
units. At the end of each CAM operation, the shiftcount is started. The
finite state machine in shiftcount does the following sequence of operations:
At the leading edge At the falling edge
state 0: load in the match word change to state 1
for low bank cells
from CAM
-<tate 1 load in the match word
for high bank cells
from CAM
change to state 2
state 2 send the 32 bit enable
input to the two scell 16
change to state 3
state 3 activate the binary counter
bank for parallel shift upward
change to state 0,











n v * ' oi NX * - o "
vt:c
-a a n vcrc xj n
^







aO -h Q aS -h ^ca -h
s •










If" M - O
NX * - o "r^X * o u
v°-c u « v?C B M •> vCTC xj n "
o - ° /\ tj a ° y\ ° « - A
(p C3 (p cp T T
a *N ^
D. ~Q -« D /sS) -H aS "H
UJ
,3 a s • ft s • —
-H
.N M - »




^X * * o "1
^CTC T3 „ « ^ CTC TD u n vcrc -a u «B »- A " •- A D •- A
cp cp












^X * _ o "r r
*^crc • „ « v CTc -o u « ^ O^C TJ n n
T
=» - ° /\ u a > s^ i° <» - /\





cin 3 ; z




If- * - »



























/ > W 25 v,









>" * - «X * - u y
v O « « ' y\.
C L
T « " " /%



















U u H >>U n u s_






































c I k I c 0-5*\
£> _ae. _2E
c lk*\l










































Fig. 7. Shift count Unit.
18
The interface between the shiftcount and the control unit is defined as
1. Reset: reset the finite state machine and the binary counter bank
2. Start: it starts the shiftcount operation after CAM output is
available
3. CLK : clock input
4. Read : when this level is pulled low, the contents of the binary
counter bank is shifted out at the trailing edge of the
clock.
Timing Verification:
The binary counter bank of the twoscell 16 units are triggered at the
trailing edge of the clock. The timing verification results are shown in Fig.
8. This diagram shows the rising edge of clock at nanoseconds and the
trailing edge of the clock at 50 nanoseconds. For fixed stable START*,
RESET*, RD* inputs, the change (unstable) states are shown as cross-hatch
areas in the diagram. It is obvious that all the intermediate signals in the
diagram as defined in the SCALD signal name sytax are stable at those two
edges. Therefore, there are no timing problems in the proposed design. All
the device propagation delay* are included in the test. For the registers, the
requirement on minimum pulse width, the setup, and hold time are also checked
and verified.
Simulation:
The design has also been checked in the SCALD simulator. Part of the
19
waveform results is shown in Fig. 9. The start pulse synchronized with the
clock is sent to the circuit. Different values of data input D [15..0] are
provided to check whether the counter bank inscell 16's will work correctly.
The reset* function is also checked to see whether the unit can be cleaned.
The simulation was done from time to time 6000 nanoseconds. The basic clock
period is 100 nanoseconds so 120 possible events were tested in the simulation
run. Whenever the RD* is pulled low as shown in Fig. 9, the stored counter
results is output from the tri-state driver. The 19P$S <2. .0> signal shows
the states of finite state machine. As you can see that each time the start*
is activated, the state machine will go through state 0, 1, 2, 3. and return
back to zero. The 82P$Y <31..0> is the enable input to the counter bank that
causes the selective counting. The signal of 56P$Y is the parallel load
signal to the counter bank for shifting upward in Fig. 3. One problem that
is part of the simulation process is its slow response time. For a moderate
design like this it requires almost 50 minutes to get the results.
20
UN*1*F00S7P*A
MERGE*B2P*B< 15. . 0>
MERGE*82PSA< 15. . 0>
MERGE*57P$B< 15. . 0>




































0. 10. 20. 30. -40. 50. 60. 70. 60. 90. 100. Z
Timing Uar i f i cat ion o-f Drawing ' SHIFTCNT' ',
< COMPILATION ON SUN SEP 8 16:08:56 1985
Uen-fior Directives:
CLOCK.PERIOD 100.0!
CLOCK.INTERNALS 10 < 10. 0ns >!
CLOCK_SKEW 0.0!
>;
Fig. 8. Timing Verification Results.
21
1z. uUN$1$F06$74PSY<15. . 0> g
1







UNS1$F02$67PSY< 15. . 0>
UNS 1SF02467PSA*













































i i i i i i i i i i i
































































































>- /\ /\ /\WSSQ
Q_ . . .























































































>- /V /\ /\
Q_ . . .























































In this paper, a new similarity counting algorithm for object detection
in signals is studied. There are the conventional sequential algorithms and
the presented parallel algorithm. A design is reported here that uses content
addressable memory and large counter banks. This design constructed in
hardware can allow object detection using M comparison operations on CAM.
Regular correlation methods require L*M total operations. Presently, all the
logic design of the shiftcount unit, the template, and the control unit for a
one-dimensional case is completed. The details of the logic design together
with the timing verification results and the simulation results are discussed
here. Further study is underway to extend the controller design to include
the two dimensional signal situation. Along with this study, the evaluation
of a sequential similarity counting algorithm compared with the other
algorithms is also underway.
24
References
(1) D.I. Barnea and H.F. Silverman, "A Class Of Algorithm For Fast Digital
Image Registration"
IEEE Trans. Comp. vol. C-21
,
p.p. 179-186, Feb. 1972.
(2) M. Onoe and M. Saito, "Automatic Threshold Setting For The Sequential
Similarity Detection Algorithm"
IEEE Trans. Comp. p.p. 1052-1053, Oct 1976.





1. Library, Code 1422 2
Naval Postgraduate School
Monterey, CA 93943
2. Professor C.H. Lee, Code 62 Le 50
Electrical and Computer Engineering Dept.
Naval Postgraduate School
Monterey, CA 93943
3. Defense Technical Information Center 2
Attn: DDC-TCA
Cameron Station, Building 5
Alexandria, VA 22234
4. C.E. Holland, Jr. Code 811 2
Naval Ocean System Center
San Diego, CA 92152-5122






DUDLEY KNOX LIBRARY - RESEARCH
REPORTS
5 6853 01057654 9
