Parallel processing for digital picture comparison by Kou, L. T. & Cheng, H. D.
l" t L_//J
;/%/]
Parallel Processing for Digital Picture Comparison
H.D. Cheng and L.T. Kou
University of California, Davis
Davis, CA 95616
In picture processing an important problem is to identify two digital pictures of the same scene taken
under different lighting conditions. This kind of probTem can be found in remote sensing, satellite signal
processing and the related areas. The identification can be done by transforming the gray levels so that the
gray level histograms of the two pictures are clo)ely matched, The transformation problem can be solved by
using the Ipacking _ method, l-_J_LLgzll_J'_JweYpropose a VLSI architecture consisting of m x n processing
elements with extensive parallel and pipellntng computation capabilities to speed up the transformation wtth the
time complexity O(max(m,n)), where m and n are the numbers of the gray levels of the input picture and the
reference picture respectively. If using uniprocessor and a dynamic programming algorithm, the time complexity
will be O(_3_n). The algorithm partition problem, as an important issue in VLSI design, is discussed.
Verification of the proposed architecture is also given.
Index terms_-_Oi°_i_al picture comparison, packing algorithm, very large scale Integration (VL$1), algorithm
_-- partition, VLSI architecture verification.
I. INTRODUCTION
The technique of dynamic programming has wide applications in computer science [6.7] for solving
mathematical problems arising from multistage decision processes. Based on the dynamic programming path-finding
algorithm, the technique of dynamic programming is both mathematically sound and computatlonally efficient. The
recent advent of very-l_rge-scale integration (VLSI) technology has triggered the thought of implementing some
algorithms directly in hardware with extensive parallel and pipellnlng computation capabilities. The use of
VLSI architectures to implement dynamic programming procedures has been investigated for several applications.
Guibas et al. [8] describes a VLSI architecture for a class of dynamic programming problems characterized by
optimal parentheslzat!on. Chu and Fu [g] describe VLSI architectures for recognition of context-free and
: finite-state languages. Chiang and Fu [10] describe a VLSI implementation of Early's algorithm for parsing
general context-free languages. Cheng and Fu [11] describe algorithm partition and parallel recognition of
general centext-free languages using fixed-slze VLSI architecture. Liu and Fu [12] describe a VLSI implementa-
tion for strlng-dlstance computation. Clarke and Dyer [15] describe four VLSI architectures for line
and curve detection. Cheng and Fu [13,14] propose VLSI architectures for pattern-matching and hand-written
symbol recognition. In this paper, we propose a VLSI architecture for identifying digital pictures if they are
taken from the same scene under different lighting conditions. This is a very important problem related to
remote sensing, satellite signal processing and other areas. As an important issue in VLSI design, the
algorithm partition problem is discussed. The backtracking procedure Is also discussed in much detail, and the
formal verification of the proposed architecture is given. An example is used to illustrate the work of the
proposed VLSI architecture.
II. PRELIMINARY
The image matching technique has been used extensively for many applications such as curvature sequences
detection [2], template matching and pattern matching [I]. character recognition, target recognition, aerial
navigation and stereo mapping, picture matching, earth resource analysis, missile guidance, intelligence
gathering systems, and robotics [2,3].
There are many situations in which we want to match or register two pictures with one another, or match
some given pattern with a picture [2]. For example:
(a) Given two or more pictures of the same scene taken by different sensors, we want to determine the
characteristics of each pixel with respect to all of the sensors and then we can classify the pixels.
(b) Given two pictures of scenes taken from different times, we want to determine the poi,.:s at which they
differ and then can analyze the changes that have taken place.
(c) Given two pictures of a scene taken from different positions, we want to identify corresponding points
in the pictures and then determine their distances from the camera to obtain three-dimensional
information from the scene.
https://ntrs.nasa.gov/search.jsp?R=19890017140 2020-03-20T02:08:01+00:00Z
(d) We want to flnd places In a picture where It matches a given pattern.
In this paper we want to discuss another very Important aspect In picture processing which Is to identify two
digital pictures of the scene taken under different 11ghtlng condltlons. These kinds of problems arise from
many areas such as remote sensing, satellite slgnal processing, and etc. The Identification can be done by
transforming the gray levels so that the gray level histograms of the two plctures are closely matched.
Mathematically, a plcture is defined by a function of two variables F(x,y), where F(x,y) Is the brlghtness,
or K-tuples of brlghtness values In several spectral bands, [2,4,5] and x and Y are the coordinates In the l_ge
plane. In the black and white case, the values are called gray levels. These values are real, non-negatlve,
and bounded. The plctures are represented as matrices wlth integer elements which are the pixels. A gray
level histogram of an image is a functlon that glves the frequency of occurrence of each gray level In the
image. Where the gray levels are quantlzed from 0 to n, the value of the histogram at a particular gray level
p, denoted H(p), Is the number of fraction of plxels In the Image wlth that gray level [5]. When pictures of
the same scene are obtained under different 11ghtlng conditions, dlfferent histograms are gained. For
Identlfylng these pictures, we can transfom thelr gray level scales so that their hlstograms Would closely
match each other.
Assume that HI and H2 are histograms of two pictures obtained from the same scene wlth m and n gray levels,
respectlyely. An algorithm Is proposed to "reshape" H I (I.e. rescale Its gray levels) so that it has the mini-
mal deviation from H2. The mathematical problem Is defined by:
Z - MIn ? mH2(J).i!p,l(1)l(Xo .....Xn)j. I
where P = Xj. I and O-Xj-I subject to
l-Xo(Xl(...<Xn.m+l (I)
X I = integer, for i = I,...,n.
It will transform the gray levels xj.l,...,xj-In one picture Into gray level j In the other picture, for suit-
ably chosen Xj. I and Xj,j = I..... n.
This problem can be interpreted as a packing problem: to pack m objects of sizes {H1(1),...,Hl(m)} Into n boxes
of spaces {Hz(1) .....H2(n)} in such a way that
(i) if the ith object has been placed in the jth box, the (i+I) th object is not allowed to be packed into
the k th box for any K < j, and
(ii) the accumulated error due to space over-packed of leftover is minimized.
Such a problem can be solved by using dynamic programming techniques. Let Sj(i) be the minimal accumulated
error caused by transforming the gray levels I,...,I into the gray levels i .....j. The recursive formula is
given by i
Min
Sj(i) = O<u<i{Sj_1(u)+IH2(J)- _ HI(V)I } (2)
V=U+]
For i=l .....m and j=I ..... n
I J
where the initial conditions are So(O ) • U, So(1) = X" Hi(v) For all i = I .....m and Sj(O) = " H2(u) for all
v=_l u;I
J
j = I..... n. If i>j, then Z HI(k) = O. The minimal accumulated error, Sn(m), can be computed.
l(=i
The straight Forward execution of Chis procedure would obtain the optimal solutions for all (i,j) pairs
with time complexity O(m3xn) by using uniprocessor. In this paper, we want to propose a m x n VLSI array to
speeo up the computation. The time complexity for the proposed architecture is O(max(m,n)).
Ill. VLSI DIGITAL PICTURE COMPARATOR
3.1 The algorithm and its VLSI implementation
We will propose a VLSI architecture based on the space-time domain expansion approach [14,15], which has a
very natural and regular configuration and can be implemented easily by applying today's VLSl technolo!ly.
Another important issue in VLSI design - algorithm partition problem is also solved by using the proposed VLSI
architecture. The proposed VLSI architecture can soeed up the digital picture comparison procedure greatly
by using extenslve parallel and plpelinlng techniques. Before discussing the VLSI architecture in detail, we
propose the following algorithm.
Let HI and H 2 be the histograms of two pictures taken at the same scene with • and n gray levels, respect-
ively.
156
Algorithm 1: The algorithm for dtgttal p|cture comparison:
So(O) :- O;
for t :- 1 to m do
begtn
So(1) :- O;
for k :- 1 to t do
So(t) :- So(t)+Hl(k)
end;
for j :- I to n do
begin
Sj(O) :- O;
for k :- 1 to j do
Sj(O) :- Sj(O)+H2(k)
end;
for t :- 1 to m do
for u :- 0 to t-1 do
begin
v :- u + 1;
T(v) :- O;
for k := v to I do
T(v) := Hl(k) + T(v)
end;
for I := I to m do
for j :. I to n do
begin
T' := Sj.l(i) + H2(J);
I := i (index channel value);
T := O;
for u := 1 to t-1 do
begin
v := u +1;
S := O;
for k := I tO v do
begin
s := s + Hl(k);
T := ]H2(j) - sl;
157
T :- sj.l(u) * T;
If T ° < T then
t' := T and output v to the index channel by letting I := v
end
end;
end
Append index-pair if,j-l) to index-pair it,j), when the identification signal arrives, and form
{(I,J-l), (l,j)}.
end.
We can butld a VLSI array with m x n processing elements. Each processing element has a subtracter which
wlil produce the absolute value of the two inputs difference, a comparator whlch will compare two input values
and output the smaller value with the corresponding index to the next processing element below it. The
functions perfohmed by the (i,j) th processing element are as follows:
i
Input: H2(J), outputs of (l-l,j) th processing element,v_iHl(v), Index-palr, Sj.L(i-I ) and Sj_I(I ).
Output: Sj(1) and index-palr to the right element when the identification signal arrives, and the intermediate
results to the processing element below.
Operations: Each processing element has a local connection to the processing element beneath it which wlll
accept the Intermediate results Includlng the accumulated errors and the Index-palrs, and has a local connection
to the right processing element which will receive Sj(1) and index pair (l,j) when the identification signal
arrives. Each processing element can perform accumulation, la-bl, and comparison operations, and requires one
tlme unit. The adder uses the combinational circuit, which will not require the time unit, or its delay Is much
smaller than a time unit. The data wll] move among the processing elements, one processing element per tlme
unit.
I) Input data of H! arrives at the (i,j) th PE and performs the accumulation of each for one time unit.
i
IH2(J)-!HI(V) needs one tlme unit,
2) Sj_1(i-1) arrives at the (i,j) th processlng element and it performs Sj.1(i-1) + IH2(j)-H1(i)i operation
wBlch requires one time unit. The result is delayed one time unit.
i
3) Ocu<i-2{Sj_1(u)+IH2(j)- Z Hi(v)) } arrives at the (i,j)th processing element from the (i-l,j)t h pro-
v=u+1
cessing element and compares with the result of step 2) which requires one time unlt, At the same time,
the identification signal arrives and the result will compare with Si.l(i)+H2(j) which w111 require one
time unit. Then Sj(I) and the index-pair will be sent to the (i,j+1_ tB processing element.
A1gerithm 2 VLSI implementation of Algorithm I
Input: Gray levels of the input picture -Hi(1), and of the reference picture -H2(j) (for I<i_, 1(j<n);
irdices, index pairs; initial conditions: So(O), So(i). and Sj(O) (for 1(i_n, 1_(n); and identification
signals.
Output: The accumulated error Sj(i) and corresponding Index pairs
Move the gray levels H2(J) of the reference picture, the identlflcatlon signals, and the index j from the
top to the bottom one processing element per time unit. Move the gray levels Hl(i ) of the input oicture and
index i from the ]eft to the right of the VLSI array one processing element per time unit. The identification
signals will be sent at the fifth time unit and will move down one processing element per two time units ano
move to the right one processing element per time unit, When the identification signal arrives, it will open
the connection channel to the comparator which connects the rigBt processing element, and Sj(i) will be sent to
the processing element (i,j*1). To obtain the 'packing' sequence, we have to perform a bacEtracklng procedure
which can be done in several ways as follows.
I) Output the accumulate error matrix S and/or the index-palrs to the host machine which will perform the
backtracking proceaure.
2) Attach another VLSI module and use the tag of the index pair as the search key to perform the backtrack°
Ing procedure.
3) Expand the 'append' operation to the one which appends the index into the index list of its ancestor.
158
An index list is formed by appending an index or an index list. We can use tndex (m,n) as the tag to
find the 'packing* sequence. This will change the backtracking procedure Into forward and speed up the
computation, but it requires a large output channel capacity, especially for the processing elements
located at the upper-right corner. The upper bound of the channel capacity for the it,J) processing
element will be (t+j+l).
4) Add an Index register to each processing element wh]ch consists of two parts, the first part for the
first Index and the second part for the second index. The second part of the index pair register will
compare with the tag. If they are matched, the second part ts output Into the output channel and also
output into the first part as the tag to Its top and left side neighbors. The tag will move up until it
match with another Index pair. The procedure will be continued. We need one Index register for each
processing element. At the (2I+J+3) th time unit, send a backtracking signal which moves along the
channels connecting to the ]eft neS@hbor and the one on the top of It, each processing element per time
unit. The Index (m,n) Is used for the tag of the (m,n)th processing element. It needs at most (m+n)
time units to complete this procedure.
From the above discussion, we can conc]ude that the proposed architecture can compare two digital pictures
by transforming the gray levels. In many applications, only the summation error ts required. In such cases, we
can simplify the structure of the processing element and the entire VLSI architecture further. If there are P
digital pictures which are compared with the reference picture, or an Input digital picture compared wlth P
reference digital pictures, we can make a P-tlme expansion. The time complexity will be O(max(Pxm,n)). If
using uniprocessor, the tlme complexity will be O(Pxm3xn). For Indicating the most matched digital picture, we
number the digital pictures and add a regIster consisting of two parts. One part is for the summation error,
the other is for the Index of the numbered digital pictures. We also add a counter which Is Initially set to
zero and starts at (2_n+3)rd time unit.
The operation of the register Is as follows:
begin
error.reglster:--;
if error.register > error array
then begin
error.reglster:=error.array
index.reglster:-counter
end end
The final result of Index.reglster indicates the index of the most matched digital picture. If we use a
three dimensional array (Pxmxn processlng elements), the tlme complexity wll] be reduced to O(max(P,m,n)).
The detail will be omitted here.
IV, VERIFICATION OF THE PROPOSED ALGORITHM
To verify algorithm 2, we need the following lemmas and theorem.
Lemma i: The identification signal _rrives at the (I,j)th processing element at the (2i+j+2) th time unit.
Proof: The identification signal is sent at the fifth time unit and it needs 2(i-I) time units tO reach
the itn row, it then needs j time units to arrive at th_ (i,j) th processing element. Totally, 5+2i-2+j-1-
Zi+j+Z time units.
i
Lemma 2: _ HI(V) will be computed at tne (v,j) th processing element at the (l+v+j-2) th time unit, for all
v
I(v(1,1<i<m and I<j_n.
Proof: First consider the j-I case. From the data arransement in Fig. 3, the first input of the vth row
will arrive at the boundary of the array at 2(v-l) th time unit, then (l-v+l) tlme units are needed to compute
i
ZHI(v). Totally, 2(v-l)+(i-v+l)-l+v-I time units are needed. Since the computation of the (v,k) th processing
v
element will start one time unit earller than one of the (v,k+I) tb processing elements, the time units needed
for the (v,j) th processing element wlll be i+v+j-2 to produce the summation.
Theorem: After receiving the inputs, Sj(i) will be produced at the (2i+j+3) th time unit, for all l<i_ra and
l<j(n.
Proof: We prove the theorem by induction on i and j.
Basis: First we consider i-j-i case. Since So(0) and So(1) are fixed values which exists already, HI(1)
the inputs into the processing element (l,l) and it performs the accumulation which requires one time unit.
159
Then JH;_(1)-HI(I)I ts performed by spendtng another ttme unit. It wt|l be added to So(O) and delayed one time
unit. At the forth time unit, the resu]t wtll be compared wtth S1(0). When the Identification signal arrives
at the fifth time unit, the result of the coq)argtor krl]l co_N_are wtth So(1 ) + H2(1 ) and output S1(1) _tch
needs one more ttme unit, 6-2xl+l+3-2xt+J+3.
Induction Step: Our induction hypothesis ts that a]| (p,q)th processing e|ements can produce the outputs
and the Index-pairs at the (2xP+q+3) th ttmo unit, for ai| l<p<t and l(q<J.
t+|
Now consider the (t+l,J) th processing element. According to ]emma 2 and the hypothesis, _ Hi(v) wt11 be
V
computed at the (v,J) th processing e|ement, (t+l+v+J-2) th tlme unit, for al] l(v(t, l<t<m and l_j_n and the
t+1
comparators are connected tn a ptpellne version, so M - _l(Sj.1(u)+lH2(J)-_ Hl(v)l} wt|| be output from the
v=u+l
_t,J-l) th processing element, (2t+J-1+2+3) th ttm untt. Also Si.1(t+1) wt|| be Input at the 2x(l+l)+J-l÷3 th
ll#e unit. At the same ttme N,.Sj.I(t+I)+H2(J) wt11 be coeq)utea: _tt_ordtng to |emma 1, the identification
stgna| arrtves at the (t+l,j) tn processing element at the 2(t+_+J+2 time unit. Then , and N wt11 be
compared, the minimum {M,N}mSj(t+I) will be sent to the (t+l,J) processing element at the (2t+j+S) th ttme
unit. Stnce S +1(t÷1) wtlI lae one time untt |ater than $j(|+1), Sj+l(t÷l) wt]| be obtained at the (21÷J÷6)-
2(l+l)+(J+l)+3Jthe ttme unit. Therefore the proof ts c_)|eted. "
Corollary 1: The accumulated error and the Index pairs can be obtained at the (2m+n+3) th time unit.
Proof: Follow the theorem and let l-m and j=n.
V. ALGORITHM PARTITION
We could use a one-dimensional array or a two-dimensional array with stze different to the problem size by
performing time expansions following the partition rule.
A. Using the One-Dimensional Array
First we assume that the size of the array ts m. We can consider tt as an m-space expansion along the xl
direction. The input channels wtI! form the queues. The register will hold the tntttaI value and the result
from the CR output which wll' input into the register by the control stgnal. The control signal Is sent at the
(m+l) th time unlt and moves down per two tlme units and one processing element, The Input wt]] repeat n times.
The time comp]extty wtll De O(m x n).
B. Using the Two-Dimensional Array with the Dimensions KxI
If k_m and _n, it tS the case which has already been diSCuSsed. We now consider the other cases. Accord-
ing to the partition rule we nave to make an [m/k] -ttme expansion and an [n/1] - time expansion. There are
also queues for feedback of the data. The lengths of the queues wtl] be varying with the values of m and n to
make the right data meet at the right processing element at the right time. Thts wtl1 cause much aifficulty to
the control system and the queue structures. Hence, we either use a sufficiently large size VLSI architecture
or use a one-atmensional array to solve the partition problem.
Vl. CONCLUDING REMARKS
We have proposed an VLSI architecture for dlgltal picture comparison. The time complexity wi11 be O(max
(m,n)) by using a two-dlmensional m x n array, where m is the gray level of the input digital picture and n IS
the ray level of the reference digital picture. With a uniprocessor, the comparison process will _ave the time
complexity O(m3xn) if using the straightforward computation approach. If there are p reference pictures using
the proposed architecture, the comparison process will be solved in time O(max(mxp,n)); and using a unlprocessor
the time complexity will be O(mxpxn). If using a three-dimensional array, this problem can be solved in ti_e
O(max(m,n,p)). One important issue, the algorithm partition of the VLSI design is discussed and formal verifl-
cation of the proposed VLSI architecture is glven. The proposea architecture will be useful for remote sensing,
satellite signal processing, and other related areas. It can also be useful for other 'packing' related
and for real-time digital picture processing.
tasks
LIST OF REFERENCES
i. K. S. Fu, Syntactic Pattern Recoqnitlon with Appllcatlon, Prentlce-Ha11, Inc., 1982.
2. A. Rosenfeld and A. C. Kak, Digital Picture Processing, VoI. 2, Academic Press, New York.
3. R. J. Offen, VLSI Image Processin 9, Collins Professional and Technical Books, William Collins Sons & Co.
Ltd., 1985.
4. T. Pavlidis, A1_orithms for Graphics and Ima_ Processing, Computer Science Press, 1982.
5. D. H. Ballard and C. M. Brown, _uter Vision, Prentice-Hail, Inc., Englewood Cliffs, New Jersey, 1982.
6. R. E. Bellman, Dynamic Programming, Princeton, NJ: Princeton Univ. Press, 1975.
160
7. K. Q. Brown, "Dynamic programlng in computer science," CHU Tech. Rap., February 1979o
8. L. J. Gutbas, H. T. Kung and C. D. Thompson, "Dtrect VLSI tmplemntatlon of co_)tnattonal algorttl_ns,"
Caltech Conf on VLSI. January 1979.
9 K H. Chu and K. S Fu, "VLS! architectures for htgh speed recognition of general context-free languages
and ftnlte state languages," Proc. 9th Mnual lnt'l. _ Co,put. Arch--, Austtn, TX, Aprt1 1982.
10. Y. T. Chtang and K. S. Fu, "Parallel parstng algorithm and VLS! twlmentattons for syntactic pattern
recognition." IEEE trans, on Pattern Anal. Hachtne lntell_..,., Hay 1984.
11. H. D. Cheng and K. S. Fu, "Algorithm partition and parallel r_cognttton of general context-free languages
ustng fixed-size VLS] architecture." Pattern Recognition, Vol. 19, No, 5, 1986.
12. H. H. Ltu and K. S. Fu. "VLS! arrays for rain|man-distance classifications." |n VLS! for Pattern
Recognition and Image Processing, edtted by K. S. Fu, Sprtnger-Verlag, Berltn, Hetdelberg,--Tg84"_
13. H. D. Cheng and K. S. Fu. "VLS! architectures for strtng matchtng and pattern matchtng." to appear In
Pattern RecoDnttlon, Vol. 20, No. 1, 1987.
14. H. D. Chen9 and K. S. Fu, "VLS! afch|tecture for dynamtc time-warp recognition of hand-written s.mbols."
IEEE Trans. Acoust., _o Signal Processing, Vol. 34. No. 3. June 1986.
15. N. J. Clarke and C. R. Dyer. "Curve Detection tn VLS|," VLSI for Pattern Recognition ancl Image Processing,
edited by K. S. Fu, Sprlnger-Verlag, Berlin. Itetdelberg, _)Ta".'
16. W-H Chow and L. T. Kou. "Match|ng Two Otgttal Pictures," Proc. International _ Symposium, December
1978.
161
