Real time pipelined system for forming the sum of products in the processing of video data by Wilcox, Brian
United States Patent [19] [11] Patent Number: 4,750,144 
Wilcox 1451 Date of Patent: Jun. 7. 1988 
~~ 
[54] REAL TIME PIPELINED SYSTEM FOR 4,430,721 2/1984 Acampora ........................... 364/724 
FORMING THE S U M  OF PRODUCTS IN 4,432,066 2/1984 Benschop ............................ 364/758 
4,450,533 5/1984 Petit et al. ........................... 364/724 
4,489,393 12/1984 Kawahara et al. 364/728 
THE PROCESSING OF VIDEO DATA 
Brian Wilcox, Tujunga, Calif. 4,574,357 3/1986 Pastor et al. ...................... 382/42 X 
................. 
4,623,923 11/1986 Orbach ............................. 382/42 X 
4,665,556 5/1987 Fukushima et al. 382/41 The United States of America as 
reeresented bs the Administrator of 
.................. 
[75] Inventor: 
[73] Assignee: 
[21] Appl. NO.: 
[22] Filed: 
the National Aeronautics and Space 
Administration, Washington* 
815,106 
Dec. 31,1985 
OTHER PUBLICATIONS 
R. Bakis, et al., Pipelined Convolver for Two-Dimen- 
sional Images, IBM Technical Disclosure Bulletin, vol. 
14, No. 2, Jul. 1971, pp. 475-476. 
[51] Int. (3.4 .............................................. GO6F 15/34 
[52] U.S. Cl. .................................... 364/728; 364/757; 
382/42 
[58] Field of Search ............... 364/728, 736, 754, 757, 
364/758, 604; 382/42,41,23, 34; 358/166 
[561 References Cited 
U.S. PATENT DOCUMENTS 
3,725,689 4/1972 Kautz .................................. 364/728 
3,866,030 11/1975 Baugh et al. ........................ 364/757 
3,956,622 5/1976 Lyon ................................... 364/757 
3,993,890 11/1976 Peled et al. ......................... 364/724 
4,013,879 3/1977 Bornmann et al. ................. 364/757 
4,121,296 10/1978 Snijders et al. ..................... 364/724 
4,145,931 3/1979 Delforge ............................. 364/724 
4,161,033 7/1979 Martinson ........................... 364/728 
4,255,794 3/1981 Nakayama ........................... 364/724 
4,293,922 10/1981 Davio et al. ........................ 364/757 
4,322,810 3/1982 Nakayama ........................... 364/724 
4,328,426 5/1982 DOrtenzio ....................... 382/34 X 
4,334,277 6/1982 Bond et al. ..................... 364/758 X 
4,347,580 8/1982 Bond ............................... 364/728 X 
4,388,693 6/1983 Nakayama ........................... 364/724 
4,407,013 9/1983 Kanemasa ........................... 364/736 
4,231,100 10/1980 Eggemont ......................... 364/724 
Primary Examiner-Gary V. Harkcom 
Assistant Examiner-Tan V. Mai 
Attorney, Agent, or Firm-Paul F. McCaul; John R. 
Manning; Thomas H. Jones 
1571 ABSTRACT 
A 3-by-3 convolver utilizes 9 binary arithmetic units 
(10) connected in cascade for multiplying 12-bit binary 
pixel values pi which are positive or two’s complement 
binary numbers by 5-bit magnitude (plus sign) weights 
Wi which may be positive or negative. The weights are 
stored in registers (13,14 and 15) including the sign bits 
(shown separately for convenience). For a negative 
weight, the one’s complement of the pixel value to be 
multiplied is formed at each unit by a bank of exclusive 
OR gates under control of the sign of the corresponding 
weight Wi, and a correction is made by adding the sum 
of the absolute values of all the negative weights for 
each 3 x 3 kernel. Since this correction value remains 
constant as long as the weights are constant, it can be 
precomputed and stored in a register (16) as a value to 
be added to the product PW of the first arithmetic unit. 
10 Claims, 2 Drawing Sheets 
REGISTER 
https://ntrs.nasa.gov/search.jsp?R=19880014785 2020-03-23T19:52:07+00:00Z
US. Patent JW. 7,1988 Sheet 1 of 2 
t 
4,750,144 
US. Patent J U ~ .  7,1988 Sheet 2 of 2 4,750,144 
REGISTERS 
M u l t i p l i c a n d  pi 1 0 1 0 I n p u t  
M u l t i p l i e r  Wi 1 1 0 1 I n p u t  
s = c lwil or PW+S o o o 1 o o 1 1 o I n p u t  
STEPS 
1. Enter  S 0 0 0 1 0 0 1 1 0 Accumulator 
1 0 1 0  
2. Add pi 0 0 0 1 1 0 0 0 0 Accumulator 
S h i f t  pi 1 0 1  0 
3.  Add Zero 0 0 0 1  1 0 0 0 0  
S h i f t  pi 1 0 1 0  
4. Add p. 0 0 1  0 1  1 0 0 0  
S h i h  pi 1 0 1  0 
5. Add pi 0 1 0 1 0 1 0 0 0 F i n a l  Product  
PW-tS Output  
FIG.  2 
4,750,144 
c. 
REAL TIME PIPELINED SYSTEM FOR FORMING 
THE SUM OF PRODUCTS IN THE PROCESSING 
OF VIDEO DATA 
ORIGIN OF INVENTION 
The invention described herein was made in the per- 
formance of work under a NASA contract, and is sub- 
ject to the provisions of Public Law 96-517 (35 USC 
202) in which the Contractor has elected not to retain 
title. 
BACKGROUND OF THE INVENTION 
This invention relates to a convolver, namely a sys- 
tem comprising an arithmetic unit for carrying out the 
operation defined by the following equation: 
L 
values which are either unsigned or two’s complement 
binary numbers, and weights are represented by abso- 
lute value binary numbers plus sign bits. 
The number N- 1 of complete scan lines that must be 
5 stored in order to cover successive n-by-n kernels, are 
stored in N-1 series connected buffers, each buffer 
storing a number n of pixels where n is the total number 
of pixels in a scan line. The accumulation of successive 
products p,w; is accomplished by an array of m one-bit 
10 full adders using the absolute values of the convolver 
weights, both positive and negative, and the one’s com- 
plement of the pixel value pi when it is to be multiplied 
by a negative convolver weight wi. The accumulated 
sum of p; 1 Wi I and 6; I w;l are corrected for the negative 
l5 convolver weights by adding to the sum of products 
pi1 Wil +pi[ Wil the sum of I w;l for all convolver 
weights less than zero, i.e., for negative convolver 
weights. 
The novel features that are considered characteristic 
2o of this invention are set forth with Darticularitv in the 
This equation is explained and illustrated for the case of appended claims. The invention willbest be undkrstood 
k=9, ice., for a 3-bY-3 moving window (kernel) of video from the following description when read in connection 
data and more specifically to a real time pipelined con- with the accompanying drawings. 
volver for f o d g  the sum of products ptwj, wherein i 25 BRIEF DESCRIPTION OF THE DRAWINGS 
is a number from 1 to n2 and n is a number that defines 
the size of an n-by-n kernel, pi are the pixel values, FIG. 1 is a schematic diagram of the present inven- 
typically 12-bit values of an n-by-n kernel, w; are the 
convolver weights that may have positive (wi>O) and FIG. 2 illustrates an example of th algorithm for 
negative (wi<O) values. The pixel values and weights 3o forming the sum of products PW+S in the system of 
are represented by absolute value binary numbers and a 
tion. 
FIG. 1. 
DESCRIPTION OF PREFERRED 
EMBODIMENTS 
sign bit. 
In the processing of video data, it is necessary to 
produce the sum of products of fixed weights w; times 
the corresponding pixel values pi of successive rows in 35 Referring to the drawings, FIG. 1 is a schematic 
an n-by-n kernel, such as a 3-by-3 kernel. Examination diagram of a 3-by-3 convolver which produces the sum 
of the typical weights involved in low-level vision ap- of products of nine fixed weights w1 through w9 times 
plications indicates that small positive or negative inte- the corresponding pixel values p1 through p9 in a 3-by-3 
gers are most common, with the ratio of the smallest to moving window of video data. Exmination of the typi- 
the largest weight being usually less than 20. Come- 4o cal weights used in low-level vision algorithms indicates 
quentb, each weight contains six bits consisting of a that small positive or negative integers are most com- 
Sign bit and five bits for magnitude from zero to 3 1. The monly used, with the ratio of the smallest to the largets 
sign bit expands the range of weight values from -31 to weight being usually less than 20. This means that a 
+31. six-bit (including sign) weight is adequate, since this can 
essary to scale the Output Of the COnVOlVer. Scaling is TO prevent the 12-bit data path from overflowing 
accomplished by shifting down the data one Or more because multiplying a 12-bit pixel by a 5-bit weight 
bits, i.e., dividing by some power of two. This is most produces a result PW+S having 17 significant bits), it is 
easily done by switches selecting the Parallel output, also necessary to scale the output of the convolver in 
i.e.9 by selecting the output from one or more bit Posi- 50 some variable way (since all large positive weights will 
tions to the right. The problem is in the requisite hard- produce a much larger 
ware for multiplying the pixel values piby Positive and tive and negative weights). Scaling is most easily ac- 
negative weights Wi and forming the sum Of the Prod- complished in hardware by shifting the data one or ucts. more bits, Le., dividing by some power of two. If scaling 
SUMMARY OF THE INVENTION 55 were not included, adding nine of these 17-bit quantities 
PW+S together can produce a result having as 21 
significant bits. Since only 12 of these bits can be output 
TO prevent the data path from overflowing, it is nec- 45 repesent integer values from - 3 1 to + 3 1. 
than a mix of small 
In accordance with the present a ‘On- 
volver is comprised of N-1 buffers for storing N-l 
rasters (scan lines) Of Pixel 
The necessary 
to maintain a constant data path throughout the pipe- 
line, it would be excessive to compute the result accu- 
and additions are carried 60 rately to 21 bits. However, somewhat more than 12 bits 
for an n-by-n 
out in arithmetic operating units connected in cascade 
in accordance with the following equation: must be retained in intermediate stages of the con- volver, since it is common to take derivatives of heavily 
smoothed data, which involves subtracting quantities 
which are nearly equal. To preserve 12 significant bits 
Wi> 0 wi<O W i < O  65 of result when subtracting quantities differing only by 
lo%, or so, requires a 16-bit internal data path. To 
ensure validity of the least significant bit of the output, 
an additional bit of low significance is also needed inter- 
k 
$PY;  P i J W i l  + 2 F i I W i l  + L Iwil 
where k=n x n, p; are the pixel values, wi are the con- 
volver weights, and piare the one’s complement of pixel 
4,950,144 
3 4 
nally to the convolver. Thus, the convolver is provided 
with a 1Fbit parallel data path between pipelined 
blocks for the results PW+S. 
Scaling of the 17-bit result is accomplished in two 
arithmetic unit, is assumed as an input to an arithmetic 
unit 10. The incoming S is entered in the accumulator of 
the arithmetic unit as a first step. Then the least signifi- 
cant bit of the multiplier is inspected. B~~~~~~ it is a bit 
stages: the input Pixel values to the convolver may be 5 1, the next step is to add the multiplicand in the four 
shifted down in significance, allowing mor room for the least significant bit positions of the accumulator. carry overflow when large positive weights are used, 
and the 12-bit output actually taken from the 17-bit third step, the next least significant bit of the multiplier 
convo~ver data output may be shifted up to allow for is inspected. Because it is a bit 0, zero is effectively 
cancellation when subtracting nearly equal quantities. 10 added by doing nothing except effectively shifting the 
As discussed above, adding the products of nine 12-by-5 multiplicand before going into step mmber 4. Then the 
multiplications can produce a 21-bit result. However, third multiplier bit is inspected. Because it is a bit 1, the 
having all nine weights near the maximum value of 31 is multiplicand is effectively shifted, and added to the 
very unliiely, since they could all be divided by two content of the accumulator. The process is repeated in 
and the result scaled to produce nearly identical results. 15 step 5 for the last bit of the multiplier which, upon 
Thus, it is reasonable to assume that the S u m  of the nine inspection after a shift, tames the multiplicand to be 
weights can always be kept to somewhat under the added to the partial products in the accumulator. The 
kept under 256 (9% less than 279), overflow into the the last step. 21st bit can be avoided. This means that shifting the 20 
input pixel down in signifiwce by up to thee T h i s  algorithm for the arithmetic units 10 can be 
bits pedts the 174,it data path to accommodate the implemented for a 17-bit pixel and 5-bit weight (abso- 
most significant bits of the 2@bit result. AS a conse- lute Value) with a 17-by-5 array Of one-bit full adder 
quence, using the convolver of FIG. 1 calls for a pro- Circuits- (Note that in the Preferred implementation 
gr-able shift of from zero to three bits in the input 25 illustrated in the simple example with reference to FIG. 
data using external data input select switches as a scaler 2, the multiplicand is shifted only in the sense of it being 
A. The shift of the output 12-bit data path with respect added in a position of one binary bit higher significance 
to the internal 17-bit data Pa& is Sma1Y ProSamma- in each step using selecting gates to the inputs of the 
ble from zero to three Using a scaler B9 SO that when adder, rather than actually shifting the multiplier in the 
subtracting nearly equal quantities more significant bits 30 registers 13,14 and 15.) Thus, nine 17-by-5 arrays, for a 
total of 765 full adders, are needed for the convolver. In are preserved. 
the preferred embodiment, seventeen-bit latches are mented with custom VLSI circuit chips. The two com- 
plete scan bs that must be stored in order to cover the used at the output of each of the arithmetic units 10 to 
3-by-3 window are stored in conventional line buffers 35 store the intermediate accumulated results PW +S. This 
(memory chips) comprised of N-3 pixel delays 11 a d  allows each multiplication to take the full pixel scan 
12, each with three pixel delay elements in cascade for time. Appropriate reduction in the line buffer delay N 
a total of N pixel delays, and three pixel delay elements corrects for these delays. The multiplication steps 
which precede the N-3 pixel delay 11 to store the needed for each pixel are synchronously performed 
pixels pi through pg in a moving window array as fol- 4o during pixel scan time. 
lows: Signed arithmetic is accomplished by the simple ex- 
pedient of complementing the pixel value via exclusive 
OR operators G1 through Gg (each consisting of a bank 
of 17 exclusive OR gates connected to a one pixel. delay 
45 element D for a 3 x 3 array of data) prior to multiplica- 
tions by a negative weight. Since the one’s complement 
of the pixel value plus one would produce the negative 
of that value in two’s complement representation, the 
product ofa negative weight and the pixel value is equal 
value 50 to the product of the absolute value of the weight (five 
value plus one. This is accomplished in the system of 
FIG. 1 by adding the absolute values of all negative 
the time of Pro@ming),  and storing that 
maximum 279 (9-by-31)‘ If the Of the weights is final product pW+S appears in the accumulator after 
The arithmetic units lo’ are 
P3; p2; PI 
P6; PS; P4 
P9; pa; P4 
It is highly desirable to utilize VLSI technology for 
implementation of the convolver (except for the line 
buffers), including registers 13, 14 and 15 for the three 
sets of weights Wl-w3, Wq-Wgand W7-W. Note that the 
weights are 
registers 13a which are in reality part Of 
registers 13,14 and 15, but shown here separate because 
they are so separated by the convolver algorithm. Cus- 
metic units 10 to be implemented directly within an 
internal 17-bit data path. Moreover, a custom VLSI 
circuit is easiest to design when the circuit to be imple- 
mented is a regdaf, repeated structure, as in this case of 
pipelined arithmetic units BO. 
Accumulation of successive multiplications can be 
accomplished most straight forwardly by a multiplica- 
tion algorithm of repeated add and shift operations, as 
the multiplier 1101. A nine-bit quantity S equal to 
OOO100110, the sum of absolute values for all negative 
weights (Wi<O) or the result PW+S from the previous 
by five bits Of 
(madtude) Pl s a sign bit’ The sign bits are stored in bits without a sign bit), and the complement of the pixel and 
tom =SI implementation allows the 12-by-5 bit d t h -  55 weights together at the Outset (since they are fixed at 
2 I W l I  
wi<O 
60 in a register 16. 
The convolver thus organized performs the function 
of the following equation: 
illustrated in FIG. 2 using a simple example of a four-bit 
pixel as the multiplicand 1010 and a four-bit weight as 65 
9 (1) 
7 F l w i  = I: piwi + z ( - ~ J I w i l  
wi>O Wl<O 
Using two’s complement arithmetic, (i.e., -Pi=p’i+ 1) 
this function becomes: 
4,750,144 
5 
However, instead of forming pi+ 1, the distributive law 
is applied yielding the following equation: 
(3) 
where pi arefhe pixel values, Wi are the convolver 
weights, and pi are the one’s complement of the pixel 
values. In this way a uniform VLSI architecture can 
handle both positive and negative convolver weights. 
Note that the input pixel data to the convolver may be 
positive or negative, and that the final output of the 
convolver may also be positive or negative. In a partic- 
ular unit, all bits of a pixel value are complemented 
when it is to be multiplied by a negative weight ex- 
pressed in the form of absolute binary value plus a sign 
bit. Consequently, the multiplicand will have all 1’s left 
of the most significant bit to fill the 17-bit word for the 
scaled pixel weight (if it was initially positive, or was 
negative with a positive weight), and so the product 
PW will be negative for a negative weight, but the 
output PW+S may be positive or negative depending 
on the value of S and whether it is positive or negative. 
In that regard, it should be noted that the correction 
value 
added initially is always a positive number because it is 
the sum of the absolute value of all negative weights 
(five bits absolute without the sign bit). 
Although particular embodiments of the invention 
have been described and illustrated herein, it is recog- 
nized that modifications and variations may readily 
occur to those skilled in the art. Consequently, it is 
intended that the claims be interpreted to cover such 
modifications and variations. 
What is claimed is: 
1. A system utilizing n x n  binary arithmetic units 
connected in cascade from multiplying binary pixel 
values pi by binary weights Wi, where the pixel values 
are either positive or two’s complement and the weights 
are expressed by an absolute value binary number and 
sign bit, comprising 
means for storing an array of n-by-n pixel values of 
raster scanned video data in a moving window of n 
successive pixels from n successive rasters, 
means for coupling said array of n-by-n pixel values 
,to said n x n binary arithmetic units for multiplica- 
tion of positive binary pixel values by correspond- 
ing positive weights and multiplication of the one’s 
complement of binary pixel values by correspond- 
ing negative weights, and 
means for adding to the total sum produced by said 
arithmetic units connected in cascade the sum of 
the absolute value of all negative weights. 
2. A system comprised 
of a number k of pipelined arithmetic operating units 
and N-1 buffers connected in series for storing 
values of two rasters of pixels, plus means for stor- 
ing the values of n pixels in series ahead of said 
buffers, while convolving n-by-n kernels in succes- 
sion, where k equals n times n, and where the pixel 
5 
10 
15 
20 
25 
30 
35 
40 
6 
values are either positive or two’s complement 
binary numbers and the weights are positive or 
negative numbers expressed as a binary number of 
magnitude and a sign bit, registers for storing k 
convolver weights, w1 through wk, 
means responsive to a sign bit of negative weights for 
forming the one’s complement of the pixel values 
to be multiplied by negative weights, and 
means for forming the sum of products in each oper- 
ating unit in accordance with the equation PW + S, 
where the product PW is formed by repeated shift 
and add of pixel value P under control of a weight 
value W, where S is a previous sum of products 
PW formed in accordance with the equation 
where pi are the pixel values and pi are the one’s 
complement of pixel values, and where S for the 
first arithmetic unit forming the sum of products 
PW t S  is the term 
3. A system as defined in claim 2 comprising 
registers for storing the weights, including sign bits, 
and 
an additional register for storing the sum of the abso- 
lute value of all negative weights formed at the 
outset as said term 
to be added to the convolver output as a correction 
for an error produced in multiplying pixel values 
by negative weights by addition of the one’s com- 
plement of the negative weights. 
4. A svstem as defined in claim 3 where n eouals 3 for 
convolving 3 by 3 kernels. 
processing of video data comprising - 
5. A system for forming the sum of products in the 
nine binary arithmetic units connected in cascade for 
multiplying binary pixel values pi which are posi- 
tive or two’s complement binary numbers by 
weights wjwhich are positive or two’s complement 
binary numbers, 
means for storing said weights, including the sign bits 
of said weights, 
means for forming the one’s complement of a pixel 
value pi to be multiplied by a negative weight W i  
under control of the sign bit of the negative weight 
means for adding as a correction the sum of the abso- 
lute values of all the negative weights for each 3 x 3 
kernel. 
6. A system as defined in claim 5 wherein said correc- 
including means for storing said precomputed correc- 
tion sum as a value to be added to the product of 
the first arithmetic unit. 
7. A system comprised of 
N- 1 buffers for storing N- 1 rasters of pxiel values 
for an n-by-n kernel, 
means for carrying out necessary multiplications and 
additions in arithmetic operating units connected in 
45 
50 
55 wit and 
60 tion sum is precomputed, and 
65 
4,750,144 
7 8 
cascade in accordance with the following equation: value binary number and sign bit, comprising the steps 
of 
storing the array of n-by-n pixel values of raster 
scanned video data in a moving window of n suc- 
multiplying positive binary pixel values by corre- 
sponding positive weights and multiplying the 
one’s complement of binary pixel values by corre- 
sponding negative weights, 
forming the sum of products produced by said multi- 
plying steps and 
adding to the sum of products the sum of the absolute 
value of all negative weights. 
9- A method for forming the sum of Products in the 
processing of video data in a kernel of 3 x 3 pixels by 
multiplying binary Pixel values piwhich are either posi- 
tive or two’s complement binary numbers by weights 
Wi which are positive or two’s complement, comprising 
the steps of 
storing said weights including the sign bits of said 
weights, 
forming the one’s complement of a pixel value p; to be 
multiplied by a negative weight w;~ and 
adding as a correction the sum of the absolute values 
of all the negative weights for each 3 x 3 kernel. 
10. A method for forming the s u m  of produces in the 
processing of video data in a 3-by-3 kernel as defined in 
clah 9, wherein said correction sum is precomputed, 
including the step of storing said precomputed cor- 
rection sum as a value to be added to said sum of 
products. 
k 
?Piwi= 2 Pilwil+ 2 P;: lwi l+  2 l w i l  5 cessive pixels from n successive rasters, wi>O wi<O Wi<O 
where k=n Xn, pi are the pixel values, Wi are the 
system weights, and piare the one’s complement of 
pixel values which are either unsigned or two’s 1o 
complement binary numbers, and weights =e rep- 
resented by absolute value binary numbers plus 
sign bits, 
means for storing the number N- 1 of complete ras- 
ters that must be processed in order to cover SUC- 
cessive n-by-n kernels consisting of N- 1 series 
connected buffers, each buffer storing a number n 
of pixels where n is the total number of pixels in a 
raster, 
means for accumulating successive products p,wi 
consisting of an array of m one-bit full adders using 2o 
the absolute values of the system weights, both 
positive and negative, and the one’s complement of 
the pixel value pi when it is to be multiplied by a 
negative weight Wi, and 
means for correcting the accumulated sums of pi1 Wil 25 
and pi1 wil for negative weights by adding to the 
sum 0% products pi1 wil +pi1 Wil the sum of the 
absolute value Wi for all negative weights. 
8. A method for forming the sum of products in the 3o and 
processing of video data in a kernel of nXn  pixels by 
multiplying binary pixel values pi by binary weights wi, 
where the pixel values are either positive or two’s com- 
plement and the weights are expressed by an absolute * * * * *  
35 
40 
45 
50 
55 
60 
65 
