Method And System For Generating And Implementing Orientational Filters For Real-time Computer Vision Applications by Kubota, Toshiro & Alford, Cecil O.
United States Patent [19J 
Kubota et al. 
[54] METHOD AND SYSTEM FOR GENERATING 
AND IMPLEMENTING ORIENTATIONAL 
FILTERS FOR REAL-TIME COMPUTER 
VISION APPLICATIONS 
[75] Inventors: Toshiro Kubota, Columbia, S.C.; Cecil 
0. Alford, Lawrenceville, Ga. 
[73] Assignee: Georgia Tech Research Corporation, 
Atlanta, Ga. 
[21] Appl. No.: 08/801,602 
[22] Filed: Feb. 13, 1997 
Related U.S. Application Data 
[ 60] Provisional application No. 60/011,813, Feb. 16, 1996, 
provisional application No. 60/011,814, Feb. 16, 1996, and 
provisional application No. 60/011,815, Feb. 16, 1996. 
[51] Int. Cl.6 ...................................................... G06F 17/00 
[52] U.S. Cl. ............................................. 708/313; 708/308 
[58] Field of Search .......................... 364/724.1, 724.011, 
[56] 
4,891,630 
5,519,793 
5,526,446 
364/724.17, 724.19, 572; 375/346; 382/275, 
266; 340/706; 708/313, 308 
References Cited 
U.S. PATENT DOCUMENTS 
1/1990 Friedman et al. ...................... 340/706 
5/1996 Grammes ................................ 382/266 
6/1996 Andelson et al. ...................... 382/275 
OTHER PUBLICATIONS 
"Computation of Orientational Filters for Real-Time Com-
puter Vision Problems I: Implementation and Methodol-
ogy," Kubota, Tashiro and Alfrod, Cecil 0., Real-Time 
Imaging, vol. 1, 1995 pp. 261-281. 
"Computation of Orientational Filters for Real-Time Com-
puter Vision Problems II: Multi-resolution Image Decom-
position," Kubota, Tashiro and Alford, Cecil 0., Real-Time 
Imaging, vol. 2, 1996, pp. 91-116. 
"Computation of Orientational Filters for Real-Time Com-
puter Vision Problems III: Steerable System and VLSI 
Architecture," Kubota, Tashiro and Alford, Cecil 0., Real-
Time Imgaing, vol. 3, 1997, pp. 37-58. 
12 
INPUT 
I lllll llllllll Ill lllll lllll lllll lllll lllll 111111111111111111111111111111111 
22 
US00600944 7 A 
[11] Patent Number: 
[45] Date of Patent: 
6,009,447 
Dec.28,1999 
"Design and Implementation of Orientational Filters Using 
Separable Approximation Methods," Kubota, Tashiro and 
Alford, Cecil 0. and Woods, Michael B., Proc. IEEE/SUC 
Int. Conf. pp. 333-338. 
Discrete Fourier Analysis of Multidimensional Signal, Mul-
tidimensional Digital Signal Processing by Dan E. Dudgeon 
and Russell M. Mersereaus, Prentice-Hall, pp. 76-81 
(1984). 
From Images to Surfaces, Vision by David Marr, W.H. 
Freeman pp. 174-183 (1982). 
"Survey of Texture Mapping", Heckbert, Paul S., IEEE 
CG&A, Nov. 1986, pp. 231-332. 
(List continued on next page.) 
Primary Examiner-Joseph E. Palys 
Assistant Examiner-Nguyen Xuan Nguyen 
Attorney, Agent, or Firm-Harness, Dickey & Pierce, P.L.C. 
[57] ABSTRACT 
Methods for generating and implementing digital orienta-
tional filters. A first method of the present invention is 
provided for generating digital orientational filters of uni-
form size but each having a different fixed orientation. A 
second method provides for dilation of the digital orientation 
filters generated by the first method, both in a decimated and 
an undecimated format. A third method of the present 
invention provides for steering the orientation of the filters 
generated by the first method. Also, associated VLSI hard-
ware based systems implementing the above methods are 
disclosed. The above methods and systems allow digital 
orientational filters to be utilized in computer vision and 
other applications requiring a large amount of video signal 
data to be processed in real-time. 
10 Claims, 21 Drawing Sheets 
MONITOR 
OUTPUT 
18 
6,009,447 
Page 2 
OIBER PUBLICATIONS 
"Multirate Digital Filters, Filter Banks, Polyphase Net-
works, and Applications, A Tutorial," Vaidyanathan, P.P., 
Proceedings of the IEEE, vol. 78, No. 1, Jan. 1990, p. 56. 
"Wavelets and Signal Processing," Rioul, Olivier and Vet-
terli, Martin, IEEE SP Magazine, Oct. 1991, pp. 14--15 adn 
36-38. 
Orthonormal Bases of Wavelet, Orthonormal Bases of Com-
pactly Supported Wavelets by Ingrid Daubechies, COmmu-
nication in Pure and Applied Mathematics, vol. 41, No. 7, 
pp. 914--923 (1988). 
"Determining Three-Dimensional Shape from Orientation 
and Spatial Frequency Disparities," Jones, David G. and 
Malik, Jitendra, pp. 661-669. 
"Adaptive enhancement based on a visual model," Peli, Eli, 
Optical Engineering, vol. 26, No. 7, Jul. 1987, pp. 655-660. 
"Image Representation Using Block Pattern Models and Its 
Image Processing Applications," Wang, Yao and Mitra, 
Sanjit K., IEEE Transactions on Pattern Analysis and 
Machine Intelligence, vol. 15, No. 4, Apr. 1993, pp. 
321-336. 
"Application of Multiresolution Spatial Filters to Lon-
g-Axis Tracking," Abramson, S.B. and Fay, F.S., IEEE 
Transactions on Medical Imaging, vol. 9, No. 2, Jun. 1990, 
pp. 151-158. 
"Parallel Processing for Computer Vision and Images 
Understanding," Choudhary, Alok and Ranka, Sanjay, Com-
puter, Feb. 1992, pp. 7-10. 
"A Real-Time Algorithm for Signal Analysis with the Help 
of the Wavelet Transform," Holschneider, M., Kronland-
Martinet, R., Morlet, J. and Tchamitchian, Ph., pp.287-297. 
"An Implementation of the 'algorithme a trous' to Compute 
the Wavelet Transform," Dutilleux, P., pp.298-304. 
"The Design and Use of Steerable Filters," Freeman, Will-
iam T. and Adelson, Edward H., IEEE Transactions on 
Pattern Analysis and Machine Intelligence, vol. 13, No. 9, 
Sep. 191, pp. 891-906. 
"X-Y separable pyramid steerable sealable kernels," Shy, 
Douglas and Perona, Pietro, IEEE, 1994, pp. 237-245. 
"Finite Representation of Deformable Kernels," Perona, 
Pietro, pp. 8-17. 
"A New Parallel 2-D FFT Architecture," Pytosh, Teresa M., 
Magnani, Alberto M., IEEE, 1990, pp. 905-908. 
"Parallel Visual Pathways: A Review," Lennie, Peter, Vision 
Research, vol. 20, pp. 561-594. 
"Fast Algorithms for Discrete and Continuous Wavelet 
Transforms," Rioul, Oliver and Duhamel, Pierre, IEEE 
Transactions on Information Theory, vol. 38, No. 2, Mar. 
1992, pp. 569-586. 
"Parallel Computer Architectures for Image Processing," 
Reeves, Anthony P., Computer Vision, Graphics and Image 
Processing, vol. 25, 1984, pp. 68-88. 
"A Theory for Multiresolution Signal Decompostion: The 
Wavelet Representation," Mallat, Stephane G., IEEE Trans-
actions on Pattern Analysis and Machine Intelligence, vol. 
11, No. 7, Jul. 1989, pp. 674--693. 
"The Discrete Wavelet Transform: Wedding the A Trous and 
Mallat Algorithms," Shensa, Mark, J ., IEEE Transactions on 
Signal Processing, vol. 40, No. 10, Oct. 1992, pp. 
2464-2482. 
"Multichannel Texture Analysis Using Localized Spatial 
Filters," Bovik, Alan, C., Clark, Marianna, and Geisler, 
Wilson, S., IEEE Transactions on Pattern Analysis and 
Machine Intelligence, vol. 12, No. 1, Jan. 1990, pp. 55-73. 
"A Quasi Radix-16 FFT VLSI Processor," Bhatia, R., 
Furuta, M. and Ponce, J., IEEE, 1991, pp. 1085-1088. 
"Texture Analysis Based on A Human Visual Model," Tan, 
T.N. and Constantinides, A.G., IEEE, 1990, pp. 2137-2140. 
"The Design of Multistage Separable Planar Filters," Treital, 
Sven and Shanks, John L., IEEE Transactions on Geo-
science Electronics, vo.GE-9, No. 1, Jan. 1971, pp. 10-27. 
"Unsupervised Texture Segmentation Using Gabor Filters," 
Jain, Anil K. and Farrokhnia, Farshid, IEEE, 1990, pp. 
14--19. 
"The Visual Cortex," Tusa, Ronald J.,Am. J. EEG Technol. 
vol. 26, 1986, pp. 135-143. 
"The Cortex Transform: Rapid Computation of Simulated 
Neural Images," Watson, Andrew B, Computer Vision, 
Graphics, and Image Processing, vol. 39, 1987, 
pp.311-327. 
Tashiro Kobuta, Cecil 0. Alford and Michael B. Woods, 
"Design and Implementation of Orientational Filters Using 
Separable Approximation Methods", Proceedings of the 
IEEE International Conference on Systems, Man and Cyber-
netics, pp. 333-338, Oct. 1993. 
Gilbert Strang and Vasily Strela, "Short Wavelets and Matrix 
Dilation Equations", IEEE Transaction on Signal Process-
ing, vol. 43, No. 1, pp. 108-115, Jan. 1995. 
William T. Freeman and Edward H. Adelson, "The Design 
and Use of Steerable Filters", IEEE Transactions on Pattern 
and Machine Intelligence, vol. 13, No. 9, pp. 891-906, Sep. 
1991. 
Douglas Shy and Pietro Perona, "X-Y separable pyramid 
steerable scalable kernels", IEEE Computer Vision and 
Pattern Recognition, pp. 237-244, Jan. 1994. 
D. Raghuramireddy and R. Unbehauen, "A New Realization 
for Microprocessor Implementation of 2-D Denomina-
tor-Separable Digital Filters for Real-Time Processing", 
IEEE Transactions on Signal Processing, pp. 2349-2353, 
Sep. 1992. 
Cormaqc Herley and Martin Vetterli, "Orthogonal Time-
Varying Filter Banks and Wavelets", IEEE Circuit and 
Systems, pp. 391-394, May 1993. 
Cormac Herley and Martin Vetterli, "Orthogonal Time-
Varying Filter Banks and Wavelet Packets", IEEE Transac-
tions on Signal Processing, vol. 42, No. 10, pp. 2650--26763, 
Oct. 1994. 
U.S. Patent Dec.28,1999 Sheet 1of21 6,009,447 
20 
MONITOR 
OUTPUT 
12 
INPUT 
18 
MEMORY 
PROCESSOR 
16 
r------~ I VLSI ~--·---~ 
I I ....._ ______ _. 
Figure· 1 
24 
U.S. Patent Dec.28,1999 Sheet 2 of 21 6,009,447 
•••••• 
I 
h 0 N ·1 
Figure-3 
Vertical filter 1 
-' Horizontal filter 1 
I I Intermediate Memory : I I I - ll 1 
• • • 
• • • 
• • • 
Vertical filter P I Horizontal filter P t I Intermediate Memory : I I I 91 
• 
• 
• 
• 
• figure ·4 Vertical filter 1 ·~ 11£ 
• 
• 
• 
Vertical filter P 
._.... 
fl£ 
U.S. Patent Dec.28,1999 Sheet 3 of 21 6,009,447 
lntennediate Memory i---.---- Vertical filter 1 Horizontal filter 1 &1 
Figure· 5 
Approximation order (P) 
• 
• 
• 
Vertical filter P 
• 
• 
• 
Horizontal filter P 
01 
• 
• 
• 
• 
• 
Horizontal filter 1 
{}E 
• 
• 
• 
Horizontal filter P 
OE 
filter type (gabor, directional gaussian, gaussian derivative, ·etc) 
the number of filtes (F N ), filter size (M), filter parameters 
~
Filter coefficients computation 54 52 
56 
Approximation matrix generation (Fugyre 3•3) 
Singular Value Decomposition 
Figure·6A 
60 62 
P vertical filters F N P horizontal filters 
sizeM sizeM 
U.S. Patent Dec.28,1999 Sheet 4 of 21 6,009,447 
v 
u v ~ ~ ~ tJ r::J ~ 
l 
SVD 
u l u v 
u u u 
f.igure · 68 
U.S. Patent Dec.28,1999 Sheet 5 of 21 6,009,447 
I 1 (mJ 
1.0 
0.5 
0 
0 2 
Figure· lA 
11 (m) 
1.0 
0.5 
0 ...__ ___ __, 
0 2 
Figure· 78 
572 'ii (n) 
1.0 
0.5 
0 
(a) Spline Order= 1 
574 ii (n) 
1.0 
0.5 
0 
(b) Spline Order= 3 
0 2 
Figure· lE 
.. ------.· 575 
0 4 
Figure· lF 
l1fm)..----....,---....., 576 
o.6. A . 
0.0 / " 
ii (n) ------- 577 
0.5 
0.0~ 
.0.6 ...__ ___ __, 
.0.5 L-------' 
0 4 0 6 
Figure· lC (c) Spline Order= 5 Figure· lG 
. ii (n) ·------ 579 
0.5 
0.0~ 
·0.51-------' 
0 6 0 8 
Figure· lD (d) Spline Order= 7 Figure· lH 
U.S. Patent Dec.28,1999 Sheet 6 of 21 6,009,447 
r---------- -------------------1 
1[n,m] ! cO 
Pre-filter ------
Lowpass 
Bank 
I 
c 1 Lowpass c2 1 
Bank 
I 
l---·---1-------------- ____________ J 
Highpass Highpass 
Bank Bank 
d1 
(a) Overall System 
·I ¢1 [m] ·I ¢1 [n] • 
(b) Pre-filter Bank 
·I h[m] ·I h[n] .. 
(c) lowpass Filter Bank 
,-------1 
I I 
I I 
g1 Y [n] I g1 Y {m] I I 
I 
I 
I 
g2Y [n] g2Y [m] 1---+ 
The portion enclosed • • 
by dashed boxes can • • 
be shared with other • • 
orientational filters 
g2Y [n] g2Y [m] 
'---------J 
(d) HighpassFilter Bank 
Comutational Structure Separable Wavelet Approximation 
Figure· 8 
U.S. Patent Dec.28,1999 Sheet 7 of 21 6,009,447 
,82 
From: Reorder Buffer ./ 
Vertical Filter Output --1ol1 l2 l3 I I N-2 I N· 1 I 
the numbers indicated are the pixel indices 
To· 
pixel Order atlevel 2 JN· 11 • • • [!] [.!] [!]" [!] • • • !±J ~ @J -- H;rizontal Filter input 
pixel Order atlevel 3 [!JI • • • IZJ [.!] ~ • • • [!] ~ ~ • • • [!] [!] IN.] • • • [±] @] 
" " " To: Figure • g -- Horizontal Filter input 
1-
1 
I 
I 
1 r---
1 I l ___ Pre_·filfe!._~ j RAG 
,------------1 
I RAG I I I 
I I 
l 
I I I I 1---...______ VFM HFM ..______.._.-' VFM HFM 
VFM 
.....---- filter 1 
HFM · 
HFM 
-_.__- filter EN 
HFM 
1st Level 
BAG= Reorder Buffer Address Generator 
VFM= Vertical Filter Module 
HFM= Horizontal Filter Module 
--- filter1 
VFM HFM 
HFM 
_ _.__~ filter Fil 
HFM I 
l_ ____________ 2nd_L_e_vel J 
Undecimated Separable-Wavelet Approximation (Scheme /lj witn Multiple Orientationa/ Filters 
Figure· 10 
U.S. Patent 
110 
Approximation order (P) 
Dec.28,1999 Sheet 8 of 21 6,009,447 
)100 
filter type (gabor, directional gaussian, gaussian derivative, etc) 
the number of filters (FN ), filter size (M), filter parameters 
~
Filter coetncients computation 104 102 
106 
Approximation matrix generation 
108 
Singular Value Decomposition 
( 120 P vertical filters 
\ - sizeM 112 114 
F N P horizontal filters 
- sizeM 
Basic spline order .--------
126 128 
interpolation Jowpass 
filter · fl/fer 
highP,ass 
veltical filter 
122 
Spline Approximation 
124 
hi,gh P.aSS 
horJzontal filter 
118 
figure· 11 
U.S. Patent Dec.28,1999 Sheet 9 of 21 6,009,447 
)130 
131 
in 
a 1 [y] b11 [x] 
• • 
• • q1 (0) 
• •• 
a plYl btpfx] 
b21 [x] 
• out 
• q 2fD) 
• 
b2p{X} • 
• 
• 
• 
ba1fxJ • 
• 
• 
• qQ(O) 
• 
bapfxJ 
Interpolation Unit 
Figure· 12 
U.S. Patent Dec.28,1999 Sheet 10 of 21 6,009,447 
5132 
1/n,m] Pre·filter cO Lowpass c1 Bank 
Orthogonal Projection q1 (8) Filters Filters 1 
• • • 
• • • 
• • • 
Projection qa(&J Filters Q 
(a) Overall System 
·I ,t fm] ·I '1 [n] • 
(b) Pre-filter Bank 
·I h[m] ·I . h[n] • 
(c) Lowpass Filter Bank 
gtY [m] I ------i·I L.-_ _;__--1 gk1 x [n] 
I g2Y [m] ---.i·I._ __ ...... gk2 x [n] + 
• • 
• • 
• • 
g2Y [m] 
--·IL.-____. gkPX {n] 
(d} Orthogonal Filters (e) Projection Filters k 
Structure for FSA Combined with SWA 
Figure· 13 
U.S. Patent Dec. 28, 1999 
VFM 
Pre·filter 
VFM 
BAG= Reorder Buffer Address Generator 
VFM= Vertical Filter Module 
HFM= Horizontal Filter Module 
(See Fig. NO TAG) 
qk{ g )=Interpolation Unit 
Sheet 11 of 21 
VFM 
HFM 
The identical MRD Module can be cascaded for further MRD 
a distinct set of Interpolation units is needed for each orientation tuning. 
Structure of Undecimated Steerable MRD 
6,009,447 
Figure -14 
U.S. Patent Dec.28,1999 Sheet 12 of 21 6,009,447 
Approximation order (Q) 
154 
filter type (gabor, directional gaussian, gaussian derivative, etc) 
11/tersize (II), lilkrpanl11'1eJS~ 
----..--Fo-un-'er-se,._rles-exp-an ..... s1-'on_o_f th-e-filt-er...... 152 
Q basic filters 156 
Approximation matrix generation 
160 
Approximation order (P) 158 
Singular Value Decomposition. 
<66 P vertical filters QP horizontal filters - sizeM - sizeM 
Basic spline order 
172 174 
interpolation low pass 
filter filter 
w 
+ 
high P,ass 
venical filter 
168 
2-port adder with data width w 
w1 w2 
x 
w1 by w2 multiplier producing the 
output with width wt + w2 
Spline Approximation 
170 
164 
high P.aSS 
honzontal filter 
Figure· 15 inte~olation Coe icients 
w 0 
2·port adder followed by D flip-flops 
width data width w 
global bus 
Figure· 16 
U.S. Patent Dec.28,1999 Sheet 13 of 21 6,009,447 
pixel 
hostllocal_controller 
wr,addr 
coeff, hosLaddr, chip_id 
mem1 i---- in1 vout1 1-----~ vout1 vout1 
: VFC 
• 
HFC ••• HFC 
memM inWFV voutN FV 1----~ voutN FV voutNFV 
hout hout 
Filter 1_ output FilterFN _output 
Figure • 17 VLSI Architecture of SV/OSD 
U.S. Patent Dec.28,1999 Sheet 14 of 21 6,009,447 
)184 
r--------------
------------ --------------------, 
: 
/NwFV i l Cf w FV 5-dJry Regis/N FUe 1 coeff, adiJ 
Bp l Bp 
x 
out1 
'----·------------------------------
• 
• 
• 
• 
• 
• 
I ~mNFV ~P 
Filter 1 
______ J 
I dfV I vpoutN FV © Filter N FV i 
~----------------------------------------------------------------J 
Figure· 18 
U.S. Patent 
r 
38P. 
I hout1 ~ hout2 
• 
• 
• 
houtNFH ·1 
houlNFH I I 
I 
._ __ 
Dec.28,1999 
••• 
• •• 
Binary Adder 
• 
• 
• 
• 
• 
• 
Sheet 15 of 21 
hout1 
houtNFH 
I psum 
6,009,447 
coeff, addr 
Filterf 
FilterN FH 
Figure· 19 
U.S. Patent Dec.28,1999 Sheet 16 of 21 
pixel 
hosUfocaLcontroller 
wr,addr 
coeff, hostaddr, chip_id 
mem1 1---inf 
• • 
• • 
• • 
vpout1 vpoutN FV 
wr,addr VFC voutti----.--- voutt 
HFC ••• 
memM ut
.11 ..._ ___ voutN FV 
1---inWFV vo "FV 
.___ _ __. houH voutN FH 
• 
• 
• 
hpoutt hpoutN FH 
vout1 
HFC 
voutN FV 
hout 
Filter 1_ output 
••• 
Note: coeff, host_addr and chip_id signals from the host are routed to evef1J VFC 
and HFC although the above figure does not show the connections to some of 
the chips. 
VLSI Architecture of SVIOSD with M > W FV and M > W FH 
Figure-20 
6,009,447 
voutt 
HFC 
voutN FV 
houtt voutN FH 
• 
• 
• 
hpout1 hpoutN FH 
voutt 
HFC 
voutN FV 
bout 
FilterF N _output 
U.S. Patent Dec.28,1999 Sheet 17 of 21 6,009,447 
'5192 
Secondary Register Rle 2 coeff, addr 
•••••• 
a1 a2 
Int in2 
mrr l ·r l 1 
b1 b2 bWFs 
muxout l 2Bp Bp 2Bp Bp 2Bp 
x 
Sp 
x ••• x 
vout 
,__ __ --.I 1 
2D Separable Filter Chip for Pre-filter and low-pass Filter in SWA out 
Figure-21 
U.S. Patent Dec.28,1999 Sheet 18 of 21 
pixeUn _ hosUlocaL controller 
write, addr write, addr write, addr 
.------' coeff, hosladdr 
-------1 
...__ __ _;--.-...i SFC I mem1 mem1 SFC 
I !I, I memW1 I 
(Pre·fiii .... lte-r) _ __. II l;i: 
Figure-22 
_JI memM 
I 
i (low-pass filter) I 
I I 
'-------·---+------' r-r=-- -----., 
I I inss I 
I 
I 
VFC 
vout$$ 
vout1 
HFC 
Leve/1 
memM 
I Filterl_output 
I 
vout1 
I 
HFC 
Leve/1 
I FilterFN-output 
I 
1 (high-pass filter) I L___ _________ __, 
VLSI Architecture for Decimated SWA 
6,009,447 
in$$ 
VFC 
vout$$ 
vout1 
HFC 
hout 
vout1 
HFC 
hout 
U.S. Patent Dec.28,1999 Sheet 19 of 21 6,009,447 
pixel 
)182 
hosUlocaLcontroller 
coeff, host_addr, chip_id 
wr,addr 
mem1 in1 vout1 vout1 vout1 
• VFC 
• HFC ••• HFC 
• 
memM inWFV voutNFV voutN FV voutN FV 
hout hout 
in1 inWFV in1 inWFV 
coeff, host_addr, chip_id ~--..i VFC VFC 
••• 
vout1 voutN FV vout1 voutN FV 
••• • •• 
Output1 OutputW FV OutputFN 
VLSI Architecture of FSA + SV/OSD 
Figure-23 
U.S. Patent Dec.28,1999 Sheet 20 of 21 6,009,447 
pixel_in 
r----- --------...:---., 
I I 
I I 
I I 
' mem1 I 
SFC I 
I memM1 I 
I I I (Pre-filter) • c__ _____________ _J 
hosUlocaL control/er 
write, addr write, addr wrffe, addr 
,-------- ------: 
in$$ I 
I 
VFC 
vout$$ 
vout1 
r--------------------1 
I I 
HFC I inf VFC ouH ~tmn 
houf·i----H-1 I outN Filter N FV out 
I inWFV FV 1--1-, ---
11 I 
11 1,ll 
11 
vout1 11 
HFC ! I in1 out1 i-+; +--
hout·i----1 ' VFC I 
I I 
I I 
11 
I I 
11 
outN FV Filter FNout 
11 (high-pass filter) 11 ~nterpolation unit) 
L---------------""' ...___ __________ _. 
1st Level Decomposition 
VLSI Architecture for Decimated FSA + SWA 
Note: Only the first level decomposition is shown here. However, the subsequent 
descompositions are implemented in exactly the same way as the first level 
Figure-24 
U.S. Patent Dec.28,1999 Sheet 21 of 21 
pixel_in hosV/ocalcontroller 
write, addr write, addr write, addr 
~---' coeff, host_addr 
_....._....., -------i ,--- ----------1 
mem1 I mem1 I mem1 
SFC I SFC i 
I I 
memM 1 hin vout j · 
__ f Pre-~~~ II 
....____ memM memM 
(low-pass filter) RB I 
----- _ __. 
,------ n-1 I in$$ I I 
I VFC 
vout$$ 
rbwr1, rbaddr1 
rbwr1 
rbaddrl 
6,009,447 
rbwr1, rbwr2 
rbaddrl, 
rbaddr2 
in$$ 
VFC 
vout$$ 
RB RB 
rbwr2 RB RB 
Figure-25 
vout1 
HFC 
hout 
I HFC 
rbaddr2 · 
Leve/1 
Filterl_output 
Leve/1 I I FNN_output 
I . I 
'L (high-pass filter) i 
__________ _, 
VLSI Architecture for Undecimated SWA 
vout1 
HFC 
hout 
HFC 
hout 
6,009,447 
1 
METHOD AND SYSTEM FOR GENERATING 
AND IMPLEMENTING ORIENTATIONAL 
FILTERS FOR REAL-TIME COMPUTER 
VISION APPLICATIONS 
2 
and high. Low level processing takes pixel data as inputs and 
extracts primitive features such as edges, texture, depth map, 
and optical flow. This level of processing is mostly regular 
and data-independent, requiring numeric operations on huge 
This application claims benefit or provisional applica-
tions 60/011813, filed Feb. 16, 1996 provisional application 
60/011814, filed Jun. 16, 1996, and provisional application 
60/011815, filed Feb. 16, 1996. 
5 amounts of pixel data. The intermediate level is dependent 
and regular, and takes the primitive features generated from 
low level processing and extracts more meaningful features 
such as surfaces and contours. The computations are often 
both symbolic and numeric. High level processing is highly 
TECHNICAL FIELD 
10 data dependent and very diverse and interacts with the 
database of objects to determine types of objects in the 
image. The present invention relates generally to digital filters, 
and particularly to digital filter generation methods and 
system for computer vision processing that enable vision 
data to be processed in real-time with minimal associated 15 
hardware costs. 
DISCUSSION 
In the last two decades, an immense amount of research 
has been conducted on how to construct a machine system 20 
which is capable of seeing and understanding a visual scene 
as well as humans. This problem is referred to as the 
computer vision problem. The problem has two major 
sub-problems; (1) identifying some particular object in a 
scene, or (2) understanding and describing all the objects in 25 
a scene. The computer vision problem may be categorized 
with computationally intensive problems, such as global 
weather modeling, fluid turbulence and molecular dynamics. 
The computational requirement for a vision system can be 
estimated by assuming that 1024x1024 pixels will be pro- 30 
cessed at a rate of 30 frames/see. Such a system must process 
30 million data elements (pixels) per second. If one thousand 
operations have to be performed on each pixel to fulfill the 
goal of understanding the scene, 30 billion operations per 
second will be required. For most systems, the number of 35 
operations per pixel will be higher by a factor of 10 to 100 
leading to an estimated requirement of computational power 
between 100 and 1,000 billion operations per second. To 
build such a system requires parallel processing and special 
In low-level vision processing, two types of information 
have to be extracted from images simultaneously. One is a 
feature type which can be characterized well in the fre-
quency domain. The second is the location of the feature in 
the spatial domain. Taking a Fourier transform of the image 
is not acceptable since the transform loses all the spatial 
information and the location of features cannot be deter-
mined. In order to extract the information from two 
domains, spatial-frequency analysis needs to be performed 
on images. Also, in order to extract features of various 
orientations, the spatial-frequency analysis must have direc-
tional selectivity. An operator which performs spatial-
frequency analysis with directional selectivity is called an 
orientational filter. The operator can be performed in either 
the spatial domain or the frequency domain. Many such 
operators appear in computer vision research. Some 
examples are the windowed Fourier transform, Gabor filters 
and Gaussian derivatives. 
Many filters tuned to different orientations are needed to 
detect features with various orientations. In order to detect 
features with various sizes, filters with different sizes are 
needed. Thus, a computer vision system must contain many 
filters tuned to different frequency regions in order to be 
flexible enough to solve different problems. Also, the imple-
mentation cost of the vision system depends heavily on the 
implementation cost of each filter. In order to minimize cost 
without losing flexibility, it is essential to be able to tune the hardware. 
Another difficulty in computer vision is that the problem 
40 filter to a particular frequency response dynamically from 
frame to frame, or even within a frame in real-time, to allow 
the system to be adapted to a particular problem and to 
changes of input images. 
is not well-posed from a computational perspective. For 
example, a system must detect a chair from image sequences 
which represent dynamic views inside a room. A simple 
pattern matching method which compares an image with a 45 
small template representing the chair does not work, since 
the chair can be any size and at any orientation relative to the 
image frame. The chair can also be partially occluded by 
other objects, and obscured by lighting and signal noise. 
Thus, input images representing the scene must to be pro- 50 
cessed in such a way that the result produces a data repre-
sentation of each object which is minimally dependent on an 
affine transformation, partial occlusion and noise. This rep-
resentation is often expressed as a collection of character-
istic called "features". Edges, corners and texture have been 55 
suggested as good features for image analysis. 
A reasonable approach to the computer vision problem is 
BRIEF DESCRIPTION OF THE DRAWINGS 
FIG. 1 is a block diagram of a computer system in which 
digital orientational filters generated and implemented by 
the method and system of the present invention are utilized; 
FIG. 2 is a block diagram illustrating the structure of a 
separable approximation scheme; 
FIG. 3 illustrates the construction of an approximation 
matrix with multiple orientational filters; 
FIG. 4 illustrates a first digital filter implementation 
scheme; 
FIG. 5 illustrates a second digital filter implementation 
scheme; 
FIG. 6Ais a flow diagram of the methodology associated 
with the filter generation scheme of the present invention; 
FIG. 6B is a block schematic diagram illustrating filters 
generated by the method of FIG. 6A; 
to apply the human visual processing mechanism to com-
puter vision methods. This interaction of neuro-
physiological research and computer engineering not only 60 
produces better vision methods, but also helps understand 
the mechanism of the human visual system. However, only FIGS. 7A-7H illustrate interpolation functions and cor-
responding filters generated through a multi-resolution 
decomposition method according to a preferred embodiment 
65 of the present invention; 
a small portion of human visual processing is understood. 
Thus, no consensus has been established on how to solve the 
vision problem. 
In order to make the vision problem tractable, the problem 
is divided into three levels of processing: low, intermediate, 
FIG. 8 is a block diagram illustrating the computational 
structure of the separable wavelet approximation method for 
6,009,447 
3 
decimated multi-resolution decomposition (MRD) accord-
ing to the present invention; 
FIG. 9 illustrates a reorder buffer utilized for undecimated 
MRD; 
4 
decomposing the approximation matrix into a plurality of 
separable filters to filter data in real time; generating a basic 
spline function; approximating each separable filter by basic 
FIG.10 is a block diagram of the structure of the SV/OSD 5 
separable approximation method combined with FSA of the 
spline function, which produces a set of low-pass and 
high-pass filters for an efficient recusive multi-resolution 
decomposition; and outputting the filtered pixel data. The 
method also includes the steps of increasing a dilation factor 
of the orientational filters by a predetermined factor; and 
repeating the above steps for L levels of decomposition, 
present invention; 
FIG. 11 is a flow diagram of the methodology involved in 
generating MRD filter coefficients according to the present 
invention; 
FIG. 12 is a block diagram of the structure of an SV/OSD 
system combined with an FSA system; 
FIG. 13 is a block diagram of the structure for an FSA 
system combined with an SWA system; 
FIG. 14 is a block diagram of the structure of an undeci-
mated steerable MRD system; 
FIG. 15 is a flow diagram of the methodology for gen-
erating steerable filter coefficients according to a preferred 
embodiment of the present invention; 
FIG. 16 illustrates the hardware symbol representations 
for hardware used to implement a steerable MRD system of 
the present invention; 
FIG. 17 is a block diagram of the VLSI architecture of an 
SV/OSD system of the present invention; 
FIG. 18 is a block diagram of the structure of a vertical 
filter chip according to the present invention; 
FIG. 19 is a block diagram of the structure of a horizontal 
filter chip according to the present invention; 
FIG. 20 is a block diagram of the VLSI architecture of an 
SV/OSD implementation of the present invention; 
FIG. 21 illustrates the VLSI architecture of a 2D separable 
filter chip of the present invention; 
FIG. 22 illustrates the VLSI architecture for a decimated 
SWA system of the present invention; 
FIG. 23 illustrates the VLSI architecture of FSA com-
bined with an SV/OSD system of the present invention; 
10 where L equals a positive non-zero integer. 
A third preferred embodiment of the present invention 
includes a method for generating a plurality of steerable 
digital orientational filters, and includes the steps of gener-
ating a two-dimensional mother wavelet function; decom-
15 posing the mother wavelet function into a set of basis filters; 
and decomposing the basis filters into separable filters to 
control orientation of each of the plurality of filters in 
response to input data characteristics. 
In addition, the present invention provides VLSI based 
20 hardware schemes for implementing the above methods of 
the present invention in a hardware minimizing and cost 
effective manner. 
Thus, it is an object of the present invention to provide a 
method for generating orientational filters for real-time data 
25 processing applications that exhibits high throughput, low 
latency, small computational complexity and small storage 
criteria. 
It is also an object of the present invention to provide a 
30 multi-resolution decomposition (MRD) method to produce a 
hierarchial image representation suitable for a variety of 
image processing applications. 
It is a further object of the present invention to provide a 
method of tuning orientational digital filters dynamically in 
35 real-time while still satisfying application performance cri-
teria. 
FIG. 24 illustrates the VLSI architecture for decimated 40 
It is another object of the present invention to provide a 
VLSI-based component system that implements a plurality 
of digital orientational filters for real-time data processing in 
a cost-effective manner. 
FSA+SWA system of the present invention; and 
FIG. 25 illustrates a structure for a FSA+SWA undeci-
mated system of the present invention. 
SUMMARY OF THE INVENTION 
The present invention provides an inexpensive method 
and implementation scheme for orientational filters suitable 
for real-time computer vision applications requiring pro-
cessing of pixels at high throughput and producing outputs 
with associated low latency. 
According to a first preferred embodiment of the present 
invention, a method of generating a plurality of digital 
orientational filters is provided. The method includes the 
steps of providing a processor for processing filter data; 
inputting filter parameters into the processor; computing 
filter coefficients from the filter parameters; generating a 
digital filter approximation matrix based on the filter coef-
ficients; and decomposing the approximation matrix into a 
plurality of separable digital filters to filter data in real time. 
According to a second preferred embodiment of the 
present invention, the present invention provides a method 
of processing video data, and includes the steps of providing 
a processor for processing input pixel data; inputting filter 
parameters into the processor; computing filtered coeffi-
cients from the filter parameters; generating a digital filter 
approximation matrix based on the computed coefficients; 
DETAILED DESCRIPTION OF THE 
PREFERRED EMBODIMENT 
Referring to FIG. 1, a computer system including digital 
45 filters generated and implemented for computer vision appli-
cations of the method and system of the present invention is 
shown generally at 10. The computer system 10 is of the 
type well known in the art used for vision based 
applications, such as a SUN Ultra SPARC station. In 
50 summary, the system includes a video signal input 12 and a 
memory 14 for storing video signal data input into the 
system through the input 12. The system also includes a 
processor 16 that is programmed to control video signal data 
processing in accordance with the present invention, along 
55 with a system power source 18 of the type well known in the 
art. The system also includes a video monitor 20, which is 
preferably a standard raster graphics monitor with 1,024x 
1,024 pixel resolution, and which receives the processed 
video data from the processor via output 22. The digital 
60 orientational filters generated by the method of the present 
invention may be programmed through a conventional pro-
gramming language into the memory 14 and utilized by the 
processor 16. Alternatively, the filters may be implemented 
through VLSI-based hardware, as indicated in phantom at 24 
65 and as described below in detail. 
To address the present invention in a constructive manner, 
the following filter requirements are formulated in such a 
6,009,447 
5 6 
since at least M rows of data are needed to perform MxM 
filtering operations. 
way that they are independent of technology and 
applications, and will be referred to throughout the follow-
ing description as implementation criteria. 
The assumptions below are made throughout the follow-
ing description: 
1) Input images consist of NxN pixels and are processed 
in a scan line order. 
From the above, the storage requirement of the system is 
O(N), and is not dependent on F N· As will be seen from the 
5 discussion below, the method and implementation scheme of 
the present invention satisfy all four implementation criteria. 
2) The number of filters in the system is FN. 
3) For simplicity, all the filters have the same size, MxM 10 
where M~N. 
Orientational Filters 
Orientational filters (directional filters) are a class of 
filters which have a narrow angular bandwidth in the fre-
quency domain and are used for spatial-frequency analysis. 
Their applicability has been demonstrated in texture 
analysis, texture segmentation, edge detection, contour 
4) The critical time, tm, is the longest operation time 
between one multiply-accumulate operation and one 
memory write/read operation. These two operations govern 
the speed of a real-time filter system in current VLSI 
technology. 
Based on the above assumptions, the 
criteria can be defined as the following. 
1. High Throughput 
implementation 
15 following, shape analysis, stereo analysis, image coding, 
video coding, image restoration, image enhancement and 
motion detection. Their advantages in computer vision prob-
lems are: 1) the ability to be tuned to a certain radial 
frequency and orientation, and perform as a spatially local-
20 ized frequency analyzer, 2) the ability to extract many image 
features easily, and 3) the resemblance to certain functions 
in the human visual system. 
The system must be capable of handling an input rate of 
0(1/tm) and producing outputs at the same rate. With current 
VLSI technology, a multiply-accumulate operation on 16-bit 
integers can be easily done in 30 nsec, and fast RAM can 
provide an access time in under 30 nsec. This implies a 
possible throughput of about 33 million pixels/sec in current 25 
technology. 
2. Low Latency 
In computer vision problems, objects of interest vary in 
size and orientation relative to the image frame, and their 
size and orientation can change from frame to frame due to 
the movement of the camera or the movement of the objects. 
For this reason, orientational filters have to be tuned to a 
particular angular/radial frequency and a spatial/frequency Latency is defined as the time delay between the first input 
available and the output produced by this input. The require-
ment on system latency is to be bounded by O(tmNM). This 
is the best that can be done when the input stream is arriving 
30 resolution so that the analysis can be performed on multiple 
objects, and follow the dynamic motion of the objects. 
in a scan line order, since the system requires at least M rows 
of data in order to perform a filter operation of size MxM. 
3. Small Computational Complexity 
For a computer vision system to be used in various 
applications, the orientational filters need to be tunable in the 
following four aspects-angular center frequency, radial cen-
35 ter frequency, radial frequency bandwidth, and angular 
frequency bandwidth matched to particular applications and Computational complexity is defined as the order of the 
total amount of computation required to complete the fil-
tering operation for the whole image. The requirement is that 
the computational complexity of the system to be less than 
O(FNN2M2). A direct implementation of the filters has a 40 
complexity of O(FNN2M2). Small complexity implies that 
real-time performance can be achieved with less hardware. 
situations. 
A shift between images is caused by either a small time 
difference or a different viewing point. Applications such as 
motion analysis and stereo analysis need to find a match 
between certain features in multiple images and compute the 
distance between the matched features. However if the 4. Small Storage 
The storage is the amount of memory in bytes required to 
implement the filters. A system which requires small storage 
transform does not preserve translation, the distance cannot 
45 be computed properly. Since motion analysis and stereo 
analysis form a core in image understanding problems, 
preserving image translation is essential. is less expensive in terms of chip count, board size, and cost than a system which requires larger storage. Although com-
mercial RAMs have become more dense and less expensive, 
smaller size memory chips are faster than larger size 
memory chips with the same technology. Thus by keeping 50 
the storage requirements smaller, a faster system can be 
designed with the same implementation cost. Storage 
requirement can easily grow large if every filter in a system 
requires its own storage since the number of filters in a 
system can be fairly large ( 6 different orientations, 7 differ- 55 
ent radial frequencies and 1 quadrature pair amounts to 
6x7x2=84 filters). Also the storage requirement of a method 
to perform one orientational filter tends to measure the 
latency of the method. For example, suppose Method A 
requires O(NxN) bytes of memory to implement an orien- 60 
tational filter whereas Method B requires O(N) bytes of 
memory. It implies that Method A needs to wait for O(N xN) 
bytes of data before producing an output whereas Method B 
needs to wait for only O(N) bytes of data. Thus the latency 
of Method A is at least O(NxN) and the latency of Method 65 
B is at least O(N). Note that at least MN words of storage 
are necessary when the inputs are coming in raster order 
The digital orientational filter generation and implemen-
tation methods and systems of the present invention meet the 
above-described implementation criteria, and are therefore 
suitable for real-time computer vision applications. The 
following description sets forth the details of the present 
invention as it relates to the following categories: I. Sepa-
rable Approximation; II. Multi-Resolution Image Decom-
position; III. Steerable System; and IV. System Hardware 
Design. 
I. Separable Approximation 
It should be understood that the discrete orientational 
filters to be approximated originate from continuous orien-
tational functions, and the filters are derived from sampling 
the continuous functions using a rectangular sampling grid. 
Throughout the following description, h(x,y) denotes the 
orientational function. It is sampled at { (Xm, Y n)} to form a 
discrete orientational filter h[ m,n]. The size of the discrete 
filter is MxM as before. 
6,009,447 
7 
In the separable approximation method, an orientational 
filter h(x, y) is approximated by a linear sum of separable 
filters as 
p 
h(x, y)" 1=a;(x)b;(y) 
i=l 
5 
8 
The error EA introduced by the approximation is measured 
as a sum of approximation errors in a least squares sense, 
'A 
1= 
i=P+l 
In this form, the orientational filter can be implemented by 
P banks of separable filters operating in parallel followed by 
accumulators as shown at 30 in FIG. 2. The separable 
approximation method enables fast operation of the filter 
and reduces the hardware complexity significantly. The 
separable approximation method of the present invention is 
a synthesis of two known approximation methods: Singular 
Value Decomposition (SYD) and Orthogonal Sequence 
Decomposition (OSD), and will be referred to throughout as 
Singlular Value/Orthogonal Sequence Decomposition (SV/ 
OSD). 
In this sense, u; are the optimum orthogonal sequences for 
10 approximating {hk}. Since this algorithm is a combination of 
SYD and OSD, it is denoted as SV/OSD. 
Convergence of Approximation 
A normalized energy error for an approximation order P 
is at most for SYD and SV/OSD. A normalized energy error 
15 
for an approximation order P is 
SV/OSD 
The SYD method has an advantage in its approximation 20 
performance and a disadvantage in its implementation, 
while the reverse is true for OSD. Hence there are two 
approaches to improve the separable approximation method. 
The first one is to improve the implementation of SYD by 
adapting the implementation structure of OSD. The other is 25 
to improve the performance of OSD by obtaining the set of 
orthogonal sequences which provides the best fit to a given 
set of orientational filters to be approximated. It will be 
shown that these two approaches produce the separable 
approximation method of the present invention, which will 30 
now be described. 
A set of orientational filters {hk(x,y)} (O~k<FN) is 
approximated, where each filter is represented as an FIR 
filter of the size MxM. Typically, {h~x,y)} is generated 
from the same prototype filter with different orientations. 35 
The orientational filters are combined to form a MxMF N 
matrix A as shown at 32 in FIG. 3. The matrix Ais called an 
approximation matrix. A set of M adjacent columns from 
column kM () through column (k+l)M-1 constitutes the 
orientational filter hk. SYD is performed on the approxima-
tion matrix to produce rA vectors of length M, rA vectors of 40 
the length (MFN) and the corresponding eigenvalues. The 
first set of vectors are denoted u; ( 0 ~ i<rA), the second set 
M 
1= W;; 
2 i=P+l 
Ep= -M--. 
Lwii 
i=l 
By choosing the P largest eigenvalues (wJ for the 
approximation, the worst case is when all the eigenvalues 
have t?e same magnitude. In the case, the normalized energy 
error is 
Referring to FIG. 6A, the methodology for generating 
digital orientational filters according to the separable 
approximation method of the present invention is shown 
generally at 50. At step 52, filter parameters, such as filter 
type, number of filters required (F N), filter size (MxM), and 
other filter parameters that may vary according to a particu-
lar filter application are input into the processor. At step 54, 
filter coefficients are computed through use of the above 
filter parameters. At step 56, an approximation matrix, as 
shown in FIG. 3 is generated. The filter approximation order 
(P) is then input at step 58 and SYD is performed on the 
matrix to generate a set of ID filters with optimal orthogonal 
coefficients based on SYD performance characteristics. As a 
result, P vertical filters are output at step 60. 
of vectors are denoted V; (O~i<rA), and the square root of 
the eigenvalues are denoted as W;;· Then hk can be expressed 
by the separable form, 45 Correspondingly, F NP horizontal filters are output at step 62. 
'A 
hdm, n] = 1= WiiUmVkM+n· 
Assume the weights, W;;, are sorted by magnitude in 
descending order, and the corresponding vectors, u; and v;, 
are shuffled in correspondence with W;; so that the decom-
position hx[ m,n] still holds. Then the pTH order separable 
approximation is: 
p 
hdm, n]::::::: 1= WiiUmVkM+n· 
Since the u; are orthogonal vectors, and are common to all 
the orientational filters to be approximated, it is an OSD, 
although the vectors are generated from SYD. Hence this 
separable approximation method accomplishes the advan-
tages of both SYD and OSD; it has the good approximation 
performance of SYD and the simple implementation of 
OSD. 
FIG. 6B illustrates the generation of digital filters as 
shown generally at 70. 
It should be appreciated that there is no guaranteed upper 
error bound for OSD. The approximation may not converge 
50 to the function being approximated if the function does not 
reside in the sub-space spanned by the set of orthogonal 
functions used for the approximation. 
Implementation Scheme for SV/OSD 
As described above, SV/OSD possesses good approxima-
55 tion performance good implementation characteristics, and 
good convergence behavior. There are two possible schemes 
to implement lD spatial filtering. They are called pipelined 
filtering and parallel filtering. Pipelined filtering takes one 
input at a time sequentially. The input is multiplied with all 
60 the filter coefficients at the same time, and each result of 
multiplication is accumulated at an accumulator attached to 
the multiplication unit. All the accumulators are connected 
together through delay units, and the output of the last 
accumulator in the chain is the output of the filter. Parallel 
65 filtering takes M inputs at a time, and each input is multi-
plied with a corresponding filter coefficient. The results of 
the multiplications are added through the binary tree adder 
6,009,447 
9 
to form the output. Both schemes require M multipliers and 
M-1 adders. Pipelined filtering is suitable for a sequential 
input stream, and parallel filtering is suitable for a parallel 
input stream. For SV/OSD, pipelined filtering is suitable for 
horizontal filtering since the input is coming sequentially in 
a raster order, and parallel filtering is suitable for vertical 
filtering since the input can be provided in parallel so that the 
latency of the system reduces to O(NM). It is important to 
note that the parallel filtering requires an input buffer so that 
M parallel inputs can be provided to the filter unit simulta-
neously. 
A point very influential to the structure of the filter system 
10 
recognized first, then it is easy to find a window as a 
rectangle inside the house. But it is more difficult to recog-
nize the window if the house is not recognized first. Using 
the multi-resolution technique, it is easy to first process the 
5 coarse image to understand the context of the original image 
and then move to finer images for further processing. 
(coarse-to-fine processing). Third, coarse-to-fine processing 
can speed up processing since the coarse information can be 
represented by fewer samples. The finer details require more 
10 
samples, but the prior information derived from the context, 
constrains the region of observation. Moreover, if an object 
can be recognized from a coarse description, then processing 
finer details is not needed. In addition, decomposing an 
is the order of filtering. Separable filtering can be done in 
either the horizontal-vertical order or the vertical-horizontal 
order. As shown at 34 in FIG. 4, with the horizontal-vertical 
order, the system requires an intermediate memory of size 
NM after each horizontal filter, thus requiring a total of NMP 
words of memory. As shown at 36 in FIG. 5, with the 
vertical-horizontal order, the intermediate memory can be 
shared among the vertical filters because the inputs to the 20 
vertical filters are identical. Thus the system requires a total 
15 image into different frequency bands is useful in analyzing 
the image. For example, an edge consists of higher frequen-
cies while most texture information has its energy concen-
trated in narrow frequency bands. 
The following discusses multi-resolution decomposition 
computation applied to MRD in both decimated and undeci-
mated formats. The discussion is based on wavelets because 
the time-frequency characteristic of the wavelet transform is 
suitable for image analysis. More specifically, decimated 
MRD is described in terms of a Wavelet Series (WS) defined 
by 
of NM words of memory. The amount of intermediate 
memory needed is NM no matter how many orientational 
filters are to be implemented. 
It should be noted that the orthogonal filters are imple- 25 
mented as horizontal filters and the projection filters are 
implemented as vertical filters in the horizontal-vertical 
filtering scheme. On the other hand, the orthogonal filters are 
implemented as vertical filters and the projection filters are 
implemented as horizontal filters in vertical-horizontal fil- 30 
tering scheme. 
Because of the memory requirement advantage the 
vertical-horizontal filtering scheme possesses over the 
horizontal-vertical filtering scheme, the vertical-horizontal 
filtering is employed to implement SV/OSD, as shown at 36 35 
in FIG. 5. 
= ~ f[i, j]2-k+ 1 1/1(2-k+! i - m, 2-k+! j- n) = 
ij 
~f[i, j]i/lk-l(i- 2k-lm, j-2'- 1n) 
ij 
where dk m n is the k'h decomposition output at location 
(m,n), fTi,jJ is the input signal, 1.jJ(x) is the orientational filter, 
and 1.jJk(x)=2-k121.jJ(2-kx). It is assumed that the sampling 
period of fT m,n] is 1 in both directions for simplicity. For 
general sampling, 1.jlk(m,n) has to be replaced by 1.j!k(mTx, 
As discussed above, the approximation converges at least 
linearly to the original filter as the approximation order 
approaches M. Thus, even for the worst case, the computa-
tional complexity of the filters using the SV/OSD is only 
slightly more than the complexity of the direct method and 
40 
nTy) where Tx and TY are sampling periods in the x direction 
and y directions respectively. 
is close to 1 for large FN. However, the experimental 
convergence and speed of SV/OSD is much faster than 
linear convergence in most cases, and sufficient approxima-
tion for computer vision applications can be achieved with 45 
a much smaller approximation order than M. Thus, SV/OSD 
satisfies all the implementation criteria. Also, the computa-
tional advantage of SV/OSD increases as the number of 
filters in the system increases. 
The formula for undecimated MRD is 
= ~f[i, j]T'+ 11/!(T'+1 i-T'+ 1m, r'+ 1 j-T'+ 1n). 
i ,j 
The decimated MRD involves decimation by 2 in both 
directions. Thus, the size of the image decreases by 2x2 
II. Multi-Resolution Image Decomposition 
The present invention also provides a multi-resolution 
image decomposition (MRD) method that produces multiple 
filtered output images. Each output represents the contents 
of an input image over a certain frequency region. The 
output image corresponding to a lower frequency region has 
a lower resolution and can be decimated (decimated MRD). 
Hence, the multiple output images have a pyramid structure 
wherein the lowest frequency plane is the smallest and the 
size increases as the frequency band associated with the 
image increases. 
Thus, multi-resolution image decomposition is suitable 
for an image analysis platform for the following reasons. 
First, the objects to be recognized often have very different 
sizes. Hence it is impossible to define the optimal resolution 
for all the Second, the objects can be recognized easily if the 
context of the image is known. For example, if a house is 
50 from one level of the decomposition to the next level. The 
first level decomposition requires N2 convolutions with each 
convolution having a complexity O(M2). Hence the total 
complexity of the first level decomposition is O(N2 M2). 
Similarly the k'h level of decomposition requires N2/4k-l 
55 convolutions. Each convolution at this level has a complex-
ity 0(4k-1M2) due to the dilation of the filter. Hence the total 
complexity of the k'h level decomposition is O(N2 M2). Now 
if the decomposition is performed up to the L'h level, the 
complexity of the whole decomposition is O(LN2 M2). The 
60 discrete wavelet transform (DW1) performs MRD in a 
recursive fashion. The k'h level decomposition is performed 
on the k-l'h decomposition using the same filter kernel. This 
recursive scheme is possible when the wavelets are orthogo-
nal to each other. With DWT, the decomposition can be done 
65 in O(N2M2 ) and is independent of L. Then at the k'h level of 
decomposition, the complexity is O(N2M2/4k-1). Thus, the 
total amount of the computation is 
6,009,447 
11 
L 4 
l..,N2M2/4k-1:;; 3N2M2 
k=l 
An approximation method that will be referred to as The 
Wavelet Approximation approximates ID continuous wave-
lets in a manner that enables application of DWT even 
though the wavelets are not orthogonal. The Wavelet 
Approximation can be extended to 2D continuous wavelets 
by first decomposing the wavelets using Separable Approxi-
mation and applying Wavelet Approximation to each lD 
filter separately. A new approximation method according to 
5 
12 
As the wavelet is dilated, the discrete filter also expands 
with zeros being padded between each filter coefficient. This 
suggests that a dilation operation at each level of decompo-
10 
sition can be moved prior to the 
operator identity 
discrete filter due to an 
DnG(zn)=G(z)Dn 
a preferred embodiment of the present invention is the 
Separable Wavelet approximation (SWA), and will be dis- 15 
cussed in detail below. where DK is decimation by K, and G(z) is a z-transformed 
representation of a discrete filter. It also shows that the 
wavelets 1.jlbk(x) can be implemented in a cascaded fashion 
with two filters, g[ m] and cp/(x). The mother wavelet decays 
as t---;.oo and can be truncated at some time point in the above 
For undecimated MRD, the size of the image stays the 
same, whereas the size of the filter increases by 2x2. Thus 
the amount of computation increases exponentially as the 
decomposition level increases. For the level L MRD, the 
total computation is 
L I.. 4k-l N 2 M 2 = (4L - l)N2 M 2 /3 
k=l 
Thus, the computational complexity of undecimated 
MRD is 0( 4LN2M2). 
The description below first describes the Discrete Wavelet 
Transform (DWT) which computes a decimated lD MRD 
efficiently. Second, it introduces the Wavelet Approxima-
tion. Third, it introduces SWA. Fourth, it discusses imple-
mentation issues for undecimated MRD, and suggests an 
efficient computation scheme based on SWA. 
Discrete Wavelet Transform 
20 approximation. The sequence is derived in such a way that 
the Wavelet approximation is exact at integer time points 
within the truncation points. 
Once the wavelet approximation is obtained, the wavelet 
25 decomposition for k~b lean be recursively performed. The 
low pass sequence is decimated by 2 at each decomposition. 
However, the high pass sequence is not decimated. Thus the 
decomposition is not critically sampled. A basic spline 
function is often used as the interpolation function, cp/x). 
30 With the basic spline of order k, the discrete filter h[ n] 
becomes 
( 
k + 1) h[n] = rk-1/2 n 
A wavelet series is defined by a set of wavelet coefficients 35 
{ dmk}, where 
There are no constraints on the mother wavelet, 1.jJ(x). The 
method works for any function as long as a sufficient 
40 approximation of the wavelet is done through the spline 
function. 
Separable Wavelet Approximation Although wavelet series decomposition can be done directly to each subspace Wk through implementation of the 
above equation, a more efficient way is to use recursion. The 
first decomposition divides V 0 into two sub-spaces, V 1 and 
W 1 . The second decomposition divides V 1 into two sub-
45 
spaces, V2 and W2 . The kth decomposition divides Vk-i into 
Most of the notations and approximations are 2D exten-
sions of ones in the previous section. First a 2D mother 
wavelet Yb(x,y) is decomposed into a separable form using 
SV/OSD, 
V k and Wk· If an input is given as a continuous function Q, 
it is mapped to a discrete sequence c0 m by 
p 
c~ =I f(x)¢,(x - m)dix 50 i/Jb(x, y) =I., a;(x)b;(y). 
Then each ID function is approximated by the basic If the input is given in discrete form (f[ m ]), the DWT can 
then be performed in the discrete domain as is well known 55 spline. 
in the art. 
Wavelet Approximation 
The wavelet approximation method allows the wavelet 
series to be computed in a recursive fashion, the same way 
as the DWT even though the wavelets are not orthonormal 60 
to each other. A mother function of dyadic wavelets is 
denoted as 1.jlb(X). The continuous wavelets are approxi-
mated by 
a;(x)" I., gf[j]¢1(x - j) and b;(y)" I., g{[j]¢1(y- j). 
Following the development in Section 3.2, the decompo-
sition can be done in a recursive fashion using discrete filters 
h[m], gt[m] and g([m]. 
6,009,447 
13 
-"'f[" "],J,k-1(· -2k-! . -2k-!) 
- L....J lx, ly 'f' b lx m, ly n 
ix,iy 
= ~ f[ix, iy]2-k+!l/lb(2-k+! ix - m, 
ix,iy 
-n) 
- '\' f[" . i2-k+l '\' (2-k+l. - )b (2-k+!. - ) ,...., U lx, ly L....J Gt lx m t ly n 
. . l 
1x,1y 
"Lf[ix, iy]Tk+!L {~g{[jx]¢1(Tk+!ix-m-jx)} 
1x,1y l 
= ~fUx, 
ix,iy 
= ~ f[ix, iy]T'¢1(T'ix -m)¢1(T'iy-n) 
ix,iy 
= L f[ix, iy]T'{ y'2~ h[jx]¢1(Tk+! ix - 2m - jx)} 
1x,1y 
{ y'2~ h[jy]¢1(Tk+! iy - 2n - jy)} 
= L h[jx -2m]L h[jy - 2n] ~ f[ix, iyJ4'l 
Jx Jy 1x,1y 
= ~ f[ix, iy]¢1(m - ix)¢1(n - iy) 
ix,iy 
The level k decomposition is performed on a decimated 
k-1 level decomposition, namely ck-1 . The low pass part of 
the decomposition above is merely a separable 2D filtering 45 
by the same filter in both dimensions. The prefiltering part 
of the decomposition is also a separable 2D filtering by the 
interpolation filter. 
Plots of <Pm and li[ m] for spline orders of 1, 3, 5, and 7 are 
shown generally at 70 in FIGS. 7A-7H. The first order 50 
spline is shown at 72 in FIG. 7A, and the low-pass filter 
shown at 73 in FIG. 7E is generated by the equation h[n] 
above, with k=l. However, as the spline order increases, as 
shown at 74, 76 and 78, in FIGS. 7B, 7C, and 7D, 
respectively, the resultant filters shown at 75, 77 and 79 in 55 
FIGS. 7F, 7G and 7H, respectively approach a uniform 
Gaussian distribution, thereby causing the filter to produce 
a smoother approximation. The high pass part of the decom-
position uses a filter bank structure to implement the sepa-
rable approximation. It contains a set of separable filters 
with gt[m] for the orthogonal direction and gkY[m] for the 60 
projection direction. FIG. 8 shows the structure of SWA at 
80. There can be multiple high-pass banks for a system with 
multiple orientational filters. In that case, the same pre-filter 
stage and the low-pass bank can be shared among the 
multiple high-pass filter banks. Also the set of orthogonal 65 
filters, can be shared among the multiple orientational filters 
when SV/OSD is used for Separable Approximation. 
14 
Benefits of SWA are examined in terms of the implemen-
tation criteria. Assume the size of h is Mh, and the size of the 
spline filter cjJ is MI. Since the decomposition can be imple-
mented in a pipeline fashion as shown in FIG. 8, and every 
filter is separable, the throughput can be as small as 1/tm. 
Thus, the SWA satisfies the first implementation criterion. 
As soon as the pre-filtering stage starts generating 
outputs, the first stage decomposition can proceed. Also the 
k'h level of the decomposition can proceed as soon as the 
ck-1 is generated. Thus, the whole filter system operates in 
a pipeline fashion. The latency at the pre-filter is NMitm. The 
pixel input rate decreases by ¥.from one level to the next 
level because of the decimation at each stage. This implies 
that the latency increases at each stage since it takes more 
time to collect necessary pixels. The input rate at the k'h 
level decomposition is 4'-1 tm. Also the size of the output 
image decreases by 2 in each dimension. Thus, the latency 
of the low pass filter bank at the k'h level decomposition is 
(N/2k-l )Mh( 4k-1 tm)=zk-l NMhtm, and the latency of the high 
pass filter bank at the L'h level is ~-1NM. The total latency 
of the L'h level decomposition is approximately 
6,009,447 
15 
L-! 
NM1tm + ~ 2'-1 NMhtm + 2L-! NMtm = 
k=l 
16 
buffer stores one row of data after vertical filtering, and feeds 
input to the horizontal filtering units so that the long shifting 
delay between two multiplication units can be deleted. The 
scheme uses one DWT module which is time-multiplexed to 
with M""Mh. Note that Mr-5 and Mh=7 when the 5th order 
basic spline is used for the wavelet approximation. If the L'h 
5 compute multiple DWTs at different time-points. FIG. 9 
illustrates the reordering scheme of the reorder buffer gen-
erally at 82. A counter and a PAL programmed for each 
decomposition level is used to provide a proper address to 
the reorder buffer. This scheme is scalable and the utilization 
of the chip area for the horizontal filter unit is much better. 
The filter unit can contain more multipliers reducing the 
:~~~ed:~~~~~i~~:xi~~~~~ :~el~~Zn~;i:; t~isd~l:~=~s fi;{:~ 10 
O(Y.-1 tmNM). Therefore, the SWA satisfies the second 
implementation criterion. 
The direct implementation requires computation of 
O(FNLN2M). When SWA is employed, the first level of 
decomposition requires 2N2MI multiply-accumulate opera- 15 
tions (macs) for pre-filtering, 2N2Mh macs for low-pass 
filtering, N2 PM macs for vertical filtering in the high-pass 
banks, and F NN2PM macs for horizontal filtering in the 
high-pass banks. The second level of the decomposition 
requires N2Mh/4 macs for low-pass filtering, PN2M/4 macs 20 
for horizontal filtering in the high-pass banks, and 
F NN2 PM/4 macs for vertical filtering in the high-pass banks. 
Thus, the whole decomposition requires 2N2MI macs for 
pre-filtering, 4N2Mh/3 macs for low-pass filtering, 4PN2 M/3 
macs for vertical filtering in the high-pass banks, and 25 
4FNN2 PM/3 macs for horizontal filtering in the high-pass 
banks. Horizontal filtering in the high-pass banks dominates 
the complexity of the computation. Therefore the amount of 
the computation for the decomposition is 4FNN2 PM/3. The 
computation increases only slightly from single-resolution 
30 to multi-resolution decomposition, and is much less than the 
direct method. The SWA satisfies the third implementation 
criterion. 
The pre-filtering stage requires NMI words of memory. At 
the k'h level of the decomposition, the size of the input image 
to the filter banks (i.e. ck-1 ) is 2k-1Nx2k-1N due to the 35 
decimation. Thus, at the k'h stage, the low-pass filter bank 
requires NMh/2k words of memory and the high pass filter 
bank stage requires NM/2k-l words of memory. Both the 
low-pass and high-pass filter access the same part of the 
input image, thus the input buffer can be shared. Assume 40 
M>Mh. Then only NM/2k-l words of memory are required 
instead of NM/2k-l + NMh/2k-1 . The total storage required is 
L 
number of chips needed for implementing a horizontal filter. 
The scheme requires an extra PAL and counter. However, 
they can be shared among other horizontal filters at the same 
level. 
For vertical filters, a suitable implementation of a dilated 
filter is to introduce Mx2k-l M multiplexers between the 
input buffer and the filter unit. The multiplexer selects the 
rows where non-zero filter coefficients are aligned. The 
complete structure for an undecimated multi-resolution 
decomposition system using SWAis shown at 84 in FIG.10. 
Using the scheme, the latency of the decomposition stays 
the same with the decimated SWA case, which is 
approximately, the throughput also stays 1/tm, the amount of 
computation is 2N2 MI+2LN2 Mh+LN2 MP+LF NN2 MP= 
2N2MI+LN2(2Mh+MP+FNMP), and the amount of storage 
is NMftLN max(M,Mh) +NL(FNP+l). The number of reor-
der buffers in the system is L(FNP+l) and the number of 
reorder buffer address generators is L. 
Referring to the flow diagram 100 in FIG. 11, the meth-
odology implemented for the multi-resolution image decom-
position method of the present invention is shown. Steps 102 
through 114 are identical to those steps utilized for gener-
ating the digital orientational filters through the separable 
approximation method described above. However, upon the 
vertical and horizontal filters being output at steps 112 and 
114, the filters are subjected to a spline approximation as 
shown at 116, 118 and as described above. In addition, at 
step 120, a basic spline order is input into the basic spline 
function and is utilized along with the spline approximation 
calculations. Subsequently, a high pass vertical filter is 
output at step 122, and a high pass horizontal filter is output 
NM,+~ N(M + Mh)/2k-l < N(M1+2M+2Mh) 
k=l 
45 at step 124. In addition, an interpolation filter is output at 
step 126 for pre-filtering, while a low pass filter is output at 
step 128 for the low-pass filter banks. 
This quantity is independent of FN and O(N), and the 
approximation satisfies the fourth implementation criterion 
regarding small storage requirements. 50 
III. Steerable System 
In many image analysis and image processing tasks, it is 
useful for the filter system to have a capability of changing 
its orientation dynamically under adaptive control. Such a 
filter system is called a steerable system, and those tasks 
Another benefit of SWA is that a large part of the 
decomposition structure can be shared with other decompo-
sition structures associated with different orientational fil-
ters. In FIG. 8, the portion which can be shared with different 
orientational filters is enclosed in dashed boxes. 
Undecimated Separable Wavelet Approximation 
55 which utilize the steerability include local orientation 
analysis, contour following, target tracking and image 
enhancement. Filter h(x,y) is considered steerable if a 
rotated copy of h(x,y) at an arbitrary orientation can be 
expressed by a finite linear sum of basis filters. Thus, 
The decimation at each level of the SWA plays an 
essential role in efficient decomposition. However, in some 
applications, it is preferred to have the decomposition with-
out decimation. This is common for image analysis appli- 60 
cations since the decimation introduces aliasing and causes 
the transformation to be shift invariant. This section exam-
ines how to produce an undecimated MRD without losing 
computational and implementational advantages of the 
SWA. 
An external buffer (Reorder Buffer), is implemented for 
undecimated separable wavelet approximation. The reorder 
65 
1 00 
he(x, y) = ;;:a:o(x, y) +~{an (x, y)cos (n8) + f3n(x, y)sin(n8)) 
n=l 
where, 
an(x, y) = ~[he(x, y)cos(n8)di8 and 
L -n 
6,009,447 
17 
-continued 
f3n(x, y) = ~[he(x, y)sin(n8)di8. 
L -n 
Then a Qth order approximation to the steerable system can 
be accomplished by selected Q basis filters from a;(x, y) and 
/3; (x, y) with the Q largest energy value& The approximation 
formula is 
Q 
he(x, y)" ~ y;(x, y)q;(B) where 
i=l 
{y;} c {a;, /3;) and 
{ 
1 /2 if y; = ao 
q;(B) = cos(n8) if y; E {a;}, i * 0. 
sin(n8) if y; E {/3;} 
18 
-continued 
with 
5 
= ~ f[ix, iy]c/Jt(m - ix)¢1(n - iy). 
ix,iy 
FIG. 13 shows the structure AT 132 which corresponds to the 
10 approximation. The pre-filter and low-pass modules are the 
same as SWA The high pass modules are divided into three 
parts: orthogonal filters, projection filters, and interpolation 
units. The orthogonal filters and projection filters approxi-
mate the FSA basis functions using SV/OSD. They are 
15 followed by the interpolation unit to steer the filter to a 
particular orientation. The orthogonal filters and projection 
filters can be shared among filters at different orientations. 
An independent interpolation unit is necessary for each 
orientation. 
The idea of undecimated MRD discussed above can be Note that (31) is the complex Fourier series of h8 * (x,y) 20 
which is 2it periodical in e. Thus, a;(x,y)=a* ;(x,y)+j ~* ;(x,y). easily extended for steer ability in a very similar way as the 
above decimated MRD scheme. FIG. 14 shows the structure 
In FSA, flexibility is present in its filter selection, and 
a;(x,y) and b;(x,y) which correspond respectively to the real 
and imaginary part of the complex filter a;(x,y) are separate 
entities in the filter selection process. 25 
of the undecimated steerable MRD at 134. The only differ-
ence between FIGS. 13 and 4 in terms of the hardware 
structure is the addition of the interpolation units which steer 
the filter. The lower portion of the MRD module in FIG. 14 
approximates a set of FSA basis functions, whereas the FSA+SV/OSD 
The basis filters are non-separable, in general, and the 
method does not satisfy the orientational filter implementa-
tion criteria discussed above. For the steerable system to 
satisfy the criteria, SV/OSD can be applied to { (g;(x,y)} for 
the separable approximation at the expense of additional 
error. The approximation matrix is formed by combining for 
the Q'h order FSA Using this combination of FSA and 
SV/OSD, denoted as FSA+SV/OSD, an orientational filter at 
an arbitrary orientation can be approximated as, 
Q p 
he(x, y)" L q;(B) ~a J(y)bu(x), where 
i=l ;=l 
p 
y;(x, y)" ~aJ(y)bu(x) and 
j=l 
The computational structure of FSA+SV/OSD is shown at 
130 in FIG. 12. To implement a system with multiple 
orientations, the number of orientation control units (inside 
the dashed box in FIG. 12) must equal the number of 
required orientations. 
The throughput of the system can be l/tm, the latency is 
O(NM), the amount of computation is N2(MQP+FN), and 
the storage requirement is NM. 
FSA+SWA 
counterpart in FIG. 13 approximates a set of orientational 
filters. It also requires LQ(P+ 1)+1 reorder buffer and L 
30 reorder buffer address generators. Referring to FIG. 15, the flow diagram shown at 150 
represents the methodology utilized to implement steerable 
digital orientational filter system. The methodology steps 
152-17 4, except step 154, are identical to the flow diagram 
35 steps shown at 100 in conjunction with the description of the 
multi-resolution image decomposition method of the present 
invention. The methodology shown at 150 differs in that the 
approximation contains (Q) orientation independent to 2D 
basis filters instead of multiple orientation filters as in 100. 
40 The basis filters are the results of Fourier series expansion 
applied to the mother wavelet filter. Step 154 also produces 
interpretation coefficients which are used to adjust (steer) the 
orientation of the overal filter operation. 
45 
IV. Hardware Design 
The following description relates to hardware design of a 
set of VLSI chips for SV/OSD, SWA and FSA+SWA imple-
mentation. First, the design for SV/OSD is considered. 
Second, the design for SV/OSD is adapted for SWA Third, 
the design for undecimated SWA will be discussed. Fourth, 
50 it is shown that a modification on the design for SWA can 
implement FSA+SWA 
In addition to the design criteria discussed earlier, an 
additional design concern is scalability. Four types of scal-
ability are considered: 
55 The next extension for the steerable system is to add MRD 
capability. FSA and SWA can be combined in the following 
way. First, the filter of interest is decomposed with FSA 
Second, each basis filter is decomposed into a separable 
form using SV/OSD. Third, each separable pair is approxi-
mated using the Wavelet Approximation. This approxima- 60 
tion process is defined by 
1. Input Image Scalability. 
The system can accommodate any input image size by 
only changing the size of the input buffer which is imple-
mented with discrete memory chips. 
2. Filter Number Scalability. 
The design is modular so that the number of filters can be 
increased by adding extra filter components to the system. 
The amount of hardware increased only linearly as the 
number of filters increases. 
where 65 3. Filter Size Scalability. 
The size of filters can be increased by adding extra filter 
components to the system. No new design is necessary for 
6,009,447 
19 
a different filter size. The amount of hardware increases only 
linearly as the filter size increases. 
4. Approximation of Scalability. 
The order of approximation can be increased by adding 
extra filter components to the system. No new design is 
necessary for a different approximation order. The amount of 
hardware increases linearly as the order of approximation 
mcreases. 
The main components of the design are a separable 
approximation filter bank and a 2D separable filter. The filter 
bank is implemented using two custom made VLSI chips: 
vertical filter chip (VFC) and horizontal filter chip (HFC). 
The 2D separable filter is implemented with single custom 
made VLSI chip, separable filter chip (SFC). 
In the following description, it is assumed that a memory 
chip allows simultaneous write and read operations in one 
clock cycle. This can be done by either using dual port 
RAM, using two separate memory chips for read and write, 
using a video RAM for random writes and sequential reads, 
or using fast RAM so that on one phase (high clock) the 
memory can be read and on the other phase (low clock) it 
can be written. The width of the input pixels to the system 
is denoted as Bp. 
20 
to change the parallel input pattern from the memory banks 
to the chip. The secondary register file is provided to enable 
a smooth transition from one row to the next row. Its 
contents are updated with a new set of filter coefficients 
5 
while the filter coefficients are read from the primary register 
file. At the end of the row, the contents of the secondary 
register file are transferred to the primary register file, and 
filtering for the next row can proceed continuously without 
10 any interruption. Either the host or a local_controller which 
resides on the same board with the filter chip can be used to 
store a new set of coefficients to the secondary register file. 
The structure of the VFC is shown in FIG. 18. 
15 An extra adder at the end of each parallel filter is used to 
add the output of the filter (sum$$) and a partial result 
(vpout$$) which is provided externally. This adder is needed 
when the size of the filter is larger than W Fv· 
Component Count 
20 
The table below lists the major components in a VFC and 
their counts. 
Many signal names in the design have numeric post-fixes, 25 
such as houtl and vout2. Within the text, the notation 
<signal_prefix>$$ is used to refer to the set of signals with 
the same prefix but different numeric post-fixes. For 
example, hout$$ implies houtl, hout2, hout3, and so on. 
Component Count of the Vertical Filter Chip 
Component Type 
Multiplier 
Adder 
Count 
WFvNFv 
WFVNFV 
NFVWFv 
Size (bits) 
Signals internal to the VLSI chip are indicated with a 30 
single underline. All signals without an underline are exter-
nal to the VLSI chip. 
D flip-flop 
Register File 
Input Ports 
Output Ports 
Global Bus 
2WFV 
(WFV+2NFv+l)Bp 
2NFvBP 
WFv+l FIG. 16 shows at 180 the symbols used in later figures to 
represent certain components in the designs. 
SV/OSD 
The VLSI implementation structure of SV/OSD is shown 
at 182 in FIG. 17. The structure can be divided into three 
modules: input buffer, vertical filters and horizontal filters. 
The vertical filters and horizontal filters are to be designed 
in separate VLSI chips, since only one set of vertical filters 
is needed for the whole system (for single stage filtering), 
whereas each orientational filter needs its own set of hori-
zontal filters. 
Input Buffer 
An intermediate memory module is used to provide M 
parallel inputs to the vertical filters. The memory module 
consists of M independent memory banks whose size is N 
words each. One memory module is needed for all the 
orientational filters. A separate intermediate memory module 
is needed at each level of MRD. 
Components Count 
The memory module requires M memory chips of N 
words each. 
Vertical Filter Chip (VFC) 
Each vertical filter requires M multipliers and M-1 
adders. Assume W Fv is the width of a filter implemented in 
a VFC, and N Fv is the number of filters implemented in the 
chip. The chip requires BPWFv input ports where input 
pixels are received in parallel. It also requires 2BPNFv 
outputs ports where the output of N Fv vertical filters are sent 
to the horizontal filter chip. The pixel width is doubled after 
multiplication to preserve accuracy. Each filter in the chip 
requires a register file of BP W Fv bits where the filter 
coefficients are stored. The coefficients for each multiplica-
tion do not change while the input is coming from the same 
row, but change when the input moves to the next row since 
it is easier to update the coefficients at every new row than 
35 
Horizontal Filter Chip (HFC) 
The horizontal filter unit employs a pipelined filtering 
scheme. Each filter requires M multipliers and M-1 adders. 
There is no need for a buffer between a vertical filter and 
40 horizontal filter pair. The output of the vertical filter can be 
directly fed to the HFC. Outputs of the horizontal filters are 
added together by a binary adder to form the linear sum of 
the separable approximation. The structure of the HFC is 
45 shown at 186 in FIG. 19. 
Assume W FH is the width of a filter implemented in a 
HFC, and NFH is the number of filters implemented in the 
chip. Note that the number of bits in the output of vertical 
50 filtering is 2BP to preserve accuracy. The chip requires 
2BpNFH input ports where input pixels are received sequen-
tially for N FH filters. It also requires BP input ports ( coeff) 
for loading filter coefficients, and 3BpNFH input ports 
(hpout$$) for a partial result of the horizontal filtering, 3BP 
55 input ports (psum) for a partial result of the linear sum for 
the separable approximation. The ports hpout$$ are used 
when the filter size is larger than W FH, and the ports psum 
is used when approximation order is larger than N FH. It 
60 
requires 3BP output ports for the result from the binary 
adder, and 3NFHBp output ports for the partial results from 
each filter before the binary adder. The latter outputs are 
used together with hpout$$ when the filter size is larger than 
W FH. The secondary register files are provided for updating 
65 the filter coefficients without an interruption to the filtering. 
Unlike vertical filtering, the coefficients are fixed as long as 
the orientational filter being approximated is the same. 
6,009,447 
21 
Component Count 
The table below lists the major components in a HFC and 
shows their sizes and counts. 
22 
multiplexer (decimate) is 0, and the input to the horizontal 
filter is the external input, hin, which is supplied from a 
reorder buffer. 
It requires the following input ports: BP W Fs for parallel 
Component Count of the Horizontal Filter Chip 
Component Type Count Size (bits) 
5 input pixels (in$$), 2BP for hin, and BP for filter coefficients 
( coeff). It requires the following output ports: 2BP for vout 
and 3BP for the filter output (out). 
Multiplier WFHNFH 2B,,xBp 
Adder WFHNFH 2Bp 
D flip-flop NFHWFH+l 3Bp 
Register File 2WFH Bp 
Input Ports (5Nptt+4)Bp 1 
Output Ports (3NFH+3)Bp 
Global Bus NFH+l 2Bp 
Architecture p FIG. 17 shows the VLSI architecture of 
SV/OSD with FN orientational filters being implemented. 
Each box corresponds to one VLSI chip. Every HFC obtains 
the same vout inputs from VFC, which may not be clear 
from the figure. 
10 
15 
20 
Only the input buffer module is affected by the input 
image size. Thus, the architecture satisfies the first scalabil-
ity. The number of orientational filters implemented can be 
increased by adding more HFC module as shown at 190 in 
FIG. 20. Thus, the second scalability is satisfied. When the 25 
filter size M is larger than W Fv and W FH, additional VF Cs 
and HFCs can be connected to as shown. Each VFC obtains 
inputs (in$$) from distinct memory banks. The outputs of 
the first VFC are connected to the input (vpout$$) of the next 
VFC. The last VFC of the sequence produces the results of 30 
the vertical filtering which are passed to HFCs in the same 
way as in FIG. 19. Multiple HFC are connected in cascade. 
Every HFC obtains the same vout$$ from the last VFC. Each 
HFC except the last one produces partial results from each 
filter (hout$$), which are connected to hpout$$ of the next 35 
HFC in the chain. The last HFC produces the complete result 
of the separable approximation. The architecture satisfies the 
third scalability criterion. When the approximation order P 
is larger than NFv and NFH, additional VFCs and HFCs can 
be connected to existing VFCs and HFCs as shown in FIG. 40 
19. Every VFC receives the same set of inputs from the 
memory module. There is no interconnection among VFCs. 
The output of a VFC (vout$$) is connected to vout$$ of the 
corresponding HFC. In the HFC chain, the output hout is 
connected to psum of the next HFC, if any. The last HFC 45 
produce the complete result of the separable approximation. 
The architecture satisfies the fourth scalability criterion. 
SWA (Decimated MRD) 
Implementation of SWA can be divided into three mod-
ules: pre-filter, low-pass filter, and high-pass filter modules. 50 
Both the pre-filter and low-pass filter are separable filters, 
and each of them can be implemented with a single VLSI 
chip. The high-pass filters are implemented using SV/OSD. 
Pre-filter Unit 
Component Count 
The table below lists the major components in a SFC and 
shows their sizes and counts. 
TABLE 9-3 
Component Count of the Separable Filter Chip 
Component Type Count Size (bits) 
Multiplier I WFS Bp x Bp 
Multiplier II Wps 2Bp x Bp 
Adder I Wp5 -1 2Bp 
Adder II Wp5 -1 3Bp 
D flip-flop I WFS 2Bp 
D flip-flop II Wp5 -1 3Bp 
multiplexer 1 2Bp 
Register File 4Wps Bp 
Input Ports (Wp5 +3)Bp 1 
Output Ports 5Bp 
Global Bus 3Bp 
Low-pass Filter Unit 
The SFC shown in FIG. 21 can be used for low-pass 
filtering. The size of the low-pass filter is k+2xk+2 for a k'h 
order basic spline. Thus, the size of the low-pass filter is a 
little larger than the pre-filter. A typical filter length for the 
low-pass filter is 9x9. 
High-pass Filter Unit 
The high-pass filters are implemented with SV/OSD. The 
structure shown at 182 in FIG. 17 can be used. 
Architecture 
FIG. 22 shows a VLSI architecture to implement SWAfor 
decimated MRD at 196. 
The size of each memory bank at the k'h level of decom-
position is N/2k-l due to a decimation of the output. The 
host/local_controller provides write signal to each memory 
bank. The decimation at the k'h stage can be performned by 
providing a write signal to memory banks so that only every 
other row of outputs are stored in which only every other 
column is stored. No decimation is done after the pre-
filtering. 
Only the memory chips have to be replaced when the 
input image size becomes large. Thus, the first scalability 
criteria is satisfied. The number of orientational filters imple-
mented can be increased by adding more HFCs in the 
high-pass filter module. Thus, the second scalability criteria 
is satisfied. The sizes of the pre-filter and low-pass filter are 
independent of the size of the orientational filters and are 
dependent on the order of the basic spline used in the 
55 approximation. The size of the high-pass filters is the same 
as the size of the orientational filters. The third and fourth 
The pre-filter is a separable filter. Note that the size of the 
pre-filter is kxk when a krh order basic spline is used for the 
approximation, and the spline order is typically set to 7. 
Assume W Fs is the width/height of the filter implemented in 
the separable filter chip (SFC) shown at 192 in FIG. 21. The 
chip contains 2W Fs multipliers and 2W Fs-2 adders. The 
result of the vertical filter (vout) is an external output and the 
input to the horizontal filter (muxout) can be either vout or 
the internal signal, hin. This extra multiplexing is needed for 
undecimated MRD. For the decimated MRD, the select 
signal for the multiplexer (decimate) is 1, and the input to the 65 
horizontal filter is the immediate output from the vertical 
filter. For the undecimated MRD, the select signal for the 
scalability criteria can be accomplished by employing a 
scheme such as that shown in FIG. 20 for the high-pass filter 
module. 
60 Undecimated SWA 
Input Buffer 
At the k'h level decomposition, the total amount of data at 
the input buffer increases due to dilation of filters to 2k-1 MN 
which is divided into M memory banks each having a size 
of 2k-1N. Consider each memory bank as a 2D memory 
array with the number of columns being N and the number 
of rows being 2k-1 . For the first N cycles, each memory bank 
6,009,447 
23 
outputs the data from the first row. For the second N cycles, 
it outputs the data from the second row. After 2k-1N cycles, 
the memory access returns to the first row. This way of 
dividing the memory buffer removes the M 2k-1 way mul-
tiplexers. The input pixel is written to the input buffer at a 5 
consecutive location in the first memory bank for 2k-1N 
cycles, and the access moves to the next bank, and so on. 
After 2k-1MN cycles, the access returns to the first memory 
bank. The host or the local_controller has to provide proper 
24 
Summary 
SV/OSD 
Two schemes may be used to perform a lD filter opera-
tion; pipelined filtering and parallel filtering. For 2D sepa-
rable filtering, the former is suitable for a filter whose 
direction aligns parallel to the input sequence, and the latter 
is suitable for a filter whose direction is perpendicular to the 
input sequence. Assuming the inputs are in a raster order, 
pipelined filtering is suitable for horizontal filtering, and 
addresses to each memory bank. 
Reorder Buffer 10 parallel filtering is suitable for vertical filtering. 
As described above, a reorder buffer stores one row of 
data after a vertical filtering. The reorder buffer and a reorder 
buffer address generator provides a sequence of data to 
horizontal filters. 
In order to make use of the implementation advantage of 
SV/OSD, the orthogonal filters have to be performed on an 
input image before the projection filters so that the outputs 
of the orthogonal filters can be shared among multiple sets 
Architecture 
FIG. 20 shows the VLSI architecture for undecimated 
SWA. The host/local_controller generates all the addresses 
15 of projection filters. In this case, the system requires P(F N+ 
1) filters. If the filter order is reversed, the system requires 
2PF N filters. The amount of intermediate storage can be 
reduced to NM if the orthogonal filters are the vertical filters, 
and are performed before the projection filters which are 
to the reorder buffers as well as the input buffers instead of 
havin~ a separate address generator. The reorder buffers in 
the k' level of the high-pass filters can be shared among the 
horizontal filters in the level. Note that in a low-pass filter 
module, the SFC provides an input to the reorder buffer, and 
the output of the reorder buffer is fed back to the SFC for 
horizontal filtering. An input signal, decimate, at the SFC 
has to be 0 for this configuration. The architecture satisfies 25 
all the scalability criteria. 
20 horizontal filters. 
FSA+SV/OSD 
FSA+SV/OSD can be implemented with a modification to 
the architecture for SV/OSD shown in FIG. 17. The inter-
polation module has to be designed and placed after the 30 
horizontal filters. The module interpolates the outputs from 
the basis filters to steer the filter. The module requires Q 
multipliers and Q-1 adders assuming the order of FSA is Q. 
Interpolation Unit 
An interpolation unit takes an output of every basis filter 35 
as an input every cycle, and multiplies the output with an 
interpolation coefficient. The results of the multiplications 
are added together at a binary adder. FIG. 16 depicts this 
computational scheme. It requires multiple interpolation 
units when the system implements multiple steerable filters. 40 
It can be seen from FIG. 16 that the VFC can be used to 
implement interpolation units. The parallel inputs (in$$) 
come from the set of basis filters, the register file contains 
the interpolation coefficients, and each unit in VFC is 
assigned to steer the orientational filter to a certain orienta- 45 
tion. 
Architecture 
The VLSI architecture for FSA+SV/OSD is shown at 200 
Upon reading of the foregoing description, it should be 
appreciated that SV/OSD guarantees a linear convergence of 
the approximation. The approximation converges to the 
original filter as the approximation order approaches M. The 
speed of convergence typically is faster than linear 
convergence, and approaches exponential convergence. 
SWA 
Separable Wavelet Approximation (SWA) may be utilized 
for an efficient decimated MRD scheme. It is a combination 
of separable approximation and wavelet approximation. The 
amount of computation for an L level decomposition using 
the method is approximately 4FNN2PM/3, the throughput is 
l/tm, the latency is approximately 2L-1 N(MR+M)tm, and the 
amount of storage required is less than N(Mft2max(M,M2)). 
SWA can be modified to perform an undecimated MRD. 
The scheme employs an intermediate buffer called a reorder 
buffer, at the output end of each vertical filter, to provide a 
reordered input stream to the horizontal filter. The reorder 
buffer arranges the order of a pixel stream so that the 
horizontal filtering can be done using a pipelined filtering 
scheme. The size of each reorder buffer is N. The system 
requires LF~P+l)+l reorder buffers and L reorder buffer 
address generators. The latency of the decomposition is the 
same as the decimated SWA case, which is approximately 
2L-1N(Mn+M)tm. The throughput is 1/tm, amount of com-
putation is 2N2 MI+2LN2 Mh+LN2MP+LFNN2 MP=2N2 MI+ 
LN2(2Mh+MP+FNMP), and the amount of storage is NMI+ 
LN max(M,Mh)+NL(FNP+l). 
SWA uses a basic spline function as an interpolation filter. 
The length of the pre-filter and the low pass filter in SWA 
depend on the order of the basic spline. If a k'h order basic 
spline is used, then the lengths of the pre-filter and the low 
pass filter are k and k+2 respectively. Based on performance 
evaluation, a basic spline of order 7 achieves a good 
in FIG. 23. The only difference from FIG. 17 is the addi-
tional VFC after the set of HFCs. The first, third and fourth 50 
scalability criteria can be accomplished in the same way as 
SV/OSD as previously discussed. When the number of 
orientations the system must steer exceeds NFv, multiple 
VFCs are needed after HFCs as shown in FIG. 23. Thus the 
architecture satisfies all scalability requirements. 55 performnance/computation trade-off. 
System Integration 
Finally the VLSI architecture for FSA+SWA is consid-
ered. It achieves the implementation criteria, MRD 
capability, and steerability. The architecture is constructed 
by modifying the architecture for SWA. Both decimated and 
undecimated FSA+SWA are considered here. ps Decimated 
FSA+SWA 
The architecture for decimated FSA+SWA is shown at 
202 in FIG. 24. 
Undecimated FSA+SWA 
The architecture for undecimated FSA+SWA is shown at 
204 in FIG. 25. 
Steerable Implementation 
Fourier Series Approximation (FSA) exhibits a high 
degree of flexibility in the filter selection process, since the 
real and the imaginary parts are separate entities in the 
60 selection process. SV/OSD can be combined with FSA 
(FSA+SV/OSD) for an efficient steerable filter implemen-
tation with the throughput of the system being ltm, the 
latency being O(NM), the amount of computation being 
65 
N2 (MQP+FN), and the storage requirement being NM. 
Also, FSA and SWA can be combined (decimated FSA+ 
SWA) for an efficient decimated steerable MRD scheme, 
with the throughput of the system being 1/tm, the latency 
6,009,447 
25 
being iz.-1N(MR+M)tm, the amount of computation being 
4N2PQM/3, and the storage requirement being N(MI+max 
(M,Mh)). FSA and SWA can be combined (undecimated 
FSA+SWA) for an efficient undecimated steerable MRD 
scheme, with the throughput of the system being l/tm, the 5 
latency being 2L-1N(MR+M)tm, the amount of computation 
being 2N2 MI+LN2 (2Mh+MP+MPQ), and the storage 
requirement being NMI+LNmax(M,Mh)+NL(PQ+l). 
Hardware 
26 
6. The method of claim 1, wherein said step of increasing 
a dilation factor comprises: 
generating two-dimensional mother wavelet functions; 
decomposing the mother wavelet functions into separable 
one-dimensional functions; 
generating a basic spline function; 
generating a low-pass filter based on a spline order, 
associated with said basic spline order function; and 
approximating each of the separable one-dimensional 
filters in terms of said basic spline function to generate 
said plurality of digital filters. 
7. The method of claim 6, wherein said step of approxi-
mating each of said separable one-dimensional functions 
Six VLSI architectures have been defined, including ones 10 
for (1) SV/OSD, (2) decimated MRD using SWA, (3) 
undecimated MRD using SWA, (4) FSA+SV/OSD, (5) a 
decimated steerable MRD using FSA+SWA, and (6) an 
undecimated steerable MRD using FSA+SWA. Each of 
these systems satisfies the four identified scalability criteria. 15 comprises approximating each of said separable one-
dimensional functions to generate interpolation, low-pass 
filters, and high-pass filters for recursive multi-resolution 
decomposition. 
What is claimed is: 
1. A method of processing video data, comprising the 
steps of: 
A providing a processor for processing input pixel data; 
B. inputting filter parameters into said processor; 
C. computing a set of 2D filter coefficients having a 
specific orientation from said filter parameters; 
D. generating a single digital filter approximation matrix 
based on said computed filter coefficients; and 
E. decomposing said approximation matrix into a plural-
ity of separable digital filters by utilizing a set of 
8. The method of claim 6, wherein said step of generating 
20 a basic spline function comprises the step of generating high 
pass filters in a horizontal orientation using the basic spline 
function given by the equations 
25 
orthogonal filter coefficients for each of said plurality and generating high-pass filters m the vertical direction 
of filters that provides an optimal approximation of using 
each of said plurality of filters; and 30 
F. increasing a dilation factor of said digital filters by a 
predetermined factor in order to generate an undeci-
mated multi-resolution image decomposition output. 
2. The method of claim 1, wherein said dilation factor 
comprises a 2x2 dilation factor. 
3. The method of claim 1, further comprising the steps of: 
G. filtering pixel data; 
H. outputting said filtered pixel data; 
35 where said above functions (a;, b;) are derived from a mother 
function of dyadic wavelets decomposed into a separable 
fashion given by the equation: 
p 
I. adjusting a dilation factor of said plurality of digital 40 
filters by a predetermined factor; and 
'l'b(x, y)" ~ a;(x)b;(y) 
J. repeating steps A-I for N levels of decomposition, 
where N is a positive, nonzero integer. 
4. The method of claim 1, further comprising the steps of: 
G. formulating a basic spline function to obtain pre-filter 
and low-pass filter coefficients; and 
H. approximating a set of lD high-pass filters by decom-
posing said basic spline function into separable func-
tions. 
5. The method of claim 1, further comprising the step of 
increasing a decimation factor of said filtered data in con-
junction with said step of increasing a dilation factor of the 
orientation filters to provide a decimated multi-resolution 
decomposition filter output. 
9. The method of claim 1, wherein said step of decom-
posing comprises decomposing said approximation matrix 
45 into a plurality of sets of horizontal filters and at least one 
vertical filter that are implemented in a vertical-horizontal 
filtering scheme. 
10. The method of claim 9, further comprising the step of 
inputting filtered data into a reorder buffer after the step of 
50 filtering the data through a vertical filter to minimize shift 
delay between the vertical and horizontal filters and to 
produce an undecimated filtered data output. 
* * * * * 
