VLSI architecture for an Underwater Robot Vision System by Ila, Viorela et al.
Oceans - Europe 2005 
VLSI Architecture for an Underwater Robot Vision 
System 
Viorela Ila 
and Rafael Garcia 
Institute of Informatics and Applications 
Campus Montilivi, 17071 Girona, Spain 
Email: { viorela,rafa} @eia.udg.es 
Abstrarl-it is well known that image processing requires a 
huge amount of computation, mainly at low level processing 
where the algorithms are dealing with a great number of datn- 
pixel. One of the solutions to estimate motions involves detection 
of the correspondences between two images. For normalised 
correlation criteria, previous experiments shown that the result 
is not altered in presence of nonuniform illumination. Usually, 
hardware for motion estimation has been limited to simple 
correlation criteria. The main goal of this paper is to propose 
B VLSI architecture for motion estimation using a matching 
criteria more complex than Sum of Absolute Differences (SAD) 
criteria. Today hardware devices provide many facilities for 
the integration of more and more complex designs as well 
as the possibility to easily communicate with general purpose 
processors. 
I. INTRODUCTION 
Presently, different methods for motion estimation and lo- 
calisation of an underwater vehicle exist, mainly based on 
acoustic sensor networks. This strategy is relatively expensive 
since transponders have to be deployed from a ship, calibrated 
and recovered afler the mission. Therefore this procedure is 
not adequate for low-cost, small-size underwater vehicles such 
as URTS (Underwater Robotic Intelligent System) developed 
at the University of Girona (see figure 1). One cost-effective 
alternative can be to equip the vehicle with a down-looking 
camera, which acquires seafloor images while the robot is 
performing its mission. This down-looking camera provides 
rich visual information which can be used for vehicle mo- 
tion estimation. Sequence of images acquired by the camera 
mounted on the robot can be used in constructing a map of the 
Fig. 1. URlS Undewater vehicle, developed at the University of Girons 
Francois Charot 
IRISA 
Campus de Beaulieu, 35042 Rennes, France 
Email: francois.charot@irisa.fr 
0-7803-91 03-9/05/$20.0002005 IEEE 674 
zone surveyed by the submersible. This map is a composite 
image constructed by aligning a set of smaller consecutive 
images. An example of a part of a underwater mosaic image 
is presented in figure 2. 
In most of the cases the process involves recovering the 
motion of the vehicle by means of gray level correlation [l] 
or using optical flow [2]. UnfortunateIy underwater images are 
difficult to process due to the medium of transmission char- 
acteristics. Blurriness of elements of the image, cluttering and 
nou-uniform illumination are some of the problems present in 
underwater imaging. 
Our aim in the present work is, starting fiom analysis of 
VLSI architectures for motion estimation, to develop efficient 
versions of Field Programable Gate Array (FPGA) hardware 
implementation of motion estimation algorithm for the partic- 
ular case of underwater images. In order to enable hardware 
implementation, a number of transformation in the algorithms 
may be performed. The algorithm and the required transfor- 
mations are introduced in section II. Section m describes our 
proposed architectures €or real time motion estimation for of 
underwater images. The implementation wouId allow motion 
to be estimated at  video rate as described in section TV. The 
paper ends up with the conclusions and further work. 
11. REAL TIME MOTION ESTIMATION 
When an autonomous robot performs a real mission, it needs 
to perceive its environment in real time. In this application the 
time constrains are defined by the frame rate of a low-cost PAL 
Fig. 2. Sampte underwaier mosaic image 
Authorized licensed use limited to: UNIVERSITAT DE GIRONA. Downloaded on April 26,2010 at 10:39:14 UTC from IEEE Xplore.  Restrictions apply. 
Fig. 3. Correspondences between current and reference f h n e  detected using 
normalised correlation criteria. 
video camera mounted on the vehicle. Frame rate of 25 h m e s  
per second is equivalent to 0.04 seconds, the time between fmt 
pixel and the last pixel of the frame are acquired. Nowadays, 
different solutions exist for solving real time problems. Soft- 
ware solutions nlnning under Real Time Operating Systems 
(RTOS) provide quite good results when time cons!nints are 
critical. When time requirements overcome the capabilities of 
RTOS, a possible solution is to customise the implementation 
using Application Specific Integrated Circuits (ASIC). These 
devices should be considered only when the applications are 
specific and have a simple implementation. Indeed, alternative 
solutions like microprocessors have been the dominant devices 
in use for general-purpose computing for the last decade. In the 
case of image processing tasks we have to deal with complex 
algorithms applied to a large amount of data. ReconEgurable 
devices take the best of two worlds, combining the flexibility 
of software with the execution speed of hardware. Thus, 
hardware architectures based on these devices can be a good 
trade off for achieving real time in robotic applications. The 
evolution of reconfigurable devices technology in the last few 
years permits complex design integration, high parallelism, 
hardware-software codesign. 
A. Matching Criteria 
The motion estimation algorithm consider two images: the 
current image I ,  acquired by the camera and a reference image 
I, coming from the control system. Point correspondences 
between the current image and a previous reference image 
have to be found in order to compute a motion estimation 
matrix. Often this requires detecting features in one image, 
and matching them in the other one. The selection o f  features 
may depend on the application, although points are commonly 
used because they can be easily extracted and are quite robust 
to noise [3]. A correlation algorithm provides, for each interest 
point (zc, yc) of the c m n t  image, its corresponding matching 
(E,., y r )  in the reference image (see figure 3). The correlation 
score is defined as the covariance between the grey levels of a 
region defined by the correlation window in the current image 
and the same region defined in the reference image. The algo- 
rithm searches for all similar patches inside the correspondent 
search window. A normalised correlation criteria C, which 
assures that the result is not altered in presence of nonuniform 
ilIumination, is showed in equation (1). In ow previous work 
we successfully applied this criteria to underwater images, in 
the presence of nonuniform illumination [4]. The correlation 
score between a point (zc, yc) in the first image I,, and point 
(zr, y,.) in the second image is  defined as: 
P o 1  
~ ( I = ( Z = + ~ . y = ~ j ) - I ~ ( ~ = . ~ = ) ) ( ~ ~ ( Z ~ + ~ ~ ~ r + ~ ) - J ~ ( ~ r . ~ ~ ) )  c = -* --1) 
(2a+1)2@(1c).v2(Jp) 
(1) 
where (2a + I) x ( 2 a  + 1) is the size of the correlation 
window. Ic(xc,yc) and Ir(zr:y,.) are the average intensity 
and a2(-) defines the variance of both correlation windows. 
The correlation algorithm compares the correlation score of 
each pixel within the search window and selects the highest 
one. 
Correlation algorithms have important properties like reg- 
ularity and modularity. Thus, they can be break down into 
computational blocks which can be processed in parallel. We 
propose a decomposition of criteria C for its parallelization. 
We can observe that there are five sums to be computed in 
equation (1): sum1, sum2, sum3, sum4 and sums. 
s u m 1  = 5 2 1 , ( z c + i ~ y C + j )  
sum2 = 2 2 I c ( G  -I i .yc  +j)Z 
sum* = 2 .e I T ( Z T  - k i : % . + d Z  
i = - a j = - , ,  
i=-a j=-- 
sum3 = 2 2 r , ( z , + i , y . + j ) . ~ r t + r + i , l / r + j )  (2) 
i=- c, ,=-a 
,=-a 3=-" 
sum5 = 2 ,p l,.(zr + i : y r  + j )  
i=--u >=-a 
Then, equation (1) becomes: 
sumz-*-suml.sums * .J[(2a+1)~.su,n*-ssum:].[(2a+l)~.svlnr-ssu?n;fl C =  
This braking down of the equation 1 leads to an easy 
parallel implementation, while each Processing Element (PE) 
of the architecture executes in parallel the computation of these 
five sums. Furthermore, the Post Processing Element (PPE) 
performs the remaining computation. 
B. Linear A w q s  for Motion Estimation 
Our approach is based on some ideas applied in VLSI 
architectures for motion compensation algorithms used in 
video coding standards. Full Search Block Matching Algo- 
rithms (FSBMA) are used for motion estimation in such 
applications [5]-[7]. In these algorithms the current h m e  is 
divided into blocks of n x n pixels, and for every block the 
aIgorithm searches for similar blocks in the previous fiame 
within a search area of size (2p  + n) x (Zp + n). In full search 
the algorithm has a very regular data-flow for the search area 
[8]. These data are repeatedly used in the computation for 
different candidates and can simplify the architecture. 
Komarek and Pirsch [9] analysed four different systolic ar- 
rays for FSBMA mapping. Linear or quadratic array structures 
(3) 
675 
Authorized licensed use limited to: UNIVERSITAT DE GIRONA. Downloaded on April 26,2010 at 10:39:14 UTC from IEEE Xplore.  Restrictions apply. 
0 
Search area data relerence area 
,_________*__________ 41, 31.2~31, 21, 11~&~,l1, ________I_________~ 21 31’(,ll, 21.31 ’%%,, 
42,32,2232,22. $2 / AD ‘‘a%,12 2, 32:Y>,22.32 “>, 
displacement vector 
Lbi-w-r 
Fig. 4. Linear stmctures: a) AB1-type; b) ASI-type. 
can be used for the implementation of the algorithm. More- 
over, two possible ways of computation come out from the 
property of associativity of the operations in the algorithm. The 
four resulting array structures are denominated: AB1, AS1 
for linear arrays and AB2, AS2 for quadratic arrays. These 
architectures exploit concurrency using different structures and 
numbers of PES. 
Complexity of the motion estimation algorithm kom equa- 
tion 3 introduces some reshction in chousing the adequate 
array architecture. For instance, a quadratic array is suitable 
only in case of simple PE architecture. When more computa- 
tion must be done in paraIlel it can “eat” a lot of resources, 
which sometimes are not available. This is the reason why we 
restrict our analysis to linear arrays; AEil-type and AS1-type 
are shown in figure 4. 
One solution to reduce the latency introduced by both AB1 
and AS1 structures is to increase the memory access. When 
every PE is supplied with data coming from external memory, 
idle cycles can be avoided. When accessing external memory 
is constrained, reducing the latency can also be obtain by 
controlling the data-flow through the array using registers and 
multiplexors. This strategy was applied by Vos [7], Roma [6] 
and Hsieh [lo] in order to make the AB2 quadratic architecture 
more efficient and by Yang [ l l ]  for h e a r  arrays. When 
different reference and search data flow into each FE, and 
the intermediate results are broadcast through the array, this 
latency can be reduced as it is point out in table I. 
Our approach estimates the motion of certain point from the 
image, this reducing the number of operations but increases 
the complexity of memory addressing. Due to the complexity 
of the chosen motion estimation algorithm, output bandwidth 
is also restricted, while a post processing calculus must be per- 
formed. The goal is to design a motion estimation architecture 
to reduce the input/output bandwidth maintaining the hardware 
efficiency and reduced execution time. Reducing the memory 
throughput can be solved by adding many registers for the 
search area and also for reference block. Vos 171 proposed 
one solution to reduce space occupied into the device by 
replacing the registers with memory blocks which reduces half 
of the silicon occupancy. Taking into consideration the new 
FPGA kchnology, availabte embedded memory bIocks can 
save a Iarge amount of logic elements. A part of low memory 
bandwidth, the solution proposed by Yang [ I l l  also reduces 
the latency, being the most efficient between the proposed 
linear arrays. But as any other solution, it also has some small 
inconveniences. One can be that, for high hardware utilisation 
the restriction ( 2 p  + 1) = n must be accomplished. 
111. VLSI ARCHITECTURE FOR NORMALISED 
CORRELATION 
WhiIe in FSBMA the image is divided in blocks and the 
algorithm is looking for matches of every block in a search 
area, our approach is looking for correspondences of areas 
surrounding interest points. These are scene features which can 
be reliably found when the camera moves from one location 
to another, even when lighting conditions of the scene change, 
In our previous work we proposed a real-time implementation 
of interest points detection 1121. Due to its simplicity Sum 
of Absolute Differences (SAD) has been most extensively 
used matching criteria in VLSI implementation. In case of 
underwater imaging, our previous works [4] showed that by 
applying normalised correlation criteria to find matchmg in 
pairs of images, the result is invariant to nonuniform illumi- 
nation. The complex error measurement computation is shared 
out over two computational elements: an array of Processing 
Elements (PE), each PE performing two accumulations and 
three muItiply-accumulations and a Post Processing Element 
@PE) containing multipliers, subtractors, square root and 
division to compute the error measurement. 
VLSI architectures for motion estimation presented above 
can be easily adapted to our algorithm. The complexity of the 
processing element determined us to set apart the quadratic 
arrays, so that only linear arrays are analysed in this proposal. 
Figure 5 represents the AB1 and AS1 structures adapted 
to our design. The architectures correspond to experimental 
correlation window size of 3 x 3 and search window size 7 x 7. 
As we can see in figure 5 b), a reduced AS1 structure 
is analysed, where each PE has a search window datum 
input from the memory and the correlation window data 
is broadcast through the array. Comparing with AB], this 
strategy reduces the number of time cycles but increases the 
number of processing elements. 
AB1-type architecture is suitable for applications where 
Iarge motion vectors must be estimated, which imply big 
search windows, When the application requires faster pro- 
cessing speed, AS1-type architecture can achieve higher per- 
formance than ABl-type for small search window size. The 
approach introduced by Yang et al. [ I l l  is an important 
contribution to reduce the memory throughput comparing with 
both AB 1 and AS 1. Figure 6 shows this strategy applied to 
676 
Authorized licensed use limited to: UNIVERSITAT DE GIRONA. Downloaded on April 26,2010 at 10:39:14 UTC from IEEE Xplore.  Restrictions apply. 
Architecture 
AB1 
AB1 reduced 
ASI 
ASnduced 
Yang urch. 
a) 
search m w  
CO"&¶Mn 
Wmmw 
b) 
Fig. 5. Proposal of linear architectures to implement normalised correlation 
algorithm. a) AB1-type; b) ASl-type. 
Time Cycles Processing Elements Memory Addressing 
71 x ( 2 p  + 1) x (2p + n) 
n. x ( 2 p +  1 ) Z  n n + l  
n x (21, + 1) x (2p + n) 
n2 x ( 2 p +  1) 
n x (2p - t  1)2 12 2 
n 2 
2 2p+ 1 
2p + 1 ( 2 p  + 1) + 1 
our algorithm. Another advantage o f  this architecture is that 
there is only one delay cycle between the results corresponding 
to each candidate in a line, while in AB1 ssUcture there are 
(n - 1) delay cycles depending on the correlation window 
size. The drawback of this architecture is that it processes one 
column less that the others, so that "dummy" data are input 
to the array. 
Despite of this, Yang's proposal is an efficient architecture, 
due to its low memory throughput, small delay introduced and 
the fact that the result ofPEs array can be easily pipelined into 
PPE without extra control requirements. 
I )  Processing Element: A processing element may execute 
two accumulations and three multiply-accumulations in par- 
allel. Figure 7 presents the intend structure of one PE in 
both cases: AB1-type and ASl-type (Yang's architecture). In 
AB1 structure the PE perfoms three multiplications and five 
additions as the accumulation is done in a separate block 
at the end of the PES array, Besides, in AS1 structure the 
Fig. 6 .  
algorithm. 
Yang VLSI architecture to 
Mashfig 
implement normalised correlation 
accumulations are done in the PE. It increases the size of 
the PE but simplify the control, while each PE has the same 
structure. In case of -1, one PE applies the arithmetic 
operators to the outputs of the previous PE, therefore the bit- 
size of the inputs and the outputs of each PE vary through the 
array of processing element. 
It is obvious that the ABI-type PE may occupy less siIicon 
area than AS 1 -type PE, where multiply-accumulations must 
be performed in parallel. This analysis helps the designer to 
choose between a solution to saves hardware and a second 
choice to reduce the memory access and increase the execution 
speed. As we mentioned above, depending on the application, 
the designer may decide for ABI-lype, which can deal with 
large search areas but introduces great latencies or ASI-ope 
when performance is a critical issue. In case of underwater 
images, the motion of the vehicle is slow, so that the displace- 
ment between consecutive fiames is  quite small. Therefore the 
architecture proposed by Yang can accelerate the aIgorithm to 
reach real time performances with fair resources requirements. 
2) Post Processing Element: The Post Processing Element 
(PPE) is one of the critical part of our design. The results from 
the array of PES are pipelined into the PPE. The PPE computes 
the correlation criteria defined in the equation (3). Seven mul- 
tiplications, three substraction, one 64-bit square root and one 
32-bit division have to be implemented in hardware. Parallel 
implementation of this operations is performed. The square 
root implementation is based on the non-restoring algorithm 
proposed by Li 1131. The advantage of this method is the 
677 
Authorized licensed use limited to: UNIVERSITAT DE GIRONA. Downloaded on April 26,2010 at 10:39:14 UTC from IEEE Xplore.  Restrictions apply. 
-':I 
I Sum54ut 
I f  
. __lll . 
b) 
Fig. 7. Processing element internal structure. a) AB1-type; b) ASI-type 
CTang. 
Fig. 8. Post Processing Element. 
reduced space occupied on the FPGA device and generates an 
exact result value. Figure 8 shows the computations performed 
by PPE. 
The last step of the algorithm compares all the measure- 
ments corresponding to every candidate match. The result of 
the algorithm is the coordinates of the pixel with the biggest 
value for the correlation score. 
Iv. IMPLEMENTATION AND ANALYSIS 
The purpose of h s  work is the implementation of the 
proposed motion estimation algorith on a target FPGA 
hardware. This was accomplished by describing the algorithm 
in VHDL language and then synthesising it for the FPGA 
device. Prior to any hardware design we chose to implement 
a MATLAB software version Corresponding to every step of 
the algorithm. MATLAB is a tool which facilitates procedural 
routines to operate on images represented as a matrix. The 
software implementation of the algorithms has two important 
roles: to chose the most adequate algorithm for OLE application 
and to provide benchmark results for hardware implementa- 
tion. 
The application was targeted for FPGA devices for many 
reasons. FPGAs are new trends for signal processing appli- 
cations, where more original work can be done in terms of 
performance optimisation. On the other hand, today FPGA 
devices offer very attractive hardware fkcilities: great U0 
pin-count, embedded memory blocks, large logic area, high 
possible clock speed and soft and hard embedded processors. 
Advanced software CAD tools are available for assisting every 
design stage. 
VHDL language was chosen for hardware design, foremost 
because of familiarity and also its wide supporting range. 
Parameterisable VHDL blocks were implemented making pos- 
sible the reuse of the design for different P G A  platforms. 
ModelSim simulation tool was used for design verification 
together with Altera's Quartus E design software. Altera's 
Statix family was chosen as hardware target, mostly because 
its accessibility and low cost. Important characteristics such 
as embedded multipliers and memory blocks have also been 
an iduential factor. We benefit kom the Nios Development 
Kit Stratix IOk edition and M E  Nios Development Kit S h a h  
25k hardware platfonns for algorithm's synthesis and test. 
The implementation must be flexible, which means that by 
changing some of the parameters of the aIgorithm, the new 
generated hardware must be valid. The architecture was design 
in such a way to permit changing of these parameters. Compu- 
tation complexity affects the level of parallelism, while many 
multiplications and accumulations must be performed at the 
same time. When talking about flexibility we can refer either 
to the architecture or to the implementation. The architecture 
must be able to support variation of the algorithm's parameters 
such as number of comers, correlation and search window 
size. Indeed, it imply parametriaation of the implementation. 
As FPGA implementation allows optimisation at bit level th is  
parameters determine the bit size of the accumulator results, 
which furthermore affects the computational requirements in 
PPE. Table II shows the hardware requirements and the 
performance in case of Yang's architecture applied to different 
correlation and search window sizes. The required resources 
are quantified using Logic Elements(LE) and DSP blocks. LE 
are basic logic blocks of the selected FPGA device architecture 
and DSP blocks are embedded multiplies fiom the FPGA 
device. The performance is defined in ms and represents the 
latency introduced by the normalised correlation aIgorithm for 
to the detection of 100 matchings using a reasonable clock- 
frequency of 1OMH. 
Chousing an adequate description language allows migra- 
tion of the design to different hardware platforms. More- 
over, compIai@ reduction means avoiding floating-point units 
which are very area expensive. Both, comer detection and 
matching algorithm make use of division operation. Tmsfor- 
mation of these algorithms must be performed such that the 
computation to be based only on fixed-point arithmetic. Silicon 
must be optimally used to implement the computation so that 
the data storage and the control parts must be minimised [14$ 
678 
Authorized licensed use limited to: UNIVERSITAT DE GIRONA. Downloaded on April 26,2010 at 10:39:14 UTC from IEEE Xplore.  Restrictions apply. 
Correlation window Search window Logic Elements DSP blocks 
7 x 7  13 x 13 2946 35 
9 x 9  17 x 17 3259 41 
11 x 11 21 x 21 3470 ’ 47 
13 x 13 25 x 25 3688 50 
15 x 15 29 x 29 3933 56 
Concerning the data storage, embedded block memories can 
be used for data storage for sbuctures such as FIFOs or RAM 
memory blocks, so that the on-chip storage must be restricted 
to the size of this blocks, Frequency used for interfacing 
the external memory where the whole image must be stored, 
have impact on the total execution time of the algorithm. 
Therefore, architecture must be carefully chosen to reduce the 
memory throughput. Saving silicon can also be achieved by 
using external controllers for the communication with the host 
computer. 
[5 ]  
[6] 
PI 
[91 
[IO] 
v. CONCLUSfONS AND FURTHER WORK 
Based on an extensive analysis of the hardware design 
and implementation of motion estimation algorithm, a vision 
system targeting an P G A  device is proposed. The vision 
system will be in charge of the acquisition and processing 
of the image acquired by the camera and communication 
with the control system of the underwater robot running on a 
Pentium processor from an embedded PC/104+ computer. The 
constraints of OUT design are clearly defined by the h m e -  
rate performance, memory access and device capacity. The 
linear array proposed by Yang was chosen to implement the 
matching problem. Testing and synthesis using CAD tools 
provides us information about timing and FPGA hardware 
utilisation of the algorithms. Our experiments show that the 
execution of the matching algorithm can run 50 times faster in 
an FPGA-based architecture than in a Pentium based PC/104+ 
computer. Indeed, commercial alternatives to this system exist 
on the market: frame-grabbers, FPGA boards for PC/lO4+, 
PPGA based dedicated image processing boards, etc. Proposed 
system overcome these existing soIutions by providing real- 
time image processing facilities at low cost and small size. 
[I21 
Li4] 
Performance (ms) 
0.3534 
0.7186 
1.3458 
2.2213 
3.8438 
REFERENCES 
[ l ]  N, Gracias and 1. Santos-Victor, “Underwater video mosaics as visual 
navigation maps,,” Computer Rsion and Image Understunding, vol. 19 
[2] X. Xu and S. Negahdaripour, ‘Vision-based molion sensing from under- 
water navigation and mosaicing of  ocean floor images:’ in Proceedings 
of the Mi’XEEE OCEANS, 1997, pp. vo1.2, 1412-1417. 
[3] C. Harris and M. Stephens, “ A  combwed cornex and edge detector,” in 
hoceedings of the Fourth Alvey VEiion Confmnce, Manchester, 1988, 
[4] R Garcia, X. Cufi, and V Ila, “Recovering camera motion in a sequence 
of underwater images through mosaicking,” in First Iberian Confmnce 
on Pattern Recognition and Image Analysis, Lecture Nota in Computer 
Science, no, 2652, 2003, pp, 255-262. 
(l), pp. 66-91, 2OOo. 
pp. 147-151. 
A. Benedetli, A. Prati, and N. Scarabnolo, ‘%age convolution on 
FPGAs: the implementation of a mulii-FPGA FIFO structure:’ in 
Pmceedings on Eummicro Conference 1998., Aug. 1998, pp, 123 -130 
V0l.l. 
N. Roma and L. S o w ,  “Efficient and con6gurable fd search block 
matching processors,” E E E  7hmsacrions on Circuits und @stem for 
Tide0 Technology, vol. 12, no. 12, pp. 11661167, December 2002. 
L. Vos and M. Steghm, “Pammeterizable VLSI architectures for the full- 
search block-matching aLgonthm,” in IEEE Trunsactions on Circuits ond 
Sysremr, October 1999, pp. 1309-1316. 
P. Baglietto, M. Maresca, A. Migliaro, and M. Migliardi, ‘‘Parallel 
implementation of the full search block matching algorithm for motion 
estimation,” in Internurwnal Conference on Application Specifrc A r r q ~  
Pmcessors, July 1995, p p ~  t82-192. 
T. Komatek and P. Pirsch, “Array architectures for bfock matching 
algorithms,” LEEE Punmcrions on Circuits and Sysrsronr, vol. 36, pp. 
C. H. Hsieh and T. P. Lin, “VLSI architecture for block -matching 
motion estimation a l g o r i w  IEEE ikons. on Cimirs and Systemsfor 
Pideo rechndogy, vol. 2, no. 2, pp. 169 - 175, June 1992. 
IC-M. Yang, M.-T. Sun, and L. Wu, ‘A family of VLSI designs for the 
motion compensation block-matching algorithm,” LEEE 7kansactiuns on 
Circuits and Systems, pp. 1317 -1325, Oct. 1989. 
V. fla, R. Garcia, and E Charot, “Proposal of a parallel architecture for 
a molion detection algorithm,” in Internaiionnl Conference on Pattern 
Recognition, Cambridge, Aug. 2004. 
W. Li and W. Chu, “A new non-rstoring square root algorithm and 
its vlsi implementations:’ in 1996 IEEE Intemntional Conference on 
Computer Design: VLSl in Compufers and Pmcessors, 7-9 Oct. 1996, 
E Charot, C. Labit, and P. Lemonnier, “Architectural study of a block- 
recursive motion estimation algorithm,”Reul-?Tme Imaging, vol. 3, no. 2, 
pp. 111-128, 1997. 
1301 -1308, 10, OCt 1989. 
pp, 538 - 544. 
679 
Authorized licensed use limited to: UNIVERSITAT DE GIRONA. Downloaded on April 26,2010 at 10:39:14 UTC from IEEE Xplore.  Restrictions apply. 
