A FPGA based hardware implementation of the Santos-Victor optical flow algorithm, useful in robot guidance applications, is described in this paper. The system used to do this task contains an ALTERA FPGA (20K100), an interface with a digital camera, three VRAM memories to contain the data input and some output memories (a VRAM and a EDO) to contain the results. The system have been used previously to develop and test other vision algorithms, such as image compression, optical flow calculation with differential and correlation methods. The designed system let connect the digital camera, or the FPGA output (results of algorithms) to a PC, throw its Firewire or USB port. The problems take place in this occasion have motivated to adopt another hardware structure for certain vision algorithms with special requirements, that need a very hard code intensive processing.
INTRODUCTION
The first part of this document describes the previous work developed about real time hardware implementations of visual image processing algorithms, which are used as components of a visual processing system, fixed in a global active vision system. In the second part, the cited algorithm hardware implementation attempt, which lets to extract very important conclusions to develop and test new applications or to improve previous ones, is also described. First of all, the environment where this application has to work is described, with the vision system structure and the framegrabber used. In the following section, some instances of vision algorithms previously developed are explained, as well as their characteristics, properties, advantages, troubles and possible applications. The next parts contains the experimental description of the hardware vision system and the FPGA algorithm implementation, that finally can not be mapped on the available system. Finally, in section 5 the results and conclusions are presented.
Vision system
The active vision system can be divided in the following blocks ( -Digital camera: it contains the digital element (commercial model) that provides monochrome images of 480 x 500 pixels.
-Frame grabber: This block store the image information in some dual port memories.
-Log-Polar: it carries out the log-polar transformation on the captured images.
-Optical flow: It calculates the optical flow from a sequence of images (with a minimum of two consecutive images).
-Actions calculus: It processes the received data and decides where to focus the camera.
-Control camera: It carries out the necessary motions to focus the camera in a concrete point.
PREVIOUS WORK
In [1] , [2] , [3] some implementations of these blocks are described. The structure of all of them is the same (Fig. 2) , where an easy interconnection and real time operation are provided. A little hardware block can do its work quickly than a larger one and many of these blocks can operate in a parallel mode if the intermediate results are accessible. The input and output memories, that can be accessed at the same time from the source and the target systems, are the more important parts of this hardware structure. The other element is always a FPGA (or several), that holds the vision algorithm and the memories addressing task.
Optical flow obtaining techniques & hardware implementations
Optical flow is a concept with many definitions: the apparent motion that an observer notices in an image, a twodimensional vector that indicates the motion of objects, or features, in a sequence of images, etc. The definition is lees important that the fact that it is the best option to calculate three-dimensional properties of the environment, and to obtain other useful motion information, starting from the luminance changes of the image plane points. The optical flow algorithms have many applications, where stands out the autonomous mobile robot navigation in a non well-known environment. These techniques are detailed in [4] and are generally very difficult to support real-time performance (only with strong restrictions on the environment). They can be divided on four types: Many of this algorithms are not suitable for practical applications actually, due to the hardware requirements (parallel computers, or ASIC s) of real time applications, or to the excessive duration (hours) of other approaches with high precision, on a workstation.
The two algorithms that will be described next use differential and correlation (matching) regions techniques. The first one is the hardware realization of the Horn&Schunk algorithm [5] and the second one is the hardware implementation of the Camus algorithm [6] [7] , that modifies the correlation scheme with a restriction in the image movement.
FPGA implementation of the Horn & Schunk Optical Flow
Algorithm. This classical differential algorithm is used in [1] to develop a real time process system. Next figure (Fig. 3) shows the structure of the system where there are two blocks of external memory (input and output memories) and a external sequencer to load the input memory as it is required. The calculus block is divided in two FPGAs (XC40203 and XC4005H) and the system structure requires at least two input memories to hold two input frames and one output memory to hold the calculated flow. All of them are dual port memories like the used in next section. This system could process 19 images (50x50 pixels)/second.
FPGA implementation of Camus correlation Optical Flow Algorithm.
This correlation method is used in [3] , showing that it is capable to operate in real time. The algorithm makes the following stages, to determine the speed, or displacement of a pixel, between a number of (2d+1) 2 possible displacements where d is an integer value that depends on the speed: 4. Election: The most resemblance displaced region with the reference one will correspond with that with smaller excitement stage values. Therefore this stage will find the smallest value of the excitement one.
Next figure (Fig. 4) describes the logical design of the FPGA that implements the algorithm of optical flow calculation. The meaning of its blocks is as follows:
The two RAMDP_8kb blocks are double port RAM memories (8 KB), used to store the data images temporally. They are necessary, since the frame grabber gives the information of the images in a sequential way and loading it is necessary to load some lines to carry out the process. Their size is of 16 serial lines of 96 pixels, out of 480 possible ones, for the temporal restriction of the time of calculation (see point 4). The device chosen for it is the C.I. CY7C09159V. -The RAMDP block (32 KB) is the dual port RAM memory where the optical flow calculated vectors are stored on. The used C.I. is a CY7C09079V.
-The used FPGA contains all the necessary modules required for the optical flow calculation. The chosen device was an ALTERA EPF10K50RC240-3, using 37 inputs and 75 outputs from it. Two clocks are used in the system. One, named "CLKWRITE", is given by the interface with the camera CCD and has a 10 MHz frequency. The other one, "CLKREAD", is internally generated and its value is of 15.384615 MHz (or the maximal possible for the correct operation of all the processing units).
The output of the system is placed on the right port pins of the of the dual port memory RAMDP (32 KB). This memory holds the optical flow vectors that have been obtained. The system also generate the three necessary enable signals ("out 1", "out 2" and "out 3") that are used by the later prosecution stages to read safely from this memory without writing interference. The optical flow of three images is stored, and the enable signals indicate if the memory is ready for being read ("out1" enable the reading of the image 1, and so on.). The format of the flow of each pixel is a byte that holds the (u,v) coordinated (flow in x-direction and y-direction, respectively). The numeric format used is a two-complement format number of 4 bits, that expresses the possible results (+1, 0, -1) in each address. With this hardware implementation, on ALTERA EPF10K50RC240-3, a minimum clock cycle of 65 nsec was achieved, obtaining a process rate of 22,56 images (96x96 pixels)/second.
FPGA implementation of a Log-polar
Algorithm. This third application [2] is another instance of the structure of the vision systems that have been previously developed. It forms a vision block used to compress the data image volume without the lose of useful properties. Next figure (Fig. 5) shows the elements of this application. In this case the FPGA s job is very little. It only has to address the input memory and to compare with the "valid" directions, that are contained at the ROM memory, in order to conform the log polar image, that is stored at the output memory. This last memory again is a dual port RAM, to permit that another system could read it more quickly. The size image and speed of operation in this case are of 96x128 points images from 480x500 source images (with an ALTERA EPF8282A) and 25 images (96x128 (radial and points))/second. 
HARDWARE VISION SYSTEM DESCRIPTION
It have been exposed several examples of vision systems that use a very similar hardware structure. The previous problem for the investigator was to develop a new board for every new design and to assemble various of them to form a complete system. A FPGA Vision Based System has been developed to avoid this problem and to let make implementations at an easier way. The blocks diagram and schematics of this system is shown next (Fig. 6, Fig. 7 ) and the operation of every one is fully described at [8] , [9] . This system is being now finished and soon real results will be available. 
HARDWARE VISION SYSTEM APPLICATION
The examples of hardware implementations of vision algorithms that have been exposed previously, perfectly fit on the hardware vision system, but to complete the study, we needed to test other approximations, also very useful for vision applications. The algorithm that have been chosen in this occasion to test the cited system is developed at [10] . It is an example of a purposive vision algorithm, where almost only is important obstacle detection, for a good robot navigation. The optical flow field output of this method is taken from geometric properties of the camera-scene configuration and does not depend on camera parameters. The analysis of the optical flow structure, obtained from a image sequence with this method, allows an easy optical flow analysis with some "little" drawbacks. For instance, a projection of the image plane onto the ground plane (where the robot is moving) is needed. The coordinate system used is shown at Fig. 8 , where Pc is the image plane produced by a camera that is moving whit pure translation over a straight line, on a plane ground. The flow image plane obtained is very complex, even in simple structures, due to perspective effects. With this scenario, the movement analysis problem is a very difficult task. Ph is the ground plane that would generate a camera with a vertical optical axis (down focussing). The optical flow field obtained with this approach is very simple, due to the constancy of the ground distance: all vectors have the same length, and holes or other obstacles can be easily detected. Practical reasons does not made useful this vertical camera orientation, but the optical flow on Pc plane can be inversely projected on Ph plane, with a high simplification of the flow complexity. So, if this analysis is used, the work is easier with some restrictions. While in (C) ground movement is a mapped on a complex vectorial flow, in (H) all vectors have the same length and orientation, if the movement is only transactional. If this projection is possible, the robot movement (and obstacles detection) can be easily analysed only with normal flow estimation, without camera parameters, nor robot velocity, influence. In order to avoid the aperture problem, inherent to only use of first order derivatives, an related (or afin) 2 transformation is employed, where second order parameters are discarded. The error of this approximation is less than the one produced with the use of Laplacians (to calculate second order derivatives) for instance. The algorithm used brief resume is as follows. The motion field has the following expression:
where model parameters can be related with camera movement and image plane versus ground plane orientations. These parameters are not dependent of camera parameters for translation only camera movement and can be estimated with three, or more optical flow measures at an initialization system stage and a least square approximation. The method first calculates x,y and t derivatives to obtain the following matrix (three measures): 
, that can be resolved with the pseudo-inverse solution method (six values of optical flow parameters): .
To obtain these values is necessary the multiplication of the M matrix by its transponder one, with many products and sums. Once this matrix is obtained, it is necessary to obtain its inverse one, where the calculus of its determinant is needed (again many arithmetic operations..). With these operations optical flow parameters would be ready and only plane parameters would rest to calculate. The only drawback is that trigonometric expressions are to be calculated. Many problems to describe a hardware architecture to do it at a real time inside a FPGA. Nevertheless, we have designed the hardware structure for these requirements. Here the multiplications number required to calculate the matrix for a single point of the image are exposed. It can not be avoided that more points are required to obtain a trusted output, approximately 9 points! With this we obtain an notable increment of products and sums, with more execution time and hardware requirements for the algorithm. Once this matrix is obtained, it is necessary to calculate its inverse one, with the following formulation:
Another calculus, M determinant, is needed. This operation requires a variable number of sums and products, depending on the matrix size (for instance for a 2x2 matrix two products and a sum are needed, for a 3x3 matrix twelve products and five sums, for a 4x4 matrix, twenty six products and nine sums. For a 6x6 matrix, as in this case, the operations number grows at a very notable form. This calculus can be made at a more efficient way with the matrix decomposition into other of lower range, until a 2x2 size one, but again a lot of operations are needed. For instance, our VHDL implementation of this module (180 products and 124 sums over 16 bits operands are required) occupies the 80% of a EPF10K70RC240-2, FPGA used for other full algorithms implementations. With this data in mind we could not afford a viable hardware design, but all the hardware modules are developed for future implementations. The optical flow vectors calculation, and its inverse projection would be the last modules, where Taylor series are used to approximate trigonometric expressions. As a conclusion, it is necessary a very high volume of basic operations (sums, subtractions and products). If a hardware implementation of an algorithm with this characteristics is needed, it is necessary to hold at an easy way matrix calculus and trigonometric functions. Nor parametric modules nor VHDL are capable to do this efficiently on a single FPGA design. Exhaustive Matlab simulations and comparisons with theoretical results [11] of the algorithm output have been done, as can be seen next (Fig. 9 ). An intermediate image is shown, with the results of the sequence associated optical flows (theoretical and produced by the algorithm), over it. This work has proved the validity of the algorithm over a software simulation, but if we want to do it at real time, this hardware approach evidently was not adequate: As an example, if the module needed for matrix manipulation occupies almost half the chip, and the operation speed is very low, this is not an acceptable solution. This is an example, as others that have been encountered [12] , of a vision algorithm that does not match with the previous ones that have been developed, because they require some complex operations. It shows, as it will be said at the next point, that another approach is needed.
RESULTS AND CONCLUSIONS
This works have probed that previous system is not quite good for all vision applications. We are going to try soon another vision board from LSP [13] , that mixed DSP/FPGA solutions to digital signal processing work. The data intensive functions will be mapped onto some FPGA (Virtex or Altera) and the code/mathematical intensive functions, like the used by present algorithm, will be mapped onto DSP s elements, to do efficiently the work. This boards links directly with the software used (Matlab, Simulink), and so, it will result as a very good tool to develop and test another vision algorithms on a hardware system, with real time requirements. With this system and the other previously used, many more vision algorithms will be developed and tested in the future.
As a resume, these systems pretends to be some good components of a vision laboratory, for construct robotics real time applications, with real time vision modules.
