INTRODUCTION
Techniques for motion extraction from images come from two opposite approaches. From one side, many analytical methods of motion analysis have been formalised: examples are the studies on optical flow (Beauchemin and Barron (1)), or the extraction of moving features (Kanade and Tomasi (2), Smith and Brady (3)); these methods are able of extracting not only moving points, but also information of the motion direction and velocity, and are generally very expensive in terms of computational time. For example in Liu et al. (4) , a real-time motion detection system is proposed, but running on an HyperSparc workstation with it needs a Datacube MV200. From the other side, many proposed approaches have been developed only for specific real-time applications, and therefore prefer empirical and approximated solutions. Examples are methods based on the analysis of single columns of pixels for measuring vehicular flow on roads, called Inductive Loop Emulators (Fathy and Siyal (5)), or systems that make use of road markings in order to easily evaluate moving objects with respect to the fixed background (Bouzar et al. (6) ). These video-based systems work at pixel-level only, and they are not able of providing detailed information on individual vehicles, thus lacking generality and flexibility. Our proposal can be viewed as a trade-off between the goal of achieving object segmentation with very simple processes (easily portable in hardware) and at the same time the aim of processing the whole image in order to extract the complete information of the moving objects and their shape. The goal is to provide a hardware solution of the problem of detecting moving objects in outdoor scene at real-time. Therefore the aim of this work is the development of a system for video-surveillance and object tracking with these issues: simple processes, in order to be able to develop a cheap and reliable VLSI implementation; real-time issues, to achieve a frame-rate processing, necessary for video-surveillance applications; flexible approach, possibly reconfigurable according with the final application; on-site implementation without the necessity of an installation of cumbersome generalpurpose PCs where vehicles have to be detected.
To achieve all these issues, a prototype based on SRAM-based FPGAs has been developed. The approach we follow, oriented to surveillance and traffic control applications, extracts moving points from images with simple image processing techniques, then segments them in order to obtain moving objects; the location and identification of object will be used in a further tracking step (Koller et al. (7) , Barattin, Cucchiara and Piccardi (8) ). In particular, we adopt a spatio-temporal filtering that consists in the integration of motion information from sequences of frames with the information of gray level variation in each single frame. In the paper, we outline the proposed approach for extracting moving objects, that in the specific road context can be classified as vehicles, and we describe the hardware solution based on reconfigurablehardware. Finally, we present performance results of the developed prototype.
THE PROPOSED APPROACH
Moving objects in an outdoor scene can be perceived by an observer since both motion and luminance contrast concur to pop the object shape out the background. Accordingly, many approaches exploit differential computation with both spatial and temporal filtering. A possible promising solution (8) exploits the following steps for moving object extraction: 1) detection of moving points by performing a difference on three consecutive frames; 2) detection of high contrast points in image, i.e.
points with high gradient, as possible edges of moving points; 3) execution of a Moving Edge Closing, that is a morphological closure between moving points and sharp edges in order to extract moving objects. The algorithm for extracting moving points is based on the difference of three consecutive frames ((8), Yoshinari and Michihito (9)): we adopt this method since we proved that it is particularly robust to noise due to camera movements and it avoids the detection of very small moving objects in the scene (such as tree leafs, reflections, etc.).
The three steps in (8) are followed by other processes for obtaining separate objects even in presence of occlusion: the objects are finally classified as moving vehicles according to some rules and a rule-based tracking system is proposed. Independently from the adopted high-level symbolic system, the low-level system must perform moving objects segmentation at very high speed, in order to meet real-time requirements of applications. Therefore, most or all frames must be processed. From these requirements, the need of a system able to segment moving object at frame rate arises: the previous considered three steps take about one second tested on a high performance PC for images already stored in central memory, thus without considering transfer time for the frame grabber acquisition. This time, even if it is not too critical, is still far from real-time requirements. Moreover, in many real applications, the adoption of a standard PC is not affordable for many reasons, first of all cost limitations: in many distributed applications, such as road traffic control, a suitable solution should equip all traffic-lights of an intelligent camera able of detecting and measuring the vehicular movement. Finally, an alternative could be the real-time transfer of all frames from the road to a possible processing center: but also this way is not affordable for the current costs of transfer bandwidth (for example, in the traffic control system mounted and running in Bologna, Italy, cameras transfer rates are of 0.2 frame/sec only).
HARDWARE IMPLEMENTATION
In this context, the main contribution to this work is to propose a robust approach and its hardware implementation based on Field Programmable Gate Arrays, that answers the requirements of high integration, and possibly low-cost. The developed prototype is based on the Gigaops G800 board (Giga Ops Documentation (10)). In this working environment, we have implemented different versions of differential algorithms for the moving points extraction with two-frame difference, three-frame-difference and difference with background approaches (9) in order to compare results in various external conditions. As well as moving point extraction, we perform concurrently edge detection and moving edge closure in real-time.
The prototyping board we used is the GigaOps G800 Spectrum board (10) , schematically shown in figure 1. In figure 1 it is possible to notice the main blocks of this system. The actual computation is performed by pairs of Xilinx XC4010E FPGA's, connected in modules called XMOD's: in We developed the system in VHDL language and compiled it with Synopsys Tools (Synopsis (11) ) . The netlist file obtained has been translated in bitstream in order to be downloaded into FPGAs at execution time. Once the bitstream has been produced it can be downloaded to the hardware every time we need to use it. 
VEHICLE DETECTION
All operations are executed in pipeline with delay lines for performing mask near-neighbour algorithms (e.g. edge detection) exploiting the synchronism clock of the acquisition system. In our approach, target extraction is based on spatialtemporal segmentation: "temporal", because it exploits information on moving points; "spatial", because it performs convolution to exploit luminance variations in a 3x3 near-neighbour mask to select edge points. We define a suitable moving edge closure, in order to obtain a close contour of a moving object. This algorithm correlates moving points (detected by the doubledifference algorithm) with high gradient point (extracted with a standard Sobel operator). Temporal and spatial filtering have to be performed simultaneously. Therefore we exploit the data parallelism available in the G800 board in order to meet the real-time constraints required by the application.
In figure 2 , the data path is shown. The image coming from a standard camera and grabbed from the decoder (performed by the SCVIDMOD) flows both to YFPGA1 and to YFPGA2. Through the former we obtain the double-difference image, while the latter performs edge detection. Results are sent to the YFPGA3 through hbus_data(5) and hbus_data(6), respectively. The former contains the binarized information of the moving points (using a hysteresis thresholding), the latter contains the edge image, binarized too. The lines hbus_data(0)-hbus_data(4) reach each module to synchronize them through a semi-frame counter. YFPGA3 performs the moving-edge closure above described, exploiting information from moving and contours points. The final results of this operator are passed to YFPGA4 through hbus_data(15) to allow performing a further morphological closure with four closing steps. Finally, in the current prototype, results are sent to the encoder (i.e. SCVIDMOD) in order to be displayed on the CRT.
The final morphological closure is an optional step that can be useful for providing closed contours of moving objects. However, this iterative operation is time consuming: therefore, total throughput of the system has been improved with a two-step pipeline, as shown in figure 3 .
PERFORMANCE EVALUATION
The current solution of intersection management in most of the cities equipped with intelligent traffic light controller is based on the usage of inductive loops. These devices produce only information on the number of vehicles passing over them. However, lack of information (then inflexibility) is not the only drawback of inductive loops. Due to bandwidth of common infrastructure networks mounted in urban environments, data output rate is normally slow (in Utopia system (12) , for instance, data are updated every 5 seconds) since acquired information on road must reach the traffic light control system. To increase data output rate, enriching the information of the processing system, dedicated hardware solutions are the most promising choices. ISPDs (In-System Programmable Devices), such as FPGAs, rely on high integration and reprogrammability to being very useful for rapid prototyping. Furthermore, on-site installation of such devices implies the limitation of the bandwidth in order to meet the real-time constraints, relying on the possibility of local processing in order to transmit synthetic result of processes only, instead of the whole frames.
The european PAL video standard adopts a frame rate of 25 frames/sec., i.e., a frame every 40 msec. Since PAL standard is interleaved, the above times refer to semi-frames and since double-difference operator needs three whole frames to be performed, in theory we need 240 msec to obtain double-difference image. But, due to the data-parallelism introduced and to the two-step pipeline shown in figure 3 , we are able to overlap operations obtaining YFPGA3's output in 240 msec (edge detection operation is performed in parallel on the source image).
We shall be able to use only three consecutive frames, obtaining frame rate behaviour. Nevertheless, in order to catch movement of vehicles driving from 40 to 60 km/h, we output one frame every five.
As shown in figure 3 , the morphological closure must wait the end of moving-edge closure to be able to be performed. Moreover, each step of the closure needs the result of the previous one. With a four-steps closure this means a latency time of 160 msec, to be added to the 240 msec of previous stage of the processing. But since we can pipeline the two steps (onto the last performing step of the double-difference), we can obtain final result in 200+160=360 msec.
Since 200 msec are due to acquisition time, we are able to produce a refreshing time of output image of 160 msec, that is 5 frames/sec. This is enough to obtain a sufficiently good continuity of the movement in the result image. Nevertheless, the 240 msec for doubledifference computation are necessary to catch only strong movements of the objects in the scene and to increase robustness of the system. Figure 4 show one example frame of four possible video outputs of our system that, using a standard PAL camera, is able to furnish different video output forms at frame-rate: the sequence of the standard colour image as in Fig. 4 upper left (without any image processing), the moving points as in Fig. 4 upper right, the edge points in Fig. 4 lower left and the moving objects in Fig. 4 lower right, obtained with one-step morphological closure. Therefore a real-time processing is performed and also a severe compression of the images (from colour pixel to 1-bit/point images) keeping only the important information about motion and, at the same time, requiring a very limited bandwidth. The system has been tested on real road traffic scenes in the cities of Modena and Bologna (Italy). This work is a part of a project sponsored by the Bologna Provincia government for a city control center with vision based traffic monitoring. 
EXPERIMENTAL RESULTS

CONCLUSION AND FUTURE WORKS
In this paper, we have presented a traffic-control system implemented by using FPGAs. This system is the lowlevel implementation of a complete urban traffic controller, able to track vehicles, to count/classify vehicles (for special applications, such as reserved lane for busses, which needs a vehicles classification in loose sense) and to extract extra-information such as turning rates, position and length of queues, etc. In (8), the algorithm setup for day time conditions and the high level tracking module has been presented. This paper reports the hardware implementation of the lowlevel algorithm. At the same time, research activities for other day conditions, and in particular at night, are performed (Cucchiara and Piccardi (13)). Performed experiments show that the vehicle detection under different light condition requires very different image processing algorithms depending on the different visual cue that have to be detected (e.g. vehicle template at daytime and headlight at night). This analysis suggests the exploitation of a reconfigurable low level system able for adaptively change its computational function. In the next future, we intend to implementing this dynamically configurable behaviour in hardware, by exploiting the in field reprogrammability of the FPGAs.
Testing the system upon more sequences (and in different wheater conditions, such as rainy, foggy and cloudy) is another goal for the next future.
BIBLIOGRAPHY
