Motion can be a useful sensory dimension for autonomous navigation if it is mailable in real time. In this paper we present our approach to real time motion processing giving an outline of the algorithm and hardware architecture we have developed. We then describe in more detail how this architecture is being implemented using FPGA technology.
INTRODUCTION
If motion is to be a useful sensory dimension for navigating in a dynamic environment, it must be extracted quickly relative to the dynamics of the objects in that environment, i.e. in "real-time". In the context of motion processing "real-time" refers to processing that is quick enough to prevent temporal aliasing [l] . For an autonomous vehicle the result of slow or high latency processing is at best, a poor navigation plan and at worst, a catastrophic collision. Producing real-time motion estimates is challenging since it typically involves applying some form of iterative search to the massive bandwidth of a live video stream. Furthermore, overheads such as memory management and interfacing must be addressed. In order to overcome these problems we approach the design problem from end to end. That is, care is taken to ensure the chosen algorithm can be implemented efficiently in hardware, and the hardware is chosen to ensure our overall goals can be met.
What Are We Trying To Achieve?
Our primary goal is to design a compact, self-contained system for computing motion estimates in real time. This system is targeted for use as a part of a navigation system of an autonomous vehicle and is to use a combination of range and visual data in order to reduce ambiguity. The output of the system is a segmented one dimensional motion estimate.
Non Traditional Approaches to Motion Processing
When considering motion processing hardware, it is common to assume that the system must digitize a visual image (or range data) and process that image with digital hardware however this is not necessarily the case. For example, the Visionchip [2] [3] [4] is an approach to vision processing where, rather than using a camera and separate processing hardware, the imaging and processing components are incorporated into a single device. A variety of motion 
Ljubo Vlacic
Intelligent Control Systems Laboratory Griffith University Nathan, Q, 41 1 1, AUSTRALIA l.vlacic@griffith.edu.au processing algorithms can be used and they are often implemented using analogue components [3] though hybrid digital implementations are also possible [4] . Difficulties still abound in this technology including low resolution and problems integrating video and processing elements into the same silicon.
Other systems are active rather than being strictly passive. For example, structured light can be used to determine motion. In this scenario, a known pattern of light is projected into the environment using a laser. A camera detects the reflected pattern and determines the structure and motion of objects in the environment based on the deformation in that pattern. For example, a laser stripe and camera can be used to determine relative motion between an object and a vehicle [5] . A similar system is used to enable robotic gasping of moving objects [6] . Another active system was proposed by Houghton et al [7] . They show how the speckle pattern that arises when a laser shines on an optically rough surface can be used to measure motion. This idea was de- veloped [SI into and ASIC that is capable of processing 1D speckle images at high speed.
Traditional Approaches to Motion Processing
There are three broad approaches in traditional motion processing hardware: custom design, video processor based implementations and PC based implementations. [19] show how a mobile robot can navigate through the center of a corridor using the temporal derivatives of optical flow.
Why aren't these approaches enough?
It is evident fiom the previous discussion that modem hardware developments have made real time motion processing more achievable. This begs the question, why are existing approaches not yet sufficient?
The central tenet of our work is that motion segmentation should be performed using a combination of visual and range information in order to minimise ambiguity that arises when visual information alone is used. This ambiguity arises because visual motion is a function of both velocity and depth. None of the implementations listed above directly utilise range data. Algorithms that combine data fiom both sources exist (eg [20] [21]), however these have not been used in real time implementations and have other drawbacks. Furthermore, none of these implementations are able to segment the visual environment into coherently moving regions, a feature critical for effective navigation planning.
ALGORITHM
In this section we define what we mean by "real time" in the context of our work and show how a dynamic scale space is used to achieve real time processing. We then introduce our motion estimation algorithm that fuses visual and range data to eliminate motion ambiguity. Our algorithm separates motion estimation fiom motion segmentation and reduces what is traditionally an expensive 2D minimisation to a less costly 1D problem.
Real Time and Dynamic ScaleSpace
Gradient based approaches such as ours fail if apparent motion greater than about one pixel per frame is present.
Greater motion results in temporal aliasing and this in tum makes the image derivatives that we use to estimate motion invalid. Based on this, we define "real-time" as a rate quick enough to avoid temporal aliasing. The usual method of resolving the problem of temporal aliasing is to use a scale-space approach where a pyramid of subsampled images is created. The motion estimation algorithm is then applied at each level of the pyramid. High velocities can be reliably measured at higher levels of the pyramid while the lower levels (i.e. those levels with less subsampling) are used to measure lower velocities. Unfortunately this scheme is problematic. Generating and storing an image pyramid is time consuming and propagating motion estimates fiom one level to the next adds significant complexity [22] making it extremely difficult to implement such a system in real time.
Rather than implementing a full scale-space scheme, we use a dynamic scale-space [ 11 where an appropriate scale is chosen to avoid temporal aliasing based on range data. The nearest object is likely to have the highest apparent velocity, and it is the object with which we are most likely to collide with in the short term, so we choose a scale that prevents temporal aliasing for that object. A simple geometric [l] argument based on the pinhole camera model can be used to derive the following relationship between the required frame rate for real time processing, object velocity.(V = O.lm/s) and distance (D = 0.4m), and the camera focal length (f = 4.8")
and pixel pitch (r
This is a worst-case relationship based on a maximum relative velocity of 2V between the camera and an object. In our environment, with our camera a frame rate of 192fps is required to achieve real time if dynamic scale space is not used. With a 5 level dynamic scale-space, this falls to a much more attainable 12fps.
Motion Estimation
Our short range motion estimation algorithm uses the optical flow constraint equation (OFCE [23] ), together with the equations of motion [24] to fuse visual and range information [l] . This results in the following constraint equation.
In this equation, I, and I, are the horizontal and temporal derivatives of the image sequence, f is the camera focal length, yis a constant converting d s e c to pixeYfiame and 2 is depth. Ux is the lateral apparent velocity which corresponds to the speed (in &sec) at which a point in the image appears to be moving. Since Ux does not in general correspond to an objects physical velocity its absolute value is irrelevant. This allows us to assume the scale factorsfand yare equal to one, further simplifymg our formulation. 
OUR HARDWARE IMPLEMENTATION
In this section a brief introduction to the core components of our system is given before discussion moves to the specific detail of our implementation.
Processing
In order to accelerate prototype development we have opted for a commercially available prototrpe development system from Lyr Signal Processing [26] . The key processing element on the SignalMaster platform is an Analog Devices Sharc ADSP-21062 DSP which has access to 16MB of RAM. A number of interface options are available on the board (e.g. PC104, Ethemet) however the most important for our application is the BITS1 interface that allows us to add GatesMaster Mezzanine card. This adds a Virtex XCV800 FPGA running at 40MHz and an additional 16MB of SDRAM, allowing greater implementation flexibility. The majority of ow system is implemented in FPGA since this allows a compact solution where all processing, glue and interface logic is contained within a single chip.
Fuga 15D Camera
We use the Fugal5D camera from C-Cam Technologies because its simple RAM-like interface eases development. The FugalSD has a logarithmic intensity response making it less sensitive to illumination variation, however it has higher noise levels and less contrast than other cameras. Fortunately, the noise pattem is fixed and easily corrected and contrast can be adjusted via gain controls though a trade off must be found between sufficient contrast and sensitivity to illumination variation. A further issue with this camera is that is does not use a shutter. The Fuga camera measures the value of a pixel only when that pixel is addressed while traditional cameras effectively measure the value of all pixels at the same time -that is, when the shutter closes. Thus, if there is a significant delay addressing pixels, the resulting image may contain motion distortion appearing as shear in the shape of moving objects. We have found that in our context, motion distortion is negligible.
ICSL Vehicle Testbeds
Our system is being tested using the ICSL robots [27] . The ICSL has developed a number of vehicle test-beds for use in evaluating autonomous vehicle concepts without the need for large scale testing facilities. These robots feature a distributed multi-microcontroller architecture where the PIC 16C74 microcontroller and the 12C serial bus are the primary building elements. Subsystems include infrared and ultrasonic ranging systems for navigation, radio packet modem for communication and a laser based system for intelligent speed adaptation. A number of behaviors have been developed for these test-beds including fuzzy logic based leader following, static obstacle avoidance, lane keeping, intersection navigation and overtaking. Maximum velocity for the robots is 0. I d s . Figure 1 illustrates the architecture of our implementation.
IMPLEMENTATION DETAILS
To allow for testing, a link to a PC allows for visualization of results though this link can easily be replaced by another device if further processing is required. This design consists of three key sections; the memory subsystem, a collection of processes (i.e. reading from camera, processing data, output to PC etc) and buffers designed to mediate communication between processes and memory.
RAM Interface and Memory Management
Because this system a number of processes each of which could potentially require simultaneous access to RAM, bus arbitration is a necessity. Further, careful memory management (RAM allocation) leads to more efficient RAM use since devices use RAM differently. 
Bus arbitration is implemented via a combination of decoupling buffers and the RAM Interface and Controller
(RAMIC) module. Each process has a decoupling buffer implemented using Block SelectRAM (this RAM is within the FPGA so buffering does not cause fiuther contention). The buffers allow each process to operate at full speed without waiting on memory. This is especially critical for the FUGA camera where we must maintain a constant pixel rate to avoid distortion. The RAMIC polls each buffer in a round robin fashion allowing each buffer appropriate access to RAM and placing an absolute upper bound on RAM access times. To maintain a constant pixel rate, buffers do not pause when an overflowlundermn occurs. In this situation it is important to know the upperbound on memory access time so that the design can be made overflowhnderun free. Aside from providing a low level interface to SDRAM, address decoding and bus arbitration, the RAMIC also provides a number of composite operations that allow more efficient RAM use. For example, we exploit the fact that data from the Fuga camera is 8 bits wide, while each SDRAh4 location is 32 bits wide to optimize memory access. A section of RAM is allocated as a "frame-buffer" with a one to one mapping fiom camera to RAM addresses, however we stack pixels so that each RAM location stores four frames of data. To implement this, existing pixel data is read from RAM, shifted 8 bits to the left and the new pixel is inserted in the least significant eight bits. From the point of view of the buffer, this is a single "write pixel" operation, which is more efficient than using the polling process to perform the implied read and write separately.
However the real efficiency is realized when data is read for processing. Our algorithm requires 3 frames of data at each location to compute temporal image derivatives. With our pixel stacking scheme, this data can be obtained with a single RAM read operation.
The final function of the RAMIC is to generate zero-order calibration data for the FUGA camera. When the system is first started, the RAMIC generates a single image by taking the pkel-wise average of 16 fiames of data. During this time a plain, translucent sheet of paper covers the lens so that the image seen by the camera contains only noise. Next, the distribution of pixel intensities is shifted so that is has zero mean. The resulting image represents the noise at each pixel called the calibration value. The corresponding calibration value is subtracted from each pixel to eliminate the fixed noise pattern.
Buffers
The Virtex FPGA has dedicated RAM (known as Block SelectRAM) available on board. SelectRAM provides 4096 bits of storage, is dual ported, and the data width of each port can be chosen independently. A total of 28 blocks of SelectRAM are available on our XVC800 Virtex device. Because we use SelectRAM for decoupling buffers we are able to implement buffers that can simultaneously be accessed by a process and by the RAMIC. The specific design of buffers varies depending on their task though they all have the general structure shown in f i w e 2. 
Camera Datapath
In the camera data-path the Camera Interface generates the control signals necessary to read a 512*32 pixel image from the camera. E-X and E-Y are the X and Y address strobes respectively, ADDR contains the address and ADCK (active low) is the analogue to digital conversion strobe and is controlled by a 1MHz pixel clock. Frame timing is controlled by the loHz FRAME-CLK signal from the Range Scanner Interface. The camera interface also applies zeroorder calibration to the raw camera data. The preprocessing block then takes the calibrated image, subsamples it appropriately for the current dynamic scale space. This subsampled image is fed into a queue (as per [9]) where a low pass filter is applied and the result is passed to the camera buffer. Data is Written into RAM sequentially from address 0 up to address (512*32)-1 in row major order.
Range Scanner
The range scanner block implements the interface to a range scanning device and provides support operations such as mapping the range data onto the camera coordinate system, and provision a fiame clock and of scale data. Our target range scanner operates at lOHz and this rate will be used as the M e clock. This is slightly slower than our real-time rate of 12% however lOHz is acceptable since, in practice, we will rarely experience the worst case conditions assumed in our temporal aliasing calculations. Scale data relates to the image width at the current scale and other related parameters.
Processing
Our system is designed so that the reading of new image and range data occurs concurrently with the output of data to PC. When the buffering of this data is complete and when scale data has been updated, processing begins. The ProcessingIN buffer computes image derivatives and these are used by the processing block to generating a robust estimate of motion for each image column. This one dimensional motion estimate is then smoothed and segmented using a weak string model to produce the final motion estimate and segmentation.
Output Process
Because both the output of existing data to the PC and input of new from the camera and range finder occur concurrently, care must be taken to ensure data is not clobbered. We achieve this by ensuring that output to the PC always leads input of new data so that data is only updated once it has been output. To prevent possible buffer undermns, the PC Buffer reads ahead slightly before triggering the PC to start accepting data (via the START-PC line clocks per pixel per iteration under the assumption of 512*32 pixel images and 10 iterations per fiame. This is more than sufficient for our algorithm.
ACKNOWLEDGMENTS
We would like to thank Frauenhofer Autonomous Intelligent Systems Group (Ais) for their generous donation of the Signal Master platform and Fuga Camera used in this work.
CONCLUSIONS
In this presentation we have introduced a motion estimation algorithm tailored for real time implementation then shown how this algorithm can be implemented in a single chip together with all glue and interface logic to produce a compact sensing solution.
