Abstract -Stereopsis, the perception of depth of an object within a scene viewed from two different points is often modelled in an effort to generate three-dimensional information from a dual image capture. This paper presents an FPGA based hardware prototype of a stereoscopic capture system using a pair of CMOS image sensors. Data buffering and pipelining methods are employed using FPGA control of embedded SRAM and DPRAM memory. Captured realtime dual image data is displayed on a VGA display and concurrently transferred, via USB, to a host PC, enabling the development of host-based 3D image processing algorithms. Concurrent migration of data from image sensor pair to VGA, C++, and MATLAB is demonstrated.
INTRODUCTION
Stereopsis is the capture of a common scene from two slightly different viewpoints, providing three-dimensional information that would be otherwise lost were each image examined as a separate entity [1] . Single camera systems are restricted to rigid scenes, or scenes where the object movement, both rate and magnitude, is known [2] . For applications such as object recognition, gaming applications, industrial automation, or automotive safety, multiple fixed position cameras are used. The key to successful stereopsis is the ability to find object correspondence in both captured frames [1] .
With the rise in the use of embedded systems and portable/handheld electronic hardware [3] , there is an upturn in research and development in areas of real-time image processing, recognition, and multidimensional image capture. The objective of this research is to present a hardware FPGA algorithm acceleration platform, which includes a stereoptic image capture providing real-time data to the hardware-accelerated algorithm.
This paper presents an FPGA hardware controlled prototype of a stereoptic capture system using a pair of CMOS image sensors. The novel dual protocol output system has been designed for the comparison of concurrent firmware and software stereoptic algorithms. Captured dual image data is displayed on VGA display, and concurrently transferred via USB to a host PC, enabling the development of host-based 3D image processing algorithms, prior to porting algorithms to FPGA hardware implementation. The image pair is ported to a MATLAB application for demonstration.
A pair of synchronised Kodak CMOS image sensors (KAC 9647) [4] is controlled from a Spartan 3 development board [5] that incorporates a Xilinx XC3S200 FPGA. Image buffering is carried out using 1MByte of fast SRAM (a pair of ISSI IS61LV25616AL-10T) [6] . The raw dual image is ported from the image buffer to a customised 24-bit RGB (VGA) graphics card carrying an AD75123 [7] . The same raw data is concurrently ported to a USB device, the FIFO FT245R [8] , enabling the development of image processing applications on a stereoptic frame pair, using personal computer (PC) applications such as C/C++ and MATLAB. This paper reviews how single protocol stereoptic hardware systems quantify their output results. The paper includes an overview of stereopsis theory, and the considerations in finding image correspondence. The paper describes the dual image stereoptic capture system FPGA hardware prototype architecture for KAC9647 image sensor. The paper illustrates the data pipeline to the SRAM buffer, and porting of data to the USB device and a VGA display.
The paper presents image data throughput performance, and real-time captured stereo images transferred to VGA screen, a host-based C++ image bitmap data converter, and a MATLAB user interface application. The structure of the paper is as follows: Section II reviews reported hardware stereoptic systems. Section III introduces a brief overview of stereopsis theory, with Section IV presenting the hardware architecture. Sections V, VI, VII and VIII present the CIS, SRAM, USB and VGA peripherals respectively. II REVIEW OF STEREOPTIC HARDWARE SYSTEM QUANTIFYING.
Stereoptic hardware systems primarily base the quality of the system on the time taken to complete the correspondence (see Section III) algorithm for a single frame [2] , [9] [10] [11] [12] [13] [14] . The frame dimension, and image complexity are therefore key factors in rating the quality of the stereopsis implementation. Another important attribute is the quality of the disparity (see Section III) map [11] , [13] and [14] . However the disparity map requires both stereo images for an opinion-based visual comparison. A more rigid approach to the analysis of results is the rectification of the stereo pair, the image processing levels of which may far outweigh the original task of object tracking or distance measuring [11] .
An alternative analysis of results for those using configurable logic is the comparison of processing time between an embedded configuration and a PC-based application running the same algorithm [12] [13] [14] , with PC-based algorithms using a database to source a stereo pair.
The hardware method presented ports the same live captured pair on the PC and configurable hardware simultaneously, allowing an immediate comparison of a hardware embedded and a PC application version of the developing algorithm.
III STEREOPSIS OVERVIEW
Stereopsis is the extraction of threedimensional data from a pair of images of the same scene taken from slightly different viewpoints [1] . Consider the stereo image pair shown in Figure 1 below. Using only the image on the left, the comparison of physical size of the sphere and the cube is lost. Using the image on the right, the sphere is nearer than the cube, but if the cube represents an office block, is the sphere a marble or a beach ball? Using the information of both left and right images together determines that both objects are in close proximity to each other and are of comparable dimension. Consider the upper left hand corner of the cube in both images. Its coordinates are slightly different in both image frames. The difference in coordinates, in conjunction with the position of both cameras, applied to epipolar geometry, provides the distance of the object from the camera pair [1] , [14] .
However, the lower left hand corner of the cube is not available in the right image. This is a scenario referred to as a discontinuity for which stereopsis fails. Likewise, the sphere, although complete, has no defined points for correspondence and so too fails. Increasing the distance between the cameras increases the disparity, but also increases the distance between corresponding points, and may fail to correspond. If the points share a common coordinate then the disparity is zero and also results in a stereoptic failure.
IV HARDWARE ARCHITECTURE
This section presents the hardware architecture, and the method used to pipeline image data through the memory buffering system. There are four hardware peripherals, two of which (CIS pair), write data to the memory buffer, and two that read from the buffer, namely the VGA display and the USB device attached to the PC. The different read/write rates of the peripherals to the common SRAM are further buffered using dedicated DPRAMs for each peripheral. The dual frameset data from the CIS pair is transferred to the SRAM via DPRAM, and when required, read from the SRAM buffer for display on a VGA screen, again via a DPRAM. A third SRAM to DPRAM interaction reads a sample dual frameset for USB data transfer to a personal computer (PC) for processing.
A Field Programmable Gate Array (FPGA), which also provides the DPRAM interfaces for each of the four hardware peripheral devices, controls the elements of the system, provides bus selection, and determines address location for read and write to the SRAM. Each of the hardware peripherals has 2048 bytes of allocated FPGA DPRAM for interfacing with the SRAM data bus, each referred to here as a "block" of data. The DPRAM consists of a common matrix of data that is independently accessed from two ports, (Port A and Port B). Each port is clocked separately, can have different data bus widths, and can be written to, and read from, simultaneously. In the event of a data bus contention, reading and writing to the same byte within the common matrix, the DPRAM is configured to read first, write second. Figure 3 illustrates a DPRAM interconnect for a CIS. Both DPRAM configurations selected for each CIS are identical. Port A having an 8-bit data bus has 2048 addresses (000 to 7FF). Port A is clocked by the data valid signal from each CIS whilst image data (not header data), is presented to the CIS data bus. Port B for both CIS DPRAMs, having a 16-bit wide data bus, has 1024 addresses (000 to 3FF). Port B is clocked from the SRAM data valid signal during the write to SRAM process. The parity bit per byte available at each port is ignored. The DPRAM will undertake 32 write cycles to transfer a complete image from CIS to SRAM. Each full DPRAM contains the data for 8 rows of the 256x256 image. Each write time is limited by SRAM access time, constant at 2048 * 10ns + 220ns = 20.7 µs
The 220ns offset accounts for the SRAM set up for write, and exit from write. Whilst the SRAM write time is constant, the time between sequential writes is dependent on the length of exposure per frame. The pixel address value, and the CIS frame rate determine when to empty the DPRAM to SRAM. Emptying too early results in the upper address bytes of the block retaining data from the previous block. Emptying too late results in the upper address bytes of the block obtaining data from the next block.
The dual image required for the VGA display is collected from the SRAM using a 2048 byte DPRAM arranged in a 32-bit read and an 8-bit write configuration, (unlike the CIS, which is a 16-bit read). The DPRAM has the capacity for four VGA lines each of which is comprised of two CIS lines and therefore interlaced. It will take 20.48 µs to refresh its data contents from the SRAM, and deplete its contents over a time period of 124 µs. It has a write/read duty cycle of 14.2%, which is constant, unlike the other peripherals.
The DPRAM for the USB device is identical to that of the VGA DPRAM, again taking 20.48 µs to refresh its data contents from the SRAM. Its rate of depletion solely depends on the rate at which the host PC requires the data. Vigilance is required on triggering the refresh from SRAM task. Refreshing too early will cause the last address byte to be overwritten before it has been read, and refreshing too late will cause the first address byte of the previous block to be retained as the first byte of the next block.
V CMOS IMAGE SENSOR
Each CIS has an independent clock running at 25 MHz and is configured over an I 2 C bus by the FPGA. The row and frame timing is initiated by the FPGA from the applied signals HSync and VSync from the FPGA to the CIS. The KAC-9647 image sensor has a maximum frame rate listed at 68 fps on a 640 x 480 frame selection. A frame size of 256 rows and 256 columns has been selected in the CIS configuration, increasing the maximum possible frame rate to 241 fps. The primary reason for the selection of this frame size is simplifying the process of assigning each pixel to an address. The pixel address is generated from a Data Valid rising edge count, ignoring leading edges that are not image data, such as those from the row header. The CIS in turn provides image data to the FPGA on a 10-bit data bus that is further reduced to 8-bit, discarding the 2 least significant bits. This data is clocked into a DPRAM using the Data Valid signal leading edge and the interpreted pixel count as the dual-port address location. Figure 4 demonstrates the horizontal timing, (row time length), of a CIS. To increase the overall exposure time by increasing the row time, the trailing edge of HSync is delayed; this is the implemented method for this hardware configuration, there are alternatives. Figure 5 demonstrates the vertical timing, (frame time length), of a CIS. To increase the overall exposure time by increasing the frame time, delay the trailing edge of VSync. The pixel location within a 256 x 256 CIS image is the address applied to the 10-bit address bus of Port A of its corresponding DPRAM. The three most significant bits of the address select one of eight rows, whereas the lower seven bits denote the column position of the pixel. The DPRAM has no concern with the remaining upper six bits of the pixel address; these upper bits determine the block position within the SRAM page.
Each CIS has 51 parameters that are accessed by the FPGA CIS interface via an I 2 C bus. These parameters can be changed using push button and toggle switches on the Digilent Spartan 3 development board. The values of the switch settings are displayed on the VGA screen, the seven-segment display being disconnected to make way for the 12 IO required for the USB device.
VI SRAM MAPPING
The SRAM configuration used in this application is a pair of 256K x 16-bit arrays (IS61LV25616AL-10T). The array has a combined capacity of 1Mbyte with an access time of 10 ns. Both devices share a common 18-bit address bus (0000h to 3FFFFh), and common read/write and output enable signals. Due to this restriction, the SRAMs cannot operate independently, i.e. one array reading, whilst the other array is writing. As a result, and with the objective of attaining maximum data throughput rate, both arrays have been cascaded to provide a 32-bit wide data bus.
The SRAM is the hub for image data transfer, all data captured by the dual CIS (DCIS) is routed through the SRAM for access by the USB device and VGA display. The interface between each peripheral and SRAM is a dedicated DPRAM, allowing different read-write rates over varying data bus widths. With concurrent operation of DCIS capture, VGA display, and USB transfer, regulation of read/write access is required to inhibit bus collision on the common SRAM data bus. Figure 7 illustrates the SRAM data bus layout where peripheral access is determined and selected by the FPGA.
[f1] Figure 7 : SRAM data bus configuration.
The SRAM has the capacity to store up to eight pairs of captured images. The page selected for write by the DCIS is auto incrementing, but has the capability of skipping a page, if for example the USB device has been halted. The VGA and USB read will, on rollover to a new page, select the most recent complete page updated by the DCIS.
VII USB DEVICE
The entire hardware apparatus becomes a USB device when interfaced with an FT245, allowing a PC to become a USB host, providing a platform from which the end user may apply a choice of image processing methods to the delivered image pair. The FT245 is a USB 2.0 compliant interface to a FIFO buffer accessed from a bi-directional, halfduplex, 8-bit data bus. The FT245 is capable of reading from its associated DPRAM at 6.25 Mbytes/Sec, giving 47.7 dfps, however the USB engine (high speed) limits this rate to 11.4 dfps.
The data bus side of the FT245 has two status signals to indicate (a) the host can accept more data from the data bus, or (b) the device has received data from the host that has yet to be read. There is no feedback regarding the quantity of data to be read or written, this information is derived from the interfaced DPRAM address. The host must read available data; the data bus will not overwrite unwritten data. Figure 8 illustrates the internal FPGA configuration, permitting the dual image capture system to act as a USB device, capable of transferring data to the USB host (PC). The bulk of data transfer is image data from the DPRAM to the FT245 data bus. There are instances when the host needs to send a CIS parameter change, or an address status update to the FPGA in the form of a 256-byte datagram. This datagram, whilst available on the FT245 bus, is not directed to the DPRAM, it is sent to a USB Control entity that applies the enclosed command. The USB Control ignores the preamble, acknowledges the next block number that the host will require and finally, updates each CIS control with its associated parameter list.
The device then returns the datagram to the host with the updated variables. The host cannot determine whether received content is image data, or command data. A preamble consisting of 32 bytes, (eight reoccurrences of 0000FFFFh) determine the data as control/status rather than image data due to the unlikelihood that neighbouring common coloured pixels of the Bayer array would have alternating minimum/maximum values.
VIII
VGA OUTPUT
The image capture of the DCIS is read from the SRAM and transferred to the 10-bit RGB graphics card (using the eight MSB), via its associated DPRAM to generate an analogue VGA output. The image pair is normally displayed in raw Bayer Grid format, direct from each CIS, prior to any image processing. The VGA format, with its 480 x 640 visible window, is in fact a 525 x 800 matrix. The matrix requires a constant refresh period of 16.8 ms. IX RESULTS Figure 10 illustrates a stereo capture of a distant pair of folders and a coil of wire (used as a subject for test and analysis). The warping of the image in Figure 9 is due to the convex CRT surface, not barrelling due to the CIS lens. At the top of the images there appears a strip that should be at the bottom of the image. This originated as an indexing error and was applied to the test to demonstrate the size of a block per image of DPRAM data. The image on the left is a Bayer grid array format, however, the image on the right is converted to greyscale without interpolation using the FPGA. Figure 11 , (USB to PC data), illustrates an almost canvass like texture on the left image that is not present in the right, an artefact of displaying Bayer format without colour. The vertical edges in the right hand image appear jagged. This is due to synchronised cameras using a common data strobe, using separate clocks causes a phase shift. This has been corrected using a unique data strobe for each CIS. Figure 12 shows the images imported into MATLAB via the USB interface. The left hand image displays a form of lattice interference, or noise. This is due to aliasing that occurs in MATLAB figures when an image size is changed by dragging a corner or forced resizing to fit within a plot boundary. Figure 13 and 14 provide overviews of the hardware layout. Table 1 demonstrates that approximately one third of the resources in the relatively small Xilinx XC3S200 FPGA are used, (without correspondence algorithm implementation). X CONCLUSION This paper presents a low-cost FPGA implementation of a real-time stereoscopic image capture system. The novelty of which lies in the ability to deliver concurrent synchronous (VGA) and USB protocols of a common captured image pair. The system demonstrates a method of applying stereoptic research and development to low-end FPGA devices and is a suitable platform for validation of firmware and software multidimensional image processing by comparative methods.
Logic Utilization
Future work will implement full speed USB, increasing the throughput 364 dfps. Further development of the MATLAB user interface will be performed to enable control of the CMOS image sensor parameters and implement a series of realtime image processing algorithms.
XI

