Abstract-This paper presents a real-time trinocular disparity processor. The core module performs a pairwise segmented window matching for both the center-right and center-left image pair as their scaled down image pairs. The resulting cost functions are combined which results into nine different curves. A hierarchical classifier is presented which selects the most promising disparity value using information provided by the calculated cost curves and the pixels spatial neighborhood using a two level classification architecture. The disparity processor has been evaluated with an indoor dataset and with a real-time implementation using an FPGA and three cameras. Special care has been taken to reduce the memory footprint so that the processor doesn't need external memory.
INTRODUCTION
Trinocular vision makes use of three cameras to calculate a disparity search image (DSI). The DSI is generated by pairwise matching the images from the different cameras which is based on a local window based stereo matching architecture.
An improvement of occlusion handling in trinocular vision compared to stereo vision is achieved by Mozerov [1] . The main idea is based on the assumption that any occluded region in a matched stereo pair (center-left images) in general is not occluded in the opposite matched pair (center-right images). They use a global optimization technique to derive the composite DSI. Bidirectional matching using trinocular stereo is used by Ueshiba [2] to detect half-occlusions and to discard false matches. It uses a cumulative cost function derived from a summation of both cost curves. This paper will likewise calculate several DSI's. However, instead of combining them, a hierarchical classifier is used to select the most likely disparity for each pixel in the final DSI. The matching algorithm is based on the adaptive-weight algorithm proposed by Yoon [3] , which adjusts the support weight of each pixel in a fixed sized window. The support weights are depending on the color and spatial difference between each pixel in the window and the center pixel. Dissimilarities are computed based on the support weights and the plain similarity scores. Their experiment indicates that a local based stereo matching algorithm can produce depth maps similar to global algorithms. A hardware implementation using the same ideas is published by Motten [4] .
For each matching result, a confidence metric is calculated. A good comparison between different confidence metrics can be found in the evaluation paper of Hu [5] . Confidence metrics suitable for hardware implementation can be found in [6] . They conclude that neighboring pixels contain valuable information to distinguish good matches from bad ones.
Recently many stereo implementations have been proposed for hardware implementations. A real-time FPGA-based stereo vision system is presented by Jin [7] that makes use of the census transform. Their system includes all the pre-and postprocessing functions such as; rectification, LR-check and uniqueness test in a single FPGA. Another extensive implementation can be found in [8] . They divide the problem into two parts: first a rough depth map is constructed using a segmentation based sum of absolute differences (SAD) window comparison, second a disparity refinement module identifies false matches and replaces them with new estimates. Hardware implementations of a trinocular disparity processor are limited. A sum of SAD's with a fixed windows implementation can be found in [9] . This paper combines the strengths of an advanced stereo vision system with a two-scale adaptive window SAD incorporated in a trinocular setup.
II. SYSTEM OVERVIEW

A. General Architecture
The trinocular disparity processor takes three images that have been taken by three cameras that have a vertical alignment and a horizontal offset. Objects will appear on the same horizontal line (The epipolar line) on all images. The horizontal distance between the same objects on the center image and the left (or right) image is called the disparity. If calibrated correctly, the disparity of an object between the center-left and the centerright image pair should be the same. This characteristic can be used to discard false matches using bidirectional matching [2] or to improve the quality of the DSI especially in occluded regions [1] .
The architecture consists of three main blocks. The first block captures the pixel streams, generates the scaled images and places them in multiple on-chip parallel memories. The second block performs a pair wise window different streams (see Fig. 1 ) using a binar cost aggregation [8] . The third block calcula for each data stream and selects the final disp For every window that needs to be calculation is performed. The larger the dispa the more SAD calculations are needed. The that contains a SAD score for each dispar starting from 0), this array is also known as this architecture nine different cost curves each pixel in the DSI (1).
In this paper, C 1 stands for the lowest minima of the Cost Curve). C 2 stands for t SAD score, and so on. Their correspon indicated by D 1 and D 2 . Most matching alg the disparity from the cost curve using a "W (WTA) approach. Doing so, the minima of t will become the calculated disparity D 1 .
III. HIERARCHICAL CLASSIFIC
In the previous section it is explained t values are generated for each pixel (1). In ord them for generating the DSI, a two level hie is constructed (Fig. 2) . In the first level of disparity values are investigated independen For each disparity value a binary confid constructed using the methods presented confidences are passed on to the second lev selects the disparity to use, or indicates tha been found. The goal of this classificat promising disparity value (3). level is the SSDD for each disp classification level is a disparit is afterwards generated using individually stream.
&EOL=NEPU L ãs\
An exhaustive search is per combination of streams pr improvement. This process writings. From Fig. 3 we can streams improves the quality o improves the DSI most notice scaled images improves the di texture. 
IV. SYS
The hardware architecture First a filter and subsampling pre-processing module [8] so t with one-fourth the size of th window matching module is m multiple data stream matching. module is constructed to selec from the different disparity resu
A. Pre-Processing Module
The pre-processing modu entities for each pixel stream: used to reconstruct the color im is used to remove lens dist calibration and lastly the image generate a scaled image.
Pixels generated by the ca pattern consisting of the four Blue (B) and Green2 (G2), rep ording to [6] , the most important egmentation Size (SEG) and the s Differences Binary Window decision tree (DT) is chosen as am individually.
ier uses the generated binary ith the agreement between the new feature is called the Sum of (SSDD), it calculates the depth ent disparity streams taking the (2).
tion level is to select the most The input of this classification parity stream. The output of this ty selection. A confidence value the same method as for each
rformed in order to know which rovides the highest disparity will be elaborated in future n see that the addition of extra of the DSI. The trinocular setup eably at occluded regions. The isparity map at parts with little sukuba dataset [10] . Comparison of DSI (Left) and DSI generated from the 0 and CR1 data streams (Right).
TEM DESIGN consists of three main modules. module has been added to the that a scaled image is generated he original image. Second the modified from [8] to allow for Third a hierarchic classification ct the most promising disparity ults.
ule consists of four different first a demosaicing algorithm is mage, next a rectification module tortion and perform trinocular e is filtered and downsampled to amera are formatted in a Bayer colors: Red (R), Green1 (G1), presenting the three color filters.
The demosaicing algorithm is used to estimate the color components for each pixel. Using linear interpolation, the missing RGB colors are reconstructed from the adjacent pixels [8] . The proposed architecture makes use of the YCrCb color space. The Luminance (Y) values are used to compare the two input streams. While the chrominance values (Cr, Cb) are used to construct the binary mask window. Hence, the reconstructed RGB color space needs to be transformed into the YCrCb color space (8) . Two different kinds of distortions are present in a trinocular camera setup. The first ones are the lens distortions, the second one is the misalignment of the three cameras. Since the search space is only located on the epipolar line, both distortions should be resolved before matching can be performed. The intrinsic and extrinsic parameters of the cameras individually and the transformation matrix of the trinocular setup are determined offline using images of checkerboard patterns [11] . These parameters are hence used to construct the x and y mapping coordinates for each pixel in the image. The rectification module uses those coordinates to rectify the images in real time [6] . The rectified pixel stream is passed through a 3x3 mean filter and downsampled by a factor of two. The original pixel stream is annotated with level 1 (L1) while the scaled pixel stream is annotated with level 0 (L0).
B. Window Matching Module
The pixel streams originated from the right and left camera are compared with the center camera using a segmentation based SAD calculation (Fig. 5) . During every clock cycle a window of the center camera is compared with four windows of the left or right camera. Since four successive pixels are stored in one memory location, one memory read accesses four pixels, hence four comparison modules are running in parallel.
On every clock cycle, the stream selection unit (SSU) determines where each data stream is written to and which windows are compared.
The frequency of the window matching module directly controls the possible disparity search width of the trinocular matching architecture and can be adapted to the available resources. The higher the frequency difference between the pixel stream and the window matching module, the more comparisons can be executed. In the example on Fig. 6 the window matching module is clocked twenty-four times higher than the pixel streams.
On each clock cycle (CC), the comparison module compares the reference window with four consecutive windows (Fig. 5) . The lowest SAD score and its corresponding index are saved in a register, so that on the next clock cycle this lowest SAD score can be compared against the SAD scores of the next four windows. When the end of a search window is reached, the index indicates the disparity result and a new search window is initiated. In our example, in the first eight clock cycles, the center image is compared with the right image. In the next eight CC's the center image is compared with the left image. This architecture makes it possible to easily change the disparity search width and comparison data streams for each pixel in the DSI. By adapting the SSU it is possible to switch between a trinocular disparity search width of thirty-two to a stereo disparity search width of one hundred twenty-four without changing the architecture.
C. Hierarchical Classification Module
The hierarchical classification module consists of the generation of the features used during the classification phase and the two classification steps. The first level classifier calculates the confidence of each stream in the selection (4, 5) . For each stream, different thresholds are selected. However, the main structure of the classifier remains the same. The second level classifier selects the most promising disparity stream for the final DSI (6, 7, 8) .
&EOL=NEPU L :&EOL=NEPU4OAHA?PEKJ; Figure 6 . Window matching of different data streams.
V. IMPLEMENTATION
The architecture and methods presented in this paper have been implemented on an FPGA system, based on an Altera Cyclone IV with 114,480 logic elements and 432 memory blocks. The sources of the input streams are three cameras with a resolution of 640x480 and a pixel clock of 16 MHz resulting in a refresh rate of 52 Hz. The current implementation consists of the proposed design using a 7x7 binary adaptive window SAD with a window matching clock of 96 MHz.
The architecture has been constructed to reduce memory usage. Hence there is no need for external memories. The reduction of external memory usage has the additional advantage that the latency between input frame and output frame becomes minimal. This makes this system suitable to be incorporated in real-time control loops. In addition to the evaluation presented in section II, the system has also been tested in real life environments.
VI. CONCLUSIONS
A trinocular disparity processor has been proposed. We investigated nine cost curves resulting from pairwise comparison of three cameras. Each data stream has been investigated independently from one another and ultimately a hierarchic classification algorithm chooses the most promising disparity value.
For each of the nine cost curves, a classification algorithm is trained in order to provide a confidence indication for their disparity values. These confidences are passed on to the second level classifier which selects the disparity to use, or indicates that no disparity has been found.
The selection of classification algorithms has been used as guideline for the implementation in an FPGA. From the results we can conclude that the quality of the disparity space image increases by using more cost curves from a trinocular camera.
Due to the adaptability of the window matching module and the hierarchic classification structure, the system can easily be expanded with more data streams to further improve the disparity space image.
