Abstract
Introduction
In computer vision, stereo images are captured similar to the way in which the human eye captures scenes. One of the key topics in the stereo system is to reduce visual fatigue, while maintaining sufficient 3D reality [1] . In this paper, we propose a real-time virtual re-convergence hardware platform to decrease the visual fatigue. To reduce the visual fatigue, we adopt the virtual view control method of the optical axis of a stereoscopic camera to converge the target object in the divergence zone [2] . Fig. 1 shows the result of the virtual view controlled image changed from the original view. To create virtual view image, we shifted each image in stereoscopy condignly by virtual view control procedure. 
Hardware Architecture of Virtual Reconvergence
As shown in Fig. 2 , our virtual re-convergence system is realized with the following sub-systems:
• Image rectification: to increase accuracy of disparity estimation and decrease visual fatigue from stereoscopy misalignment.
• Disparity estimation: to estimate total disparities in stereoscopy for providing depth information with depth-map to vergence control procedure.
• Depth post-processing: to increase accuracy of total number of disparities. We devise a so-called "disparity smoothing" filter that computes to smooth the disparity value with the neighborhood disparities.
• Virtual view control: to determine and adjust the main object that affects visual fatigue. Our process uses the following three major steps: 1) Calculate a disparity-histogram; 2) Sort disparity values that exceed the threshold value; 3) Find maximum disparity value. We implemented our virtual re-convergence platform by developing real-time dedicated hardware architecture. Fig. 3 shows the overall hardware architecture. Our platform consists of an image rectification core, a disparity estimation core, a depth post-processing core, a virtual view control core, various memories and a memory controller. The memory controller interfaces to external memory and arbitrates between the DDR2 and SRAM memories. The data port width of the each memory is 32-bit. Fig. 4 shows the data transfer flow of our image rectification procedure. The memory address of rectified image is contained in the lookup table memory. In the input image memory, the pixel value corresponding to the address in lookup table is transferred to the rectified image memory. Rectifying the images with high resolution requires large lookup table that results in delay increase. Therefore, for realtime processing, we compress the lookup table by adopting differential encoding, where an example is shown in Fig. 5 .
Image Rectification Core

Fig. 4. Rectification data transfer process
The overall architecture of our rectification core is shown in Fig. 3 . The core consists of dual port BRAM, PX/PY calculator, and PX/PY to 1-D address convertor. PX/PY calculator acts as a differential decoder for the compressed look-up-table input from dual port BRAM. PX/PY to 1-D address convertor decides destination of each pixel. 
Disparity Estimation Core
Our overall flow of disparity estimation core is shown in Fig. 6 . Our core contains major two steps: 1) search range estimation; 2) disparity estimation. We need to consider several disparities in search range for the accurate disparity estimation. The search range estimation module generates a histogram about the disparity intensity level in the search range. The search range estimation consists of census transform, hamming distance, adder tree, and WTA modules. The sparse disparities are removed by the search range estimation module. The updated search range is used to determine disparity during the disparity estimation. The disparity estimation hardware architecture is composed similar to search range estimation hardware architecture. 
Fig. 6. Overall flow of disparity estimation core
In our core, 11×11 window size is used for census transform that has been recommended by several previous works [3, 4] . In order to generate the census transform window at one clock, we use twelve separated line buffers. The eleven buffers are used to compose the window, and the other buffer is the storing buffer. The first eleven clocks are consumed to construct the initial window. After that, the only one clock is needed for each pixel, because we reuse the ten rows of the window. Since the distance between two sequential elements within the same window is three pixels, we change the reading order of the buffer as shown in Fig. 7 .
The elements of the census transform window are registers. We use two images and YUV plane. Thus proposed architecture has the six census transform windows. Fig. 8 shows the hardware architecture for the census transform. Each element of the window is compared with center element. If each element is bigger than the center element, the comparator result is a '1' bit, otherwise a '0' bit. The bit-streams (the result of census transform) are generated from each pixel, and they are stored in switching buffer. We compute a number of pixels in the window in parallel. Our census transform generates 120 bit vectors per pixel. 9 shows the disparity smoothing as a depth post-processing. We use a 9×9 window size for disparity smoothing. The space between two elements is four pixels. We determine that the line buffer size is 64 lines for the 9×9 window and the element interval. After comparing the central pixel with the other elements in the image window, the accumulator module adds selected elements in the disparity window. We use a non-restoring array divider for the unsigned divider module. The divider module requires 16 clocks per pixel.
The virtual view control core is simpler than any other cores. The outputs from disparity smoothing are accumulated in the histogram buffer. The comparing module determines values greater than threshold from the histogram buffer. The final value in the histogram buffer is the selected disparity of the main object. 
Implementation Results
Fig.10 shows our implemented virtual reconvergence hardware platform. Fig. 10 (a) shows the stereoscopic imaging camera that has the same specification with a highly adjustable jig. This stereo camera sends each image through the HDMI cable to the FPGA board as shown in Fig. 10 (b) . The board contains the HDMI image capture board, Xilinx Virtex-5 FPGA chip, and JTAG. The HDMI image capture board consists of two input ports from the stereo camera and one output port to the 3D TV as shown in Fig. 10 (c) . Our system runs at 60 frames per second. Table 1 lists the resource utilization for each module in our system and compare with existing system. We reduced the complexity of disparity estimation that requires the largest amount of logic resources. Table 2 compares the disparity estimation speed of the proposed architecture with other implementations quantitatively. Although the frame rate of the proposed system is lower than other systems, the processing speed of our proposed system is sufficiently high to operate in real-time in terms of million disparity estimation per second (MDE/s). 
