Abstract: This paper reports a depth-adaptive sharpness adjustment algorithm for stereoscopic perception improvement, and presents its field-programmable gate array (FPGA) implementation results. The first step of the proposed algorithm was to estimate the depth information of an input stereo video on a block basis. Second, the objects in the input video were segmented according to their depths. Third, the sharpness of the foreground objects was enhanced and that of the background was maintained or weakened. This paper proposes a new sharpness enhancement algorithm to suppress visually annoying artifacts, such as jagging and halos. The simulation results show that the proposed algorithm can improve stereoscopic perception without intentional depth adjustments. In addition, the hardware architecture of the proposed algorithm was designed and implemented on a general-purpose FPGA board. Real-time processing for full high-definition stereo videos was accomplished using 30,278 look-up tables, 24,553 registers, and1,794,297 bits of memory at an operating frequency of 200MHz.
Introduction
As three-dimensional (3D) consumer electronics products, such as 3DTV, 3D monitors and 3D smart phones, have become increasingly popular, along with the success of 3D movies, more interest has been attracted to 3D image processing. Stereoscopic 3DTV can provide viewers with the impression of depth and a greater sense of presence [1] . The main depth cue used by the human visual system comes from the horizontal differences (parallax) between the two ocular viewpoints. To control stereoscopic perception in a 3D display, it is important to properly adjust the depth, or parallax. Many depth adjustment algorithms have been developed [2] [3] [4] [5] [6] [7] [8] . For example, some researchers, such as Kim and Sohn [2] , proposed controlling the depth information based on visual fatigue [2] [3] [4] [5] [6] . Fig. 1 describes the general depth adjustment algorithm. First, the pixel distance between the left-eye view frame (LVF) and right-eye view frame (RVF) (i.e., disparity) was estimated. Note that the disparity actually corresponds to depth information. Second, the appropriate depth was selected by considering the visual fatigue level, or the stereoscopic perception. Third, the pixels in the LVF or RVF were moved according to the adjusted depth. In this step, so-called holes can occur around the shifted pixels. Finally, these holes were filled using appropriate interpolation methods [9] .
The main drawback of such a simple parallax shifting method is that it can create losses in the image area due to unavoidable cropping at the screen edges that occurs when eliminating the unpaired points. In addition, all of these hole-filling techniques can lead to a range of distortions, which may be noticeable and visually annoying. In particular, artificially growing the depth may increase the visual fatigue level.
This paper presents a sharpness adjustment algorithm for artifact-free stereoscopic perception improvement without a deliberate depth adjustment for objects in a 3D video sequences. First, the disparity for each LVF/RVF pair was estimated on a block basis. Second, object segmentation was performed according to the disparity, so that the foreground and background objects were discriminated. Finally, the sharpness of the foreground object was enhanced, and that of the background objects was maintained or lessened. For this step, a new sharpness enhancement algorithm was presented that mitigates jagging and halo artifacts. The experiment results showed that the proposed algorithm enhances depth perception without any visual fatigue caused by an artificial depth adjustment. In addition, the hardware architecture was designed for the proposed algorithm, and the results of its implementation are discussed. The proposed algorithm was created successfully on a dedicated 200MHz fieldprogrammable gate array (FPGA) board operated in real time using the following resources: 30,278 look-up tables (LUTs), 24,553 registers, and 1,794,297 bits of memory. The implemented hardware will soon be applied to a specific 3D monitor product model.
The remainder of this paper is organized as follows: Section 2 describes the proposed algorithm. Section 3 reports the simulation results. Section 4 summarizes the very large scale implementation results, and Section 5 reports the conclusions. 
The Proposed Algorithm

Disparity Estimation
For several decades, many disparity estimation algorithms for stereo images have been developed [10] [11] [12] [13] [14] . To enable real-time hardware implementation, this paper adopts a typical block-matching-based disparity estimation algorithm because of its low computational complexity. Note that block matching is performed only in the horizontal direction because it is assumed that the input stereo frames are already rectified. The matching block size was set to 16×16, and the searched disparity vector (DV) was assigned to the central 8×8 of the matching block via proper overlapping with its neighboring matching blocks. A typical three-step hierarchical search was used for further computational reduction. Levels 2, 1 and 0 represent the coarsest, middle, and finest resolutions, respectively. Prior to the search, the frames at the middle and coarsest resolutions, i.e., levels 1 and 2 (named (1) I and (2) I , respectively) are produced by downsampling the original frame by 1/2 and 1/4, respectively, in both directions. First, a level 2 search is performed on the 8×8 block. The best DV at level 2 ( (2) d ) is obtained by minimizing the sum of absolute differences (SAD) as follows:
where (2) L I and (2) R I represent the LVF and RVF, respectively, at level 2, and (i, j) denotes the coordinate of the upper-left corner pixel in matching block (2) W . The search range at level 2, (2) Ω is set to [-32, 32] . Similarly, the search at level 1 is performed for a local search area with the center being 2x (2) d , and the best DV at level 1, (1) d , is found. The matching block size at level 1 is 8×8, and (1) Ω is set to [-1, +1]. Finally, the search at level 0 is performed for a local search area (0) Ω with the center being 2× (1) d , and the best DV d , (0) d , is found. The matching block size is 16×16, and (0) Ω is set to [-1, +1]. After obtaining the DV for each block, the so-called bidirectional check [15] is performed to investigate the accuracy of block matching. That is, if the target block in the LVF is matched to a particular block in the RVF with the corresponding DV d , the best DV of the matched block in the RVF is explored in a given search area in the LVF. If such a reverse DV estimation is accomplished, whether or not the reverse DV is -d can be determined. If it is, d is determined to be reliable. Otherwise, the DV of the target block is replaced with the median of the DVs of the neighbor blocks becaused may be unreliable. In addition, a morphological closing operation and median filtering is applied to the derived DV map on a 3×3 block basis. In this manner, a block-based disparity map is obtained.
Disparity-based Foreground Segmentation
The second step in the proposed algorithm is to extract the dominant foreground object(s) using the DV histogram. Fig. 3 illustrates the extraction process from a typical DV histogram. One feature of the DV histogram is that the DVs of a foreground object are generally located on the right side of the DV histogram; those of the background object are located on the left side. Another feature of the DV histogram is that the DVs of the background object(s) tend to be similar and gathered together. Based on these two features, an object segmentation is performed on the DV histogram. First, the start point and end point having meaningful non-zero bin-values ( S d and E d ) are found on the DV histogram, as shown Fig. 3(a) . For example, S d , which meets the following condition (2), is found, searching from the left using: Fig. 3(b) . Finally, a labeling operation is applied to only the foreground pixels. A well-known connectedcomponent labeling algorithm is adopted [16] . After labeling, several foreground objects can be produced. In this paper, the largest object was adopted as the most dominant foreground object.
Object-based Sharpness Enhancement
The last step of the proposed algorithm is to enhance the sharpness of only the selected foreground object. In this study, the remaining regions in the frame were maintained without processing. Note that a typical sharpness enhancement algorithm may cause unwanted artifacts, such as jagging, the halo effect, and noise boosting. Polesel et al. employed an adaptive filter that controls the sharpness in such a way that contrast enhancement occurs in high-detail areas with little or no image sharpening occurring in the smooth areas [17] . The adaptive filter emphasizes the medium-contrast details in the input image more than the large-contrast details, such as abrupt edges, to avoid overshoot effects in the output image. Therefore, the adaptive unsharp masking (AUM) method first divides each input image into three regions: smooth, medium-contrast, and high-contrast regions. The adaptive filter does not perform a sharpening operation in smooth areas. Therefore, the overall system is more robust to the presence of noise in the input images than the traditional approaches. In addition, the local dynamics in the high-contrast areas are already high, and such regions require only moderate sharpening. The medium-activity areas require the most enhancements. Based on this, the AUM applies strong sharpening to the medium-contrast regions, whereas moderate weak sharpening is applied to the high-contrast regions. In this manner, the AUM accomplishes the dual objectives of avoiding noise amplification and excessive overshoot in the detail areas. According to the experimental results in Section 3, the AUM algorithm does not resolve jagging or halo artifacts. This paper proposes a sharpness enhancement algorithm that provides less computational complexity and fewer artifacts than the AUM algorithm. Fig. 4 describes the proposed algorithm. In this figure, ( , ) x m n and ( , ) y m n indicate a pixel located at (m,n) in the LVF or RVF input, respectively, and the processed output. Like the AUM, the proposed algorithm consists of two parts: pixel-wise weight computation and sharpness enhancement. The details are described in the following subsection.
Weight Computation according to Edge level
For convenience, only the weight computation process for the horizontal direction is discussed. First, a typical low-pass filter (LPF) and high-pass filter (HPF) are applied sequentially to ( , ) x m n . Second, the horizontal edge level of the processed pixel, i.e., λ was fixed at 0 to avoid the overshoot artifacts. The weights and thresholds were determined empirically according to intensive experiments for various stereo video sequences. As a result, in this study, T 1 , T 2 , T 3 , and T 4 were set to 20, 40, 70, and 100, respectively.
Sharpness Enhancement
After the horizontal and vertical weights, i.e., h λ and v λ , are obtained, the sharpness enhancement using these weights is applied to ( , ) x m n , as shown in the upper part of Fig. 4 . To mitigate the halo or jagging artifacts, this paper proposes the use of an edge-preserving LPF. As seen in Fig. 4 , the sharpness enhancement was performed using one-dimensional processing to enable simple hardware implementation. Fig. 6 illustrates the basic concept of the proposed edge-preserving LPF. The LPF coefficients for the five rows in the 5×5 filtering processing block were determined adaptively. The proper weights 
In (3), the threshold T S was set to 60. In other words, a pixel that was significantly different from the current pixel ( , ) x m n was excluded from the computation. Note that proper normalization follows. Using the computed LPF coefficients, ( , ) x m n was low-pass-filtered. The edgepreserving HPF was then accomplished by subtracting the low-pass-filtered ( , ) x m n from the original ( , ) x m n . The proposed HPF provided clearer edges than the typical HPF. As a result, the halo effect around the strong edges was avoided. h λ was then multiplied with the horizontally HPFed output, and v λ was similarly multiplied with the vertically HPF-ed output. Finally, ( , ) y m n was obtained by adding the results to ( , ) x m n .
Performance Evaluation
Four Middlebury stereo images were used to evaluate the performance of the proposed algorithm; Reindeer, Cones, Dwarves, and Art. The frame size of all of the test sequences was 1920×1080. The frame format was side-by- side. Therefore, in the case of the left-right (L/R) side-byside format, the frame size of the LVF/RVF was 960×1080, and in the case of the top-bottom (T/B) side-by-side format, the frame size of the LVF/RVF was 1920×540. Fig. 7 shows the foreground extraction result for the Reindeer image in T/B format. To avoid visually annoying artifacts around the object boundary when enhancing the sharpness, an 8-pixel exterior band around the boundary of the foreground object was also assigned to the foreground object.
For the comparisons, the AUM and generalized unsharp masking algorithm (GUM) were employed [19] . Segments of the results are displayed in Fig. 8 . The AUM results suffered from jagging artifacts around the strong edges (see Fig. 8(b) ). In addition, the GUM results provided weak sharpness on the bright intensity range in Fig. 8(d) . In contrast, the proposed algorithm mitigated these jagging artifacts while maintaining the sharpness around the edges, as shown in Fig. 8(c) . Fig. 9 presents another result. The halo effect can be seen around the patterns in Fig. 9(b) and jagging artifacts can be observed around the diagonal edges. The GUM and proposed algorithm significantly suppressed such phenomena while enhancing the sharpness, but GUM boosted the noise at the flat areas as shown Fig. 9(d) .
Figs. 10 and 11 compare the final outputs from the Dwarves and Art images in the anaglyph, respectively. The foreground objects (e.g., the dwarves and the plaster cast) were sharpened except for background wallpaper. On a typical 3D monitor, the stereo video sequences manipulated by the proposed algorithm were found to provide better depth perception than the original video sequences, and reduced visual fatigue. To apply the proposed algorithm to a particular 3D monitor, it was implemented on an FPGA platform, which will be described in the following section.
For comparison in terms of the objective visual quality, an image quality assessment metric for an objective evaluation of the sharpness enhancement: multi-scale structural similarity (MSSSIM) can be employed [18] . Table 1 lists the MSSSIM values for various algorithms. Noteworthy is that the closer the MSSSIM is to 1, the closer the sharpness of the test image to that of the original. As shown in Table 1 , the proposed algorithm provides higher MSSIM values than the AUM and GUM on average.
Hardware Implementation
Architecture Design
The target video has a resolution of 1920x1080 and a frame rate of 60Hz. A careful design is needed to process, such a full high-definition stereo video in real time. Dual buffering of the frame units, parallel connection of the random access memory (RAM) in the FPGA, and pipelining modules were all applied in the hardware design for real-time operation.
The hardware was designed to operate at 200 MHz for real-time operation on a dedicated FPGA board. For the 200MHz operation, the clock cycles required for the one matching block (MB) calculation were determined by: Therefore, approximately 49,440 clock cycles are required for a one slice calculation because there are 60 MBs in a single slice. In the disparity estimation at level 2, which has the heaviest calculation load, the number of pixels to be read was 640 (8×8 pixels from the right side, 72×8 pixels from the left side). If one pixel is read in each clock cycle, 640 clock cycles are required, which is more than the allocated cycles (412 cycles) in a single MB calculation for 200MHz operation. Because 1,300 cycles are needed to handle one MB, and one slice has 60 MBs, a total of 78,000 clock cycles will be needed without pipelining for a single slice-disparity estimation, which is also more than the 49,440 cycles allowed by the 200MHz operation. To reduce the number of required clock cycles to conform 200MHz operation, 16 parallel pixels were first read together from the line buffers in a single cycle. Therefore, the clock cycles for reading the data from both sides of the line buffers for the level 2 disparity estimation were reduced by 36 cycles from the 640 cycles. Furthermore, by pipelining the disparity estimation process, the clock cycles for a single slice calculation are reduced by 5,300 from 78,000 cycles. The number of clock cycles required for the later processes, such as filtering, histogram building, and labeling, was approximately 3,740 cycles. Approximately 20,000 clock cycles were consumed while waiting to fill the line buffers. Therefore, the total number of cycles used in this implementation was approximately 29,040, which is less than the 49,440 cycles available for the 200MHz operation. Fig. 12 presents a block diagram of the hardware implementation for the algorithm. The hardware consists of 13 modules. A 1GB module of DDR2-800 SO-DIMM RAM is used as the external RAM, which is controlled by the high-performance memory controller embedded in the FPGA. The pixel data, converted from red-green-blue (RGB) 4:4:4 to luminance-bandwidth-chrominance (YUV) 4:4:4, is read in real time at 60Hz. The line buffer stores the left and right side pixel data slices for the search operation. One slice consists of 32 horizontal lines, but only 16 lines are stored in the line buffer because there is no disparity estimation in level 0.
When both line buffers are full, the disparity estimation block begins to calculate the DV values. The core block extracts the foreground region using the proposed algorithm. The foreground region information is stored in internal RAM. The stored data is then read out from RAM, and the sharpening is processed. For the RVF data, the stored foreground region determines whether to proceed with the sharpening. Because there is only a position difference between the LVF and RVF data, the shifted foreground region can be used for LVF sharpening. Therefore, the foreground extraction for the LVF can be omitted.
If the data is determined to be a foreground, the luminance (Y) data from the sharpening filter and the chroma (UV) data from external RAM are combined for the final YUV data. If the data is determined to be the background, the UV data and Y data without sharpening are combined for the final YUV data. The final YUV data are converted back to RGB for an enhanced 3D data display.
FPGA Implementation
The proposed hardware architecture was coded in Verilog HDL and implemented with a dedicated FPGA. The implementation was verified using RTL simulations. The HDL model was synthesized on the FPGA device. Fig.  13 shows a simulation waveform of the process used to calculate the DV of a MB, as an example of the implementation. The waveform shows the internal signals of the FPGA operating at 200MHz. This also shows that a total of 88 clock cycles are needed for one MB disparity estimation. Therefore, a total of 5,280 clock cycles were required for the 60 MBs disparity estimation process. This is similar to the estimated cycles (5,300 cycles) described in Section 4.1. Table 2 lists the synthesis results. This shows that 30,278 LUTs, 24,553 registers, and 1,794,297 bits of memory were used. For the real time processing of the algorithm for a 1080p 3D display, the core block was designed to operate at 200MHz. The maximum operating frequency was measured at 213.36MHz. The other blocks, such as the display control and the RGB to YUV conversion, were set to 148.35MHz for the connected displays.
As shown in Fig. 14 , the functionality of the proposed algorithm was verified using a typical 3D image with the algorithm implemented on the FPGA board.
Conclusions
This paper proposed a sharpness adjustment algorithm that provides artifact-free stereoscopic perception improvement for 3D video sequences. The proposed algorithm was shown empirically to improve stereoscopic perception without visual fatigue because there is no intentional disparity adjustment for objects. In addition, the hardware architecture of the proposed algorithm was designed, and the real-time processing for full HD stereo videos was demonstrated on a general-purpose FPGA development board with an operating frequency of 200MHz, using 30,278 look-up-tables, 24,553 registers, and 1,794,297 bits of memory. As a result, the proposed framework can be a possible solution for artifact-free stereoscopic perception improvement for 3D applications. 
