Abstract Stereo matching is one of the most widely used algorithms in real-time image processing applications such as positioning systems for mobile robots, three-dimensional building mapping and both recognition, detection and three-dimensional reconstruction of objects. In areabased algorithms, the similarity between one pixel of the left image and one pixel of the right image is measured using a correlation index computed on vicinities of these pixels called correlation windows. To preserve edges, small windows need to be used. On the other hand, for homogeneous areas, large windows are required. Due to only local information is used, matching between primitives is difficult. In this article, FPGA implementing of an efficient similarity-based adaptive window algorithm for dense disparity maps estimation in real-time is described. To evaluate the proposed algorithm's performance, the developed FPGA architecture was simulated via ModelSim-Altera 6.6c using different synthetic stereo pairs and different sizes for correlation window. In addition, the FPGA architecture was implemented in an FPGA Cyclone IIEP2C35F672C6 embedded in an Altera development board DE2. The disparity maps are computed at a rate of 76 frames per second for stereo pairs of 1280 Â 1024 pixel resolution and a maximum expected disparity equal to 15. The developed FPGA architecture offers better results with respect to most of the real-time area-based stereo matching algorithms reported in the literature, allows increasing the processing speed up to 93,061,120 pixels per second and enables it to be implemented in the majority of the medium gamma FPGA devices.
Introduction
The perception of the depth values of the points contained in a scene is one of the most important tasks of the computer vision systems and has been used in several applications such as positioning systems for mobile robots and both recognition, detection and three-dimensional reconstruction of objects [2, 3, 4, 13, 24] .
Although numerous techniques exist to determine the depth of a scene, to extract the information referring to the depth from images obtained by a stereo configuration has become the most used technique. In this technique, the correspondence between stereo pairs and the geometrical configuration of the stereo camera allows to obtain images of depth called disparity maps. To determine a disparity map, it is necessary to measure the similarity of the points contained in two images. Techniques to determine these similarities are divided into two categories: area-based algorithms [1, 14, 16, 25, 32, 33] and feature-based algorithms [7, 12, 19] .
Area-based algorithms use the gray scale or color values of the surrounding pixels to the interest pixel for similarity estimation and produce dense disparity maps, i.e., it compute disparity for each pixel in stereo pair. These algorithms are more efficient in runtime, computer resource consumption and mathematical simplicity in comparison with feature-based algorithms. On the other hand, feature-based algorithms are based on specific interest points and are more stable against changes of contrast, environment conditions and illumination due to which they represent the geometric properties of the scene and the interest points are selected according to detectors of specific features. The main restriction of feature-based algorithms is that they do not allow to generate dense disparity maps, therefore, they often need to be applied with other techniques. In addition, a pre-processing stage for the extraction of features is necessary, which increases the computational resource consumption and runtime.
Due to FPGAs devices allow high-speed handling of a large deal of information, several algorithms for the estimation of disparity maps have been implemented in these devices [10, 17, 18, 34] . Depending on the configuration of the cameras, the range of disparity levels varies; in the case of implemented algorithms in FPGA, this implies a significant increment of the consumption of hardware resources, this has motivated diverse authors to study the possibility of reducing that disadvantage [21, 28] and search for new approaches to implement stereo vision algorithms in FPGA devices.
Related works
The system presented in [36] consists in a 4 Â 4 array of FPGAs connected in mesh-type configuration; authors use a maximum total of near 35,000 LUT of four inputs, allowing to process 40 frames per second for images of 320 Â 240 pixel resolution. In [5] , a structure based on four FPGAs Virtex 2000E of Xilinx is presented, obtaining dense disparity maps at a speed of 40 frames per second for images of 256 Â 360 pixel resolution. In [6] , the use of a single FPGA is proposed, the developed system processes images at 30 frames per second using images of 640 Â 480 pixel resolution.
The architecture developed in [29] uses a technique based on SAD to calculate the optical flow efficiently; the system generates dense disparity maps at speeds superior to 800 frames per second for images of 320 Â 240 pixels using a correlation window of 7 Â 7 and a maximum expected disparity equal to 121. A modification of SAD is shown in [22] ; the authors of this work synthesize diverse versions of SAD to determine the needs and the performance of the hardware resources; by decomposing the correlation window of SAD in rows and columns using buffers, a saving of resource of around 50 % is reached. Using different forms of windows, the high consumptions of memory decrease without any detriment of the quality. Disparity maps are calculated at speed of 122 frames per second for images of 320 Â 240 pixels and a maximum expected disparity equal to 64.
The architecture in [26] uses four FPGAs to conduct a rectification in real-time; later, a verification of left-right consistency was applied to improve the quality of the produced disparity map. Speeds of 30 frames per second are reached for images of 640 Â 480 pixel resolution and a maximum expected disparity equal to 128. In [1] , an FPGA correlation-edge distance approach is proposed. Speeds of 76 frames per second are reached for images of 1280 Â 1024 pixel resolution and a maximum expected disparity equal to 15. Using a geometric feature, the euclidean distance between the selected point and the nearest left edge, the developed FPGA architecture provides a improvement over others conventional correlationbased stereo matching algorithms.
In [9] , one module for real-time disparity maps computation implemented in an FPGA Stratix IV of Altera is proposed; disparity maps are computed at a rate of 320 frames per second for images of 640 Â 480 pixels and a maximum expected disparity equal to 80. Finally, the module developed in [11] enables to process 275 frames per second for images with a maximum expected disparity equal to 80 and 640 Â 480 pixel resolution; the presented architecture provides a high speed of processing at expenses of the accuracy with great scalability in terms of disparity levels.
Adaptive window algorithms
Several adaptive algorithms have been proposed to improve results in both depth discontinuities and homogeneous areas. A technique of adaptive window in combination with SAD is used in [30] ; the algorithm processes images of up to 1024 Â 1024 pixels and a maximum expected disparity equal to 32 at 47 frames per second. Authors of [20] estimate the current depth by changes in correlation window size and shape. These changes were performed iteratively according to the local variation of the gray scale values. However, the algorithm is computationally expensive and sensible to the initial depth estimates. Authors of [35] have changed the window size and shape by optimization over a large class of compact windows via minimum ratio cycle. The algorithm presented in [23] proposes using edges in the reference image to determine the size of a rectangular window. In [37] , pixels are aggregated adaptively based on pixel similarity using a tree structure. In [27] , obtain the aggregation process cost from a perspective of a histogram is proposed.
To simplify the adaptive algorithms, several algorithms have been proposed. Authors of [8] compute the correlation coefficients on nine windows, and the one yielding the lowest value is retained. In [15] , the use of a central window surrounded by several support windows is proposed. The correlation coefficients of the best support windows, i.e., the lowest values, are added to the coefficient computed on the central window. The reduced number of windows used in these algorithms cannot cover the whole range of different sizes and shapes required in all the situations. The use of non-parametric measures has been proposed by authors of [38] ; in the Census transform, each pixel and its surrounding is mapped into a vector of boolean variables, denoting the ordering relation between the center pixel and a vicinity pixel. Boolean vectors are compared using the Hamming distance. Hamming distances are summed over a small local area and the shift that minimizes Hamming distance is retained as the disparity. Non-parametric measures reduce the sensitivity to outliers but not resolves the problem of the window size due to which the window size must remain small.
Correlation using fixed-size windows
The main disadvantage regarding the algorithms described in Sect. 2.1 is that these cannot be implemented in a dedicated hardware for real-time processing. However, in this research, we are interested on stereo matching algorithms suitable for real-time image processing. The most adapted are correlation-based algorithms such as the Sum of Absolute Differences (SAD), because they have a regular structure with fixed runtime. In addition, several systems that use correlation-based algorithms have been described in the literature.
In majority of area-based algorithms, a rectangular vicinity centered on a reference pixel in one of the images from a stereo pair is compared with similar vicinities for some pixels in the same raster line of the other image. Vicinities are called correlation windows and can be compared using a correlation-based measure such as SAD:
where I l ðx þ i; y þ jÞ and I r ðx þ i þ s; y þ jÞ are the gray scale values of the pixels within the window in both images, called the left and right images, respectively. ð2 Â w þ 1Þ 2 is the correlation window size, s is the shift of the window in the right image and the maximal shift of the correlation window in the right image is s m . A correlation coefficient is determined for each pixel and the shift that minimizes the correlation coefficient is retained as the disparity. These algorithms yield a dense depth map, but it need a high runtime. Disparity maps generated by applying the SAD algorithm on different synthetic stereo pairs are shown in Fig. 1 . The main problem with this algorithm is to select the correlation window size. High window size values allow to determine the correct correlation values in areas with uniform texture. However, these window sizes imply a high computational demand and erroneous values at certain points due to the blurring edges and that small features are eliminated as seen in Fig. 1c, f, i . On the other hand, small window sizes imply low computational demand but the correlation coefficient measurement is sensitive to noise; hence, erroneous values at uniform texture regions are generated as seen in Fig. 1b , e, h.
To avoid the main disadvantages of the SAD algorithm (blurring edges and noise in homogeneous areas), the use of an adaptive correlation window based on a similarity criterion suitable for real-time image processing is proposed. Hence, in this article, an area-based stereo matching algorithm in which the size and shape of the correlation window are adjusted by each pixel in the reference image according to its content and his FPGA implementation are described. The proposed algorithm uses the gray scale value variations in the window as a technique to determine the similarity criterion. It is demonstrated that even with a simple similarity criterion, the proposed algorithm outperforms other adaptive window algorithms and enables to be implemented in a dedicated hardware for real-time processing such as FPGA devices. Furthermore, it is demonstrated that the developed FPGA architecture outperforms to most of the other real-time area-based stereo matching algorithms reported in the literature and allows to maintain a high processing speed.
The rest of this paper is organized as follows: Sect. 4 presents the proposed algorithm and the technique to determine the similarity criterion used for the selection of pixels. In Sect. 5, the FPGA architecture for the proposed algorithm is described. Experimental results for different synthetic stereo pairs, a comparison with other adaptive window algorithms, a comparison regarding several realtime stereo matching implementations reported in the literature and FPGA implementation results are detailed in Sect. 6. Finally, Sect. 7 concludes this article.
The proposed method
The main objective in this research is to develop one algorithm that uses a single window, which is processed only once using a recursive approach appropriate for dedicated hardware implementation. To explain the proposed algorithm, the Tsukuba scene shown in the Fig. 2a is used. This image presents multiple objects at different depths. Depth of each object is indicated using gray scale values as shown in the Fig. 2b .
The pixels within the small overlapped window as illustrated in Fig. 3a include projections of points of different objects as shown in Fig. 3b . When correlation coefficient is computed using all the pixels of this window, the averaging effect yields errors on the estimated disparity. On the other hand, Fig. 3c shows a vicinity in which only the pixels that are the projections of points of the same object are used while the others are not considered and eliminated from the window. Pixels that are not considered are indicated in black. Color of the pixels retained is similar to the central pixel and they have the same depth as shown in the Fig. 3d . Using this window, disparity estimation is more accurate.
In the similarity-based adaptive window algorithm (SBAW), a fixed-size window is centered on each pixel of the reference image, but only the selected pixels by similarity criterion are used to compute the correlation coefficient. Any correlation coefficient based on gray scale values can be modified using this technique. For example, the standard SAD expression turns into: 
where the coefficient bðx; y; i; jÞ is equal to 1 when the pixels from the correlation window are projections of the selected point, otherwise is zero. i and j are used in the sum process. This corresponds to define a window with variable size and shape that can be adapted to the reference image local data. In order that pixels within the window correspond to the same object than the selected pixel P l ðx; yÞ, a pixel P l ðx þ i; y þ jÞ is included or excluded from the window according to a similarity criterion. If the two pixels are similar, bðx; y; i; jÞ is set to 1, otherwise is zero.
Techniques to define the similarity criteria
Several techniques are able to be used to define the similarity criterion. However, a technique based on recursive approach is more suitable in terms of computational efficiency and facilitates his hardware implementation. In this section, a technique based on the comparison of the gray scale values is described. It is demonstrated that even a simple technique allows the use of adaptive windows achieving to increase the disparity map's accuracy.
Criterion based on comparison of the gray scale values
We can assume that two pixels are not similar and they have different disparity, when there is a significant difference between their gray scale values [39] . Then, we set bðx; y; i; jÞ to 1 only when the gray scale value pðx þ i; y þ jÞ is close to the gray scale value of the selected pixel p(x, y), i.e., if:
jpðx þ i; y þ jÞ À pðx; yÞj Tðx; yÞ; ð2Þ
where T(x, y) is the maximum acceptable difference between the gray scale values. In practice, it is sufficient to assign the value of T 8 pðx; yÞ as a constant value defined by the user. However, the problem is to determine an appropriate value for all points contained in the input stereo pair. By analyzing simulations performed in Matlab R2013a, it was determined that small values of T are most appropriate for points contained in regions near the edges, while higher values of T are more suitable for regions which belong to the same object. On the other hand, it was determined that by assigning a constant value to T erroneous estimations occur in regions where due to the color of selected pixel, some pixels of to the same object are eliminated from the correlation window. Therefore, assigning to T a constant value does not ensure that an appropriate value for each point of the input stereo pair is used; furthermore, wrong estimations will be obtained at some points.
To compute T, the use of the sum of absolute differences between the selected pixel and vicinity pixels is proposed. This value of T is adapted appropriately to most points contained in the stereo pair, i.e., in homogeneous areas in which it is supposed that all the points correspond to the same object, T is small and allows to avoid noise points and punctual features of the object. This enables to increase the accuracy of the correlation measure. On the other hand, when multiple objects are projected, the T value increases; however, this value allows to differentiate between the object that includes the selected point and others. Through multiple tests performed in Matlab, it was determined that the minimum pixels required for an T accurate estimation are the pixels around the selected pixel, I l ðx; yÞ (see Fig. 4 ; Eq. 4).
Finally, it was determined that in areas where the variation of the correlation window is high, mainly the points close to the corners of objects, T value is high and the disparity estimation accuracy decreases. Even with this limitation, a high accuracy could be reached (see Fig. 4 ). In this case j ¼ 1 and T could be defined as T ¼ K, Eq. 4.
However, using a simple adequation, it is possible to increase the accuracy level. Considering that when the selected point is a corner, T value is high and points of different objects would be included in the correlation window. Therefore, when T value is high, this value could be replaced with a smaller value that allows to differentiate between the object that includes the selected point and the others, based on the assumption that into a large correlation window, multiple objects are projected, i.e., multiple corners are included. If the correlation window size increases, the number of projected objects and corners proportionally increases. We propose to compute the T value adding a restriction parameter applicable on the corner points, Eq. 3. This parameter is computed as shown in Eq. 5 and enables to reduce the errors at corners and increase the general accuracy near to 2 %. Figure 5 shows some correlation windows for the Tsukuba scene using Eq. 3. Although several techniques can be used to define the similarity criterion, we can affirm that even using a simple technique like the proposed in this article, any rectangular correlation window can be adapted to the local variations in the stereo pair.
Tðx; yÞ ¼ Kðx; yÞ; Kðx; yÞ w w;
otherwise; ð3Þ
as n ¼ bits per pixel ðbppÞ: ð5Þ
In standard algorithms, the disparity d l ðx; yÞ is defined as the shift s giving the maximum (or minimum) value of the correlation values, Eq. 1. To detect occlusions, the leftright consistency is used. For each pixel, if the disparity d l ðx; yÞ computed using the left image as a reference is equal to the disparity d r ðx þ S m ; yÞ computed using the right image as the reference, Eq. 6, the solution is considered as correct. Otherwise, the pixels are marked as occluded and the disparity can be computed with subpixel accuracy or be assigned as the minimum value between d l ðx; yÞ and d r ðx þ S m ; yÞ. In this case, the minimum value between d l ðx; yÞ and d r ðx þ S m ; yÞ will be used. 
Computational complexity
To explain the computational complexity of the proposed algorithm, first, the SAD computational complexity is analyzed. In this case, the computational complexity is defined as following: O SAD ðMSD=d 0 Þ; where M is the size of the input image. S is the size of the correlation window. D is the maximum expected disparity and d 0 is the increment regarding the disparity values. Like SAD, the proposed algorithm possesses a computational complexity defined in the same terms, Table 2 
Based on the high efficient of the SAD algorithm and considering that this computational complexity is equal to the complexity of the proposed algorithm, it is possible to affirm the high efficient of the proposed method. In addition, when SAD applying leftright consistency was implemented, the SAD runtime was similar to the proposed method runtime for the same setup, Table 1 . As can be seen, the increment on the runtime, in all the cases near to 30 %, is the time required for the T value computation. Fig. 5 The selected pixels using the proposed algorithm All the runtimes are measured in seconds and were obtained via MatLab Although the algorithm presented in Sect. 4 possesses a low mathematical complexity, computing a disparity map for a 384 Â 288 pixel resolution synthetic stereo pair (pixel resolution of the Tsukuba scene) implies a runtime close to 1 second. This time is not appropriate for real-time applications. This was the main motivation to search efficient ways to implement the proposed algorithm, an FPGA implementation was selected. In Fig. 6 , an overview of the developed FPGA architecture is shown. This architecture have three inputs, clk pixel as the pixel rate of the input stereo pairs, left image ½7:0 and right ima ge ½7:0 as gray scale values of pixels from the left and right images, respectively, and one output, disparity ½7:0, corresponding to disparity value for the selected pixels. The developed FPGA architecture allows to process input stereo pairs of x Â y pixel resolution, where x 8 N and y 2048. Furthermore, this architecture enables to compute the disparity maps by applying the SBAW algorithm using n Â n correlation windows, where n ¼ 2k þ 1 8 k 2 N, and considering a maximum expected disparity equal to 2 k À 1 8 k 2 N. Its general behavior can be described as following: first, the buffer modules store gray scale values of pixels contained in n horizontal lines for both left and right images of input stereo pair. After, the storage_vector modules generate n storage vectors; each vector consists of a register defined by the gray scale values for n vertical pixels stored in one of the horizontal lines stored above. Then, left-disparity and right-disparity values are computed via SBAW modules separately. Later, a multiplexer (mux) sets the final disparity value as the minimum of two disparity values previously computed by the SBAW modules. Finally, the equalizer module converts the final disparity value to gray scale values of 8 bits of depth. In the following subsections, the architecture of all the individual modules is shown in detail.
The buffer module
To store necessary data for the disparity computation, the use of buffer modules is proposed. These modules allow to store the gray scale values corresponding to the pixels contained in n horizontal lines from an image and enables to read all stored lines in parallel. An overview of the FPGA architecture of the buffer module is shown in Fig. 7 . This module consists of three different submodules. The RAM_driver module manages an array of n þ 1 singleport ram units (RAM) assigning to each one the corresponding address, address ½9:0, and the corresponding write-read value, w=r ½n þ 1:0. The w=r ½n þ 1:0 output consists of one logic vector of n þ 1 bits of size, the writeread value of each of the RAMs is determined by each one of the bits of the w=r ½n þ 1:0 output. The outputs of the buffer modules are determined via state machines, which are controlled by horizontal resolution of the input stereo pairs, x resolution ½11:0, and the correlation window size n ½5:0. In Table 3 , the behavior of the state machine for the output w=r ½n þ 1:0 is shown; the number of states is set as n þ 1. n RAMs are in read mode, while one RAM is in write mode for all the states at any time. On the other hand, in Table 4 , the behavior of the state machine for output address ½9:0 is shown.
The RAM module consists of a synchronous single-port ram unit; its general settings were set as: type = synchronous, width = 8, depth = 2048, operation type = single port; all the others parameters are defined as default. These parameters allow to store the gray scale values for each pixel contained in horizontal lines from images of up to 2048 horizontal resolution with 8 bits of color depth. The use of an RAM modules array enables to read the gray scale values of the pixels contained in n horizontal lines from an image, see Table 5 The n_lines_generator module reads the outputs from the RAM modules and determines which RAMs modules are in read mode at any time. To assign lines in the outputs of the n_lines_generator module in ascending form, i.e., pixel 1 ½7:0 = input image line number l, pixel 2 ½7:0 = input image line number l þ 1, pixel n ½7:0 = input image line number l þ n À 1, the outputs from the RAM modules in read mode are assigned to the outputs of the n_lines_generator module as seen in Table 6 ; the first column corresponds to the output w=r ½n þ 1:0 of the RAM_driver module, the second column corresponds to the numbers of the RAM modules assigned to the outputs of the n_lines_generator module.
The storage_vector module
To compute the disparity value via the SBAW algorithm, it is necessary to have stored the gray scale values of all the pixels from the correlation window. However, the buffer module only provides the gray scale values of one of the vertical lines of the correlation window at each time. To store the rest of the values efficiently, use of n registerbased storage vectors is proposed. All storage vectors possess a similar behavior with respect to a shift register unit; however, these allow to read multiple data in one clock cycle. In general, when a line begins, the gray scale value of the pixel with coordinate (1) is stored in index [7: 0] of one storage vector; in the following clock cycle, this value is moved to index [15:8] and the gray scale value of the pixel with coordinate (2) is stored in index [7:0] . A similar process is repeated for all the pixels that integrate the line. In Fig. 8 , behavior of storage_vector module with settings is as follows: number of lines to process = n, v = 8 Â n À 1 is shown. In Fig. 9 , the architecture of the storage_vector module is shown.
The SBAW module
For the computation of the disparity map via the SBAW algorithm, a pixel-parallel and window parallel architecture was designed; the necessary data are obtained from the storage_vector modules, using the appropriate indexes is possible to process video streams at real-time, giving as result disparity maps of ðX À wÞ Â ðY À wÞ pixel resolution, where X, Y corresponds to the values of resolution of the input video stream and ð2 Â w þ 1Þ 2 is the size of the correlation window used. The architecture of the SBAW module is presented in Fig. 10 ; its general behavior is described as following: first, the absolute_differences modules compute the absolute Fig. 8 Behavior of the storage_vector module difference between pixels from left and right images of the correlation window. This process is executed in each of the d max þ 1 absolute_differences modules, implemented in parallel, which are configured for expected disparity levels from 0 until d max , where each module processes only one disparity level and computes the absolute differences only for pixels which are projections of selected pixel, i.e., all pixels belong to the same object, Eqs. 3-5. Then, the output of each of the absolute_differences modules is sent to its corresponding adder module; in this step, adder blocks compute the sum of the absolute differences for all pixels retained in the correlation window. Finally, the minimum module assigns the corresponding index for all correlation values, and then determines the minimum correlation value and set the disparity value as the index of the minimum correlation value. In the developed FPGA architecture, two SBAW modules were implemented in parallel form, where the first module uses the left image as reference and the second module uses the right image.
The minimum module
To reach an appropriate propagation of the processed data, the use of the minimum module is proposed, Fig. 11 . It consists of an index_generator module and k min modules implemented in sequential form. First, the index_genera-tor module assigns the corresponding indexes to all the correlation values from the previous stage; then, the min 1 module receives all the correlation values and their indexes. Afterwards, this module determines the minimum values for correlation values, which are sorted by pairs with unrepeated correlation values for any pair; the minimum correlation values and their indexes obtained here are placed in the vectors (value ½x:0, where x ¼ 16 Â ðd max þ 1Þ À 1, and index ½x:0, where x ¼ 8 Â ðd max þ 1Þ À 1), respectively. This process is repeated in sequential form until only one correlation value and its index are placed in the output vectors.
The equalizer module
In order that disparity values are appropriate for displaying in LCD screens or another output devices, the use of the equalizer module is proposed, Fig. 6 . This module convert the final disparity value to gray scale values through disparity ½7:0 Â256=d max . To reduce the hardware resource consumption, this process was performed with a CASE structure, which considers all expected disparity levels and turns the final disparity value into integer constant value corresponding to the disparity [7: 0] 9 256/dmax operation.
Discussion and analysis of results
The FPGA architecture presented in Sect. 5 was implemented with a top-down approach. All the modules were programmed using Verilog; Quartus II Web Edition version 10.1SP1 was used for the synthesis process. To verify functionality of all the modules individually, post-synthesis simulation in ModelSim-Altera 6.6c was executed.
Simulation results
To evaluate the behavior of proposed algorithm, the developed FPGA architecture was simulated in Model SimAltera 6.6c using different synthetic stereo pairs and different sizes for correlation windows. The selected tests stereo pairs were the Tsukuba, Venus, Teddy and Cones scenes. The window sizes used were: f3; 5; 7; 9; 11; 13; 15; 17; 19; 21; 23; 25; 27; 29; 31; 33; 35; 37; 39; 41g. In Fig. 12 , the behavior of the error obtained in the disparity maps generated for different window sizes for the selected synthetic stereo pairs is shown; this demonstrates the effectiveness of the developed FPGA architecture. Disparity maps have been compared using the method proposed in [31] , in which the percentage of pixels with a disparity error greater than one is computed. Three percentages are computed, one for all non-occluded pixels (nonocc), one for all pixels (all) and one for occluded pixels near depth discontinuities (disc). Performance for occluded pixels is not considered because no algorithm computes occluded pixels explicitly.
On the other hand, Fig. 13 presents the error percentage of all pixels (all) for all evaluated synthetic stereo pairs obtained via SBAW algorithm compared with using the SAD algorithm. For the SBAW algorithm, if a small correlation window is used, error is more important than error for the SAD algorithm because some pixels of the window are not used and effective area of the used window is reduced. However, error in untextured areas is significantly reduced when a large correlation window is used. Error at discontinuities grows with the correlation window but it is smaller than the error for the SAD algorithm. So, we can affirm that performance of the SBAW algorithm is better when a large correlation window is used.
In Table 7 , quantitative results of the number of erroneous pixels obtained by the proposed algorithm for the Tsukuba, Venus, Teddy and Cones scenes, using a 41 Â 41 correlation window compared with other real-time stereo matching algorithms, reported in the literature, are presented. To process and collect the data presented in Table 7 , the developed architecture was scaled and synthesized to operate with the appropriate maximum expected disparity values. For all cases, Quartus II Web Edition version 10.1SP1 was used for the synthesis process and simulations in ModelSim-Altera 6.6c were executed. By analyzing Table 7 , it is observed that the results of the algorithm present a improvement regarding most of the real-time-stereo matching algorithms reported in literature.
In addition, similar to the majority of these algorithms, the proposed algorithm presents a high performance with small values of maximum disparity (Tsukuba, Venus scenes); whilst, a medium performance with high values of maximum disparity (Teddy, Cones scenes) is observed. The Furthermore, comparisons with respect to other adaptive window algorithms, such as the SMW algorithm [8] , the Census algorithm [38] and the Hirschmüller (HIR) algorithm [15] , were performed. To perform comparisons, the synthetic image shown in Fig. 15a is used. Two textured objects are present in the synthetic scene, which appear as a square and as the background in the image. Figure 15b shows the ground truth map, where well-defined edges correspond to depth discontinuities. Figure 16 shows the disparity maps computed by all the evaluated algorithms using a 27 Â 27 correlation window.
In the areas corresponding to a single object, all the algorithms estimate the disparity precisely, because the correlation window is large; however, the averaging effect generates errors at depth discontinuities, which is clearly visible in the disparity maps of the SMW and HIR algorithms (Fig. 16a, b) . The HIR algorithm reduces errors at discontinuities, but there are still false matchings due to the central window, which is always used. Square windows used in the SMW algorithm are well adapted for this image. However, there are false matchings at the corners of the central square object. The performance of the Census algorithm is worse because of the repetitive pattern in the image. With the SBAW algorithm (Fig. 16d) , the estimated disparity map is very similar to the ground truth, even near both depth discontinuities and at the corners of the central square object. Table 8 shows the numerical values obtained by performing this comparison. All values shown in this table were obtained by Matlab implementations for the HIR, SMW and Census algorithms. The erroneous pixels were measured by applying the Eq. 7; where I 1 is the ground truth map. I 2 is the generated disparity map by a particular algorithm. x is the horizontal resolution of the input image. y is the vertical resolution of the input image and N is set as x Â y. In all the cases, a similar setup was applied and similar behavior for different test scenes was observed. 
In Table 8 , only two percentages are computed, one for all the pixels and one for the pixels near discontinuities, because objects are well textured. The quantitative comparison demonstrates that the error percentages increase with window size for SMW and HIR algorithms, but decrease with window size for the SBAW algorithm. In the SMW algorithm, the window is adapted according to the local texture as confirmed by the low error percentages, but a large window is not well adapted at the corners and the error percentages rise up with the size window. Percentage errors in the Census algorithm are high due to the repetitive patterns in the image, but their performance in discontinuities is better than the HIR algorithm. Applying SBAW algorithm using a small window, errors are caused by a lack of information in the correlation window. On the other hand, with large windows, the errors near to depth discontinuities are avoided with the SBAW algorithm. This behavior is confirmed by low error percentages of the SBAW algorithm for pixels near to depth discontinuities. For the SBAW algorithm, the best performance is obtained with a large window. This is a difference and an advantage with respect to the other algorithms where the window size must remain small. Tables 9 and 10 present a comparison of the use of hardware resource regarding all the synthesized and simulated configurations of the developed FPGA architecture. By analyzing Fig. 12 , the acceptable behavior for the SBAW algorithm can be determined using a 21 Â 21 correlation window; in this case, the hardware consumption for the developed FPGA architecture is appropriate for the majority of the medium gamma of FPGA devices such as the Stratix III family of Altera or Spartan III family of Xilinx; however, for higher window sizes, only high gamma FPGA devices such as the Stratix V family of Altera support the hardware resource consumption. It is the user decision to select the configuration of the SBAW algorithm more appropriate to his particular requirements.
Finally, Table 11 presents comparisons of processing speed regarding other real-time stereo matching algorithms reported in the literature. Due to the mathematical simplicity of the proposed algorithm, the developed architecture does not require complex arithmetical operations such as calculation of quotients and radicals (which require a high runtime); hence, it maintains a high processing speed. When comparison of processing speed is conducted, Table 11 , an increase is observed with respect to other algorithms implemented in FPGA devices of up to 93,061,120 pixels per second.
Implementation results
The developed FPGA architecture was implemented in a FPGA Cyclone IIEP2C35F672C6 embedded in the development board DE2 of Altera and the selected configuration for the SBAW algorithm was d max ¼ 15, 2 Â w þ 1 ¼ 15. To acquire input stereo pairs, a TRDB DC2 board connected in the first port of expansion of the DE2 board is used. TRDB DC2 board provides stereo pairs of 1280 Â 1024 pixel resolution in RGB scale. To determine the gray scale value of the input stereo pairs, the value of the green channel was used as a gray scale value. With the purpose of reaching appropriate values to the environmental characteristics of the input scene, the implementation enables to configure the exposition of the cameras. For assigning the exposure value of the cameras, four push buttons of the DE2 board are used. The function of each of these push buttons is detailed in the Table 12 .
Output disparity maps were displayed in a terasIC 4.3 in. LCD screen of 800 Â 480 pixel resolution connected to the second expansion port of the DE2 board. The processing speed of the FPGA implementation is equal to 76 fps (99,614,720 pixels per second) for the input stereo pairs of 1280 Â 1024 pixel resolution. The resource consumption of the implemented architecture is shown in Table 13 . 
Conclusions
In this article, an area-based algorithm suitable for realtime stereo matching using an adaptive window technique based on a gray scale similarity criterion was presented.
Only selected pixels are used in the window according to their similarity to the central pixel. Furthermore, a technique to determine similarity criterion has been described and it was demonstrated that even using a simple similarity criterion, the SBAW algorithm outperforms other adaptive window algorithms reported in the literature. The best performance of the SBAW algorithm was obtained with a large window appropriated for homogeneous areas. However, since the effective size and shape of the window were adaptive, blurring effects at discontinuities are avoided.
To improve its processing speed, the proposed algorithm was implemented in a FPGA device. The developed FPGA architecture outperforms other real-time stereo matching algorithms in the literature, allowing high accuracy level and enables both increasing the processing speed and to be implemented in the majority of the medium gamma FPGA devices.
Furthermore, an important characteristic of the presented architecture is the scalability permissible; all the modules and submodules which integrate the developed FPGA architecture easily allow to be adapted for processing of larger correlation windows than the simulated and implemented correlation windows. On the other hand, the FPGA architecture enables to configure different levels of maximum expected disparity (d max ); consequently, it is possible to configure the module for the computation of disparity maps with appropriate values to the environmental characteristics of the input video streams. This allows that the developed architecture can be applied to a wide range of applications of real-time stereo vision such as positioning systems for mobile robots and recognition, detection and tri-dimensional reconstruction of objects.
