ABSTRACT This paper proposes a low-complexity and high-throughput VLSI architecture for correction of barrel distortion in images acquired by wide-angle cameras. Given with a raw image that is obtained by the low-cost single-sensor cameras that employ the Bayer color filter array (CFA), the proposed architecture can perform the barrel distortion correction (BDC) jointly with color demosaicking, so as to produce the barrel-distortion-corrected color image. The backward mapping process, which maps each pixel location in the distorted image space into that in the corrected image space, is performed incrementally in order to reduce the number of complicated arithmetic units. The sub-pixel image resampling process is executed for each color channel considering the CFA. As a result, the proposed architecture performs BDC jointly with color demosaicking despite low hardware complexity. A prototype of the BDC processor based on the proposed architecture is implemented with 42.1K logic gates with 0.18µm CMOS technology; the correction throughput is 200 Mpixels/s. When compared to the previous single-channel BDC processors, the proposed architecture has the versatile functionality to perform BDC jointly with color demosaicking, even with a low-complexity. The correction efficiency is 2.22 times superior to that of the previous state-of-the-art BDC processor, where the correction efficiency is defined carefully to consider the correction throughput as well as the complexity. The correction quality is comparable to the previous one in terms of the peak-signal-to-noise ratio.
I. INTRODUCTION
Wide-angle cameras are extensively used in industrial imaging systems such as video surveillance systems, automotive black boxes, imaging radars, and clinical endoscopes. However, a contortion called barrel distortion generally appears in images acquired by these cameras due to inevitable optical aberrations [1] . This causes serious problems in the aforementioned applications. The problems caused by the barrel distortion might be prevented to some extent by employing a lens whose physical characteristics are close to the ideal so that the distortion is not noticeable. However, this is not feasible in practice for the realization of a cost-effective system because such lens is very expensive in general. Therefore, barrel distortion is usually corrected by employing digital signal processing techniques as presented in [2] - [11] . In addition, when application systems using wide-angle cameras need to be miniaturized, it is crucial to implement barrel distortion correction (BDC) in a low-complexity hardware.
Aiming at the efficient realization for a low-complexity hardware, several researchers studied BDC from the implementation perspectives [6] , [8] , [9] , [12] . They were successful to achieve a real-time correction with a low-complexity hardware, but are not efficient at all for the correction of the multi-channel color images, because they have to perform BDC for each channel separately though the distortion is monochromatic. This paper presents the design and implementation of an efficient processor to perform BDC in singlesensor cameras that employ the Bayer pattern as a color filter array (CFA). In contrast to the previous processors that were developed with the focus on BDC for a single-channel image, the proposed processor performs BDC jointly with the color interpolation in a low-complexity hardware; thereby, produces a distortion-corrected multi-channel image for a distorted Bayer-pattern image. The contributions of the paper are summarized as follows.
• This paper presents an efficient VLSI architecture to perform BDC jointly with color demosaicking. The proposed architecture is designed so that the backward mapping process is executed once for each pixel location in the Bayer pattern incrementally. The sub-pixel image resampling process is executed for each color channel considering the Bayer pattern. As a consequence, in the proposed architecture, BDC and color demosaicking are merged effectively, and thus overall system complexity may become significantly low.
• A prototype processor is implemented based on the proposed architecture and the implementation results are evaluated elaborately by comparing with those of the previous work. The correction quality is also evaluated in both objective and subjective manner. In terms of the figure of merit which is defined to consider the hardware complexity as well as the correction throughput, the proposed processor is 2.22 times more efficient than the previous one. The correction results of the proposed processor are comparable to those of the previous one in terms of the peak signal-to-noise ratio (PSNR). To the best of the author's knowledge, this is the first work to combine BDC with color demosaicking and shows the proof of the concept by realizing a VLSI implementation. This paper is the extension of the preliminary work that was presented in [12] . In addition to the method to perform BDC jointly with color demosaicking, which was presented in the preliminary work [12] , this paper presents a method to process the backward mapping incrementally and the efficient architecture based on it. Furthermore, this paper has been extended from the preliminary work in order to elaborate the evaluation of the proposed processor from various aspects that include the correction quality and the ASIC/FPGA implementation results. This paper shows how the proposed processor achieves the comparable correction quality, and illustrates how other interpolation schemes can be employed to achieve a superior correction quality, as well.
The rest of the paper is organized as follows. Section II presents the BDC process and reviews previous studies on efficient hardware architecture for performing BDC. Section III describes the proposed architecture. Section IV evaluates the proposed architecture by investigating the implementation results and the correction quality. Finally, Section V concludes the study.
II. BACKGROUND A. BARREL DISTORTION CORRECTION
As the shape of a lens is not ideal, its magnification factor is not uniform, and thus images acquired through a lens are geometrically distorted. In particular, for wide-angle lens with a substantially short focal length, the magnification factor usually decreases from the optical center such that the image seems to have been mapped to a barrel. Such an effect is called barrel distortion and is illustrated in Fig. 1 . In practice, the barrel distortion is usually considered to be uniform along a circle around the optical center, and thus most of previous studies including [4] , [6] , [8] - [10] , [13] assumed that the distortion is radially symmetric, so does this study.
Barrel distortion can be modeled as the movement of pixels from the corrected image space (CIS) to the distorted image space (DIS) [4] . As shown in Fig. 1 , a pixel in CIS is shifted by barrel distortion and mapped to another pixel in DIS. In the figure, θ is equal to θ because of radial symmetry; hence, the two triangles in the two image spaces are different in size but similar to each other. The similarity ratio of the two triangles is defined as a scale factor s and is expressed as a mapping polynomial of r [5] , [8] - [11] :
where r denotes the Euclidean distance between (x, y) and the optical center in CIS, and a k is the distortion coefficient of r 2k . Incorporating s, we can find the location of a pixel in DIS from the corresponding pixel in CIS, as expressed in the figure. This expression is in fact backward mapping, as it describes a mapping from a CIS pixel location into a DIS pixel location. Given a distorted image, its undistorted version can be reconstructed by copying the intensity of a DIS pixel into that of the corresponding CIS pixel. As the DIS pixel location calculated by backward mapping may not be an integer location, we have to estimate the intensity of a sub-pixel point. Bilinear interpolation, which combines the intensities of four neighboring pixels, is usually employed for sub-pixel image resampling required in the BDC process [6] - [10] .
B. PREVIOUS WORK
The BDC process requires high computational complexity. To reconstruct CIS, we have to perform backward mapping for every DIS pixel. As described in the previous subsection, this process requires computation of the mapping polynomial to obtain the scale factor. In addition, interpolation has to be performed to acquire the sub-pixel intensity. Real-time applications demand a high correction throughput, so to achieve a high-throughput and low-complexity implementation of BDC, dedicated VLSI implementations have been more preferable to any other possible implementations using a software [5] or a graphic-processing unit [14] . Targeting the real-time and low-complexity applications, this paper presents the VLSI design of a BDC processor.
Several researchers presented efficient VLSI architectures to perform BDC. Ngo et al. presented a pipelined architecture to perform a backward-mapping-based BDC process [6] . Chen et al. [8] proposed a low-complexity architecture by manipulating the backward mapping process to avoid complicated mathematical operations such as trigonometric functions. A multi-cycle architecture was proposed, where the hardware units required in each step in the BDC process are shared in a time-multiplexed manner so that hardware complexity can be reduced by sacrificing the correction throughput to some extent [9] .
III. PROPOSED ARCHITECTURE A. MOTIVATION AND OVERALL ARCHITECTURE
Most previous studies were focused on developing efficient architectures to perform BDC for single-channel images [6] - [10] , and there has been little consideration to perform BDC efficiently for multi-channel color images. In single-sensor camera systems, where the Bayer pattern is usually employed as a CFA [15] , each channel of a color image has to be interpolated with a given raw image in the Bayer pattern. This process is called color demosaicking [16] and it usually precedes BDC [17] , [18] . BDC for a color image can be implemented by instantiating a conventional single-channel BDC processor to perform BDC for the image of each color channel, as shown in Fig. 2a . However, this is not efficient when attempting to realize a low-complexity system because the result of the backward mapping process for a pixel location is monochromatic, i.e. identical irrespective of the color channel [19] - [21] .
The overall system architecture with the proposed BDC processor is shown in Fig. 2b . The proposed architecture has single backward mapping unit to perform the backward mapping process for each pixel location in a raw image, whereas in the conventional architecture, there are as many backward mapping units as the number of the color channels because BDC is performed separately for each color VOLUME 6, 2018 FIGURE 5. Internal structure of the backward mapping unit in (a) the conventional architecture based on the non-incremental calculation [6] , [8] , [9] and (b) the proposed architecture based on the incremental calculation, where and denote the bitwise shift operators, SQ means the square unit, and MAC means the multiply-accumulate unit.
channel. In addition, the backward mapping unit in the proposed architecture is performed in an incremental manner in order to reduce hardware complexity. The color interpolation unit performs the sub-pixel image resampling process for each color channel considering the Bayer pattern. As a result, the proposed architecture executes color demosaicking along with BDC effectively while having low complexity. The overall processing flow of the proposed architecture is shown in Fig. 3 . The proposed architecture performs the backward mapping and the color interpolation for each pixel location in CIS. In the backward mapping, the squared radius and the scale factor corresponding to a CIS pixel location are calculated so as to find the DIS pixel location as illustrated in Fig. 1 . Given with the DIS pixel location resulting from the backward mapping, the pixel intensity for each color channel is estimated considering the Bayer pattern.
B. INCREMENTAL BACKWARD MAPPING UNIT
The backward mapping process entails calculation of the mapping polynomial to obtain the scale factor of the pixel location. In the proposed architecture, the backward mapping is performed based on the incremental process which was presented in the previous work [10] . In the mapping polynomial expressed in (1) , N affects the correction quality and is set to 2 in the proposed architecture because the improvement of the correction quality is known to be insignificant when the degree of the mapping polynomial is greater than 4 [4] , [11] .
Let (x n , y n ), r n , and s n denote the location, the radius, and the scale factor, respectively, for the n-th pixel in the BDC process. In the proposed architecture, correction is performed in a raster-scanning order, as shown in Fig. 4 , where the relation between two consecutive locations can be expressed as follows:
where (x 0 , y 0 ) denotes the top-left position of an image. If the backward mapping process of (x n+1 , y n+1 ) is implemented straightforwardly, r 2 n+1 should be calculated first, followed by the calculation of s n+1 for scaling. As indicated in (1), the direct calculation of s n+1 necessitates the computation of r 2 n+1 and r 4 n+1 , which entails high complexity. In the proposed method, however, s n+1 is calculated in an incremental manner without any square or fourth power operations. The incremental calculation of r 2 n+1 can be derived as expressed in (3)
and r 2 n+1 + r 2 n can also be calculated incrementally on the basis of (3). Consequently, the incremental calculation of the scale factor can be derived as expressed in (4), as shown at the bottom of this page.
The backward mapping unit in the proposed architecture is designed based on the incremental process described above. In Fig. 5 , the structure of the backward mapping unit in the proposed architecture is compared with that in the conventional architecture. As shown in the figure, the backward mapping unit in the proposed architecture has considerably fewer complicated arithmetic units such as square operators than the conventional one. In addition, the calculations of (x n+1 , y n+1 ), r 2 n+1 , and s n+1 can be performed in parallel because these calculations have no data dependencies in the incremental process, whereas in the conventional one, the calculations have to be performed serially. As the backward mapping unit is pipelined to achieve a high correction throughput, the parallel operations reduce the number of the pipeline stages considerably compared to the conventional architecture.
C. COLOR INTERPOLATION UNIT
In the proposed architecture, the sub-pixel image resampling process, which is required to obtain the intensities at non-integer pixel locations resulting from the backward mapping process, is performed for each color channel given with the pixels in the raw image. As a result, the proposed architecture produces a distortion-corrected color image for a distorted raw image. The raw image is acquired by a camera that has a single photo sensor, and thus, one color component, at each pixel location. The Bayer pattern is usually employed for arranging color components; Fig. 6 shows the Bayer pattern considered in the proposed architecture. In the figure, a red (R) component is located at (2k 1 −2, 2k 2 −2), a blue (B) component is located at (2k 1 − 1, 2k 2 − 1) , and a green (G) component is located at (2k 1 −1, 2k 2 −2) or (2k 1 −2, 2k 2 −1),
, H denotes the number of the pixels in each column of an image, and a means the smallest integer greater than or equal to a.
The pixel intensity of each color channel is obtained by carrying out the bilinear interpolation with the intensities of the four neighboring pixels that have color components corresponding to the channel. To illustrate how interpolation is carried out, let us take an example case when (x , y ) is located as shown in Fig. 7 , where (x , y ) denotes the DIS pixel location resulting from backward mapping of the CIS pixel location (x, y). In the figure, a int and a frac denote the integer and the fraction part of a. As illustrated in the figure, the intensity of each color channel is obtained by combining the intensities of the four neighboring pixels based on the bilinear interpolation, which can be expressed as (5) - (7), as shown at the bottom of the next page. In the expressions, CIS.α(x, y) denotes the pixel intensity of the channel α ∈ {R, G, B} at (x, y) in CIS, DIS(x, y) denotes the pixel intensity at (x, y) in DIS as given in the Bayer pattern, and we have used 1 − a ≈ a for a ∈ x frac , y frac , where a means the bitwise inversion of a. Here have been shown how interpolation is performed for one case, but it can be conducted similarly for other cases as well, which has been omitted for brevity.
In the proposed architecture, the color interpolation unit calculates the intensities at the sub-pixel location for each color channel. The color interpolation here is slightly different from the traditional color interpolation [16] in that it is carried out in order to acquire the pixel intensities at the sub-pixel locations which are calculated from the backward mapping unit. The internal structure of the color interpolation unit is shown in Fig. 8 . By considering the pixel location calculated from the backward mapping unit, the address generation unit generates the addresses at which the pixel intensities to be used for interpolation are read from an external memory. The proposed architecture is designed by assuming that the entire pixel data in DIS can be accessed from the external memory,
for Case II (4) VOLUME 6, 2018 as in the architectures of the previous BDC processors [6] , [8] - [10] . The weighting factor generation unit calculates the weighting factors to be used for interpolation for each color channel. The interpolation unit performs the bilinear interpolation for each color channel by combining pixel intensities read from the memory according to their weighting factors.
IV. EVALUATION A. CORRECTION QUALITY
Here shows that the correction quality achieved by the proposed architecture is comparable to that achieved by the conventional architecture. For the evaluation of the correction quality, the quality of the correction results obtained by the proposed architecture were compared with the quality of those obtained by the conventional architecture, where the same mapping polynomial was used consistently for each BDC process and sub-pixel image resampling was performed in the same way by bilinear interpolation as did in the previous BDC processors [6] , [8] - [10] . To focus on a fair comparison of the correction quality, the results of the conventional architecture were obtained by performing BDC separately for each channel of the color image, where the color image was generated in the equivalent way as in the proposed architecture, i.e, bilinear interpolation [22] , [23] . The quality of the correction results for the conventional and the proposed architectures was measured in terms of the PSNR, where the referential results were obtained by performing BDC separately for each channel of a color image. Table 1 summarizes the quality of the correction results, where it is noteworthy that the absolute difference between the quality achieved by the proposed and the conventional architectures is so small. Fig. 9 illustrates one of the correction results in the table in order to demonstrate the subjective quality. In fact, for R and B channels, both the proposed and the conventional architectures use the same set of the four neighboring pixels corresponding to the channels so as to obtain the pixel intensities according to the bilinear interpolation. However, for G channel, the proposed architecture uses the set of the four neighboring pixels in the diamond-shaped square for the interpolation; whereas the conventional architecture uses a different set of the pixels. For instance, in Fig. 10 , the proposed and the conventional architectures calculate R p by the linear combination of R 2 , R 4 , R 10 , and R 12 , identically. To calculate G p , however, the proposed architecture uses the linear combination of G 3 , G 6 , G 8 , and G 11 ; whereas, the conventional architecture uses the linear combination of G 9 , G 14 , G 3 , G 6 , G 8 , and G 11 . As a result, in R and B channels, there is no difference between the results obtained by the proposed and the conventional architectures, while in G channel, there may be some difference. This analysis is validated by the results shown in Table 1 .
B. IMPLEMENTATION RESULTS
The architecture of the processor was described in Verilog hardware description language (HDL) and synthesized using Synopsys Design Compiler with a standard logic library. Table 2 summarizes the ASIC implementation results of the proposed processor and the previous BDC processors. The proposed processor performs triple-channel BDC jointly with color demosaicking, while the other processors perform single-channel BDC only [8] - [10] . That is, each of the previous processors corresponds to one of the three BDC processors shown in Fig. 2a . It is worth noting that the correction quality of the BDC processors in the table is comparable to each other as the sub-pixel image resampling is performed in the same way. Considering that the proposed processor achieves such functionality that is equivalent to those of the three instantiations of a previous BDC processor in combination with a color demosaicking processor, its hardware complexity is very low. In the table, the figure of merit is the measure of the correction efficiency showing the hardware complexity as well as the correction throughput. In terms of the figure of merit in Table 2 , the proposed processor is 2.22 times superior to the previous one [10] , even though the proposed processor can perform color demosaicking, in addition. The proposed processor can compute one CIS pixel intensity for each channel of a color image without stalls when four DIS pixel intensities are available simultaneously. For the BDC processors, the DIS pixel intensities are usually read from the external memory, and the bandwidth requirement per channel of the proposed processor is the same as that in the previous one [8] , as shown in Table 2 . The bandwidth requirement is quite low in the processors presented in [9] and [10] ; however the correction throughput is not so high even with the high operating frequency and the processor presented in [10] requires additional internal buffers.
The proposed processor was also implemented using an FPGA device. The architecture that was described using HDL for the ASIC implementation was imported and synthesized targeting a low-cost FPGA device (Intel EP20K600EB) by using Quartus II 15.0. The number of the total logic elements is 8.21K. The correction throughput is 138 channels·Mpixels/s at the operating frequency of 46MHz.
C. DISCUSSION
This paper shows an efficient method to combine BDC and color demosaicking and its efficient hardware architecture, VOLUME 6, 2018 FIGURE 9. BDC results of sample image 1 in Table 1 : (a) an original distorted raw image in the Bayer pattern, (b) the referential result, (c) the result in the conventional architecture, and (d) the result in the proposed architecture. which is validated in a prototype VLSI implementation. The proposed method is motivated by the observation that follows. The sub-pixel image resampling, which is required for the backward-mapping-based BDC, can be considered to be the estimation of the pixel intensity at a non-integer location given with the pixel intensities at the integer locations, which FIGURE 10. Example of a pixel mapped into the DIS Bayer pattern image, where C p stands for the pixel intensity of C channel at the location of p, where C ∈ {R, G, B}. FIGURE 11. Sub-pixel resampling for CIS.G(x, y ) using the bicubic interpolation, where (x , y ) has been obtained by the backward mapping process and both x int and y int are odd and x frac + y frac < 1.
is similar to the color demosaicking that estimates the pixel intensity of a missing color component given with the pixel intensities in the Bayer pattern. In fact, the proposed method is equivalent to the color demosaicking for the noninteger pixel locations resulting from the backward mapping process in BDC.
Even though the proof-of-concept has been conducted by employing the basic bilinear interpolation for the color demosaicking method, which results in the zipper artifacts observed in Fig. 9 , it should be noted that the proposed joint method does not oblige the use of the bilinear interpolation. Since the proposed joint method is considered to be the color demosaicking for the noninteger pixel locations, more advanced interpolation schemes such as those presented in [16] , [24] , and [25] can be employed in order to improve the correction quality, which will be performed in further work. For instance, in the appendix, it is described that the traditional bicubic interpolation can also be incorporated in the proposed method.
V. CONCLUSION
This paper proposed an efficient architecture to correct barrel distortion in images acquired using wide-angle cameras. By exploiting the correction order of pixels, backward mapping is performed incrementally to reduce hardware complexity. Sub-pixel image resampling is performed for each color channel by considering the CFA. As a result, the proposed architecture efficiently performs BDC jointly with color demosaicking. The implementation results show that the proposed architecture has very low complexity even with the versatile functionality.
the four values, p 0 through p 3 , at the uniformly-spaced locations, and t, 0 ≤ t < 1, represents the normalized position between the second and the third locations. The readers interested in how the bicubic interpolation has been derived based on the one-dimensional cubic interpolations can be referred to [26] . Even though here have been presented that the bicubic interpolation can perform the sub-pixel image resampling, which is required in the BDC process, for one case, it can also work for other cases in similar ways, which has been omitted to avoid the verbosity.
ACKNOWLEDGMENT
Even though Hui-Seong Jeong has so little direct contribution that he cannot be included in the author list, I appreciate his devotion and endeavour as some amount of this study is originated from his thesis work [12] . The EDA tools were supported by IDEC, South Korea.
