Scene changes occur frequently in film broadcasting, and tend to destabilize the performance with blurred, jagged, and artifacts effects when de-interlacing methods are utilized. This paper presents an efficient VLSI architecture of video de-interlacing with considering scene change to improve the quality of video results. This de-interlacing architecture contains three main parts. The first is scene change detection, which is designed based on examining the absolute pixel difference value of two adjacent even or odd fields. The second is background index mechanism for classifying motion and non-motion pixels of input field. The third component, spatial-temporal edge-based median filter, is used to deal with the interpolation for those motion pixels. Comparing with the existed deinterlacing approaches, our architecture design can significantly ameliorate the PSNRs of the video sequences with various scene changes; for other situations, it also maintains better performances. The proposed architecture has been implemented as a VLSI chip based on UMC 0.18-µm CMOS technology process. The total gate count is 30114 and its layout area is about 710 × 710-µm. The power consumption is 39.78 mW at working frequency 128.2 MHz, which is able to process de-interlacing for HDTV in real-time.
Introduction
Video de-interlacing techniques are important today to improve the quality of display because of the popularity of progressive monitoring devices, such as LCD displays, PC monitors, and HDTV that requires a progressive scan format. Video de-interlacing is a picture format conversion that changes interlaced images to progressive images. The traditional NTSC system is broadcast in an interlaced scan format, which reduces both the bandwidth required and large area flicker. However, the interlaced scan technique creates undesirable visual artifacts and makes the lines flicker, twitter, and crawl.
Numerous de-interlacing techniques have been proposed for interlaced-to-progressive scan conversion [1] - [20] ; they can be roughly classified into four de-interlacing methods: intrafield de-interlacing, interfield de-interlacing, motion adaptive de-interlacing, and motion compensated de-interlacing. Intrafield de-interlacing [2] - [7] uses a single field to reconstruct one complete frame. The edge-based line average, ELA [2] , method is widely used; it extracts edge information and calculates the average between lines as interpolation. This method provides good results when the edge can be correctly estimated. Nevertheless, it has shortcomings when incorrect edge information is used, and it is sensitive to small pixel values. Interfield de-interlacing [2] , [8] - [11] generates a full progressive frame by directly merging two consecutive fields. Normally, the video quality is better than that of intrafield de-interlacing in a static area, but a linecrawling effect occurs in a motion area. Motion adaptive deinterlacing [9] - [14] has the advantages of both intrafield deinterlacing and interfield de-interlacing. If non-motion is detected, interfield de-interlacing is able to present a pleasing resolution with low computational complexity; otherwise, intrafield de-interlacing is used. Motion compensated deinterlacing [9] , [10] , [15] - [20] uses a macroblock to search for a most similar block in two successive even or odd fields and calculate its motion vectors to form a new field. However, this approach is more complex, and it is difficult to obtain good results without reliable motion estimation.
The above de-interlacing techniques are capable of improving the quality of the visual results; nevertheless, their performances are seriously affected by scene change. Scene change in film broadcasting tends to destabilize the quality of performance when de-interlacing technique is utilized; in most cases, it produces jagged, blurred, or artifacts effects. In video sequences, it is highly possible to retrieve incorrect messages from the interfield information and produce artifacts or blurring effects during scene change. Therefore, the issue of scene change quality needs to be addressed in relation to the de-interlacing process.
We propose an efficient video de-interlacing technique with reliable interfield information [14] . This de-interlacing method involves decomposing videos into foreground and background areas. We examine the previous fields and the next field for a period of time (i.e., a few fields). If the values of the pixels remain unchanged, then we can assume the fields are background. At the same time, we check the occurrence of scene change from one frame to its adjacent frames. To improve the quality of de-interlacing, the factors of scene change have to be taken into account when deCopyright c 2007 The Institute of Electronics, Information and Communication Engineers interlacing techniques are applied. This paper presents the VLSI implementation of the proposed method, which provides a simple hardware architecture design, low computation cost, and is easy to implement in real-time hardware.
The Proposed Algorithm
The first stage of de-interlacing is scene change detection, to ensure that the interfield information can be used correctly. If scene change is detected, the interfield information is disregarded, and all interpolated pixels are taken as intrafield de-interlacing interpolation. If no scene change is detected, then the motion adaptive de-interlacing procedure is adopted. In order to obtain precise and stable interfield information, the situation is observed for a few fields to ensure non-motion; afterward, it is classified as a background area and proceeds to interfield interpolation.
Scene Change Detection
Within the same shot, a scene change may cause a frame to differ from its consecutive frame due to factors such as camera movement, focal length change, large object movement, and scene fade. To detect a scene change between two consecutive frames, a dissimilarity measure between the two frames must be defined, mainly based on pixel-based method [21] , [22] and histogram-based methods [21] , [23] . Upon exploring several types of pixel-based and histogrambased algorithms, and finding their dissimilarity measures, we discovered that the pixel-based method provides a simple measure and low-cost hardware implementation for scene change detection.
We now start with the first stage of our method: scene change detection. Let F n−1 (x, y), F n (x, y) and F n+1 (x, y) denote the previous field, the current field, and the next field, respectively, where the two-dimensional spatial indices (x, y) are x = 1, 2, . . . , W and y = 1, 2, . . . , H, and W and H are the width and height of the frame. The absolute pixel difference value of two adjacent even or odd fields (FD n ) is obtained from the absolute difference value between F n−1 and F n+1 fields. Thus, FD n is defined as
FD n is used to define the similarity pixels and the similarity pixels statistics value (SPS n ) for frame number n is obtained by
where a threshold parameter Th needs to be set in advance. Equation (2) is used to detect the condition of scene change between two adjacent even or odd fields, F n−1 and F n+1 . The value of SPS n will present a value less than half of the total pixels of a field when a scene changes.
The Proposed De-interlacing Method
The proposed motion adaptive de-interlacing method involves decomposing videos into foreground and background areas. Two adjacent even or odd fields are recognized as non-motion pixels if their absolute pixel difference value is less than a defined threshold. If the value remains unchanged for a few fields, then it is classified as background, and the remaining pixels are classified as foreground.
The FD n defined in Eq. (1) is also used to define the non-motion pixels. The background index (BI n ) for frame number n is obtained by
Background index stands for the possibility of a pixel in background area. Initially, BI 0 is set to all zero. When scene change is detected, we reset BI n to all zero again. BI n is then applied to classify interpolated pixels as foreground or background. If the value of BI n is greater than the threshold, the interpolated pixel is classified as background; otherwise, it is classified as foreground. The background is filled with the pixels of the previous frame and the foreground is interpolated using Spatial-Temporal Edge-Based Median Filter [5] (ST-ELA). The output frame, OF n (x, y), for frame number n can be obtained by
The ST-ELA is a technique of motion adaptive deinterlacing, which performs the edge-based line averaging on a spatial-temporal window. In the current frame, the value of interpolated pixels, OF n (x, y), could be decided by the ST-ELA method. Let
and m 1 = |a − f |, m 2 = |b − e|, m 3 = |c − d|, m 4 = |p − u|, m 5 = |q − t|, and m 6 = |r − s|. Then the minimum absolute difference value, the pair of (vw), can be obtained by
Let A be defined as the average value of (vw),
Thus, the value of interpolated pixel, F ST−ELA (x, y), can be obtained by
This method could raise the edge-detection consistency by checking the past and the future edge orientation of neighboring pixels.
The Proposed Hardware Architecture
The proposed method utilizes the low computational complexity of a de-interlacing technique to promote higher quality video sequences on progressive devices; however, it is still difficult to achieve real-time interpolation. Our hardware architecture provides a simple design, as well as low computation cost, and is easy to implement in real-time. A pipeline processing of scene change detection and interpolation is used in the proposed hardware architecture. Interpolation could be either F n−1 (x, y), or the results of ST-ELA; meanwhile, the scene change information of the next field is also calculated during interpolation. The block diagram of the hardware architecture for the proposed method is shown in Fig. 1 . The images of the Fig. 1 The block diagram of the proposed hardware architecture. Fig. 2 The architecture of scene change detection. previous field F n−1 , the present field F n , and the next field F n+1 are filed in the storages. While F n−1 (x, y), F n (x, y), and F n+1 (x, y) are used to calculate the results of ST-ELA, the scene change detection for the next field proceeds. If scene change is detected, then the background index is set to zero; the interfield information is ignored, and all interpolated pixels are taken as the operation of ST-ELA. However, if no scene change is detected, then the motion adaptive de-interlacing procedure proceeds with background information of the current field.
The Hardware Architecture of Scene Change Detection Proposed De-interlacing
In order to determine the occurrence of scene change correctly, according to Eq. (1), the hardware calculates FD n+1 (x, y), the difference between the previous field and the next field. The value of FD n+1 (x, y) determines the similarity pixels statistics value, SPS n+1 , and the background index, BI n+1 (x, y), based on Eqs. (2) and (3), respectively. The architecture of scene change detection is shown in Fig. 2 . The data of F n (x, y), F n+2 (x, y), and BI n (x, y) are read first, and then the absolute pixel difference between F n (x, y) and F n+2 (x, y) is determined. The upper four bits of FD n+1 (x, y) are connected to the inputs of an OR gate to examine whether it is greater than or equal to Th, the value of which is set to 16 in the experiment. The output of the OR gate controls the status of increment, hold, or clear of two counters: FD Count and SPS Count [24] . It is low when the value of FD n+1 (x, y) is smaller than Th. The value of BI n+1 (x, y), which is the content of the BI Count counter, as well as the counter SPS Count will increase. When the value of BI n+1 (x, y) is greater than threshold K, the interpolated pixel is defined as background; otherwise it is defined as foreground. The NOR gate is used to hold the original data when BI n+1 (x, y) is greater than K. Conversely, the output of the OR gate is high when FD n+1 (x, y) is greater than or equal to Th. In this case, the content of the BI Count counter, BI n+1 (x, y), will be cleared, but the content of the counter SPS Count, SPS n+1 , will be unchanged.
Summing the similarity pixels of two adjacent even or odd fields, the value of SPS Count is used to determine whether scene change occurs. Scene change happens between F n and F n+2 if the value of SPS Count is less than half of the pixels of a field. In this case, BI n+1 is cleared to zero. When the value of SPS Count is less than half of the total pixels in a field, the contents of the two-bit shiftleft register SC Reg shown in Fig. 2 , are set to high, and the Fig. 3 The architecture of ST-ELA. multiplexer will read 0 in the next field. Otherwise, if the value of SPS Count is greater than, or equal to, half of the total pixels in a field, the two-bit register will shift left one bit and insert 0 in the least significant bit.
The Architecture of ST-ELA
The architecture of ST-ELA, shown in Fig. 3 , includes data input FIFOs, the circuits of absolute difference, the minimum value sorting circuit, and the median value sorting circuit [25] , [26] . The interpolation time for odd and even fields are different since the processing of an odd field is slower and consequently cause system delay. The problem can be solved by the multiplexer MUX0, which balances the processing time of odd and even fields. The output of multiplexer MUX0 is high when the first row of an odd field is being processed; otherwise, it remains in low.
In the operation of ST-ELA, the average of two pixels with minimum directional difference is obtained from Eqs. (5) and (6); then the median value can be found according to Eq. (7). The normal processing of ST-ELA is depicted in Fig. 4 . In order to decrease operation time, it simultaneously calculates the average of two pixels with the minimum directional difference and sorts the median value. The circuits of absolute difference calculate the directional differences, which are input to the comparators; the directional difference in each direction is obtained by subtracting the smaller number from the larger number. The comparators are located in front of the circuit of absolute difference, and as a result, they reduce a stage of comparison while sorting median values. The data format of the comparator in the circuit of sorting the minimum values has eleven bits. The upper three bits are decoded number and the other eight bits determine the directional difference. All the directional differences are input to the minimum value sorting circuit. After the minimum directional difference is found, the decoded number is sent to the multiplexer MUX5 to obtain the average of two pixels with minimum directional difference.
The interpolation for border is treated differently because there lacks some surrounding pixels. The frame border interpolation hardware shown in Fig. 5 is used if the interpolation data is frame border while ST-ELA proceeds. The operation is similar to the normal process of interpolation. The best interpolation results are obtained through the circuit of the absolute different value, the sorting circuit of minimum value, and the sorting circuit of median value. Finally, the operation result of ST-ELA is determined by the multiplexer MUX4, shown in Fig. 3 , to select the interpolation data from the frame border interpolation hardware, or from normal processing hardware.
The Simulation Results and VLSI Implementation
To evaluate the performance of video de-interlacing with scene change detection, two threshold values used in the simulations are T h = 16 and K = 2. We compared and analyzed the software simulation of the proposed method with those of Chen [12] , and Gao [20] . The results of test sequences and their characteristics using scene change de- tection, shown in Table 1 , and Fig. 6 and Fig. 7 , demon- strate that the proposed method presents a more pleasing visual quality. For example, in Fig. 6(d) , artifacts effect is eliminated, and line crawling is reduced; in Fig. 7(d) , artifacts effect is eliminated and blurring effect is effectively decreased. In addition, Fig. 8 illustrates the subjective quality comparison. The original video sequence is shown in Fig. 8(a) ; the results of Chen [12] and Gao [20] are illustrated in Figs. 8(b) and 8(c) , respectively. The results from the proposed method with background index setting, shown in Fig. 8(d) , demonstrate higher quality video sequences than those produced by the other methods. Furthermore, the PSNRs of the test sequence of Fig. 8 , summarized in Table 2, also demonstrate that the proposed method produces a superior performance to that produced by the methods of Chen [12] and Gao [20] . The average PSNRs of the results from the proposed method and from different interpolation methods for various sequences are compared in Table 3 . It Table 2 The PSNRs of News in Fig. 8 . Table 3 The average PSNRs of the various de-interlacing methods on some video sequences. Table 4 The chip specifications. indicates that the results of the proposed method have better PSNR when scene change occurs; in other situations, the proposed method also maintains good performances.
The proposed VLSI architecture has been described in high-level language and verified by the simulations of Verilog description and the simulations of MATLAB. The VLSI architecture implementation was synthesized and generated using the Synopsys Design Compiler with a UMC 0.18-µm, 1.8 V, six-metal-level CMOS standard cell library. The specifications of VLSI architecture are illustrated in Table 4 , and the chip photomicrograph is shown in Fig. 9 . The VLSI architecture has been characterized in terms of power consumption by gate-level simulations. The required working frequency for HDTV is 105 MHz [27] ; thus the frequency of the chip, 128.2 MHz, could process de-interlacing for HDTV in real-time. In Table 5 , the performance of the hardware architecture is compared to that of de Haan [28] . The results demonstrate that the proposed method is more efficient than the other.
Conclusions
This paper presented an efficient VLSI architecture of video de-interlacing with considering scene change for real-time applications. There are three features in this architecture design. In the beginning, the scene change detection scheme ensures that the interfield information can be used correctly. Then the background index is used to record the foreground and background areas of the field. It increases the precision and stability of the interfield information and promotes quality of sequence. At last, the foreground area is interpolated by ST-ELA and the background area is filled with the pixels of the previous frame. Our proposed hardware architecture provides a simple design, as well as low computation cost, and is easy to implement in pipelining. The simulation results have demonstrated that the high-performance architecture of video de-interlacing is able to process de-interlacing with better PSNR in real-time for HDTV. Finally the architecture was implemented as a VLSI chip.
