Abstract-Motion estimation algorithms are used in various video coding systems. With the advent of VLSI technology, a large collection of processing elements can be assembled to achieve high-speed computation economically. Rather, the problem of testing a VLSI chip begins with introduction of a defect during the design or implementation phases. Therefore, this paper describes a novel testing scheme of motion estimation. The key part of this scheme is to offer high reliability for motion estimation architecture. The experimental result shows the design achieve 100% fault coverage. And, the main advantages of this scheme are minimal performance degradation, small cost of hardware overhead and the benefit of at-speed testing.
INTRODUCTION
In recent years, multimedia application has become more flexible and powerful with the development of semiconductors, digital signal processing, and communication technology, in which the latest video standard [1] , known as H.264 and also MPEG-4 Part 10 Advance Video Coding, is regarded as the next generation video compression standard. For video coding systems, motion estimation (ME) [2] [3] is the most computationally demanding component in a video encoder. It is known that about 60%-80% of the total of computation time is consumed in motion estimation. The motion estimation algorithm used also profoundly influences the visual quality of reconstructed images. More accurate predictions increase the compression ratio and improve peak signal-to-noise ratio (PSNR) at a given bit-rate. Since the motion estimation algorithm is not specified in the video coding standards, many algorithms exist with different H/W complexities, power consumptions and processing times.
Due to the rapid advance in semiconductor fabrication technology, a large number of transistors can be integrated on a single chip. However, integrating large number of processor on a single chip results in the increase in the logic-per-pin ratio, which drastically reduces the controllability and observability of the logic on the chip. Consequently, testing such highly complex and dense circuits become very difficult and expensive.
Currently, built-in self-test [4] - [8] is an effective approach for testing integrated circuits (ICs) that reduces the need for external testing, since the circuit and its tester are implemented in the same chip, enabling the circuit to test itself.
In this paper will present a minimal performance penalty BIST scheme with significantly smaller area overhead. As a result, after employing the presents test method, the circuit will have 100% fault coverage with very low test application time at low area overhead. The experimental results prove the effectiveness and value of this work.
The rest of this paper is organized as follows: Section 2 introduces the motion estimation and their corresponding parallel architectures. Design-for-testability techniques for the motion estimation and the corresponding BIST schemes are proposed in Section 3. Section 4 provides the experimental results. Finally, Section 5 concludes this work.
II. MOTION ESTIMATION
The motion estimation process comprises a strategy to search for the motion displacement offset, i.e., the motion vector (MV), and a matching cost metric computation, such as the sum of absolute differences (SAD) or the sum of squared differences (SSD). The search strategy aims at selecting a set of candidates. Finally, it selects the one with the minimum cost. After the encoder selects the MV that minimizes the cost metric, it encodes the difference block (prediction residual) between the original and motion compensated blocks. Each residual block is transformed, quantized, and entropy coded.
Motion estimation algorithms exploit the temporal redundancy of a segmented video sequence. Among all the search algorithms, the full-search block-matching algorithm has been shown to produce the best results in terms of finding displacement vectors (MVs). Such algorithms are implemented in two stages, namely the calculation of the sum of absolute differences (SAD) for each displacement vector, followed by method for finding the smallest SAD values. This is summarized by (1) and (2 
Here, ( , ) C k l and ( , ) R i k j l + + represent the current picture frame and search region's macroblock [9] displacements, respectively. In order words, block matching is performed by a sequential exploration of the search area, while the computation of each distortion is performed in parallel. Each of the AD nodes of corresponding structure is implemented by an absolute difference processing element (AD-PE). And, the parallel architecture for computing the SAD value and its corresponding AD-PE structure are shown in Fig.1 and Fig.2 , respectively. The AD-PE stores the value of ( , )
C k l and
receives the value of corresponding to the current position of the reference in search window. It performs the subtraction and the absolute value computation, and adds the results to the partial result coming from the upper AD-PE (see Fig.2 ). The partial results are added on columns and a linear array of adders performs the horizontal summation of the row sums, and compute The presented test scheme consists of two major steps. To obtain complete controllability and observability for each component in a single AD-PE, which use the ripple-carry adders as the absolute difference value computation and addition units in each AD-PE, as shown in Fig.3 . In this paper design an efficient BIST scheme for the motion estimation as shown in Fig.4 . The proposed BIST scheme includes three modes:
• Normal mode: In this mode, each AD-PEs performs its normal function in motion estimation architecture.
• Test mode: The test pattern generator send test patterns to each AD-PE. The output response are then • Analyze mode: In this mode, output response analyzer can use the compressed signature to determine if the architecture is faulty or not.
A. Test pattern generator
According to the characteristic of one bit full-adder that the test patterns 000~111 can provide single stuck-at faults for 100% fault coverage. So that, the Table I shows the proposed the test patterns which can be cover RCA single stuck-at fault 100%. And, Figure 5 shows the proposed test pattern generator. It can be proved that single stuck-at faults of 8-bits ripple-carry adder can be covered by the test patterns. These test patterns can be simplified as 3-bits patterns by some means and generated by a 3-bits LFSR. The pattern of Table I can be cover RCA single stuck-at fault 100%, because if let C in 、A 0 、B 0 data sequences to be 111 →011→001→100→010→101→110, the next stage carry is generated by itself, so other ripple-carry adder stages carry are generated, too. The A 1 、B 1 of ripple-carry adder stage set value sequences 00→10→01→10→11→11→01. The A 2 、 B 2 of RCA stage set value sequences 10→11→11→01→00 →10→01, the A 3 、B 3 of RCA stage set value sequences 11 →01→00→10→01→10→11. These A n 、B n need to match up different carry as shown in Table 1 . In a word, the test patterns propagation as shown in Fig.6 . 
B. Output response analyzer
Output response analyzer of the two adders can be implemented by Multi-Input Shift Register (MISR) and adding some extra logics. MISR is widely used as the signature analyzer to compact the output response of the circuit under test (CUT). During the test mode, the MISR compact the output responses and the final state of the MISR serves as the signature which is then checked to determine whether the CUT passes or fails under the test. Hence, testing with signature analyzer has the merits of simplicity and low hardware cost because that there is no need to store the entire responses of test patterns. This MISR of ORA have two input data kind of from ADPE output data through some XOR gates and the generates pattern into SET pins, and if the MISR is initialized properly ,the output sequence of MISR after the test should be all '0', which easy to check. And, for example the ORA of 8-bits RCA are shown in Fig.7 . 
IV. RESULTS DISCUSSION
The proposed BIST architecture was realized using Verilog HDL and synthesized Design Compiler of Synopsys. The performance comparisons aim at area overhead, fault coverage and test patterns discussion which are also presented here to verify the good performance of the proposed BIST architecture.
The design is carried out top-down at the gate-level in the system of Quartus II by means of waveform and the design finally passes both the unit test and the integrated test. Figure  8 shows a functionality of the waveforms for AD-PE (as you The fault verification verifies the validity of the test patterns used in the test of adders and the order is to verify if the simplification in test of AE-PE is reasonable. Fig.10 , when test mode is active, TPG will depend on clock to generate the test patterns and the test on sel were set '111111' then add1 sum, add2 sum can show the BIST verification results. When is normal mode, the test on sel were set '000000' then add1 sum , add2 sum can show the AD-PE of motion estimation architecture verification result.
Comparing with [10] , numbers of test patterns, the pin overheads, test time and fault coverage are listed in Table II . By using the proposed BIST scheme and fault models, the single stuck-at fault coverage of each AD-PE can achieve 100%. This perfectly proves the validity of the test patterns. And, the area overhead of the motion estimation architecture including BIST is 0.2%, which is tolerable in industry.
V. CONCLUSIONS
This paper describes a BIST scheme for parallel motion estimation in video coding systems. In the test mode, test patterns are scanned into the motion estimation architecture to test, and test responses are scanned out. All of control signals are generated by the controller. And, experimental results show that the area overhead of the BIST architecture for motion estimation architecture is less than 1%. The fault coverage of each AD-PE can achieve 100%, and it perfectly proves the validity of the test patterns. Moreover, BIST structure can easily be designed and applied to the motion estimation architecture. That means the simplification in the BIST design of AD-PE in motion estimation architecture is reasonable.
