Meanwhile, the data bandwidth and high-interconnection
increasing demand of various platform-based SoC designs
The chip implementation of RCA will be debated in the has evolved in the multimedia field. In the video same section. We give a brief conclusion in the last applications, DSP processor [1] [2] [3] or/and accelerator section. designs [4] [5] Step 2: GPE Exploration accommodate other computations. In Fig. 3 algorithms, the interconnection flexibility will be confined baSelinep3RCA GPExplsratbut truly simple compared to that of [8] .
Step fabric with minimum number of GPE's can be obtained.
Otherwise, repeat step 3.
Four 8-bit additions/subtractions
Two 16-bit additions/subtractions 4x8 Video Algorithms , ,)
One three 16-bit input addition r Fig. 3 . The proposed reconfigurable computation array.
Step l -Algorit-m Explortion
Configuration Controller and Memory ----------------------
The key parameters of the configuration controller are --------------Exploran revealed in Fig. 4(a) , where rA and rB denote source
operand address, and rD represents the destination operand
Step-3 RCA Exploratn address, and lpc0, lpcl are loop counter, and bAO, bBO, bDO, bAl, bBl, bDl denote offset. The function of the
configuration controller is to manipulate the selection of
Step 4Pefrnc
Step4-----rformance-----the following modes: ME, RGB2YUV, DCT/IDCT, and FIR filter. The pseudo code can be illustrated in Fig. 4(b Fig. 5 . The operation
procedures are illustrated on the right-hand side of Fig. 5 .
GP3Data pass
When we operate one 8x8 block matching, the pointer- 
-64, 0 and 2, respectively. The detailed operations are Fig. 6 . Configuration of RGB2YUV. described as follows, the loop lpcO records 4 iterations for 3.3 8x DCT/IDCT one 8x8 block matching. In the lpclI loop, in order to move The DCT/IDCT is the key computation for MPEG-4 rA to the start point of the reference block, we set bA1 =-64. standard. When the 8x8 DCT as shown in Fig. 7 
X16x8 X16x8
Two 8x-bit input data 
chrominance. Thatis why we convert RGB color space to 4. Comparison Results and Implementation~~~~~~~Ỹ UV space. Moreover, in most video codecs, theIn this section, we give-------------------------------comprehensive------comparison----RGB2YUV conversinis one of hug consuming resuls as listed in able 2 and Fig. 16inbtermsdofith and performance as well as hardware utilization rate. In 0.13 um CMOS process. The critical delay time obtained terms of the number of GPE's as listed in Table 2 , the RCA from the static timing analysis (STA) of Synopsys is 12.5 can be saved by a factor of 25% compared with [8] . In ns (i.e., 80 MHz) under the worst-case condition. addition, the 12 GPE's can be equivalently regarded as 3 16xl6 bit multipliers. Therefore, the proposed RCA owns 5. Conclusion the lowest hardware cost among three structures [3, 8] . In
We he ve contributed a cost-effective accelerator based Table 2 , it is manifest that our proposed architecture has on the novel RCA fabric without sacrificing performance superior performance to that of [3] and keeps the same for the platform-based SoC design. Via the proposed RCA, performance as that of [8] . Applying the proposed the RCA usage rates for ME, RGB2YU and DCT/IDCT cost-effective RCA architecture, 25%, 18.7% and 23.9%
can be improved by 250o, 18.7%, and 23.9%, respectively, hardware utilization improvement as shown in Fig. 8 can compared with that of [8] . be landed for ME, RGB2YUV and DCT/IDCT
