Abstract
INTRODUCTION
ITH mature digital video technology, inexpensive W camcorders gradually enter our life. Original purpose of MPEC-7 is to provide a powerful search engine which helps people easily find what they are looking for. Several MPEG-7 toolkits integrate useful functionalities for categorizing and organizing their personal collection. However, some rebated research showed that most people only categorize their albums at seinanlic level, and the recognition technique nowadays is still not able to meet this kind of demand [I] . descriptors are good tools for indexing and retrieval but should not be limited to them. MPEG-7 descriptors can be creatively extended and linked to applications such as rate control in real-time video coding and movement detection in surveillance systems. In these applications, computational complexity of the real-time implementation for these descriptors will not be a trivial issue.
With statistics derived from MPEG-7 descriptors, good indication of image and video properties can provide referable adjustment parameters for video pre-processing like auto white balance, RGB gains tuning, saturation control, auto contrast, and edge enhancement. In video coding, it can assist fast algorithm of motion estimation, rate control policy, probability distribution model of entropy coding, and so on. A recent research showed that edge histogram descriptor and SCD are applied to segmentation for content-based video coding [2] , [3] . When we use them in surveillance system, the system can notice police lo keep an eye on unusual behavior by analyzing object trajectory. Face descriptor can also provide auto identification of uncertified people in certain degree.
MPEG-7 visual descriptors record statistics of images and video sequences in color, texture, shape of objects, and motion. Because the variety of possible applications, we first take implementation of color descriptors as our start point. Color is one of important visual attributes for human vision and image processing. It is also an expressive visual feature in image and video retrieval. In MPEG-7, six descriptors are selected to record color statistics of images and video. Among them, CSD and SCD provide better image indexing and retrieval results [4] . In this paper, we focus on the architecture and analysis of resource sharing of CSD and SCD.
Although the concepts of CSD and SCD are different from each other. they have similar color transformation, histogram accumulation. and non-linear quantization. These similarities The challenge to realize CSD part for real-time video system is that each pixel in a frame needs to be scanned 64 times. The vast data bandwidth and then excessive operating frequency make CSD impossible for real multimedia systems. Annlysis of the trade-off between input bandwidth and local index algorithm of the color appearance in one structuring window (SW) has to be considered carefully to lower operating frequency. Realizing SCD part is not difficult, so we pay attention lo providing a suitable solution for CSD and how to share same resource with two descriptors.
Operational analysis of software simulation for CSD is shown in Table 1 . "Accumulation" comprises related operations of moving SW and CSD histogram accumulation. For a video sequence with frame size 256x256 and 30 fps, 4.5 giga instructions per second (GIPS) and 6 giga bytes per second (GBis) of memory bandwidth are needed. Such computational cost is the reason why CSD can not be applied lo real-time products without a hardware accelerator. And there is no good solution at present.
In this paper, we first give a brief introduction about the algorithms of CSD and SCD. Next we show the block diagram of CSD and SCD and the similarity between them in section 111. Section IV describes architecture of color appearance indexing in CSD. The design of Haar transform and resource sharing issue of SCD are discussed in section V. Section VI shows the experimental result and section VI1 concludes the remarks.
ALGORITHMS OF CSD AND SCD
A. Color Sl17rcrri-t. Descr-iplor CSD represents an image by color accumulation and the local spatial distribution of colors. The procedure of CSD histogram uses a moving 8x8 SW, which shifls one row or one column at a time, lo observe which colors are present in it, and then updates those color bins by only adding one. no matter how many same color pixels exist Fig. 2(a) . This characteristic causes gray pixels to exist in more SWs, and finally reflects on gray bin in CSD description. This advantage lets us easily distinguish those images with dissimilar dispersion. Figure 3 depicts CSD extraction procedure [7] . Our design chose highest number of bins for more precise CSD description in real-time applications. The top path directs the flow of 256-bin CSD that s m t s with color transformation from RGB to HMMD. Next step is hiStOgram accumulation which is followed by a decision of number of bins needed. After a nonlinear quantiration, CSD description is derived.
B. Scalable Color. Descrfptor
The characteristic of SCD is similar to traditional color histogram. The difference between them is SCD uses Haar filter on the histogram and express it in frequency domain. This approach provides a scalable description because the length of the description can be varied according to the precision we need. Similar to quantization in frequency domain of imageivideo codec, SCD reserves more bits for "low frequency" bins, and vice versa.
The SCD extraction flow and block diagram is shown in concept of energy centralization. Finally, SCD description is expressed from "low frequency" to "high frequency". buffering three successive SWs (8x10 pixels). Purpose of the buffer is for data sharing. The scan order is shown in Fig. 6 . Pixel values of three SWs are complete updated after discarding top row pixels from last three SWs and reading in ten new bottom pixels in current SWs. After finishing indexing SW colors in one stripe, we start to index SW colors in next stripe. The displacement between adjacent stripes is three pixels.
Parallelism decision is according to the target frequency. Approximately, in the situation of no local buffer of SW, each pixel in every window has to be scanned again even though it has been scanned during the period of operations of last neighboring window. The memory bandwidth is about 357 MBis and the required operating frequency is I19 MHz. In fact, we assume histogram can be updated once in one cycle to make this chip running at 119 MHz. But according lo the problem of single-port SRAM processing speed, it takes four 111. BLOCK DIAGRAM OF CSD A N D SCD cycles to update one pixel data on average and forces the required opereting frequency to 476 MHz. Relationship of parallelism and operating frequency is shown in Table 11 . Three parallelisms is the final decision to meet the requirement without over design.
CSD block diagram is shown in Fig. 5 . After color transfonn from RGB to HMMD. main part of CSD is the three parallel local histogram observing (LHO) blocks, which are used to indicate which colors are present in each structuring window [SI, [9] . After finishing color indicating of three SW;, their results are summed and sent to CSD histogram accumulation. SCD block diagram is illustrated in Fig. 4 and stated in last section.
First similarity between CSD and SCD are the color transforms from RGB to Hue-Max-Min-Difference (HMMD) and RGB to HUe-SaturatiOn-Value (HSV). Evaluation of saturation in HSV is the only overhead. Second similarity is non-linear quantization. Architectures of the two quantization blocks are the same, except for the quantization tables. The other similarity is not the characteristic of two algorithms but the local buffer needed for them. The buffer in each LHO in CSD just fils the need of the buffer ofHaar filter in SCD.
IV. COLOR APPEARANCE RECOHVING IN CSD
A. Parallelism Analjwi.7 Specification of our CSD generator is for the video sequence with frame size 256x256 and 30 fps. Operating frequency limitation is targeted at 27 MHz, which is common for most TV systems. This requirement can be achieved by B. Color Appearance Recording in LHO How to record which colors exist in a SW efficiently is another main issue. It is unrealistic to query all pixels at the same time or to query by taking 64 cycles. The method of querying at the same time will make interconnection of decision circuit become very large and inconvenient to handle. The method of querying by taking 64 cycles has to be realized by raising operating frequency. in order to solve the problem, we proposed LHO architecture. LHO contains a SRAM lo record color histogram of a SW and a color appearance register bank lo indicate which colors exist in the SW according to the values of the color bins.
The main idea of LHO is recording SW histogram to indicate which colors exist in a SW. Along with updating histogram, we observe the value of changing color bin and save this information into color appearance register bank. Nonzero bin means this color belongs to the window. After update, three register banks are summed and sent to CSD histogram accumulation block.
Using SRAM to record histogram of SW is an area efficient method. But histogram updating cycles are directly restricted by SRAM specification. Single port SRAM provides one read or one write in a cycle. That means, when we get an address from the color which needs to update corresponding color bin, we read the bin value in one cycle, add or subtract the value by I , and write it back to SRAM in another cycle. With an appropriate design for dual port SRAM, the throughput of updating histogram can achieve one update per cycle at the expense of double SRAM area and power. With power consideration. we choose single port SRAM as buffer of SW histogram. Single port SRAM takes four cycles to refresh histogram for each pixel. Two cycles are for removing accumulation from previous pixel and the others are for addition of incoming pixel. To update three SWs by refreshing ten pixels will take 40 cycles. Figure 7 depicts the LHO architecture.
V.

A. Haar Trunsform
HAAHTRANSFORM AND RESOURCE SHARING IN SCD
Haar transform in SCD works on the color histogram to present the histogram by way of the form of "frequency domain". As the precision that users require, SCD descriptions are sent from low band to high band data. The architecture of Haar transform is shown in Fig. 8 . Two bin values are saved in two registers and ready for summation and subtraction of Haar filter. Most of the time, the subtraction data representing high band are transmitted to saturation and format changing block as final output. The last output of Haar filter represents the bin with the lowest frequency and is sent to saturation and format changing block at the last cycle. Three SRAMs in 
1
The most time consuming and area occupied part of ( I ) is evaluation of hue value. If the divider and the multiplier were directly mapped into hardware, it would require high precision and set a bottleneck of the chip. Since dividend and divisor are 8-bit integers and final output of hue is quantized into 16 categories, two drawbacks just mentioned would be eliminated by building a mapping table for hue evaluation. According lo our synthesis results, realizing color transformation by table look-up method c m save 36% area and shorten critical path than by fundamental operations. saturation evaluation in HSV is the only overhead. Similar to establishing look-up lable of hue, we also make a look-up table for saturation.
C. Non-lineur Qnanrizariun
Non-linear quantization of CSD and SCD can he achieved by using same binary comparison method and folding skill. In CSD, after histogram accumulation is finished, non-linear histogram quantization is the final step. Each bin should he quantized into &bit via 255 comparisons. With those skills, eight comparisons are needed to quantize one bin. This strategy is shown in Fig. 9 . As shown in (a), we compare the bin with center value within valid range each time. Since the latency of non-linear quantization. which is compared with CSD histogram accumulation, is negligible, 255 comparators can be folded into one. With (h) architecture, 2048 (256x8) cycles and one comparator are needed to achieve this work. In SCD, non-linear quantization is applied before Haar transform. Each bin is quantized into 4 bits. With same architecture but different quantization table, 1024 (256x4) cycles are required to finish the quantization.
VI. EXPERIMENTAL RESULT In this section, we show that indexing and retrieval results of CSD and SCD for comparison. The indexing and retrieval database contains 52b images in 78 categories. Those images are collected from Internet and manual categorized. The quantity of images per category varies from 2 to 28. The seinantic categories include landscapes, animals, buildings, transportation, night scenes, paintings, cartoons, and so forth. Nine sample images are showed in Fig. 10 . Furthermore, for extending the concepts of these descriptors to image and video coding, we replace default color spaces with YCbCr domain and the performance drops slightly.
Here we use a quantitative measure inethod called queryby-example (QBE) suggested by . QBE sorts the distances between description vector of query image and those of images contained in a database. Retrieval rank represents the rank at which certain ground-lntth image is retrieved. Normalized modified retrieval rank (NMRR) eliminates the influence of nuniber of ground-truth images. Finally, average normalized modified retrieval rank (ANMRR) is the average of NMRR of each query. The smaller ANMRR means the descriptor provides better indexing and retrieval ability. experimental results in [ 6 ] . These good results imply that we can apply the concepts of these descriptions to the field of image and video coding which chooses YCbCr as default color space.
V11. CONCLUSION
In this paper, we provide the vision of future MPEG-7 descriptor applications for not only indexing and retrieval, but also for real-time multimedia applications. First analysis of dedicated hardware architecture design for combination of CSD and SCD descriptors, which can generate CSD and SCD descriptions together with frame size 256x256 and 30 fps, is also proposed. This design provides about 12 times speed-up than running on a 2.54 GHz microprocessor platform to achieve real-time applications. Detailed design explorations of the hardware implementation, and practical reference data of prototype is valuable for future researchers.
