Abstract
I. Introduction
T he use of lossless data compressor can bring about a number of increasingly important benefits to an electronic system. With the audio and video compression systems (such as JPEG and MPEG) which are lossy and hence only recreate an approximation of the original data. The most obvious benefit of data compression is reduction in the volume of data which must be stored .This is important where the storage media itself is costly (such as memory) or the other parameters, such as power consumption, weight or physical volume, are critical to product feasibility. Using the data compression reduce the total storage requirement, thus effecting the cost saving. The push to roll out high definition video enabled video and imaging equipment is creating numerous challenges for video system architects. The increased image resolution brings with it higher performance requirements for basic video data path processing and next-generation compression standards, outstripping that which standalone digital signal processors (DSPs) can provide. In addition, the system specifications require designers to support a range of standard and custom video interfaces and peripherals usually not supported by off-the-shelf DSPs. While it is possible to go the route of application specific integrated circuits (ASICs) or use application specific standard products (ASSPs), these can be difficult and expensive alternatives that might require a compromised feature set. Furthermore, these choices can hasten a short product life cycle and force yet another system redesign to meet varied and quickly changing market requirements. Field programmable gate arrays (FPGAs) are an option that can bridge the flexibility gap in these types of designs. Additionally, with the increasing number of embedded hard multipliers and high memory bandwidth, the latest generation of FPGAs can enable customized designs for video systems while offering a manifold performance improvement over the fastest available stand-alone DSPs. Designers now have the ability with state-of-the-art FPGA co-processor design flows to implement high-performance DSP video and image processing applications. This new generation of tools facilitates the design of a system architecture that is more scalable and powerful than traditional DSP-only designs while at the same time taking advantage of the price and performance benefits of FPGAs. We have been researching high performance loss less data compressor /decompression hardware as the means to achieve the high throughput target [1] - [5] .
II. Video Compression Application Requirements
A wide variety of digital video applications currently exist. They range from simple low-resolution and lowbandwidth applications (multimedia, Picture phone) to very high-resolution and high-bandwidth (HDTV) demands. The present requirements of current and future digital video applications and the demands they place on the video compression system. The importance of video compression, the transmission of digital video television signals is presented. The bandwidth required by a digital television signal is approximately one-half the number of picture elements (pixels) displayed per second. The analog pixel size in the vertical dimension is the distance between scanning lines, and the horizontal dimension is the distance the scanning spot moves during ½ cycle of the highest video signal transmission frequency. The bandwidth is given by Equation
(FR)(RH)(ML) Where
Bw = system bandwidth. FR = number of frames transmitted per second (fps), N L = number of scanning lines per frame, R H = horizontal resolution (lines), proportional to pixel resolution. Video compression and video CODECs will therefore remain a vital part of the emerging multimedia industry for the foreseeable future, allowing designers to make the most efficient use of available transmission or storage capacity. In this paper we introduce the basic components of an image or video compression system. We begin by defining the concept of an image or video encoder (compressor) and decoder (decompress or). We then describe the main functional blocks of an image encoder/decoder (CODEC) and a video CODEC. We are emphasizes on the real implementation of the CODEC on the VLSI system into a FPGA (Xilinx Spartan 3 kit).
III. Basic Video Compression Scheme
Video or visual communications require significant amounts of information transmission. Video compression, as considered here, involves the bit rate reduction of digital video signal carrying visual information. Traditional videobased compression, like other information compression techniques, focuses on eliminating the redundant elements of the signal. The degree to which the encoder reduces the bit rate is called its coding efficiency, equivalently; its inverse is termed the compression ratio: Coding efficiency = (compression ratio)-1 = encoded bit rate/decoded bit rate.
IV. Basic Video Transmission Scheme
Applications of digital video range from low quality videophones and teleconferencing to high resolution television. The most effective compression algorithms remove the time redundancy with motion compensation. Local image displacements are measured from one frame to the next, and are coded as motion vectors. Each frame is predicted from the previous one by compensating for the motion. An error image is calculated and compressed with a transform code. The MPEG standards are based on this motion compensation. For teleconferencing, color images have only 360 by 288 pixels. A maximum of 30 images per second are transmitted, but more often 10 or 15. If the images do not include too much motion, a decent quality video is obtained at 128kb/s, which can be transmitted in real time through a digital telephone line. The High Definition Television (HDTV) format has color images of 1280 by 720 pixels, and 60 images per second. The resulting bit rate is on the order of lo3 Mb/s. To transmit the HDTV [11] through channels used by current television technology, the challenge is to reduce the bit rate to 20 Mb/s, without any loss of quality. Field Programmable Gate Array (FPGA is a semiconductor device containing programmable logic components and programmable interconnects [1] . The programmable logic components can be programmed to duplicate the functionality of basic logic gates (such as AND, OR, XOR, NOT) or more complex combinatorial functions such as decoders or simple math functions. In most FPGAs, these programmable logic components (or logic blocks) also include memory elements, which may be simple flip-flops or more complete blocks of memories. FPGAs require three major types of elements [2] : Figure 4 shows the basic architecture of FPGA that incorporates these three elements.
VII.A Design Architecture of the Compressor / Decompress or
The design is a dictionary style compressor based around a dictionary implemented in the form of a content addressable memory (CAM).The length of the CAM varies from 16 to 1024 tuples (4-byte location) trading complexity for the compression. Typically, the device complexity has been increased 1.5 times the dictionary doubles. It has also master control logic , target control logic and DMA controller . The architecture has 2 independent compression and decompression engines that can work simultaneously in fullduplex mode. Each of this engine has 2 32 -bit ports that are used to move data to and from the PCI interface logic. Each of these ports has own buffering scheme formed by dual port SRAM memory blocks plus control logic that enable a smooth flow of data during compression and decompression operations. A hand -shaking protocol formed by bus request, bus acknowledge and wait signals is used by each of these 4 ports to input or output data to the interface core. Compression and decompression commands are issued through a common 32-bit control data port .A 4-bit address is used to access the internal registers that store the commands, information related to compressed and uncompressed block sizes and CRC (Cyclic Redundancy check) codes to verify the compression operations. Each channel includes a CRC unit that accesses the compression engine or that leaves the decompression engine. A total of 10 registers form the registers bank. 5 registers are used to control the compression channel and the other 5 for the decompression channel. The first bit in the address line indicates if the read /write operation accesses compression or decompression registers. The device includes a test mode that simultaneously decompresses the block being compressed and reports any mismatches in the CRC codes using an interrupt. The architecture is based around a block of CAM to realize the dictionary. A CAM [14, 17, 18 ] is used to store data, much the same as a RAM. The write mode of CAM and RAM is similar to some degree, but the read mode differs significantly. With Figure 4 . Basic Architecture of FPGA Figure 3 . FPGA synthesis and implementation RAM we input an address, and get data out. With CAM we input data, and if this data is stored in the CAM, we get the address of that data out. There is an address at the output even if there is no match so with a CAM we need a Match bit to indicate if the CAM contains the input data.
Fig shows a CAM has been applied for string matching in this work. The data to be matched is sent to the CAM as a Byte Strum. In parallel, it is compared to all strings (i.e. words) stored in the CAM. If a match is found, it is indicated by the Match bit. The Match Address reports the "address" of the string that matched in the CAM. Exact string matching is performed and thus, only one (or none) string will give a match. The string matcher has not yet been integrated with the Snort program, although this is the intention.
The contribution of this paper is also to implement a Variable Word-Width Content Addressable Memory (CAM) in FPGA for string matching. Thereby great flexibility is obtained with respect to how the Snort rule set is defined. In order to design a system for an FPGA, a Hardware Description Language (HDL) is needed. VHDL [18, 19, 20] was chosen for this work in spite of its somewhat restricted programming features.
VIII.The Dictionary Architecture
In our architecture CAM based dictionary has 16,32 or 64 tuples.The n-tuple dictionary is formed by a total no of n*32 CAM cells. Each cell stores one bit of data tuple and it can maintains its current data, or load the data present in the cell above. The dictionary architecture has been shown in the above fig . The architecture compares the search data with the data present in the dictionary using one XOR gate to do the comparison of each input bit plus (log 2 (dictionary width)) 2 input and gates tree to obtain a single comparison bit per dictionary position. The delay of the search operation although in principle is independent of dictionary length, in fan outs and long wires of large dictionaries its speed considerably. An adaptation vector named move in fig and whose length equal to dictionary length defines which cells keep its current data and which cells load data from its northneighboring cell.
We implemented the CAM in hardware by using Xlinx ISE 8.1i.
IX. Implementation and RTL Overview at ime 
X. Conclusions
In this paper we presented the starting portion of our research project of ASIC design implementation and verification of large video data transformation SoC [24] design in FPGA System. In future complete pre coder design will be appended to the coders and a complete video compression system or scheme will be realized after mathematical verification of algorithm. The intention is to use XILINX Spartan 3E FPGA for implementation of ASIC Design. More over the design should be tested in real time using XILINX ISE 8i.
XI. Acknowledgment
I am thankful to Prof Amit Konar for his kind guidance. And also thankful to the lab support of the Embedded system Lab of CMERI, Durgapur for complete my work. 
