Abstract: in this paper
I. Introduction
The discrete wavelet transform (DWT) is widely used in many image compression techniques, and is used as an ingredient in many image compression standards, such as JPEG2000 etc. This is because the DWT can decompose the signals into different sub bands with both time and frequency information and facilitate to arrive a high compression ratio. The architecture of two-dimensional DWT is mainly composed of the multi rate filters. Because extensive computation is involved in practical applications. The parallel and embedded decimation techniques are employed to optimize the architecture, which is mainly comprise of two horizontal filter modules and one vertical filter module, working in parallel fashion and pipeline fashion. The architecture has been designed to generate two outputs at one working clock cycle, with every two sub bands coefficients alternately, and consuming approximately 2N² (1-4ᴶ)/3 clock cycles for computing J levels of decomposition for NxN image. For simplicity but no losing generalization, we introduce the architecture by taking the bi-orthogonal 9/7 wavelet transform as an example.
II. Background
As computing power grows, 3D animation becomes one among the vital parts in communication. 3D animation compression naturally catches attention. There are 2 approaches in 3D animation compression. within the 1est approach, topology is assumed to be static, and there's no or chump change in property. within the second approach, topology might amendment indiscriminately between frames.
III. Proposed Compression Scheme
The data to be compressed is assumed to be regularly sampled, which is mainly used in scientific and medical visualization purpose. New scheme called Lifting Scheme is applied to compress the image and inverse discrete wavelet transform is applied to recover the original image. Because the compression technique is lossy technique, it is complex to recover back the original image. The process is divided into three steps which are split predict and update steps. Which is discussed below?
IV. Lifting Scheme For 2d Wavelet Transform
Lifting scheme is mainly based on spatial to construct wavelet, which comprises of three steps split, predict and update. Lifting wavelet is called the second generation wavelet, the basic principle of which is to break up the poly phase matrices for the wavelet filters into a sequence of upper and lower triangular matrices and convert the filter implementation into banded matrix multiplications. Every finite impulse response filter wavelet can be factored into lifting steps, and the lifting strategy is a highly non-unique process. 
Predict :
We use n even samples to predict the value of a neighboring odd value. With a good prediction method the change is high that the original odd sample and its prediction and replace the odd sample with its difference. As long as the signal is highly correlated , the newly calculated odd samples will be on the average smaller than the original one and can be represented with fewer bits. The odd half of the signal is transformed. To transform the other half samples, we will have to apply the predict step on the even half as well, because the even half is merely a sub-samples version of the original signal, it has lost some properties that we want to preserve. In the case of images we might keep the intensity (mean of samples) throughout different levels.
Update : The third step updates the even samples using the newly calculated odd samples such that the desired property is preserved. We apply these three steps repeatedly on even samples and transform each time half of the even samples. Up to all samples are transformed. 
A. Wavelet Realization and Implementation
Wavelet are the functions that satisfy certain requirements, wave is also referred to as the oscillating function of time and space. The name wavelet comes from the fact that they satisfy the requirement that they should integrate to zero, waiving above and below x-axis. A wavelet is a small wave which has finite energy, with its energy concentrated in time or space to give a tool for the analysis and way of comparison of transient, nonstationary, or time-varying development. The very small requirement of wavelet is that wavelet suggests the function has to be localized. There is many kinds of wavelets one of the most simple form of wavelet is Haar Wavelet. So we can say that wavelet is a wave like oscillation which starts out at zero, increases and then decreases back to zero. 
B. Multirate Filter bank for DWT
The computations of DWT are inner products of the input signal and a family of wavelets. The bi-orthogonal 5/3 wavelet transform is adopted in JPEG2000 standard to implement lossless compression of image, which can be factored into two stages of lifting. In this paper, we will take it for an example to introduce the proposed architecture for 2-D DWT for simplicity. The conventional lifting factorization for the forward transform of the 5/3 wavelet filters .Each level in figure above is composed of G, H, and down sampling to decompose the input data into two parts. G and H represent the high-pass filter and low-pass filter respectively. Down sampling is to directly output the data every two input data. For convenient description, the three-level DWT which consists of 4-tap G and H filters is used in the following architecture design Figure: Three stage wavelet decomposition.
The filter coefficients are saved in shift registers. When the data sequentially inputs to the module, the first input data multiplies to go and g2, then the second input data multiplies to g1 and g3, and so forth. These resulted data is putted into two sets of shift registers. Next, the adders A, B and C are used to sum up the data in shift registers. The whole data flow for the G part is depicted in Fig.(2) . If we use this hardware module to construct the entire three-level DWT, it can be found that the utilization for the hardware module in the higher level would become lower. From the above computation equations, the computing times of the level 2 and 3 are a half and a quarter of the level 1 respectively. Therefore, the second level module and third level module can be performed commonly on one module hardware by feeding the output to the input.
C. Methodology
The following equations can be used for computing the high-pass and low-pass coefficients of jth decomposition level of lifting DWT using filters. By using the lifting scheme and the embedded decimation technique, the numbers of adder and multiplier of the proposed architecture arc reduced significantly, which are 4 and 8 respectively.
The proposed pipelined architecture for FU1.
Typical performance comparison of the DWT architectures includes evaluation in terms of the number of multipliers, the number of adders, storage size, computing time, control complexity, latency and hardware utilization. Firstly, we will analysis the performance of the proposed architecture in theory. The computing time has been normalized to the same internal clock rate, and can be easily derived that is T=2(1-4ᴶ)N²/3 where N² is the image size, and J denotes the 2D DWT decomposition level.
V. Problem Formulation
Fast architecture for two-dimensional discrete wavelet transform by using lifting scheme, both parallel and embedded decimation techniques are proposed and employed to optimize the architecture, the architecture is mainly composed of two horizontal filter modules and one vertical filter module, works in parallel and pipeline fashion with 100% hardware utilization. The architecture is designed to produce two outputs at one working clock cycle, with every two sub bands coefficients alternately.
VI. Conclusion
A efficient architecture for discrete wavelet transform has been proposed, the architecture is based on lifting scheme of wavelet transform. The transform module of the proposed architecture includes two parallel working horizontal filters and one pipeline working vertical filter. Embedded decimation, parallel and pipeline techniques were adopted to optimize design, reducing significantly the internal memory size and the number of accessing storage, and increasing efficiently the throughput. Compared with the other devices, the proposed architecture has lower hardware cost and control completely, as well as short system latency. The architecture is fast as well as power and area efficient, and has 100% hardware utilization. The design is regular, simple, and well suited for VLSI implementation.
