In this paper, the design of new real time integer to integer lifting based wavelet transform IWT architecture is focused. An efficient design method is proposed to construct an integrated programmable VLSI architecture that can operate as forward or backward IWT in pipeline fashion. The layout VLSI integrated structure is simple, modular, and cascadable for computation of wavelet transform based on 5/3 biorthogonal filters. The architecture is optimal with respect to both area and time and independent of the size of the input signal without necessitate to memory. The lifting steps adapted to be causal and the proposed architecture is suitable to be used in the real time processing applications. The critical path of the architecture is equal to critical path of one lifting step. The numerical precision has been established using simulink model. Experimental tests have been made with 8-bit signed two's complement integer numbers. Based on the experimental result observations, the data path width of proposed architecture is fixed at 10 bits.
. : ) IWT ( . ) VLSI ( . . 5 / 3 . . ) 8 ( ) 2 ( . ) 10 ( .
Introduction
The lifting scheme is an elementary method to obtain truly loss-less non-linear integer-to-integer wavelet transforms with following properties (1) fast implementation because lifting principle based on the optimal similarities between high and low pass filters, which in consequence reduce the computation complexity (2) in-place calculation by regularly replacing the original signal with its transform, an auxiliary memory can be kept away and the hardware implementation can be compact (3) the backward transform can be realized using the inverse elementary operations of the forward one, taken in reversed order [1] , [2] , [3] , [4] .
In lifting scheme it is possible to maintain integer data after filtering operation if the input data are integer. This can be developed very simply by achieving rounding in each lifting step. In consequence the linear lifting steps are replaced by their nonlinear approximation. The reversible integer-to-integer wavelet transform is called IWT. It is important to note that the filter coefficient not necessary to be integer for IWT [1] .
A lot of literatures have been published concerning traditional convolution design for DWT implementations. The architecture can be broadly classified in the range from SIMD arrays to folded architectures such as systolic arrays and parallel filters. The folded architectures, implement online versions of the recursive pyramid algorithm RPA [6] . These architectures support single chip implementations in VLSI and are optimal with respect to both area and time under the word serial model [7] [8] [9] .
Recently most of the works are done on the newly proposed lifting scheme. A numerous papers are published for efficient VLSI architectures of 1-D and 2-D lifting based DWT [10] - [15] . In [10] a lifting scheme base architecture is proposed that perform the forward and inverse DWT for a set of filters anticipated in JPEG2000. Efficient lifting scheme VLSI architecture is proposed in [11] by flipping conventional lifting structures for improving and minimizing the critical path and memory requirement. In [12] , a systematic design method for efficient pipeline VLSI architectures of lifting scheme is proposed, which includes specific lifting factorization, dependence graph formation, and systolic arrays mapping. A VLSI architecture is proposed in [13] for the IWT implementation, capable of achieving very high frame rates with moderate gate complexity. DSP-type architecture for IWT are presented in [14] dealing with optimal factorization and finite precision effects. Although the lifting scheme has been widely studied in the literature, most of them c o n s i d e r n o n -c a u s a l s y s t e m s w h e r e t h e w h o l e s i g n a l i s b u f f e r e d .
In this paper, we address new real time integer to integer lifting based wavelet transform IWT architecture. An efficient design method is proposed to construct an integrated programmable VLSI architecture that can be operating as forward or backward IWT. The architecture is casual and no memory is needed for buffering.
The paper is organized as follows. In Section 2, the theory of lifting scheme factorization polyphase matrix is reviewed. The design issues of real time forward and backward IWT are given in section 3. In section 4 the fixed-point lifting structure with numerical precision analysis is described. The design procedures for integrated VLSI architecture are provided in section 5. Finally, in section 6, conclusions are drawn.
Lifting Scheme
The polyphase representation of a discrete-time FIR filter h[n] can be decomposed in z-transform domain into two parallel filters as )
where H e (z) encloses the even filter coefficients and H o (z) encloses the odd filter coefficients of the FIR filter H (z). The z-transform of the decomposed filters can be expressed as The general block scheme of the DWT is analogous to classical subband system as shown in figure 1 . If the sets of filters {H a (z), G a (z)} and {H s (z), G s (z)} represent analysis and synthesis lowpass and highpass filters respectively. The corresponding polyphase matrices are defined as [4] 
The forward DWT can be expressed in terms of polyphase matrix as (2) and backward is represented as 
where T is matrix transpose operator and I is the 2 x 2 identity matrix.
It has been shown in [5] that for a given complementary pair of filters {H a (z), G a (z)}, there are always exist Laurent polynomials S ai (z) and T ai (z) for 1 i q and a non-zero constant K, such that
This means that the polyphase matrix P a (z) can be factorized into finite sequence of alternating upper and lower triangular matrices. This factorization is not unique, several pairs of {S ai (z)} and {T ai (z)} filters are allowable. However, all possible choices give the same result for DWT realization. In practice the set { S ai (z), T ai (z)} of filter pairs are usually of 1 to 3 taps FIR filters [1] . Computing with S ai (z) filter is called primal lifting or simply lifting while computing with T ai (z) filter is known as dual lifting. The forward and backward lifting schemes are shown in figure 2. 
Lifting Structure realization
The lifting scheme realizes analysis or synthesis filter bank as factorized polyphase matrix which are convenient both for design and implementation of wavelet transform. In the literature, lifting scheme architectures have been proposed [2] , especially in the very last years due to increasing interest gathered by JPEG2000 deliver. The well known 5/3 bi-orthogonal is a default filter employed by JPEG2000 for lossless transforms. The analysis biorthogonal 5/3 filters {H a (z), G a (z)} have the following coefficients [1] :
The polyphase matrix of above filters is
A probable factorization of P a (z) using two lifting steps is 
where 0 k N/2 for input stream data x of length N.
The given system is a multi-rate system; the input sampling rate is F s while the output sampling rate is half or F s /2. It's visible that each of the lifting steps has alike computing outline, the disparity are in the values of input samples and multiplier factors. The determined lifting scheme should be causal for real times processing applications. It's obvious that predict lifting is causal while update lifting is not causal. Usually this is not really a problem. The processing operations can delayed to make the system causal. In order that the result lifting scheme become adapted in real time applications the computation process is delayed by one unit time as:
The resultant dependence graph (DG) can be drawn for the corresponding lifting as shown in figure 3 . It is important to note that d 
The above equations are projected to get the SFG of the real time backward lifting for reconstructing the original signal as shown in figure 5. 
Fixed-point Reversible Lifting Structure
The invertible transform means that the transform is calculated using exact arithmetic. In practice finite-precision arithmetic is usually employed, and such arithmetic is inherently inaccurate due to rounding error. In this case the transforms are reversible (i.e. invertible in finite-precision arithmetic). It is possible to create transforms that are not only invertible, but reversible as well [15] . The reversible transform map integers to integers, and approximate linear wavelet transforms. Although reversible wavelet transforms map integers to integers, such transforms are not fundamentally integer in nature. That is, these transforms are based on arithmetic over the real numbers in conjunction with rounding operations [16] .
7
The 5/3, transforms are truly multiplierless (i.e., their underlying lifting filters all have coefficients that are strictly powers of two). Evidently, each of the resultant architecture in figures 4 and 5 has computation complexity of 4 additions and 2 shifts. The total delay between the input signal and reconstructed signal are three clocks or 3/F s . The critical path of each architecture is equal to the delay of the predict step plus the delay of the update step. The critical path of each lifting step is given by T L = 2 T A + T S (16) where T A is the latency of the adder and T S is the latency of the arithmetic shifter.
The reversible implementation of the forward and backward operations of equations (12), (13), (14) , and (15) are approximated by nonlinear operations which map integers to integers. The forward IWT equations are put into practice as
While the backward IWT equations are executed as
Where the symbol . means floor function. In this work all the arithmetic operations considered are fixed-point arithmetic and operands are represented as two's complement signed integer. Under these conditions the arithmetic right shift (symbolized by ») of a number V by p bits is equivalent to p 2 V or floor function that results into largest integer not larger than V/2 p .
Numerical precision analysis
A comparison study has been implemented using Simulink of Matlab 7 to determine the number of bits required for satisfied fixed-point implementations. The study started by examining the BIBO (Bounded Input Bounded Output) gain of lifting implementation of the 5/3 biorthogonal filters. Considering the cascade equivalence relations obtained by means of the interchanging between a filter and down sampling facilities the way to compute the BIBO [2] . The equivalent low-pass filter obtained after j stage of the basic filter bank structure is It is apparent from table 1 that the worst-case bit-depth expansion intended for lifting implementation of the 5/3 biorthogonal filters is 2 bits up to five level of decompositions. The above computed values at different levels refer to filter bank or lifting implementation wavelet transform using of 5/3 biorthogonal filters. Now the case of the integer-to-integer mapping wavelet transform IWT is taken into account for the purpose of hardware completion of the proposed architecture. 
Where bounds the effect of the floor operations used in each lifting step. It is noted in [2] that has negligible impact on the number bits required for representing the subband sample values. Consequently, if the signal input samples are b-bit two's complement integer numbers, then (b+2)-bit integers are sufficient to represent the reversibly transformed output subbands up to five levels of decompositions.
Experimental Results
Four input test signals are used to extract the performance of the proposed IWT architecture. The signals are shown in figure 6 and named as blocks, bumps, quad-chirp, and white gaussian noise. The input samples of each tested signal set apart as 8-bit signed two's complement integers. All signals were 1024 samples long. The SNR in decibels is used to measure the performance as
Where x is the original input data represented as 8-bit signed two's complement integers, x r is the reconstructed output data. Figure 6 . The tested signal.
The SNR values for input signals after five levels of forward and backward IWT are given in table 2, 3, 4, and 5. The lossless transform is happened for all input test signals at 10-bit data width, where infinity (Inf.) SNR values are gained. Therefore the data path width of the proposed architecture is fixed at 10-bit 
Proposed VLSI Architecture
The predict and update lifting steps has a similar computing pattern. It is possible to design a single programmable process element (PE) with control inputs such that the PE can operate as predicts or updates lifting step. In order to configure the PE a two control inputs denoted as m (shift) and s (add/subtract) are applied as Table 6 shows the setting used in the case of forward/backward predict and update lifting steps. The detailed structural design of the lifting step PE is shown in Figure 7 . The one of the four categories of the forward predict, forward update, backward update and backward predict can be implemented using the given PE by selecting the corresponding control inputs m and s. The programmable lifting step PE affixed regularity in overall system design. Hence the implementation of the forward lifting IWT is straightforward. On the other hand, a better performance can be achieved by using pipeline structure as shown in figure 8 . Adding two latches between the PE 0 and PE 1 the principle of pipelining is attained. Now the critical path is enhanced and equal to critical path of one PE. It's clear that the functional block diagrams of the proposed forward and backward of IWT differ in the way the input data supplied to the pipelined processing elements PE 0 and PE 1 see the bold boxes in figures 8 and 9. It is possible to build an integrated structure that can functions as forward or backward IWT architecture by adding multiplexers with control signal u. Such that if u=0 the integrated structure operates as forward IWT architecture, otherwise it operates as backward IWT architecture.
The block diagram of the programmed forward/backward IWT architecture is shown in figure 9 . In the forward mode u=0, the input multiplexers select the input x to the architecture and the buffers corresponded to 'a' and'd' outputs are actives. In the same time control selections of PEs are set to m 0 =1, m 1 =1, s 0 =0 and s 1 =1. While in backward mode u=1, m 0 =1, m 1 =0, s 0 =1 and s 1 =0, the input multiplexers route the 'a' and 'd' to be the inputs to the architecture. In the same moment the output multiplexer buffer is active to deliver the output x r . Figure 10 : The integrated architecture for forward and backward IWT.
Conclusion
In this paper, the design of a programmable modular VLSI integrated architecture for computing 1D IWT is proposed. The proposed architecture is simple and cascadeable for computation of multi-levels decompositions and can be programmed to operate as 1D forward or backward IWT. The integrated architecture is independent of the size of input signal therefore it is not including any memory and this is an advantageous in VLSI design with respect to both area and time.
The precision data analysis is performed using simulink model in the environment of Malab7. The data path of the architecture is selected as 10-bit for the integer input samples of 8-bit using two's complement representations. The 10-bit is sufficient for lossless reversible transform and up to five level of IWT.
The architecture is suitable to be used in the appliance of real time processing systems. A better arrangement is attained by using pipeline configuration which reduces the critical path of the architecture to critical path of one lifting step and consequently increase the speed of processing. With simple modifications the proposed architecture can be used as 2D IWT.
