Multiplierless Modules for Forward and Backward Integer Wavelet
  Transform by Kolev, Vasil
International Conference on Computer Systems and Technologies - CompSysTech’2003 
 
 
 
-               - 
 
 
Multiplierless Modules for Forward and Backward Integer 
Wavelet Transform  
 
Vasil Kolev 
 
 Abstract: This article is about new architecture of a integer DWT with reprogrammable logic. It is 
based on second generation of wavelets with a reduced of number of operations.  A new basic structure for 
parallel architecture and modules to forward and backward integer discrete wavelet transform is proposed. 
         Key words: lifting scheme, reprogrammable logic, wavelet, filter bank, integer DWT. 
 
Introduction  
        The wavelet analysis is useful for the problems in many areas of applications. This is 
a new systematic way for analyzing functions with a rich basis [1, 6]. In general this means 
that these are functions with a good time-frequency distribution and zero mean. It is also 
possible to miss a prototype as is for the second generation wavelets. 
         The discrete wavelet transform (DWT) is used in image and sound processing, noise 
reduction in signal, tomographic, modems digital holographic etc. Often the processors for 
this transformation is based on serial input and like data are input serial from the first to the 
last sample. These processors are effective for one-dimensional signal, but it is also 
possible to use them for two-and three-dimensional signals. It is possible to represent the 
two dimensional pictures as a sequence of one-dimensional signal with the pixels 
arranged with a predefined manner. The wavelet transform with lifting scheme (LS) 
working ”in place” is described in [5, 8]. There is realization with integer arithmetic for VLSI 
optimising structures for JPEG2000 standard [10]. 
The goal of this paper is to build modules with new basic structures for integer discrete 
wavelet transform (integer DWT) by LS for real time processing and hardware parameters 
studying. 
 
The algorithm overview 
        The basic principles of LS (for analysis filter - )(zH  and syntesis filters  - )(zG ) are 
the polyphase representation: 
                                    )()()( 212 zHzzHzH oe −+=                                                      (1) 
                                              )()()( 212 zGzzGzG oe −+= , 
where the polyphase components )( 2zHe , )( 2zGe  and )( 2zHo , )( 2zGo  are Lauren’s 
polynomials from commutative ring.  The division with redundancy here is possible but not 
an unique solution. These polynomials represent the even and odd filter coefficient and 
form polyphase matrix [2]: 






= )(
)(
)(
)()(
zG
zG
zH
zH
zP
o
e
o
e
                                                            (2) 
)(zP  decomposition is a new form of representation with triangle matrixes, which defined 
the phases of prediction (P) and updating (U) (Fig.1) 
Analysis  
   Let the signal jS  be a given signal of j-level. The requirement for decorrelation 
defines the problem for signal representation with minimum coefficient number [4]. 
• Spit  
After using lazy wavelet the input samples are separated to even and odd areas of 
coefficient [8](Fig.1).  
 
International Conference on Computer Systems and Technologies - CompSysTech’2003 
 
 
 
-               - 
 
                                                   1−jeven                                    1−js  
 
  
                             js  
  
                                                    1−jodd                                  1−jd   
 
 
fig.1  Lifting sheme, forward transform 
 
 
The half of the samples of original signal forms each one of these two areas. 
                        )(:),( 11 jjj soddeven Split=−−                                              (3)  
• Prediction (P)  
Let the signal be strongly correlated. It is possible to use an even sequence for prediction 
of odd sequence. The detail part 1−jd  is the difference between odd and prediction values: 
                      )( 111 −−− −= jjj evenoddd P                                               (4) 
It is possible to use triple samples ]22[],12[],2[ ++ nsnsns  to line interpolation with a 
prediction phase (P) equal to average value of even samples. The difference between odd 
and predicted value forms the wavelet. It is utilized an operator for rounding (floor 
function  x - is the greatest integer less than or equal to x). It is found the following wavelet 
equation: 
               




 ++
−+= −−
− 2
]22[]2[]12[][ 111
nsns
nsnd jjjj                                          (5) 
If the odd value coincide with predicted value, then wavelet coefficient is zero . 
• Update (U) 
The main goal of phase updating (U) is to find from calculated wavelet coefficients the 
scalable function for saving the current and average of the wavelet for all level of 
decomposition. The coefficient 0,0s  is average value of signal for last level. The updating 
represents the scaling function for the next lifting step. 
)( 111 −−− += jjj devens U                                                        (6) 
From the above condition the scaling coefficients can be represented by the expression: 
                                




 −+
+= −−
− 4
]1[][]2[][ 111
ndnd
nsns jjjj                                                  (7) 
   Reconstruction 
 From (6) the inversion can be found of the step of updating, which result is an even 
value from the restored sequence of a given level: 
                               )( 111 −−− −= jjj dseven U                                                        (8) 
From 1−js  and 1−jd  can just be found the inversion of prediction step (4), which result is an 
odd value from restored sequence of given level: 
                                       )( 111 −−− += jjj evendodd P                                                              (9) 
 The reconstruction signal js~  is described as a merge of even and odd value: 
                                      ),(~ 11 −−= jjj oddevens Merge                                                        (10) 
 
  
- 
 U Split  P 
+ 
International Conference on Computer Systems and Technologies - CompSysTech’2003 
 
 
 
-               - 
 
The new basic structure 
        The algorithm of realization of LS requires to store in the same time some values. 
The proposed in [5] basic architecture consists from two adder, three registers and one 
multiplayer. Such configuration requires more resource and is inconvenient for updating. In 
[10] is propose a sructure with one multiplayer, one adder, register and multiplexor. The 
disadvantages are the requirements of a big number of registers for making multiplying, 
with which the time of arithmetic operations increase and the possibility the samples be 
processed only in a serial way.  
 
 
 
 
 
 
 
 
 
 
 
 
Fig.2 The new processing element  
 
The new structure (Fig.2) is consisted of two programmable delays (Dm и Dn), tree 
registers(R) and one adder. This structure is in the basis of builder integer DWT. 
 
 Modules describing 
 1. Modul analysis (Fig.3). 
The input signal is 8-bits. The sum of even values (sJ-1[2n] and sJ-1[2n+2]) is stored in 
register making division by two as a shift one bit right. The result from (sJ-1[2n+1]) is 
substracted with a pass over to two-complement code. At the output is obtained the 
wavelet values dJ-1[n-1] and dJ-1[n] which are summated. If the sum is negative, the 
operation division by four is combined with one bit correction and the result is added with 
even value sJ-1[2n]. That gives the scaling values sJ-1[n]. The blocks-dashed line is used in 
the new basic structure. 
 
2. Modul reconstruction (Fig.4) 
There are wavelets dJ-1[n] and scaling function sJ-1[n] samples at the input. The block 
enclosed with dashed line is made with new bases structure. If the sum of values   dJ-1[n] 
and dJ-1[n-1] is negative, it is necessary to make correction of the result with one bit for 
error eliminating from next operation.The register with the sum is shifted two bits to the 
right for division of four.  
Next is the difference between the result of the shifting and sJ-1[2n], and that give 
even values s[2n]  stored in an additional register. These two values are summarized and 
it is made a division by two for this result with a shifting of register one bit to the right. The 
output of division is summed with dJ-1[n] and that gives the odd value s[2n]. By using a 
multiplex it is possible to have reconstruction values (      ). 
 
The Hardware implementation of the modules  
The reprogrammable logic changes real time the computer system with 
reconfiguration of system level and with change of corresponding data and instruction for 
the individual processing element (PE). This logic works with greatly frequencies than that 
for the digital signal processor (DSP). The disadvantage of this logic is the limit on number 
1
~
−Js
R Dn Dm R R 
Output 1 Output 2 
Input  
+ 
International Conference on Computer Systems and Technologies - CompSysTech’2003 
 
 
 
-               - 
 
of logical operations. It is possible to use the advantages of hardware description 
languages and to apply the method of design from down to top level. It is available the 
error correction and the possibility to make changes in whole chip structure. Here it is used 
the high-level hardware description language VHDL for design description. Xilinx 
Foundation is used for place and routing. The simulations are made with ModelSim5.6a of 
MentorGraphics. The test is with consideration of the application of the modules, which will 
work with integer positive samples. A signal with 64 samples (Fig.5) is generated whose 
values have with a normal distribution. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
  
Fig. 3 Architectures of forward integer DWT 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Fig. 4 Architectures of backward integer DWT 
  
 
 
After the optimization the designed modules have the following characteristic (Table 1). 
The designed with the proposed new basic structure modules gives the possibility of 
decreasing the hardware requirements for generation of the outputs of forward (sJ-1[n],  
dJ-1[n]) and backward ( 1~ −js  ) integer DWT (Tабл.2). 
 
+
>>1 
SJ[n] s
 J-1[2n], sJ-1[2n+1], sJ-1[2n+2] 
dJ-1[n] 
+ >>2 correction 
sJ-1[n] +
dJ-1[n-1] - 
register 
 
odd 
>>2 
dJ-1[n] sJ-1[n] 
d
 J-1[2n], d J-1[2n-1]... d J-1[2n-6]  
correction 
- 
s
 J-1[2n], sJ-1[2n-1]… sJ-1[2n-3] 
even 
+ 
>>1 
[n]s 1J−~
+ 
  MUX 
+ 
International Conference on Computer Systems and Technologies - CompSysTech’2003 
 
 
 
-               - 
 
                                                                                             Tаble 1 
Module Analysis 
Target Device: Virtex Excv200e-pq240-8 
Module Reconstruction 
Target Device:  Spartan2 xc2s150-fg256-6 
 
#Work frequency (MHz) :100 
# Registers                    :  30 
8-bit register                  :  30 
# Adders/Subtractors     :   5 
8-bit subtractor              :   1 
8-bit adder                     :   4 
 
 
#Work frequency (MHz)  :100 
# Registers                      :  21 
9-bit register                    :  21 
# Adders/Subtractors      :    6 
9-bit adder                        :   6 
 
         
Fig. 5             
                                                              
 
Fig. 6 
                                                      
CONCLUSIONS AND FUTURE WORK 
         It is possible to make the following conclusions from the designed modules and from 
the achieved result: 
- Parallel processing with the proposed block structures simplified the hardware 
implementation for the arithmetic operation realization (Tабл.2); 
International Conference on Computer Systems and Technologies - CompSysTech’2003 
 
 
 
-               - 
 
                                        Table 2                                                                   Table 3                      
 
 
          
 
 
 
- The forward and backward integer DWT by Lossless scheme have the same 
calculation complexity; 
- The possibility for parallel and serial operations with data in a clock period by using 
the state chart; 
- It is possible to be used processing of data that have not only the power two; 
- The standard methods for (5, 3) filter bank require 8 operations while the LS only 5. 
This make useful and effective the integer values filters for calculating speed 
increase; 
- A given integer value of signal after transformation is lossless transformation 
(Fig.5); 
- The architectures for fixed point arithmetic (the proposed modules and these in [10]) 
with comparison of these with floating point arithmetic gives some higher speed for 
processing of line with 256 samples with 8–bit accuracy (Table 3); 
In the future work it is expected to be applied the modules for IDWT with several level and 
analyzing with LS algorithm for sound and image compression. This is a possibility for 
modules ASIC implementation. 
 
Reference: 
[1]  Daubechies I., Ten Lecture on wavelet, New York, Siam, 1992.  
[2] Daubechies I., Sweldens W., Factoring wavelet transforms into lifting steps, 
Journal Fourier Analysis Applications, vol. 4, pp. 247-269, 1998. 
[3] Gnavi S., Penna B., GrangettoM, and Magli E., Olmo G., DSP performance 
comparison between lifting and filter banks for image coding, Proc. of ICASSP, pp. III-
3144-III-3147, 2002. 
[4] Jensen A., La Cour-Harbo A., Ripples in mathematics - the discrete wavelet 
transforms, Springer Verlag, 2001. 
[5] Kishore A., Chaitali C., A VLSI Architecture for Lifting-Based Forward and Inverse 
Wavelet Transform, IEEE Transactions on Signal Processing, vol.50, 2002. 
[6] Resnikoff H. L., Wavelet analysis – Scalable structure of information, Springer, 
1998. 
[7] Ritter J., Molitor P., A pipelined architecture for partitioned DWT based lossy 
image compression using FPGA’s, Proc. 9th International Symposium on FPGA, 2001. 
 [8] Swedens W., The lifting scheme: A custom–design construction of biortogonal 
wavelets, Journal Applications Comp. Harmonic Analysis, vol. 3, no.2, pp.186-200, 1996. 
[9] Swedens W. Schroder P, Building your own wavelet at home, tech. rep., Industrial 
mathematics Initiave, University of South Carolina, no 5, 1995. 
[10] Trenas M. A., Opez J. L, and Zapataa E. L, Configurable Architecture for the 
Wavelet Packet Transform, Journal of VLSI Signal Processing 32, 255–273, 2002. 
 
ABOUT THE AUTHOR 
Vasil Kolev, Institute of Information and Communications Technologies, Bulgarian 
Academy of Sciences, Е-mail: kolev_acad@abv.bg. 
 
Filter (5/3) This 
work 
Kishore 
A. [5] 
Adders 4 8 
Shifters 2 4 
This 
work 
DSP [9] 
Fl. point 
Dwt[10] 
Fpga 
12µs 
 
400µs 
 
20µs 
