In the past afew years, wavelet transforms have become a hot topic of research. Discrete and continuous wavelet transforms have been widely used in signal and multimedia processing. Due to the high performance and flexibility of reconfigurable computing systems, it is very attractive to design a reconfigurable architecture for discrete and continuous wavelet transform of wide range of waveletfilters. In this paper, a unified computation framework for discrete and continuous wavelet transform based on lifting scheme and a reconfigurable architecture that includes reconfigurable lifting step arrays and reconfigurable address generator are proposed. The unified framework is the theory basis of this system. The step array is the computing core ofthis engine. And the address generator supports several memory scan pattern which is used to generate memory access addresses. In order to validate this architecture, an FPGA prototype is built based on Xilinx VirtexII FPGA to test the reconfiguration of 2-D discrete 5/3 and 9/7 transforms (defined in specification ofJPEG2000) and 2-D continuous Haar wavelet transform. Furthermore, a 3-level decomposition for a 512 x 512 grayscale image is performed and the results show that the decomposition can be finished within 12.16ms when running at 20MHz. It can be concluded that this design is applicable and scalable.
Introduction
In the past twenty years, there has been an everincreasing amount of interest in wavelet transform. There are several types of wavelet transforms depending on the nature of the signal (continuous or discrete) and the time and scale parameters (continuous or discrete). In this paper we focus on the implementation of the discrete wavelet transform (DWT) and the continuous wavelet transform (CWT) *This work is supported by Natural Science Foundation of Zhejiang Province, China (Grant No. Y105355).
tcontact author: Xuezeng Pan, email: xzpan@zju.edu.cn 1-4244-0910-1/07/$20.00 ®2007 IEEE. defined in [11] . The DWT and CWT are widely used in signal analysis [3] , image compression [8] [7] , and so on.
Generally speaking, certain wavelet is suitable for a class of signals or applications, and different wavelets or adaptive ones are selected according to different signals or applications. So it is significant to design a reconfigurable system by ASIC or reconfigurable hardware like FPGA for discrete and continuous wavelet transform of different wavelet filters.
At algorithmic level, the lifting scheme provides a theoretical basis for the reconfigurable system. In [4] , Daubechies and Sweldens proposed the lifting scheme for discrete transform, and Stoffel first extended the lifting scheme to unsubsampling wavelet transform, namely continuous wavelet transform [13] . So, it is possible to build a unified lifting scheme framework for reconfigurable computing of discrete and continuous wavelet transforms.
Lifting scheme can lead to a faster, in place calculation of wavelet transform. So it is widely used to speed up the DWT computation and possibly reduce the memory requirement of 2-D DWT. At implementation level, there have been a lot of efforts on VLSI architecture of discrete wavelet transforms. They did not mainly focus on wide wavelet filters but on one or two specific wavelet filters [9] [1] [10] [5] . As far as the CWT is concerned, little literature is available on mapping the CWT onto VLSI. Literature [2] presents a wide range of algorithms and architectures for computing the 1-D and 2-D DWT and the 1-D and 2-D CWT. However, those architectures are based on convolution computation and the DWT architecture is not compatible with the CWT. In [6] , a reconfigurable FPGA prototype of DWT for some wavelet filters is proposed. And literature [12] introduces a systemic and reconfigurable VLSI architecture for DWT. But neither [6] nor [12] discussed the reconfigurable computing of CWT.
The rest of this paper is organized as follows: A unified computation framework for DWT and CWT based on lifting scheme is proposed in section 2. In section 3, the reconfigurable architecture is presented and reconfigurable lifting step kernel and reconfigurable address generator are described in detail. An FPGA prototype for 5/3 and 9/7 is computed by the following: So Figure 1 . Block diagram of discrete wavelet transform and continuous wavelet transform.
DWT and Haar wavelet CWT is designed in section 4, furthermore, a 3-level decomposition for a 512 x 512 grayscale image is performed and the performance is tested. Finally, a brief summary in section 5 concludes this paper.
Unified Computation Framework of Discrete and Continuous Wavelet Transforms
Lifting scheme [4] is not only a new approach for constructing the second-generation wavelet but also an effective algorithm for accelerating computation of wavelet transforms. Computing wavelet transforms by lifting scheme can provide many benefits, such as allowing faster and fully inplace implementation of the wavelet transforms, immediate computation of the inverse transform, easy to manage the boundary extension, and so on.
The forward DWT consists of two analysis filters -low pass H(z-1) and G(z-1) high pass followed by subsampling while the inverse DWT first upsamples and then uses two synthesis filters low pass H(z) and high pass G(z). If the subsamplings and upsamples are deleted, this becomes the block diagram of CWT. From Figure 1 , we can see the difference between 2-scale continuous wavelet transform and discrete wavelet transform lies in using of subsampling, which can cause different factorization forms of wavelet filters with lifting scheme.
According to literature [4] , forward discrete wavelet transform is computed by the following (for even n): (1) Sj(z) and Ri(z) are low subband and high subband respectively. Sj(z) and ti(z) are predict lifting step and update lifting step respectively. cl and c2 are normalized factors.
Seven(z) and Sodd(z) are even part and odd part of So(z) respectively.
In literature [13] , forward continuous wavelet transform
where
and lifting steps are Qn-1(z) and Qn(z)-l and so on. c is a constant factor.
By integrating eq. (1) with eq. (2), we can built a unified computation framework for DWT and CWT.
Computing forward wavelet transform, the wavelet filter is factorized by
For DWT, Sol(z) and S02(z) are even part and odd part of So(z) respectively, while for CWT, they are equal to So(z).
When performing inverse wavelet transform, the wavelet filter is factorized by
The specific factorization of wavelet filters is calculated by Euclid's algorithm. 
and According to different modes, the related status of access to the two memories is listed in Table 2 . As for the data for the Xilinx Virtex II (xc2v500) FPGA. Table 3 presents the hardware resource usage of the design. The reconfigurable lifting array includes four lifting step kernels. The 5/3 and 9/7 wavelet filters that are defined in JPEG2000 standard [8] are chosen for 2-D DWT. The Haar wavelet, which is widely used in image edge detection, is taken as the wavelet filter of 2-D CWT. According to the unified computation framework introduced in section 2, 5/3, 9/7 and Haar wavelet filter are factorized respectively as follows: (8) storage ways, DWT coefficients are easily stored in in-place mode, while storing of CWT coefficients is rather complex.
Because the size of 1-D CWT coefficients is doubled, the low subband coefficients is stored using in-place mode and high coefficients is stored in the other memory. In Fig. 4 , the storage of 2-D CWT coefficients is done in a similar way. Table IV . We performed a 3-level decomposition for a 512x512 grayscale image by the reconfigurable wavelet engine architecture. The executive clock cycles of our architecture for a NxN tile with 3 levels decomposition is N*N/4+(4+N/4)*2N+(4+N/8)*N+(4+N/16)*N/2. The 3-level decomposition for a 512x512 grayscale image is finished within 12.16ms when running at 20MHz. Simulation results showed that the FPGA prototype system not only realized hardware reconfiguration but also has better performance for 2-D DWT and CWT. In this paper, we proposed a unified computation framework for DWT and CWT based on lifting scheme and design a reconfigurable architecture for DWT and CWT of wide range wavelet filter. Experiment results show the architecture is effective. Because the CWT and DWT are both widely used in signal and multimedia processing, this work is attractive and valuable. However, in the future, some work requires to be done, such as the design of fine granularity lifting step kernels and automatic generation of RTL code and configuration files of FPGA, etc.
