I. INTRODUCTION
T HE field of discrete wavelet transforms (DWT) has been attracting substantial interest in part due to the wavelet analysis being capable of decomposing a signal into a particular set of basis functions equipped with good spectral properties [1] - [4] . Wavelet analysis has been used to detect system nonlinearities by making use of its localization feature [5] . DWTbased multi-resolution analysis leads to both time and frequency localization [4] , [6] - [9] .
Indeed, wavelet filter banks establish a strong support for many signal processing systems [10] . Wavelets are employed in numerical analysis [11] , [12] , real-time processing [11] , image compression and reconstruction [3] , [13] - [17] , pattern recognition [11] , biomedicine [12] , approximation theory, computer graphics [18] and image, video coding standards (H.265) [19] , [20] . Following the adoption of the bi-orthogonal 2.2 wavelet filters in the JPEG2000 standard [3] , [6] , [13] , [21] , much research effort has been employed on reducing computational and circuit complexities of DWT hardware architectures in VLSI systems [2] , [11] , [13] , [22] , [23] .
A particular class of DWT are the Daubechies wavelets [24] . They are well-suited and commonly used in image compression applications [3] , [25] , [26] . Herein we refer to the Daubechies wavelets generated from 4-and 6-tap filter banks as Daub-4 and -6 wavelets, respectively. In particular, whereas the Daub-4 wavelets are often employed in applications where the signals are smooth and slowly varying, the Daub-6 wavelets are used for signals bearing abrupt changes, spikes, and having high undesired noise levels [11] . Daub-4 wavelets can be highly localized to smooth [2] , [25] and Daub-6 wavelets have found applications in medical imaging, such as wireless capsule endoscopy where images of fine details are regarded important [11] , [24] , [27] .
Since wavelets can be associated to specific filter banks, practical wavelet analysis is achieved by means of sub-band coding [24] , [28] - [30] . Sub-band coding is a basic filtering principle which splits a given signal in several frequency bands for subsequent encoding [31] . In particular, 2-D multi-resolution analysis is obtained via sub-band coding [24] , [28] .
In this paper, we propose a new multi-encoding technique that achieves exact computation of multi-level 2-D Daubechies wavelet transforms using algebraic integer (AI) encoding. Compared to existing AI designs in literature [1] - [3] , [6] , [11] , [13] , [19] , the proposed design can compute wavelet image approximations entirely over integer fields and with a single FRS in a purely AI based 2-D architecture. The design avoids the need of intermediate reconstruction steps.
Moreover, the proposed architecture is sough to be multiplierfree. Such design facilitate accuracy, speed, relatively smaller area on chip as well as cost of design. The new design is multiencoded and multi-rate, operating over AI with no intermediate reconstruction steps. In this framework, error-free computations can be performed until the final FRS. Our architecture emphasizes on quality of output image and speed by trading complexity and power consumption for accuracy. This paper unfolds as follows. Section II describes the AI context and brings a literature survey. An overview of the issue of fixed-point errors and their mitigation using AI bases is also provided. Section III reviews the principles of sub-band coding by means of Daub-4 and -6 filters. Section IV translates the the mathematical formalism of AI encoding into the 2-D sub-band coding context. Wavelet sub-band coding using multi-encoding with AI bases are provided for multi-level decomposition, considering both Daub-4 and -6 filter banks. The final reconstruction step (FRS) procedure for the proposed analyses are described in Section V. Based on the expansion factor method ( [32] , p. 274), alternative FRS schemes were also sought for the Daub-6 case. Field programmable gate array (FPGA) implementation results, hardware resource consumption, and power consumptions are provided in Section VI for both 4-and 6-tap filters.
We also compare published 1-D and 2-D DWT architectures with the proposed architectures. Maximum operating frequency, signal-to-noise ratio (SNR) and peak-signal-to-noise ratio (PSNR) figures are sought using the proposed designs operating in fixed-point. Concluding remarks are given in Section VII.
II. CONTRIBUTIONS

A. The Problem of Fixed-Point Errors
Filter banks associated to Daubechies wavelets have irrational coefficients whose representation in fixed-point requires truncation or rounding off [2] , [6] , [33] . Such approximations introduce representation errors which propagate through a given filter bank. Moreover, the longer the required filter bank is, the greater the computational error may become. This process effects a lower obtained signal-to-noise of the resulting data.
B. Prior Art on AI-Based DWT
AI encoding can address the computational noise injection in wavelet analysis systems [6] . Pioneered by Cozzens and Finkelstein [3] , [34] , [35] , AI quantization has been employed in several signal processing schemes, including wavelet and discrete cosine transform analysis [1] , [11] , [13] , [36] . A significant advantage of the AI encoding is its capability of mapping the required irrational wavelet coefficients into vectors or arrays of integers. Therefore, wavelet decomposition can be performed without errors in a vectorial framework consisting exclusively of integer operations. Thus, the irrational coefficients of the Daubechies filters can be represented into integers, according to a selected AI basis [3] , [6] , [37] .
AI encoding schemes require a reconstruction step to convert the resulting AI encoded quantities back into fixed-precision binary. The design of digital architectures for the 1-D Daub-4 and -6 filters were pioneered by Wahid and Dimitrov in the recent past. Importantly, the 2-D architectures proposed by Wahid et al. [1] - [3] , [6] , [13] [3] , [6] , [13] , [19] . Errors incurred in the intermediate reconstructions mitigate the benefits of using AI encoding for 2-D multi-level DWTs. This is an outstanding problem in the current literature which we identify and correct in the present contribution.
C. Proposed Encoding Scheme
We correct above described issue by proposing a multi-encoding method that possesses error-free computation across the 2-D decomposition levels. In our method, the reconstruction step appears only once, at the final level of decomposition and filtering [38] . Unlike the schemes described in [2] , [3] , [6] , [19] , our scheme operates entirely over the AI representation-up to a single and final reconstruction block-without any intermediate reconstruction steps. Thus, the FRS is the only possible source of computational errors.
In view of the above, we propose a new AI-based architecture for sub-band coding of images using 2-D Daub-4 and -6 wavelet filters. The AI quantization approach leads to an architecture possessing a parallel channel structure [36] . Input data is successively wavelet decomposed over several levels according to application requirements. The single FRS employs constant coefficient multipliers based on canonical signed digit (CSD) representation, offering low circuit complexity. This architecture facilitates very low levels of uncorrelated and uncoupled quantization noise in the final decomposed image data.
III. REVIEW OF SUB-BAND CODING
Wavelet decomposition of input image data can be accomplished by sub-band coding. A 2-D finite impulse response (FIR) filter bank processes the input data resulting in an approximation and detail sub-images.
The input image is of resolution pixels; and it is input to a pair of low-pass (approximation) and high-pass (detail) filters and , respectively. The filters operate columnwise on the image followed by dyadic down-sampling, i.e., only one of every two columns are retained. Then the same process is applied row-wise. The outputs are four sub-images , and , which represent the 2-D wavelet coefficients for the coarse approximation, vertical details, horizontal details, and diagonal details, respectively. This process is shown in Fig. 1 for one-level wavelet analysis via filter banks. Symbols and are used to denote the column-wise and row-wise down-sampling. respectively ( [39] , pp. 6-26). The resultant sub-images are all of size , because of dyadic down-sampling.
These operations can be performed recursively [24] , [28] . The resulting approximation can be re-submitted to the signal flow architecture shown in Fig. 1 . As a result, after each iteration a coarser approximation can be achieved. Let the original image to be analyzed be denoted by . Fig. 2 shows the recursive diagram of the multi-level wavelet analysis. After each set of filter banks, a coarser approximation , is furnished. Each level also produces the detail information. In this work, we focus on the computation of the coarser approximations . The topmost branch of the signal flow shown in Fig. 1 computes the approximation data. Detail data , and are normally discarded or thresholded in data compression applications [24] .
The 2-D FIR filter bank based on the Daub-4 and -6 filter bank is of particular relevance [2] , [28] . Let the low-pass filter associate to these filter banks be denoted as -and -, respectively. These particular filters possess irrational quantities as shown below [2] , [3] , [6] , [24] , [40] : --where the superscript denotes transposition.
IV. AI-BASED DAUBECHIES-4 AND -6 SCALING FILTERS
A. Mathematical Background
An algebraic integer is a real or complex number that is a root of a monic polynomial with integer coefficients [41] - [43] . Algebraic integers can be employed to define encoding mappings which can precisely represent particular irrational numbers by means of usual integers. Considering the roots of the monic polynomials , and we can extend the set of integers by including the algebraic integer and . Doing so, a given quantity can possibly be represented as where , and are integers. Sets and constitute two bases for AI encoding. Notice that these two bases are adequate for representing the 4-and 6-tap Daubechies filter coefficients. Thus, taking apart quantities and as scaling factors, the Daub-4 and -6 filter coefficients can be represented as --Therefore, these unnormalized low-pass FIR filters of 4-tap/6-tap can be split into separate filters given by
where Therefore, the Daub-4 and -6 filter bank analysis can be separated into two/three structures. This facilitates a two/four integer channel structure, where the integer coefficient filters and ; and and are considered. All implied computations are necessarily over an integer field.
Notice that a usual integer can be effortlessly represented in either basis: This is relevant for encoding image pixel values, which are integers. In practical terms, this means that no circuitry for encoding integer input data is necessary. AI based Daub-4 and -6 filter structures are shown in Figs. 4 and 5. These filters possess zero initial condition.
B. 2-D Filtering
We now provide the mathematical framework to describe the operation of the proposed AI-based multi-level encoding de- sign. The following notation is adopted in this work. Let be an matrix with columns and be an -point column vector. The operation ⦶ is defined according to: ⦶ where is the convolution operation. Analogously, operation is given by: ⦶ In other words, and ⦶ are the filtering operations along the rows and columns of a given image, respectively, followed by a dyadic down-sampling stage. 
where is the input image of integer pixel values. Substituting (1) into (3), we obtain:
Notice that . Thus, we obtain:
where and are given in Table I . The operations described above are illustrated in Fig. 3(a) . The combinational block A is exploited to compute and from the AI filter bank. The resulting filtered images decomposition and necessitate only integer arithmetic to be rendered. Multi-level analysis follows the same algorithm, i.e., multiplications by the AI base are never explicitly performed.
Further decompositions are similarly computed. In particular, the 2nd level decomposition is formulated below:
Applying (4) into above expression, we proceed as follows:
where approximations and have their fully expanded forms given in Table I . Indeed, the above manipulation can be similarly applied to the remaining approximation levels. Thus the th level approximation is furnished by where and are shown in Table I . Notice that the required multiplications by 3 shown in Table I can be easily realized by a bit-shift operation and addition, i.e.,
, where is an integer. The above multi-level analyses for Daub-4 DWT filters are depicted in Fig. 3(b) . Expressions for level 2 and shown in Table I induce the implementation of the combinational block B, as shown in Fig. 3(b) . The architecture of this block is detailed in Fig. 3(c) . Fig. 3(b) also shows the FRS block, which is detailed in the next section.
2) AI-Based Daub-6 DWT Decomposition: In a similar fashion, the Daub-6 filter bank can be put into the AI for- 
where , and are given in Table II . The error free integer operations described above are illustrated in Fig. 6(a) . The combinational block C is employed in order to furnish , and from the AI filter bank. The level 2 decomposition follows similar manipulations, as detailed below --⦶ Calling (5), we derive the following expression:
where the approximations , and are given in Table II . The general result for the level decomposition is shown in Table II . Fig. 6(b) depicts the full scheme of a four-level Daub-6 decomposition. The structure of combinatorial block D stems from the expressions shown in Table II for level 2 and decompositions. Fig. 6 (c) details this stage.
V. FINAL RECONSTRUCTION STEP
The proposed AI-based wavelet analyses based on Daub-4 and -6 filter banks are computed entirely over extended integer fields. However, the resulting AI encoded approximations , and must be converted back to standard fixed-point representation. This is required in order to interface the resulting approximation sub-images with conventional real time systems. Decoding operations for both Daub-4 and -6 consist of explicitly performing the following computations, respectively (6) (7) Fortunately, the factors and are always a power of two, which can be conveniently realized with bit-shift operation. The above decoding operations are realized at the FRS blocks depicted in Figs. 3(b) and 6(b) , respectively. Therefore, the only possible source of errors in the proposed architectures for Daub-4 and -6 are the multiplication by AI basis elements 
, (x) .
We propose two approaches for the FRS design: (i) CSD representation and (ii) expansion factor method. 
A. CSD Approximation
The FRS can be directly implemented by approximating the required irrationals in (6) and (7) into rationals. A possibility is employing CSD representation. Table III displays CSD encodings for and Table IV shows  encoding for , and for several word lengths as well as the associate relative errors. CSD encoding requires only bitshifters and adders/subtracters.
B. Expansion Factor Method
Expansion factors are scaling constants usually employed in the design of approximate discrete transforms [44] , [45] . In ( [32] , p. 274), Britanak et al. survey the topic in this context. Recently this methodology was extended and adapted to the design of final reconstruction blocks related to AI based architectures [38] .
An expansion factor is simply a constant that simultaneously scales a given set of real numbers into integer values. In practical terms, only approximate integers at a given error tolerance are sought.
In mathematical terms, we have the following structure. Let the AI elements , and constitute a vector . An expansion factor is a real number that satisfies the following minimization problem ([32, p. 274]) (8) where returns the Euclidean norm and is the rounding-off function. Resulting integer approximations are ,
given by , and . Now, we can recast (7) according to Notice that the above expression in parentheses can be evaluated by means of integer arithmetic, which requires simple additions and bit-shift operations in hardware. As a consequence, only a single non-integer multiplication by is required. As posed above, (8) is a non-linear, unconstrained optimization problem. Its intractability indicates the application of computational search. In this case, we must impose a constraint to the search space.
Thus, for with a precision of , we could obtain five distinct solutions for (8) . These values are listed in Table V . The scaling factor choice depends on the specific application in question, resource constraints, and the accepted error tolerance.
For example, taking , we obtain
Above particular scaling leads to percent relative errors of 0.0042, 0.0104, and 0.0014 in , respectively. We used the CSD representation for Daub-4 filters and both CSD representation and the expansion factor method for Daub-6.
The expansion factor method is expected to offer better results for larger basis. Indeed, the Daub-4 architecture could not benefit from the expansion factor method since its basis contains only one non-unity element: . However, because the AI basis related to the Daub-6 scheme has three non-unity elements , the expansion factor method could lead to useful architectures in the FRS following computational search algorithms for suitable integer combinations. In the next section we provide measurement results concerning the expansion factor method. The architectures for Daub-4 and -6 filter banks were implemented on Xilinx Virtex xc6vcx240t-1ff1156 device using the ML605 evaluation board. The designs were tested with six different standard images obtained from [46] . Gray 512 512 images images Woman, Cameraman, and Reflection to the Daub-4 filter banks whereas Mandrill, Lena, and CT head were submitted to the Daub-6 filter banks. Hardware results were verified with MATLAB. Fig. 7 displays hardware results from the Xilinx FPGA for the Daub-4 and -6 filter banks. Table VI shows a performance comparison among proposed Daub-4 and -6 architectures for single level decomposition of 8-bit Lena image.
For comparison, we devised a version of the proposed system that operates over fixed-point arithmetic instead of AI-based arithmetic. For such, we employed 8 bits for word size with 6 fractional bits. In this case, the required filter banks were implemented by quantizing the exact filter coefficients into the fixed-point representation. Notice that the fixed-point scheme incurs coupled quantization noise, whereas the AI-based architecture is immune to this source of contamination. Fig. 8 shows the results for the fixed-point design. Tables VII and VIII list resource consumption for the Daub-4 and -6 filter banks. Monitored resources include: the number of slice registers, the look-up table (LUT) count, and the number of configurable logic blocks (CLB).
A. Resource Consumption and Figures of Merit
Critical path delays (CPD), the maximum operating frequency, area-time product (AT), and were selected as figures of merit, being also reported in Tables VII and VIII. The AT product is a standard performance metric in digital hardware designs. It refers to chip-area and speed (maximum frequency) of the design. Lower AT values indicate a higher speed of operation. In an FPGA, the area (A) is provided by the number of slice LUTs used for logic given by the FPGA de- sign tool called XFLOW and the time is simply the critical path delay. Quantity is useful, when clock speed is the driving factor of design optimization, for high-throughput realizations. Table IX shows the estimated power consumption for the Daub-4 and -6 filter banks.
Xilinx power analyzer (XPA) was employed to analyze the power consumption on Xilinx FPGA Virtex-6 device. The quiescent (static) power dissipation is a combined effect of standby and leakage power (dominant) dissipations [54] . At 40 nm process technology static power dominates dynamic power. Dynamic power represents the fluctuating power as the design runs and is the sum of short-circuit and capacitive (switching of logic cells) power dissipations. Leakage and standby currents do exist in digital circuits and are reported for the entire FPGA. Dynamic power consumption is associated only with the logic of the design under test. Therefore, reported static power for FPGAs can be several Watts and has limited usefulness as a metric. Table XIII lists the dynamic power consumption for the single level decomposition. The quiescent power reported for FPGAs are for the entire chip, not just the relevant parts of the particular design being tested.
The memory requirement expressed as 1-deep FIFO elements count for the Daub-4 and -6 schemes are given in the Table X as a function of image size in pixels and number of decomposition levels .
The SNR and peak PSNR were adopted as figures of merit. 
B. Comparison With Existing Methods
A significant amount of work is published on 1-D and 2-D DWT VLSI architectures [1] - [3] , [6] , [13] , [19] , [51] , [55] - [57] . In particular, the designs proposed [3] , [6] address the Daub-4 and -6 wavelet analysis. Also detailed data is reported in [3] , [6] allowing us to derive meaningful comparisons.
Considering 8-bit input word length, the obtained SNR and PSNR values for proposed architectures, were roughly 30-40% higher than the 1-D and 2-D DWT architectures described in [3] , [6] .
Among the FRS approaches we have mentioned, we used canonical signed digit (CSD) approximation for comparison. Moreover, we compared the proposed architectures with several prominent VLSI 2-D DWT designs archived in literature. In particular, we separated the following works: [47] - [53] . Table IX shows the comparison results.
The proposed architectures are also compared with recently published AI based DWT architectures. Table XIII are for the CSD representation considering 8-bit equivalent word size, unless it is specifically mentioned that we employed the expansion factor method. The comparison is provided in Table XIII. The proposed architectures are entirely multiplier free with no coupled quantization noise; possess low levels of both uncorrelated and uncoupled quantization noise; and offer the maximum frequency of operation among others. Since the design is speed optimized using fine-grain pipelining and parallel architectures, it is not anticipated to yield advantages in terms of power and area. In a sense, we traded the speed (maximum frequency) for power and resources.
VII. CONCLUSION
We proposed a multi-encoded AI-based 2-D wavelet filter bank architecture capable of arbitrarily high numerical accuracy. The introduced design employs AI-based arithmetic which is (i) error-free, (ii) defined over integers, and (iii) free of multiplications.
By employing AI encoding, resulting wavelet decomposed images had SNR and PSNR figures improved by approximately 25-30% when compared to a counterpart fixed-point system with 8-bit word length and 6 fractional bits.
Comparing the paper [1] , our proposed Daub-4 and -6 architectures The SNR and PSNR values for the AI-based Daub-6 architecture were approximately 6-10% higher than the figures obtained from the Daub-4 architecture. The better mathematical properties of the Daub-6 wavelets, such as more vanishing moments, explains this difference. Due to its inherent simplicity of coefficients and smaller number of AI numbers, the Daub-4 AI-based architecture had consumed approximately 50% lower power than the Daub-6 systems. Moreover, its maximum frequency of operation is approximately 90% higher under the same conditions.
A single FRS is the only source of computational error. Noise injection from intermediate fixed-point errors is non-existent. We proposed several designs for the FRS based on CSD representation and expansion factor scaling. These two methods allowed various configurations of accuracy and tolerable circuit complexities. Applications exist in sub-band coding of high dynamic range image sequences. Standard images were analyzed. FPGA based four-level prototypes for Daubechies 4-and 6-tap wavelet filters are operational at a compilation target frequency of 100 MHz on the Xilinx ML605 board. Place-and-route timing analysis furnished 282.50 MHz and 146.42 MHz for the Daub-4 and -6 architectures, respectively. Daub-4 and -6 single level decomposition architectures were also FPGA prototyped with the Xilinx Virtex-6 device at 442.47 and 274.72 MHz, respectively. CMOS sensor arrays for imaging are being continuously improved with increasing resolutions. The dynamic range of typical imaging applications are also increasing and more emphasis is being made for picture quality. In the presence of higher resolution, increased dynamic range, and increased frame rate, there is no option but to increase the throughput of the digital filtering architectures.
Finally, it is important to notice that-in principle-the discussed AI based scheme can be applied to any type of DWT as long as the scaling and wavelet coefficients of the corresponding filters could be given and exact representation. For instance, this is the case for the Haar, Daubechies-4/-6, and Bior-5/3 wavelets. On the other hand, wavelets such as gaussian, mexican hat do not have a compatible DWT version.
