Multiplier Free Implementation of 8-tap Daubechies Wavelet Filters for Biomedical Applications by Eminaga, Y. et al.
WestminsterResearch
http://www.westminster.ac.uk/westminsterresearch
 
Multiplier Free Implementation of 8-tap Daubechies Wavelet 
Filters for Biomedical Applications
Eminaga, Y., Coskun, A. and Kale, I.
 
This is a copy of the author’s accepted version of a paper subsequently published in the 
proceedings of the IEEE New Generation of Circuits and Systems Conference, Genova, 
Italy 7 to 9 Sep 2017.
It is available online at:
https://dx.doi.org/10.1109/NGCAS.2017.63
© 2017 IEEE . Personal use of this material is permitted. Permission from IEEE must be 
obtained for all other uses, in any current or future media, including 
reprinting/republishing this material for advertising or promotional purposes, creating 
new collective works, for resale or redistribution to servers or lists, or reuse of any 
copyrighted component of this work in other works.
The WestminsterResearch online digital archive at the University of Westminster aims to make the 
research output of the University available to a wider audience. Copyright and Moral Rights remain 
with the authors and/or copyright owners.
Whilst further distribution of specific materials from within this archive is forbidden, you may freely 
distribute the URL of WestminsterResearch: ((http://westminsterresearch.wmin.ac.uk/).
In case of abuse or copyright appearing without permission e-mail repository@westminster.ac.uk
Multiplier Free Implementation of 8-tap Daubechies
Wavelet Filters for Biomedical Applications
Yaprak Eminaga, Adem Coskun, and Izzet Kale
Applied DSP and VLSI Research Group
University of Westminster
London, W1W 6UW, United Kingdom
Email: y.eminaga@my.westminster.ac.uk, a.coskun@westminster.ac.uk, kalei@westminster.ac.uk
Abstract—Due to an increasing demand for on-sensor biosignal
processing in wireless ambulatory applications, it is crucial to
reduce the power consumption and hardware cost of the signal
processing units. Discrete Wavelet Transform (DWT) is very
popular tool in artifact removal, detection and compression for
time-frequency analysis of biosignals and can be implemented as
two-branch filter bank. This work proposes a new, completely
multiplier free filter architecture for implementing Daubechies
wavelets which targets Field-Programmable-Gate-Array (FPGA)
technologies by replacing multipliers with Reconfigurable Multi-
plier Blocks (ReMBs). The results have shown that the proposed
technique reduces the hardware complexity by 40% in terms
of Look-Up Table (LUT) count and can be used in low-cost em-
bedded platforms for ambulatory physiological signal monitoring
and analysis.
Keywords—Biomedical applications, Discrete Wavelet Trans-
form (DWT), Reconfigurable Multiplier Blocks (ReMB), multi-
plierless implementation, Daubechies wavelets.
I. INTRODUCTION
Biomedical signals are known to be non-stationary thus their
analysis requires both local and spatial information in order to
maintain fidelity of the signal during signal processing such
as artefact removal and detection. Discrete Wavelet Trans-
form (DWT) performs signal decomposition via translated
and dilated versions of a basis function which effectively
localizes signal in both time and frequency domain. DWT
can be realized by two-channel quadrature mirror filter banks
with lowpass filter (h0) and highpass filter (h1) as shown
in Fig. 1 [1]. The multiresolution analysis is achieved by
recursive application of the filter bank on the lowpass filtered
output. The output from each filter is downsampled by 2
where outputs at lowpass and highpass branches are known as
the approximation coefficients (cAm[n]) and detail coefficients
(cDm[n]) covering spectrum below and above half sampling
frequency, respectively. Fig. 1 demonstrates implementation of
a three level analysis filter bank. Among wide range of wavelet
families, Daubechies wavelets are popular choices where the
Daubechies-4 (db4) wavelet, with four vanishing moments and
8-taps, has been used in many different biomedical signal
processing applications [2]–[4]. Daubechies wavelets lead to
orthogonal filter banks in which the lowpass and highpass
filters are non-symmetric with equal filter lengths, real and
fixed coefficients. They are known to be maximally flat filters
with dyadic coefficients that can be represented with fixed
h₁
h₀ 
h₀
h₁
↓2
↓2
↓2
↓2
h₀
h₁
↓2
↓2 
cD0[n]
cD1[n]
cD2[n]cA0[n]
cA1[n]
cA2[n]
x[n]
Fig. 1. Three level analysis filter bank
point arithmetic without significant loss of accuracy. Thus,
they can be implemented as signed binary numbers that can
reduce the hardware complexity and power consumption.
Biosignal processing tools such as filters and discrete
transforms employ constant multiplications that can be im-
plemented as shift-add operations to reduce the hardware
complexity and cost of the medical systems. For example,
in [5], [6], authors presented a decimation filter chain for
ElectroCardioGram (ECG) acquisition systems which replaces
constant multiplications with shift-add operations. DWT also
employs fixed coefficient filters associated with a selected
mother wavelet, hence it can benefit from shift-add network
topologies. Several studies, mainly in the area of image
processing, used these networks to implement wavelet filters
including 5/3, 9/7 lifting-based wavelet [7]–[9], and 4- and
6-tap Daubechies filters [10]. However, to the best of au-
thor’s knowledge, use of the Reconfigurable Multiplier Block
(ReMB) [11] for implementing wavelet filters has not been
investigated in the biomedical signal processing literature.
This paper presents a hardware efficient ReMB structure and
its Field-Programmable-Gate-Array (FPGA) implementation
that can be employed in time-multiplexed filter structures for
wavelet analysis of biomedical signals, suitable for low-cost
portable medical devices. This design is based on efficiently
employing dedicated resources of FPGA as presented in [11],
[12], however with an extension for taking advantage of
the new FPGA technology. Section II introduces the ReMB
method followed by the fixed-point db4 coefficients quantiza-
tion considerations, design details and implemented structure
of the proposed ReMB. Section III compares the resource
utilization figures of the proposed design and a general purpose
multiplier. Finally, Section IV presents the drawn conclusions.
II. METHOD
For this study, ReMB design methodology introduced in
[11] is extended for recent FPGAs which replace 4 input
Look-Up Tables (LUTs) with 6 input ones. A 1-bit full
adder/subtracter can be implemented using the dedicated
carry-chain logic and an LUT for the remaining XOR gate.
The concept of reconfigurability in multiplier blocks employs
a multiplexer where its output is connected to at least one
input of the adder. The multiplexer can be implemented using
the unused pins of the LUT that is used for an XOR gate.
Demirsoy et al. [11] introduced an ReMB algorithm which
maximizes the use of FPGA logic elements by adapting the
”basic graph structure”. A basic structure, as shown in Fig. 2
(a), is simply a two input adder with at least one of its inputs
connected to a 2:1 multiplexer that can be implemented with a
4-input LUT. Due to the dedicated resources of FPGA, adders
in basic structures are implemented as ripple-carry adders. In
this work, the basic structures are modified using new 6-input
LUTs which enables replacement of 2:1 multiplexers with 3:1
ones for no additional cost and are demonstrated in Fig. 2
(b). Inputs of these muxes can be connected to the input of
the ReMB or to the output of another basic structure or to
ground. In order to implement a set of coefficients, a number
of these basic structures can be interconnected in chain (i.e.
horizontally cascaded) and tree forms (i.e. inputs of a mux
connected to the output of another basic structure). Number
of generated coefficients at the output is dependent on basic
structure topology, number of basic structures and how they
are interconnected. For example, if two basic structures given
in Fig. 2 (b) are interconnected (both with 3 different outputs)
then the output set size is equivalent to 9 (3 × 3). To find a
valid ReMB design for an aimed coefficient set, it is critical to
realize required depth of the design, and adder depth of each
coefficient. Depth of a design represents number of required
cascaded stages to obtain required number of coefficients
and adder depth represents the number of cascaded adders
required on each path between the input and the output nodes
to generate each coefficient. Thus, following these requisites,
ReMB depth can be generalized using Eqn. 1.
ReMBdepth = max(ADmax,min(k)) (1)
where ADmax is the maximum adder depth and k is the
number of cascaded basic structures (i.e. layers).
B0
B1
A
<<
±b0
<<
±b1
<<
 ±a
S
Sum 
+/-
(a)
+/-
B0
B2
A
<<
±b0
<<
±b2
<<
 ±a
S0
Sum 
<<B1
±b1
S1
(b)
Fig. 2. Basic structure with (a) 2:1 multiplexer [11] and (b) 3:1 multiplexer.
TABLE I
SELECT LINE VALUES FOR MULTIPLEXERS (M0:M3) GIVEN IN FIG. 4.
Fixed-point Integer S0 S1 S2 S3
0.0107421875 11 1 0 1 0
0.033203125 34 1 0 1 1
0.03125 32 0 1 0 0
0.1875 192 0 0 2 0
0.0283203125 29 2 0 1 0
0.630859375 646 1 0 2 0
0.71484375 732 2 0 2 2
0.23046875 236 1 1 1 2
A. Coefficient Quantization
Accuracy of DWT depends on precision of decomposition
and reconstruction filter coefficients. Quantization of floating
point coefficients results in quantization error which accu-
mulates as it propagates through the filter bank. In order
to evaluate finite-word length effects on an input data, fil-
ter coefficients associated with the db4 mother wavelet are
quantized with various precision and employed in DWT.
ECG and ElectroEncephaloGram (EEG) signals obtained from
Physionet [13] are selected as reference signals for evaluation.
The error variance between input and reconstructed data, with
floating point and fixed point filter coefficients are measured.
Fig. 3 demonstrates measured error values using different
coefficient word-lengths for both ECG (blue) and EEG (red).
Approximately -70 dB error variance can be observed with
coefficient word-length of 11 bits (10 fractional bits) which
is decided to be negligible for this study as such error is
not observable during time domain analysis by clinicians. The
aforementioned coefficients are also scaled with 210 in order
to have integer values and their absolute values are given in
Table I.
8 9 10 11 12 13 14 15 16
-95
-90
-85
-80
-75
-70
-65
-60
-55
-50
-45
ECG input
EEG input
Fig. 3. Estimated error variance between input and reconstructed output with
various filter coefficient precision.
B. ReMb Design Details
The db4 filters are implemented using ReMB designed
by using the basic structures given in Fig 2 (b). There are
eight distinct coefficients for both highpass and lowpass filters,
which simplyfies ReMB design procedure. Before starting the
design, it is critical to evaluate adder depth of individual
coefficients and the required number of layers needed to
generate eight coefficients. One basic structure as in Fig.
2 (b) can generate three distinct values (i.e. its depth is
three). Interconnecting two of these structures as a chain can
generate maximum of nine distinct integers whereas three
interconnected basic structures in a tree form with two layers
can generate 27 integers at most. Therefore, the number of
layers required for eight coefficients with the proposed basic
structures is two. In addition, maximum adder depth of the
coefficients is two, however three basic structures are required
to realize all coefficients. Therefore, according to Eqn. 1,
ReMBdepth is calculated as two with three basic structures
interconnected in a tree form. Fig. 4 demonstrates the ReMB
designed accordingly which employs three adders, one 2:1
and three 3:1 multiplexers. The main controller addresses the
required select values to generate coefficients in correct order.
Select values for each mux (M0:M3) in Fig. 4 are denoted
as S0:S3, respectively and are presented in Table I where ’0’
selects top, ’1’ selects bottom (for 2:1 mux) or middle (for
3:1 mux) and finally ’2’ selects bottom input of a 3:1 mux.
The proposed ReMB is targeted for FPGA platforms which
takes the advantage of using the dedicated fast carry logic and
implements multiplexers with no additional cost, as described
before. However, when non-FPGA technologies are targeted,
then ReMB can be redesigned with increased flexibility of
using larger muxes. In FPGA platforms, resource cost for an
individual mux with more than three inputs is comparable to
an adder’s. For instance, implementation of both 4-bit 4:1 mux
and 4-bit full adder, utilize four LUTs each, whereas for non-
FPGA technologies, multiplexer cost is relatively cheaper [14].
III. EXPERIMENTAL RESULTS AND
COMPARISONS
For this study, two time-multiplexed Time-Delay Line
(TDL) implementations are realized in order to compare the
resource utilization of a conventional reference design with a
M0<< 1
<< 3
+/-
xm[n]
M1
<< 2
+/-
M3
M2
<< 1
+/-
<< 4
<< 2
<< 6
<< 1
<< 2
ym[n]
Fig. 4. Constant multiplier block designed for db4 filter coefficients.
Input
Memory
Control
Reg
Reg
y[n] 
x[n] 
Coefficient
Memory
Reg
(a)
Input
Memory
ReMB
Control Reg
Reg
y[n] 
x[n] 
(b)
Fig. 5. A time-multiplexed TDL FIR filter implemented using (a) a dedi-
cated multiplier and coefficient memory, and (b) the proposed ReMB block
replacing multiplier and coefficient memory.
general purpose multiplier and a design with ReMB which are
demonstrated in Fig. 5 (a) and (b), respectively. The reference
time-multiplexed FIR filter structure is comprised of an input
memory and a coefficient memory, and a single Multiply-
ACcumulate (MAC) unit with a general purpose multiplier.
Such filter structure operates sequentially. At every cycle,
incoming data is multiplied with one coefficient stored in
memory and this process is controlled with a simple control
unit. Each generated product is accumulated with the previous
one by using an accumulator and a register.
On the other hand, the proposed multiplier block generates
absolute values of the required filter coefficients and thus,
replaces coefficient memory and general purpose multiplier
of the reference design. As it can be seen from Fig. 5 (b),
a multiplexer is placed after the ReMB which is responsible
for selecting between generated coefficient or its complement.
Here, the controller is responsible for addressing the correct
coefficient for each tap by generating correct control lines for
multiplexers and adders/subtracters employed in the ReMB as
well as the multiplexer after it (given in Section II).
For all experiments, filter architectures are designed using
System Generator for DSP in MATLAB Simulink environment
and are implemented on Kintex-7 FPGA with Vivado v16.2.
The resource utilization for each aforementioned architecture
after implementation is demonstrated in Table II in terms
of LUTs, and Flip-Flops (FFs) and compared. In addition,
the critical path delay for the multiplier and the ReMB are
demonstrated in terms of adder and multiplier operation times,
indicated using τa and τm, respectively. Looking at Table II,
it can be observed that resource utilization for the proposed
design is less compared to the reference design. The reference
design’s cost is estimated as 212 LUTs and 258 FFs where
multiplier on its own costs 163 LUTs and 129 FFs. On
the other hand, ReMB demonstrates high savings against the
TABLE II
RESOURCE UTILIZATION OF THE PROPOSED AND THE REFERENCE
DESIGNS AFTER IMPLEMENTATION ON XILINX KINTEX-7 DEVICE.
Filter resource utilization after implementation
Fig. 5 (a) Fig. 5 (b)
LUT 212 144
FF 258 151
Multiplier resource utilization
General purpose multiplier ReMB
LUT 163 93
FF 129 56
Critical path delay τm 2τa
reference design. Overall filter cost is reduced by 38% where
ReMB utilizes 43% and 57% less LUTs and FFs, respectively.
In FPGA implementations multiplexer delays are not included
in path delay since they are embedded into LUTs, therefore it
is only critical to consider the logic depth of the adders. The
proposed design has a low logical depth since the adder depth
is two, compared to the general multiplier which will reduce
critical path delay for the multiplication operation.
IV. CONCLUSION
In this paper hardware efficient implementation for the
db4 wavelet and scaling filters are presented that employs a
specifically designed ReMB. It is shown that the addition of
multiplexers into shift-add networks provides reconfigurability
to well known constant multiplication blocks. By taking the
advantage of recent FPGA technologies having 6-input LUTs,
3:1 muxes are employed in the design of ReMBs at no addi-
tional hardware cost which updates the techniques proposed
in the state of the art. In order to evaluate resource efficiency
of the proposed structure, it is implemented on a Kintex-7
FPGA and is compared to reference designs. As the results
reported in this paper demonstrate, the proposed ReMB can
decrease overall hardware cost of a time multiplexed filter
by 40% compared to a general purpose multiplier. The low-
cost and hardware efficient structure of the proposed multiplier
is suitable for DWT filter banks and can be used in low-
cost embedded platforms for ambulatory physiological signal
monitoring and analysis.
ACKNOWLEDGMENT
The authors wish to thank the University of Westminster
Faculty of Science and Technology for the PhD Studentship.
REFERENCES
[1] S. Mallat, A wavelet tour of signal processing: the sparse way. Aca-
demic Press, 2008.
[2] S. Lahmiri, “Comparative study of ECG signal denoising by wavelet
thresholding in empirical and variational mode decomposition domains,”
Healthcare Technology Letters, vol. 1, no. 3, pp. 104–109, 2014.
[3] C. Ye, B. V. Kumar, and M. T. Coimbra, “Heartbeat classification
using morphological and dynamic features of ECG signals,” IEEE
Transactions on Biomedical Engineering, vol. 59, no. 10, pp. 2930–
2941, 2012.
[4] N. K. Al-Qazzaz, S. Hamid Bin Mohd Ali, S. A. Ahmad, M. S. Islam,
and J. Escudero, “Selection of mother wavelet functions for multi-
channel EEG signal analysis during a working memory task,” Sensors,
vol. 15, no. 11, pp. 29 015–29 035, 2015.
[5] Y. Eminaga, A. Coskun, S. A. Moschos, and I. Kale, “Low complexity
all-pass based polyphase decimation filters for ecg monitoring,” in Ph.
D. Research in Microelectronics and Electronics (PRIME), 2015 11th
Conference on. IEEE, 2015, pp. 322–325.
[6] Y. Eminaga, A. Coskun, and I. Kale, “Two-path all-pass based half-
band infinite impulse response decimation filters and the effects of their
non-linear phase response on ecg signal acquisition,” Biomedical Signal
Processing and Control, vol. 31, pp. 529–538, 2017.
[7] A. Darji, R. Arun, S. N. Merchant, and A. Chandorkar, “Multiplier-less
pipeline architecture for lifting-based two-dimensional discrete wavelet
transform,” IET Computers & Digital Techniques, vol. 9, no. 2, pp. 113–
123, 2014.
[8] C.-H. Hsia, J.-H. Yang, and W. Wang, “An efficient VLSI architecture
for discrete wavelet transform,” in Signal and Information Processing
Association Annual Summit and Conference (APSIPA), 2015 Asia-
Pacific. IEEE, 2015, pp. 684–687.
[9] J. Wu, A. Ang, K. M. Tsui, H. Wu, Y. S. Hung, Y. Hu, J. Mak, S.-
C. Chan, and Z. Zhang, “Efficient implementation and design of a
new single-channel electrooculography-based human–machine interface
system,” IEEE Transactions on Circuits and Systems II: Express Briefs,
vol. 62, no. 2, pp. 179–183, 2015.
[10] S. K. Madishetty, A. Madanayake, R. J. Cintra, V. S. Dimitrov, and D. H.
Mugler, “VLSI architectures for the 4-tap and 6-tap 2-D Daubechies
wavelet filters using algebraic integers,” IEEE Transactions on Circuits
and Systems I: Regular Papers, vol. 60, no. 6, pp. 1455–1468, 2013.
[11] S. S. Demirsoy, I. Kale, and A. Dempster, “Reconfigurable Multiplier
Blocks: Structures, Algorithm and Applications,” Circuits, Systems, and
Signal Processing, vol. 26, no. 6, pp. 793–827, 2007.
[12] S. S. Demirsoy, I. Kale, and A. G. Dempster, “Synthesis of Reconfig-
urable Multiplier Blocks: Part I-Fundamentals,” in IEEE International
Symposium on Circuits and Systems, ISCAS, 2005. IEEE, 2005, pp.
536–539.
[13] A. Goldberger, L. Amaral, L. Glass, J. Hausdorff, P. Ivanov, R. Mark,
J. Mietus, G. Moody, C. Peng, and H. Stanley, “PhysioBank, Phys-
ioToolkit, and PhysioNet: Components of a New Research Resource for
Complex Physiologic Signals,” Circulation, vol. 101, no. 23, pp. E215–
20, 2000.
[14] P. Tummeltshammer, J. C. Hoe, and M. Puschel, “Time-multiplexed
multiple-constant multiplication,” IEEE Transactions on Computer-
Aided Design of Integrated Circuits and Systems, vol. 26, no. 9, pp.
1551–1563, 2007.
