



Design and Implementation of Complexity Reduced Digital Signal 
Processors for Low Power Biomedical Applications
Eminaga, Y.
 
This is an electronic version of a PhD thesis awarded by the University of Westminster. 
© Miss Yaprak Eminaga, 2019.
The WestminsterResearch online digital archive at the University of Westminster aims to make the 
research output of the University available to a wider audience. Copyright and Moral Rights remain 
with the authors and/or copyright owners.
Whilst further distribution of specific materials from within this archive is forbidden, you may freely 
distribute the URL of WestminsterResearch: ((http://westminsterresearch.wmin.ac.uk/).
In case of abuse or copyright appearing without permission e-mail repository@westminster.ac.uk
Design and Implementation of Complexity
Reduced Digital Signal Processors for Low
Power Biomedical Applications
Yaprak EMI˙NAG˘A
A thesis submitted in partial fulfilment of the requirements




my parents O¨zdal & Mustafa EMI˙NAG˘A
and
to the loving memories of my grandparents
Bahire & Hu¨seyin KUBI˙LAY
i
I declare that the work presented in this thesis is my own, has not been submitted
for any other award, is identical to the content of the electronic submission and that,
to the best of my knowledge, it does not contain any material previously created by




Wearable health monitoring systems can provide remote care with supervised, inde-
pendent living which are capable of signal sensing, acquisition, local processing and
transmission. A generic biopotential signal (such as Electrocardiogram (ECG), and
Electroencephalogram (EEG)) processing platform consists of four main functional
components. The signals acquired by the electrodes are amplified and preconditioned
by the (1) Analog-Front-End (AFE) which are then digitized via the (2) Analog-to-
Digital Converter (ADC) for further processing. The local digital signal processing
is usually handled by a custom designed (3) Digital Signal Processor (DSP) which is
responsible for either anyone or combination of signal processing algorithms such as
noise detection, noise/artefact removal, feature extraction, classification and compres-
sion. The digitally processed data is then transmitted via the (4) transmitter which
is renown as the most power hungry block in the complete platform. All the afore-
mentioned components of the wearable systems are required to be designed and fitted
into an integrated system where the area and the power requirements are stringent.
Therefore, hardware complexity and power dissipation of each functional component
are crucial aspects while designing and implementing a wearable monitoring platform.
The work undertaken focuses on reducing the hardware complexity of a biosignal DSP
and presents low hardware complexity solutions that can be employed in the aforemen-
tioned wearable platforms.
A typical state-of-the-art system utilizes Sigma Delta (Σ∆) ADCs incorporating a
Σ∆ modulator and a decimation filter whereas the state-of-the-art decimation filters
employ linear phase Finite-Impulse-Response (FIR) filters with high orders that in-
crease the hardware complexity [1–5]. In this thesis, the novel use of minimum phase
Infinite-Impulse-Response (IIR) decimators is proposed where the hardware complexity
is massively reduced compared to the conventional FIR decimators. In addition, the
non-linear phase effects of these filters are also investigated since phase non-linearity
may distort the time domain representation of the signal being filtered which is un-
iii
desirable effect for biopotential signals especially when the fiducial characteristics carry
diagnostic importance. In the case of ECG monitoring systems the effect of the IIR
filter phase non-linearity is minimal which does not affect the diagnostic accuracy of
the signals.
The work undertaken also proposes two methods for reducing the hardware complexity
of the popular biosignal processing tool, Discrete Wavelet Transform (DWT). General
purpose multipliers are known to be hardware and power hungry in terms of the number
of addition operations or their underlying building blocks like full adders or half adders
required. Higher number of adders leads to an increase in the power consumption which
is directly proportional to the clock frequency, supply voltage, switching activity and
the resources utilized. A typical Field-Programmable-Gate-Array’s (FPGA) resources
are Look-up Tables (LUTs) whereas a custom Digital Signal Processor’s (DSP) are
gate-level cells of standard cell libraries that are used to build adders [6]. One of the
proposed methods is the replacement of the hardware and power hungry general pur-
pose multipliers and the coefficient memories with reconfigurable multiplier blocks that
are composed of simple shift-add networks and multiplexers. This method substantially
reduces the resource utilization as well as the power consumption of the system. The
second proposed method is the design and implementation of the DWT filter banks
using IIR filters which employ less number of arithmetic operations compared to the
state-of-the-art FIR wavelets. This reduces the hardware complexity of the analysis
filter bank of the DWT and can be employed in applications where the reconstruction
is not required. However, the synthesis filter bank for the IIR wavelet transform has a
higher computational complexity compared to the conventional FIR wavelet synthesis
filter banks since re-indexing of the filtered data sequence is required that can only
be achieved via the use of extra registers. Therefore, this led to the proposal of a
novel design which replaces the complex IIR based synthesis filter banks with FIR fil-
ters which are the approximations of the associated IIR filters. Finally, a comparative
study is presented where the hybrid IIR/FIR and FIR/FIR wavelet filter banks are de-
ployed in a typical noise reduction scenario using the wavelet thresholding techniques.
It is concluded that the proposed hybrid IIR/FIR wavelet filter banks provide better
denoising performance, reduced computational complexity and power consumption in
comparison to their IIR/IIR and FIR/FIR counterparts.
iv
Acknowledgements
First and foremost, I would like to express my sincerest appreciations to my supervisor
Dr. Adem Cos¸kun for his constant support, approachability, enthusiasm, and in-depth
knowledge throughout my PhD. He has always been supportive, positive and more
importantly understanding. It is an honour to be his first PhD student and thank you
for making my PhD experience a very enjoyable one.
I would also like to express my deepest gratitude to my second supervisor Prof. I˙zzet
Kale for his endless support, encouragement, always positive attitude and great know-
ledge throughout my PhD. Thank you for always enlightening my way for more than
a decade, for always pointing me the correct direction, and for encouraging me to try
my best even during the difficult times. It has been and always will be an honour to
work under your guidance.
To Dag˘han O¨zbilenler, thank you for making my life better and for helping me get over
any obstacle I face. I would not get over this journey without your endless support. I
would also like to thank my whole family and friends for having faith in me.
Last but not the least, I would like to thank my parents for being my inspiration and
supporting me both financially and emotionally, and believing in me throughout my
whole life. I would not be the person I am and I would not be where I am now, if it
was not for the endless compromises you made and your infinite love. My dear sister
Bas¸ak and dear brother Barıs¸ thank you for giving me the opportunity to know that I





AIQ Algebraic Integer Quantization
AP Action Potential
ASIC Application Specific Integrated Circuit
AV Atrioventricular
BPM Beats Per Minute
CIC Cascaded Integrator-Comb
CLB Configurable Logic Block
CNS Central Nervous System
CSC Cross Spectral Coherence
CSD Canonic Signed Digit
CSE Common Sub-expression Elimination
CWT Continuous Wavelet Transform
DA Distributed Arithmetic
DAG Directed Acyclic Graph
DCT Discrete Cosine Transform
DFT Discrete Fourier Transform
DR Distortion Ration
DSP Digital Signal Processor
vi








FIR Finite Impulse Response





IIR Infinite Impulse Response
KCM Constant Coefficient Multiplier
LIFO Last-in First-out
LMS Least Mean Square
LSB least significant bit
LTC Long Term Condition
LUT Look-up Table
MAC Multiply-Accumulate
MAE Maximum Absolute Error
MAG Minimum Adder Graph
MCM Multiple Constant Multiplication
MEMS Micro-Electro-Mechanical Systems
MSB most significant bit
MSD Minimum Signed Digit
vii
MSE Mean Square Error
ND(TDL) Numerator-Denominator TDL
PAR Place and Route
PCA Principle Component Analysis
PDA Personal Data Assistant
PNS Peripheral Nervous System
PR perfect reconstruction
PSD Power Spectral Density
PZP Pole-Zero Plane
QMF quadrature mirror
QNPSD Quantization Noise Power
RAG-n Reduced Adder Graph
RAM Random Access Memory
ReMB Reconfigurable Multiplier Block
RF Radio Frequency










STFT Short-Time Fourier Transform
TDA Tap-Delay Accumulate
TDL Tapped Delay Line
viii
TSMC Taiwan Semiconductor Manufacturing Company
TWAC T-Wave Alternans Challenge
VHDL VHSIC Hardware Description Language
VLSI Very-Large-Scale Integration






List of Acronyms vi
List of Figures xiii
List of Tables xix
1 Introduction 1
1.1 Wireless Body Area Network . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 WBAN General Architecture . . . . . . . . . . . . . . . . . . . 4
1.1.2 State-of-the-art WBAN Systems with Local Processing . . . . . 6
1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3 Novel Contributions of the Work . . . . . . . . . . . . . . . . . . . . . 13
1.4 Outline of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.5 Chapter Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2 Physiological Background and Wavelet Theory 20
2.1 Introduction to Physiological Signals . . . . . . . . . . . . . . . . . . . 20
2.1.1 Electrocardiography (ECG) . . . . . . . . . . . . . . . . . . . . 21
2.1.2 Electroencephalography (EEG) . . . . . . . . . . . . . . . . . . 25
2.1.3 Electromyography (EMG) . . . . . . . . . . . . . . . . . . . . . 28
2.2 Introduction to Wavelet Theory . . . . . . . . . . . . . . . . . . . . . . 30
2.2.1 Short-Time Fourier Transform . . . . . . . . . . . . . . . . . . . 32
2.2.2 Wavelet Transform . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.3 Chapter Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3 Decimation Filter for Wearable ECG Monitoring Systems 42
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.2 State-of-the-art Decimation Filters in Biomedical Applications . . . . . 44
3.3 Proposed Decimation Filter Structure . . . . . . . . . . . . . . . . . . . 45
3.3.1 The Slink Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3.2 Two-Path All-pass Based Half-Band IIR Filters . . . . . . . . . 47
x
3.3.3 The Slink Roll-off Compensation Filter . . . . . . . . . . . . . . 49
3.3.4 Decimation Chain Magnitude Response . . . . . . . . . . . . . . 49
3.4 Phase Characteristics of All-pass Based Half-band Polyphase IIR Filter 50
3.5 Phase Compensation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.6 T-wave Alternans Challenge and MIT-BIH Arrhythmia Database . . . 54
3.7 Error Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.8 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.9 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.10 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4 Investigation of Hardware Efficient Implementation Methods for Wave-
let Filter Banks 71
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.2 Parallel Architectures and Multiplier Blocks . . . . . . . . . . . . . . . 73
4.2.1 Common Sub-expression Elimination (CSE) Technique . . . . . 74
4.2.2 Directed Acyclic Graph (DAG) . . . . . . . . . . . . . . . . . . 76
4.3 Time-multiplexed Architectures and Multiplier Blocks . . . . . . . . . . 77
4.4 Reconfigurable Multiplier Blocks for FPGAs . . . . . . . . . . . . . . . 79
4.4.1 ReMBs for 4-series FPGAs . . . . . . . . . . . . . . . . . . . . . 79
4.4.2 ReMBs for 7-series FPGAs . . . . . . . . . . . . . . . . . . . . . 83
4.5 ReMB for Daubechies Filters in Biomedical Applications . . . . . . . . 86
4.5.1 db4 Filter Coefficient Quantization . . . . . . . . . . . . . . . . 87
4.5.2 ReMB Structures for db4 Filters . . . . . . . . . . . . . . . . . . 90
4.5.3 db4 Filters and DWT Filter Bank Architectures . . . . . . . . 93
4.6 Hardware Validation and Cost Assessment . . . . . . . . . . . . . . . . 99
4.6.1 Multiplier Block Cost Assessment . . . . . . . . . . . . . . . . . 99
4.6.2 Time-Multiplexed FIR Filters Cost Assessment . . . . . . . . . 101
4.6.3 1-level Analysis Filter Bank Cost Assessment . . . . . . . . . . 102
4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5 IIR Wavelet Filter Banks for Biomedical Signal Processing 109
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.2 Orthogonal IIR Wavelet Analysis Filter Banks . . . . . . . . . . . . . . 111
5.3 IIR Wavelet Synthesis Filter Banks . . . . . . . . . . . . . . . . . . . . 119
5.3.1 The Block Processing Technique . . . . . . . . . . . . . . . . . . 122
5.3.2 Approximation of anti-causal IIR filters by causal FIR filters . . 125
5.4 Design of IIR Wavelet Analysis Filters for Biomedical Applications . . 128
5.5 IIR/IIR Wavelet Filter Banks for Biomedical Applications . . . . . . . 131
5.5.1 Floating-Point Models . . . . . . . . . . . . . . . . . . . . . . . 131
5.5.2 Fixed-Point Models . . . . . . . . . . . . . . . . . . . . . . . . . 139
5.5.3 Hardware Validation and Cost Assessment . . . . . . . . . . . . 146
xi
5.6 Hybrid IIR/FIR Wavelet Filter Banks for
Biomedical Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 152
5.6.1 Hybrid IIR/FIR Wavelet Filter Banks - Floating Point . . . . . 153
5.6.2 Hybrid IIR/FIR Wavelet Filter Banks - Fixed Point . . . . . . . 159
5.6.3 Harware Validation and Cost Assessment . . . . . . . . . . . . . 163
5.7 Hybrid IIR/FIR Wavelet Filter Banks for ECG Signal Denoising . . . . 172
5.7.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
5.7.2 Generated ECG data and Synthetic Noise Sources . . . . . . . . 175
5.8 Results and Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . 177
5.9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
6 Conclusions and Future Work 184
6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184







1.1 The architecture of a wearable WBAN system, illustrating the typical
three tier system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Main functional blocks of WBAN sensors; AFE, Σ∆ ADC, DSP, RF
communications and Power Management . . . . . . . . . . . . . . . . . 5
2.1 Conduction system of the heart and generation of a ECG signal by the
temporal and spatial summation of APs [47]. . . . . . . . . . . . . . . . 22
2.2 Time domain features of an ECG signal: P, QRS, and T represent at-
rial depolarization, ventricular depolarization, and atrial and ventricular
repolarization respectively [48] . . . . . . . . . . . . . . . . . . . . . . . 23
2.3 Lateral view of the cerebral cortex with occipital, temporal, parietal and
frontal lobes [52]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4 Example EEG recordings at (a) frontal, (b) temporal, (c) parietal and
(d) occipital lobe, which were recorded while subject performed different
motor/imagery tasks [55]. . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.5 A schematic view of the activation of muscle fibres by the CNS impulse [52]. 29
2.6 Example recording of EMG obtained from individual and combined fin-
ger movements of a healthy subject. Each five second interval (indicated
with dashed lines) represents different finger’s movement [65]. . . . . . 29
2.7 Time and frequency domain responses of a (a) Stationary and (b) Non-
stationary signal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.8 Example wavelet functions of different families. . . . . . . . . . . . . . 35
2.9 Time and frequency resolution of (a) Short-Time Fourier Transform and
(b) Wavelet Transform [88]. . . . . . . . . . . . . . . . . . . . . . . . . 37
2.10 DWT analysis and synthesis filter banks, for 3 level decomposition and
reconstruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.1 Behavioural structure of the decimation filter, incorporating the 4th or-
der Slink, two 2-path all-pass based HB IIR, and Slink compensation
filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2 The magnitude response of the fourth order Slink filter given in (3.1). . 46
3.3 Two-path all-pass based HB IIR decimator structure, incorporating all-
pass filters A1pzq and A2pzq in the top and bottom paths respectively. . 47
3.4 First order all-pass filter structures for (a) A1pzq in the top branch with
α1  0.125 and (b) A2pzq in the bottom branch with α2  0.5625 . . . . 48
xiii
3.5 Magnitude response of the proposed Half-Band (HB) Infinite Impulse
Response (IIR) filters presented in Figure 3.3. . . . . . . . . . . . . . . 48
3.6 Slink roll-off compensation filter structure with coefficient αc  0.03125. 49
3.7 Full band magnitude response of the overall decimation filter, (a) at
input rate (zoomed into the passband region) and (b) at the output rate
(decimated) along with the Slink (blue) and Slink compensation (red)
filters magnitude response. . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.8 The group delay of the two-path all-pass based HB IIR filter with the
all-pass section coefficients α1  0.125 and α2  0.5625. . . . . . . . . . 52
3.9 The two-path polyphase IIR filters normalized group delay response cor-
rected using (a) a single section corrector and (b) a 4 section corrector
along with the original filters (blue), and the phase compensators (red)
group delay responses. . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.10 PSD of 10 seconds long (a) 12 lead recordings (from Lead I to Lead V6)
of record twa55 in sinus rhythm obtained from TWAC Database and (b)
13 Lead II recordings from MIT-BIH Arrhythmia Database with various
conduction abnormalities and beat morphologies. (AFIB :Atrial fibril-
lation , AFL: Atrial flutter, SBR: Sinus bradycardia, IVR: Idioventricu-
lar rhythm, SVTA: Supraventricular tachyarrhythmia, VFL: Ventricular
flutter, VT: Ventricular tachycardia, BII: 2 heart block, PREX: Pre-
excitation, B: Ventricular bigeminy, T: Ventricular trigeminy and P:
Paced rhythm) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.11 Power spectrum measurements within the signal bandwidth of 500 Hz
for sampling rate of 64 kHz, (a) Σ∆ modulator output and (b) decima-
tion chain output (without phase compensation (black) and with phase
compensation (red). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.12 PSD of Lead II recording of record twa55 at sinus rhythm (blue) versus
the group delay variation with (green) and without phase compensation
(red) filters, f1  22.5 Hzpν1  0.0625q and f2  45 Hzpν2  0.125q
indicated by the yellow lines at fs2  46.08 kHz. . . . . . . . . . . . . . 59
3.13 Decimation chain output (black) versus the input(red) (a) without phase
compensation and (b) with phase compensation. . . . . . . . . . . . . . 61
3.14 Amplitude difference between the input and output of the decimation
chain, without (original - red) and with compensation (corrected - black). 61
3.15 (a) Waveform Dissimilarity between the input and output of the overall
decimation chain without (blue) and with group delay compensation
(red). (b) Input/ Output Distortion Ratios of the overall decimation
chain without (blue) and with phase compensation (red). . . . . . . . . 62
3.16 Decimation chain output (black) versus the input (red) (a) without
phase compensation and (b) with phase compensation (Ventricular Ta-
chycardia). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.17 Amplitude difference between the input and output of the decimation
chain, without (red) and with compensation (black). (Ventricular Ta-
chycardia) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
xiv
3.18 (a) Waveform Dissimilarity between the input and output of the overall
decimation chain without (blue) and with group delay compensation
(red). (b) Input/ Output Distortion Ratios of the overall decimation
chain without (blue) and with phase compensation (red). . . . . . . . . 64
4.1 Parallel (a) Tapped Delay Line (TDL) and (b) Time Delay and Accu-
mulate (TDA) filter architectures where the boxes highlight the multi-
plication blocks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.2 Constant multiplier structure for coefficients 7, 29, 39 (a) before and (b)
after CSE technique. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.3 Multiplier graph for 27 represented with (a) CSD and (b) method given
in [127] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.4 Adder graphs generated for coefficient set 1, 7, 16, 21 and 33 by (a) BHM
and (b) RAG-n [127] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.5 Xilinx 4-series (a) Configurable logic block and (b) simplified half slice
[144]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.6 (a) Basic structure with 2:1 multiplexer and 4-input LUT mapping for
(b) addition, (c) subtraction and (d) addition/subtraction. . . . . . . . 80
4.7 Example adder graph with N  8, k  2 and n  2 [125]. . . . . . . . . 83
4.8 Simplified half slice of Xilinx 7-series [146]. . . . . . . . . . . . . . . . . 83
4.9 (a) Basic structure with 3:1/4:1 multiplexer and its 6-input LUT map-
ping for (b) addition, (c) subtraction and (d) addition/subtraction. . . 84
4.10 (a) Basic structure with two 2:1 muxes and (b) its mapping on a 5-input,
2-output LUT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.11 Estimated (a) MSE and (b) SER of the reconstructed output with vari-
ous filter coefficient precision. . . . . . . . . . . . . . . . . . . . . . . . 88
4.12 Frequency response of (a) fixed-point filters at each decomposition level ,
(b) Wavelet and Scaling filters with floating-point and 11-bit fixed-point
coefficients (red), (c) Scaling and Wavelet function associated with db4,
and (d) Pole-zero plane of the floating and 11-bit fixed-point coefficients
of db4 scaling filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.13 The ReMB designed for db4 wavelet filters. . . . . . . . . . . . . . . . . 91
4.14 Constant multiplier blocks designed for db4 filter coefficients. (a) Design2
and (b) Design3 [155] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.15 Time-multiplexed TDL FIR filter implemented using (a) a general pur-
pose multiplier and coefficient memory, and (b) the proposed ReMB
block replacing multiplier and coefficient memory. . . . . . . . . . . . . 94
4.16 The controller for the lowpass analysis filter (h0pkq). . . . . . . . . . . . 96
4.17 One level analysis filter bank comprised of a lowpass (h0pkq) and high-
pass (h1pkq) time-multiplexed TDL filters with; (a) parallel multiplier
and coefficient memory and (b) the proposed ReMB. . . . . . . . . . . 97
4.18 The controller for the analysis filter bank. . . . . . . . . . . . . . . . . 98
5.1 Polyphase realization of two-channel IIR QMF bank. . . . . . . . . . . 112
xv
5.2 9th order IIR lowpass (blue) and highpass (red) filters; Maximally flat
with K  9 and N  4; (a) Magnitude response, (b) Pole-Zero Plane
Elliptic with K  1 and N  0; (c) Magnitude response, (d) Pole-Zero
Plane Intermediate with K  5 and N  2; (e) Magnitude response, (f)
Pole-Zero Plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
5.3 Scaling and Wavelet functions for 9th order (a) Maximally flat, (b) El-
liptic, and (c) Intermediate IIR wavelet filters. . . . . . . . . . . . . . . 118
5.4 One level IIR wavelet filter bank in polyphase structure with causal
stable analysis and stable but anti-causal synthesis filters. . . . . . . . . 121
5.5 Implementation of non-causal time reversed allpass filter using Powell
and Chau technique [169]. . . . . . . . . . . . . . . . . . . . . . . . . . 125
5.6 (a) One level hybrid IIR/FIR wavelet filter bank in polyphase structure
with causal and stable IIR analysis and FIR synthesis filter banks. . . . 127
5.7 ilet3; (a) Magnitude response and (b) Pole-Zero locations. ilet5; (c)
Magnitude response and (d) Pole-Zero locations. (e) ilet3 Scaling and
Wavelet functions, and (f) ilet5 Scaling and Wavelet functions. . . . . . 129
5.8 Analysis filter bank responses for 5 level decomposition where D and A
are the highpass and the lowpass branch responses. (a) ilet3, (b) ilet5
Wavelet Transform (WT), and (c) db4 WT . . . . . . . . . . . . . . . . 130
5.9 Truncated impulse response of ilet3, (a) Maximum error between the
H0pzq and :H0pzq which is the truncated IIR filter, and (b) Top figure;
Magnitude response, Bottom figure; Magnitude error of the H0pzq (blue)
and :H0pzq (red) impulse response with L0  8. . . . . . . . . . . . . . . 132
5.10 Truncated impulse response of ilet5, (a) Maximum error between the
H0pzq and :H0pzq which is the truncated IIR filter, and (b) Top figure;
Magnitude response, Bottom figure; Magnitude error of the H0pzq (blue)
and :Hpzq0 (red) impulse response with L0  8 and L1  16. . . . . . . 132
5.11 (a) First level analysis filter bank (FB), (b) first level synthesis FB and
(c) implementation of A0pz1q with L0  8 in floating point precision
for ilet3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
5.12 (a) First level analysis FB, (b) first level synthesis FB, (c) implement-
ation of A0pz1q with L0  8 and (d) implementation of A1pz1q with
L1  16 in floating point precision for ilet5. . . . . . . . . . . . . . . . 135
5.13 Timing diagram of A0,1pz1q in Figures 5.11 and 5.12. . . . . . . . . . . 136
5.14 ilet3 FB performance for perfect reconstruction with L0  8, (a) ECG
record-232 input(red)vs reconstructed output (blue), (b) MAE (mV) for
ECG data records, (c) EEG record-chb14 input(red)vs reconstructed
output (blue), and (d) MAE (mV) for EEG data records. . . . . . . . . 137
5.15 ilet5 FB performance for perfect reconstruction with L0  8 and L1 
16, (a) ECG record-232 input(red)vs reconstructed output (blue), (b)
MAE (mV) for ECG data records, (c) EEG record-chb14 input(red)vs
reconstructed output (blue), and (d) MAE (mV) for EEG data records. 138
xvi






  rH0pzq coefficients, ilet5 wavelet
Analysis filters; (c) Magnitude responses, and (d) PZPs for floating point
and fixed point coefficients. . . . . . . . . . . . . . . . . . . . . . . . . 140
5.17 Peak gain and quantization noise shaping for 1st order ND-TDL allpass
structure [171]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
5.18 (a) Gain |P1pzq| and (b) Output quantization noise power (QNPSD), for
A0pzq of ilet3 (blue), A0pzq of ilet5 (red) and A1pzq of ilet5 (black) . . . 142
5.19 ilet3 (a) One-level IIR synthesis filter bank architecture and (b) the
timing diagram for controlling the operation of the synthesis filter bank. 149
5.20 ilet5 One-level IIR synthesis filter bank architecture with the block pro-
cessing method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
5.21 (a) Phase compensation error with pL0  1qth order FIR filter for ilet3
wavelet, (b) Magnitude of Linear Distortion Transfer Function (DLpzq),
(c) Magnitude of Aliasing Distortion Transfer Function (DApzq), and (d)
the group delay of the analysis and synthesis filter banks. . . . . . . . . 153
5.22 ilet3 wavelet; (a) Analysis filter magnitude responses, (b) Synthesis filter
magnitude responses, (c) Analysis filter group delay, and (d) synthesis
filter group delay. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
5.23 Floating model of one-level hybrid IIR/FIR wavelet filter bank for ilet3
wavelet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
5.24 (a) Phase compensation error with pL0  1qth and pL1  1qth order FIR
filters for ilet5 wavelet, (b) Magnitude of Linear Distortion Transfer
Function (DLpzq), (c) Magnitude of Aliasing Distortion Transfer Func-
tion (DApzq), and (d) the group delay of the analysis and synthesis filter
banks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
5.25 ilet5 wavelet; (a) Analysis filter magnitude responses, (b) Synthesis filter
magnitude responses, (c) Analysis filter group delay, and (d) synthesis
filter group delay. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
5.26 Floating model of one-level hybrid IIR/FIR wavelet filter bank for ilet5
wavelet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
5.27 Magnitude responses of; (a) ilet3 synthesis filters and (b) ilet5 synthesis
filters with floating-point (blue-red) and fixed-point (black-green) coef-
ficients. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
5.28 Magnitude of; (a) linear, and (b) aliasing distortion transfer functions,
and (c) the hybrid filter bank group delay, with floating- (blue) and
fixed-point (red) coefficients for ilet3 wavelet. . . . . . . . . . . . . . . 160
5.29 Magnitude of; (a) linear, and (b) aliasing distortion transfer functions,
and (c) the hybrid filter bank group delay, with floating- (blue) and
fixed-point (red) coefficients for ilet5 wavelet. . . . . . . . . . . . . . . 161
5.30 Four seconds of (a) ECG data record-232 (top figure) and reconstruction
error (bottom figure), and (b) EEG data record-chb14 (top figure) and
reconstruction error (bottom figure). . . . . . . . . . . . . . . . . . . . 163
5.31 (a) The multiplier free architecture of the Hybrid IIR/FIR wavelet filter
bank for ilet3, and (b) the structure of the ReMB. . . . . . . . . . . . . 165
xvii
5.32 The structure of the controller designed for generating the ReMB control
signals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
5.33 The multiplier free architecture of the one-level Hybrid IIR/FIR wavelet
filter bank for ilet5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
5.34 Structure of (a) ReMB0 designed for R0pzq and (b) ReMB1 designed for
R1pzq of the ilet5 Hybrid IIR/FIR wavelet filter bank. . . . . . . . . . . 169
5.35 The controller designed for generating the ReMB control signals. . . . . 170
5.36 The controller designed for generating the ReMB control signals. . . . . 171
5.37 The block diagram of the DWT based denoising method. . . . . . . . . 174
5.38 A 5 second segment of the (a) clean record ‘105’, (b) the generated
EMG noise (c) generated baseline wander and (d) the noisy record with
an SNR of -8 dB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
5.39 Average (a) SNR Improvement (dB), and (b) MSE, after wavelet de-
noising with ilet3, ilet5, db4, db6, db8, sym4 and coif4. . . . . . . . . . 178
B.1 (a)State diagram and (b) logic design, of the 3-bit up/down counter
employed in the design of ilet3 IIR synthesis filter bank to generate the
required addresses for the dual-port RAM. . . . . . . . . . . . . . . . . 210
B.2 a)State diagram and (b) logic design, of the 4-bit up/down counter em-
ployed in the design of ilet5 IIR synthesis filter bank to generate the
required addresses for the dual-port RAM. . . . . . . . . . . . . . . . . 211
C.1 Seven level analysis filter bank responses of (a) ilet3, (b) ilet5, (c) db4
and (d) db6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
C.1 Seven level analysis filter bank responses of (e) db8, (f) sym4, and (g)
coif4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
xviii
List of Tables
2.1 Normal values of a Lead II ECG features of a healthy subject in sinus
rhythm at 60 BPM [49]. . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.2 Typical Adult Human Scalp EEG Waves and Associated Frequency
Ranges and Body Activities [54] . . . . . . . . . . . . . . . . . . . . . . 27
3.1 Variation in the group delay in normalized frequency bands of νB1 
0  0.0625 and νB2  0  0.125. . . . . . . . . . . . . . . . . . . . . . . 60
3.2 Two-path All-pass based HB IIR Filter Characteristics Comparison with
the State-of-the-art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.1 Lowpass and Highpass, Analysis and Synthesis Filter coefficients . . . . 90
4.2 Fixed-point (11-bit) db4 wavelet filter coefficients, their adder costs and
shift-add format used to design the proposed ReMB. . . . . . . . . . . 91
4.3 CSD encoded coefficients and common sub-expressions . . . . . . . . . 92
4.4 Select line (S0:S4) values for multiplexers given in Figures4.13 and 4.15
(b) to generate the lowpass analysis filter coefficients. . . . . . . . . . . 95
4.5 Resource utilization of individual MCM blocks that are designed using
the ReMB, CSE and DAG fusion methods as well as the the Xilinx
Multiplier LogiCORETM. . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.6 Resource utilization of the time-multiplexed TDL Filters with the pro-
posed RemB and the Xilinx Multiplier after Place and Route on Xilinx
Kintex-7 device. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.7 Resource Utilization and Power Consumption of the Multiplier Free db4
Filter Bank Architectures . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.1 Average Error Measures for evaluating the implemented three level IIR/IIR
ilet3 and ilet5 wavelet filter banks. . . . . . . . . . . . . . . . . . . . . 139
5.2 The average MSE, QNPSD, and SER metrics obtained for White Gaus-
sian Noise, ECG and EEG data with ilet3 and ilet5 filter banks. . . . . 144
5.3 The average QNPSD obtained for Approximation (A3, A2, A1) and de-
tail coefficients (D1, D2, D3) at decomposition levels 3, 2, and 1 for
White Gaussian Noise, ECG and EEG data with ilet3 and ilet5 filter
banks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
5.4 Fixed-point (9-bit) ilet3 and ilet5 wavelet filter coefficients, their adder
costs and shift-add format used to design the constant multiplications. 147
xix
5.5 Resource Utilization and Power Consumption of the ilet3 and ilet5 Ana-
lysis IIR Filter Bank Architectures. . . . . . . . . . . . . . . . . . . . . 148
5.6 Resource Utilization and Power Consumption of the Multiplier Free ilet3
and ilet5 IIR/IIR Filter Bank Architectures. . . . . . . . . . . . . . . . 151
5.7 Average Error Measures for evaluating the implemented three level hy-
brid IIR/FIR ilet3 and ilet5 wavelet filter banks. . . . . . . . . . . . . 158
5.8 Average Error Measures for Evaluating Three Level Hybrid IIR/FIR
ilet3 and ilet5 Wavelet Filter Banks with Finite-Precision Filter Coeffi-
cients. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
5.9 FIR Synthesis Filters’ Coefficients for ilet3 and ilet5 Wavelets. . . . . . 164
5.10 Control signals for the ReMB designed for R0pzq of ilet3 Hybrid IIR/FIR
wavelet filter bank. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
5.11 Control signals for the ReMB designed for R0pzq of ilet5 Hybrid IIR/FIR
wavelet filter bank. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
5.12 Control signals for the ReMB designed for R1pzq of ilet5 Hybrid IIR/FIR
wavelet filter bank. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
5.13 Resource Utilization and Power Consumption of the Multiplier Free ilet3
and ilet5 IIR/FIR Filter Bank Architectures. . . . . . . . . . . . . . . . 172
5.14 SNR improvement (dB) and MSE after wavelet denoising the four noisy
ECG records with input SNR of -8 dB. . . . . . . . . . . . . . . . . . . 180
A.1 Truth tables used to design the db4 lowpass filter h0pkq ReMB controller. 208
A.2 Truth tables used to design the db4 highpass filter h1pkq ReMB controller.209
B.1 Truth tables used to design the ilet3 R0pzq ReMB controller. . . . . . . 212
B.2 Truth tables used to design the ilet5 R0pzq ReMB controller. . . . . . . 212




There is a growing demand in managing people with Long Term Conditions (LTCs)
such as cardiovascular diseases, hypertension, diabetes, cancer, and epilepsy. Currently
15 million people in England suffer from one or more such chronic diseases [7]. In
addition, more than 17% of the UK population are aged 65 and over, and this number
is predicted to rise due to the ageing population demographics [8]. The unavoidable
result of such population ageing trend, is an increase in the number of chronically
ill patients which is projected to rise by 3 million within the next two decades [7].
Another inevitable result is the increase in the annual cost of health and social care
services. According to the Department of Health, in 2012 the health services provided
for caring and treatment of patients with LTCs governed 50% and 64% of general
practitioner and outpatient appointments respectively, which accounted for 70% of
healthcare expenditure in England [9].
These facts emphasize the importance of long-term unobtrusive physiological monitor-
ing for early-stage diagnosis of the health issues, as well as decreasing the health care
1
costs via offering healthcare services to be delivered at patient’s home without affecting
their daily routines [10]. Thus, these act as significant catalysts for developing clinical
and cost-effective healthcare solutions by using the recent technological advancements.
The rapid improvement in the semiconductor industry enables innovative technologies
to be manufactured and integrated while increasing computational performance and di-
minishing sizes. Wearable systems, such as Wireless Body Area Network (WBAN), can
provide remote care of the elderly and chronically ill people that leads to convenience
of the healthcare provider. Utilization of these systems can also deliver supervised
but independent living and improved healthcare quality to the users while reducing
the cost of health services by means of decreased number of General Practitioner and
hospital admissions and increasing the efficiency of chronic disease management [11].
Typical applications of medical wearable systems can be continuous LTC monitoring
for chronically ill patients, remote postoperative rehabilitation, daily activity monitor-
ing, fall and movement detection, location tracking for elderly people and assessment
and/or enhancement of sportive technical abilities [12].
The benefits of wearable systems also bring new challenges in the development and
acceptance of these systems. Compared to the non-ambulatory monitoring systems,
wearable human assistive systems are in close contact with users for a longer time,
therefore they should not restrict users’ life. In addition, due to the fact that ambulat-
ory monitoring application does not take place in a steady and controlled environment,
wearable systems must operate with optimum performance under real-time. While re-
liability, comfort, and usability are of great importance, security and acceptance are
equally significant [13].
2
1.1 Wireless Body Area Network
A WBAN is a wireless networking technology that interconnects miniaturised and
autonomous sensor nodes or routers in, on or around a human body, monitoring body
function during normal activities for sporting, health and emergency applications [14].
During the last few decades WBAN has become main research focus in medical dia-
gnostics and personal healthcare due to the growing demand for long-term real-time
health monitoring and proactive healthcare [15, 16]. With the advancing technology
it is aimed to replace bulky wired devices, with small, low-weight, low-cost, portable
and wireless ones, in order to provide real-time feedback of useful medical information
to medical servers, and mobility and comfort to the users. The smart wearable sys-
tems can measure different physiological parameters including, bio-potential signals,
skin and body temperature, heart rate, blood pressure, respiration and blood oxygen
saturation (SpO2) [17].
As mentioned previously, wearable systems in remote processing category, acquire raw
biomedical data and transmit them to a remote server for further advanced digital
signal processing, whereas in the second category signal processing is handled locally.
Wearable WBAN systems can be categorized in two groups. The first group is the off-
site systems in which the physiological monitoring system acquires the raw data and
either stores the data locally for off-line processing or transmits them through a wireless
communication link to a remote site for real-time processing. Some examples of the
research prototypes include Liferguard [18], LiveNet [19], and RTWPMS [20]. These
prototypes are capable of measuring multiple body parameters such as blood pressure,
SPO2, heart rate and body temperature. They integrate medical sensors, a wearable
3
device such as Personal Data Assistant (PDA) and a remote device such as Tablet PC,
in which communication links between the medical sensors and the wearable device are
achieved through wires. In common, these are all bulky wired systems which are not
compatible with present understanding of the term ”wearable” and do not have signal
processing capabilities.
The second group of systems is on-site whereby the medical sensor nodes process the
acquired data in real-time by their on-node processing abilities and either transmit
them to a remote terminal or perform corresponding operations, such as detection
and classification [21]. The challenge with the on-site processing is the limited energy
supply and computational capability which limits the complexity of the algorithms
deployed on the sensor node. This work focuses on design and development of low-
power, low-complexity digital signal processor that can be employed in the medical
sensor nodes in an on-site wearable physiological monitoring system.
1.1.1 WBAN General Architecture
A generic WBAN system is composed of three tiers which are the medical sensor,
personal terminal, and remote terminal tiers [13]. The medical sensing tier involves
the medical sensors which acquire physical and chemical quantities, apply real-time
analog and digital signal processing and then transmit processed signals to a personal
terminal via wireless body communication networks. The second tier is usually a base
station or a smart phone which relays information between the medical sensor node
and the remote terminal. Finally, the remote terminal stores or manages the received
data and can link with medical server, emergency server and/or informal caregivers
4
for required medical response. The architecture of a WBAN system with three tiers is































Figure 1.1: The architecture of a wearable WBAN system, illustrating the typical three tier system
The medical sensing tier generally consists of battery powered medical sensor nodes
including analog and digital components which are desired to have high acquisition and
processing accuracy while being low-cost, small in size, and power efficient. A generic
medical sensor consists of four main functional components as shown in Figure 1.2 [22].




Figure 1.2: Main functional blocks of WBAN sensors; AFE, Σ∆ ADC, DSP, RF communications and
Power Management
The biopotential signals acquired by the electrodes are amplified and preconditioned by
the Analog-Front-End (AFE) block which are then digitized via the Analog-to-Digital
5
Converter (ADC) for further processing. Digital signal processing is usually handled by
a micro-controller and/or a custom designed Digital Signal Processor (DSP). The digit-
ally processed signals are then transmitted to the second tier via the Radio Frequency
(RF) communication block using Wi-Fi, Bluetooth or ZigBee. Finally, power man-
agement of all these components are handled by the power management block. Com-
munication between the first and second tier can be achieved via Wi-Fi or Bluetooth
requiring a wireless communication module on the sensor node. Operating life time
is one of the main obstacles in sensor design and the highest power is consumed by
wireless communication block [23,24]. Since the medical sensor has limited computing
power and energy supply, it is necessary to handle some of the signal processing, such
as noise reduction, feature detection and classification on the sensor nodes in order
to avoid transmission of unnecessary information (e.g. motion artifact). This is a
trade-off between the power consumption of the wireless communication and the signal
processing.
1.1.2 State-of-the-art WBAN Systems with Local Processing
Since the last two decades wearable physiological monitoring sensors with embedded
digital signal processing capabilities have been the major research focus in the wearable
health monitoring field which encapsulated design of many research prototypes as well
as commercially available products. With the developing semiconductor industry, the
wearable devices embed digital signal processing into the hardware where local pro-
cessing takes place before any data transmission. The advantages of on-sensor signal
processing are; (1) reduced system power consumption, (2) robust and autonomous res-
6
ults with minimized system latency, (3) increased quality of the recorded signals, and
(4) decreased amount of data to be transmitted thus lesser off-line processing [25]. In
the last decade, different Application Specific Integrated Circuit (ASIC) and System-
on-Chip (SoC) designs have been proposed, each with different analog and digital
processing capabilities.
Some designs are implemented for multi-signal processing that can be deployed in vari-
ous applications. In [23] a digital signal processing platform to be deployed in ambulat-
ory monitoring systems was proposed. The platform provides power and performance
reconfigurability which makes it possible to be employed in different biopotential signal
processing applications and has four operation modes; (1) high performance, (2) low
power, (3) data collection, and (4) sleep mode. In another study, a biosignal acquisition
system is proposed that consists of three separate chips for AFE, DSP and transceiver
which are fabricated in TSMC 0.18 µm standard CMOS process [26]. The total power
consumption is 1.5 mW on a supply voltage 1.2 V and the digital back end processor
occupies 460 µW of the total power consumption.
On the other hand, several studies were presented on signal specific DSP design that
were mainly concentrated on Electroencephalogram (EEG) and Electrocardiogram
(ECG) applications. Verma et al. [27] presented a SoC for continuous real-time de-
tection of seizure onset in epilepsy patients. Each EEG channel corresponds to a SoC
which is comprised of an AFE, ADC and DSP. The DSP performs local feature extrac-
tion which is followed by the wireless transmission of the extracted data for classification
processes. It is reported that each SoC is implemented in a 0.18 µm CMOS process
and consumes 120 µW on a supply voltage of 1 V . In another study, an 8-channel
7
scalable EEG acquisition SoC which integrates an AFE, a personalized seizure onset
detector and classifier is presented [28]. It is implemented with 0.18 µm CMOS techno-
logy and shows energy efficiency of 2 µJ{classification at 128 classifications/s. In [29]
a prototype device assembling Micro-Electro-Mechanical Systems (MEMS) sensor for
nasal airflow measurement, an Integrated Circuit (IC) for sleep apnea detection and
autonomous scoring by applying time-domain signal processing and a wireless trans-
mitter. The IC chip is fabricated in 0.5 µm CMOS technology and integrates an AFE,
a breathing rhythm detection block, a time-to-digital converter. The IC chip dissipates
33 µW on a 5 V supply voltage. Page et al. [30] demonstrated an Field-Programmable
Gate Array (FPGA) implementation and ASIC design in 65 nm CMOS of a person-
alized seizure detector and classifier. The ASIC dissipates power between 37 nW and
77 nW when running at its nominal and maximum frequencies of 484 Hz and 1 GHz,
respectively, with 1 V supply voltage. In more recent studies, neural signal processors
are demonstrated where Discrete Wavelet Transform (DWT) is employed for various
purposes such as feature extraction and data compression. In [31], a multiplier free
DWT processor for a 32-channel neural recording system is presented which is imple-
mented in a 0.18 µm CMOS process. The processor is responsible for data compression
through a four-level DWT decomposition and is reported to consume 37 µW at 6.4
MHz operating frequency. Finally, another 32-channel neural signal processor mapped
on 0.18 µm CMOS technology is reported by Yang et al. in [32] with spike detection,
feature extraction and spike classification capabilities. The processor employs Haar
DWT based feature extraction and consumes 24 µW at 160 kHz operating frequency.
In addition to EEG applications, the following demonstrate some examples for ECG
8
processors from the open-literature. In 2011, Liu et al. [33] proposed an ECG signal-
processing ASIC with 0.18 µm CMOS technology, employing wavelet transform al-
gorithm for the purpose of real time artefact removal, QRS detection and heart rate
prediction and classification with a total power consumption of 29 µW on a supply
voltage of 1 V . In [34] a 3-lead wireless ECG SoC integrating AFE, RF transceiver
and a application specific micro-controller that can be programmed to perform heart
rate detection is demonstrated. The SoC is implemented with 0.13 µm CMOS tech-
nology can operate in two modes; the heart rate detection and raw data transmission
modes during which the power consumptions are 17.4 µW and 74.8 µW respectively on
a supply voltage of 0.7 V . In another study, Min et al. [35] presented an ECG detector
chip fabricated in 0.35 µm CMOS technology employing an ADC, to be deployed in
implantable cardiac pacemakers. The detector is based on wavelet-transform and con-
sumes 19 µW on a supply voltage of 3 V . In [36] an ECG processor capable of real-time
ECG recording and QRS, P and T wave detection that employs wavelet transform is
proposed. It is implemented in 0.18µm CMOS technology and consumes 457 nW at
0.5 V supply. An event-driven ADC with real-time QRS detector, in 0.13 µm CMOS
technology, to be employed in ambulatory ECG monitoring applications is presented
by Zhang et al. [37]. The detector employs two algorithms; (1) pulse triggered and (2)
time-assisted pulse triggered QRS detection, and together with ADC the detector con-
sumes 220 nW at 300 mV supply. Kim et al. [38] described a mixed signal SoC which
integrates AFE and a custom designed DSP for continuous and real-time 3 channel
ECG acquisition, heart beat detection using wavelet transforms and motion artefact
removal as well as electrode-tissue impedance measurement. The proposed SoC can
9
operate in three modes which are data acquisition, heart beat detection and accurate R
peak detection. The digital back end can be programmed to perform two motion arte-
fact removal algorithms (i.e. adaptive Least Mean Square (LMS) filtering and Principle
Component Analysis (PCA) prior to accurate R peak detection. It is implemented in
0.18 µm CMOS technology and power consumption is between 32 µW and 82.4 µW
at a supply voltage of 1.2 V depending on the selected mode. Zou et al. [39] presented
an ASIC fabricated in TSMC 65 nm CMOS technology, for ECG acquisition, data
compression and R-peak detection applications. The ASIC embeds data compression
and R peak detection algorithms based on wavelet transform. The chip is tested in
two modes with two different supply voltages. The first mode is the recording mode in
which the approximation of the measured signals are stored in the memory bank and
compressed before transmission. In this mode the ASIC consumes 49 µW and 312 µW
with 0.7 V and 1.2 V supply at operating frequency 9 kHz, respectively. The second
mode is the detection mode where a detection algorithm is applied on the acquired
signal and the output is compressed for wireless transmission. The second mode dis-
sipates less power that the first one which are 33 µW and 233 µW for 0.7 V and 1.2 V
supply voltages, respectively. Whereas, the transmitter on its own consumes 37.2 mW
with 2 V supply voltage. In [40] an adaptive ECG feature extraction and delineation
algorithms are implemented within an ASIC in 65 nm CMOS technology is presented.
The total power consumption of the ASIC is reported as 614 µW . An improved version
of this processor is presented in [41] which provides additional features like statistical
analysis based feature extraction and classification based on naive Bayes classifier. A
similar technology is used for implementation and the power consumption is reported
10
to be 2.78 µW at 10 kHz operating frequency with a supply voltage of 1 V. Further-
more, in a more recent study an efficient ECG processor is designed for QRS detection
purposes which employs Haar wavelet based DWT with a multiplier free structure. It
is implemented on 0.18 µm CMOS technology and reported to consume 410 nW with
a supply voltage of 1V [42].
The aforementioned systems are all capable of achieving low power consumption, how-
ever it is difficult to make a fair comparison among each other due to the different
technologies, operating frequencies, and the number of channels they use as well as
the modalities they employ. Although there are many other solutions along with the
provided examples, there is still room for optimization in hardware and power con-
sumption which will enable utilization of sophisticated techniques for mobile health
monitoring systems. As reported in [42], utilization of DWT filter banks with higher
filter orders than the Haar wavelet filters are avoided due to the stringent hardware and
power constraints of the portable medical devices. Therefore, this raises an interest in
investigation of different techniques in order to implement low-complexity DWT filter
banks with different wavelet functions.
1.2 Objectives
Wearable health monitoring systems have very stringent area and power requirements
as they are desired to be small in area and operate for long time. As mentioned
above, the highest power is consumed by wireless communication block [23, 24].Since
the medical sensor has limited computing power and energy supply, in the last decade,
it is aimed to incorporate as much local signal processing in order to reduce the amount
11
of data transmitted. Therefore, the aim of this research is to investigate alternative
solutions to the state-of-the-art in the biomedical signal processing literature in order
to reduce the hardware complexity of a biomedical DSP while improving its power
consumption. Based on the literature review, the objectives for this research can be
listed as follows:
• To investigate the applicability of infinite impulse response filters in the decima-
tion chain of a biomedical signal acquisition system due their hardware simplicity
and minimized power consumption. There is a trade-off between hardware sim-
plicity and the non-linear phase. Thus, the subsequent aim is to investigate
and evaluate the non-linear phase effects of the all-pass based infinite impulse
response filters.
• To analyse implementation cost of wavelet transform and investigate a more
hardware and power efficient implementation alternative. Wavelet transform
is a popular tool employed in biomedical applications for decomposing signals
into different frequency bands where noise reduction and detection algorithms
can be applied. Furhtermore, reconfigurable multiplier blocks have been used
in fixed coefficient filtering and transforms which significantly reduced hardware
utilization for FPGA implementations. Thus, employing reconfigurable multi-
plier blocks in wavelet transform filters will contribute to power reduction of the
transform in biomedical applications.
• Infinite impulse response filters are hardware efficient, thus it is aimed to in-
vestigate, design and implement infinite impulse response filters based wavelet
transform that can be employed in biomedical signal processing applications. In-
12
finite impulse response filters are well-known to achieve similar or better filtering
performance with lower filter orders as compared to finite impulse response fil-
ters. Low order filters will automatically lead to reduced hardware complexity
and power consumption since the number of power hungry arithmetic operations
are less. Furthermore, better frequency selectivity that can be achieved with
simple structures will provide an advantage and therefore, it is also aimed to in-
vestigate the denoising performance of this novel implementation of the wavelet
families in the analysis of biopotential signals.
• The IIR wavelet analysis filter banks exhibits non-linear phase which introduces
unwanted distortions in the reconstructed data. A typical FIR based perfect
reconstruction filter bank overcomes this problem by simply time-reversing the
filter coefficients which eliminates the non-linear phase of the analysis filter bank.
However, due to the feedback path existing in the IIR filter structures the time-
reversal of the filter coefficients is not a straight forward and simple operation.
Therefore, the subsequent aim of this thesis is to overcome the phase non-linearity
of the IIR wavelet filter banks by investigating different approaches.
1.3 Novel Contributions of the Work
The novel contributions of the research conducted and resulting publications are listed
below.
• This work presents novel use of a decimation filter chain incorporating two-path
HB all-pass based IIR filters in ECG data acquisition systems with a high filter-
13
ing performance, a completely multiplier free hardware structure and low power
consumption. The proposed structure highly reduces hardware complexity com-
pared to the existing decimation filters used in this field which in turn reduces
power consumption.
• The non-linear phase effects of the described filters on ECG signals, which may
introduce signal distortions leading to misdiagnosis is investigated which, to the
best of the author’s knowledge, is a first in open literature. This work concludes
that the proposed design introduces minimal distortion to the signal of interest
and would not affect critical diagnosis, and thus, this is an efficient approach
for the decimation process for high resolution biomedical data conversion and
acquisition applications.
• A novel Reconfigurable Multiplier Block (ReMB) structure targeting FPGA tech-
nologies is proposed for 8-tap Daubechies filters that have been widely employed
in biomedical signal processing applications. This structure employs Look-up
Tables (LUTs) with increased number of inputs, unlike state-of-the-art and in-
creases the reconfigurability of the system. The proposed structure highly reduces
hardware complexity compared to a general purpose multiplier which in turn re-
duces power consumption. The method introduced can also be employed in the
design of other wavelet functions.
• IIR filter based Discrete Wavelet Transform for biomedical signal processing ap-
plications and the effect of phase non-linearity of the analysis IIR Wavelet Filter
Bank is investigated. In the open literature, the IIR wavelets has not been
involved in real-time continuous data processing applications which require per-
14
fect reconstruction due to the problems arise from their non-linear phase. In this
thesis, a novel solution of Hybrid IIR/FIR Wavelet Filter banks is proposed which
enables the use of IIR wavelets in continuous biomedical signal processing applic-
ations while achieving minimal computational complexity, power consumption
and near-perfect reconstruction. As a result of this research;
– Two novel hybrid IIR/FIR wavelet filter banks are proposed and designed
to be deployed in biomedical signal processing applications.
– The floating- and fixed-point structures of the proposed hybrid filter banks
are modelled in which the quantization noise effects in the filter bank datapath
and filter datapath itself are investigated.
– The proposed hybrid systems are implemented on an FPGA device using
the novel method of designing ReMBs which further reduces the complexity
of the proposed systems.
• Novel use of IIR filter based Wavelet Transform for ECG signal denoising is pro-
posed. The denoising performance of the proposed filter banks are compared with
the conventional FIR based wavelet transform. The proposed designs provide bet-
ter performance in terms of the improved Signal-to-Noise Ratio (SNR) and Mean
Square Error (MSE). IIR wavelets are shown to achieve better frequency selectiv-
ity than the conventional FIR wavelets which provides a significant advantage in
denoising applications.
• List of publications relevant to this work:
– Y. Eminaga, A. Coskun, S. A. Moschos and I. Kale, “Low Complexity All-
15
Pass Based Polyphase Decimation Filters for ECG Monitoring” in 11th Con-
ference on Ph.D. Research in Microelectronics and Electronics (PRIME),
IEEE, 2015, pp. 322-325.
– Y. Eminaga, A. Coskun, and I. Kale, “Two-Path All-pass Based Half-Band
Infinite Impulse Response Decimation Filters and the Effects of Their Non-
Linear Phase Response on ECG Signal Acquisition”, Journal of Biomedical
Signal Processing and Control, vol. 31, pp. 529-538, 2017.
– Y. Eminaga, A. Coskun, and I. Kale, “Multiplier Free Implementation of
8-tap Daubechies Wavelet Filters for Biomedical Applications”, in New Gen-
eration of CAS (NGCAS), IEEE, 2017, pp. 129-132.
– Y. Eminaga, A. Coskun, and I. Kale,“Area and Power Efficient Implementa-
tion of db4 Wavelet Filter Banks for ECG Applications Using Reconfigurable
Multiplier Blocks.” in 4th International Conference on Frontiers of Signal
Processing (ICFSP), IEEE, 2018,pp. 65-68.
– Y. Eminaga, A. Coskun, and I. Kale, “IIR Wavelet Filter Banks for ECG
Signal Denoising” in Signal Processing: Algorithms, Architectures, Arrange-
ments, and Applications (SPA). IEEE, 2018, pp. 130-133.
– Y. Eminaga, A. Coskun, and I. Kale, “Hybrid IIR/FIR Wavelet Filter Banks
for ECG Signal Denoising” in Biomedical Circuits and Systems Conference
(BioCAS), IEEE, 2018, pp. 227-231.
• List of awards received from the attended conferences:
– The Gold Leaf Certificate awarded for the best written and presented paper
16
in the 11th Conference on PhD Research in Microelectronics and Electronics
(PRIME), 2015.
– The Best Presentation Certificate awarded in the Doctoral Conference or-
ganized by the Faculty of Science and Technology at the University of West-
minster, 2017.
– The Excellent Oral Presentation Certificate awarded for the best presented
paper in the 4th International Conference on Frontiers of Signal Processing
(ICFSP), 2018.
1.4 Outline of the Thesis
This thesis consists of six subsequent chapters.
Chapter 2 is composed of two main parts. The first part introduces background in-
formation on the physiological signals that are most commonly recorded for diagnostic
purposes inside and outside clinical environment. This is further divided into three
sections for providing detailed information about the physiological origin, morpho-
logical characteristics, recording techniques and applications of the ECG, EEG and
Electromyogram (EMG) signals. The second part provides a review of the wave-
let theory and explains its main differences and advantages over Short-Time Fourier
Transform (STFT). Here two main types of WT which are Continuous Wavelet Trans-
form (CWT) and DWT are introduced where DWT is described in terms of digital
filter banks.
Chapter 3 presents the proposed decimation filter chain that comprises a Slink filter,
17
two two-path all-pass based HB IIR filters and a Slink roll-off compensator. A review
of the state-of-the-art decimation filters used in biomedical applications is presented
which is followed by the detailed description of the decimation filter chain structure
along with their magnitude characteristics. A brief description of phase linearity and
group delay is provided. Phase characteristics of the two-path all-pass based HB IIR
filter is analysed and phase compensation method used is described. The chapter
further provides description of the ECG databases used for simulation purposes. The
chapter is concluded by discussions on the simulation results.
Chapter 4 presents the proposed ReMB structure for 8-tap Daubechies wavelet trans-
form filters that replaces multiplication operation with a reconfigurable shift-add net-
work using multiplexers optimized for FPGA implementation. The chapter provides
a brief introduction about different methods used for multiplier-block implementation
and gives some examples from the open literature. The method used for implementing
the proposed design is described and followed by the implementation results in terms of
hardware utilization and estimated dynamic power consumption. The proposed design
is compared with alternative designs achieved through state-of-the-art methods and a
general purpose multiplier. In addition, the comparisons are extended by including the
open literature designs implemented for the db4 filters in order to evaluate the efficiency
of the proposed system. Chapter 4 concludes with discussions and conclusions.
Chapter 5 provides an introduction on how to employ IIR filters for the wavelet trans-
form to be employed in biomedical applications. A procedure for designing IIR wavelets
is presented where detailed mathematical steps are provided along with the design ex-
amples. Furthermore, the synthesis filter bank implementation problems due to the
18
need for anti-causal filtering is addressed where two methods are presented to overcome
this problem. This chapter provides detailed design and fixed-point implementation
considerations regarding the analysis and synthesis filter banks of the IIR wavelets as
well as the computational complexity and the estimated power dissipation of the pro-
posed systems. In addition, application of the designed wavelet filter banks in ECG
signal denoising scenario is presented and their performance are compared to the state-
of-the-art wavelet filter banks. The chapter is concluded with the conclusions section.
Chapter 6 provides the summary and the evaluation of the study presented in this
thesis. The driven conclusions, possible extensions of this work and the future work
are presented in detail.
1.5 Chapter Conclusion
Chapter 1 provided an introductory knowledge on the WBAN systems and the chal-
lenges faced by both users and engineers. The aims, contributions and publications





2.1 Introduction to Physiological Signals
Physiological signal monitoring has been one of the most widely employed methods
for patient observation and diagnosis. The monitoring systems are usually used in
clinical environment where admission of the patient is required for continuous real-
time monitoring. Portable recorders, such as holter monitor, have been widely used for
outside hospital monitoring, however they lack to provide real-time data that can be
used for taking instantaneous action in emergency situations. As stated in the previous
sections, emerging wireless and low-power technologies can allow employment of these
monitoring systems outside the clinical environment and offer a level of comfort to
the users while supplying real-time diagnostic data to the healthcare professionals and
system, and caregivers.
20
This chapter provides a background information on the most widely used physiological
signals for diagnostic purposes that can be monitored inside and outside the clinical
environment. These signals are the ECG, EEG and EMG waves that can be measured
through the body surface. They are generated by the ion exchange through the cell
membrane, resulting in a cycle of the cellular potential known as the Action Potentials
(APs) by different types of cells [43].
2.1.1 Electrocardiography (ECG)
A human heart consists of four chambers which are right and left, atria and ventricles,
responsible for the collection and transport of blood throughout the body. The right
side of the heart collects the deoxygenated blood from the systemic veins and pumps
it towards the lungs whereas the left side collects the oxygenated blood from the pul-
monary veins and pumps it to the rest of the body [44].
Cardiac excitation involves the generation of electrical impulses, referred to as APs,
by individual cells due to the electrical current flow across the cell membranes and
their conduction to neighbouring cells [45]. The conduction system of the heart con-
sists of three groups of specialized cells which are known as the Sinoatrial (SA) Node,
Atrioventricular (AV) Node and bundle of HIS. Cells within the SA node, also known
as Pacemaker cells, which are located on the right atrium, have the fastest rate of AP
generation and drive the rest of the heart at this rate [46]. Thus, electrical activation
of the heart starts within the SA node which fires electrical impulses that propagates
through the AV Node, bundle of HIS down to the Purkinje Fibers [46].
A recorded ECG signal represents the temporal and spatial summation of the action
21
potentials generated by these specialised group of cells through the conduction system
during a cardiac cycle [47]. Simplified anatomy of the heart and the generated AP
shapes and durations for specialised cardiac cells are illustrated in Figure 2.1. Here it
is depicted that different parts of the cardiac conduction system give rise to APs with
different shapes and durations, at different times and different locations. In this figure,
the APs generated by each group of cells are colour coded and their contribution to an
ECG waveform (bottom right corner) during a cardiac cycle can be clearly seen. For
instance, the two pink APs generated by the ventricular muscles contribute towards
the generation of the QRS complex and the T wave which are important features of















0    100  200   300  400  500   600  700
                       Time, msec 
Atria
Ventricles
Figure 2.1: Conduction system of the heart and generation of a ECG signal by the temporal and
spatial summation of APs [47].
22
ECG Waveform Morphology
ECG signals reflect the cardiac activation of the heart measured between any two points
on the body surface using electrodes. A standard clinical ECG signal is composed of
waves at different frequency bands, each reflecting the electrical activation of different
parts of the heart. These waves are known as the P waves, QRS complex (combination
of Q, R and S waves), and T waves, representing the atrial depolarization, ventricular
depolarization and repolarization, respectively. The segments and intervals between
these waves such as ST segment as well as PQ/PR interval, QRS width, QT and RR
interval also represent different cardiac events and carry diagnostic information. Figure
































Figure 2.2: Time domain features of an ECG signal: P, QRS, and T represent atrial depolarization,
ventricular depolarization, and atrial and ventricular repolarization respectively [48]
These waves have extremely low amplitudes ranging from 100 µV to 5 mV and low
diagnostic frequency bandwidth between 0.05 to 100 Hz [49]. The standard clinical
features of the ECG waves for a healthy adult male in sinus rhythm are presented in
Table 2.1 [49] and are dependent on several factors such as age, gender, heart rate,
respiration patterns and diseases [49].
23
Table 2.1: Normal values of a Lead II ECG features of a healthy subject in sinus rhythm at 60
BPM [49].
Feature Normal Value Normal Limit
P width 110 ms  20 ms
PQ/PR interval 160 ms  40 ms
QRS width 100 ms  20 ms
QT interval 400 ms  40 ms
ST segment 70 ms  10 ms
P amplitude 0.15 mV  0.05 mV
QRS height 1.5 mV  0.5 mV
ST level 0 mV  0.1 mV
T amplitude 0.3 mV  0.2 mV
ECG Recording Techniques and Applications
Standard clinical ECG instrumentation consists of ten surface electrodes placed on
the chest, limbs and left leg. This system records 12-lead ECG which is the most
commonly used method for diagnostic purposes. In wearable WBAN systems usually
1-,3- and 6-lead ECGs are recorded by employing two and three electrodes respectively,
in order to minimize the system complexity and power dissipation [34]. The ECG is a
fundamental component in patient monitoring and diagnosis, thus accurate ECG signal
acquisition and its precise analysis are of great importance and has been subject to
numerous research work [33–41,50]. Diagnosis of the cardiac health conditions mostly
rely on the assessment of ECG data based on the interbeat timing and wave amplitudes.
Different types of arrhythmias can be distinguished by the morphological and beat-
to-beat interval variations and/or missing beats. These morphological abnormalities
sometimes can be fatal and often occur sporadically which require long-term and real-
time monitoring [26, 33, 34, 38, 40]. ECG is also employed in cardiac implants such as
Pacemakers. The detection of abnormalities is used for triggering stimulations by the
implanted device in order to regulate functioning of the heart [35].
24
2.1.2 Electroencephalography (EEG)
The nervous system is responsible for collection, transmission and processing of inform-
ation from various body parts and comprises two subsystems. The first is the Central
Nervous System (CNS) that employs the brain and spinal cord. The cerebral cortex
is the largest part of the brain which is located at the outermost layer and consists of
two hemispheres responsible for vital functions such as movement, perception, learning
and speaking [51]. Each hemisphere is divided into four cortical lobes: (1) occipital,
(2) temporal, (3) parietal and (4) frontal lobes. Figure 2.3 shows one cortical half with
the four major lobes. The frontal lobe is responsible for the control of voluntary move-
ments such as speaking and finger movement. Whereas hearing, seeing and sensing is
controlled by the temporal, occipital and parietal lobes, respectively [51]. The second
subsystem is the Peripheral Nervous System (PNS) comprising the nerves responsible
for transmitting information between the CNS and other organs or vice versa [52].
Figure 2.3: Lateral view of the cerebral cortex with occipital, temporal, parietal and frontal lobes [52].
EEG is the detection and recording of the electrical activity of the brain along the
scalp, produced by the firing of the cortical neurons which are the main functional
25
cell type in the cerebral cortex [53]. Similar to the cardiac cells, positive deflection in
the membrane potential across the cortical neurons, result in generation of APs which
propagate through the PNS creating a communication link between the CNS and PNS
peripherals. The electrical activity of a single cortical neuron does not have sufficient
strength to be recorded by surface electrodes, since it gets attenuated by thick layers of
tissue (fluids, bones, and skin) around the cerebral cortex [51]. Thus, a recorded EEG
signal represents summation of the synchronous activity of numerous cortical neurons.
Brain is the driving force in the nervous system and many brain disorder diagnoses are
done by critical evaluation of the recorded EEG signals, therefore accurate recording
of the brain activity is a significant task.
EEG Rhythms and Waveforms
Typical adult human scalp EEG signals have amplitudes ranging from 10 to 300 µV ,
and frequency spectrum ranging from 0 to 100Hz which can vary with different in-
dividuals as well as with the state of the brain activity [54]. Figure 2.4 presents 60
































Figure 2.4: Example EEG recordings at (a) frontal, (b) temporal, (c) parietal and (d) occipital lobe,
which were recorded while subject performed different motor/imagery tasks [55].
26
The EEG signal is comprised of five major waves with different frequency ranges that
are associated with different brain activities. These waves are also referred as rhythms
due to their oscillatory and repetitive behaviour. As shown in Table 2.2, when the
brain is in active state, such as thinking and focusing, the cerebral cortex is actively
processing information therefore, rhythms observed during these activities have high
frequency. Consequently, due to high frequency, the cortical neurons’ activation is asyn-
chronous which results in low-amplitude rhythms (Beta, Gamma). On the contrary,
the inactive state of the brain such as drowsiness or deep sleep, leads to low-frequency
but high-amplitude rhythms (Delta, Theta, Alpha).
Table 2.2: Typical Adult Human Scalp EEG Waves and Associated Frequency Ranges and Body
Activities [54]
EEG Wave Primarily Associated Brain Activity Frequency Band (Hz)












Gamma (γ) Event-related synchronization (ERS) 30 - 45 
Recording Techniques and Applications
The EEG can be recorded either non-invasively or invasively. The non-invasive record-
ing is achieved by placing surface electrodes on to the scalp whereas invasive recording is
achieved by directly placing subdural electrodes into the brain tissue. The invasive EEG
recording is also known as Electrocorticogram (ECoG) [51]. The ECoG signals are
usually less contaminated with artefacts and have higher amplitudes however, it is an
27
invasive method and not suitable for ambulatory applications. Non-ambulatory clinical
EEG recording systems commonly use a standardized method for electrode placement
also known as the International 10/20 system in which 64 or more electrodes are placed
along the scalp [56]. On the other hand, ambulatory EEG recording systems usually
employ less amount of electrodes due to the power and hardware limitations [27–30].
EEG is an important measure for seizure detection applications in epilepsy patients,
diagnosis of sleep disorders [29, 57] as well as development of brain-computer inter-
face [58,59]. Detection of onset of the epileptic seizures can be a challenging task and
necessitates long-term monitoring, traditionally in a controlled clinical environment
where the patient has to be motionless for prolonged period of time. This causes dis-
comfort to the patient, increasing the demand for more convenient methods that can be
easily used. These include off-line [60,61] and on-line recording systems [27,28,30,62],
both offering extended recording time with satisfactory signal quality while increasing
level of patients’ comfort and reducing health expenses.
2.1.3 Electromyography (EMG)
EMG is a technique for recording the electrical activity of the skeletal muscles during
voluntary and reflexive movements. On receiving an impulse from the CNS, skeletal
muscle fibres are activated, generating muscle fibre APs propagating along the length
of fibres. Spatial and temporal summation of APs from different muscle fibres results in
electrical potential that can be measured between two points on the body surface [63].
Figure 2.5 presents a schematic view of the activation of muscle fibres.
28
Figure 2.5: A schematic view of the activation of muscle fibres by the CNS impulse [52].
EMG Signal Characteristics
The amplitude and frequency of EMG signals range from 100 µV to 90 mV and from
25 Hz to several kilohertz, respectively depending on the muscle group of interest [64].
An example EMG obtained from finger movements of a healthy subject generated by
a two-channel surface EMG system is presented in Figure 2.6.















Figure 2.6: Example recording of EMG obtained from individual and combined finger movements of
a healthy subject. Each five second interval (indicated with dashed lines) represents different finger’s
movement [65].
29
EMG Recording Techniques and Applications
As with EEG, EMG can be recorded invasively and non-invasively using needle and
surface electrodes, respectively. Invasive EMG recording is achieved by direct place-
ment of needle electrodes in specific muscle fibres of interest and has the advantage
of being less prone to artefacts. However, invasive measurements are impractical and
discomforting for patients. On the other hand, EMG recording via surface electrodes is
convenient for ambulatory wearable applications where no clinical admission is required
and patients can function in their comfort zone. However, surface EMG signals have
weaker signal amplitudes due to the attenuation by extra tissue on the signal path and
thus, they are more likely to be contaminated by artefacts. This necessitates improved
acquisition and processing methods for increased EMG signal interpretation accuracy
both for diagnostic and research purposes.
Wireless EMG sensors are being deployed in wide range of research areas along with
the traditional diagnostic applications [65–67]. These include gesture recognition, pros-
thetic control, and rehabilitation [68–73]. These research outputs offer multi-channel
recording modalities for long-term monitoring while Brunelli et al. [71] employs on-
node signal processing capabilities.
2.2 Introduction to Wavelet Theory
Over the past three decades, WT has been a very popular tool for time-frequency
domain analysis of non-stationary signals whose spectral content vary in time. WT
provides adaptive, multiresolution analysis which makes it a very successful signal pro-
30
cessing technique that has been used in various fields including biomedicine, geophysics,
telecommunications, image and video coding, etc. In Fourier analysis, a time domain
signal is expanded onto orthogonal basis functions, sine and cosine waves, in order to
define its spectral content. Although, weighted sums of sinusoids have perfect compact
support and provide very good localization in frequency domain, they are global (i.e.
infinite support) in time. This means that a frequency component observed in the fre-
quency domain cannot be localized in the time domain. In some applications such as
biomedical diagnostics, it is crucial to localize frequency variations over time, thus the
Fourier Transform (FT) cannot be used in such applications to analyse non-stationary
signals where both time and frequency information is critical. An alternative solution
to this problem was introduced by Gabor [74] in 1946 known as the STFT. In this
method a signal is segmented into short intervals using a windowing technique and the
FT is applied on each signal segment. The segments of a non-stationary signal are
treated as stationary whose statistics remain unchanged for their duration. However,
this method suffers from the lack of resolution due to the fixed window length through-
out the analysis. Many real world signals’ spectral contents change rapidly, therefore
finding an appropriate short-time window during which signal is stationary can be a
challenging task. Choice of a very narrow window results in poor frequency resolution
(i.e. localization) whereas a wide window results in poor time resolution as well as in-
validating the stationarity assumption within the window. The drawback of STFT was
overcome by introduction of WT which analyses signals with varying window lengths.
WT provides a multiresolution analysis by means of shrinking the window length at
high frequencies and expanding it at low frequencies. Alfred Haar [75], was the first
31
to present concept of wavelets in 1909, by introducing the simplest wavelets, known as
the Haar wavelets [76]. Following Haar, many applied physicists and mathematicians
such as Morlet [77], Grossman [77] and Meyer [78] studied wavelets as an alternative to
Fourier based analysis techniques for many years. However, it was not until late 1980s
that the connection between wavelets and signal processing was established [79, 80].
Since then WT has become a great interest in research especially in the field of digital
signal/image processing.
2.2.1 Short-Time Fourier Transform
The well-known FT is a mathematical tool that is used to transform a time domain




x ptq ejωtdt (2.1)
where ω  2pif . As (2.1) states, FT expands any signal onto orthogonal basis func-
tions of sinusoids. However, FT does not provide information on how the frequency
contents of the signal vary with time. In other words, one cannot determine whether
the frequency information of a signal obtained through FT is continuously present
throughout the time of observation or only at certain intervals, which can be easily
seen in the time-domain representation [81].
Now, lets consider a signal xs ptq given in (2.2), which is a multi-tone sinusoid with
frequencies 20 Hz, 100 Hz, 200 Hz and 400 Hz.
xs ptq  cos p2pi20tq   cos p2pi100tq   cos p2pi200tq   cos p2pi400tq (2.2)
32
The signal has constant frequency content over an indefinite time span thus, xs ptq is
stationary. Another multi-tone signal xns ptq over a one second observation window,
with the same frequency content as xs ptq is given in (2.3).
xns ptq 
$''''''''''&''''''''''%
cosp2pi20tq 0   t ¤ 0.25s
cosp2pi100tq 0.25   t ¤ 0.5s
cosp2pi200tq 0.5   t ¤ 0.75s
cosp2pi400tq 0.75   t ¤ 1s
(2.3)
Although xs ptq and xns ptq have the same spectral components, the frequency content
of xns ptq varies over finite time spans within the observation window which results
in a different time-domain signal and it effectively becomes non-stationary within that
observation window. The time domain and frequency domain responses of these signals
are given in Figure 2.7.




















Figure 2.7: Time and frequency domain responses of a (a) Stationary and (b) Non-stationary signal.
Carrying-out a spectral analysis using the FT with one-second window size provides
little information about the time-domain characteristics of the signal xns ptq. Thus,
shorter window lengths are required, with the assumption that the signal will be sta-
33
tionary in that duration. In applications where the time information as well as the
frequency information of the signal is needed, STFT can be used to localize frequency
variations over time. STFT of any signal x ptq is defined as;
XSTFT pτ, ωq 
» 8
8
x ptqh pt τq ejωtdt (2.4)
where h ptq is the window function. STFT uses a fixed-length window h ptq which is
shifted along the time axis in order to analyse the spectral content of the signal in the
windowed interval. Although this is a popular time-frequency method, it suffers from
a time and frequency resolution trade off. Due to the fixed-length window the whole
signal has to be analysed with the same time and frequency resolution. Therefore,
the accuracy of this method is limited to the size and shape of the window function
employed. Biomedical signals are non-stationary and are composed of high frequency
spectral components which are closely spaced in time and long lasting low frequency
components that are closely spaced in frequency. Such high frequency components can
only be analysed with very short duration windows that have poor frequency resolution,
where as low frequency components require better frequency resolution with a longer
window. Thus, STFT is not the optimal technique for this type of analysis.
2.2.2 Wavelet Transform
Wavelet analysis utilizes wavelets to transform a signal into time-frequency represent-
ation for easier interpretation and processing in comparison to the FT. A wavelet is a
limited duration oscillatory wavelike function that has a zero mean and its energy con-
centrated in time. Wavelets have different shapes that change accordingly with their
additional mathematical properties [82]. The WT calculates the correlation between
34
a signal and the selected wavelet which results in large magnitude wavelet coefficients
for resembling shapes of signal and wavelet, and low magnitude otherwise. The se-
lected wavelet is generally referred to as mother wavelet. There are many different
mother wavelets such as Daubechies (db) , Coiflets (coif ) and Symmlet (sym) and
some examples of which are shown in Figure 2.8 [83].
(a) db4 (b) coif4 (c) sym4
Figure 2.8: Example wavelet functions of different families.
Selection of the mother wavelet is dependent on the application requirements and
some research is reported for the selection of optimal wavelets for biomedical signal
processing [84–86]. The WT can be divided into two main groups known as the CWT
and the DWT which are briefly described in the following sections.
Continuous Wavelet Transform
The CWT is achieved by convolving a signal with translated and dilated versions of
the mother wavelet, generating wavelet coefficients. The temporally shifted and scaled
versions of the mother wavelet create the wavelet basis function (ψa,τ ptq) which is
defined in (2.5).










where ψptq is the mother wavelet (i.e. wavelet function), a is scaling factor and is
a ¡ 0, P R, τ is the amount of shift in time and 1?
a
is the factor required for energy
preservation, so that the wavelet coefficients have the same energy at every scale. For
this transform a and τ vary continuously over R and the CWT of a function xptq can
be calculated as,











where Wψpa, τq are the generated wavelet coefficients. (2.6) can be realized as convolu-
tion, and it appears that ψptq has a bandpass spectrum due to admissibility condition,
which implies that the FT of ψptq  Ψpωq vanishes at ω  0 [87]. Further mathem-
atical derivations and proofs can be found in [82, 83]. Hence, CWT is computed by
filtering the signal with dilated bandpass filters. The scaling factor a is directly related
to the spectrum analysing window size. A small scale factor compresses the wavelet in
time, where rapidly changing details can be captured corresponding to high frequency
components. Whereas a larger scale factor results in a stretched wavelet, capturing
slowly changing coarser features of the signal and this corresponds to low frequency
components. By varying the scale factor, the whole spectrum with varying window
sizes can be analysed. In addition, changing τ moves the time localization centre of
ψa,τ ptq, where each ψa,τ ptq is centred around τ . This way, one-dimensional time-domain
data is transformed into two-dimensions. Therefore, CWT provides good spectral res-
olution for low frequency components and good temporal resolution for high frequency
components. The time-frequency resolution of both STFT and CWT are presented in


















Figure 2.9: Time and frequency resolution of (a) Short-Time Fourier Transform and (b) Wavelet
Transform [88].
Discrete Wavelet Transform
Unlike CWT that employs continuous scale and shift values, the DWT discretizes these
values such that a1  aj0 and τ  kaj0τ0, where j, k are integers and a0 ¡ 1, τ0 ¡ 0









Choice of ψ, a0, and τ0 is a significant factor in practical implications of DWT. Hence,
selection of a0  2 and τ0  1 results in a wavelet orthonormal basis which yields to
a perfect tiling of the time-frequency plane as demonstrated in Figure 2.9 (b) [82]. A






and the wavelet coefficients of a discrete signal xpnq for n  0, 1, . . . ,M  1 can be
written as;








is a normalizing term ensuring the recovery of the signal transformed into
wavelet domain. As mentioned previously, the wavelet function has a bandpass spec-
trum, thus wavelet coefficients calculated up to scale 2j does not cover the whole signal
spectrum and it is necessary to add the low frequencies of the spectrum as well. This
can be achieved via introduction of scaling function φpnq which has a similar defini-
tion to the wavelet function but with different scaling parameters, as shown in (2.10).










xpnq2 j02 φp2j0n kq. (2.11)
where j ¥ j0. In practical applications the j0 is selected as 0 and the length of the
input data is selected to be power of 2 such that M  2J where J is an integer. The
Wψpj, kq and Wφpj0, kq are referred to as the Forward DWT. Therefore, the original
signal xpnq can be recovered using the computed wavelet and scaling coefficients using













The wavelet function can be expressed as a series summation of the scaling functions







where p is an integer and the dummy variable. If the ψpnq is scaled by 2j and shifted








where m  2k  p. Therefore, by substituting (2.14) into (2.9), the wavelet coefficients
can be expressed by the scaled and shifted versions of the scaling functions and this




hψpm 2kqWφpj   1,mq (2.15)
where Wφpj   1,mq  1?M
°
n
xpnq2 j 12 φp2j 1n mq. Similarly, the scaling coefficients




hφpm 2kqWφpj   1,mq (2.16)
Observing (2.15) and (2.16), it can be seen that these are convolution operations with
the functions hφpmq and hψpmq which are simply the impulse responses of a set of
analysis filters. Therefore, the calculation of both wavelet and scaling coefficients can
be realized as octave band filtering also known as subband coding which decomposes a
signal into octave frequency bands by consecutive stages of filtering and subsampling.
Implementation of Discrete Wavelet Transform
DWT can be implemented by two-channel quadrature mirror filter banks with lowpass
filter h0pnq (i.e. hφpnq) and highpass filter h1pnq (i.e. hψpnq) [82]. The output
from each filter is downsampled by 2 where outputs at lowpass and highpass branches
are known as the approximation coefficient (i.e. scaling coefficients) covering spectrum
below half sampling frequency and detail coefficients (i.e. wavelet coefficients) covering
39
spectrum above half sampling frequency, respectively. For the following stages, ap-
proximation coefficients obtained from previous stage are further decomposed with the
same lowpass and highpass filters. Decomposition, also known as analysis, is an iter-
ative process and provides coarser resolution in time and finer resolution in frequency.










In order to reconstruct the decomposed signal, the procedure applied in analysis stage
is repeated in reverse direction by upsampling detail and approximations coefficients,
followed by filtering operation using another set of lowpass (g0pnq) and highpass (g1pnq)
filters where the outputs of the lowpass and highpass branches are added. This pro-
cedure also known as synthesis, is repeated until the original signal length is recovered.
Complete demonstration of analysis and synthesis filter banks for DWT calculation is






































(b) Synthesis Filter Bank
Figure 2.10: DWT analysis and synthesis filter banks, for 3 level decomposition and reconstruction.
40
2.3 Chapter Conclusions
This chapter provided physiological information on the most widely used biopotential
signals in the field of diagnostics and monitoring. The physiological origins of ECG,
EEG and EMG signals along with their morphological characteristics and different
measuring techniques are presented. Feasible and accurate measurements of these sig-
nals are of great importance, since they are used in different applications where some
examples of these applications for each signal are provided. In addition, second part of
the chapter provides a brief introduction about wavelet transform and its differences
compared to well-known STFT. WT is a powerful tool in signal prcoessing applications
which has advantage of variable length windows for analysing different frequency com-
ponents. Thus, it is a superior transform for non-stationary signal applications where
signal properties vary over time. WT can be divided in to two main types which are
CWT and DWT. The DWT can be realized as a tree-form filter bank which provides
a practical implementation of wavelet transform. Chapter 4 and 5, provides further
information on filter bank implementation and their properties.
41
Chapter 3
Decimation Filter for Wearable
ECG Monitoring Systems
3.1 Introduction
In Chapter 1, the main functional blocks of a wearable WBAN sensor is presented.
Recalling Figure 1.2, the second block is given as an ADC which is required to have
sufficient resolution and minimal quantization noise for accurate digitization. Due to
low amplitude characteristics of the biopotential signals described in Chapter 2, the
ADC to be deployed needs to have 10 to 16 bits of resolution. Thus, the Sigma-
Delta (Σ∆) oversampled ADCs are well suited for the digitization process, since they
provide high resolution and dynamic range for low-bandwidth biomedical signals with
simple hardware architectures [89]. The Σ∆ ADCs incorporate two sections which are
the modulator and digital filter. The modulator reduces the quantization noise in the
signal band by moving it to higher frequencies by means of oversampling and noise
42
shaping. Digital processing of the raw high-rate bit stream from the modulator is
computationally complex, resulting in high power consumption and expensive digital
circuitry. Therefore, a decimation filter is used to reduce the sampling rate while
preserving the resolution by filtering the high frequency noise and downsampling.
This chapter presents a decimation filter chain for ECG signal acquisition which provides
a highly efficient filtering performance by introducing minimal signal distortion. The
proposed design employs cascaded Slink filter (generally spuriously referred to as a
Cascaded Integrator-Comb (CIC) filter [90,91]), two path all-pass based HB IIR filters
and a Slink roll-off compensator [92–98]. IIR filters are renowned for their computa-
tional efficiency however at the expense of having a non-linear phase response which
may result in waveform distortions in the time-domain. The work reported here also
studies the non-linear phase effects of these filters on ECG data. For further investiga-
tion phase compensation filters are implemented using low-complexity cascaded all-pass
filter sections and results from the original and phase corrected filters are comparatively
studied. To the best knowledge of the authors this is a first in the biomedical signal
processing literature. The rest of this chapter provides a brief summary of the state-
of-the-art decimation filters used in biomedical applications. The detailed structure of
the proposed decimation filter along with the magnitude, phase and group delay char-
acteristics are provided. The following section further provides information on phase
compensation filter design and introduces the databases used for simulation, test and
evaluation purposes. The chapter concludes with the results and discussion sections.
43
3.2 State-of-the-art Decimation Filters in Biomed-
ical Applications
In the open literature, several decimation filters are proposed for biosignal acquisition
systems where some employed multi-stage Finite Impulse Response (FIR) decimation
filters with very high filter orders, and therefore necessitating extra processing in terms
of arithmetic operations [1–4,26]. In [5,99] single stage FIR decimation filters are used,
in which the filter coefficients are optimized in order to match the analog loop filter
response of the modulator. Single stage decimation with a high decimation rate re-
quires sharper transition, higher stopband attenuation and thus a higher filter order.
Whereas, the Canonic Signed Digit (CSD) coefficients is another alternative for redu-
cing the complexity of the decimation filters. This representation uses a series of sub-
tractions and additions in order to accomplish the multiplication operation. Although
the number of multiplier units are decreased by this method, the need for high order
digital filters in the decimation stage is not avoided [100,101]. The aforementioned fil-
ters are all designed by using high order FIR filters which inevitably increase the power
consumption and hardware complexity. In [102–106], single stage Slink decimators are
employed. However, using a single Slink filter with a high decimation ratio comes at a
high cost, as it does not easily provide the required stopband attenuation for suppress-
ing the high frequency noise, and exhibits a high roll-off in the passband region which
degrades the in-band signal features. Furthermore, in [101,107] three stage decimation
chains each employing a Slink, an FIR and an IIR filter are proposed. However, due to
the non-linear phase of the IIR filter, an additional equalizer is applied following the
44
decimation chain. Thus, even though the overall filter order is reduced, compared to
their FIR counterparts, the hardware complexity and power consumption is increased
due to the equalizer.
3.3 Proposed Decimation Filter Structure
The proposed decimator (filter followed by downsampling) is designed to achieve a
decimation ratio of 128 in order to demodulate the 1-bit output stream from a third-
order Σ∆ modulator. It is composed of four cascaded multiplier free stages, which are
a fourth-order Slink filter, two fifth-order two-path all-pass based HB IIR filters and
a first order Slink compensation filter. The overall decimator is designed to achieve
0.04dB passband ripples and 74dB stopband attenuation. The behavioural structure
of the cascaded decimation filter incorporating the decimation ratios at each stage is
























Figure 3.1: Behavioural structure of the decimation filter, incorporating the 4th order Slink, two
2-path all-pass based HB IIR, and Slink compensation filters.
3.3.1 The Slink Filter
A fourth order Slink filter with a decimation ratio of 32 is the first stage of the proposed
design that has linear phase with a multiplier free structure and a z-domain transfer










The gain of the filter is required to roll-off at a faster rate than the Σ∆ modulation noise
rises, and therefore the order of the filter has to be one higher than the Σ∆ modulators
order. A filter with a higher order causes a higher roll-off in the band of interest and a
lower order will not provide the required stopband characteristics resulting in aliasing in
the signal band [108]. The Slink filter incorporates two sections with the opportunity of
shifting the downsampler in between these two sections. The two sections are made up
of four cascaded accumulators and four cascaded differencers. The shifted downsampler
enables the differencers to operate at a rate of 32 times slower than the input, reducing
the power consumption, which makes it well suited for applications requiring high
decimation factors and low circuit complexity and power consumption [91]. Magnitude
response of the Slink filter is presented in Figure 3.2.
Figure 3.2: The magnitude response of the fourth order Slink filter given in (3.1).
The Slink filter is behaviourally equivalent to four cascaded 32-point moving averagers
followed by a 32:1 sample rate decimation. Hence, as it can be seen, magnitude-





3.3.2 Two-Path All-pass Based Half-Band IIR Filters
In order to provide computationally efficient second and third stages of the decimation
chain shown in Figure 3.1, very high fidelity minimum phase two-path all-pass based
HB IIR filters are designed where at each stage the input signal is decimated by two.
The filters described here are formed by two-parallel paths composed of second order
all-pass filters with multiplier free structures. The delayer in the bottom path of the
recursive polyphase structure creates an increasing phase difference between the two
paths. This difference reaches 90 at νc and 180 at νs which define the normalized
cut-off and the stopband frequencies respectively. Since Aipzq, given in (3.2), is rational
and sparse, the downsamplers can be shifted before the all-pass sections by using the




↓ 2    





Figure 3.3: Two-path all-pass based HB IIR decimator structure, incorporating all-pass filters A1pzq
and A2pzq in the top and bottom paths respectively.
The coefficients of the all-pass sections in the second and the third decimation stages
are designed to be powers of two pα1  0.125 and α2  0.5625q [92]. These coefficients
eliminate the need for multipliers by replacing them with shift and add operations, as
shown in Figure 3.4 (a) and (b), where ” " ” represents a binary shift operation along
with a number indicating the number of shifts. The z-domain transfer function for
47
both all-pass sections is [92, 110],
Aipzq  αi   z
2
1   αiz2 (3.2)





















Figure 3.4: First order all-pass filter structures for (a) A1pzq in the top branch with α1  0.125 and
(b) A2pzq in the bottom branch with α2  0.5625 .
Each of the proposed HB IIR filters achieve 0.47 µdB passband ripples in the region of
signal activity and 70 dB stopband attenuation with minimal hardware complexity.
The magnitude response of these filters are presented in Figure 3.5.
Figure 3.5: Magnitude response of the proposed HB IIR filters presented in Figure 3.3.
48
3.3.3 The Slink Roll-off Compensation Filter
The Slink compensator is the last stage in the proposed design, exhibiting a passband
response that is the inverse of the slink filter in order to compensate the amplitude
roll-off in the band of interest (dc to half Nyquist) caused by the Slink filter. The
compensator has a similar structure with the all-pass sections given in Figure 3.4
whereas only one delayer is required as shown in Figure 3.6 [90, 111]. This filter also
has a multiplier free structure, since the only coefficient is designed as power of two








Figure 3.6: Slink roll-off compensation filter structure with coefficient αc  0.03125.
3.3.4 Decimation Chain Magnitude Response
The decimation filters introduced in the previous sections were implemented using
MATLAB in order to validate the overall system performance. Figure 3.7 (a) and (b)
present the fullband magnitude response of the decimation chain at the oversampled








































Figure 3.7: Full band magnitude response of the overall decimation filter, (a) at input rate (zoomed
into the passband region) and (b) at the output rate (decimated) along with the Slink (blue) and
Slink compensation (red) filters magnitude response.
In most ECG signal applications, the desired dynamic range can be up to 60 dB and the
minimum required resolution is 8-bits [22]. As it can be observed from Figure 3.7 (a),
the overall decimation chain achieves 0.04 dB passband ripples and 74 dB stopband
attenuation. This fits the specifications of the application as it provides the required
signal resolution and attenuates the high frequency quantization noise. Magnitude
characteristics of the Slink (blue) and Slink compensation filters’ (red) in the band of
interest are presented in Figure 3.7 (b) in which the inverse Slink characteristics of the
Slink compensator can be easily seen. The Slink roll-off is compensated up to a certain
ripple size by using only a first order multiplier free allpass IIR filter.
3.4 Phase Characteristics of All-pass Based Half-
band Polyphase IIR Filter
IIR filters are well known for their non-linear phase response which should be com-
pensated in order to avoid any phase distortion on the signal to be processed. The
50
filter group delay is a measure of the linearity of the filters phase response which is
defined as the negative derivative of the phase of the system with respect to frequency




where φpνq is the filter phase and ν  f{fs is normalized fre-
quency, fs being the sampling frequency [112]. The group delay contribution of the
second stage and third stage filters (Figure 3.1), both of which are chosen to have the
same structural and coefficient content as depicted in Figure 3.3 and Figure 3.4, to the
overall group delay is,
τIIRpνq  N1τHBpN1νqloooooomoooooon
2nd stage HB IIR
  2N1τHBp2N1νqloooooooomoooooooon








τkpνq   ∆ (3.4)
where N1 is the downsampling ratio at the first stage of the decimation filter and K is
the number of paths in the filter structure. In (3.4), 1
2
is the group delay contribution of
the delayer used in the bottom path of the polyphase structure and τkpνq represents the
group delay contribution of the kth all-pass filter with coefficient αk to the overall group
delay of the all-pass based HB IIR decimation filter τHBpνq, which can be formulated
as follows,
τkpνq  2 1  α
2
k
1   α2k   2αk cosp4piνq
(3.5)
and ∆  ° δpνzq is the summation of the delta functions in the group delay occurring
at frequencies νz ¡ 0.25 corresponding to the filter zeros [112]. Since the frequency
components beyond the cut-off frequency correspond to the filter’s stopband region,
the parameter ∆ in (3.4) can be ignored. The total group delay for the aforementioned
51
filters results in a bell-like shape that can be formulated as τHBpνq  12   τ1pνq   τ2pνq
according to (3.4), for K  2 and ignoring the delta functions at the filters stopband.
Here τ1pνq and τ2pνq are the group delay functions of the all-pass filters in the top and
bottom branches. The group delay of the two-path all-pass based HB IIR filter with
non-linear phase response is presented in Figure 3.8 .
Normalized Frequency (ν)













Figure 3.8: The group delay of the two-path all-pass based HB IIR filter with the all-pass section
coefficients α1  0.125 and α2  0.5625.
3.5 Phase Compensation
Linear phase is desired for most of the systems especially for biomedical applications
since the temporal characteristics of a biosignal is of great importance due to diagnostic
purposes. A non-linear phase results in a non-constant group delay which introduces
different amount of time delays at different frequencies which may result in a distortion
in the time domain amplitude of a signal. Phase non-linearity can be compensated by
designing cascaded all-pass correctors with a phase response which is approximately
opposite of that for the polyphase filter [112]. The aim is to achieve an almost constant
group delay response in the band of interest. A general transfer function for a K section
52
compensator is,












where αc0 and αcm are the coefficients of a single second order and higher order correct-
ors respectively [112]. In order to study the non-linear phase effects of the proposed
all-pass based HB IIR filters, a second order single corrector and a four section com-
pensator are designed according to the detailed algorithm provided in [112]. Due to
high sampling rate at the second stage of decimation, the phase non-linearity in the
band of interest (i.e. the group delay peak-to-peak difference) is small however, it
grows quickly as the sampling rate reduces at the third stage. Thus, a single all-pass
corrector at the second stage is sufficient whereas, a 4 section corrector is required
for the third stage in order to achieve a better correction in the band of interest. In
this study, the phase is compensated up-to νc  0.125 which reduces the peak-to-peak
difference in the band of interest (in this case for νc  0.125) from 0.431 to 0.0041. In
other words peak-to-peak error is reduced by a factor of 106.3 times [112].
K  τbpνqmax  τbpνqmin
τapνqmax  τapνqmin , where 0 ¤ ν   νc (3.7)
where τbpνq and τapνq are group delays before and after phase correction, respectively.
Figure 3.9 (a) presents the normalized group delay responses of the original filter (blue)
versus the phase corrected filter (black) via a single section all-pass corrector (red).
Similarly, Figure 3.9 (b) illustrates the group delay responses of the original (blue) and





















































Figure 3.9: The two-path polyphase IIR filters normalized group delay response corrected using (a)
a single section corrector and (b) a 4 section corrector along with the original filters (blue), and the
phase compensators (red) group delay responses.
3.6 T-wave Alternans Challenge and MIT-BIH Ar-
rhythmia Database
For evaluating the performance of the proposed decimation filter with ECG data, 10
second long recordings from two databases are used. The T-Wave Alternans Chal-
lenge (TWAC) Database provides ECG recordings at sinus rhythm which are acquired
and digitized at 500 Hz by a typical 12-lead standard ECG from healthy subjects [113].
In addition, the MIT-BIH database provides ECG data with a wide range of arrhythmia
and beat morphology variations, which are recorded via a two-channel recorder and
digitized at 360 Hz [114, 115]. Spectral analysis was carried out for both datasets
by taking the averages of the Power Spectral Densities (PSDs) of each data set for
each Lead in order to determine the frequency content distribution of the data records.
Results from the TWAC Database records showed that more than 99% of the average
power is concentrated in the frequency band of 0  50 Hzpν  0  0.1q Figure 3.10
54
(a) presents the average power of the 12-Leads (labelled through I to V6) at the cor-
responding frequency band for record twa55. Figure 3.10 (b) shows the average power
in several frequency bands of Lead II records of 13 different heart conditions, obtained
from MIT-BIH Arrhythmia database. These heart conditions are chosen according to
their density of occurrence such as atrial fibrillation, ventricular fibrillation and tachy-
cardia. In the presence of different arrhythmias and conduction abnormalities, similar
to the previous observation, more than 99% of the average power is concentrated in the
frequency band of 0 45 Hzpν  0 0.125q. The 60 Hz power-line interference (USA
standard) and its second harmonic can be observed as peaks in the corresponding plots.
Frequency, Hz
(a)

























































Figure 3.10: PSD of 10 seconds long (a) 12 lead recordings (from Lead I to Lead V6) of record twa55 in
sinus rhythm obtained from TWAC Database and (b) 13 Lead II recordings from MIT-BIH Arrhythmia
Database with various conduction abnormalities and beat morphologies. (AFIB :Atrial fibrillation ,
AFL: Atrial flutter, SBR: Sinus bradycardia, IVR: Idioventricular rhythm, SVTA: Supraventricular
tachyarrhythmia, VFL: Ventricular flutter, VT: Ventricular tachycardia, BII: 2 heart block, PREX:
Pre-excitation, B: Ventricular bigeminy, T: Ventricular trigeminy and P: Paced rhythm)
In addition, the spectral resemblance between each of the leads and consecutive 10
second time segments of the same lead for both sets of data is studied by measuring






where Pxx and Pyy are the power spectral estimates of x and y respectively, and Pxy is
the cross spectral estimate of x and y. The coherence value shows the degree of spectral
resemblance of x to y at each frequency, which ranges between 0 and 1 [49]. The CSC
results are almost 1 for each lead and different time instances showing very high spectral
resemblance amongst each other. Therefore, the performance of the decimation filters
on one channel of a 10 second segment of an ECG signal is a good indication for its
performance with respect to the overall system and longer recordings.
3.7 Error Measures
The performance of the proposed filter without and with phase correction can be
quantified by measuring the filter input and output signal similarities in time domain
(after time alignment) by calculating the cross correlation and Root Mean Square Error
(RMSE) which are defined in (3.9) and (3.10), respectively [49, 116]. The correlation
coefficient values are normalised between 0 and 1 which indicate no similarity and exact
match, respectively. The complement of the correlation coefficient is used as a measure
for the waveform dissimilarities and is insensitive to variations in the gain [117]. Thus,
the RMSE is used to measure the amplitude variations of the input xpkq and output
ypkq.




where, Cxy is the covariance between xpkq and ypkq, Cxx and Cyy are the auto-covariance






xpkq  ypkq2 (3.10)
where, K is the number of samples, xpkq is the ECG data records, and ypkq is the
output from the decimation chain. The spectral similarities of the corresponding sig-
nals are quantified by calculating the Distortion Ration (DR) and the CSC, given in
(3.11) and (3.8) respectively. The CSC is a measure that is not effected by the signal
morphology and amplitude, quantifying the steadiness of the input and output signal
phases as a function of frequency [117].
DR 
d° pY pfq Xpfqq2°
Xpfq2 (3.11)
where Xpfq and Y pfq are the spectral magnitude of the input and output of the
decimation chain.
3.8 Experimental Results
In order to test and evaluate the performance of the proposed decimation filter for ECG
signal acquisition, ECG data, described in Section 3.6, are fed into a single-loop 3rd
order 1-bit Σ∆ modulator and modulated at a rate of 128 times faster than the Nyquist
sampling frequency which are then filtered and decimated. The behavioural fixed-
point models of the aforementioned filters were implemented through MATLAB. The
simulations are run for the 10 second long Lead II recordings of two different databases.
First dataset includes 6 recordings (records twa39, twa46, twa55, twa60, twa90 and
57
twa99 ) from healthy subjects in sinus rhythm and the second data set includes 13, each
representing different arrhythmias and beat morphologies as depicted in Figure 3.10.
Both datasets are decimated by the decimation filters described in Section 3.3 with
and without phase compensation. The decimation filter performance is first evaluated
by measuring the SNR at the output of the Σ∆ modulator and the decimation chain
without and with compensation, under the sampling rates of 46.08 kHz and 64 kHz.
The signal bandwidths are chosen as 360 Hz and 500 Hz for the aforementioned
sampling conditions respectively. The most distinguishable feature of an ECG signal
is the QRS complex (see Figure 2.2) where its frequency content can extend up to 50
Hz [26, 35, 118, 119]. Any added noise in this bandwidth will distort the time domain
signal and will be most visible with the QRS complex. Thus, 50 Hz sine wave with an
added 60 dB white noise is fed to the Σ∆ modulator to demonstrate the effect of Σ∆
modulation and decimation filtering in the signal bandwidth of 50 Hz. The SNR (in
the band of interest) at the output of the modulator and the decimation chains without
and with phase compensation at the sampling rate of 64 kHz were 80.2 dB, 80.1 dB
and 80.1 dB respectively. The proposed decimation filter successfully preserves the
SNR in the band of interest after the decimation process. In addition, SNRs obtained
after decimating the modulated signal with the decimation filters without and with
compensation demonstrated that the compensation filters do not contribute to the
in band noise attenuation. Figures 3.11(a) and 3.11(b) present the Power Spectral





































SNR = 80.1 dB
SNR
corrected = 80.1 dB
Original
Corrected
Figure 3.11: Power spectrum measurements within the signal bandwidth of 500 Hz for sampling rate
of 64 kHz, (a) Σ∆ modulator output and (b) decimation chain output (without phase compensation
(black) and with phase compensation (red).
Normalized group delay introduced by the final three stages of the decimation chain(two
all-pass based HB IIR filters and a slink compensator), the compensated group delay
(up to ν  0.125), filter magnitude response at the decimation chain output and the
PSD of a Lead II ECG record (twa55 ) from a healthy subject are shown in Figure 3.12.
Figure 3.12: PSD of Lead II recording of record twa55 at sinus rhythm (blue) versus the group delay
variation with (green) and without phase compensation (red) filters, f1  22.5 Hzpν1  0.0625q and
f2  45 Hzpν2  0.125q indicated by the yellow lines at fs2  46.08 kHz.
The variations in the total group delay of the filters are calculated in the specified
frequency bands (yellow lines in Figure 3.12). Table 3.1 provides the group delay
59
variations in number of samples and in µ-seconds at two different sampling rates fs1 
64 kHz and fs2  46.08 kHz for TWAC Database and MIT-BIH Arrhythmia Database
respectively.
Table 3.1: Variation in the group delay in normalized frequency bands of νB1  0  0.0625 and
νB2  0  0.125.
Normalized Frequency
band pνBq
Without compensation With compensation
Samples µSec (fs1)1 µSec (fs2) 2 Samples µSec (fs1) µSec (fs2)
0 0.0625 1.86 40.28 29 0.22 4.8 3.44
0 0.125 7.61 165.15 119 0.49 10.63 7.66
1fs1  46.080 kHz
2fs2  64 kHz
As mentioned in Section 3.6 and shown in Figure 3.10, more than 99 % of the power
of the ECG data is concentrated in the frequency range of the ν  0  0.125 for
both datasets, thus group the delay variations are explored in this frequency range.
According to the upper and lower limits of the wave durations given in Table 2.1,
the group delay variation caused by the decimation filters are relatively low for the
frequencies up to ν  0.125. The maximum variation the signals are exposed to are
165 µsecs (fs2  46.08 kHz) and 119 µsecs (fs1  64 kHz) without compensation, and
10.63 µsecs (fs2  46.08 kHz) and 7.66 µsecs (fs1  64 kHz) with compensation. These
deviations are comparatively low regarding to the wave durations and normal limits
for standard ECG features, thus the group delay variation due to phase non-linearity
without compensation in this frequency range is negligible.
The first experimentation incorporated the data obtained from healthy subjects which
are sampled at 500 Hz and oversampled by a factor of 128 (i.e. 64 kHz). Figures 3.13(a)
60
and 3.13(b) illustrate the input (twa55 ) and output from decimation chain without and
with phase compensation, respectively. As presented in Figure 3.13, there is no visible
error between two filter outputs which can be seen from Figure 3.14 illustrating this
amplitude error. In Figure 3.14, a relatively large overshoot with a decaying manner
can be observed which is due to the transient response of the IIR filters.
Time, sec
(a)


































Figure 3.13: Decimation chain output (black) versus the input(red) (a) without phase compensation
and (b) with phase compensation.
Time, sec










Figure 3.14: Amplitude difference between the input and output of the decimation chain, without
(original - red) and with compensation (corrected - black).
The average of six RMSE values of data records obtained from decimation chain
without compensation that are calculated according to (3.10) is 15 nV with a max-
imum value of 20 nV and a minimum of 10 nV , and for the set with compensation it
61
is 13.2 nV with a maximum of 15.5 nV and minimum of 0.7 nV . The most obvious
amplitude differences are observed at the QRS peaks where the frequency content of
the QRS complex is exposed to a higher group delay. The largest peak amplitude
errors in six of the data demodulated with decimation filter without and with phase
correction are 43 nV and 44 nV . According to the normal variation limits given for
the ECG wave amplitudes (Table 2.1) both mean errors and the maximum QRS peak
errors are acceptable since they are relatively small. In addition, the overall mean of
input/output dissimilarity calculated by (3.10) are, 0.2 103% and 0.13 103% for
uncompensated and compensated filters, respectively and waveform dissimilarities for
each data records are shown in Figure 3.15(a).
(a) (b)
Figure 3.15: (a) Waveform Dissimilarity between the input and output of the overall decimation chain
without (blue) and with group delay compensation (red). (b) Input/ Output Distortion Ratios of the
overall decimation chain without (blue) and with phase compensation (red).
Finally, the spectral similarity is measured by calculating the distortion ratio as well as
the CSC between the input and the output. The DR is calculated for six of the ECG
data using (3.11), which are represented in Figure 3.15(b) and the means are 4.8% and
3.3%, for filters without and with compensation, respectively. The CSC for the input
62
and the output is also calculated which give values that are approximately one for most
of the frequencies, showing high spectral coherence between the input and output.
The above mentioned experiment is repeated for the data obtained from MIT-BIH
which are digitized at 360 Hz and oversampled by a factor of 128 (i.e. 46.08 kHz).
Average of the RMSE values obtained from decimation chain without compensation is
83.5 nV and for the one with compensation is 75 nV . The largest peak amplitude errors
in 13 of the data demodulated with decimation filter without and phase correction
are 48 nV and 60 nV . A 10 second long Lead II recording from a patient suffering
from ventricular tachycardia (record 213 ) after processing with the decimation filters
discussed above is presented in Figure 3.16. The error measures between the input and
output from original and compensated filter are given in Figure 3.17.
Time, sec
(a)
































Figure 3.16: Decimation chain output (black) versus the input (red) (a) without phase compensation
and (b) with phase compensation (Ventricular Tachycardia).
63
Time, sec












Figure 3.17: Amplitude difference between the input and output of the decimation chain, without
(red) and with compensation (black). (Ventricular Tachycardia)
Figures 3.18 (a) and (b) show the waveform dissimilarity and the measured distortion
ratio for these records. The mean DR is calculated to be 5.3% and 3.7% for decim-
ation filter without and with compensation, respectively. Also, the overall mean of
input/output dissimilarity is estimated to be 0.4103% and 0.16103% for uncom-
pensated and compensated filters, respectively. The group delay compensation results
in a better performance by resulting in a lower waveform dissimilarity.
(a) (b)
Figure 3.18: (a) Waveform Dissimilarity between the input and output of the overall decimation chain
without (blue) and with group delay compensation (red). (b) Input/ Output Distortion Ratios of the
overall decimation chain without (blue) and with phase compensation (red).
64
The performance of the proposed polyphase filter is also compared to the all-pass
based polyphase filters that are designed according to the algorithms provided in the
state-of-the art. Cascading multiples of the polyphase structures improves the stop-
band attenuation while keeping the cost and complexity low, however in the cost of
decreased passband ripples and increased group delay. In order to achieve a better stop-
band attenuation, a cascade of two proposed polyphase filters are implemented [91].
According to the algorithm provided in [120], a two path polyphase filter with two cas-
caded all-pass filters in both paths and finally, the filter designed in [121] incorporating
4 cascaded two path polyphase structures with single all-pass filters are implemented.
Table 3.2 shows the filter specifications, the number of adders and multipliers required
and the group delay introduced by each filter.
Table 3.2: Two-path All-pass based HB IIR Filter Characteristics Comparison with the State-of-the-
art
Filter Specifications Cost Group Delay Variations (samples)
Order p1 (dB) s 2 (dB) νT
3 Adders Multipliers 0-0.0625 0-0.125
Proposed 5 0.47 µ 70 0.15 6 0 1.86 7.61
[91] 10 1 µ 140 0.15 12 0 3.44 14.17
[120] 9 1.17 µ 70 0.05 9 4 3.01 12.45
[121] 12 0.1 88 0.05 11 0 3.63 15.32
1p: Passband Ripples
2s: Stopband Attenuation
3νT : Normalized Transition Bandwidth
3.9 Analysis
The calculated SNR at the modulator and decimation filter output, as shown in Figure
3.11, demonstrated that the proposed design with or without compensation successfully
65
filters the out-of-band quantization noise while preserving the SNR in the signal band
without detrimental group delay distortion. In Figure 3.12 it is shown that the use of
phase compensation filters reduces the group delay variations in the frequency band of
ν  00.125 which directly effects the amount of time delay each frequency component
is exposed to. Furthermore, in Table 3.1 the amount of the group delay variations in the
frequency bands of ν  0 0.0625 and ν  0 0.125 are presented in terms of samples
and µsec as the proposed filters are designed as discrete time digital filters. These
sample values can be easily converted into seconds by simply dividing them into the
oversampling frequencies which are indicated as fs1  46.08kHz and fs2  64kHz for
MIT-BIH Arrhythmia Database and TWAC Database, respectively. In order to provide
a more detailed understating of Table 3.1, lets look at the fourth and sixth columns
which present the amount of group delay variation for decimation filters without and
with phase compensation in the bands of interest, in terms of µsec. In the band of
ν  00.0625 the group delay variation for the proposed decimation chain is measured
as 29 µsec which indicates that if a frequency component at DC (ν  0) is delayed
by 0 second, the frequency component at ν  0.0625 will be delayed by 29 µsec. On
the other hand, use of phase compensation filters reduces this variation to 3.44 µsec.
This will effectively lead to a misalignment of the time domain signal which will result
in signal distortion and thus, lower group delay variation indicates less distortion.
Although phase compensation reduces the group delay variation, the values obtained
without phase compensation are comparatively low regarding to the wave durations
and normal limits of a standard ECG features as presented in Table 2.1. This provides
a good indication that the phase compensation is not compulsory for this application,
66
nevertheless further simulations are carried out in order to evaluate the effect of phase
non-linearity on different ECG data records. The results of these simulations are
presented in Figures 3.15 and 3.18 which show the waveform dissimilarity and distortion
ratio for each ECG data record of the two aforementioned databases. In both figures,
the blue and red bars present the results of analysis obtained without and with phase
compensation, respectively. In Figure 3.15 (a) the waveform dissimilarity for six TWAC
ECG data records are presented and it can be observed that the utilization of phase
compensation filters reduces the amount of waveform dissimilarity. Amongst these six
data records, twa90 shows higher values which is related to the power of the signal.
twa90 record is observed to have lower signal power, calculated as the average sum of
the absolute squares of the time-domain samples, when compared to the other data
sets which shows that the signal of interest is more effected by the high frequency noise.
As the group delay variation increases at higher frequencies than ν  0.125 this leads
to increased signal distortion and thus increased waveform dissimilarity. However, the
values presented in this figure are in the order of 104 which is a negligible mismatch.
In addition, Figure 3.15 (b) presents the distortion ratio where phase compensation
reduces the distortion ratio, as expected. The twa46 and twa90 records present higher
ratios as opposed to the other signals which is due to having lower signal power. As DR
is a ratio of the power of difference between the magnitude of the original data and the
decimation filtered data, to the power of the signal, the DR values can vary for different
signals. The DR results are also dependent on the frequency content of the signals.
When the signal has more high frequency components the signal faces more distortion
due to the increasing group delay variation which affects the effectiveness of the phase
67
compensators as can be seen from the difference between the red bars for these two data
records. Although, phase compensation filters reduce DR and waveform dissimilarity,
the results obtained without phase compensation is negligible. Therefore, it is not
crucial to use phase compensation filters which will increase the hardware and algorithm
complexity. Furthermore, Figures 3.18 (a) and (b) present the waveform dissimilarity
and DR for the ECG data records obtained from the MIT-BIH Arrhythmia Database.
Similarly, phase compensation filters reduces the signal distortion for all data records.
The two conditions, Ventricular bigeminy (B) and 2 heart block (BII) presented in
both figures exhibits relatively higher waveform dissimilarity and DR, compared to the
other cardiac conditions. Power of these signals are calculated to be lower than the
rest of the signals which leads to a relatively higher distortion ratio and waveform,
since according to (3.11) decreasing signal power, increases the distortion ratio and
waveform dissimilarity.
The simulation results obtained by using two ECG data sets with different diagnostic
importance, the spectral similarity for both sets of data (healthy and unhealthy) is
determined to be approximately the same due to the passband characteristics of the
decimation filters. Morphological similarity between the filter input and the output is
too high for the filters without compensation in the case of both normal sinus rhythm
and arrhythmia. In addition, since the mean amplitude error is negligibly small and
almost the same for both filters, the use of the compensation filter is not necessary,
thus saving power and cost by avoiding the use of extra hardware. This is due to the
fact that more than 99 % of the signal energy is concentrated in the frequency range
of the ν  0 0.125 where the group delay variations are minimal. Although, different
68
results are obtained for each data set, it should be noted that, biopotential signals are
non-stationary and there is no standard one best solution for them. Overall results
have proven that proposed design do not cause any drastic distortion which might lead
to misdiagnosis.
The comparisons with the state-of-the art filters and the filters implemented accord-
ing to the state-of-the-art algorithms, presented in Table 3.2, demonstrated that the
proposed polyphase structures are superior to others since they provide very small
passband ripples (0.47µdB) and sufficient stopband attenuation with negligible group
delay variation in the band of interest. The proposed design requires no multiplier
which makes it superior to the filter designed by [120], whilst they exhibit almost the
same magnitude characteristics. Although, [121] is a multiplier free design as well, it
requires almost twice the number of adders in order to achieve a stopband attenuation
of -80 dB and suffers from having large passband ripples. In addition, the group delay
variation introduced by the proposed design is almost half of the group delay variation
introduced by [120] and [121]. Cascading two polyphase structure provides a very high
stopband attenuation and micro dB passband ripples. However, the performance of a
single polyphase filter incorporated in a decimation chain is sufficient enough to at-
tenuate the out-of-band quantization noise and prevent aliasing, therefore there is no
need for using higher order filters with increased group delay variation.
3.10 Conclusions
This chapter presented the design of a decimation chain composed of a forth order
Slink filter, two fifth order allpass based polyphase halfband IIR filters and a first or-
69
der Slink compensator to be employed in ECG monitoring applications. The proposed
decimation chain is a low-complexity design which can be implemented without the
need for multipliers where the allpass filter coefficients are simply powers of 2 and can
be implemented with shift-add operations. The state-of-the-art decimation filters used
in biomedical applications employ high order FIR filters which increases the complex-
ity of the decimators. Thus, the proposed design provides reduction in complexity
and offers an attractive alternative for the already existing solutions in the literat-
ure. Furthermore, it is demonstrated that the phase non-linearity of the IIR polyphase
filters, do not cause a significant distortion on the morphological and spectral charac-
teristics of the input ECG signal. This is due to the very narrow and low frequency
range corresponding the physiologically significant frequencies for the ECG signals. In
other words, these frequency bands are close to DC where the group delay variation is
already minimal (minimum phase filter) without using any group delay compensator.
The high spectral coherence, high morphological correlation and low nV error between
the input and the output signals quantifies that the IIR polyphase filter introduces
minimal distortion to the signal which would not affect critical diagnosis therefore,
phase compensation is not a must for such an application. The work reported in this
chapter is that of a decimation filter to be used in an ECG data acquisition systems
with very efficient filtering performance that delivers low-power and low-complexity.
The proposed design meets the requirements of demodulating a Σ∆ modulator out-








Implementation of the DSP algorithms requires intensive multiply-add operations which
increases the importance of realizing the associated hardware for power limited applic-
ations. The general purpose multipliers are most widely used building blocks, however
they are known to be hardware and power inefficient [122]. Hence, a vast amount of
research is being carried to reduce their complexity. Nevertheless, the general pur-
pose multipliers are not required for multiplications with fixed constants, known as
Single Constant Multiplication (SCM) or Multiple Constant Multiplication (MCM).
Therefore, they can be replaced with bit-wise shifts and less complex adders and/or
71
subtractors in order to reduce the hardware complexity and power consumption of the
system. As adders and subtractors have similar hardware complexity both will be re-
ferred to as adders in rest of the document. Complexity of such shift-add networks is
dependent on the quantity of adders, since shifting can be realised by hard-wiring for
bit-parallel arithmetic operations, without any additional hardware [123]. Although
replacement of multipliers with fixed shift-add networks decreases the hardware cost,
different design methodologies and techniques have been studied over the years in
order to optimize the resource utilization of these networks both for parallel and time-
multiplexed architectures [122–131].
Most of discrete transforms such as Discrete Fourier Transform (DFT), Discrete Co-
sine Transform (DCT) and digital filters employ constant multiplications that can be
implemented as shift-add operations. Discrete wavelet analysis is another example of
discrete transforms that can be realised as a filter bank. It employs fixed coefficient
filters associated with a selected mother wavelet, hence it can benefit from shift-add
network topologies. To the best of author’s knowledge, the replacement of filter mul-
tipliers with these networks has not been investigated deeply in the biomedical signal
processing literature. This chapter presents a hardware efficient reconfigurable constant
multiplier block structure and its FPGA implementation that can be employed in time-
multiplexed filter structures for wavelet analysis of biomedical signals such as ECG,
EEG, and EMG. This design is based on efficiently employing dedicated resources of
the FPGA as presented in [124,125], however with an extension by taking the advant-
age of the new FPGA technology. The rest of this chapter provides a brief summary
of existing methods for designing multiplier blocks in parallel and time-multiplexed
72
filter architectures. Detailed structure of the implemented design and its comparison
to the general purpose parallel multiplier are demonstrated. Furthermore, a DWT FB
is also designed and implemented using an FPGA where its hardware resource utiliz-
ation and power consumption is compared with the state-of-the-art designs that are
also multiplier free. The chapter concludes with a results and comparisons section.
4.2 Parallel Architectures and Multiplier Blocks
In a parallel filter architecture as shown in Figure 4.1, multiplications are performed
concurrently. When dealing with constant multiplications, each general-purpose mul-
tiplier shown in the dashed line boxes in Figure 4.1, can be replaced with a separate
fixed shift-add network.
x(k) . . .
. . .
y(k) 








Figure 4.1: Parallel (a) Tapped Delay Line (TDL) and (b) Time Delay and Accumulate (TDA) filter
architectures where the boxes highlight the multiplication blocks.
Minimization of the number of adders being used in each network is known as the SCM
problem. The coefficients can be represented in signed digit format where the number of
non-zero terms determines the number of adders required to generate the corresponding
73
coefficient. However, instead of implementing separate SCMs, Bull et al. introduced
the primitive operator filter technique, also known as MCM or multiplier blocks [122], to
further reduce the hardware utilization. This method combines multiple constants and
shares the intermediate adder results between each coefficient, generating concurrent
coefficients. Over the last three decades many different optimization algorithms have
been proposed for reducing the resource utilization of the multiplier blocks for parallel
architectures, that can be broadly divided in two concepts, Common Sub-expression
Elimination (CSE) [129–135] and Directed Acyclic Graph (DAG) [122, 128, 133, 134,
136,137].
4.2.1 Common Sub-expression Elimination (CSE) Technique
The CSE is an optimization technique that maximizes sharing of common sub-expressions
by exploiting the repeating digit pattern among coefficients that are in the CSD format.
CSD is a subset of the signed digit number system in which constants are represented
using signed digits -1, 0, and 1. It provides a unique representation for any n-bit twos’
complement number and gives minimum number of non-zero digits compared both to
the twos’ complement and the signed digit formats [138]. The CSD algorithm searches
for a bit stream with a format of 1011..111 that has at least 2 bits and replaces this
pattern with 1100...01¯1 of the same length where 1¯ corresponds to -1. The procedure
starts from the least significant bit (LSB) towards the most significant bit (MSB) of a
constant and continues searching until no adjacent digits are non-zero [139].
In order to explain the CSE procedure lets consider, three coefficient values 7, 29 and
39 and their CSD encoding 001001¯, 1001¯01, and 101001¯, respectively. According to
74
the CSE method the most common sub-expression among these coefficients is 1001¯
and this can be shared for reducing the number of adders required. Figure 4.2 shows
the structure of a constant multiplier before and after sub-expression sharing where
the dashed line box indicates the common expression. The amount of reduction in the


























Figure 4.2: Constant multiplier structure for coefficients 7, 29, 39 (a) before and (b) after CSE
technique.
Over the years different techniques were introduced to maximize the identification of
the most appropriate common sub-expression for elimination. Hartley’s first algorithm
in [129] performs a recursive search where most frequently appearing subexpressions
with two non-zero digits are extracted both from the coefficient itself and across all
the coefficients whereas Pasko’s algorithm [133] searches for patterns with the most
non-zero digits (i.e. the largest) and selects the one with the highest occurrence fre-
quency. Unlike these techniques that were based on horizontal search, Jang et al. [140]
introduced a vertical search algorithm for common sub-expressions. Some other studies
combined the two mentioned methods which search for repeated digit patterns with a
minimum of two non-zero digits both in vertical and horizontal directions [130,141].
75
4.2.2 Directed Acyclic Graph (DAG)
Adder graphs, also known as the DAGs, use vertices (nodes) and edges to represent
two-input adders and binary shifts, respectively. A typical adder-graph has an initial
vertex (not an adder) that is assigned the value 1 and a terminal vertex which gives the
desired output value. The output of each intermediate vertex corresponds to a partial
sum, also known as fundamentals. Each edge can be assigned any negative or positive
power of two integer, representing the multiplication value. In terms of multiplication,
input fed into the initial vertex can be named as the multiplicand whereas the output
from the terminal vertex is the product. In order to have integer coefficients, a scaling of
2n can be applied to the fixed-point filter coefficients with LSB 2n and the output can
be rescaled with 2n at no additional cost [127]. For further understanding, an example
for the SCM problem is presented in Figure 4.3 in which number 27 is represented using












Figure 4.3: Multiplier graph for 27 represented with (a) CSD and (b) method given in [127]
The concept of adder graph method for implementing multiplier blocks is similar to CSE
in which common fundamentals are shared among the coefficients instead of common
sub-expressions. However, unlike CSE, fundamentals are not required to be in a certain
number format to obtain an optimal solution. There are different heuristic approaches
published for finding the optimal minimal solution for multiplier blocks using this
76
method. Adder-graphs were first employed by Bull and Horrocks (BHM algorithm)
[122] where a set of fundamentals were created and used as an input for a new vertex to
create new set of values. The BHM algorithm was further optimized by Dempster and
Macleod who introduced the Reduced Adder Graph (RAG-n) algorithm [127] which
utilizes pre-stored the Minimum Adder Graph (MAG) tables to generate graphs of
coefficients with already existing fundamentals. Unlike the BHM, RAG-n can generate
partial sums by using values greater than the sum itself (i.e 7  8  1, rather than
7  4  2  1) and that are only odd values to maximize the number of fundamentals.
Figure 4.4 presents two adder graphs generated by the BHM and the RAG-n algorithms





















Figure 4.4: Adder graphs generated for coefficient set 1, 7, 16, 21 and 33 by (a) BHM and (b) RAG-
n [127]
4.3 Time-multiplexed Architectures and Multiplier
Blocks
For applications where the sample rate is lower than the maximum obtainable clock
frequency, time-multiplexing is an efficient way of fully utilizing FPGA resources and
reducing hardware cost. Time-multiplexed structures re-use hardware resources via
time-sharing which reduce the number of adders and multipliers employed. These
77
designs are resource efficient compared to the parallel ones and different filter struc-
tures such as FIR, IIR, and polyphase filters as well as filter banks can be implemented
as time-multiplexed structures. Figure 4.15 (a) shows a conventional time-multiplexed
Tapped Delay Line (TDL) FIR architecture. Similar to the parallel architectures,
general-purpose multipliers as well as the coefficient memory can be substituted with
a multiplier block for complexity and area reduction [125, 142]. Several studies em-
ployed multiplier blocks for time-multiplexed architectures by introducing the usage of
multiplexers in order to add reconfigurability to the multiplier blocks by providing the
flexibility of choosing between different coefficients.
Turner et al. [143] explores the design of ReMBs on FPGA devices by exploiting the
resource redundancy created while mapping algorithms on FPGAs via the CSE method.
Based on a similar idea Demirsoy et. al. [125] developed an optimization algorithm
for time-multiplexed architectures that takes the advantage of the dedicated FPGA
resources using the DAG method. Demirsoy’s algorithm is based on the creation of
table of the DAGs for all possible fundamentals and searches for the ones that can be
shared efficiently. The algorithm is created to provide an optimal solution for Single-
Input-Single-Output (SISO) and Single-Input-Multiple Output (SIMO) systems such as
time-multiplexed filter banks. On the other hand, in [142] a more generalized algorithm
based on fusion of single coefficient DAGs is introduced that provides area-efficient
ReMB designs for ASIC implementations.
78
4.4 Reconfigurable Multiplier Blocks for FPGAs
4.4.1 ReMBs for 4-series FPGAs
The older series of Xilinx FPGAs such as the Virtex-4 are composed of basic building
blocks including Configurable Logic Blocks (CLBs), DSPs, programmable interconnect,
and I/O. Each CLB contains four slices where each slice has two 4-input LUTs, carry-
chain logic, and two flip-flops. A simplified schematic of a Xilinx 4-series half-slices are


































Figure 4.5: Xilinx 4-series (a) Configurable logic block and (b) simplified half slice [144].
A 1-bit full adder can be implemented using the dedicated carry-chain logic and an
LUT for the remaining XOR gate. The concept of reconfigurability in multiplier blocks
employs a multiplexer where its output is connected to at least one input of the adder.
The multiplexer can be implemented using the unused pins of the LUT that is used for
an XOR gate. This idea was first introduced by Turner et al. [143] in which 4-input
LUT followed by the dedicated fast carry logic are used to implement reconfigurable
multiplier blocks with the CSE technique. Later Demirsoy et al. [125] proposed an
improved algorithm by using ”basic structures” fused into the adder graph technique
79
where the structure of the ReMB is not dependent on the coefficient representations.
The main aim of both techniques is to make full use of FPGA logic elements in order
to create a set of coefficients for SISO and SIMO systems.
A basic structure is simply a two input adder with at least one of its inputs connected to
a multiplexer that can be implemented with a 4-input LUT. Due to the available FPGA
logic technology, Demirsoy’s algorithm is based on the use of 2:1 multiplexers (muxes)
which can be implemented on the FPGA slices with no additional cost. Figure 4.6
















2aA 2b0B0 for S  0



















2aA  2b0B0 for S  0




















2aA 2b0B0 for S  0



















2aA  2b0B0 for S  0
2aA 2b1B1 for S  1
Figure 4.6: (a) Basic structure with 2:1 multiplexer and 4-input LUT mapping for (b) addition, (c)
subtraction and (d) addition/subtraction.
The basic structure given in Figure 4.6 (a) is capable of generating two different sums
at the output depending on the select line (S) of the mux. If the adder can be in
80
switchable adder/subtractor mode which can be controlled via ‘Add/Sub’ input, then
the basic structure generates four different outputs, considering that there is a separate
control signal for controlling the operation of the adder. The ”!” sign at the LUT
inputs represents bit-wise left shift with the corresponding positive or negative values
(i.e. a,b0, and b1) which is equivalent to power of two multiplications of the inputs
A,B0, and B1. The Cin input of the basic structure given in Figure 4.6 represents the
carry bit from the previous least significant stage or the initial carry bit in a n-bit full
adder carry chain. A basic structure with a 2:1 mux implements Sum  2aA2b0B0 or
Sum  2aA 2b1B1, thus it produces different fundamentals (i.e. partial sums) for the
multiplier block. On an FPGA, the adder/subtractor has limited functionality since
the mux select line (S) controls both the mux and the operation of the adder/subtractor
as shown in Figure 4.6(d). Therefore, a basic structure can only produce two different
partial products in an FPGA design. Inputs of muxes can be connected to the input
of the ReMB or to the output of another basic structure or to ground. In order to
implement a set of coefficients, a number of these basic structures can be interconnected
in chain (i.e. horizontally cascaded) and tree forms (i.e. inputs of a mux connected
to the output of another basic structure). The number of generated coefficients at the
output is dependent on the basic structure topology, the number of basic structures
and how they are interconnected. For example, if two basic structures given in Figure
4.6 (b) are interconnected (both with 2 different outputs) then the output set size is
equivalent to 4 p2  2q. To find a valid ReMB design for an aimed coefficient set, it is
critical to realize required depth of the design (depthremb), and the adder cost of each
coefficient. The depth of a design represents the number of required cascaded stages
81
to obtain the required number of coefficients which can be calculated using (4.1).
N  nk21 (4.1)
where N is the size of the coefficient set at the output node, n is maximum number of
fundamentals that can be generated with a basic structure (i.e. n  2) and k is the
number of cascaded basic structures (i.e. layers). In addition, in order to generate each
coefficient a minimum number of cascaded adders are required on each path between
the input and output nodes known as the adder-cost (costadder) [145]. The adder cost







where nz pCq represents number of non-zero terms for CSD coefficients. It is important
to note that minimum adder depth for a design may require more adders to be em-
ployed than the required number for a minimal adder solution. Thus, following these






Lets consider a set of coefficients C1  t7, 19u with set size N  2. Adder-cost for
7 p8 1q is one whereas for 19 p16  2  1q is two. Thus, a ReMB with a depth of two
should be designed. On the other hand, a set C2  t9, 15, 33, 63u with N  4 which
consists of coefficients with adder depth one, requires a ReMB depth of two as well. An
example adder graph which shows chain and tree form, basic structure interconnections
along with maximum number of outputs is demonstrated in Figure 4.7 [125].
82
x(k)







Figure 4.7: Example adder graph with N  8, k  2 and n  2 [125].
4.4.2 ReMBs for 7-series FPGAs
In the previous section, the concept of ReMB utilizing 4-input LUTs for implementing
2:1 mux and a full-adder is presented. The more recent FPGAs such as the Kintex-7
replace the 4-input with 6-input LUTs which increases the amount of logical combin-
ations by 248 that can be achieved by a single LUT. A simplified half-slice of a Xilinx



































Figure 4.8: Simplified half slice of Xilinx 7-series [146].
83
In addition, the new LUTs can be configured to use six inputs with one output and
five or less number of inputs with two outputs. With the increased number of input
ports of the LUTs, the method introduced above can be expanded by replacing the 2:1
mux in the basic structure with either a 3:1 or 4:1 mux for no additional cost and the

















2aA 2b0B0 for S0  0, S1  0
2aA 2b1B1 for S0  1, S1  0






















2aA  2b0B0 for S0  0, S1  0
2aA  2b1B1 for S0  1, S1  0






















2aA 2b0B0 for S0  0, S1  0
2aA 2b1B1 for S0  1, S1  0






















2aA  2b0B0 for S0  0, S1  0
2aA 2b1B1 for S0  1, S1  0
2aA  2b1B1 for S0  0, S1  1
2aA 2b2B2 for S0  1, S1  1
Figure 4.9: (a) Basic structure with 3:1/4:1 multiplexer and its 6-input LUT mapping for (b) addition,
(c) subtraction and (d) addition/subtraction.
If the adder structure is either an adder or a subtractor then the basic structure can
produce three distinct results as presented in Figures 4.9 (b) and (c), respectively.
Figure 4.9 (b) represents an addition operation coupled with a 3:1 mux where signals
84
S1 and S0 represent the MSB and LSB of the mux control signal. In this configuration,
S1  S0  0 selects the first input of the mux, B0 scaled by 2b0 , which is added to A and
results in the one of three possible output indicated as Sum = 2aA  2b0B0. Similarly,
Figure 4.9 (c) presents a subtraction operation in which the inputs are inverted and the
subtractor is fed with a ’hot one’ through the carry input of the LSB. For S1  1 and
S0  0 the output of the mux is 2b1B1 which is subtracted from 2aA resulting in ’Sum
= 2aA 2b1B1’. Whereas, if the adder is a switchable adder/subtractor then the basic
structure can generate four distinct outputs as shown in Figure 4.9 (d). A 4:1 mux
requires 6 input ports where two are for the select lines and 4 for the inputs. Since the
7-series FPGAs consist of only 6-input LUTs, an extra one input will cause utilization
of an additional LUT in order to realize the addition operation, which is not desired.
Therefore, one of the inputs of the mux must be shared amongst others. Although
input sharing might seem as redundant and unnecessary, it actually provides more
reconfigurability to the design. In addition, by switching the ports of the input data
to the mux, three distinct combinations of output set is obtained with the same input
which is due to the mux select line (S0 or S1) controlling both the mux and the operation
of the adder/subtractor. This simple modification adds more reconfigurability to the
basic structure to reduce the hardware complexity further, especially for high order
filters. Furthermore, the flexibility of using the 6-input LUTs as 5-input, 2-output one,
enables to realize addition of two 2:1 mux outputs. This leads to the implementation
of fundamentals such as 5 p4   1q and 15 p16  1q to be realized in one structure
which was not possible before. The new structure and its mapping on an LUT with
switchable adder/subtractor is demonstrated in Figure 4.10 and each input can be
85
added or subtracted similar to Figures 4.9 (b) and (c). Due to the dedicated resources





































$&%2aA0   2b0B0 for S  02aA1  2b1B1 for S  1
Figure 4.10: (a) Basic structure with two 2:1 muxes and (b) its mapping on a 5-input, 2-output LUT.
4.5 ReMB for Daubechies Filters in Biomedical Ap-
plications
As mentioned earlier in this chapter, DWT can be implemented as a filter bank with a
tree structure which employs constant coefficient scaling and wavelet filters. The coeffi-
cients of these filters are irrational and vary according to the selection of a mother wave-
let. Daubechies-4 (db4 ) wavelet, with four vanishing moments and eight coefficients, is
a popular wavelet family that has been used in many different biomedical signal pro-
cessing applications. Some of these applications include ECG denoising [147,148], and
feature extraction [149,150], EEG spike detection [151,152], and artifact removal [153],
and finally motion detection using EMG [154]. Due to the wide application area of
db4, an ReMB is designed and implemented for a Kintex-7 FPGA using the principles
described in the previous sections. However, ReMB can also be designed for other
wavelets.
86
4.5.1 db4 Filter Coefficient Quantization
Floating point implementation of wavelet transform leads to perfect reconstruction
(PR) FBs, meaning that the reconstructed output perfectly matches the input data.
Therefore, the accuracy of DWT depends on the precision of the decomposition and
reconstruction filter coefficients, as well as the precision of the FB internal data-path.
Quantization of the filter coefficients will deteriorate the filter characteristics and oper-
ation, followed by the deterioration of the PR characteristics of DWT. In addition, the
coefficient word-length plays a significant role in the design of the ReMBs, since the
structure of the multiplier block depends on the desired coefficient precision. Longer
word-lengths result in increased number of adders and thus, higher resource utilization.
On the other hand, insufficient number of bits will deteriorate the filter characterist-
ics and the smoothness of the wavelet and scaling functions, as quantization with
lesser precision will cause displacement of filter zeros away from z  1 (i.e. Nyquist
frequency). In order to evaluate the effect of filter coefficient quantization, the in-
put and output data as well as the internal data-path of the DWT FB are designed
to have floating-point precision whereas, the decomposition and reconstruction filter
coefficients are quantized with various precision and employed in several level decom-
position and reconstruction procedure of WT. The coefficient word-lengths varied from
7-bit (6 fractional bits and 1 integer bit) to 15-bits (14 fractional bits and 1 integer
bit). Two realistic noise-free synthetic ECG and EEG signals are simulated and fed as
input data to the DWT FB for evaluation purposes. The synthetic ECG waveform is
generated using the model introduced by McSharry et al. [48], whereas the synthetic
EEG waveform is simulated by feeding white noise into a 7th order AR model which
87
is established by real noise-free EEG record obtained from Physionet [114]. The MSE
and the Signal-to-Error Ratio (SER) between the floating-point inputs and the recon-
structed outputs of the fixed-point coefficient DWT FB are measured. Figure 4.11
demonstrates measured error values using different coefficient word-lengths for both
ECG (blue) and EEG (red) data for 5 level DWT. As it can be observed increasing
the coefficient word-length decreases the introduced MSE and improves the SER.
6 8 10 12 14

















6 8 10 12 14


















Figure 4.11: Estimated (a) MSE and (b) SER of the reconstructed output with various filter coefficient
precision.
Figure 4.12(a) presents the filter responses at each decomposition level where D1-
D5 are the highpass branch responses at levels 1 to 5, respectively and A5 is the
lowpass branch response at level 5. Figure 4.12(b) compares the wavelet and scaling
filter responses, whereas Figure 4.12(c) presents the db4 scaling and wavelet functions
that are obtained with the floating-point (blue) and 11-bit (red) coefficients. Finally,
Figure 4.12(d) shows the Pole-Zero Plane (PZP) of the floating and fixed-point scaling
filter. Approximately -72 dB MSE and 140 dB SER can be observed with coefficient
word-length of 11 bits (10 fractional bits). In addition, Figures 4.12 (b), and (c)
demonstrates that coefficients represented with 11 bits preserve the smoothness and
88
the characteristics of the wavelet and scaling filters and functions. The smoothness
is directly linked to the number of zeros at the Nyquist frequency and the PZP of
the quantized filter demonstrates that the two zeros moved towards z  1 where
the remaining two are still closely placed. Therefore, the error introduced at the
reconstructed output with 11-bit coefficients is decided to be negligible for this study,
as such error is not observable with the naked eye and the wavelet properties are
preserved.

















































































Figure 4.12: Frequency response of (a) fixed-point filters at each decomposition level , (b) Wavelet
and Scaling filters with floating-point and 11-bit fixed-point coefficients (red), (c) Scaling and Wavelet
function associated with db4, and (d) Pole-zero plane of the floating and 11-bit fixed-point coefficients
of db4 scaling filter.
89
4.5.2 ReMB Structures for db4 Filters
The fixed-point db4 filter coefficients and their scaled (by 210) integer values are given
in Table 4.1.
Table 4.1: Lowpass and Highpass, Analysis and Synthesis Filter coefficients
Synthesis Filters Analysis Filters
Low-pass High-pass Low-pass High-pass
Fixed-point Integer Fixed-point Integer Fixed-point Integer Fixed-point Integer
b0 0.23046875 236 -0.0107421875 -11 -0.0107421875 -11 -0.23046875 -236
b1 0.71484375 732 -0.033203125 -34 0.033203125 34 0.71484375 732
b2 0.630859375 646 0.03125 32 0.03125 32 -0.630859375 -646
b3 -0.0283203125 -29 0.1875 192 -0.1875 -192 -0.0283203125 -29
b4 -0.1875 -192 -0.0283203125 -29 -0.0283203125 -29 0.1875 192
b5 0.03125 32 -0.630859375 -646 0.630859375 646 0.03125 32
b6 0.033203125 34 0.71484375 732 0.71484375 732 -0.033203125 -34
b7 -0.0107421875 -11 -0.23046875 -236 0.23046875 236 -0.0107421875 -11
The lowpass (h0pkq) and highpass (h1pkq) db4 filters employed both in the synthesis
and analysis FBs are power complimentary which states that all four filters have the
same coefficients but with alternating signs. Thus, there are only eight distinct coeffi-
cients and a single ReMB structure is designed for all filters using the basic structures
introduced in Subsection 4.4.2 with an additional 2:l mux at the ReMB output in order
to select between positive and negative coefficients. First of all, ReMBdepth is calculated
according to (4.1) where n  4 and N  8 which demonstrates that 2 cascaded stages
are required. Then, the maximum adder cost for all eight coefficients is calculated to
be 2. Thus, (4.3) states that the depth of ReMB is 2. Although, ReMBdepth provides a
topological minimum, it does not consider the operations required by each coefficient
and the limitations of each basic structure for an FPGA implementation. Therefore,
in order to combine the fundamentals of all coefficients three basic structures are inter-
90
connected in a tree structure and the LSB of the mux select lines are used to control the
adder/subtractor operation. The details of the adder-costs and shift-add operations
for db4 coefficients are as shown in Table 4.2 and the structure of the proposed ReMB
(will be referred to as Design 1 in the rest of this section) is presented in Figure 4.13.
Table 4.2: Fixed-point (11-bit) db4 wavelet filter coefficients, their adder costs and shift-add format
used to design the proposed ReMB.
Z Adder Cost Shift-Add
11 2 22p2  1q  1
34 1 2p24  1q
32 0 25
192 1 26p2  1q
29 2 22p4  1q   p23   1q
646 2 2p26p4  1q   p2  1qq
732 2 22p26p2  1q  p23   1qq




























Figure 4.13: The ReMB designed for db4 wavelet filters.
Design 1 is targeted for FPGA platforms which takes the advantage of using the
dedicated fast carry logic and implements multiplexers with no additional cost, as
91
described before. The operation of the adders is controlled by the LSBs of the 2-
bit mux select lines, S0:S2, which are denoted as ‘Add/Sub’. The adder will do an
addition operation when Add/Sub  0 and a subtraction operation when Add/Sub  1.
However, when non-FPGA technologies are targeted, then the ReMB can be redesigned
with increased flexibility of using larger muxes. In FPGA platforms, resource cost for an
individual mux with more than three inputs is comparable to an adder’s. For instance,
implementation of both a 4-bit 4:1 mux and a 4-bit full adder, utilizes four LUTs
each, whereas for non-FPGA technologies, multiplexer cost is relatively cheaper [142].
Therefore, a second multiplier block is designed using CSE method via searching for
common patterns from CSD representations of the coefficients as shown in Table 4.3.
Table 4.3: CSD encoded coefficients and common sub-expressions
210 29 28 27 26 25 24 23 22 21 20
11 0 0 0 0 0 0 1 0 1¯ 0 1¯
34 0 0 0 0 0 1 0 0 0 1 0
32 0 0 0 0 0 1 0 0 0 0 0
192 0 0 1 0 1¯ 0 0 0 0 0 0
29 0 0 0 0 0 1 0 0 1¯ 0 1
646 0 1 0 1 0 0 0 1 0 1¯ 0
732 1 0 1¯ 0 0 1¯ 0 0 1¯ 0 0
236 0 0 1 0 0 0 1¯ 0 1¯ 0 0
Two common patterns are detected (shaded areas in Table 4.3) and shared across all
coefficients. This design is shown in Figure 4.14 (a) (will be referred to as Design 2 in
the rest of this section) and it employs three adders, one 2:1 and two 6:1 muxes. These
structures are both compared to the structure created with DAG fusion algorithm using
SPIRAL online tool (will be referred to as Design 3 in the rest of this section) which
92
aims to minimize the adder cost [142,155]. Design 3 is given in Figure 4.14 (b) and it
employs three adders, three 2:1 muxes and one 6:1 mux. For both designs, a controller
is responsible for generating the select signals and the ‘Add/Sub’ signals for the muxes







































Figure 4.14: Constant multiplier blocks designed for db4 filter coefficients. (a) Design2 and (b)
Design3 [155]
4.5.3 db4 Filters and DWT Filter Bank Architectures
Biomedical signals have frequency bands upto a few kHz, hence they require com-
parably slow operating frequencies. Recent FPGAs can operate at upto a few GHz,
93
therefore time-multiplexed architectures can be easily used and this way hardware util-
ization of the DWT FB can be massively reduced. A conventional time-multiplexed
TDL filter is composed of an input memory, a coefficient memory and a single Multiply-
Accumulate unit with a General Purpose (GP) multiplier [156]. Such a filter structure
operates sequentially. At every cycle, the incoming data is multiplied with one coef-
ficient stored in a memory and this process is controlled with a simple control unit,
typically a counter. Each generated product is accumulated with the previous one
by using an accumulator and a register. The structure of the conventional filter is
























Figure 4.15: Time-multiplexed TDL FIR filter implemented using (a) a general purpose multiplier and
coefficient memory, and (b) the proposed ReMB block replacing multiplier and coefficient memory.
On the other hand, the proposed ReMB produces the intermediate results of input and
coefficient multiplication at each clock cycle which eliminates the need for a coefficient
memory. As it can be observed from Figure 4.15 (b), a multiplexer is placed after ReMB
which is responsible for selecting between the generated coefficient or its complement.
94
Here, controller is responsible for addressing the correct coefficient for each tap by
generating correct control signals for the multiplexers and adders/subtractors employed
in the ReMB as well as the multiplexer after it.
The controller is simply an up-counter followed by a decoder where the counter gener-
ates the address to control the input memory and the decoder decodes these addresses
to generate the mux select lines. Table 4.4 presents the select line values required to
generate the lowpass analysis filter (h0pkq) coefficients. S0 : S3 corresponds to the
select lines of the muxes given in Figure 4.13 and S4h0 is the mux at the output of the
ReMB given in Figure 4.15 (b). Select line values 0, 1, 2, and 3 choose the mux input
from top to bottom, 0 and 3 selecting top and bottom input, respectively, where X is
a Don’t Care indicating that the muxes and adders are not employed in the generation
of the corresponding coefficient.
Table 4.4: Select line (S0:S4) values for multiplexers given in Figures4.13 and 4.15 (b) to generate the
lowpass analysis filter coefficients.
Z S0 S1 S2 S3 S4h0
b0 -11 1 1 1 0 1
b1 34 0 0 1 1 1
b2 32 3 X 3 0 1
b3 -192 3 1 2 0 0
b4 -29 2 0 0 0 1
b5 646 0 0 2 1 0
b6 732 2 1 2 2 1
b7 236 1 3 1 2 0
Figure 4.16 presents the structure of the control unit designed for controlling h0pkq,
where Sx1 and Sx0 represent the MSB and LSB of the mux select line Sx, respectively. In
addition A, B, and C represent the decoded counter output MSB to LSB, respectively
95
while sA, sB, and sC are their complements. The truth tables used to generate this














Figure 4.16: The controller for the lowpass analysis filter (h0pkq).
The filter structures presented in Figure 4.15 are employed in two tree structured one-
level analysis FBs as the lowpass (h0pkq) and the highpass (h1pkq) db4 filters. The
architecture of both FBs are shown in Figures 4.17 (a) and (b). For the conventional
FBs with GP multipliers two separate coefficient Read-Only Memories (ROMs) are
used and a single input memory and a controller is used. On the other hand, for
the FB given in Figure 4.17(b) a bigger decoder is designed to simultaneously control
both ReMBs in h0pkq and h1pkq and the input memory, that replaces the two ROMs
96
employed in Figure 4.17 (a) and is presented in Figure 4.18. The truth tables used to


























Figure 4.17: One level analysis filter bank comprised of a lowpass (h0pkq) and highpass (h1pkq) time-
























Figure 4.18: The controller for the analysis filter bank.
98
4.6 Hardware Validation and Cost Assessment
For hardware validation, cost assessment and performance evaluation, the aforemen-
tioned three different multiplier block designs (Figures 4.13 and 4.14) and the 1-level
analysis filter banks employing conventional FIR filters and the filters with the proposed
ReMB (Figure 4.17), are synthesized and implemented on a Kintex-7 (xc7k325tffg900)
FPGA in Vivado v16.2, using the System Generator for DSP in the Matlab/Simulink
environment. The resource utilization of the multiplier blocks in terms of LUTs and
slice counts, as well as the critical path delay (Delaycp) in terms of adder operation
time, indicated using τa is reported. In addition, the filter bank resource utilization
is presented in terms of LUTs, flip-flops, where the maximum operating frequencies
and power consumption figures are compared with the designs available in the open-
literature.
4.6.1 Multiplier Block Cost Assessment
The multiplier block structures introduced in Subsection 4.5.2 and presented in Figures
4.13 (Design 1 ) and 4.14 (Design 2 and Design 3 ) employing three adders, however
while Design 1 and Design 2 have an adder cost (i.e. maximum number of adders
cascaded from the input to the output) of two, Design 3 adder cost is three which is a
significant factor for data-path delay and power consumption of the multiplier blocks.
All three structures are synthesized and place and routed in Vivado v16.2 and two
signed 8-bit and signed 11-bit data are fed as inputs to each multiplier structure. It
should be noted that the internal data-path for the multiplier structure is kept at full
precision in order to avoid quantization errors. Therefore, Design 1 employs a 15- and
99
a 16-bit adder/subtractors in the first stage and a 23-bit adder/subtractor in the second
stage of the ReMB. Table 4.5 presents the resource utilization figures after place and
route in terms of LUTs and Slices which are compared to each other as well as to the
off-the-shelf Xilinx Multiplier LogiCORETM (version 12.0) [157].
Table 4.5: Resource utilization of individual MCM blocks that are designed using the ReMB, CSE
and DAG fusion methods as well as the the Xilinx Multiplier LogiCORETM.
Design 1 Design 2 Design 3
Xilinx Multiplier
LogiCORETM
Input word-length (bits) 8 11 8 11 8 11 8 11
LUT 70 82 102 122 117 144 114 169
Slice 21 26 31 34 34 41 34 53
Delaycp 2τa 2τa 3τa τm
As mentioned earlier, Design 1 targets the FPGA platforms and efficiently uses the
dedicated logic due to the efficient placement of multiplexers before adders. Thus, it
utilizes the least amount of resources when compared to the other structures. For an
11-bit input, which is a typical precision for biomedical signals, Design 1 employs 82
LUTs and 26 slices which is 51% less compared to the Xilinx Multiplier LogiCORETM.
On the other hand, Design 2 reduces the resource utilization by 27.8% compared to the
Xilinx Multiplier LogiCORETM. Design 2 employs less number of muxes and provides
a better solution than Design 3 which makes it a better candidate for non-FPGA
platforms when implementing db4 filters for WT. In other words, the DAG fusion [142]
algorithm is not optimal in terms of adder cost and number of multiplexers used, and a
more sophisticated controller is required. Nevertheless, the results demonstrated that
Design 1 is the optimal structure for the db4 filters in an FPGA platforms and can be
used for replacing the parallel-multipliers and coefficient memory.
100
4.6.2 Time-Multiplexed FIR Filters Cost Assessment
The Xilinx Multiplier LogiCORETM and Design 1 are also employed in time-multiplexed
TDL filter architectures as shown in Figures 4.15 (a) and (b), respectively and their
resource utilization is compared to each other. Resource utilization for each architec-
ture after place and route is demonstrated in Table 4.6 in terms of LUTs, and Flip-
Flops (FFs). In addition, critical path delay (Delaycp) for the multiplier/multiplier
blocks are demonstrated in terms of adder and multiplier operation times, indicated
using τa and τm, respectively.
Table 4.6: Resource utilization of the time-multiplexed TDL Filters with the proposed RemB and the
Xilinx Multiplier after Place and Route on Xilinx Kintex-7 device.
Filter Resource Utilization






LUT 169 98 1
Max. Frequency (MHz) 82 82
1This number includes the 2:1 mux placed after the ReMB which selects between the positive and negative coeffi-
cients.
It should be noted that pipelining is not considered for these implementations however,
for the ReMB it can be achieved by adding registers following adders, and multiplexers
[125]. Looking at Table 4.6, it can be observed that resource utilization for Design 1 is
lower compared to the reference design. The reference design’s cost is estimated as 219
LUTs, 141 FFs and 93 slices where the multiplier on its own costs 169 LUTs. On the
101
other hand, Design 1 demonstrates high savings against the reference design. Overall
filter cost is reduced by 33% which is achieved by 39% reduction of the LUTs and 23%
reduction of the FFs where the ReMB utilizes 43% less LUTs on its own. In terms of
critical path delay, in FPGA implementations multiplexer delays are not included in
path delay since they are embedded into LUTs, therefore it is only critical to consider
the logic depth of adders. Multiplier blocks have reduced logical depth compared to
the multipliers which will reduce critical path delay. Thus Design 1 offers the most
optimal solution in FPGA targeted applications for db4 filters.
4.6.3 1-level Analysis Filter Bank Cost Assessment
The time-multiplexed filters implemented using the proposed ReMB and general pur-
pose multiplier are employed in a 1-D analysis DWT filter bank, in order to provide
an insight for a whole system. For this purpose, a conventional tree architecture is
implemented where two analysis filters (h0pkq and h1pkq) for one level decomposition
are employed. The structure of the filter bank employing the multiplier-free filters
and the control unit are presented in Figures 4.17 and 4.18, respectively. In order to,
validate the performance of the system, a 8-bit random signal and a 11-bit ECG data
record from the MIT-BIH Arrhythmia database are used. The resource utilization after
Place and Route, for both filter banks are presented in Table 4.7 in terms of, LUTs,
Registers, Adders and Multipliers. In addition, Table 4.7 presents and compares the
resource utilization, maximum clock frequency and the dynamic power consumption
figures (if applicable) of other multiplier free db4 analysis filter bank implementations
from the open literature along with the proposed ReMB implementation. System power
102
consumption is estimated at clock speed of 50 MHz in order have a fair comparison
with the literature and the Xilinx Power Estimator tool is used for a more accurate ana-
lysis [158]. The one-level analysis FB employing the proposed ReMB utilizes 255 LUTs,
129 FFs and achieves a maximum operating frequency of 90.38 MHz, where the power
consumption is estimated as 3.404 mW at 50 MHz clock frequency for an 11-bit ECG
input data. In [159], Wahid. et al. presented a matrix based Algebraic Integer Quant-
ization (AIQ) mapped and a conventional fixed-point 1-level decomposition db4 filter
bank architectures for image processing applications. The db4 transform coefficients
are implemented using the CSD representation to replace multipliers with shift-add
networks. It is reported that 22 adders are required and the hardware cost is listed
as 734 and 692 LUTs for AIQ and fixed-point (FP) based implementation, respect-
ively for coefficients with 10-bit precision. A more recent study by Hasan et al. [160],
proposed two architectures which were lifting-based structures of the db4 wavelet fil-
ters. Here the filter coefficients were divided into lifting steps and shift-add networks
were used for implementing two 1-level decomposition FB without multipliers. Lifting
scheme is used for lowering the resource utilization of the DWT FBs since polyphase
matrices are used and the structure halves the computational complexity. However, as
the number of wavelet filter coefficients increases, calculation of the lifting coefficients
become more sophisticated and require more arithmetic operations to represent them,
which increases their complexity. The resource utilizations for both FBs were reported
as 470 LUTs, 133 Registers and 389 LUTs, 101 Registers for Scheme 1 and Scheme
2, respectively. The proposed ReMB based filter bank employs 216 LUTs and 108
registers where the GP multiplier based FB employs 315 LUTs and 137 Registers, for
103
an 8-bit input data. These results demonstrate that, although a conventional tree filter
bank structure is used, the proposed structure exhibits the least hardware resources
where the hardware complexity reduction of 46% and 34% is achieved when compared
to Hasan et al.’s presented two schemes, respectively. It is critical to note that, these
massive savings are achieved through the ReMB itself and no filter bank architecture
optimization such as polyphase implementation is used. In [159], the power consump-
tion figures were also presented for the two architectures implemented which were 6.8
and 8.1 mW, respectively, whereas power consumption of the proposed and the refer-
ence designs are 2.93 and 4.41 mW, respectively. However, the technology used in the
literature are relatively old compared to the technology used in this work, thus it is
not fair to compare the power figures. Therefore, the power consumption figures are
compared against the reference analysis filter bank and it can be observed that the
use of ReMB improves the dynamic power consumption by 50.5% and 62% for 8-bit
and 11-bit input data, respectively. This demonstrates that the increase in the power











































































































































































































































































































































































































































































An efficient implementation for db4 wavelet and scaling filters are presented here that
employs a specifically designed ReMB. It is shown that addition of multiplexers into
shift-add networks provides reconfigurability to the well known constant multiplication
blocks. By taking the advantage of the recent FPGA technologies having 6-input LUTs,
3:1/4:1 muxes are employed in the design of ReMBs at no additional hardware cost
which updates the concepts proposed in the open literature. In order to evaluate the re-
source and power efficiency of the proposed structure, the proposed ReMB is employed
in time-multiplexed FIR filters and conventional DWT FBs which are implemented on a
Kintex-7 FPGA platform and are compared to the reference designs employing parallel
multipliers and to the designs present in the open-literature. Although there is a sub-
stantially diverse literature on efficient FPGA and Very-Large-Scale Integration (VLSI)
implementations of the wavelet transform, to the best of author’s knowledge, applica-
tion of reconfigurable multiplier blocks with optimized structure for FPGA platforms
has not been investigated in the field of biomedical signal processing. The replacement
of multipliers in DWT with shift-add networks has been subject to research in image
processing and image compression applications, however reconfigurable constant multi-
plications are not studied. As the results demonstrate, the proposed ReMB massively
reduces the resource utilization when compared to the parallel multipliers. The ra-
tio of the savings increase with the increasing input word-length, as the number of
adders in the parallel multiplier increases while the number of adders in the ReMB
remains the same. When employed in a time-multiplexed FIR filter architecture, the
additional 2:1 mux and the controller increases the hardware utilization of the filter
106
structure however, the proposed design still utilizes 39% less logic elements against the
reference design. Furthermore, the 1-level analysis filter bank cost assessment results
also demonstrated that the proposed system massively improves the resource utiliz-
ation and power consumption compared to the open-literature and the conventional
reference design.
4.8 Conclusions
In this chapter an alternative implementation method is investigated in order to reduce
the hardware complexity and power consumption of the DWT filter banks. A practical
approach referred to as ‘ReMB’, which replaces the conventional GP parallel multipliers
is presented. In this approach, the advantage of having constant coefficients is taken
and each multiplication operation is replaced with simpler shift and add operations.
From the work presented in this chapter, it can be concluded that the proposed ReMB
approach provides simplicity to the implementation of the DWT wavelet filter banks
and the designer can easily achieve massive hardware and power reduction up to 50%.
The performance of the ReMB approach is also compared to the state-of-the-art mul-
tiplierless implementation solutions for the presented wavelet family and it can be
observed that the presented approach achieves the highest savings with no other archi-
tectural or algorithmic optimization. These savings are simply related to the reduction
in the number of addition operations with the aid of large (3:1/4:1) multiplexers for
no additional cost for FPGA implementations. The FPGA implementation results
provided an insight that the proposed approach is low-cost and power efficient com-
pared to other FPGA implementations. Therefore, it can be concluded that the ReMB
107
structures are suitable for DWT filter banks and can be used for ASIC implementa-




IIR Wavelet Filter Banks for
Biomedical Signal Processing
5.1 Introduction
As mentioned in Chapter 2, DWT can be realised as a two-channel PR quadrature
mirror (QMF) filter banks in which the input data is decomposed by iterating the
lowpass branch of the analysis filter bank, and reconstructed through the synthesis filter
bank. Although the most commonly used wavelets are realised with non-recursive filters
that have finite impulse response, recursive filters with infinite impulse response can
also be used for implementing wavelet filter banks. In DSP applications, the IIR filters
has an advantage over their FIR counterparts, as they can achieve comparable filter
specifications such as passband ripples, stopband attenuation and transition bandwidth
with much lower filter orders which leads to reduced arithmetic operations, memory
constraints and hence lower system delay. Furthermore, the advantage of realizing
109
IIR filters with polyphase networks composed of allpass filters, further reduces the
computational burden and makes the filter more robust to coefficient quantization.
Thus, IIR filters are more desirable for low-power and low complexity applications
where coefficient precision is a significant factor to consider. In the literature for
wavelet transform , a vast amount of research employed FIR filter banks for many
application areas such as biomedical, communication, audio signal and image and video
processing [76,79,80,148,152,162], meanwhile the IIR wavelet filter banks are studied
relatively less and limited to image processing and compression applications [163–166].
The use of non-linear phase IIR filters in the analysis filter bank generally leads to
unstable but causal or non-causal but stable synthesis filters. Therefore, the design of
IIR based filter banks with PR property becomes more challenging than the FIR based
counterparts, which is the main reason for the limited application area and interest.
However, the computational simplicity of the IIR filter based analysis filter banks are
appealing alternatives to FIR filter based ones which motivated the study presented in
this chapter.
The rest of this chapter presents the desired properties and design procedure of ortho-
gonal IIR wavelet analysis filter banks which are realised as parallel connections of real
allpass filters. Therefore, the IIR wavelet analysis filter design problem is reduced to
the allpass filter design, where Remez exchange algorithm [167] based on eigenvalue
decomposition is used. Furthermore, the problem of non-causal IIR synthesis filters
is investigated and a novel hybrid solution is proposed where the analysis and syn-
thesis filter banks employ IIR and FIR filters, respectively. To the best knowledge of
the author, this hybrid solution is the first in the area of biomedical signal processing
110
and wavelet literature which offers reduced hardware complexity solutions for DWT
implementation to be employed in portable, limited size and power, health monitor-
ing systems. Validation and cost assessment studies are carried out and comparative
results are also presented.
5.2 Orthogonal IIR Wavelet Analysis Filter Banks
The analysis part of a two-channel PR IIR filter bank can be realised with a halfband
lowpass and a halfband highpass filter denoted by H0pzq and H1pzq, respectively. These
filters are based on the parallel connection of two real all-pass filters (A0pzq and A1pzq)





A0 pz2q   z1A1 pz2q
A0 pz2q  z1A1 pz2q
ﬁﬃﬃﬂ (5.1)
where A0pzq and A1pzq are M th order allpass filters with real coefficients αm and a
general transfer function,









As it can be observed from (5.1), H0pzq and H1pzq are power complementary filters
since, they satisfy (5.3) where z  ejω.
H0  ejω2   H1  ejω2  1 (5.3)
A one-level IIR QMF filter bank based on the polyphase components A0pzq and A1pzq





















Figure 5.1: Polyphase realization of two-channel IIR QMF bank.
The scaling and wavelet functions associated with the aforementioned filters, can be
achieved by iterating the filter bank J times on its lowpass branch as shown in (5.4).
This will result in the transfer functions Φ pzq and Ψ pzq with lowpass and bandpass
spectrum where their impulse responses are the scaling
 





















It is well known that the regularity of the wavelets is a very important property since
it defines the smoothness of the wavelet functions and it is useful for detecting discon-
tinuities in signals, such as electrode movements in biopotential signal recordings. It is
directly related to the wavelet’s vanishing moments which is the number of times the
wavelet spectrum vanishes (goes to zero) at ω  0 i.e Ψpejωq
ω0  0 where z  ejω.
Therefore, the aforementioned H0 pzq and H1 pzq require an additional flatness condi-











 0 for k  0, 1, ...K  1 (5.5)
112
where K corresponds to the number of zeros of H1 pzq at z  1 and H0 pzq at z  1
i.e. the Nyquist frequency. In other words, the associated wavelet function has K
consecutive vanishing moments [164]. The power complementary property given in
(5.3) enables the reduction of the design procedure to the design of H0 pzq. For a
given filter order, a trade of between frequency resolution and wavelet regularity exists.
Therefore, it is critical to identify needs of the application and select the best possible
frequency selectivity for a given flatness condition [165]. In this study, the IIR wavelet
design methodology introduced by Zhang et. al [165] is adopted for implementing the
IIR wavelet filters with the selected filter and regularity order, and the design steps are
introduced in this section. This methodology employs the Remez exchange algorithm
based on the eigenvalue problem in order to calculate the allpass filter coefficients with
an added flatness condition. Recalling (5.1), H0 pzq is defined as the parallel summation
of the two allpass filters A0 pzq and A1 pzq, and it can be re-written as,












  A1 pz2q










is an allpass filter with real coefficients am and a0  1. Phase response of z1U pz2q
can be expressed as,
113











αm sin p2mM  0.5qω
M°
m0
αm cos p2mM  0.5qω
. (5.8b)
Following this, the frequency response of H0 pzq can be calculated by evaluating (5.1)
on the unit circle and the magnitude response can be written as,
H0  ejω   cos θ pωq
2
(5.9)




am p2mM  0.5q2n1  0 for n  1, 2, ..., N, (5.10)
for ω  pi, where N  K  1
2
, such that K is odd, N is an integer which is 0 ¤ N ¤M
and M is the allpass filter order. In order to achieve maximum regularity which leads to
maximally flat H0 pzq, N should be equal to the allpass filter order M . For cases when
0 ¤ N   M , the Remez algorithm is used to achieve equiripple magnitude response
by approximating either the passband r0, ωps or the stopband rωs, pis ripples. For a
half-band IIR filter implemented with polyphase components, ωp   ωs  pi, hence its
magnitude response is characterized by,
H0  ejωp2   H0  ejppiωpq2  1. (5.11)
Therefore, the approximation of the passband region is sufficient for calculating the
filter coefficients which further simplifies the IIR wavelet filter design procedure. The
first step in the Remez algorithm is the identification of the extremal frequencies (ωi)
114
in the band of interest where the magnitude response will be recursively calculated to




αm sin p2mM  0.5qωi
M°
m0




which can be written in matrix form such that PA  δQA and solved as a generalized
eigenvalue problem, where δ is an eigenvalue and A is the corresponding eigenvector
with the allpass filter coefficients. P and Q matrices are given in (5.13) and (5.14).
P 
















   pM  0.5q
2N1
sin pM  0.5qω0 sin p1.5 Mqω0    sin pM  0.5qω0















0 0    0
cos pM  0.5qω0 cos p1.5 Mqω0    cos pM  0.5qω0









where γ  p1qMN and Θ  ωMN . In order to find the optimal allpass filter
coefficients, vector A, the absolute minimum eigenvalue should be found. Once the
coefficients are identified, the poles of U pzq are computed, where the poles inside and
reciprocals of the poles outside the unit circle are assigned as poles of A0 pzq and A1 pzq,
respectively.
The aforementioned algorithm is used to generate three 9th order IIR wavelet filters
with different regularities (i.e. number of zeros at the Nyquist frequency), as examples.
For all filters, the allpass filter order M  4, and the passband edge frequency ωp  0.4pi
(i.e. νp  0.2 where ω  2piν) are selected. The first design presented in Figure 5.2a,
exhibits a maximally flat filter response (i.e. Butterworth filter) with K  9 (i.e.
N  4). As it can be observed in Figure 5.2a all filter zeros are placed at z  1 and
z  0 for H0 pzq (blue) and H1 pzq (red), respectively. For a 9th order filter nine zeros at
z  1 (i.e. nine vanishing moments) is the maximum achievable regularity, thus the
filter is maximally flat. Second extreme case with no flatness condition imposed on to
the IIR wavelet filter is presented in Figure 5.2b. Here, K  1 and the resulting filter
exhibits equiripple passband and stopband characteristics which is no different than
a conventional Elliptic IIR filter. Looking at these two border examples, the trade of
between the regularity and the frequency selectivity can be observed. In other words,
as the number of zeros at z  1 and z  0, the transition bandwidth of H0pzq and
H1pzq gets wider, or vice versa, which degrades the frequency selectivity of the halfband





Figure 5.2: 9th order IIR lowpass (blue) and highpass (red) filters;
Maximally flat with K  9 and N  4; (a) Magnitude response, (b) Pole-Zero Plane
Elliptic with K  1 and N  0; (c) Magnitude response, (d) Pole-Zero Plane
Intermediate with K  5 and N  2; (e) Magnitude response, (f) Pole-Zero Plane
117
The wavelet and scaling functions associated with maximally flat, elliptic and inter-
mediate filters are presented in Figures 5.3a, 5.3b, and 5.3c, respectively. It can be
observed that the wavelet and scaling functions associated with the maximally flat
design decays faster than the other two designs which is due to the higher number of
vanishing moments, proving the effect of increased number of vanishing moments on
the wavelet decay speed.
(a) (b)
(c)
Figure 5.3: Scaling and Wavelet functions for 9th order (a) Maximally flat, (b) Elliptic, and (c)
Intermediate IIR wavelet filters.
118
5.3 IIR Wavelet Synthesis Filter Banks
In wavelet theory one of the most important properties of the implemented filter bank is
the perfect reconstruction of a decomposed signal. Recalling Figure 5.1, the relationship
between the filter bank input and the reconstructed output for a PR QMF wavelet filter
bank can be expressed as;
pXpzq  XpzqDLpzq  XpzqDApzq (5.15)
where Xpzq and pXpzq are the input and the reconstructed output of the filter bank,
respectively, Xpzq is the alias component of Xpzq as a result of the downsampling op-
eration, DApzq is the distortion transfer function caused by aliasing and DL is the linear
distortion transfer function which relates to the amplitude and phase characteristics of







The aliasing distortion is cancelled if DApzq  0 which can be achieved by having
synthesis filters, G0pzq  H1pzq and G1pzq  H0pzq. Also, if the analysis filters
H0pzq  H1pzq, then the filters are called quadrature mirror filters, thus the filter bank









and in order to have pXpzq  Xpzq then, DLpzq  czd where c is some constant and d
is the delay introduced by the system. Therefore, the QMF-banks guarantee complete
aliasing cancellation but the condition DLpzq  czd is only met if the designed filters
have linear phase. As described in Section 5.2, the analysis as well as the synthesis
filters can be represented with polyphase components which are presented in Figure






















By substituting (5.18) into (5.16) and (5.17), the aliasing and linear distortion functions
of the analysis-synthesis filter bank become;




where (5.19a) satisfies the complete alias cancellation while (5.19b) is an allpass filter
which depicts that the amplitude distortion is eliminated. However, the perfect signal
reconstruction is dependent on the phase responses of A0pzq and A1pzq, thus phase
distortion persists unless a linear phase is obtained between the input and the output
of the system. Theoretically, the phase distortion can be cancelled by a synthesis filter
bank whose polyphase components have the inverse transfer functions of the allpass
120
filters of the analysis filter bank as given by (5.20) and presented in Figure 5.4. This
is a common practice for the FIR filter banks since the inverse of an FIR filter transfer
function is easily achieved by the time-reversal (i.e. flipping) of the filter coefficients










































Figure 5.4: One level IIR wavelet filter bank in polyphase structure with causal stable analysis and
stable but anti-causal synthesis filters.
Meanwhile, the linear distortion transfer function with the new synthesis filters is
obtained by inserting (5.20a) and (5.20b) into (5.17) which becomes a function of both









Both A0pz2qA0pz2q and A1pz2qA1pz2q demonstrate the phase non-linearity is com-
pensation as well as the cancellation of the amplitude and aliasing distortion. How-
ever, as it can be observed from (5.20b) that the time reversal of the IIR filter impulse
response leads to either causal and unstable or anti-causal and stable filters, thus im-
plementation of the synthesis filters is not as straight forward as FIR synthesis filter
banks. This work presents two solutions for realising the anti-causal synthesis filters
where the first solution employs the block processing method introduced by Powell and
Chau [169] and the second one uses FIR approximations of the anti-causal IIR syn-
thesis filters using analytical expressions [168] to achieve novel hybrid IIR/FIR wavelet
systems for low-complexity biomedical applications.
5.3.1 The Block Processing Technique
In order to obtain a causal and stable implementation for the synthesis filter bank
for infinite length input signals, the block processing techniques introduced by Powell
and Chau [169] is employed in this work, which was originally introduced to achieve
approximate linear phase IIR filters. The feedback path in the IIR filter structure
results in infinite duration impulse response which continues with decaying amplitudes.
Thus, physical realisation of such a filter results in finite duration since the parasitic
oscillations cannot be fully represented with the resolution supported by finite precision.
Therefore, truncation naturally occurs due to the hardware limitations which indicates
that the filter response can be approximated with a truncated version of the impulse
response. Length of the truncated impulse is dependent on the filter order since higher
orders result in longer impulse responses.
122
Truncation enables the implementation of non-causal time reversed filters in real time
by using the overlap-add method [170]. In this method, an infinite length input se-
quence is divided into non-overlapping small blocks with sufficient number of samples
and each block is filtered by two causal and stable IIR filters. Then the outputs are
truncated and combined to get the continuous output sequence. The size of the input
blocks are proportional to the length of the impulse response. In theory, the length of
the input blocks pLq must be larger than the length of the physical realization of the
impulse response
 
h pkq  so that the filter output length is less than or equal to 2L 1







x pkq nL ¤ k ¤ pn  1qL 1
0 elsewhere
(5.22)
where x pkq is the infinitely long input and xn pkq represents the non-overlapping blocks
of length L. To compute the non-causal time reversed filter output, each block of
input sequence is convolved with the time reversed impulse response
 
h pkq  with
the assumption that h pkq is of finite length and equals 0 for k ¡ L. Then the output
of each block is computed by the addition of two 2L  1 length convolution outputs
from the adjacent input blocks of length L as shown in (5.23).
123








x pmqh pk mq  
k L¸
mpn 1qL
x pmqh pk mq







where yn pkq is the output of each input block, XLn and XTn 1 consist of the first and
the last L outputs of the adjacent input blocks, respectively and y pkq is the non-causal
time reversed filter output [169]. Thus, the real-time implementation of this procedure
requires the following steps;
1. Store the input sequence into a L-length Last-in First-out (LIFO) register to
obtain time reversed version of each input segment, xn pkq.
2. 2L-length output sections are computed by filtering xn pkq with a0,1 pkq, a0,1 pkq
xn pkq.
3. Add the last L output samples of the current sequence and the first L output
samples of the previous sequence, which can be achieved by introducing a delay
of 2L samples into the path resulting in yn pkq.
4. Finally, store the L samples of yn pkq into a L-length LIFO register which will


















Figure 5.5: Implementation of non-causal time reversed allpass filter using Powell and Chau technique
[169].
Figure 5.5 presents the block diagram for standalone realization of A0,1pz1q filter which
comprises two identical copies of A0,1pzq filter, two L-length LIFO registers, 2L-length
shift register and an adder.
5.3.2 Approximation of anti-causal IIR filters by causal FIR
filters
One way of causal implementation of A0pz2q and A1pz2q is the block processing
method presented above. For this method the allpass filter impulse responses are trun-
cated to a certain number of samples in order to approximate the anti-causal filtering
operations via stable and causal filters. The selected number of samples approxim-
ately corrects the phase distortion created by the analysis allpass filters up-to a certain
error. The block processing method is equivalent to having an FIR filter of order L
which can approximate the amplitude and phase characteristics of the allpass filter but
with the ease of flipping the filter coefficients without worrying about instability or
anti-causality. These FIR filter coefficients can be calculated by a simple polynomial
expression given in (5.24) that are simply the Markov parameters of the state-space
125





presented whereas (5.24b) and (5.24c) demonstrate the truncated and the time-reversed
truncated impulse responses of the allpass filters A0,1pzq, respectively [168].
A0,1pzq  z
1   α0,1
1   α0,1z1 (5.24a)
rA0,1pzq  α0,1   p1qL0,11αL0,110,1 zL0,1





R0,1pzq  α0,1zL0,1   p1qL0,11αL0,110,1









where k P Z is an integer, rA0,1pzq is the FIR approximation of the allpass filters A0,1,
and R0,1pzq  rA0,1pz1q that can be used as synthesis filters. Therefore, analytically it
now can be shown that the transfer functions of both the top and bottom branches of
the filter bank satisfy (5.25) ensuring the phase compensation up to a certain error.
A0pzqR0pzq  zL0  p1qL0εpα0, L0q (5.25a)
A1pzqR1pzq  zL1  p1qL1εpα1, L1q (5.25b)




p1qkαk0,1zpL0,1k1q  zL0,1  p1qL0,1αL0,10,1 (5.26)
126
where L0 and L1 are the FIR filter orders such that (5.25a) and (5.25b) indicates that
the analysis and synthesis paths are pure delay with near-PR property up to a certain
error εpα0, L0q and εpα1, L1q, respectively where εpα0,1, L0,1q  αL0,10,1 . There exists a
trade off between the system delay and the compensation error, in other words, the error
can be substantially reduced by increasing the FIR filters’ order which will increase the
system delay as well as the computational complexity. On the other hand, the length of
the R0,1pzq is proportional to the absolute magnitude of the allpass coefficients. As the
absolute value of the coefficient grows towards one, then longer FIR filters are required
to sufficiently approximate the first order allpass filter. This will lead to having data
alignment problems between the top and the bottom branches for different allpass
filters, therefore extra registers need to be used in the branch path that employs the
lower order FIR filter. Hence, the amount of the delay should be equal to the delay
difference between R0pzq and R1pzq, i.e. L1 L0, considering that L1 ¥ L0. Thus, the





IIR Analysis Filter Bank FIR Synthesis Filter Bank
+ +




0 ( )R z
1( )R z







Figure 5.6: (a) One level hybrid IIR/FIR wavelet filter bank in polyphase structure with causal and
stable IIR analysis and FIR synthesis filter banks.
127
5.4 Design of IIR Wavelet Analysis Filters for Bio-
medical Applications
In biomedical signal processing literature vast amount of research employed different
wavelet families for various applications and biosignals. The selection of the mother
wavelet as well as its design is dependent on the application of interest. Peak and dis-
continuity detection applications require mother wavelets with high vanishing moment,
since they are capable of identifying any abrupt changes, even if they are not visible
to the human eye. This directly relates to the differentiability of the wavelet polyno-
mial [80]. On the other hand, the correlation of the wavelet function with the signal of
interest (i.e. their similarity to each other), is another selection/design criteria for the
mother wavelet which is important for feature detection and denoising applications.
Furthermore, frequency selectivity of the wavelet filters is another critical property to
take into consideration for denoising applications. Based on these criteria, as stated in
the previous chapter, the Daubechies family and specially the mother wavelet db4 with
four vanishing moments and 8 distinct filter coefficients is the most popular mother
wavelet employed in various biomedical applications. This creates a strong reference
for the design consideration of IIR wavelets where similar frequency and vanishing mo-
ment characteristics are aimed to be achieved, as of db4 filters. Therefore, two IIR
filters are designed using the method introduced in Section 5.2 with K  3 (i.e. 3 zeros
located at z  1) and one single coefficient (α0  1{3) allpass filter, and K  5 (i.e.
5 zeros located at z  1) and two single coefficient allpass filters (α0  0.106 and
α1  0.528). For both filters, allpass filter order M  2, and passband edge frequency
128
ωp  0.4pi are selected. The 3rd and 5th order maximally flat IIR analysis filters will be
referred to as ilet3 and ilet5, and their magnitude responses, pole-zero locations and




Figure 5.7: ilet3; (a) Magnitude response and (b) Pole-Zero locations. ilet5; (c) Magnitude response
and (d) Pole-Zero locations. (e) ilet3 Scaling and Wavelet functions, and (f) ilet5 Scaling and Wavelet
functions.
129
The analysis filter bank responses over different decomposition levels, 5 in this case,
are also presented in Figure 5.8 for both ilet IIR and db4 analysis filter banks. The
magnitude response similarities between ilet3 and db4 based filter banks can be easily
observed although ilet3 achieves wider passband and steeper transition. ilet3 wavelet
is a special variation of the IIR wavelets since, unlike its counterparts, only one allpass
filter is employed in its polyphase structure and thus the analysis filter bank is compu-
tationally very efficient. Since, the wavelet and scaling filters achieve similar frequency
characteristics with the db4 filters, ilet3 can be an attractive alternative to db4. On the




Figure 5.8: Analysis filter bank responses for 5 level decomposition where D and A are the highpass
and the lowpass branch responses. (a) ilet3, (b) ilet5 WT, and (c) db4 WT
130
5.5 IIR/IIR Wavelet Filter Banks for Biomedical
Applications
As mentioned in Section 5.3, implementation of the IIR synthesis filter bank requires
more attention due to the anti-causal synthesis filters. To overcome the causality
problem two methods, the block processing and the FIR approximation methods are
introduced. In this section, the block processing method is employed in order to design
the synthesis filter banks for the IIR analysis filter banks, ilet3 and ilet5, discussed in
the previous section. Therefore, two filter bank structures are designed for the ilet3
and ilet5 wavelets and these filter banks are referred to as IIR/IIR wavelet filter banks,
as both the analysis and the synthesis filters are IIR filters.
5.5.1 Floating-Point Models
The block processing method requires the identification of the input block sizes that
will be used for the Simulink model implementation. The first step of block size iden-
tification is impulse response truncation. Since the IIR filter banks are implemented
as polyphase structures, the impulse responses of the allpass filters in the polyphase
branches are truncated and the effects on the overall filter response are evaluated by
calculating the maximum error introduced to the overall filter impulse response as a
function of L0 and L1 using (5.27).









and :H0pejωq are the original and the
truncated IIR filters, respectively and L0 and L1 are the block sizes for A0pzq and A1pzq,
respectively. Since the wavelet and the scaling filters are halfband multirate filters, the
noble identities are used. Thus, allpass sections operate at half the input sampling rate,
which means that zeros in between impulse response samples are eliminated and the
block sizes can be halved. Figures 5.9 and 5.10 present the maximum error pL0, L1q,
magnitude responses and error magnitude of ilet3 and ilet5 scaling filters.
(a) (b)
Figure 5.9: Truncated impulse response of ilet3, (a) Maximum error between the H0pzq and :H0pzq
which is the truncated IIR filter, and (b) Top figure; Magnitude response, Bottom figure; Magnitude
error of the H0pzq (blue) and :H0pzq (red) impulse response with L0  8.
(a) (b)
Figure 5.10: Truncated impulse response of ilet5, (a) Maximum error between the H0pzq and :H0pzq
which is the truncated IIR filter, and (b) Top figure; Magnitude response, Bottom figure; Magnitude
error of the H0pzq (blue) and :Hpzq0 (red) impulse response with L0  8 and L1  16.
132
A block size of L0  8 is selected for ilet3, as it employs only one allpass component.
In other words, the sizes of the LIFO registers is eight and this results in a maximum
magnitude error of pL0q  70dB. On the other hand, L0  8 and L1  16 are
selected for ilet5 allpass sections, respectively which led to pL0, L1q  85dB. Both
maximum error magnitudes are negligible since such an magnitude mismatch in the
filter responses is sufficient to generate a near-PR filter banks to be employed for
biomedical signal processing applications. However, there is a trade-off between the
block size and the computational complexity. As it can be observed from Figures 5.9(a)
and 5.10(a) increasing the block size diminishes the maximum error but at the cost
of increased memory requirements, read/write operations, and power consumption.
Therefore, the block size determination can vary depending on the hardware resources
as well as the target application requirements.
The aforementioned filter banks designed for ilet3 and ilet5 wavelets are modelled
in MATLAB Simulink environment in which two, three level filter banks with floating
point precision are implemented in order to evaluate the effect of block sizes on the filter
banks’ near-PR properties. It should be noted that these models are straight forward
implementations where tree structure is employed without any optimization or time-
multiplexing considerations. Figures 5.11(a) and 5.12(a) present the structure of the
first level analysis filter bank for ilet3 and ilet5, respectively. For both filter banks
Numerator-Denominator TDL (ND(TDL)) allpass structures are employed with two
adders, two registers and one multiplier. The selection of the allpass structure relates
to the fixed point implementation which will be explained in the following section. In
Figures 5.11(b) and 5.12(b) a single level synthesis bank is presented for ilet3 and ilet5,
133
respectively where the time-reversed allpass filters are annotated as ‘A0inv’ and ‘A1inv’.
Similar to the analysis filter bank only one branch of the polyphase structure employs a
time-reversed allpass filter for ilet3 where the second branch is pure delay. This provides
higher computational efficiency compared to the ilet5 synthesis filter bank since the
second branch does not employ a time reversed implementation. Figures5.11(c) and
5.12(c) and 5.12(d) depicts the structure of the time-reversed allpass filters, ‘A0inv’ and
‘A1inv’. As it can be observed the time reversed allpass implementation requires correct
timing scheme in order to operate correctly, which includes the correct switching times










































Figure 5.11: (a) First level analysis FB, (b) first level synthesis FB and (c) implementation of A0pz
1q







































































Figure 5.12: (a) First level analysis FB, (b) first level synthesis FB, (c) implementation of A0pz
1q
with L0  8 and (d) implementation of A1pz
1q with L1  16 in floating point precision for ilet5.
135
A generalized timing diagram for the A0,1pz1q is presented in Figure 5.13 where the
Switch represents the timing for the switches as they need to switch at every L0 or L1
samples, Enable Bottom and Enable Top are used for enabling the allpass filters in the
bottom and top branches, respectively at every p2L0,1  1q samples. The top branch
is enabled L0,1 samples after the bottom branch in order to correctly overlap-add the








Figure 5.13: Timing diagram of A0,1pz
1q in Figures 5.11 and 5.12.
This design is tested and evaluated based on the preservation of the perfect recon-
struction property to observe the effects of block processing. Thus, three different type
of test signals are fed into the system which are white Gaussian data with varying
variances, and real ECG and EEG data collected from Physionet [114]. For the white
Gaussian input 100 Monte Carlo simulations are performed while, 14 different ECG
















where xpkq and xˆpkq are the input and the reconstructed output of the filter bank, and
K is the number of samples of xpkq.
136
Figures 5.14(a) and 5.15(a) presents 10 second segment of the input and the output
of the ilet3 and ilet5 systems, respectively with ECG record-232 obtained from the
MIT-BIH Arrhythmia Database where Figures 5.14(b) and 5.15(b) show the Maximum
Absolute Error (MAE) between the input and output for 14 ECG records.
(a) (b)
(c) (d)
Figure 5.14: ilet3 FB performance for perfect reconstruction with L0  8, (a) ECG record-232
input(red)vs reconstructed output (blue), (b) MAE (mV) for ECG data records, (c) EEG record-
chb14 input(red)vs reconstructed output (blue), and (d) MAE (mV) for EEG data records.
Similarly, Figures 5.14(c) and 5.15(c) present a 10 second segment of the input and
the output of the ilet5 IIR/IIR wavelet filter bank with EEG record-chb1401 obtained
from the CHB-MIT Scalp EEG Database where Figures 5.14(d) and 5.15(d) show the
137
MAE between the input and the output for 14 EEG records. As it can be seen, the




Figure 5.15: ilet5 FB performance for perfect reconstruction with L0  8 and L1  16, (a) ECG
record-232 input(red)vs reconstructed output (blue), (b) MAE (mV) for ECG data records, (c) EEG
record-chb14 input(red)vs reconstructed output (blue), and (d) MAE (mV) for EEG data records.
Furthermore, Table 5.1 summarizes the average error metrics obtained for each input
data type. An average 65 dB and 88 dB SER is obtained for all inputs with ilet3 and
ilet5 analysis and synthesis filter banks, respectively which is a negligible amount of
SER that will not be visible to the human eye and will not lead to a misinterpretation
138
of the biomedical data. This confirms the L0,1 selections for both filter banks, thus
fixed-point implementations are made to further analyse real-life implications.




SER (dB) MSE MAE (mV) SER (dB) MSE MAE (mV)
Gaussian White Noise 65 1.09 107 2.4 103 87.8 5.5 1010 3 104
ECG 65.7 7.73 108 3.5 103 89.2 3 109 2.9 104
EEG 65.4 1.05 109 5.6 104 89.6 3.8 1012 4.8 105
5.5.2 Fixed-Point Models
It is well-know that the floating point implementations are not practical due to the
limited hardware and source power. Therefore, previously introduced models are con-
verted into fixed-point precision in terms of the filter coefficients, the filter datapath
and the filter bank datapath. Here, the implications of the coefficient and datapath
quantization are presented with the optimum solution to balance the trade off between
the performance, and hardware and power consumption efficiency.
The proposed wavelet filter banks are solely polyphase structures which eliminates the
sensitivity to the coefficient quantization of the direct form IIR filters, therefore shorter
coefficient word-lengths can be selected as opposed to their FIR counterparts. Since
allpass filters are the smallest building blocks of the overall filter banks, the first step
in fixed-point conversion is the allpass coefficient quantization. The proposed analysis
and synthesis filter banks consist of three distinct coefficients which are α0  1{3 for
ilet3 and α0  0.10557 and α1  0.52786 for ilet5. The effect of coefficient quantization










where |Hpejωq| and | rHpejωq| are the magnitude responses of the IIR filters formed
with floating-point and fixed-point allpass coefficients, respectively and NDFT is the
length of the DFT. In order to obtain fair comparisons with the FIR implementation
introduced in Chapter 4, allpass coefficient word-length of 9 bits with 8 fractional bits
are used, resulting in  70 dB of filter magnitude magnitude mismatch. Figure 5.16
present the magnitude responses as well as the PZPs of the ilet3 and ilet5 wavelet and
scaling filters with floating-point and fixed-point allpass coefficients.
(a) (b)
(c) (d)




  rH0pzq coefficients, ilet5 wavelet Analysis filters; (c) Magnitude responses,
and (d) PZPs for floating point and fixed point coefficients.
140
A fixed-point conversion results in arithmetic-round-off errors and arithmetic overflow.
The round off errors occur due to use of shorter precision used in the datapath that
degrade the filter performance in a linear manner. Meanwhile, the arithmetic overflows
are more critical which occur when the filter’s dynamic range exceeds the maximum
supported dynamic range of the fixed point precision. Such overflows can lead to a
system behaving non-linearly which will result in erroneous outputs. The dynamic
range of a filter is related to its gain and this gain is dependent on the filter structure
being used. Allpass filters can be implemented with different structures and each has
different gain and quantization noise characteristics varying with the filter coefficient
and the input frequency content. In [171] the effects of datapath quantization for
four different allpass structures are analysed using a linear quantization noise model in
which the quantization noise is modelled as a white additive noise which is uncorrelated
with the quantized input and it is added at each adder and multiplier. in this study,
a similar analysis carried out and the ND(TDL) allpass filter structure, presented in













Figure 5.17: Peak gain and quantization noise
shaping for 1st order ND-TDL allpass structure
[171].
|P1pzq| 
 1  z21   αz1

|P2pzq| 




 11   αz1

|HN2pzq| 
 α1   αz1

|HN3pzq| 




The peak internal values of the implemented first order allpass structures are determ-
ined at the output of each adder, since the coefficient values are smaller than one
and adders are the only arithmetic operations contributing towards the internal gain
growth. Thus, the transfer functions P1pzq and P2pzq are derived and presented in
(5.30). Evaluating P1pzq and P2pzq for the coefficient values 0.33, 0.1056 and 0.5279,
the resulting magnitude responses of P2pzq (i.e. |P2pejωq|) are uniform and one as
expected, while the magnitude responses of |P1pejωq| are presented in Figure 5.18 (a).
(a) (b)
Figure 5.18: (a) Gain |P1pzq| and (b) Output quantization noise power (QNPSD), for A0pzq of ilet3
(blue), A0pzq of ilet5 (red) and A1pzq of ilet5 (black) .
As it can be observed, the maximum gain of |P1pejωq| in L1-norm sense is 2 at half-
nyquist for all three coefficients. In other words, the magnitude response of the transfer
function of the internal calculations is limited to a finite value of 2 for any frequency
and is independent of the coefficient values, with the assumption that coefficients are
smaller than one. This clearly states that if the input signal has frequency components
close to half-Nyquist, then the gain of the internal operations will be doubled, and one
extra guard bit is required at each arithmetic operator in order to prevent arithmetic
overflow due to the bit growth. As mentioned above, the datapath word-length is
142
evaluated by adding white noise input at each arithmetic node and computing the
quantization noise power at the output of each allpass filter. For this purpose, a
convergent quantizer is used after the multiplier to quantize the datapath back to
the selected fractional bits while keeping the added guard bit which solely defines the
arithmetic-round-off error. In this work, the effect of convergent quantization on the
ND(TDL) filter’s internal datapath is evaluated by feeding it with white Gaussian
noise represented by 12 fractional bits and quantizing the datapath to fractional bits
same as the input. The resulting quantization noise power at the output of the allpass
filters with the aforementioned coefficients are presented in Figure 5.18(b) which ranges
between 107 dB and 97 dB, depending on the input frequency content and the value
of the allpass coefficient. As it can be observed, the average noise power level increases
with the increasing coefficient values which is a result of having poles of the filter closer
to the unit circle. Also, the quantization noise will go to infinity if the coefficient values
approach one, since there is no counter effect of the numerator of P1pzq [171]. If the
filter coefficients approach one, then the increase in quantization noise power could be
compensated with few additional fractional bits.
Finally, since the lowpass and highpass filters are implemented as polyphase structures
and the outputs of each branch do not grow over the input range, the summation of
allpass structures only requires an extra bit to fully represent the output. Based on the
above evaluations, three level analysis and synthesis bank is implemented where the
block processing method is used for the synthesis filter bank and the finite-wordlength
effects are evaluated at the output of the filter bank. In addition, the approximation
and detail coefficients are crucial elements of the wavelet transform that are used for
143
denoising and detection applications. Thus, the quantization effects on these coeffi-
cients are also evaluated, individually. For this purpose three different types of data
are employed which are white Gaussian noise, ECG and EEG datasets used which are
presented in Figures 5.14 and 5.15. For the ECG data records the input data is repres-
ented with 11-bit wordlength with 8 fractional bits, whereas the white Gaussian noise
and EEG data are represented with 14-bit with 12 fractional bits. The quantization
is applied at the internal datapath of each allpass filter where 8-fractional bits are dis-
carded. Table 5.2 presents the average MSE, Quantization Noise Power (QNPSD), and
SER at the output for both ilet3 and ilet5 wavelet filter banks where Table 5.3 presents
the average QNPSD values for the approximation and detail coefficients (A3, A2, A1
and D3, D2, D1) at decomposition levels 3, 2, and 1, respectively.
Table 5.2: The average MSE, QNPSD, and SER metrics obtained for White Gaussian Noise, ECG
and EEG data with ilet3 and ilet5 filter banks.
Error measurement
ilet3 ilet5
MSE QNPSD (dB) SER(dB) MSE QNPSD (dB) SER(dB)
Gaussian White Noise 3.5 108 74.5 69.7 6 108 71.2 67.4
ECG 6.4 106 52.4 45.3 8.6 106 50 44
EEG 3.6 108 74.5 50.3 5.7 108 71.6 48.2
Table 5.3: The average QNPSD obtained for Approximation (A3, A2, A1) and detail coefficients
(D1, D2, D3) at decomposition levels 3, 2, and 1 for White Gaussian Noise, ECG and EEG data
with ilet3 and ilet5 filter banks.
ilet3 ilet5
White Noise (dB) ECG (dB) EEG (dB) White Noise (dB) ECG (dB) EEG (dB)
A3 -90.1 -68.1 -92 -80.9 -63.5 -87.1
A2 -86.3 -65 -88.8 -76.4 -60.7 -84.1
A1 -84.6 -62.2 -86 -74.1 -57.8 -81.5
D3 -93.5 -75.1 -98.5 -82.8 -72.2 -94.8
D2 -89.8 -69.5 -93.2 -79.4 -65.8 -89.8
D1 -84.6 -62.2 -86 -74.7 -63.8 -87.8
144
It should be noted that at each level the input data fed to the analysis filters is scaled
down by two and the fractional bit growth is not truncated in order not to loose
precision of the approximation and detail coefficients. Therefore, decomposition levels
1, 2, and 3 employ 9-,10-, and 11- fractional bits for the ECG data and 13-, 14-,
and 15- fractional bits for the white Gaussian noise and EEG data. The theoretical
analysis demonstrated that when a white Gaussian noise with 12 fractional bits is fed
through allpass filters, the expected allpass filter output quantization noise power for
ilet3 and ilet5 floating-point filter coefficients ranges between 107 dB and 97 dB.
Observing Table 5.3 which presents the QNPSD of lowpass and highpass outputs for all
three types of input data, it can be seen that the average QNPSD values for the white
Gaussian noise and EEG datasets are close but not equal to the theoretical values.
This is expected as the fractional bit length increases at every decomposition level
which decreases the QNPSD of the approximation and detail coefficients. However,
the structure of the input signal also has significant effect on the theoretical QNPSD
calculated, which will cause a deviation upto 10% [171]. In addition, although the
EEG and white noise have the same wordlengths, it can be easily observed that the
QNPSD values of EEG data records are lower. This is due to the spectral content of
EEG signals as most of the signal power is concentrated at the low frequencies where
the QNPSD is much lower, as shown in Figure 5.18. Thus, the presented results are
in the expected noise power range. On the other hand, the output noise power is
relatively higher but negligible compared to the internal noise power as quantization
is also applied in the synthesis filter bank.
145
5.5.3 Hardware Validation and Cost Assessment
For hardware validation, cost assessment and performance evaluation, the ilet3 and
ilet5 IIR wavelet filter banks (Figures 5.12 and 5.13) are designed as tree structured
1-level filter banks, and are synthesized and Place and Route (PAR) on a Kintex-7
(xc7k325tffg900) FPGA in Vivado v16.2, using the System Generator for DSP in the
Matlab/Simulink. The filter bank resource utilizations are presented in terms of LUTs,
flip-flops, block Random Access Memorys (RAMs) along with the maximum operating
frequencies and power consumption figures. Implementation of the analysis filter banks
is similar to the structures presented in Figures 5.12(a) and 5.13(a) and as mentioned
earlier the ND(TDL) filter structures are used for the allpass sections. In addition,
the filter coefficients are represented in CSD format and the CSE method is used for
implementing them using hard-wired shifts and with minimum number of adders. On
the other hand, the synthesis filter bank implementation requires more attention, as the
timing and storage of the incoming data is critical. In order to implement the LIFO
registers and the required data segmentation as described by (5.22) two dual port
RAMs are used, one at the input and one at the output of the synthesis filter bank.
The following subsections provide details of the synthesis filter bank architecture, and
timing diagrams as well as the resource and power figures of the complete filter banks.
IIR Analysis Filter Bank Architecture
As discussed previously, the IIR analysis filter banks are implemented using the two-
branch polyphase structures which employ allpass sections to implement the scal-
ing pH0pzqq and the wavelet pH1pzqq filters. The allpass sections are designed with
146
ND(TDL) IIR structures employing two registers, two adders and one multiplier. As
the filter coefficients are designed according to the requirements of the target applic-
ation, they are treated as constant multiplications, thus they are represented in CSD
format and implemented as hard-wired shifts and adds using the CSE method intro-
duced in Chapter 4. Hence the analysis filter bank is multiplier free where ilet3 analysis
filter bank employs six adders and three registers and the ilet5 analysis filter bank em-
ploys ten adders and five registers, as each polyphase branch requires one first order
allpass filter. Based on the analysis provided in the previous section, the gain of the
allpass filters for both ilet3 and ilet5 is one, and one guard bit is used for the adders
to prevent any overflow that might lead to an unstable system. Table 5.4 presents the
fixed-point coefficients along with the adder depth and the shift add network used to
implement the SCMs.
Table 5.4: Fixed-point (9-bit) ilet3 and ilet5 wavelet filter coefficients, their adder costs and shift-add
format used to design the constant multiplications.
ilet3 ilet5
a0 a0 a1
Fixed-Point 0.33203125 0.10546875 0.52734375
Z 85 27 135
Adder Cost 2 2 2
Shift-Add p22   1qp24   1q 25  p22   1q 27   p23  1q
Table 5.5 presents the resource utilization and power consumption figures estimated at
50 MHz for the one-level analysis filter banks for ilet3 and ilet5 wavelets. As expected,
ilet3 analysis filter bank employs less resources than the ilet5 analysis filter bank where
32% resource savings and 24% improvement in power consumption figures are achieved
for a typical 11-bit ECG data.
147
Table 5.5: Resource Utilization and Power Consumption of the ilet3 and ilet5 Analysis IIR Filter
Bank Architectures.
ilet3 ilet5
Input word length (bits) 8 11 8 11
Adders 6 6 10 10
Multipliers 0 0 0 0
LUTs 59 75 99 122
Registers 55 73 72 96
Max. Frequency (MHz) 62.3 62.3 61 62
Power (mW) 1.755 2.197 2.166 2.673
In Chapter 4 the resource utilization of the multiplierless db4 analysis filter bank is
presented and compared to the proposed IIR wavelet analysis filter banks. It can
be observed that the resources utilized are reduced massively by 61% and 43% with
ilet3 and ilet5 analysis filter banks, respectively which reflects as power consumption
improvement of 55% and 21.5%, respectively.
IIR Synthesis Filter Bank Architecture
The fixed-point Simulink model of the synthesis filter banks for both ilets are discussed
in Section 5.5.1 where LIFO registers and switches are used for storing and flipping
the data in time. For the hardware implementation the LIFO registers and switches
are replaced with dual port RAMs. The dual port RAMs are configured not to read
during write mode and the data addresses are controlled by up-down counters. Hence,
the RAM writes into addresses while the counter is counting up and reads from the
addresses while the counter is counting down. During the write operation the outputs
of the RAMs are reset back to zero. The input ports are accessed concurrently with L
sample delay which is the selected block size. This configuration generates the required
148
input signal segmentation. The dual-RAM outputs are then filtered by identical allpass
filters and then added up to obtain the overlap-added output. Finally an additional
dual-port RAM is employed to reverse the time-reversing operation and to obtain the
required output. For the ilet3 wavelet the block size L0  8 is selected thus, the dual-
port RAM requires 16 addresses, first eight for the top branch and eight for the bottom
branch. For this purpose, a 3-bit up-down counter is designed where the output three
bits are extended by adding an MSB bit 0 and 1 for the address 0 to 7 and 8 to 15,
respectively. The structure of a one-level IIR synthesis filter bank for the ilet3 wavelet



















0 ( )X z
1( )X z














Addressa A0-A7 A7-A0 A0-A7 A7-A0 A0-A7 ...
Addressb A8-A15 A15-A8 A8-A15 A8-A15A15-A8 ...








Figure 5.19: ilet3 (a) One-level IIR synthesis filter bank architecture and (b) the timing diagram for
controlling the operation of the synthesis filter bank.
149
Furthermore, the timing diagram for the synthesis filter bank is presented in Figure
5.19(b) where ‘CLK’ is the clock driving the synthesis filter bank, ‘DIR’ determines
the direction of the up-down counter, ‘WRa,b’ is the write enable for the top and the
bottom input ports, respectively, and similarly ‘RSTa,b’ is the reset for the top and
the bottom output ports. The architecture for the ilet5 IIR synthesis filter bank is































0 ( )X z
1( )X z
ˆ ( )X z
02Lz
−
1 0 )2(L Lz
− −
Figure 5.20: ilet5 One-level IIR synthesis filter bank architecture with the block processing method.
Here each polyphase branch of the synthesis filter bank employs the block processing
method unlike the ilet3 and the dual-port RAMs in the top branch consist of 16-address
lines whereas the dual-port RAMs of the bottom branch consist of 32-address lines due
to the block sizes L0  8 and L1  16 determined previously. The address lines for
the top branch are generated in the same way as the ilet3 design whereas the address
lines for the bottom branch are generated by designing a 4-bit up-down counter where
the MSB is bit-extended with 0 and 1 in order to obtain addresses from 0 to 15 and
16 to 31, respectively. The state diagrams and combinational logics of the 3- and 4-
150
bit up-down counters are presented in Appendix B Figures B.1 and B.2. The same
timing-principles used for ilet3 are employed for this structure as well. Finally, Table
5.6 presents the number of adders and multipliers as well as the resource utilization,
maximum operating frequency and the estimated power consumption of ilet3 and ilet5
one-level IIR/IIR wavelet filter banks.
Table 5.6: Resource Utilization and Power Consumption of the Multiplier Free ilet3 and ilet5 IIR/IIR
Filter Bank Architectures.
ilet3 ilet5
Input word length (bits) 8 11 8 11
Adders 17 17 30 30
Multipliers 0 0 0 0
LUTs 220 288 399 520
Registers 139 175 231 288
Max. Frequency (MHz) 76.4 75.6 70.5 70
Power (mW) 5.333 6.437 8.751 10.434
Compared to the analysis filter bank resource utilization and power consumption figures
from Table 5.5, a relatively high increase in the number of LUTs and FFs used as well
as the estimated power consumption figures can be easily observed. For ilet5 IIR/IIR
filter bank with an 11-bit input, the synthesis filter bank employs almost three times
more resources as well as block RAMs of the FPGA which corresponds to approximately
three times more power consumption. Although the block processing method enables
the causal implementation of the anti-causal filters, the hardware evaluation results
show that this method is not efficient in terms of both hardware complexity and power
consumption.
151
5.6 Hybrid IIR/FIR Wavelet Filter Banks for
Biomedical Applications
In Section 5.5 two filter bank architectures are presented where the synthesis filter
banks are implemented using the block processing method. The cost and power estim-
ation studies demonstrated that the use of dual-port RAMs as well as the increased
number of arithmetic operations in the synthesis filter bank , reduces the likelihood
of them being employed in portable biomedical devices. On the other hand, a more
attractive alternative of replacing the anti-causal IIR synthesis filters with FIR filters is
proposed. Following the method and the theory introduced in Section 5.3, two hybrid
IIR/FIR filter banks are designed and implemented for the ilet3 and ilet5 wavelets
to be employed in low-complexity biomedical signal processing applications. Since the
analysis IIR filters are already designed, the first step of the hybrid filter bank design
procedure is the determination of the FIR filters’ order in order to minimize the phase
compensation error defined as εpαi, Liq while maintaining the low-complexity and suit-
able and reasonable delay of the system. After determination of the FIR filters’ order,
the floating point model is validated by feeding the system with random biomedical
signals and measuring the error introduced in the reconstructed signal. In order to
implement the system on a hardware platform, FPGA in this case, the floating-point
filter banks are converted into a fixed-point filter banks and later synthesized in order
to evaluate the hardware complexity and estimate the power consumption.
152
5.6.1 Hybrid IIR/FIR Wavelet Filter Banks - Floating Point
In Section 5.4, two ilet wavelets referred to as ilet3 with one first order allpass filter and
ilet5 with two first order allpass filters used for the analysis filter banks are designed.
The structure of the analysis filter banks are also demonstrated in Figures 5.11 and 5.12.
In order to implement Hybrid IIR/FIR wavelet filter banks, the FIR synthesis filters
for each allpass structure are designed using (5.24c). Figure 5.21(a) presents the phase
compensation error measured in terms MSE between the IIR and the approximated
FIR filters for FIR filter lengths varying from L0  2 to L0  11.
(a) (b)
(c) (d)
Figure 5.21: (a) Phase compensation error with pL0  1q
th order FIR filter for ilet3 wavelet, (b)
Magnitude of Linear Distortion Transfer Function (DLpzq), (c) Magnitude of Aliasing Distortion
Transfer Function (DApzq), and (d) the group delay of the analysis and synthesis filter banks.
153
There is an inversely proportional relationship between the filter order and the amount
of error introduced, as increasing the FIR filter order reduces the amount error between
these two filters. As depicted in Figure 5.21(a) the magnitude of this error exhibits a
linear relationship on a logarithmic scale with the number of coefficients selected for
the FIR filter order. This is simply the result of the exponential decay of the error
εpα0,1, L0,1q  αL0,10,1 given in (5.26). In addition, an ideal IIR allpass filter magnitude
response equates to unity which is due to the trigonometric relationship between the
numerator and denominator polynomials as z  ej2piν  cosp2piνq   jsinp2piνq by the
Euler’s formula. As demonstrated by (5.24b), this ratio of polynomials is approximated
by a single polynomial (i.e. an FIR filter) that requires a compromise in the accuracy
of the unity filter response. Therefore, the filter magnitude mismatch (i.e. the error
between the ideal allpass IIR and the approximated FIR magnitude responses) carries
a trigonometric characteristic and approximates to a sinusoidal response as it can be
observed from the Figures 5.21(b). Furthermore, Figures 5.21(b) and 5.21(c) present
the magnitude of the linear distortion and aliasing distortion transfer functions defined
by (5.16) and (5.17), respectively where two filter lengths L0  7 (blue) and L0  11
(red) are used to observe their effects. Finally, Figure 5.21(d) presents the overall
group delay of the analysis-synthesis filter banks. FIR filter order of L0  11 provides
better approximation of the IIR analysis allpass filters as the magnitude of linear
distortion transfer function is  1, the magnitude of aliasing distortion function is
 120 dB and the group delay variation of the filter bank is in the magnitude of
104. However, there is a trade-off between the performance and the computational
complexity. Employing a 10th order FIR filter requires extra coefficients and higher
154
precision which will increase the complexity of the synthesis filter, therefore L0  7 is
determined to balance this trade-off while maintaining the requirements for a near-PR









filters implemented using the A0pzq with α0  1{3
and the proposed 6th order FIR pR0pzqq, are presented in Figure 5.22.
(a) (b)
(c) (d)
Figure 5.22: ilet3 wavelet; (a) Analysis filter magnitude responses, (b) Synthesis filter magnitude
responses, (c) Analysis filter group delay, and (d) synthesis filter group delay.
The similarity between the analysis and synthesis filters’ magnitude responses can be
easily detected although G0pzq and G1pzq exhibit smaller stopband attenuation. In
addition, the group delay responses of the analysis and the synthesis filters are almost
the opposite to each other, ensuring the phase compensation up to a certain error. The
155












Figure 5.23: Floating model of one-level hybrid IIR/FIR wavelet filter bank for ilet3 wavelet.
In addition, Figure 5.24(a) presents the effect of L0,1 selections on the magnitude
responses of the R0,1pzq designed for the ilet5 wavelet, and the degree of deviation
from the analysis allpass filters A0,1pzq in terms of MSE.
(a) (b)
(c) (d)
Figure 5.24: (a) Phase compensation error with pL0  1q
th and pL1  1q
th order FIR filters for ilet5
wavelet, (b) Magnitude of Linear Distortion Transfer Function (DLpzq), (c) Magnitude of Aliasing
Distortion Transfer Function (DApzq), and (d) the group delay of the analysis and synthesis filter
banks.
156
Unlike ilet3 wavelets, two first order allpass filters are employed, where the coefficient of
A0pzq is smaller than the ilet3 allpass filters and the coefficient of A1pzq is larger. This
is representative of to the MSE values, since the selection of L0,1  7 results in 140 dB
and 50 dB mismatch for A0pzq and A1pzq, respectively. Therefore, smaller L0 and
larger L1 values need to be selected. Figures 5.24(b), 5.24(c), and 5.24(d) compares
the effect of L0,1 selection and based on these comparisons L0  5 and L1  11 are
selected as the orders for the synthesis FIR filters. The magnitude and group delay
responses of the analysis and the synthesis filters are presented in Figure 5.25.
(a) (b)
(c) (d)
Figure 5.25: ilet5 wavelet; (a) Analysis filter magnitude responses, (b) Synthesis filter magnitude
responses, (c) Analysis filter group delay, and (d) synthesis filter group delay.
157
Similar to the ilet3 wavelet and scaling filters the synthesis filters’ magnitude responses
exhibit smaller stopband attenuation which is 68dB. The structure of the one-level













Figure 5.26: Floating model of one-level hybrid IIR/FIR wavelet filter bank for ilet5 wavelet.
Furthermore, three level floating-point precision hybrid IIR/FIR wavelet filter banks
are implemented for ilet3 and ilet5 wavelets and are fed with the same dataset used
in the previous section, in order to evaluate the effect of the selected FIR filter lengths
on the PR property of the filter banks. For this purpose, MSE, MAE, and SER
are calculated. The results are summarized in Table 5.7 where the presented values
demonstrate negligible amount of reconstruction error introduced to the three data
types with different frequency content.
Table 5.7: Average Error Measures for evaluating the implemented three level hybrid IIR/FIR ilet3
and ilet5 wavelet filter banks.
Error measurement
ilet3 ilet5
SER (dB) MSE MAE (mV) SER (dB) MSE MAE (mV)
Gaussian White Noise 67 5.5 1010 3 104 61.2 2.3 107 2 103
ECG 63 1.4 106 1.9 103 57.5 5.2 106 3.5 103
EEG 64 1.6 109 3.8 104 58.4 5.1 109 6.7 104
Increasing the FIR filter orders will make the linear distortion arbitrarily small, hence
the hybrid filter banks can achieve better PR performance. However, by increasing
FIR filter order, the values of the extra coefficients will get smaller which will increase
158
the hardware and computational complexity of the hybrid filter banks, as longer coef-
ficient word-lengths are required to represent the small coefficients. Otherwise, these
coefficients will be rounded to zero and they will not contribute to the PR performance
of the filter bank. The values presented in Table 5.7, showed that with the selected
synthesis FIR filter orders, the filter bank output is near perfectly reconstructed.
5.6.2 Hybrid IIR/FIR Wavelet Filter Banks - Fixed Point
Fixed-point conversion of the analysis filter banks for both ilet3 and ilet5 wavelets are
presented in Section 5.5 where the allpass filter coefficients are quantized to 9 bits.
Since, FIR filters are more sensitive to coefficient quantization which require higher
word-lengths compared to the allpass based halfband polyphase IIR filters, hence the
FIR filter coefficients are quantized to 10 bits (one sign and nine fractional bits). The
FIR synthesis filter, Rpzq for ilet3 wavelet employs eight distinct coefficients whereas
the synthesis filters, R0pzq and R1pzq for ilet5 wavelet employ six and twelve distinct
coefficients, respectively. The magnitude responses of the synthesis filters with floating-
point pG0,1pzqq and fixed-point coefficients ( rG0,1pzq) are presented in Figure 5.27.
(a) (b)
Figure 5.27: Magnitude responses of; (a) ilet3 synthesis filters and (b) ilet5 synthesis filters with
floating-point (blue-red) and fixed-point (black-green) coefficients.
159
The coefficient quantization leads to a 10 dB deterioration in the synthesis filters’
stopband attenuation. Unlike the conventional digital filter designs, the effect of this
deterioration is evaluated via recomputing the linear and aliasing distortion transfer
functions introduced in Section 5.3 as well as the overall filter bank group delay by
employing the scaling and wavelet, analysis and synthesis filters with the quantized




Figure 5.28: Magnitude of; (a) linear, and (b) aliasing distortion transfer functions, and (c) the hybrid




Figure 5.29: Magnitude of; (a) linear, and (b) aliasing distortion transfer functions, and (c) the hybrid
filter bank group delay, with floating- (blue) and fixed-point (red) coefficients for ilet5 wavelet.
Although, the floating-point IIR/FIR hybrid wavelet filter banks result in equiripple
results as outlined in Subsection 5.6.1, their finite-precision counterparts deviate from
the equiripple characteristics. In Figures 5.28 and 5.29, this deviation can be easily
observed through the red plots. As presented in Section 5.3, the linear and aliasing
distortion transfer functions are calculated using (5.16) and (5.17), which results in de-
terministic responses whereas, the changes in the magnitude and group delay responses
after filter coefficients quantization is non-deterministic and cannot be defined using a
mathematical expression, analytically. In addition, the ripple characteristics such as
161
their shape and amplitude differ according to the selected loss of precision method.
In this study, the loss of precision method is empirically determined according to the
peak error magnitude, and convergent rounding is employed since it provides the least
peak error for both linear and aliasing transfer functions. Increasing the precision of
the coefficients (more fractional bits) will effectively lead to smaller quantization errors
and eventually result in similar responses as the floating-point coefficients. However,
the quantization error introduced is negligible and provides magnitude of  1 (with
a deviation of -56.48 dB) for the linear distortion functions and almost constant (de-
viation of -33 dB) group delays for both wavelets. Also, carefully observing these
figures, the resulting magnitude responses of the aforementioned transfer functions ex-
hibit a symmetric relationship around half-Nyquist (ν  0.25), as the allpass IIR and
approximated FIR filters are half-band filters. Furthermore, the effect of coefficient
quantization is also quantified by employing the same approach as for the floating-
point hybrid wavelet filter banks where the error between the reconstructed data and
the input is measured, and the results are presented in Table 5.8. In general, a increase
in the MSE and MAE values can be easily observed after comparing with the results
from Table 5.7. However, regardless of the increase in noise power, the features of the
biomedical data such as the amplitude and fiducial points are not effected, therefore
there is no observable distortion.
Table 5.8: Average Error Measures for Evaluating Three Level Hybrid IIR/FIR ilet3 and ilet5 Wavelet
Filter Banks with Finite-Precision Filter Coefficients.
Error measurement
ilet3 ilet5
SER (dB) MSE MAE (mV) SER (dB) MSE MAE (mV)
Gaussian White Noise 50.6 2.9 106 8 103 52 2.1 106 6.5 103
ECG 50 2.5 105 9.4 103 49.4 2.5 105 12 103
EEG 51.1 2.64 108 1.7 103 49.6 4.2 108 2.3 103
162
(a) (b)
Figure 5.30: Four seconds of (a) ECG data record-232 (top figure) and reconstruction error (bottom
figure), and (b) EEG data record-chb14 (top figure) and reconstruction error (bottom figure).
Figure 5.30 presents four seconds of reconstructed ECG data and EEG data after be-
ing fed into a three level ilet3 IIR/FIR hybrid wavelet filter bank with finite-precision
coefficients.The top, middle and bottom plots for both Figures 5.30 (a) and (b) are
the input, output signals and the reconstruction error introduced by the system, re-
spectively. In addition, it can be easily seen that the three level ilet3 wavelet filter
bank introduces a constant delay of 126 samples or 0.35 seconds at a sampling rate of
360 Hz which is the time required for the filter bank to enter the steady state. This
delay is fraction of a second which is negligible for monitoring purposes as in real-time
applications, the output delay should not be perceivable by the user [172,173].
5.6.3 Harware Validation and Cost Assessment
For hardware validation, cost assessment and performance evaluation, the ilet3 and
ilet5 hybrid IIR/FIR wavelet filter banks are designed as a one-level tree structure
with polyphase components as presented in Figures 5.31(a) and 5.33(a), synthesized
163
and PAR on a Kintex-7 (xc7k325tffg900) FPGA in Vivado v16.2, using the System
Generator for DSP in Matlab/Simulink. For the hybrid systems, the analysis all-
pass sections are designed to have the ND(TDL) IIR structure with two adders, two
registers and one multiplier, whereas the synthesis FIR filters are implemented as time-
multiplexed structures which conventionally employ an input and a coefficient memory,
a multiplier and an accumulator. The allpass filter coefficients are represented in CSD
format and the CSE method is used for implementing them using hard-wired shifts and
with a minimum number of adders. Since the coefficient multiplications are the most
hardware and power demanding arithmetic operations, the ReMB method presented
in Chapter 4 is used for replacing the coefficient memory and the multipliers of the
synthesis filters. The FIR synthesis filter, R0pzq for ilet3 wavelet employs eight distinct
coefficient whereas the synthesis filters, R0pzq and R1pzq for ilet5 wavelet employ six
and twelve distinct coefficients. In order to design the ReMBs, the aforementioned
quantized coefficients are scaled by 29 to obtain integer values. The fixed-point coeffi-
cients and their integer representations are listed in Table 5.9.
Table 5.9: FIR Synthesis Filters’ Coefficients for ilet3 and ilet5 Wavelets.
ilet3 ilet5
Rpzq R0pzq R1pzq
Fixed-point Integer Fixed-point Integer Fixed-point Integer
r0 0.001953125 1 0 0 0.001953125 1
r1 -0.00390625 -2 -0.001953125 -1 -0.001953125 -1
r2 0.01171875 6 0.01171875 6 0.00390625 2
r3 -0.033203125 -17 -0.103515625 -53 -0.0078125 -4
r4 0.099609375 51 0.98828125 506 0.015625 8
r5 -0.296875 -152 0.10546875 54 -0.029296875 -15
r6 0.888671875 455 - - 0.056640625 29
r7 0.333984375 171 - - -0.10546875 -54
r8 - - - - 0.201171875 103
r9 - - - - -0.380859375 -195
r10 - - - - 0.720703125 369
r11 - - - - 0.52734375 270
164
Figure 5.31(a) presents the architecture used to design a one-level hybrid IIR/FIR
wavelet filter bank for ilet3 wavelet, whereas Figures 5.31(b) and 5.32 demonstrate
the structure of the corresponding ReMB and its controller. As R0pzq employs eight
coefficients, the controller is a 3-bit counter where its output is decoded into single bits
as A, B, and C, MSB to LSB, respectively and their logic combinations are used to




































Figure 5.31: (a) The multiplier free architecture of the Hybrid IIR/FIR wavelet filter bank for ilet3,











Figure 5.32: The structure of the controller designed for generating the ReMB control signals.
Table 5.10 presents these control signals, where Sx1 and Sx0 represent the MSB and LSB
of Sx, respectively. The truth tables used to design the controller logic are provided in
Appendix B, Table B.1. The maximum adder depth required by the ilet3 FIR filter
coefficients is three hence, the ReMB has an adder depth of three and employs three
adders. The overall filter bank is multiplier free and incorporates twelve adders where
six are used for the analysis filter bank and the remaining six for the synthesis filter
bank.
166
Table 5.10: Control signals for the ReMB designed for R0pzq of ilet3 Hybrid IIR/FIR wavelet filter
bank.
Coefficient Counter S0 S1 S2
Z A B C S01 S00 S11 S10 S21 S20
1 0 0 0 0 1 1 1 1 1
-2 0 0 1 0 1 1 0 0 1
6 0 1 0 0 0 1 0 1 0
-17 0 1 1 0 1 0 0 0 1
51 1 0 0 0 0 0 0 1 0
-152 1 0 1 1 0 0 0 1 1
455 1 1 0 1 1 0 1 0 0
171 1 1 1 1 0 0 0 0 0
Futhermore, Figure 5.33 presents the architecture used to design a one-level hybrid
IIR/FIR wavelet filter bank for ilet5 wavelet, whereas Figures 5.34(a) and 5.34(b)
demonstrate the structures of the ReMBs of R0pzq (ReMB0) and R1pzq (ReMB1), re-
spectively employed in the synthesis filter bank. In addition the control signals required
for ReMBs’ correct operation and the structure of the controllers designed for (ReMB0)
and (ReMB1) are presented in Table 5.11 and Figure 5.35, and Table 5.12 and Fig-
ure 5.36, respectively. As mentioned before, R0pzq has six coefficients, hence a 3-bit
counter is employed whereas R1pzq has twelve coefficients, hence a 4-bit counter is em-
ployed and the control logic is more sophisticated. The truth tables used to design the
controllers logic are provided in Appendix B, Tables B.2 and B.3 for R0pzq and R1pzq,
respectively. (ReMB0) and (ReMB1) employ three and four adders respectively and
therefore the overall filter bank is multiplier free and incorporates 21 adders where six


































( )X z ˆ ( )X z














































Figure 5.34: Structure of (a) ReMB0 designed for R0pzq and (b) ReMB1 designed for R1pzq of the
ilet5 Hybrid IIR/FIR wavelet filter bank.
169
Table 5.11: Control signals for the ReMB designed for R0pzq of ilet5 Hybrid IIR/FIR wavelet filter
bank.
Coefficient Counter S0 S1 S2
Z A B C S01 S00 S1 S21 S20
0 0 0 0 1 1 0 0 0
-1 0 0 1 0 1 0 0 1
6 0 1 0 1 0 0 0 0
-53 0 1 1 0 0 1 0 1
506 1 0 0 1 0 0 1 1









Figure 5.35: The controller designed for generating the ReMB control signals.
Table 5.12: Control signals for the ReMB designed for R1pzq of ilet5 Hybrid IIR/FIR wavelet filter
bank.
Coefficient Counter S0 S1 S2 S3 S4 S5
Z A B C D S01 S00 S11 S10 S21 S20 S31 S30 S41 S40 S5
1 0 0 0 0 1 0 X X X X 1 0 1 0 0
-1 0 0 0 1 1 0 X X X X 1 0 1 0 1
2 0 0 1 0 1 0 1 1 1 1 X X 0 0 0
-4 0 0 1 1 X X X X X X X X 1 1 1
8 0 1 0 0 0 0 1 0 1 1 X X 0 0 0
-15 0 1 0 1 0 0 X X X X 1 0 1 0 1
29 0 1 1 0 1 1 1 1 1 0 X X 0 0 0
-54 0 1 1 1 1 1 X X X X 0 1 1 0 1
103 1 0 0 0 1 1 1 0 0 1 X X 0 0 0
-195 1 0 0 1 0 0 0 1 1 0 X X 0 0 1
369 1 0 1 0 0 1 0 0 0 0 X X 0 0 0
















Figure 5.36: The controller designed for generating the ReMB control signals.
171
The filter bank resource utilizations are presented in terms of LUTs, flip-flops, along
with the maximum operating frequencies and estimated power consumption at 50 MHz
clock frequency. Unlike the IIR/IIR wavelet filter banks the hybrid IIR/FIR wavelet
filter banks do not employ block RAMs which reflect to the estimated power con-
sumption figures where for ilet3 and ilet5 wavelets with 11-bit input, this results in
37.5% and 55.7% improvement in the estimated power consumption, respectively. The
proposed hybrid systems also increase the maximum achievable operating frequency of
the systems which demonstrates that the maximum path delays are shorter. Therefore,
hybrid systems become attractive alternative to the IIR/IIR wavelet filter banks.
Table 5.13: Resource Utilization and Power Consumption of the Multiplier Free ilet3 and ilet5
IIR/FIR Filter Bank Architectures.
ilet3 ilet5
Input word length (bits) 8 11 8 11
Adders 12 12 21 21
Multipliers 0 0 0 0
LUTs 229 272 488 561
Registers 210 258 367 445
Max. Frequency (MHz) 81 85 84.6 83
Power (mW) 3.93 4.681 5.571 6.7
5.7 Hybrid IIR/FIR Wavelet Filter Banks for ECG
Signal Denoising
In biomedical signal processing literature the DWT is used for various purposes and
denoising is one popular application of the DWT, which has received considerable
attention in biomedical signal noise reduction. The theoretical and experimental stud-
172
ies proved that the hybrid IIR/FIR wavelet filter banks are capable of maintaining
near-perfect reconstruction with the right selection of parameters. However, it is also
important to evaluate their performance in noise removal scenarios. The ECG signals
are usually contaminated with noise where the noise and signal spectra overlap and the
conventional filtering techniques are insufficient to remove this noise. The DWT is a
popular tool in the field of non-stationary signal processing that provides simultaneous
time and frequency information, and has been used to detect and remove the over-
lapping noise from the signal. In ECG denoising literature a vast amount of research
employed FIR filter banks with various wavelet families, most popular ones being the
Daubechies (e.g. db4), Symmlets (e.g. sym4) and Coiflets (e.g. coif4) [156, 174–177].
However, IIR wavelet filter bank studies are less extensive and limited to image pro-
cessing and compressing applications [166, 178]. This section presents the application
of the proposed hybrid IIR/FIR wavelet filter banks for ECG signal denoising. To
the best knowledge of the authors, this is a first in the wavelet literature for ECG
denoising and the results demonstrate that the proposed hybrid IIR/FIR DWT filter
banks achieve better denoising performance in terms of SNR improvement and MSE,
with reduced arithmetic operation complexity compared to the conventional FIR/FIR
wavelets.
5.7.1 Method
There are various types of noise such as the powerline interference, baseline wander,
and muscle contraction artifacts that are assumed to be additive and independent
from the the ECG signal which is generally modelled as xn pnq  xc pnq   e pnq, where
173
xn pnq, xc pnq, and e pnq are the noisy ECG, the clean ECG and the composite noise,
respectively. Powerline interference can be eliminated by a digital notch filter, the
spectrum of other noise sources overlap with the spectrum of the ECG which makes
them difficult to remove with conventional filtering techniques. In such circumstances,
wavelet thresholding can be employed where the noisy signal is decomposed into several
levels, thresholded and reconstructed [179]. The block diagram of the DWT based

























































Figure 5.37: The block diagram of the DWT based denoising method.
• Decompose the noisy ECG signal into 7 levels.
• Compute threshold for each detail coefficient (i.e. cDjpkq; outputs of h1pkq at
each level) and apply the selected thresholding technique to remove EMG noise.
• Nullify the finest level approximation coefficient to remove the baseline wander
(i.e. cA6pkq; h0pkq output at level 7).
• Reconstruct the thresholded detail coefficients to obtain the denoised signal.
For this study, soft thresholding [180] given in (5.32a) [180], is used where the threshold





signpcDjqp|cDj|  λq |cDj| ¥ λ






pK  2iq   cD2j

pK  iq  °Ki1 cDj	
K
(5.32c)
where cDj and cDj are the original and denoised detail coefficient at level j, respect-
ively. σ  medianp|cD1|q{0.6745 is the noise variance of level 1 detail coefficient, K is
the length of the detail coefficient at each level, and i  1, 2, . . . , K. The threshold-
ing method and threshold criterion is empirically determined where soft thresholding is
well-known for delivering smoother outputs and the Rigorous SURE threshold selection
scheme is known for successfully identifying the small details of signal overlapped with
noise. A good comparison of different threshold selection and thresholding methods
can be found in [182].
5.7.2 Generated ECG data and Synthetic Noise Sources
Four raw ECG records (‘103’, ‘105’, ‘109’, and ‘118’) are randomly taken from the MIT-
BIH arrhythmia database [114] which are resampled to 256 Hz. In order to obtain
clean control data, preprocessing stages are applied, including notch and highpass
filtering (cut-off frequency (fc)  0.5 Hz), to remove 60 Hz powerline interference and
baseline wander, respectively. Then, the EMG interference (xe pkq) is modelled as white
Gaussian noise, whereas the baseline wander is modelled as additive combination of
175












where 0   fi ¤ 1 for i  1, 2, . . . , P , fs is the sampling frequency and W pkq is lowpass
filtered (fc  1 Hz) white Gaussian noise. Thus, the composite noise is obtained by
e pkq  Apxe pkq   xbw pkqq where A is the input noise scaling factor that is determined








e pkq 2 10SNR{10 (5.34)
The ECG signal denoising performance of ilet3 and ilet5 hybrid IIR/FIR filter banks
as well as db4, db6, db8 sym4, and coif4 FIR/FIR wavelet filter banks are evaluated
and compared by computing the MSE and the SNR improvement. In this study, the
MSE results are used to evaluate the amount of signal distortion introduced after de-
noising whereas the SNR improvement results represent the ratio of improvement in




xn pkq  xc pkq 2
K°
k1
xd pkq  xc pkq 2 (5.35)
where xd pkq is the denoised ECG signal. In Figure 5.38 a 5 second segment of the
clean record ‘105’, the generated EMG noise, the baseline wander and the noisy record




Figure 5.38: A 5 second segment of the (a) clean record ‘105’, (b) the generated EMG noise (c)
generated baseline wander and (d) the noisy record with an SNR of -8 dB.
5.8 Results and Discussions
In Figure 5.38, it is shown that an input SNR of -8 dB results in an ECG signal which
is buried in noise where the QRS peaks are barely visible and the ECG characteristics
with lower amplitude such as T and P waves cannot be easily distinguished. Thus,
having a lower input SNR indicates that the ECG signal cannot be used for diagnostic
purposes which necessitates noise removal. On the other hand, ECG recordings do
177
not always suffer from strong noise therefore, higher SNR levels are also required to
mimic real life applications. Therefore, four 60 second long records (‘103’, ‘105’, ‘109’,
and ‘118’) are contaminated by adding the synthetically generated EMG and baseline
wander with SNR ranging from 12 to 16 dB. For each data record and at each SNR,
100 Monte Carlo Simulations are performed and the average SNR improvement and
MSE are computed. The results obtained after denoising of the noisy record ‘105’ are
presented in Figure 5.39.
(a) (b)
Figure 5.39: Average (a) SNR Improvement (dB), and (b) MSE, after wavelet denoising with ilet3,
ilet5, db4, db6, db8, sym4 and coif4.
Observing Figure 5.39 (a) it can be seen that the SNR improvement achieved after the
wavelet thresholding using different wavelet families follows a similar pattern. The noise
reduction method works more effectively, when the noise power is high which is due
to the decreasing correlation between the signal and the wavelet. Wavelet transform
simply performs a correlation analysis between the signal and the wavelet function,
and expected to produce wavelet coefficients with minimal amplitude which typically
correspond to the noise. This way considerable amount of noise is suppressed. The
178
basic idea behind this method is that wavelet coefficients in large-magnitude are typ-
ically the sharp features of the signal, such as the QRS complex, which are preserved.
However, this leads to signal distortion especially due to the relatively smoother fea-
tures such as T and P waves. Thus, measurement of the SNR improvement is not
sufficient to determine the effectiveness of the noise reduction since the time-domain
characteristics of an ECG signal are diagnostically important. Hence, the MSE after
denoising is calculated and presented in Figure 5.39 (b). In this case, it can be observed
that higher noise power results in increased MSE, as the wavelet thresholding method
cannot distinguish between the signal and noise components as effectively and some
of the signal characteristics are distorted. Therefore, although the SNR improvement
values present promising results the important features of the ECG signal is affected.
Nevertheless, the presented MSE values are small and the distortion is negligible for
this application example. The aim of this study is to compare the denoising perform-
ance as well as the computational complexity of the proposed IIR filter based wavelet
filter banks with the conventional wavelet filter banks. Therefore, it is important to
highlight that the SNR improvement and MSE values obtained in this study may not
be optimal and can be improved with alternative/additional noise removal algorithms,
such as adaptive filtering and Indepenedent Component Analysis (ICA).
From Figure 5.39, it can be observed that the ilet5 hybrid IIR/FIR wavelet filter bank
(red) provides the highest SNR improvement and the lowest MSE when compared to
the others, whilst coif4 FIR/FIR wavelet filter bank (burgundy) provides the second
highest SNR improvement and the lowest MSE results. Table 5.14 presents the average
SNR improvement figures (in dB) and the MSE values obtained after denoising four
179
noisy ECG with input SNR of -8 dB by using the aforementioned wavelet filter banks
in order to provide detailed SNR and MSE values obtained after denoising.
Table 5.14: SNR improvement (dB) and MSE after wavelet denoising the four noisy ECG records
with input SNR of -8 dB.
SNR Improvement (dB) Mean Square Error (MSE)
ilet3 ilet5 db4 db6 db8 sym4 coif4 ilet3 ilet5 db4 db6 db8 sym4 coif4
‘103’ 11.95 12.5 11.19 11.57 11.69 11.43 12.27 0.038 0.034 0.046 0.042 0.041 0.043 0.036
‘105’ 12.5 13.17 11.67 12.18 12.39 11.95 12.85 0.033 0.028 0.04 0.036 0.034 0.038 0.037
‘109’ 13 13.78 12.16 12.68 12.94 12.28 13.33 0.061 0.051 0.074 0.066 0.062 0.072 0.057
‘118’ 12.18 12.86 11.35 11.82 12.08 11.51 12.53 0.064 0.056 0.078 0.07 0.066 0.075 0.059
In Table 5.14 the SNR improvement and MSE values for all data records vary for differ-
ent mother wavelets while the threshold method and rule are kept the same. This indic-
ates that the performance of the well-known wavelet thresholding based noise removal
method can be improved by using different wavelet families. This is simply due to the
various filter characteristics such as transition bandwidth and passband ripples, which
are directly related to the filtering performance. The analysis filter bank responses
are presented in Appendix C. These figures present that the highpass branches of the
analysis filter banks of ilet5 and coif4 exhibit longer passband regions with sharper
transition bands compared to the others. Hence, ilet5 provides the highest SNR im-
provement and the lowest MSE values for all the data records which are presented in
Table 5.14 as bold numbers. The ilet5 hybrid filter bank is followed by the coif4 both
under high and relatively low input noise power due to the better frequency selectivity
achieved. However, the computational complexity of these filter banks needs attention
since the ilet5 hybrid IIR/FIR wavelet filter bank achieves better frequency selectivity
by employing only 20 coefficient multiplications whereas the coif4 FIR/FIR wavelet
180
filter bank requires 48, with the assumption that the filter banks are implemented as
polyphase structures. Although in Chapter 4 an efficient method of implementing the
FIR/FIR wavelet filter banks is proposed, the same method is applied with the hybrid
IIR/FIR wavelet filter banks which makes them superior over their FIR/FIR counter
parts for high order wavelets such as the coif4. In addition, the db4, sym4, db6 and
db8 analysis and synthesis filters employ 16, 16, 24 and 32 coefficient multiplications
(with the assumption that the filter banks are implemented as polyphase structures),
respectively whereas the ilet3 hybrid filter bank employs only 9 and leads to better
SNR improvement and lower MSE at the output. While SNR improvement is an im-
portant factor in the denoising applications, it is important to note that lower MSE is
an indication of a relatively smaller signal distortion after denoising which is a signific-
ant factor for diagnostic applications. In terms of the computational complexity, since
the analysis filter banks of the proposed hybrid filter banks are implemented using the
allpass based halfband polyphase IIR filters, the arithmetic and storage complexity of
the hybrid filter banks will always be lower in comparison to the FIR wavelets based
on the selected mother wavelet and the FIR filter structure. Also, it is a well known
fact that for fixed-point implementations, FIR filters are more sensitive to coefficient
quantization which require higher word-lengths compared to the allpass based halfband
polyphase IIR filters, which further increases the system complexity.
5.9 Conclusions
In this chapter, the design methodology for orthonormal IIR wavelets, introduced by
Zhang et al. [165] is described in detail. Using this method two IIR wavelet filters,
181
ilet3 and ilet5, are designed and their corresponding filter banks are implemented in
floating and fixed point precision. In order to eliminate the non-linear phase effects and
achieve a near perfect reconstruction, the synthesis filters need to be the time-reversed
versions of the analysis filters which results in non-causal filters. Although it is pos-
sible to achieve this for off-line processing with finite length input extra care needs
to be taken for real-time implementation with infinitely long input sequences. Thus,
two methods for realising the non-causal filter implementation is introduced. The first
method incorporated block processing technique where the input is divided into small
blocks and with the help of dual-port RAMs time-reversal of the input is achieved.
Then, the time reversed input blocks are filtered with allpass sections, added and time
reversed again to obtain the required output. The effects of the block sizes are invest-
igated and concluded that L0  8 and L0  8 and L1  16 for ilet3 and ilet5 filters,
respectively are sufficient to achieve near perfect reconstruction. These filter banks are
later converted into fixed point precision and implemented in MATLAB/Simulink en-
vironment where the quantization effects are evaluated for white Gaussian noise, ECG
and EEG data. The evaluated systems are also synthesized and place and routed on
Kintex-7 FPGA device in order to achieve the resource utilization, maximum operating
frequencies and estimated power consumption figures. The second method for imple-
menting the synthesis filter banks, incorporated the design and implementation of FIR
approximations of the required anti-causal IIR synthesis filters. Thus, the length of
the FIR filters are determined by evaluating the perfect reconstruction properties of
the filter where L0  8 and L0  7 and L1  11 for ilet3 and ilet5 filters are selected,
respectively. Similarly, the floating point models are converted into fixed-point pre-
182
cision and later synthesized and place and routed on a Kintex-7 FPGA. The FPGA
synthesis results demonstrated that the hybrid systems are less power demanding and
are attractive alternatives to the IIR/IIR wavelet filter banks.
Furthermore, the proposed hybrid IIR/FIR wavelet filter banks as well as the state-
of-the-art FIR/FIR wavelet filter banks are employed in an ECG denoising applica-
tion based on DWT denoising method. The results of this study demonstrated that
the proposed systems provide better frequency selectivity and hence better denoising
performance compared to the state-of-the-art FIR filter based systems with reduced
computational complexity. Therefore, the work proposed in this chapter concludes that
the IIR/IIR wavelet filter banks are computationally complex and have high memory
requirements whereas their hybrid counterparts provide more efficient implementation.
In addition, it is demonstrated that the hybrid systems are less computationally com-
plex and requires less power compared to the FIR based filter banks while providing
better filter specifications that are useful for applications such as denoising. It should
be noted that the DWT based denoising methods are not always sufficient enough to
provide required output SNRs for the biomedical signals, thus they are used along with
other techniques such as adaptive filtering which are hardware and power demanding.
The reduction in the complexity and power requirements of the DWT filter bank by
using the proposed hybrid systems creates resources for additional methods and enables
their employment in power limited portable health monitoring applications.
183
Chapter 6
Conclusions and Future Work
6.1 Conclusions
The wearable, mobile health monitoring systems have received a considerable amount
of attention in the recent years in order to provide remote, supervised and independent
living for patients suffering from long term medical conditions. The key to achieving
this is to provide continuous and real time monitoring by using on-body sensors that
have on-site processing abilities and are capable of long-term transmission. However,
these systems face the challenge of being in a resource and power constrained envir-
onment which requires low-complexity design solutions. Based on this motive, this
research proposed the design and implementation of a complexity reduced digital sig-
nal processor solutions for biomedical applications that reduces the power consumption
and can be employed in power limited portable health monitoring devices.
The work presented in this thesis first proposed the design of a low-complexity multi-
plier free decimation chain composed of a forth order Slink filter, two fifth order allpass
184
based polyphase halfband IIR filters and a first order allpass based Slink compensator.
The decimation filters used in biomedical applications employ high order FIR filters
which increases the complexity of the decimators and hence the digital signal processor.
The proposed decimator provides reduction in complexity and offers an attractive al-
ternative to existing solutions in the literature by replacing the very high order FIR
based decimators with IIR decimators which employ only two distinct coefficients that
are implemented with simple shift and add operations. The proposed design is tested
and evaluated by feeding real ECG data through a Σ∆ modulator which is then filtered
and decimated via the proposed decimation chain. The phase non-linearity of the IIR
filters is an important factor that required attention. Therefore, the phase non-linearity
effects of the proposed decimator are evaluated by measuring the time- and frequency
domain distortion introduced to the filtered data. It is demonstrated that the phase
non-linearity of the IIR polyphase filters, do not cause a significant distortion on the
morphological and spectral characteristics of the input ECG signal. This is due to the
very narrow and low frequency range corresponding the physiologically significant fre-
quencies of the ECG signals. In other words, these frequency bands of interest are close
to the DC where the group delay variation is already minimal (minimum phase filter).
The high spectral coherence, high morphological correlation and low error between the
input and the output signals quantifies that the IIR polyphase filter introduces minimal
distortion to the signal which would not affect critical diagnosis and therefore, phase
compensation is not a must for such applications.
In a typical on-body sensor the ADC is followed by the digital signal processing oper-
ations. The DWT is the one of the most popular tools employed in biomedical signal
185
processing literature and it is commonly realised as tree-structured filter banks com-
posed of highpass and lowpass filters derived from wavelet functions, also known as
mother wavelets. These filters are conventionally implemented as FIR filters which
employ intensive multiply-add operations. The complexity of the DWT is reported
to be high which limits its application in the area and power limited on-body sensor
nodes. Therefore, in Chapter 4 an efficient implementation method for the DWT fil-
ters is proposed in which the resource and power hungry multipliers are replaced by
specifically designed ReMBs. In this method, the constant coefficient multiplications
of the scaling and wavelet filters are replaced with shift-add networks with an addition
of multiplexers in a time-multiplexed FIR filter structure. It is shown that addition
of the multiplexers introduces reconfigurability to the well known constant multiplica-
tion blocks. By taking the advantage of the recent FPGA technologies having 6-input
LUTs, 3:1/4:1 muxes are employed in the design of ReMBs at no additional hardware
cost which updates the concepts proposed in the open literature. The proposed novel
solution for the DWT filter bank implementation is employed for the design of the db4
mother wavelet based filter bank, in order to evaluate the resource and power efficiency
of the proposed method. For this purpose an ReMB is specifically designed for the db4
filters and employed in time-multiplexed FIR filters within a conventional DWT FBs.
The proposed design is then implemented on a Kintex-7 FPGA platform and are com-
pared to the reference designs employing parallel multipliers and to the other multiplier
block designs presented in the open-literature. Although there is a wide literature on
efficient FPGA and VLSI implementations of wavelet transform, to the best of author’s
knowledge, application of reconfigurable multiplier blocks with optimized structure for
186
FPGA platforms has not been investigated in the field of biomedical signal processing.
The replacement of multipliers in DWT with shift-add networks has been subject to re-
search in image processing and image compression applications, however reconfigurable
constant multiplications are not studied. As the results demonstrated, the proposed
ReMB massively reduces the resource utilization when compared to the parallel mul-
tipliers. The ratio of the savings increase with the increasing input word-length, as
the number of adders in the parallel multiplier increases while the number of adders in
the ReMB remains the same. Furthermore, 1-level analysis filter bank cost assessment
results also demonstrated that the proposed system massively improves the resource
utilization and power consumption compared to the open-literature and the conven-
tional reference design. The FPGA implementation results provided an insight that
the proposed structure of the multiplier block is low-cost and power efficient compared
to other FPGA implementations. Thus, such structures are suitable for DWT filter
banks and can be used for ASIC implementations and employed in low-cost embedded
platforms for ambulatory physiological signal monitoring and analysis.
Although an efficient implementation of the FIR filter based DWT filter banks is in-
troduced in Chapter 4, Chapter 3 presented that IIR filters are attractive alternatives
to FIR filters in terms of complexity reduction. Therefore, in Chapter 5 the design
and implementation of IIR filters based DWT filter banks are proposed for further
complexity reduction. For this purpose two IIR wavelet filters, ilet3 and ilet5, are
proposed, designed and implemented in floating and fixed point precision. As, the
IIR wavelet filters employed in the analysis filter bank suffer from non-linear phase
which necessitates the elimination of these effects for perfectly reconstructing the de-
187
composed input. In order to eliminate the non-linear phase effects and achieve a near
perfect reconstruction, the synthesis filters need to be the time-reversed versions of the
analysis filters which results in non-causal filters. Thus, two methods for realising the
non-causal filter implementation are introduced. The first method incorporated block
processing technique where the input is divided into small blocks and with the help of
dual-port RAMs time-reversal of the input is achieved. Then, the time reversed input
blocks are filtered with allpass sections, added and time reversed again to obtain the
required output. These filter banks are later converted into fixed point precision and
implemented in MATLAB/Simulink environment where the quantization effects are
evaluated for white Gaussian noise, ECG and EEG data. The evaluated systems are
also synthesized and place and routed on Kintex-7 FPGA device in order to evaluate
the resource utilization and power consumption of the IIR/IIR filter banks. However,
the implementation results demonstrated that employing the block processing method
increases the hardware complexity as well as the power consumption when compared
to the FIR filter based DWT filter banks which does not serve the purpose of this
research. Therefore, the second method for implementing the synthesis filter banks is
proposed which incorporated the design and implementation of FIR approximations of
the required anti-causal IIR synthesis filters. The length of the FIR filters are determ-
ined by evaluating the perfect reconstruction properties of the filter and the floating
point and fixed-point models are implemented. The fixed point models are then syn-
thesized and place and routed on a Kintex-7 FPGA. The filter banks composed of
IIR analysis and FIR synthesis filters are referred to as hybrid IIR/FIR wavelet filter
banks. The FPGA synthesis results demonstrated that the hybrid systems are less
188
power demanding and are attractive alternatives to the IIR/IIR wavelet filter banks.
The proposed hybrid IIR/FIR wavelet filter banks and the state-of-the-art FIR/FIR
wavelet filter banks are employed in an ECG denoising application based on DWT
denoising method. The results of this study demonstrated that the proposed systems
provide better frequency selectivity and hence better denoising performance compared
to the state-of-the-art FIR filter based systems with reduced computational complex-
ity. Therefore, the work proposed in Chapter 5, concluded that the IIR/IIR wavelet
filter banks are computationally complex and have high memory requirements whereas
their hybrid counterparts provide a more efficient implementation. In addition, it is
demonstrated that the hybrid systems are less computationally complex and require
less power compared to the FIR based filter banks while providing better filter spe-
cifications that are useful for applications such as denoising. The reduction in the
complexity and power requirements of the DWT filter bank by using the proposed
hybrid systems creates resources for additional methods and enables their employment
in power limited portable health monitoring applications.
6.2 Future Work
The work proposed in this research demonstrated that the hardware complexity and
power consumption of a biomedical signal processor can be massively reduced when
compared to the state-of-the-art design solutions for the decimation filters and the
DWT filter banks. The proposed designs’ hardware utilization and power consump-
tions are evaluated on a Kintex-7 FPGA however, these are also suitable for ASIC im-
plementations to be employed in low-cost embedded platforms for ambulatory physiolo-
189
gical signal monitoring and analysis. Therefore, design of a custom ASIC using the
proposed systems is the next step towards creating low-cost and low-complexity DSPs
for wearable body area network.
The work presented in this thesis can be further extended by incorporating various
implementation techniques for reducing the hardware complexity of DWT filter banks
which are required to be applied for several decomposition and reconstruction levels.
As the analysis and synthesis filters are the same for each level of DWT analysis and the
required operating system frequency is rather slow, time-multiplexed architectures can
be employed where only one set of analysis and synthesis filters is required to reduce
the hardware resources. However, time-multiplexed designs will require additional
memory and memory read-and write operations in order to access the intermediate
detail and approximation coefficients of the filter-bank, hence this will increase the
power consumption of the overall system. Therefore, alternative methods needs to be
investigated too.
Chapter 3 covers the design of a multi-stage IIR filter based decimation filters to be
employed in biomedical applications. Due to the importance of the time domain char-
acteristics of the biomedical data a non-linear phase analysis is carried out where ECG
data records are used for test purposes. The presented results demonstrated that the
non-linear phase of the allpass based half-band polyphase IIR filters, only result in a
negligible amount of distortion in the ECG signal. However, due to time limitation the
research only considered ECG signals which are commonly used for ambulatory mon-
itoring applications. In the future, the study undertaken can be extended to consider
EEG signals with smaller amplitudes and narrower bandwidth and EMG signals with
190
larger amplitude and wider bandwidth compared to the ECG signals.
Furthermore, in this thesis, a novel class of IIR/FIR hybrid wavelet filter banks is
proposed where the FIR filters are designed using the impulse response truncation
method. The constructed synthesis filters can approximately eliminate the non-linear
phase effects and the amplitude distortion caused by the analysis filter bank. However,
the selection of the FIR filters length is arbitrary and there is a trade of between low
complexity and low delay systems and the minimization of the amplitude, phase and
aliasing distortion. The impulse response truncation method is not optimal in the sense
of H1-norm. In other words, the selected filter lengths L0,1 do not provide the minimal
magnitude error. Therefore, this work can be further extended by investigation of
different approximation methods such as Linear Matrix Inequality (LMI), and Balanced
Model Truncation (BMT) which can potentially minimise the peak error with shorter
filter lengths [183,184].
A typical biomedical signal processor can be used for various applications, such as signal
detection, artifact removal and/or classification. The low-complexity and low-power
solutions proposed in this work can be easily extended to many other fields where the
DWT is to be employed in junction with other detection and/or noise reduction and/or
classification algorithms. Further investigation is required as these methods are usually
computationally complex and power demanding. For instance, the adaptive filtering
techniques used to eliminate the ocular artifacts from the EEG as in [185] include
iterative processing and updating of variables, or the independent component analysis




[1] D. McDonagh, O. Eljamaly, and A. Burdett, “1V 14 µW Switched-Opamp ∆Σ-
ADC for Bioelectric Data Aquisition,” in 4th IEEE/EMBS International Summer
School and Symposium on Medical Devices and Biosensors, ISSS-MDBS. IEEE,
2007, pp. 147–150.
[2] K. Muthusamy, T. H. Teo, and Y. P. Xu, “A 1-V 32-µW 13-bit CMOS Sigma-
Delta A/D Converter for Biomedical Applications,” in IEEE 8th International
Conference on ASIC, ASICON’09. IEEE, 2009, pp. 207–210.
[3] T. H. Teo, X. Qian, P. K. Gopalakrishnan, Y. S. Hwan, K. Haridas, C. Y. Pang,
H.-K. Cha, and M. Je, “A 700-W Wireless Sensor Node SoC for Continuous Real-
Time Health Monitoring,” IEEE Journal of Solid-State Circuits, vol. 45, no. 11,
pp. 2292–2299, 2010.
[4] X. Liu, Y. Zheng, M. W. Phyu, F. Endru, V. Navaneethan, and B. Zhao, “An
Ultra-Low Power ECG Acquisition and Monitoring ASIC System for WBAN
Applications,” IEEE Journal on Emerging and Selected Topics in Circuits and
Systems, vol. 2, no. 1, pp. 60–70, 2012.
[5] J. Garcia, S. Rodriguez, and A. Rusu, “A Low-Power CT Incremental 3rd Order
Sigma Delta ADC for Biosensor Applications,” IEEE Transactions on Circuits
and Systems I: Regular Papers, vol. 60, no. 1, pp. 25–36, 2013.
[6] M. Kumm and P. Zipf, Multiple Constant Multiplication Optimizations for Field
Programmable Gate Arrays. Springer, 2016.
[7] DH/Long Term Conditions, “Raising the Profile of Long Term Conditions
Care: A Compendium of Information,” Jan 2008. [Online]. Avail-
able: http://webarchive.nationalarchives.gov.uk/20130107105354/http:
/www.dh.gov.uk/prod consum dh/groups/dh digitalassets/documents/
digitalasset/dh 082067.pdf
[8] P. Large, “Long Term Conditions Compendium of Inform-
ation: Third Edition,” June 2015. [Online]. Available: http:
//www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/
populationestimates/bulletins/annualmidyearpopulationestimates/latest
[9] DH/ Long Term Conditions, “Long Term Conditions Com-




[10] Y. Hao and R. Foster, “Wireless Body Sensor Networks for Health-Monitoring
Applications,” Physiological Measurement, vol. 29, no. 11, p. R27, 2008.
[11] C. R. May, T. L. Finch, J. Cornford, C. Exley, C. Gately, S. Kirk, K. N. Jenkings,
J. Osbourne, A. L. Robinson, A. Rogers et al., “Integrating Telecare for Chronic
Disease Management in the Community: What Needs to be Done?” BMC Health
Services Research, vol. 11, no. 1, p. 1, 2011.
[12] R. Igual, C. Medrano, and I. Plaza, “Challenges, Issues and Trends in Fall De-
tection Systems,” Biomed. Eng. Online, vol. 12, no. 66, pp. 1–66, 2013.
[13] C. Wang, W. Lu, M. R. Narayanan, S. J. Redmond, and N. H. Lovell, “Low-
power Technologies for Wearable Telecare and Telehealth Systems: A Review,”
Biomedical Engineering Letters, vol. 5, no. 1, pp. 1–9, 2015.
[14] H. Cao, V. Leung, C. Chow, and H. Chan, “Enabling Technologies for Wireless
Body Area Networks: A Survey and Outlook,” IEEE Communications Magazine,
vol. 47, no. 12, pp. 84–93, 2009.
[15] L. Gatzoulis and I. Iakovidis, “Wearable and Portable eHealth Systems,” IEEE
Engineering in Medicine and Biology Magazine, vol. 26, no. 5, pp. 51–56, 2007.
[16] A. Lymberis and A. Dittmar, “Advanced Wearable Health Systems and
Applications-Research and Development Efforts in the European Union,” IEEE
Engineering in Medicine and Biology Magazine, vol. 26, no. 3, pp. 29–33, 2007.
[17] C. Marie, E. Daniel, F. Jean-Yves, e. E. Christoph, and C. Eric, “Smart Wear-
able Systems: Current Status and Future Challenges,” Artificial Intelligence in
Medicine, vol. 56, no. 3, pp. 137 – 156, 2012.
[18] K. Montgomery, C. Mundt, G. Thonier, A. Tellier, U. Udoh, V. Barker, R. Ricks,
L. Giovangrandi, P. Davies, Y. Cagle et al., “Lifeguard-A Personal Physiological
Monitor for Extreme Environments,” in 26th Annual International Conference
of the IEEE Engineering in Medicine and Biology Society, IEMBS’04., vol. 1.
IEEE, 2004, pp. 2192–2195.
[19] M. Sung, C. Marci, and A. Pentland, “Wearable Feedback Systems for Rehab-
ilitation,” Journal of Neuroengineering and Rehabilitation, vol. 2, no. 1, p. 1,
2005.
[20] B.-S. Lin, B.-S. Lin, N.-K. Chou, F.-C. Chong, and S.-J. Chen, “RTWPMS: A
Teal-Time Wireless Physiological Monitoring System,” IEEE Transactions on
Information Technology in Biomedicine, vol. 10, no. 4, pp. 647–656, 2006.
[21] P. K. Jain and A. K. Tiwari, “Heart Monitoring Systems A Review,” Computers
in Biology and Medicine, vol. 54, pp. 1–13, 2014.
193
[22] S. Ha, C. Kim, Y. M. Chi, A. Akinin, C. Maier, A. Ueno, and G. Cauwen-
berghs, “Integrated Circuits and Electrode Interfaces for Noninvasive Physiolo-
gical Monitoring,” IEEE Transactions on Biomedical Engineering, vol. 61, no. 5,
pp. 1522–1537, 2014.
[23] J. Hulzink, M. Konijnenburg, M. Ashouei, A. Breeschoten, T. Berset, J. Huisken,
J. Stuyt, H. de Groot, F. Barat, J. David et al., “An Ultra Low Energy Biomedical
Signal Processing System Operating at Near-Threshold,” IEEE Transactions on
Biomedical Circuits and Systems, vol. 5, no. 6, pp. 546–554, 2011.
[24] C. Bachmann, M. Ashouei, V. Pop, M. Vidojkovic, H. Groot, and B. Gyselinckx,
“Low-Power Wireless Sensor Nodes for Ubiquitous Long-tTerm Biomedical Signal
Monitoring,” IEEE Communications Magazine, vol. 50, no. 1, pp. 20–27, 2012.
[25] A. J. Casson, “Opportunities and Challenges for Ultra Low Power Signal Pro-
cessing in Wearable Healthcare,” in 23rd European Signal Processing Conference
(EUSIPCO). IEEE, 2015, pp. 424–428.
[26] S.-Y. Lee, J.-H. Hong, C.-H. Hsieh, M.-C. Liang, S.-Y. C. Chien, and K.-H.
Lin, “Low-Power Wireless ECG Acquisition and Classification System for Body
Sensor Networks,” IEEE Journal of Biomedical and Health Informatics, vol. 19,
no. 1, pp. 236–246, 2015.
[27] N. Verma, A. Shoeb, J. Bohorquez, J. Dawson, J. Guttag, and A. P.
Chandrakasan, “A Micro-Power EEG Acquisition SoC with Integrated Feature
Extraction Processor for a Chronic Seizure Detection System,” IEEE Journal of
Solid-state Circuits, vol. 45, no. 4, pp. 804–816, 2010.
[28] J. Yoo, L. Yan, D. El-Damak, M. A. B. Altaf, A. H. Shoeb, and A. P.
Chandrakasan, “An 8-Channel Scalable EEG Acquisition SoC with Patient-
Specific Seizure Classification and Recording Processor,” IEEE Journal of Solid-
State Circuits, vol. 48, no. 1, pp. 214–228, 2013.
[29] J. Jin and E. Sa´nchez-Sinencio, “A Home Sleep Apnea Screening Device with
Time-Domain Signal Processing and Autonomous Scoring Capability,” IEEE
Transactions on Biomedical Circuits and Systems, vol. 9, no. 1, pp. 96–104, 2015.
[30] A. Page, C. Sagedy, E. Smith, N. Attaran, T. Oates, and T. Mohsenin, “A Flex-
ible Multichannel EEG Feature Extractor and Classifier for Seizure Detection,”
IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 62, no. 2,
pp. 109–113, 2015.
[31] P. Turcza, “Distributed-Arithmetic-Based DWT Processor for Neural Recording
Systems,” in Mixed Design of Integrated Circuits and Systems, 2017 MIXDES-
24th International Conference. IEEE, 2017, pp. 591–595.
[32] Y. Yang, S. Boling, and A. J. Mason, “A Hardware-Efficient Scalable Spike Sort-
ing Neural Signal Processor Module for Implantable High-Channel-Count Brain
Machine Interfaces,” IEEE Transactions on Biomedical Circuits and Systems,
vol. 11, no. 4, pp. 743–754, 2017.
194
[33] X. Liu, Y. Zheng, M. W. Phyu, B. Zhao, M. Je, and X. Yuan, “Multiple Func-
tional ECG Signal is Processing for Wearable Applications of Long-Term Cardiac
Monitoring,” IEEE Transactions on Biomedical Engineering, vol. 58, no. 2, pp.
380–389, 2011.
[34] M. Khayatzadeh, X. Zhang, J. Tan, W.-S. Liew, and Y. Lian, “A 0.7-v 17.4-/spl
mu/w 3-Lead Wireless ECG SoC,” IEEE Transactions on Biomedical Circuits
and Systems, vol. 7, no. 5, pp. 583–592, 2013.
[35] Y.-J. Min, H.-K. Kim, Y.-R. Kang, G.-S. Kim, J. Park, and S.-W. Kim, “Design
of Wavelet-Based ECG Detector for Implantable Cardiac Pacemakers,” IEEE
Transactions on Biomedical Circuits and Systems, vol. 7, no. 4, pp. 426–436,
2013.
[36] X. Liu, J. Zhou, Y. Yang, B. Wang, J. Lan, C. Wang, J. Luo, W. L. Goh,
T. T.-H. Kim, and M. Je, “A 457 nW Near-Threshold Cognitive Multi-Functional
ECG Processor for Long-Term Cardiac Monitoring,” IEEE Journal of Solid-State
Circuits, vol. 49, no. 11, pp. 2422–2434, 2014.
[37] X. Zhang and Y. Lian, “A 300-mV 220-nW Event-Driven ADC With Real-Time
QRS Detection for Wearable ECG Sensors,” IEEE Transactions on Biomedical
Circuits and Systems, vol. 8, no. 6, pp. 834–843, 2014.
[38] H. Kim, S. Kim, N. Van Helleputte, A. Artes, M. Konijnenburg, J. Huisken,
C. Van Hoof, and R. F. Yazicioglu, “A Configurable and Low-Power Mixed Sig-
nal SoC for Portable ECG Monitoring Applications,” IEEE Transactions on Bio-
medical Circuits and Systems, vol. 8, no. 2, pp. 257–267, 2014.
[39] Y. Zou, J. Han, S. Xuan, S. Huang, X. Weng, D. Fang, and X. Zeng, “An Energy-
Efficient Design for ECG Recording and R-peak Detection Based on Wavelet
Transform,” IEEE Transactions on Circuits and Systems II: Express Briefs,
vol. 62, no. 2, pp. 119–123, 2015.
[40] N. Bayasi, T. Tekeste, H. Saleh, B. Mohammad, and M. Ismail, “A 65-nm Low
Power ECG Feature Extraction System,” in IEEE International Symposium on
Circuits and Systems (ISCAS). IEEE, 2015, pp. 746–749.
[41] N. Bayasi, T. Tekeste, H. Saleh, B. Mohammad, A. Khandoker, and M. Is-
mail, “Low-power ECG-Based Processor for Predicting Ventricular Arrhythmia,”
IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 24,
no. 5, pp. 1962–1974, 2016.
[42] P. Li, X. Zhang, M. Liu, X. Hu, B. Pang, Z. Yao, H. Jiang, and H. Chen, “A 410-
nW Efficient QRS Processor for Mobile ECG Monitoring in 0.18-µm CMOS,”
IEEE Transactions on Biomedical Circuits and Systems, vol. 11, no. 6, pp. 1356–
1365, 2017.
[43] R. F. Yazıcıog˘lu, C. Van Hoof, and R. Puers, “Introduction to Biopotential
Acquisition,” in Biopotential Readout Circuits for Portable Acquisition Systems.
Springer, 2009, pp. 5–19.
195
[44] L. Formaggia, A. Quarteroni, and A. Veneziani, Cardiovascular Mathematics:
Modeling and Simulation of the Circulatory System. Springer Science & Business
Media, 2010, vol. 1.
[45] A. G. Kle´ber and Y. Rudy, “Basic Mechanisms of Cardiac Impulse Propagation
and Associated Arrhythmias,” Physiological Reviews, vol. 84, no. 2, pp. 431–488,
2004.
[46] L. Sherwood, “Cardiac Physiology,” in Human Physiology: from cells to systems.
Cengage Learning, 2015, pp. 297–314.
[47] J. Malmivuo and R. Plonsey, “The Heart,” in Bioelectromagnetism: Principles
and Applications of Bioelectric and Biomagnetic Fields. New York, USA: Oxford
University Press, 1995, ch. 6, pp. 119–130.
[48] P. E. McSharry, G. D. Clifford, L. Tarassenko, and L. A. Smith, “A Dynamical
Model for Generating Synthetic Electrocardiogram Signals,” IEEE Transactions
on Biomedical Engineering, vol. 50, no. 3, pp. 289–294, 2003.
[49] G. D. Clifford, F. Azuaje, and P. McSharry, “ECG Statistics, Noise, Artifacts,
and Missing data,” in Advanced Methods and Tools for ECG Data Analysis, ser.
Artech House Engineering in Medicine & Biology Series. London, UK: Artech
House, 2006, vol. 6, ch. 3, pp. 55–99.
[50] H. Chatterjee, R. Gupta, J. Bera, and M. Mitra, “An FPGA Implementation of
Real-time QRS Detection,” in 2nd International Conference on Computer and
Communication Technology (ICCCT). IEEE, 2011, pp. 274–279.
[51] L. Srnmo and P. Laguna, “The Electroencephalogram–A Brief Background,” in
Bioelectrical Signal Processing in Cardiac and Neurological Applications. Lon-
don, UK: Elsevier Academic Press, 2005, ch. 2, pp. 25–53.
[52] L. Sherwood, “The Central Nervous System,” in Human Physiology: From Cells
to Systems, 9th ed. Boston,USA: Cengage Learning, 2015, ch. 5, pp. 133–182.
[53] N. Boutros, S. Galderisi, O. Pogarell, and S. Riggio, “EEG Recording and Ana-
lysis,” in Standard Electroencephalography in Clinical Psychiatry: A Practical
Handbook. West Sussex, England: John Wiley & Sons, 2011, pp. 13–30.
[54] S. Sanei and J. A. Chambers, “Introduction to EEG,” in EEG Signal Processing.
West Sussex, England: John Wiley & Sons, 2013, ch. 1, pp. 1–31.
[55] G. Schalk, D. J. McFarland, T. Hinterberger, N. Birbaumer, and J. R. Wolpaw,
“BCI2000: A General-Purpose Brain-Computer Interface (BCI) System,” IEEE
Transactions on Biomedical Engineering, vol. 51, no. 6, pp. 1034–1043, 2004.
[56] J. Qian, J. S. Barlow, and M. P. Beddoes, “A Simplified Arithmetic Detector for
EEG Sharp Transients-Preliminary Results,” IEEE Transactions on Biomedical
Engineering, vol. 35, no. 1, pp. 11–18, 1988.
196
[57] C. Papadelis, C. Kourtidou-Papadeli, P. D. Bamidis, I. Chouvarda, D. Koufogi-
annis, E. Bekiaris, and N. Maglaveras, “Indicators of Sleepiness in an Ambulatory
EEG Study of Night Driving,” in 28th Annual International Conference of the
IEEE Engineering in Medicine and Biology Society, EMBS’06. IEEE, 2006, pp.
6201–6204.
[58] C. J. Bell, P. Shenoy, R. Chalodhorn, and R. P. Rao, “Control of a Humanoid Ro-
bot by a Noninvasive Brain–Computer Interface in Humans,” Journal of Neural
Engineering, vol. 5, no. 2, p. 214, 2008.
[59] T. J. Sullivan, S. R. Deiss, T.-P. Jung, and G. Cauwenberghs, “A Brain-Machine
Interface Using Dry-Contact, Low-Noise EEG Sensors,” in IEEE International
Symposium on Circuits and Systems, ISCAS’08. IEEE, 2008, pp. 1986–1989.
[60] W. O. Tatum IV, L. Winters, M. Gieron, E. A. Passaro, S. Benbadis, J. Ferreira,
and J. Liporace, “Outpatient Seizure Identification: Results of 502 Patients us-
ing Computer-Assisted Ambulatory EEG,” Journal of Clinical Neurophysiology,
vol. 18, no. 1, pp. 14–19, 2001.
[61] J. J. Shih and W. O. Tatum, “Computer Assisted Ambulatory Electroenceph-
alography,” in Clinical Neurophysiology, 4th ed., ser. Contemporary Neurology
Series. New York, US: Oxford University Press, 2016, ch. 10, pp. 167–173.
[62] T. C. Chen, T. H. Lee, Y. H. Chen, T. C. Ma, T. D. Chuang, C. J. Chou, C. H.
Yang, T. H. Lin, and L. G. Chen, “1.4µW/Channel 16-Channel EEG/ECoG
Processor for Smart Brain Sensor SoC,” in IEEE Symposium on VLSI Circuits
(VLSIC). IEEE, 2010, pp. 21–22.
[63] G. Robertson, G. Caldwell, J. Hamill, G. Kamen, and S. Whittlesey, “Elec-
tromyographic Kinesiology,” in Research Methods in Biomechanics, 2nd ed.
Illinois, USA: Human Kinetics, 2013, ch. 8, pp. 179–201.
[64] J. G. Webster, “Biopotential Amplifiers,” in Medical Instrumentation Application
and Design, 4th ed. Massachusetts, USA: Wiley Global Education, 2009, ch. 6,
pp. 241–291.
[65] P. Kugler, C. Jaremenko, J. Schlachetzki, J. Winkler, J. Klucken, and B. Eskofier,
“Automatic Recognition of Parkinson’s Disease using Surface Electromyography
during Standardized Gait Tests,” in 35th Annual International Conference of the
IEEE Engineering in Medicine and Biology Society (EMBC). IEEE, 2013, pp.
5781–5784.
[66] C. J. Cheng, Z. P. Yang, C.-Z. Ou, and B.-S. Lin, “Wireless Surface Elec-
tromyography Monitoring System for Dysphagia Evaluation,” in Tenth Inter-
national Conference on Intelligent Information Hiding and Multimedia Signal
Processing (IIH-MSP). IEEE, 2014, pp. 37–40.
[67] M. Balouchestani and S. Krishnan, “Long-term Surface EMG Monitoring using
K-means Clustering and Compressive Sensing,” in Compressive Sensing IV, vol.
9484. International Society for Optics and Photonics, 2015, p. 94840Q.
197
[68] J. Fu, J. Chen, Y. Shi, and Y. Li, “Design of A Low-Cost Wireless Surface EMG
Acquisition System,” in 6th International IEEE/EMBS Conference on Neural
Engineering (NER). IEEE, 2013, pp. 699–702.
[69] Z. Ju, G. Ouyang, M. Wilamowska-Korsak, and H. Liu, “Surface EMG Based
Hand Manipulation Identification via Nonlinear Feature Extraction and Classi-
fication,” IEEE Sensors Journal, vol. 13, no. 9, pp. 3302–3311, 2013.
[70] A. Beneteau, G. Di Caterina, L. Petropoulakis, and J. J. Soraghan, “Low-
Cost Wireless Surface EMG Sensor Using the MSP430 Microcontroller,” in 6th
European Embedded Design in Education and Research Conference (EDERC).
IEEE, 2014, pp. 264–268.
[71] D. Brunelli, A. M. Tadesse, B. Vodermayer, M. Nowak, and C. Castellini, “Low-
cost Wearable Multichannel Surface EMG Acquisition for Prosthetic Hand Con-
trol,” in 6th IEEE International Workshop on Advances in Sensors and Interfaces
(IWASI). IEEE, 2015, pp. 94–99.
[72] A. Yousefian, S. Roy, and B. Gosselin, “A Low-Power Wireless Multi-Channel
Surface EMG Sensor with Simplified ADPCM Data Compression,” in IEEE In-
ternational Symposium on Circuits and Systems (ISCAS). IEEE, 2013, pp.
2287–2290.
[73] K. D. Manalo, N. B. Linsangan, and J. L. Torres, “Classification of Myoelectric
Signals Using Multilayer Perceptron Neural Network with Back Propagation Al-
gorithm in a Wireless Surface Myoelectric Prosthesis,” International Journal of
Information and Education Technology, vol. 6, no. 9, p. 686, 2016.
[74] D. Gabor, “Theory of Communication. Part 1: The Analysis of Information,”
Journal of the Institution of Electrical Engineers-Part III: Radio and Commu-
nication Engineering, vol. 93, no. 26, pp. 429–441, 1946.
[75] A. Haar, “Zur Theorie der Orthogonalen Funktionensysteme,” Mathematische
Annalen, vol. 69, no. 3, pp. 331–371, 1910.
[76] A. Graps, “An Introduction to Wavelets,” IEEE Computational Science and
Engineering, vol. 2, no. 2, pp. 50–61, Summer 1995.
[77] A. Grossmann and J. Morlet, “Decomposition of Hardy Functions into Square
Integrable Wavelets of Constant Shape,” SIAM journal on mathematical analysis,
vol. 15, no. 4, pp. 723–736, 1984.
[78] Y. Meyer, Wavelets and Operators, Advanced Mathematics. Cambridge Univer-
sity Press, Cambridge, 1992.
[79] S. G. Mallat, “A Theory for Multiresolution Signal Decomposition: The Wavelet
Representation,” IEEE Transactions on Pattern Analysis and Machine Intelli-
gence, vol. 11, no. 7, pp. 674–693, 1989.
[80] I. Daubechies, “Orthonormal Bases of Compactly Supported Wavelets,” Com-
munications on Pure and Applied Mathematics, vol. 41, no. 7, pp. 909–996, 1988.
198
[81] R. X. Gao and R. Yan, “From Fourier Transform to Wavelet Transform: A
Historical Perspective,” in Wavelets. Springer, 2011, pp. 17–32.
[82] S. Mallat, A Wavelet Tour of Signal Processing: The Sparse Way. Academic
Press, 2008.
[83] I. Daubechies, Ten Lectures on Wavelets. SIAM, 1992.
[84] T. Omari and F. Bereksi-Reguig, “An Automatic Wavelet Selection Scheme for
Heart Sounds Denoising,” in IWBBIO, 2014, pp. 1450–1462.
[85] J. Rafiee, M. Rafiee, N. Prause, and M. Schoen, “Wavelet Basis Functions in
Biomedical Signal Processing,” Expert Systems with Applications, vol. 38, no. 5,
pp. 6190–6201, 2011.
[86] S. Li, Y. Ji, and G. Liu, “Optimal Wavelet Basis Selection of Wavelet Shrinkage
for ECG De-noising,” in International Conference on Management and Service
Science,.
[87] A. N. Akansu, P. A. Haddad, R. A. Haddad, and P. R. Haddad, Multiresolution
Signal Decomposition: Transforms, Subbands, and Wavelets. Academic press,
2001.
[88] A. N. Akansu and M. J. Smith, Subband and Wavelet Transforms: Design and
Applications. Springer Science & Business Media, 2012, vol. 340.
[89] J. J. McKee, N. E. Evans, and D. Wallace, “Sigma-Delta Analogue-to-Digital
Converters for ECG Signal Acquisition,” in Proceedings of the 18th Annual In-
ternational Conference of the IEEE Engineering in Medicine and Biology Society,
Bridging Disciplines for Biomedicine, vol. 1. IEEE, 1996, pp. 19–20.
[90] R. C. Morling, I. Kale, S. Morris, and F. Custode, “DSP Engine for Ultra-
Low-Power Audio Applications [Codec Application],” in Proceedings of the 2003
International Symposium on Circuits and Systems, ISCAS’03, vol. 5. IEEE,
2003, pp. V–357.
[91] Y. Eminaga, A. Coskun, S. A. Moschos, and I. Kale, “Low Complexity All-Pass
Based Polyphase Decimation Filters for ECG Monitoring,” in 11th Conference
on Ph. D. Research in Microelectronics and Electronics (PRIME). IEEE, 2015,
pp. 322–325.
[92] I. Kale, A. Krukowski, and N. Murphy, “On Achieving Micro-dB Ripple Poly-
phase Filters with Binary Scaled Coefficients,” in Second International Sym-
posium on DSP for Communication Systems, Signal Processing Research Insti-
tute Adelaide, Australia. Citeseer, 1994, pp. 26–29.
[93] I. Kale, R. Morling, A. Krukowski, and D. A. Devine, “Architectural Design
Simulation and Silicon Implementation of A Very High Fidelity Decimation Filter
for Sigma-Delta Data Converters,” in IEEE Instrumentation and Measurement
Technology Conference, IMTC/94. IEEE, 1994, pp. 878–881.
199
[94] I. Kale, R. C. Morling, A. Krukowski, and D. A. Devine, “A High Fidelity Decim-
ation Filter for Sigma-Delta Converters,” in Second International Conference on
Advanced AD and DA Conversion Techniques and their Applications. IET, 1994,
pp. 30–35.
[95] I. Kale and R. C. Morling, “High Resolution Data Conversion via Σ-∆ Modulat-
ors and Polyphase Filters: A Review,” Measurement, vol. 19, no. 3, pp. 159–168,
1996.
[96] I. Kale and R. Morling, “Sigma-Delta Modulation and Polyphase Filtering: The
Key to High Fidelity Data Conversion and Instrumentation,” in International
Workshop on ADC Modeling, 1996, pp. 183–192.
[97] A. Krukowski and I. Kale, “Polyphase IIR Filter Banks for Subband Adaptive
Echo Cancellation Applications,” in Proceedings of the 2003 International Sym-
posium on Circuits and Systems, ISCAS’03, vol. 4. IEEE, 2003, pp. IV–405.
[98] I. Kale, “An Integrated Ultra-Low-Power DSP Engine for Biomedical
Applications-From Algorithm to Custom Silicon Realization,” in Isik 2003 work-
shop on New Challenges in Biomedical Engineering,invited keynote lecture, 14-16
July 2003.
[99] S. Tao and A. Rusu, “A Power-Efficient Continuous-Time Incremental Sigma-
Delta ADC for Neural Recording Systems,” IEEE Transactions on Circuits and
Systems I: Regular Papers, vol. 62, no. 6, pp. 1489–1498, 2015.
[100] Y. Yao, K. Zhang, H. Chen, and Y. Cheng, “The Decimator with Multiplier-Free
Realizations for High Precision ADC Applications,” in IEEE 10th International
Conference on ASIC (ASICON). IEEE, 2013, pp. 1–4.
[101] S.-Y. C. Chien, C.-H. Hsieh, M. P.-H. Lin, Q. Fang, and S.-Y. Lee, “Implement-
ation of a Real-time ECG Signal Processor,” in IEEE International Symposium
on Bioelectronics and Bioinformatics (ISBB). IEEE, 2014, pp. 1–4.
[102] S.-C. Lee, T.-J. Chen, and H. Chiueh, “A Multi-Channel Multi-Mode Physiolo-
gical Signals Acquisition and Analysis Platform,” in IEEE International Sym-
posium on Circuits and Systems (ISCAS). IEEE, 2013, pp. 397–400.
[103] E. Greenwald, E. So, Q. Wang, M. Mollazadeh, C. Maier, R. Etienne-Cummings,
G. Cauwenberghs, and N. Thakor, “A Bidirectional Neural Interface IC with
Chopper Stabilized BioADC Array and Charge Balanced Stimulator,” IEEE
Transactions on Biomedical Circuits and Systems, vol. 10, no. 5, pp. 990–1002,
2016.
[104] D. H. Gawali and V. M. Wadhai, “Mixed Signal SoC Based Bio-Sensor Node for
Long Term Health Monitoring,” in 2016 IEEE International WIE Conference
on Electrical and Computer Engineering (WIECON-ECE). IEEE, 2016, pp.
194–198.
200
[105] G. Gagnon-Turcotte, M. N. N. Khiarak, C. Ethier, Y. De Koninck, and B. Gos-
selin, “A 0.13-µm CMOS SoC for Simultaneous Multichannel Optogenetics and
Neural Recording,” IEEE Journal of Solid-State Circuits, no. 99, pp. 1–14, 2018.
[106] S.-Y. Park, J. Cho, K. Na, and E. Yoon, “Modular 128-Channel Delta Sigma
Analog Front-End Architecture Using Spectrum Equalization Scheme for 1024-
Channel 3-D Neural Recording Microsystems,” IEEE Journal of Solid-State Cir-
cuits, vol. 53, no. 2, pp. 501–514, 2018.
[107] C. Tsou, C.-H. Hsieh, M.-C. Liang, P.-W. Huang, and S.-Y. Lee, “ECG Acquis-
ition System with Heart Rate Detection and Energy Harvesting for Drivers,” in
International Symposium on Bioelectronics and Bioinformatics (ISBB). IEEE,
2015, pp. 31–34.
[108] R. Morling, I. Kale, C. Tsang, S. Morris, G. Hague, and C. Foran, “The Design
of a Sigma-Delta Codec for Mobile Telephone Applications,” in Second Inter-
national Conference on Advanced AD and DA Conversion Techniques and their
Applications. IET, 1994, pp. 11–17.
[109] V. P. P., “Fundamentals of Multirate Systems,” in Multirate Systems and Filter
Banks. Englewood Cliffs, New Jersey: Prentice-Hall, Professional and Technical
Division, 1993, pp. 100–178.
[110] I. Kale, R. Morling, A. Krukowski, and C.-W. Tsang, “A High-Didelity Decim-
ator Chip for the Measurement of Sigma-Delta Modulator Performance,” IEEE
Transactions on Instrumentation and Measurement, vol. 44, no. 5, pp. 933–939,
1995.
[111] S. Cetinsel, R. Morling, and I. Kale, “An FPGA Based Decimation Filter Pro-
cessor Design for Real-time Continuous-time Σ- ∆ Mdulator Performance Meas-
urement and Evaluation,” in 20th European Conference on Circuit Theory and
Design (ECCTD). IEEE, 2011, pp. 397–400.
[112] A. Krukowski and I˙. Kale, DSP System Design: Complexity Reduced IIR Filter
Implementation for Practical Applications. Springer Science & Business Media,
2007.
[113] G. Moody, “The PhysioNet / Computers in Cardiology Challenge 2008: T-Wave
Alternans,” Computers in Cardiology, vol. 35, pp. 505–508, 2008.
[114] A. Goldberger, L. Amaral, L. Glass, J. Hausdorff, P. Ivanov, R. Mark, J. Mietus,
G. Moody, C. Peng, and H. Stanley, “PhysioBank, PhysioToolkit, and PhysioNet:
Components of a New Research Resource for Complex Physiologic Signals,” Cir-
culation, vol. 101, no. 23, pp. E215–20, 2000.
[115] G. B. Moody and R. G. Mark, “The Impact of the MIT-BIH Arrhythmia Data-
base,” IEEE Engineering in Medicine and Biology Magazine, vol. 20, no. 3, pp.
45–50, 2001.
201
[116] J. R. Buck, M. M. Daniel, and A. C. Singer, Computer Explorations in Signals
and Systems using MATLAB. Prentice-Hall, Inc., 1997.
[117] C. M. Hymel, M. H. Skolnick, R. A. Stubbers, and M. E. Brandt, “Temporally
Advanced Signal Detection: A Review of the Technology and Potential Applica-
tions,” IEEE Circuits and Systems Magazine, vol. 11, no. 3, pp. 10–25, 2011.
[118] L. G. Tereshchenko and M. E. Josephson, “Frequency Content and Characterist-
ics of Ventricular Conduction,” Journal of Electrocardiology, vol. 48, no. 6, pp.
933–937, 2015.
[119] Y. Zou, J. Han, X. Weng, and X. Zeng, “An Ultra-low Power QRS Complex
Detection Algorithm based on Down-sampling Wavelet Transform,” IEEE Signal
Processing Letters, vol. 20, no. 5, pp. 515–518, 2013.
[120] R. Valenzuela and A. Constantinides, “Digital Signal Processing Schemes for
Efficient Interpolation and Decimation,” IEEE Proceedings Electronic Circuits
and Systems, vol. 130, no. 6, pp. 225–235, 1983.
[121] F. Harris, “Implementing High Performance, Low Computation, IIR Filters with
2-path Recursive All-pass Filters and the Harris-Sharpening Filter,” in 16th In-
ternational Conference on Digital Signal Processing. IEEE, 2009, pp. 1–5.
[122] D. Bull and D. Horrocks, “Primitive Operator Digital Filters,” IEE Proceedings
G (Circuits, Devices and Systems), vol. 138, no. 3, pp. 401–412, 1991.
[123] O. Gustafsson, A. G. Dempster, K. Johansson, M. D. Macleod, and L. Wanham-
mar, “Simplified Design of Constant Coefficient Multipliers,” Circuits, Systems,
and Signal Processing, vol. 25, no. 2, pp. 225–251, 2006.
[124] S. S. Demirsoy, I. Kale, and A. G. Dempster, “Synthesis of Reconfigurable Mul-
tiplier Blocks: Part I-Fundamentals,” in IEEE International Symposium on Cir-
cuits and Systems, ISCAS, 2005. IEEE, 2005, pp. 536–539.
[125] S. S. Demirsoy, I. Kale, and A. Dempster, “Reconfigurable Multiplier Blocks:
Structures, Algorithm and Applications,” Circuits, Systems, and Signal Pro-
cessing, vol. 26, no. 6, pp. 793–827, 2007.
[126] A. G. Dempster and M. D. Macleod, “Constant Integer Multiplication using
Minimum Adders,” IEE Proceedings-Circuits, Devices and Systems, vol. 141,
no. 5, pp. 407–413, 1994.
[127] ——, “Use of Minimum-adder Multiplier Blocks in FIR Digital Filters,” IEEE
Transactions on Circuits and Systems II: Analog and Digital Signal Processing,
vol. 42, no. 9, pp. 569–577, 1995.
[128] O. Gustafsson, “A Difference Based Adder Graph Heuristic for Multiple Constant
Multiplication Problems,” in IEEE International Symposium on Circuits and
Systems, ISCAS 2007. IEEE, 2007, pp. 1097–1100.
202
[129] R. I. Hartley, “Subexpression Sharing in Filters using Canonic Signed Digit Mul-
tipliers,” IEEE Transactions on Circuits and Systems II: Analog and Digital
Signal Processing, vol. 43, no. 10, pp. 677–688, 1996.
[130] A. Vinod, E.-K. Lai, A. Premkuntar, and C. Lau, “FIR Filter Implementation
by Efficient Sharing of Horizontal and Vertical Common Subexpressions,” Elec-
tronics Letters, vol. 39, no. 2, pp. 251–253, 2003.
[131] L. Aksoy, E. Da Costa, P. Flores, and J. Monteiro, “Exact and Approximate
Algorithms for the Optimization of Area and Delay in Multiple Constant Multi-
plications,” IEEE Transactions on Computer-Aided Design of Integrated Circuits
and Systems, vol. 27, no. 6, pp. 1013–1026, 2008.
[132] M. Potkonjak, M. B. Srivastava, and A. P. Chandrakasan, “Multiple Constant
Multiplications: Efficient and Versatile Framework and Algorithms for Exploring
Common Subexpression Elimination,” IEEE Transactions on Computer-Aided
Design of Integrated Circuits and Systems, vol. 15, no. 2, pp. 151–165, 1996.
[133] R. Pasko, P. Schaumont, V. Derudder, S. Vernalde, and D. Durackova, “A New
Algorithm for Elimination of Common Subexpressions,” IEEE Transactions on
Computer-Aided Design of Integrated Circuits and Systems, vol. 18, no. 1, pp.
58–68, 1999.
[134] Y. Voronenko and M. Pu¨schel, “Multiplierless Multiple Constant Multiplication,”
ACM Transactions on Algorithms (TALG), vol. 3, no. 2, p. 11, 2007.
[135] C.-Y. Yao, H.-H. Chen, T.-F. Lin, C.-J. Chien, and C.-T. Hsu, “A Novel
Common-Subexpression-Elimination Method for Synthesizing Fixed-Point FIR
Filters,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 51,
no. 11, pp. 2215–2221, 2004.
[136] H.-J. Kang and I.-C. Park, “FIR Filter Synthesis Algorithms for Minimizing the
Delay and the Number of Adders,” IEEE Transactions on Circuits and Systems
II: Analog and Digital Signal Processing, vol. 48, no. 8, pp. 770–777, 2001.
[137] O. Gustafsson, H. Ohlsson, and L. Wanhammar, “Improved Multiple Constant
Multiplication Using A Minimum Spanning Tree,” in Conference Record of the
Thirty-Eighth Asilomar Conference on Signals, Systems and Computers, 2004.,
vol. 1. IEEE, 2004, pp. 63–66.
[138] U. Meyer-Baese, Digital Signal Processing with Field Programmable Gate Arrays.
Springer Science & Business Media, 2007.
[139] R. M. Hewlitt and E. Swartzlantler, “Canonical Signed Digit Representation for
FIR Digital Filters,” in IEEE Workshop on Signal Processing Systems, SiPS,
2000. IEEE, 2000, pp. 416–426.
[140] Y. Jang and S. Yang, “Low-power CSD Linear Phase FIR Filter Structure using
Vertical Common Sub-expression,” Electronics Letters, vol. 38, no. 15, pp. 777–
779, 2002.
203
[141] S.-H. Yoon, J.-W. Chong, and C.-H. Lin, “An Area Optimization Method for
Digital Filter Design,” ETRI journal, vol. 26, no. 6, pp. 545–554, 2004.
[142] P. Tummeltshammer, J. C. Hoe, and M. Puschel, “Time-multiplexed multiple-
constant multiplication,” IEEE Transactions on Computer-Aided Design of In-
tegrated Circuits and Systems, vol. 26, no. 9, pp. 1551–1563, 2007.
[143] R. H. Turner and R. F. Woods, “Highly Efficient, Limited Range Multipliers
for LUT-based FPGA Architectures,” IEEE Transactions on Very Large Scale
Integration (VLSI) Systems, vol. 12, no. 10, pp. 1113–1118, 2004.
[144] X. Inc., “Xilinx UG070 Virtex-4 FPGA User Guide, User Guide,” User Guide,
vol. 1, 2008.
[145] A. Dempster and M. Macleod, “Multiplication by an Integer using Minimum
Adders,” in Mathematical Aspects of Digital Signal Processing, IEE Colloquium
on. IET, 1994, pp. 11–1.
[146] Xilinx, “Series FPGAs configurable logic block,” User Guide, San Jose, CA,
vol. 1, 2016.
[147] B. N. Singh and A. K. Tiwari, “Optimal Selection of Wavelet Basis Function
Applied to ECG Signal Denoising,” Digital Signal Processing, vol. 16, no. 3, pp.
275–287, 2006.
[148] S. Lahmiri, “Comparative Study of ECG Signal Denoising by Wavelet Threshold-
ing in Empirical and Variational Mode Decomposition Domains,” Healthcare
Technology Letters, vol. 1, no. 3, pp. 104–109, 2014.
[149] C. Ye, B. V. Kumar, and M. T. Coimbra, “Heartbeat Classification using Mor-
phological and Dynamic Features of ECG Signals,” IEEE Transactions on Bio-
medical Engineering, vol. 59, no. 10, pp. 2930–2941, 2012.
[150] P. Khandait, N. Bawane, and S. Limaye, “Features Extraction of ECG Signal for
Detection of Cardiac Arrhythmias,” International Journal of Computer Applic-
ations, vol. 2, no. 1, pp. 520–525, 2012.
[151] T. Kalayci and O. Ozdamar, “Wavelet Preprocessing for Automated Neural Net-
work Detection of EEG Spikes,” IEEE Engineering in Medicine and Biology
Magazine, vol. 14, no. 2, pp. 160–166, 1995.
[152] N. K. Al-Qazzaz, S. Hamid Bin Mohd Ali, S. A. Ahmad, M. S. Islam, and
J. Escudero, “Selection of Mother Wavelet Functions for Multi-channel EEG
Signal Analysis during a Working Memory Task,” Sensors, vol. 15, no. 11, pp.
29 015–29 035, 2015.
[153] G. Inuso, F. La Foresta, N. Mammone, and F. C. Morabito, “Wavelet-ICA
Methodology for Efficient Artifact Removal from Electroencephalographic Re-
cordings,” in International Joint Conference on Neural Networks, 2007. IEEE,
2007, pp. 1524–1529.
204
[154] B. Najafi, K. Aminian, A. Paraschiv-Ionescu, F. Loew, C. J. Bula, and P. Robert,
“Ambulatory System for Human Motion Analysis using a Kinematic Sensor:
Monitoring of Daily Physical Activity in the Elderly,” IEEE Transactions on
Biomedical Engineering, vol. 50, no. 6, pp. 711–723, 2003.
[155] P. Tummeltshammer, “SPIRAL Project,” 2006. [Online]. Available: http:
//spiral.net/hardware/multless.html
[156] Y. Eminaga, A. Coskun, and I. Kale, “Multiplier Free Implementation of 8-tap
Daubechies Wavelet Filters for Biomedical Applications,” in 2017 IEEE New
Generation of CAS (NGCAS). IEEE, 2017, pp. 129–132.
[157] X. Inc., “Multiplier v12. 0 Product Guide, LogiCORE, IP,” PG108, 2015.
[158] ——, “Xilinx Power Estimator (XPE),” 2018. [Online]. Available: https:
//www.xilinx.com/products/technology/power/xpe.html
[159] K. A. Wahid, M. A. Islam, and S.-B. Ko, “Lossless Implementation of Daubechies
8-tap Wavelet Transform,” in 2011 IEEE International Symposium on Circuits
and Systems (ISCAS). IEEE, 2011, pp. 2157–2160.
[160] M. M. Hasan and K. A. Wahid, “Low-Cost Lifting Architecture and Lossless
Implementation of Daubechies-8 Wavelets,” IEEE Transactions on Circuits and
Systems I: Regular Papers, 2018.
[161] P. Longa, A. Miri, and M. Bolic, “A Flexible Design of Filterbank Architec-
tures for Discrete Wavelet Transforms,” in IEEE International Conference on
Acoustics, Speech and Signal Processing, 2007, vol. 3. IEEE, 2007, pp. III–1441.
[162] M. A. Islam and K. A. Wahid, “Area- and Power-efficient Design of Daubechies
Wavelet Transforms using Folded AIQ Mapping,” IEEE Transactions on Circuits
and Systems II: Express Briefs, vol. 57, no. 9, pp. 716–720, 2010.
[163] C. Herley and M. Vetterli, “Wavelets and Recursive Filter Banks,” IEEE Trans-
actions on Signal Processing, vol. 41, no. 8, pp. 2536–2556, 1993.
[164] X. Zhang, T. Yoshikawa, and H. Iwakura, “Recursive Orthonormal Wavelet Bases
with Vanishing Moments,” IEICE Transactions on Fundamentals of Electronics,
Communications and Computer Sciences, vol. 80, no. 8, pp. 1472–1477, 1997.
[165] X. Zhang and T. Yoshikawa, “Design of Orthonormal IIR Wavelet Filter Banks
using Allpass Filters,” Signal Processing, vol. 78, no. 1, pp. 91–100, 1999.
[166] X. Zhang, W. Wang, T. Yoshikawa, and Y. Takei, “Design of IIR Orthogonal
Wavelet Filter Banks using Lifting Scheme,” IEEE Transactions on Signal Pro-
cessing, vol. 54, no. 7, pp. 2616–2624, 2006.
[167] E. Y. Remez, “Sur la De´termination des Polynoˆmes Dapproximation de Degre´
Donne´e,” Comm. Soc. Math. Kharkov, vol. 10, no. 4163, p. 196, 1934.
205
[168] E. Galijasevic and J. Kliewer, “On the Design of Near-perfect-reconstruction IIR
QMF Banks using FIR Phase-compensation Filters,” in Proceedings of the 2nd
International Symposium on Image and Signal Processing and Analysis, ISPA
2001. IEEE, 2001, pp. 530–534.
[169] S. R. Powell and P. M. Chau, “A Technique for Realizing Linear Phase IIR
Filters,” IEEE Transactions on Signal Processing, vol. 39, no. 11, pp. 2425–2435,
1991.
[170] L. R. Rabiner and B. Gold, “Theory and Application of Digital Signal Pro-
cessing,” ”Englewood Cliffs, NJ, Prentice-Hall, Inc., 1975. 777 p.
[171] A. Krukowski, R. C. Morling, and I. Kale, “Quantization Effects in the Poly-
phase N-path IIR Structure,” IEEE Transactions on Instrumentation and Meas-
urement, vol. 51, no. 6, pp. 1271–1278, 2002.
[172] J. A. van Alste, W. Van Eck, and O. Herrmann, “ECG Baseline Wander Reduc-
tion using Linear Phase Filters,” Computers and Biomedical Research, vol. 19,
no. 5, pp. 417–427, 1986.
[173] K. Englehart, B. Hudgins et al., “A Robust, Real-time Control Scheme for Mul-
tifunction Myoelectric Control,” IEEE Transactions on Biomedical Engineering,
vol. 50, no. 7, pp. 848–854, 2003.
[174] H. Hong and T. Yonghong, “A Novel Adaptive Wavelet Thresholding with
Identical Correlation Shrinkage Function for ECG Noise Removal,” Chinese
Journal of Electronics, vol. 27, no. 3, pp. 507–513, 2018.
[175] C.-I. Ieong, M. Li, M.-K. Law, P.-I. Mak, M. I. Vai, and R. P. Martins, “A 0.45 V
147-375 nW ECG Compression Processor With Wavelet Shrinkage and Adaptive
Temporal Decimation Architectures.” IEEE Trans. VLSI Syst., vol. 25, no. 4,
pp. 1307–1319, 2017.
[176] J. Zhu and X. Li, “Electrocardiograph Signal Denoising Based on Sparse Decom-
position,” Healthcare Technology Letters, vol. 4, no. 4, p. 134, 2017.
[177] P. Shemi and E. Shareena, “Analysis of ECG Signal Denoising using Discrete
Wavelet Transform,” in 2016 IEEE International Conference on Engineering and
Technology (ICETECH). IEEE, 2016, pp. 713–718.
[178] J. M. Abdul-Jabbar and R. W. Hmad, “Allpass-based Design, Multiplierless
Realization and Implementation of IIR Wavelet Filter Banks with Approxim-
ate Linear Phase,” in 2011 Fourth International Symposium on Innovation in
Information & Communication Technology (ISIICT). IEEE, 2011, pp. 118–123.
[179] J. Gao, H. Sultan, J. Hu, and W.-W. Tung, “Denoising Nonlinear Time Series
by Adaptive Filtering and Wavelet Shrinkage: A Comparison,” IEEE Signal
Processing Letters, vol. 17, no. 3, pp. 237–240, 2010.
[180] D. L. Donoho, “De-noising by Soft-thresholding,” IEEE Transactions on Inform-
ation Theory, vol. 41, no. 3, pp. 613–627, 1995.
206
[181] D. L. Donoho and I. M. Johnstone, “Adapting to Unknown Smoothness via
Wavelet Shrinkage,” Journal of the American Statistical Association, vol. 90, no.
432, pp. 1200–1224, 1995.
[182] S. R. Messer, J. Agzarian, and D. Abbott, “Optimal Wavelet Fenoising for Phono-
cardiograms,” Microelectronics Journal, vol. 32, no. 12, pp. 931–941, 2001.
[183] B. Beliczynski, I. Kale, and G. D. Cain, “Approximation of FIR by IIR Digital
Filters: An Algorithm based on Balanced Model Reduction,” IEEE Transactions
on Signal Processing, vol. 40, no. 3, pp. 532–542, 1992.
[184] Y. Yamamoto, B. D. Anderson, M. Nagahara, and Y. Koyanagi, “Optimizing FIR
Approximation for Discrete-time IIR Filters,” IEEE Signal Processing Letters,
vol. 10, no. 9, pp. 273–276, 2003.
[185] H. Peng, B. Hu, Q. Shi, M. Ratcliffe, Q. Zhao, Y. Qi, and G. Gao, “Removal
of Ocular Artifacts in EEGAn Improved Approach Combining DWT and ANC
for Portable Applications,” IEEE Journal of Biomedical and Health Informatics,
vol. 17, no. 3, pp. 600–607, 2013.
[186] C. Y. Sai, N. Mokhtar, H. Arof, P. Cumming, and M. Iwahashi, “Automated Clas-
sification and Removal of EEG Artifacts With SVM and Wavelet-ICA,” IEEE
Journal of Biomedical and Health Informatics, vol. 22, no. 3, pp. 664–670, 2018.
207
Appendix A
The truth tables used to design the controllers of the ReMBs designed for the lowpass
(h0pkq) and highpass (h1pkq) db4 analysis filters are presented in Tables A.1 and A.2.

















































































































S4h0  sA sB   sC
208















































































































S4h1  sA  sB   C
209
Appendix B
The state-diagram and the logic design of the 3-bit up/down counter employed in the
design of ilet3 IIR synthesis filter bank to generate the required addresses for the dual-
port RAM, are demonstrated in Figures B.1(a) and B.1(b) where A, B, and C are the
output bits from MSB to LSB, respectively. The ‘DIR’ input determines the direction
of the count, in other words if DIR = 1, then the counter counts up and if DIR =
0 then the counter counts down. The state-diagram and the logic design of the 4-bit
up/down counter employed in the design of ilet5 IIR synthesis filter bank to generate
the required addresses for the dual-port RAM, are demonstrated in Figures B.2(a) and
B.2(b), respectively. The truth tables used to design the controllers of the ReMBs
designed for the R0pzq of ilet3 and R0pzq and R1pzq of ilet5 FIR synthesis filters are




























Figure B.1: (a)State diagram and (b) logic design, of the 3-bit up/down counter employed in the









































Figure B.2: a)State diagram and (b) logic design, of the 4-bit up/down counter employed in the design
of ilet5 IIR synthesis filter bank to generate the required addresses for the dual-port RAM.
211









































































S20  sA sB   sAC   sBC































































S20  A` C
212
Table B.3: Truth tables used to design the ilet5 R1pzq ReMB controller.
CD
AB












S01  sAC   sA sB   sB sC sD
CD












S00  A sD  BC
CD












S11  sA  sC sD
AB














S10  sAC  D



























S20  sA sB   sC sD
AB








































S41  sA sB sC   sAD   CD
AB










S40  sA sBCD












S5  sAD   sCD
213
Appendix C
The analysis filter bank responses of the 7-level DWT filter banks that are used in
Section 5.7 for ECG signal denoising purposes, where D1 to D7 represent the filter
responses of the highpass branch at levels 1 to 7 and A7 represents the filter response
of the lowpass branch at level 7.
(a) (b)
(c) (d)




Figure C.1: Seven level analysis filter bank responses of (e) db8, (f) sym4, and (g) coif4.
215
