Efficient Implementation of LMS Adaptive Filter based FECG Extraction on
  an FPGA by Vasudeva, Bhavya et al.
FPGA IMPLEMENTATION OF EFFICIENT SERIES AND PARALLEL ARCHITECTURES
FOR LMS ADAPTIVE FILTER BASED FECG EXTRACTION
Bhavya Vasudeva† Puneesh Deora Pyari Mohan Pradhan Sudeb Dasgupta
Dept. of ECE, Indian Institute of Technology Roorkee, Uttarakhand, India
ABSTRACT
In this paper, the field programmable gate array implementa-
tion of a fetal heart rate monitoring system is presented. A
least mean squares algorithm based adaptive filter (LMS-AF)
is used for the purpose of fetal electrocardiogram (FECG) ex-
traction. Two different architectures, namely series and paral-
lel, are proposed for the LMS-AF, with the series architec-
ture targeting lower utilization of hardware resources, and
the parallel architecture enabling less convergence time and
lower power consumption. The results show that it effec-
tively detects the fetal R peaks with a sensitivity of 95.74%
to 100% and a specificity of 100%. The parallel architec-
ture shows upto 85.88% reduction in the convergence time
for non-invasive FECG database while the series architecture
shows 27.41% reduction in the number of flip flops used when
compared with the existing methods.
Index Terms— Adaptive filter, fetal electrocardiogram
(FECG), fetal heart rate (FHR), floating point (FP), field
progammable gate array (FPGA), least mean squares (LMS)
1. INTRODUCTION
Over the past few decades, analysis of fetal electrocardiogram
(FECG) has proven to be a tool of great importance when it
comes to monitoring the well-being of the fetus during preg-
nancy and labour, unearthing vital information like fetal heart
rate (FHR), heart rate variability, etc. FHR extraction using
FECG recordings serves as a suitable method for mobile, low-
cost, regular, real-time monitoring of the fetus. However, this
ECG signal contains FECG contaminated with maternal ECG
(MECG), power line interference, muscle noise, motion arti-
facts. Various statistical and time domain techniques [1] have
been exploited to extract the FECG, namely adaptive filtering,
blind source separation, wavelet transform, etc.
As the adaptive filter is an accurate method for FECG ex-
traction and its computational complexity is relatively low [1],
a least mean squares algorithm based adaptive filter (LMS-
AF) is chosen for this study. The system is implemented on
an FPGA as it is a better prototyping platform for hardware
implementation compared to digital signal processors (DSPs).
This can also serve as a step towards the development of a
low-cost FHR monitoring system as a system on chip.
†corresponding author: bvasudeva@ec.iitr.ac.in
Previous hardware implementations of LMS-AF based
FECG extraction include [2], which was tested only on raw
synthetic signals without any preprocessing, [3], which used
analog preprocessing and FPGA based FECG extraction and
[4], wherein LMS-AF was implemented on a digital signal
controller. Some other methods for FECG extraction [5–9]
have also been implemented on hardware. Some of these
works [3][6][7] reportedly use fixed-point arithmetic, which
leads to lower precision than floating point (FP) arithmetic.
The main contributions of this paper are as follows:
• For fetal R peak detection, a norm to determine the
threshold is proposed to avoid false positive detection.
• A floating point unit (FPU) is developed for the FPGA
implementation to support FP calculations, and hence
improve the precision and accuracy of the system.
• For the implementation of the LMS-AF module, two
different architectures, namely series and parallel, are
proposed. While the former is developed for lower
hardware utilization, the latter is better in terms of
lower latency and power consumption.
2. METHODOLOGY
In order to retain the MECG and FECG components [10] and
attenuate the sources of noise, the signals are preprocessed.
To remove the high frequencies, a fourth order low pass But-
terworth filter is used. The cutoff of the filter is kept at 45 Hz,
so that the ECG components in the signal are retained [10].
In order to supress the peak at 50 Hz due to the power line in-
terference, a notch filter [11] centered at 50 Hz (quality factor
25) is used. A two stage moving average filter is used to ob-
tain an approximation of the baseline wander (low frequency
noise) present in the signal. To remove baseline wander, the
output of this filter is subsequently subtracted from the input
signal. The operations performed are summarized below:
M1[n] =
1
N1
N1−1∑
i=0
x[n+ i−N1 + 1] (1)
M2[n] =
1
N2
N2−1∑
j=0
M1[n+ j −N2 + 1] (2)
where x is the input signal, n is the sample index, M1 and
M2 are the first and second stage means with window sizes
N1 andN2, respectively. N1, N2 are kept as 200 in this work.
ar
X
iv
:1
91
0.
07
49
6v
1 
 [e
es
s.S
P]
  1
4 O
ct 
20
19
In order to separate the FECG from the preprocessed tho-
racic and abdominal ECG signals, LMS-AF [12] is used.
Let x[n] = [x[n], x[n− 1], . . ., x[n−m+ 1]]T , represent
the input to the filter, where x[n] is the sample value at instant
n, m is the order of the filter, and (.)T denotes transpose op-
erator. w[n] = [wm−1[n], wm−2[n], . . ., w0[n]], is the weight
vector, where wm−k[n] is kth weight at sample instant n. The
output of the filter, at the nth sampling instant is given by,
y[n] = xT [n]w[n] (3)
The error signal is calculated as
e[n] = d[n]− y[n] (4)
where d[n] is the desired signal. The weight updation is
carried out as follows:
wk[n+ 1] = wk[n] + 2µ e[n]x[n−m+ k + 1] (5)
where µ is the step size, and k = 0, 1, ...,m−1. The thoracic
signal is considered as d[n], and the abdominal signal is x[n].
The criteria for convergence of the filter weights is satisfied
around 12 000 samples. m = 19 and µ = 7× 10−5.
A modified version of the Pan and Tompkins algorithm
[13] is used to detect the fetal R peaks. The output of the
LMS-AF is differentiated, squared, and then passed through
a mean filter of length 40 to obtain the signal sdm. Since the
extracted FECG contains residual maternal R peaks as well
as sharper fetal R peaks, these operations enhance the fetal R
peaks in sdm. In order to determine the threshold th which
can be used to distinguish between the fetal and maternal R
peaks, a new norm is proposed. The mean m1 of sdm is used
as a threshold to determine the local maxima present in it.
The mean m2 of the these local maxima is calculated. th is
then set as the mean of m1 and m2. Among the local max-
ima already determined, those with amplitude less than th are
discarded. The maximum FHR can be 200 beats per minute
(bpm) [14] which corresponds to 300 samples (for a sampling
frequency of 1 kHz). For the remaining local maxima, if the
immediate next local maxima lies within 200 samples, the lo-
cation of the local maxima with the larger amplitude of the
two denotes the fetal R peak.
The difference between the consecutive R peaks is the RR
interval. The average of these RR intervals is taken, and di-
vided by the sampling frequency to get the average RR in-
terval length in seconds. The FHR is calculated as follows:
FHR (bpm) = 60/[RR interval length (s)] (6)
3. IMPLEMENTATION ON FPGA
For the purpose of FPGA implementation, the proposed sys-
tem is divided into four units as shown in Fig. 1.
3.1. FPU
An FPU is developed for performing arithmetic operations
(addition, subtraction and multiplication) and comparison.
The FP numbers are converted to their 32-bit binary repre-
sentation as per the IEEE 754 standard [15]. The sign, expo-
nent and mantissa are denoted by sa, ea,ma and sb, eb,mb
Fig. 1: Block diagram of the FPGA implementation of the system.
Fig. 2: Pseudocodes for FP (a) addition and (b) comparison.
for the inputs A and B, respectively. sout, eout, and mout
denote the sign, exponent and mantissa of the output. The
procedure followed for the FP adder is listed in Fig. 2(a).
>> denotes the right shift operation. A similar procedure
is followed for the FP subtractor, except that when the sign
bits are same, subtraction is performed after comparing the
mantissas and when they are opposite, addition is performed.
For FP multiplication, sout = sa⊕ sb, eout = ea+ eb− bias,
mout = ma × mb, where ⊕ denotes the bit-wise XOR op-
eration. For all the three operations, when mout is not of the
form 1.fout, a repetitive process of shifting mout left by one
place and subtracting 1 from eout is followed till the first bit of
mout becomes 1. The procedure for FP comparison is listed
in Fig. 2(b). cout denotes the three cases,A > B (cout = 01),
A = B (cout = 00), and A < B (cout = 10).
3.2. Preprocessing
3.2.1. Butterworth Filter
In this module, the output is obtained as follows [11]:
O[k] = α I[k]+β O[k−1]+γ O[k−2]+δ O[k−3]+O[k−4]
where I[k] is the sample value at instant k and O[k] is
the output value. The constants are obtained from the trans-
fer function of the filter. In this work, α = 0.00308, β =
3.28391, γ = −4.08689, δ = 2.28117 and  = −0.48140.
3.2.2. Notch filter
This module works in a similar manner as the previous mod-
ule, following the equation [11]:
O[k] = α I[k]+β I[k−1]+γ I[k−2]+δ O[k−1]+O[k−2]
In this case, α = 0.99405, β = −1.31278, γ = 0.99405,
δ = 1.31272 and  = −0.98804.
3.2.3. Baseline wander removal
Fig. 3(a) shows the structure of the two stage moving average
filter. As in (1), M1 is the average of N1 values. In every
clock cycle, the input is added to M1 and x[N1 − 1] is sub-
tracted from M1, both after getting multiplied by 1N1 . For
the moving average operation, all the values in Memory 1 are
shifted by one position, so that x[N1 − 1] is discarded and a
new value is stored in x[0]. A similar procedure is followed
for calculating M2 as per (2). M1 is multiplied by 1N2 , stored
in Memory 2 and also added to M2. y[N2 − 1] can then be
directly subtracted from M2 to obtain the second stage mean.
M2 represents the baseline wander approximation. This out-
put is used to remove the baseline wander from the input by
performing one subtraction operation every clock cycle. The
latency of these three modules is 1 clock cycle.
Fig. 3: Illustration of (a) two stage moving average filter, proposed (b) series
and (c) parallel design of LMS-AF. β = 2µ. The abdominal and thoracic
signals have to be scaled appropriately (scaling) before being used.
3.3. FECG Extraction
3.3.1. Series Architecture
In Fig. 3(b), Memory 1 stores the vector xT [n] and an extra
element, and Memory 2 contains the weights of the filter. In
every clock cycle, one element of xT [n] is scaled, multiplied
by one element of w[n] and added to y[n]. The same ele-
ment of xT [n] is also copied to the immediate next position
in xT [n]. Thus, after m clock cycles, y[n] has been obtained
as in (3), and xT [n] has shifted by one index. In the follow-
ing clock cycle, error is calculated using (4), and the updated
value of the first weight of the filter is also obtained. This up-
dated weight value is stored in its position in the next clock
cycle. This sequential process is repeated until all the weights
are updated, which corresponds to m+ 1 clock cycles. After
a total of 2m + 1 clock cycles, a new input value is stored
in x[0] so that xT [n] is updated. The register containing d[n]
also gets updated. The required output for a particular pair of
xT [n] and d[n] is obtained after 2m+ 1 clock cycles.
3.3.2. Parallel Architecture
In Fig. 3(c), the Memory 1 (vector xT [n]) gets updated with
the next input value in every clock cycle. Each element of
xT [n] is scaled, and then multiplied with the elements from
the Memory 2 (vector w[n]). These are added to obtain y[n],
as in (3). The register containing d[n] is updated every clock
cycle, and used to calculate the error, using (4). Since 2µe[n]
is used in every weight updation, it is calculated first, and
subsequently multiplied with the values from Memory 1 to
update the weights, using (5). The updated weights are stored
in Memory 2. All operations are performed in 1 clock cycle.
3.4. FHR Detection
3.4.1. Peak Enhancement
In this module, the operations listed in Fig. 4(a) are executed
in every clock cycle. cval and pval denote the current and
previous input values, respectively. sdiff denotes the differ-
entiated and squared signal, M is a memory of size P , and N
denotes the number of inputs.
3.4.2. Detection of Local Maxima
In this module, the local maxima are determined, using m1
as threshold. The operations executed are summarized in Fig.
4(b). in denotes the current input,R1 (R2) is used to store the
input value (location) for the next clock cycle, and R3 (R4)
is used to conditionally store the input value (location). The
locations and values of local maxima are denoted by pl and
pv, respectively, m1 and m2 are initialized to zero.
3.4.3. Fetal R Peak Detection
In the first cycle, the inputs pl and pv are stored inR1 andR2,
respectively. The operations executed are summarized in Fig.
4(c). out denotes the locations of the fetal R peaks detected.
3.4.4. FHR Calculation
The RR intervals are estimated using the differences between
consecutive outputs of the previous module. Two registers
are used for storing the current and previous input. The es-
timated RR intervals are accumulated and averaged out, fol-
lowing which FHR is obtained using (6).
4. RESULTS AND DISCUSSION
To test the system for real signals, non-invasive FECG
(NiFECG) dataset [16] and database for identification of
systems (DaISy) [17] are used. The synthetic signals were
simulated using FECGSYN toolbox [18]. The thoracic and
abdominal signals are shown in Fig. 5. Figs. 6(a) and (d)
show the preprocessed real and synthetic signals. The fre-
quencies between 3 and 35 Hz are retained, while the other
frequencies are suppressed. The peak at 50 Hz is attenuated.
The output of the LMS-AF is shown in Figs. 6(b) and (e).
The signal obtained after peak enhancement (sdm) and the
detected fetal R peaks (fpk), are shown in Figs. 6(c) and (f).
Table 1 lists the quantitative results for the tested datasets.
It is observed that the proposed norm for the determination
of th results in no false positives. Table 2 summarizes the
comparison of performance of the proposed work with var-
ious FECG extraction methods. This work shows a 1.34%
increase in sensitivity, and 2% in accuracy for DaISy, along
with a 1.02% increase in senstivity and 7.51% in accuracy
when compared to works tested on both NiFECG and DaISy.
Fig. 4: Pseudocodes for (a) Peak Enhancement, (b) Detection of Local Maxima, and (c) Fetal R Peak Detection modules.
Fig. 5: Real (a) thoracic, and (b) abdominal signals. Synthetic (c) thoracic,
and (d) abdominal signals. In the abdominal signals, the higher peaks are
the maternal R peaks, and the fetal R peaks are annotated as fpk. Synthetic
dataset has no units (nu).
Fig. 6: Results of (a) preprocessing, (b) LMS-AF, and (c) peak detection for
real signals, and (d) preprocessing, (e) LMS-AF, and (f) peak detection for
synthetic signals. All values are normalized between 0 and 1.
Table 1
Results obtained for different datasets using the proposed approach.
Dataset FHR (bpm) Sensitivity Specificity Accuracy
ecgca444 [16] 152 95.74% 100% 97.37%
ecgca840 [16] 161 96% 100% 97.37%
ecgca746 [16] 147 97.78% 100% 98.53%
ecgca771 [16] 153 100% 100% 100%
DaISy Channel 2 [17] 143 100% 100% 100%
DaISy Channel 3 [17] 143 100% 100% 100%
Synthetic [18] 115 100% 100% 100%
Table 2
Comparison of performance of various FECG extraction methods.
Method Dataset Sensitivity Accuracy
Tai Le et al. [19] DaISy 98.68% 98.04%
Gini et al.[20] DaISy 91% 87.30%
Lima-Herrera et al. [21] DaISy and NiFECG 97.50% 92.10%
Morales et al. [3] DaISy and NiFECG - 89%
Proposed method DaISy 100% 100%
Proposed method DaISy and NiFECG 98.5% 99.04%
- Not reported.
The system is implemented on the Xilinx Artix-7 FPGA
(XC7A100TCSG324-1). The baseline wander removal mod-
ule consumes 2.691W power, and utilizes 820 LUTs and 94
FFs. The power per cycle is 89.683 µW. The detection of lo-
cal maxima module consumes 0.167W power, and utilizes 45
LUTs and 34 FFs. The power per cycle is 9.278 µW. All other
modules, except for the LMS-AF module, have minimal re-
source (∼0 LUTs and FFs) and power utilization (0.068W).
For the parallel design, the number of operations in ev-
Table 3
Comparison of hardware implementations of various FECG
extraction methods and the proposed approaches.
Convergence Power
Method Device Time Consumption LUTs FFs
(ms) (W)
LMS [2] XC6SLX45-3-CSG394 - - 1042 440
LMS [3] Spartan3E XC3S500E 600 - - -
LMS [4] dsPIC30F6014A 0.33 1.67* - -
OL-JADE [5] OMAP L137 948 - - -
Infomax [6] Stratix-V 3.4-54 0.55 - -5SGXEA7N2F45C2
Neural Stratix-II - - 9726 4324Network [7] EP2S15F484C3
BSS [8] Spartan-3 - - 3002 405
Proposed Series Artix-7 18.72 6.478 2368 294
Proposed Parallel XC7A100TCSG324-1 0.48 1.954 22 407 640
- Not reported. * The system proposed by Ortega et al. [4] consumes 1W, for the
current absorption of 200 mA and supply of 5 V, at 30 MHz operating frequency.
ery clock cycle is more as compared to the series design, and
hence the resource utilization is greater. On the other hand,
the series architecture distributes the same number of opera-
tions across more clock cycles, and hence needs more time
for convergence, and consumes more power. Table 3 summa-
rizes the comparison between the existing implementations of
various FECG extraction methods on different hardware plat-
forms and the proposed architectures after mapping the power
consumption and convergence time to operating frequency 50
MHz. The power per cycle is 7.823 µW for series and 65.133
µW for parallel architecture. As per the latencies, the conver-
gence time for the former is 39 times the convergence time
for the latter. The series architecture shows 27.41% reduction
in the number of FFs, whereas the number of LUTs is com-
parable to the other methods. The parallel architecture shows
upto 85.88% reduction in the convergence time when com-
pared with the methods [3][6][8] using NiFECG database. It
has also been reported that implementation of FP operations
on FPGA leads to excessive consumption of logic elements
[22]. The use of fixed-point numbers would have resulted in a
lower resource utilization and power consumption as the op-
erations involving FP numbers are computationally intensive
[2][8][22]. However, the use of fixed-point numbers compro-
mises with the accuracy of the system.
5. CONCLUSION
In this paper, the FPGA implementation of a FHR monitor-
ing system is presented. For FECG extraction, an LMS-AF
is used, and series and parallel architectures are designed for
its implementation. The precision and accuracy of the sys-
tem is significantly enhanced by the use of FPU. Comparison
with previous works shows that the parallel architecture re-
quires the least time for convergence of filter weights, while
the series architecture has lowest resource utilization.
6. REFERENCES
[1] M. Hasan, M. Reaz, M. Ibrahimy, M. Hussain, and J. Ud-
din, “Detection and processing techniques of FECG sig-
nal for fetal monitoring”, Biological Procedures Online,
vol. 11, pp. 263-295, 2009.
[2] I. Hatai, I. Chakrabarti, and Swapna Banerjee, “FPGA
implementation of a fetal heart rate measuring system”,
2nd International Conference on Advances in Electrical
Engineering (ICAEE 2013), pp. 160-164, 2013.
[3] D. P. Morales, A. Garcia, E. Castillo, M. A. Carvajal, L.
Parrilla, and A. J. Palma, “An application of reconfig-
urable technologies for non-invasive fetal heart rate ex-
traction”, Med. Eng. Phys., vol. 35, no. 7, pp. 1005-1014,
2013.
[4] R. Arias-Ortega, M. J. Gaitan-Gonzalez, and O. Yanez-
Suarez, “Implementation of a real-time algorithm for ma-
ternal and fetal heart rate monitoring in a digital signal
controller platform”, Annual International Conference of
the IEEE Engineering in Medicine and Biology, Buenos
Aires, pp. 2354-2357, 2010.
[5] D. Pani, A. Dessi, B. Cabras, and L. Raffo, “A real-time
algorithm for tracking of foetal ECG sources obtained by
block-on-line BSS techniques”, Computing in Cardiol-
ogy Krakow, pp. 65-68, 2012.
[6] E. Tortia, D. Koliopoulos, M. Matraxia, G. Danese, and
F. Leporati, “Custom FPGA processing for real-time fe-
tal ECG extraction and identification”, Computers in Bi-
ology and Medicine, vol. 80, pp. 30-38, 2017.
[7] M. A. Hasan, M. I. Ibrahimy, M. Reaz, J. Uddin, and
M. S. Hussain, “VHDL modeling of FECG extraction
from the composite abdominal ECG using artificial in-
telligence”, Proceedings of the IEEE International Con-
ference on Industrial Technology, ICIT, pp. 1-5, 2009.
[8] C. Chareonsak, F. Sana, Y. Wei, and X. Bing, “Design of
FPGA hardware for a real-time blind source separation
of fetal ECG signals”, IEEE International Workshop on
Biomedical Circuits and Systems, vol. S2, no. 4, pp. 13-
16, 2004.
[9] M. I. Ibrahimy, M. B. I. Reaz, F. M. Yasin, T. H. Khoon,
and A. F. Ismail, “Fetal QRS complex detection algorithm
for FPGA implementation”, IEEE International Confer-
ence on Computational Intelligence for Modelling, Con-
trol and Automation, and International Conference on
Intelligent Agents, Web Technologies and Internet Com-
merce (CIMCA-IAWTIC’06), Vienna, pp. 846-850, 2005.
[10] R. Sameni and G. D. Clifford, “A review of fetal ECG
signal processing issues and promising directions”, The
open pacing, electrophysiology & therapy journal, pp.
S3/4-20, 2010.
[11] L. O. Chua, C. A. Desoer, and E. S. Kuh, Linear and
nonlinear circuits, McGraw-Hill, New York, 1987.
[12] B. Widrow and S. D. Stearns, Adaptive signal process-
ing, Prentice-Hall, Englewood Cliffs, NJ, 1985.
[13] J. Pan and W. J. Tompkins, “A real-time QRS detection
algorithm”, IEEE Transactions on Biomedical Engineer-
ing, vol. 32, pp. 230-236, 1985.
[14] S. P. von Steinburg, A. L. Boulesteix, and C. Lederer, et
al., “What is the “normal” fetal heart rate?”, PeerJ, vol.
1, no. 82, 2013.
[15] “IEEE standard for binary floating-point arithmetic”,
1985 (ANSI/IEEE Std 754-1985).
[16] A. L. Goldberger, L. A. N. Amaral, and L. Glass, et
al., “PhysioBank, PhysioToolkit, and PhysioNet: Com-
ponents of a new research resource for complex physio-
logic signals”, Circulation Electronic Pages, vol. 101, no.
23, pp. e215-e220, 2000.
[17] De Moor B.L.R. (ed.), “DaISy: Database for the Iden-
tification of Systems”, Department of Electrical En-
gineering, ESAT/SISTA, K.U.Leuven, Belgium, URL:
http://www.esat.kuleuven.ac.be/sista/daisy/, May 2019.
[18] J. Behar, F. Andreotti, S. Zaunseder, Q. Li, J. Oster, and
G. D. Clifford, “An ECG model for simulating maternal-
foetal activity mixtures on abdominal ECG recordings”,
Physiological Measurement, vol. 35, no. 8, pp. 1537-
1550, 2014.
[19] T. Le, A. Moravec, M. Huerta, M. P. H. Lau, and H.
Cao, “Unobtrusive Continuous Monitoring of Fetal Car-
diac Electrophysiology in the Home Setting”, IEEE Sen-
sors, New Delhi, pp. 1-4, 2018.
[20] J. R. Gini, K. I. Ramachandran, R. H. Nair and P. Anand,
“Portable fetal ECG extractor from abdECG”, Interna-
tional Conference on Communication and Signal Pro-
cessing (ICCSP), Melmaruvathur, pp. 0845-0848, 2016.
[21] S. L. Lima-Herrera, C. Alvarado-Serrano, and P. R.
Hernndez-Rodrguez, “Fetal ECG extraction based on
adaptive filters and Wavelet Transform: Validation
and application in fetal heart rate variability analysis”,
13th International Conference on Electrical Engineering,
Computing Science and Automatic Control (CCE), Mex-
ico City, pp. 1-6, 2016.
[22] S. Sahin, A. Kavak, Y. Becerikli, H. Engin Demiray,
“Implementation of floating point arithmetics using an
FPGA”, Mathematical Methods in Engineering, Springer
Netherlands, Dordrecht, pp. 445-453, 2007.
