ABSTRACT This paper describes a mixed-signal electrocardiogram (ECG) system for personalized and remote cardiac health monitoring. The novelty of this paper is fourfold. First, a low power analog front end with an efficient automatic gain control mechanism, maintaining the input of the ADC to a level rendering optimum SNR and the enhanced recyclic folded cascode opamp used as an integrator for ADC. Second, a novel on-the-fly PQRST boundary detection (BD) methodology is formulated for finding the boundaries in continuous ECG signal. Third, a novel low-complexity ECG feature extraction architecture is designed by reusing the same module present in the proposed BD methodology. Fourth, the system is having the capability to reconfigure the proposed low power ADC for low (8 b) and high (12 b) resolution with the use of the feedback signal obtained from the digital block when it is in processing. The proposed system has been tested and validated on patient's data from PTBDB, CSEDB, and in-house IIT Hyderabad Data Base (IITHDB) and we have achieved an accuracy of 99% upon testing on various normal and abnormal ECG signals. The whole system is implemented in 180-nm technology resulting in 9.47-µW (at 1 MHz) power consumption and occupying 1.74-mm 2 silicon area.
I. INTRODUCTION
Among the several non-communicable diseases world is scourged with Cardiovascular Diseases (CVD) resulting in millions of deaths every year throughout the globe [1] . There is an exponential increment in human mortality rate, caused due to the delayed diagnosis, lack of proper distribution of health care facilities and prognosis centers in the vicinity. There is a need of a robust automated device for the early detection of the vital abnormal ECG signals in chronic CVD patients. To address the aforementioned problems, there is a tremendous necessity of developing a personalized CVD monitoring device powered by battery backup and with a very low form factor to achieve unobtrusiveness that works under the emerging cyber-physical system setup. This medical science and technological needs impose many challenges on such device development viz., low power consuming system design tradeoff between the on-board processing and RF communication, low complexity analog front end circuit design and energy harvesting or self-power mechanism to prolong battery life. In this paper, our emphasis is on proposing a low complexity system with on-board processing methodology, preceded by a low power analog module targeting remote and personalized CVD monitoring. Although the existing analog front-ends [2] fulfill the processing requirement, but it suffers from a major drawback of high power consumption which leads to fast draining of battery. To overcome this, and hence ensure prolonged operation of the monitoring device, each constituent block of the AFE is designed using the gm/ID optimization technique. This technique helps in the sizing the individual MOSFETs, such that the power consumption and noise levels are within the desired specifications. Further, to ensure a reliable operation of the AFE, the performance parameters individual blocks are verified across process, voltage, and temperature variations. On the other hand, main research thrust has been given so far on the low power feature extraction mechanisms [3] , [4] where the objective was to obtain several ECG characteristic features (P, Q, R, S, T) by proposing a methodology which could be implemented on either micro-controller or FPGA or an ASIC platform resulting in low power consumption. However, all of these mechanisms [5] , [6] have considered the ECG boundary containing all the required features will be supplied from some other process or module and hence has not been taken into consideration while doing the power budgeting and the hardware complexity analysis. Although such robust boundary detection approach would significantly impact the accuracy of the subsequent feature extraction modules, but definitely at the cost of higher circuit complexity resulting in higher power consumption than what reported in the literatures [7] , [8] . Hence, in this paper, we propose the following:
• a low power AFE, with an efficient automatic gain control mechanism which maintains the input of the ADC to a level rendering optimum SNR. The enhanced recyclic folded cascode opamp is used for the implementation of the required integrators for ADC (section II-A).
• the proposed methodology for finding the boundaries of ECG signal (starting and ending index of the single heart beat) as shown in Fig. 1 (a) (section II-B).
• a novel low complexity architecture of ECG feature extraction by reusing the same module used in the proposed Boundary Detection (BD) methodology (section II-B). Unlike the state-of-the-art (SoA) architectures [3] , the proposed architecture performs both BD and Feature Extraction (FE) together using only one Discrete Wavelet Transform (DWT) core as shown in Fig. 1(b) . The rest of the paper is organized as follows. Section II introduces the proposed methodologies and architecture design, section III presents the experimental results and section IV concludes the paper. Figure. 1(b) represents the block diagram of the proposed personalized remote CVD monitoring system. As shown in the Fig. 1(b) , a nalog block comprising of Automatic Gain Control (AGC), reconfigurable ADC and getting continuous feedback switching resolution signal from the digital block.
II. PROPOSED METHODOLOGY
As shown in Fig. 1 (b) the AGC is responsible for setting the gain of the Programmable Gain Amplifier (PGA) such that the input ECG signal is amplified to a level for which the ADC achieves its maximum SNR. The ADC is activated only when the AGC completes its operation. The ADC dumps the digital samples into a memory of fixed size. The ADC resolution is adjusted on the fly with a feedback signal from the Control Logic Unit (CLU) based on the classification of the ECG signal. Digital block comprises of the proposed BD, FE and the low complexity DWT core as the main processing unit (shown in bold in Fig. 1(b) ) alongside it has a CLU, memory, and an intelligent rule engine [9] which would estimate the trade-off between the on-board process and RF communication (Bluetooth low energy/ Zigbee/ WiFi) to either smartphone or tablet or cloud under the cyber-physical system framework as shown in Fig. 1(b) . Rule Engine (RE), as seen in Fig. 1(b) , decides the abnormality of the signal based on the extracted features [25] , [26] and takes an intelligent decision dynamically on the trade-off between on-board processing or RF communication [9] .
A. ULTRA LOW POWER ANALOG FRONT END FOR ACQUISITION AND DIGITIZATION
The prime features of the Analog Front End (AFE) as shown in Fig. 2 (a) are: (a) An ultra-low power two stage capacitive-coupled signal conditioning circuit providing programmable amplifications and tunable 2nd order highpass and lowpass characteristics. (b) An efficient AGC mechanism maintaining the input of the ADC to a level rendering optimum SNR. In all of the acquisition schemes (even those where gain is controlled via DSP) reported till date, the ADC is kept continuously. Hence, the present scheme is designed to consume less power than the conventional ones because (i) ADC is turned on only after AGC has finished its job, and (ii) it avoids gain control through a DSP which is very power hungry [12] .
(c) A low power, high resolution ADC achieving 2 nd order noise shaping while using a single integrator.
(d) Seamless ADC resolution reconfigurability with minimal hardware and almost zero power overhead. This facilitates the digital circuit, following the ADC, to reduce its power consumption by opting to process lowresolution data. The proposed ADC implementation aids area and power cost-efficient switching between the two modes vis-a-vis other ADC architectures e.g. SAR ADC, pipeline ADC, etc [13] .
1) SIGNAL CONDITIONING STAGE WITH PROGRAMMABLE GAIN AND TUNABLE BANDWIDTH
The proposed low power AFE provides the required gain and the bandpass filter characteristics. Fig. 2(b) , shows the schematic of the AFE. The heart of each of the stages is a fully differential Recyclic Folded Cascode (RFC) OTA adopted from [14] . Reconfigurability can be introduced in the AFE by incorporating the features of programmable gain and tunable bandwidth, hence extending its utility for the acquisition of various biopotential (ExG) signals. The voltage gain of the ADC. The circuit renders 1st order noise shaping when only components in black color are activated, and 2nd order noise shaping when the components in dark red are also activated. Here C S1 = 0.67pF, C S2 = 2.02pF, C Sfb = 0.79pF, C i 1 = 4pF, and V cm = 0.9V.
closed loop amplifier is varied by changing the feedback factor. While the high pass cutoff frequency is varied by changing the gate voltage of the pseudo-resistor, the lowpass cut-off frequency is varied changing the load capacitance CL [16] . Further, a T-feedback network is used to reduce the effective feedback capacitance so that the same gain can be achieved with much smaller capacitance [17] . The frequency response of the AFE is shown in Fig. 3(a) indicates the utility of this system for the acquisition of various ExG signals. 
2) AUTOMATIC GAIN CONTROL
Since the input voltage of the subsequent ADC needs to be maintained at an optimum level, the gain of this stage needs to be controlled. The output level of the AFE is controlled by the AGC stage, shown in Fig. 2(c) , by selecting the appropriate combination of capacitors from the capacitor bank [18] as shown in Fig. 2(a) . The AGC comprises of a peak detector, a voltage range detector, a fully digital decoder modeled as a Moore machine and a logic isolation block. The isolation block:
(i) decouples the gain control mechanism from the analog front end.
(ii) forwards the output of the AFE to the ADC, once the input falls in the desired amplitude range.
3) ADC DESIGN
ADC digitizes the AFEs output so that it can be taken up by the digital module. A conventional Discrete Time (DT) Cascaded Integrator Feedback (CIFB) ADC modulator with 2 nd order noise shaping is chosen for this work. Owing to the fact that the integrator is the most power hungry block of the ADC, two-fold strategy was employed to minimize the ADC power consumption. Firstly, the whole ADC was designed for minimum current as possible, keeping the target SNR intact. Second, the 2 nd order noise shaping was achieved using only a single integrator, which reduces the power consumption to nearly half that of the ADC employing two integrators as shown in Fig. 2(d) [14] . The integrator is implemented using the enhanced recyclic folded cascode (ERFC) [10] , [11] . The ERFC OTA has twice the bandwidth of a conventional folded cascode OTA for the same power and area. The plot for spectral density of the ADC output is shown in Fig. 3(c) . The figure clearly shows that the ADC renders 2nd order noise shaping and SFDR of 70dB.
4) SEAMLESS ADC RESOLUTION RECONFIGURABILITY
The output of the ADC is taken up by the subsequent digital circuit for relevant signal processing like classification and feature extraction. Since the power consumed by this digital circuit is proportional to the resolution of the data it is processing, the digital circuit may opt to reduce its power consumption by reducing the resolution of the data it is processing. A control signal from the digital circuit selects the output resolution of the ADC. The proposed ADC is designed to work in two modes controlled by the DSP (i) the low-resolution (8 bits), and (ii) high-resolution (12 bits) mode. In the low-resolution mode, the modulator provides 1 st order noise shaping using one integrator whereas in the high-resolution mode, the ADC modulator provides 2 nd order noise shaping while using only a single integrator. Since integrators are the most power hungry circuits in the ADCs, here, a higher resolution is extracted keeping the power consumption nearly the same.
B. PROPOSED ON THE FLY BOUNDARY DETECTION METHODOLOGY
As shown in the Fig. 1 (b) Haar-based DWT core will be shared by the proposed BD and FE modules. The proposed BD methodology works on the R-Peak and boundary estimation from an ECG signal. To get the optimum R-peak, the analysis has to be performed on third resolution level of DWT, it has filter bank structure with a cascaded high pass (h[n]) and low pass (l[n]) filters [3] and at every stage we get the half the number of coefficients as output w.r.t the number of samples at the filter input. It is to be noted that, to keep the computational complexity low in terms of required mathematical operations, we have selected the Haar wavelet the simple wavelet function.
Haar wavelet removes the noise and isoelectric line wandering of ECG signals, which is shown to be more suitable for health monitoring applications [3] .
To begin, 'N' ECG samples [ECG_data] has been applied as input to the first level of DWT, due to the downsampling after every stage of the filter gives N / 2 L coefficients as output, here 'L' represents the resolution level of DWT, the number of output coefficients obtained at the third resolution level are N / 2 3 . Considering N = 4096 ECG samples results in 512 detailed (cD_L3) from high pass, please see the line number 4 of Algorithm 1. The corresponding equations of the filters are (1) and (2), where
where n = 1 to
Here 'L' represents the resolution level of DWT. In the above equation, the factor 1 / √ 2 can be eliminated because it is just a constant multiplication factor for all the samples which does not change the morphology of the input ECG signal. Possibility to obtain R-peak when the ECG is sampled at 1 KHZ is at least once in 1024 samples, therefore, to get the R-peak, (cD_L3) coefficients are divided into m = N / 1024 (sampling rate) subframes, as N = 4096 the cD_L3 coefficients should be divided into 4 sub-frames as shown in Fig. 4(c) where each sub-frame holds N / (2 L Xm) coefficients i.e. each sub-frame consists of 128 coefficients for N = 4096. N is chosen as 4096 because there will be at least three frames (heart beats) in 4096 samples, to get boundaries of a single heart beat there should be a minimum requirement of at least three frames, so this made us to take 4096 ECG samples in the design. Keeping in the mind of architectural implementation, we have not increased the depth more than 4096 samples, as the increment in memory depth will lead to the increment in area and power consumption of the design. In Alogorithm 1, cD_L3(1 : 128) to cD_L3(384 : 512) are the sub-frames please see the line number 7. Getting the maximum index from all the sub-frames are shown in Algorithm 1, please see the line number 7. While finding the maximum and minimum pairs there is possibility of missing some other pair in the SF which are highlighted in red color as shown in Fig. 4(c) . We can capture those pairs based on two factors, 1) the amplitude value of the maximum missing coefficient should be least 60%of the min_4 (Please see the Algorithm 1, line number 8) which is called threshold value.
2) The index difference between the missing pair and the pre and post pairs should be greater than the value 50. The threshold is calculated as follows: Th = K% (min_4). Here the value K = 60%, as the value is taken based on the statistical analysis performed on the three databases (PTBDB, CSEDB and IITH DB), there is a chance of detecting the noise if the threshold value is less than 60%. The logic to adopt the hard value 50 is obtained based on the number of ECG samples between consecutive R-peaks at the third level of DWT.
The minimum number of samples in the consecutive R-peak under 1 kHz sampling frequency are 400. Since the analysis is performed on the third resolution, the R-peak Algorithm 1 Pseudo code for Boundary Detection Require: Boundaries of each ECG beat to be calculated. equivalent points in DWT domain will have to scale down proportionally, therefore level 3 DWT the scaled factor will be 2 3 therefore, 400 / 2 3 results in the value 50. The value 400 will be changed if the digital samples are coming at different sampling rate other than from 1 KHz. Therefore, the occurrence of successive maximums will be greater than 50 samples difference in level 3 detailed coefficients of DWT as shown in Fig. 4(a) . The Comparator has been taken to compare all 512 cD_L3 coefficients with the threshold value and gives an outcome as '1' if the coefficient value is greater than the threshold else, the outcome is '0', it is clearly shown in Fig. 4(a) and explained in the line number 9 to 14 of Algorithm 1. The output of the comparator '1' s and '0' s will be stored in the memory (comp_mem) whose depth is 512 with word length of 1 bit. To find the missing maximum coefficients in cD_L3 coefficients, count the number of '1' s in the 'comp_mem' such that the index difference between the two successive '1' s should be greater than 50 as shown in Fig. 4(a) and Algorithm 1, please see the line number 32.
In the Fig. 4(a) the difference between the index_1 and index_2 is greater than 50, this signifies index_1 is one of the maximum coefficients in the cD_L3 and its value has to be stored in a memory (store_index) Algorithm 1, please see the line number 32. Similarly, other maximum coefficients have been calculated and stored in 'store_index' memory of variable depth 4 to 7, 'mem_max' depth depends on the number of pairs occurring in cD_L3 coefficients. For every maximum index there is a minimum index nearby (±10) maximum in cD_L3 coefficients, hence we get the minima's as explained in Algorithm 1, please see the line number 40. The value ±10 has been taken on statistical analysis, by observing various ECG signals. The R-peak has been calculated by projecting maxima and minima values to the ECG memory as explained in Algorithm 1, please see the line number 40 and it is clearly shown in the Fig. 4(b) . The boundaries are calculated by taking the average over R-peaks as explained in Algorithm 1, please see the line number 54. In the real time, ECG wave may start at any point within P, Q, R, S, T and the ECG signal may or may not have the initial (B0) and final (B5) boundaries as shown in Fig. 4(b) . Line number 42 to 53 of Algorithm 1 explains the condition about the occurrence of first (B0) and last boundary (B5) of continuous ECG signal.
The existing feature extraction algorithm [3] has applied Maximum Modulus Analysis (MMA) on cD_L3 to get the temporal boundaries let say, t1 and t2 as shown in Fig. 4(c) . These algorithms used to calculate R-peak based on conditions whether t1 < t2 or t1 > t2. In the proposed algorithm, there is no necessity to check these conditions, instead, left shift the t1 and t2 by three times, say x1 = (t1<<3) and x2 = (t2<<3) to get the R-peak and find the absolute value in the ECG memory within the range x1 and x2, resulting in the hardware optimization upon ignoring the conditions. The accuracy of the algorithm has been improved at the stage of finding the P / T waves, where the existing algorithms [3] analyzed the extraction of P / T at fifth level by considering the QRS on and QRS off obtained at the third. To find the exact values of P / T , the values of QRS on and QRS off values should be divided by 2 2 since analyzing is done at fifth level, which improves the accuracy and also the low complexity is achieved by discarding the LSB bits instead of division hardware and right shifting.
C. LOW COMPLEXITY FEATURE EXTRACTION ARCHITECTURE
The obtained start and end boundaries and R_peaks as shown in Fig. 4(b) from the boundary detection methodology will be given as input to FE module. The remaining features to be extracted are QRS complex and P / T intervals and their indices from the main memory (ECG memory) of all the frames. The following explanation of Algorithm 2 is for one frame and the same applies to all other frames extracted from BD algorithm. To identify the QRS boundaries QRS on and QRS off we have adopted the concept from [3] . In this methodology, the accuracy is achieved along with architecture optimization in extracting the 'Q' and 'S' indices and P/T wave intervals along with their exact indices. Generally, if the R-Peak is positive, then the 'Q' and 'S' peaks will be negative and vice versa. In most of the feature extraction algorithms [3] have not considered in finding the negative side of 'Q' and 'S', losing the accuracy of the algorithm, where in the proposed method, we are able to find the 'Q' and 'S' points in a very accurate way. Once we get the R-Peak index, we need to check whether the value of R-peak is positive or negative by passing the index value to the location of main memory (ECG memory). To find the 'Q' index, we need to find the minimum value between the QRS on and R-Peak when the R-peak is positive, else we need to find the maximum when the R-peak is negative, the same case is applicable in finding the 'S' point but the range changes from R-peak to QRS off . The optimization is achieved in getting the P / T waves in terms of decreasing the computational complexity by removing unnecessary conditions [3] . Algorithms till date have tried to find the maximum and minimum for P / T peaks based on the conditions occurred after applying MMA on cD_L5 coefficients, and other complex functions [3] . To explain the low complexity of the optimized algorithm, we have taken an example. Let 's say x_1 and x_2 are the maximum and minimum coefficients in the cD_L5 coefficients in the range of '1' to (QRS on >> 2, right shift by two times) then, irrespective of whether x_1 index is greater than x_2 or vice versa, we can directly go to main memory (4096 samples) and find the absolute value between the range (P_on = x_1<< 5) and (P_off = x_2 << 5) for getting P peak index, please see the line number of 11 of Algorithm 2. This logic avoid the extra hardware architecture required for holding the conditions (whether x_1> x_2 or x_1<x_2) and while implementing we have not used any bulky hardware like multipliers and shifters, instead we have appended that many number of zeroes to get the same value. The same procedure is followed in finding the T peak, T_on and T_off indices, whereas the ranges change from (QRS off >>2) to cD_l5_end (last sample in cD_L5) for the detailed coefficients of fifth resolution level, please see the line number of 11 of Algorithm 14. The above mentioned pseudo code is for one frame and it applies to all other frames got from boundary detection algorithm as shown in Fig. 4(b) .
III. RESULTS AND DISCUSSIONS
The combined validation of the whole system is performed using the AMS simulator of Cadence Virtuoso. The input (ECG) is given in pwl format and fed it to the analog block. The digital output from ADC is fed to the digital block and the input and output waveforms of the entire system are monitored on the AMS simulator for the verification.
Switching of ADC resolution from low (8 bit) to high (12 bit) occurs only when there is any abnormal in heart rate count. Initially the outcome of ADC is an 8 bit ECG data, the digital block will process the data and if any variation in the heart rate count compared to the normal condition of the patient then the control signal from the digital block will act as a switch to change the resolution of the ADC to 12 bit to maintain the accuracy in the classification of ECG while extracting the features. The reason for choosing 8-bit resolution is to reduce the power consumption of the chip while processing. Even at the 8-bit resolution we can easily track the heart rate of the patient without losing the accuracy in calculating the heartbeat count. However, after the abnormal heart rate detection, to maintain the accuracy of classification in extracting the features of ECG we go for high resolution (12 bit) of the ECG data. Therefore, any abnormal variation in the heart rate count leads to the switching from low resolution to high resolution of the ADC. The digital module consumes 6.86 µW and 7.47 µW power at 1MHZ for 8 and 12 bit ECG data respectively.
A. ANALOG FRONTEND VALIADTION
The complete analog module was validated on the ECG signal taken from the PTBDB [15] , CSEDB [21] and IITHDB. The digital output of the ADC (stream of 1s and 0s) generated by the Spectre simulator in cadence is exported to Matlab Simulink, where it is passed through a CIC filter VOLUME 4, 2016 to reconstruct the ECG signal. As is shown in Fig. 3(b) the reconstructed waveform captures all the essential features of the ECG signal. The performance parameters of the designed AFE are compiled in Table II . Table III summarizes the performance of the designed ADC.
B. PROPOSED BD AND FE VALIDATION
The validation has been done by testing the proposed methodology on 350 test cases of different ECG diseases (Myocardial Infraction, Hypertrophy, sinus arrhythmia, ventricular arrhythmias, etc.) and normal ECG signal database taken from PTBDB [15] , CSEDB [21] and in-house IITHDB, these records are standard 12-lead ECG sampled at a rate of 1 KHz. The stability and the correctness of the proposed system is evaluated by the validating the whole system using the various ECG normal and abnormal signals obtained from the ECG database [15] , [21] . We have implemented the design on three platforms, ASIC, FPGA, and ARM board. The performance evaluation of BD and FE are shown in the TABLE IV and TABLE V respectively The design is implemented on ARM cortex M3 based microcontroller LPC1768 which has 500 KB flash memory and 64 KB SRAM. The design is coded in embedded C and compiled for LPC 1768 using ARM mbed online compiler. The overall memory consumed for implementation is 39.1kB (8%) of flash and 0.7kB (2%) of RAM. After processing if the ECG signal is detected as abnormal by the rule engine logic [9] , then the LED1 on LPC1768 board blinks and simultaneously sends patient fiducial points and ECG signal samples to the doctor mobile phone via Bluetooth as shown in Fig. 6 . Verilog code is written for the integration of BD and FE, where the Xilinx ISE tool is used to verify the results, we have used Xilinx inbuilt memory core (BRAM) in storing the ECG samples and the DWT coefficients. The BIT file is downloaded onto the Xilinx Virtex-7 FPGA chip using JTAG cable, the final results from the FPGA have been observed on the Chipscope pro tool. The design has occupied 19 % of the slice LUTs present in the Virtex-7 board. The proposed system is implemented in 180 nm technology resulting in 9.47µW (@ 1 MHz) power consumption and occupying 1.74 mm 2 silicon area. Table VI summarizes the comparison study with our proposed work and the state-of-the-art ECG-based cardiac health monitoring architectures. However, it is to be noted that among the existing architectures (Table VI) , [7] focused only on the artifacts removal and R-peak detection but not on extracting features and classification of ECG. Reference [8] has achieved comparable power consumption like the proposed one, but the application is limited to ECG acquisition and the heart rate monitoring only without detailed feature extraction.
IV. CONCLUSION
This paper presents a mixed-signal system for personalized and remote cardiac health monitoring. We proposed here novel methodologies to reduce the power consumption of the chip. Subsequently, the architecture based on the proposed methodology has been designed and performance has been compared with the state-of-the-art designs. Low power analog front end with an efficient automatic gain control mechanism, maintaining the input of the ADC to a level rendering optimum SNR and the enhanced recyclic folded cascode opamp used as an integrator for ADC is our first contribution. Secondly, a novel on-the-fly PQRST BD methodology is formulated for finding the boundaries. A novel low-complexity ECG feature extraction architecture is designed by reusing the same module present in the proposed BD methodology is our third contribution. As shown in the Section III, the results obtained from the proposed system have the medical significance in terms of detecting abnormal ECG waves. The proposed system is having the capability to reconfigure the ADC from low (8 bit) to high (12 bit) resolution using a feedback signal from the digital block is our fourth contribution. We have taken the ECG database from the PTBDB, CSEDB and in-house IIT Hyderabad DB (IITHDB) to validate the whole system and we got an accuracy of 99% upon testing on various healthy and unhealthy ECG signals. The whole design is occupying an area of 1.74 mm 2 and consume 9.47µW (@ 1 MHz) power using the technology node of 180nm. 
