ABSTRACT Electrocardiogram monitoring is crucial for the prevention and treatment of cardiovascular diseases. In this paper, we developed a wearable device involving multiple leads that monitors cardiac electrical activity using very large-scale integration technology to facilitate long-term activity monitoring. In addition to using limb leads and augmented limb leads for monitoring heart activity, six unipolar chest leads were attached to six positive electrodes and placed on the surface of the chest to record electrical activity of different regions of the heart. The multiple leads lie perpendicular to the frontal plane and were integrated within a single circuit to reduce hardware costs and size of the device. The proposed architecture also allows users to switch between lossless and lossy compression to control the power consumption and bit compression ratio. The effectiveness of the proposed approach was verified by fabricating a chip using 0.18-µm CMOS technology. The proposed architecture core has an operating frequency of 41 MHz and a gate count of 4K.
I. INTRODUCTION
Personal heart rate monitoring devices have been developed to facilitate the monitoring of potential health risks associated with cardiovascular diseases. The body sensor system is a special class of wireless sensor network and comprises a variety of miniature biosensors. These biosensors are employed in the body to continuously monitor biomedical signals. In addition to the limb leads and augmented limb leads that are used to monitor the electrical activity of the heart, six unipolar chest leads are attached to six positive electrodes and placed on the surface of the chest to record the electrical activity of different regions of the heart [1] . The leads lie perpendicular to the frontal plane. However, a compression algorithm is necessary to manage the signals from these additional leads. Power consumption, hardware costs, and device size are crucial parameters that must be reduced, and thus, some designs that use high-quality reconstructed signals with very-large-scale integration (VLSI) technology have been developed to address these particular issues. A lossless compression algorithm based on fuzzy decision control and hybrid entropy-coding techniques was proposed in [2] . In [3] , a joint QRS detection and data compression processor was developed. The QRS detector and lossless data compressor shared hardware resources to lower the overall power consumption of the system. A low-complexity lossless compression algorithm based on adaptive trending prediction with two-stage Huffman coding was proposed in [4] . The quality of the reconstructed signal was dependent on the bit compression ratio (BCR). A low BCR is required to balance the compression ratio (CR) and the data recovery performance. A number of compression algorithms based on compressed sensing (CS) have been developed to manage sparse signals, such as electrocardiogram (ECG) signals [5] - [7] . A low-complexity CS technique with prior probability of ECG sparsity in the wavelet domain and a variable orthogonal multiple matching pursuit (vOMMP) algorithm with two phases were also proposed in [5] . A block sparse Bayesian learning framework was proposed in [6] to compress and reconstruct non-sparse fetal ECG recordings to reduce the required code execution speed in the data compression stage, and another approach that combined nearprecise compressed (NPC) and CS algorithms was developed to improve the quality of signal reconstruction [7] . Most of these methods are effective for overcoming the problems pertaining to the hardware-namely, high power consumption, high hardware costs, and large device size. However, the methods only support lossless or lossy compression. In this study, we developed a wearable device involving multiple leads that monitors cardiac electrical activity by using the VLSI technology to facilitate long-term activity monitoring. The proposed compressed architecture is integrated within a single circuit to reduce hardware costs and device size. The proposed architecture allows users to switch between lossless and lossy compression, and the BCR and power consumption can be controlled. The use of multiple circuits for multiple leads to record data increases the required hardware resources, device size, and power consumption. Therefore, we developed a low-cost compression architecture with multiple leads. In the architecture, six leads were compressed using a single circuit. The simulation results in [7] revealed that some regions of an ECG signal are suitable for lossy compression, whereas other regions require lossless compression to achieve a high BCR and high signal-to-noise ratio (SNR). The proposed architecture integrates a lossless algorithm and a lossy algorithm to control the BCR and power consumption.
The paper is organized in the following manner. In section II, the theory underlying the proposed algorithm is presented. Simulation results are presented in section III. A discussion of the proposed device is presented in section IV, and conclusions are summarized in section V.
II. PROPOSED ARCHITECTURE
As presented in Fig. 1 , a compression algorithm was developed that integrates lossless and lossy compression for processing signals obtained from multiple ECG leads. Three two-stage flip-flops were used to divide the input data in the following units: {x 1,n , x 2,n , x 3,n } and {x 4,n , x 5,n , x 6,n }. A BCR adjustment module adjusts the BCR, and a mode selection module was used to manage data allocation to the corresponding compression mode. A sample insertion counter was employed to control a multiplexer and perform data switching to resolve error propagation and allow smallscale adjustment in the BCR. Four data compression modes can be selected based on the data allocation of the mode selection module. In the first step, the six leads can be divided into two groups as follows:
where G k m,n indicates the n-th sample of the m-th lead in group k. After dividing the data into two groups, the samples in group 1 were stored in the first-stage D flip-flop. In the next step, the samples of group 1 were fed into the second-stage D flip-flop, and the samples in group 2 were stored in the firststage D flip-flop. The differences between the present sample and the previous sample of each lead in each group z k m,n can then be calculated as follows: (2) After the difference between the present sample and the previous sample of each lead has been calculated, the different results of the two mapping leads (m = 1) and the base lead (m = 1) can also be calculated as follows:
The BCR adjustment module was used to control the compression mode using the BCR mode signals, as presented in Table 1 . If the values obtained from the leads were less than or equal to the threshold value V Th , then the output mod k j,n must be equal to zero as follows:
The proposed Algorithm 1 presents the data calculation which is performed before data compression. The mapping results from the BCR module-[mod k 1,n , mod k 2,n , mod k 3,n ]-were fed into the mode selection module, where the data is managed and allocated into the corresponding compression mode. Four data compression modes were selected based on the data allocation performed by the mode selection module. The compression mode was derived using the initial two most significant bits (MSBs) of the output data, known as the head, to indicate the compression mode during data decompression. The output data differs based on the compression mode used, and the processing conducted with the four compression modes can be described as follows: (Table 2) , and only two types of input data are stored and compressed in mode 3. For example, if mod k 1,n = mod k 2,n , then the data format is [0 0], and only mod k 1,n and mod k 3,n data are compressed. The bit length frame indicates the necessary bit length of least significant bits (LSB s) that can completely represent data information [3] . For example, bit length = [0 0 0 0] indicates that 2 bits are required, and bit length = [0 0 0 1] indicates that 3 bits are required. During data compression in mode 3, the necessary bit lengths of two LSBs data are calculated individually. If the necessary bits lengths are different, then the longest bit length is selected, and the residual MSBs of the two data are removed. Fig. 2 presents a data compression example conducted in mode 3. In this example, data 1 (necessary bit length of 5) and data 2 (necessary bit length of 4) completely represent two data information. For a bit length of 2, the longest necessary bit length is selected, and LSBs of the two data are retained by referring to the longest necessary bit length. This enables the removal of any residual MSBs that do not carry any information from the two data compression streams. Subsequently, MSB bit extension is applied during the recovery process.
Mode 4: If mod k 1,n = mod k 2,n = mod k 3,n , then the head = [1 1], the output bits stream = [head, bits length, compressed data]. For mode 3, three necessary bit lengths of LSBs data are calculated individually. If the necessary bits lengths are different, then, the longest bit length is selected, and the residual MSB bits of the three data items are removed. Fig. 3 presents a data compression example conducted in mode 4. In this example, data 1 (necessary bit length of 5), data 2 (necessary bit length of 4), and data 3 (necessary bit length of 6) completely represent all the relevant information for the three data items. The necessary LSB lengths for compression were calculated. If the necessary bit lengths are different, then, the longest bit length is selected, and any residual MSBs that do not contain necessary information for the three data items are removed. Algorithm 2 presents the multiple compressions in the modes conducted using the proposed architecture.
Based the simulation results presented in Table 3 , the BCR can be changed using the BCR adjustment module in Algorithm 1 Proposed Architecture Before Compression Input: Input group data G 1 m,n = x 1,n , x 2,n , x 3,n T and G 2 m,n = x 4,n , x 5,n , x 6,n T , where m is the row of vectors G k m,n and x h,n is the n-th sample of h input, BCR mode, inserting count
000: V Th = 0 9.
001: V Th = 0.0009 10.
010: V Th = 0.002 11.
011: V Th = 0.004 12.
100: V Th = 0.008 13.
101: V Th = 0.016 14.
if (zd 
y C1 = B l LSB bits of (S 1 ) , y C2 = B l LSB bits of (S 2 ) , y C3 = B l LSB bits of (S 3 ) , 13.
y m = [head, B l , y C1 , y C2 , y C3 ] 14. Output data: y m the lossy mode. However, the BCR change caused an error propagation problem that occurred in the recovery process. To solve the error propagation problem and enable smallscale adjustment of the BCR, a sample insertion counter was incorporated that fed into mode 4 to compress the original samples when the counter reached the set counting value, as listed in Table 3 . Power consumption can also be controlled by adjusting the BCR and sample insertion counter. Section III lists the power consumed in each of the modes.
The three crucial features of the proposed architecture are listed below:
1. The hardware cost and size can be reduced when the data from the six ECG leads were compressed using the proposed approach. 2. The proposed method enhances the flexibility by enabling the adjustment of the BCR in lossy mode. 3. Power consumption can be controlled by adjusting the BCR. 
III. SIMULATION RESULT
We evaluated the performance of the proposed method using the data obtained from the six leads from the MIT-BIH Arrhythmia Database. SNR (dB) = 20 log 10
where x andx are the original and recovered signals, respectively, and (.) 2 is the 2-norm of the vector. The BCR can be calculated as follows:
where Bo and Bc refer to the bit numbers of uncompressed and compressed data, respectively. fed into mode 4 for compression. When the V Th value was low, most of the results retained their original values, and a lower number of values was changed to zero. The loss in the reconstructed data was small when most data retained their original values. Data recovery performance was better when the SNR values were high and PRD values were low. When the V Th was high, numerous values were changed to zero (i.e., the registers were overwritten by zero). The loss in the reconstructed data increased with an increase in the V Th or PRD or a decrease in the SNR. Data recovery performance was also affected by the number of samples that were inserted. By increasing the number of samples, the error propagation can be increased, thus, resulting in a low SNR and a high PRD. As presented in Figs. 4 and 5, the quality of the reconstructed signal was affected by the V Th and the number of inserted samples. However, this does not imply that a high V Th was futile. The BCR simulation results presented in Fig. 6 display the number of inserted samples for various V Th values. An increase in the V Th implies that a high number of values were set to zero, such that more data is fed into mode 1 for compression. In the lossy mode, a high V Th value caused a high BCR. This implies that the compression efficiency was higher when a high V Th value was used than that when a low V Th value was employed. This suggests that a low V Th value provided superior data recovery performance but a low BCR, and a high V Th value provided a high BCR but a low SNR. These features make the proposed architecture highly flexible because the compression algorithm can be switched between lossless and lossy compression based on the required BCR. We also investigated the difference between the original and recovered signals at various V Th values. Fig. 7 presents the waveforms of the original and the recovered signals when 20 samples were inserted at various V Th values. As presented in Fig. 7 , the recovered signal was similar to the original when the V Th value was low (V Th = V Th1 and V Th = V Th2 ). When the V Th was increased to the V Th3 value, a slight distortion was observed when the original signal was near the T wave. When the V Th was increased to the V Th4 or V Th5 values, obvious distortion was observed with evident deviation from the original signal. As presented in Fig. 7 , most of the deviations were observed around the P or T wave region of the ECG signal. To clarify the deviation between the original and recovery signal region at various V Th values, the squared error with the original and recovered signals was calculated as follows, as shown and in Fig. 8 :
where x andx indicate the original and recovered signals, respectively.
As presented in Fig. 8 , the maximum squared error of V Th1 and V Th2 were approximately 1 × 10 −5 and 2 × 10 −4 , respectively. By increasing the V Th to the V Th3 value, the maximum squared error increased to 0.9 × 10 −3 . By increasing the V Th to the V Th4 and V Th5 values, the maximum squared errors increased to 0.008 and 0.02, respectively. The squared error increased with an increase in the V Th near the P or T wave region; however, the squared error remained low for the flat region of the signal under a high V Th value. Fig. 9 , a low V Th and low inserted sample provided te best recovered performance but a low BCR. By contrast, a high V Th and high inserted sample provided a good BCR but the recovery performance was the worst. This implies that the maximum BCR can be achieved by adjusting the V Th value and inserted samples based on the region to improve data recovery performance. The high V Th and inserted sample can be seleced to achieved a high BCR and low power consumptiuon for long-term monitoring. If some important region of the ECG would affect a physician's judgment regarding which diseases need a high-resolution signal, a low V Th and low inserted sample (or lossless mode) can be selected to achieved high recovery performance. From a performance and data compressed perspective, the BCR and recovery performance can be adjusted by the V Th and inserted sample according to different requirements.
IV. VLSI IMPLEMENTATION AND COMPARISON

A. VLSI IMPLEMENTATION
The proposed architecture core was implemented using register-transfer level (RTL) by employing 0.18 µm CMOS technology (Taiwan Semiconductor Manufacturing Company, Taiwan). By following the RTL synthesis (Synopsys) flow, the Cadence Encounter Digital Implementation (EDI) was used for placement and routing. As the input contained six signals, pad limitation problems were encountered. Therefore, the architecture was implemented using two clock domains to synchronize the input signal with the circuits, as presented in Fig. 10 . To resolve the clock domain metastability problem, a two-stage flip-flop was implemented within the multiclock domain [10] . Fig. 11 presents the photomicrograph of the proposed chip. Table 4 lists the power consumption values obtained using various compression modes in post-layout simulation. The maximum power consumption was 0.55 mW in the lossless mode. We observed a reduction in the power consumption in the lossy mode, where it varied based on the V Th value and the number of inserted samples. For V Th1 and 14 inserted samples, the power consumption was 0.52 mW. For V Th1 and 40 inserted samples, the power consumption was 0.50 mW. The minimum power consumption in the lossy mode was associated with V Th5 and 40 inserted samples, resulting in a power consumption of 0.41 mW. Therefore, the proposed architecture allowed the user to vary the VTh and allowed the inserted sample number to control the power consumption.
B. MEASUREMENT RESULT
Advantest V93000 was used to verify the function of the proposed chip, generate the ECG data input in the proposed chip, and measure the output data from the chip to verify its function. Advantest V93000 was also used to measure the power consumption and operating frequency of the proposed chip. Fig. 12 displays the shmoo plot of the proposed multilead compressor. As displayed in the shmoo plot, the state was passed when the output data and VOLUME 6, 2018 software results were the same. According to the measurement results, an operating frequency of 1 MHz was achieved, with a core voltage of 1.78 V for low power applications. The chip can be driven by a 46-MHz clock frequency at 1.9 V for high-throughput applications. The power consumption of the chip was 0.936 mW based on results obtained with the V93000 equipment. Table 5 lists the hardware specifications of the chip.
C. COMPARISON
In this section, the proposed architecture with existing designs are compared, as presented in Table 6 . Some previous designs have focused on overcoming various hardwarerelated problems and improving the performance of VLSI technology. The ECG encoder proposed in [4] was designed based on the pipeline technology and employs a two-stage entropy encoder through a LUT to improve performance. The method in [7] integrated the NPC and CS algorithms to achieve a high level of reconstruction quality and a high CR while using a lossy compression algorithm. The method in [11] involved a mixed biosignal lossless data compressor that was capable of handling a multichannel electroencephalogram (EEG), ECG, and diffuse optical tomography.
The method proposed in [12] was a lossless compression algorithm that adaptively applied the most suitable prediction method based on the detection of characteristic waveform features. The method in [13] was a multifunction microcontrol unit with a slope-feature forecaster, register bank, reconfigurable filter, and UART interface to ensure compatibility. The lossless ECG compression algorithm in [14] was developed using fuzzy-based PSO prediction and Huffman region entropy-coding techniques to achieve higher compression rates and lower hardware costs.
The CR of the proposed architecture was not the highest among the CR values for all compared methods. However, most other designs support only one channel. The hardware cost, power consumption, and size increased linearly with a linear increase in the number of input data. The proposed architecture enabled the compression of multiple ECG signals while achieving lower power consumption and lower cost. Moreover, in the proposed method, discrete lossless and lossy compression modes were used to control the power consumption. The proposed architecture has superior flexibility because the compression algorithm can switch between the lossless and lossy compression modes based on the required BCR. A comparison of the aforementioned 67298 VOLUME 6, 2018 methods with the proposed architecture led to the following conclusions:
1. The proposed architecture supported the compression of six leads within a single circuit and had the lowest average gate count per lead among the compared architectures. Thus, reduced hardware costs and smaller chip sizes can be achieved in compression multiple leads with the proposed architecture. 2. The CR of the proposed architecture was lower than that of the lossless algorithms presented in [4] , [12] , and [14] . However, previously proposed architectures only supported one channel and caused an increase in the power consumption, gate count, CR, and overall size of the device with an increase in the number of inputs. The proposed algorithm supports six leads to compress data, thus, eliminating the need to increase hardware resources. 3. The proposed compression algorithm was more flexible than the methods described in [4] , [7] , and [11] - [14] , and enabled switching between lossless and lossy compression algorithms to control the power consumption, accuracy (SNR), and BCR.
Some potential methods for reducing the power consumption of each lead include:
• Switching between lossless and lossy compression algorithms: The power consumption reduction in the lossy mode varied according to the V Th value and number of inserted samples, as presented in Table 4 . The simulation results displayed in Fig. 8 indicated that different regions of a lead can be compressed using different lossless/lossy modes to improve the power consumption and obtain a high-quality recovery signal and high BCR. Therefore, switching between the lossless and lossy modes is one potential method of reducing power consumption.
• Increasing the threshold value (V Th ) and number of inserted samples: As presented in Table 4 , although the power consumption was reduced by increasing V Th and the number of inserted samples, the recovery performance was the lowest in the region of the lead with the most changes. Therefore, the flat region can be compressed using a high V Th or high number of inserted samples to maintain a high BCR. By contrast, the region of a lead with large changes can be compressed using a low V Th and small number of inserted samples to perform high recovery performance.
V. CONCLUSION
In this study, a novel microcompressor was developed that supports multiple leads and provides the ability to switch between lossless and lossy compression algorithms. The BCR and power consumption can also be controlled in our proposed architecture. The effectiveness of this method was verified by fabricating a chip using 0.18µm CMOS technology. 
