Abstract-Most of the abnormal cardiac events such as myocardial ischemia, acute myocardial infarction (AMI) and fatal arrhythmia can be diagnosed through continuous electrocardiogram (ECG) analysis. According to recent clinical research, early detection and alarming of such cardiac events can reduce the time delay to the hospital, and the clinical outcomes of these individuals can be greatly improved. Therefore, it would be helpful if there is a long-term ECG monitoring system with the ability to identify abnormal cardiac events and provide realtime warning for the users. The combination of the wireless body area sensor network (BASN) and the on-sensor ECG processor is a possible solution for this application. In this paper, we aim to design and implement a digital signal processor that is suitable for continuous ECG monitoring and alarming based on the continuous wavelet transform (CWT) through the proposed architectures -using both programmable RISC processor and application specific integrated circuits (ASIC) for performance optimization. According to the implementation results, the power consumption of the proposed processor integrated with an ASIC for CWT computation is only 79.4µW. Compared with the single-RISC processor, about 91.6% of the power reduction is achieved.
I. INTRODUCTION
The surface electrocardiogram (ECG) is the electrical recording of the heart behavior, and is also a significant noninvasive tool for diagnosis of many heart diseases. With the aid of continuous ECG monitoring, these abnormal cardiac events are possible to be detected early before evolving into serious phase. For example, for patients with acute myocardial infarction (AMI), there are often changes in their ECG several hours before the fully onset of the fatal arrhythmia. Therefore, there may be great improvement in the clinical outcomes of these patients if these changes can be identified before truly severe situations [1] .
In the past, the measurement of ECG can only be taken in the hospitals, and it is difficult to realize long-term ECG monitoring outside of the clinical environment. Some portable devices such as ECG holters can record the ECG signal lasting for about 24 hours, but the form factor issue makes it inconvenient for the users to carry such devices in their daily life. The emerging wireless body area sensor network (BASN) adopts the local sensor nodes and is able to provide long-term monitoring of biosignals [2] . However, to achieve a practical energy efficiency for the wireless sensors, the application-oriented on-sensor signal processing units are often required [3] . Several on-sensor ECG processors have been proposed in recent years. Some of the designs can achieve the low-power requirement by using application specific integrated circuits (ASIC), but the provided functions are still limited [4] . Some of the others use the general purpose processor (GPP) to support multiple functions, but the power consumption often increases as well [5] . The design of a low-power multi-functional on-sensor digital signal processor remains to be a challenge.
Recently continuous wavelet transform (CWT) has found to be an useful tool for ECG signal processing [6] . Several CWT-based algorithms for either ECG delineation or feature extraction for various heart diseases have been proposed and discussed [7] [8] . In this work we aim to design a CWTbased processor that is suitable for realtime ECG analysis and abnormal cardiac event detection. The organization of the paper is as follows. In section II, the background knowledge of CWT and ECG processing is briefly reviewed. Section III describes the proposed hardware architecture for the CWTbased processor. The implementation results and discussion will be given in section IV. Finally, section V summarizes this work and gives the conclusion.
II. CONTINUOUS WAVELET TRANSFORM AND ECG PROCESSING

A. Continuous Wavelet Transform (CWT)
The continuous wavelet transform (CWT) of a continuous time signal x (t) is defined as
where ψ * (t) is the complex conjugate of the mother wavelet ψ (t), b is the location of the wavelet, and a is the scaling parameter. X w (a, b) is considered to be the time-frequency representation at location b and scale a. Generally, choosing a larger value of a means to analyze the signal using a wider basis function, and the corresponding coefficients can reveal 
B. CWT and ECG Analysis
Due to its advantages on time-frequency analysis, CWT has been widely adopted for biosignal processing, and becomes popular for ECG analysis in recent years [6] . One of the major applications of CWT is ECG beat detection and delineation [7] . The delineation procedure aims to extract the locations of ECG feature points such as QRS complex, Pwave and T-wave. Many significant clinical parameters can be calculated after these feature points are retrieved. Figure 1 illustrates an example of the ECG signal with marks of some standard ECG parameters. According to the clinical research, the temporal features of ECG, such as RR interval, heart rate variability (HRV) and QT interval can reveal important information of the cardiac status of an individual. For instance, the HRV is thought to reflect the heart's ability to adapt to changing circumstances, and is an useful index to analyze personal health status [9] . The QT interval is also viewed as an important parameter for risk stratification of sudden cardiac death (SCD) [10] .
In addition to ECG delineation, recent research in ECG processing has found CWT useful for the detection and classification of various cardiac diseases. The algorithm for accurate detection of microvolt T-wave alternan (TWA) -a biomarker for identifying patients at high risk of VT and VF is proposed in [8] . A pilot study using CWT to observe changes of cardiac activity during transient myocardial ischemia is reported in [11] . Other CWT-based methods for advanced classification of arrhythmia can also be found in [12] . Since CWT is such an useful tool for analyzing a variety of heart diseases, a CWT-based ECG processor is a suitable choice to support the multiple functions required by the ECG monitoring and alarming system.
III. ARCHITECTURE DESIGN AND IMPLEMENTATION
A. Hardware Requirements
As described in section I, there are generally two choices for the implementation of the digital signal processor -by using GPP or by using ASIC. The GPP is a programmable processor that is able to provide the flexibility for customized situations. The ASIC usually gives higher performance than the GPP, but lacks of the flexibility. As for ECG or biosignal processing, the flexibility of the processor is often preferred due to the inter-individual variations. Therefore, the GPP seems to be a proper candidate because it preserves the programmability of the provided functions, and is able to adjust to different conditions. On the other hand, the power consumption and the area cost of the on-sensor signal processor should be practically small to support long-term monitoring, and the ASIC architecture usually performs better than GPP in this aspect. According to our previous software simulation, the bottleneck of most CWT-based algorithms for ECG analysis lies in the computation of CWT coefficients. Take the CWTbased R-peak detection algorithm [7] as an example, the runtime spent on computing the CWT coefficients consumes about 98.2% of the total amount. If the computation of CWT can be accelerated, the efficiency of the hardware should be improved. Figure 2 illustrates the proposed architecture of the CWTbased processor. There are two major parts in this architecture: the ASIC accelerator for CWT computation, and the OpenRISC processor for application-oriented processing. The OpenRISC processor is a GPP based on the OpenRISC core, OR1200, which is a 32-bit scalar RISC with Harvard microarchitecture [13] . Here we use the OpenRISC processor for its programmability to support multiple functions based on CWT, such as feature extraction and disease classification.
B. Proposed Architecture for the CWT-based ECG Processor
As shown in fig. 2 , the ASIC accelerator will read the input ECG data and compute the CWT data of the required scales in a realtime. Since the calculation of CWT is a convolutionbased procedure, a multiply-accumulate (MAC) structure is adopted. The input ECG sample is first stored in an input buffer for delay and shifting. Next, the ASIC accelerator will calculate the CWT data for each of the required scales serially. During the computation of one scale, the controller of the ASIC will fetch the CWT coefficients from the look-up table as well as the corresponding ECG data from the input SRAM, and send the data to the MAC for computation. After all the required number of MAC operations of the current scale is finished, the accumulated CWT data is sent to a data SRAM for storage. The ASIC will then start computing the next scale.
If the computations for all the required CWT scales are finished, the controller will send a flag as well as the CWT data to the OpenRISC processor in the next stage through Fig. 3 . The timing diagram of the ASIC for CWT computation. For each scale n, the c n j represents the j th wavelet coefficient, and a n indicates the index of last coefficient at this scale. During the computation of CWT data, the x nk is the ECG data to be multiplied with the k th wavelet coefficient.
the wishbone bus. The OpenRISC processor will then check the flag and fetch the CWT data for the processing of the programmed functions. Finally, the output features and detection results computed by the OpenRISC processor are sent to an output SRAM for either local storage or further transmission through the wireless module for remote use.
C. Limitations of the Proposed Architecture
There are some limitations for the proposed architecture using the ASIC accelerator. First of all, there is a trade-off between the supported CWT scales and the memory usage. Due to the delay property of CWT computation between different scales, the size of the input buffer as well as the SRAM used for intermediate CWT data storage are proportional to the number of supported scales of the design. That is, if a larger scaling parameter a in eq. (1) is used, the size of the SRAMs will increase as well. For instance, in order to support all the first 8 scales of CWT computation, about 7.2kb of SRAM will be required.
Another limitation is about the minimum operation frequency of the ASIC accelerator. Figure 3 shows the timing diagram of the ASIC accelerator during CWT computation. Since the CWT data of different scales are computed serially, the minimum operation frequency for realtime computation is related to the total number of coefficients for multiplication among all required scales. That is, if more scales are used, the requirement of the operation frequency will increase. However, in most of the tested algorithms using the proposed architecture, we found that the lower bound of the operation frequency often comes from the RISC processor. For example, suppose the sampling frequency of the input ECG signal is 250Hz, to support a ECG R-peak detection algorithm using one scale of CWT, the required operation frequency of the ASIC accelerator is about 10kHz. However, the operation frequency for the RISC processor to provide realtime detection results is about 250kHz. In addition, compared with the situation using only one RISC processor for the computation of one CWT scale, the required operation frequency is 4MHz. That is, the ASIC can accelerate about 400 times of the computation speed. Therefore in most cases, the reduction in operation frequency and power consumption by adopting the proposed architecture can be expected. Table I summarizes the implementation results of the CWT-based processor adopting the proposed architecture using UMC 90nm low-leakage CMOS process. For the ASIC accelerator, the used logic gate is 30.3k and the memory usage is 3.3kb. At most 4 scales can be supported and the maximum a is no larger than 16 in this case. As for the RSIC processor, 34.2k of the logic gate is used and the required memory is 18KB, including the instruction, data, and output memory. The power consumption is 16.7µW for the ASIC and 62.7µW for the OpenRISC processor respectively, under an operation frequency of 400kHz. As discussed in section III-C, the operation frequency is related to the required number of CWT scales or the bottleneck of the RISC processor, depending on the working conditions of the target applications.
IV. IMPLEMENTATION RESULTS AND DISCUSSION
A. Implementation Results
B. Case Analysis: ECG Delineation and Classification
In order to demonstrate the feasibility of the proposed architecture, we choose the ECG delineation and classification procedures for performance evaluation. Figure 4 illustrates the algorithm flow used for testing. The ECG signal is first processed by the CWT stage to get the CWT data. In the feature extraction stage, the CWT data is used for ECG beat detection and delineation to extract the PQRST information. After the feature points are extracted, the temporal and morphological ECG features such as RR interval, HRV, QT interval and T-wave amplitude can be calculated directly. The following decision making stage then exploits some clinical knowledge-based rules to identify abnormal cardiac events.
Since the feature values would greatly affect the detection results, the accuracy of the delineation results should be high enough. In our test, two scales of CWT using Mexican hat function as the mother wavelet are used to provide robust ECG beat detection and PQRST extraction. For the ECG signal with the sampling rate of 250Hz, we choose scale 4 for QRS complex detection and scale 8 for T-wave and Pwave delineation. We first search the QRS complex at scale 4 by finding the modulus maxima, and then search the Pwave and T-wave at scale 8 within a certain temporal range according to their physiological characteristics. According to our software simulation results on the standard MIT-BIH QT-database [14] , this algorithm is proven to provide average 97.9% accuracy for the PQRST delineation of ECG signals.
The delineation and classification procedures are tested on both the CWT-based processor using the proposed architecture and the baseline single-RISC processor. For the proposed processor, some parameters such as the required CWT scales is first configured to the ASIC accelerator to decide the working flow. The algorithm for feature extraction and decision making are first described by the C language, compiled to machine codes, and then programmed to the OpenRISC processor through the wishbone bus. As for the baseline single-RISC processor, all stages of the algorithm are programmed and tested on one OpenRISC processor. An example of the testing results is illustrated in fig. 5 . Table II gives a comparison of the implementation results of the baseline OpenRISC processor and the proposed processor. The ECG delineation algorithm using two scales of CWT described above is adopted for testing. The sampling rate of the input ECG signal is set to be 250Hz with 12-bit resolution. Under this testing condition, the required operation frequency to meet the realtime requirement for the single-RISC processor is 10MHz, and the power consumption is 943.7µW. As for the processor adopting the proposed architecture, the required operation frequency is 400kHz, and the overall power consumption is 79.4µW.
C. Comparison
Compared with the single-RISC processor, the proposed processor can reduce 91.6% of the power consumption. Besides, although one additional ASIC is integrated in the proposed processor, the total amount of area cost slightly decreases in this case. This is because the required memory usage is greatly reduced. For the single-RISC processor, totally 36KB of SRAM is required for the instruction and data memory. However in the proposed processor integrated with the ASIC, only 3.3kb is used by the ASIC for CWT computation, and the required memory in the RISC processor for feature extraction and decision making is reduced to 18KB. That is, about 48.9% of the memory usage can be saved by adopting the proposed architecture in this case.
V. CONCLUSION In this work we try to design and implement a CWT-based multi-functional processor for long-term ECG monitoring and realtime alarming. The processor adopting the proposed architecture integrated with an ASIC accelerator in the CWT stage can increase the speed for computation, and the overall hardware performance can be improved. According to the implementation results, about 91.6% of the power reduction and 48.9% of memory saving can be achieved compared with the single-RISC processor. Therefore, the proposed architecture is considered to be more feasible for the realization of the on-sensor continuous ECG monitoring system.
