Abstract-Tiny embedded systems have not been an ideal outfit for high performance computing due to their constrained resources. Limitations in processing power, battery life, communication bandwidth, and memory constrain the applicability of existing complex medical analysis algorithms such as the Electrocardiogram (ECG) analysis. Among various limitations, battery lifetime has been a major key technological constraint. In this paper, we address the issue of partitioning such a complex algorithm while the energy consumption due to wireless transmission is minimized. ECG analysis algorithms normally consist of preprocessing, pattern recognition, and classification. Considering the orientation of the ECG leads, we devise a technique to perform preprocessing and pattern recognition locally in small embedded systems attached to the leads. The features detected in the pattern recognition phase are considered for the classification. Ideally, if the features detected for each heartbeat reside in a single processing node, the transmission will be unnecessary. Otherwise, to perform classification, the features must be gathered on a local node and, thus, the communication is inevitable. We perform such a feature grouping by modeling the problem as a hypergraph and applying partitioning schemes which yield a significant power saving in wireless communications. Furthermore, we utilize dynamic reconfiguration by software module migration. This technique, with respect to partitioning, enhances the overall power saving in such systems. Moreover, it adaptively alters the system configuration in various environments and on different patients. We evaluate the effectiveness of our proposed techniques on MIT/BIH benchmarks and, on average, achieve 70 percent energy saving.
INTRODUCTION
T HE electrocardiogram (ECG) is the record of variation of bioelectric potential with respect to time as the human heart beats. Due to its ease of use and noninvasiveness, ECG plays an important role in patient monitoring and diagnosis. Multichannel electrocardiogram (ECG) data provide cardiologists with essential information to diagnose heart disease in a patient. Our primary objective is to address the feasibility verification of implementing an ambulatory ECG analysis algorithm with real-time diagnosis functions for wearable computers. ECG analysis algorithms have always been very difficult tasks in the realization of computeraided ECG diagnosis. Implementation of such algorithms becomes even harder for small and mobile embedded systems that should meet the given latency requirements while minimizing overall energy dissipation for the system. Distributed embedded systems are successfully deployed in various wearable computers. Distributed architectures have been developed for cooperative detection, scalable data transport, and other capabilities and services. However, the complexity of algorithms running on these systems has introduced a new set of challenges associated with resource constrained devices and their energy concerns. These obstacles may dramatically reduce the effectiveness of embedded distributed algorithms. Thus, a new distributed, embedded, computing attribute, dynamically reconfigurable, must be developed and provided to such systems. In these systems, reconfiguration capability, in particular, may be of great advantage. This capability can adaptively alter the system configuration to accommodate the objectives and meet the constraints for highly dynamic systems.
There have been exciting advances in the development of pervasive computing technologies in the past few years. Computation, storage, and communication are now more or less woven into the fabric of our society with much of the progress being due to the relentless march of Silicon-based electronics technology as predicted by Moore's Law. The emerging field of flexible electronics, where electronic components such as transistors and wires are built on a thin flexible material, offers a similar opportunity to weave computation, storage, and communication into the fabric of the very clothing that we wear, thereby creating an intelligent fabric (also called electronic textiles or e-textiles) [1] . The implications of seamlessly integrating a large number of communicating computation and storage resources, mated with sensors and actuators, in close proximity to the human body are quite exciting; for example, one can imagine biomedical applications where biometric and ambient sensors are woven into the garment of a patient to trigger and modulate the delivery of a drug. Realizing such novel applications is not just a matter of developing innovative materials for flexible electronics, along with accompanying sensors and actuators; the characteristics of the flexible electronics technology and the requirements of the applications enabled by it necessitate radical innovation in system-level design. Electronic components built of flexible materials have characteristics that are very different from that of silicon and PCB-based electronics. Further, the operating scenarios of these systems involve environmental dynamics, physical coupling, resource constraints, infrastructure support, and robustness requirements that are distinct from those faced by traditional systems. This unique combination requires one to go beyond thinking of these systems as traditional electronic systems in a different form factor. Instead, rethinking and a complete overhaul of the system architecture and the design methodology for all layers of these systems is required.
RELATED WORK
Several "wearable" technologies exist to continually monitor a patient's vital signs, utilizing low cost, well-established disposable sensors such as blood oxygen finger clips and electrocardiogram electrodes. The Smart Shirt from Sensatex [2] is a wearable health monitoring device that integrates a number of sensory devices onto the Wearable Motherboard from Georgia Tech [3] . The Wearable Motherboard is woven into an undershirt in the Smart Shirt design. Their interconnect is a flexible data bus that can support a wide array of sensory devices. These sensors can communicate via the data bus to a monitoring device located at the base of the shirt. The monitoring device is integrated into a single processing unit that also contains a transceiver. Several other technologies have been introduced such as MIThril from MIT [4] , e-Textile from Carnegie Mellon University [5] , Wearable e-Textile from Virginia Tech [6] , and CustoMed and RFab-Vest from UCLA [7] , [8] . The Lifeguard project being conducted at Stanford University is a physiological monitoring system comprised of physiological sensors (ECG/Respiration electrodes, Pulse Oximeter, Blood Pressure Monitor, Temperature probe), a wearable device with built-in accelerometers (CPOD), and a base station (Pocket PC). The CPOD acquires and logs the physiological parameters measured by the sensors [9] . The Assisted Cognition Project conducted at the University of Washington's Department of Computer Science explored the use of AI systems to support and enhance the independence and quality of life of Alzheimer's patients. Assisted Cognition systems use ubiquitous computing and artificial intelligence technology to replace some of the memory and problem-solving abilities that have been lost by an Alzheimer's patient [10] . Nevertheless, none of the above projects/systems supports the concept of scalability and adapting complex processing algorithms.
AUTOMATED FEATURE SET DETECTION
Given the goal of classifying objects based on their attributes, the functionality of an automated pattern recognition system can be divided into two basic tasks: The description task generates attributes of an object using feature extraction techniques, and the classification task assigns a group label to the object based on the attributes with a classifier.
There are two different approaches for implementing a pattern recognition system: statistical and structural. Each approach utilizes different schemes within the description and classification tasks which incorporates a pattern recognition system. Statistical pattern recognition [11] , [12] concludes from statistical decision theory to discriminate among data from different groups based upon quantitative features of the data. The quantitative nature of statistical pattern recognition, however, makes it difficult to discriminate among groups based on the morphological (i.e., shapebased or structural) subpatterns and their interrelationships embedded within the data. This limitation provided the impetus for development of structural approaches to pattern recognition.
Structural pattern recognition [13] , [14] relies on syntactic grammars to discriminate among data from different groups based upon the morphological interrelationships (or interconnections) present within the data. Structural pattern recognition systems are effective for image data as well as time-series data.
We have investigated an accurate ECG processing algorithm based on structural pattern recognition (as depicted in Fig. 1 ) mapped onto our processing units (dot-motes) [15] . The algorithm consists of three stages: preprocessing, pattern recognition, and classification. We perform preprocessing and pattern recognition locally, i.e., within close proximity to the ECG leads. The preprocessing includes filtering, while the pattern recognition includes heartbeat detection (through the QRS complex detection), segmentation, as well as feature extraction. Once the features are extracted, they will be processed for classification.
The filtering is performed by finite impulse response (FIR) filters with cut-off frequencies of 5-150 Hz for a sampling rate of 360 samples/sec. The heartbeat detection is implemented with a QRS detector based on the algorithm of Pan and Tompkins [16] with some improvements that employ slope information. The scheme proposed by Laguna et al. [17] is used to extract the fiducial points. All offset and onset points are detected based on the location and convexity of the R point. We detect each point onset by locating the largest isoelectric region before the point. Then, we search for the inflection point followed by largest negative slope for convex R-wave or largest positive slope for concave R-wave. We also detect the point offset by searching for significant up slope following the end of the last down slope for P, T, and S offsets in particular. Consequently, features related to heartbeat intervals and ECG morphology are calculated for each heartbeat. The list of features is included in Table 1 and are based on [18] and [19] with minor additions. In addition, a sample filtered ECG signal which was automatically segmented by our tool is depicted in Fig. 2 .
We extract a total of 23 features from the ECG signals, and each derives from one of the groups below:
RR Interval Features: We extract four features based on RR Intervals. The RR interval is the interval between two successive heartbeat fiducial points, obtained from the maximum of the R-wave. The pre-RR interval is the RRinterval between a detected heartbeat and the previous one. The post-RR interval is the interval between a given heartbeat and next detected one. The average-RR interval is the average of all detected RR intervals, and the local average-RR interval is the average of the 10 most recent RRintervals.
Heartbeat Interval Features:
We extract five features related to heartbeat intervals. QRS duration is the time between QRS offset and QRS onset. T-wave duration is the time between T-wave onset and T-wave offset. The PR, ST, and QT duration are additions to the automated classification system. ST duration is the time between S-wave offset and T-wave onset. The PR duration is the time between Pwave onset and R. The QT duration is the time between Qwave onset and T-wave offset. All of these features are obtained by first determining the start and end point of each interval, and then subtracting the end point from the start point.
Geometric Points: We calculate the signal DC shift level by taking the average base line of the previous five successive detected heartbeats. The maximal positive and the minimal negative peaks are detected by computing the voltage difference between each sample in the heartbeat and DC shift level. In addition, we extract the number of samples in a 70-100 percent range of absolute peak value. Finally, we compute the slope velocity of Q-onset-R as well as R-S segments.
ECG Morphology Features: We extract eight features based on ECG morphologies arranged into four groups. Two groups consist of samples from heartbeat segments and two groups consist of samples from fixed intervals.
Within each group, one feature consists of samples from the original ECG signal, while the other feature is extracted from the normalized ECG signal. The normalization is done through scaling down the amplitude of samples by standard deviation of the same heartbeat.
We extract samples from heartbeat segments in ECG morphology 1 and 2 (see Fig. 3 ). In morphology 1, 10 samples between QRS onset and offset are extracted, and in morphology 2, nine samples between S-wave offset and T-wave offset are obtained. The number of samples collected is also contingent upon the sampling rate and scales with various sampling rates accordingly (the aforementioned numbers are for the original sampling rate of 360 samples per second).
We extract samples from a fixed interval in ECG morphology 3 and 4 (see Fig. 4 ). In morphology 3, 10 samples between R À 50ms and R þ 100ms are extracted, and in morphology 4, eight samples between between R þ 150ms and R þ 500ms are acquired.
For all ECG morphologies, the elements that fall in between two samples are estimated using linear polarization. We have repeated such feature extraction for three input sampling rates of 360, 200, and 100 samples per second. Three hundred and sixty samples/second is the original sampling rate for the MIT/BIH [20] benchmarks and the sampling rates of 200 and 100 samples/second was acquired by downsampling the input.
Despite our objective is to minimize the communication among processing nodes before the classification phase, this study does not investigate the problem of classification. Therefore, we did not implement a classifier for our platform. However, any classifier suitable for constrained embedded systems may be deployed.
SOFTWARE PROFILING
To measure the execution delay of our heartbeat detection and feature extraction program, we used Avrora [21] , a microcontroller simulator framework developed at the University of California, Los Angeles. Avrora is a precise and flexible simulator that preserves all timing and behavior of the instrumented program, while allowing user-defined profiling of application information. With Avrora, users can easily profile application-specific information such as branch frequency, maximum stack size, and memory access by adding custom program monitors.
For our experiments, we implemented a program monitor on Avrora that generates the control flow graph (CFG) while measuring the execution frequency and delay of each basic block. Since the CFG of our system is very large, only the major processes are shown in Fig. 5 . The CFG is dynamically generated based upon our compiled and assembled ECG program, while Avrora simulates the program execution. Unlike static analysis, parts of a program that are not executed during the simulation will not be accounted for. Also, the generated graph will accurately reflect compiler optimizations. Hardware interrupts, which occur intermittently during execution, are accounted for as well.
Delay analysis for each function is performed during CFG generation for practicality since basic block information may be too detailed. Function delay is measured as the duration when execution enters a function to when execution exits. Calls to other functions are accounted for, while interrupts are not. Since execution delay may be inconsistent due to functions containing different execution paths, the average function delay is gathered from each execution instance. However, for our purposes, the functions that extract features from heartbeat signals all consist of a single execution path. Therefore, we lost no precision in our analysis. The delays of feature detection modules for sampling rate of 360 samples/ second are illustrated in Fig. 6 .
TARGET ARCHITECTURE MODEL
Networked sensor nodes containing constrained, often battery-powered, embedded computers can densely sample phenomena that were previously difficult or costly to observe. Sensor nodes can be placed anywhere on a patients' body. Due to the mobility of such systems, wireless sensor networks are expected to be both autonomous and long-lived, surviving environmental hardships while conserving energy as much as possible.
It is well-known that the amount of energy consumed for a single wireless communication of one bit can be many orders of magnitude greater than the energy required for a single local computation [22] . Thus, we focus on the energy used for wireless communication. In our model, since all nodes are placed within close proximity of each other, we assume they communicate directly and multihop communication is not required. Therefore, the total energy consumed for in-network processing is:
where bðnÞ is the number of packets transmitted and eðnÞ is the average amount of energy required to transmit one packet. In our design, we consider the Collision Free Model (CFM), which simplifies the programming by abstracting out all of the details of low level channel contention and packet collision from the algorithm designers. By abstracting reliable communication as an atomic operation, programming based on CFM bears a resemblance to existing algorithm design in parallel and distributed computation. CFM does not really capture the impact of packet collision that distinguishes wireless communication from wired communication, which makes performance analysis under CFM not very accurate. However, for the sake of simplicity, we consider CFM in our design.
DYNAMIC RECONFIGURATION
Sensor nodes are composed of embedded systems as well as general-purpose software, introducing a tension between resource and energy constraints and the layers of indirection required to support true general-purpose operating systems. TinyOS [23] , the state-of-the-art sensor operating system, tends to prioritize embedded system constraints over general-purpose OS functionality. TinyOS consists of a collection of software components written in the NesC language [24] , ranging from low-level parts of the network stack to application-level routing logic. Our target operating system, SOS, is a new operating system for mote-class sensor nodes that takes a more dynamic point on the design spectrum [25] . SOS consists of dynamically-loaded modules and a common kernel, which implements messaging, dynamic memory, and module loading and unloading, among other services. Dynamic reconfigurability is one of our primary assumptions. In the domain of embedded computing, reconfigurability is the ability to modify the software on individual nodes of a network after the network has been deployed and initialized. This provides the ability to incrementally update the sensor network after it is deployed, add new software modules, and remove unused software modules when they are no longer needed. The growing tensions between large, hard to update networks and complex applications with incremental patches has made reconfigurability an issue that can no longer be ignored. SOS supports a mechanism that enables over the air reprogramming of the sensor nodes. Using this method, software modules may be modified, added, or removed.
FEATURE SET PARTITIONING
A hypergraph is a generalization of a graph, where the set of edges is replaced by a set of hyperedges. A hyperedge extends the notion of an edge by allowing more than two vertices to be connected by a hyperedge. Formally, a hypergraph H ¼ ðV ; E h Þ is defined as a set of vertices V and a set of hyperedges E h , where each hyperedge is a subset of the vertex set V [26] , and the size a hyperedge is the cardinality of this subset. Let w i denote the weight of vertex v i 2 V . A K-way vertex partition Å ¼ fV 1 ; V 2 ; . . . ; V k g of H is said to be balanced with an overall load imbalance tolerance ( 1 if each V i satisfies the following equation:
where
In a partition of H, a hyperedge that has at least one vertex in a partition is said to connect that partition. Connectivity set Ã j of a hyperedge e j is defined as the set of partitions connected by e j . Connectivity j ¼ jÃ j j of a hyperedge e j denotes the number of partitions connected by e j . A hyperedge h j is said to be cut (external) if it connects more than one partition (i.e., j > 1), and uncut (internal) otherwise (i.e., j ¼ 1). Therefore, the definition of cut-size is as follows:
Hence, the cutsize is equal to the number of cut nets. The hypergraph partitioning is defined as dividing it into two or more parts such that the cutsize is minimized, while a given balance criterion among the partition weights is achieved. The hypergraph partitioning problem is known to be NPhard [27] .
During the software partitioning, it is quite important to be able to divide the system specification into clusters so that the intercluster (intermote) connections are minimized. Hypergraphs can be used to naturally represent feature extraction algorithms. The vertices of the hypergraph are modeled as features, their weights represent the computational time required for features detection, and the hyperedges resemble the number of times a set of features is triggered simultaneously. Partitioning the graph such that the cut-size is minimized while the partitions are balanced can reduce the communication that is required among various processing units for classification phase. The vision is that all features selected must be classified at a local node, thus, in the events where selected features reside on distributed nodes, internode communication is inevitable. A high quality hypergraph partitioning algorithm greatly affects the feasibility, quality, and the cost of the resulting system.
We employed a hypergraph partitioning algorithm that is based on the multilevel paradigm. In the multilevel paradigm, a sequence of successively coarser hypergraphs is constructed. A bisection of the smallest hypergraph is computed and used to obtain a bisection of the original hypergraph by successively projecting and refining the bisection to the next level finer hypergraph. We have used hMETIS, a program for partitioning hypergraphs implemented for PCs [28] . The same algorithm can be easily ported on a mobile computer such as a Pocket PC to facilitate dynamic reconfiguration. The vision is that the hypergraph information is collected real-time from the processing nodes of the wearable computer. Subsequently, the algorithm running on the motes are reconfigured. The number of partitions is determined as described below:
The preprocessing tasks, as well as pattern recognition, must be completed before the next heartbeat arrives. Let the heartbeat be N beats per minute (bpm). Therefore, the heartbeat rate period can be obtained from:
Let the time required for preprocessing and pattern recognition be t pre and t recog , respectively. t pre þ t recog < Â ðT heartbeat Þ; where < 1: ð7Þ The factor is selected to be 0:9 to ensure a margin that prevents overloading the processing units. Therefore, the maximum CPU time that may be assigned to pattern recognition is Â ðT heartbeat Þ À t pre , where t pre is fixed and can be computed from the profiling stage. As described earlier, the weight on vertices represents the required computational time for each feature. In addition, W k is already outlined in (4) . Therefore, the following objective should be accommodated:
To determine the value of K, we consider the total time required for pattern recognition on all features, T recog (extracted from profiling analysis). It is trivial that the lowerbound on K can be obtained from the following equation:
Once partitioning is performed based on the value of K, the solution may be imbalanced and violates the constraint described in (8) . In this case, K must be incremented and the features are repartitioned until a feasible solution is determined.
SIMULATION ANALYSIS
This section presents various simulation analysis performed to exhibit the effectiveness of our technique. All experiments were carried out with ECG signals from MIT-BIH Arrhythmia database. The MIT-BIH Arrhythmia database contains 48 half-hour excerpts of two-channel ambulatory ECG recordings, obtained from 47 subjects studied by the BIH Arrhythmia Laboratory between 1975 and 1979. The recordings were digitized at 360 samples per second per channel with 11-bit resolution over a 10 mV range. We used all 48 complete records freely available from PhysioNet [29] . We also repeated the experiments by downsampling all the benchmarks to 200 and 100 samples per second. As illustrated in Table 2 , each MIT-BIH record has the recordings of two channels. Yet, we only used the first channel. The second channel was not used for the sake of simplicity. Originally, in MIT/BIH benchmarks, the electrodes placed on the chest were selected due to their small noise level. We performed profiling analysis on the algorithm described in Section 3 using Avrora to compute the computational delay of feature detection modules. The ECG algorithm was ported both for dot-motes (SOS) and PCs. The algorithm for PC was written in C language. The simulation for feature and hypergraph extraction was done on PC due to a number of software instability that we encountered in SOS. As for hypergraph partitioning, we utilized hMETIS. The MIT/BIH benchmarks were used with three sampling rates as illustrated in Tables 3, 4 , and 5. The original sampling rate was 360 samples/sec while 200 and 100 samples/sec were acquired by downsampling the data. In Tables 3, 4 , and 5, two scenarios for configuration were considered. In one scenario, features were adaptively assigned to processing units based on hypergraph partitioning (adaptive partitioning). In the other scenario, the optimized configuration was determined using hypergraph partitioning on benchmark 100 and remained fixed throughout our experiments (fixed configuration). The number of partitions were obtained from (9) for each benchmark. Table 3 figures the number of queries exchanged in both scenarios. Considering that the experiments were carried out through simulations, we were unable to measure the wireless power consumption. However, given the number of features we examined-23, each query may be incorporated in a wireless packet of dotmotes (30 bytes). Therefore, taking into account (1), the wireless power consumption is proportional to the number of queries exchanged. On average, the communication energy consumption was reduced by approximately 70 percent in all sets of experiments. The wireless communication overhead for partitioning was negligible due to the small size, sparsity, and slowly changing nature of our hypergraphs. The reconfiguration was performed only once for each benchmark. Therefore, its effect on the performance of the system was negligible. His recent research interests lie in the area of embedded and reconfigurable computing, VLSI CAD, and design and analysis of algorithms. Dr. Sarrafzadeh is a fellow of the IEEE for his contribution to "Theory and Practice of VLSI Design." He is also a member of the IEEE Computer Society. He received a US National Science Foundation Engineering Initiation award, two distinguished paper awards in ICCAD, and the best paper award in DAC. He has served on the technical program committee of numerous conferences in the area of VLSI Design and CAD, including ICCAD, DAC, EDAC, ISPD, FPGA, and DesignCon. He has served as a committee chair of a number of these conferences. He is on the executive committee/steering committee of several conferences such as ICCAD, ISPD, and ISQED. Professor Sarrafzadeh has published approximately 250 papers, is a coeditor of the book Algorithmic Aspects of VLSI Layout (World Scientific, 1994), and coauthor of the books An Introduction to VLSI Physical Design (McGraw Hill, 1996) and Modern Placement Techniques (Kluwer, 2003) . Dr. Sarrafzadeh is on the editorial board of the VLSI Design Journal, an associate editor of ACM Transactions on Design Automation (TODAES), and an associate editor of the IEEE Transactions on Computer-Aided Design (TCAD).
. For more information on this or any other computing topic, please visit our Digital Library at www.computer.org/publications/dlib.
