Abstract-Wearable devices performing advanced bio-signal analysis algorithms are aimed to foster a revolution in healthcare provision of chronic cardiac diseases. In this context, energy efficiency is of paramount importance, as long-term monitoring must be ensured while relying on a tiny power source. Operating at a scaled supply voltage, just above the threshold voltage, effectively helps in saving substantial energy, but it makes circuits, and especially memories, more prone to errors, threatening the correct execution of algorithms. The use of error detection and correction codes may help to protect the entire memory content, however it incurs in large area and energy overheads which may not be compatible with the tight energy budgets of wearable systems.
I. INTRODUCTION AND MOTIVATION
The emergence of wearable devices for long-term acquisition of cardiac signals (electrocardiogram or ECG) promises a paradigm shift in the monitoring of chronic heartrelated conditions. Functionalities of state-of-the-art wearable cardiac sensors are not limited to sensing and (wirelessly) transmitting the acquired data, but they also provide advanced Digital Signal Processing (DSP) capabilities to analyze biosignals on-node and extract clinically-relevant features [1, 2, 3] . Diverse applications have been proposed, ranging from the automated detection of epileptic seizures [12] to the predictive risk assessment of atrial fibrillations [13] .
In this context, power spectral analysis (PSA) of the heart rate variability (HRV) is among the most widely employed strategies, as it allows the monitoring of various health conditions associated with the heart as well as other organs [4, 5] , providing valuable frequency-domain medical indicators. The implementation of PSA on ultra-low power embedded devices requires a carefully tailored digital architecture.
A key element in these devices are the memory components, where a significant amount of energy is consumed [1] . To maximize the energy efficiency, an effective approach is to scale down the supply voltage (Vdd). Aggressive supply voltage scaling, leads to quadratic energy savings, but makes circuits (and especially SRAM cells) prone to errors compounding the reliability issues present in nanometer technologies. Larger memory bit-cells [15] and error detection and correction errors (ECC) [17] can help in dealing with errors induced by a scaled Vdd. However, such mechanisms impose large energy and area overheads. Ensuring the correctness of run-time execution in digital systems is a major challenge, due to the increase in variability derived from technology scaling and near-threshold voltage supplies. A striking alternative, often referred to as approximate computing, is to take advantage of the resilience nature and statistical properties of bio-signal DSP applications such as filtering, features extraction etc. As an energy-saving strategy, the approximate computing paradigm relaxes reliability constraints when errors have a negligible impact from an application perspective. Algorithms in the embedded health monitoring domain operate on noisy acquisitions, while often presenting statistical or qualitative outputs [18] . In such scenarios, in this paper we extend the observations from our previous work [18] , and advocate for not needing to provide 100% exactness in all cases, which is also extremely expensive from an energy efficiency viewpoint. In this paper, we propose to investigat of the approximate paradigm in bio-signal ana extraction applications and utilize their stat for limiting the overhead of classical EC contributions can be briefly described as:
1)
We study the statistical properties of a P variability system and classify data e intermediate steps of the algorithm into sign significant based on their contribution to outp 2) We apply a significance-based memory p in PSA systems and evaluate the energy g compared to traditional full ECC scheme.
3)
We exploit the statistical properties of elements produced by the PSA system alternative mitigation scheme. The scheme, r to correct the errors, only limits their impa quality by replacing the erroneous data w value that is the best available statistical inf incorrect data.
4)
We compare the proposed scheme techniques such as ECC and analyze the en the quality loss on benchmarks of ECG rec embedded devices.
The rest of this paper is structured as fo describes the target PSA application in the signals. Section III describes the impact of s memories and presents state-of-the-art tec address them. Section III describes the propo smart wearable sensors. It also analyzes th content of the buffers employed by an implementation, justifying our choice of a h scheme. Section IV presents experimental presents experimental evidence showcasing h scheme can achieve a high quality-of-serv embedded systems, in the presence of errors small fraction of the memory buffers.
II. POWER SPECTRAL ANALYSIS (
The power spectral analysis (PSA) of the proposed as a powerful technique to evalua control of the heart rate [10] . Such a system, 1, is composed of 4 essential steps: a) in the f difference between consecutive heartbeats intervals) is extracted by processing the reco a fixed size window. Due to the non-periodic intervals, Fast Lomb method is considered as method for estimating the power spectrum o [6] . According to Fast Lomb, in the next step RR intervals are extrapolated to a fixed size samples), which are then in step c) processe specific trigonometric functions [6] . A fast m used at this step for estimating these comple as the Fast-Fourier-Transform (FFT). In our we used a Wavelet-based FFT, which was s te the application alysis and feature tistical properties C schemes. Our PSA of heart rate lements in the nificant and lessput quality. In this modified algorithm ini Transform (DWT) of size N is ap followed by butterfly operations si FFT but with modified, simpler co also in Figure 1 . Finally, the Lom data, estimating the real-time power the clinical practice, the most used m the ratio between the power in lowas 0.04 -0.15 Hz) and high-frequen with LFHF Ratio=LFP/HFP. A dev above or below normal values is in issues [10] . The buffers required by this imp of the extrapolation step (compose the output of the DWT transform arrays). arsity (most elements close ubstantial energy gains [6] . itially a Discrete Wavelet pplied on the input signal, imilar to the conventional oefficients [6] 
III. UNRELIABLE MEMORIES AND PROPOSED APPROACH
As discussed in Section I, aggressive voltage scaling can induce a non-zero probability of erroneous reads and writes to the memory subsystem. In [7] , a bit-flip probability of 0.22% and 0.07% are reported for 6-transistors SRAMs supplied at 0.6V and 0.65V respectively, implemented on a 40nm technology. More resilient memory topologies (such as 8-transistors and SCMEMs) do allow reliable operations at these voltage levels, however they do incur high area and energy overhead [7] . Alternatively, error detection and correction techniques can be employed to recover from bit-flip events, but they also present non-negligible added complexity, from an area as well as energy perspective.
Our proposed method minimizes such overhead by judiciously employing detection and correction of errors depending on the criticality of the stored data, providing high correctness guarantees only to its most critical part, as dictated by the application characteristics. Figures 2 and 3 show the typical data distribution across the two buffers used in the PSA application. Figure 3 highlights how the elements of the DWT_buffer are mostly centered on zero, and therefore are sparse, while elements of Extr_buffer (c.f.: Figure 2 ) have a non-sparse distribution.
Different data distribution patterns require different protection approaches.Intuitively, the most suited protection scheme for the non-sparse Extr_buffer is to protect the Most Significant Bits (MSBs) of every word with an ECC code, as they will have a larger influence on the output. Conversely, a small, but non-zero, probability of a bit-flip in the Least Significant Bits (LSBs) can be allowed (Figure 4 ). In the case of DWT_buffer, we discriminate between significant and non-significant words, instead of significant bits. In fact, as most of the elements of this buffer are close to zero, it is possible to replace them with their expected value (zero) if an error occurs, which can be detected by employing a simple parity check. For the rest (which, in the PSA application, reside in the low-frequency range) a more expensive error correction capability must be provided; in our case, ECC ( Figure 5 ).
Such partition between significant and non-significant words can be performed statically, i.e. independently from the particular window of inputs being processed. It is in fact derived from the inherent properties of the DWT transform, and the resulting separation of the processed data into high and low frequencies. It must be noted that a difference exists between the strategies adopted for non-significant bits in non-sparse buffers (e.g.: LSBs in Extr_buffer) and non-significant words in sparse buffers (e.g.: near-zero values in DWT_buffer). In the first case, errors will be completely undetected while, in the second, a parity check will detect the error and invalidate the corresponding word, but no correction will be required. In both cases, the aim is to have a negligible deviation in the end results of the application, while greatly diminishing the protection overhead. In the case of significant bits and words, each bit-flip is detected and corrected, because it highly affects the quality of the output.
IV. EXPERIMENTAL SETUP AND RESULTS
To evaluate the proposed heterogeneous protection scheme, we developed a high-level fault simulation environment, executing the entire target application. In this way, we evaluated the impact of errors in the intermediate buffers on the quality of the PSA output, which was compared, under different protection schemes, to a fault-free execution. Single bit-flip errors in the buffers are considered with probabilities of 0.07% and 0.22%, corresponding to the behavior of a 6-transistor SRAM at 0.65V and 0.6V, respectively [6] .
Input ECG data was retrieved from the PAF prediction challenge database, available on the Physionet portal [8] . The database includes 100 recordings, of 30 minutes each. We have considered input data windows of 2, 4 and 6 minutes, with an overlap of 1, 3 and 5 minutes, respectively. Results from each recording and each window size are averaged in the presented results.
To obtain fair results, we employed error masks, forcing bit-flips in random locations of the buffers if they reference to un-protected regions. Different error masks are employed for each processed input window and for each buffer, but the same set is used across all protection configurations. For all buffers, data is represented with 32 bits words. For the Extr_buffer, we explored a protection of the 8, 16, or 32 (all) most significant bits, while for DWT_buffer, we assumed a protection of the 5%, 10% or 15% of the most significant memory words.
In the following sections, we explored the output degradation induced by bit-flips in the buffers (IV-A), the energy overhead of different protection schemes (IV-B) and the trade-off between energy efficiency and quality of service (IV-C).
A. Error Analysis
Figures 6 and 7 compare the percentage error in the computation of the LFHF ratio at supply voltages of 0.65V (bit-flip probability = 0.07%) and 0.6V (bit-flip probability = 0.22%), respectively.
Results highlight that the selective protection of significant words in sparse buffers can guarantee high-quality results with little overhead. In fact, in the case of a voltage supply of 0.65V (Figure 6 ), less than 1% error in the LFHF ratio can be achieved by protecting only the most significant 15% of the significant words in the DWT_buffer, when the non-sparse Extr_buffer is error free. As expected, by reducing the ratio between significant and non-significant words in DWT_buffer, the PSA error increases. Nevertheless, it still remains rather low (3%) even when only the 5% most significant words are protected. Conversely, the protection of the non-sparse Extr_buffer presents more challenges; even when protection against bit-flips is provided for the 16 MSBs of each word (which corresponds to protecting half of the buffer content), still a noticeable decrease in quality of service can be noted (square-dotted orange Line in Figure 6 ). The same trends can be noticed in Figure 7 for a lower voltage supply of 0.6V, and corresponding higher bit-flip probability. Interestingly, even in this case the deviation in the LFHF ratio, with respect to a fault-free execution, can be bounded to 5% by allowing errors in the 16 LSBs of Extr_buffer and only checking (but not correcting) errors in 90% of DWT_buffer.
B. Energy Analysis
We comparatively evaluated the energy overhead induced by the memory protection configurations by modeling different schemes using CACTI [11] in the McPAT framework [17] . The Extr_buffer requires two buffers, each of 8K Bytes, while the DWT_buffer is composed of four buffers of the same size. The operating temperature was assumed to be 300K. The technology node employed in the simulation of the memories was 40nm. All wirings were considered to be global, and the interconnect projection was taken as conservative. The memories have a single port used for both reading and writing. Also, for our purpose, we assumed that each of the memories consists of a single bank connected using a bus.
Relevant metrics, output of the CACTI model, consist of the dynamic read and write energies per access and the leakage power of the target memory configurations. To derive the corresponding dynamic energy, we retrieved the number of accesses to the different buffers from the high-level model of the application. In addition, the leakage energy was estimated by considering the execution time of an optimized version of PSA running on an ARM Cortex M4 processor running at 180MHz.
Additional storage is required to support data protection. In the case of Extr_buffer (protection of most significant bits), considering one error detection and correction per memory word, 6 extra ECC bits are required for each word when all 32 data bits are protected. Fewer ECC bits are used when only the most significant part of each word is protected: 5 and 4 bits in the case of 16-and 8-MSB protection, respectively. A maximum of one error occurring in a memory word has been assumed and the number of ECC bits are chosen accordingly as described before.
In any case, larger number of errors in e also assumed which will require higher order redundant bits to detect and correct all the this will lead to higher overheads for the con opposed to the proposed scheme which ac significant words) irrespective of the nu replacing erroneous words based on the dete error. Therefore, focusing on one failure per efficacy of the proposed scheme for the popu bit flips while being fair to the conventional s
In the case of DWT_buffer (prot significant words), a single parity bit is em significant words, allowing error detectio correction is instead supported for signif employing a 6-bit error correction code. T heterogeneous structure in CACTI, we con corner cases where all data is either employi 6-bit ECC. To derive the dynamic energy p intermediate configurations, we employed described in (1) .
Where E p is the read energy per access memory, and E u is the read energy per unprotected memory. Also, p is the percenta significant words. E t is the net read energy heterogeneous memory. The write energy pe leakage power of the hybrid memory were a the same manner. Figure 8 represents a scenario with the m number of accesses to the hybrid memories application using the RR intervals from a pat 5 minutes. This figure shows the energy con memory subsystem in an optimized embedd each word can be r ECC with more errors. Note that nventional case as cts (for the lessumber of errors ection of a single r word proves the ular case of single schemes. tection of most mployed for nonon. Single error ficant words by To simulate this nsidered the two ng 1-bit parity or er read access of the formula as (1) in the protected r access in the age of considered per access in the er access and the also calculated in 5V Supply for emes maximum possible s for running the tient recording of nsumption of the ded system where power management makes the m 2.67s for the processing of a five patient. The variable S represents respect to the baseline configur protected in the Extr_buffer a protected in the DWT_buffer. It sh all data entails a significant energy with respect to baseline configurati having a 16-bit MSB protection in of significant words protection in D energy overhead is acquired. As r scheme induces a negligible quality just 2% at 0.65V.
C. Energy / Quality-of-Service Tra
The proposed methodology can proper memory configuration whe given for the target signal Alternatively, it can be used to solution for a fixed energy budget. 
V. CONCLUSION
In this paper, we have explored the energy benefits that can be obtained by applying hybrid data protection schemes to embedded memories for ultra-low power wearable monitoring systems. Working within a well-known, specific application domain allows designers to implement algorithms that exploit significance-based computing to effectively reduce the energy consumption, while keeping the output error under a certain threshold and providing outputs of high enough quality. Our experiments show that by adopting different correctness guaranties in words and bits of varying significance from a digital signal processing viewpoint, the proposed approach can effectively reduce the energy overhead implicit in data protection, while minimally impacting the end-to-end quality of service of the target Power Spectral Analysis application.
The illustrated methodology is applicable in many realworld applications in the embedded health monitoring domain beyond PSA, because they share the same characteristics of processing noisy inputs, providing statistical or qualitative outputs and adopting a sparse representation in intermediate buffers.
Experimental evidence highlights that heterogeneous protection reduces approximately by 11% the energy budget of the data memory used to store the buffers in real-life wearable ECG analysis systems. Moreover, this new proposed approach tolerates high error rates, potentially allowing more aggressive voltage/frequency scaling at system level. Hence, this observation opens promising venues to be explored in the development of ultra-low power multi-modal wearable embedded systems.
