Abstract-Voltage scaling is a promising approach to reduce the power consumption in signal processing circuits. However aggressive voltage scaling can introduce errors in the output signal, thus degrading the algorithmic performance of the circuit. We consider the specific case of the finite impulse response (FIR) filter, and identify two different sources of errors occurring due to voltage scaling: (a) errors introduced because of increased delay along the logic path and (b) errors caused by failures in the memory due to process variations. We design a FIR filter by using a simple feedback based approach to reduce the memory errors and a linear predictor structure for correcting the logic errors. The proposed filter is more robust to both logic and memory errors caused by voltage scaling. The results show a considerable improvement in the output Signal to Noise ratio (at least around 10 dB) for a probability of error (P err ) even as high as 0.5. We also utilize the proposed technique for an image filtering application and observe a considerable improvement in the visual quality of the output image along with an improvement of over 10 dB in the Peak Signal to Noise ratio for P err as high as 0.5.
I. INTRODUCTION
Voltage scaling ( [1] , [2] ) helps to reduce the power consumption in digital CMOS circuits. For signal processing applications, maintaining the circuit throughput is of greater value than performing computations faster than the sampling rate. This understanding allows designers to adopt a suitable supply-voltage scaling strategy to reduce dynamic, short circuit and leakage power [3] . However, aggressive supply voltage scaling introduces errors in the circuit output and degrades the functional reliability of the circuit. Hence an effective voltage scaling strategy demands an effective control these errors.
The use of multiple supply voltages and adaptive voltage scaling for low power FIR filtering has been proposed in [4] . Several different approaches including voltage scaling for realizing low power implementations of FIR filters have been presented in [5] . In [6] , [7] , the authors have proposed approaches to compensate for degradation in the algorithmic performance of signal processing circuits caused due to voltage scaling. Similarly, we suggest using voltage scaling for low power FIR filtering and propose a methodology to improve the functional reliability of the FIR filter that will allow aggressive voltage scaling with minimum impact on the signal fidelity. However in contrast to other approaches we identify and compensate for two different types of errors occurring in a voltage-scaled FIR filter: the logic errors, and the memory errors. To correct memory errors we propose a feedback based approach which adapts the supply voltage for the coefficient memory when memory errors are detected. Finally by including both the logic and the memory error correction circuitry, we show that the voltage-scaled noisy filter can operate at much higher failure probabilities. 
II. IDENTIFYING THE DIFFERENT SOURCES OF ERROR IN
A VOLTAGE-SCALED FIR FILTER The output of a N-tap FIR filter is calculated as follows,
here, x[n], y[n] and h[n] are the input, output and the impulse response of the filter respectively. The direct form implementation of this FIR filter is shown in Fig. 1 . The filter elements can be classified into two broad categories: the logic elements consisting of the multipliers, adders and the shift registers, and the memory element comprising of the coefficient memory.
In an implementation of a FIR filter on a generic DSP architecture the filter coefficients are stored in the one of the memory spaces (program or data memory). Since many programmable DSP's are implemented using full static CMOS technology, the dynamic power (P d ) can be significantly reduced using voltage scaling [5] . Since the logic path is active while computing every output sample, the supply voltage scaling for logic elements can be very effective in reducing the power consumption of the circuit. However it is also important to observe that for each output sample, N coefficients need to be accessed from the memory. Thus to effectively minimize the overall power consumption, voltage scaling needs to be implemented for the logic and the memory elements. However, scaling the supply voltage increases the error rate of both logic and memory, and introduces two different types of errors in the output as explained in the following subsections.
A. Logic error
If T s denotes the sampling period of the circuit and T cp denotes the critical path delay of the circuit, then the condition to ensure an error-free output is, T cp ≤ T s ( [7] ).
If V dd−crit denotes the critical supply voltage, (defined as the voltage at which T cp = T s ), then voltage overscaling (reducing V dd below V dd−crit ) will increase the critical path delay, which introduces random errors at the output.
These random errors modify (1) as follows,
whereŷ[n] is the corrupted output signal and ε[k] is the error. Thus the logic errors directly corrupt the output signal. For simulation purposes we designed a 27 tap low-pass FIR filter with a cutoff frequency 1500 Hz and a transition band of 500 Hz using the MATLAB 'fdatool'. To simulate the logic errors we assume a bit-error probability (P log−err ), and randomly introduce bit-errors in the output signal.
B. Memory error
As mentioned previously, a FIR filter can be implemented on a generic programmable DSP by storing the coefficients in the on-chip memory. For example, the TMS320C2x [8] provides a total of 544 16-bit words of on-chip SRAM; partitioned into program or data memory. Either of these spaces can be used as coefficient memory and be subjected to voltage scaling for reducing the overall power consumption. However supply voltage scaling in the memory leads to memory-errors. This happens because voltage scaling increases the sensitivity of the memory elements to the manufacturing variations [9] . As a result, larger number of memory failures occurs at reduced supply voltages and corrupt the stored values of filter coefficients and result in memory errors.
The frequency response of a FIR filter is given by
However due to corruption of the filter coefficients, the actual transfer function gets modified to,
where,
As seen in (4) the corrupted frequency response can actually be separated into two components; the original frequency response (H(ω)) and the error response (E(ω)). Clearly, the memory errors modify the frequency response of the filter.
The logic and memory errors described above can be somewhat likened to the rounding and coefficientquantization errors in fixed point implementation of digital FIR filters. The characteristics and effects of these rounding and quantization errors is very well understood [10] and hence robust structures can be designed to compensate for them [11] . However, the distribution of logic and memory errors is not well defined, which introduces additional difficulties in modeling these errors and developing approaches to reduce them. We suggest one possible implementation in the next section.
III. IMPROVING CIRCUIT ROBUSTNESS TO LOGIC AND MEMORY ERRORS

A. Forward path correction for logic errors
Logic errors introduce noise directly in the output signal. Since the output signal is readily available for further processing, it is possible to include error correction mechanisms in the forward path itself. Several existing approaches can be used for this. The approach mentioned in [6] uses a checksum based probabilistic error correction technique. Algorithmic noise tolerance (ANT) based detection and correction of errors due to voltage overscaling is proposed in [7] , [12] . We correct the logic errors by using an implementation similar to the one proposed in [7] . Here, a N p tap linear predictor is cascaded at the output of the noisy filter and used for error detection and correction as described below, 1) Threshold calculation: The linear predictor output
are the predictor coefficients obtained by minimizing the mean-square prediction error. Then the threshold, T = 4 * σ e p is calculated, where σ e p denotes the variance of the prediction error (e p ) with noiseless digital filter.
is the prediction error with noisy filter.
. If no error is detected, the system output is same as the output of the noisy filter (ŷ[n]). The above scheme works under the assumptions that, (a) the magnitude of error introduced by the noisy filter is very large and, (b) the interval between two successive errors is greater than 2N p .
According to us, using the above scheme requires some modifications for reasons described below. The coefficients (h p [k]) of the linear predictor are obtained by minimizing mean-squared prediction error. This requires solving the normal equations [13] of the form,
where R n is the short-time autocorrelation of the input signal. Thus the optimal predictor coefficients obtained from (5) 
B. Feedback correction for memory errors
The memory errors directly modify the frequency response of the filter. This may lead to errors in each and every output sample (since every output sample is now calculated from corrupted memory coefficients) and hence these errors are certainly not bursty or isolated. Moreover, there is no correlation between the different values of the coefficients. The linear predictor works on the assumption that neighboring samples of the input signal are correlated. This means that the linear predictor cannot be used to correct these memory errors. To satisfactorily address this problem the corrupted filter coefficients need to be corrected before the input signal is fed to the filter. We propose a feedback approach to identify and correct erroneous filter coefficients as described below.
Reducing the supply voltage to the SRAM array increases the probability of failure (due to read disturb, access or write failure) of the memory elements [9] . Thus we can reduce the memory errors if we can detect failures in the coefficient memory, and then increase the supply voltage till the failure rate is zero, or at an acceptable value. Ideally, we would like to detect memory errors by comparing the corrupt coefficients directly with the designed coefficients. However the true coefficient values are not available to us. Instead what is available to us is the corrupted output signal. Hence we can indirectly detect memory errors by observing deviation in the observed output from the expected output. For instance, let us assume that the expected output is y[n] = 0, ∀ n. Then for a fixed value of y[n], a special input test pattern (x[n]) can be generated as shown below. However if the filter coefficients are corrupted, then y[n] = 0 for at least one n = 0 · · · N − 1. Thus if y[n] = 0, an error is detected and the and hence the supply voltage for the memory elements needs to be increased by a pre-defined step-size, i.e. V dd = V dd + ∆, where ∆ is the step-size. The block diagram and the flowgraph are shown in Fig. 4 . From the flowgraph it can be seen that V dd is increased only after every N samples. Thus the entire test pattern (x[n]) is allowed to pass through the circuit and if the sum of all the N output samples is found to be non-zero, then V dd is increased by ∆. This is done because changing V dd instantaneously after each non-zero sample is not practically possible. This process can continue till all the output samples settle to the expected value (i.e. 0) or until a pre-defined maximum value of supply voltage (V dd−max ) is reached.
In SRAM the read-disturb failures present the primary bottleneck for scaling the supply voltage in the coefficient memory. Hence we show our experiments by considering only the read-disturb failures. Fig. 5 shows the relation between supply voltage (V dd ), the Read margin (data obtained from [9] ). From Fig. 5 we can see that that as V dd increases, the Read Margin increases and hence the absolute error between the true coefficients and the corrupted coefficients reduces. Here we have assumed that a bit-error in the coefficient occurs if the Read margin is below a threshold value (R thresh = 50mV ).
C. Combining the two
The final implementation assumes both logic and memory errors are occurring together and introduces both the feedback and feedforward correction circuitry. The block diagram of the final implementation is shown Fig. 6 . The performance of the proposed configuration is discussed in the next section. 
IV. RESULTS AND DISCUSSION
The final implementation combines the memory-error and the logic-error correction circuitry to improve the overall functional performance of the FIR filter. To obtain a quantitative measure of performance improvement, the output SNR is calculated as follows, SNR = 10 log( σ s σ n ). σ s is the variance of the error-free output y ideal[n] and σ n is the variance of the noise. Here, the noise is equal to y ideal[n] − y noisy[n] in case of no error correction mechanism and, equal to y ideal[n] − y corrected[n] when error correction is introduced. Thus we can obtain two SNR's: SNR noisy and SNR corrected corresponding to y noisy[n] and y corrected[n] respectively. The input signal in these simulations is a set of 3000 samples extracted from a random speech signal. Fig. 7 . Variation of the output SNR with P log−err Fig. 7 shows the improvement in SNR corrected over SNR noised due to the addition of the linear predictor based logic correction block as the bit-error probability (P log−err ) is increased. This figure is generated by assuming that the time interval between two successive isolated errors is 12 (< 2N p ) and there are no memory errors. The performance is expected to deteriorate as the burst interval is decreased. Fig. 7 shows an improvement of around 15 dB in the SNR at lower values of P log−err . However, the performance of the correction circuitry degrades with increasing P log−err . Fig. 8 shows the variation of the probability of memory error (P mem−err ) with supply voltage. From Fig. 8 we can observe that in the proposed memory-error correction scheme, increasing the supply voltage helps to bring down the probability of memory failures (due to increasing Read Margin). To demonstrate the algorithmic robustness of the proposed filter we need to verify the functional performance of the combined circuit at lower operating voltages, or in other words at higher probabilities of bit-failures. Thus to verify the circuit robustness we need to approximate the behavior shown in Fig. 5 and Fig. 8 and simulate these memory errors. To do this, a bit-failure probability (P mem−err ) is assumed and then a randomly generated number for each bit of each coefficient is compared with P mem−err to determine if it is corrupted. For example, if we fix the supply voltage to be V dd = 0.2V , then from Fig. 8 we can expect the errors in the coefficients to occur with an error probability of P mem−err = 0.35, and then compute the corrupted filter coefficients.
Thus, if the correction circuit starts with a operating Fig. 9 . Illustration of the proposed memory error correction scheme.
value of supply voltage (V dd ), in the simulation we start by assuming an initial value of P mem−err . Now as the correction circuit increases V dd in steps (to reduce memory-errors), in the simulation we reduce P mem−err in steps. Thus we basically move along the curve in Fig. 8 . Also as mentioned previously, the supply voltage adaptation terminates if V dd == V dd−max . For simulation purposes this upper-bound can be specified in terms of no. of steps (max − steps), i.e. the no. of times P mem−err is reduced (or V dd is increased). Fig. 9 simulates the run-time behavior of the memory correction scheme. We start with V dd = 0.2V or P mem−err = 0.35. Now each output sample is compared to 0. After every N output samples we check if ∑ N−1 n=0 |y[n]| is non-zero. If it is non-zero then we increase V dd , i.e. effectively reduce P mem−err . This process continues until all the output samples settle to 0 (assuming that no bounds such as V dd−max or max − steps are set). In our case N = 27. Thus from Fig. 9 , we see that after every N = 27 samples V dd is increased (and P mem−err reduced). This continues for around 12 iterations (each iteration = N samples). At the end of the memorycorrection phase all the output samples settle to 0, the normalized absolute error approaches 0 and V dd and P mem−err settle down to 0.75V and 0.03 respectively. (The absolute error is normalized to [0, 1] for better illustration). Using these assumptions, we first obtain Fig. 10 , which is generated by assuming that only memory errors are present. Fig. 10 shows the improvement in SNR corrected over SNR noised due to the addition of memory correction block as the probability of memory failure (P mem−err ) is increased. The figure shows that at lower values of P mem−err a significant improvement of around 12 dB in the SNR is achieved by using the proposed memory correction scheme. The performance of the circuit deteriorates as the failure probabilities increase. It is also necessary to remember that whenever memory errors are involved, the degradation in output SNR may not be monotonic. This is because every coefficient of a FIR filter does not contribute equally to the output. Hence the amount of output degradation (or improvement on correction) actually depends on which coefficient is corrupted (and corrected). Fig. 11 . Variation of the output SNR with P log−err and P mem−err Fig. 11 shows the improvement in SNR corrected over SNR noised due to the addition of logic and memory correction block as the bit-error probability (P log−err ) and the probability of memory failure (P mem−err ) is varied. The figure shows an constant improvement of around 10 dB in the output SNR for the entire range of P err .
In all the above simulations the error correction circuits themselves are assumed to be error free. This is made possible by ensuring T cp ≤ T s for the correction circuits. The P d for the logic error correction circuit can be minimized by using a smaller tap-length or reduced precision linear predictor. As demonstrated in [7] , the P d overhead due to the correction circuits is compensated for by the increased power savings afforded due to voltage overscaling. Also since the calibration of the optimal supply voltage for the coefficient memory happens offline (training phase), the memory error correction circuit does not contribute to the P d during runtime.
Applications to Image Processing
Image filtering is very commonly used for image enhancement. For example, low pass filtering is used for image blurring, noise removal etc. and high pass filtering is used for edge detection, image sharpening etc. Since an image is a 2D signal, the impulse response of the FIR filter is also 2D. However it is possible to decompose the 2D kernel into a set of orthogonal 1D sub-filters [14] . Thus the 2D filtering operation can be separated into successive 1D filtering operations along the rows and columns of the image. So the proposed correction technique can directly be applied for reducing the output degradation due to logic and memory errors in a voltage-scaled FIR filter for images. Here we have chosen an example application of image blurring. As seen in Fig 12 the image (d) obtained by using by using the proposed logic and memory error correction schemes looks significantly better than the one without any correction (c). The output Peak Signal to Noise ratio (PSNR) is defined as, PSNR = 10 log 10 ( 
MSE ).
Here, Max is the maximum possible pixel value of the image and MSE is the mean square error between the error-free and the noisy image. Thus we can obtain two PSNRs: PSNR noisy and PSNR corrected corresponding to the noisy image, and the corrected image respectively. For the images shown in Fig. 12 , the PSNR noisy is 17.70 dB and PSNR corrected is 28.68 dB. This shows that the corrected image is not only visually closer to the error-free output image but also significantly better in terms of the PSNR. Fig. 13 shows the PSNR values for the noisy and the corrected images for different values of P mem−err and P log−err . From the figure, we can see that even for error-probabilities as high as 0.5, the scheme shows an improvement of around 10 dB in the PSNR.
V. CONCLUSION
In this paper we presented a FIR filter which is more robust to errors introduced in the output signal due to voltage scaling. To come up with this robust design the total errors in the output signal were classified into logic errors (occurring due to violation of the logic path delay condition) and memory errors (occurring due to failures in the coefficient memory elements). A linear predictor based error correction circuit was employed for detecting Fig. 13 . Variation of PSNR with P err for the Lena Image and correcting logic errors. A simple feedback based circuit was used to detect and reduce memory failures by adjusting the supply voltage of the coefficient memories. The results indicated a considerable improvement in the output SNR (at least around 10 dB) for a probability of error even as high as 0.5.
