Abstract-A low-power technique for digital filtering referred to as adaptive error-cancellation (AEC) is presented in this paper. The AEC technique falls under the general class of algorithmic noise-tolerance (ANT) techniques proposed earlier for combating transient/soft errors. The proposed AEC technique exploits the correlation between the input and soft errors to estimate and cancel out the latter. In this paper, we apply AEC along with voltage overscaling (VOS), where the voltage is scaled beyond the minimum (referred to as -) necessary for correct operation. We employ the AEC technique in the context of a frequency-division multiplexed (FDM) communication system and demonstrate that up to 71% energy reduction can be achieved over present-day voltage-scaled systems.
I. INTRODUCTION
T HE RAPID growth in demand for portable and wireless computing systems is driving the need for increasingly higher functionality with low energy consumption [1] - [4] . However, with feature sizes being scaled into the deep submicron (DSM) regime, the emergence of DSM noise [5] , [6] consisting of ground bounce, crosstalk, drops, clock jitter, charge sharing, process variations, etc., resulting from relentless scaling of feature sizes [7] , has raised questions about our ability to design reliable and efficient (hence affordable) microsystems and, hence, the ability to extend Moore's law [8] well into the deep submicron regime.
Our past research [9] - [11] on energy-efficiency bounds of DSM VLSI systems in the presence of noise strongly suggests that design techniques based on noise-tolerance need to be developed if energy-efficiency and reliability are to be jointly addressed. Indeed, the 2001 International Technology Roadmap for Semiconductors [7] refers to error-tolerance as a design challenge for the next decade. We have developed noise-tolerance at the algorithmic [12] as well as circuit [13] levels of the design hierarchy. In [12] , we proposed algorithmic noise-tolerance (ANT) as a technique that, when combined with supply voltage overscaling (VOS), enables the design of low-power signal processing systems that operate at energyefficiencies beyond those achieved by present-day systems. The overall approach of employing VOS in combination with ANT for low-power is referred to as soft DSP. In soft DSP systems, energy efficiency and reliability issues are addressed jointly. This is the key difference between the ANT and fault-tolerant computing techniques [14] - [18] .
In this paper, we propose a new ANT technique referred to as adaptive error-cancellation (AEC). The proposed AEC technique is based on the observation that soft errors at the output of a voltage overscaled system exhibits an extremely complicated (though deterministic) dependence on the input signal and the underlying datapath architecture. Hence, by modeling the soft-error signal as a stochastic process and exploiting its correlation with the input signal, one can devise an error cancellation scheme that is akin to echo cancellation techniques employed in voiceband modems. The configuration of the error cancellation scheme can be calibrated adaptively by the well-known least mean square (LSM) algorithm.
We optimize the proposed AEC technique by developing an energy-optimum AEC design that minimizes the energy overhead due to error cancellation while being subject to an algorithmic performance constraint. In comparison with the previously proposed prediction-based error-control (PEC) scheme [12] for narrowband frequency-selective filters, the proposed AEC technique is well-suited for broadband signal processing and communication systems, e.g., 3G wireless communications [19] , [20] and next-generation digital subscriber loop (DSL) systems [21] , [22] . We employ the proposed AEC technique to design a low-power frequency-division multiplexed (FDM) system [23] for which the input signal is composed of several bandlimited signals occupying adjacent frequency bands. Simulation results demonstrate that an AEC-based filter achieves 43-71% energy savings without incurring any algorithmic degradation.
The paper is organized as follows. In Section II, we review our past work on ANT. In Section III, we propose the adaptive error-cancellation (AEC) technique that is suitable for the design of low-power broadband DSP and communication systems. An energy-optimum AEC design strategy is developed in Section IV. Simulation results are presented and evaluated in Section V.
II. ALGORITHMIC NOISE-TOLERANCE (ANT)
In this section, we present VOS and ANT and describe their use in the design of low-power signal processing systems.
A. Voltage Overscaling (VOS)
Dedicated DSP implementations are designed subject to an application-specific throughput requirement. Specifically, for correct operation, the critical path delay of the DSP 1053-587X/03$17.00 © 2003 IEEE architecture should be less than or equal to the sample period of the application, i.e.,
The critical path delay is a function of the supply voltage. As voltage is scaled down, power dissipation reduces quadratically (assuming that dynamic power is the dominant source), whereas the delay (including the critical path delay ) increases. At a certain point, (1) is violated, and soft errors start to appear internally and eventually at the output. The supply voltage at which is referred to as the critical supply voltage and denoted as -. Present-day voltage scaling stops at the point where -. Overscaling supply voltage beyond -results in output errors if critical paths, and other longer paths are excited by certain input patterns, i.e., soft errors occur. This induces algorithmic performance degradation such as a loss in the output signal-to-noise ratio (SNR).
For the purpose of illustration, consider a simple four-tap FIR filter implementation as shown in Fig. 1 . The worst-case delay is shown to be , where is the propagation delay of 1-bit full adder at . Assume that the supply voltage is overscaled to a value --such that --, i.e., VOS is applied. With the same clock rate (throughput), we see that
but --
This indicates that the top six MSBs of the filter output will be in error when input patterns exciting the critical paths and other longer paths are applied. Note that most arithmetic units employed in practice use LSB-first computation. This makes soft errors appear at the MSBs first, thereby creating errors of large magnitude. Hence, error detection is easy, but error correction is difficult to achieve. This opens up a unique opportunity for ANT.
B. Algorithmic Noise-Tolerance
The key idea behind ANT is to employ a low-complexity error-control block that detects and corrects errors that may arise in a comparatively large VOS block. An effective ANT technique is one that has low complexity (compared with the VOS block) and is able to mitigate the performance degradation. These techniques may observe the input, output, and certain intermediate signals of the VOS block to generate an output , where is the error-free output of the VOS block.
We now derive the conditions under which soft DSP leads to energy savings over optimal error-free voltage-scaled systems (defined as systems operating at -). The dynamic energy dissipation per clock cycle of such a system is given by - where is the average switching capacitance that accounts for signal transition activities, voltage swing ranges, load, and parasitic capacitances at all of the circuit nodes. It can be regarded as a measure of the hardware complexity of the underlying architecture. Note that is the minimum energy dissipation that conventional voltage scaling can achieve.
In comparison, the dynamic energy dissipation per clock cycle of the corresponding soft DSP system is given by
where represents the overhead complexity due to ANT, -is the critical supply voltage for the ANT-based errorcontrol block, and is the VOS factor (VOSF). From (4) and (5), it can be easily shown that , provided --
In practice, the condition in (6) is easily satisfied by making as small as possible and/or by making as large as possible. There is indeed an interesting direct relationship between and . When is increased, the performance degradation becomes larger as more critical paths and other longer paths start to fail. This requires increasingly sophisticated and perhaps complex ANT techniques that may increase . Fig. 2 depicts the previously proposed prediction-based error-control (PEC) technique [12] . The PEC technique is effective in reducing energy dissipation for narrowband filters while incurring a minor performance loss. Many modern-day DSP and communication applications require broadband signal processing techniques. In this paper, we propose an ANT technique referred to as adaptive error-cancellation (AEC), which is suitable for broadband signal processing.
III. ADAPTIVE ERROR CANCELLATION
In this section, we present the AEC technique. The soft-error signal is modeled as a stochastic process, and the crosscorrelation between the input signal and the soft-error signal is exploited for error control. As soft errors are input dependent, they can be regarded as an echo of the input signal, and hence, echo cancellation algorithms used in modern communication systems can be employed to effectively restore the system performance. Fig. 3 illustrates the proposed AEC technique. In the presence of soft errors due to VOS, the output of an -tap VOS filter can be expressed as (7) where is the error-free output, is the soft output error, is the th-tap coefficient, and is the th delayed input sample.
A. AEC Algorithm
For a given implementation of , soft error depends on the input samples . Therefore, an error canceler can be employed to generate a statistical replica of the soft errors from these input samples, which can then be subtracted from the output. The resulting estimate of , which is denoted by , is given by (8) where is the coefficient vector of the error canceler . It can be chosen to minimize the estimation error , which is defined as (9) Here, we use the commonly employed minimum mean-squared error (MMSE) criterion that minimizes (10) While the value of that minimizes (10) can be obtained as a solution to the Weiner-Hopf equation [24] , in practical signal processing systems, an adaptive algorithm such as the least mean square (LMS) algorithm [24] given below is commonly employed: (11) (12) (13) where is an estimate of the optimum tap-weight vector of , is the complex conjugate of , and is the step size. The computations in (11) are done in the filter (F) block of the AEC and those in (13) are executed in the weight-update (WUD) block. Note that the feedback loop shown in Fig. 3 is employed to adapt the error canceler. The stability of this feedback structure is governed by the well-known stability analysis of the LMS algorithm, which can be found in [24] . This analysis shows that stability is guaranteed as long as the step-size is less than an upper bound.
A practical approach to implement the AEC algorithm described above is to have an auto-calibration phase during powerup. Note that such calibration is commonly used in many practical adaptive systems. During this phase, a predefined input signal is passed through the VOS filter , and a precomputed error-free output is used as the desired signal (see the multiplexer in Fig. 3 ). After the tap-weight vector has converged, the WUD-block can be powered-down, and the multiplexer control signal can be flipped so that gets subtracted directly from the output , thereby canceling out the soft errors. If the WUD-block is left powered up then the error canceler would be able to track variations in the temperature.
B. Algorithmic Performance Measures
We now define algorithmic performance measures needed for characterizing the energy-optimum AEC design presented in Section IV. Note that the filter output under VOS can be written as (14) where is the error-free output composed of a desired signal and signal noise , and denotes the soft error due to VOS. While the relationship between and the input is deterministic, it is extremely complex for any reasonable-sized filters. Hence, we choose to model this relationship in a statistical manner by quantifying the degradation in the output SNR.
Definition 1: The output SNR of a conventional VOS filter is defined as SNR , which is given by SNR (15) where , , and are the variances of the desired signal , signal noise , and soft error , respectively.
Definition 2:
The output SNR of a soft filter employing the AEC for ANT is defined as SNR , which is given by SNR (16) where is the variance of the residual soft error [or estimation error, see (12) ] after the AEC.
In practice, AEC-based soft filters are designed for an application-specific performance requirement SNR , such that SNR SNR (17) where denotes the variance of the worst-case signal noise at the filter output. The error canceler in Fig. 3 is an adaptive filter that takes the soft error as the desired signal and generates the estimated signal as its output. Thus, the estimation error between and is seen as the output noise. The AEC algorithm given in (11)-(13) must achieve the algorithmic performance as specified in (17) for a given SNR . Parameters that determine the SNR include the VOSF, length of the error canceler , precision of the F-block, and precision of the WUD-block.
C. Energy-Savings Measures
The average energy savings achieved by an AEC-based soft filter is defined as % (18) where is the energy dissipation of the conventional filter at the optimally scaled voltage of -, and is the energy dissipation of the soft filter at the overscaled voltage of -. It can be seen from Fig. 3 that is given by (19) where and are the energy dissipations of the primary filter and the error canceler , respectively. For a given input signal, the value of is determined by the supply voltage -, length , and coefficients of the primary filter . To quantify , we define a vector , where is the length of the primary filter , and is an -dimension vector space with binary elements s . We denote if the th tap of the error canceler is powered up and otherwise. The length of can be written as (20) We assume that the WUD-block is switched off after has converged. This gives (21) where is the energy dissipation due to the th tap of . Given the coefficient , can be estimated as a function of via the weighted multiplier energy model [25] . Note that can be obtained via any of the available power modeling approaches [26] , [27] and then employed to solve the energy optimization problem.
To achieve maximum energy savings, we need to minimize subject to the performance constraint (17) . This is formulated as an energy optimization problem as given in minimize: subject to: SNR SNR
In the next section, we will derive the energy-optimum AEC design based on the solution of (22) .
IV. ENERGY-MINIMUM ERROR-CANCELLATION ALGORITHM
We now consider a given filter whose length and coefficients are determined by frequency domain specifications such as filter bandwidth. For a given input signal, energy dissipation of is a function of the supply voltage -or equivalently the VOSF. From (21), energy dissipation of the corresponding error canceler is a function of the VOSF and vector . Thus, in (22) is a function of VOSF and vector only, of which the energy-optimum solutions are provided in Sections IV-A and B, respectively.
A. Energy-Optimum VOSF
To illustrate the relationship between and VOSF, we rewrite (5) as --
-
where and -are determined by the architecture of the primary filter , --is the supply voltage for the error canceler , and is the VOSF. The first and second terms on the right-hand side of (23) correspond to and in (19) , respectively. Note that and in (24) are related. Starting with and , increases with because more and more taps of contribute to the soft errors at the output and the magnitude of the soft errors themselves increase. However, the relationship between and is extremely complex and nonlinear. Hence, in this paper, we find the optimum value of by determining the optimum value of for a given value of , as described next. It is shown in our simulations that needs to be maximized at the point where the algorithmic performance constraint in (22) is just satisfied.
B. Energy-Optimum AEC
We now derive the energy-optimum for a given and VOSF. The reason for the existence of energy-optimum AEC is that performance degradation due to VOS is dominated by soft errors from a few of the taps of having large coefficient magnitude as these taps can easily excite the critical paths and other longer paths thereby contributing more to the performance degradation. Therefore, a reduced-order AEC exists that can restore the algorithmic performance.
In what follows, we assume a zero-mean and uncorrelated input signal . This is a reasonable assumption for most practical broadband systems because such systems employ scramblers to deliberately "whiten" input signals for the purpose of easing timing recovery functions in the receiver and combating interference. Note that the above assumption on input signal is only for the purpose of simplifying the mathematical development so that the key advantages of the proposed AEC technique can be illustrated clearly. In Section V, we relax this assumption to include nonzero mean and correlated signals.
From (8), (9), and (20), the variance of residual soft error after cancellation by the AEC can be expressed as (25) where and are the variances of the input signal and soft output error , respectively, for the given and VOSF, and s are the optimum coefficients of , given by [24] (26)
Note that from (16) and (17) , in (25) due to the -tap has the following constraint: (27) where is determined by . Using the above notations, the energy optimization problem for AEC can be expressed as an explicit function of the vector , as follows:
subject to: (28) where , , , and are given by (21), (25), (16) , and (17), respectively.
The optimization problem (28) can be solved via the Lagrange multiplier method [28] . We define the Lagrangian function as (29) where is the sensitivity vector of the Lagrange multiplier. The solution to (28) is obtained at the point ( ), satisfying (30) for any and . It can be shown that in (30) is given by [25] if if (31) where is the optimum value of . The energy-optimum length of the error canceler is obtained as (32) From (31), if the th tap of has a large coefficient while consuming a relatively small energy , then . In other words, the input has to be utilized to cancel the soft output errors. On the other hand, we can switch off the th tap of if this tap consumes more energy (large ) but has a minor contribution in terms of error cancellation (small ). In practice, we can avoid the computation of by powering down those taps in starting with the tap with the largest value of and continuing until the performance constraint in (28) is violated.
We now describe the relationship between the performance degradation due to VOS and the energy-optimum AEC configuration. Denote as the soft-error component from the th tap of . As is excited by the input , it is reasonable to assume that is statistically independent of and for . Thus, we can rewrite (26) as (33) In general, if the th tap of has a large coefficient , then critical paths and other longer paths get excited easily, thereby resulting in a larger value for and for . From (33) , this implies that is large, which in turn implies [from (31) ]
. This is to be expected as is induced by and thus can only be canceled by the th tap of . As the filter bandwidth increases, the predominant contribution to the soft-error energy at the output will be from fewer taps of . This is because wideband filters have a narrow impulse response. Thus, more s will be zero, resulting in a smaller . This indicates that the proposed AEC technique is best suited for wideband filters.
Finally, we study the convergence characteristics of the energy-optimum AEC. Employing the same assumption on as above, the misadjustment , which is defined as the ratio of the excess MSE (in the steady state) to the optimum MSE, can be expressed as [24] (34) whereas the convergence time constant is given by [24] (35)
From (34), for the same amount of misadjustment, the energy-optimum AEC having a smaller can employ a larger step size for the calibration. This results in a faster settling time [see (35)] than that of a conventional AEC, thereby reducing the energy overhead during the calibration. This further demonstrates that the proposed AEC technique is well-suited for wideband filters.
C. Reduced-Order AEC Algorithm
Employing the energy-optimum AEC derived above, we propose a reduced-order LMS algorithm to compute the AEC coefficients, as shown in (36) (37) (38) where is given by (32) and if in (31) . We now determine the precisions of F-block and WUD block in the energy-optimum AEC. Assuming a uniform stochastic model for the quantization errors in the coefficients s, the mean-squared quantization noise referred to the output of is given by [29] (39) where is the variance of the input signal , is the maximum magnitude of s, and denotes the precision of s in the F-block.
To make quantization errors arbitrarily small, we define a factor such that , where is given by (25) . The precision is then obtained as [30] (40)
From (40), a smaller also reduces the precision of the F-block, thereby favoring energy reduction.
We employ the stopping criterion [31] to determine the precision of the WUD block. The stopping criterion asserts that the WUD block will stop adapting if the correction term in (38) becomes smaller than half of the least significant bit of s. This can be expressed as (41) where is the precision of s in the WUD block. From (41), the lower bound on is given by (42) Fig. 4 shows the architecture of the proposed reduced-order AEC. Simulation results in the next section demonstrate significant reduction in hardware complexity as compared with the conventional LMS algorithm, whereas the performance loss is negligible. Thus, we are able to satisfy (6) easily. Furthermore, even if the supply voltage of the reduced-order AEC is made identical to that of the VOS for simplicity of implementation, the reduced-order AEC is error-free because its critical path is much smaller than that of the filter .
V. APPLICATION TO FDM SYSTEMS
In this section, we study the performance of the proposed AEC-based low-power filter in the context of a frequency-division multiplexed (FDM) communication system. FDM is employed in many broadband communication systems today such as very high-speed digital subscriber line and wireless communication standards. We first describe the simulation setup and then evaluate the achievable energy savings versus algorithmic performance tradeoff. Fig. 5(a) illustrates the spectrum of the input signal , which consists of a baseband signal occupying the band and a bandpass signal in the adjacent band. This input signal emulates a FDM signal [23] . The input signal is also corrupted by a white Gaussian noise source . We assume that all the signals , , and noise are statistically independent.
A. Simulation Setup
The goal of the receiver signal processing is to extract the primary signal . This is accomplished by passing the input signal through a lowpass filter to suppress the out-of-band signal and noise components. We employ the optimization strategy given in Section IV to design AEC-based low-power filters that perform frequency-selective filtering [see Fig. 5(b) ]. In order to evaluate the energy-performance tradeoff for FDM systems at different bandwidths, we vary the bandwidth of from to . All the simulations employ the filter architecture shown in Fig. 1 , where two's complement carry-save Baugh-Wooley multipliers [32] and ripple-carry tree-style adders are being employed.
We employ a logic level simulation to calculate the number of full-adder delays on every path from the input to the filter output given a sequence of inputs. Thus, all paths, and not just the critical paths, are included. The corresponding circuit delay under VOS is obtained by determining the delay of a full adder with respect to supply voltage via circuit simulation using HSPICE. Table I tabulates with respect to for a 0.25-m CMOS process. If the constraint in (1) is violated, the corresponding output will not be able to settle to its new value but instead retain its previous value, thereby resulting in an output error. The output SNR is calculated by averaging over the entire input data set. The energy dissipation is obtained via the gate-level simulation tool MED [33] . The energy overhead 
B. Performance Comparison
In these simulations, FDM systems are assumed to have a 22-dB output SNR requirement, i.e., SNR 22 dB. Thus, conventional optimally voltage-scaled filters have been designed to meet this performance specification with minimal complexity. The energy-optimum AEC filters were designed using the methodology described in Section IV to achieve the same algorithmic performance. Table II summarizes the results of this design methodology for different filter bandwidths. Note that the minimum-complexity has a transition bandwidth for different filter bandwidths. Using the optimal Parks-McClellan design method [34] , we obtained an with 32 taps. The optimum VOSF was found to be around 2.0.
In Fig. 6 , we compare the energy-performance tradeoff achieved by an energy-optimum AEC-based filter for a bandwidth of . Table II shows that for this filter. For the purpose of comparison, we also provide in Fig. 6 the energy-performance tradeoff for this filter with an AEC with 4, 8, and 12 taps, respectively. Note that due to the presence of adjacent-band signals, soft output errors occur frequently as soon as the supply voltage is reduced below -. Thus, a sharp SNR drop is observed for the conventional filter. Fig. 6 shows that energy savings of 37, 69, and 64% are achieved at the desired output SNR with -1.8 V, 1.3 V, and 1.4 V, by using the four-tap, eight-tap, and 12-tap AEC, respectively. Hence, the eight-tap AEC gives the best energy-performance tradeoff.
Table II also indicates the trends in energy savings achieved by the energy-optimum AEC-based filters at different bandwidths. It can be seen that the hardware complexity of the energy-optimum AEC decreases with the filter bandwidth increasing from to . This is because wideband filters have a narrow soft-error energy distribution with respect to filter taps. Therefore, fewer filter taps contribute to the performance degradation and this reduces the complexity of AEC algorithm, thereby enabling larger energy reduction. The achievable energy savings ranges from 43.1 to 71.2% (when the WUD block is off) and 22.3 to 65.9% (when the WUD block is on, and thus, the energy savings is offset by the energy dissipation from the WUD block) as the filter bandwidth increases from to . We now evaluate the convergence speed of the energyoptimum AEC. Fig. 7 shows two learning curves of energyoptimum AEC for filters with bandwidth of and , respectively. The step size is suitably chosen to obtain an output SNR equal to 22 dB. As indicated in Table II,  the settling time for an energy-optimum AEC ranges from 220 630 samples, depending on the filter bandwidth. A relatively larger value for is expected for filters with a narrower bandwidth as is large for such filters. This is consistent with the observations in (34) and (35).
VI. CONCLUSIONS
In this paper, we have proposed an adaptive error-cancellation (AEC) algorithm for designing low-power soft DSP systems. We apply the AEC technique in the context of a FDM communication system and demonstrate significant energy savings over conventional filters without performance loss. Future work is being directed toward the application of the proposed AEC and combination of AEC and PEC [12] in practical broadband communication systems. Developing ANT techniques for adaptive filters is of great interest, given the presence of such filters as equalizers in many communication systems. Soft DSP provides a new direction for research in the design of energy-efficient DSP algorithms and architectures, whereby DSP algorithms, architectures, and circuit properties are jointly optimized to push the limits of energy reduction.
