Abstract-In this paper, dynamic algorithm transformations (DAT's) for designing low-power reconfigurable signal-processing systems are presented. These transformations minimize energy dissipation while maintaining a specified level of mean squared error or signal-to-noise ratio. This is achieved by modeling the nonstationarities in the input as temporal/spatial transitions between states in the input state-space. The reconfigurable hardware fabric is characterized by its configuration state-space. The configurable parameters are taken to be the filter taps, coefficient and data precisions, and supply voltage V dd . An energy-optimal reconfiguration strategy is derived as a mapping from the input to the configuration state-space. In this strategy, taps are powered down starting with the tap with the smallest value of [ 
I. INTRODUCTION
T HE recent growth of portable wireless networked communication systems has made it essential that maximum functionality be provided for prolonged periods under severe constraints on battery weight and life. This fact has made low-power digital signal processing (DSP) an important research area. Energy minimization techniques have been proposed at all levels of the design hierarchy beginning with algorithms and architectures and ending with circuits and technological innovations. Existing techniques include those at the algorithmic level (such as strength reduction [1] - [3] and variable-length vector quantizer (VQ) [4] ), architectural level (such as pipelining [5] , [6] and parallel processing [6] ), logic (logic minimization [7] , [8] and precomputation [9] ), circuit (reduced voltage swing [10] and adiabatic logic [11] ) and Manuscript received July 1, 1998 ; revised November 15, 1998 . This work was supported by the National Science Foundation under CAREER Award MIP 96-23737, and by the Defense Advanced Research Projects Agency under Contract DABT63-97-C-0025.
The authors are with the Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, Urbana, IL 61801 USA (e-mail: mgoel@uivlsi.csl.uiuc.edu; shanbhag@uivlsi.csl.uiuc.edu).
Publisher Item Identifier S 1063-8210(99)04569-2.
technological level [12] . Algorithm transformation techniques [13] such as look-ahead [6] , relaxed look-ahead [14] , algebraic transformations [15] , and retiming [16] have been employed in high-speed and, more recently, low-power DSP system design. We refer to these algorithm transformations as static algorithm transformations (SAT's) because these are applied during the algorithm design phase, assuming a worst-case scenario, and their implementation is time-invariant. In recent years, reconfigurable signal processing has emerged as an alternative approach to low-power DSP. In [17] , reconfigurability is employed to map a wide class of signalprocessing algorithms (SPA's) to an appropriate architectural template. Related work includes approximate signal processing [18] , [19] where just the right amount of computational resources, needed at a specific instant/period to meet the algorithm performance requirements, is allocated. Compilerbased run-time software optimization techniques are explored in [20] , [21] . Field-programmable gate-array (FPGA)-based devices and their reconfiguration strategies are discussed in [22] - [24] . Hybrid architectures based on FPGA's and general-purpose DSP's is the topic of research in [25] and [26] .
Our approach to the design of reconfigurable DSP is to add just the right degree of flexibility (as demanded by the application) to ASIC's resulting in application-specific reconfigurable integrated circuits (ASRIC's). The ASRIC approach is suitable for mobile multimedia systems of the future as it maintains the energy and throughput efficiency of ASIC's. In this paper, we present dynamic algorithm transformations (DAT's) as a systematic approach to low-power ASRIC's. The reconfiguration strategies are derived via DAT's that optimize energy dissipation while maintaining a specified level of algorithmic performance measure such as mean squared error (MSE) or signal-to-noise ratio (SNR). The DAT techniques are based upon the principle that the input is usually nonstationary and, hence, it is better (from an energy perspective) to adapt the algorithm and architecture to the input. In contrast, presentday systems (referred to as "worst-case designs") seek out and design for the worst-case scenario. For example, a broadband modem is typically designed for the longest cable length, the maximum cable temperature, and the worst-case near-end crosstalk (NEXT) interferer. The worst-case design requires high complexity and, hence, high energy consumption that remains the same even if the cable length, in practice, is small. A DAT-based modem will exploit this spatial variability between one location to another to reconfigure itself to save energy. Similarly, wireless channels exhibit extensive temporal variabilities due to fading that can be exploited by a DATbased receiver.
The main contribution of this paper is to formalize the design of reconfigurable DSP systems. This is done by modeling: 1) the temporal/spatial variabilities in the input as state transitions in an input state-space and 2) reconfiguration of the hardware fabric as transitions in a configuration state-space. Given an input state, an energy-optimal configuration state is derived systematically as a solution to an optimization problem, which has energy as the objective function and a constraint on the SNR. The proposed design approach is independent of the hardware platform. From an implementation perspective, a reconfigurable DSP system has the SPA implemented in a reconfigurable hardware (see Fig. 1 ) (such as FPGA, certain DSP's, or ASIC's), while the input state and state transitions are monitored by a signal monitoring algorithm (SMA) block (or a controller).
In the past, the DAT-based approach has been successfully applied to low-energy equalizers for 51.84-Mb/s very-highspeed digital subscriber loop (VDSL) [27] and low-energy finite-impulse response (FIR) filters. In this paper, we apply DAT to a NEXT canceller for 155.52-Mb/s asynchronous transfer mode (ATM)-local area network (LAN) [28] . Simulation results indicate that the energy savings range from 2% to 87% as the cable length varies from 110 to 40 m, respectively, with an average energy savings of 69% as compared to the worst-case design corresponding to a cable length of 110 m.
The remainder of this paper is organized as follows. Section II provides preliminaries regarding adaptive filters and the variable supply voltage scheme. In Section III, we describe reconfigurable datapath and their energy dissipation models. The main result is described in Section IV, where we present DAT's for adaptive filters. Finally, in Section V, we employ the DAT-based adaptive filter as a NEXT canceller in 155.52-Mb/s ATM-LAN over unshielded twisted-pair (UTP) category-3 copper wiring and present simulation results.
II. PRELIMINARIES
In this section, we describe preliminaries regarding adaptive filters [29] and the variable supply voltage scheme [19] that would be necessary for the development of DAT.
A. Adaptive Filtering
Adaptive filters are based upon a stochastic model of the input signal , where is the time instant. The output of a fixed-coefficient -tap filter processing the input is given by (1) where are the filter coefficients. The filter coefficients can be chosen to minimize the error given by (2) where is the desired signal. Typically, it is the MSE value that is minimized, where the MSE is defined as (3) where is the expectation operator. If the error is assumed to be an ergodic process, then the sample average of can be approximated by its time average, which is given by (4) where is the window over which the squared error is averaged. It can be shown [29] that if the input signal is a wide-sense stationary (WSS) white process (samples of are uncorrelated), then the minimum MSE is given by (5) where is the desired signal power, is the input signal power, and are the optimum coefficients. For stationary inputs, the optimum coefficients can be computed [29] by solving a system of linear equations. However, if the input signal is nonstationary, then the coefficients need to be updated via an adaptive algorithm such as the least mean square (LMS) algorithm [30] , defined as follows:
where is the complex conjugate of , defined in (2) , is the desired signal, is the filter output, and is the step size. If the step-size is sufficiently small, then the coefficients will approach as approaches infinity. In that case, in (4) will approach the optimum value in (5) . The LMS algorithm is commonly employed in numerous signal processing and communications applications due to its inherent simplicity.
A direct implementation of the LMS algorithm is shown in Fig. 2 , where each tap consists of two multipliers and two adders. The filter (F) block implements (6) and weight-update (WUD) block implements (7) . The critical path delay for the architecture in Fig. 2 is given by (8) where is the propagation delay for the multiplier, and are the propagation delays for the sum and carry outputs, respectively of a 1-b full adder, and is the precision of the adder in the F-block. Note that we assume a ripple-carry adder architecture and that is a power-of-two. Typically, we choose so that the sample rate requirements are met. In many applications, the WUD block is switched off (to conserve energy) after the optimum coefficients have been obtained, i.e., after convergence. In that case, the critical path delay is due to the adders and multipliers in the F block and is given by (9) Note that, if some of the taps in the F block are powered down, then in (9) will be smaller. Further energy savings can be obtained by lowering the supply voltage via schemes such as in [19] , [31] , and [32] , as described below.
B. Variable Supply Voltage for Low Power
If is the sample period and is the critical path delay when all the taps are powered up, then we choose (10) If some of the taps are powered down, then . The slack in critical path delay can be exploited to obtain energy savings. The propagation delay for CMOS circuits has an inverse relationship with the supply voltage, while the power dissipation has a quadratic relationship with the supply voltage. Therefore, the reduced critical path delay due to reconfiguration can be exploited to reduce the supply voltage in order to save energy. In fact, the slack is equivalent to the processing rate [19] , which is defined as (11) where . Thus, when , indicating that the processing rate of the architecture is maximum. It can be shown [19] that the supply voltage can be lowered to , where is given by (12) and , is the threshold voltage and . In [19] , the value of is obtained experimentally by buffering the input samples in a first-in firstout (FIFO) buffer and monitoring the number of unprocessed inputs in the buffer. A specific supply voltage value is then chosen according to (12) . In contrast to [19] , we determine algorithmically from (11) , where the critical path delay is varied (by powering down the specific taps) according to the requirements imposed by the input environment. It should be mentioned that (12) is employed to obtain a coarse value. Fine adjustments [31] to track the process and temperature variations can be done subsequently. The reader is referred to [31] and [32] for details and additional references. Thus, energy savings are obtained by powering down taps, reducing precisions and reducing the supply voltage . In order to enable these energy savings, we need an underlying reconfigurable hardware fabric and the energy dissipation models for the hardware, which are described below.
III. RECONFIGURABLE DATAPATH: THE SPA BLOCK
In this section, we present reconfigurable hardware and energy models for arithmetic units and filters. The reconfigurable hardware architecture and energy models are formulated so that energy-optimum reconfiguration strategies (described in Section IV) can be computed in real time. We will focus on energy models for the multipliers as these consume a large percentage of the total energy. It is well known that energy dissipation is a function of the input statistics in CMOS circuits. For a direct-form FIR filter, the input into the th-tap multiplier is a delayed copy of . Thus, the statistics of the data input are the same for all taps. Therefore, we present an energy dissipation model of a multiplier, which is a function of the coefficient input only.
A. Multiplier Energy Models
We will assume that a -b signal is being multiplied by a -b constant coefficient . The constant coefficient assumption is valid for adaptive filters if we assume that the WUD-block is powered-down after convergence. The multiplier energy model is based on two estimates of the transition activity in an array multiplier [33] . The first estimate is given by number of ones in a binary representation of the coefficient . The justification for this estimate is that if the th bit in the coefficient is zero, then the transition activity in the th row of the array multiplier is reduced. In [34] , a similar expression was employed in estimating the energy dissipation of a shift-add multiplier. Let be the two's complement representation of the coefficient . The number of ones is then given by (13) A linear model based on was regressed against realdelay energy consumption values obtained via the gate-level simulation tool MED [35] with typical delay values for a 0.18-m 2.5-V CMOS technology. It was found that underestimates transition activity in the array multiplier by an error of approximately 14%. This is due to the fact that, in this model, the transition activity in a row due to the signal propagation from the previous row is ignored.
The second estimate is based on the number of zeros in the least significant bit (LSB) positions in the two's complement representation of . For example, has two zeros at the LSB positions. It can be shown that, in an array multiplier, the rows corresponding to these zeros have very little transition activity. Therefore, the energy model is given by the difference of the coefficient precision and the number of LSB zeros. This number is computed as (14) where is the bit-wise complement of . Regression of a linear model based on with real-delay energy values based on MED [35] indicated that (14) overestimates the multiplier energy by 75%.
As the two estimates (13) and (14) underestimate and overestimate the transition activity, respectively, an accurate energy model can be obtained by taking their weighted sum as follows: (15) where and are obtained from (13) and (14), respectively. In Fig. 3(a) , we show regression of the model in (15) against the real-delay energy consumption values. It was found that the model in (15) is accurate with less than 9% error as compared to a real-delay gate-level simulation.
An architecture-level implementation of the hardware that evaluates the energy model in (15) is shown in Fig. 3(b) , where is the th b of coefficient . If such energy models for the SPA hardware are not available, then a look-up table (ROM) can be used for storing all possible energy values. Note that the models based on closed-form expressions such as (15) are useful in determining the energy-optimum configurations, but are not used to estimate the energy savings. 
B. Reconfigurable Adaptive Filters
In Fig. 4 , we present a reconfigurable architecture for an adaptive filter, where we have modified the architecture in Fig. 2 by introducing control signals and , which power up/down the th tap. For example, setting forces a zero at the input to the F-block multiplier of the th tap and bypasses the F-block adder. Additional energy savings are possible by zeroing out or freezing . This is not done because it was found via a real-delay gate-level simulation that the energy dissipation in such a case is less than 5% of the average multiplier energy. In addition, it is assumed that reconfigurations in Fig. 4 occur once every samples, where is large (e.g., for the NEXTcanceller application). Therefore, energy dissipation due to the transition in the multiplier coefficient input from to 0 is ignored. For the applications, where fast reconfigurations are required, it might be better to instead hold the inputs to the previous value and force the output to zero. This can be done by replacing multiplexers at the multiplier inputs by pass transistors. Such an approach will also be useful for folded architectures (e.g., in fractionally spaced equalizers). If needed, additional multiplexers can be employed to power down the adder in F-block. Thus, the th tap in the F-block can be powered down by setting . Similarly, powers down the multiplier in the th tap of the WUD block.
Additional energy savings can be achieved by changing the precisions of the input signal and coefficients. In [36] , the coefficient precision was varied by providing a gain at the output of the filter. By changing the gain, the adaptive filter can be made to converge to new coefficients with smaller precision. In this paper, we adjust the precision by forcing the LSB's to a value of "0." The advantage of this scheme is its quick convergence and simplicity of implementation. Forcing the LSB's to "0" does not vary the coefficients by a large amount. Thus, the adaptive filter tracks the new optimum coefficients (at optimum precision) very quickly. In Section IV, we will prove that the coefficient precision is a logarithmic function of the number of powered-up taps, and the precision requirement do not change by more than 2 b. Therefore, the reconfiguration of precision can be implemented with a very small hardware overhead consisting of two AND gates for each tap.
The energy dissipation of an -tap reconfigurable adaptive filter employed in the SPA block is given by (16) where and are the energy dissipations for the F and WUD blocks, respectively. It can be seen that is given by (17) where is the energy dissipation of the multiplier in the F-block and th tap. Similarly, the energy dissipation of the WUD block is given by (18) where is the energy dissipation of a weight-update block multiplier. It is worth mentioning that two inputs to all the multipliers in the WUD-block (see Fig. 2 ) have the same statistics and, therefore, all WUD-block multipliers are assumed to consume the same energy. In case of a sign-LMS algorithm and powers-of-two LMS algorithm, can be replaced by the energy consumption of a shifter. As there is no need to update a tap if it is powered down or if the filter has converged, then in these two cases, we will force . Therefore, after convergence, the critical path delay in the F-block is given by (19) where is the computation delay of a two-to-one multiplexer (2 1 mux). The critical path delay in (19) will be maximum when all the taps in adaptive filter are powered up. The maximum critical path delay is obtained by substituting in (19) and is given by (20) However, when some of the taps in the F-block are powered down (i.e., for some ) then in (19) will be smaller than in (20) . The reduced critical path delay for these cases can be exploited to save energy further by lowering the supply voltage by an appropriate amount, as described in Section II-B. It is worth noting that the effectiveness of critical path change is dependent on the relative values of to other terms in (20) .
C. Complex Adaptive Filters
In this section, we consider complex adaptive filters with complex-valued coefficients (where and are real and imaginary parts of ) processing the complexvalued data (where and are real and imaginary parts of ). These filters are commonly employed in many communications systems including the NEXT canceller, to be described in Section V. In particular, we will employ the low-power strength-reduced (SR) [3] architecture shown in Fig. 5 . This architecture can be derived by viewing complex filtering as polynomial multiplication and then applying the strength-reduction transformation [15] at the algorithmic level. The reconfigurable SR adaptive filter architecture can be derived from Figs. 4 and 5. The energy dissipation of the SR adaptive filter is given by (21) where and indicates that the multipliers in the th tap of the F and WUD blocks are powered down.
In Section IV, we develop energy-optimum reconfiguration strategies for the reconfigurable adaptive filters presented in this section.
IV. DAT
In this section, we present DAT for low-power reconfigurable signal processing, specifically for adaptive filters. The motivation for DAT is that the worst-case scenario is usually not the nominal scenario. Hence, significant energy efficiencies can be gained by having an SMA block (see Fig. 1 ) that monitors the input state and then reconfigures the SPA block. This naturally leads to the definition of the input and configuration state-spaces discussed in Sections IV-A and IV-B, respectively. These definitions are then employed in formulating an energy optimization problem discussed in Section IV-C. The solution to this problem is derived in Section IV-D, which results in an energy-optimal reconfiguration strategy. In Section IV-E, we compute the energy savings due to DAT. The DAT technique is employed in a system identification example in Section IV-F.
A. Input State-Space
We model the nonstationarities in the input via the definition of an input state-space as follows. 
Definition 1: The input state-space
, where is a vector of input-dependent parameters. The input state at time instant , , will be in state (i.e., ) with a probability . As an example, consider the case of a 155.52-Mb/s ATM-LAN networked office building with 100 workstations, out of which 80 workstations are approximately 70 m from the closet and the remaining are distributed equally between 40 and 100 m. In this case, the state-space will have three elements, i.e., , , and and the probability of occurrence , , and . The elements of vector would be the parameters of interest in the design of an ATM-LAN receiver. For example, one parameter of interest is the peak-to-average ratio (PAR), which is defined as the ratio of the peak and the root-mean-squared values of the input signal. Thus, when the input state , the corresponding PAR (denoted as ) is useful in determining the input precision . Thus, , where , , and are the input signal energy, input PAR, and input SNR, respectively. Hence, the state-space components are given by dB dB dB dB dB dB dB dB dB
In Section V, we will employ to model different lengths of the cable for 155.52-Mb/s ATM-LAN. Usually, is monitored over a window of samples.
B. Configuration State-Space
A reconfigurable hardware fabric is characterized by its configuration state-space, which is defined below.
Definition 2: The configuration state-space , where is a vector of reconfiguration control signals. The hardware fabric will be in configuration at a given time instant where . For an -tap reconfigurable adaptive filter (see Fig. 4 can have values from 1.5 to 2.5 V with a step of 0.5 V, then , , , , and , are control words with eight, eight, three, two, and two bits, respectively. Thus, is a control word with 23 bits and the number of configurations . The above example indicates how easily the number of configurations explodes with an increase in reconfigurable parameters. We are interested in determining the energyoptimum configuration for every input state from a total of possible configurations while satisfying an MSE constraint. Due to the large number of possible configurations , it becomes important to develop a systematic approach to obtaining . In order to develop such an approach, we formally define the energy-optimum configuration as follows.
Definition 3: The energy-optimum configuration for a given input state is defined as
where is energy consumed by the SPA block in configuration , is the specified MSE and is the MSE achieved by the SPA block when the input is in state and the SPA block is in configuration .
If the SPA block consists of the adaptive filter in Fig. 4 , then is given by (16) (for a real adaptive filter) or (21) (for a complex adaptive filter) and is given by (23) where , , and are as defined in Section II-A. Thus, in (22) , the objective function is tied to the hardware fabric and the constraint is dependent upon the application at hand. Next, we formulate the energy optimization problem whose solution will result in a energy-optimum reconfiguration strategy.
C. Energy Optimization Problem
In its most general form, the energy optimization problem can be written as s.t. (24) Note that the optimization problem in (24) is independent of the hardware platform, i.e., one could potentially have a platform based on an FPGA, programmable digital signal processor (software DAT) or a multiprocessor [17] . This is because the hardware-specific parameters can be incorporated via the definition of and the configuration state-space . In this paper, we have focused on a dedicated implementation of a reconfigurable DSP. We simplify the problem in (24) by solving it independently for each input state . The energy optimization problem for the reconfigurable adaptive filter in Fig. 4 is given by s.t. (25) Note that we do not include the WUD-block reconfiguration signals 's in the optimization problem because we assume that after the adaptive filter has converged, i.e., the WUD-block is powered down. Next, we determine the solution to (25) via the Lagrange multiplier method [37] to obtain a practical reconfiguration strategy.
D. Energy-Optimum Reconfiguration Strategy
A block-level diagram of a DAT-based adaptive filter is shown in Fig. 6 . The SPA block has the architecture shown in Fig. 4 and the SMA block computes the energy-optimum reconfiguration strategy (to be derived in this section) for the SPA block.
The first sub-block in the SMA block detects the value of the input state . The energy-optimum values of the configurable parameters , , , , and are then computed in other sub-blocks as a solution to the energy optimization problem (25) presented in Section IV-C. In order to keep the reconfiguration strategy simple, we propose to solve the optimization problem in (25) through the following three steps.
Step 1) Determine the optimum values of control signals and via the Lagrange multiplier method [37] .
Step 2) Determine the optimum precisions and via (31) and (36), respectively.
Step 3) Determine the optimum supply voltage from (11) and (12) and (19) and (20) . Note that it is possible to stop at any of the steps indicated above to obtain increasingly suboptimal low-energy solutions. We next describe the above three steps in more detail.
1) Energy-Optimum Choice of and (Step 1):
We define a dual optimization problem as follows: (26) where is a constant referred as the Lagrange multiplier [37] and (27) where is a constant multiplier, is the th coefficient, and is the energy of a multiplier that has at one of the inputs.
It can be shown [38] that the solution to (26) is given by (28) where is a constant. Equation (28) indicates that it is better to power down the taps with small values of the energynormalized metric , where is defined as (29) Intuitively, small values of imply that the th tap contributes less to the SNR (as is small), but consumes more energy ( is large). In practice, we do not need to compute the constant in (28) if we employ the reconfiguration strategy of powering down the taps, starting with those with the smallest value of , until the MSE constraint is violated. It is worth mentioning that the solution to Step 1 depends upon the multiplier architecture. The optimum value for is chosen as zero if either or the filter has converged.
If is assumed to be independent of , then the energy-optimum reconfiguration strategy would be to switch off taps with the smallest coefficients. Thus, the strategy proposed in [18] for low-pass filters (LPF's) falls out as a special case of (28) . Note that the reconfiguration in (28) is different from the case where only the end taps are powered down [36] . Finally, our strategy has been derived by solving an optimization problem and, hence, is guaranteed to be energy optimal under the SNR constraint. In Section V, we will demonstrate the application of this strategy to a NEXT canceller in a 155.52-Mb/s ATM-LAN transceiver.
2) Energy-Optimum Choice of Precisions (Step 2):
Let the coefficients in the F-block be represented by bits. Assuming a linear stochastic model for the fixed-point error in the coefficients , the MSE for an -tap fixed-point FIR filter is given by [39] (30)
where the second term in (30) is the coefficient quantization error due to the fixed-point implementation and is the MSE for the floating-point algorithm.
From (30) , we see that in order to maintain a fixed quantization error, the required coefficient precision decreases with the filter length. Therefore, coefficient precision can be reduced when the taps get powered down in the reconfigurable datapath. From (28) and (30), the optimum coefficient precision is obtained as follows: (31) where is the maximum precision required when all the taps are powered up. The reduction in precision [see (31) ] with the number of taps is very small. In fact, one bit reduction in the precision is achieved for each four times reduction in filter length.
Similarly, the input precision can be determined from the expression for the signal-to-quantization noise ratio at the input (SQNR). Let be the input signal with the maximum value and mean squared value . Assuming that we employ bits to quantize and that the quantization noise is a uniformly distributed signal over the interval where , we obtain the quantization noise power as follows: (32) From (32), the SQNR (dB) can be obtained as dB (33) which can be further simplified to obtain dB dB (34) where PAR (dB) is the PAR at the input, and is defined as dB (35) For an equalizer, we can assume that an automatic gain control (AGC) block normalizes the input signal so that the input signal range matches that of the analogto-digital converter (ADC). The root mean square (rms) value can then be computed by taking the square root of the time average of the squared input signal. The optimum precision can be computed from (34) as follows:
where is the worst-case PAR and is the maximum input precision. We keep the data precision fixed in the NEXT canceller for a 155.52-Mb/s ATM-LAN, as it was found that these do not change.
3) Energy-Optimum Choice of Supply Voltage (Step 3):
In reconfigurable systems where variable supply voltage generators are available (such as in [31] and [32] ), an energyoptimum value of (if computed) can be employed to provide a coarse initial estimate to the tracking loop. In this section, we demonstrate how the energy-optimum value for can be computed. The critical path delay of an adaptive filter is obtained by substituting , in (19) . Next, we obtain the normalized processing rate by substituting (19) in (11) as follows: (37) where is the sample period. The optimum supply voltage can now be obtained by substituting into (12) . Thus, we have presented a practical reconfiguration strategy to determine the configuration parameters , , ,
, and for a DAT-based adaptive filter, as shown in Fig. 6 . In the following section, we compute energy savings due to a DAT-based system over a traditional worst-case design.
E. Energy Savings Via DAT
The average energy dissipation of the SPA block is given by (38) where is the energy-optimum configuration corresponding to state obtained as a solution to (24) . The SMA block is activated after samples and that too only if there is a transition in the state. For sufficiently large, the energy dissipation of the SMA block will be negligible. The total average energy dissipation per sample of a DAT-based system is obtained as (39) where, in general, will be negligible as compared to . The average energy savings due to a DAT-based system is given as (40) where is the energy dissipation of the worst-case design. In fact, equals the maximum value that in (38) can assume over all possible states .
F. Example: System Identification
In this subsection, we apply DAT to a system identification example in order to demonstrate the concepts presented thus far. The problem is to estimate the impulse response of an unknown system via an adaptive filter, as shown in Fig. 7(a) . The unknown system can represent an echo path in a voiceband modem or a crosstalk path in high-speed data modem, where the adaptive filters are usually employed to identify the unknown echo/crosstalk signal and then cancel it. As the number of taps required by the adaptive filter varies with the unknown system, hence, DAT-based approaches can achieve significant energy savings over the traditional designs based upon worst-case assumptions.
We assume that the unknown system can be in five distinct states (i.e., ) with impulse responses for each of the states, as shown in Fig. 7(b) . Such impulse responses can be encountered in a wireless channel with multipath fading. We assume that probability of occurrence of these states is given by , , , , and . The input signal is uncorrelated with variance , and we assume a noise of variance (i.e., dB) at the output of the unknown system. As the signal in Fig. 7(a) represents the residual echo/crosstalk, we are, therefore, interested in minimizing . Hence, we set the desired MSE, . For the DAT-based adaptive filter, assume an input precision of bits, coefficient precision of bits, maximum number of taps equal to taps, and maximum supply voltage of V. Also, assume the computational delays of the hardware blocks to be ns, ns, ns, ns, and a sampling rate of 50 MHz. Further assume that the hardware platform permits the powering-up of taps via control signals , reconfiguring the coefficient precision and supply voltage . Table I shows the optimum configurations achieved by employing the strategy presented in Section IV-D. When the input state varies from to , the number of powered-up taps decreases from 8 to 2, the Table I , is given by and , respectively, where indicates that a tap is powered down, and 1 indicates that a tap is powered up. Thus, for states and , the energy-optimum configuration requires powering down internal taps and, hence, could not have been obtained via existing approaches [18] , [36] .
The energy dissipation of the SPA block in each of the five states is also shown in Table I . The energy dissipation for the worst-case design corresponds to state , which requires the maximum number of powered-up taps, the maximum precision, and the maximum supply voltage . The energy savings, shown in Table I , are computed by employing (40) . Energy savings range from 0% to 74% as the input state changes from to . The average energy savings are computed via (40) and are shown in Table I . An average of 39% energy savings are achieved for this example. As can be seen, the average energy savings due to DAT depend upon the relative energy dissipation and probability of occurrence for the states corresponding to the nominal and the worst cases. Large energy savings can be expected for situations where , and the state corresponding to the worst case is not very likely. In Section V, we employ the DATbased adaptive filter as a NEXT canceller for a 155.52-Mb/s ATM-LAN.
V. APPLICATION TO 155.52-MBITS/S ATM-LAN
In this section, we will study the performance of the proposed DAT-based system in a high-speed digital communication system. We will employ the DAT-based adaptive filter as a NEXT canceller for a data rate of 155.52 Mb/s over a UTP wiring [40] . Fig. 8 shows a vendor's view of an asynchronous transfer mode ATM-based LAN. The environment of interest for the UTP category-three (UTP-3) user network interface (UNI) consists of "I1" and "I2" interfaces (see Fig. 8 ). The wiring distribution system runs either from the closet to desktop or between hubs in the closets. The wiring employed consists mostly of either a TIA/EIA-568 UTP-3 four-pair cable or the DIW 10 Base-T 25-pair bundle. The propagation loss for these channels increases rapidly with an increase in the frequency of operation. Therefore, bandwidth efficient transmission schemes become necessary to support such high data rates over these channels. The carrierless amplitude phase (CAP) transmission scheme is such a scheme and is the standard [41] for 155.52-Mb/s ATM-LAN over UTP-3 wiring.
In the LAN environment, the two major causes of performance degradation for transceivers operating over UTP wiring are: propagation loss and crosstalk generated between adjacent wire pairs. The propagation loss that is assumed in system design is the worst-case loss given in the TIA/EIA-568 draft standard for category-3 cable [42] . This loss can be approximated by the following expression: (41) where the propagation loss is expressed in decibels per 100 m and the frequency is expressed in megahertz. The worst-case NEXT loss model for a single interferer is also given in the TIA/EIA Draft Standard [42] . The squared magnitude of the NEXT transfer function corresponding to this loss can be expressed as (42) where the frequency is in megahertz, and is expressed in decibels. In Section V-A, we briefly describe the CAP transceiver for a 155.52-Mb/s ATM-LAN. The interested reader is referred to [40] for further details. The simulation setup is described in Section V-B, while the simulation results are presented in Section V-C.
A. 155.52-Mb/s ATM-LAN Transceiver
The block diagram of a digital CAP transceiver is shown in Fig. 9 . The bit stream to be transmitted is first passed through a scrambler. The scrambled bits are then fed into an encoder, which maps blocks of bits onto one of different complex symbols for a -CAP line code. In this study, we have employed because this value is necessary to limit the transmit spectrum to 30 MHz; a limit set by the Federal Communication Commission (FCC). The symbols and are processed by digital shaping filters. This requires that the shaping filters be operated at a sampling frequency , which is at least twice the maximum frequency component of the transmit spectrum. The outputs of the filters are subtracted and the result is passed through a digital-to-analog (D/A) converter, which is followed by an interpolating LPF. It can be seen that most of the signal processing at the transmitter (including transmit shaping) is done in the digital domain, which permits a robust VLSI implementation.
In the receiver (see Fig. 9 ), the analog signal is first amplified by a programmable gain amplifier (PGA) whose gain is controlled by a digital PGA control block. The output of PGA is passed to an analog-to-digital (A/D) converter operating at 77.76 MHz, which converts the analog signal into a digital signal. The sampling instant of the A/D is controlled by a digital timing recovery block. The digital output of the A/D is processed by a fractionally spaced linear equalizer (FSLE). In addition, the local transmitted symbols are passed through a complex adaptive NEXT canceller, which tries to cancel the effect of the NEXT in the received signal. The algorithmic performance measure in this case is the SNR at the slicer , which is equal to the ratio of signal constellation power (which equals 42 for 64-CAP) to the MSE across the slicer. Hence, we have the following relation:
Henceforth, we employ as the algorithmic/system performance measure. For 64-CAP, a specification of dB is sufficient to obtain a probability of error less than 10 .
The complexity requirements for the NEXT canceller increase as the cable length increases. Traditionally, the NEXT canceller is designed for the worst-case scenario, i.e., the longest cable length. However, for shorter cable lengths, the complexity of the NEXT canceller can be reduced and, thus, substantial energy savings can be achieved. In this experiment, we employ a DAT-based NEXT canceller to enable the energy savings possible due to the different cable lengths.
B. Simulation Setup
We assume a spatial variation in the length of the UTP-3 cable from 110 to 40 m (see Fig. 8 ) with the lengths having a Gaussian distribution with a mean of 75 m, as indicated in Table II . Usually, an estimate of the probability distribution of the cable lengths can be obtained from surveys, which currently are not available. Thus, the state-space has eight states. In our simulations, we emulate the spatial variation of The received signal power depends upon the attenuation of the channel and, hence, the cable length. Furthermore, the input SNR is also a function of the received signal power and this determines the performance of the receiver. We define the input states (corresponding to the eight cable lengths in Table II The changes in the input state can be detected by monitoring . In particular, we compute by averaging over 1024 symbols and substitute its value into (43) to obtain the value of . As the NEXT canceller is a complex adaptive filter, we employ the SR architecture proposed in [3] . The SR architecture enables 21% energy savings without any loss in SNR. However, the energy savings in this paper do not include those due to SR transformation. This is because the reference architecture for computing the energy savings is also based on the SR architecture. Assume that dB (this is 1.55 dB more than the minimum of 29.45 dB) is the desired performance level. Furthermore, if , then we assume that the input state has changed substantially so that a new SPA configuration for the NEXT canceller needs to be computed. We choose dB to remove undesired glitches in steady state. This will guarantee that the SNR is always better than 29.45 dB, thus keeping the bit error rate below 10 .
We will assume ns, ns, and ns, ns. The sample period is ns. It is assumed that V and the SMA block always operates at 2.5 V. The worst-case design corresponds to number of powered up taps , coefficient precision bits, data precision bits, F-block adder precision , and supply voltage V. The data precision is kept constant at four bits because the input to the NEXT canceller belongs to the 64-CAP signal set , which can be represented with four bits.
For the energy consumption, it is assumed that the standard cells based on 0.18-m 2.5-V CMOS technology are being employed. The energy consumption models for the arithmetic blocks are obtained by real-delay simulations via the gatelevel simulation tool MED [35] and employed to compute the energy savings due to DAT. However, the SMA block employs the simple energy models (also supported by realdelay simulations), described in Section III-A, in order to compute the energy-optimum configuration, as presented in Section IV-D. We next present the simulation results for the DAT-based NEXT canceller.
C. Simulation Results
Consider the NEXT canceller designed for the worst case. The local transmitter (see Fig. 9 ) is switched on after 102 400 symbols. This introduces the NEXT interferer into the receiver which the NEXT canceller attempts to cancel. In Fig. 10(a) , the convergence plot of shows that the performance of the worst-case design varies from 32 to 42 dB as the length is varied from 100 to 40 m. The value of dB ensures a probability of error less than 10 . Hence, it is possible to trade off the excess performance for shorter cable lengths in order to achieve energy savings. We employ the DAT-based NEXT canceller to enable these energy savings.
Recall that for a DAT-based NEXT canceller, an SNR window of 31-34 dB was specified. This means that if is less than 31 dB, then some of the taps are powered up to enhance the performance. Similarly, if is greater than 34 dB, then some of the taps are powered down to achieve energy savings. Thus, DAT-based NEXT canceller jointly optimizes energy dissipation and . In Fig. 10(b) , the algorithmic performance measure for a DAT-based NEXT canceller is plotted. It can be seen that for the DAT-based system always lies in the window 31-34 dB during steady-state, which guarantees adequate system performance. Whenever the channel length changes, there is a sudden decrease in [refer to the peaks in Fig. 10(b) ]. In that case, all the taps are turned on and the adaptive filter coefficients converge to their optimum settings. After convergence, the SMA block monitors the and determines the energy-optimum configuration for the NEXT canceller according to the strategy described in Section IV-D.
The final configuration for each state (shown in Table II ) indicates that the number of powered-up taps range from 4 to 30 for cable length variations from 40 to 110 m. Similarly, the coefficient precision vary from 11 to 12 bits. As there is a 4 reduction in the number of powered-up taps in going from state to , we obtain a reduction in coefficient precision by 1 b. Also, the supply voltage varies from 2.0 to 2.5 V for cable lengths ranging from 40 to 110 m. In Fig. 11 , we plot energy savings [see (40) )] for a DATbased NEXT canceller employing array multipliers when the cable length varies from 110 to 40 m. The energy savings include the energy consumption in the SMA block. For the variable supply voltage case, the energy savings range from 2% to 89% for cable length variations from 110 to 40 m, respectively. Negative energy savings are due to the reconfiguration overhead that the traditional worst-case designs do not have. The energy savings range from 2% to 87% when the supply voltage is fixed and cable length is varied from 110 to 40 m. It was found that for the state probability distribution given in Table II , the average energy savings with the fixed supply voltage are 62%. Similarly, for the variable supply voltage case, the average energy savings were found to be 69%. Thus, for this application, only 7% additional energy savings over the fixed supply voltage case are achieved due to the variable supply voltage scheme. This is because the critical path delay varies only marginally with reduction in the number of powered-up taps. Larger energy savings can be expected in the architectures, where the critical path is a strong function of the powered-up taps. Thus, it can be seen that the DATbased approach is quite attractive from the viewpoint of energy savings for the 155.52-Mb/s ATM-LAN application.
VI. CONCLUSIONS AND FUTURE WORK
We have proposed DAT's as a formal approach to the design of low-power reconfigurable DSP systems and have demonstrated its use in the design of a NEXT canceller for a 155.52-Mb/s ATM-LAN. The main contribution of the DAT-based approach is a systematic determination of energy-optimal reconfiguration strategies for any platform or application. Application of DAT techniques require a proper understanding of the system requirements and the constraints imposed by the reconfigurable hardware fabric. Thus, the DAT approach embodies the growing trend of jointly addressing system and circuit design issues in order to obtain increasingly superior solutions.
DAT's have a broad range of applicability other than the ones presented in this paper. For example, DAT techniques can be applied for developing low-energy software for programmable DSP's, determining energy-optimal reconfiguration strategies for FPGA's, design of low-power wireless transceivers, lattice-based adaptive equalizers, forward errorcorrection (FEC) codecs, and computer-aided design (CAD) tools that enable the design of complex DAT-based DSP and communication systems.
