Abstract-Adaptive channel equalization algorithms are commonly used in wireless communications receivers to counter intersymbol interference, multi-path dispersion, and other time varying channel degradations. In this paper we obtain approximate expressions for the increase in mean square error of the LMS adaptive algorithm when the total processing power is decreased by reducing the number of data and filter coefficient bits used by the algorithm. We also obtain expressions for the power-optimal bit-allocation factor which determines the proportion of the bits allocated to the data vs. allocated to the coefficients. Numerical studies are presented for an exponential memory IS1 channel and 4-ary PSK signalling. These studies indicate that as few as 8 bits total are needed to equalize the channel and that most of these bits (6 out of 8) should be allocated to the filter coefficients.
I. INTRODUCTION
In a battery powered receiver, the adaptive equalization function consumes a significant portion of total processing power. For example the SINCGARS combat radio used by the US Army consumes on the average 7 Watts in receive mode of which more than 1 Watt is consumed by the channel equalizer [l] . Therefore the channel equalization function is a prime target for power reduction in these handsets. There have been many digital hardware design strategies proposed for power reduction including: reduction of supply voltage, reduction of clock speed and data rate, parallelization and pipelining of operations, using sign-magnitude arithmetic, and differential encoding of data [2] , [3] . Another technique, which is the springboard for this paper, is the reduction of the number of bits used to represent the data and control variables in the digital circuit. This bit width reduction strategy is very highly leveraged since it reduces the power dissipation everywhere in the data and control flow paths. This strategy is also very versatile since it can be applied to any hardware architecture and can be easily adjusted in real time by dynamically switching-off bus lines and register bits. However, bit width reduction generally entails a degradation in algorithm performance, as measured by adaptive algorithm convergence rate, steady state mean square error (MSE), and subsequent probability of bit error. This paper pro- vides an analysis of MSE degradation versus power reduction for a widespread class of adaptive algorithms popularly known as the LMS algorithm.
The LMS algorithm was introduced by Widrow [4] and is one of the most common adaptation algorithms found in practical systems such as channel equalizers [5] , adaptive antenna arrays [6] , and interference cancellation systems [7] . The LMS algorithm adapts the filter coefficients of an FIR filter driven by an error signal formed by subtracting the training signal from the received data. We consider a quantized version of the LMS algorithm, called QLMS, which is an LMS algorithm implemented with separate uniform scalar quantizers in the data path and the filter coefficient path, where the quantizers can have different resolutions. To obtain significant power reduction, we propose applying different pairs of quantizers during the transient acquisition phase and the steady state tracking phase of the algorithm. In this paper we focus on the steady state tracking phase.
We first present a formula for the increase in steady state mean square error (MSE) due to quantization which generalizes the formulas of Caraiscos and Liu [8] to the case of complex data and coefficients. We then derive a pair of optimal bit-allocation factors which minimize the increase in MSE subject to: 1) a total bit-width constraint; and 2) a total power consumption constraint. Finally we show that QLMS with optimal bit-allocation consumes significantly less power than LMS at little expense in performance. While these results hold for the generic LMS algorithm in a variety of applications, we concentrate on the case of channel equalization with training in this paper. Numerical examples will be given for an IIR (exponential memory) channel which illustrate that the power can be reduced by more than a factor of 4 relative to the standard LMS implemented with 16 bit arithmetic and at negligible increase in MSE. This power reduction is achieved by a QLMS algorithm having a total of 8 bits and optimal bit allocation strategy consisting of assigning 2 bits to the data and 6 bits to the coefficients.
REGISTER LENGTH AND POWER
It is well known that the power consumed by the operation of loading successive time samples of a random sequence into a B-bit register is proportional to the average number of bit flips induced in the register 191. While for a white sequence the average number of bit flips is B , in general this average can be much less than B for a correlated sequence. This is because for correlated sequences higher order bits have lower probability of transitioning than lower order bits. Several models for the power consumption of register loading have been proposed [9] . We propose to use the following simple upper bound on the power consumption for a B bit register, derived under the assumption of a zero mean wide sense stationary Gaussian random sequence: Channel where 7 is the power dissipation-per-bit, which depends on factors such as load capacitance and supply voltage, and R(T) is the autocorrelation function of the random sequence. Note that it is important that the bound (1) is conservative: if we constrain the right hand side of (1) to some maximum tolerable power dissipation, P, , , say, then a circuit design which uses register bit width B,,, which
guaranteed to consume less power than P, , , .
A plot of PB versus B is given in Fig. 1 for an AR(1) sequence with real pole located at a l . Note that as the pole approaches the unit circle the PB curve displays an abrupt threshold occurring at an increasingly high bit-width. The threshold bit-width can be specified by the formula 111. QUANTIZED ADAPTIVE CHANNEL E Q U ALIZ AT10 N Figure 2 shows the block diagram of an adaptive equalization system with two different quantizers, denoted Qd and Q,, applied to the data and to the filter coefficients of an adaptive p t a p FIR filter. 
where
is the quantized error signal. Here p is the gain parameter which controls the convergence properties of the algorithm. The total power per iteration of the quantized LMS algorithm is determined by power dissipation of shift, add, multiply, memory load, and memory store operations. This depends on the specific design of the FIR filter and control circuitry. For illustration we will use the following expression for total power dissipation per iteration of LMS:
This expression is linear in the number of bits Bd and B, and assumes fixed point complex arithmetic, overwriting the data stack without using shift operations, multiplication using table lookup as opposed to adding partial products, and generic power coefficients qg representing logic gate power consumption and r]t representing table lookup operation.
A . P e r f o r m a n c e of L M S Algorithm
The performance of the LMS adaptive algorithm is typically characterized by two quantities: the speed of convergence and the excess MSE. We assume that X k , Y k and as long as the gain parameter p satisfies the condition
When the QLMS algorithm converges, the MSE converges as a decaying exponential with the 1/e time constant of the slowest mode equal to 73dB = l/(-maxi h ( l l -,U&/), called the adaptation time constant. Note that the speed of convergence generally increases as p increases.
A.2 Excess Mean Square Error
When the above condition for mean convergence of QLMS is met, an expression for the steady state mean square error can be derived.
where is the increase in MSE due to quantization, and Ad, A, are the maximum amplitude ranges of the data and filter coefficient quantizers. The expression above applies to complex sequences and is derived in a very similar Tanner to the derivation of Caraiscos and Liu [8] for real-valued sequences. As in [8] we make several standard assumptions including: the process X k is circularly Gaussian, the quantization error is a zero mean white sequence, the quantization errors are independent of the data xk and the filter coefficients wk. These assumptions are fairly restrictive but enable us to obtain useful closed form expressions.
With these relations the increase in MSE due to quan- 
IV. OPTIMAL BIT ALLOCATION STRATEGIES
We present expressions for the optimum allocation of bits to data versus filter coefficients under two constraints: @tal number of bits and total power consumption. While a combined study of bit allocation as a function of convergence rate and excess MSE is of importance, for concreteness we limit the focus in this paper to the excess MSE.
Assume that there are a total of BT + 2 bits which are available to allocate between data and coefficients, i.e. 
A . Total Bat-Width Constraint
Under a constraint on BT the objective becomes to minimize the increase At-in MSE with respect to p. Graphically, this is the same as minimizing AE along the diagonal line BT = B d + B, of slope -1 in the B,, Bd plane shown on Fig. 4 . It is straightforward to show that At-is convex as a function of p with a single minimum occurring at the point p = p*:
and the minimum value is m i n a € = 2-BT+1 P --2-BT-1
P
To obtain the concise relations involving go we have assumed that the ranges of the quantizers are identical & = A, = 1. Observe that the optimal bit allocation factor p* converges to 1/2 as the combined register length BT goes to infinity. This is the regime where the standard allocation is optimal: allocate an equal number of bits to data as to filter coefficients. As register length decreases or convergence speed increases the standard allocation becomes suboptimal. In typical implementations, e.g. where AGC is implemented to prescale the data to unit variance, the gain parameter is chosen such that p << 1 to ensure convergence. In particular, if p < 1/4 then p* is less than 1/2 and more bits should be allocated to the filter coefficients than to the data. Also worth noting is that At-increases in p at a linear rate, decreases in p at an inverse square root rate, and decreases in BT at an exponential rate. Therefore, the total number of bits allocated gives more leverage over excess MSE than any other of the design parameters.
B. Total Power Constraint
Under the constraint on PT, we can use (2) to re-express the total combined number of bits BT as a function of p and PT Now using (4) in the expression for AE (3) we again obtain a convex function of p with unique minimum at p = p** which gives the corresponding minimum MSE where BT is given in (5) .
Observe that the optimal bit allocation factor p** converges to the standard allocation 1/2 as the total power constraint PT is relaxed to infinity. As PT decreases the standard allocation becomes suboptimal.
V. NUMERICAL EXAMPLE: 4-PSK
Here we briefly consider the case of a Gaussian noise IIR channel with a single pole at a1 = 0.8, a 4-PSK signal Sk of unit variance, noiseless training sequence yk = Sk, and a 2-tap LMS filter with gain coefficient p = 0.01. This corresponds to a rather severe exponential memory channel with intersymbol interference (ISI) extending over 5 to 10 data samples. Figure 5 shows the BT-constrained optimal data bit allocation factor p* as a function of BT superimposed on a plot of the resultant optimal MSE, t. Note that the formula (3) for A€ is independent of any channel effects (i.e. independent of R, and R x g ) . Therefore, the channel affects only that portion of the MSE that is not attributed to quantization error. This quantity is visible at the large BT region. Hence, the only effect of the channel on the MSE plot is to "shift" it up by the infinite precision MSE, cri -R&R;lR,, + ptr(R,). The channel has no effect on the shape of the plot, or, as will be shown, the optimal bit allocation factors. Figure 5 shows that MSE does not begin to degrade significantly until BT falls below approximately 6 bits. For BT = 6 bits the optimal data bit allocation factor is approximately p* = 0.4 which means that 2 bits plus sign should be allocated to the data while 4 bits plus sign should be allocated to each of the filter coefficients. At this breakdown point the optimal data bit allocation factor is approximately p* = 0.25. We can use relation (5) with p = p**, which is plotted in Fig. 7 , to find the corresponding BT as a function of PT. From the plot we see that PT = 1200 corresponds again to BT = 6, but the optimal p** tells us to allocate only 1 bit plus sign to the data and 5 bits plus sign to the filter coefficients. This reduction is because the data operations dominate the power relation (2). 
VI. CONCLUSION
We have derived expressions for optimal bit allocation for adaptive LMS algorithms under combined register length constraints and total power constraints. These expressions can easily be specialized to a specific hardware implementation for computation of the number of bits to allocate to data and filter coefficients. A general conclusion is that the standard design strategy of allocating an equal number of bits to the data and filter coefficients is optimal only as the power or register length constraints get very large. For typical LMS implementations, it is optimal to allocate more bits to the coefficients than to the data. In particular, we have found that it is possible to reduce the number 
