Channel equalization is a required function in many high speed communication systems. As a result of the high performance requirements and complexity, adaptive equalization filters require significant power. These filters are often implemented in hardware rather than software on a DSP. In this paper, a power scalable adaptive equalizer is presented where the power scales with the required precision through the use of dynamic tap length and bit precision. (1) where A.is the adaptation step size.
INTRODUCTION
In recent years, power consumption in digital CMOS has become an increasingly important issue. In portable applications, a major factor in the weight and size of the devices is the amount of batteries. For non-portable devices, cooling issues associated with the power dissipation has caused significant interest in power reduction. In this paper, the system design of a power-scalable least means square (LMS) adaptive filter is described. LMS adaptive filters are used for channel equalization in modems and wireless transceivers. Because of the high rate and computation complexity involved, adaptive equalization filters consume a lot of pdwer. Currently, equalization is typically hardwired instead of using a digital signal processor because current DSPs are unable to handle the large number of operations required per second.
[2] cites a requirement of 1440 million operations per second for a decision feedback equalizer.
Previous work on low-power implementation of equalizers include the following. Nicol et al. [3, 4] proposed using carry save adders, Wallace-tree multipliers and booth encoding, with adaptive bit precision by adding a programmable gain at the output of the filter. Further power reduction is achieved by having burst-mode update and setting small taps to zero. The delay line (see Figure 1 ) still needs to operate because other non-zero taps still need to be updated.
Shanbag et al.
[SI considered shutting off certain taps according to an optimal trade off between power consumption and mean square error. When taps are shut down, the critical path length is reduced, and the supply voltage can be reduced to further reduce power. Additional power savings is achieved by forcing the least significant bits of the input signal and coefficients to zero to reduce their precision, and by using algebraic transformations and pipelining with relaxed lookahead techniques.
In this work, the trade-off between the quality of the cancellation of inter-symbol interference (measured by the standard deviation of the steady state output error), and (1) the length of the adaptive filter and (2) the precision of the taps is studied empirically and implemented in CMOS logic to demonstrate power scalability with the required precision.
TRADE-OFF BETWEEN STEADY STATE ERROR, AND FILTER LENGTH AND PRECISION
Using 3 channel responses chosen from a set of randomly generated channels with IS1 over 2 sample times, the performance of the filter with varying tap length, tap precision and adder precision is studied empirically. Figure 2 shows plots the standard deviation of the error as the tap length, tap precision and adder precision are vaned. Referring to Figure 2 , the following observations are made, First, given'the number of taps and the precision of the final sum, a higher tap precision leads to a smaller error and better performance. Second, a larger number of taps may lead to poorer performance, especially with low precision taps (IO, 1 1 and 12). As discussed in 161 the minimum filter order needed for a certain error rate is not well understood. The reader is referred to [I] for discussions on the effect of tap length and precision on the mean square error of the output. Finally, comparing the two graphs in Figure 4 shows the overall architecture of an implementation of a fifteen tap scalable adaptive filter. The design in this work has four levels of adjustability: (1) Five 1 l-bit taps, (2) Ten 1 l-bit taps, (3) Ten 16-bit taps, and (4) Fifteen 16-bit taps. For simplicity, the precision of the delay line is kept constant, although reducing it dynamically would have given us greater power dissipation reduction. Figure 5 : Turning off a block by gating the clocks, reseting the registers, and latching the input to multipliers.
A POWER-SCALABLE IMPLEMENTATION

Implementing Adjustable Length
To shut off the latter two filter blocks when not needed, the clk signals of these blocks are gated. In addition, the inputs to the tap-update multiplier are latched, since the err[Q : 01 is still changing. All the registers are also asynchronously reset so that when these blocks are restarted, they do not contain any old values. The latching and asynchronous reset are controlled by the rst signal of each sub-block as shown in Figure 5 . bits to zero. Using a shorter bit-length multiplier instead of a longer bit-length multiplier reduces power dissipation when the precision is not required. In 0.18pm technology, the area of duplicating the multiplier is relatively small though the multipliers could be further optimized for power consumption.
Implementing Adjustable Tap-precision
Control Logic
The control logic performs two functions. First, it decodes the Zewel[l : 01 signal into a signal for whether each block should be turn on or off (shut) and whether each block should be running high tap precision mode or low tap precision mode (Zowpre). Second, it generates the clk, lclk, Tst, and lrst signals for each sub-block. This is summarized in Figure 7 . The first portion is a simple combinational logic that computes the shut and lmpre signal for each sub-block.
The second portion is slightly more elaborate because we allow the ZeveZ[l : 0 1 signal to be asynchronous. When shut is low, a inverted clk and rst signal for a filter subblock is derived from the clk, and rst signal. When shut goes high, derived clock is held high at the subsequent rising clk edge. It is held high until the first rising clk edge following the falling edge of shut. During the time the de-rived clock is held high, rst of the sub-blocks are held high to reset the internal registers and to latch the inputs to the tap-update multiplier. The implementation of this block is shown in Figure 8 . Figure 8: Implementation of clock-gating circuit.
IMPLEMENTATION RESULTS
The final layout in Figure 9 has a transistor count of 1 17,3 15
and dimensions of 743pm by 773pm in 0.18pm technology, excluding the pads. When clocked at 33 MHz with a power supply of 1.8 volts, the power consumption is summarized in Tables 2 and 3 . Using the clock-gating technique, we obtain power savings in the clock network, the delay line and tap registers, multipliers and adders in the filter sub-blocks that are turned off. Using adaptive precision for the taps, we are also able to scale down the power dissipation of the tap multiplier and tap registers linearly with the tap precision. Figure 10 shows the trade-off between power consumption and standard deviation of error of the filter.
CONCLUSIONS
In this paper, a trade-off between power dissipation and the quality of ISI-cancellation of an LMS adaptive filter has that shorter filter length and smaller tap precision lead to larger standard deviation of error, but also consume less power.
