Low-power and high-speed algorithms and architectures for complex adaptive filters are presented in this paper. These architectures have been derived via the application of algebraic and algorithm transformations. The strength reduction transformation when applied at the algorithmic levelresults in a power reduction by 21% as compared to the traditional cross-coupled structure. A fine-grain pipelined architecture is then developed via the relaxed look-ahead transformation. The pipelined architecture allows high-speed operation with minimum overhead and when combined with power-supply reduction enables additional power-savings of 40-69%. Thus, an overall power-saving of SO-SO% over the traditional cross-coupled architecture is achieved.
Introduction
Power-reduction techniques [2, 3, 41 have been proposed at all levels of design hierarchy begining with algorithms and architectures and ending with circuits and technological innovations. It is now well recognized that an astute algorithmic and architectural design can have a large impact on the final power dissipation characteristics of the fabricated VLSI solution. In this paper, we will investigate algorithms and architectures for lowpower and high-speed adaptive filters.
Algorithm transformation techniques [3] such as lookahead [6], relaxed look-ahead [8] , block-processing, associativity [7] have been employed to design high-speed algorithms and architectures. Low-power operation is then achieved by trading off excess speed with power. Of particular interest is a class of transformations known as algebraic transformations [7] . Strength reduction [2] is an algebraic transformation, which has been applied at the architectural level to trade-off expensive multipliers with adders. This results in an overall savings in area and power. A key contribution of this paper is the application of the strength reduction transformation at the algorithmic level (instead of architectural level) to obtain low-power adaptive filter algorithms. An algorithmic level application of strength reduction is shown to be more effective in achieving power reduction as compared to an architectural level application. The application of strength reduction increases the critical path computation time. This results in a throughput limitation, which is undesirable in high-bit rate applications. We address this problem with relaxed look-ahead [8] transformation. This transformation results in a fine-grain pipelined architecture, which is an approximation of the architecture obtained by lookahead technique. The relaxed look-ahead technique maintains the functionality of the algorithm rather than the input-output behaviour. Furthermore, it is possible to trade off some of the increased throughput for reduced power dissipation via power supply reduction as indicated in [3].
Preliminaries
strength reduction and relaxed look-ahead pipelining. In this section, we will review some of the basics of
Algebraic Transformation
As can be seen from (2.2)-(2.3), the number of real multiplications is three and the number of additions is five. Therefore, this form of strength reduction transformation reduces the number of multipliers by one at the expense of three additional adders. If we assume that the effective capacitance of a twooperand multiplier is K c times that of a two-operand adder, it can be seen that strength reduction results in a power savings factor P S given by where PD, and Po,$,. are the dynamic power'dissipation of the original and strength-reduced algorithms. From (2.4), it is clear that for KC > 3, we will achieve power savings. Asymptotically, the power savings approach 25% as K c increases.
It can be easily seen that the strength reduction transformation increases the critical path computation time, which can be a limitation in the high speed applications. This problem is solved by throughput enhancement techniques such as pipelining as described next.
Relaxed Look-ahead PiRelining
In this sub-section, we describe relaxed look-ahead [SI technique, which is an approximation to the look-ahead 
where W ( n ) is a N x 1 vector of filter coefficients, p is the adaptation step-size, e(n) is the estimation error, X(n) is the N x 1 input vector, and d(n) is the desired signal.
A pipelined LMS algorithm can be obtained via relaxed look-ahead transformations described in [8] . The transformed equations are
2 . 8) where LA is the ook-ahea factor and D1 and h 2 are delays introduced via the delay relaxation and sumrelaxation. These delays can be employed to pipeline the hardware operators in an actual implementation.
In this paper, we will employ the relaxed look-ahead pipelined LMS filter to obtain the pipelined filter architectures. Furthermore, the increased throughput due to pipelining can be employed to achieve high-speed and low-power (in combination with power-supply scaling).
Low-Power Adaptive Filter Architec-
In this section, we will develop a low power adaptive filter via strength reduction transformation. We will assume that a passband digital communication system such as quadrature amplitude modulation (QAM) or carrierless amplitude/phase (CAP) modulation [5] is being employed. In this situation, the receiver processes a two-dimensional signal using a two-dimensional filter. This results in the traditional cross-coupled equalizer structure.
Traditional Cross-coupled Equalizer
Assume the filter input to be a complex signal X(n) t ure Architecture given by where X,(n) and Xi(.) are real and imaginary parts, respectively. Furthermore, if the filter W ( n ) is also complex ( W ( n ) = c ( n ) + jd(n)), then its output y(n) can be obtained as follows
where W H represents the Hermitian (transpose and complex conjugate) of the matrix W. A direct implementation of (3.2) results in the traditional crosscoupled structure shown in Fig. 1 , which requires 4N-2 adders and 4 N multipliers. In the adaptive case, a WUD-block would be needed to automatically compute the coefficients of the filter. This can be done as follows W ( n ) = W ( n -1) + pLa*(n)X(n) (3. 
. is the output of the slicer, and E* is the complex conjugate of E. Therefore, to implement WUD-block, we need the following real equations
From the WUD-block architecture in Fig. 2 , it is clear that we require 4 N + 2 adders and 4N multipliers for an N-tap complex filter. In the next subsection, we will present a low-power adaptive filter using strength reduction.
Low-Power Adaptive Filter Architec-
It can be easily seen that (3.2) involves multiplication of two complex polynomials. So strength reduction transformation presented in the previous section can be applied to (3.2). Applying the transformation, we obtain 
and Xl(n) = X,(n -X i ( n ) , c1(n) = c(nl+ d(n), and dl(n) = c(n) -don). The proposed arc itecture (see Fig. 3 ) requires three filters and two output adders which corresponds to 3 N multipliers and 4 N adders. We now consider the adaptive version and specifically analyze the W U D -block. From (3.7)-(3.9) and Fig. 3 , it seems an efficient architecture would result if cl(n) and dl(n) are adapted instead of c(n) and d(n).
Applying strength reduction transformation to the update equations for q ( n ) and dl(n), we obtain
where e X l ( n ) = 2e,(n)Xi(n) (3 . 12) eXa(n) = 2ei(n)X,(n) (3 . 13) eX3(n) = e l ( n ) x l ( n ) (3 . 14)
and el n) = e,(n) -e;(n), X I ( . ) = X,(n) -Xi(n). (Fig. 4) that it requires only 3N multipliers and 4N + 3 adders.
Combining the architecture for the F-block (Fig. 3) and WUD-block (Fig. 4) , we obtain the proposed strength reduced low-power adaptive filter architecture in Fig. 5. 
Power Savings
Using the definition of PS in (2.4), it can be easily seen that the power savings P S due to the proposed where T, and T, are two-operand multiply and single precision add times, respectively. For application that require large values of N, the lower bounds on T, may prevent a feasible implementation. In this section, we propose a solution to the problem by pipelining the SEA and therefore achieving high-speed. Some of the speed will be traded-off with power and thus achieving additional power savings. 
In order to derive the PEA, we start with SEA equations (3.6-3.9, 3.10-3.14) and'then apply realxed look- Observe that the equations are similar to the that of the traditional LMS described by (2.5-2.6). By inspection of the relaxed look-ahead pipelined LMS algorithm given by (2.7-2.8), we get the following equations which describe the PEA,
where e X l ( n ) , e X z ( n ) and e X 3 ( n ) are defined in (3.12-3.14), The block-diagram of PEA is shown in Fig. 6 . In a practical implementation, 0 1 and 0 2 delays will be employed to pipeline the F and WUD-blocks. Thus, all the operations in the PEA can be pipelined at a fine-grain level.
Assuming that the algorithmic delays have been retimed in a uniform fashion (i.e., all stages have the same delay), the lower bound on the input sample period TPEA is given by Higher values of D1 and 0 2 imply higher speed-ups.
Practical maximum values of D1 and D2 are a function of the desired algorithmic performance (i.e. BER and/or SNR at the slicer).
As mentioned before, the pipelining along with power-supply reduction [2] has been proposed as a technique for reducing the power dissipation. As previously 
Conclusions
Application of strength reduction transformation [I, 21 at the algorithmic level (as opposed to the architectural level) has resulted in a low-power complex adaptive filter architecture. Power savings of approximately 21% was shown to be achievable. Relaxed look-ahead [8] pipelined architectures were then developed for achieving high-speed operation. An additional 40-69% power savings was achieved by scaling down the power-supply. 
