I. INTRODUCTION
It has been recognized that in addition to algorithmic performance measures (such as signal-to-noise ratio (SNR)), VLSI domain constraints (such as area, speed, and power) need to be addressed during the algorithm design phase. In recent years, algorithm transformation techniques [l] such as pipelining [2] have been proposed for highspeed and low-power applications [3] . By combining pipelining with folding [4] , it is possible to trade off area with speed. Thus, all Manuscript received November 29, 1994; revised January 29, 1996. The associate editor coordinating the review of this paper and approving it for publication was Prof. Keshab Parhi..
N. R. Shanbhag is with Coordinated Science Laboratory, University of
Illinois at Urbana-Champaign, 1308 West Main St., Urbana, IL 61801 USA.
G.-H. lm is with AT&T Bell Laboratories, 200 Laurel Avenue, Middletown, NJ 07748 USA.
Publisher Item Identifier S 1053-587X(96)045 12-6. the three major parameters of interest in a VLSI implementation (speed, power, and area) can be optimized by the design of pipelined algorithms. Recently, the relaxed look-ahead technique [5] had been proposed for efficient pipelining of adaptive digital filters. Thiis technique is an approximation to the look-ahead technique [2] . In ithis correspondence, we employ relaxed look-ahead [5] and scattered look-ahead [2] to pipeline adaptive IIR (AIIR) filters [6] . While numerous AIIR filter structures have been proposed in the past [7] , [8], we will focus on the equation-error based [9 I aipproach. This palper is organized as follows. In Section 11, we present some background material. In Section 111, the pipelined AIIR architecture is derived via the application (of relaxed look-ahead and scattered look-ahead. In Section IV, we present simulation results to verify the convergence analysis results arid the performance of the pipelined architectures.
PRIZLIMINARIES
In this section, we describe: the scattered look-ahead [2] (for (for pipelining of adaptive digital fidters), and the properties of relaxed look-ahead pipelined LMS filter [lo] . 
A. Scattered Look-Ahead
and an AI-step scattered look-ahead pipelined transfer function is given by (see L21)
In a serial (or unpipelined) recursive digital filter, the current state w(n) is computed as a function of past states w ( n -1). ~( n -2), . . . , u~( n -A') and present and past values of input r ( n ) . In
H p p e ( Z P 1 )
.
where N is the order of the filter, fserzal(.) is a linear function, and n is the time index. On the other hand, in an 11-step scattered look-ahead pipelined filter, the current state is computed as follows:
where f s , p z p e (.) is the scattered look-ahead function. The hardware overhead due to scattered look-ahead is O(dATiXf), which can be reduced via decomposition [2] to OIN log, (M)]. A significant
B.
Look-Ahead
Consider the first-order recursion characteristic of scattered look-ahead is that it preserves stability.
In general, for H,,,,,l(z-') given by
The computation time of (2.5) is lower bounded by a single add time. Next, we apply a M-step look-ahead to (2.5) in the time-domain and obtain U - The delay relaxation involves the use of delayed input u ( n -D I ) and delayed coefficient 4 7 1 . -0 1 ) in (2.6). If the average value of the product a ( n ) u ( n ) is more or less constant over D1 samples, then (2.6) can be approximated as
2=0
Note that this approximation results in the "delayed LMS" [12] algorithm when applied to the traditional LMS algorithm. In general, this is a reasonable approximation given that the average value of a(n)u(n) varies slowly in (2.6).
Application of the sum relaxation to (2.6) involves taking L A terms from (2.6), where L A 5 M , to get
This relaxation can be justified if the average value of the product a ( n ) u ( 7~) is slowly varying, and simulations (for both LMS and the AIIR filters) indicate this to be a good approximation.
C. Pipelined LMS (PIPLMS) Axhitecture
The serial LMS filter is described by the followin,g equations:
where W ( n ) is the weight vector, U(71.) is the inpul vector, e ( n ) is the adaptation error, p is the stcp-size, and ~( 7 1 . ) is the desired signal. The relaxed look-ahead pipclined LMS architecture (see [lo] for details) is given by
where D1 delays are introduced via the delay relaxation, and D2 delays are introduced via the slum relaxation. The D1 and D2 delays would be emlployed to pipehe ithe hardware operators in an actual implementation.
D. Convargence Analysis of PIELMS
As mentioned before, the re1 axed look-ahead technique modifies a given ad,aptive algorithm, and hence, a convergence analysis needs to be done. The results of the convergence analysis of the meansquared error (MSE) [lo] (for L A = 1 and I< = D1 / D 2 ) show that 111. PIPELINED AIIR ARCHITECTURES In this section, we first formulate a pipelined system identification scenario and then develop the pipelined AIIR architectures. 
E [ U ( n ) U T ( n ) ] .
This reduction in the upper bound should be kept in mind in applications where fast convergence is important. However, this is not a problem in those applications (such as adaptive equalizers in digital subscriber loops) where the step-size p is kept very small.
As Ii* decreases from D1 down to unity, which also corresponds to increasing D2 from unity to D1, the convergence speed of PIPLMS slows down. It was shown in [IO] that by first choosing appropriate values of D1 and D2 for achieving the desired speedup and then adjusting LA provides a solution to this problem. This is also true for the proposed pipelined AIIR filter, as will shown in Section IV.
Finally, the misadjustment for PIPLMS is given by
where b and P are defined in (2.12). From (2.13), as Ii is increased from unity toward D1, the misadjustment would increase. In actual practice, the misadjustment does not change substantially as I<* varies.
In the next section, we develop the pipelined AIIR filter architectures.
A. Pipelined System IdentiJication Scenario
scenario, where H(zP1), which is the unknown plant, is given by
In Fig. 1 , we show the conventional serial system identification
with numerator polynomial B ( z -' ) and denominator polynomial 1 -=1( s ) , 7 ( n ) is the additive noise unconelated with the input 2 (n) , y ( n ) is the plant output (also the desired signal), and H,(n, z -' ) is the time-varying model. Note that the numerator polynomial B,(n. 2 -l ) and the polynomial Am(n, z -' ) are adaptively computed. The denominator polynomial of the model is formed by taking the inverse 1/[1-A, ( n . z -' ) ] after every coefficient update. If the order of B,, ( T i . z -' ) and 4, (n., z -' ) are sufficiently high and if the adaptation mechanism converges successfully, then B,,, ( n , z-' ) will approach B ( z -' ) , and A,(n, z -' ) will he close to A (2-l) . One of the disadvantages inherent to the equation error approach is that the final solution may be biased. In this correspondence, however, we will focus only on the problem of providing an efficient pipelined architecture. We can employ a relaxed look-ahead to pipeline the adaptive algorithm in Fig. 1 and scattered look-ahead to pipeline the timevarying recursive section l/[l -A,,(n, .-')]. Therefore, in the pipelined system identification scenario (see Fig. 2 ), we assume the plant to be in a scattered look-ahead form as in (2.4). Therefore, in Fig. 2 , the polynomial A,(n, z -' ) (for any time instance 72) would have the same form as the denominator in (2.4). In practice, given a plant of type (3.1), we can emulate the behavior of an equivalent pipelined plant by delaying the output of the plant y(n) by M latches (see Fig. 2 ).
B. Serial AIIR Filter Architecture
The serial AIIR filter is described by the following equations:
where
. u . v~(~L ) ]
(3.34
U T ( , ) = [ Z ( T L ) , Z(n-1). " ' , X ( n -N B $ I )
(3.3c)
Note that W ( n ) is the coefficient vector with B(n) and A ( n ) being the coefficient vectors of the numerator and denominator polynomials, respectively.
The serial AIIR (SAIIR) filter architecture is shown in Fig. 3 Due to the recursive structure of the adaptation and filtering operations, the SAIIR architecture in Fig. 3 has a throughput bottleneck. In particular, it can be seen that the critical path for SAIIR has a computation time of 
C. The PIPAIIR Architecture
In order to derive the PIPAIIR architecture, we start with the SAIIR equations (3.2). The process of pipelining the SAIIR proceeds in two steps. First, we transform (3.2) such that it is applicable to the scattered look-ahead-based pipelined system identification model of Fig. 2 . This step will result in the pipelining of the FC block. Next, we apply relaxed look-ahead to the adaptive sections, which will result in the PIPAIIR architecture.
By inspection of (2.4), we transform (3.2) as follows: s 
A'(~L) = [ U , l ( ? t ) ,
is greater than N B . This is precisely due to the fact that we have assumed a pipelined system identification model based on scattered look-ahead. Next, we pipeline the adaptive sections via relaxed look-ahead. Applying delay and then sum relaxations to (3.Sa) (see also (2.10)), we obtain the following equations that describe the PIPAIIR architecture: Fig. 4 , we show the PIPAIIR architecture, where the 0: delays have been retimed such that they have been placed ,at the output of the FB block and FA block. This requires delaying e l ( n ) by an additional D: delays. Note that the plant output y ( n ) is delayed by &I = DIL + 133. This is because ( 3 . 6~) introduces a latency of 0 3 samples, which i s now added 1 . 0 the 0 : delays due to retiming.
In a practical implementation, we would employ the 0: latches to pipeline the filter blocks FB and FA, whereas the 0 1 latches would be used to pipeline the VVlDDA and WUDB blocks, and 0 3 latches would be employed to pipeline the FC block. The hardware requirem'ents for SAIIR and PIPAIIR arc shown in Table I . The TYPE I adder, in Table I , refers to the adder in the filter blocks FB, FA, and FC, whereas the TYPIE I1 adder is present in the weightupdate blocks WUDA, and 'WUDB. It is clear that the increase in hardware is mainly due to the increase in the numerator order.
The convergence behavior of PIPAIIR can be easily obtained by substituting
A-;
into (2.11)-(2.13). We will verify the convergence analysis of PI-PAIIR via sirnulations in the neat section.
IV. SIMULATION RESULTS
In this, section, we present simulation results to verify the convergence analysis results of !Section 111-B and the utility of sum relaxation.
A. Experiment A
In this experiment, we consider the problem of identifying the plant o.5;-1
The value of D:% was kept at unity, whereas D1 + G I : was varied to obtain diFferemt values of li for 0 2 = 1 and 0 2 = 2. The plant SNR, which is defined as the power cif the input signal .r(n) to the power of additive noise !7/(71), was 32 dE;. All simulation results were averaged over 100 independent trials, and the final results of this experiment are shown in Table 111 ) is 3. It can be seen that the PIPAIIR architecture in Fig. 6 can be clocked (assuming uniform pipelining) with a minimum clock period of Tpzpe = 20 time units, which corresponds to a speedup of 7.2. Note that this speedup has been achieved in spite of an increase in the order of the FB block.
In Table 111 , we summarize the effect of pipelining on S N R E and SNRE1, where S N R E and S N R E I are the SNR's with respect to the adaptation error e ( 7~) and the estimation error el ( n ) , respectively. From Table 111 , it is clear that as speedup increases, both S X R E and S N R E I change only slightly. Even for a speedup of 8, Table 111 . Note that as speedup increases, the number of samples required to converge also increases. T h e slower convergence of PIPAIIR can be rectified b y using sum relaxation with LA = 3. As LA = 3 implies an increase in the computation time of the WUDA,
and WUDB blocks, we need to increase the value of D1 from 5-10. From Experiment B, w e know that the degradation in S N R is negligible for these parameter values. In Fig. 8 , we have plotted the
