In this paper, a non-uniform number representation and filter transformation techniques are used to increase the throughput rate of a Decision Feedback Equaliser (DFE). The DFE input data is non-uniformly quantised and represented by a signed power-of-two (SPT) number. Using this number representation, multipliers can be replaced with barrel shifters and adders. The mean quantisation noise power using SPT input data is examined. The Delayed Least Mean Square Algorithm (DLMS) is used for training the DFE. The delay in the filter coefficient update, together with transformation techniques, results in a DFE structure, realisable as the cascade of a series of modular sections.
I. INTRODUCTION
Numerous equaliser algorithms have been reported in the literature such as decision feedback equalisation and maximum likelihood sequence estimation [ 11. Decision feedback equalisation is often advantageous in channels with long impulse responses, where the complexity of maximum likelihood sequence estimation becomes prohibitive. Increasing data rates will inevitably lead to increased intersymbol interference in future TDMA systems. This is particularly true of high data wireless LANs such as HIPERLAN [2] . The HIPERLAN standard supports data rates of up to 2:3.5Mb/s which, even in indoor environments, can lead to very severe intersymbol interference. This presents a problem using a conventional DFE equaliser since the time available for performing the equaliser update is very small. Motivated by this, modified adaptive decision feedback equalisers (DFEs) are discussed in this paper, which provide significant increases in throughput rate compared to conventional realisations.
A DFE can be realised, using either transversal filters, lattice filters or systolic arrays [l] . I[n [ l ] adaptive equalisers were considered for application to TDMA based systems, which in some cases, impose severe tracking requirements on the equaliser. However, in the case of HIPERLAN, and generally in indoor environments, reasonably stationary channel conditions can be assumed. The equaliser is trained using all, or part of, a 450-bit header packet and may then be fixed while the following data blocks (up to 49) are processed. ,
The comparatively long training sequence allows low complexity (slow converging) algorithms to be used for equaliser training. For this reason the LMS and DLMS algorithms have been considered here.
'
In this paper, two methods for improving the throughput rate of a transversal filter based DFE are described. The two methods are discussed in section 11. In section 111, the convergence and output mean square error characteristics of the DFEs are examined.
ADAPTIVE TRANSVERSAL EQUALISER'S
Two methods for increasing the throughput rate of a transversal filter based DFE are described in this section. The first method uses non-uniform quantisation (a signedpower-of-two (SPT) approximation) of the equaliser input data [3] as shown in figure 1. The SPT approximation is carried out following frame synchronisation. The SPT quantisation is applied to the input data, as opposed to the filter coefficients, since the performance of the equaliser is largely unaffected by this approximation (see section 111). However, the complexity of the equaliser can be significantly reduced by exploiting the SPT representation of the input data. In addition, the non-uniform quantisation of the input data is required only once per input sample. In contrast, for SPT filter coefficients, it is necessary to non-uniformly quantise the coefficients following each update. This introduces additional latency and complexity within the coefficient update loop.
A transversal filter is a natural choice for the equaliser architecture, since it allows significant exploitation of the SPT representation of the input data. Using the LMS algorithm for equaliser training allows further exploitation of the SPT input data to reduce the complexity of the coefficient update recursion. Unfortunately however, adaptive transversal filters suffer from an inherent sampling rate limitation for a given speed of hardware. This is due to the feedback of the residual error necessary to adapt the filter coefficients, i.e. the whole residual error calculation must be completed before the coefficient update can be performed. However, this throughput bottleneck can be overcome using the DLMS algorithm The DLMS algorithm [4] (an approximation of the LMS algorithm) is used to realise a modular, high throughput filter structure with clock rate limited only by the delay in a single processing module. The modified structure also operates directly on the input data stream, again facilitating savings from using the SPT number representation. It is shown in section I11 that the degradation in performance when using the DLMS algorithm is not significant. It is however noted that the equaliser's stability becomes increasingly sensitive to the choice of step size as the delay in the coefficient adaptation is increased.
A. Non-Uniform Number Representation
A representation of a discrete-time: B-bit two's complement number x( m ) , in the signed power-of-two space [5] This is illustrated in figure 2 , where a fractional value in the range [0, 11 represented by a two's complement number, is approximated by the nearest 2-SPT term.
Hereafter N-SPT, will be used to denote an approximation of a two's complement integer using N POT terms, each taking either positive or negative sign. The area and or latency of a multiplier can be significantly reduced by using restricted-number representations for either the multiplier or multiplicand, i.e. using coefficients with a Himitation on the number of non-zero digits. The multiplier can then be replaced with shift and addition elements. Using a 2-SPT representation of the input data, as described above, allows the multipliers in botlh the transversal filter and coefficient update modules (for the LMS algorithm) to be replaced with a pair of barrel shifters and a single adder.
By applying the N-SPT approximation to the output of a uniform ADC, it is possible to view the composition of the two processes as a non-uniform ADC, an N-SPT ADC. The: mean quantisation noise power of such an ADC is characterised in figure 3 for variable N and B [6] .
No. POT Terms in SPT Number It was noted above, that the representation of the feedback data samples was arbitrary. It is obviously advantageous (to minimise complexity) to represent them as POT terms. By adjusting the magnitude of the training data samples, the magnitude of the filter coefficients can generally be kept bounded by one.
B. Pipelined DLMS DFE
Previously, the DLMS algorithm has been employed to allow pipelining of the LMS algorithm for a linear structure [4] . This method is extended here to allow a modular high throughput DFE structure to be developed. The DLMS algorithm is given by equations (2) and (3)
The elements of Y ( n ) are given by where y, ( n -i) is the contribution to the current equaliser output from the feedback filter. Rewriting ( 5 ) gives
The W(t) and X ( t ) vectors are first partitioned into the feedforward and feedback sections respectively; (2) is then rewritten
[ w,(n), W,,(n)] = [ W, (n -I). W,,(n -I)] + pe*(n-D)[ X, (n -D). X,,(n -D ) ]
where (4)
which is the vector of previously detected symbols i.e.,
X , ( n ) = [ d ( n -l ) d ( n -2 ) ... d ( n -L ) ]
The vectors of filter coefficients for the feedforward and feedback filters are defined as
W f ( n ) = [ w ; ( n ) ... w f -' ( n ) ]
and W / ) ( n )
=[wZ(.) ... w,"-'(n)]
In a manner similar to [4] , an output vector is defined as
Initially we define y, ( n -i) as It is stressed that this is not a strict realisation of the DLMS algorithm. However, by inserting delays in the filter coefficient terms in (7a), a transposed filtei structure implementing the DLMS algorithm is obtainec i.e. the kth coefficient used in (7a) should be delayed b) an additional L-I-k sample periods, i.e. the delay element! Dk are set to Dk = L -k . In this case yh(n-i) is now
given by (9b)
The transformed data flow diagram for a (3,3) DFE using the DLMS algorithm is shown in figure 5 . The structure consists of three identical processing modules (PMs). The latency in the output is 2L-1 sample periods. This is the time required for all the feedforward filter stages to fill and for the estimate of the desired response to propagate along the filter structure. It should be noted that the input to the feedforward filter enters from the left whilst the previous decision is input to all the feedback filter sections simultaneously. Note also that the index for the feedforward filter coefficients increases left to right, but for the feedback filter coefficients, it decreases left to right.
The weight update for w ; . ( n -i ) requiired by ( 5 ) is obtained from (4) as
For the update of wL(n-1) there are two forms corresponding to equations (7a) and (7b). For (7a) the weight update is
For (7b) the weight update is given by
In both (9a) and (9b) global communication is required; in (9a) the: same error term is fed back to all the coefficient update sections, whereas in (9b) the same data symbol is fed back. The forrn (9b) is attractive because the feedbaclk data is only a complex number of the form f l f j . In addition, because of the reversed order of the feedback fillter coefficients, the error term in (8) is the same as that required in (9b) and therefore this reduces the communication costs considerably. An individual processing section for the DLMS DFE structure is shown in figure 6 using the update (9b). The complexity of the proposed filtering structure differs from the LMS algorithm only in the additional pipelining latches. In addition, an N-SPT approximation for the input data can also be used to1 reduce the complexity, as described ablove for the LMS algorithm. In figure 6 , the contributions to the estimate of the desired response, from the feedback and feedforward filter, are combined in each PM using adder A2. Instead the feedforward and feedback contributions can be propagated separately and summed in the last filter stage. Postponing the summation is of particular advantage if NM-SPT input data samples for the feedforwardfeedback filter stages are used, i.e. the summation of N+M 2C numbers with the data eslimate from the lower order stage can be carried out once at the final PM. The data streams in the final PM can be combined using an adder tree to determine the estimate of the desired response.
This structure also has the attractive feature that the use of different number representations for the feedforward and feedback input data streams, does not affect the regularity of the structure, i.e. the functionality of each processing module will be identical. For the conventional LMS algorithm, different feedforward and feedback filter structures would be required for the different number representations of the input data. This is particularly useful in an application such as HIPERLAN, where the use of a GMSK modulation scheme allows significant savings to be made in the feedback architecture [2] .
CONVERGENCE & RESIDUAL MSE
The effect of the non-uniform approximation of the input signal on the equaliser's performance is considered here. For comparison, a stationary channlel characteristic leading to an eigenvalue spread of 46 [7] is used to distort a QPSK signal. Additive noise is added (E,/N, = 20dB) and the signal is root raised cosine filtered. The convergence of a (3,3) DFE using the DLMS algorithm and 2-SPT input data (approximation obtained from a linearly quantised 8-bit input data stream) is compared with the LMS algorithm using the original 8-bit input data in figure 7 . For clarity, only a small number of points have been plotted for the LMS algorithm. It can be seen that the effect of the algorithm approximation and non-uniform quantisation of the input data has had no significant effect on the convergence behaviour of the equaliser. The step size was chosen to be the same in both cases. 
IV. CONCLUSIONS
This paper has discussed two methods to reduce the complexity and increase the throughput of adaptive transversal DFEs for applications such as HIPERLAN.
In the first method, non-uniform quantisation of the feedforward input data stream was proposed. The input 1 data was represented using a signed power-of-two number representation which allowed the multipliers to be simplified. In the second method, a new modular structure for implementing a pipelined DFE using the DLMS algorithm was described. The modified structure ' resulted in a throughput rate determined by a single multiplier, barrel shifter and adder. Using non-uniform quantisation of the input data in conjunction with this' structure allows the throughput rate to be improved still' further.
S~S~S ,
VOI CAS-40 11, NOV 1993, pp 727-729. 
