Design of a novel delayed LMS decision feedback equaliser for HIPERLAN/1 FPGA implementation by Sun, Yong et al.
                          Sun, Y., Nix, A. R., Bull, D., Milford, D., de Beauchesne, H., Sperling, R., &
Rouzet, P. H. (1999). Design of a novel delayed LMS decision feedback
equaliser for HIPERLAN/1 FPGA implementation. 300 - 304.
10.1109/VETEC.1999.778065
Peer reviewed version
Link to published version (if available):
10.1109/VETEC.1999.778065
Link to publication record in Explore Bristol Research
PDF-document
University of Bristol - Explore Bristol Research
General rights
This document is made available in accordance with publisher policies. Please cite only the published
version using the reference above. Full terms of use are available:
http://www.bristol.ac.uk/pure/about/ebr-terms.html
Take down policy
Explore Bristol Research is a digital archive and the intention is that deposited content should not be
removed. However, if you believe that this version of the work breaches copyright law please contact
open-access@bristol.ac.uk and include the following information in your message:
• Your contact details
• Bibliographic details for the item, including a URL
• An outline of the nature of the complaint
On receipt of your message the Open Access Team will immediately investigate your claim, make an
initial judgement of the validity of the claim and, where appropriate, withdraw the item in question
from public view.
Design of a Novel Delayed LMS Decision Feedback Equaliser for 
HIPERLAN/l FPGA Implementation 
Y. Sun, A.R. Nix, 
D.R. Bull & D. Milford 
University of Bristol, Centre for 
Communications Research, MVB Room 2.19, 
Woodland Road, Bristol BS8 lUB, UK 
Fax: +44-117-954-5206: 
E-mail: Yong.Sun@bristol.ac.uk 
Abstract - This paper presents the investigation of a new 
equaliser algorithm and architecture optimised for low cost 
P G A  implementation. The design was performed as part of 
the ESPRIT WINHOME project and is fully compliant with 
the European third generation HIPERLAN/l wireless LAN 
standard. The equaliser supports GMSK modulation at an 
instantaneous transmission data-rate of just under 24 
Mbit s/s . 
In this paper the equaliser algorithm and pipelined DLMS 
DFE architecture is presented. Issues such as signal quanti- 
sation, bit and frame synchronisation and frequency offset 
correction are discussed in detail. The final structure is 
shown to achieve considerable hardware simplification to- 
gether with improved performance when compared to a 
standard implementation of the complex LMS equaliser. 
I. Introduction 
HIPERLAN represents a family of new European high- 
speed wireless LAN standards. The first of these standards, 
HIPERLAN/l, was completed in 1996 and products are 
expected early in the new millennium. Standardisation 
bodies in Europe (ETSI BRAN), North America (IEEE 
802.1 1) and Japan (MMAC-PC) are continuing to develop 
similar standards for high-speed wireless LANs. 
With the convergence of computing, broadcasting and tele- 
communications, multimedia computers have started to en- 
ter the home in volume. The desire for high bit-rate wireless 
communications is not only emerging from industry and 
education, but also from private individuals in the home. 
The ESPRIT WINHOME (Wireless INnovation in the 
HOME) project [ 11 aims to provide domestic environments 
with high quality interactive digital television, internet, sat- 
ellite access (via Eutelsat), videophone communications and 
other multi-media services via the HIPEF&AN/l wireless 
interface. 
The main focus of the WINHOME project is to achieve 
high quality wireless video links throughout the home envi- 
ronment using HIPERLAN/l technology. This will be 
achieved through a combination of advanced equaliser de- 
sign and optimised video coding techniques. The WIN- 
H. de Beauchesne, 
R. Sperling & Ph. Rouzet 
Thomson-CSF, Detexis 
55, Quai Marcel Dassault - 92214 
Saint-Cloud, Paris, France 
Fax: +33 1 3481 6165 
E-mail: rouzet@detexis.thomson-csf.com 
HOME system incorporates an adaptive Decision Feedback 
Equaliser (DFE) filter to eliminate harmful Znter-Symbol 
Inteiference (ISI), realised using P G A  (Field Programma- 
bZe Gate Array) technology. However, even the simplest 
standard LMS algorithm is too complicated for such tech- 
nology. Hence, the design of a simple, high bit rate, low 
power adaptive equaliser is a major challenge within the 
WINHOME project. 
When designing the equaliser there were four main factors 
that directly affected its complexity: (1)  the training algo- 
rithm; (2) the equaliser structure; (3) the quantisation level; 
and (4) the internal data representation. In addition, for ad- 
hoc networks, the receiver must operate without control 
from a central node. As a consequence, synchronisation and 
frequency offset correction have to be carefully designed. In 
this paper we present a low complexity, high performance 
DLMS DFE equaliser algorithm and describe the architec- 
ture chosen for its implementation. In section IT, the devel- 
oped Delayed LMS (DLMS) DFE architecture is presented. 
Section III describes and demonstrates the particular form of 
the LMS update algorithm proposed. The signal quantisa- 
tion study is presented in section IV while section V ex- 
plains our recommended synchronisation and frequency 
offset correction techniques. 
11. DLMS DFE Architechture 
A general standard LMS update algorithm, without consid- 
ering the FeedBack Filter (FBF), can be written as: 
Ck+l=Ck+A*ek*V;  (1) 
where C, e and V represent coefficients, error and input 
data, respectively, and A is the adaptation step size. 
A standard LMS DFE critical path can be obtained as 
shown in figure 1, where the multipliers operate with com- 
plex inputs. The compare device determines the difference 
(error) between the received signal and the training se- 
quence. The gradient estimate is weighted by the step size 
and the step size is selected as an exact power-of-two (POT) 
term. Assuming a fixed step size, a simple hard-wired bit 
shift can be used (as shown in figure 1). The hard detection 
0-7803-5565-2/99/$10.00 0 1999 IEEE 
300 
process is not shown in the critical path since this can be 
performed in parallel with the coefficient update process. 
Furthermore, the hard decision process is very simply im- 
plemented by signing the real or imaginary part of the DFE 
output data according to even or odd bit slots in the GMSK 
modem. 
I I V"*L CA I 
I BI. 1 ralnlng squenm 
Figure 1: Critical path of standard LMS DFE 
The latency of the LMS DFE is mainly determined by two 
complex multipliers, the adder trees and the compare de- 
vice. To perform these tasks sequentially at the HIPER- 
LAN bit rate is not possible using current FPGA technology 
(the critical path in figure 1 must be performed within 
42ns). Hence, to enable a practical implementation the De- 
layed LMS DFE must be considered. 
The conventional complex form of the DLMS algorithm can 
be written as, 
Ck+l = c k  A'ei-D'V;-rl (2) 
ei = di  -C: .V,  ( 3 )  
where ck represents the vector of filter coefficients, vk the 
vector of filter input data, A is the step-size and e,. the error 
between the equaliser output, Z,, and the desired response dk 
(here dk is obtained from the 450 bit HPERLAN/l training 
sequence). D represents the delay relaxation measured in 
bit periods. 
Naturally, by applying the above algorithm, the FeedFor- 
ward Filter (FFF) and the Coefficient Update (CU) can be 
computed in parallel. This allows one complex multiplier 
plus the error production and CU parts to be removed from 
the critical path. The latency has now been reduced to half 
that of the standard sequential LMS. However, since the 
detection device and the error production processes are sig- 
nificantly less complex in the feedback part of the DFE 
(compared to the feedforward section), it is possible to 
move the detection device into the feedback section to bal- 
ance the latency of the FFF and the FBF. By carefully con- 
sidering the connection between the FFF and the FBF, a 
sub-summation and a T delay is introduced after the FFF to 
completely parallel the FFF and the FBF process. 
Further pipelining of the DLMS DFE structure can be im- 
plemented on the FFF and the CU to isolate the complex 
multipliers, assuming these processes dominate the latency. 
A further delayed DLMS DFE is proposed in order to guar- 
antee that the computation time available for a complex 
multiply is at least one symbol period (4211s). The resulting 
DFE architecture was developed into a software demon- 
strator as shown in figure 2. 
It is now shown that by saving multipliers in the FFF struc- 
ture, it is possible to dramatically improve the resulting gate 
count. In our design, in order to cope with the home envi- 
ronment radio channel, we have proposed a DFE (6, 5) 
structure. The resulting multiplier count could rise to 2 x 6 + 
5 = 17 complex multipliers, thus resulting in 68 real multi- 
pliers in a non-optimised scheme. 
Figure 2: Software demonstrator of designed DLMS DFE 
The following simplification was developed for the WIN- 
HOME implementation and relies on performing the signal 
and coefficient calculations in series in the feedforward 
loop. This can be achieved by the latest generation of FPGA 
families (e.g. the newest 1 million gate VIRTEX family 
from XILINX) which allows double frequency real multi- 
pliers (47 MHz in our case). Instead of four real multipliers 
in each stage, we use only two, as shown in figure 3 ,  re- 
sulting in a complete DFE (6, 5 )  performed with just 34 real 
multipliers. 
j j 
I 
' , _ /  AddSub 
3 112 _ +  - T/2 T - 
Figure 3: Complex multiplier with 2 real multipliers 
With the proposed complex multiplier structure as shown in 
figure 3,  the combined complex multiplier and adder tree 
stages of the pipeline can be implemented with a latency of 
just two symbol periods. However, the proposed architec- 
30 1 
ture utilises half the FPGA area. As a result, the number of 
8x8 complex multipliers decreases from 68 to 34 in this 
scheme. 
With this improved architecture, the critical path of the 
standard LMS DFE structure (as shown in figure 1) can thus 
be modified to produce a pipelined DLMS architecture as 
shown in figure 4. 
V b  
99y - 
Figure 4: Critical path for the Pipelined DLMS DFE 
__.._. ... ..... .! ...._.__._.._. . ... ........................................... .......................... 
111. LMS Algorithm Development 
There are many ways to simplify the LMS update algo- 
rithm. The purpose of these simplifications is to reduce the 
complexity of the target hardware. However, the simplifica- 
tion process normally degrades the final system perform- 
ance. In this paper a simplification scheme is proposed 
which actually improves the resulting equaliser performance 
for GMSK (known as the real-error scheme). 
The real-error scheme is proposed for use with GMSK 
modulation. The GMSK signal takes alternate (I, Q) values 
of (* 1,  0), (0, kj) according to the sampling axis. This 
also means that the corresponding symbols are either pure 
real or pure imaginary. Therefore, a series real-error scheme 
for DFE coefficient update is proposed. 
In the general LMS equation (equation l), ek represents the 
complex error and can be written as, 
where z k  is the information symbol transmitted in the k-th 
signalling interval. Here, the symbol sequence for z k  is fixed 
using a pre-set training sequence. 
According to the GMSK modem design in our implementa- 
tion, the criteria can consist of minimising E[Re( ek}‘] or 
E[Zm( ek}’] rather than E[{ ek}’]. The justification of the 
equation can be obtained as follows: 
This leads to the following equations for the first real-error 
scheme (real-error scheme-1): 
302 
98% 
3 97% 
8 96% 
B 
95% e 
g 94% 
2 
L* 93% 
92% 
9 1 8 1  
Real err Real err Standard Real err Real err 
Scheme-2 Scheme-3 LMS Scheme-3 Scheme-1 
90% ‘ 
Figure 5: FEBR performance of real-error schemes 
From figure 5, it can be seen that real-error scheme-1 (98% 
FEBR) and (round up) real-error scheme-3 (95% FEBR) 
both improve the FEBR performance compared to the stan- 
dard LMS algorithm (93.5%). Real-error scheme-2 (92% 
FEBR) offers reasonable performance. Interestingly, (round 
down) real-error scheme-3 does not suffer too much degra- 
dation when compared with scheme-1 . All results were ob- 
tained at an S/N of 20 dB and for a 40ns RMS delay spread 
channel with Rayleigh fading time taps (i.e. a worst case 
home environment channel). 
IV. Quantisation Study 
It is shown in [2] that the accumulated quantisation noise of 
the LMS equaliser's Coefficients result in an output quanti- 
sation error whose mean squared value is, approximately, 
inversely proportional to the adaptation step size A .  Taking 
A to be very small (in order to reduce the excess mean 
square error) can result in a considerable quantisation error. 
Here, two critical parts are identified (the €333 and the FFF- 
CU) and two different quantisation loops (internal filter 
coefficients and external filter coefficients) are defined to 
balance the need for high efficiency and low complexity. 
For the FFF, the quantisation resolution can be defined as 
M I  bits and the structure of the FFF can be redrawn as 
shown in figure 6. 
ml in1 ml m2 m2 m2 In3 m3 m3 m4 m4 m4 nz5 
V 1 ... ... 
m5 m5 
lw1 
From feedback filter 
Figure 6: Resolution setup for feedforward filter 
Figure 6 shows that all branches in the FFF employ the 
same level of quantisation resolution, M I .  
From equation 2, it can be seen that the quantisation level of 
the update section should be different from the adder and 
the previous FFF multipliers. This arises since, if the up- 
dating value ( drops below the quantisation level, 
the updating value is lost. Thus, the quantisation level is set 
at MI for the multiplier, while a higher quantisation level of 
M2 is used for the coefficient update section. The resulting 
structure is shown in figure 7. 
.+ ... 
\ 
(cite rwlfiltrr < ocfficients) 
CA+! stored 
M2 (inter nuifiitrr corficrents) 
Ck 
Figure 7: Resolution setup for coefficient update 
Naturally, the value of M 2  should equal 2MI after the com- 
plex multiplier. The top branch, Ck output, is used for the 
DFE filter, while the bottom branch, Ck stored, is used in 
the calculation of the next coefficient update (where greater 
accuracy is required). By employing the above scheme, both 
the excess mean square error and the quantisation error can 
be controlled and minimised to an optimal value. 
A simulation study of the quantisation performance in a 
typical indoor channel has been performed and results are 
shown in figure 8. The graph shows the FEBR performance 
with quantisation ( M I ,  M 2 )  based on a DFE (4, 3). A42 is 
assumed to equal 2M1 and Q(None) means no quantisation 
is applied, i.e. full floating point precision is used. The 
worst case channel conditions described in section I11 were 
asumed for this study. 
100% 
90% - ~ - - - ~ - ~ _ _ _ _ _ _ _ I _  
80% 
0 2 70% 
ZI 60% 
E 
g 50% 
t 
? 40% 
85% I I .  BX Ikreslmlrl 
x 
t 
30% 
20% 
10% 
0% 
Q(4.81 Q(5.10) Q(6J I) Q(7.14) Q(8.161 QWonei 
Quantisation status 
Figure 8: FEBR performance with Q(m, 2m) 
From the results above, a quantisation level of Q(5, 10) 
achieves a reasonable performance level. However, com- 
pared with Q(6, 12), a difference of about 10% FEBR ap- 
pears. Thus, Q(6, 12) appears to offer an acceptable level of 
performance. This study shows that the value of MI should 
be greater or equal to 6. Further study was performed to 
identify a minimum value for M2 (by fixing the value of the 
M 1  and varying M2). The results of this study indicate that 
M2 should have a minimum value of 10 bits. 
303 
to perform coarse frequency offset correction. The simu- 
lated performance of the coarse frequency offset method is 
shown in figure 10. The results indicate that for errors up to 
104 kHz, the mean detection error is less than 1 kHz. 
3.5 , 
3 0 -  
6 8 10 12 14 16 18 20 22 24 
EbNo value (dB) 
Figure 10: Coarse Pequency offset detection peiformance 
To implement synchronisation, two filter circuits are pro- 
posed as a correlation engine (see figure 11). Two real fil- 
ters are applied with the interleaved input I, Q data. The 
coefficients of the two real filters are 1 or -1. To implement 
this kind of digital filter, only adders are involved and this 
makes the realisation very simple and direct. This structure 
is also used to calculate the coarse frequency offset during 
the training period by calculating carrier phase rotation be- 
tween correlation sequences. 
fL3U 
T- Clock-24 
I 
Illplct 
T or 
Clock-48 I 
Figure 11: Correlation engine with two real filters 
FiIte.p_f nsuZ 
. . .  ...A 
Assume Input: n(k), then, 
y(k) = x(k)f(3l)+x(k-1~(30)+ . + \(k-3O)f(l)+x(k-31)~0) 
Figure 12: Transposed transversal filter structure 
For each filter, the transposed transversal filter structure 
(see figure 12) has been implemented to minimise latency. 
The total latency of this circuit includes sign switching, 
shift, adder and truncation processes. The purpose of the 
sh$t is to normalise the final output. 
VI. Conclusions 
In this paper, we have concentrated on the design and im- 
plementation of a high-speed equaliser for home based 
HIPERLAN/l applications. The quantisation study found 
that there were two main quantisation levels needed in the 
design. The internal processing in the update loop needed 
high bit precision to guarantee the accuracy of the step 
weights and to ensure convergence of the DFE. 
The development of a real-error (rather than the traditional 
complex error) scheme for the HIPERLAN/l GMSK 
equaliser resulted in reduced complexity and improved per- 
formance. Results have shown that in the worst case home 
environment, the designed DFE (6,5) achieved a 98% error 
free block transmission rate. 
The proposed pipelined DLMS DFE architecture guarantees 
FPGA implementation by reducing the latency of key criti- 
cal paths. The use of two real dual frequency multipliers 
results in a significant reduction in the number of required 
real multipliers, and hence the FPGA size. 
Coarse frequency offset correction was proposed in order to 
properly run the DFE engine and to minimise the effects of 
delayed frequency offset within the delayed LMS update 
algorithm. The transposed transversal filter structure com- 
bined with fixed coefficients can be used to implement syn- 
chronisation and coarse frequency offset detection. 
A full FPGA implementation is now underway using 
XILINX VERTEX technology as part of the ESPRIT 
WINHOME project. 
Acknowledgements 
The authors (UoB and Thomson-CSF Detexis) would like to 
thank the other members of the WINHOME consortium: 
Grundig (UK), SCT (UK) and Eutelsat (Fr) as well as the 
European Commission for their continual support and en- 
couragement. This work was performed as part of the ES- 
PRIT WINHOME project (25048). 
References 
[ 11 “WINHOME 25048 - Wireless Innovation for Multi-media in 
the Home”, Project Programme, August 1997. 
[2] C.Caraiscos and B.Liu, “A Roundoff Error Analysis of the 
LMS Adaptive Algorithm”, IEEE Transactions on Acoustics, 
Speech, and Signal Processing, vol. ASSP-32, pp. 34-41, Feb- 
ruary 1984. 
[3] Y.Sun & A.R.Nix & J.P.McGeehan, “Suitable Codes Design 
for Frequency Offset Correction and Channel Matched Filter in 
HIPERLAN”, ICUPC’96, Boston, USA, pp. 717-721, October 
1996. 
[4] ETSI Radio Equipment and Systems “High Performance Radio 
Local Area Network (HIPERLAN)”. Functional Specifcation 
Version 1.1 (Draft): ETSI, January 1995 and 1996. 
304 
