Applications of FPGAs in high-performance adaptive channel equalization by Ossif, Sergei
Sergei Ossif
Applications of FPGAs in high-performance
adaptive channel equalization
Helsinki Metropolia University of Applied Sciences
Bachelor of Engineering
Degree Programme in Electronics
Bachelor’s Thesis
02.02.2016
Abstract
Author(s)
Title
Number of Pages
Date
Sergei Ossif
Applications of FPGAs in high-performance adaptive chan-
nel equalization
24 pages + 1 appendices
Tuesday 2nd February, 2016
Degree Bachelor of Engineering
Degree Programme Degree Programme in Electronics
Specialisation option
Instructor(s) Thierry Baills, Senior Lecturer
Wireless communications play an important role in today’s society. To achieve higher
speeds and bandwidth efficiency modulation techniques moved into the digital domain.
The implementation relies on the use of digital signal processing to reduce intersymbol
interference.
The idea of this project was to investigate the feasibility of building such system
on FPGA platform. An implementation of Fractionally Spaced Equalizer using Least
Mean Squares filters to configure 8-tap linear filters was built and evaluated. To
test the system, the equalizer was connected to a microprocessor. The processor
then fed the test sequence and calculated the time it takes to process the data. The
results were then compared to a similar system built on DSPmulticore processor system.
This implementation showed promising results. The system showed itself to be about
40 times faster than the DSP multicore processor system with an average time required
to process 64 samples to be 13.7 s.
Keywords Digital Signal Processing, DSP, Least Mean Squares, LMS,
Field-Programmable Gate Array, FPGA, Adaptive Filtering,
Adaptive Equalizer, Quadrature Amplitude Modulation, QAM
Contents
List of Figures 4
Abbreviation
1 Introduction 1
2 Theoretical Background 2
2.1 Transmission 2
2.1.1 Representation of digitally modulated signals 3
2.1.2 Quadrature amplitude modulation 4
2.2 Channel 5
2.2.1 Rayleigh Fading Channel 7
2.3 Reception 10
2.3.1 Demodulator 10
2.3.2 Equalizer 10
2.3.3 Least Mean Squares filter 12
3 Implementation 15
3.1 Workflow 15
3.2 Hardware 15
3.3 Software 16
3.4 Implementation 18
4 Results 21
Bibliography 23
Appendices
Appendix 1 Benchmark Code
List of Figures
1 A simple model of a communications system (reproduced from [1]). 2
2 Several signal space diagrams(reproduced from [2]). 4
3 Relationships between the channel frequency-transfer function and a signal
with bandwidth W (reproduced from [3]). 6
4 Multipath propagation (copied from [4]). 7
5 Small-scale fading: mechanisms, degradation categories, and effects (repro-
duced from [3]). 8
6 Fading channel manifestations (reproduced from Proakis [3]). 9
7 Linear and decision feedback equalizers (reproduced from [1]). 11
8 Response of a multipath channel to a narrow pulse vs. delay, as a function of
antenna position (reproduced from [3]). 12
9 Adaptive LMS filter (copied from [5]) 13
10 Basic DSP48E1 Slice Functionality (copied from [6]). 16
11 Complete system 17
12 Top view of System Generator 18
13 Fractionally spaced equalizer 19
14 Adaptive Filter top view 20
15 LMS Filter 20
16 Simplified diagram of the test environment 21
17 Equalizer error at mu=0.05 21
Abbreviations
AWGN Additive White Gaussian Noise
DFE Decision Feedback Equalizer
DSP Digital Signal Processor
FIR Finite Impulse Response
FPGA Field-Programmable Gate Array
FSE Fractionally Space Equalizer
ISI Intersymbol Interference
LMS Least Mean Squares
QAM Quadrature Amplitude Modulation
RISC  Reduced Instruction Set Computer
11 Introduction
In today’s world wireless communications take larger and larger role. It is possible now to
carry a device capable of transferring information at speeds of up to 1 Gbit/s. With the
always increasing requirements to bandwidth speed engineers have to squeeze every
single bit of performance out of the hardware to stay competitive. This means innovation
in more bandwidth efficient modulation methods, increasing the base frequency, more
complex modulators, and demodulators.
With technological limits pushed up so high, higher sensitivity is required. As the sensi-
tivity goes higher, so does the degenerative effects of the environment where the com-
munication takes place. All kinds of objects and weather effects in between of the trans-
mitter and the receiver could inject the noise into the received signal. As the result, more
thought is being put into the design of the demodulator.
An equalizer is the part of the demodulator designed to battle these degenerative effects
and reduce intersymbol interference. It does it by approximating the inverse of channel’s
impulse response. The performance of the equalizer is critical to the successful operation
of electronic systems.
The focus of this project is to investigate the feasibility of making such equalizer on an
FPGA platform and then comparing its performance with analog DSP processor based
system.
22 Theoretical Background
2.1 Transmission
Figure 1 presents a simplified block diagram of a digital communication system. The
input data is applied to theModulator and transmitter, which converts the serialized data
sequence into a bandlimited analog waveform and translates it into frequency band suit-
able for transmission. As the signal propagates through the channel, it is affected by
delay, attenuation and distortion in a frequency-dependent manner. On the other end,
the receiver accepts the deteriorated signal and attempts to reconstruct the initial data
sequence.
MODULATOR 
AND 
TRANSMITTER
PROPAGATION
CHANNEL
RECEIVER AND
DEMODULATOR
ADDITIVE
NOISE
ADDITIVE
INTERFERENCE
DATA
SOURCE
RECOVERED
DATA
Figure 1: A simple model of a communications
system (reproduced from [1]).
The digital data is usually presented
in the form of a stream of binary data.
Regardless if data is analog (audio
and video) or digital (output of a com-
puter) the goal is to transmit these
data to the destination using the given
communication channel. Depending
on a nature of communication channel transmission can suffer from various impairments,
such as noise, attenuation, distortion, fading and interference.
Tomake the transmission the signal needs to be generated in a way that represents binary
data, accommodates to channel characteristics such as bandwidth and impairments.
Different channels have different characteristics impairments and as a result generated
signal could be vastly different. The process of mapping a digital sequence to signals
for transmission over a communication channel is called digital modulation or digital sig-
naling. [2, p. 95]
32.1.1 Representation of digitally modulated signals
In order to send the digital sequence using the physical medium, the data must first
be processed in a way that allows mapping the binary value to physical amplitude and
phase. This process is called data-mapping.
The mapping between the binary sequence and the signal sequence to be transmitted
over the channel is called data-mapping. Depending on modulation scheme, it could be
either memoryless or with memory, resulting in corresponding modulation schemes.
In the memoryless scheme, such as PAM and QAM, the binary sequence is parsed into
subsequences of length k, and each sequence is directly mapped into a corresponding
set of waveforms sm(t), where 1  m  2k, regardless of previously transmitted signals.
This is equivalent to a mapping from M = 2k messages to M possible signals. If we
assume that these signals are at a signaling interval Ts, it means that in each second
Rs =
1
T
(1)
symbols are transmitted, where Rs is called the the signaling rate or symbol rate. As each
signal carries k bit of information, the bit interval Tb is given by
Tb =
Ts
k
=
T
log2M
(2)
and the bit rate R is given by
R = kRs = Rslog2M (3)
If "m is the energy content of the waveform, then the average signal energy is
"avg =
MX
m=1
pm"m (4)
where pm is the probability ofmth signal, ormessage probability. If message probabilities
4are equal, or messages are equiprobable, pm = 1M , and therefore
"avg =
1
M
MX
m=1
"m (5)
If all signals have the same energy, then "m = " and "avg = ". The average transmission
energy per bit, when signals are equiprobable is
bavg =
avg
k
=
avg
log2M
(6)
To calculate the average energy power sent by the transmitter
Pavg =
"bavg
Tb
= R"bavg (7)
Waveforms sm(t) that are used to transmit the signal over the communication channel
can be in any form. However, as they are transmitted over bandwidth limited channel
2.1.2 Quadrature amplitude modulation
M  = 64
M  = 32
M  = 16
M  = 8
M  = 4
Figure 2: Several signal space dia-
grams(reproduced from [2]).
Quadrature amplitudemodulation (QAM)
is a memoryless analog, or digital
modulation scheme.
The amplitude of two waves, 90°
out-of-phase with each other (in
quadrature) are changed (modulated
or keyed) to represent the data sig-
nal. Amplitude modulating two carri-
ers in quadrature can be equivalently
5viewed as both amplitude modulating
and phase modulating a single carrier.
This is essentially achieved by map-
ping two separate k-bit symbols on
two quadrature carriers cos2fct and
sin2fct resulting in a signal waveform
sm(t) =<[(Ami + jAmq)g(t)ej2fct]
= Ami g(t)cos2fct  Amqg(t)sin2fct; m = 1;2; :::;M (8)
where Ami and Amq quadrature carriers’ signal amplitudes and g(t) is the signal pulse.[2]
Another way to express the waveform is
sm(t) =<[rmejm ej2fct] = rm cos(2fct +m) (9)
where rm =
q
A2mi +A
2
mq and m = tan 1(Amq=Ami). It becomes apparent that QAM wave-
forms could be expressed as combined amplitude rm and phase m modulation.
In our case of transmitting discrete digital signals of length (2m  1 M); m = 1;2; :::;M,
the signal space diagram is rectangular, as shown in Figure 2.
2.2 Channel
The general concept of fading channels was first modeled in 1950s and 1960s primarily
targeting over-the-horizon communications covering a wide range of frequency bands.
There are several models developed as of today such as additive-white-Gaussian-noise
(AWGN) channel and Rayleigh fading channel. Ideal AWGN represents data sample,
free of any intersymbol interference (ISI) being corrupted by statistically independent
Gaussian noise samples. The main source of performance degradation is thermal noise
generated in the receiver. This noise can be approximated as a white noise, having flat
power density over the signal band. However, external interference received by the an-
6tenna can be more significant than the thermal noise. It can be characterized as having
a broadband spectrum and quantized as antenna noise temperature. [3, 18-1]
W
Spectral
Density
Frequency
W
Spectral
Density
Frequency
Spectral
Density
Frequency
W
f0
f0
f0
(a) Typical Frequency-Selective Fading case (f0 <W )
(b) Typical Flat-Fading case (f0 >W )
(c) Null of Channel Frequency-Transfer Function occurs at
Signal Band Center (f0 >W )
Figure 3: Relationships between the channel frequency-transfer function and a signal
with bandwidth W (reproduced from [3]).
While AWGN is a good starting point in understanding basic principles, it is based on
several assumptions that make it inaccurate in practical applications. In AWGN model
it is assumed that signal attenuation vs. distance behaves as if propagation takes place
over ideal free space, free of all the objects and obstacles, and the attenuation between
transmitter and receiver behaves according to the inverse-square law.
However, for most practical cases the signal propagation takes place in the atmosphere,
near the ground. The signal often travels from transmitter to receiver over multiple re-
flective paths, a phenomenon known as multipath propagation. This effect causes fluc-
tuations in the received signal’s amplitude, phase and angle of arrival.
7Figure 4: Multipath propagation (copied from [4]).
2.2.1 Rayleigh Fading Channel
There are 3 basic mechanisms that impact signal propagation: reflection, diffraction, and
scattering.
Reflection occurs when a propagating electromagnetic wave impinges upon a smooth
surface with very large dimensions compared to the RF signal’s wavelength .
Diffraction occurs when the radio path between the transmitter and receiver is ob-
structed by a dense body with large dimensions compared to , causing secondary
waves to be formed behind the obstructing body. Diffraction is a phenomenon that ac-
counts for RF energy traveling from transmitter to receiver without line-of-sight between
the two. It is often termed shadowing because the diffracted field can reach the receiver
even when shadowed by an impenetrable obstruction.
Scattering occurs when a radio wave impinges on either a large rough surface of any
surface whose dimensions are on the order of  or less, causing the reflected energy to
spread out (scatter) in all directions. In an urban environment, typical signal obstructions
that yield scattering are lampposts, street signs, and foliage.
8Figure 6 describes different channel fading manifestations.
To make a model of a realistic wireless channel, its impulse response is approximated
and calculated. Then it could be implemented in a form of a tapped-delay filter.
Figure 5: Small-scale fading: mechanisms, degradation categories, and effects (repro-
duced from [3]).
9Figure 6: Fading channel manifestations (reproduced from Proakis [3]).
10
2.3 Reception
2.3.1 Demodulator
There are several design considerations when developing a demodulator. Essentially,
the role of the demodulator is to: “(1) bandpass filter the incoming signal, (2) adjust
the average input signal amplitude, (3) estimate and remove any carrier component, (4)
equalize channel’s dispersive effects, (5) “time slice” the input signal to obtain pulse am-
plitude and phase measurements, (6) decide which pullse amplitude and phase pair was
actually transmitted, and (7) convert that decision into associated bit pattern”. [1]
2.3.2 Equalizer
“The revolution in data communication technology can be dated from the invention of
automatic and adaptive channel equalization in the late 1960s.” [7]
As the QAM was developed with the increase of bit-per-Hertz ratio for transmission, it
also required the use of adaptive equalization to counteract the degenerative effects of
the signal propagation. Coupled with the fact that equalizer tends to consume most
of the receiver’s computational resources, great effort was put into finding an optimal
tradeoff between cost and performance.
As the channel’s disruptive effects on the transmitted signal could be modeled as a linear
filter, it is natural to use another linear filter to compensate or equalize these effects. If
the channel characteristics are known and time-invariant, we only need to design it once.
However, with our application, this is not the case. Not only the exact characterics are
unknown, they change with time. This means our filter must be equipped with control
logic to learn the characterics and, if necesary, to track and adjust to changes. This is
the situation where adaptive equalizer is used.
11
FEED
FORWARD
FILTER
LINEAR
EQUALIZER
FEEDBACK
FILTER
DECISION
CIRCUIT
DECISION
CIRCUIT
REGENERATED
SYMBOLS
DECISIONS ...101...
DECISIONS ...101...
(a)
(b)
RECEIVER
INPUT
RECEIVER
INPUT
Figure 7: Linear and decision feedback equaliz-
ers (reproduced from [1]).
There are two basic equalizer struc-
tures. The first one is called lin-
ear equalizer (Figure 7a) as the out-
put is the linear combination of the
received signal and its’ delayed ver-
sions. The second one is called deci-
sion feedback equalizer(DFE) (Figure
7b), where the output is a combina-
tion of both filtered input signal and
filtered version of the nonlinear decision circuit’s output. Deciding which type of equal-
izer to use depends on practical considerations of the task to solve. The biggest concern
with DFE is that it is very complicated to pipeline the feedback loop in high-speed de-
signs, hence linear equalizer was chosen.
Early demodulator designs used T-space equalizers , meaning the input signal was sam-
pled once every symbol. This required the timing of the sampled clock to be adjusted in
a way, so that the samples are taken at the “top dead center” of received symbols.[8] It
was an intuitive and computationally efficient way to design an equalizer. However, even
though the pulses arrive at rate 1=T , the actual bandwidth is 10 to 40 percent higher mak-
ing it insufficient to satisfy the Nyquist theorem.[1] Thus, the use of a Fractionally Spaced
Equalizer (FSE) was proposed.
The idea is to sample the received signals faster than the symbol rate fB while keeping
the output at fB. While it is more computationally complex than T-spaced equalizer, it
simplifies the overall design of demodulator and allows it to work at almost ideal level.
The most commonly used input rate is simply 2 ·fB, making the filter tap spacing equal
to T =2.[8]
The complexity of an equazlier is directly related to its length. Being encumbered by
doubling the sample rate, it becomes even more important to keep the length as short
as possible why still achieving desirable peformance. Even though is no good answer
how to calculate such thing, 2 approaches became common. The first one is to simply
build a prototype and test its perforamance against an actual channel. The second is to
12
Antenna position #1
at Time t1
Power
Received
Delay Time τ
Antenna position #2
at Time t2
Power
Received
Delay Time τ
Antenna position #3
at Time t3
Power
Received
Delay Time τ
(a)
(b)
(c)
Figure 8: Response of a multipath channel to a narrow pulse vs. delay, as a function of
antenna position (reproduced from [3]).
use some rule of thumb, based on idea that equalizer’s convergent pulse response will
approximate the inverse of the channel. [1]
2.3.3 Least Mean Squares filter
The LMS algorithm was first implemented in late 1950s by Widrow and Hoff in their study
of a pattern-recognition machine. It is a simple and effective algorithm for the design of
adaptive transversal filters. [9] The idea is to find filter coefficients that produces the least
mean squares between the desired signal and the actual signal. LMS algorithm consist
of 2 basic processes:
1. A filtering process, where the output of the transversal filter is computed and error
estimation is generated
2. An adaptive process, where where tap weights of the filter are adjusted according
to estimation error
13
This combination produces a feedback loop around the LMS algorithm. Its formal defi-
nition reads:
wˆ(n+1) = wˆ(n) +u(n)[d(n)  wˆH (n)u(n)] (10)
x(n) y(n)
d(n)
e(n)h(n+1) =
h(n) + μe(n)x(n)
y(n) = hT(n)x(n)
–
+
Figure 9: Adaptive LMS filter (copied from [5])
where wˆ is the tap-weight vector of
the LMS filter, computed at n, u is
the tap-input vector, d(n) is the de-
sired response, and  is the step-size
parameter, the asterisk denotes com-
plex conjugation and H denotes Her-
mitian transposition. An easier way to
represent it would be with a block di-
agram 9.
Here, the input signal x(n) is given as a vector containing current sample followed by
N   1 samples. The output value of the FIR filter is a product of the input and the vector
of N filter coefficients hT (n).
h(n) =
2666666666666666666664
h1(n)
h2(n)
:::
h3(n)
3777777777777777777775
; x(n) =
2666666666666666666664
x(n)
x(n  1)
:::
x(n N +1)
3777777777777777777775
(11)
y(n) = hT (n)x(n)
=
NX
i=1
hi(n)x(n  i +1) (12)
e(n) = d(n)  y(n) (13)
h(n+1) = h(n) = e(n)x(n) (14)
14
Error e(n) is the difference between output y(n) and the desired output d(n). Based on
that the filter coefficients are updated tominimize the output mean squared error E[(d(n) 
y(n))2]. Coefficient vector h(n+1) is a sum of the current h(n+1) with the weighted input
vector x(n), scaled with the error value e(n) and adaptation rate . [5, 10]
15
3 Implementation
3.1 Workflow
While FPGA’s can achieve incredible performance in streamlined tasks, conditional al-
gorithms are achieved with a great cost. To alleviate that Xilinx designed a licensed IP
softcore called Microblaze. It is a 32-bit RISC Harvard architecture soft processor core
and is included in free version of Vivado. It is implemented entirely in the general-purpose
memory and logic fabric of FPGA and therefore it allows for complete customization. In
the system, softcore handles all the input/output transactions between the PC and the
development board and prepares the data to be processed. After that, it starts the Adap-
tive Equalizer and feeds the data to the input. The interface between Ad. Eq. and the
softcore is implemented with several GPIO buses and multiplexer logic. I and Q values
are packed into a single variable using bitshift operations and then glue logic between
AdEq and softcore splits it into two data buses. While this is a very simple and conve-
nient solutions, it is also not optimal. It takes a great amount of processing time to do
these operations, while the equalizer is idle. A better solution would be to implement
AXI4 interface in the equalizer, which could be then accessed as a memory mapped
interface.
3.2 Hardware
The choice of hardware provided a major challenge. There are hundreds of different
configuration on the market starting from 100 euros up to several thousands. As the task
on hand was calculation heavy, the FPGA chip has to have enough processing power.
There are several ways to crunch number on FPGA. The first one is using inner fabric
to generate adders and multipliers. There are several topologies that have different pros
and cons. As this method is very area consuming, the manufacturers started to add
16
dedicated DSP blocks and multipliers to the chip to save the area.
Another essential component was the interface between FPGA board and PC. The in-
dustry standard is PCI Express providing 500 MB/s transfer rate and very easy protocol
to implement.
48-Bit Accumulator/Logic Unit
Pattern Detector
25 x 18
MultiplierPre-adder
B
A
D
C
P
+ / –
X
=
+
–
Figure 10: Basic DSP48E1 Slice Functionality
(copied from [6]).
After all the considerations, the choice
was to use Digilent Nexys4 plat-
form. It hosts Xilinx Artix-7 FPGA
chip, 16 MByte of CellularRam and
plethora of peripheral interfaces, in-
cluding 10/100 Ethernet. The board
was chosen for its impressive com-
putational power, having 240 DSP
blocks on the board. Each block has a pre-adder, 18x25 multiplier and 48 bit accumula-
tor. Such architecture is great for any kind of transversal filters, allowing for parallelization
and pipelining of computations.
3.3 Software
To provide an optimal solution in timely manner, a whole set of software was used. The
top level level design was created in Xilinx Vivado Design Suite. It allows to use Xilinx
IPs for the general tasks as well as packaging your own IPs from VHDL or Verilog code.
Filter design was done in System Generator. It is an addon to Simulink made by Xilinx
that allows to combine design, synthesis, and verification in a single tool. After the filter
is synthesized, it is packaged in IP package using Xilinx Vivado and integrated into the
system. This all allows for a modular based design, where you start with a simple proof-
of-concept and progress to a more complicated design, debugging and verifying the
design with each step.
17
 Adaptive Equalizer module and interface
 Memory modules
 Outside interface modules
 Debugging Modules
xlslice_0
Slice
Din[23:0] Dout[11:0]
lms_0
lms
i_desired[9:0]
i_in[11:0]
q_desired[9:0]
q_in[11:0]
clk
clr
i_out[9:0]
q_out[9:0]
dh_i[17:0]
dh_q[17:0]
e_i[17:0]
e_q[17:0]
coeff_i[17:0]
coeff_q[17:0]
xlslice_1
Slice
Din[23:0] Dout[11:0]
axi_gpio_1
AXI GPIO
S_AXI
GPIO
gpio_io_o[23:0]
s_axi_aclk
s_axi_aresetn
axi_gpio_2
AXI GPIO
S_AXI
GPIO
gpio_io_i[17:0]
s_axi_aclk
s_axi_aresetn
microblaze_0_axi_periph
AXI Interconnect
S00_AXI
M00_AXI
M01_AXI
M02_AXI
M03_AXI
M04_AXI
M05_AXI
M06_AXI
M07_AXI
ACLK
ARESETN[0:0]
S00_ACLK
S00_ARESETN[0:0]
M00_ACLK
M00_ARESETN[0:0]
M01_ACLK
M01_ARESETN[0:0]
M02_ACLK
M02_ARESETN[0:0]
M03_ACLK
M03_ARESETN[0:0]
M04_ACLK
M04_ARESETN[0:0]
M05_ACLK
M05_ARESETN[0:0]
M06_ACLK
M06_ARESETN[0:0]
M07_ACLK
M07_ARESETN[0:0]
ila_0
ILA (Integrated Logic Analyzer)
clk
probe0[9:0]
probe1[11:0]
probe2[17:0]
probe3[0:0]
axi_gpio_3
AXI GPIO
S_AXI
GPIO
gpio_io_i[17:0]
s_axi_aclk
s_axi_aresetn
mii_to_rmii_0
Ethernet PHY MII to Reduced MII
RMII_PHY_M
MII
rst_n
ref_clk
eth_rmii
processor_clk
eth_ref_clk
microblaze_0
MicroBlaze
INTERRUPT
DLMB
ILMB
M_AXI_DP
M_AXI_DC
M_AXI_IC
DEBUG
Clk
Reset
mdm_1
MicroBlaze Debug Module (MDM)
MBDEBUG_0
Debug_SYS_Rst
axi_mem_intercon
AXI Interconnect
S00_AXI
M00_AXI
S01_AXI
ACLK
ARESETN[0:0]
S00_ACLK
S00_ARESETN[0:0]
M00_ACLK
M00_ARESETN[0:0]
S01_ACLK
S01_ARESETN[0:0]
clk_wiz_1
Clocking Wizard
clk_in1
clk_out1
clk_out2
clk_out3
clk_out4
resetn
locked
util_vector_logic_0
Utility Vector Logic
Op1[0:0] Res[0:0]reset microblaze_0_local_memory
DLMB
ILMB
LMB_Clk
SYS_Rst[0:0]
rst_clk_wiz_1_100M
Processor System Reset
slowest_sync_clk
ext_reset_in
aux_reset_in
mb_debug_sys_rst
dcm_locked
mb_reset
bus_struct_reset[0:0]
peripheral_reset[0:0]
interconnect_aresetn[0:0]
peripheral_aresetn[0:0]
psram_ip_0
psram_ip_v1.0
S00_AXI
psram_rtls00_axi_aclk
s00_axi_aresetn
psram_rtl
sys_clock
resetn
util_vector_logic_1
Utility Vector Logic
Op1[0:0] Res[0:0]
usb_uart
axi_uartlite_0
AXI Uartlite
S_AXI
UART
s_axi_aclk
s_axi_aresetn
interrupt
microblaze_0_axi_intc
AXI Interrupt Controller
s_axi
interrupt
s_axi_aclk
s_axi_aresetn
intr[1:0]
processor_clk
processor_rst
axi_gpio_0
AXI GPIO
S_AXI
GPIO
gpio_io_o[20:0]
s_axi_aclk
s_axi_aresetn
xlslice_2
Slice
Din[20:0] Dout[9:0]
xlslice_3
Slice
Din[20:0] Dout[9:0]
microblaze_0_xlconcat
Concat
In0[0:0]
In1[0:0]
dout[1:0]
axi_ethernetlite_0
AXI EthernetLite
S_AXI MII
MDIOs_axi_aclk
s_axi_aresetn ip2intc_irpt eth_mdio_mdc
axi_timer_0
AXI Timer
S_AXI
capturetrig0
capturetrig1
generateout0
generateout1
pwm0
interrupt
freeze
s_axi_aclk
s_axi_aresetn
Figure 11: Complete system
18
3.4 Implementation
Fractionally spaced equalizer receives K samples to produce 1 output samples and up-
date it weights. In this implementation this number is 2 meaning the output sample rate
is 1/T where input sample rate is K/T. The weight-updating occurs at the output rate,
which is the slower rate.
Initial design and simulation of the equalizer were done in System Generator. The soft-
ware works in cooperation with Simulink and allows to automatically make testbenches
for VHDL simulation. The design is heavily based on a standard implementation provided
by Xilinx Inc. Figure 12 presents top view on the equalizer. QAM16 source generates
I and Q parts of the signal, which are merged into a complex number. In these num-
bers are convoluted with a sinc filter and interpolated to produce band limitation. In FIR
Channel Mode FIR filter with coefficients [ 1 + 0:1j 0:2 0:2j ] is applied to the signal,
simulating a channel. The unmodified signal from QAM16 source is fed into the equalizer
as a training sequence.
Adaptive Equalizer
•
double
•
Fix_12_10
•
double•
Fix_12_10
In Out
dB
 Conversion
double
I_In
Q_In
I_desired
Q_desired
I_Out
Q_Out
Error
DHAT
Coeff
T_2 FSE
double (c)
double (c)
double (c)
Fix_10_8
Fix_10_8
Symbols
z-1
Fix_10_8
z-1
Fix_12_10
z-1
Fix_10_8z-1
Fix_12_10
 Random Data
QAM16 Source
double (c)
Data
I_Out
Q_Out
Interpolation
double
double
[coef]
[dhat]
[error]
[error] double (c)
[coef] double (c)
[dhat] double (c)
User
Filter
Taps
I
Q
I_Chan
Q_Chan
FIR Channel
Model
double
double
Error
DHAT
Re
Im
double
double
|u|
double
System
Generator
Re
Im
double
double
•
Fix_10_8
•
Fix_10_8
z-1
Fix_10_8
z-1
Fix_10_8
Figure 12: Top view of System Generator
Figure 13 represents the equalizer. The input signal is fed into the filter. The output of the
19
filter is later compared with the desired value and the error (Listing 1) is calculated.
1 function [ mu_e_i , e_ i , e_q , mu_e_q ] = lmsErr ( sym_i , f i l t _ i , sym_q , f i l t _ q , mu)
2
3 pe r s i s t en t r0 , r0 = x l _ s t a t e (mu, { xlSigned , 18 , 15 , xlRound , xlWrap } ) ;
4 pe r s i s t en t r1 , r1 = x l _ s t a t e (mu, { xlSigned , 18 , 15 , xlRound , xlWrap } ) ;
5 pe r s i s t en t r2 , r2 = x l _ s t a t e (mu, { xlSigned , 18 , 15 , xlRound , xlWrap } ) ;
6 pe r s i s t en t r3 , r3 = x l _ s t a t e (mu, { xlSigned , 18 , 15 , xlRound , xlWrap } ) ;
7 pe r s i s t en t r4 , r4 = x l _ s t a t e (mu, { xlSigned , 18 , 15 , xlRound , xlWrap } ) ;
8 pe r s i s t en t r5 , r5 = x l _ s t a t e (mu, { xlSigned , 18 , 15 , xlRound , xlWrap } ) ;
9
10 mu = x f i x ( { xlSigned , 18 , 15 , xlRound , x lSa tu ra te } , mu) ;
11
12 e_ i = r4 ;
13 r4 = sym_i   f i l t _ i ;
14
15 e_q = r5 ;
16 r5 = sym_q   f i l t _ q ;
17
18 mu_e_i = r0 ;
19 r0 = r1 ;
20 r1 = r4 * mu;
21
22 mu_e_q = r2 ;
23 r2 = r3 ;
24 r3 =   r5 * mu;
Listing 1: Error Calculation
5
Coeff
4
DHAT
3
Error
2
Q_Out
1
I_Out
•
e_q
double
•
e_i
double
•
dh_q
double
•
dh_i
double
z-1
Fix_10_8
z-1
Fix_10_8
Re
Imdouble (c)
Re
Im
double (c)
z-2
QMap
Fix_10_8
z-1
z-1
z-1
z-1
MCode
Fix_18_15
Fix_18_15
Fix_18_15
Fix_18_15
z-2
IMap
Fix_10_8
I_In
Q_In
Err_I
Err_Q
I_Out
Q_Out
Coeff
Filters
double (c)
Fix_18_10
Fix_18_10
z-2
Fix_18_10
z-2
Fix_18_10
2
Q_In
Fix_12_10
1
I_In
Fix_12_10
4
Q_desired
Fix_10_8
3
I_desired
Fix_10_8
z-11
Down Sample
Fix_10_8
z-11
Down Sample1
Fix_10_8
Figure 13: Fractionally spaced equalizer
The implementation of the filter algorithm is represented in Figure 14. Both LMS struc-
tures are identical, the input data is downsampled at the ratio of 2. On the top LMS the
input data is also delayed by 1 sample so that the filters can work in parallel.
20
Figure 14: Adaptive Filter top view
Figure 15: LMS Filter
The inner workings of LMS filter are a bit trickier. ASR on the left side is shift register of
length 4, fir_fsm is a 2 bit counter, counting from 0 to 4. What it does it syncronizes the
whole system so that each value in the buffer is multiplied with corresponding filter co-
efficient. As there are 4 values in the buffer and there are 2 structures like that in parallel,
it means we have an 8-tap filter. The sum of all the values is kept in the Accumulator.
When the counter goes through the whole cycle, the rst bit is set, which stores the data
21
in the output register and nullifies the accumulator.
4 Results
To produce repeatable results a test bench was designed. The idea is to eliminate as
many unnecessary elements as possible while keeping the system running smoothly
thus decreasing the invariance. For a precise measurement, a timer module was added
on FPGA fabric and connected to the processor via AXI-LITE bus. The first test is run on
10 000 samples to evaluate convergence rate and overall error reduction.
Figure 16: Simplified diagram of the test environment
As demonstrated in Figure 17, it takes approximately 550 iterations for the filter to con-
verge. After that is stays at the error magnitude of 0.1. For the benchmark, an n-long
sequence of 12 bit values is prepared for both I and Q channels, where n is 50, 100 or
200. Then, the timer is initialized and started. Benchmarks for every n is repeated 100
times. Every time, the value of the timer is read before and after the benchmark and then
subtracted. As the microprocessor runs at 100 MHz, the value from the timer is then
divided by 100 ·106 . The code could be found in the Appendix 1.
0 500 1000 1500 2000 2500 3000 3500 4000
0
0.2
0.4
0.6
0.8
1
Iterations
Er
ro
r m
ag
ni
tu
de
Figure 17: Equalizer error at mu=0.05
Table 1 shows the execution time in respect to the amount of samples in one frame.
22
As you can see from the results, first two executions add fixed overhead of 434 and 33
clocks. On the third and all the consecutive runs the results are surprisingly consistent. It
also becomes apparent there is a slight inverse correlation between the size of the frame
and the average time required to process each sample in the frame.
# n = 50 n = 100 n = 200
1 1444 2744 5344
2 1043 2343 4943
3 1010 2310 4910
4 1010 2310 4910
5 1010 2310 4910
... ... ... ..
95 1010 2310 4910
96 1010 2310 4910
97 1010 2310 4910
98 1010 2310 4910
99 1010 2310 4910
100 1010 2310 4910
average clock per value 20,2934 23,1467 24,57335
average time per value (ns) 202,934 231,467 245,7335
Table 1: Execution time (lower is better) at different sample sizes
To compare results with corresponding work from [5, p. 57, Table 6], the benchmark was
modified so that the sample size is the same 64 samples. The results were the integrated
into Table 2. The comparison clearly shows the massive advantage FPGA has over the
DSP implementation. There are several reasons to this.
Total average, ns Max, ns Min, ns Records
FPGA FSE 13 786,7 18 080 13 740 100
LMS 552 296,42 513 860 552 544 2856
NLMS 679 089,46 641 005 679 325 1070
Complex LMS 1 174 887,12 1 098 390 1 175 245 2208
Complex NLMS 1 298 153,61 1 221 652 1 298 738 818
Complex LMS 2 cores 2 289 930,02 2 290 858 2 289 170 3532
Complex NLMS 2+1 cores 4 731 336.67 4 818 911 4 616 363 2477
Table 2: Algorithm execution benchmarks
The biggest impact is caused in difference in architecture. The flexibility of FPGA allows
for concurrent and pipelined computation of the algorithm. The data flow could be di-
vided into smaller streams and processed in parallel. Having a configurable hardware
meaning the whole system could be streamlined. The LMS module is connected directly
to the microprocessor and requires very little overhead to use. There is no loss because
of context switching and multicore synchronization. The DSP implementation also uses
23
floating point calculations. While in theory it could provide better accuracy, it is also
computationally more expensive and slower.
Bibliography
1 Treichler JR, Fijalkow I, Johnson CR. Fractionally Spaced Equalizers. IEEE
Signal Processing Magazine. 1996;13. 4, 2, 10, 11, 12
2 Proakis J, Salehi M. Digital Communications. 5th ed. McGraw Hill; 2008. 4, 2,
5
3 Gibson J. The mobile communications handbook. 2nd ed. CRC Press LLC;
1999. 4, 6, 8, 9, 12
4 Ghent University. Physical radio channel models;. Accessed: 15.01.2016.
http://www.wica.intec.ugent.be/research/propagation/physical-radio-channel-
models. 4, 7
5 Kempi I. Adaptive Channel Equalization: Multicore Processor Implementation.
Metropolia University of Applied Sciences; 2015. 4, 13, 14, 22
6 Xilinx, Inc. 7 Series DSP48E1 User Guide; 2014. 4, 16
7 Gitlim RD, Hayes JF, Weinstein SB. Data Communication Principles. Plenum
Press; 1992. 10
8 Bingham JAC. The Theory and Practice of Modem Design. Wiley Interscience;
1988. 11
9 Haykin S, Widrow B. Least-Mean-Square Adaptive Filters. Wiley Interscience;
2003. 12
24
10 Haykin S. Adaptive Filter Theory. 3rd ed. Prentice Hall; 1995. 14
Appendix 1
1 (2)
1 Benchmark Code
1
2 # inc lude <s td io . h>
3 # inc lude ” p la t fo rm . h”
4 # inc lude < l i m i t s . h>
5 # inc lude ” f s l . h”
6 # inc lude ” x tmrc t r . h”
7 # inc lude ” xparameters . h”
8
9 # inc lude ” tes t_va lues . h”
10 #def ine COUNT 64
11
12 void p r i n t ( char * s t r ) ;
13
14 i n t main ( )
15 {
16 i n i t _ p l a t f o rm ( ) ;
17
18 p r i n t ( ”START\ n \ r ” ) ;
19 i n t ou t_ i [COUNT ] ;
20 i n t out_q [COUNT ] ;
21 / / I n i t i a l i z e r t imer
22 XTmrCtr TmrCtr InstancePtr ;
23 XStatus Status = XTmrC t r _ I n i t i a l i z e (& TmrCtr InstancePtr , XPAR_TMRCTR_0_DEVICE_ID ) ;
24 i f ( Status != XST_SUCCESS) {
25 return XST_FAILURE ;
26 }
27
28 / / Timer t e s t
29 Status = XTmrCtr_Sel fTest (& TmrCtr InstancePtr , 0 ) ;
30 i f ( Status != XST_SUCCESS) {
31 return XST_FAILURE ;
32 }
33 / / Load t imer conf ig
34 XTmrCtr_SetOptions (& TmrCtr InstancePtr , 0 , XTC_AUTO_RELOAD_OPTION ) ;
35 vo l a t i l e u32 TValue1 ;
36 vo l a t i l e u32 TValue2 ;
37
38 TValue1 = XTmrCtr_GetValue (& TmrCtr InstancePtr , 0 ) ;
39 XTmrCtr_Start (& TmrCtr InstancePtr , 0 ) ;
40
41 i n t TimerValues [ 1 0 0 ] ;
42
43 i n t c , i ;
44 for ( i = 0 ; i <100; i ++) {
45 TValue1 = XTmrCtr_GetValue (& TmrCtr InstancePtr , 0 ) ;
46 for ( c = 0 ; c < COUNT; c++) {
47 / / load va lues i n t o equa l i z e r
48 pu t f s l x ( i n _ i [ c ] , 0 , FSL_DEFAULT ) ;
49 pu t f s l x ( in_q [ c ] , 1 , FSL_DEFAULT ) ;
50 / / get va lues from equa l i z e r
Appendix 1
2 (2)
51 ge t f s l x ( ou t _ i [ c ] , 0 , FSL_DEFAULT ) ;
52 ge t f s l x ( out_q [ c ] , 1 , FSL_DEFAULT ) ;
53 }
54 TValue2 = XTmrCtr_GetValue (& TmrCtr InstancePtr , 0 ) ;
55 TimerValues [ i ] = TValue2 TValue1 ;
56 / / send t imer va lues
57 x i l _ p r i n t f ( ”%d\ r \ n” , TimerValues [ i ] ) ;
58 }
59
60 / / send t e s t r e s u l t s
61 x i l _ p r i n t f ( ” ou t _ i : \ r \ n” ) ;
62 for ( c = 0 ; c < COUNT; c ++)
63 x i l _ p r i n t f ( ”%d\ r \ n” , ou t _ i [ c ] ) ;
64
65 x i l _ p r i n t f ( ” out_q : \ r \ n” ) ;
66 for ( c = 0 ; c < COUNT; c ++)
67 x i l _ p r i n t f ( ”%d\ r \ n” , out_q [ c ] ) ;
68
69 cleanup_plat form ( ) ;
70 return 0;
71 }
Listing 2: Benchmark Code
