Implementation of Four Real-Time Software Defined Receivers and a Space-Time Decoder using Xilinx Virtex 2 Pro Field Programmable Gate Array by Green, P.J. & Taylor, D.P.
Implementation of Four Real-Time Software Defined Receivers and a
Space-Time Decoder using Xilinx Virtex 2 Pro Field Programmable Gate Array
Peter J. Green and Desmond P. Taylor
Department of Electrical and Computer Engineering University of Canterbury
Christchurch
New Zealand
peter.green@canterbury.ac.nz
taylor@elec.canterbury.ac.nz
Abstract
This paper describes the concept, architecture, develop-
ment and demonstration of a real time, high performance,
software defined 4-receiver system and a space time de-
coder to be implemented on a Xilinx Virtex 2 Pro Field Pro-
grammable Gate Array. It is designed and developed for
research into receiver diversity and multiple input and mul-
tiple output (MIMO)wireless systems. Each receiver has a
Freescale DSP56321 digital signal processor (DSP) to run
synchronization, channel state estimation and equalization
algorithms. The system is software defined to allow for flexi-
bility in the choice of receiver demodulation formats, output
data rates and space-time decoding schemes. Hardware,
firmware and software aspects of the receiver and space
time decoder system to meet design requirements are dis-
cussed.
The current system implementation is an enhancement
to an existing Smart Antenna Software RAdio Test Sys-
tem (SASRATS) platform [5, 6] designed to test and ver-
ify various space time architectures and algorithms. Our
original receivers, developed in 2000 [5] were designed for
low bandwidth (270 KBauds) and beamforning/smart an-
tenna work. The latest enhancement allows high bandwidth
performance (4 MBauds), parallel 16-bit IQ baseband data
outputs and greater programmability. The 4 receivers com-
plement a 4 transmitter space time (ST) encoding platform
designed and developed for real-time testing of ST coding
schemes developed by Alamouti [1] and others mentioned
in [4]. The primary objective is to increase system capacity
and performance through the use of multiple antennas, em-
ploying spatial multiplexing and ST coding and decoding.
The focus of this paper is on the digital and baseband
portion of the system. The analogue portion consists of an
RF amplifier, a phase locked local oscillator, a mixer and in-
termediate frequency (IF) bandpass filters and high gain IF
amplifiers whose purpose is to amplify, translate and filter a
received radio frequency signal at 915 MHz or 2.4 GHz to
an intermediate frequency (IF) of 70 MHz where digitiza-
tion and bandpass sampling occurs. The architecture of the
digital IF and baseband portion of the receivers is shown in
Figure 1.
The 70 Mhz IF signal is digitised using bandpass sam-
pling techniques using a high speed Analog Devices analog-
to-digital convertor (ADC) AD6640 at a sampling clock fre-
quency of 65 MHz. The sampling clock frequency is fully
programmable and is generated from a Direct Digital Syn-
thesizer (DDS) from a AD9857 Upconvertor integrated cir-
cuit (IC) working in fixed tone mode. This allows us the
flexibility to change the sampling clock frequency for vari-
ous experiments. The output of the ADC is then fed into a
versatile Analog Devices AD6620 Digital Down Converter
(DDC) which digitally downconverts, decimates and filters
the input data to baseband IQ signals for further process-
ing. The AD6620 is fully programmable and together with
the AD9857, makes the receiver software defined. How-
ever, one is restricted by the chip design to resolution of
the output data words to 16-bits and the sequential paral-
lel 16-bit output format of the in-phase (I) and quadrature
phase (Q) signals. The 16-bit sequential I and Q data are
stored in 2 sets of 32-bit buffers. A Data Valid pulse is gen-
erated after the IQ data are latched into the buffers. The
IQ data can then be accessed as a simultaneous parallel 32-
bit output pair or as separate I and Q data words, using
memory mapped addressing under DSP or FPGA control.
Simultaneous 32 bit parallel transfer is normally used be-
tween the DDC and a National Instruments Data Acquisi-
tion (NIDAQ) card which is used with the SASRATS plat-
form to perform batch or pseudo real-time processing of
data using MATLAB. This feature is useful during devel-
opment and testing stages of the design.
Proceedings of the Third IEEE International Workshop on Electronic Design, Test and Applications (DELTA’06) 
0-7695-2500-8/05 $20.00 © 2005 IEEE 
Authorized licensed use limited to: University of Canterbury. Downloaded on June 14,2010 at 02:25:04 UTC from IEEE Xplore.  Restrictions apply. 
Freescale
DSP56321
Sychronizer
Analog
Down
Converter
Xilinx
Virtex 2 Pro
FPGA
AD6620
Digital
Down Converter
AD9857
Direct Digital
Synthesizer
AD6640
Analog-Digital
Converter
RX 1
Antenna
IQ
Sync
65 MHz
Clock
70 MHz
IF
DDS
Sync
NCO Sync
Freescale
DSP56321
Sychronizer
Analog
Down
Converter
AD6620
Digital
Down Converter
AD9857
Direct Digital
Synthesizer
AD6640
Analog-Digital
Converter
RX 2
Antenna
IQ
Sync
65 MHz
Clock
70 MHz
IF
DDS
Sync
NCO Sync
Freescale
DSP56321
Sychronizer
Analog
Down
Converter
AD6620
Digital
Down Converter
AD9857
Direct Digital
Synthesizer
AD6640
Analog-Digital
Converter
RX 3
Antenna
IQ
Sync
65 MHz
Clock
70 MHz
IF
DDS
Sync
NCO Sync
Freescale
DSP56321
Sychronizer
Analog
Down
Converter
AD6620
Digital
Down Converter
AD9857
Direct Digital
Synthesizer
AD6640
Analog-Digital
Converter
RX 4
Antenna
IQ
Sync
65 MHz
Clock
70 MHz
IF
DDS
Sync
NCO Sync
Data Output
Figure 1. SASRAT 4 receiver system architec-
ture with Xilinx space-time decoder
One set of buffered IQ signals are fed back into a
Freescale DSP56321 platform through Port A which per-
forms synchronization and channel estimation algorithms.
The software for the DSP56321 is written in assembly lan-
guage for optimum high speed performance. The output of
this algorithm can dynamically control the Numerical Con-
trol Oscillator (NCO) in the AD6620 down convertor via
Port B of the DSP or the AD9857 DDS through Port C to
achieve synchronization.
In any digital receiver design, the major issues that must
be solved involve the adjustment of gain in the receiver
structure to work within the dynamic range of the signal
processing algorithms, the ability to synchronize to the car-
rier frequency and phase, the ability to synchronize the time
at which to sample the received symbols and the type of fil-
tering needed for best possible bit-error performance. This
paper does not cover the issues of gain control.
Carrier synchronization is required as the transmitter and
the local oscillator (LO)in the receiver are derived from in-
dependent clock sources and will thus have a frequency off-
set. Doppler shift in the channel will also affect the received
carrier frequency. Frequency offset will cause carrier phase
rotation and correct decisions cannot be made on the rotat-
ing symbols. The local oscillator must be adjusted to stop
this carrier rotation and continously track and correct any
frequency offset that will occur over time.
We use a decoupled approach to carrier and timing syn-
chronization. This allows the receiver to correct for car-
rier frequency offsets even when the symbol timing is not
known. Wherever possible, we adopt a feedforward ap-
proach through the use of known training symbols (data-
aided) or preambles to resolve phase ambiquity. This will
also work in Rayleigh flat fading channels.
In our implementation to assist the receiver in correcting
the frequency offset, a sync preamble is sent by the trans-
mitter before data is sent. At the receiver, the I and Q data is
processed by an algorithm depicted in Figure 2 which will
adjust the frequency of the NCO to negate the frequency
offset. On initialization and after a Data Valid signal ap-
Initiailize DSP and Interrupts
Read I and Q
Compute using CORDIC
algorithm ( mod 2
i i
i

Interrupt
on
IRQA?
n
y
 i i+1=
Compute Angle between
IF
Q >0 AND Q < 0
THEN = 2 - +
OR
i+1 i
f i i+1   
-
IF Q < 0 AND Q > 0
THEN = -(2 + - )
ELSE
= -
 
   
  
i+1 i
i+1 i
f i i+1
f i+1 i
Adjust NCO frequency
IF
< 0
Reduce NCO frequency
by scaled factor of |
f


f
f
|
ELSE
Increase NCO frequency
by scaled factor of | |
IRQA received? n
y
Read I and Q
Compute using CORDIC
algorithm ( mod 2
i+1 i+1
i+1

Figure 2. Algorithm for Carrier Frequency Off-
set Compensation
pears on the Interrupt Request (IRQ) line of the DSP, the
DSP reads the I and Q data stored in the 32-bit buffer. The
Proceedings of the Third IEEE International Workshop on Electronic Design, Test and Applications (DELTA’06) 
0-7695-2500-8/05 $20.00 © 2005 IEEE 
Authorized licensed use limited to: University of Canterbury. Downloaded on June 14,2010 at 02:25:04 UTC from IEEE Xplore.  Restrictions apply. 
algorithm then computes the phase angle of the vector using
the Coordinate Rotation Digital Computer (CORDIC) algo-
rithm [2]. The CORDIC algorithm is an iterative shift-add
algorithm which can perform rectangular to polar conver-
sions in DSPs’ quickly and efficiently. This value is stored
and the next IQ pair is then read and the new phase angle
computed. The current and previous phases are compared
and taking into account modulo 2π operation, the differ-
ence is computed. As the timing period between the current
and previous samples is known, the frequency offset can
be estimated. A positive estimate indicates that the NCO
frequency is below the TX carrier and the converse is true.
Knowing the direction and speed of the rotating vector, the
NCO is then programmed to correct for this. A new IQ pair
is read and the process is repeated over the duration of the
sync preamble.
A symbol sync preamble follows the carrier sync pream-
ble. The symbol sync preamble allows the receiver to es-
timate and track the time at which to sample the received
waveform. Symbol timing recovery will be implemented
on the DSP56321 based on an all digital maximum like-
lihood symbol timing recovery algorithm by Watkins and
Taylor [9]. The Watkins timing error detector (TED) works
at baud rate and has performance comparable to the frac-
tionally sampled (2× baud rate) Gardner TED or the Am-
plitude Directed TED by Verdin and Tozer [3]. TheWatkins
TED also works well in Rayleigh flat fading channel.
In our complementary SASRATS transmitter platform
[7] running a 2 transmit Alamouti encoding scheme, two
symbols, s0 and s1 are transmitted simultanously from two
transmitters at time instant t. At time instant t + T , the
symbols −s∗1 and s∗0 are transmitted simultanously from
the transmitters where * represents the complex conjugate.
This is depicted in Figure 3. The transmitted symbols travel
through 2 independent channels h0 and h1 to a receiver
where noise and interference n0 and n1 are added to the
received signal. h0 and h1 are complex multiplicative dis-
tortions assumed constant across two consecutive symbols.
It is shown in [1] that at the input of the combiner, the re-
ceive signals are given by r0 = r(t) = h0s0 + h1s1 + n0
and r1 = r(t + T ) = −h0s∗1 + h1s∗0 + n1. A chan-
nel estimator algorithm estimates h0 and h1 and this in-
formation is also fed to the combiner to yield two com-
bined output signals s˜0 = (α20 + α21)s0 + h∗0n0 + h1n∗1
and s˜1 = (α20 + α21)s1 − h0n∗1 + h∗1n0 where α0 and α1
are the magnitudes of h0 and h1 respectively. The signals
s˜0 and s˜1 together with the channel state information h0
and h1 are sent to the maximum likelihood (ML) detector
so that ML estimates sˆ0 and sˆ1 can be made of s0 and s1.
As we use PSK modulation of the symbols at the transmit-
ter, the decision rule in the ML detector is to choose si iff
d2(s˜0, si) ≤ d2(s˜0, sk),∀i = k for sˆ0 and choose si iff
d2(s˜1, si) ≤ d2(s˜1, sk),∀i = k for sˆ1 where d2(x, y) is the
RX 0
Antenna
s
s
0
1-
*
h0 = e0
j
TX 0
Antenna
TX 1
Antenna
s
s
1
0
*
n
n
0
1
h1 = e
j
Maximum
Likelihood
Detector
Channel
Estimator
Combiner
~
s0
h0
h0
h1
h1
~
s1
^
s0
^
s1
SASRATS
RECEIVER
0
VIRTEX FPGA
RX 1
Antenna
n
n
2
3
Channel
Estimator
h
2
h
2
h
3
h
3
SASRATS
RECEIVER
1
h
2
= e
j
h
3
= e	
j
r
r
0
1
r
r
2
3 to
Combiner
to
Maximum Likelihood
Detector
Figure 3. Block diagram of Alamouti decod-
ing implementation on SASRATS platform
squared Euclidean distance between signals x and y.
The Alamouti scheme can be extended to more than one
receiver but to implement a 2 transmitter and 2 receiver
system for example, requires the estimation of 4 channels
(h0, h1, h2 and h3), 2 at each receiver as shown in Figure
3. The decoding of the Alamouti encoded signals is a lin-
ear process and our SASRATS receiver system design will
implement the combiner and maximum likelihood detection
on the Xilinx Virtex 2 Pro FPGA board using the Xilinx In-
tegrated System Enviroment (ISE) Foundation design tool.
The channel estimation is carried out on the DSP platforms.
The channel estimation algorithm will be based on a
feedforward data-aided (DA) technique based on a set of
training symbols sent after the carrier and timing sync
preambles where optimal minimum mean square error
(MMSE) from the received training symbols can be per-
formed via ML estimation. Data aided reception is of rela-
tively low complexity since channel estimation (carrier syn-
chronization) and detection are totally decoupled [8]. Each
transmitter will have orthogonal preambles to enable the re-
ceiver to estimate the different channels h0 and h1 needed
in the 2 TX 1 RX Alamouti decoding scheme mentioned
above.
The hardware architecture allows high speed IQ and syn-
chronization signals from up to 4 receivers to be combined
into the FPGA board for processing. For more sophiscated
space-time decoding experiments that require a Viterbi de-
coder or other decoding algorithms, the Xilinx Core Gener-
Proceedings of the Third IEEE International Workshop on Electronic Design, Test and Applications (DELTA’06) 
0-7695-2500-8/05 $20.00 © 2005 IEEE 
Authorized licensed use limited to: University of Canterbury. Downloaded on June 14,2010 at 02:25:04 UTC from IEEE Xplore.  Restrictions apply. 
ator modules can be used. The Xilinx Core Generator Intel-
lectual Property (IP) modules [10] include math functions,
correlators, filters, modulators, memory and storage ele-
ments incorporated into the ISE Foundation software. How-
ever, the Viterbi algorithm require an additional license.
Figure 4. Photo of one SASRATS receiver
with external FPGA board
Figure 5. Photo of 4 SASRATS receivers with
external FPGA board on top
Figure 4 shows the internal digital hardware of one receiver
and Figure 5 shows the complete system consisting of 4 re-
ceivers. They are fully operational up to 5 Mbauds and fully
software defined.
1 Conclusions
We have described the design and development of a real
time software defined 4-receiver system and an Alamouti
decoder to be implemented on an FPGA. Synchronization
and decoder algorithms are fully defined with DSP software
and FPGA firmware currently undergoing development and
testing.
References
[1] S. Alamouti. Space block coding: A simple transmitter di-
versity technique for wireless communications. IEEE J. Se-
lect. Areas. Communication, 16:1451–1458, Oct. 1998.
[2] R. Andraka. A survey of CORDIC algorithms for FPGA
based computers. Proc. ACM Int. Symp. on Field Pro-
grammable Gate Arrays, Monterey, CA, USA, pages 191–
200, 1998.
[3] R. Clark. On synchronization issues in wireless mobile dig-
ital communications. Master’s thesis, University of Canter-
bury, 2002.
[4] D. Gesbert et al. From theory to practice: An overview of
mimo space-time coded wireless systems. IEEE Journal on
Selected Areas in Communications, 21:281–302, Apr. 2003.
[5] P. Green and D. Taylor. Smart antenna software radio test
system. Proceedings of the First IEEE International Work-
shop on Electronic Design, Test and Applications., 1:68–72,
Jan. 2002.
[6] P. Green and D. Taylor. Experimental verification of space-
time algorithms using the smart antenna software radio test
system (sasrats) platform. Personal, Indoor and Mobile Ra-
dio Communications, 2004. PIMRC 2004. 15th IEEE Inter-
national Symposium on, 4:2539–2544, 2004.
[7] P. Green and D. Taylor. Implementation of a high speed four
transmitter space-time encoder using field programmable
gate array and parallel digital signal processors. Proceed-
ings of the Third IEEE International Workshop on Electronic
Design, Test and Applications., Jan. 2006.
[8] M. M. H. Meyr and S. A. Fechtel. Digital Communication
Receivers. Wiley, 1998.
[9] D. Taylor and G. Watkins. Ml symbol timing recovery in
the rayleigh flat-fading channel using baud rate sampling.
Conf. Proceedings of the 2001 International Symposium on
Signals, Systems, and Electronics (ISSE’01), pages 62–65,
2001.
[10] Xilinx. http://www.xilinx.com/ ipcenter/ coregen/ core-
gen iplists.htm, Nov. 2005.
Proceedings of the Third IEEE International Workshop on Electronic Design, Test and Applications (DELTA’06) 
0-7695-2500-8/05 $20.00 © 2005 IEEE 
Authorized licensed use limited to: University of Canterbury. Downloaded on June 14,2010 at 02:25:04 UTC from IEEE Xplore.  Restrictions apply. 
