FPGA Implementation of a Real Time Maximum Likelihood Space-Time Decoder on a MIMO Software Radio Test Platform by Green, P.J. & Taylor, D.P.
FPGA Implementation of a Real Time Maximum Likelihood Space-Time Decoder on
a MIMO Software Radio Test Platform
Peter J. Green and Desmond P. Taylor
Department of Electrical and Computer Engineering
University of Canterbury
Christchurch, New Zealand
Email: drgreenpeter@gmail.com
Email: taylor@elec.canterbury.ac.nz
Abstract—This paper describes the concept, architecture,
development and demonstration of a real time, maximum
likelihood Alamouti decoder for a wireless 4-transmit 4-receiver
multiple input and multiple output (MIMO) Smart Antenna
Software Radio Test System (SASRATS) platform. It is imple-
mented on a Xilinx Virtex 2 Pro Field Programmable Gate
Array (FPGA). Hardware, firmware, use of the Xilinx Core
Generator Intellectual Property modules and experimental
verification of the decoder are discussed.
Keywords-real-time implementation; Alamouti; FPGA; max-
imum likelihood decoder; MIMO; software radio test platform
I. INTRODUCTION
The proposed system implementation is developed on
an existing MIMO Smart Antenna Software RAdio Test
System (SASRATS) platform [1], [2] designed to test and
verify various space time architectures and algorithms. The
4 receivers complement a 4-transmitter space time (ST)
encoding platform [3] designed and developed for real-time
testing of ST coding schemes developed by Alamouti [4] and
others mentioned in [5]. The primary objective is to increase
system capacity and performance through the use of multiple
antennas, employing spatial multiplexing and ST coding and
decoding. Spatial multiplexing and diversity techniques are
currently adopted in the IEEE 802.11n draft specification to
fully exploit the benefit of MIMO channels.
The focus of this paper is on the digital baseband portion
of the system, particularly the real-time implementation of
the Alamouti decoder on a Xilinx Virtex 2 Pro FPGA. Other
MIMO testbeds [6], [7] typically perform post processing
operations such as channel estimation and Alamouti decod-
ing in Matlab after capturing large batches of data. Real-time
implementation of a 2 × 1 Alamouti decoder was briefly
described by [8]. Our work describes in detail, the real-time
implementation of a maximum likelihood 2 × 2 Alamouti
decoding implementation extending to a 2 × 4 system on
the SASRATS platform.
II. OVERVIEW OF THE SASRATS ARCHITECTURE
The basic architecture of the SASRATS receivers is shown
in Figure 1. The analogue portion amplifies, translates and
Xilinx
Virtex 2 Pro
FPGA
RX 1
DSP56321
Synchronizer
Analog
Down
Converter
Digital
Down Converter
Direct Digital
Synthesizer
Analog-
Digital
Converter
RX 2
IQ
Sync
65 MHz
Clock
70 MHz
IF
DDS
Sync
NCO Sync
Data Output
RX 4RX 3
Figure 1. SASRAT 4 receiver system architecture with the Xilinx FPGA
channel estimator
filters a received radio frequency signal at 915 MHz or 2.4
GHz to an intermediate frequency of 70 MHz where digitiza-
tion and bandpass sampling occurs. The output of the analog
to digital converter is then fed into a digital down converter
which digitally downconverts, decimates and filters the input
data to produce baseband in-phase (I) and quadrature phase
(Q) signals for further processing. The SASRATS receivers
work asynchronously with the transmitters and we have
developed and implemented real-time algorithms for carrier
and symbol timing synchronization [9] and also channel
estimation operations [10] in DSP and FPGA.
We adopt a feedforward approach through the use of
known training symbols (data-aided) or preambles at the
transmitter to resolve magnitude and phase ambiquities in
Rayleigh flat fading channels. We assume that the channels
change only slowly during the period between training
preambles. A portion of the FPGA performs real-time chan-
nel estimation as described in [10]. In a 4-transmiter and 4-
receiver (4× 4) MIMO system, each receiver must estimate
4 distinct channels with a total of 16 channel estimates for
4 receivers.
III. OVERVIEW OF ALAMOUTI SCHEME
The Alamouti scheme is the only orthogonal space-time
block code using complex signals for two transmit antennas
which provides full diversity of 2 and full rate of 1. For more
2010 Fifth  IEEE International Symposium on Electronic Design, Test & Applications
978-0-7695-3978-2/10 $26.00 © 2010 IEEE
DOI 10.1109/DELTA.2010.9
139
than two transmit antennas, the goal is to design transmission
codes that achieve full diversity at the highest possible rate
with low decoding complexity.
In our 2× 2 MIMO implementation, we use two distinct
training codes over 2 time multiplexed preamble slots at the
transmitter. When one transmitter is sending training data
in one time slot, the other is off. These 26-bit preambles
are GSM training sequence codes (TSC) 0 and 1 [11].
The two transmitters then transmit 128 space-time encoded
data symbols simultaneously before the cycle repeats. At
the transmitter, the SASRATS transmitters are programmed
to run a 2 transmit Alamouti encoding scheme, where two
symbols, 𝑠0 and 𝑠1, are transmitted simultanously from two
transmitters at time instant 𝑡. At time instant 𝑡 + 𝑇 , the
symbols −𝑠∗
1
and 𝑠∗
0
are transmitted simultanously from the
transmitters where * represents the complex conjugate. The
transmission matrix is represented by
𝑺 =
[
𝑠0 𝑠1
−𝑠∗
1
𝑠∗
0
]
(1)
The transmitted symbols travel through 2 independent
channels ℎ0 and ℎ1 to a receiver where noises 𝑛0 and 𝑛1
are added to the received signals. ℎ0 and ℎ1 are complex
multiplicative distortions assumed constant across two con-
secutive symbols. This is depicted in Figure 2.
RX 0
Antenna
s
s
0
1
-
*
h
0
= e0
j
TX 0
Antenna
TX 1
Antenna
s
s
1
0
*
n
n
0
1
h
1
= e

j
Maximum
Likelihood
Detector
Channel
Estimator
Combiner
~
s
0
h
0
h
0
h
1
h
1
~
s
1
^
s
0
^
s
1
SASRATS
RECEIVER
0
VIRTEX FPGA
RX 1
Antenna
n
n
2
3
Channel
Estimator
h
2
h
2
h
3
h
3
SASRATS
RECEIVER
1
h
2
= e

j
h
3
= e

j
r
r
0
1
r
r
2
3 to
Combiner
to
Maximum Likelihood
Detector
Figure 2. Block diagram of Alamouti decoding implementation on
SASRATS platform
It is shown in [4] that at the input of the combiner, the
receive signals are given by
𝑟0 = 𝑟(𝑡) = ℎ0𝑠0 + ℎ1𝑠1 + 𝑛0
𝑟1 = 𝑟(𝑡 + 𝑇 ) = −ℎ0𝑠
∗
1
+ ℎ1𝑠
∗
0
+ 𝑛1 (2)
In our implementation[10], a real-time FPGA based channel
estimator produces the estimates ℎˆ0 and ℎˆ1 and this infor-
mation is fed to the combiner to yield two combined output
signals
𝑠0 = ℎˆ
∗
0
𝑟0 + ℎˆ1𝑟
∗
1
𝑠1 = ℎˆ
∗
1
𝑟0 − ℎˆ0𝑟
∗
1
(3)
The signals 𝑠0 and 𝑠1 are sent to the maximum likelihood
(ML) detector so that ML estimates 𝑠0 and 𝑠1 can be
made of 𝑠0 and 𝑠1. As we use PSK modulation of the
symbols at the transmitter (equal energy constellations),
the ML detector does not need channel estimates and the
decision rule in the ML detector is simplified to choose 𝑠𝑖
iff 𝑑2(𝑠0, 𝑠𝑖) ≤ 𝑑2(𝑠0, 𝑠𝑘), ∀𝑖 ∕= 𝑘 for 𝑠0 and choose 𝑠𝑖 iff
𝑑2(𝑠1, 𝑠𝑖) ≤ 𝑑
2(𝑠1, 𝑠𝑘), ∀𝑖 ∕= 𝑘 for 𝑠1 where 𝑑2(𝑥, 𝑦) is the
squared Euclidean distance between signals 𝑥 and 𝑦.
The complexity of the combiner and ML detector depends
on type of modulation. Binary phase shift keyed (BPSK)
symbols are the simplest to detect. Detection of non equal
energy modulation schemes require channel estimates in the
ML detector and has higher complexity. The present work
considers BPSK and QPSK implementations only.
Implementation of a MIMO 2 transmitter and 2 receiver
Alamouti system, requires the estimation of 4 channels
(ℎˆ0, ℎˆ1, ℎˆ2 and ℎˆ3), 2 at each receiver as shown in Figure
2. In this situation, the output of combiner yields 2 outputs
𝑠0 = ℎˆ
∗
0
𝑟0 + ℎˆ1𝑟
∗
1
+ ℎˆ∗
2
𝑟2 + ℎˆ3𝑟
∗
3
𝑠1 = ℎˆ
∗
1
𝑟0 − ℎˆ0𝑟
∗
1
+ ℎˆ∗
3
𝑟2 − ℎˆ2𝑟
∗
3
(4)
where ℎˆ2 and ℎˆ3 are channel estimates from the second
receiver. In the case of a 2 × 2 Alamouti implementation
using PSK signals, the ML decoder remains unchanged
except for the combiner. As seen from (4), the combiner
output 𝑠0 is actually the sum of 𝑠0 from receiver 0 and
𝑠0 from receiver 1. Likewise, 𝑠1 is actually the sum of 𝑠1
from receiver 0 and 𝑠1 from receiver 1. Thus a 2 × 𝑀
Alamouti implementation can be easily implemented by
summing together the appropriate combiner outputs from
𝑀 receivers before feeding one ML detector.
In an extended version of Alamouti for 4 transmitters
[12], full rate is achieved but the system is half rank
(quasi-orthogonal) with some loss in diversity as transmitted
symbols cannot be fully decoupled. Tarokh’s STBC scheme
[13] for 4 transmitters on the other hand, achieves complete
orthogonality at half the full rate. Tarokhs scheme suffers
no loss in diversity and receiver decoding is simpler as the
transmitted symbols can be fully decoupled.
The decoding of the Alamouti encoded signals is a
linear process and our SASRATS receiver system design
implements the combiner and maximum likelihood detection
on the Xilinx Virtex 2 Pro FPGA board using the Xilinx
Integrated System Enviroment (ISE) Foundation design tool.
140
At the SASRATS receivers, the I and Q outputs are fed
into a Xilinx University Program Virtex 2 Pro Development
System board based on the Virtex 2 Pro XC2VP30 with
30,816 logic cells. This low cost development board from
Digilent Inc. has four 20-bit wide ports which are ideal for
our 4 receiver system.
The complete design is implemented using a top down hi-
erarchical schematic entry approach on the Xilinx Integrated
System Enviroment (ISE) Foundation design tool. VHDL
code can also be integrated as a block with other schematic
components if desired. We have also made extensive use
of various Xilinx Core Generator intellectual property(IP)
modules incorporated within the ISE Foundation toolset to
shorten design cycle time.
IV. IMPLEMENTATION OF THE ALAMOUTI COMBINER
AND ML DECODER
We begin by first describing the overall architecture of
the Alamouti 2 × 1 decoding scheme for QPSK modulated
received symbols as shown in Figure 3.
IMG_S1(16:0)
RE_S1(16:0)
IMJ_S0(16:0)
RE_S0(16:0)
r0_X(16:0)
r0_Y(16:0)
r1_X(16:0)
r1_Y2C(16:0)
DV_E
DV_CLOCK
h0_Y2C(16:0)
h1_Y2C(16:0)
h1_Y(16:0)
h1_X(16:0)
h0_X(16:0)
h0_Y(16:0)
SYMBOL_A(16:0)
SYMBOL_B(16:0)
SYMBOL_C(16:0)
SYMBOL_D(16:0)
IMG_Si(16:0)
RE_Si(16:0)
A(15:0)
B(15:0)
C(15:0)
D(15:0)
CLOCK BIT_1
BIT_0
SYMBOL_A(16:0)
SYMBOL_B(16:0)
SYMBOL_C(16:0)
SYMBOL_D(16:0)
IMG_Si(16:0)
RE_Si(16:0)
A(15:0)
B(15:0)
C(15:0)
D(15:0)
CLOCK BIT_1
BIT_0
Y_IN(16:0)
X_IN(16:0)
DV_CLOCK
h0_Y(16:0)
DV_ENABLE
h1_Y(16:0)
r1_X(16:0)
h0_Y2C(16:0)
h1_Y2C(16:0)
r1_Y2C(16:0)
r0_X(16:0)
r0_Y(16:0)
ALA_CLK_POS
GLOBAL_CLOCK
DV_CLK_SYS
DV_ENABLE
ALA_CLK_POS
s0_0
s0_1
s1_0
s1_1 DATA_OUTPUT
ML_LATCH_CLOCK
DV_ENABLE
DV_CLOCK
DV_ENABLE
PRE_CLK
INPUTS
OUTPUT
~S0
~S1
S0
S1
^
^
h0
h1
^
^
r0, r1
CLOCK
ENABLE
OUTPUT_BITS
X_IN(16:0)
Y_IN(16:0)
h0_X(16:0)
h0_Y(16:0)
h1_X(16:0)
h1_Y(16:0)
GLOBAL_CLOCK
DV_CLK_SYS
DV_CLOCK
DV_ENABLE
PRE-COMBINER
PRE_CLK
COMBINER MAXIMUM LIKELIHOOD DETECTOR
OUTPUT DATA FORMATTER
Figure 3. Block diagram of the Alamouti combiner and maximum
likelihood detector implementation on the SASRATS receiver platform
The architecture consists of several blocks; the pre-
combiner, combiner, ML detector and output data formatter.
The inputs into the pre-combiner block consist of 16-bit I
and Q data and channel estimates ℎˆ0 and ℎˆ1 which remain
static for the duration of 128 data symbols. On receipt of
the Data Valid (DV) pulse from the channel estimator, the
pre-combiner circuitry latches to capture 𝑟0 and 𝑟1 over two
symbol periods and calculates the complex conjugates of
ℎˆ0, ℎˆ1 and 𝑟1 needed in the combiner. This is achieved by
performing a two’s complement operation on the imaginary
parts of ℎˆ0, ℎˆ1 and 𝑟1 using the Xilinx Two’s Complement
IP module.
The combiner block as shown in Figure 4 calculates 𝑠0
and 𝑠1. The product terms ℎˆ∗0𝑟0, ℎˆ1𝑟∗1 , ℎˆ∗1𝑟0 and ℎˆ0𝑟∗1are first
calculated in 4 separate Xilinx Complex Multiplier v2.0 IP
blocks. The product terms ℎˆ∗
0
𝑟0 and ℎˆ1𝑟∗1 are then summed
to compute 𝑠0. The signal 𝑠1 is then formed by taking dif-
ference between ℎˆ∗
1
𝑟0 and ℎˆ0𝑟∗1 by two properly configured
Xilinx Adder/Subtracter v7.0 IP cores respectively.
ADD
B(16:0)
A(16:0)
OVFL
S(16:0)
ADD
B(16:0)
A(16:0)
OVFL
S(16:0)
ADD
B(16:0)
A(16:0)
OVFL
S(16:0)
ADD
B(16:0)
A(16:0)
OVFL
S(16:0)
DV_E
DV_CLOCK
ar(16:0)
ai(16:0)
br(16:0)
bi(16:0)
sclr
clk
ce
pr(16:0)
pi(16:0)
GND
ar(16:0)
ai(16:0)
br(16:0)
bi(16:0)
sclr
clk
ce
pr(16:0)
pi(16:0)
DV_CLOCK
DV_E
ar(16:0)
ai(16:0)
br(16:0)
bi(16:0)
sclr
clk
ce
pr(16:0)
pi(16:0)
DV_CLOCK
DV_E
ar(16:0)
ai(16:0)
br(16:0)
bi(16:0)
sclr
clk
ce
pr(16:0)
pi(16:0)
GND
COMPLEX MULTIPLIERS
VCC
GND SUBTRACTOR
ADDER
Re(s0)~
Img(s0)
~Re(s1)
Img(s1)
~
~
RE_S0(16:0)
IMJ_S0(16:0)
RE_S1(16:0)
IMG_S1(16:0)
h0_X(16:0)
h0_Y2C(16:0)
r0_X(16:0)
r0_Y(16:0)
h1_X(16:0)
h1_Y(16:0)
r1_X(16:0)
r1_Y2C(16:0)
h1_Y2C(16:0)
h0_Y(16:0)
h0*
r0
h1
h1*
r0
h0
r1*
r1*
INPUTS OUTPUTS
GND
GND
DV_CLOCK
DV_E
h1_X(16:0)
r0_X(16:0)
r0_Y(16:0)
h0_X(16:0)
r1_X(16:0)
r1_Y2C(16:0)
DV_CLOCK
DV_E
DV_CLOCK
DV_ENABLE
Figure 4. Block diagram of the combiner
The outputs 𝑠0 and 𝑠1, are then fed into the maximum
likelihood (ML) detector processing block. The ML block
consist of 2 parallel and independent sets of Euclidean
distance calculators and minimum distance comparators as
shown in Figure 3 where the decision statistics, 𝑠0 and 𝑠1
are processed independently.
A(16:0)
S(16:0)
A(16:0)
S(16:0)
A(16:0)
S(16:0)
A(16:0)
S(16:0)
A(16:0)
S(16:0)
A(16:0)
S(16:0)
A(16:0)
S(16:0)
A(16:0)
S(16:0)
a(16:0)
b(16:0)
o(16:0)
a(16:0)
b(16:0)
o(16:0)
a(16:0)
b(16:0)
o(16:0)
a(16:0)
b(16:0)
o(16:0)
a(16:0)
b(16:0)
o(16:0)
a(16:0)
b(16:0)
o(16:0)
a(16:0)
b(16:0)
o(16:0)
a(16:0)
b(16:0)
o(16:0)
ADD
B(16:0)
A(16:0)
OVFL
S(16:0)
ADD
B(16:0)
A(16:0)
OVFL
S(16:0)
ADD
B(16:0)
A(16:0)
OVFL
S(16:0)
ADD
B(16:0)
A(16:0)
OVFL
S(16:0)
SYMBOL DIFFERENCE 
SQUARER ADDER
VCC
VCC
VCC
VCC
Imag(s0)
~
~
Re(s0)
RE_Si(16:0)
IMG_Si(16:0)
SYMBOL_A(16:0)
SYMBOL_B(16:0)
SYMBOL_C(16:0)
SYMBOL_D(16:0)
OUTPUTS
SYMBOL_B (01)
SYMBOL_C (10)
SYMBOL_D (11)
SYMBOL_A (00)
PARALLEL SQUARED EUCLIDEAN DISTANCE CALCULATOR
INPUTS
Figure 5. Block diagram of the squared euclidean distance block in the
ML detector
The Euclidean distance calculator block shown in Figure 5
first calculates in parallel, the difference between the symbol
decision statistic and 4 prestored QPSK symbols (±0.707±
𝑗0.707). The real and imaginary parts of each symbol are
then squared and added together. The 4 squared Euclidean
distance outputs (A,B,C and D) are fed into the minimum
Euclidean distance comparator block shown in Figure 6.
The minimum Euclidean distance comparator is imple-
mented using 6 two-input magnitude comparators. There are
two outputs (𝑥 > 𝑦, 𝑥 < 𝑦) from each comparator. The
outputs from the various comparators are AND’ed together
141
GT
LT
A[15:0]
B[15:0]
COMPMC16
GT
LT
A[15:0]
B[15:0]
COMPMC16
GT
LT
A[15:0]
B[15:0]
COMPMC16
GT
LT
A[15:0]
B[15:0]
COMPMC16
GT
LT
A[15:0]
B[15:0]
COMPMC16
GT
LT
A[15:0]
B[15:0]
COMPMC16
AND3
AND3
AND3
AND3
FDCE_1
D
C
Q
CLR
CE
FDCE_1
D
C
Q
CLR
CE
FDCE_1
D
C
Q
CLR
CE
FDCE_1
D
C
Q
CLR
CE
GND
GND
GND
GND
OR4
CHOOSE_A
CHOOSE_B
CHOOSE_C
CHOOSE_D
OR2
OR2
QPSK DECODED BITS 
MAGNITUDE COMPARATORS
OUTPUT LATCHES
'A'  IS MIN
'B'  IS MIN
'C'  IS MIN
'D'  IS MIN
A(15:0)
B(15:0)
C(15:0)
D(15:0)
CLOCK
BIT_0
BIT_1
A
B
C
D
INPUTS
CLOCK
A,B
A,C
C,D
B,D
B,C
A,D
Figure 6. Block diagram of the minimum distance comparator block in
the ML detector
and latched based on the following minimum magnitude
selection criterion which chooses 𝐴 iff (𝐴 < 𝐵) & (𝐴 < 𝐶)
& (𝐴 < 𝐷), 𝐵 iff (𝐴 > 𝐵) & (𝐵 < 𝐶) & (𝐵 < 𝐷), 𝐶
iff (𝐴 > 𝐶) & (𝐵 > 𝐶) & (𝐶 < 𝐷), 𝐷 iff (𝐴 > 𝐷) &
(𝐵 > 𝐷) & (𝐶 > 𝐷4). Only one of the four outputs goes
high when the criterion is met. Then the latched outputs are
fed into two OR gates to decode the estimated QPSK symbol
into bits. Thus each magnitude comparator for 𝑠0 or 𝑠1 has
one 2-bit output which represents either 00, 01, 10 or 11.
The output data formatter places the bit estimates of 𝑠0
and 𝑠1 in the correct time position resulting in a continuous
serial bit output which can be stored and checked agaisnt the
original serial bit stream sent at the transmitter for bit error
rate measurements. The system outputs 4 bits for every pair
of QPSK symbols received.
The Alamouti 2 × 2 decoding scheme on our testbed is
implemented by duplicating the pre-combiner and combiner
blocks for the second receiver where the combiner outputs
of both receivers are summed together in a multi-receiver
summer block as defined by (4), as shown in Figure 7,
to form the new combined output 𝑠0 and 𝑠1 prior to ML
detection. The same process is repeated for the Alamouti
2× 3 and 2× 4 schemes. In all cases, only one ML detector
block is needed.
The same modular approach can be used to implement
Tarokh’s 4× 1 orthogonal STBC [13] with some extensions
to the receiver pre-combiner, combiner and ML detector
design. Applying Tarokh’s theory of complex generalised
orthogonal designs [13] to a 4 × 4 scheme for example,
requires the pre-combiner to store sets of 8 received symbols
and 4 channel estimates per receiver prior to combining, ML
detection and estimation of 4 symbols. Implementation is
beyond the scope of this paper, but is straightforward.
IMG_S1(16:0)
RE_S1(16:0)
IMJ_S0(16:0)
RE_S0(16:0)
r0_X(16:0)
r0_Y(16:0)
r1_X(16:0)
r1_Y2C(16:0)
DV_E
DV_CLOCK
h0_Y2C(16:0)
h1_Y2C(16:0)
h1_Y(16:0)
h1_X(16:0)
h0_X(16:0)
h0_Y(16:0)
Y_IN(16:0)
X_IN(16:0)
DV_CLOCK
h0_Y(16:0)
DV_ENABLE
h1_Y(16:0)
r1_X(16:0)
h0_Y2C(16:0)
h1_Y2C(16:0)
r1_Y2C(16:0)
r0_X(16:0)
r0_Y(16:0)
ALA_CLK_POS
DV_ENABLE
DV_CLOCK
h0
h1
^
^
PRE-COMBINER
PRE_CLKr0, r1
IMG_S1(16:0)
RE_S1(16:0)
IMJ_S0(16:0)
RE_S0(16:0)
r0_X(16:0)
r0_Y(16:0)
r1_X(16:0)
r1_Y2C(16:0)
DV_E
DV_CLOCK
h0_Y2C(16:0)
h1_Y2C(16:0)
h1_Y(16:0)
h1_X(16:0)
h0_X(16:0)
h0_Y(16:0)
Y_IN(16:0)
X_IN(16:0)
DV_CLOCK
h0_Y(16:0)
DV_ENABLE
h1_Y(16:0)
r1_X(16:0)
h0_Y2C(16:0)
h1_Y2C(16:0)
r1_Y2C(16:0)
r0_X(16:0)
r0_Y(16:0)
ALA_CLK_POS
DV_ENABLE_RX1
DV_CLOCK_RX1
PRE_CLK_RX1r2, r3
h2^
h3^
CLOCK
CLOCK
S0_X_RX0(16:0)
S0_Y_RX1(16:0)
S1_X_RX2(16:0)
S1_Y_RX3(16:0) OVFL
S0_Y_RX0(16:0)
S1_X_RX0(16:0)
S1_Y_RX0(16:0)
S0_X_RX1(16:0)
S1_X_RX1(16:0)
S0_X_RX2(16:0)
S0_Y_RX2(16:0)
S1_Y_RX2(16:0)
S0_X_RX3(16:0)
S0_Y_RX3(16:0)
S1_X_RX3(16:0)
S1_Y_RX1(16:0)
S0_X(16:0)
S0_Y(16:0)
S1_X(16:0)
S1_Y(16:0)
S0~
~S1
DV_ENABLE
PRE_CLK
GLOBAL_CLOCK
DV_CLK_SYS
DV_ENABLE
ALA_CLK_POS
s0_0
s0_1
s1_0
s1_1 DATA_OUTPUT
ML_LATCH_CLOCK
SYMBOL_A(16:0)
SYMBOL_B(16:0)
SYMBOL_C(16:0)
SYMBOL_D(16:0)
IMG_Si(16:0)
RE_Si(16:0)
A(15:0)
B(15:0)
C(15:0)
D(15:0)
CLOCK BIT_1
BIT_0
SYMBOL_A(16:0)
SYMBOL_B(16:0)
SYMBOL_C(16:0)
SYMBOL_D(16:0)
IMG_Si(16:0)
RE_Si(16:0)
A(15:0)
B(15:0)
C(15:0)
D(15:0)
CLOCK BIT_1
BIT_0 S0
^
S1^
OUTPUT DATA FORMATTER
S0~
~S1
MAXIMUM LIKELIHOOD DETECTOR
X_IN(16:0)
Y_IN(16:0)
h0_X(16:0)
h0_Y(16:0)
h1_X(16:0)
h1_Y(16:0)
X_IN_RX1(16:0)
Y_IN_RX1(16:0)
h0_X_RX1(16:0)
h0_Y_RX1(16:0)
h1_X_RX1(16:0)
h1_Y_RX1(16:0)
OVERFLOW
DV_CLK_SYS
GLOBAL_CLOCK
OUTPUT_BITS
DV_ENABLE_RX1
DV_CLOCK_RX1
DV_ENABLE
DV_CLOCK
DV_ENABLE
DV_ENABLE
RX_0
RX_1
INPUTS
OUTPUT
COMBINER
MULTI  RECEIVER SUMMER
Figure 7. Block diagram of the 2 X 2 Alamouti implementation
V. EXPERIMENTAL VERIFICATION OF THE ML
DETECTOR
The first experiment to test the operation of the ML detec-
tor is performed on a 2× 1 setup of the SASRATS platform
as shown in Figure 8. TX0 and TX1 each transmit time
DIGITAL
RECEIVER
RX0
HP 11759B
RADIO FREQUENCY
CHANNEL SIMULATOR
DIGITAL
TRANSMITTER
TX0
XILINX
VIRTEX 2 PRO
DEVELOPMENT
BOARD
COMBINER
COMPUTER
DIGITAL
TRANSMITTER
TX1
NIDAQ
Input
Figure 8. SASRATS setup with HP11759B for channel estimation
verification
multiplexed GSM preambles TSC codes 0 and 1 respectively
between data frames at 915 MHz. The two transmitters are
programmed to transmit Alamouti space-time encoded data
during the data frame. The modulation is BPSK and symbol
rate is 1500 kbaud. In this experiment, a Hewlett Packard
11759B radio frequency channel simulator is programmed to
generate two independent uncorrelated Rayleigh flat fading
channels. 1048 kbit estimates from the output of the ML
detector are captured by the NIDAQ card and compared with
the actual transmitted bits in Matlab. Alamouti [4] assumes
that they are constant across two consecutive symbols but
in a practical implementation, this requirement is difficult
to meet. In our experiment, it is assumed constant across
the entire data frame of 128 symbols. We have verified in
Matlab that at an average SNR value of 25 dB, there are
virtually no errors proving the correct operation of the entire
142
system on FPGA. The experiment was repeated with QPSK
modulation of data symbols with similar results.
To test operation with more than one receiver, the SAS-
RATS platform is reconfigured into a 2 × 2 MIMO system
with the HP11759B removed as the channel simulator cannot
generate more than 2 Rayleigh fading channels. In the lab,
4 antennas spaced sufficiently apart are connected to the
transmitter outputs and receiver inputs. It was found that
under these conditions, the channels at 915MHz are highly
correlated and experience almost no fading. We are able to
process the information from the FPGA to confirm operation
of the ML space-time decoder. At a high SNR value of 30
dB, there are no errors despite highly correlated channels
using both BPSK and QPSK modulated symbols. We have
also tested the SASRATS platform configured as a 2 × 4
Alamouti MIMO system with excellent performance.
VI. CONCLUSIONS
We have described the implementation of a real time
maximum likelihood Alamouti decoder for use on our
MIMO platform implemented on an FPGA using the Xilinx
ISE tool and Core Generator IP modules. We have also
experimentally verified the operation of the decoder in a
closed Alamouti 2×1 diversity scheme using an RF channel
simulator and also in an open 2×2 and 2×4 antenna based
system under correlated channel conditions.
REFERENCES
[1] P. Green and D. Taylor, “Smart antenna software radio test
system,” Proceedings of the First IEEE International Work-
shop on Electronic Design, Test and Applications., vol. 1, pp.
68–72, Jan. 2002.
[2] ——, “Experimental verification of space-time algorithms us-
ing the smart antenna software radio test system (sasrats) plat-
form,” Personal, Indoor and Mobile Radio Communications,
2004. PIMRC 2004. 15th IEEE International Symposium on,
vol. 4, pp. 2539–2544, 2004.
[3] ——, “Implementation of a high speed four transmitter
space-time encoder using field programmable gate array and
parallel digital signal processors,” Proceedings of the Third
IEEE International Workshop on Electronic Design, Test and
Applications., pp. 466–471, Jan. 2006.
[4] S. Alamouti, “Space block coding: A simple transmitter
diversity technique for wireless communications,” IEEE J.
Select. Areas. Communication, vol. 16, pp. 1451–1458, Oct.
1998.
[5] D. Gesbert et al., “From theory to practice: An overview of
mimo space-time coded wireless systems,” IEEE Journal on
Selected Areas in Communications, vol. 21, pp. 281–302, Apr.
2003.
[6] R. M. Rao et al., “Multi-antenna testbeds for research and ed-
ucation in wireless communications,” IEEE Communications
Magazine, vol. 42, no. 12, pp. 72–81, 2004.
[7] S. Caban et al., “Vienna MIMO testbed,” EURASIP Journal
on Applied Signal Processing, vol. 54868, pp. 1–13, 2006.
[8] P. F. P. Murphy and C. Dick, “An fpga implementation of
alamoutis transmit diversity technique,” University of Texas
WNCG Wireless Networking Symposium, Oct. 2003.
[9] P. Green and D. Taylor, “Implementation of four real-time
software defined receivers and a space-time decoder using
xilinx virtex 2 pro field programmable gate array,” Proceed-
ings of the Third IEEE International Workshop on Electronic
Design, Test and Applications., pp. 89–92, Jan. 2006.
[10] ——, “Implementation of a real-time multiple input multiple
output channel estimator on the smart antenna software radio
test system platform using the xilinx virtex 2 pro field
programmable gate array,” Proceedings of the 2006 IEEE In-
ternational Conference on Field Programmable Technology.,
pp. 257–260, Dec. 2006.
[11] ETSI/GSM, “Multiplexing and multiple access on the radio
path,” GSM Recommendations Document 05.02 Version 3.8,
Dec. 1995.
[12] M. Rupp and C. Mecklenbrauker, “On extended alamouti
schemes for space-time coding,” Wireless Personal Multime-
dia Communications, 2002. The 5th International Symposium
on, vol. 1, pp. 115–119, Oct. 2002.
[13] V. Tarokh, H. Jafarkhani, and A. Calderbank, “Space-time
block codes from orthogonal designs,” IEEE Transactions on
Information Theory, vol. 45, no. 5, pp. 1456–1467, 1999.
143
