マルチユーザMIMO無線通信システムに関する研究 by Tran  Thi Thao Nguyen
A Study on Multi-User MIMO Wireless
Communication Systems
著者 Tran  Thi Thao Nguyen
その他のタイトル マルチユーザMIMO無線通信システムに関する研究
学位授与年度 平成28年度
学位授与番号 17104甲情工第325号
URL http://hdl.handle.net/10228/00006318
A STUDY ON MULTI-USER MIMO WIRELESS COMMUNICATION
SYSTEMS
Tran Thi Thao Nguyen
Contents
1 Introduction 8
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2 Research Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3 Thesis Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2 Multi-User MIMO Wireless System Overview 15
2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Multi-User Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3 Multi-User Transmission System . . . . . . . . . . . . . . . . . . . . . . . 17
2.3.1 Channel Emulator . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3.2 IDMA System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3 Multi-User MIMO Channel Emulator with Automatic Sounding Feedback 23
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2 MU-MIMO Channel Model . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2.1 General MU-MIMO Channel Model . . . . . . . . . . . . . . . . . 25
3.2.2 Statistical Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2.3 Feedback Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3 Hardware Platform Implementation . . . . . . . . . . . . . . . . . . . . . 30
3.3.1 Design of Functional Blocks . . . . . . . . . . . . . . . . . . . . . 31
3.3.2 Gaussian Random Number Generator . . . . . . . . . . . . . . . . 33
3.3.3 Doppler Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
ii
3.3.4 Spatial Correlation Block . . . . . . . . . . . . . . . . . . . . . . . 35
3.3.5 Rician Fading Block . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3.6 FPGA Implementation . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4 Measurement Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.4.1 Statistical Verification . . . . . . . . . . . . . . . . . . . . . . . . 38
3.4.2 Feedback Delay Verification . . . . . . . . . . . . . . . . . . . . . 39
3.4.3 Platform Verification . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.5 Synthesis Results of Proposed Channel Emulator . . . . . . . . . . . . . . 44
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4 Higher Order QAM Modulation for Uplink MU-MIMO IDMA Architecture 48
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.2 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.3 Iterative Chip-By-Chip Receiver . . . . . . . . . . . . . . . . . . . . . . . 51
4.3.1 Elementary Signal Estimator . . . . . . . . . . . . . . . . . . . . . 51
4.3.2 Extrinsic LLR Calculation . . . . . . . . . . . . . . . . . . . . . . 57
4.3.3 Interleaver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.3.4 Antenna Diversity . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.3.5 Soft mapper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.4 Simulation Results of QAM IDMA System . . . . . . . . . . . . . . . . . 60
4.5 Complexity Comparison between SCM and QAM Modulation . . . . . . . 63
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5 Interleaved Domain Interference Canceller for Low Latency IDMA System 65
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.2 Latency Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.3 Proposed Interleaved Domain Architecture . . . . . . . . . . . . . . . . . . 68
5.4 Implementation of Proposed Architecture . . . . . . . . . . . . . . . . . . 70
5.4.1 Conventional Architecture . . . . . . . . . . . . . . . . . . . . . . 70
5.4.2 Proposed Architecture . . . . . . . . . . . . . . . . . . . . . . . . 71
5.5 FPGA Implementation Results of Interleaved Domain IDMA Receiver . . . 74
5.5.1 Simulation Results of Interleaved Domain IDMA Receiver . . . . . 76
iii
5.5.2 Synthesis Results of Interleaved Domain IDMA Receiver . . . . . . 81
5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6 Conclusions and Future Works 85
6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
A Snapshots of the Designs 89
Bibliography 95
iv
List of Tables
3.1 Channel Emulator Specification . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 Simulation Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.3 Platform Verification Parameters . . . . . . . . . . . . . . . . . . . . . . . 42
3.4 Synthesis Result of Feedforward Channel vs. Feedforward and Feedback
Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.1 Simulation Parameter of Higher Order QAM IDMA System . . . . . . . . 60
4.2 Complexity Comparison between SCM and QAM Modulation . . . . . . . 64
5.1 Summary of Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.2 Input/Output Port Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.3 Simulation Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.4 Comparison of Architectures . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.5 Synthesis Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.6 Synthesis Results (Xilinx Virtex 6 240TFF784) . . . . . . . . . . . . . . . 83
v
List of Figures
1.1 Multi-user transmission for a dense network . . . . . . . . . . . . . . . . . 9
1.2 Standard development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Thesis hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1 MU transmission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 UL-MU MAC Protocol in IEEE802.11ax . . . . . . . . . . . . . . . . . . 18
2.3 MU communication systems . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4 Channel sounding procedure . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.5 IDMA transceiver with N users . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1 MIMO fading coecient generator structure . . . . . . . . . . . . . . . . . 25
3.2 MU-MIMO channel emulator . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3 CSI feedback protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4 Feedback mechanism in conventional channel emulator platform [20] . . . 29
3.5 Feedback mechanism in proposed channel emulator platform . . . . . . . . 29
3.6 Flexible feedback delay adjustment . . . . . . . . . . . . . . . . . . . . . . 31
3.7 MIMO fading coecient generator structure . . . . . . . . . . . . . . . . . 32
3.8 Single path processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.9 AWGN generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.10 Doppler filter block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.11 IEEE 802.11ac evaluation platform . . . . . . . . . . . . . . . . . . . . . . 37
3.12 Channel spectrum for 4x4 model D TGac . . . . . . . . . . . . . . . . . . 40
3.13 Channel capacity for 4x4 model D TGac . . . . . . . . . . . . . . . . . . . 41
3.14 Snapshot of the feedback channel output . . . . . . . . . . . . . . . . . . . 42
vi
3.15 BER performance of IEEE 802.11ac system . . . . . . . . . . . . . . . . . 43
3.16 Overview of the MU beamforming process . . . . . . . . . . . . . . . . . 44
3.17 Platform implementation of MU beamforming process . . . . . . . . . . . 44
3.18 EVM and constellation of the proposed system . . . . . . . . . . . . . . . 45
4.1 Transceiver IDMA system with N users in one antenna k=1 . . . . . . . . . 50
4.2 16-QAM constellation in IDMA system . . . . . . . . . . . . . . . . . . . 53
4.3 Mapping table of higher order QAM modulation . . . . . . . . . . . . . . . 54
4.4 IDMA system with antenna diversity . . . . . . . . . . . . . . . . . . . . . 59
4.5 Multiuser detection algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.6 Performance of SCM-QPSK and 16-QAM modulation with one antenna . . 62
4.7 Performance of Higher order QAM modulation with two antennas . . . . . 62
4.8 Performance in mixed modulation for IDMA system . . . . . . . . . . . . 63
5.1 Conventional architecture of IDMA receiver . . . . . . . . . . . . . . . . . 67
5.2 Proposed architecture of IDMA receiver . . . . . . . . . . . . . . . . . . . 70
5.3 Flow chart of the conventional architecture . . . . . . . . . . . . . . . . . . 72
5.4 Flow chart of the proposed architecture . . . . . . . . . . . . . . . . . . . 73
5.5 Architecture of the proposed interleaved domain architecture using dual-
port RAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.6 Timing chart of the proposed architecture . . . . . . . . . . . . . . . . . . 76
5.7 BER performance of the proposed system vs SNR . . . . . . . . . . . . . . 77
5.8 Latency of the IDMA system vs iteration . . . . . . . . . . . . . . . . . . . 80
5.9 Latency evaluations of the conventional architecture and the proposed ar-
chitecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
A.1 MU-MIMO channel emulator for 4x4 antenna and 35 taps . . . . . . . . . 90
A.2 MU-MIMO channel emulator with sounding feedback . . . . . . . . . . . 91
A.3 MU-MIMO channel emulator evaluation by using oscilloscope . . . . . . . 92
A.4 Spatial correlation block of MU-MIMO channel emulator . . . . . . . . . . 92
A.5 Rician block of MU-MIMO channel emulator . . . . . . . . . . . . . . . . 93
vii
Abbreviations
5G 5th Generation
ADC Analog-to-Digital Converter
AP Access Point
APP A Posteriori Probability
AWGN Additive White Gaussian Noise
BER Bit Error Rate
BICM Bit-Interleaved Coded Modulation
BPSK Binary Phase Shift Keying
CDMA Code Division Multiple Access
CSI Channel State Information
CSMA/CA Carrier Sense Multiple Accesses with Collision Avoidance
DAC Digital-to-Analog Converter
DL Downlink
ESE Elementary Signal Estimator
FDMA Frequency Division Multiple Access
FEC Forward Error Correction
FFT Fast Fourier Transform
FPGA Field Programmable Gate Array
ICI Inter Carrier Interference
IDMA Interleave Division Multiple Access
ISI Inter-Symbol Interference
LLR Log-Likelihood Ratio
LOS Line Of Sight
LPF Low Pass Filter
LTE Long Term Evolution
LUT Look Up Table
MAC Media Access Control
1
MRC Maximal Ratio Combining
MU Multi-User
MU-BF Multi-User Beamforming
MUD Multi-User Detection
MU-MIMO Multi-User Multi-Input Multi-Output
NDP Null Data Packet
NDPA Null Data Packet Announcement
NLOS None Line Of Sight
NOMA Non-Orthogonal Multiple Access
OFDMA Orthogonal Frequency Division Multiple Access
OMA Orthogonal Multiple Access
PDP Power Delay Profile
PHY Physical
PSD Power Spectral Density
PSDU Physical Layer Service Data Unit
QAM Quadrature Amplitude Modulation
QPSK Quadrature Phase Shift Keying
RAM Random-Access Memory
RX Receiver
SCM Superposition Coded Modulation
SIFS Short Interframe Space
SMC Simulink Model Compiler
SOC System On Chip
STA Station
SU Single-User
TDMA Time Division Multiple Access
TF-R Trigger Frame for Random Access
TGac Task Group ac
TX Transmitter
UL Uplink
URNG Uniform Random Number Generator
VHT Very High Throughput
2
Symbols
N Number of users
H Channel coecient matrix
L Number of multi-path
t Number of time slot
M Number of transmitter antenna
R Number of receiver antenna
R Channel correlation
S ( f ) Doppler power spectrum
fd Doppler frequency
Td Feedback delay duration
S amp Rate Sampling rate
fserial Serial processing frequency
Chan Forward Number of feedforward channel coecients
Num PDPtaps Number of PDP taps
Chan Coe f Number of feedforward and feedback channel coecients
fMAXuni f orm Maximum frequency with uniform random generators
U Number of uniform random generators added
a0 Denominator coecients
b0 Numerator coecients
fs Normalizing frequency
Hliid Independent identify matrix
C Cholesky decomposition matrix
P Overall power of channel
K Rician K-factor
HLOS LOS matrix
HRayleigh Rayleigh matrix
xn Transmitted signal of the n-th user
3
dn Data length of n-th user
cn Chip sequence of n-th user
xn;k Symbol sequence of n-th user and k-th antenna
J Frame length
K Number of transmitter antenna for each user
rk Received signal
xReal
n;k Real part of symbol sequence
x
Img
n;k Image part of symbol sequence
ak Complex zero mean AWGN with variance 2
yk Received signal after OFDM demodulation
n;k Sum of interference from other users and AWGN noise
H
n;k Conjugate of Hn;k( j)eyn;k Received signal with the conjugateen;k Sum of interference from other users and AWGN noise with the conjugate
(xn;k) Output of ESE processing
E(en;k) Mean of the interference
E(yk) Mean of the received signal
E(xn;k) Mean of the transmitted signal
Var(en;k) Variance of the interference
Var(n;k) Variance of the interference without the conjugate
Var(yk) Variance of the received signal
Var(xn;k) Variance of the transmitted signal
gˆn;k Estimated symbol
ˆbReal
n;k Estimated bit in real part
ˆbImg
n;k Estimated bit in image part
cˆn;k Estimated chip sequence
v Half of the number of bit per symbol
 A point in the constellation diagram
 1n Deinterleaving for the n-th user
n Interleaving for the n-th user
4
aˆn;k Despread outputecn;k Spread output
n;k( j) Extrinsic LLRs
Nc Number of sub-carriers
Ctrl Sum of soft mapper delay and the ESE delay
SP Number of spreading length
I Number of interference iteration
ID Index number of RAM
w ena Write enable of RAM
Nb Number of data bit
Wd Bit length in fixed-point operation
F Clock frequency
5
Summary
In recent years, Multi-User Multi-Input Multi-Output (MU-MIMO) transmission has be-
come a very important technique to improve the eciency of wireless communication sys-
tems. MU-MIMO transmission can allow multiple users to simultaneously communicate
enhancing the system performance. Because of this, MU-MIMO systems have been incor-
porated in current generation of wireless system standards.
Current MU-MIMO transmission schemes employ orthogonality in one way or an-
other. For example, Space-Division Multiple Access (SDMA) introduced in 802.11ac
avoids interference by applying a spatial precoding matrix before transmission. On the
other hand, Orthogonal Frequency Division Multiple Access (OFDMA) avoids interference
by scheduling users in separate frequency resource units. Next generation of MU-MIMO
transmission works in completely non-orthogonal way which further increases the system
throughput due to the absence of control packets necessary for user orthogonalization.
Non-orthogonal multiple access (NOMA) has been proposed for Long Term Evolution
(LTE) and envisioned to be an essential component of the 5th Generation (5G) mobile net-
work. Interleave Division Multiple Access (IDMA) is one of the NOMA techniques that
can support multiple access for a large number of users in the same bandwidth. IDMA has
several other advantages over multiple access schemes such as OFDMA and Code Divi-
sion Multiple Access (CDMA). These include higher spectral eciency and insensitivity
to clipping distortion. However, some problems of the conventional IDMA must be con-
sidered. These include latency and hardware complexity. In addition, IDMA theoretical
improvements are still unverified in practice and hence it needs experimental tests to verify
that all parts of the system are properly working.
This thesis presents contributions to make IDMA systems applicable for future MU-
MIMO communication systems.
 First, we present an MU-MIMO channel emulator that is indispensable not only in
testing the proposed ideas in this thesis regarding MU-MIMO transmission but also
in allowing experimental validation of current wireless communication systems.
 Second, we propose a novel interleaved domain IDMA architecture applicable to cur-
rent wireless communication standards. The proposed architecture is able to reduce
6
the latency of interference cancellation to half increasing the throughput by twice.
 In addition, to further improve the proposed IDMA system in terms of throughput
and low receiver complexity, we propose the use of higher order quadrature ampli-
tude modulations (QAMs) which allows increase in throughput by simply changing
the Log-Likelihood Ratio (LLR) calculation without increasing the needed parallel
IDMA cancellation processing chain.
7
Chapter 1
Introduction
1.1 Background
In high density wireless local area network (WLAN) environments in which many users
are present in a specific area, the collision probability of data transmission is high. As
a result, the eective system throughput will be severely decreased because of the colli-
sions among the stations accessing the wireless channel simultaneously. In Carrier Sense
Multiple Access with Collision Avoidance (CSMA/CA), the transmission by hidden nodes
causes severe interference, i.e. collision, to an on-going transmission [3]. Wireless multiple
access techniques supporting a large number of users are considered in order to take into
account the problems mentioned above. There have been significant advances of multiuser
(MU) techniques for wireless communication over the last ten years. Fig. 1.1 shows the
volume of public WLAN users from years 2011 to 2016. As shown in the figure, the ever
increasing number of users can only be supported through an ecient MU transmission
based system.
MU transmission techniques can be distinguished by the dierent frequency, time, code,
or power. These MU techniques are now being introduced in several new generation wire-
less standards (e.g., the fifth generation (5G) [1], 802.11ax [2]) as shown in Fig. 1.2. In
next generation systems, the high transmission data rates, low latency and low complexity
are required. Furthermore, there is a growing concern about user fairness. From system
point of view, the customers have to pay the same charges for the same service expect the
8
Figure 1.1: Multi-user transmission for a dense network
Figure 1.2: Standard development
same quality of service (QoS). In future standards, we also need to focus more on fairness
to satisfy the customer.
To satisfy these requirements, enhanced technologies are needed. Among the poten-
tial candidates, non-orthogonal multiple access (NOMA) is a key technology to enhance
9
the performance of next generation wireless communications. Orthogonal frequency divi-
sion multiple access (OFDMA) is a well-known high-capacity orthogonal multiple access
(OMA) technique whereas NOMA oers a set of desirable benefits, including greater spec-
trum eciency and its ability to support for a large number of users. There are dierent
types of NOMA techniques, including power-domain and code-domain. In the NOMA
power-domain multiplexing, multiple users are superimposed with dierent power gains,
which causes a problem of user unfairness. Interleave Division Multiple Access (IDMA)
is one of the NOMA code-domain techniques. IDMA is a special form of Code Division
Multiple Access (CDMA). The receiver dierentiates each station (STA) by their unique
interleaving patterns instead of using unique spreading codes. Compared to OFDMA and
NOMA power allocation, IDMA allows multiple users to be transmitted at the same time
and frequency without the strict requirements of dierent frequencies and powers. Because
of the advantages of the IDMA system above, the thesis studies how to improve the current
IDMA transceiver systems as well as their ability to employ the practical implementation.
To apply enhanced systems for future standards, the wireless channel emulator is im-
portant to test the systems. It dictates the transmitter architecture, the transmission rate,
and the receiver architecture. In an MU wireless communication, the transmitted signals
are being attenuated by fading due to multipath propagation and by shadowing due to large
obstacles in the signal path, yielding a fundamental challenge for a reliable communication.
In this thesis, the field programmable gate array (FPGA) implementation of an MU com-
munication system is focused. Thus, the MU channel emulator is indispensable. The thesis
proposes the MU multi-input multi-output (MU-MIMO) channel emulators with automatic
sounding feedback. The feedback channel coecients are separated by programmable time
duration as compared to the feedforward channel coecients. This programmability allows
a thorough evaluation of the Doppler eecting in MU transmission.
In previous studies of IDMA system [4]-[7], the authors suggested the use of BPSK
and QPSK modulation for IDMA system. The purpose of this thesis is to improve the
spectral eciency transmission of IDMA system by proposing a low complexity higher
order quadrature amplitude modulation (QAM) for IDMA system.
The main problem that needs to be addressed in designing an IDMA system is the
10
latency caused by the interleaving process. According to the interleavers proposed in pub-
lished literature, both the interleaving and de-interleaving operations permute sequences
serially, which will take many hardware clock periods and lead to high processing latency
and low processing throughput. This has been the bottleneck of the system throughput,
especially when the number of iterations is large. Since the interference cancellation up-
dates the extrinsic log likelihood ratios (LLRs) to improve performance by using previous
LLR values, the reduction of latency in each iteration has a significant eect because the
parallel processing cannot be employed to hasten the interference cancellation. The latency
is particularly important because it has to follow a strict requirement. For example, in the
case of recent 802.11 systems, the standard defines a short interframe space (SIFS) such
that a wireless interface processes a received frame and responds with a response frame of
16s. With practical IDMA system however, each iteration of the interference cancellation
consists of an interleaving and deinterleaving process that would make the latency much
higher than the defined SIFS. This problem hinders the development of IDMA system in
practice. The thesis proposes a novel architecture for IDMA system. The architecture can
calculate the updated extrinsic LLRs to detect multiple users in the interleaved domain
without the deinterleaver iteration in interference canceller. As a result of the interleaved
domain architecture, the proposed architecture can increase the throughput by almost twice
and reduces the latency by almost half, but it does not increase the complexity that makes
IDMA more feasible for the practical implementation.
From these contributions, the implementation of a MU communication system such as
IDMA is possible for future wireless systems.
1.2 Research Objectives
The target of this thesis is to make IDMA system applicable for future wireless standards
which have to satisfy the following objectives:
 An implementation of MU-MIMO channel emulator for testing not only the IDMA
system but also current MU wireless systems.
 A low complexity and high throughput IDMA system.
11
 A low latency IDMA system which can meet the requirements of future wireless
standards.
The design of an MU-MIMO channel emulator is capable of sending channel feed-
back automatically to the access point from the generated channel coecients after the
programmable time duration. This function is used for MU beamforming features such as
IEEE 802.11ac. The low complexity design of a MIMO channel emulator with a single
path implementation for all MIMO channel taps is also considered. A single path design
allows all elements of the MIMO channel matrix to use only one Gaussian noise genera-
tor, Doppler filter, spatial correlation channel and Rician fading emulator to minimize the
hardware complexity. In addition, the single path implementation allows the addition of the
feedback channel output with only a few additional non-sequential elements which would
otherwise double in a parallel implementation.
Previous works proposed systems in the context of Superposition Coded Modulation
(SCM) where multiple layers of BPSK or QPSK modulated symbols are transmitted si-
multaneously to achieve high spectral ecient transmission for IDMA system. However,
this method has a very high complexity due to the high number of streams that need to be
separated in the multi-user detection of the receiver. The thesis instead of SCM employs
QAM modulation up to 256-QAM for high spectral eciency transmission. The thesis
shows the receiver architecture using a soft demapper which significantly decreases the re-
ceiver detection complexity. While a maximum number of users that can be accommodated
in the proposed system is slightly less than the conventional, our proposed system is much
more suited in modern multi-mode transceivers. Aside from the fact that it needs about
25% complexity compared with SCM-QPSK.
One of the problems in hardware implementation of IDMA is its high latency due to
iterative processing. The thesis proposes a novel architecture for IDMA receiver with low
latency while maintaining low complexity. The results show that the proposed architecture
can reduce the latency about half and increase the throughput about double compared to
the conventional architecture.
12
Figure 1.3: Thesis hierarchy
1.3 Thesis Hierarchy
Fig. 1.3 shows this thesis hierarchy. The thesis has six chapters. This first chapter is the
introduction of this thesis. The remaining chapters are as follows:
Chapter 2. Multi-User Wireless System Overview
This chapter describes general introductions to the topic of MU wireless communi-
cation systems. The thesis briefly introduces the current techniques for multiple access
systems. Then, it points out the advantages of IDMA systems such as great spectral e-
ciency and user fairness. The overview of IDMA system and MIMO channel emulator for
testing are also described in this chapter.
Chapter 3. Multi-User Channel Emulator System with Automatic Sounding Feed-
back
This chapter focuses on the channel emulator for MU wireless systems and the auto-
matic sounding feedback channel. First, the thesis describes MU-MIMO wireless channel
emulator and the feedback delay. Then, it shows the hardware implementation of the pro-
posed channel emulator and the measurement results.
Chapter 4. Higher Order QAM Modulation for Uplink MU IDMA Architecture
13
This chapter shows the proposed higher order modulation IDMA system that includes
the iterative multi-user detection with a simplified soft bit computation. The complex-
ity comparison, the simulation result of QAM-IDMA system and the superposition coded
modulation IDMA system are shown to clarify the eectiveness of the proposed QAM-
IDMA system.
Chapter 5. Interleaved Domain Interference Canceller for Low Latency IDMA
System
This chapter describes the proposed interleaved domain architecture that can reduce the
latency to almost half eectively doubling the throughput with almost the same hardware
utilization. The details of the implementation of the proposed architecture and its results
are also shown in this chapter.
Chapter 6. Conclusion and Future Work
This chapter shows the summary of our whole works and the achievement results. It
also discusses about the possible research directions for future works to improve the MU
wireless communication systems.
14
Chapter 2
Multi-User MIMO Wireless System
Overview
2.1 Overview
Multi-User transmission is a radio transmission scheme that allows several stations to trans-
mit at the same time. There are several specific multiple access techniques such as Time
Division Multiple Access (TDMA), Frequency Division Multiple Access (FDMA), CDMA
and OFDMA designed to share the channel among several users. We separate these multi-
ple access techniques into orthogonal multiple access (OMA) such as FDMA and OFDMA
and non-orthogonal multiple access (NOMA) such as CDMA and IDMA. In OMA, wire-
less users competes with each other for the frequency resource to transmit their information
flow. If we cannot control concurrent access of several users, collisions can occur. Since
collisions are undesirable for connection-oriented communication such as mobile phones,
personal/mobile users need to be allocated into the dedicated channels on request. A main
issue with the OMA techniques such as OFDMA is that its spectral eciency is low when
some bandwidth resources are allocated to users with poor channel state information. On
the other hand, the use of NOMA enables each user to have access to all the subcarrier
channels, and so the bandwidth resources allocated to the users with poor CSI can still be
accessed by the users with strong CSI, which significantly improves the spectral eciency.
A duplex method of MU transmission is divided into uplink (UL) (many-to-one) and
15
Figure 2.1: MU transmission
downlink (DL) (one-to-many) transmission as shown in Fig. 2.1. Our main emphasis will
be on UL communication in which multiple users simultaneously communicate with a
single receiver such as access point (AP). In the UL transmission, the IDMA technique
can allow all users to spread their signals across the entire bandwidth, like in the CDMA
system. However, rather than using unique spreading codes to decode every user treating
the interference from other users as noise, the receiver dierentiates each STA by their
unique interleaving patterns. This leads to a low complexity receiver which grows linearly
with the number of parallel stations (STAs) supported [10].
In testing a MU system, experimental tests using actual wireless transmission are very
important to ensure that all parts of the system are properly working. However, due to
various factors such as government restrictions and logistical problems, experimental tests
using wireless medium often cannot be performed. In this case, having a wireless channel
emulator is indispensable. While all of various research works in the literature [8],[9]
support single-user (SU) transmission, we need to consider the MU channel emulator for
MU transmission.
16
2.2 Multi-User Protocol
MU techniques have been applied and proposed for current and future wireless commu-
nication systems. After the 802.11ac standard was ratified a few years ago, the downlink
MU-MIMO system has become a very promising option to improve WLAN spectral e-
ciency [11]. Uplink MU is supported in 802.11ax [12]. Fig. 2.2 shows a simple example
of the UL-MU access in 802.11ax. In this protocol, the transmission timing of each sta-
tion (STA) is centrally controlled by the AP. To inform necessary control information of
UL-MU transmission to users, the AP transmits a controlled frame called Trigger Frame
for Random Access (TF-R). Each user performs OFDMA random access according to the
control information which is informed by the AP. Users who get transmission opportunity
will send a frame to the AP. The AP responds in accordance with the condition of received
UL-MU frames. A series of this flow is repeated every trigger interval time. In order to
process UL-MU Media Access Control (MAC) protocol, first the UL-MU physical (PHY)
transmission has to be supported. IEEE 802.11ax adopts uplink OFDMA random access
scheme. However, the spectral ineciency and high complexity in user scheduling are the
problems of OFDMA techniques. Therefore, NOMA techniques are the promising tech-
nology for future wireless systems as 5G [1]. IDMA is one of the NOMA techniques; thus
it has many advantages of NOMA for spectral eciency and user fairness.
2.3 Multi-User Transmission System
The MU communication system includes the transmitter and the receiver which are con-
nected by the channel as shown in Fig. 2.3. The transmitted signal is aected by channel
fading and a thermal noise caused by electronic devices.
2.3.1 Channel Emulator
The performance of the wireless system depends on channels where the signal is transmit-
ted from the transmitter to the receiver. Unlike stable and predictable wired channels, radio
channels are completely random and not easy to analyze. Signals are transmitted via radio
channels, hampered by buildings, mountains and trees. They are then reflected, scattered
17
Figure 2.2: UL-MU MAC Protocol in IEEE802.11ax
Figure 2.3: MU communication systems
and diracted. These phenomena are referred to as fading. As a result, in the receiver, a lot
of dierent versions of the transmitted signal are collected. These fadings aect the quality
of radio communication systems. Hence, channel emulator is very important to ensure that
all parts of the system are properly working.
MU-MIMO is a set of multiple-input and multiple-output technologies for wireless
communications, in which a set of users or wireless terminals, each with one or more
18
Figure 2.4: Channel sounding procedure
antennas, communicate with each other. In contrast, the single-user MIMO is a single-
user multi-antenna transmitter communicating with a single-user multi-antenna receiver.
In a similar way that OFDMA adds multiple access capabilities to OFDM, MU-MIMO
adds multiple access capabilities to MIMO. The MU-MIMO channel models comprise of
the Doppler spectrum, the spatial correlation, the Rayleigh fading, the Rician fading, the
multipath fading, the path loss and shadowing. If the line of sight (LOS) signal is much
stronger than the others, Rician fading occurs. If there are multiple scatterers and no LOS
signal, Rayleigh fading occurs. MU-MIMO techniques can be adapted to both indoor and
outdoor environments such as channel models in 5G, WIMAX or 802.11ac system. In
802.11ac, there are the channel models A, B, C, D, and E for indoor environment as well as
the model F for both indoor and outdoor environment. In indoor environment, the channel
is not as easily aected by rough path loss exponents. While delay spreads are often much
smaller than outdoor environments, the indoor systems often have to achieve very high
data rates. In the MU-MIMO channel emulator, although the parameters of the channel
emulator in the standards are dierent, the coecient generator is the same.
The MU transmission for 802.11ac systems enables the access point (AP) to send sig-
nals simultaneously to all stations (STAs) without interference. This is possible by calculat-
ing an MU beamforming (MU-BF) matrix from a priori knowledge of each STAs channel
state information (CSI). In order to evaluate the MU-BF performance, the transmitter media
19
access control (MAC) must perform a channel sounding procedure as shown in Fig. 2.4 for
all the receiving STAs. The transmitter, after receiving the feedback from each of the STAs,
will compute an MU-BF matrix to be used for the MU-MIMO transmission. Depending
on the duration between the time when the STAs compute their channel feedback and the
time when the AP performs MU transmission, the performance of the system changes due
to channel evolution [14]. The channel feedback has an important role in MU transmission.
2.3.2 IDMA System
The focus of this thesis is on the uplink MU transmission for IDMA system since it can
increase performance for future wireless systems. The IDMA system diers to the CDMA
system in the use of interleaving code instead of spreading code. In IDMA system, the
spreading code is used as repetition code. Therefore, bandwidth expansion is fully ex-
ploited for forward error correction code that typically results in very low rate code as
compared to CDMA system. In the case of using the same spreading length, the number
of users in IDMA system is larger than the number of users in CDMA system because
the spreading length can be used smaller than the number of users in IDMA system. An-
other advantage of IDMA system is insensitivity to clipping distortion compared to CDMA
system. However, the most advantage of IDMA system is low complexity at the receiver.
IDMA system has low cost and superior performance in multi-user detection because it
detects desired signals from interference and noise. Matched filter of CDMA system is
low complexity but it has poor performance. MMSE filter of CDMA has moderate per-
formance but it is large complexity. While the computation cost of MMSE filter is N2 for
CDMA system, the computation cost of the interference cancellation is N for IDMA sys-
tem where N is the number of users. In IDMA system, the interleaver patterns used by the
participating stations (STAs) are pre-generated and stored in both access point (AP) and
STAs. The specific interleaver used by one client depends on its index assigned by the AP
during association.
The IDMA receiver includes the interference canceller to process the multiuser detec-
tion. In the IDMA and turbo coding literature, the a posteriori probability (APP) decoder
is inside the iteration loop because it make the performance of IDMA systems better in
20
	 
 	












	
	
 
	 


!
"""


	 
 	

#$


!

Figure 2.5: IDMA transceiver with N users
iterative decoding. However, since this will cause a very high latency to implement, we
simplify a simpler iteration loop where only the repetition decoder is placed inside the it-
eration loop [13]. The interference canceller consists of the elementary signal estimator
(ESE), the deinterleaver, the despreader, the extrinsic LLR calculation and the soft map-
per as in Fig. 2.5. The extrinsic LLR calculation includes the spreader and the interleaver.
The ESE is used as a soft demapper by calculating the LLR for each bit in one symbol.
The LLR output of ESE is deinterleaved with the unique interleaver index for each user.
Then the ordered LLR value is despread. In the first iteration, the extrinsic information is
very inaccurate. The receiver needs more than 4 iterations even with a little actual noise
to obtain an acceptable bit error rate (BER) [15]. If this iteration is not the last iteration,
the despread LLRs are spread again for the extrinsic LLR calculation that bases on the dif-
ference of before and after despreading. These are the values of the other spreading codes
excluded itself. The extrinsic LLRs are then interleaved to produce the values for the soft
mapping which updates the mean and variance variables for the ESE processing. In the
21
case of the final iteration, the spreader and the interleaver are not needed. The decoded
LLR values from the despreader are decoded by channel decoder to produce the estimate
of the transmitted bits.
2.4 Summary
In this chapter, the thesis has shown the overview of multi-user wireless system. The multi-
user protocol has also presented. The MU communication system includes the transmitter
and the receiver. The channel emulator is also needed for testing the system. The thesis
focuses on MU channel emulator and the uplink MU transmission for IDMA system.
22
Chapter 3
Multi-User MIMO Channel Emulator
with Automatic Sounding Feedback
3.1 Introduction
In this chapter, we focus on the field programmable gate array (FPGA) implementation of
MU channel emulators for MU systems. While various research works in the literatures [8],
[9] all support wireless local area network (WLAN) environments, they are designed for
single-user (SU) transmissions. After the 802.11ac standard was ratified a few years ago,
downlink (DL) multi-user (MU) transmission with multiple input multiple output (MIMO)
antennas has become a very promising option for improving WLAN system eciency [11].
Uplink (UL) MU-MIMO is supported in 802.11ax [12]. UL and DL MU schemes can be
considered as dual modes. Hence, in this chapter, we only consider the DL MU case be-
cause the DL requires the channel state information feedback for beamforming processing
which is not necessary in UL.
In the evaluation of MU transmission performance of the hardware WLAN platform,
one hurdle is that it is able to evaluate the performance of the system. Timely channel
sounding operations must be performed, which needs a working MAC layer. Although
channel emulators are commercially available [16], their features do not support the gener-
ation of the feedback channel coecients for MU-MIMO systems. A complete MAC and
PHY module that can process MAC information elements must be available for MU-BF.
23
However, MAC development in itself takes a lot of time and resources such that develop-
ment is done in parallel with the PHY.
In this chapter, we present the design of an MU-MIMO channel emulator. This MU-
MIMO channel emulator can be used for testing any MU systems such as IDMA, OFDMA
and MU-MIMO by changing the parameters in the design. The proposed channel emulator
is capable of sending channel feedback automatically from the generated channel coef-
ficients. It is called the feedforward channels used for convolving the input transmitted
signals. The feedback channel coecients are separated by programmable time duration
compared to the feedforward channel coecients. In the case of uplink IDMA system,
this channel feedback can be used for power control of each users. Moreover, in 802.11ac,
the feedback channel can be used for downlink MU-MIMO which needs channel state in-
formation to process the MU-BF. The programmable time duration of feedback channel
allows a thorough evaluation of the Doppler eecting in MU-BF transmission. Aside from
this, the feedback capability of the channel emulator makes it possible for the following
advantages:
1. Evaluation of MU-BF algorithms without channel estimation error. This is important
for non-linear MU-BF algorithms whose performance gain is highly sensitive to the
eect of channel estimation.
2. PHY level evaluation of MU-MIMO transmission with very minimal MAC features.
3. Evaluation of the MU-MIMO systems with virtual STAs. Virtual STAs are STAs that
are part of the MU-MIMO system, but whose bit error rate (BER) performance is not
calculated. This enables the evaluation of any MU-MIMO system configurations
even with a limited platform that has room for only one AP and one or few STAs.
The chapter is organized as follows. In Section 3.2, we describe MU-MIMO WLAN
channel emulator models and the feedback delay. Hardware platform implementation is
shown in Section 3.3. Section 3.4 shows the measurement results. Section 3.5 presents the
synthesis results, and Section 3.6 is our summary.
24
Figure 3.1: MIMO fading coecient generator structure
3.2 MU-MIMO Channel Model
The MU-MIMO channel coecient generator structure is shown in Fig. 3.1. At every
time instant, the channel model generates a set of matrix coecients H11  HLN for STAs
1 to N and path 1 to L. The aggregate MU-MIMO channel is then defined as Hl(t) =
[(Hl1(t))H; (Hl2(t))H:::(HlN(t))H]H for the l-th multi-path and the t-th time. While not seen
in the model, each of the matrices can have multiple path components following a certain
power delay profile (PDP).
3.2.1 General MU-MIMO Channel Model
The MU-MIMO channel models comprise of the Doppler spectrum, the spatial correlation,
the Rayleigh fading, the Rician fading, the multipath fading, the path loss, and the shad-
owing as in Fig. 3.2, where M is the number of transmitter antenna and R is the number of
receiver antenna. The designed channel emulator can be used for the general MU-MIMO
channel model, but in this case, we used the actual value defined in the 802.11ac channel
model as an example. Moreover, because the 802.11ac transceiver was completed without
the channel [17], a channel emulator can be used to test our 802.11ac transceiver platform
well.
25
Figure 3.2: MU-MIMO channel emulator
3.2.2 Statistical Model
The statistics for path delay, Doppler and spatial correlation are based on the values defined
in the 802.11ac channel model. These values are the results of many experimental measure-
ments done by many companies that attend the IEEE 802.11ac standardization. The Task
Group ac (TGac) channel model [18] produces randomly generated channel matrix coe-
cients with a defined spatial, temporal and spectral statistics. The spatial correlation of the
channel matrices which follows the Kronecker model as assumed since 802.11n directly
aects the channel capacity [19]. This means that the spatial correlation can be expressed
as
Rl = vec(Hl)Hvec(Hl) = RlT X 
 RlRX (3.1)
Equation (3.1) signifies that the channel correlation R can be estimated independently
in the transmitter and receiver. vec() is the vectorization of a matrix. It is a linear trans-
formation which converts the matrix into a column vector. Since the spatial correlation
26
is calculated by the Kronecker product of the correlation between the transmitter and the
receiver antenna, the vectorization is used to express matrix multiplication as a linear trans-
formation on matrices. RlT X and RlRX are the spatial correlations between the transmitter
antennas and the receiver antennas respectively.
The temporal correlation of the channel is directly due to the Doppler spread where the
channel coecients undergo fading with respect to time. For outdoor environments, the
auto-correlation of the channel coecient can be aected by the relative motion of the user
terminal and the base station.
For indoor wireless channels, the typical fading eect scenario involves human-based
motion as opposed to the relative motion between the transmitter and the receiver [18].
These fading eects can be described by the following Doppler power spectrum:
S ( f ) = 1
1 + A
 f
fd
2 (3.2)
where A is a constant, defined to set S ( f ) = 0:1 (a 10 dB drop) at frequency fd (thus:
A = 9) and fd is the Doppler frequency. Based on new experimental data collected during
the 802.11ac standardization, the channel coherence time was set to 800ms or an equivalent
Doppler spread of fd = 0.414Hz [18].
In term of frequency selectivity, the power delay profile (PDP) followed by the channel
model directly aects the frequency domain statistics of the frequency selective channel.
The 802.11ac channel model did not change the PDP definitions for 802.11ac, but defined
a mechanism to extend the previously defined PDP to higher bandwidths instead. The
802.11n PDP was defined only with a minimum tap spacing of 10ns for bandwidths up to
40MHz.
3.2.3 Feedback Delay
The 802.11n standard defines a mechanism for channel feedback from the STA to the AP.
This was expanded in 802.11ac to support multiple user feedback as shown in Fig. 3.3.
First, the AP sends a null data packet announcement (NDPA) frame starting the CSI feed-
back process. The null data packet (NDP) is a packet only containing the training symbols
27
Figure 3.3: CSI feedback protocol
and is solely used for sounding the channel. After the NDP is received, each of the STA
will send the very high throughput (VHT) Compressed Beamforming frame containing the
channel feedback information.
As seen in the above protocol, a complete MAC and PHY module that can process MAC
information elements must be available in order to experiment transmissions with MU-
BF. We propose an implementation of a feedback channel emulator which automatically
generates MIMO channel feedback with the programmable delay timing. This function
helps to evaluate the MU-BF without using channel estimation and very minimal MAC
features. In other words, one benefit of using our channel emulator instead of using the
wireless channel is that it is possible to provide a channel feedback to the AP without
initiating the protocol in MAC. In addition, the channel evolution due to the time delay
associated with the protocol can be parameterized to simulate various update periods in
real WLAN operation.
In the conventional model [20], the design of the channel emulator which generates the
channel coecients is shown in Fig. 3.4. At the beginning, the AP-MAC sends the NDP
to start the CSI process. The CSI is estimated at the PHY of each STA. The MAC of each
STA then constructs the beamforming report frame and feedbacks to AP. At the AP, the
PHY parses each channel feedback and the MAC computes a MU-BF weight to be used
to produce the MU-BF signal. The computed MU-BF weights of the MAC are stored in
the MU-BF RAM inside the AP. Note that this is done transparently to the PHY, meaning
that the PHY will use any MU-BF weight stored in the MU-BF RAM regardless of the
28
Figure 3.4: Feedback mechanism in conventional channel emulator platform [20]
Figure 3.5: Feedback mechanism in proposed channel emulator platform
”freshness” of its contents.
In the design of our proposed MAC and PHY operation for evaluation, the channel
feedback is directly written by the proposed channel emulator. These results are in a much
simpler flow as shown in Fig. 3.5. Based on the feedback channel coecients generated
by the proposed channel emulator, the non-AP STAs do not need any MAC functions and
hence the MAC layer can be omitted. Moreover, we use the very minimal MAC features
at the AP. It is the CSI RAM that stores the channel feedback from the STAs and the
29
MU-BF weight calculation. In addition, the physical layer service data unit (PSDU) RAM
that contains the packets to be transmitted is also needed. The rest of the MAC features
such as carrier sense multiple accesses with collision avoidance (CSMA/CA), control or
management frames and operator are not needed. In the case of the transmitter and the
receiver share information by connecting directly, there are two technical problems. First,
the transmitter and the receiver must agree on an NDP-like signaling scheme and some
related control information to support the direct connection. Hence, one needs to create
a crude channel sounding protocol which in itself must be verified. This procedure is
inecient and prone to error. The proposed emulator is transparent to the transmitter and
the receiver except for the writing of the feedback channel coecients to the transmitter
RAM. Second, when the delay duration is large, our proposed emulator has an advantage to
reduce the memory register of the hardware resource which is used to save the feedforward
channel until the delay time happens.
The delay controller in our proposed design is shown in Fig. 3.6. This controller is
used to choose the feedback delay duration Td for generating the feedback channel. In
realistic channel environment, because of the delay in gathering CSI, e.g. CSMA/CA and
random back-o, the CSI feedback delay duration for each STA is a random number. To
emulate the feedback channel in this case, the delay controller sets the duration to a random
number which has the same design with the simulator of IEEE 802.11ac system. Our
channel emulator can support both the random delay and the constant delay. In the case
of evaluation of a new MU-BF scheme, a constant delay is very helpful. Published papers
have given feature constant delay MU-MIMO BER performance verification [21], [22]. In
these cases, the proposed channel emulator allows us to provide a programmable constant
delay, e.g. 20ms or 40ms. In our proposed system, the delay controller sets the delay
duration using any pre-defined values per user input.
3.3 Hardware Platform Implementation
In the hardware implementation, the parameters of 802.11ac channel emulator are chosen to
implement as an example. The structure of the MIMO channel coecient generator block
of the 802.11ac channel emulator is shown in Fig. 3.7. The main components include the
30
Figure 3.6: Flexible feedback delay adjustment
Table 3.1: Channel Emulator Specification
Parameter Value
Output Sampling Rate 124 Hz
Doppler Frequency 0.414 Hz
Channel coherence time 800ms
PDP tap spacing 5ns
Number of taps 35
Supported Channel Models TGac A-F
Supported MIMO Configuration 4  4
Supported Number of Users/Streams 2
Transmit signal bandwitdh 80MHz
additive white Gaussian noise (AWGN), the Doppler fading emulated by using low pass
filter (LPF), the spatial correlator, PDP blocks and line of sight (LOS) eects. The channel
emulator specification is shown in Table 3.1.
3.3.1 Design of Functional Blocks
In Fig. 3.7, the functional blocks of the 802.11ac channel model are shown. The functional
blocks of the proposed channel emulator are based on this model. In Table 3.1, the case
with the maximum number of channel coecients that need to be generated is the Channel
Model D (35 PDP taps) with 44 MIMO TGac configuration and 5ns PDP tap spacing.
This configuration needs a total of Chan Forward = Num PDPtaps  M  X  2 =
35442 = 1120 independent Gaussian numbers to be generated where Num PDPtaps
is the number of PDP taps. The 2 factor is used because of the channel coecient being
the complex numbers. If these function blocks are processed in parallel, these Gaussian
31
Figure 3.7: MIMO fading coecient generator structure
numbers need 1120 blocks low-pass filters, spatial correlation, and Rician to generate the
channel coecients. When a feedback channel is supported, the total blocks will double to
Chan Coe f = Chan Forward  2 = 1120  2 = 2240 independent Gaussian numbers as
presented in Fig.3.8.
As a number of coecients are very large and the hardware resource is limited, the
implementation cannot be fitted using parallel implementation. In order to address this
issue, a design methodology for computing all channel coecients using single path im-
plementation is proposed. Since the frequency clock of FPGA board is high at 80MHz,
we propose to use higher sampling frequency to reduce the complexity. For example, the
sampling frequency of the Doppler filter S amp Rate is 124Hz and with a maximum of 35
PDP taps for model D, the maximum frequency to generate all 2240 channel coecients is
fserial = S amp Rate  Chan Coe f = 124Hz2240 = 277:7kHz. Therefore, by increasing
the sampling frequency, all channel coecients are generated as a serial processing which
is designed to include one Gaussian generator, one LPF, one spatial correlation and one
Rician fading block. This processing reduces the computational complexity up to 99%
32
Figure 3.8: Single path processing
compared to the parallel processing of the conventional design. The single path processing
is shown in Fig. 3.8. All channel coecients are generated by using the serial processing.
This architecture makes use of a model based design methodology using simulink
model compiler (SMC) from Synopsys, Incorporated. Model based design methodology
utilizes mathematical and visual methods for rapid simulation and prototyping. This is es-
pecially suitable for channel design where channel models are either described visually or
mathematically.
3.3.2 Gaussian Random Number Generator
To generate these numbers, we use the uniform random number generator (URNG) block
in SMC and apply the central limit theorem by adding time samples of the URNG block.
To ensure no correlation between random coecients, we add many uniform random gen-
erators which have dierent random seeds. Therefore, the maximum frequency becomes
fMAXuni f orm = fserial  U = 277:7kHz4 = 1:1MHz where U is the number of uniform
random generators added, which is processed one every 4 samples in this case. We chose
U = 4 as a good trade-o between the complexity and the low sampling frequency. The
AWGN generator block is shown in Fig. 3.9. The top branch produces all the necessary
taps for the main channel output or feedforward channel output while the bottom branch
produces all the necessary taps for the feedback channel output. It is to be observed that at
the end of the block, the commutator is used to sequentially switch the data from two par-
allel input ports to a single output port and the data rate of the output port will double as in
Fig. 3.9. This is called a single path implementation. Therefore, the output of the AWGN
33
Figure 3.9: AWGN generator
generator will include the feedforward channel coecients and interleave with feedback
channel coecients.
3.3.3 Doppler Filter
As mentioned in the previous section, the time variant channel is modeled by a ”Bell shape”
power spectrum. The TGn channel model provided the digital filter in eq. (3.3) and was
used by our emulator as it is an infinite impulse response filter.
S ( f ) = U b0 + b1z
 1 + b2z 2 + ::: + b7z 7
a0 + a1z 1 + a2z 2 + ::: + a7z 7
(3.3)
where U = 2:79 while the rest of the coecients including the denominators a0, a1, a2,
a3, a4, a5, a6, a7 are 1.00, -5.94, 14.8, -19.9, 15.2, -6.44, 1.28, 0.06, respectively and the
numerators b0, b1, b2, b3, b4, b5, b6, b7 are 1.00, -4.63, 9.40, -10.9, 7.91, -3.59, 0.92, -0.09,
respectively [19]. Because we used these parameters in IIR filter according to 802.11ac
standard, we chose a normalization factor of 300 consistent with [19] to achieve the eec-
tive sampling period of the Doppler filter. This is equal to the Doppler spread fd = 0:414Hz
multiplied by a normalizing factor fs = fd  300 = 0:414  300 = 124Hz.
While in parallel processing we need a total of 2240 IIR filters for all 2240 channel
coecients as in Fig. 3.7, in single path implementation we only need one IIR filter for all
coecients for low complexity. Normally, we cannot share the IIR filter with multiple input
streams as switching between the states of the filter registers will destroy the previous state.
34
To do this without aecting the statistics of the generated channel taps, we use banks of
random-access memory (RAM) to save the filter states before switching from one channel
tap to another. We use 7 RAM blocks with size of 224032bit to store the filter states of
all channel taps. The design is shown in Fig. 3.10.
3.3.4 Spatial Correlation Block
While the temporal elements of the matrices have already been correlated by the Doppler
filter, the spatial domain is still uncorrelated. Let the output of Doppler filter be arranged
into a column vector Hliid such that
Hliid =
h
hl11hl21 : : : hlM1hl12 : : : hlMR
iT (3.4)
where T is the transpose of a matrix. Equation (3.1) can be rewritten as
HlV = CHliid (3.5)
where C can be obtained from the Cholesky decomposition
Rl = CCHl (3.6)
The spatial correlation block needs a total of M4 = 256 complex multipliers to implement.
Similar to the Doppler filter block, we use one complex multiplier block to oversample by
256. Given that the output sampling frequency of the Doppler filter is 124Hz and with a
maximum of 35 PDP taps, the spatial correlation block throughput needs to run at about
1.1MHz to fulfill the task. We also use the simple complex multiplier which has only three
multipliers instead of four multipliers (as in the normal case) to reduce the utilization of
hardware resource.
3.3.5 Rician Fading Block
In general, the wireless MIMO channel consists of a line-of-sight (LOS) component and
non-line-of-sight (NLOS) components. In this section, both LOS and NLOS fading are
35
Fi
gu
re
3.
10
:D
op
pl
er
fil
te
rb
lo
ck
36
Figure 3.11: IEEE 802.11ac evaluation platform
considered. The first tap power, or LOS component which is much larger than the NLOS
component, is added to generate Rician fading as in eq. (3.7).
H =
p
P  (
r
K
K + 1
HLOS +
r
1
K + 1
HRayleigh) (3.7)
where P is the overall power of channel, K is the Rician K-factor, HLOS is the LOS matrix
and HRayleigh is the Rayleigh matrix. The Rician fading with parameter K = 0 which is
defined as the ratio of the LOS and NLOS component powers is the Rayleigh fading. When
the LOS component exists, K > 0.
As in the spatial correlation block, throughput needs to run at about 1.1MHz to fulfill
the task.
3.3.6 FPGA Implementation
Fig. 3.11 shows the channel emulator as a part of a complete MU-MIMO evaluation plat-
form. The transmitter and receiver are a complete MAC and PHY 802.11ac verification
platform previously implemented in [17].
The channel emulator board itself contains 5 Stratix II EP2S180F1508 FPGAs and
one Virtex 4 FPGA. Four of the FPGAs are equipped with 4 analog-to-digital converter
(ADC) and 4 digital-to-analog converter (DAC) for connecting with the baseband. This
37
is connected to the passband converter of the channel emulator called the interconnection
device. There are two interconnection devices in the channel emulator block where one
connects to the passband of the transmitter and the other one connects to the passband of
the receiver. This architecture is used to verify the transmitter, the receiver and the channel
emulator at the passband. In the channel emulator board, the 4 FPGAs called FPGA A, B, C
and D receive the transmission signals from their ADCs and channel coecients generated
from the remaining FPGA called FPGA E. Then, these FPGAs convolve the transmitted
signals with the channel coecients to produce the received signals and transmit them
to the receiver after using their DACs. It is to be observed that the feedback channel is
connected to the transmitter by using the ribbon cable connection.
3.4 Measurement Results
In this section, we verify the results measured by the 4  4 channel emulator platform.
First, we verify the statistical properties of the generated main channel samples by com-
puting the Doppler spectrum of each tap and the stochastic capacity of the resulting MIMO
channel. Next, we investigate the MU-MIMO features of the proposed system by testing
the feedback channel output as well as capturing the constellation of the transmitted (TX)
signal which processes the MU-BF progress using oscilloscope. In this experiment, the
transmitter and the proposed channel emulator are synthesized inside the channel emulator
FPGA board. The oscilloscope Tektronix 3032 is used to replace the receiver to capture the
constellation diagram. After combining the MU-BF signal and the channel coecients, the
received signal is transmitted to oscilloscope by using the 12-bit DACs inside the FPGA
board. We configure the oscilloscope to display the constellation diagram by using the XY
display feature.
3.4.1 Statistical Verification
The simulator uses 802.11ac system whose parameters are set as in Table 3.2. To test the
Doppler spread, we set the channel emulator configuration to the TGac channel model D
with 4x4 antennas. We then output the first channel coecient h11 of the first channel tap
38
to the signal tap and plot the power spectral density (PSD) spectrum to compare with the
Doppler spread of TGac system simulation. In similarly method, we receive the Doppler
spectrum of the second tap. Fig.3.12a shows the comparison of the PSD spectrum between
the reference output from simulation and the hardware result of the first tap while Fig.3.12b
shows the results of the second tap. As we can be seen, both outputs have the Doppler
spectrum with similar distribution. In model for indoor wireless LANs, all taps have a
classical Doppler spectrum, except for the first tap of channel D which has a 10 dB spike
[23]. The results show the 10 dB spike shape of the first tap and the bell shape of the second
tap in Fig.3.12a and Fig.3.12b respectively.
Much of the increase in capacity of IEEE 802.11 systems depends on the rank of the
channel matrix. In 802.11ac channel model, the spatial correlation of the channel matrices
follows the Kronecker model, which aects the channel capacity. The PHY capacity of
MIMO channel for measured MIMO channels is calculated as in (3.8) [18].
C = log2detjIR + S NRM HH
H j (3.8)
where SNR is the average received signal to noise ratio, R and M are the number of the
receiver and transmitter antennas respectively, H is the channel coecient matrix and H
denotes the Hermitian transpose. Assuming 30 dB average SNR, we use (3.8) to verify the
capacity of the generated MIMO channel. We set the channel emulator configuration to
channel model D and the distance of transmitter and receiver to be 15 m, which satisfies
the NLOS condition of TGac channel. In Fig.3.13, we can see that the capacities of the first
tap of the model D and model E in NLOS condition obtained from the hardware emulator
which matches well with that of the theory reference channel output from the standard
TGac simulator.
3.4.2 Feedback Delay Verification
In this subsection, we verify the feedback delay output of the channel emulator. Fig.3.14
shows the picture of two waveforms with a 100ms delay verifying the correctness of the
emulator output.
Next, we demonstrate the advantage of having a programmable feedback delay. The
39
(a) The first tap
(b) The second tap
Figure 3.12: Channel spectrum for 4x4 model D TGac
CSI feedback delay in TGac system simulator is randomly changed from 0ms to 40ms as
in the condition of actual channel environment while the feedback delay of the proposed
channel is set at a constant 20ms delay. The BER performance of random feedback delay
using the channel simulator and the proposed channel is shown in the pink curves and blue
curve respectively in Fig.3.15. From these results, there are at least 3 dB dierences when
the delay duration is changed. The proposed channel emulator can generate the constant
channel feedback delay which has stable performance. This function is useful in doing the
experimental tests for testing new MU-BF algorithms which need constant delay duration.
40
Figure 3.13: Channel capacity for 4x4 model D TGac
Table 3.2: Simulation Parameters
Parameter Value
Simulator 802.11ac system
Number of transmitter antennas 4
Number of receiver antennas 4
Data length 100 bytes
Transmit signal bandwitdh 80MHz
Modulation and coding scheme 2
Precoding Block Diagonal
Number of iteration 300
Channel decoding Hard Viterbi
Channel model TGac model D
CSI feedback delay Randomly from 0 ms to 40 ms
3.4.3 Platform Verification
The platform verification parameters are shown in Table 3.3. In this subsection, we want
to verify the platform in Fig.3.11. However, in order to avoid problems related to synchro-
nization of multiple FPGAs, we synthesize the transmitter and receiver inside the channel
41
Figure 3.14: Snapshot of the feedback channel output
Table 3.3: Platform Verification Parameters
Parameter Value
FPGA board type Stratix II EP2S180F1508
Oscilloscope Tektronix 3032
System model MIMO-OFDM system
Number of antennas 2  2 for platform verification
Modulation type QPSK
Channel model 802.11ac model D
Transmit signal bandwidth 0.5 MHz
CSI feedback delay 7.2ms, 28.7ms, 100ms
Time simulation 7926 seconds
emulator FPGA board which includes five FPGAs connected in one board. Instead of re-
ceiving the transmitted (TX) signal from the external FPGA, we generate the TX signal in
FPGA A of the channel emulator board. In this verification, we assume a two user MIMO
42
Figure 3.15: BER performance of IEEE 802.11ac system
system, the quadrature phase-shift keying (QPSK) modulation, 0.5MHz signal bandwidth
and TGac channel D. Fig.3.16 shows the MU-BF process. Fig.3.17 shows the platform
implementation of the MU-BF process. In the MIMO channel emulator board, FPGA E
is used to implement the MIMO channel emulator which includes the feedforward chan-
nel and the feedback channel. The transmitted signal of two users x1 and x2 is produced
inside the FPGA A. In the FPGA A, the MU-BF signal is also calculated by convolving
the transmitted signals and the feedback channel. After that, this signal is transmitted to
FPGA B and FPGA C. These FPGAs convolve the feedforward channel from FPGA E and
the MU-BF signal from FPGA A to output x1 and x2. These signals are captured by using
oscilloscope Tektronix 3032. The EVM results of x1 are shown in Fig. 3.18. The EVM
of hardware implementation has about 1% dierence with the EVM of Matlab simulation
because of the fixed point natural of hardware implementation. According to the results,
we observe the constellation of x1 in our proposed system at Td=7:2ms; 28:7ms and 100ms
delay duration on oscilloscope as examples. Fig. 3.18 shows the EVM results continu-
ously increase when the feedback delays increase. This is reasonable with the degree of
constellation scattering which is observed on the oscilloscope.
43
Figure 3.16: Overview of the MU beamforming process
Figure 3.17: Platform implementation of MU beamforming process
3.5 Synthesis Results of Proposed Channel Emulator
The synthesis results with the target FPGA Stratix II EP2S180F1508 are shown in Table
3.4. The eciency of the single path implementation in reducing the complexity is apparent
in this table. The table includes the synthesis results of the single path implementation of
feedforward channel, the single path of both feedforward and feedback channel, and the
parallel processing of feedforward channel.
44
Figure 3.18: EVM and constellation of the proposed system
In a parallel implementation, adding a feedback channel output would double the hard-
ware complexity. A single path implementation, however, would result in only a few addi-
tional non-sequential elements even though the sequential elements such as registers would
double as usual. In the single path implementation, the logic utilization for both feedfor-
ward channel and feedback channel is only 20% while the utilization of one feedforward
channel takes all 15%. Comparing single path implementation with parallel processing,
the significant eciency of single path implementation is indicated. The estimated logic
utilization of parallel processing takes 16; 800%, which cannot be consequently fitted into
the implementation device. The single path implementation method, however, requires
only 15%, reducing its workload by 1120. Because of the single path processing and large
45
available memory resources in the FPGA, the platform can further lower tap spacing needed
for higher bandwidth at the expense of higher operating frequency. We emulate the chan-
nel emulator for 802.11ac which the Doppler frequency is fixed at 0.414Hz. Because the
Doppler frequency in IEEE802.11 standards is small, the proposed model uses single path
implementation. It has an advantage of reducing the hardware resource. If the Doppler
frequency is high in another system, the design of more than one path processing can be
used.
3.6 Summary
In this chapter, we have proposed a 4x4 MU MIMO channel emulator with automatic CSI
feedback which is necessary for the evaluation of the MU-BF system. Our emulator is
based on FPGA technology and rapid prototyping software tools. Synthesis results have
also shown the eciency of single path processing. After describing the theoretical model,
we have outlined the emulator design and its basic operation. We have also discussed in
detailed about the actual hardware emulator results which are compared to the theoretical
ones. The design implemented in the target FPGAs of Stratix II EP2S180F1508 and analog
results have been verified on an oscilloscope.
46
Ta
bl
e
3.
4:
Sy
nt
he
sis
R
es
ul
to
fF
ee
df
or
w
ar
d
Ch
an
ne
lv
s.
Fe
ed
fo
rw
ar
d
an
d
Fe
ed
ba
ck
Ch
an
ne
l
Ty
pe
Fe
ed
fo
rw
a
rd
ch
an
nn
el
Fe
ed
fo
rw
a
rd
a
n
d
Fe
ed
ba
ck
ch
an
ne
l
Pa
ra
lle
lp
ro
ce
ss
in
g
fo
r
fu
ll
m
o
de
l
Lo
gi
c
U
til
iz
at
io
n
(W
)
-
Co
m
bi
na
tio
n
A
LU
Ts
-
D
ed
ic
at
ed
lo
gi
c
re
gi
ste
rs
15
%
18
,9
29
/
14
3,
52
0(1
3%
)
2,
55
2
/
14
3,
52
0(2
%)
20
%
21
,6
63
/
14
3,
52
0
(15
%)
9,
32
8
/
14
3,
52
0
(6%
)
16
,8
00
%
21
,2
00
,4
80
/
14
3,
52
0
(14
,77
2%
)
2,
85
8,
24
0
/
14
3,
52
0
(1,
99
2%
)
To
ta
lI
/O
pi
ns
12
3
/
1,
17
1
(11
%)
12
3
/
1,
17
1
(11
%)
13
4,
40
0
/
1,
17
1
(11
,47
7%
)
D
SP
bl
oc
ks
76
8
/
76
8
(10
0%
)
76
8
/
76
8
(10
0%
)
76
8
/
76
8
(10
0%
)
To
ta
lb
lo
ck
m
em
o
ry
bi
ts
27
6,
46
8
/
9,
38
3,
04
0
(3%
)
54
9,
92
8
/
9,
38
3,
04
0
(6%
)
30
9,
64
4,
16
0
/
9,
38
3,
04
0
(3,
30
0%
)
To
ta
lP
LL
s
4
/
12
(33
%)
4
/
12
(33
%)
4
/
12
(33
%)
47
Chapter 4
Higher Order QAM Modulation for
Uplink MU-MIMO IDMA Architecture
4.1 Introduction
Interleave division multiple access (IDMA) is one of the multiple access schemes that are
currently being considered for next generation wireless systems. Although IDMA scheme
has been studied as a special form of code division multiple access (CDMA) with advan-
tages in supporting a large number of users, it has not been widely used as a technique for
uplink multiple access because of the diculties in the multi-user detection (MUD).
IDMA utilizes dierent interleaver patterns which are used to distinguish users. A
distinguishing feature of IDMA is the necessity for MUD which uses turbo-type iterative
joint detection and decoding. In previous results on IDMA system [4]-[7], the authors
suggested the use of BPSK and QPSK modulation for IDMA system. For higher spectral
eciency transmission, some papers recommended the use of the similar superposition
coded modulation (SCM) which used multiple layers of BPSK or QPSK streams and treated
them as virtual users [24],[25]. Due to the increase in an eective number of users needed
to be separated, the complexity of this method linearly increases as a number of SCM layers
increase.
In this chapter, a method which transmits a single layer of high order QAM modulated
48
symbol and its low complexity detection at the receiver is proposed. We employ the log-
arithm likelihood ratios (LLR) calculation in soft mapper and soft de-mapper to quickly
separate the bits of one user. This is especially useful in very high order QAM modulations
employed in modern wireless standards.
The soft decision demapper for a QAM modulation is in itself also computationally
complex. Hence, we estimate the LLR by the simplified soft-output demapper method by
using multiple comparators instead of a highly complex summation of multiple logarithms.
This scheme has been previously used in bit-interleaved coded modulation (BICM) based
systems such as 802.11 wireless LAN [5].
In this chapter, we explain the operation of higher order QAM modulation for IDMA
system and the throughput with antenna diversity for 16-QAM, 64-QAM and 256-QAM
modulation. The performance of the proposed system is shown in terms of BER and hard-
ware complexity compared to SCM-IDMA. Due to the use of a regular QAM mapper in our
proposed system, the transmitter architecture is identical to the transmitter of the 802.11
system apart from the actual interleaver pattern. Hence, our IDMA system is much easier
to integrate in IEEE 802.11 system compared to the conventional SCM-IDMA system.
The chapter is organized as follows. In Section 4.2, the thesis describes the proposed
IDMA system. In Section 4.3, we introduce the iterative MUD with a simplified soft bit
computation. Section 4.4 presents the simulation results of the system. Section 4.5 shows
the complexity comparison between SCM-QPSK-IDMA and QAM-IDMA system and in
Section 4.6 is our conclusion.
4.2 System Overview
The transmitter and receiver structures of the proposed IDMA system with n users trans-
mitting at the same time are shown in Fig. 4.1.
Let dn be the data length of user n. The data is encoded by a convolution code and
spread with a repetition code which generates the chip sequence cn. Then cn is permuted
by a user specific interleaver of user n. After symbol mapping, the symbol sequence
xn;k = [xn;k(1); :::; xn;k( j); :::; xn;k(J)] is produced, where J is the frame length and k is the
49


 
          
             
	  

  
  
   
 
 
 

     

  
  
	

	

	

	
			

	

	
 
 
Figure 4.1: Transceiver IDMA system with N users in one antenna k=1
number of antennas. Next, IFFT accomplishes the OFDM modulation to multiple subcar-
riers. Finally, a cyclic prefix is inserted into the OFDM symbol to prevent inter-symbol
interference (ISI). This OFDM signal is transmitted to the channel.
At the channel, the transmitted data of each user is aected by multi-path fading with
the dierent Rayleigh coecients. Then, all of users are combined together to generate the
received signal rk( j).
Subscripts, ”Re” and ”Im”, indicate real and imaginary parts, respectively. Then,
xn;k( j) = xRealn;k ( j) + ixImgn;k ( j) (4.1)
In this chapter, we use 16-QAM, 64-QAM and 256-QAM modulation as examples for
general higher order modulations. xn;k( j) denotes the transmitted QAM symbol.
The MUD algorithm includes two main parts, which are Elementary Signal Estimator
(ESE) and the part for updating the mean and variance variables. Exact user separation
relies on the accurate estimation of the variables which are sent as feedback to the ESE.
50
4.3 Iterative Chip-By-Chip Receiver
4.3.1 Elementary Signal Estimator
The IDMA system using higher order QAM modulation proposed in this chapter assumes
a multi-path fading channel. Because of OFDM modulation, it is understood that ISI and
Inter Carrier Interference (ICI) can be completely eliminated.
The received signal after OFDM demodulation can be expressed as (4.2).
yk( j) =
NX
n=1
Hn;k( j)xn;k( j) + Ak( j) (4.2)
where Hn;k( j) = PL 1l=0 hn;k(l)e i2 jl=Nc is the channel coecient of subcarrier- j with L-path;
and Ak( j), the FFT of ak( j), is a complex zero mean AWGN with variance 2. We focus on
xn;k( j) and re-write (4.2) as
yk( j) = Hn;k( j)xn;k( j) + n;k( j) (4.3)
where
n;k( j) =
X
m,n
Hm;k( j)xm;k( j) + Ak( j) (4.4)
Note that the complex conjugate of Hn;k( j) by Hn;k( j). We have (4.5).
eyn;k( j) = Hn;k( j)yk( j) = jHn;k( j)j2xn;k( j) +en;k( j) (4.5)
where en;k( j) = Hn;k( j)n;k( j) (4.6)
Based on the central limit theorem, en;k( j) can be approximated as a Gaussian variable.
This approximation is used by ESE to generate LLR for xn;k( j).


xn;k( j)

=
2jHn;k( j)j2
eyn;k( j)   Een;k( j)
Var
en;k( j) (4.7)
51
E
en;k( j) = Hn;k( j)Eyk( j)   Hn;k( j)Exn;k( j) (4.8)
Var
en;k( j) = RTk;n( j)Varn;k( j)Rk;n( j) (4.9)
where
Var

n;k( j)

= Var

yk( j)

  Rk;n( j)Var

xn;k( j)

RTk;n( j) (4.10)
Rk;n( j) =
26666664HRen;k( j)  HImn;k( j)HIm
n;k( j) HRen;k( j)
37777775 (4.11)
with E(xn;k( j)) = 0 and Var(xn;k( j)) = I in the first iteration. They are also used to update
the interference mean and variance in the next iteration which will be discussed in details
in the soft mapper.
We define the signal gˆn;k( j) as (4.12).
gˆn;k( j) =
eyn;k( j)   Een;k( j)
jHn;k( j)j2 (4.12)
where E
en;k( j) is the mean of en;k( j).
For demapping, we maximize the probability of bit bn;k( j) by using the signal gˆn;k( j). It
is defined as P(bn;k( j)jgˆn;k( j)). Using Bayes rule, we have
P

bn;k( j)jgˆn;k( j)

=
P

gˆn;k( j)jbn;k( j)

 P

bn;k( j)

P(gˆn;k( j)
 (4.13)
In Fig. 4.2, we can clearly see that the probability of all constellation points occurs
equally, we have
P

bn;k( j)jgˆn;k( j)

= P

gˆn;k( j)jbn;k( j)

(4.14)
In higher order QAM modulation, we need to soft de-map the received data by the LLR
based on (4.15).
LLR

bI;v;n;k( j)

= log
P

bI;v;n;k( j) = 1jgˆn;k( j)

P

bI;v;n;k( j) = 0jgˆn;k( j)
 (4.15)
52
Figure 4.2: 16-QAM constellation in IDMA system
LLR

bI;v;n;k( j)

= log
P
2fS (1)I;v;n;kg p

gˆn;k( j)jxn;k( j) = 

P
2fS (0)I;v;n;kg p

gˆn;k( j)jxn;k( j) = 
 (4.16)
where  is a point in the QAM constellation; S (0)I;v;n;k and S (1)I;v;n;k denote all the points in the
constellation where v is half of the number of bit per symbol. S (0)Q;v;n;k and S
(1)
Q;v;n;k have the
same meaning as S (0)I;v;n;k and S
(1)
I;v;n;k, respectively but in the imaginary component of the
symbol. Computing the exact LLR for each bit in higher order QAM modulation signal in-
volves computing the ratio of the sum of probabilities in the constellation. Mathematically,
this calculation involves the computation in (4.16) for each bit of the gˆn;k( j) received signal
(e.g. computing 8 probabilities in 16-QAM modulation).
53
Figure 4.3: Mapping table of higher order QAM modulation
Sub-optimal simplified LLR can be obtained by the log-sum approximation: logP j z j 
max jlogz j. Thus, we have (4.17).
LLR(bI;v;n;k( j))  log
maxI2fS (1)I;v;n;kgp

gˆI;n;k( j)jxn;k( j) = I

maxI2fS (0)I;v;n;kgp

gˆI;n;k( j)jxn;k( j) = I
 (4.17)
LLR(bI;v;n;k( j))  14

minI2fS (0)I;v;n;kg

gˆI;n;k( j)   I
2
 minI2fS (1)I;v;n;kg

gˆI;n;k( j)   I
2 (4.18)
 DI;v;n;k (4.19)
Obtaining DI;v;n;k and DQ;v;n;k in (4.19) requires multiple computation of the logarithmic
function and so highly complex. Thus, in this chapter, we employ a further approximate
method illustrated in Fig. 4.2. The mapping table for 16-QAM, 64-QAM and 256-QAM are
shown in Fig. 4.3. The approximate values of DI;v;n;k and DQ;v;n;k of the 16-QAM modulation
54
is shown below.
DI;1;n;k =
8>>>>>><>>>>>>:
2(gˆI;n;k( j) + 1) gˆI;n;k( j) <  2
gˆI;n;k( j)  2  gˆI;n;k( j)  2
2(gˆI;n;k( j)   1) gˆI;n;k( j) > 2
(4.20)
DI;2;n;k =  jgˆI;n;k( j)j + 2; for all gˆI;n;k( j) (4.21)
For 64-QAM modulation, we utilize the same method as 16-QAM, but we calculate the
probability of six bits instead of four bits in 16-QAM. We have (4.22), (4.23) and (4.24).
DI;1;n;k =
8>>>>>>>>>>>>>>>>>>><>>>>>>>>>>>>>>>>>>>:
4(gˆI;n;k( j) + 3) gˆI;n;k( j) <  6
3(gˆI;n;k( j) + 2)  6  gˆI;n;k( j) <  4
2(gˆI;n;k( j) + 1)  4  gˆI;n;k( j) <  2
gˆI;n;k( j)  2  gˆI;n;k( j)  2
2(gˆI;n;k( j)   1) 2 < gˆI;n;k( j)  4
3(gˆI;n;k( j)   2) 4 < gˆI;n;k( j)  6
4(gˆI;n;k( j)   3) gˆI;n;k( j) > 6
(4.22)
DI;2;n;k =
8>>>>>><>>>>>>:
2( jgˆI;n;k( j)j + 3) jgˆI;n;k( j)j  2
4   jgˆI;n;k( j)j 2 < jgˆI;n;k( j)j  6
2( jgˆI;n;k( j)j + 5) jgˆI;n;k( j)j > 6
(4.23)
DI;3;n;k =
8>><>>: jgˆI;n;k( j)j   2 jgˆI;n;k( j)j  4 j(gˆI;n;k( j)j + 6 jgˆI;n;k( j)j > 4 (4.24)
DQ;1;n;k; DQ;2;n;k and DQ;3;n;k are calculated similarly to DI;1;n;k; DI;2;n;k and DI;3;n;k, but
DQ;v;n;k is based on the imaginary component of the received signal.
For 256-QAM modulation, we do similarly as 16-QAM and 64-QAM, but we calculate
55
the probability of eight bits. We have (4.25), (4.26), (4.27) and (4.28).
DI;1;n;k =
8>>>>>>>>>>>>>>>>>>>>>>><>>>>>>>>>>>>>>>>>>>>>>>:
8(gˆI;n;k( j)   j7j) jgˆI;n;k( j)j  14
7(gˆI;n;k( j)   j6j) 12  jgˆI;n;k( j)j < 14
6(gˆI;n;k( j)   j5j) 10  jgˆI;n;k( j)j < 12
5(gˆI;n;k( j)   j4j) 8  jgˆI;n;k( j)j < 10
4(gˆI;n;k( j)   j3j) 6  jgˆI;n;k( j)j < 8
3(gˆI;n;k( j)   j2j) 4  jgˆI;n;k( j)j < 6
2(gˆI;n;k( j)   j1j) 2  jgˆI;n;k( j)j < 4
gˆI;n;k( j) 0  jgˆI;n;k( j)j < 2
(4.25)
DI;2;n;k =
8>>>>>>>>>>>>>>>>>>><>>>>>>>>>>>>>>>>>>>:
4(jgˆI;n;k( j)j + 11) jgˆI;n;k( j)j  14
3(jgˆI;n;k( j)j + 10) 12  jgˆI;n;k( j)j < 14
2(jgˆI;n;k( j)j + 9) 10  jgˆI;n;k( j)j < 12
jgˆI;n;k( j)j + 8 6  jgˆI;n;k( j)j < 10
2(jgˆI;n;k( j)j + 7) 4  jgˆI;n;k( j)j < 6
3(jgˆI;n;k( j)j + 6) 2  jgˆI;n;k( j)j < 4
4(jgˆI;n;k( j)j + 5) 0  jgˆI;n;k( j)j < 2
(4.26)
DI;3;n;k =
8>>>>>>>>>>>>>>>><>>>>>>>>>>>>>>>>:
2(jgˆI;n;k( j)j + 13) jgˆI;n;k( j)j  14
jgˆI;n;k( j)j + 12 10  jgˆI;n;k( j)j < 14
2(jgˆI;n;k( j)j + 11) 8  jgˆI;n;k( j)j < 10
2(jgˆI;n;k( j)j   5) 6  jgˆI;n;k( j)j < 8
jgˆI;n;k( j)j   4 2  jgˆI;n;k( j)j < 6
2(jgˆI;n;k( j)j   3)  2  jgˆI;n;k( j)j < 2
(4.27)
DI;4;n;k =
8>>>>>>>>><>>>>>>>>>:
jgˆI;n;k( j)j + 14 jgˆI;n;k( j)j  12
jgˆI;n;k( j)j   10 8  jgˆI;n;k( j)j < 12
jgˆI;n;k( j)j + 6 4  jgˆI;n;k( j)j < 8
jgˆI;n;k( j)j   2 0  jgˆI;n;k( j)j < 4
(4.28)
DQ;1;n;k; DQ;2;n;k; DQ;3;n;k and DQ;4;n;k are calculated similarly to DI;1;n;k; DI;2;n;k; DI;3;n;k and
DI;4;n;k but DQ;v;n;k is based on the imaginary component of the received signal.
56
From equation (4.7) and equation (4.12), we have the ESE equation as in (4.29)
ˆbRealn;k ( j) =
2jHn;k( j)j4(DI;v;n;k)
Var
en;k( j) (4.29)
And ˆbImg
n;k ( j) can be generated in a similar way.
4.3.2 Extrinsic LLR Calculation
After calculating LLR, the corresponding ESE outputs, ˆbn;k( j), are de-interleaved with the
same interleaver index of transmitter to form cˆn;k( j).
From equation (4.29), the extrinsic LLR can be calculated. After an initial estimate of
the transmitted symbols for all STAs, the decoding of each STA’s transmitted sequence is
done. For STA n, the receiver performing deinterleaving is expressed as
cˆn;k( j) = ˆbn;k

 1n ( j)

(4.30)
where ˆbn;k( j) is the LLRs following the ESE processing and  1n ( j) is the deinterleaver
address of the j-th address.
Given the deinterleaved ESE output cˆn;k( j), the despread output is
aˆn;k(i) =
SP 1X
sp=0
cˆn;k

i  SP + sp

(4.31)
where i=b jSPc, i=0; 1;...( JSP -1) is the despreading data and bc is the floor calculation.
The spreading can be done as
ecn;k( j) = SP 1X
sp=0
cˆn;k

b j
SP
k
 SP + sp

(4.32)
The extrinsic LLR can be calculated by the dierence of cˆn;k( j) andecn;k( j) and followed
57
by the interleaver as
n;k( j) =ecn;kn( j)   cˆn;kn( j) (4.33)
At the final iteration, channel decoding of the data is performed to produce the estimate
of the transmitted bits ˆdn. In this chapter, we use the Viterbi algorithm for the channel
decoder.
4.3.3 Interleaver
Interleaver is a key component in designing IDMA system. The interleaver assigned to
the users should be ecient and the least complex. Interleaver indices have to be unique
and distinguishable with each other as well as easy to implement. The interleaver which is
used in this chapter is a random interleaver. Interleaving patterns of data for the users are
generated randomly. These patterns allow the system to uniquely identify each user during
MUD process.
4.3.4 Antenna Diversity
To improve the performance of higher order modulation in IDMA system, we have ap-
plied antenna diversity transmission technique with two antennas, y1 and y2. We are using
Maximal Ratio Combining (MRC) with Post-FFT Processing by combining LLRs after
de-interleaving. The detailed system is presented in Fig. 4.4.
The total signal on the n-th user at the output of de-interleaver with k-th antenna element
can be given by (4.34)
aˆn(i) =
KX
k=1
cˆn;k( j) (4.34)
where K is the total number of antenna and cˆn;k( j) is the output value of the de-interleaver.
4.3.5 Soft mapper
An important part in the IDMA system is the soft-mapper which maps the LLR bits to the
constellation as described in Fig. 4.5. The output is the mean and variance used in the next
58
Figure 4.4: IDMA system with antenna diversity
iteration of the ESE. The soft mapper is processed in 4 steps:
 Step 1: Calculating the probability of each bit with known LLR values.
 Step 2: Calculating the probability of each symbol.
 Step 3: Mapping probability of each symbol to constellation.
 Step 4: Calculating the mean of bits.
The output of the de-spreading is the extrinsic LLRs for gˆn;k( j). Then, these LLRs are
used to generate the updated mean as in (4.35) and the updated variance as in (4.36).
E

xn;k( j)

=
2Nbpsc 1X
Nc=0
(p + iq)Nc 
 n;k( j)
1 + n;k( j)

Nc
(4.35)
where Nc is a number of points in constellation diagram, Nbpsc is a number of bits per
symbol, p and q are the values taken by the I and Q axes (e.g. the values are f-3, -1, +1,
+3g for 16-QAM)
Var

xn;k( j)

= Var

n;k( j)

  E

xn;k( j)
2 (4.36)
where Var

n;k( j)

is the variance of the QAM symbol. E

xn;k( j)

and Var

xn;k( j)

are up-
dated in (4.8) and (4.10) respectively to calculate the LLR for xn;k( j).
59
Figure 4.5: Multiuser detection algorithm
Table 4.1: Simulation Parameter of Higher Order QAM IDMA System
Parameter Value
Packet data size [bit] 128 (16-QAM), 192 (64-QAM), 256 (256-QAM)
Number of users 16, 10, 7
Spreading length 16
Number of iterations in MUD 10
Number of symbols 1024
Number of AP antennas 2
Channel model Rayleigh channel (9 paths)
Modulation QPSK, 16-QAM, 64-QAM and 256-QAM
Cyclic Prefix 64
Convolution Code K=1/2, L=7, [171 133]
Number of block simulation 1000
4.4 Simulation Results of QAM IDMA System
The IDMA system is simulated and evaluated to assess its performance in higher order
QAM modulations such as 16-QAM, 64-QAM and 256-QAM modulation. The detailed
parameter of our simulation is described in Table 4.1 below.
In 16-QAM modulation, the data length is 128 bits. The data is encoded with rate of
1=2 the convolution code to produce 256 coded bits. If the spreading length is 16 bits, the
coded bits spread to a 4096 bit data length. All users employ the same spreading factor that
60
contains a balanced number of +1 and -1 as the spreading sequence. After spreading, each
user is interleaved by a user-specific interleaver, which is randomly and independently
generated with a length of 4096. Next, these chips are mapped to higher order QAM
symbols. The OFDM symbol of each user is modulated to multiple sub-carriers by using
IFFT. The total number of sub-carriers Nc is set to be 1024 for every type of modulation.
A cyclic prefix of 64 is inserted. Multi-path Rayleigh fading channels are used in this
simulation. At the receiver side, FFT is proceeded prior to the iterative MUD. The iteration
number is fixed at 10 to guarantee the convergence.
In Fig. 4.6, we have compared the performance of 2 layer SCM-QPSK and 16-QAM
modulation with IDMA system with one antenna. Note that the total throughput per user
is equal in both cases. However, because the convergence of the two methods diers, a
number of users shown in this figure correspond to the highest number of users where
each algorithm properly converges. In this figure, it is shown that the performance of the
proposed algorithm just diers by about 1 dB to 2 dB compared with SCM-QPSK at 10 4
dB. But the complexity of the proposed algorithm is much less than SCM-QPSK, which
will be shown in the next section.
To overcome the reduction of the parallel number of users when employing high or-
der modulation such as 256-QAM, we supplement the system with antenna diversity in
(4.34). In addition, it is especially eective in severe fading situations which can cause
performance degradation in wireless system.
Fig. 4.7 shows the performance of the proposed system with high number of users made
possible by using two antennas. In this system, 16-QAM, 64-QAM and 256-QAM modula-
tions can support up to 16 users, 10 users and 7 users respectively with good performance.
These advantages are mainly because of the use of MUD and antenna diversity.
In a realistic multiple access system, each user has dierent channel condition and
dierent capability, which leads to a multiple access transmission where each user employs
modulation order independently. To show the performance of the proposed system in this
scenario, the thesis simulates a system with mixed modulation consisting of QPSK, 16-
QAM, 64-QAM and 256-QAM modulation. We have selected a total of 24 users in which
15 users using QPSK, 4 users using 16-QAM, 3 user using 64-QAM and 2 users using 256-
QAM. The receiver is assumed to have 2 antennas. In Fig. 4.8, the result can be proven that
61
Figure 4.6: Performance of SCM-QPSK and 16-QAM modulation with one antenna
Figure 4.7: Performance of Higher order QAM modulation with two antennas
the OFDM-IDMA system can support the realistic scenario where users employ modulation
independently.
62
Figure 4.8: Performance in mixed modulation for IDMA system
4.5 Complexity Comparison between SCM and QAM Mod-
ulation
According to the ESE algorithm for QPSK, the complexity of SCM-QPSK modulation has
32 multiplications, 20 additions/subtractions and 2 divisions [4]. On the other hand, the
simplified LLR higher order QAM modulation presented in this chapter has the following
hardware complexities: 16-QAM modulation has 32 multiplications, 36 additions/subtractions,
2 divisions; 64-QAM modulation has 32 multiplications, 72 additions/subtractions, 2 di-
visions; and 256-QAM modulation has 32 multiplications, 136 additions/subtractions, 2
divisions per chip per user per iteration.
The summary of the comparison of the complexity of the IDMA receiver with 10 itera-
tions is shown in Table 4.2. Note that the eect of the complexity of the approximate LLR
in the proposed system is reflected in the number of multiplications.
In SCM-QPSK modulation with 6 users and 2 streams per user, we have a total of 12.
In QPSK modulation, there are 2 bits per symbol. Thus, the total number of bit is 24. This
is equivalent to the proposed 16-QAM sytem with 6 users, the proposed 64-QAM system
63
Table 4.2: Complexity Comparison between SCM and QAM Modulation
Parameters SCM-QPSK 16-QAM 64-QAM 256-QAM
Number of users 9 (x2 streams/user) 9 6 4
Multiplications 5760 2880 1920 1280
Additions/Subtractions 3600 3240 4320 3040
Divisions 360 180 120 80
with 4 users and the proposed 256-QAM system with 3 users.
According to the results in Table 4.2, we can conclude that the more bit per symbol in
higher order QAM modulation, the less overall complexity for the proposed IDMA system.
For the same number of transmitted bits, the complexity of 256-QAM modulation is about
25% compared to SCM-QPSK modulation.
4.6 Summary
In this chapter, the principles of the IDMA scheme for higher order QAM modulation have
been presented. IDMA system has a turbo-type iterative interference cancellation which
can improve the performance and support many users. To improve the ecient, SCM-
IDMA is used but the structure of SCM-IDMA is very complex. We have proposed the
simplified LLR computation to reduce the complex calculation in QAM modulation. One
of the reasons why the QAM modulation of IDMA system has not implemented so far is
due to the performance of QAM-IDMA is not good. The eectiveness of using antenna
diversity is also shown in this chapter to improve the performance of QAM-IDMA.
64
Chapter 5
Interleaved Domain Interference
Canceller for Low Latency IDMA
System
5.1 Introduction
IDMA is a special form of Code Division Multiple Access (CDMA). The receiver dif-
ferentiates each STA by their unique interleaving patterns instead of the spreading codes.
This leads to a low complexity receiver which grows linearly with the number of parallel
stations (STAs) supported in [10]. At the simplest case, the hardware complexity of the
IDMA transmitter is very similar to a regular OFDMA or multi-carrier CDMA transmitter.
However, the receiver is recursive and requires deep memory hardware. The main problem
that needs to be addressed in designing an IDMA system is the latency caused by the inter-
leaving process. For the interleavers proposed in the literatures so far, both the interleaving
and de-interleaving operations permute sequences serially, which will take many hardware
clock periods. Thus, it leads to high processing latency and low processing throughput.
This has been the bottleneck of the system throughput, especially when the number of it-
erations is large. The interference cancellation updates the extrinsic log likelihood ratios
(LLRs) to improve performance by using previous LLR values. The reduction of latency in
each iteration has a significant eect because the parallel processing cannot be employed
65
to hasten the interference cancellation. In addition, the reduction of latency isparticularly
important in the case of IEEE 802.11 system. The standard defines a short interframe space
(SIFS) such that a wireless interface process a received frame and responds with a response
frame of 16s. In practical IDMA system, however, each iteration of the interference can-
cellation consists of an interleaving and deinterleaving process that would cause a latency
much higher than the defined SIFS. This problem is a huge obstacle in the adoption of
IDMA in commercial devices such as IEEE 802.11.
There are some papers that proposed dierent methods to reduce the latency of IDMA
[30, 31, 32]. The problem of latency reduction is tackled by using grouped spread IDMA
to decrease the number of users who participates in the iteration process [30]. Although the
group spread IDMA has low latency and low complexity, its bit error rate (BER) perfor-
mance is worse than the IDMA system that uses a small number of iterations. The parallel
interleavers for user separation is proposed in [31] for the improvement of throughput.
However, the correlation of interleavers is very poor resulting in reducing BER perfor-
mance [31]. In [32], the author demonstrated the feasibility of implementing IDMA in
current large scale integration (LSI) technology and proposed the dual-frame processing.
The paper [32] proposed the dual-frame processing to reduce the latency due to the waiting
time which occurred in interleaver and deinterleaver memory units. This is done by dou-
bling the memory size of the random-access memories (RAM) block to process two frames
simultaneously. The paper [32] used the waiting time to transmit two frames to improve the
throughput twice, but it can not reduce the latency in the iteration of the interference cancel-
lation. In contrast, our proposed architecture can reduce the latency by half by simplifying
the architecture without the need to double the memory size of RAM. This architecture can
calculate the updated extrinsic LLRs to detect users in the interleaved domain without the
deinterleaver iteration in interference canceller called the interleaved domain architecture.
As a result of the interleaved domain architecture, the proposed architecture can increase
the throughput by decreasing the latency to half without increasing the complexity.
The rest of the chapter is organized as follows. In Section 5.2, we discuss the overview
of IDMA system. Section 5.3 describes the proposed IDMA receiver architecture in detail.
In Section 5.4, we derive the hardware implementation of the proposed architecture. The
results are shown in Section 5.5. Lastly, we conclude this chapter in Section 5.6.
66
6RIWPDSSHU
ොܽ௡ሺ݆ሻ 'H
,QWHUOHDYHU
'H
6SHDGHU
(OHPHQWDU\6LJQDO
(VWLPDWRU(6(
6SUHDGHU ,QWHUOHDYHU
ܧሺݔ௡ሺ݆ሻሻ
,'0$,QWHUIHUHQFH&DQFHOODWLRQIRUXVHUQWK෠ܾ௡ሺ݆ሻ
ߝ௡ሺ݆ሻ ܸܽݎሺݔ௡ሺ݆ሻሻǁܿ௡ሺ݆ሻ
Ƹܿ௡ሺ݆ሻ ௡
Figure 5.1: Conventional architecture of IDMA receiver
5.2 Latency Analysis
In this section, we focus on the interference canceller of IDMA receiver as shown in
Fig. 5.1. In the interference canceller, the extrinsic LLR is calculated to generate the up-
dated variable for the ESE in next iterations.
Each iteration of the interference cancellation involves the following processes:
– ESE
– Deinterleaver
– Despreader
– Spreader
– Extrinsic LLR computation
– Interleaver
– Soft mapper
From the received signal yk( j), the first process involves computing an initial estimate
of each user data bits using (4.29) to obtain ˆbn;k( j). The next step is the deinterleaver shown
in (4.30). Because of the writing process of the deinterleaver, the memory operations need
J cycles which are equal to the frame size. After this, the next step to despread is expressed
in (4.31) and is an accumulator operation that has negligible latency equal to the spreading
factor SP. The computation of the extrinsic LLR shown in (4.33) includes the interleaving
which again would need J cycles. Lastly, the feedback update variable in (4.35)–(4.36)
67
Table 5.1: Summary of Latency
Type Operation cycles
ESE processing and soft mapper Ctrl
Deinterleaver J
Despreader SP
Spreader 0
Extrinsic LLR computation 0
Interleaver J
will also have negligible latency because it uses a lookup table. The sum of soft mapper
delay and the ESE delay is Ctrl cycles. In our design, Ctrl equals to 14 cycles including
6 cycles caused by the soft mapper and 8 cycles caused by the ESE. Since the number of
deinterleaving/interleaving length is very large compared to the number of spreading length
and the arithmetic computation, the largest latency of IDMA system is in the interleaver and
deinterleaver with 2J delayed cycles for the conventional architecture. Table 5.1 shows
the summary of the latency.
5.3 Proposed Interleaved Domain Architecture
The relation between the interleaver and deinterleaver can be expressed as follows:
cˆn;k( j) = ˆbn;k

 1n ( j)

, cˆn;k

n( j)

= ˆbn;k( j) (5.1)
On the other hand, the extrinsic LLR can be calculated as
(xn;k( j)) =
SP 1X
sp=0
cˆn;k
 jn( j)
SP
k
 SP + sp
!
  ˆbn;k

 1n

n( j)

(5.2)
=
SP 1X
sp=0
cˆn;k
 jn( j)
SP
k
 SP + sp
!
  ˆbn;k( j) (5.3)
=
SP 1X
sp=0
cˆn;k
 jn( j)
SP
k
 SP + sp
!
  cˆn;k

n( j)

(5.4)
68
As shown in (5.4), the extrinsic LLR can be calculated by subtracting the current data
from the sum of all data in one spreading codeword. The sum of data in one spreading
codeword is calculated by PSP 1sp=0 cˆn;k jn( j)SP k  SP + sp and the current data is cˆn;kn( j).
The data cˆn;k, which is the data after deinterleaver, is used instead of both ecn;k and cˆn;k as
in (4.33). The interleaver address n( j) can be calculated by the algebraic interleaver [34]
from the sequence addresses j. Note that the received signal yk( j) and the channel Hn;k( j)
which are used to calculate the ESE are the interleaved signals. If the interference canceller
can be processed in the interleaved domain, the latency can be significantly reduced. In the
original IDMA system, the data has to be deinterleaved before processed at the despreader.
And the data has to be interleaved to calculate the update LLRs. Thus, the deinterleaver
and the interleaver have to be processed sequentially in each iteration. According to (5.4),
the deinterleaver, the despreader, the spreader and the interleaver are combined to process
concurrently. Instead of using deinterleaved addresses to read the LLRs for despreading,
the interleaved domain architecture uses generated interleaved addresses to read these data
to calculate the extrinsic LLR. Therefore, the output of the proposed extrinsic LLR calcu-
lation is the interleaved data. Fig. 5.2 presents the interleaved domain architecture in the
IDMA receiver. The deinterleaver, the despreader, the spreader and the interleaver in the
interference canceller are replaced by the interleaved domain block to reduce the latency
by half.
In (5.4), data in one spreading codeword must be read simultaneously for despread-
ing. This means that there are SP data reads at the same time. Although the multiple
port register has the ability to read SP data simultaneously, its implementation is currently
impossible on field programmable gate array (FPGA) because it requires high hardware re-
source. Therefore, we propose to use multiple RAMs instead of multiple ports register for
low complexity. The number of RAMs is equal to the spreading length SP. The memory
size of each RAM is JSP . Thus, the total memory size of SP RAMs is J.
By using the RAM, (5.4) is rewritten as (5.5) where cˆn;k;m is the data of n-th user at
antenna k-th in m-th RAM. Modulo calculation of n( j) and SP is used to determine the
RAM which stores the current data.
69
6RIWPDSSHU
(OHPHQWDU\6LJQDO
(VWLPDWRU(6(
,'0$,QWHUIHUHQFH&DQFHOODWLRQIRUXVHUQWK
,QWHUOHDYHG'RPDLQ
$UFKLWHFWXUH
෠ܾ௡ሺ݆ሻ
ߝ௡ሺ݆ሻ
ොܽ௡ሺ݆ሻ ܧሺݔ௡ሺ݆ሻሻܸܽݎሺݔ௡ሺ݆ሻሻ
௡
Figure 5.2: Proposed architecture of IDMA receiver
(xn;k( j)) =
SP 1X
m=0
cˆn;k;m
 jn( j)
SP
k!
  cˆn;k;(n( j)%SP)
 jn( j)
SP
k!
(5.5)
The deinterleaver and the interleaver are omitted in the proposed architecture. Thus,
the reading for the extrinsic LLR calculation in the current iteration and the writing for
the updated LLR calculation in the next iteration use the same RAM address. The data is
read and written in the same time in two continuous iterations. And each data is randomly
read in SP times. If one RAM is used, the data is overwritten. Therefore, two RAMs are
used to separate the reading and the writing processes in two adjacent iterations. The total
number of RAMs becomes 2SP. Since the target FPGA has only dual-port RAM, the
proposed architecture uses a dual-port RAM as two single-port RAMs. Thus, the number
of dual-port RAMs is SP. And memory size of each dual-port RAM is 2JSP . In this paper,
the terminologies of “lower half” and “upper half” of dual-port RAMs are used to indicate
low addresses from 0 to JSP 1 and high addresses from JSP to 2JSP  1. The lower half and
upper half are used in two continuous iterations.
5.4 Implementation of Proposed Architecture
5.4.1 Conventional Architecture
In the conventional architecture [35], the IDMA interference canceller processes the itera-
tion sequence of deinterleaver, despreader, spreader, extrinsic LLR calculation, interleaver,
70
soft mapper and ESE as shown in Fig. 4.1. In a hardware design, the processing of inter-
leaver and deinterleaver needs two RAMs with 2J cycles to write the data. The flow chart
of the conventional architecture is presented in Fig. 5.3. In the first iteration, initialization
values include the mean E

xn;k( j)

=0 and the variance Var

xn;k( j)

=1. The ESE calculation
uses the received signal yk( j), the channel of each user Hn;k( j) and the initialization values
to calculate the estimated LLRs. The ESE calculation needs Ctrl delayed cycles. The
deinterleaver is used to detect user with dierent interleaver patterns for users. In the con-
ventional architecture, there are two single-port RAMs used for each iteration. In Fig. 5.3,
the deinterleaver uses one single-port RAM called “RAM 0” and the interleaver uses the
other single-port RAM called “RAM 1”. The deinterleaver uses “RAM 0” to write inter-
leaved data corresponding to interleaved write addresses called “De IL WRITE”. After J
cycles, data is read with sequence read addresses called “De IL READ”. These sequence
data are despread after SP cycles. In the first iteration, these LLRs are not correct and need
to be updated. The LLRs are spread and the extrinsic LLRs are calculated by subtracting
the spread data with the pre-despread data. These extrinsic values are written in “RAM 1”
for the interleaver called “IL WRITE”. After J cycles of writing, the interleaved data is
read called “IL READ”. These interleaved data are used to calculate the updated mean
and variance at the soft mapper. These updated LLRs are feedback to the ESE calculation
for the next iteration. In the last iteration, the deinterleaver and the despreader are used to
export the decoded bits. The despread data is written in “RAM 1” with sequence address
to export the decoded bit called “SP WRITE”. This process needs J cycles to write the de-
spread data. In total, the operation cycles that need to process the interference cancellation
in the conventional architecture are I(2J+SP+Ctrl) cycles.
5.4.2 Proposed Architecture
Fig. 5.4 presents the flow chart of the proposed architecture. In each iteration, the ESE
calculation and the soft mapper need Ctrl cycles to produce the LLRs. In the first iteration,
the proposed architecture writes the interleaved LLRs with the interleaved addresses into
lower half of all dual-port RAMs. In Fig. 5.4, ID1 and ID2 are used to decide lower half
or upper half of dual-port RAMs called “All RAMs ID1” and “All RAMs ID2” where
71
Figure 5.3: Flow chart of the conventional architecture
ID1=mod(Iteration,2) and ID2=mod(ID1,2). If ID1 and ID2 are equal to 0, the lower
half of dual-port RAMs is used. Otherwise, the upper half of dual-port RAMs is used.
Therefore, the writing and the reading are processed in two dierent part of RAM in one
iteration to avoid overwriting. “All RAMs” means “RAM 1-st” to “RAM SP-th” as in
Fig. 5.5. The deinterleaver writing called “De IL WRITE” needs J cycles. After writing
the deinterleaved data, the proposed architecture reads simultaneously SP data in SP RAMs
with the interleaved addresses called “IL READ”. In “Extrinsic LLR calculation”, the
interleaved read data from SP RAMs are added together simultaneously for despreading
to reduce SP cycles compared to the conventional system. After that, the despread data
subtracts the current data for the extrinsic LLR calculation. In the second iteration, LLRs
72
Figure 5.4: Flow chart of the proposed architecture
output from the ESE calculation is written in upper half of dual-port RAMs at the addresses
which correspond to the read addresses in the first iteration. These iterations are processed
in the loop until the last iteration which has Iteration equal to I-1. In the last iteration, the
sequence address is used to read as in the normal deinterleaver called “De IL READ”. Then
the LLRs from SP RAMs are added together simultaneously for the despreading to export
the decoded bit. In Fig. 5.5, the “Last” signal is used to select the sequence address and
enable to export JS P decoded bits. Since the proposed architecture can skip the despreading
and downsampling processes at the last iteration, it can reduce J cycles compared to the
conventional system. The proposed architecture needs J+Ctrl cycles to process data for
each iteration. We need I(J+Ctrl) in total. The latency is reduced by half compared to
I(2J+SP+Ctrl) cycles in the conventional architecture.
The proposed architecture is shown in Fig. 5.5. The inputs are described in Table 5.2.
Note that the write address (WA) and the read address (RA) are sequence addresses which
are generated by counter from 0 to J 1. The timing chart of the write enable (WE1, WE2),
73
the read enable (RE1, RE2) and Last signal are shown in Fig. 5.6. WE1 and RE1 are used to
enable the writing and the reading process in lower half of dual-port RAMs. WE2 and RE2
are used to enable the writing and the reading process in upper half of dual-port RAMs.
Therefore, the delay between WE1 and WE2 as well as RE1 and RE2 is J+Ctrl. Last signal
is used right after the last iteration to control the exporting of the decoded bits. Last signal
is set to 1 within JSP cycles which is equal to the length of the despread data shown in
Fig. 5.6. In Fig. 5.5, the algebraic interleaver is used to generate the interleaver index. The
write address input (wa) and the read address input (ra) of RAMs are calculated based on
Eq. (5.5). WE2 and RE2 are used to enable the upper half of RAMs. If the upper half is
chosen which means WE2 and RE2 equal to 1, wa and ra are added to JSP shown in black
blocks in this figure. In the proposed architecture, the data which are stored on RAMs at
the same address are in the same spreading codeword. In other words, the order of the data
in the spreading codeword corresponds to the RAM index. Thus, the write enable of the
first RAM (we1) to the SP-th RAM (weSP) are used to determine the current data written
in which RAM. At one time, one write enable signal is equal to 1, the others are equal to
0. In contrast, since the reading is performed simultaneously in multiple RAMs for the
despreading, the read enable (re) is the same for all RAMs. However, the extrinsic LLR
calculation needs to eliminate the current data from the despreading calculation. The select
signal sel1 to selSP are used to eliminate the current data which is set to 0. In the last
iteration, Last signal is used as a control signal to export the decoded bit. The additional
process for the last iteration is noted by the dash items in Fig. 5.5. The read address is
sequence address which is used to read the data from all RAMs as the normal deinterleaver.
Since the extrinsic LLR calculation is skipped, all read data are added together to despread.
Thus, the select signals are set to 1.
5.5 FPGA Implementation Results of Interleaved Domain
IDMA Receiver
In order to show the performance of the proposed system as well as to confirm the sound-
ness of the chosen design architecture, we perform simulations of the BER performance
74
        
  
  
  





	


	


	



 






















  









	























 
 
   
 
 
      
	



 
 
 
 
 
   
 
 
 
 
 
 
 

	















 





 










 

































  







 
	



 




 


 



Fi
gu
re
5.
5:
A
rc
hi
te
ct
ur
e
o
ft
he
pr
op
os
ed
in
te
rle
av
ed
do
m
ai
n
ar
ch
ite
ct
ur
e
u
sin
g
du
al
-p
or
tR
A
M
75
Figure 5.6: Timing chart of the proposed architecture
and the latency comparison of the conventional architecture with the proposed architec-
ture. The eciency of the proposed system in hardware utilization is also shown in this
section. The default simulation parameters are listed in Table 5.3.
5.5.1 Simulation Results of Interleaved Domain IDMA Receiver
The BER performance result of the proposed architecture and the conventional architecture
are shown in Fig. 5.7. The fixed point word length which is used is 24 bits including the in-
teger length of 8 bits and the fraction length of 16 bits. The maximum simulation iterations
is 10,000 times with a 512 bits data frame. Since the calculations of the ESE and the soft
mapper are remained unchanged in the proposed architecture, the BER performance of the
fixed-point proposed architecture is as the same as the fixed-point conventional architecture
in hardware implementation. The comparison between the hardware implementation of the
proposed system and the Matlab simulation of the conventional system is also shown in
Fig. 5.7. Since the fixed-point word length chosen in the design is large enough to perform
LLR values, the BER performance of the proposed architecture is closed to the BER per-
formance of the conventional architecture with floating-point. Migrating from the floating
to fixed point representation results in a small (0.1 dB) loss in BER performance. The
76
Table 5.2: Input/Output Port Parameters
Din Received signal
Hin Estimated channel
WE1/WE2 Write enable for lower/upper half in
RAMs
RE1/RE2 Read enable for lower/upper half in
RAMs
WA/RA Write/Read address are generated by
counter (0 to J-1)
Last Equal 1 right after the last iteration,
otherwise equal 0
Dout Output signal of decoded bit
0 1 2 3 4 5 6 7 810
−7
10−6
10−5
10−4
10−3
10−2
10−1
SNR (dB)
B
ER
 p
er
fo
rm
an
ce
 
 
Conventional architecture (Matlab) − Floating−point
Conventional architecture − Fixed−point
Proposed architecture − Fixed−point
Figure 5.7: BER performance of the proposed system vs SNR
small dierence of two lines also shows the proposed system to be robust to fixed point
arithmetic.
In Table 5.4, the comparison between the conventional architecture [35], the dual-frame
processing [32] and the proposed architecture is shown. Wd denotes a bit length in fixed-
point operation, F indicates the clock frequency (Hz) and Nb is the frame data size (bits).
Although the number of RAMs in the proposed architecture is larger than the number of
77
Table 5.3: Simulation Parameters
System IDMA
Modulation type BPSK
Frame data size [bit] (Nb) 512
Repetition code length (SP) 16
Number of symbols (J) 8192
Number of users 20
Number of IDMA iterations (I) 10
Number of algebraic interleaver stage 3
Fixed-point word length [bit] (Wd) 24
Fixed-point fraction length [bit] 16
Channel model One-path Rayleigh fading
Signal to noise ratio [dB] 12
Simulation iteration (times) 10,000
RAMs in the conventional architecture, the total memory size of the proposed architecture
is as the same as the conventional architecture. Moreover the memory size of the proposed
architecture is smaller than half of the dual-frame processing. The throughput of the pro-
posed architecture can increase by twice compared to the conventional method and is as the
same as that of the dual-frame processing. However, the latency of the proposed architec-
ture can be reduced by half compared to the conventional architecture while the dual-frame
processing [32] cannot reduce the latency.
As we can see above, the main contribution of the latency reduction is the interleaved
domain processing in the interference cancellation. Assuming a reference frequency of
640 MHz and an interleaver length of 900 bits, we plot the latency vs. the number of iter-
ations in Fig. 5.8. In Fig. 5.8, when the number of iterations increases, the number of the
interleaver and deinterleaver increases, which causes the latency to become large. By pro-
cessing the updated LLR completely in the interleaved domain, the latency of the proposed
architecture can be reduced by half compared to the conventional architecture. At the 10-th
iteration, while the conventional architecture needs about 28s to operate the system, the
proposed architecture needs only 14s which easily meets the SIFS requirement of IEEE
802.11 mentioned in the Introduction. While a 640MHz is too high for an FPGA imple-
mentation, an optimized application specific integrated circuit (ASIC) implementation of
78
Ta
bl
e5
.4
:C
om
pa
ris
on
o
fA
rc
hi
te
ct
ur
es
Ty
pe
C
on
v
en
tio
na
la
rc
hi
te
ct
ur
e[3
5]
D
ua
l-f
ra
m
e
pr
o
ce
ss
in
g
[3
2]
Pr
o
po
se
d
a
rc
hi
te
ct
ur
e
M
em
or
y
siz
e
(bi
ts)
2

J
W
d
4

J
W
d
2

J
W
d
Th
ro
ug
hp
ut
(bi
ts/
se
co
n
d)
F
N
b
I
(2

J
+
SP
+
C
tr
l)+
V
2
F
N
b
I
(2

J
+
SP
+
C
tr
l)+
V
F
N
b
I
(J
+
C
tr
l)+
V
O
pe
ra
tio
n
cy
cl
es
I
(2

J
+
SP
+
C
tr
l)+
V
I
(2

J
+
SP
+
C
tr
l)+
V
I
(J
+
C
tr
l)+
V
79
1 2 3 4 5 6 7 8 9 100
5
10
15
20
25
30
Number of IDMA iteration
La
te
nc
y 
(u
s)
 
 
Conventional architecture
Proposed architecture
Figure 5.8: Latency of the IDMA system vs iteration
the proposed architecture can come close. Additional techniques such as bit width and
IDMA iteration optimization can provide additional latency reduction but are outside the
scope of this paper.
This simulation does not include the channel decoder such as Viterbi decoder or low
density parity check (LDPC) decoder. In the IDMA and turbo coding literature, the convo-
lutional encoder is one of the recursive types because it has better performance in iterative
decoding when a posteriori probability (APP) decoder is inside the iteration loop. But since
this will cause a very high latency and hardware complexity to implement, the proposed
architecture opted for a simpler iteration loop where only the repetition decoder is placed
inside the iteration loop as in [25]. Hence even if the eect of a channel decoder is added,
the latency may increase but still below 16s so that the proposed IDMA architecture can
achieve to the time constraint of SIFS. The eect of the channel decoder with latency of
V cycles on the throughput can be seen in Table 5.4. For example in [36], the operation
cycles of Viterbi decoder are 54 clocks which translate to a mere 0.08s additional latency.
In Fig. 5.9, the latency evaluations of the conventional architecture and the proposed
architecture has the same time scale. The interference canceller iteration needs ten itera-
tions to estimate the bit information for each user. The operation frequency is the same
80
between the conventional architecture and the proposed architectures because ESE calcu-
lation having the longest path delay is the same in two architectures. Since the operation
frequency is the same, the latency of proposed architecture can be calculated by only op-
eration cycles. By using the simulation parameters as in Table 5.3, while the conventional
architecture needs 10(28192+16+14)=164; 140 cycles, the proposed architecture needs
10(8192+14)=82; 060 cycles in the interference cancellation. Thus, the latency of the pro-
posed architecture can reduce by half compared to the conventional architecture as shown
in the mathematical equations in Table 5.4.
5.5.2 Synthesis Results of Interleaved Domain IDMA Receiver
The synthesis results of the target FPGA Xilinx Virtex 6 240TFF784 are presented in Table
5.5 and Table 5.6. In Table 5.5, the hardware utilization of the conventional architec-
ture, the proposed architecture using single-port RAM and the proposed architecture using
dual-port RAM are shown. Because the target FPGA has only dual-port RAM, the use
of single-port RAM increases number of RAM blocks. It also uses the extra logic for the
address decoder. Hence, the register and the look-up table (LUT) usage are higher than
the conventional one and the design of dual-port RAM. The dierence of the conventional
architecture and the proposed architecture using dual-port RAM is small, which demon-
strates our proposed architecture using dual-port RAM to be eective for IDMA system.
The proposed architecture using dual-port RAMs increases slice registers to 14% while
reducing slice LUTs to 8% and occupied slices to 1% compared to the conventional archi-
tecture. RAM and digital signal processing (DSP) block of the proposed architecture are
as the same as the conventional architecture. Since the proposed architecture has to gen-
erate specific write address and read address, the number of registers needed are slightly
larger than the conventional one. However, the number of slice LUTs and the occupied
slices are slightly smaller than the conventional architecture because the despreading and
the extrinsic LLR calculation are combined to use one adder in the proposed architecture.
The number of RAMs is the same because the total memory size of RAM is the same. The
evaluated frequency is 110MHz which is the same between the conventional architecture
and the proposed architecture because ESE calculation having the longest path delay is the
81
Fi
gu
re
5.
9:
La
te
nc
y
ev
al
ua
tio
ns
o
ft
he
co
nv
en
tio
na
la
rc
hi
te
ct
ur
e
an
d
th
e
pr
op
os
ed
ar
ch
ite
ct
ur
e
82
Table 5.5: Synthesis Comparisons
Type Conventional
system [35]
Proposed system
Single-port RAM
Proposed system
Dual-port RAM
Frequency 110 MHz 110 MHz 110 MHz
Slice Registers 18,604 25,204 21,204
Slice LUTs 41,919 76,947 38,551
Occupied Slices 11,903 22,853 11,734
RAMB36E1 160 160 160
RAMB18E1 320 480 320
DSP48E1s 420 420 420
Table 5.6: Synthesis Results (Xilinx Virtex 6 240TFF784)
Type Proposed system Available Utilization (%)
Slice Registers 21,204 301,440 7%
Slice LUTs 38,551 150,720 25%
Occupied Slices 11,734 37,680 31%
RAMB36E1 160 416 38%
RAMB18E1 320 832 38%
DSP48E1s 420 768 54%
same in two architectures.
Table 5.6 shows the hardware utilization of the proposed architecture. The result indi-
cates that the proposed architecture can fit the target FPGA board.
5.6 Summary
We have presented the interleaved domain architecture of an interference cancellation for
the IDMA receiver which can reduce the latency about 50% eectively and increase the
throughput about twice with almost the same hardware utilization. Because the interleaved
domain architecture uses the same LLR calculation equation as the conventional IDMA,
the BER performance of the interleaved domain is unchanged. The simulation results show
that if we use a frequency of 640 MHz and an interleaver symbol of 900 bits, the process-
ing takes about 14s which is smaller than 16s and so it can satisfy the SIFS requirement
83
of 802.11 systems. The design is implemented in the target FPGAs of Xilinx Virtex 6
240TFF784. The synthesis results have also shown the eciency of the proposed architec-
ture compared to the conventional architecture and the ability to implement this system on
the target FPGA board.
84
Chapter 6
Conclusions and Future Works
6.1 Conclusions
The goal of this thesis is to make IDMA systems applicable for future MU-MIMO com-
munication systems. The IDMA system has several other advantages over uplink multiple
access schemes such as OFDMA and CDMA. However, since the latency of IDMA system
is high due to iterative processing, the IDMA system have not proposed yet for any wireless
standards. The interleaved domain IDMA system can reduce the latency to half increasing
the throughput by twice which can able to implement into the practice. Moreover, the pro-
posed higher order QAM modulation for IDMA system can achieve the low complexity and
also improve the throughput. Regardless of the wireless applications, the proposed MU-
MIMO channel emulator is important to test the IDMA system and the current MU-MIMO
systems are properly working.
A comprehensive view of MU-MIMO wireless communication system has been pro-
vided in Chapter 1 and Chapter 2.
We have presented the implementation of MU-MIMO channel emulator in Chapter 3.
This channel emulator also includes the automatic CSI feedback which is necessary for
the evaluation of the MU-BF system. Our emulator is based on FPGA technology and
rapid prototyping software tools. Synthesis results have also shown the eciency of single
path processing in the hardware implementation. In a parallel implementation, adding a
85
feedback channel output would double the hardware complexity. A single path implemen-
tation, however, would result in only a few additional non-sequential elements even though
the sequential elements such as registers would double as usual. In the single path imple-
mentation of IEEE 802.11ac channel model D, the logic utilization for both feedforward
channel and feedback channel is only 20% while the utilization of one feedforward channel
takes all 15%. Comparing single path implementation with parallel processing, the signif-
icant eciency of single path implementation is indicated. The estimated logic utilization
of parallel processing takes 16800%, which cannot be consequently fitted into the imple-
mentation device. The single path implementation method, however, requires only 15%,
reducing its workload by 1120.
In Chapter 4, we have proposed the low complexity IDMA system by using the simpli-
fied higher order QAM modulations. For the same number of transmitted bits per symbol,
the complexity of 256-QAM modulation is about 25% compared to the SCM-QPSK mod-
ulation. By using the higher order QAM modulations, the proposed IDMA system can im-
prove the throughput but the performance is not good. We have compared the performance
of SCM-QPSK and higher order QAM modulation for IDMA system with one antenna.
The performance of the proposed higher order QAM modulation worse than SCM-QPSK-
IDMA about 1 dB to 2 dB at 10 4 dB. We have shown the eectiveness of using the antenna
diversity to improve the performance for the QAM-IDMA system. If two antennas are used
in the proposed system, the performance of higher order QAM IDMA system is improved
by twice compared to the one antenna IDMA system.
In Chapter 5, we have presented the low latency IDMA system which uses a novel in-
terleaved domain architecture. The proposed architecture can perform multi-user detection
directly without deinterleaving the received frame in the interference canceller iteration.
The interleaving is also no longer needed in the interference cancellation loop resulting in
the decrease of latency. The hardware implementation of this low latency IDMA system
has presented. By using the design by RAM instead of registers, the proposed interleaved
domain architecture of an interference cancellation can reduce the latency to 50% eec-
tively and increase the throughput to double with almost the same hardware utilization.
The simulation results show that if we use a frequency of 640 MHz and interleaver symbol
of 900 bits, the processing takes about 14s and hence can satisfy the SIFS requirement of
86
802.11 systems.
As a result of the low latency and low complexity IDMA architecture, the proposed
IDMA is more feasible for the practical implementation in future wireless communication
systems. In addition, the MU-MIMO channel emulator can provide the experimental tests
for the proposed IDMA in the implementation.
6.2 Future Works
In our future work, we will do a thorough analysis of the proposed system to improve
its convergence. One way to do this is via optimal power allocation for IDMA system.
Another avenue to improvement is by using a flexible spreading length and number of
iterations depending on number of users. Since the latency is independent of the spreading
length in the proposed architecture, the control signals for flexible spreading length may be
implemented easier than the conventional IDMA architecture.
For the chip design, the VLSI implementation of the proposed IDMA architecture is
necessary to get the power consumption and circuit area.
According to the result of the latency simulation in Chapter 5, we use the high frequency
of 640 MHz because we want to achieve a low latency. In current, it is very hard to meet
this frequency. The additional technologies need to be considered to achieve such high
frequencies.
Because of the design complexity of register for the low latency IDMA, the current
design as shown in Chapter 5 uses the design of dual-port RAM. In case of the multi-port
RAM supporting, the proposed interleaved domain IDMA can achieve lower complexity.
The combination of IDMA system and OFDMA system is considered as an interest-
ing future work. The bandwidth resources are split orthogonally into identical sub-bands
like OFDMA technique. Each sub-band includes a number of users that can transmit their
signals simultaneously within each sub-band by IDMA technique. The other users are
decoded independently without any interference. The decoding complexity of multi-user
detection is lower than IDMA system. By this combination, we have greater spectral e-
ciency and reduce the number of multi-user detection at the receiver side. Because of using
IDMA technique instead of NOMA power allocation technique, the user grouping of weak
87
channel gains and high channel gains is unnecessary. This leads to the low complexity
system in the practical implementation.
88
Appendix A
Snapshots of the Designs
This appendix shows the snapshots of our proposed designs. For the Model based designs
for the MU-MIMO channel emulator in chapter 3, we show the snapshots of the circuits.
For the Verilog based designs for the the low latency IDMA system, we show the snapshots
of simulation waveform run by Modelsim.
89
Fi
gu
re
A
.1
:M
U
-M
IM
O
ch
an
ne
le
m
u
la
to
rf
or
4x
4
an
te
nn
a
an
d
35
ta
ps
90
Fi
gu
re
A
.2
:M
U
-M
IM
O
ch
an
ne
le
m
u
la
to
rw
ith
so
u
n
di
ng
fe
ed
ba
ck
91
Figure A.3: MU-MIMO channel emulator evaluation by using oscilloscope
Figure A.4: Spatial correlation block of MU-MIMO channel emulator
92
Fi
gu
re
A
.5
:R
ic
ia
n
bl
oc
k
o
fM
U
-M
IM
O
ch
an
ne
le
m
u
la
to
r
93
Acknowledgment
I would like to thank Prof. Hiroshi Ochi, who has instructed and supported me during the
Ph.D course in Kyushu Institute of Technology. I also would like to thank Prof. Masayuki
Kurosaki, Dr. Leonardo Lanante and Dr. Yuhei Nagao for their insightful comments and
advices in all time of my research.
To my parents and siblings who support me in every undertaking in my life.
I am also indebted to the following reviewers, Prof. Masato Tsuru, Prof. Xiaoqing
Wen, who took time to read and give their very helpful advices for my thesis manuscript.
Prof. Shuichi Ohno and Prof. Shigenori Kinjo who traveled to Fukuoka for my thesis de-
fense and also gave me very insightful comments.
I am also thankful for the Japanese Government (MEXT) Scholarship Program for giv-
ing me financial and moral support during my Ph.D course.
I cannot thank enough to all lab members, especially my tutor Ms. Reina Hongyo, for
their helpings me to solve all problems related to daily life in Japan.
94
Bibliography
[1] L. Dai, B. Wang, Y. Yuan, S. Han, C.-L. I, and Z. Wang, “Non-orthogonal multiple
access for 5G: solutions, challenges, opportunities, and future research trends,” IEEE
Communication Magazine, vol. 53, no. 9, pp. 74-81, 2015.
[2] T. Uwai, T. Miyamoto, Y. Nagao, L. Lanante Jr., M. Kurosaki, and H. Ochi, “Adap-
tive Backo Mechanism for OFDMA Random Access with Finite Service Period in
IEEE802.11ax,” in Proc. IEEE Conference on Standards for Communications and
Networking (CSCN), Berlin, Germany, Oct. 2016.
[3] C. Lei, H. Bie, G. Fang, M. Mueck, and X. Zhang, “An Ecient Backo Algorithm
Based on the Theory of Confidence Interval Estimation,” IEICE Transaction Commu-
nication 2016, vol. E99-B, no. 10, pp. 2179–2186, May 2016.
[4] Y. Li, “OFDM-IDMA wireless communication systems,” Mphil thesis, City Univer-
sity of Hong Kong, P.R.China, 2007.
[5] F. Tosato and P. Bisaglia, “Simplified soft-output demapper for binary interleavered,”
IEEE Transactions on Wireless Communications, vol. 6, no. 5, pp. 1973-1983, May
2002.
[6] K. Li, X. Wang and L. Ping, “Analysis and optimization of interleave division
multiple-access communication systems,” IEEE Transactions on Wireless Commu-
nications, vol. 6, no. 5, pp. 1973-1983, May 2007.
[7] S. Yoshizawa, Y. Hatakawa, T. Matsumoto, and S. Konishi, “Hardware implementa-
tion of an interference canceller for IDMA wireless communications,” in Proc. 2013
95
International Symposium on Intelligent Signal Processing and Communications Sys-
tems (ISPACS), pp. 645-650, Japan, Nov. 2013.
[8] T.T.T. Nguyen, N.V. Ha, Y. Nagao, L. Lanante, B.H. Phu, M. Kurosaki and H. Ochi,
“Hardware implementation of a MIMO channel emulator,” in Proc. 28th International
Technical Conference on Circuits/Systems, Computers and Communications (ITC-
CSCC), pp. 758-761, Korea, Jul. 2013.
[9] M. D. Dianu, J. Riihijarvi and M. Petrova, “Measurement-based study of the perfor-
mance of IEEE 802.11ac in an indoor environment,” in Proc. 2014 IEEE International
Conference on Communications (ICC), pp. 5771-5776, Sydney, Australia, June 2014.
[10] L. Ping, “Interleave-division multiple access and chip-by-chip iterative multi-user de-
tection,” IEEE Communications Magazine, vol. 43, no. 6, pp. S19-S23, June 2005.
[11] IEEE 802.11ac, “Specification framework document,” IEEE 802.11-09/0992r21, Jan.
2011.
[12] IEEE 802.11ax, “Specification framework for TGax,” IEEE 802.11-15/0132r14, Jan.
2016.
[13] T. T. T. Nguyen, L. Lanante, Y. Nagao, and H. Ochi, Low Complexity Higher Order
QAM Modulation for IDMA system, in Proc. 2015 IEEE Wireless Communications
and Networking Conference Workshops (WCNCW 2015), pp.129–134, New Orleans,
USA, Mar. 2015.
[14] G. Redieteab, L. Cariou, P. Christin and J. Helard, “PHY+MAC channel sounding
interval analysis for IEEE 802.11ac MU-MIMO,” in Proc. 2012 International Sympo-
sium on Wireless Communication System, pp. 1054-1058, Paris, France, Aug. 2012.
[15] L. Ping, L. Liu, K. Wu, and W. K. Leung, “Interleave division multipleaccess,” IEEE
Transaction Wireless Communication, vol. 5, no. 4, pp. 938-947, Apr. 2006.
[16] Copyright 2016 Azimuth Systems Inc., “ACE400WB MIMO
Channel Emulator,” Azimuth, http://www.azimuthsystems.com/wp-
content/uploads/PB ACE400WB 8Jan16.pdf, accessed May 2016.
96
[17] T. Uwai, L. Lanante, B. Sai, H. Ochi, Y. Nagao and N. Surantha, “Live demonstration:
IEEE802.11 wireless LAN system verification platform,” in Proc. 2014 Asia Pacific
Conference on Circuit and System (APCCAS), pp. 175-176, Japan, Nov. 2014.
[18] TGac Channel Addendum, https://mentor.ieee.org/802.11/dcn/09/11-09-0308-12-
00ac-tgac-channel-model-addendum-document.doc, accessed Jan. 2016.
[19] L. Schumacher and B. Dijkstra, “Description of a MATLAB implementation of the
Indoor MIMO WLAN channel model proposed by the IEEE 802.11 TGn Channel
Model Special Committee”, The University of Namur, Jan. 2004.
[20] Yuji Yokota, Shingo Yoshizawa, and Hiroshi Ochi, ”ASIP Implementation of CSI
Feedback and Low Complexity Precoding for MU-MIMO system”, in Proc. The
3rd International Conference on Computing, Management and Telecommunications
(ComManTel), pp.88-93, DaNang, Vietnam, Dec. 2015.
[21] X. Xie, X. Zhang and K. Sundaresan, “Adaptive Feedback Compression for MIMO
Networks,” In Proc. 19th Annual International Conference on Mobile Computer and
Networking (ACM MobiCom), pp. 477-488, Florida, USA, Sept. 2013.
[22] A. Michaloliakos, R. Rogalin, V. Balan, K. Psounis, G. Caire, “Ecient MAC for dis-
tributed multiuser MIMO systems,” in Proc. 2013 Wireless On-demand, Networking
System and Service, pp. 52-59, Mar. 2013.
[23] M. Ibnkahla, “Handbook of Signal Processing for Mobile Communications,” CRC
Press, Jan. 2004.
[24] J. Tong and L. Ping, “Performance analysis of superposition coded modulation,”
Physical Communication, vol. 3, no. 3, pp. 147-155, Sep. 2010.
[25] L. Ping, P. Wang, X. Wang, “Recent progress in interleave-division multiple-access
(IDMA),” in Proc. IEEE Military Communications Conference, pp. 1-7, 2007.
[26] P.A. Hoeher and X. Wen, “Multi-Layer Interleave-Division Multiple Access for 3GPP
Long Term Evolution,” in Proc. IEEE International Conference on Communication
(ICC 2007), pp. 5508–5513, June 2007.
97
[27] D. Hao and P.A. Hoeher, “Iterative Estimation and Cancellation of Clipping Noise for
Multi-Layer IDMA Systems,” in Proc. 7th International ITG Conference on Source
and Channel Coding (SCC), pp. 1–6, Jan. 2008.
[28] K. Kusume, G. Bauch and W. Utschick, “IDMA vs. CDMA: Analysis and Compar-
ison of Two Multiple Access Schemes,” IEEE Transaction on Wireless Communica-
tion, vol. 11, no. 1, pp. 78–87, Jan. 2012.
[29] T.T.T. Nguyen, L. Lanante, Y. Nagao, and H. Ochi, “Low Complexity Higher Order
QAM Modulation for IDMA system,” in Proc. IEEE WCNC 2015 Workshop on Next
Generation WiFi Technology, pp. 129–134, New Orleans, USA, Mar. 2015.
[30] Z. Yin, X. Mao, J. Cai, and N. Zhang, “IDMA based MAI mitigation scheme with low
complexity and low latency,” IEEE Transaction on Wireless Communication, vol. 23,
no. 6, pp. 791–801, Dec. 2012.
[31] S. Wu, X. Chen, and S. Zhou, “A parallel Interleaver Design for IDMA Systems,”
in Proc. International Conference on Wireless Communication and Signal Processing
(WCSP), Nov. 2009.
[32] S. Yoshizawa, M. Nozaki, and H. Tanimoto, “VLSI Implementation of an Interference
Canceller Using Dual-Frame Processing for OFDM-IDMA Systems,” IEICE Trans-
action Fundamental, vol. E98-A, no. 3, pp. 811–819, Mar. 2015.
[33] F. Tosato and P. Bisaglia, “Simplified soft-output demapper for binary interleaved
COFDM with application to HIPERLAN/2,” in Proc. IEEE International Conference
on Communication (ICC 2002), pp. 664–668, 2002.
[34] O.Y. Takeshita and D.J. Costello, “New classes of algebraic interleavers for turbo
codes,” in Proc. IEEE International Symposium on Information Theory (ISIT),
pp. 419, Aug. 1998.
[35] R. Dodd, C. Schlegel, and V. Gaudet, “DS-CDMA implementation with iterative mul-
tiple access interference cancellation,” IEEE Transaction Circuits System, vol. 60,
no. 6, pp. 222–231, Jan. 2013.
98
[36] Y. Tang, D. Hu. W. Wei, W. Lin and H. Lin, “A Memory-Ecient Architecture for
Low Latency Viterbi Decoders,” Proc. IEEE Interational Symposium VLSI Design
(VLSI-DAT), pp. 336-338, Hsinchu, Taiwan, Apr. 2009.
99
Publication List
Journals
1. Tran Thi Thao Nguyen, Leonardo Lanante, Yuhei Nagao, and Hiroshi Ochi, “Multi-
User MIMO Channel Emulator with Automatic Channel Sounding Feedback,” IEICE
Transactions on Fundamentals, vol. E99-A, no. 11, pp. 1918-1927, Nov. 2016.
International Conferences
1. Tran Thi Thao Nguyen, Leonardo Lanante, Yuhei Nagao, Masayuki Kurosaki, Shingo
Yoshizawa and Hiroshi Ochi, “Low Latency Interleave Division Multiple Access
System,” in Proc. The 31st International Conference on Information Networking
(ICOIN 2017), pp. 7–12, Da Nang, Vietnam, Jan. 2017.
2. Tran Thi Thao Nguyen, Leonardo Lanante, Yuhei Nagao, Masayuki Kurosaki, and
Hiroshi Ochi, “MU-MIMO Channel Emulator with Automatic Channel Sounding
Feedback for IEEE 802.11ac,” in Proc. 2016 IEEE Wireless Communications and
Networking Conference (WCNC 2016), pp. 1363–1368, Doha, Qatar, Apr. 2016.
3. Tran Thi Thao Nguyen, Leonardo Lanante, Yuhei Nagao, and Hiroshi Ochi, “Low
Complexity Higher Order QAM Modulation for IDMA system,” in Proc. 2015 IEEE
Wireless Communications and Networking Conference Workshops (WCNCW 2015),
pp. 129–134, New Orleans, USA, Mar. 2015.
4. Tran Thi Thao Nguyen, Yuhei Nagao, Leonardo Lanante, Masayuki Kurosaki, and
Hiroshi Ochi, “FPGA Implementation of a MIMO Channel Emulator for the IEEE
100
802.11n/ac,” in Proc. 2015 Vietnam-Japan International Symposium on Antennas
and Propagation (VJISAP 2015), pp. 77–82, Ho Chi Minh, Vietnam, Jan. 2015.
5. Tran Thi Thao Nguyen, Nguyen Viet Ha, Leonardo Lanante, Yuhei Nagao and Hi-
roshi Ochi, “Hardware Implementation of a MIMO Channel Emulator”, 2014 In-
ternational Symposium on Dependable Integrated Systems (DISC 2014), Kyushu
Institute of Technology, Fukuoka, Japan, Mar. 2014.
6. Tran Thi Thao Nguyen, Nguyen Viet Ha, Yuhei Nagao, Leonardo Lanante, Bui
Huu Phu, Masayuki Kurosaki, and Hiroshi Ochi “Hardware implementation of a
MIMO channel emulator,” in Proc. 28th International Technical Conference on Cir-
cuits/Systems, Computers and Communications (ITC-CSCC 2013), pp. 758–761,
Korea, Jul. 2013.
Domestic Conferences
1. Tran Thi Thao Nguyen, Leonardo Lanante, Yuhei Nagao, Masayuki Kurosaki, Hi-
roshi Ochi, and Shingo Yoshizawa, “Interleaver/Deinterleaver-less Implementation
of an Interference Cancellation in Low Latency for IDMA Systems”, in Proc. IEICE
Technical Report on Radio Communication Systems, vol. 116, no. 257, pp. 149–154,
Yokohama, Japan, Oct. 2016.
2. Tran Thi Thao Nguyen, Leonardo Lanante, Yuhei Nagao, Masayuki Kurosaki, and
Hiroshi Ochi, “FPGA Implementation of Channel Emulator with Automatic Channel
Sounding Feedback for MU-MIMO WLAN systems,” in Proc. IEICE Technical Re-
port on Antenna and Propagation, vol. 115, no. 286, pp. 213–218, Okinawa, Japan,
Nov. 2015.
3. Tran Thi Thao Nguyen, Leonardo Lanante, Yuhei Nagao, and Hiroshi Ochi, “Higher
Order QAM Modulation for IDMA system”, in Proc. IEICE Technical Report on Ra-
dio Communication Systems, vol. 114, no. 490, pp. 405–410, Kyoto, Japan, Mar. 2015.
4. Leonardo Lanante, Tran Thi Thao Nguyen, Tatsumi Uwai, Takafumi Tomiyasu, and
101
Hiroshi Ochi, “Interleave Division Multiple Access for Next Generation Wireless
LAN”, in Proc. IEICE General Meeting, pp. 373, Kyoto, Japan, Mar. 2015.
Proposals for IEEE 802.11ax standard
1. Tran Thi Thao Nguyen, Leonardo Lanante, Hiroshi Ochi, Tatsumi Uwai, and Yuhei
Nagao, “Uplink multi-user MAC protocol for 11ax,” doc.:IEEE11-14/0598r0, Waikoloa,
Hawaii, USA, May 2014.
2. Leonardo Lanante, Tran Thi Thao Nguyen, Hiroshi Ochi, Tatsumi Uwai, and Yuhei
Nagao, “MAC Eciency Gain of Uplink Multi-user Transmission,” doc.:IEEE 802.11-
15/0089r1, Atlanta Georgia, USA, Jan. 2015.
3. Tatsumi Uwai, Yuhei Nagao, Tran Thi Thao Nguyen, Leonardo Lanante, and Hi-
roshi Ochi, “UL MU MAC Throughput under None Full Buer Trac,” doc: IEEE
802.11-15/0376r2, Berlin, Germany, Mar. 2015.
4. Leonardo Lanante, Tran Thi Thao Nguyen, Tatsumi Uwai, Yuhei Nagao, and Hi-
roshi Ochi, “Considerations on UL MU resource scheduling,” doc.:IEEE 802.11-
15/0377r0, Berlin, Germany, Mar. 2015.
102
