Évaluation des techniques de réception multi-antennes dans un système DS-CDMA by Sarraf, Élie
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
UNIVERSITÉ DU QUÉBEC 
MÉMOIRE PRÉSENTÉ À 
L'UNIVERSITÉ DU QUÉBEC À TROIS-RIVIÈRES 
COMME EXIGENCE PARTIELLE 
DE LA MAÎTRISE EN GÉNIE ÉLECTRIQUE 
PAR 
ELIE H. SARRAF 
ÉVALUATION DES TECHNIQUES DE RÉCEPTION MULTI-ANTENNES DANS UN 
SYSTÈME DS-CDMA 
JUIN 2007 
  
 
 
 
Université du Québec à Trois-Rivières 
Service de la bibliothèque 
 
 
Avertissement 
 
 
L’auteur de ce mémoire ou de cette thèse a autorisé l’Université du Québec 
à Trois-Rivières à diffuser, à des fins non lucratives, une copie de son 
mémoire ou de sa thèse. 
Cette diffusion n’entraîne pas une renonciation de la part de l’auteur à ses 
droits de propriété intellectuelle, incluant le droit d’auteur, sur ce mémoire 
ou cette thèse. Notamment, la reproduction ou la publication de la totalité 
ou d’une partie importante de ce mémoire ou de cette thèse requiert son 
autorisation.  
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
Abstract 
The SA is set to play a significant role in the development of the next-generation 
wireless communication system. SAs are a new technology for wireless systems that use 
a fixed set of antenna elements in an array. 
Technological progress has recently changed the introduction of multiple antenna 
elements at the receiver unit of wireless access points and mobile terminaIs from a 
purely theoretical concept to a practical issue in current and future wireless 
communication systems. 
The signaIs from these antenna elements are combined to form a movable beam 
pattern that can be steered, using either digital signal processing, or RF (Radio 
Frequency) hardware, to a desired direction that tracks mobile units (pedestrians, cars,) 
as they move. This allows the SA system to focus the RF resources on a particular 
subscriber, while minimizing the impact of noise, interference, and other effects that can 
degrade signal quality. 
The challenge lies in two approaches, the tirst is purely mathematical, called 
signal processing and the second is purely hardware, called implementation. These two 
approaches are highly correlated in order to achieve good performance especially in a 
real time requirements or applications, which is the case. 
In short, software and hardware engineers are facing an enormous challenge in 
designing, developing and implementing a reliable system core combining an efficient 
Aigorithm (Software) with an intelligent Architecture (Hardware) at a moderate cost for 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
the next generation of wire1ess communications systems. This challenge, this trade-off 
this future vision, consist the core of our work. 
11 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
Résumé 
Un des principaux avantages des systèmes des antennes intelligentes réside dans 
l'augmentation potentielle du nombre d'utilisateurs dans le ré~eau cellulaire d'une part, 
et l'accroissement de l'éventail des services offerts par le système cellulaire d'une autre 
part. L'intérêt de ces systèmes est leur capacité à réagir automatiquement à un 
environnement complexe dont l'interférence est connue à priori. L'augmentation du 
nombre d'usagers et l'amélioration de la qualité du service offert représentent un atout 
pour les futurs systèmes sans fils de la troisième et quatrième génération. 
Ces systèmes reposent sur des antennes réseau, des dispositifs pour calculer les 
angles d'arrivées et des outils numériques de synthèse. Ces derniers attribuent des poids 
aux éléments de l'antenne réseau afin d'optimiser le signal de sortie. 
Une antenne réseau adaptative peut être définie comme un réseau capable de 
modifier son diagramme de rayonnement. Cette modification est réalisée grâce à un 
algorithme performant implémenté et apte à répondre aux spécifications désirées. Plus 
spécifiquement, en considérant un système de communication à temps réel. 
La problématique se figure dans deux aspects physiquement différents, mais en 
terme d'application ces deux aspects sont très dépendants entre eux. 
Le premier problème est un aspect logiciel (Software) et mathématique appelé 
«Traitement de signal », il comprend les méthodes mathématiques appliquées aux 
antennes intelligentes. Le deuxième problème est un aspect matériel (Hardware) appelé 
111 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
« Implémentation », il comprend les circuits intégrés (DSP, FPGA, ASIC, ... ). Dans 
notre cas on utilise les composantes programmables FPGA. 
L'objectif de notre étude est de trouver pour chaque aspect la meilleure 
combinaison dans le but de donner une solution optimale qui lui correspond. 
lV 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
Preface 
This thesis is based on a work that was carried out at the Laboratory of SignaIs 
and Systems Integration LSSI, and at the Laboratory of the Canadian Microelectro.nics 
Corporation, Department of Electrical Engineering, Université du Québec à Trois-
Rivières. 
The purpose of this thesis is to provide a comparative study of the Smart 
Antennas (SAs) aspect, previous, CUITent and future approaches. In addition, this work 
presents a performance evaluation of the mathematical algorithms and methods used for 
SAs. Moreover, to minimize the time-to-market, this work provides an evaluation of the 
hardware implementation on FPGA (Field Programmable Gate Array) as well. 
According to the results, the best candidate will be able to drive the performance up and 
the costs down. 
Several thousand years ago, King Solomon wrote "The end of a matter is better 
than its beginning, and patience is better than pride". 
At the end of this matter, rd like to thank my family for their support and 
patience. 1 acknowledge my gratitude to my research advisor Dr. Daniel Massicotte as 
well as my research co-advisor Dr. Adel-Omar Dahmane, for providing me the tools 1 
needed to complete this work, giving me the opportunity to explore both past and 
present ideas and issues. 
v 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
1 would also like to acknowledge my colleague and friend Dr. Messaoud-Ahmed 
Ouameur for his insightful analysis. He was very helpful in gui ding me toward a 
qualitative methodology. 
My sincere thanks go to my brother Walid for al ways helping out in a crisis, and 
for his tinancial support. 
Ultimately, however aIl thanks are due to the beginning and end, the Alpha and 
Omega, the creator, sustainer, and redeemer ofmy soul, Jesus Christ. 
VI 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
Table of Contents 
ABSTRACT .................................................................................................................................................. 1 
RÉsUMÉ ...........................................................................................•....................................................... 111 
PREFACE ................................................................................................................................................... V 
TABLE OF CONTENTS .............................................................................•.......................................... VII 
LIST OF FIGURES ...................................•......................•.......................•...............•............................... IX 
LIST OF TABLES .................................................................................................................................... XI 
LIST OF ACRONYMS ........................................................................................................................... XII 
CHAPTER 1 - INTRODUCTION ............................................................................................................ 1 
1.1 INTRODUCTION ........................................................................................................................... 1 
1.2 AIM OF THESIS ............................................................................................................................ 3 
1.3 OVERVIEW OF CONTENT ............................................................................................................. 4 
CHAPTER 2 - SMART ANTENNAS STATE-OF-THE-ART .............................................................. 5 
2.1 INTRODUCTION ................................................................................................................................... 5 
2.2 THE NEED FOR SMART ANTENNAS ....................................................................................................... 6 
2.3 MULTIPLE ACCESS SCHEMES ............................................................................................................... 9 
2.3.1 Frequency Division Multiple Access (FDMA) ......................................................................... 10 
2.3.2 Time Division Multiple Access (TDMA) ................................................................................... Il 
2.3.3 Code Division Multiple Access (CDMA) .................................................................................. 12 
2.3.4 Space Division Multiple Access (SDMA) ................................................................................. 13 
2.4 SMART ANTENNA'S SySTEM .............................................................................................................. 15 
2.4.1 A First Basic Approach ............................................................................................................ 16 
2.4.2 A Second systematic Approach ..................................................................... ............................ 18 
2.4.3 Two difJerentfunctionality approaches .............................................................. ...................... 20 
2.5 SWITCHED SMART ANTENNAS (SBA) ................................................................................................ 20 
2.5.1 Description ............................................................................................................................... 20 
2.5.2 Advantages ............................................................................................................................... 21 
2.5.3 Drawbacks .............................................................. .................................................................. 22 
2.6 ADAPTIVE SMART ANTENNAS (TB A) ................................................................................................ 23 
2.6.1 Description .............................................................. ................................................................. 23 
2.6.2 Advantages ............................................................................................................................... 25 
2.6.3 Drawbacks ..................................................................... ........................................................... 28 
2.7 GENERALBENEFITS ........................................................................................................................... 28 
2.8 FUTURE CHALLENGES ....................................................................................................................... 31 
2.9 SUMMARY ......................................................................................................................................... 32 
CHAPTER 3 - PERFORMANCE EVALUATION OF BEAMFORMING TECHNIQUES ............ 33 
3.1 INTRODUCTION ......................................................................................................................... 33 
3.2 SIGNALMODEL ......................................................................................................................... 36 
VIl 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
3.3 NON ADAPTIVE PILOT-CHANNEL AIDED BEAMFORMING TECHNIQUES ....................................... 42 
3.3.1 Direct Approach (MRC) ................................................................... ................................... 42 
3.3.2 Direct Approach (DMI) ...................................................................................................... 43 
3.3.3 Eigen-Decomposition Approach ......................................................................................... 44 
3.3.4 Code Filtering Approach .................................................................................................... 45 
3.4 ADAPTIVE PILOT-CHANNEL AIDED BEAMFORMING TECHNIQUES .............................................. 46 
3.4.1 Least Mean Squares (LMS) ............................................................... .................................. 47 
3.4.2 Normilized Least Mean Squares (NLMS) ............................................................................ 48 
3.4.3 Noise Constrained Least Mean Squares (NC-LMS) ............................................................ 49 
3.4.4 Recursive Least Squares (RLS) ........................................................................................... 50 
3.4.5 Set Membership Identification (SMI) .................................................................................. 51 
3.5 SIMULATIONS RESULTS ............................................................................................................. 52 
3.5.1 PTR Effect ........................................................................................................................... 54 
3.5.2 Number of Antenna Effect ................................................................................................... 55 
3.5.3 Number ofUsers Effect ...................................................................... ................................. 56 
3.6 SUMMARY ................................................................................................................................ 57 
CHAPTER 4 - FPGA IMPLEMENTATION OF BEAMFORMING TECHNIQUES ..................... 58 
4.1 INTRODUCTION ......................................................................................................................... 58 
4.2 IMPLEMENTATION OF BEAMFORMING ALGORITHMS ................................................................. 60 
4.3 FPGA IMPLEMENTATION OFMRC AND NC-LMS .................................................................... 60 
4.3.1 Complexity analysis ............................................................................................................ 60 
4.3.2 Proposed architectures ....................................................................................................... 61 
4.3.3 Implementation Technique ............................................................ ...................................... 62 
4.3.4 Hardware resources .............................................................. .............................................. 65 
4.3.5 Quantization study .............................................................................................................. 67 
4.4 FPGA DESIGN AND IMPLEMENTATION OFDMI ........................................................................ 68 
4.4.1 Redesign ofDMI ....................................................................... .......................................... 68 
4.4.2 Complexity analysis ............................................................................................................ 70 
4.4.3 Quantization study .............................................................................................................. 71 
4.4.4 Proposed architecture ....................................................................... .................................. 73 
4.4.5 Implementation technique ................................................................................................... 76 
4.4.6 Mapping to VLSI architectures ........................................................................................... 77 
4.4.7 Hardware resources .............................................................. .............................................. 82 
4.4.8 Tradeoffs for maximum number of users ............................................................................ 83 
4.5 SUMMARY ................................................................................................................................ 86 
CHAPTER 5 - CONCLUSION ............................................................................................................... 88 
REFERENCES ......................................................................................................................................... 92 
APPENDIX A - RESULTS WITH A LOWER COMPLEXITY PLATFORM ................................. 99 
APPENDIX B - RÉSUMÉ EN FRANÇAIS ........................................................................................ 116 
Vl11 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
List of figures 
Page 
Chapter 2 
Figure 2.1 Wireless communication environment ..................................... ...... ... 6 
Figure 2.2 Problems related to wireless communications .... .. . .. ... ... . .... . ... . . .. ... ... . ... 8 
Figure 2.3 Multiple access schemes .................... ... . ........ .. ... . . . ...... . . . . .. . ... .. . . ... 12 
Figure 2.4 Concept of a CDMA system................ ... ........ ... .. . .. . .. ... . . . . . ..... .... ... 13 
Figure 2.5 The new dimension: Space Division Multiple Access ... ... ... ... . ... .. .. .... . . ... 15 
Figure 2.6 Two analogies.. .. . .. . .. ... ... . . . . . ..... .... .. .. . . ..... ... .. . .. . .. . .. . .. . .. . . .. . .. .. . . ... 18 
Figure 2.7 SA system and functionality ............. ... .... ......... ......... ............. ....... 19 
Figure 2.8 Switched-beam-array .. ... ..... ...... ............. ... ........... ....... ........... ..... 21 
Figure 2.9 Adaptive-beam-array ..... ........... ... ... ....... .... ........ ........ .......... ... .... 24 
Figure 2.1 0 Comparison between the ABA and the TBA ..................................... .... 26 
Chapter 3 
Figure 3.1 
Figure 3.2 
Figure 3.3 
Figure 3.4 
Chapter 4 
Figure 4.1 
Figure 4.2 
Figure 4.3 
Delay and Sum Beamformer ........................................................ . 
PTR effect .............................................................................. . 
Number of antenna effect ............................................................ . 
Number of users effect ............................................................... . 
Proposed implementation of MRC (a) and NC-LMS (b) .......................... . 
MRC mounted into Xilinx Blocks .................................................. . 
NC-LMS mounted into Xilinx Blocks .............................................. . 
35 
54 
55 
56 
62 
63 
64 
IX 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
Figure 4.4 
Figure 4.5 
Figure 4.6 
Figure 4.7 
Figure 4.8 
Figure 4.9 
Figure 4.10 
Figure 4.1 1 
Figure 4.12 
Figure 4.13 
Figure 4.14 
Quantization study results for MRC and NC-LMS using N=8, K=5 users, 
PTR=OdB, mobile speeds of 60Km/h, carrier frequency 2GHz and chip rate 
of 1.25Mchip/s, and Simulink blocks . . .. ... . .. .... ...... .. . .. . .. . .. ... . ..... ..... ... 68 
Fixed point results of the DMI technique (24-31). N = 16 (Processing 
gain), K = 10users, PTR=-6dB .............. .............. ... ....................... 72 
Proposed block diagram of the proposed VLSI architecture for the SD-DMI 
with the corresponding data flow ................................................... . 
Legend of operators ......................................................................... . 
Block b1 .......................................................................................... .. 
Block b2 ................................................................................. . 
Block b3 ................................................................................ .. 
Block b4 ................................................................................ .. 
Block b5 ................................................................................. . 
Block b6 ................................................................................. . 
Block b7 ................................................................................. . 
75 
77 
78 
79 
79 
80 
81 
81 
82 
x 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
Chapter4 
Table 4.1 
Table 4.2 
Table 4.3 
Table 4.4 
Table 4.5 
Table 4.6 
Table 4.7 
Table 4.8 
List of tables 
Number of arithmetic operations per iterations for MRC and NC-LMS ........ . 
Hardware resources for one antenna ................................ " ............. . 
Hardware resources for four antennas ............................................. . 
Percentage of slices Sand multipliers M for four antennas ..................... . 
Arithmetic Complexity of the SD-DMI (proposed) versus RLS, per antenna 
and per user. ............................................................................. . 
Time steps derivation from the SD-DMI scheduling of equations (4.2 - 4.9) 
Hardware resources for different blocks in the architecture for L=4 and J=1 0 
Maximum number of users J<!'IAX over different FPGA devices in the case of 
L=4,J=10 andM=l .................................................................... . 
Page 
61 
65 
66 
67 
71 
74 
82 
86 
Xl 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
A 
ABA 
AB 
ASIC 
ASSP 
AID 
B 
BER 
BPSK 
BRAM 
BS 
C 
CCI 
CDMA 
CPLD 
CSI 
D 
dB 
DOA 
DS-CDMA 
DSP 
DIA 
List of Acronyms 
Adaptive Bearn Array 
Adaptive Beamfonning 
Application Specifie Integrated Circuit 
Application Specified System Processor 
Analogie to Digital 
Bit Error Rate 
Binary Phase Shift Keying 
Block Randorn Access Mernory 
Base Station 
Co-Channel Interference 
Code Division Multiple Access 
Cornplex Programmable Logic Deviee 
Channel State Infonnation 
Deci-Bel 
Direction of ArrivaI 
Direct Sequence Code Division Multiple Access 
Digital Signal Processor 
Digital to Analogie 
XlI 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
E 
EM 
EVD 
F 
FDMA 
FIR 
FF 
FPGA 
H 
HDL 
1 
IC 
IF 
ISI 
IPN 
lOB 
IIO 
L 
LMS 
LUT 
M 
MAC 
MCU 
Electromagnetic 
Eigen Value Decomposition 
Frequency Division Multiple Access 
Finite Impulse Response 
Flip Flop 
Field Programmable Gate Array 
Hardware Description Language 
Integrated Circuit 
Intennediate Frequency 
Inter-Symbol Interference 
Interference Plus Noise 
Input Output Bound 
Input Output 
Least Mean Squares 
Look Up Table 
Multiplier Accumulator 
Maximum Core Unit 
xiii 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
MAI 
MIMO 
MMSE 
MRC 
MSINR 
MUD 
N 
NC-LMS 
NLMS 
o 
OBE 
p 
PCS 
PE 
POCR 
PTR 
Q 
QPSK 
R 
RF 
RLS 
S 
SA 
Multiple Access Interference 
Multiple Input Multiple Output 
Minimum Mean Square Error 
Maximum Ratio Combining 
Maximum Signal plus Interference Noise Ratio 
Multi-user Detector 
Noise Constrained Least Mean Squares 
Normalized Least Mean Squares 
Optimal Bounding Ellipsoids 
Personal Communication Service 
Process Element 
Performance Over Complexity Ratio 
Pilot to Traffie Power Ratio 
Quadrature Phase Shift Keying 
Radio Frequency 
Recursive Least Squares 
Smart Antenna 
XIV 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
SB 
SINR 
SMI 
SNOI 
SOI 
T 
TBA 
TBUF 
TDMA 
U 
ULA 
UMTS 
V 
VLSI 
VSLMS 
VHDL 
VHSIC 
W 
WCDMA 
WLAN 
WLL 
Switched Beamforming 
Signal to Interference Noise Ratio 
Set Membership Identification 
Signal Not OfInterest 
Signal OfInterest 
Tracking Bearn Array 
Tristate Buffer 
Time Division Multiple Access 
Uniform Linear Array 
Univers al Mobile Telecommunication System 
Very Large Scale Integration 
Variable Step-size Least Mean Square 
VHSIC Hardware Description Language 
Very High Speed Integrated Circuits 
Wideband Code Division Multiple Access 
Wireless Local Area Network 
Wireless Local Loop 
xv 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
Chapter 1 
Introduction 
1.1 Introduction 
Since the dawn of civilization, communication has been of foremost importance 
to mankind. In the first place, communication was accompli shed by sound through 
voice. However as the distance of communication increased, numerous devices were 
introduced, such as homs, drums, and so forth. 
At sorne point they used animaIs to send messages for long distances, e.g. 
pigeons. In addition to that, visual techniques were injected for even greater distances. 
For exampIe, signal flags and smoke signaIs were used in the daytime while fireworks in 
the night. These optical communications utilize the light portion of the electromagnetic 
(EM) spectrum and it has only been in recent times that the EM spectrum, outside the 
visible region, has been adopted for communication, though the use of the radio. 
The radio antenna may be defined as the structure associated with the region of 
transition between a guided wave and a free-space wave, or vice versa [KRA88]. In 
1 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
another word, radio antennas coupled EM energy from one medium e.g. space to another 
such as waveguide, wire or coaxial cable. 
Consequently, the applications of wireless communication systems such as 
radios, Bluetooth, cellular networks, WLAN (Wireless Local Area Netwark) and MIMO 
(Multiple Input Multiple Output) has erupted throughout the world and recent years have 
witness wireless communications relishing its fastest growth period in history. 
This huge eruption in the applications of wireless communications is due to an 
enonnous evolution of signal processing algorithms driving up the perfonnance and the 
quality of service as weIl, and most importantly the evolution of hardware 
implementation driving down the design to lower power consumption, lower complexity 
and lower costs. 
FPGAs have historically been found in high-end professional broadcast systems, 
network surveillance cameras and medical imaging equipment; their flexibility has made 
them very suitable for digital signal processing applications. Today the design flow for 
FPGA has been largely characterized by hardware centric approach. 
The requirement is the exposure of the high computation al efficiency of FPGAs 
matched by high bandwidth concurrent memory access and rich on-chip 
interconnectivity, combined with complete programmability. These requirements make 
FPGAs weIl suited for high efficient implementation of signal processing, packet 
processing and high performance computing applications [BOL06]. 
In our view, the hardware implementation aspect is much more crucial than the 
signal processing aspect, ever since the key of time-ta-market is related to the hardware 
implementation and its efficiency whether it meets the real time requirements or not, and 
most importantly whether the cost is affordable or not. 
2 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
1.2 Aim of thesis 
The demand for high perfonnance, better quality, lower complexity wireless 
communication systems, has led to the research and studies in this exciting topic. 
Communications has become the key to momentous changes in the organization 
of businesses and industries worldwide as they themselves adjust to the shift toward an 
infonnation economy. Consumers are in demand of more and more high-tech services, 
and these services require the developing of sophisticated algorithms and at the same 
time require an intelligent hardware implementation. 
In this project we face the problem of cellular systems in the third generation of 
mobile communications. Thus, we evaluate existing methods, even we examined 
methods that were never used for SAs, we, and at sorne point, re-designed an existing 
algorithm, and this re-design enables us to explore more efficiency and parallelism in the 
algorithm which results in larger bandwidth and higher quality of service though more 
reliable communication. 
The major goal of this work is to provide a perfonnance evaluation of the 
existing numerical algorithms and methods used for SAs. The evaluation is in tenns of 
measuring the Bit Error Rate (BER), the Pilot to Traffic Power Ratio (PTR), the number 
of antenna elements L, and the number ofusers K to be served by the base station (BS). 
For an insightful analysis of the results and for better understanding a DS-CDMA (Direct 
Sequence Code Division Multiple Access) platform based on real time requirements has 
been developed and deployed for simulations. The purpose of this evaluation is to give 
an overview of the used methods, and to choose the appropriate method favorable for 
hardware implementation. 
3 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
For hardware evaluation, three methods were chosen based on their complexity 
and performance wherein two different techniques have been used for implementation; 
rapid prototyping method and regular method or manual coding. The purpose of this 
evaluation is to give an overview of the two techniques and to find the suitable 
architecture for the suitable algorithm that meets our real time requirements. 
1.3 Overview of content 
The contents are organized as follows: 
Chapter 2 provides an in-depth description of the SA technology. It discusses the 
need of this technology and where to use it, different access schemes, their 
characteristics and limitations. This chapter presents a description of the SA system, 
different approaches, advantages, drawbacks and future challenges. 
Chapter 3 describes the mathematical aspect of the SA system, or the backbone 
of the system, so called Beamforming. This chapter discusses the DS-CDMA platform 
which has been used for simulations. Then it presents an in-depth description of the used 
algorithms, adaptive and non-adaptive approaches. Simulations are carried out for 
different schemes and parameters. 
Chapter 4 describes the FPGA implementation of the three chosen algorithms. 
The implementation techniques, their characteristics and efficiency, then it describe the 
proposed architectures, the hardware resources, and the quantization study as well in 
order to determine the precision of the algorithm. 
Lastly, chapter 5 draws a summary and concludes the thesis with future 
developments. 
4 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
Chapter 2 
Smart Antennas State-Of-The-Art 
2.1 Introduction 
With the limitation of the valuable EM spectrum, engineers are looking for new 
frontiers for the wireless communications battle. It would be very easy if we allocate for 
each user a frequency channel, but for a fixed bandwidth of spectrum, there is a 
fundamental limit on the number of radio channels that are realized by a mobile or a 
cellular communication system operating over the bandwidth [BHOOl], [BOUOO], and 
[BEL02a]. This can be considered as one of many reasons for the interest in SA 
technology. 
A second reason is the multi-path phenomenon, when the transmitted radio 
signal is reflected by physical features/structures [BOUOO] such as buildings, cars and 
other users. This reflection creates, as it is shown in Figure 2.1, multiple signal paths 
between the base station and the user terminal, and the consequences of this multi-path 
phenomenon are very dramatic on the wireless communications. 
5 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
Anterma 
..... 
" 
4 •• 
........... '" 
...... 
b· .... . 
~eCI ..... . 
P.!lllz •••••• 
...... 
:JI~' v§.'r>. ...... . -n\>6C-t ,,?O •••••••• 1.1»' ••••••.• 
.......... Usager#2 
Figure 2.1: Wireless communication environment. 
The interest in this promising technology is increasing since spatial processing is 
considered as the last frontier in the battle for cellular system capacity with a limited 
amount of the radio spectrum [BOUOO] as mentioned above i.e. limited channel 
bandwidth satisfying a growing demand for a large number of mobiles on 
communications channels [BEL02a]. 
Unlike previous published work, which covered each area separately (The 
communications problems, the antenna array design, and the adaptive algorithms) for 
SAs, this work presents a very global overview of SAs, from the needs of their 
employment, to the performance of the adaptive algorithms used in SAs. 
2.2 The need for smart antennas 
For a better understanding of the analogy of signal propagation in a 
telecommunication model, a model of wireless communication is illustrated in Figure 
2.1, this model contains a base station and two users communicating with each other. 
6 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
One of the major problems is what we caU Co-Channel Interference (CCI) 
[IECOE], [ALE04]. It occurs when the same carrier frequency reaches the same receiver 
from two separate transmitters i.e. the signaIs that miss an intended user can become 
interference for users on the same frequency in the same or adjoining cell. Here is a very 
good illustration of this type of interference. Envision a perfectly pool of water into 
which a stone is dropped. The waves that radiate outward from that point are uniform 
and diminish in strength evenly. This pure omni-directional broadcasting equate to one 
caller's signal originating at the terminal and going uplink. It is interpreted as one signal 
everywhere it travels. 
Picture now a Base Station (BS) at sorne distance from the wave origin. If the 
pattern remains undisturbed, it is not a challenge for a base station to interpret the waves. 
But as the signal' s waves begin to bounce off the edges of the pool, they come back 
(perhaps in a combination of directions) to intersect with the original wave pattern. As 
they combine, they weaken each other's strength. These are multi-path interference 
problems [IECOE). The problem becomes more serious when a few more stones have 
being dropped in different areas of the pool, equivalent to other caBs starting. How could 
a base station at any particular point in the pool distinguish which stone's signaIs were 
being picked up and from which direction? This multiple-source problem is called Co-
Channel Interference. 
There are two-dimensional analogies: to fully comprehend the distinction 
between calIers and/or signal in the earth's atmosphere, a base station must possess the 
intelligence to place the information it analyzes in a true spatial context. 
Another major problem is the multi-path. In a wireless system, the transmitted 
signal interacts with the environment in a very complex way. There are reflections from 
7 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
large objects, diffractions from sharp edges and scattering waves, the result of these 
interactions is the presence of many signal components, which are refereed as multi-path 
signal at the receiver [ZHAOl]. It is one of the major and serious problems in wireless 
communications systems and can be a limiting factor in a system [ALE04], [SHI97] (see 
Figure 2.1). 
Many problems are associated with multi-path. One problem resulting from 
having unwanted reflected signaIs is that the phases of the waves arriving at the 
receiving station do not match. The phase of a radio wave is simply an arc of radio 
wave, measured in degrees, at a specific time [IECOE] (see Figure 2.2 (a)). 
a) 
bl 
c} 
llmir 
Figure 2.2: Problems related to wireless communications. (a) SignaIs out of phase, 
(b) Phase cancellation, and (c) Fading. 
8 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
Another resulting problem is phase cancellation which occurs when waves of 
two multi-path signaIs are rotated to exactly 180 0 out of phase; the signaIs will cancel 
each other (see Figure 2.2 (b». The effect is more a concem when the control channel 
signal is canceled out resulting in a black hole, a service area in which caU set-ups will 
occasionally fail [IECOE]. 
The fading also result from the multi-path problem. It is a reduction in signal 
strength when the waves ofmulti-path signaIs are out of phase (see Figure 2.2 (c», this 
phenomenon is known as "Rayleigh fading" or "fast fading" causing the received signal 
to fluctuate downward, causing a momentary, but periodic degradation in quality 
[IECOE]. 
Another resulting problem is the delay spread. It occurs in multi-path 
propagation environments when a desired signal, arriving from different directions, 
becomes delayed due to different trave1 distances [CHROO] (see Figure 2.1). This effect 
has a critical impact on link quality. The resulting effect of this delay spread is an Inter-
Syrnbol Interference (ISI) [ALE04], or bits crashing into one another and the receiver 
cannot sort out. When this occurs, the BER rises and eventuaUy causes noticeable 
degradation in signal quality [IECOE]. 
2.3 Multiple access schemes 
Due to the recent development of wireless communication systems, the range of 
frequencies available for wireless communication technologies can be utilized in various 
ways/schemes, and this is referred to as multiple access schemes. These techniques are 
adopted to allow numerous users to share simultaneously a finite amount of signal 
spectrum. 
9 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
The distribution of spectrum is required to achieve this high system capacity by 
simultaneously allocating the available bandwidth (or available amount of channe1s) to 
multiple users. This must be accomplished without severe degradation in the 
performance of the system in order to achieve high quality communications. 
ConventionaIly, there are three major access schemes used to share the available 
bandwidth in a wireless communication. Nonetheless they are known as the Frequency 
Division Multiple Access (FDMA) , Time Division Multiple Access (TDMA), and the 
Code Division Multiple Access (CDMA). 
In [BHOOl], the authors present the evolution of SAs for the first, second and 
third generation. Here we present a brief overview of each technique, advantages and 
limitations, preparing for presenting the new technology known as the Space Division 
Multiple Access (SDMA). 
2.3.1 Frequency Division Multiple Access (FDMA) 
FDMA is the most widespread multiple-access scheme for land mobile 
communication system due to its ability to discriminate channels effortlessly by filters in 
the frequency domain. In FDMA, every subscriber is allocated to an individual unique 
frequency band or channel (see Figure 2.3 (a» where the allocated system bandwidth is 
divided into bands of Weh and guard space between adjacent channels to prevent 
spectrum overlapping that may be resulted from carrier frequency instability. 
Besides, when a user sends a caU request, the system will assign one of the 
available channels to the user, in which, the channel is used exclusively by the user 
during a call. However, the system will reassign this channel to a different user when the 
previous call is terminated. 
10 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
One of the most important advantages in FDMA system is there isn't any need 
for synchronization or timing control and therefore, the hardware is simple. In addition, 
there is only a need for flat fading consideration as for anti-fading technique because the 
bandwidth of each channel in the FDMA is sufficiently narrow. On the other hand there 
are many problems associated with FDMA systems and they are: 
> Inter-modulation interference increases with the number of carriers. 
> Variable rate transmission is difficult because such terminal has to prepare a lot 
of modems. For the same reason, composite transmission of voice and non-voice 
data is also difficult. 
> High Q-value for the transmitter and receiver filters is required to guarantee high 
channel selectivity. 
2.3.2 Time Division Multiple Access (TOMA) 
In the basic TDMA protocol, the transmission time is divided into frames of 
equal duration, and each frame is divided into the same number of time 8lots having 
equal duration. Each slot position within a frame is allocated to a different user and this 
allocation stays over the same sequence frames. 
Each user occupies a cyclically repeating time slot, so a channel may be thought 
of as a particular time slot that reoccur every frame. TDMA systems transmit data in a 
buffer-and-burst method: thus, the transmission for any user is non-continuous. This 
implies that digital data and digital modulation must be with TDMA [SEU99]. 
A TDMA frame with four time slots per frame is illustrated in Figure 2.3 b, with 
the shaded areas representing the guard times in each slot in which transmission is 
Il 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
prohibited. It is essential to have the guard times as it prevents transmissions of different 
(spatially distributed) users from overlapping due to transmission de1ay differences. 
Guard band Guard band Guard band Guard band 
a) 
Tlme 
Figure 2.3: Multiple access schemes (a) FDMA, (b) TDMA. 
2.3.3 Code Division Multiple Access (COMA) 
In CDMA systems, the signal is multiplied by a very large bandwidth signal 
called the spreading signal. The spreading signal is a pseudo-noise code sequence that as 
a chip rate which is in orders of magnitude greater than the data rate of the message 
[LEB99]. Having its own pseudorandom codeword, aU subscribers in a CDMA system 
use the same carrier frequency and may transmit simultaneously. A CDMA is illustrated 
in Figure 2.4. The most distinct feature of CDMA system is that all the terminaIs share 
the whole bandwidth, and each terminal signal is discriminated by the code. 
In a CDMA mobile communication system, since each subscriber is assigned its 
own (orthogonal) code, the signal of each subscriber is separated by individually 
12 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
corre1ating the received signal with the code sequence assigned to each subscriber 
performing a matched-filtering operation [SEU99]. 
al 
User5 
User4 
b) User3 
User2 _ 
User1 IDDIII 
User4 
W-I user31 
e) 
User1 
User2 
UserS 
1 1--
Time 
Figure 2.4: Concept of a CDMA system. (a) Spectrum of a CDMA system, (b) a 
caU initiation and holding model for five-user case, (c) channel 
allocation for each user. 
When each user sends a caU request to the base station, the base station assigns 
one of the spreading codes to the user. When five users initial and hold the caUs as 
shown in Figure 2.4 (b), time and frequency are occupied as shown in Figure 2.4 (c). 
Therefore CDMA requires a larger bandwidth as compared to FDMA and TDMA. 
Furthermore, there is aiso a need for code synchronization in CDMA system. 
2.3.4 Space Division Multiple Access (SOMA) 
In addition to these techniques, SAs provide a new method of multiple access 
schemes to the users, which is known as the SDMA. 
13 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
As its name implies, SDMA technology exploits infonnation collected in the 
spatial dimension in addition to the temporal dimension to achieve signifi cant 
improvements in wireless infonnation transmission. Spatially selective transmission and 
reception of RF energy provides substantial increases in wireless system capacity, 
coverage and quality [ROY97]. 
The SDMA .scheme, which is commonly referred to space diversity, uses SAs to 
provide control of space by providing virtual channels in an angle domain. With the use 
of this approach, simultaneous calls in various different cells can be established at the 
same carrier frequency. 
The SDMA scheme is based upon the fact that a signal arriving from a distant 
source reaches different antennas in an array at different times due to their spatial 
distribution, and this delay is utilized to differentiate one or more users in one area from 
those in another area [GOD97a]. 
Its advanced spatial processing capability enables it to locate many users, 
creating a different sector for each user, as shown in Figure 2.5, this means that more 
than one user can be allocated simultaneously to the same physical communications 
channel in the same cell, with only an angular separation. 
This technology dramatically improves the interference suppression capability 
while it greatly increases frequency reuse, resulting in increased capacity and reduced 
infrastructure cost. BasicaIly, capacity is increased not only through inter-cell frequency 
reuse, but also through intra-cell frequency reuse [BEL02A]. 
14 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
User # 1 User # 2 
Figure 2.5: Th~ new dimension: Space Division Multiple Access. 
Moreover this technique enables an effective transmission to take place in one 
cell without affecting the transmission in another cell. Without the use of an array, this 
can be accompli shed by having a separate base station for each cell and keeping cell size 
permanent, while the use of space diversity enables dynamic changes of cell shapes to 
reflect the user movement. Thus an array of antennas constitutes to an extra dimension 
in this system by providing dynamic control in space and needless to say, it leads to 
improved capacity and better system performance. 
In conclusion, we refer to this technology or to this new dimension, as one of 
Qualcomm's founders Andrew Viterbi stated: "Spatial processing remains as the most 
promising, ifnot the last frontier, in the evolution of multiple access systems" [ROY97]. 
2.4 Smart antenna's system 
First of aIl, antennas have been the most neglected of all the components in 
personal communications systems. Yet, the manner in which energy is distributed into 
and collected from surrounding space has a profound influence on the efficient use of 
spectrum, the cost of establishing new networks, and the service quality provided by 
those networks [IECOE]. 
15 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
The term "antenna" itself comprises only the mechanical construction 
transforming free EM waves into RF signaIs traveling on a shielded cable and vice versa, 
most often it is called the "radiating element" [BHOOl], [KRA88], i.e. it is the port 
through which RF energy is coupled from the transmitter to the outside world and, in 
inverse, to the receiver to the outside world [IECOE). 
The SA system combines multiple antenna elements with a signal processing 
capability to optimize its radiation and/or reception pattern automatically in response to 
the signal environment [IECOE). In other words, we define a SA system as an array of 
antenna elements [BHO), [CHROO) connected to either an analog receiver or a digital 
signal processor, whose radiation pattern adapts to the current signal environment 
[BHOOl]. 
2.4.1 A First Basic Approach 
The term SAs was born in the early 1990s when the well-developed adaptive 
arrays used in the military were brought by several scientists into mobile 
communications. To be more precise, SAs entered research in civil mobile 
communications before the 1990s under the original name adaptive antenna arrays 
[KAI05). So, the fundamental theory of SAs is not new, in fact, they have been applied 
in defense-related systems [BH001], [BEL02a], [IECOE] since World War II 
[BEL02a]. As a result, recently, the application of SAs has been suggested for mobile 
communications systems, to overcome the problem mentioned in section 2.2. 
The most enquiring is about its notation, the fact they are smart, but what makes 
them smart? 
16 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
In truth, antennas are not smart - antenna systems are smart [BEL02a], [IECOE]. 
Actually, it is the digital signal processing, along with the antennas, which make the 
system smart [BEL02a]. 
How this smartness can be understood? Where this smartness come from? A 
very good two illustrations show that the closest system to an SA is the human ear 
[BEL02a] and [IECOE], besides the functionality ofmany engineering system is readily 
understood when it is related to our human body. For instance, close your eyes and 
converse with someone as they move about the room, you will notice that you can 
determine their location without seeing them because of the following: 
» You hear the speaker's signaIs through your two ears, your acoustic sensors. 
» The voice arrives at each year at a different time. 
» Your brain, a specialized signal processor, does a large number of calculations to 
correlate information and compute the location of the speaker. 
In addition to that, your brain adds the strength of the signaIs from each ear together, 
so you perceive sound in one chosen direction as being twice as loud as everything else 
[BEL02a], [IECOE]. Furthermore, if additional speakers join in the conversation, the 
brain can tune out unwanted interferers, and concentrate on one conversation at a time. 
Conversely, the listener can respond back to the same direction as the desired speaker by 
orienting his or her transmitter - his or her mouth - towards the speaker [BEL02a]. As a 
result, 8, 10, or 12 ears can be employed to help fine-tune and tum up signal information 
[IECOE], or in the equivalent electrical system, an array of 8, 10 or 12 antenna elements 
can beused. 
17 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
2.4.2 A Second sysfematic Approach 
Electrical SA systems do the same things as the ears in the reception case and as 
the mouth in the transmission case and in both cases a digital signal processor instead of 
the brain (see Figure 2.6). 
Desired -I!tli 
speaker ~ 
Undesired ~ 
speaker 
a) -. ~ 
Figure 2.6: Two analogies (a) human analogy, (h) SA analogy. 
Therefore, after the digital signal processor receives the time delays from each 
antenna element, it computes the Direction-Of-Arrival (DOA) of the Signal Of Interest 
(SOI). It then adjusts the excitations (the amplitudes and phases of the signal) to produce 
a radiation pattern that focuses on the SOI, while tuming out any Signal Not Of Inferest 
(SNOI) [BEL02a] which resumes the effect of the smartness of the system. This scenario 
is very weIl illustrated in Figure 2.7. 
18 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
Desired ~ ~ ~ 
speaker ~ 
Undesired =IQ 
speaker 
Figure 2.7: SA system and functionality. 
An SA system is generally co-Iocated with a BS, as we mentioned earlier in this 
section, it combines an antenna array with a digital signal-processing capability to 
transmit and receive in an adaptive spatially sensitive manner. 
This digital signal processor is called the control unit, or the SA 's intelligence, it 
receives signal input from a smart scanning receiver and outputs to an antenna 
controller. The processor controls feed parameters of the antenna, based on several 
inputs, in order to optimize the communication link Here in the following sub-section 
we present the types of the SA system, for the evolution pro cess from the conventional 
base station (cell sectoring or splitting), passing by diversity techniques (switched 
diversity and diversity combining), to the SA 's base stations, which is beyond the scope 
of this thesis, and for more in-depth details, the reader is referred to [BELOl], and 
[IECOE]. 
19 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
2.4.3 Two differenf funcfionality approaches 
Tenns commonly heard today that embrace various aspects of a SA system 
technology include intelligent antennas, phased array, SDMA, spatial processing, digital 
beamfonning, adaptive antenna systems[IECOE], antenna array [SHI97], fully adaptive, 
and others. 
So, depending on the type of SA system, different optimization criteria càn be 
used at the transmitfreceive patterns are more than just the "antenna", but rather a 
complete transceiver concept [BHOOt]. Thus, SA systems can be customarily [IECOE] 
categorized as either switched or adaptive [BHOOI], [BEL02a], [IECOE], [CHROO]. 
2.5 Switched smart antennas (SBA) 
2.5.1 Description 
The switched approach is called switched beam [BHOOI], Switching-Beam 
Array [SEU99], switched-beam system [BEL02a], [CHROO]. In this work we refer it as 
the Switched Bearn Array or SBA. 
In the SA system, the SBA approach is considered as an extension of the current 
cellular sectorization scheme, in which a typical sectorized cell site is composed of three 
120-degrees macro-sectors, this approach subdivides the macro-sectors into micro-
sectors [CHROO]. The switched approach fonns multiple fixed beams with enhanced 
selectivity in specific area or in particular directions. 
These antenna systems will detect signal strength, and select one of the best 
[BHOOI], [BEL02a], [IECOE], [CHROO], predetennined, fixed beams for the 
subscribers as they move throughout the coverage sector [BEL02a], [IECOE]. So, 
20 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
during the caB, the system monitors the signal strength and switch to other micro-
sectors, if required [CHROO], which is illustrated in Figure 2.8 (a). Thus, instead of 
modeling the directional antenna pattern with the metallic properties and physical design 
of single element, the switched approach system couple the outputs of multiple antennas 
in such manner that it forrns finely sectorized or directional beams with more spatial 
selectivity than the conventional, single-element approach [IECOE]. 
@ Duplexer 
• RXMatrix 
iii! TX Matrix 
Figure 2.8: Switched-beam-array (a) functionality (b) block diagram. 
2.5.2 Advantages 
Concerning the materials of the system, it is a simple technique and comprises 
only a basic switching function between separate directive antennas or predefined beams 
of an array [BHOOl] (see Figure 2.8). A block diagram of SBA systems is illustrated in 
Figure 2.8 (b). The switch matrix should be quick and accurately switch the subscriber's 
channel to the correct bearn in which the user best signal, with no degradation to voice 
quality [BROO!]. 
21 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
The receivers take the RF energy from the multi-beam antennas and choose the 
appropriate beam [BEL02a] that gives the best Signal to Interference Noise Ratio (SINR) 
[BHOOI] or the one having the strongest signal [CHROO]. For more in-depth details 
information about this, please refer to [BHOOI], [SEU99]. 
The switched approach offer numerous advantages of more e1aborate SA systems 
at a fraction of the complexity and expense [CHROO]. Consequently, there are number 
significant improvements with such a system. 
They provide sorne range of extension benefits which yie1ds in a 40% increase 
in the range of the sector [BHOOl], reducing transmitting power by 3-6 dB, and for the 
base station too, which reduces interference introduced into co-channel cells [BHOOt], 
which consequently increases the gain of the system according to the location of the user 
[BEL02a]. 
Another very practical benefit of this approach is offering reduction in de1ay 
spread in certain propagation environments. On the other hand there are many 
limitations to system based on the switched approach. 
2.5.3 Drawbacks 
Since the beams are predetermined, the intended user may not be in the center of 
the beam [BEL02a]. The signal strength varies as the user moves through the sector, 
specially if the user moves the edge of the beam, the signal strength degrade rapidly 
before the user switched to another beam [BHOOI] , [CHROO]. Another limitation is 
when a switched beam system does not distinguish between a desired user and an 
interfering one [BHOOt], [CHROO], or in other words if the interferer is located at the 
22 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
same angle of the desired, it may be enhanced more than the desired user [BHOOl], 
[BEL02a], [SEU99] this results in poor quality due to low SINR. 
So the interference suppression is limited by the antenna bandwidth of the 
switched beam [BHOOl]. However, compared to conventional sectored cells SBA 
systems can increase the range of a base station from 20% to 200%, depending on the 
circumstances of operation. The additional coverage means that an operator Can achieve 
a substantial reduction in infrastructure costs [CHROO]. 
2.6 Adaptive smart antennas (TBA) 
2.6.1 Description 
The second approach is called digitally adaptive bearnforrning (AB), phased 
arrays, adaptive antenna array [BHOOl], adaptive array [BEL02a], [CHROO], tracking-
beam-array [SEU99]. In this work we refer it as the Adaptive Bearn Array (ABA). This 
approach deals with the communication and the base station in a very different way, in 
effect adding a dimension in space. 
Using a variety of new signal processing al gorithms , the adaptive approach 
systems provide more degrees of freedom [BEL02a], since they adapt the radiation 
pattern to the RF signal environment as it changes in real time. 
23 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
a) 
b) 
>-~ 
4: 
Interfere 
#1 
Desired usel! 
Interferer 
#2 
~ '--+JôlYLWI------{ 
c: 
Q) -c « 
Figure 2.9: Adaptive-beam-array, (a) functionality, (b) block diagram. 
This system takes advantage of its ability to effectively locate and track various 
types of signaIs to dynamically minimize interference and maximize intended reception 
[IECOE]. In other words, this system can direct the main beam towards the pilot signal 
i.e. the desired user or the SOI while suppressing the antenna pattern in the direction of 
interferers or SNOI [BEL02a], as illustrated in Figure 2.9 (a). Moreover, the adaptive 
array system can customize an appropriate radiation pattern for each user. 
In conclusion this system continuously differentiates between the desired signaIs, 
multipath and interfering signaIs as weIl as calculates their DOA by utilizing 
sophisticated signal processing algorithms. The technique constantly updates its 
transmitting approach based on changes in both the desired and interfering signal 
locations. It ensures that signal links are maximized by tracking and providing user with 
24 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
main lobes and interferers with nulls, because they are neither macro-sectors nor 
predefined patterns as the SBA system. A block diagram of this system is illustrated in 
Figure 2.9 (b). 
The RF signaIs from the M antennas are coherently down-converted to an 
Intermediate Frequency (IF) frequency, low enough for quality digitization of the 
signaIs. The bearn/armer then process the digital outputs for each channel, the process 
includes amplitude adjustments as weIl as phasing ad just ment s, which results in beam 
and nuII steering. So the ABA can be viewed as a spatial filter in which the pass and stop 
band is created along the direction of the signal and interference respectively [BHOOl], 
for more about this scenario, please refer to [BHOOl], [SEU99]. 
2.6.2 Advantages 
Although both system attempt to increase gain with respect to the location of the 
users, however only the adaptive system [BHOOl], [IEeOE], [SEU99] is able to 
contribute optimal gain while simultaneously identifying, tracking, and minimizing 
interfering signaIs, which can be seen from Figure 2.9 (a), that only the main lobe is 
directed towards the user while a null, being directed at an interferer. 
Figure 2.10 illustrates the beam patterns that each system might choose, in a 
scenario involving one desired signal and two co-channel interferers. The switched-
beam system is depicted on the left, while the adaptive system is on the right. Both 
systems direct their major lobe in the general direction of the SOI, but the adaptive 
system chooses a more-accurate placement, providing greater signal enhancement. 
25 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
b) 
c) 
T low 
Interferenca 
Adaptlve 
Desired I.J$if3r iii 
CoChannel Interfererer 
Adaptlve Bearn • 
Switt:hecl Bearn • 
l Conventlonsl sectorization • 
High 
Interferenœ 
6lW'Îronement envirooemenl 
Figure 2.10: Comparison between the ABA and the TBA a) Comparison of 
lobes, b) comparison for different environments, c) comparison for 
user tracking. 
In Figure 2.10 (a), the beamfonning lobes and nulls that SRA (left) and ARA 
(right) systems might choose for identical user signaIs and co-channel interference. 
While in Figure 2.10 (b), the coverage patterns for ARA and SRA for two different 
environments, low noise (left) and high noise (right). 
Figure 2.10 (c) describes an ARA supporting two users on the same conventional 
channel simultaneously in the same cell. The dark beam pattern is used to communicate 
with the user on the left, while right beam is used to communicate with the user on the 
right. In the right figure, there is an illustration of how the beam patterns for the ARA of 
the figure on the left are updated to accommodate the motion of the users. 
26 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
Similarly, the interfering signaIs arrive at places of lower gain outside the main 
lobe, but, again, in the adaptive system, the interfering signaIs receive maximum 
suppression [CHROO]. Figure 2.10 (b) shows a comparison, in terms of relative coverage 
area, of conventional sectorized, SBA, and ABA. 
In the presence of a low level of interference both types of SAs provide 
significant gains over the conventional .sectored systems. When a high level interference 
is present the interference rejection capability of the ABA provides significantly more 
coverage that either the conventional or SBA system [BEL02a], [IECOE], [CHROO]. 
Another important advantage of the next generation of adaptive-antenna systems 
1S its capability to "create" spectrum. Because the accurate tracking and robust 
interference-rejection capabilities, multiple users can share the same conventional 
channel within the same cell. System capacity increases through inter-cell frequency re-
use patterns, as weIl as intra-cell frequency re-use. Figure 2.10 (c) shows how this 
technology can he used to accommodate two us ers on the same conventional channel, 
simultaneously, in the same cell. 
The dark beam pattern is used to communicate with the user on the left, while the 
light beam pattern is used to communicate with the user on the right. It should be noted 
that every pattern has nulls in the direction of the other user. As the users move, the 
beam patterns arc constantly updated, to insure these positions. 
It is this ability to continuously modify the beam pattern with respect to both 
lobes and nulls that separates the ABA approach from the SBA type. As interfering 
signaIs move throughout the sector, the switched beam pattern is not altered, because it 
only responds to movements of the signal of interest. 
27 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
Unlike the SBA approach, the ABA system is able to continue to distinguish 
between the signal and the interferer, and to aIlow them to get substantiaIly closer than 
in the SBA system, while maintaining enhanced SINR levels. The most-sophisticated 
adaptive SA will hand off any two co-channel users, whether they are inter-ceIl or intra-
ceIl, before they get too close, and begin to interfere with each other [CHROO]. 
2.6.3 Drawbacks 
Studies and evaluations in terms of performance and complexity ofboth systems 
have been made [SEU99] and showed the advantage in terms of performance of the ABA 
over the SBA. However Despite all those advantages, the ABA systems have many 
drawbacks, unlike the SBA which is fairly easy to integrate in to the base station 
architecture, the integration of the ABA is difficult and complex [BHOOI], [BEL02a], 
[SEU99]. 
It should be noted that in many references they consider the ABA as the SA, 
knowing that both approaches are smart, and that the ABA is much smarter than the SBA. 
This is what dragged us down to tbis challenge, to find an algorithm that is very 
powerful and to prove its low complexity. 
2.7 General benefits 
An understanding of signal propagation environment and channel characteristics 
is significant to the efficient use of a transmission medium. In recent years, there have 
been signal propagation problems associated with convention al antennas and 
interference and aIl its consequences are tbe major limiting factors in the performance of 
28 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
wireless communications. Thus the introduction of SAs is considered to have the 
potential ofleading to a large increase in wireless communication systems perfonnance. 
The benefits of SA systems can be applied to ABA more than to SBA can be summarized 
and regrouped as follows: 
~ Increased range and coverage: [BOUOO],[ BEL02a], [IECOE], [ALE04], 
[CHROO], [JIN05] SA systems provide enhance<J coverage through range 
extension, focusing the energy sent out into the ceIl, hole filling, and better 
building penetration. Given the same transmitter power at the base station and 
subscriber unit, SA can increase range by increasing the gain of the base station. 
Lower power requirements also enable a greater battery life and smaller/lighter 
handset size 
~ Increased capacity: [BEL02a], [ALE04], [SHI97], [CHROO], [JIN05] SA systems 
can also improve system capacity by mitigating interference and allowing 
transmission of different data streams from different antennas. They can be used 
to allow the subscriber and the base station to operate at the same range as a 
conventional system, but at low power. This may pennit FDMA and TDMA 
systems to be re-channelized to reuse frequency channels more frequently than 
conventional systems using fixed antennas, since the SINR is much greater when 
SAs are used, so with SA the SINR is maximized when not only the desired wave 
or signal but also interfering waves arrive [OHI02]. In CDMA systems if SAs are 
used to allow subscribers to transmit less power for each link, the Multiple 
Access Interference (MAI) is reduced, which increases the number of 
simultaneous subscribers that can be supported in each cell. 
29 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
~ Better Security: [BEL02a] In a society that is becoming more dependent on 
conducting business and transmitting personal information, security is an 
important issue. The employment of SA systems diminishes the risk of 
connection tapping. The intruder must be situated in the similar direction as the 
user as seen from the base station. 
~ Reduced interference and multipath fading: [BOUOO], [BE;L02a], [IECOE], 
[SHI97], [CHROO], [JIN05] In wireless communications, SAs systems can be 
used to reduce multipath fading, in the transmitting mode by forming beams in 
certain direction and nulls in others, and in the receiving mode by knowing the 
directionallocation of the signal's source, and utilizing interference cancellation, 
thereby canceling sorne of the delayed arrivaIs. Usually, in the transmitting 
mode, the array focuses energy in the required direction, which helps to reduce 
multipath reflections and the delay spread. In the receiving mode, however, the 
array provides compensation in multipath fading by adding the signaIs emanating 
from other c1usters after compensating for delays, as weIl as by canceling signaIs 
emanating from directions other than that of the desired user. 
~ Reduced expenses: [IECOE], [CHROO] because of the power efficiency provided 
by the SA system, by combining the inputs of multiple elements to optimize 
available processing gain towards the user, which will resuit in lower amplifier 
costs and power consumption. 
> Better services: [BEL02a] the usage of the SA systems enables the network to 
have access to spatial information about the users. This information can be send 
to assess the positions of the users much more precisely than in existing network. 
30 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
This can be applied in services such as emergency caUs and location-specific 
billing. 
2.8 Future challenges 
Although SA systems are favorable in many ways, there are also drawbacks 
which include a more complex transceiver [BOUOO], [BEL02a] structure compared to 
traditional base station transceiver and a growing need for development of efficient 
algorithm for real optimizing and signal tracking. Thus, SAs base stations will no doubt 
be much more expensive than convention al base stations and the advantages should 
al ways be evaluated against the cost, which consist a big challenge for engineers, that's 
why they should be user where they are truly needed. 
Conceming the future challenges of SA systems, there are many enquiries. 
~ What are the strategies for next generation wireless systems [ALE04]? 
~ Trends and challenges of SAs [ALE04]? 
~ What will be the characteristics of the SA in the future? 
~ When will SAs be ready to market [KAI05]? 
In conclusion we can say that the next generation of SA is the capability to 
"create" spectrum, so multiple users can share the same conventional channel [CHROO], 
and the ability to reject aIl kinds of interference, and to locate and track the desired users 
maintaining a high SINR, and a nulls for aIl interfering users. 
In general, the next-generation of wireless systems require signal processing 
techniques capable of operating in wide variety of scenarios with respect to propagation, 
traffic, interference, user mobility, antenna configuration, radio access technology and 
Channel State Information (CSI) reliability. 
31 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
Yet, the biggest challenge for us takes two approaches. The first is to develop an 
efficient adaptive algorithm, giving the best performance, in different scenarios, and the 
second approach is to implement it in a very low complexity, guarding always a high 
Performance Over Complexity Ratio (POCR). 
2.9Summary 
A global overview of SA systems was studied and discussed. The reasons 
explaining the needs for this technology are given. The analysis of the multiple access 
schemes is given too, from the FDMA to SDMA. A basic and a systematic approach of 
the SA were given. The advantages and disadvantages of the main categories of SAs, 
switched beam array SBA and adaptive beam array ABA were presented and discussed. 
The next chapter presents the research for the adaptive algorithm leading the SA 
system to give his best response whether towards a desired user or an interferer, in any 
wireless communication environment. 
The best candidate will play the role of the brain in the human analogy, and the 
role of the Digital Signal Processor (DSP) in the SA analogy. This candidate will push 
the system to its best performance ever in any wireless environments. 
32 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
Chapter 3 
Performance Evaluation of Beamforming Techniques 
3.1 Introduction 
There is a very growing need for improvements that increase coverage, capacity, 
quality [ROY97] and reliability in many wireless communication systems. Optimal 
exploitation of new dimensions through SAs technology has the potential to meet these 
needs while providing operators the ability to offer new value-added services to their 
costumers. SAs are recognized as a key technology in 30 networks. They offer a mixed 
service capacity gain of more than 100% and hence reduce to less than half the number 
ofbase stations required. 
A SA 's ability to abate both multipath and CCI makes it one of the leading 
candidates [JIN05] and one of the most promising technologies for the enabling ofhigh 
capacity wireless networks. Since SAs are more expensive than conventional base 
stations, they should be used where they are truly needed. AIl characteristics and 
advantages were discussed in Chapter 1. 
33 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
The most important contributions of an SA is its capability of changing its pattern 
in real time in order to track the user with its maximum gain while trying to minimize 
the interference by placing the nulls of its radiation pattern at location of the other users 
[PEN02]. 
Many questions came to the mind. How the SA is able to do that? What stands 
behind theses scenarios? How do es the antenna steer its beam toward the user? Is there 
an engine behind that? 
In fact, there is a complex mathematical process standing behind that physical 
reaction. This process is called beamforming, which is referred as the heart of SA or the 
main core ofit. We describe it as the backbone of the SA system. 
The beamforming is a process in which each user's signal is multiplied with 
complex weights that adjust the magnitude and the phase of the signal from each 
antenna. Hence the array forms a transmit beam in the desired direction and minimizes 
the output in other directions. 
There are two types of beamforming constituting the two types of SAs presented 
in chapter t, the Switched-Beam-Array (SBA) where the complex weights are selected 
from a library of weights that form beams in specifie, predetermined directions, this 
process is called Switched Beamforming (SB); and the Adaptive-Beam-Array (ABA) 
where the weights are computed and adaptively updated in real time, iteratively, this 
process is ealled Adaptive Beamforming (AB). 
Due to the numerous advantages of the AB over the SB, the SB is no further 
discussed in this work, and for more about this process the reader is referred to 
[00004], [SEU99], and [BROOt]. 
34 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
For better understanding, consider a SA system with M antenna elements, whose 
signaIs are processed adaptively in order to exploit the spatial dimension of the mobile 
radio channel. Figure 3.1 illustrates an example of a narraw beamfarmer or Delay and 
Sum Beamfarmer. In order to accomplish the tracking in the real time such system has to 
have information about location of aH users in the system relative to the array and adjust 
the weight to make the phase and magnitude changing. 
r----r-l> y (t ) 
Figure 3.1: Delay and Sum Beamformer 
As a result, the nuns of the radiation pattern are placed to the interference while 
the peak ofthe radiation pattern is positioned towards the user, which amplifies its signal 
prior to its arrivaI at the radio. 
In other words, ABA systems have the ability to nun out the interfering signal 
(uncorrelated) and steer the main beam in the direction of the SOI [PEN02]. A detailed 
schematic model of the functionality ofthis system is illustrated in Figure 2.9. 
Based on computing the DOA of the signal by a DOA algorithm which is beyond 
the scope of this work (for more of DOA algorithms, please refer to [GOD97b], 
[GOD04]), the AB algorithms implemented in the digital signal processor adapt the 
complex weights, leading the SA system to give his main beam towards SOI, and nuns 
towards SNOl. 
35 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
Since the successful application of SA technology not only depends on the 
knowledge of the propagation channel, however, but also on the choice of the AB 
algorithm applied at the base station [SHI98], SA concept has been extensively studied 
and most of the research activities have been dedicated to the algorithm development, 
theoretical and simulation analysis. 
In addition to that the choice of an SA system and algorithm today is highly 
dependant on the air interface and its parameters [BOUOO]. Among the most critical 
parameters are the multiple access method, the type of duplexing, pilot availability, 
channel, modulation, diversity, and frame structure. 
Besides the compatibility with the air, the level of lntegrated Circuit (le) 
technology on DSP, FPGAs and Application Specifie lntegrated Circuits (ASICs), can 
also be limiting factors for implementation of SA algorithms. 
For aIl these reasons, the main contribution of this thesis takes on searching for 
the best adaptive algorithm for beamforming, allowing the antenna to give his main 
beam towards the SOI and null towards the interferer or the SNOI. 
3.2 Signal model 
Adaptive signal processing involves the manipulation of signaIs induced onto the 
elements of an array [CHROO]. The AB techniques exist that can yield multiple, 
simuitaneously available beams. The beams can be made to have high gain and low sicle 
lobes, or controllecl beamwidth. 
The AB techniques dynamically adjust the array pattern to optimize sorne 
characteristics of the received signal. In beam scanning, a single main beam of an array 
is steered and the direction can be varied either continuously or in the small discrete 
36 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
steps. The ABA system uses AB techniques in order to reject interfering signaIs having a 
DOA different from that of the desired user. 
A signal model used for array processing is presented in this section. For more 
extended and in-depth details, the reader is referred to [GOD97b], [JINOO], and 
[GOD04]. 
Consider a reverse link (uplink) from K mobile stations to a base station with 
multiple antennas. Let the baseband traffic transmission (stream) from the kth mobile 
00 
station be X(k) (t) = L ~k)S(k) [n ]d(k) (t -nT;n), where T represents the symbol 
n=-oo 
interval; s(k) [n] ES (S is the symbol alphabet; S ~ {±1} for Binary Phase Shift keying 
(BPSK) case and S ~ {± 1/.fi ± j /.fi} for Quadrature Phase Shift Keying (QPSK) case, 
the nth symbol; and d(k)(t;n) the spreading waveform at the nth symbol of the kth 
transmission. 
N-l 
The spreading waveform is given byd(k) (t;n) = ~>(k) [f;n ]91(t -fT,,) , where T" 
t=o 
is the chip interval; N = TIT" the processing gain; C(k) [f;n] the .e th element of the 
spreading sequence for the nth symbol of the kth transmission; and 91 (t) is the chip 
waveform ofunit energy. 
Assume that the base station is equipped with L antenna elements and 
a ( B) denotes the array response vector to the signal with angle of arrivaI (A OA) B . If the 
angle spread of the nominal AOA B, denoted by!:lB, is sufficiently small, the channel 
vector h(k) can be written as h =:; aa( B). Notice that the spatial signature vector can be 
37 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
characterized by a single nominal AGA B and a single attenuation factor a under the 
local scatter model. 
In general a can be considered as a random variable. For Rayleigh fading 
channels a would be a complex Gaussian random variable. For M multipaths, there 
would be M spatial signature vectors corresponding to each path. 
In agreement with the current 3G CDMA transmission schemes, the pilot channel is 
orthogonally multiplexed with the traffic channel. 
First, letM = 1. Therefore, the baseband received signal vector from m(k) (t) can 
be written as 
00 
r(k) (t - T(k) ) = L ~k)h(k)S(k) [n] d(k) (t - T(k) - nT; n) , 3.1 
n=--oo 
where T(k) is the transmission delay. Instantaneously, the pilot transmission has been 
received. The received k th pilot transmission is given by 
3.2 
where ~k) is the amplitude of the pilot transmission; S(k) [n] the pilot symbol; and 
~k)(t;n) the spreading wave form of the kth pilot transmission. For simplicity, the 
pilot symbol, S(k) [n] is assumed to be fixed, i.e., S(k) [n] = s and the spreading waveform 
N-l 
iS~k)(t;n)= Lë(k) [p;n]9?(t-CI:) ,where ë(k)[C;n] is the spreading sequence for the 
1=0 
k th pilot transmission. In general, the amplitude of the traffic transmission is greater 
38 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
than that of the pilot transmission, i.e. ~k) > A"rk). The received signal vector at a base 
station is written as 
K 
y(t) = I( r(k) (t -r(k))+ P(k) (t - r(k))) +n(t) 
k=l 
3.3 
where n(t) is the zero-mean background noise with E[ n(t)nH (t) ] = (No/2)I6(t -r). 
Let y [n] denote the sample vector sequence of y (t) from analog-to-digital converters 
following the bank of the L matched filters for the chip waveform cp (t ) . 
The sampled matched filter outputs can be written 
asy[C]= [y(t-r)cp(r)drl . Assuming despreaders for the first transmissions, 
00 l=lT"H(I) 
the despread signaIs for the desired traffic and pilot transmissions are wriUen as [JINOO] 
3.4 
and 
3.5 
where ir,(l) [n] and ip,(l) [n] are Interference Plus Noise (IPN) vector sequences in the 
despread traffic and pilot transmissions, respectively [JINOO]. For multipath fading 
vector channels of M paths, the received signal vector is written as 
K M 
y{t) = I I(r(k),m (t -'(k),m) +P(k),m (t - '(k),m)) +n{t) 
k=1 m=l 
3.6 
39 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
where r(k).m (t) [P(k),m (t) ] is the received signal vector due to the k th traffic (resp., pilot) 
transmission in the m th path of the channel and r(k).m is the corresponding transmission 
delay. To deal with the multipath components independently after dispreading, we 
assume that the multipath components are resolvable, i.e., minlr(k).m -r(k).m,1 > ~ for 
m ;:f:. m'and all k . 
In each path, the channel vector is given byh(k).m' Assume that the propagation 
delays of the desired transmission are known. For the m th multipath component, the 
sampled matched filter outputs are given bYYm[e]= [Y(t-r)q,>(r)drlt=lT,H(IJ,m 
Through the dispreading operation, we can get despread traffic and pilot vectors of each 
path. The despread traffic of the m th path is written as 
3.7 
where ir.(l).m [n] is the IPN vector in the despread traffic vector sequence of the m th 
path. In addition, the despread pilot vector of the m th path is written as 
P [n] = A. h s + i [n] (l).m --(1) (l).m p.(l).m 3.8 
where ip.(l).m [ n] denotes the IPN in the despread pilot vector sequence. The subscript (1) 
to denote the first transmission will be omitted hereafter for notational convenience 
except when explicitly needed. Furthermore, without 10ss of generality, we assume that 
rewritten as 
40 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
3.9 
respectively. 
In [NAG96], the two-dimensional Rake (2D-Rake) combiner, wherein the 
beamforming weights are computed according to the approach presented in [SUA93], 
has been proposed for a base station equipped with multiple antennas. Recently, the 
approaches in [SUA93] have been modified to account for the pilot channel presence, 
wherein the processing is, instead, performed on the received signal after dispreading 
[JINOO]. 
Moreover, in the presence of a training sequence, an interesting pool of adaptive 
algorithms can be used to compute the beamforming weights for the 2D-Rake receiver 
[HA Y96). 2D-Rake combiner can provide the estimate of the symbol sJ [ n] using the M 
vectors LxI given in (3.7). 
In addition, if the IPN vectors are temporally uncorrelated and possibly spatially 
correlated [JINOO] , 
where"\ [n ] ~ [i~.J [n ] i~.2 [n] ... i~.M [ n ]J, then, the maXImum SINR (MSINR) 
symbol estimate (decision variable) is given by [JINOO] 
M 
s[n]= Lw~rm[n] 3.10 
m=J 
where 
W = W W ... w =R:- h A[ JT 1 
m m,l m,2 m,L I,r,m m 3.11 
41 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
In the following subsection, we will review the approaches to compute the 
weight vectors W m that are used to combine the dispread signaIs from L antennas 
[NAG96], [SUA93] and [JINOO]. 
It is worth mentioning that in the code-filtering approach [NAG96], [SUA93]; 
both the received signal vector y m [.e] and the dispread signal vector r
m 
[n] are used. 
Since the received signal vector y m [.e] is used, the computation at the chip rate is 
requiredOn the other hand, the approaches in [JINOO] only use the dispread signal vector 
rm [n] to compute the weight vectors. It implies that the approaches in [JINOO] have an 
advantage over the code-filtering approach in the computation. 
3.3 Non adaptive pilot-channel aided beamforming techniques 
3.3.1 Direct Approach (MRC) 
Pilot channel-aided channel vector estimation is possible because the pilot 
symbol in the pilot channel is known. A simple method is the averaging technique to 
estimate the channel vector. The estimate of the channel vector associated with the m th 
path is given by 
3.12 
where J is the number of sample vectors for averaging. Altematively, the estimate of 
the vector hm in 3.12 can be recursively computed using Wiener LMS algorithm for time 
varying channels [LINOl]. Wiener LMS considers and efficiently incorporates the 
channel dynamics into the filteringlsmoothinglprediction process. It was successfully 
42 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
used to estimate and track time varying channel in WCDMA (Wide band CDMA) and 
CDMA2000 systems [OUA05]. This approach is also called Maximum Ratio Combining 
(MRC), wherein the weight vector is made equal to the channel vector. 
3.13 
3.3.2 Direct Approach (DMI) 
In the eigen-decomposition approach, since the channel vector estimate is 
computed from the eigenvector, the power algorithm or steepest ascent algorithm can be 
used [GOL83], [OJA83]. 
In addition to the estimates of the channel vectors, estimates of the covariance 
matrices of the IPN vectors are required. Let us define the two covariance matrices as 
Ri,p.m ~ E [ip.m [n] i:'m [n ]]. It has also been observed that Ri.r.m = Ri,p.m = Ri.m [JINOO]. 
The estimates of Ri.r,m and Ri.p,m are obtained by 
3.14 
The covariance matrices of the IPN vectors can be estimated as 
3.15 
The above estimates are not guaranteed to be positive definite. The weight vectors 
W mare now given by 
43 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
3.16 
from the estimates in 3.13 and 3.15. Notice that the above approach to provide the 
weight vector is a Direct Matrix Inversion (DM!) approach in [MON80]. In addition, 
there is no scalar ambiguity with this approach. This algorithm is later re-designed (SD-
DMI), to reduce its complexity for a hardware implementation 
3.3.3 Eigen-Decomposition Approach 
In the presence of the pilot transmission, the eigen-decomposition approach in 
[MON80] is modified as follows. Using (13), the estimate of the channel vector hm is 
obtained by the eigen-decomposition [GOL83] of Rhm =-l-(Rrm -R ) [JINOO]. 
, 1-'7 ' p,m 
Let êh m denote the eigenvector associated with the maximum eigenvalue, ~,m' of the 
matrix Rh,m . Therefore, hm can be estimated as 
3.17 
The estimate in 3.17 still has the phase ambiguity. Using the dispread pilot 
transmission Pm [ n ] , the phase ambiguity can be resolved by 
vector hm can be written as 
3.18 
In order to compute the weight vectors W m , the covariance matrices Ri,m shaH be 
estimated. From 3.14, the estimate of the covariance matrix Ri,m is now given by 
44 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
A 1 (A A) R =- R - R 
I,m 1-17 p,m 17 r,m 3.19 
It is worth to mention that the estimate Ri,m in 3.19 can be replaced by the estimate in 
3.15, while hm is given in 3.18. 
3.3.4 Code Filtering Approach 
The approach in [SUA93] operates on the received signal y m [i] and the dispread 
signal r
m 
[n] . Note that since the pilot transmission is available, the scalar ambiguity can 
be resolved as weIl. The latter approach, referred to us as "Chip-EVD", can provide the 
optimal weight vectors. 
Let eh m denote the eigenvector associated with the maximum eigenvalue, Â,.,m 
of the generalized eigen decomposition of the matrix pencil (Ry,m' Rr,m) , so that 
3.20 
Note that the complexity burden of the Chip-EVD technique stems mainly from the 
computation ofRy,m' The estimate in 3.20 still has a phase ambiguity. Using the 
dispread pilot transmission Pm [n], the phase ambiguity can be resolved by letting 
j;m = L(L~:~P~ [n]ch,m)- L(s) so thatthe estimate of the vector hm can be written as 
3.21 
The performances of the Chip-EVD approach are found to be affected by the 
processing gain being used instead of PTR as in the EVD approach [NAG96], [SUA93]. 
45 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
As the processing gam approaches unit y, the Chip-EVD approach suffers from 
performance degradation. 
3.4 Adaptive pilot-channel aided beamforming techniques 
The literature on applying the adaptive algorithm to compute the weight vectors 
is abundant [HA Y96]. Recently, an attractive LMS based adaptive technique based on 
incorporating partial information on the channel is suggested [WEI02], the adaptive 
scheme faIls under the variable-step-size LMS family. From the other side, unlike LMS 
based methods, RLS technique, yet more complex, is not sensitive to the eigen spread of 
the covariance matrix of the received signal. 
Eigen spread is more pronounced in case of near far situation and time varying 
channel conditions [CAIOO]. Adaptive techniques can be applied using pilot 
transmission as a training sequence. The weight vector is computed according to the 
following MMSE criterion 
3.22 
The processing is performed after dispreading which maintain the low 
implementation complexity. As the pilot strength may be as low as few dB 's compared 
to the traffic transmission power, sorne adaptive algorithms (eg LMS and RLS) may 
suffer from the high interference level due to traffic transmission. A two stage approach 
[CAIOO] recently applied for multiuser detection can be used. 
In [CAIOO], in contrast to the decision directed approach, which suffers from 
losing tracking time varying channel after deep fades, a two stage strategy is adopted 
wherein no data is needed. The first stage is implemented in a blind mode while the 
46 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
second stage uses the pre-detected symbols as a training sequence. In this work a two-
stage approach is perfonned. However, the blind stage is replaced by a non-blind 
adaptive scheme using despread pilot transmission. 
The second stage considers the deteeted traffie transmission from the first stage 
to compute the weight veetor of the second stage. At low PTR ratio, as low as -12dB, the 
two-stage approach provides perfonnanee improvement over the single stage one. 
3.4.1 Leasf Mean Squares (LMS) 
It is a eommonly used algorithm. The LMS algorithm is a stochastic gradient 
algorithm, whieh consists of two basic processes: the first one is Filtering which 
involves (a) computing the output of a transversal filter produced by a set of tap inputs, 
and (b) generating an estimation error by comparing this output to a desired response; 
the second process is adaptation which involves the automatic adjustment of the tap 
weights of the filter in accordance with the estimation error [HA Y96]. For more about 
this algorithm the reader is referred to [GOD97b], [GOD04], [HAY96], [M0099], and 
[SL093]. 
The LMS algorithm can be summarized as follows: 
1. The filter output is computed as 
y m [ n] = w~ [ n ] Pm [ n ] 3.23 
2. Estimation error is computed as 
3.24 
3. Weight adaptation 
3.25 
47 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
WhereH denotes the transpose, and * denotes the conjugate. Pm [n] is the input 
signal, s [ n ] is the reference signal, n is the number of iterations, Lis the number of 
taps or receiving antennas, fi is the adaptation step size, between 0 and 
2 where L - Input _ power is the sum of the mean-square values of aIl 
L - Input _ power 
the tap inputs in the filter: 
L-1 
L-Input_power= .LE[IPm(n-k)21] 
k=O 
3.26 
3.4.2 Normilized Least Mean Squares (NLMS) 
The NLMS algorithm is a companion of the ordinary LMS algorithm [HA Y96]. It 
has been introduced to resolve certain problem concerning the input vector in the LMS 
algorithm. That problem can be summarized as follows: As in the standard LMS 
algorithm, the correction ,uPm[n]e*[n]applied to the tap weight vector wm[n] at 
iteration [n + 1] is directly proportional to the tap-input vector Pm [n] . 
Therefore, when Pm [n] is large, the LMS algorithm experiences a gradient noise 
amplification problem. So the correction applied to the tap-weight vector W m [n] at 
iteration [n + 1] is "normalized" with respect to the squared Euclidean norm of the tap-
input vector Pm[n] at iteration[n], hence the term "normalized". For more about this 
algorithm, the reader is referred to [GOD04], [HAY96], [M0099], and [SL093]. This 
algorithm can be summarized as follows: 
The NLMS algorithm can be summarized as follows: 
1. The filter output is computed as 
48 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
y m [n ] = w~ [ n ] Pm [ n ] 3.27 
2. Estimation error is computed as 
3.28 
3. The weight adaptation is computed as 
3.29 
Where H denotes the transpose, and * denotes the conjugate. Pm [ n] is the input 
signal, s [ n] is the reference signal, n is the number of iterations, Lis the number of 
taps or receiving antennas, Ji is the adaptation step size, and a is a small constant. 
3.4.3 Noise Constrained Least Mean Squares (NC-LMS) 
NC-LMS is a type of variable step-size LMS (VS-LMS) algorithm where the step-size 
mIe arises naturally from the constraints. VS-LMS algorithms are a popular modification 
of requirements of large step-size to maximize convergence rate and small step-size to 
minimize mis ad just ment. 
What is particular about this algorithm is that it exploits knowledge of channel noise 
variance for identification and tracking of Finite Impulse Response (FIR) channels. So 
the knowledge of the noise variance might be useful in selecting search directions and/or 
step-size in an adaptive algorithm. This algorithm has not been applied to beamforming 
yet, in the aforementioned papers. 
We chose this algorithm to evaluate its complexity and performance comparing to 
other adaptive algorithms. For more about this algorithm the reader is referred to 
[YONOl]. 
49 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
The NC-LMS algorithm can be summarized as follows: 
1. The step size adaptation is computed as 
am [n'] = a(1 + yÂm [n']) 3.30 
Âm [n' +IJ = Âm [n'J+ p(i{e; [n']-a; )-Âm [nlJ) 3.31 
2. The estimation error is computed as 
3.32 
3. The weight adaptation is computed as 
W m [n' + IJ = W m [n'J+am [n'Jem [n'Jpm [n'J 3.33 
where s[ n'J is the reference pilot signal, Pm [n'J is the input signal, 
am = iL E{II i:'m [n}p,m [n'] Il) is the average IPN variance across the array, Gm [n'J is the 
estimation error and W m [n'J is the weight vector. u, ~ and y are three constants, am [ n'J 
and Âm [n'J are used for step-size adaptation. 
3.4.4 Recursive Least Squares (RLS) 
The RLS is an extension of the method of the least square, where the update 
estimate of the tap-weight vector is at iteration n upon the arrivaI of new data is 
computed. This algorithm may be viewed as special case of the Kalman filter [HA Y96]. 
For more about this algorithm the reader is referred to [GOD97b], [GOD04], and 
[HAY96]. 
The RLS algorithm can be summarized as follows: 
1. Initialization 
50 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
3.34 
2. The gain vector is computed as 
3.35 
3. The estimation error is computed as 
a[n] =d[n]-wH [n-l]u[n] 3.36 
4. The weight adaptation is computed as 
w[ n] = w[ n-l]+k[ n]a*[n] 3.37 
5. The Correlation matrix adaptation is computed 
3.38 
Where 0 is a small positive number, 1 is the L * L identity matrix, where His the 
number of taps. IL is the forgetting factor, between 0 and 1, u [ n ] is the input signal, H 
denotes the transpose, and * denotes the conjugate, d [ n] is the reference signal and 
[ n ] is the number of iterations. 
3.4.5 Set Membership Identification (SM/) 
SMI theory is extended to the more general problem of linear-in-parameters 
filtering by defining a set-membership specification, as opposed to a bounded noise 
assumption. It is one of Optimal Bounding Ellipsoids (OBE) algorithms. 
It is known as the set-membership NLMS algorithm, it has an optimized adaptive 
step size and it has never been applied to beamforming. For more about this algorithm, 
please refer to [GOL98]. 
The SMI algorithm can be summarized as follows: 
51 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
1. The error estimation can be computed as 
o [ n ] = d [ n ] - aH [ n -1] x [ n ] 
2. The weight adaptation is computed as 
a[ n] = a [n -1] + a [n ] J* [ n ] x[ n] 
xH [n ]x[ n] 
3.39 
3.40 
3. The adaptation of step-size matrices can be computed as 
a[n]=fl-1J[n]l,ifIJ[n]I)1 
lo,else 
3.41 
Where d [n ] is the reference signal or the desired response, x [n] is the input 
signal, H denotes the transpose, and * denotes the conjugate, [n] is the number of 
iterations, a [ n ] is the function that adapts the step-size, and ris the only variable 
parameter in this algorithm (between 0 and 1). 
3.5 Simulations results 
In order to compare the performance of the beamforming approaches described 
above, consider the following system parameters and channel conditions. A base station 
is equipped with an Uniform Linear Array (ULA). The ULA eonsists of half wavelength 
spaced L antennas. The angles of arrivaIs are uniformally distributed within[ 0 Bmax]' 
where B max = 1200 The array response vector a( B) IS modeled as 
a( e) = [1, exp (j:r sin ( e) ), ... , exp(j( m -1):rsin( e)), ... , exp(;"(L-l )Jrsin( enT 
. These settings, along with good antenna spacing, result in E[ a( e) aH (e) ] =:: ILxL 
[SUA93]. Two orthogonal codes are used for the traffie and pilot transmissions. 
Considering a processing gain of 8, the orthogonal codes for the traffic and the pilot 
52 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
transmissions are {l, -1, 1, -1, 1, -1, 1, -1} and {l,l,l, 1, 1,1,1,1} respectively. The 
different mobile stations are identified by their assigned unique random spreading 
(scrambling) sequences. Therefore the spreading sequences for each mobile can be 
viewed as the multiplication of the orthogonal codes and the random spreading 
sequences. For the unite energy pulse shaping filter, the root raised cosine with a roll off 
factor of 0.22 is used. 
The channel is modeled as a 3 path time varying Raleigh fading process. The 
paths relative delays are [0, 1.1, 3.19],us with the respective relative powers of 
[0, -3, -9]dB. The absolute delays are randomly generated so that the maximum delay 
do es not exceed the round trip delay within the base station coverage range of 5km cell 
radius. The SINR for the additive white Gaussian noise is set to 10 dB. 
Other system parameters include the carrier frequency of 2GHz, the chip rate of 
3. 84Mcps, and the mobile speeds of 60km/h -important for time-varying fading 
generation. The figures are representative of WCDMA system parameters. Unless 
otherwise stated the number of antennas is 4, the PTR is set to -6dB and the number of 
us ers is 5. 
The transmission is packetized as follows; a normalized slot size of 2560 chip is 
considered as a basic packet, then frames consisting of 15 slots are constructed (worth 
38400 chips). For the different approaches, we consider that the number of samples 
available for the parameters update lS 2 slots worth of sampI es 
(2x2560j8 =2x320samples) after dispreading. Below we analyze the effect of PTR, the 
array's size and number ofusers on the different approaches' performance. 
53 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
3.5.1 PTR Effecf 
The effect of PTR is analyzed by sweepmg the PTR over the range of 
[-12 0] dB. Tt has been found that the EVD approach exhibits perfonnance degradation 
as the PTR increases [8]. Figure 3.2 below reveals the same effect. As for the DM! 
approach, a BER degradation in the low PTR region is observed. 
Tt has been shown that the degradation could be higher due to channel estimation 
error using despread pilot transmission averaging [JINOO]. We observed, through 
simulations, that averaging over two slots provides good channel vector estimate. With 
the Chip-EVD approach, the perfonnance is as good as the MRC approach. This 
relatively expected poor perfonnance is not affected by the level of PTR, However, it is 
due to the low processing gain being used (here 8). Notice that the adaptive approaches 
based on RLS are perfonning equally weIl at high PTR. Interestingly, at low PTR level 
the two stage RLS (TS-RLS) technique provides a relatively beUer BER perfonnance. 
0.18 
0.14 
n: 
~ 0.1 
0.06 
0.0 
-12 
-MRC 
---e- DMI 
-és- EVD 
• Chip-EVD 
--R- NC-LMS 
>- RLS 
--4-- TS-RLS 
-10 -8 -6 
P1R(dB) 
-4 
Figure 3.2: PTR effect. 
-2 o 
54 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
3.5.2 Number of Antenna Effect 
To analyze the effect of the number of antennas on the different approaches' 
performances, we set the PTR level to -6dB and maintain the number ofusers to 5. The 
results in Figure 3.2 reveals that aIl the beamforming approaches benefit from the 
increase in the number of antennas. It has been shown in [JINOO] that the DMI and EVD 
approaches' performances do not get improved after lets say 4 antennas. 
et: 
w 
CD 
- MRC 
--6- DMI 
----8- EVD 
...... Chip-EVD 
~ NG-LMS 
[> RLS 
-4- Two-Stage-RLS 
2 3 4 5 6 
L, Antennas 
Figure 3.3: Number of antenna effect. 
7 8 
This observation is attributed to the errors in the estimation of the channel vector 
and IPN covariance matrix. Utilizing an averaging window of 2 slots in our simulation 
helps to overcome such problems. The relative performances are kept the same as in the 
previous subsection. It is worth to notice the poor performance of the EVD as compared 
to other schemes. This is attributed to the relatively high PTR level (-6dB) being used. 
The Chip-EVD approach suffers from the performance degradation due to possible 
55 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
relatively low processmg gam. Once again the two-stage RLS technique seems to 
outperform aIl other methods. However, nor substantial improvement is relatively 
recorded, neither for the single stage RLS technique nor for the DM!. 
3.5.3 Number of Users Effect 
The last experiment involves the effect of MA!. With a PTR level to -6dB and 
array size of 4 antennas. The results (see Figure 3.4) show that the adaptive two-stage 
approach performs better than all the remaining approaches at different system loads 
(capacity, number ofusers). Compared to the EVD and Chip-EVD approaches, the two-
stage RLS technique provides a relatively substantial capacity increase. However, DM1 
and even MRC are found to compete with almost similar performances. 
a: 
w 
co 
3 4 5 
K, Us ers 
- MRC 
--e- DMI 
-A- EVD 
.'" Chip-EVD 
-*- NC-LMS 
-fr- RLS 
--4- lS-RLS 
6 
Figure 3.4: Number of us ers effect. 
7 
56 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
3.6 Summary 
In beamforming, both the amplitude and phase of each antenna element are 
controlled. Combined amplitude and phase control can be used to adjust side lobe levels 
and steer nulls better than can be achieved by phase control al one. 80 as a result, the 
beamformer for a radio transmitter applies the complex weights to the transmit signal, 
shifts the phase and sets the amplitude for each element of the antenna array. 
Beamforming techniques, adaptive and non adaptive approaches, have been 
discussed in this chapter. Simulations results have been presented based on a DS-CDMA 
platform, and for different parameters. The purpose of this evaluation study is to propose 
an algorithm which is favorable for hardware implementation. 
Thus, the global target is to achieve a performance/complexity tradeoff; however 
when an efficient algorithm is presented such as the DM! a challenge in the 
corresponding architecture is about to begin. As we have mentioned before, that the 
algorithm although his efficiency and accuracy remains invaluable unless someone 
succeeds to implement it at a moderate cost. 
Therefore it is important to look for a performance/complexity tradeoff when 
searching for the best candidate for SAs, however it's more important to drive up the 
design level by implementing a complex algorithm, and most importantly is to redesign 
that complex algorithm so that non-complex low power architecture can be revealed. 
These hardware challenges take on the next chapter, where different 
implementations methods and techniques for different beamforming algorithms are 
presented and discussed. 
57 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
Chapter 4 
FPGA Implementation of Beamforming Techniques 
4.1 Introduction 
According to [GAM05], wireless infrastructure revenue continues to experienee 
phenomenal growth, increasing from approximately $27 billion in 2003 to an estimated 
$35 billion in 2004. Industry analysts are predicting that 2004 will be the peak revenue 
year, as forecasts show the revenue figure dropping back to $27 billion in 2005, 
eventually settling in to the $10-$15 billion range by the end of the decade. This revenue 
decline is driven both by lower priees as weIl as a drop in base station deployrnents, 
from nearly 500,000 stations in 2004 to less than 200,000 in 2010. 
To begin addressing the challenge, wireless BS designs are shifting from ASIe 
technology to readily available off-the-shelf components such as FPGAs which are 
increasingly being used for signal processing applications. This shift is driven both by 
declining annual base station unit volumes as well as FPGA technology improvernents 
that increase processing power and enable a much lower cost per channel [GAM05]. 
58 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
FPGAs are increasingly being used for signal processing applications. They 
provide the necessary perfonnance and flexibility to tackle many of today's most 
challenging DSP applications, from MIMO digital communications systems to H.264 
encoding to a high definition broadcast system. Within such systems, FPGAs are ideally 
suited high-perfonnance tasks traditionally served by ASICs and ASSP (Application 
Specified System Processor) [TAH05]. 
The migration to FPGAs is not just an attempt to reduce costs and create a 
common platfonn to achieve commoditization, but it also being driven by time-to-
market pressures, along with the need to make in-field upgrades of BS deployments 
[GAM05]. 
Hence, among the major challenges is the ones associated with the 
implementation of future wireless communication systems (e.g., bearnfonning receiver). 
The design of low-complexity beamfonning algorithrns and corresponding VLSI (Very 
Large Scale Integration) architectures (or re-designing existing algorithrns so that 
massive parallelism and bit level arithrnetic present in these algorithrns can be revealed 
and efficiently implemented in VLSI architecture) constitutes an interesting exarnple. 
In this chapter we present an FPGA implementation of three beamforming 
algorithms: Maximum Ratio Combining (MRC), Noise Constrained Least Mean Squares 
(NC-LMS) and Direct Matrix Inversion (DMI). The first two methods were implemented 
using rapid prototyping techniques while the third one is implemented using hand coded 
VHDL (VHSIC Hardware Description Language). Fixed point data representation and 
arithrnetic are investigated to evaluate the effect of finite word length. The targeted 
FPGA devices are Virtex-II, Virtex-II Pro and Virtex-IV farnilies ofXilinx. 
59 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
4.2 Implementation of Beamforming algorithms 
Considering the hardware implementation for beamforming algorithms, few past 
works have been reported. In [POW93], a reduced sampling rate beamforming technique 
for the reception of acoustic data transmitted through the water has been implemented on 
a DSP from Texas Instrument. In [NUT02], the authors present an FPGA 
implementation using Xilinx devices and calibration methodologies for a digital 
beamforming system consisting of eight channels and operating in L band (1.8 to 2GHz). 
A comparison of efficient broadband beamforming architectures is presented in 
[WEI02]; however no comparison on hardware resource is reported. 
A low power ASIC implementation of 2Mbps antenna-rake combiner supporting 
both Maximum Ratio Combining (MRC) and Least Mean Square (LMS) filter is 
presented in [T AR05] where a 107 MHz clock frequency and 550m W as highest rate 
power are achieved. In [CES05] a 1.7 MSPS (megasamples per second) has been 
achieved in an implementation of a matrix inversion on a Virtex4XC4VSX55 for 
beamforming using the RLS approximation with QR decomposition of the input data 
matrix. 
4.3 FPGA implementation of MRC and NC-LMS 
4.3.1 Complexifyanalysis 
MRC is chosen because it has the lowest complexity in hardware terms among 
the evaluated algorithms. It is considered as a reference for most of the beamforming 
algorithms. 
60 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
NC-LMS is chosen because it represents an intennediate complexity between 
MRC and DM! or RLS. It is an adaptive algorithm that hasn't been applied to 
beamforming yet in the aforementioned papers. 
We chose this algorithm to evaluate, because (i) it belongs to the LMS family and 
(ii) to evaluate its complexity comparing to the MRG. 
The number of arithmetic operations per iterations for both algorithms is given in 
Table 4.1. 
Table 4.1: Number of arithmetic operations per iterations per antenna and per user for 
MRC and NC-LMS 
Adders Substracts Delays Multipliers 
MRC 4 2 2 8 
NC-LMS 8 7 10 19 
From the required number of arithmetic operations per iteration, we can see that 
the NC-LMS is about 2.5 more arithmetically complex than the MRC, this arithmetic 
complexity reflects directly the hardware complexity and the amount of resources 
required (Silicon) to implement the algorithm, and later on it plays a decisive role to 
choose the corresponding FPGA. 
4.3.2 Proposed architectures 
Based on the equations presented in Chapter 3, (3.12, 3.13) for the MRC and 
(3.30-3.33) for the NC-LMS, the proposed architectures are illustrated in Figure 4.1, the 
hexagon presents a complex arithmetic block. 
61 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
fJ 
0.5 
a) 
................. , ·~·~1 
~ 
h---+--t--I---' w.["1 
J:Q3.33 
b) 
Figure 4.1: Proposed implementation of MRC (a) and NC-LMS (b). 
The number of arithmetic blocks in the architectures is derived from the number 
of arithmetic operations shown in Table 4.1; in addition to that aU arithmetic operations 
blocs are complex-valued. 
4.3.3 Implementation Technique 
The implementations are mounted in a straight-forward manner, using rapid 
prototyping methodology with Matlab®-Simulink®. tools targeted into Xilinx devices. 
After evaluating the algorithm with MATLAB, we based the implementation on 
Simulink® tools using System Generator targeted on Xilinx devices. 
Simulink® provides a graphical environment for creating and modeling 
dynamical systems. System Generator consists of a Simulink® library called the Xilinx 
Blockset, and software to translate a Simulink® model into a hardware realization of the 
mode!. 
62 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
System Generatar maps system parameters defined in Simulink (e.g. as mask 
variables in Xilinx Blackset blocks), into entities and architectures, ports, signaIs, and 
attributes in a hardware realization. In addition, System Generatar automatically 
pro duces command files for FPGA synthesis, HDL simulation, and implementation 
tools, so that the user can work entire1y in graphical environments in going from system 
specification to hardware realization. For more about this tool, the reader is referred to 
[XIL05]. 
The other importance with our design is the balancing computation with 1/0 
(Input/Output). Since a special-purpose system typically receives data and outputs 
results through an attached ho st, 1/0 considerations influence overall system. The 
proposed implementations have very good concurrency and good communication by 
using fast components. Besides the degree of concurrency is deterrnined by the 
underlying algorithm whether MRC or NC-LMS. 
The architecture of MRC and NC-LMS are mounted into Simulink blocks (Figure 
4.2, 4.3 respective1y). 
Figure 4.2: MRC mounted into Xilinx Blocks. 
Both architectures have been synthesized usmg real data which has been 
generated from a DS-CDMA platfonn. The data has been converted to digital using the 
63 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
Gateway In and Gateway Out blocks which provide an interface to the Xilinx Bloekset in 
Simulink. Xilinx Gateway blocks handle the type conversions, since MATLAB uses 
double-precision floating-point and the Xilinx portion of the design uses fixed-point 
precision. Xilinx Gateway blocks handle the type conversions. 
They play the role of Analogie ta Digital (A/D) or Digital ta Analogie (D/A) 
converters. The Xilinx Gateway In block represents an input port into the FPGA, while 
the Gateway Out block is an output port from the FPGA. 
Figure 4.3: NC-LMS mounted into Xilinx Blocks. 
It should be noted that we have a latency of 15 cycles and 10 cycles for NC-LMS and 
MRC respectively. The achieved clock frequency in both approaches is acceptable. For 
the MRC we have a minimum period of IOns and a maximum frequency of 95 MHz, and 
64 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
for the NC-LMS the minimum period and the maXImum frequency are 30 ns and 
33 MHz respective1y. 
4.3.4 Hardware resources 
The hardware resources estimate inc1ude numbers of slices, lookup tables, 
memory blocks (BRAM), embedded multipliers, 1/0 blocks, flip flops (FF), and tristate 
buffers (TBUF). These estimates make it easy to determine how design choices affect 
hardware requirements. It is so c1ear and evident that the amount of FPGA hardware 
used is directly re1ated to the data width, so it is best, for reasons of cost, circuit speed, 
and power dissipation to request only the precision required for a particular application. 
Table 4.2 and Table 4.3 show the hardware resources of one and four antennas 
respectively. 
Table 4.2: Hardware resourees for one antenna 
Maximum Ratio Combining MRC 
(32,16) (24,16) (18,10) (16,12) 
Slices 1006 758 348 310 
FFs 1328 1184 468 416 
BRAMs 0 0 0 0 
LUTs 1752 1272 360 320 
lOBs 128 144 72 64 
Embedded Mults 32 32 8 8 
TBUFs 0 0 0 0 
Noise Constrained Least Mean Squares NC-LMS 
Slices 2529 2019 1191 890 
FFs 3138 2346 1242 1104 
BRAMs 0 0 0 0 
LUTs 4540 3534 1445 971 
lOBs 128 144 72 64 
Embedded Mults 60 60 15 15 
TBUFs 0 0 0 0 
The notation (32, 24)-bit indicates that the fixed point arithmetie operations 
represent 32-bit signed two's complement value, with 24 fractional value. It should be 
65 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
noted that for a given bit representation e.g. (32, 24)-bit the amount of hardware for the 
NC-LMS is 2.5 times the amount of hardware for MRC, which shows the real 
complexityofthe NC-LMS. 
Table 43' Hardware resources for four antennas . . .
Maximum Ratio Combining MRC 
(32,16) (24,16) (18, JO) (16, 12) 
Slices 4024 3032 1392 1240 
FFs 5312 4736 1872 1664 
BRAMs 0 0 0 0 
LUTs 7008 5088 1440 1280 
lOBs 512 576 288 256 
Embedded Mults 128 128 32 32 
TBUFs 0 0 0 0 
Noise Constrained Least Mean Squares NC-LMS 
Slices 10116 8076 4764 3560 
FFs 12552 9384 4968 4416 
BRAMs 0 0 0 0 
LUTs 18160 14136 5780 3884 
lOBs 512 576 288 256 
Embedded Mults 240 240 60 60 
TBUFs 0 0 0 0 
Another comparison between MRC and NC-LMS compare to percentage of 
number of slices and multipliers for a given FPGA. For this comparison, we chose three 
FPGAs, Virtex-II Pro XC2VP30 and XC2VPIOO, and Virtex-4 XC4VSX55. The results 
are shown in Table 4.4. 
66 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
T hl 44 P a e ereen age 0 s tees an mu tpJ ters or our an ennas. t f r S d If r Mfi fi t 
Maximum Ratio Combining MRC 
(32,161 (24,16) (18,10) 116, 12) 
VP30 %S 29.4 22.1 10.1 9.0 
%M NIA NIA 47 47 
VPI00 %S 9.1 6.8 3.1 2.8 
%M 57.6 57.6 14.4 14.4 
VSX55 %S 16.4 12.3 5.7 5.0 
%M 50 50 12.5 12.5 
Noise Constrained Least Mean Squares NC-LMS 
VP30 %S 73.9 58.9 34.7 26.0 
%M NIA NIA 88.2 88.2 
VPI00 %S 22.9 18.3 10.8 8.1 
%M NIA NIA 27 27 
VSX55 %S 41.16 32.8 19.3 14.5 
%M 93.8 93.8 23.4 23.4 
For a given bit-representation, compare to the percentage of slices the MRC 
barely scratches the surface of what can be done for the three chosen FPGAs, the NC-
LMS shows low percentage but not as much as MRC for the same bit-wordlength. 
Comparing to the percentage of multipliers, we can see the NC-LMS needs almost the 
double amount of multipliers needed by MRC in the three chosen FPGAs (NIA marked 
show an overflow) and three times more of slices. 
4.3.5 Quantization sfudy 
In the quantization effect study and evaluation (see Figure 4.4) the performance 
is evaluated based on the Loss in dB using the following equation: 
l [ H A Jl 
1 K M W(k).m W(k),m 
Loss=20log lO KMLLE Il IIIIA Il dB 
k=1 m=1 w(k),m w(k),m 
4.1 
where w~),m and w(k),m are the result weight veetor in floating point and fixed point 
respectivel y. 
67 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
1.8 
1.6 
1.4 
!g 1.2 
! 1 
~ 1 (32, 24) Il (32, 16) Il (24, 16) 1 
Users 
Figure 4.4: Quantization study results for MRC and NC-LMS using N=8, 
K=5 users, PTR=OdB, mobile speeds of 60Kmlh, carrier 
frequency 2GHz and chip rate of 1.25Mchip/s, and Simu/ink 
blocks. 
It should be noted that both algorithms operate well for a (32, 24)-bit 
representation and, both show good performance for (32, 16)-bit representation. For an 
(18, 12)-bit representation MRC shows a poor performance (Loss=-0.7 dB) while the NC-
LMS shows a better performance (-0.45 dB). 
4.4 FPGA design and implementation of DMI 
4.4.1 Redesign of DMI 
The DM! technique in the subsection 3.3.2 requires an inherent matrix inverse 
operation (3.16) which is generally considered as an expensive arithmetic operation that 
requires high word length for fixed point data representation and arithmetic. Besides, the 
time varying nature of the wireless channel requires performing such operation, if 
possible, at a symbol rate. 
To overcome such problems we redesigned the DM! technique using the steepest 
descent method (SD-DMI) to (i) render the inherent matrix inversion problem 
68 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
computationally feasible through an iterative method and to (ii) ensure good tracking 
properties for time varying channels. 
A VLSI architecture targeted on FPGAs is presented. The goal is to maximize 
the number of users on a chip (or device in case of FPGA). Maximizing the number of 
users makes it possible to increase the capacity of the base station equipped with 
multiple antennas. 
In [HOQ05] a study for the maximum number of users on a device is reported 
based on an FPGA implementation of a Multiuser Detector (MUD), however no work 
was done to maximum the number ofusers on a chip for beamforming or SAs. 
Consider a window of width J samples. The value of the window's width will 
depend on the time varying channel. As the channel variations go hi gh er, the width 
value goes lower. Applying gradient descent technique to the DMI technique (SD-DMl) 
will require the following operations: 
1. The estimate of the channel vector hm [ n] and the channel covariance matrix 
Rh [ n ] associated with the m th path are recursively computed as 
4.2 
4.3 
2. The covariance matrices Rand Rp m can, similarly, be computed as r,m , 
and 
69 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
3. The covariance matrices of the IPN vectors can be estimated as 
4.6 
4. Using gradient descent techniques the weight ve~tors W m [ n] are now given by 
em [n] = Ri,m [n]w m [n-l]-lÎm [n] 
W m [n] = w m [n-l]- pem [n] 
5. An estimate of the required syrnbol s [ n ] will be given by 
4.4.2 Complexityanalysis 
4.7 
4.8 
4.9 
We can briefly conclude from the simulation results that SD-DMI represents an 
attractive approach. Compared to the two-stage RLS, one can easily notice that the 
redesigned DMI approach SD-DMloffers relatively lower computations complexity (see 
Table 4.5). 
Moreover the recursive nature renders its attractive for fast time varying 
channels. Besides, it is worth mentioning that the RLS algorithrn represents potential 
divergence problems when implemented using fixed point data representation and 
arithmetic. 
70 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
Unless efficient techniques like square QR decomposition is used, long 
wordlengths are recommended. For comparison, Table 4.5 depicts the number of real 
additions and multiplications per single pass weight update (one pilot sample) required 
per user for L antennas. 
Table 4.5: Arithmetic Complexity of the SD-DMI (proposed) versus RLS, per antenna 
d an . per user. 
Method RLS SD-DMI. 
Additions O( 4È +12L2 +12L-4) O(22L2 +6L) 
Multiplications O(4È+8L2+12L) O(17L2 +4L) 
4.4.3 Quantization study 
Prior to any implementation, fixed point representation and arithmetic effect 
should be studied. Toward this end, this section is devoted for fixed point simulation 
wherein it is noticed through trial and error that at least 13 bits are required. It will be 
interesting, in the light of the signal model (3.1)-(3.33), to develop a closed fonn 
fonnula to first calculate the amount of precision fJ (in bits) required by A/D converter 
as follows. 
4.10 
where K is the total number of users, Np and NT are the pilot and traffic spreading 
factors. The pilot-to-traffic power ratio is 17 ~ ~~)I A(k) and the channel attenuations 
hm (t) for the mth path are complex Gaussian random variables with variance CTi,m . The 
SNR is denoted by r (in dB). 
71 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
The factor 4 accounts for the fact that with a probability of 0.99 that the received 
signal sample will be within the range. b. bits for additional precision are added with 
one bit for the sign bit. Equation (4.10) gives precision in the range of 8-12bits, for 
almost all possible (feasible) combinations of the parameters stated ab ove, which is 
'bl . h AIV F '11 . 'd N =256 N =16 (-C. pOSSl e wtt current 1. converters. or 1 ustratlOn cons! er P , T lor 
64kb/s data rate),1]=0.1,M=3,{j~m E{0,-3,-9}dB, r=12dB and· b.=1 then 
fJ ~ 12 bits. 
Through simulations it is observed that the word length of 13 bit or higher will 
be required to maintain BER performances close to those in floating point arithmetic (see 
Figure 4.5). For FPGA implementation, we consider a word length of 16 bits for fixed 
point data representation and arithmetic. 
0:: 
w 
III 
2 3 4 
8blts 
5 6 7 8 
Eb/No (dB) 
9 10 11 12 
Figure 4.5: Fixed point results of the SD-DMI technique (4.2-4.9). N = 16 
(Processing gain), K = lOusers, PTR=-6dB. 
72 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
4.4.4 Proposed architecture 
To achieve real-time implementation, either DSP processors or VLSI 
architectures could be applied. However, as real-time implementation is concemed, 
System On Chip (SOC) architectures offers more parallelism, more compact size and 
lower power consumption than general DSP processors [GU006]. 
Considering the hardware implementation of matrix inversions few works have 
been reported. In [JEN85] a matrix inversion algorithm has been implemented in a 
limited-precision adaptive antenna array processor where the matrix inversion was 
performed in a Multibus processor (Marinco APB3024M), using digitized baseband 
samples from the antennas. The actual device accumulator used a 32-bit mantissa, and 
for this reason the LU decomposition of matrix inversion had been selected. 
A high speed implementation of two matrix inversion algorithms, for general and 
symmetrical matrices in orthogonal systolic architectures, has been presented in 
[PAP88]. A direct Cholesky decomposition and iterative (Newton's iterative) matrix 
inversion methods have been implemented usingAltera's APEX FPGA in [YLI04]. 
In [ECH05] a scalable pipe1ined complex valued matrix inversion architecture 
that performs a QR factorization and a triangular matrix inversion, has been proposed 
and implemented on a FPGA from Xilinx, where the hardware implementation can be 
used as a core processor in a real-time smart antenna system and in a wide variety of 
implementations of beamforming and MIMa algorithms as well. 
In [SAL05] three matrix inversion implementations based on Cholesky 
decomposition in fixed-point are presented. An FPGA implementation of matrix 
73 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
inversion using QRD-RLS algorithm is presented in [KAR05], where a throughput of 
0.13 M updates per second was achieved on a Virtex4 FPGA running at 115 MHz. 
The architecture must be able to respect real-time constraints therefore, an 
efficient implementation is essential to reduce the critical path de1ay, power and area of 
wireless receiver. To explore the proposed architecture, we e1aborate the tasks as 
depicted in Table 4.6. 
Table 4.6: Time steps derivation from the SD-DMI schedulim of equations (4. 2-4.9) 
STEPS #EQ. BLOCK OPERATIONS LATENCY STEP 
LATENCY 
1 (22) b1 hm [n] L L+4 
(25) b2 Pm[n]p~[n] L+4 
(24) b2 rm[n]r:[n] L+4 
2 (25) b3 Rp,m [n] L L+4 
(24) b3 Rr,m [n] L 
(23) b2 Rh [n] L+4 
3 (26) b4 RI,m [n] L L 
4 (27) b5 RI,m [ n ] W m [ n -1 ] 2L+3 2L+4 
(28) b6 wm[n] 1 
5 (29) b7 s[n] L+4 L+4 
Total Latency 6L+16 
Throughput (2L+4)Fclk 
With a timing- and data-dependency analysis, the top level block diagram for the 
SD-DMI algorithm is shown in Figure 4.6. This block diagram is a general architecture 
representing the data dependency and the data flow between different sub-blocks 
representing the arithmetic operations of(4.2)-(4.9) of the SD-DMI. 
74 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
The architecture demonstrates parallelism and reduced redundancy. For example, 
to obtain the result of Step 1 we need b 1 and b2 twice. The data path is balanced and 
facilitates the pipelining in multiple subblocks for high-speed implementation. In 
addition to that the developed architecture should be reconfigurable to several baseband 
processing UMTS (Universal Mobile Telecommunication Systems) systems, thus it can 
be reconfigured by respecting aIl systems constraints (hardware, algorithm, speed ... ). 
k=1,2, ... ,K 
~I)', ... ,M #CYcles~ ':.iil"'-
k=1 k=K 
...--...... 
PESD-DMI PESD-DMI 
Figure 4.6: Proposed block diagram of the proposed VLSI architecture for the SD-DMI 
with the corresponding data flow. 
Each block diagram shown is for one user, k= 1 (multiple antennas are mounted 
inside the one block diagram), and one channel path, m=l. The architecture can be easily 
replicated for multiple users and multiple paths. The VLSI architecture shown in Figure 
75 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
4.6 is verified through end-to-end VHDL simulations along with actual hardware 
implementation to determine the device resources. 
4.4.5 Implementation technique 
Matrix inversion implementation's problem inversion has been an important 
challenge and consideration for most of the hardware engineers. Engineers have been 
drawn to design a low complexity solution for this problem since it is considered as one 
of the main cores in most of the signal processing applications. 
The choice of suitable hardware architecture for the Implementation of equations 
(4.2-4.9) depends on the system specifications and on the available hardware resources 
[BUR06]. The role ofbit-wordlength has an important effect on the hardware resources 
to execute with precision required the implemented algorithm. Hence, we assume an 
Implementation on 16-bits wordlength architectures as we have shown them to have 
sufficient fixed-point range. 
AlI Proeess Elements (PEs) mainly are implemented for complex numbers, 
however in sorne PEs as in blocks b3 and b7 there are no multiplications between two 
complex numbers so the architecture can be assumed to be replicated for the imaginary 
part. The complex multiplier is derived from [XIL05] and the equations in [SHA98]. 
The complex multiplier contents three real multipliers in a pipelined structure with a 
latency of 4 cycles and a throughput of a result each clock cycle Fclk [){fLOS}. 
The architecture is coded in VHDL using Xilinx ISE which is invoked for gate 
level synthesis. ISE Foundation integrates architecture in a complete logic design 
environment for aIl leading Xilinx FPGA and CPLD (Complex Programmable Logic 
Deviee) products, for more about this tool; the reader is referred to [XIL05]. 
76 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
The targeted FPGA components are Virtex-II and Virtex-II Pro families of 
Xilinx. The bottlenecks in our design are the matrix-vector product in (4.7), the vector-
vector product in (4.3, 4.4, 4.5, and 4.9). For these multiplications parallel and systolic 
architectures have been employed. 
4.4.6 Mapping ta VLSI architectures 
High-Ievel synthesis has become crucial for developing fast time-to-market delays 
for new VLS! implementations of signal processing algorithms in order to estimate the 
complexity which is an important consideration for real time implementations. However, 
the complexity in the hardware is reflected by the number of arithmetic operations, 
recursiveness, dataflow, and memory access. 
Considering the fact that is unlikely to have more than four receiving antennas in the 
BTS, the analysis is made for four antennas (L=4). 
~ Half-complex multiplier, real with complex value inputs 
o Complex multiplier values 
E9 Complex adder values 
2b 
ÇJ Right point shifter of complex value (input multiply with 2b) 
Zb 
Q Left point shifter of complex value (input multiply with 2-b) 
Complex delay 
Figure 4.7: Legend of operators. 
77 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
Equation (4.2) computes the channel vector. The block architecture with the 
corresponding process element PE for this equation is showed in Figure 4.8. The PE for 
this equation has two shifters, one complex addition and one complex subtraction. The 
number of PE in the block (b 1) is equal to L. The computation of the channel vector 
requires a maximum of20.680 ns. 
. ....................................... . 
••• PE JL : ~ 
: ____ ~ Pm [n-J]I(i) 
.......................................... 
Figure 4.8: Block bl. 
Block b2 is used for the vector-vector product where the dimensions of the two 
multiplied vectors are LxI and 1 x L yielding a Lx L matrix. This block is used to 
caIculate the partial covariance matrix for different input vectors as in (4.3, 4.4 and 4.5). 
The block architecture with the corresponding PE for this product is shown in Figure 
4.9. The number of PE is equal toL2 • The required time for computation ofthe matrix is 
4.728 ns. 
78 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
Figure 4.9: Block b2. 
Equations (4.4) and (4.5) compute the covariance matrices for the pilot and data 
traffic respectively. The block architecture with the corresponding PE for this equation 
is showed in Figure 4.10. The same architecture can be applied for the pilot covariance 
matrix. The PE for this equation has one complex addition and one complex subtraction 
and two shifting operations. The number of delay elements can be determined based on 
the number of samples J . In addition to that, this PE has a Lx L matrix input which has 
been computed by a separate block (b2). The number of the PEs in the block (b3) isL2 . 
The computation of the covariance matrices of the pilot and the traffic requires of Il.120 
ns each. 
· ·PE(;;····· .~~ ..................... ~ 
rm [n]r: H(i'i) ,.----'-----, rm[n-Jjr':[n-J]I(I,i) 
-.~~~ . 
(Pm [n jp~ [n ]I(i'i») : 
(J I)R.m[n-lll 
:. (I,i) 
· · · .................................... 
Figure 4.10: Block b3. 
· · · 
79 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
Equation (4.6) computes the covariance matrices of the IPN vectors. The block 
architecture with the corresponding PE for this equation is showed in Figure 4.11. The 
PE for this equation has two real shifting operations, one complex addition and one 
complex subtraction. The number of PE in the block (b4) is equal to L2 • The 
computation ofthe RI,m [n] matrix requires 17.057 ns . 
.........•..•..•......•.. 
~ 1+1] PE(j): 
Rp,m [ n ]Iu,j) ---IIIoJ-, ~ 
, 
.......•.......•...•.••. 
Figure 4.11: Block b4. 
Equation (4.7) and (4.8) computes the error em [n] and the weight vectorw m [n]. 
For the matrix-vector product RI,m [n]w m [n -1] a systolic architecture has been 
deployed (see Figure 4.12). The matrix-vector product is identified as block b5. For 
aLxLmatrix and LxI vector, the result is a LxI vector. To compute the resulting 
vector 2L + 1 c10ck tops are needed. The number of PE in the block (b5) is equal to L . 
This architecture derives from the architecture presented in [QUI89]. There are many 
optimizations for the matrix multiplications in [QUI89] and [PAR99]. These 
optimizations can be used for area-time trade-offs architectures resulting in less power 
dissipation. This architecture has L complex-valued Multiplier-Accumulator (MACs) 
where each MA Chas one complex multiplication, one complex addition and one 
complex register or delay element. The required time for computation of the vector is 
9.15 ns. 
80 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
Figure 4.12: Block b5. 
: .......... ..... J?~(.~): 
• 1 • · 
· · ..............•....... 
The block architecture with the corresponding PE for computing the weight vector is 
identified as block b6. The number of PEs in this block is equal to L (see Figure 4.13). 
This architecture has one shifting operation, one complex addition, one complex 
subtraction, one complex delay element. The required time for computation of the vector 
is 7.598 ns. 
b5 
Ri.m [ n ] W m [ n -Il) 
................................. ...... ~ J>~ 
· · · 
jl • 
............••..•••••..•.......•...•.....• 
Figure 4.13: Block b6. 
Equation (4.9) computes the estimate ofthe symbol for a certain path m = {l, ... ,M}. 
This is a vector-vector product where the dimensions of the two multiplied vectors are 
1 x L and LxI the result is a scalar. 
81 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
The architecture is shown in Figure 4.14 where one complex MAC is been used. 
This architecture is identified as block b7. The result is computed at the L + 2 clock top. 
The computation of the estimate requires 4 ns . 
•••..•.......•..•.... 
. : 
Figure 4.14: Block b7. 
At last, aIl estimates for aIl paths are summed, which yields the estimate of the 
required symbol s [ n ] . 
4.4.7 Hardware resources 
The hardware resources of the different blocks are shown in Table 4.7. The 
estimates include numbers of slices, lookup tables, flip-flops, memory blocks and 
multipliers. These estimates make it easy to determine how design choices affect 
hardware requirements. Block b2 requires 36 real multipliers (for four antennas), with a 
considerable number of slices. This block can be considered as bottlenecks in our 
design. Block b2 and b7 require the least computational time, though they are the fastest 
among aIl the others in the block diagram. 
Table 4.7: Hardware resources for different blocks in the architecture for L=4, J=10 and M=3. 
Block bl b2 b3 b4 b5 b6 b7 
Slices 221 700 820 128 764 58 191 
MULT.18XI8 0 12 0 0 12 0 3 
Path Delay-max (ns) 6.96 4.76 5.27 5.88 4.76 3.40 4.76 
Clock Freq. (MHz) 144 210 190 210 210 294 210 
82 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
The computational time of b2 and b7 is the computational time to perfonn a 
multiplication. The largest computational time is for the computation of the weight 
vector (block b5 and b6) which is considered as the computational time for each task 
since the architecture is fulIy pipelined and the critical path is constrained to be set to 
less or equal to 20 ns (it is rather our design choice). AlI blocks have the same 
throughput, with quiet similar latencies except for block b4 and b6. In sorne blocks the 
latency depends on the number of antenna elements L as in b5 and b7 and the number of 
delay elementsJ as in bl and b3. 
The total latency (in cycles) for the first estimate is given by the folIowing 
fonnula 
TLatency = 6L + 16 4.11 
AlI the necessary variables to compute the mathematical expressions of the SD-
DM! algorithm are available at the proper time at the proper block at the proper PE. 
Thus the data flow through the VLSI network is correct, though the VLSI architecture 
executes the algorithm correctly. It should be noted that the total architecture respects 
the totallatency calculated using (4.11). 
4.4.8 Tradeoffs for maximum number of users 
Depending on the objective of the implementation, a trade-off can be chosen. 
There are many trade-offs in the VLS! architectures providing critical insights into 
implementation issues that may arise during the product development process. These 
optimizations or trade-offs make the proposed architecture more suitable for real-time 
implementation. 
83 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
In order to achieve high throughput operations, pipelined and parallel processing 
are often used in the implementation of digital signal processing algorithms. Time-
Constraints are respected since the architecture is fully pipelined. For area-constraints, 
the bottleneck lies in the area which multipliers and slices occupy. The number of slices 
and multipliers in the FPGA play a decisive role for determining the number of users a 
base station (equipped with 4 antenna elements) can serve. 
In our application, the objective is to attain a maximum number of users in the 
cell based on the real time requirernents. Therefore the area-constraints are closer to be a 
slices/multipliers trade-off. The following equations help finding the approximate 
maximum number of users. 
Since the computational time of block b2 is twice the computational time of 
block b2, which are considered in the same task, block b2 (hardware) is used once. 
Hence the total number of slices for the proposed architecture is given by 
4.12 
And the total number of multipliers for the proposed architecture is given by 
4.13 
The critical path is assumed to be less than 20 ns, though the maximum achieved 
throughput is F max = 50 MHz. If the targeted throughput of the user is equal to 
F = 2 Mb/s, then we define the throughput factor Ff as 
F ~ Fmax = 50 = 25 
f F 2 4.14 
If the channel has 3 paths, then we define the reuse factor Rf as 
84 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
4.15 
The number of users can be detennined based on two constraints, the number of 
slices and the number of multipliers; hence we define the Maximum Core Unit (MCU): 
MCU. = Number ofslices available in a device 
Slrces TI· d b f 1· ota requzre num er 0 s lces 4.16 
MCU = Number of multipliers avai/able in a device 
Multipliers TI· d b fI· 1· ota requlre num er 0 mu hp !ers 4.17 
Therefore the number of users is given by: 
4.18 
From Table 4.8, we can see that the maximum number of users we can attain, for the 
chosen Xilinx FPGA devices, is 90 (for Virtex2P XC2VP50). With a 2Mb/s for each 
user, the maximum number of users attaint is a pretty goal for the DS-CDMA 
communication system. 
Therefore a cheaper FPGA with/or a lower bit rate, allows for a large 
augmentation in the number of us ers to be served by the base station; on the other hand, 
for a state-of-the-art FPGA, such as the Xilinx Virtex IV and Virtex V the number of 
users to be served by the base station increases however a cost penalty cornes along. 
85 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
Table 4.8: Maximum number ofusers ~AX over different Xilinx FPGA devices in 
the case ofL=4, J=10 andM=3. 
S1~~ VIRTEX2 VIRTEX 2 PRO ><! >< ><! ><! ><! ><! () (î () ><! >< ~tTJ N IV N () () () <: <: <: (î (î N N (J) Id . N .... VJ 00 N IV ~ ~ 0 <: 0 0 0 ~ ~ 
'"r1 00 0 0 0 VJ VI 0 0 0 0 IV ~ 0 0 
JCYIIV\. 2 15 37 65 4 10 53 90 
4.5 Summary 
In this chapter, FPGA implementations of beamforming receiver based on MRC 
and NC-LMS for DS-CDMA System have been presented and studied. 
However, due to the adaptive nature of NC-LMS the 10ss in computing the 
weight vectors is lower compared to MRC which in the overall makes NC-LMS 
attractive in a 10w bit-wordlength, but due to the low complexity of MRC, it is more 
suitable for FPGA implementation than the NC-LMS. Pipelining and parallelism can be 
introduced in both architectures to allow for higher clock frequencies though higher 
throughputs. 
Moreover, for high performance, robust transmission schemes reqUlre the 
implementation of comp1ex signal processing algorithms such as the DM/. An FPGA 
implementation for the DM! has been presented, after resolving the matrix inversion 
using a gradient descent method (SD-DMI). Performance analysis for beamforming 
techniques and simulations in fixed-point are carried out, and hardware resources are 
shown. 
86 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
The algorithm is massively parallel and is suitable for a fixed-point VLSI 
implementation. We have demonstrated that there is no loss in the algorithm 
performance due to this redesign of the algorithm. Results showed that we can attain a 
sufficient number of users to be served in a ceIl, using complex algorithm and at a 
modest cost. 
The FPGA implementation can be used as a core processor in a real-time SA 
system and in a wide variety of MIMO algorithms. Furthermore the FPGA 
implementation can easily be ported to an ASIe implementation. 
87 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
Chapter 5 
Conclusion 
As the industry transitions from a high-growth phase to a more mature state, cost 
pressures will increasingly mount in aIl facets of the infrastructure, including the 
wireless base station. Theoretically, this cost should be correlated with the augmentation 
in the performance of the base station, in other words, next-generation base station 
deployments must conquer the challenge of continually reducing co st (as measured by 
cost per channel) while adding functionality to support new services, protocols, and 
changing subscriber usage patterns. 
However can we maintain a moderate cost when launching the next-generation 
base station? 
SA technology can significantly improve wireless system performance and 
economics for a range of potential users. It enables operators of Personal 
Communication Service (PCS), cellular, and Wireless Local Loop (WLL) networks to 
realize significant increase in signal quality, capacity, and coverage. 
88 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
Nevertheless, the success of the SA systems relies on two considerations that 
have been often overlooked when investigating SA technologies: first, the SAs features 
need to be considered early in the design phase of future wireless systems which is 
called the top-down feasibility; second a realistic performance evaluation of SA 
techniques need to be performed according to the critical parameters associated with 
future systems requirements and this is called bottom-up feasibility. 
The first consideration was the objective of Chapter 2 of this work, where a 
global overview of SA systems was studied and discussed. The second consideration 
consist the main core of Chapter 3, where we focused on the research of the best 
beamfoming algorithm, leading the SA system to give his main beam towards SOI, and 
nulls towards SNOI, in any wireless communication environment. 
However, based on the simulations results, we have found that the SD-DMI is 
our best candidate for beamforming; it is a very efficient and powerful algorithm. Hence, 
most of signal processing engineers recommend this algorithm to the industry due to its 
high performance; on the other hand, hardware engineers are trying to avoid it as much 
as they can, or if not, worst case scenario, be looking for new techniques to implement 
it, ensuring no loss or degradation in the performance. 
As the famous quote about challenge for Tommy LASORDA got our mind "the 
difference between the impossible and the possible lies in person's determination", we, 
hardware and signal processing engineers have accepted the challenge of the 
implementation of the DM!. 
We have redesigned the algorithm, so that massive parallelism and pipelining 
and bit level arithmetic can be revealed. The redesign of the DMI has made it more 
powerful especially ensuring tracking properties for time varying channels. We have 
89 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
proposed an efficient FPGA implementation, and estimated the maximum number of 
users a base station (equipped with four antenna elements) can serve. This is done using 
Xilinx ISE, or VHDL manual co ding, the good thing about this software is that it allow 
the designer to verify each level of the design, so each level separately can be identified 
and checked, plus the properties in placing and routing, assigning package pins, area and 
time constraints. 
We have implemented also two other beamforming algorithms, MRC and NC-
LMS, using rapid prototyping methodology. These two techniques represent an 
intennediate complexity comparing to the SD-DMI. This rapid prototyping technique is 
fast in design, "just connecting blocks - no VHDL coding", however it suffers when it 
cornes to timing, scheduling and routing. 
In conclusion, the manual VHDL co ding stays the reference for deeply hardware 
designing and implementation, it allows the designer to interact with every stage of the 
design, while the rapid prototyping technique, in our own perspective point of view, 
serves for fast testing of the design, because the design even it is coded in a known HDL, 
it remains weird and a foreign language to the designer. Once the design is verified with 
rapid prototyping technique, the designer can start working deeply with ISE. 
For future work and challenge, there is a large possibility for extending the work 
of this thesis by: 
~ Developing an algorithm, more powerful than the DMI and the TS-RLS 
(BER <0.02). 
~ Developing VLSI architectures with low power and low complexity for 
the developed algorithms. These architectures play the role of a core 
90 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
processor in a real-time smart antenna system and in a wide variety of 
implementations of beamforming and MIMO algorithms as weIl. 
~ Implementing the developed architectures on FPGAs and ASICs. 
~ Developing a SOC based on SA. 
91 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
[KRA88] 
[BOL06] 
[BH001] 
[BOUOO] 
[BEL02a] 
[IECOE] 
[ALE04] 
[ZHA01] 
References 
J. D. Kraus, ANTENNAS: McGraw-Hill, 1988. 
Ivo Bolsens, "Programming modem FPGA platforms", IEEE 1 7th 
International Conference on Application-specifie Systems, 
Architectures and Processors, Steamboat Springs, Colorado, 
September 11-13, 2006. 
A. U. Bhobe and P. L. Perini, "An Overview of Smart Antenna 
Technology For Wireless Communication," IEEE Proceedings of 
Aerospace Conference, Volume 2, 10-17 March 2001 
Page(s):2/875 - 2/883 vol.2. 
A. O. Boukalov and S. G. Haggman, "System aspects of smart-
antenna technology in cellular wireless communications-an 
overview," IEEE Transactions on Microwave Theory and 
Techniques, vol. 48, pp. 919,2000. 
S. Bellofiore, C. A. Balanis, J. Foutz, and A. S. Spanias, "Smart-
antenna systems for mobile communication networks. Part 1. 
Overview and antenna design," Antennas and Propagation 
Magazine, IEEE, vol. 44, pp. 145,2002. 
International Engineering Consortium, Online Education, Web 
ProForum Tutorials, "Smart Antenna Systems", 
http://http://www.iec.org/online/tutorials/smart_antl 
A. Alexiou and M. Haardt, "Smart antenna technologies for future 
wireless systems: trends and challenges," Communications 
Magazine, IEEE, vol. 42, pp. 90, 2004. 
M. Zhai and Y. Liu, "An overview of spatial channel models used 
in smart antenna system analysis", International Conferences on 
92 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
[SHI97] 
[CHROO] 
[SEU99] 
[LEB99] 
[ROY97] 
[GOD97a] 
[GOD97b] 
[JINOS] 
[KAI05] 
Info-tech and bifo-net.ICII 2001 Beijing, 
Volume 2, 29 Oct.-l Nov. 2001 Page(s):542 - 548 vol.2. 
J. Shiann-Shiun, O. Garret, and X. Guanghan, "Experimental 
evaluation of smart antenna system perfonnance for capacity 
improvement," IEEE Global Telecommunications Conference, 
GLOBECOM '97, Volume 1,3-8 Nov. 1997 Page(s):369 - 373. 
M. Chryssomallis, "Smart antennas," Antennas and Propagation 
Magazine, IEEE, vol. 42, pp. 129,2000. 
C. Seungwon, S. Donghee, and T. K. Sarkar, "A comparison of 
tracking-beam arrays and switching-beam arrays operating in a 
CDMA mobile communication channel," Antennas and 
Propagation Magazine, IEEE, vol. 41, pp. 10, 1999. 
j. Joseph C. Leberti, Theodore C. Liberti, JR., Theodore S. 
Rappaport, Smart antennas for Wireless Coomunications: Prentice 
Hall,1999. 
R. H. Roy, "An overview of smart antenna technology and its 
application to wireless communication sybstems," IEEE 
International Conference on Personal Wireless Communications, 
17-19 Dec. 1997 Page(s):234 - 238. 
L. C. Godara, "Applications of antenna arrays to mobile 
communications. I. Perfonnance improvement, feasibility, and 
system considerations," Proceedings of the IEEE, vol. 85, pp. 
1031, 1997. 
L. C. Godara, "Application of antenna arrays to mobile 
communications. II. Beam-fonning and direction-of-arrival 
considerations," Proceedings of the IEEE, vol. ·85, pp. 1195, 1997. 
H. Jin and A. Acampora, "A Reservation-Based Media Access 
Control (MAC) Protocol Design for Cellular Systems Using Smart 
Antennas&#8212; Part J. Flat Fading," IEEE Transactions on 
Wireless Communications, vol. 4, pp. 792, 2005. 
T. Kaiser, "When will smart antennas be ready for the market? Part 
93 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
[OHI02] 
[PEN02] 
[GOD04] 
[SHI98] 
[JINOO] 
[NAG96] 
[SUA93] 
[HAY96] 
[LINOl] 
[OUA05] 
l," Signal Processing Magazine, IEEE, vol. 22, pp. 87,2005. 
T. Ohira, "Analog smart antennas: an overview," The 13th IEEE 
International Symposium on Personal, Indoor and Mobile Radio 
Communications, Volume 4, 15-18 Sept. 2002 Page(s):1502 -
1506 volA. 
J. Peng and Z. Duan, "The research of adaptive algorithm of smart 
antenna arrays in the CDMA system," Proceedings of the 4th 
World Congress on Intelligent Control and Automation, 
Volume 2, 10-14 June 2002 Page(s):838 - 841 vol.2 
L. C. Godara, SMART ANTENNAS: CRC PRESS, 2004. 
J. Shiann-Shiun, G. T. Okamoto, X. Guanghan, L. Hsin-Piao, and 
W. J. Vogel, "Experimental evaluation of smart antenna system 
performance for wireless communications," IEEE Transactions on 
Antennas and Propagation, vol. 46, pp. 749, 1998. 
C. Jinho, "Pilot channel-aided techniques to compute the 
beamforming vector for CDMA systems with antenna array," 
IEEE Transactions on Vehicular Tech no logy, , vol. 49, pp. 1760, 
2000. 
AF. Naguib, "Adaptive antennas for CDMA wireless networks," 
PhD. Dissertation, Stanford University, Stanford, CA, 1996. 
B. Suard, A. Naguib, G. Xu, and A Paulraj, "Performance 
Analysis of CDMA Mobile Communication Systems Using 
Antenna Arrays," in Proc. Int Conf Acoustics, Speech, and Signal 
Processing (ICASSP), Mineapolis, MN, April 1993, pp. 153-156 
S. Haykin, Adaptive Fi/ter Theory, Third ed: Prentice Hall, 1996. 
L. Lindbom, M. Sternad, and A Ahlen, "Tracking of time-varying 
mobile radio channels .1. The Wiener LMS algorithm'" IEEE 
Transactions on Communications, vol. 49, pp. 2207-2217, 2001. 
M. Ahmed Ouameur and D. Massicotte, "Multiuser Wiener LMS 
for Time Varying Multipath Channel Estimation and Tracking in 
DS-CDMA Systems," submitted to EURASIP Journal on Applied 
94 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
[GOL83] 
[OJA83] 
[WEI02] 
[CAIOO] 
[SL093] 
[M0099] 
[YONOl] 
[GOL98] 
[GAM05] 
[POW93] 
Signal Processing, Special Issues on Reliable Communication over 
Rapidly Time-Varying Channels, June I S\ 2005. 
G.H. Golub and C. F. Van Loan, Matrix Computations, Baltimore, 
MD: the John Hopkins Univ. Press, 1983. 
E. Oja, Subspace Methods of Pattern Recognition. New York: 
Wiley, 1983. 
S. Weiss and 1. K. Proudler, "Comparing efficient broadband 
beamforming architectures and their perfonnance trade-offs", 14th 
International Conference on Digital Signal Processing, DSP 
Volume 1,2002 Page(s):417 - 423 vol. 1. 
G. Caire, "Two-stage nondata-aided adaptive linear receivers for 
DS/CDMA," IEEE Transactions on Communications, vol. 48, pp. 
1712-1724,2000. 
D. T. M. Slock, "On the convergence behavior of the LMS and the 
nonnalized LMS algorithms," IEEE Transactions on Signal 
Processing, , vol. 41, pp. 2811, 1993. 
Wynn. C. Stirling. Todd K. Moon, "Mathematical Methods and 
Aigorithms for Signal Processing", Prentice Hal!, 1999. 
W. Yongbin, S. B. Gelfand, and J. V. Krogmeier, "Noise-
constrained least mean squares algorithm," IEEE Transactions on 
Signal Processing, vol. 49, pp. 1961,2001. 
S. Gollamudi, S. Nagaraj, S. Kapoor, and H. Yih-Fang, "Set-
membership fiItering and a set-membership nonnalized LMS 
algorithm with an adaptive step size," IEEE Signal Processing 
Letters, vol. 5, pp. 111, 1998. 
D. Gamba, "Using FPGAs in Wireless Base Station Designs," in 
DSP Magazine, October ed, 2005, pp. 20-22. 
D. G. Powell and A. G. J. HoIt, "Reduced sampling rate 
beamforming technique and its hardware implementation," Radar 
and Signal Processing, IEE Proceedings F, vol. 140, pp. 209-215, 
1993. 
95 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
[NUT02] 
[TAR05] 
[CES05] 
[XIL05] 
[HOQ05] 
[GU006] 
[JEN85] 
[PAP88] 
T. W. Nuteson, J. E. Stocker, J. S. Clark, D. S. Haque, and G. S. 
Mitchell, "Perfonnance characterization of FPGA techniques for 
calibration and beamfonning in smart antenna applications," IEEE 
Transactions on Microwave Theory and Techniques, vol. 50, pp. 
3043-3051, 2002. 
A. Tarighat, E. Grayver, A. Eltawil, J. F. Frigon, G. Poberezhskiy, 
Z. Hanli, and B. Daneshrad, "A low-power ASIC implementation 
of 2Mbps antènna-rake combiner for WCDMA with MRC and 
LMS capabilities," Proceedings of the IEEE Custom Integrated 
Circuits Conference, 18-21 Sept. 2005 Page(s):69 -72. 
T. Cesear and R. Uribe, "Implementing Matrix Inversions ln 
Fixed-Point Hardware," in DSP Magazine, October ed, 2005, pp. 
32-35. 
Xilinx IP center, Logic Core, Complex multiplier Core v2.1 , April 
2005, http://www.xilinx.comlbvdocs/ipcenter/data _ sheetl cmpy. pdf 
Quoc-Thai Ho, Daniel Massicotte, and Adel-Omar Dahmane, 
"FPGA Implementation of an MUD Based on Cascade Filters for 
a WCDMA System," EURASIP Journal on Applied Signal 
Processing, vol. 2006, Article ID 52919, 12 pages, 2006. 
Yuanbin Guo, Jianzhong(Charlie) Zhang, Dennis McCain, and 
Joseph R. Cavallaro, "An Efficient Circulant MIMO Equalizer for 
CDMA Downlink: Algorithm and VLSI Architecture," EURASIP 
Journal on Applied Signal Processing, vol. 2006, Article ID 57134, 
18 pages, 2006. 
R. Jenkins, "Implementing a matrix-inversion algorithm in a 
limited-precision adaptive antenna array processor," Antennas and 
Propagation Society International Symposium, Volume 23, Jun 
1985 Page(s):289 - 292 
G. M. Papadourakis and H. Andre, "High speed implementation of 
matrix inversion algorithms in orthogonal systolic architectures", 
IEEE Conference Proceedings Southeastcon, 11-13 April 1988 
96 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
[YLI04] 
[ECHOS] 
[SAL05] 
[KAR05] 
[BUR06] 
[SHA98] 
[QUI89] 
[PAR99] 
Page(s):200 - 204. 
M. Ylinen, A. Burian, and J. Takala, "Direct versus iterative 
methods for fixed-point implementation of matrix inversion," 
International Symposium on Circuits and Systems ISCAS 
Volume 3, 23-26 May 2004 Page(s): III - 225-8 Vol.3. 
F. Echman and V. Owall, "A scalable pipelined complex valued 
matrix inversion architecture" IEEE International Symposium on 
Circuits and Systems, ISCAS, 23-26 May 2005 Page(s):4489 -
4492 Vol. 5. 
P. Salmela, A. Happonen, A. Burian, and J. Takala, "Several 
approaches to fixed-point implementation of matrix inversion" 
International Symposium on Signais, Circuits and System, ISSCS 
2005, Volume 2, 14-15 July 2005 Page(s):497 - 500. 
M. Karkooti, J. R. Cavallaro, and C. Dick, "FPGA Implementation 
of Matrix Inversion Using QRD-RLS Aigorithm", Conference 
Record of the Thirty-Ninth Asilomar Conference on Signais, 
Systems and Computers, October 28 - November 1, 2005 
Page(s):1625 -1629. 
A. Burg, S. Haene, D. Perels, P. Luethi, N. Felber, and W. 
Fichtner, "Algorithm and VLSI architecture for linear MMSE 
detection in MIMO-OFDM systems" IEEE International 
Symposium on Circuits and Systems, ISCAS 2006, 
21-24 May 2006 Page(s):4 pp. 
N. R. Shanbhag, "Algorithms Transformation Techniques for 
Low-Power Wireless VLSI Systems Design," International 
Journal of Wireless Information Networks, vol. 5, pp. 147-171, 
1998. 
P. Quinton. Y.Robert, Algorithmes et architectures systoliques: 
MASSON, 1989. 
K. K. PARHI, VLSI Digital Signal Processing System: WILEY, 
1999. 
97 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
[TAR05] 
[MON80] 
O. Tahernia, "Righ-Perfonnance DSP-Vsion, Leadership, 
Commitment", DSP Magazine, October ed, 2005, pp. 4. 
R.A. Monzingo and T.W. Miller, Introduction to Adaptive Arrays, 
New York: Wiley, 1980. 
98 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
AppendixA 
Results with a Lower Complexity Platform 
A.1 Platform characteristics 
In order to study the performance of the adaptive algorithms, in a wireless 
communication system, (Transmission, reception, noise ... ), we have developed a low 
complexity platform based on CDMA proto col. The characteristics of the platform can 
be summarized as follows: 
>- One transmitting antenna, multiple receiving antennas (SIMO), in default, the 
number of receiving antennas is equal to 4. 
>- Variable number ofusers, in default it is set to 8. 
>- SINR is varying from zero dB to 16 dB with a step of four. 
>- The Process gain vector or the OVSF (Orthogonal Variable Spreading Factor), in 
default it is set to 16. 
>- Number of frames = 40; 
>- Number of slots per frame is 15. 
~ The slot length chip is 256000. 
99 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
~ Time varying channel, the variation rate depends on the speed of the user. The 
channel change every 15 slots if the speed of the user is less than 5 Km/h, it 
changes every 7 slots if the speed is between 5 and 20 Km/h, it changes every 4 
slots if the speed is between 20 and 60 Km/h, it changes every 2 slots if the speed 
is between 60 and 90 Km/h and changes every slots if the speed is greater than 
90Km/h. 
~ Generating long code, at each time the channel changes, the system regenerates 
the code. 
~ The percentage of the error in the channel is about 5%. 
A.2 Maximum Ratio Combining 
For the default parameters of the platform, the result of this algorithm is 
illustrated in Fig 1. It can be seen the BER decreases as the SINR increases. The BER is 
about 10-3.7 for an SINR of 16 dB. 
10" 
10" O~---:-----:---~6--:-6 --!:10'---1-'-::-2 --:'14:------::'16 
SNR Signai to Noise Ratio 
Figure 1: BER vs. SINR for the MRC 
100 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
A.3 Direct Matrix Inversion 
For the default parameters of the platfonn, the result of this algorithm is 
illustrated in Fig 2. It can be seen the BER diminishes as the SINR increases. The BER 
is about 10-3.7for an SINR of 16 dB. 
s .. 
a: 
~ 10" 
iD 
a: 
w 
al 
10" 
10·' '--------'----'---'-----'----'---'-----'----' 
o 4 6 8 10 12 14 16 
SNR Signalto Noise Ratio 
Figure 2: BER vs. SINR for the DMI 
A.3 Least Mean Squares 
The tirst scenario of simulations for this algorithm is to display the bit error rate 
BER versus the SINR for different values of p (see Fig 3). It can be seen the BER 
diminishes as the SINR increases. The BER is about 10-3.8 for an SINR of 16 dB. We 
have taken many values for the step-size p , the optimal p is one that gives the best result 
at BER = 10-2 and SINR = 5dB. It can be seen that the optimal pis 0.05. 
101 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
10' 1~-~--"--r--'--7==~::=i1 
-"-E>-- O.lDlOl 
--D--O.OOOI 
···x··· 0.001 
---'---0.01 
--+-- Dm 
--+--0.05 
10·' --"""""",,'*-
---$-"-0.1 
---9'-- 0.6 
---b-- 0.8 
10-' .. 
", 
" " 
-. "'" 
'" 
10-<4 
0 6 8 10 12 14 16 
SNR Signal to Noise Ratio 
Figure 3: BER vs. SINR for the LMS 
The second scenario of simulations for this algorithm is to display the MSE 
versus the number of iterations for different values of Ji (see Fig 4). This is done by 
fixing the SINR to 10 dB, and the remaining parameters of the platform to their default 
values. We have taken three values of the step-size Ji . It can be seen that as Ji increases, 
more it converges, but on the other hand the MSE increases. Besides as the number of 
iterations increases the MSE decreases. 
Training Daia length 
Figure 4: MSE vs number of iterations for the LMS 
The third scenario of simulations for this algorithm is to display the BER versus 
the number of training bits for different values of Ji (see Fig 5). This is done by setting 
the SINR to 10dB and the remaining value of the platform to their default values. For 
this simulation we have taken two values of the step-size Ji . We see that as the number 
102 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
of iterations increases, the BER decreases (~1 0-3.5). 
Training Data 
Figure 5: BER vs. number of training data for the LMS 
A.3 Normalized Least Mean Squares 
The first scenario of simulations for this algorithm is to display the bit error rate 
BER versus the SINR for different values of fi (see Fig 6). It can be seen the BER 
diminishes as the SINR increases. The BER is about 10-3.8 for an SINR of 16 dB. We 
have taken many values for the step-size j1 , the optimal fi is one that gives the best result 
at BER = 10-2 and SINR = 5dB. It can be seen that the optimal j1 is 0.03. 
'" w 
'" '" .... " ":: 
~, ", 
"1-'_ . 
" 
-o-O.OCOO1 
""""'O--O.(XXl1 
~-OOOI 
··········0.01 
._ ... + ... - 0 03 
···-..... -·0.05 
=!=~.~ ~ 
Figure 6: BER vs. SINR for the NLMS 
The second scenario of simulations for this algorithm is to display the MSE 
versus the number of iterations for different values of fi (see Fig 7). This is done by 
103 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
fixing the SINR to 10 dB, and the remaining parameters of the platfonn to their default 
values. We have taken three values of the step-size f1 . It can be seen that as fi increases, 
more it converges, but on the other hand the MSE increases. Besides as the number of 
iterations increases the MSE decreases. 
TllIiningData Length 
Figure 7: MSE vs. number of iterations for the NLMS 
The third scenario of simulations for this algorithm is to display the BER versus 
the number of training bits for different values of f1 (see Fig 8). This is done by setting 
the SINR to 10dB and the remaining value of the platfonn to their default values. For 
this simulation we have taken two values of the step-size fi . We see that as the number 
of iterations increases, the BER decreases (~1 0-3.5). 
, ••• L.. ~, -,~. -,-'-:-, -20-'-:--'-'-:-' -,-'-:-. -,-'-:-, ----':-.. ----':-.. --:' ... 
TralningO.ta 
Figure 8: BER vs. number of training data for the LMS 
A.4 Noise Constrained Least Mean Squares 
The first scenario of simulations for this algorithm is to display the bit error rate 
104 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
BER versus the SINR for different values of a, fJ and r (see Fig 9). It can be seen the 
BER diminishes as the SINR increases. The BER is about 10-3.8 for an SINR of 16 dB. 
We have fixed the a constant to the optimal step-size of LMS, 0.05, and we have taken 
many values for fJ and r, the optimal values for fJ and r are the ones that give the best 
result at BER = 10-2 and SINR = 5dB. It can be seen that the optimal fJ and rare 0.3 and 
0.5 respectively. 
'" "" "" "-
- -10-4110-3 
-0-0.110.3 
-0-0.310.1 
-;<-0.3A15 
~-0.5AJ.3 
-+-0.8/1 
--110.8 
... , , 
'....., 
10.4 '--------''---__ ---.1._--.1._--.1._---'-_--'-_-" 
a 6 8 10 12 14 16 
SNR Signal 10 Noise Ratio 
Figure 9: BER vs. SINR for the NCLMS 
The second scenario of simulations for this algorithm is to display the MSE versus the number of 
iterations for different values of a, fJ and r (see Fig 10). This is done by fixing the SINR to 10 dB, and 
the remaining parameters of the platform to their default values. We have taken sixth different values of 
the constants fJ and r . a is set to 0.05. It can be seen that as fJ and r increases, it ensure more 
convergence, but on the other hand the MSE increases 4 and 4.5are the best values for 
fJ and r respectively. Besides as the number of iterations increases the MSE decreases. 
105 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
g 
w 
Mean Square Error 
10° .--------,----------.-----,-------,------, 
o 500 1500 
Training Data Length 
2000 2500 
Figure 10: MSE vs. number of iterations for the NCLMS 
The third scenario of simulations for this algorithm is to display the BER versus 
the number of training bits for different values of a, f3 and r (see Fig Il). This is done 
by setting the SINR to 10dB and the remaining value of the platform to their default 
values. For this simulation we have taken two values for f3 and y each. We see that as 
the number of iterations increases, the BER decreases (~1 0-3.6). 
Training Data 
Figure Il: BER vs. number of iterations for the LMS 
106 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
A.5 Recursive Least Squares 
The first scenario of simulations for this algorithm is to display the bit error rate 
BER versus the SINR for different values of the constant 5 (see Fig 12). It can be seen 
the BER diminishes as the SINR increases. The BER is about 10-4·8 for an SINR of 16 
dB. We have fixed the forgetting factor to 0.98, and we have taken many values for the 
constant5 . The optimai 5 is the one that gives the best result at BER = 10-z and SINR 
= 5dB. It can be seen that the optimal 5 is le-z. 
10' ,-~-~~-~~---r-;:===ïl 
--1 .. 6 
···_·0-'1 .. 4 
·····-0-·-1.·3 
-0I(0'-1e-2 
·-····---1.·1 
-+--1 
-+-1.3 
-0-106 
10,5 O:--~--:-----:!-6 --!-8 -7.:10-7.:12"---:'14:---~16 
SNR Signal to Noise Ratio 
Figure 12: BER vs. SINR for the RLS 
The second scenario of simulations for this algorithm is to display the MSE 
versus the number of iterations for different values of the constant 5 (see Fig 13). This 
is done by fixing the SINR to 10 dB, and the remaining parameters of the platform to 
their default values. We have taken four values of the constant 5. It can be seen that 
when 5 is set to 1 e-z, it gives the best performance and at the same time the MSE remain 
constant. Besides as the number of iterations increases the MSE decreases. 
107 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
500 1000 1500 2000 2500 
Training DaIa length 
Figure 13:' MSE vs. number of iterations for the RLS 
The third scenario of simulations for this algorithm is to display the BER versus 
the number of training bits for different values of the constant t5 (see Fig 14). This is 
done by setting the SINR to 10dB and the remaining value of the platform to their 
default values. For this simulation we have taken two values for the constant t5 . We see 
that for t5 equal le-1 the BER is much more less than it for t5 = 1 e-2. Thus, as the 
number of iterations increases, the BER decreases (~1 0-3.8). 
Training Data 
Figure 14: BER vs. number of training data for the RLS 
A.6 Set Membership Identification 
The first scenario of simulations for this algorithm is to display the bit error rate 
BER versus the SINR for different values ofthe constant r (see Fig 15). It can be seen 
the BER diminishes as the SINR increases. The BER is about 10-4·5 for an SINR of 16 
108 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
dB. We have taken many values for the constant r between 0 and 1, the optimal r is the 
one that gives the best result at BER = 10-2 and SINR = 5dB. It can be seen that the 
optimal ris 0.8. 
10-
5 O'------:'---'--~6 -~8--:'10':--.....,1~2 -1:':-4 ----::16 
SNR Signal 10 Noise Ratio 
Figure 15: BER vs. SINR for the SMI 
The second scenario of simulations for this algorithm is to display the MSE 
versus the number of iterations for different values of the constant r (see Fig 16). This 
is done by fixing the SINR to 10 dB, and the remaining parameters of the platform to 
their default values. We have taken four values of the constant J. It can be seen that 
when as r decreases, the MSE decreases too, as well as the number of iterations 
increases the MSE decreases. 
Mean Square ErrtIf 
10° i,.-----~-~--~-----, 
0.6 
Training Datalength 
Figure 16: MSE vs. number of iterations for the LMS 
109 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
The third scenario of simulations for this algorithm is to display the BER versus 
the number of training bits for different values of the constant r (see Fig 17). This is 
done by setting the SINR to 10dB and the remaining value of the platform to their 
default values. For this simulation we have taken four different values for the constant r . 
We see that for as r increases the BER decreases as well as the number of iterations 
increases, the BER decreases too (~ 10-3.6). 
training bita 
Figure 17: BER vS. number of training data for the LMS 
A.7 Comparison 
A. 7. 1 Firsf scenario 
The first performance comparison is for the default parameters of the platform, 
and taking the best parameters in each algorithm, simulated separately. The result is 
illustrated in Figure 18. We look for the aigorithm that eut tirst or passes in a closely by 
the point (SNR=5dB, BER=l 0-2), for this simulation the best algorithm is the RLS. 
110 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
10° 1~r------'----r-----'--'--~===11 
-&-MRC 
--H····DMI 
-*-- LMS 
-~NLMS 
-f-NCLMS 
10·5~--:,----.l_~_~_--:-':-_---:':-_-:-':-_..J. 
o 2 4 6 B 10 12 14 16 
SNR Signal to Noise Ratio 
Figure 18: Default parameters 
A. 7.2 Second scenario 
The second performance comparison is in terms of the BER taking the default 
parameters of the platform, just changing the speed of the user to 100 Km/h, and taking 
the best parameters in each algorithm, simulated separately. The result is illustrated in 
Figure 19. We look for the algorithm that cut first or passes in a closely by the point 
(SNR=5dB, BER=1O-2), for this simulation the best algorithm is the RLS. 
10° F----,-~-~---r-~-r==!:::==ïl 
-E>-- MRC 
-B-OMI 
~LMS 
..• NLMS 
-+-NCLMS 
--RLS 
-+-SMI 
10'" 
10·'0~~--7---=-6 -~B"----:I:':-O --:':'2,.----:'":-4 ~16 
SNR Signal to Nois. Ratio 
Figure 19: Speed = 100 Km/h 
111 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
A. 7.3 Third scenario 
The third performance comparison is in terms of the BER taking the default 
parameters of the platform, just changing the number of users to 16, and taking the best 
parameters in each algorithrn, simulated separate1y. The result is illustrated in Figure 20. 
We look for the algorithm that cut tirst or passes in a closely by the point (SNR=5dB, 
BER=10-2), for this simulation the best algorithm is the RLS. 
A. 7.4 Fourth scenario 
10' r---.---,----,---,-----,----;====;_] 
-lr-MRC 
-a-DMI 
--LMS 
NLMS 
--l-NCLMS 
----+·_·-RLS 
__ SMI 
10·3~~--L--=--~~-----:7---:':-~ 
o 2 6 B 10 12 14 16 
SNR Signal ta Noise Ratio 
Figure 20: Number ofusers = 16 
The fourth performance comparison is in terms of the BER taking the default 
parameters of the platform, just changing the number of users to 50, and the process gain 
to 64, and taking the best parameters in each algorithm, simulated separate1y. The result 
is illustrated in Fig 21. We look for the algorithm that cut tirst or passes in a c10sely by 
the point (SNR=5dB, BER=10-2), for this simulation the best algorithm is the DM!. 
112 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
-{j- MRe 
-B" OMI 
~··LMS 
_. NUj\S 
-+- NQ.MS 
-+ .. RLS 
-<1- SMI 
Bi 
ID llf 
1 003 !:-----:!:--~:_____=:__~-_:';:___:';:___:'::-~ 
o 2 6 3 10 17 14 16 
SNR Signal to Noise Ratio 
Figure 21: Number ofusers = 50, OVSF = 64 
A. 7.5 Fifth scenario 
The fifth perfonnance comparison is in terms of the MSE taking the default 
parameters of the platfonn, and taking the best parameters in each algorithm (giving the 
best perfonnance in MSE simulations). The result is illustrated in Fig 22. We can see 
that the best algorithm for this simulation is the RLS, the SMI cornes second. 
10' .r--~--~-~--~--,---, 
TraIning Data Length 
Figure 22: MSE vs number of iterations, default parameters 
A. 7.6 Sixth scenario 
The sixth and the last perfonnance comparison is in tenns of the BER vs. the 
113 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
number of training data, taking the default parameters of the platform, and taking the 
best parameters in each algorithm (giving the best performance in BER vs. the number 
of training data). The result is illustrated in Fig 23. We can see that the best algorithm 
for this simulation is the RLS, and the NLMS cornes second. 
Training Data 
Figure 23: BER vs. number of training data, default parameters. 
A.7 Conclusion 
The DMI algorithm converges more rapidly than the LMS but it 1S more 
computationally complex. It requires a reference signal, and a matrix inversion. 
The LMS algorithm requires knowledge of the desired signal. Considering the 
performance of the algorithm, it always converges. Two significant features about this 
algorithm are its simplicity of implementation and its model-independent and 
therefore his robust performance. It does not require measurements of the pertinent 
correlation functions, nor do es it require a matrix inversion. Indeed, it is the simplicity of 
this algorithm that has made it the standard against which other adaptive filtering 
algorithms are benchmarked. In addition to that, we found that two factors affect its 
convergence behavior, the adaptation step size f.J , and the eigenvalues of the correlation 
matrix R of the tap-input vector. So when a small number is assigned to JI, the 
114 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
adaptation is slow which is equivalent to the LMS Algorithm having a long "memory", 
correspondingly, the excess mean-squared error after adaptation is small, on the average 
because of the large amount of data used by the algorithm to estimate the gradient 
vector. On the other hand, when JI is large, the adaptation is relatively fast, but at the 
expense of an increase in the average excess mean-squared error after adaptation, so less 
data enter the estimation, hence a degraded estimation error performance. The speed of 
convergence of the mean-squared error is affected by a spread of the eigenvalues ofR to 
a lesser extent than the convergence of Ê[W(n) J. In addition to that this algorithm 
converges slowly if the eigenvector spread of Rn is large. His major drawback is its 
relatively slow rate of convergence, and moreover it requires a reference signal (desired 
response). This is can be done in a digital system by transmitting a training sequence 
that is known to the receiver, or using the spreading code in the case. 
115 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
8.1 Introduction 
Appendix B 
Résumé en Français 
Ces dernières années ont été le témoin d'une évolution sans précédent du marché 
de télécommunication mobile, au niveau du portable, de la station de base (ES) et au 
niveau des services fournis aux consommateurs (courriel électronique, messages textes, 
multimédia, vidéo, conférences, etc.). Pour faire face à l'augmentation prévisible du 
nombre d'utilisateurs d'une part et à l'augmentation des débits de transmission d'autre 
part, les futurs réseaux de communications devront mettre en oeuvre des techniques de 
plus en plus évoluées et sophistiquées. Malheureusement, ces nouvelles techniques sont 
directement reliées et proportionnelles à l'élévation du coût du matériel ou de 
l'infrastructure du réseau utilisé, ce qui apporte une augmentation élevée du coût pour 
l'usager et le consommateur. 
Plusieurs approches à différents niveaux de coûts sont estimées, et l'une d'entre 
elles consiste à combiner les signaux reçus par les éléments d'une antenne réseau 
(plusieurs antennes connectées entre eux). Cette approche, de traitement de 
l'information, fait références aux systèmes utilisant des antennes intelligentes. Un des 
116 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
principaux avantages de ces systèmes réside dans l'augmentation potentielle du nombre 
d'utilisateurs dans le réseau cellulaire d'une part, et l'accroissement de l'éventail des 
services offerts par le système cellulaire d'une autre part. 
L'intérêt de ces systèmes est leur capacité à réagir automatiquement à un environnement 
complexe dont l'interférence est connue à priori. Ils ont le potentiel de réduire les 
interférences inhérentes aux multi-trajets, de valoriser le rapport signal à bruit (SINR), et 
d'introduire la réutilisation de fréquences dans un environnement confiné. De plus, ils 
permettent de réduire les niveaux des lobes secondaires existants dans la direction de 
l'interférence, tout en maintenant le lobe principal dans une direction utile. Cependant, 
en faisant circuler l'énergie directement entre la BS et le portable, une réduction des 
bruits ambiants est produite. Par conséquence, les interférences provenant d'autres 
usagers et obstacles sont éliminées. L'augmentation du nombre d'usagers et 
l'amélioration de la qualité du service offert représentent un atout pour les futurs 
systèmes sans fils de la troisième et quatrième génération. 
Ces systèmes reposent sur des antennes réseau, des dispositifs pour calculer les angles 
d'arrivées et des outils numériques de synthèse. Ces derniers attribuent des poids aux 
éléments de l'antenne réseau afin d'optimiser le signal de sortie. Les méthodes 
d'optimisation utilisent des techniques de contrôle prédéfinies pour la formation des 
voies et l'annulation d'interférents. 
Une antenne réseau adaptative peut être définie comme un réseau capable de modifier 
son diagramme de rayonnement. Cette modification est réalisée grâce à un algorithme 
performant implémenté et apte à répondre aux spécifications désirées. Plus 
spécifiquement, en considérant un système de communication à temps réel. 
117 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
L'évolution du domaine de la microélectronique et les micro systèmes est caractérisée 
par la réduction des dimensions des circuits intégrés. Au cours des quatre dernières 
décennies, l'industrie des semiconducteurs n'a pas cessé d'améliorer ses produits grâce à 
l'augmentation de la densité d'intégration, de la vitesse de fonctionnement et de la 
diminution du coût de fabrication. En effet, le domaine des transmissions numériques 
connaît un essor considérable lié à l'évolution des moyens et des méthodes de 
conception des systèmes numériques (CAO, VHDL, FPGA, etc.). Les circuits 
programmables permettent de mettre au point des systèmes électroniques flexibles et des 
prototypes rapidement. 
B.2 Problématique 
Le canal de transmission radio-mobile est un des moyens de communication les 
plus variables et les plus incontrôlables. La modélisation de ce genre de canaux de 
communications est souvent très sophistiquée dues aux nombreux problèmes et 
paramètres que ce type de canal pose. 
En parcourant un trajet entre l'émetteur et le récepteur, les ondes radioélectriques sont 
sujettes aux nombreuses irrégularités de morphologie, de caractéristiques 
électromagnétiques, de température, d'humidité du milieu traversé qui ont un effet de 
dégradation sur la qualité du signal. Pour cela, les transmissions hertziennes ont comme 
propriété de fluctuer en temps et en espace, souvent avec des variations très importantes 
dues à plusieurs phénomènes de propagation. Alors que le signal envoyé subira une 
dégradation ce qui permet de modifier les données émises de plusieurs manières, dont le 
besoin des systèmes sophistiques à la réception pour reconstituer la « mésurande ». En 
d'autres termes, récupérer les données émises avec un taux d'erreurs BER très faible; 
118 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
dont le besoin d'une technologie efficace comme les antennes intelligentes (Voir figure 
1). 
.. 
' .. 
"' . 
. -.... " 
" 
.... ~ 
Chemin direct ·~~4. 
~ ......... . " .. ~~ 
Chemin indir;~~'" 'u er#2 
.. ,~ sag 
...... ~ .. 
Figure 1: Exemple d'un envirmmement d'un réseau sans fil. 
En bref, deux raIsons principales pour l'introduction des antennes intelligentes: 
premièrement, la possibilité d'accroître la capacité du réseau cellulaire sans avoir à 
accroître le spectre lui même. Deuxièmement, résoudre les interférences imposées par le 
canal de transmission: les trajets multiples (Multipath Fading), l'effet Doppler pour le 
décalage de fréquence (Doppler Shift), interférences inter-symboles ISI (Inter-Symbol 
Interference), interférence co-canal (Co Channel Interference). L'une des grandes 
difficultés de recherche est d'atteindre des gains de performances adéquats tout en 
limitant l'augmentation de la complexité d'implémentations pratiques. 
Le processus des antennes intelligentes appelé Beamforming est purement mathématique 
et joue un rôle vital dans les systèmes des antennes intelligentes. Deux types de 
Beamforming se présentent: Le Beamforming adaptive (TBA) et le Beamforming 
répartiteurs de faisceaux (ABA). Les études dans la littérature ont démontré l'avantage 
du premier par rapport au second, dans la poursuite des variations du canal et dans les 
environnements bruités. 
119 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
Avec un modèle du signal, on remarque que la sortie du système Beamforming est 
fonction des poids complexes, alors la problématique réside dans: 
(i) Le choix d'un algorithme favorable qui calcule le poids optimal, afin que le 
système change la direction du faisceau (Beam) vers l'usager d'intérêt en gardant 
la contribution des usagers interférents minimale. 
(ii) La proposition de l'architecture VLSI pour l'algorithme choisi, celci devrait 
minimiser le coût tout en gardant les mêmes performances. 
8.3 Objectifs de la recherche 
Étant donné que la problématique se figure dans deux aspects physiquement 
différents, mais en terme d'application ces deux aspects sont très dépendants entre eux. 
Le premier problème est un aspect logiciel (Software) et mathématique appelé 
«Traitement de signal », il comprend les méthodes mathématiques appliquées aux 
antennes intelligentes. Le deuxième problème est un aspect matériel (Hardware) appelé 
« Implémentation », il comprend les circuits intégrés (DSP, FPGA, ASIC, ... ). Dans 
notre cas on utilise les composantes programmables FPGA. 
L'objectif de notre étude est de trouver pour chaque aspect la meilleure combinaison 
dans le but de donner une solution optimale qui lui correspond. 
Plus spécifiquement, si nous considérant le premier problème évoqué plus haut, notre 
objectif est d'insérer des méthodes d'antennes adaptatives dans une platefonne DS-
CDMA multi antennes, et d'évaluer ces algorithmes mathématiques, c'est-à-dire 
d'examiner, en tennes de performances (taux d'erreurs en fonction, de nombre 
d'usagers, de nombre d'antennes et du trafic) et de complexité (nombre d'opérations 
arithmétiques par cycle), les algorithmes existants afin de faire ressortir ceux qui 
120 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
répondent le mIeux aux exigences de l'industrie de communication sans fil. Une 
prédiction du résultat attendu, la possibilité d'orienter le faisceau (beam) d'une antenne 
en appliquant un algorithme puissant et très perfonnant, pennet d'effectuer une vaste 
couverture et de suivre les déplacements d'un utilisateur (poursuite de l'usager) à 
l'intérieur d'une même cellule en minimisant le bruit et les interférences, sans avoir 
recours à un mécanisme de rotation (mécanisme physique), ajouté à la possibilité 
d'obtenir un où plusieurs faisceaux ayant un gain important et une ouverture à mi-
puissance étroite. 
D'un autre coté, considérant le deuxième problème mentionné précédemment, notre 
objectif est de proposer des architectures VLSI pour les algorithmes choisis. Par la suite, 
d'implémenter ces algorithmes sur les composantes programmables FPGA, afin 
d'appliquer une analyse détaillée de quantification dans le but de détenniner la précision 
de l'algorithme, et d'estimer les ressources matérielles (nombre des additionneurs, 
multiplieurs, registres, soustracteurs, cellules logiques, tampons, etc.). Cette dernière 
s'appelle synthèse. Pour compléter l'objectif de recherche, une étude sur le nombre 
d'usagers maximale qu'on peut atteindre dans la composante programmable sera 
appliquée. 
8.4 Méthodologie préconisée 
Pour simuler les algorithmes choisis, une plateforme DS-CDMA a été développé. 
Cette dernière est basée sur le système SIMO. Ses caractéristiques sont les suivantes: 
~ Les éléments du réseau linéaire d'antennes (ULA) sont placés d'une façon qu'on a 
une distance de IJ2 entre chacun. Les angles d'arrivées sont distribués 
uniformément dans un intervalle de [0 BmaxJoùBmax = 1200 • 
121 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
~ Les données du pilot et du trafic utilisent deux codes orthogonaux. On considère 
un gain de 8, les codes utilisés pour le pilot et le trafic sont 
{l, -1, 1, -1, 1, -1, 1, -1} et {l,l,l, 1, 1, 1, 1, 1} respectivement. 
~ Le canal est modélisé par trois chemins, variant en fonction du temps (Rayleigh 
fading). Les délais de ces chemins sont [0, 1.1, 3.19] J.ls ayant des puissances 
moyennes de [0, - 3, - 9] dB respectivement. Le diamètre de la cellule est de 5 Km. 
Le rapport signal sur bruit est de 10 dB. La fréquence de la porteuse est de 2 GHz. 
«chip rate» de 3.84Mcps, la vitesse des usagers est 60 Km/h. Par défaut, quatre 
antennes sont utilisées. Le nombre des usagers est de cinq et le rapport de la 
puissance sur trafic est de -6 dB. 
~ La transmission est pactisé de la manière suivante: la taille de « slot » nonnalisée 
est de 2560 chip. Chaque trame contient 15 slot (38400 chip). On considère que le 
nombre des échantillons disponibles est de deux slots ce qui est équivalent à 
2 x 2560/8 = 2 x 320 données. 
Les algorithmes disponibles dans la littérature sont les suivants: 
~ Maximum Ratio Combining MRC 
~ Direct Matrix Inversion DMI 
~ Eigen Value Decomposition EVD 
~ Chip- Eigen Value Decomposition Ch ip-EVD 
~ Noise Constrained Least Mean Squares NC-LMS 
~ Recursive Least Squares RLS 
~ Two-Stage RLS. 
~ Etc. 
122 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
Les algorithmes implémentés sont: MRC, DM! et NC-LMS. Durant l'implémentation 
deux méthodes ont été utilisées. La première est reconnue par prototypage rapide. Cette 
méthode est descendante de la programmation orienté objet (00). En effet, afin de 
simuler l'architecture conçue, il s'agit d'utiliser les blocks de Xilinx (additionneurs, 
multiplieurs, soustracteurs, registres, etc.) dans l'environnement Simulink et les 
connecter entre eux. Après simulation, il était possible d'estimer les ressources 
matérielles sur les composantes programmables FPGA. Cette technique permet de 
sauver le temps de conception, ou bien le «Time To Market ». Elle permet de générer 
automatiquement le code VHDL pour l'architecture proposée. 
La deuxième méthode consiste à utiliser le codage classique du VHDL sous 
l'environnement du logiciel ISE de Xilinx. Cette méthode permet de visualiser les détails 
d'implémentation dans chaque phase de conception (placement et routage, timing, etc.). 
Cela permettre aux développeurs bien encadrer sa conception. 
L'utilisation des différentes méthodes d'implémentation permet d'avoir un point de vue 
étendu et approfondi sur les résultats obtenus afin de bien conclure de façon objective. 
B.5 Propositions 
Tel que mentionné plus haut, trois algorithmes sont choisis pour 
l'implémentation. Le MRC, le choix de cet algorithme est fondé sur le fait que le MRC 
représente l'algorithme de base pour les antennes intelligentes. De la même façon, le 
DM! représente un défi pour la plupart des ingénieurs à cause de l'inverse de la matrice 
et son coût de l'implémentation. Finalement, l'algorithme NC-LMS se trouve à un 
niveau intermédiaire de complexité entre les deux précédents. Le MRC et le NC-LMS 
sont implémentes en utilisant la méthode de prototypage rapide (première méthode 
123 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
proposée dans ce document). Le DM! est implémenté en utilisant la méthode de codage 
manuel (deuxième méthode). 
La figure 2 montre les architectures proposées de MRC et de NC-LMS. Ces algorithmes 
sont montés d'une manière direct ou« straight-forward manner ». Le but est de faire une 
comparaison de complexité entre les deux, et d'appliquer le NC-LMS au Bearnforrning. 
À signaler que l'algorithme NC-LMS n'a pas été appliqué aux antennes intelligentes. 
Nonnalement, le NC-LMS est hors domaine d'application. 
a) 
b) 
Figure 2: Implémentation proposée du MRC (a) et NC-LMS (b). 
L'architecture générale proposée pour le DM! est visualisée par la figure 3. Cette 
architecture est composée de plusieurs blocks «pipelinées » et «parallélisées ». Dans 
l'architecture générale, des sous-architectures parallèles et systoliques ont été utilisées. 
124 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
3 --------------[ R [n]] 
[$J 
4 - - - - -- -- - - - - -- -[ w m [n]1r., 
[,.lnIL-$ 
5 -----------------xm[n] 
m=l , , 
k=1,2, ... ,K 
~1,2, ... ,M 
#cyc/es ~ '::il m-
k=l k=K 
.------. 
PESD-DM1 PESD-DMI 
Figure 3: L'architecture générale proposée pour le SD-DMI 
8.6 Résultats 
Dans les sections précédentes nous avons proposées des architectures pour 
résoudre la problématique évoquée ultérieurement. Par la suite, les résultats de 
l'implémentation de ces architectures seront discutés. 
Deux types de résultats sont présentés: premièrement, études comparatives entre les 
différents algorithmes utilisés. Deuxièmement, la complexité des ces algorithmes et la 
quantification et l'implémentation seront évoquées. 
125 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
B.6.1 Études comparatives 
La première étude comparative est en fonction du rapport puissance sur trafic. 
D'après les résultats (figure 4), le DM! montre une dégradation réduite dans les régions 
du faible PTR. Par contre, cet algorithme donne une très haute performance dans les 
régions où le PTR est élevé. 
0.18 
0.14 
0::: 
w 0.1 III 
-12 
- MRC 
-e- DMI 
--A- EVD 
.. Chip-EVD 
-#- NC-LMS 
----7- RLS 
---<:;'l- TS-RLS 
-10 -8 -6 
PTR(dB) 
-4 -2 
Figure 4: L'effet du changement du PTR. 
o 
La deuxième étude comparative est en fonction du nombre d'antennes (éléments) dans la 
station de base (réseau linéaire). Le PTR et le nombre d'usagers sont fixés à -6 dB et à 5 
respectivement. Le résultat obtenu montre une meilleure performance (convergence) 
dans l'algorithme TS-RLS. De la même façon le DM! donne une réponse similaire au TS-
RLS. Le détail de cette constatation se trouve dans la figure suivante. 
126 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
a:: 
w 
m 
-MRC 
--e- DMI 
-A- EVD 
" Chip-EVD 
---ft- NC-LMS 
--Î;> RLS 
<4 Two-Stage-RLS 
2 3 4 5 6 7 
L, Antennas 
Figure 5: L'effet de changement du nombre d'antennes. 
8 
Pour avoir une comparaison scientifique objective, la troisième étude comparative est 
basée sur le nombre d'usagers (figure 6). Cependant, le PTR et le nombre d'antennes 
sont fixés à -6 dB et 4 respectivement. Comparativement au deuxième essai, la 
performance dans le TS-RLS est la même à une différence que les essais représentent 
deux environnements différents. Le DM! et le MRC ont une performance équivalente. 
127 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
0::: 
w 
!Xl 
3 4 5 
K, Us ers 
- MRC 
-e- DMI 
--A-- EVD 
.. Chip-EVD 
---*- NC-LMS 
... t> RLS 
·<l TS-RLS 
6 
Figure 6: L'effet du changement du nombre d'usagers. 
7 
En effet, selon les trois études précédentes en simulation, le TS-RLS donne une meilleure 
performance. Par contre, ce dernier est beaucoup plus complexe comparativement à 
l'algorithme DM!. 
8.6.2 Études de complexités, de quantification et d'implémentation 
À travers de cette section, nous allons réaliser l'étude de la complexité des 
algorithmes implémentés. Les algorithmes choisis pour l'étude de la complexité sont: 
NC-LMS. MRe et DM!. 
Cette étude montre que le NC-LMS a une complexité de 2.5 plus que le MRe. De 
cette manière, le NC-LMS demande des ressources matérielles de 2.5 plus que le MRC. 
On a une latence (délai) de 15 et de 10 cycles pour le NC-LMS et le MRC 
128 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
respectivement. Pour exécuter une opération, le MRC demande une période minimale 
de 10 ns et une fréquence maximale de 95 MHz. Similairement, le NC-LMS prend une 
période minimale de 30 ns et une fréquence maximale de 33 MHz. D'après les résultats 
(Tableau 1), on remarque qu'afin d'implémenter le NC-LMS avec 4 antennes et 5 
usagers, nous devons fournir d'un FPGA qui est dispendieux (Virtex XC4VSX55). 
T bl a eau 1 P t ourcen age d es « sr td lces » e If r es mu IpJleurs~our 4 t an ennes. 
Maximum Ratio Combining MRC 
32,16 24,16 18,10 16,12 
VP30 %S 29.4 22.1 10.1 9.0 
%M NIA NIA 47 47 
VPI00 %S 9.1 6.8 3.1 2.8 
%M 57.6 57.6 14.4 14.4 
VSX55 %S 16.4 12.3 5.7 5.0 
%M 50 50 12.5 12.5 
Noise Constrained Least Mean Squares NC-LMS 
VP30 %S 73.9 58.9 34.7 26.0 
%M NIA NIA 88.2 88.2 
VPlOO %S 22.9 18.3 10.8 8.1 
%M NIA NIA 27 27 
VSX55 %S 41.16 32.8 19.3 14.5 
%M 93.8 93.8 23.4 23.4 
D'après l'étude de quantification, on observe que le MRC est plus sensible que le NC-
LMS à cause de sa nature adaptative. Par contre, MRC est de 2.5 moins coûteux que 
l'autre. 
Durant l'implémentation de l'algorithme DMI, l'inverse de la matrice a été résolu par la 
méthode itérative de descente du gradient. Nous avons remarqué que cette méthode n'est 
pas coûteuse, de plus avec le «re-design» plusieurs caractéristiques du DM! ont été 
découvertes. La résolution de l'inverse de la matrice par la méthode proposée, a 
beaucoup d'avantages surtout dans les environnements bruités et dans la poursuite des 
variations du canal mobile. 
1 X = (a, b), ou "a" représente la partie fractionnelle et « b » représente la partie décimale. 
129 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
L'étude de quantification pour le DMI montre que 14 bits sont suffisants pour obtenir 
des performances adéquates. Afin d'avoir plus de précision, nous avons opté à une 
implémentation sur 16 bits. 
Il a été démontré que l'architecture est bien parallélisée et pipelinée, avec une latence 
totale de : TLatency = 6L + 16 , où L est le nombre des antennes. Alors pour L = 4 la latence 
obtenue est de 40 cycles pour l'architecture total. 
Finalement, une étude dans le but de maximiser le nombre d'usagers sur la composante 
programmable FPGA a été réalisée. Pour se faire, plusieurs boucles d'optimisations 
(basant sur le nombre des «slices» et des multiplieurs) ont été adoptées. De cette 
manière, plusieurs blocks contiennent un grand nombre de multiplieurs ont été insérés 
dans l'architecture. Une méthode d'optimisation est de synthétiser les multiplieurs en 
LUT. Cette méthode réduit le nombre de multiplieur d'une façon évidente. 
Le meilleur compromis nous permet d'atteindre un maximum de 90 usagers sur 
un VP 50 (Virtex II Pro) qui n'est pas assez coûteux. 
Le résultat de l'optimisation donne une constatation claire, cette constatation stipule 
qu'on peut atteindre 90 usagers, en utilisant un algorithme complexe telle DM! avec une 
fréquence (débit) de 2 MHz. Selon le tableau suivant le DMI «re-design» est une 
bonne solution pour les antennes intelligentes. 
Tableau 2: Nombre maximal d'usagers atteint dans une composante programmable pour L= 4, 
J=10 etM=3. 
c~~ VIRTEX2 VIRTEX2PRO X X X X X Vltx1~ X () () () X X ~tI1 IV IV N () () () <: <: <: () () N IV Vl :;0 . IV IV N :â :â <: - v.> 00 ~ ~ 0 0 0 0 '"r1 00 0 0 0 v.> VI 0 0 0 0 IV 0 0 
K"AX 2 15 37 65 4 10 53 90 
130 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
B.7 Conclusion 
À travers de ce travail, une recherche bibliographie a été élaborée. Dans cette 
dernière une série des algorithmes ont été ciblés dans le domaine des antennes 
intelligentes. La suite de cette conclusion se concentre sur deux aspects principaux visés 
par cette recherche: le premier consiste à montrer les accomplissements obtenus. Le 
deuxième stipule à donner le travaux futures afin d'avoir une continuité de ce projet. 
Accomplissement: deux contributions originales on été élaborées durant le 
développement de cette recherche: 
1) l'évaluation de plusieurs algorithmes appliqués aux antennes intelligentes. 
2) l'implémentation de trois algorithmes trouvés dans la littérature (MRC, Ne-LM S, 
DM!) sur les composantes programmables FPGA. 
Ces contributions ont des points forts et originalités qui caractérisent le travail effectué. 
Ces derniers sont les suivants: 
>- La plateforme DS-CDMA assez développé: La plateforme DS-CDMA développée 
comporte des caractéristiques et des conditions de simulations très proches de 
l'environnement de communication réelle, ce qui a aboutit à des vraies performances 
des algorithmes simulés. 
>- Implémentation de Ne-LMS: L'intégration de l'algorithme NC-LMS aux antennes 
intelligentes. À noter que, cet algorithme n'était pas utilisé dans ce domaine. 
>- Utilisation du prototypage rapide: L'implémentation des deux algorithmes MRC et 
Ne-LMS en utilisant la méthode de prototypage rapide. Cette méthode a permis de 
faire une étude approfondie sur l'outil et sur les différentes astuces 
d'implémentation. 
131 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
~ Ingénierie inverse de l'algorithme DMI: Le «re-design» de l'algorithme DMI, pour 
la résolution de l'inverse de la matrice, a donné beaucoup d'avantages dans la 
poursuite des usagers dans les environnements bruités. Ce re-design de l'algorithme 
a facilité le parallélisme et le pipeline pour augmenter le débit. 
~ Utilisation du codage manuel: L'implémentation du DMI en utilisant l'outil ISE, a 
permis de bien comprendre les différentes phases de l'intégration. 
~ Estimation du nombre d'usagers: Selon la recherche bibliographie, dans le domaine 
de l'antenne d'intelligent, l'estimation du nombre d'usagers n'était pas l'intérêt de 
chercheurs. Un point fort et original de cette recherche est l'estimation du nombre 
d'usagers dans la composante programmable en vue des systèmes d'antennes 
intelligentes, afin d'avoir un compromis entre la composante programmable et 
l'algorithme implémenté. 
Travaux futurs: cette section donne un aperçu global sur des possibilités d'une 
extension du projet élaboré dans cette maîtrise. 
~ Développement des algorithmes plus performants que le DMI et le TS-RLS (BER 
<0.02). 
~ Développement des architectures à faible consommation de puissance et de 
complexité, pour les algorithmes développés. Cette implémentation jouera le rôle 
d'un processeur « core» dans le circuit intégré. 
~ Implémentation des architectures pour les algorithmes développées au niveau les 
FPGAs et les ASICs. 
~ Développement d'un SOC (System On Chip) basant sur les systèmes des antennes 
intelligentes. 
132 
Reproduced with permission of the copyright owner.  Further reproduction prohibited without permission.
B.8 Publications 
);> Elie H. Sarraf, Messaoud Ahmed-Ouameur, and Daniel Massicotte "FPGA 
Implementation of Beamforming Receiver Based on MRC and NC-LMS for DS-
CDMA System". IEEE International Conference on Application-specifie 
Systems, Architectures and Processors, pp 114-117, Steamboat Springs, CO, 
September 11-13, 2006. 
);> Elie H. Sarraf, Messaoud Ahmed-Ouameur, and Daniel Massicotte "FPGA Design 
and Implementation of Direct Matrix Inversion Beamforming based on a 
Steepest Descent Method" (accepté à MWCAS 2007, Montréal 5-8 Août 2007). 
133 
