Reconfigurable cores for wireless appliances: Turbo codes by Brown, Edward
 
 
 
 
 
 
 
https://theses.gla.ac.uk/ 
 
 
 
 
Theses Digitisation: 
https://www.gla.ac.uk/myglasgow/research/enlighten/theses/digitisation/ 
This is a digitised version of the original print thesis. 
 
 
 
 
 
 
 
 
Copyright and moral rights for this work are retained by the author 
 
A copy can be downloaded for personal non-commercial research or study, 
without prior permission or charge 
 
This work cannot be reproduced or quoted extensively from without first 
obtaining permission in writing from the author 
 
The content must not be changed in any way or sold commercially in any 
format or medium without the formal permission of the author 
 
When referring to this work, full bibliographic details including the author, 
title, awarding institution and date of the thesis must be given 
 
 
 
 
 
 
 
 
 
 
 
 
 
Enlighten: Theses 
https://theses.gla.ac.uk/ 
research-enlighten@glasgow.ac.uk 
 
 
 
 
 
 
 
https://theses.gla.ac.uk/ 
 
 
 
 
Theses Digitisation: 
https://www.gla.ac.uk/myglasgow/research/enlighten/theses/digitisation/ 
This is a digitised version of the original print thesis. 
 
 
 
 
 
 
 
 
Copyright and moral rights for this work are retained by the author 
 
A copy can be downloaded for personal non-commercial research or study, 
without prior permission or charge 
 
This work cannot be reproduced or quoted extensively from without first 
obtaining permission in writing from the author 
 
The content must not be changed in any way or sold commercially in any 
format or medium without the formal permission of the author 
 
When referring to this work, full bibliographic details including the author, 
title, awarding institution and date of the thesis must be given 
 
 
 
 
 
 
 
 
 
 
 
 
 
Enlighten: Theses 
https://theses.gla.ac.uk/ 
research-enlighten@glasgow.ac.uk 
Reconfigurable Cores for Wireless 
Appliances: Turbo Codes
V O L U M E  1 (O F  2)
Edward Brown
A themed portfolio submitted to 
The Universities of
Edinburgh,
Glasgow,
Heriot-Watt and 
Strathclyde
For the Degree of 
Doctor of Engineering in System Level Integration
© Edward Brown, May 2004
ProQuest Number: 10754011
All rights reserved
INFORMATION TO ALL USERS 
The quality of this reproduction is dependent upon the quality of the copy submitted.
In the unlikely event that the author did not send a com p le te  manuscript 
and there are missing pages, these will be noted. Also, if material had to be removed,
a note will indicate the deletion.
uest
ProQuest 10754011
Published by ProQuest LLC(2018). Copyright of the Dissertation is held by the Author.
All rights reserved.
This work is protected against unauthorized copying under Title 17, United States C ode
Microform Edition © ProQuest LLC.
ProQuest LLC.
789 East Eisenhower Parkway 
P.O. Box 1346 
Ann Arbor, Ml 48106- 1346
f^LASCOw" i 
.i^VEftsrrvI LEJbrary-
Abstract
This portfolio thesis documents the work carried out by the author whilst enrolled in the 
Engineering Doctorate (EngD) programme. The author completed the majority of the work 
at the sponsoring company, Xilinx, in their D igital Signal Processing division.
The thesis introduces the subject o f Turbo codes, highlighting the motivation behind their 
inclusion in international standards. Particular attention is given to the cdm a2000 and 
UM TS third generation m obile telephony standards. Both the technical and commercial 
advantages/disadvantages of im plem enting Turbo codes in a Field Programmable Gate 
Array (FPGA) based system are discussed. The subject o f third generation mobile 
technology is also discussed, this includes an introduction to spread spectrum and rake 
receivers.
The commercial relevance of all projects conducted is discussed. These projects allowed 
the sponsoring company to highlight the advantages of using FPGAs in third generation 
m obile base stations.
A  novel system for testing forward error correction (EEC) codes is presented. Results 
obtained are shown and discussed. A novel param eterisable Turbo decoder will also be 
highlighted. The decoder in question allows the user to specify certain criteria that can be 
used to control the memory used by the decoder and its latency. A novel hardware 
architecture for Turbo decoders is proposed, as is a unique channel variance value that 
optimises a cdm a2000 Turbo decoder. O ther subjects covered are Duo-Binary Turbo 
codes, Turbo decoder hardware architectures and how to calculate the input values to Turbo 
decoders.
EngD Portfolio -  Volume 1, Edward Brown i
Declaration of Originality
The material in this thesis is entirely the results of my own independent research under the 
supervision of Dr J. M. Irvine and Mr B. Wilkie and is not the outcome of any collaborative work. 
All published or unpublished material used in this thesis has been given full acknowledgement.
Edward Brown 
28 Septem ber 2005
EngD Portfolio -  Volume 1, Edward Brown ii
Acknowledgements
On a professional level I would like to thank both of my supervisors, Bill W ilkie and Dr. 
James Irvine, for all the support and advice they have provided throughout my EngD. I am 
extremely grateful for the time they spent advising me on all matters, from reviewing my 
conference submissions to answering technical queries. Special thanks also go to Dr. Colin 
Carruthers and Dr. David Lawrie. Colin, along with Bill, awarded me the opportunity to 
work with Xilinx, while David proved an invaluable resource throughout the latter stages of 
my EngD project. I also would not have been able to enrol on the EngD without the help of 
my form er supervisor, Patrick Lysaght. I would therefore, also like to thank Patrick for 
giving me the opportunity to do the EngD.
I would also like to thank all o f X ilinx Edinburgh, especially the DSP team, for making me 
feel so welcome. Also to Ann and Rhona, thanks for driving and/or entertaining me on all 
those long journeys along the M8.
I would like to express my gratitude to the ISLI and all its staff, particularly A lexandra 
Buchanan for all the help and guidance she has provided. O ther staff I would like to thank 
are Sian W illiams for all her help in the library, and also for helping with EngD matters in 
the latter stages o f my degree, including arranging my viva, and W endy Glendinning for all 
her help during the MSc portion of the EngD.
On a more personal basis I ’d like to thank my M um and Dad for all the support they have 
given me throughout not only my EngD, but my undergraduate degree as well. I would like 
to thank my sister and nephew for always supporting me throughout both of my degrees. 
I ’d like to thank all my friends, particularly Linda and Farah for supporting me throughout 
my EngD, Linda and Louise, my flatmates, for putting up with me throughout my write up 
and my golfing/drinking buddies, Richie, Steve, Nige, Jamie, Andy and Brian for providing 
some much needed distraction over the last 5 years.
EngD Portfolio -  Volume 1, Edward Brown ill
Glossary
3G
Alpha Probability 
ASIC
AW GN channel
BER
BERT
Beta probability
Block code
cdm a2000 
Coding gain
Convolutional code
DSP
DVB
Third generation mobile, an advancem ent from 2G standards, such as 
GSM  and IS-95, that allows higher data rates. The European 
standard is the UM TS standard, the United States of America 
standard is the cdm a2000 standard.
One of the three probabilities required to calculate final Turbo 
decoder output. The alpha probability is calculated by perform ing a 
forward trace of the trellis.
Application Specific Integrated Circuit. An integrated circuit that is 
not reconfigurable. It is used for one particular purpose.
Additive white Gaussian noise channel. A channel that is often used 
in com m unication system models. The name comes from the fact 
that the noise is linear and has an average value o f 1 over its entire 
distribution.
Bit error rate. A measure for how many bits in a particular sequence 
are in error. Usually, BER is given in powers of ten, e.g. 1CT6, 
meaning 1 bit in every 1 million bits are in error.
BER Tester. A platform  that can be used to determine the BER o f a 
particular system.
Required to calculate the final Turbo decoder output. Calculated by 
perform ing a backward trace o f the trellis.
A type o f forward error correction (FEC) code that relies only on the 
current input for its output.
The 3G standard to be im plem ented in the U nited States of America. 
The difference in dB between two BER curves, when com pared at 
the same BER value.
A type o f FEC code whose current output is dependent upon the 
previous m  inputs.
Digital Signal Processor. A specialised m icroprocessor that is used 
in digital signal processing (DSP). It contains com ponents, such as 
M ultiply-Accum ulators (M ACs), that are often used in DSP.
Digital Video Broadcasting. A standard that uses Turbo coding on 
its reverse satellite channel.
EngD Portfolio -  Volume 1, Edward Brown iv
N„
Fading channel
Fast term ination
FEC
FER
FPGA
G am m a probability 
Interleaver
LLR
Log-M AP
M AP
A measure of signal to noise ratio, where Eb is the energy per bit and 
No is the noise spectral density.
A channel model that is more like a real mobile channel than the 
A W GN channel introduced earlier. The most common fading 
channels are the Rayleigh channel and the Rician channel. The 
Rayleigh channel is a model where the source and destination have 
no direct line of sight. In the Rician channel, the source and 
destination has a direct line o f sight or has a dom inant com ponent 
among the signals received by the destination.
A m ethod of reducing the average number o f iterations perform ed by 
the Turbo decoder.
Forward Error Correction. A  method of detecting and correcting 
errors in a com m unications system.
Fram e Error Rate. A measure of error in an FEC system. As data in 
3G systems is transmitted in frames, this provides a quantifiable 
measure of how many frames would be dropped for any given 
system.
Field Program m able Gate Array. A reconfigurable integrated circuit 
containing DSP specific components, such as multipliers.
The gam m a probability is required to calculate the final output from 
the Turbo decoder. It is also known as the branch probability.
An interleaver is used to de-correlate data entering the second Turbo 
com ponent encoder from the first Turbo component encoder, 
im proving the BER perform ance o f the Turbo code in the process. 
Log likelihood ratio. Data input to a Turbo decoder is in the form at 
o f a LLR. An LLR gives the probability o f a particular symbol 
representing either a 1 or a 0. The larger the magnitude o f the LLR, 
the higher the probability that it is correct.
Applying the M AP algorithm  in the logarithm domain allows it to be 
im plem ented in hardware more easily, operations such as 
exponentials disappear and multiplications turn into additions. 
M axim um  a posteriori. The M AP algorithm is used in Turbo codes 
to calculate the probability that a particular bit at a particular point in 
time is either a 1 or a 0.
EngD Portfolio -  V olum el, Edward Brown v
M ax-log-M AP
Max Scale
Puncturing
RSC
SISO
SOYA
Spread spectrum
Turbo code
Turbo encoder
Turbo decoder
A simpler variation o f the log-M AP algorithm. The log-M AP 
algorithm requires a look up table to produce its output. The max- 
log-M AP algorithm  simplifies the log-M AP by rem oving the look up 
table.
The M ax Scale is a variation o f the max-log-M AP algorithm. It 
applies a scaling factor to the Turbo decoder output, im proving the 
BER perform ance of the Turbo decoder.
A m ethod of increasing the inform ation rate of a FEC code. The 
FEC code is punctured by rem oving certain symbols at the Turbo 
encoder output. The punctured symbols are replaced by an all zero 
codeword at the Turbo decoder input, thereby decreasing the BER 
performance.
Recursive systematic convolutional. Turbo component encoders are 
RSC encoders, an RSC encoder is one that contains feedback and 
whose input is a part o f the encoder output symbol.
Soft-in, soft-out. Turbo com ponent decoders are SISO decoders, by 
both inputting and outputting soft bits each com ponent decoder can 
share information on what they think the Turbo decoder output 
should be. The main types o f SISO decoder used im plem ent either 
the M AP algorithm or the SOVA.
Soft output Viterbi algorithm. A  variation o f the Viterbi algorithm  
that outputs soft bits, allowing it to be used to im plem ent a Turbo 
com ponent decoder.
A lmost all 3G standards are based on spread spectrum systems. 
Spread spectrum is a term that describes a system that allows users to 
share both time and frequency.
A turbo code is a type of FEC, traditional Turbo codes use RSC 
encoders as a com ponent encoder. Turbo product codes use block 
encoders as com ponent encoders.
The Turbo encoder is used at the transmission side o f the 
com m unications system. It contains 2 encoders separated by an 
interleaver. Each encoder is known as a component encoder.
The Turbo decoder is used at the receiver side o f the 
com m unications system. It contains 2 SISO decoders separated by
EngD Portfolio -  Volum el, Edward Brown vi
UMTS
VHDL
Sliding window
an interleaver and de-interleaver. Each SISO decoder is known as a 
com ponent decoder.
The name of the 3G standard im plemented in Europe.
Very high speed integrated circuit description language. A 
program m ing language used to model hardware systems. Widely 
used when designing FPGA circuits.
A technique used to reduce the overall latency o f a Turbo decoder, 
allows the Turbo decoder to begin decoding before a complete set o f 
data has been received.
EngD Portfolio -  Volum el, Edward Brown vii
Symbols Used
a,(s) Forward state metric/probability
Pt-i(s’) Backward state metric/probability
jk(s’, s )  Branch metric/probability
t  M easure of time
i N um ber of decoder iterations
L  M easure o f memory
L(uk) Log-likelihood ratio
Le(uk) Extrinsic data
UIN( uk ) Com ponent decoder extrinsic input
Lout( uk ) Com ponent decoder extrinsic output
m  Num ber of com ponent encoders/decoders
N  Block size
q Symbol width
R  Code rate
S  N um ber of states in trellis
u, Turbo encoder input
ut Turbo decoder output
w Sliding window width
X,5 RSC 1 systematic output
X ts RSC 2 systematic output
X f  RSC 1 parity output
X tp RSC 2 parity output
Yts SISO 1 systematic input
Yt s SISO 2 systematic input
Ytp SISO 1 parity input
Yt p SISO 2 parity input
y  Turbo decoder input
EngD Portfolio -  Volume 1, Edward Brown vm
Contents
1. Executive Summary...................................................................................................................1
1.1. Projects..................................................................................................................................... 1
1.2. Commercial Relevance........................................................................................................ 3
1.3. Novelty...................................................................................................................................... 3
1.4. Milestones.................................................................................................................................4
2. Portfolio Organisation..............................................................................................................6
3. Taught M odules..........................................................................................................................8
3.1. Technical Credits...................................................................................................................8
3.2. Business M odules.................................................................................................................. 9
4. Publications................................................................................................................................10
5. Commercial Relevance...........................................................................................................11
5.1. FPG A s........................................   11
5.2. Contribution.......................................................................................................................... 13
6. Technical Background............................................................................................................15
6.1. Convolutional Codes........................................................................................................... 16
6.2. Turbo Codes.......................................................................................................................... 18
6.2.1. Turbo Encoder................................................................................................................20
EngD Portfolio -  Volume 1, Edward Brown ix
6.2.2. Interleaver..................................................................................................................22
6.2.3. Turbo D ecoder..............................................................................................................23
6.3. Field Programmable Gate Arrays...................................................................................28
6.4. Third Generation Mobile Technology...........................................................................30
7. Results and Discussion........................................................................................................... 35
7.1. Bit Error Rate Test Platform ................................................................................................ 35
7.2. Block S ize..............................................................................................................................39
7.3. Iterations................................................................................................................................41
7.4. Code R ate...............................................................................................................................45
7.5. Parameterisable Turbo Decoder.....................................................................................45
7.6. cdma2000 vs. U M TS........................................................................................................... 53
7.7. Channel Variance................................................................................................................55
7.8. Comparison With Published Results.............................................................................. 59
8. Conclusion................................................................................................................................ 65
8.1. Future Direction................................................................................................................... 67
9. References................................................................................................................................69
EngD Portfolio -  Volum el, Edward Brown X
List of Figures
Figure 1: W ork carried out by the EngD over a four year perio d ..................................................5
Figure 2: Organisation of V o lu m el....................................................................................................... 7
Figure 3: Overview of a sim ple com m unications system ............................................................ 15
Figure 4: Convolutional encoder.......................................................................................................... 16
Figure 5: RSC en co d er........................................................................................................................... 17
Figure 6: RSC encoder with trellis term ination............................................................................... 17
Figure 7: Top-level view o f a Turbo en co d er...................................................................................18
Figure 8: Top-level view o f a Turbo decoder...................................................................................19
Figure 9: UMTS component encoder................................................................................................. 20
Figure 10: cdma2000 com ponent encoder.........................................................................................20
Figure 11: DVB component encoder.................................................................................................. 22
Figure 12: cdma2000 standard interleaver.........................................................................................23
Figure 13: Calculating alpha p robability ........................................................................................... 25
Figure 14: Calculating beta probability ............................................................................................. 26
Figure 15: Add, compare, select, offset un it..................................................................................... 27
Figure 16: FPGA basic structure..........................................................................................................28
Figure 17: System Generator DSP design f lo w ...............................................................................29
Figure 18: System Generator hardware design flo w ...................................................................... 29
Figure 19: Spread spectrum signa l...................................................................................................... 30
Figure 20: Multipath channel................................................................................................................31
Figure 21: Signal received at base s ta tio n .........................................................................................31
Figure 22: Rake receiver........................................................................................................................ 32
Figure 23: Transport channel o f a typical 3G system ..................................................................... 33
Figure 24: BERT Platform top-level...................................................................................................36
Figure 25: Procedure for running B E R T ........................................................................................... 37
Figure 26: System Generator im plem entation of BERT p latform .............................................. 38
Figure 27: BER performance o f cdm a2000 Turbo codec as block size changes.....................40
Figure 28 BER performance of UM TS Turbo codec as block size changes............................40
Figure 29: BER performance o f cdm a2000 Turbo codec as num ber of iterations perform ed
increases............................................................................................................................................41
Figure 30: BER performance o f UM TS Turbo codec as num ber o f iterations perform ed
increases............................................................................................................................................42
Figure 31: BER plot of different fast termination thresholds.......................................................43
EngD Portfolio -  Volume 1, Edward Brown x i
Figure 32: Average iterations plot for different fast termination thresholds........................... 44
Figure 33: Turbo decoder with fast termination im plem ented ....................................................44
Figure 34: Effect o f using different code rates on BER performance in cdma2000 Turbo
c o d ec ................................................................................................................................................. 45
Figure 35: Sliding window technique.................................................................................................46
Figure 36: Decoding of packet with block size 256 using traditional Turbo decoder, Turbo
decoder with sliding window 64 and Turbo decoder with sliding window 3 2 .............. 48
Figure 37: FER plot showing different sliding window s iz e s ..................................................... 50
Figure 38: BER plot for different internal m etrics..........................................................................51
Figure 39: BER performance of log-M AP, M ax-log-M AP and M ax Scale algorithm ......... 52
Figure 40: Comparison of cdm a2000 and UMTS Turbo decoders............................................ 54
Figure 41: Plot o f against noise variance................................................................................... 56
Figure 42: Non-optimal BER performance o f code rate y using o'2 value of 1.......................57
Figure 43: Code rate y BER plots for noise variance values between 1 and 1-35.................58
Figure 44: Optimal BER perform ance o f code rate y using o'2 value o f 1 1 ............................58
Figure 45: Comparison of BER plots with known (? and BER plots using cr2= 1-1.............. 59
Figure 46: BER plot for comparison with Valenti and Sun [5 7 ]................................................ 60
Figure 47: BER plot for com parison with Qi [5 8 ]..........................................................................61
Figure 48: BER plot for comparison with Sugimoto et al [59]....................................................62
Figure 49: BER plot for com parison with Sugim oto et al [59]....................................................63
Figure 50: BER plot for com parison with Valenti and Sun [5 7 ]................................................ 64
EngD Portfolio -  Volum el, Edward Brown xii
List of Tables
Table 1: Technical credits breakdow n...................................................................................................8
Table 2: Business credits b reakdow n....................................................................................................9
Table 3: cdm a2000 puncturing patterns for data..............................................................................21
Table 4: cdm a2000 puncturing patterns for trellis term ination.................................................... 21
Table 5: DVB standard puncturing p a tte rn s ..................................................................................... 22
Table 6: Resources available and resources required for various BERT im plem entations.. 39
Table 7: FPGA resources required for different internal fractional w idths.............................. 51
Table 8: FPGA resources used for different SISO a lgo rithm s.................................................... 53
Table 9: Comparison of Valenti and Sun [58] with results produced by the RE at BER of
10'6 ......................................................................................................................................................59
Table 10: Comparison of Qi [58] with results produced by the R E ........................................... 61
Table 11: Comparison of Sugim oto et al [5 9 ] ................................................................................ 61
Table 12: Comparison of Sugimoto et al [5 9 ] ................................................................................ 52
Table 13: Comparison o f Valenti and Sun [57] with results produced by the RE at BER o f
10’6 ......................................................................................................................................................63
EngD Portfolio -  Volume 1, Edward Brown x i ii
1. Executive Summary
Chapter 1: Executive Summary
The initial specification o f the Engineering Doctorate (EngD) industrial project stated that 
by the time o f its completion the EngD Research Engineer (RE) would have had 
investigated new opportunities within wireless appliances to exploit the reconfigurability of 
Field Program m able Gate Arrays (FPGAs) as a key, product-differentiating feature. This 
was achieved through num erous projects, all o f which were based around Turbo codes.
1.1. Projects
The first stage of the industrial project was to decide on a task that was m utually beneficial 
to the sponsoring company, Xilinx, and the RE. Xilinx had acquired a Turbo encoder and 
Turbo decoder from Frontier Designs [1] that conform ed to the UM TS specification [2]. 
To create the Turbo encoder and decoder, Frontier used their A |RT Designer tool [3, 4] to 
convert C++ code to a synthesizable VHDL netlist. As Xilinx intellectual property (IP) is 
written in structural VHDL it can take a relatively long time to simulate. To overcom e this 
problem  Xilinx usually supplied a behavioural model o f the design to custom ers for 
simulation purposes, this had not been supplied by Frontier. It was decided that this would 
be a perfect learning opportunity for the RE to fam iliarise him self with Turbo codes, while 
producing som ething that benefited the sponsoring company and their customers. This 
stage of the project included both a literature search and practical work. The work carried 
out in this portion o f the industrial project is summarised in Appendices A-C {vol. 2/pp. 1- 
2 5). The time scale to com plete this portion of the project can be obtained from Figure 1. 
The reports produced by the RE during this project were o f great commercial relevance to 
Xilinx, particularly ‘Investigation o f Turbo Decoder Hardware A rchitectures’ , Appendix D 
{vol.2/pp. 26-40}. At this time Xilinx were designing a Turbo codec that com plied with 
the cdm a2000 standard [5]. ‘Investigation o f Turbo Decoder Hardware A rchitectures’ 
allowed Xilinx to determ ine what hardware architecture should be used for their cdm a2000 
Turbo decoder.
Figure 1 highlights the com m ercial relevance and novelty o f all projects undertaken by the 
RE and shows milestones passed by the RE as the industrial project progressed.
The second project tackled was to use a bit error rate test (BERT) platform  [6] created by 
Nallatech to test the Frontier Turbo codec. The forward error correction (FEC) designs
EngD Portfolio -  V olum el, Edward Brown 1
Chapter 1: Executive Summary
were instantiated in the BERT using a VHDL wrapper. The FEC designs could be either 
written in VHDL or be an EDIF netlist. The FEC design under test (DUT) could then be 
stim ulated by random data supplied by a M atlab script. Throughout the life of this project 
there were conflicts between the EDIF netlist supplied by Frontier and the Nallatech BERT, 
as a result the Frontier Turbo codec could not be tested. The developm ent of the Nallatech 
BERT was passed to a staff Engineer at Xilinx, while the RE began to develop a recursive 
systematic convolutional (RSC) encoder that conform ed to the cdma2000 Turbo code 
standard. In between the two projects the RE was able to present a poster [7] highlighting 
the advantages of using both Turbo codes and FPGAs.
W hile the cdm a2000 encoder was being developed Xilinx became interested in the subject 
of D uo-Binary Turbo codes, which were to be used in the Digital Video Broadcasting 
(DVB) standard [8]. The RE was asked to research the subject o f Duo-Binary Turbo codes 
and compile a report on the subject, Appendix E {vol. 2/pp. 41-50}, paying particular 
attention to the differences between Duo-Binary and standard Turbo codes. This allowed 
Xilinx to determ ine the effort required to convert their cdm a2000 codec to a DVB codec.
Once the cdm a2000 Turbo codec was complete it was tested using the N allatech BERT 
platform, this helped Xilinx verify that their codec design was working correctly while 
creating m arketing data for their design. As with m ost Xilinx cores the Turbo decoder was 
param eterisable. The user could specify parameters such as sliding window size, decoder 
input width and which algorithm to use for the decoder. The param eterisable aspect o f the 
design allowed the RE to test the design using various configurations. The results o f these 
tests were presented by the RE in two separate publications [9, 10]. These papers helped 
Xilinx publicise the reconfigurable nature o f their design while allowing the RE to publish 
novel research.
One subject that was of interest to the RE and Xilinx was the subject o f how to manipulate 
the input data to a Turbo decoder so that the best possible BER perform ance was achieved. 
The result o f this investigation was the discovery o f a novel variance value that optim ised 
the performance of X ilinx’s Turbo decoder; this is docum ented in Appendix F {vol. 2/pp. 
51-65}. W hen this literature search was taking place it became apparent that the Nallatech 
BERT platform  was not perform ing optimally: often the BERT would crash when a test 
was running. It was therefore necessary that the problem s with the Nallatech BERT be
EngD Portfolio -  Volum el, Edward Brown 2
Chapter 1: Executive Summary
solved or a new system be designed. Around the same time a new version o f the Xilinx 
System  Generator tool [11] had recently been released. It was decided that the RE should 
im plem ent a BERT in System Generator. Implementing a BERT in System Generator 
allow ed the RE to create something that was com pletely novel and allowed Xilinx to 
investigate how the System Generator tool performs when it is used in system development. 
All results presented in this thesis were obtained using the System Generator BERT. A 
System  Generator design can be altered repeatedly and im plem ented on any compatible 
X ilinx FPGA platform  [12, 13, 14] further highlighting the reconfigurable nature o f the 
System  Generator tool and Xilinx designs.
1.2. Commercial Relevance
The subject o f commercial relevance is dealt with more com prehensively in Chapter 5. A 
sum m ary o f the com m ercial relevance of all projects is summarised below in reference to 
Figure 1.
• Behavioural models and literature search: The report “Investigation o f Turbo 
Decoder Hardware Architectures”, Appendix D  [vol. 2/pp. 26-40].
• Nallatech BERT: The BERT was used by the RE to produce BER plots for 
X ilinx’s cdm a2000 Turbo codec, a selection of the results produced were presented 
by the RE [9].
• cdma2000 Turbo encoder: The RE designed a (RSC) encoder that conformed to 
the cdm a2000 standard.
• Investigation of calculating Turbo decoder inputs: The outcom e of this 
investigation was a report entitled “Calculating Inputs to Turbo D ecoders”, 
Appendix F [vol. 2/pp. 51-65}. The report suggested a novel value that could be 
used for channel variance, the value proposed optimised the Turbo decoder core 
created by Xilinx.
•  SysGen BERT: The System Generator BERT allowed Xilinx to research and 
m arket their Turbo code cores and their 3G solutions in general.
1.3. Novelty
A summary o f the novel aspects of the EngD project are shown below in reference to 
Figure 1.
•  Behavioural models and Literature search: Presentation of original hardware 
architectures.
EngD Portfolio -  Volum el, Edward Brown 3
Chapter 1: Executive Summiry
•  Nallatech BERT: This project highlighted a novel parameterisable Turbo decode* 
[9].
• cdma2000 Turbo encoder: The RSC encoder was designed entirely using Xilinx 
structural libraries and was targeted to the Xilinx Virtex-II FPGA.
• Investigation of calculating Turbo decoder inputs: Presentation of a novel 
channel variance value that optim ised the perform ance of a Turbo decoder.
• SysGen BERT: The System Generator BERT is novel due to the fact that it is 
designed com pletely in System Generator. D esigning in System Generator allowed 
features to be added rapidly and for the testing system to be autom ated [15]. As it is 
designed in System Generator it allows non-FPGA designers to use and upgrade the 
BERT platform in an FPGA environment w ithout using standard FPGA design 
techniques such as HDLs.
1.4. Milestones
The milestones referenced in Figure 1 are detailed below.
•  1: June 2001, 120 taught technical credits achieved.
•  2: February 2002, Turbo encoder behavioural model complete. Report on encoder
behavioural model complete, Introduction to Turbo codes report complete.
•  3: June 2002, Turbo decoder behavioural model partially complete. Report on 
decoder behavioural model complete, Investigation of Turbo D ecoder Hardwire 
Architectures report complete.
•  4: Decem ber 2002, poster presentation, SET for Europe.
•  5: May 2003, Duo-Binary Turbo Codes report complete.
•  6: Septem ber 2003, paper published at IEE Colloquium  on DSP Enabled Radio.
•  7: Novem ber 2003, Calculating Input Values for Turbo Decoders report complete
•  8: January 2004, 60 taught business credits achieved.
•  9: M ay 2004, paper published at W orld W ireless Congress.
•  10: June 2004, System Generator BERT completed.
EngD Portfolio -  Volum el, Edward Brown 4
UJ
U
* >  — K 5
E5—
a) ra
o
Qox;
cW
ox;
E
En
gD
 
Po
rtf
ol
io
 
- 
V
ol
um
e 
1, 
Ed
w
ar
d 
B
ro
w
n
2. Portfolio Organisation
Chapter 2: Portfolio Organisation
The Portfolio Thesis is presented in two volumes. Volume 1 presents the main motivations 
and outcom es o f the work. It also shows the com m ercial relevance and novelty of the 
work. Volume 2 presents num erous appendices containing papers and reports com piled by 
the RE as the EngD progressed. All reports presented are on the subject o f Turbo codes but 
cover num erous areas within the Turbo code field.
The rem ainder o f Volume 1 is organised as follows: Chapter 3 discusses the taught 
technical and business credits that were chosen by the RE, showing how the classes chosen 
aided the RE in both his research and in gaining an understanding on how the projects 
undertaken im pacted on X ilinx commercially. Chapter 4  highlights all publications made 
by the RE. The commercial relevance of projects undertaken by the RE is exam ined in 
C hapter 5. It is shown how the numerous projects conducted by the RE contributed to the 
com m ercial success o f X ilinx’s 3G solutions and how FPGAs can be used to reduce the 
cost o f building a mobile base station.
Chapter 6 presents the technical background to the R E ’s industrial project. Firstly the 
subject o f channel coding is introduced. Particular attention is given to the subject of 
convolutional codes and the differences between a standard convolutional encoder and a 
recursive systematic convolutional (RSC) encoder, used in Turbo codes, is given. An 
introduction to Turbo codes is then presented. Each component, the Turbo encoder, 
interleaver and Turbo decoder, in the Turbo codec are described separately. The Section on 
Turbo encoders includes a discussion on the standards looked at by the RE during the 
EngD, the UM TS [2], cdm a2000 [5] and DVB [8]. The advantages and disadvantages of 
im plem enting a Turbo codec in a Field Programmable Gate Array (FPGA) based system 
are also discussed. Chapter 7 will highlight some o f the results obtained by the RE and 
discuss the significance o f these results. The results included tests that change inputs such 
as block size and code rate. Novel results such as a comparison between UM TS and 
cdm a2000 Turbo codecs will also be analysed. Finally, Chapter 8 will summarise the thesis 
and propose future work that should be carried out.
Figure 2 shows how each of the chapters in Volume 1 relate to each other. All chapters in 
some way or another have an im pact on the commercial relevance of the project. As stated
EngD Portfolio -  Volume 1, Edward Brown 6
C hap te r 2: P o rtfo lio  O rganisation
previously, the business modules have a direct link to the commercial relevance of  the 
projects undertaken. Any results generated were used to promote Xilinx 3G solutions to 
potential customers, similarly any publications highlighting these results promoted Xilinx’s 
3G solutions to international audiences.
C h a p te r  4: 
P ublications
C h a p te r  7: 
R esults and  
D iscussion
C h a p te r  5: 
C om m ercial 
R elevance
C h a p te r  6: 
T echn ica l 
B ack g ro u n d
C h a p te r  3: 
T echnical and  
Business 
M odules
Figure 2: Organisation of Volum el
E ngD  P ortfo lio  -  V olum e 1, E dw ard  B row n 7
C hap te r 3: T aught M odules
3. Taught Modules
Taught technical credits were obtained in the RE’s first year at the Institute for System 
Level Integration (ISLI). The 60 business credits were obtained from the University of 
Strathclyde Graduate Business School.
3.1. T echnical C red its
In total 120 credits were gained at the ISLI, the subjects taken by the RE to gain these 
credits are shown in Table 1. The classes at the ISLI provided an insight into many aspects 
of system on a chip (SoC) design. Classes that benefited the RE’s industrial project were: 
Intellectual Property Block Authoring (IPBA); Intellectual Property Block Integration 
(IPBI); Communication Algorithms; and Mobile Communications. Both IPBA and IPBI 
contributed to the RE’s knowledge of digital design with a particular emphasis on design 
reuse through parameterisation. Communication Algorithms and Mobile Communications 
helped the RE understand some of the basic concepts of mobile telephony and error control 
coding.
Table 1: Technical credits breakdown
SoC Overview 2
System Partitioning 12
IPBA 12
IPBI ?  12
VLSI Design 12
Software Engineering 15
Microcontrollers and Microprocessors 12
Towards Deep Submicron 7
Communications Algorithms 12
Multimedia and Video 7
Mobile Communications 
Broadband and Digital Networks 10
EngD  P ortfo lio  -  V olum e 1, E dw ard  B row n 8
C hap te r 3: T aught M odules
3.2. Business M odules
Completing the business modules allowed the RE to be more aware of the commercial 
relevance of the work carried out during the industrial project. Sixty credits were gained by 
the RE from the Master of Business Administration (MBA) modules available at the 
University of  Strathclyde Graduate Business School. The MBA modules chosen by the RE 
and their credit weighting are shown in Table 2.
Table 2: Business credits breakdown
Marketing Management in| 12
Data Management ^
Information Systems 6
Managing People in Organisations 12
Finance and Financial Management 12
Financial and Management Accounting 12
From a purely research point of view Information Systems was very beneficial for the RE. 
It helped the RE gain an understanding of how to manage and extract necessary information 
where an abundance of information is at the researcher’s disposal. Both Finance and 
Financial Management and Financial and Management Accounting allowed the RE to 
recognise how important financial constraints and project deadlines were in allowing 
projects to be financially beneficial to the company. Marketing Management was of 
particular significance to the RE. It introduced the RE to new concepts that could be used 
to highlight work the RE had completed and allowed the RE to look at products from a 
customer’s viewpoint. Gaining this ability meant that the RE could improve systems such 
as the System Generator BERT by adding features which improved its usability.
E ngD  P ortfo lio  -  V o lu m e l, E dw ard  B row n 9
Chapter 4: Publications
4. Publications
The RE m ade numerous presentations at international conferences. The publications made 
by the R E are highlighted below.
•  Reconfigurable Cores for Wireless Appliances [7]: An introduction to both 
FPGAs and Turbo codes. The RE also highlighted the advantages of im plem enting 
Turbo codes in FPGAs.
•  A Memory-efficient Parameterisable FPGA Implementation of the cdma2000 
Turbo Codec [9]: Presented a novel param eterisable Turbo decoder. The RE used 
the Nallatech BERT to produce results that showed how changing the param eters o f 
the Turbo codec core affected performance.
•  A Memory-efficient Parameterizable FPGA Implementation of the cdma2000 
Turbo Codec [10]: The paper presented here showed how the BERT could be used 
to calculate novel results such as a channel variance value that optim ised the Turbo 
codec core.
•  Rapid Prototyping of a Test Harness for Forward Error Correcting Codes
[16]: The paper concentrated on the implementation of the System Generator BERT 
designed by the RE. It highlighted a num ber o f inputs that made the BERT user 
programmable. The publication was included in the conference session ‘Novel 
A pplications of Reconfigurability’.
•  Rapid Prototyping of a Test Harness for Forward Error Correcting Codes 
[15]: In this paper the RE presented a num ber of results that were produced using 
the System Generator BERT. The results included a comparison between the 
cdm a2000 and UM TS Turbo codecs.
EngD Portfolio -  Volume 1, Edward Brown 10
Chapter 5: Commercial Relevance
5. Commercial Relevance
By the time 3G mobile networks were launched in the UK [17] mobile telephony, due to its 
rapid acceptance by the consumer, was a relatively mature product. Consumers therefore 
had become accustom ed to a certain level of quality for a certain price on mobile networks, 
for exam ple seamless handover between base stations while on a call and near 100% 
network coverage. M obile telephones were expected to be lightweight and compact, have a 
battery that could last for 2 to 3 days and offer basic functions such as gam es and a 
calendar. Both network operators and phone manufacturers faced problems when 3G 
telephony was initially launched, phones and network tariffs were expensive, battery life 
was short and phones were relatively large and heavy. One o f the major technological 
challenges for 3G network operators was im plem enting an algorithm that could seamlessly 
handover calls between GSM and 3G networks. This is partly due to the fact that the 
UM TS standard im plem ented in Europe has no direct backward compatibility with GSM. 
Benson and Thomas [18], Jugl and Pampel [19] and Lugara et al [20] discuss the 
lim itations o f certain handover algorithms and present simulation results o f these 
algorithms.
The majority o f consumers did not see the advantages o f 3G over more mature technologies 
such as GPRS. This led to a very slow take up in 3G phones. Three, the first com pany to 
launch 3G in the U K  had only 210,000 customers by the end o f their first year in M arch 
2004, they had expected one million [21]. However, the market for 3G telephones is 
expanding, all but one o f the com panies able to provide 3G services was expected to launch 
3G networks by D ecem ber 2004 [22]. Three have attem pted to attract custom ers by 
offering tariffs that are com petitive with the 2G network operators’ free minutes and free 
short m essage service (SM S) bundles. This has been backed with free trials o f T hree’s 3G 
services such as football highlights and video conferencing. This seems to have been 
successful with Three reaching a total o f 1-2 million customers in the U K  by A ugust 2004 
[23].
5.1. FPGAs
The size and pow er consum ption o f FPGAs mean it is difficult for them to be used in 
mobile phones in the imminent future. However, given their reconfigurability they are 
ideal for use in mobile base stations or for hardware prototyping. The reconfigurable nature
EngD Portfolio -  Volume 1, Edward Brown 11
Chapter 5: Commercial Relevance
of FPGAs is particularly im portant given the changing nature of 3G mobile standards. 
Both UM TS and cdm a2000 standards changed a num ber of times between their inception 
and the final standard being set. This offers m anufacturers of FPGAs a great advantage 
over the m anufacturers o f Application Specific Integrated Circuits (ASICs). An FPGA 
m anufacturer can offer their 3G solutions to any customers and as the standards evolve so 
can the solutions offered by the FPGA manufacturers. However, when the final standard is 
released the ASIC m anufacturer can offer a solution that is smaller, faster and consumes 
less power. Tim e to market is a very im portant part o f any product development, the 
reconfigurability o f FPGAs means that their solutions can always beat ASICs to the target 
market. As FPGAs are “off the sh e lf’ parts they are also cheaper than ASICs for low 
volume production.
Initially FPGAs were used as ‘glue logic’ in m obile base stations, so called because it 
connected two or more complex systems. However, as the processing pow er o f FPGAs has 
increased they have been used more extensively in base stations. M any base stations use a 
com bination o f ASICs, Digital Signal Processors (DSPs) and FPGAs to im plem ent their 
desired system. ASICs offer the designer high speed, low power and small area, but are 
very costly to design and offer little freedom in terms o f reconfiguration. DSPs and FPGAs 
both offer reconfigurability, while DSPs may be smaller and cheaper per unit than FPGAs 
they cannot offer the same processing power. This has led to FPGAs replacing both ASICs 
and DSPs in many base stations. FPGAs are ideal for use in com ponents such as rake 
receivers and Turbo decoders as they allow multiple channels to be processed in parallel. 
This leads to FPGAs being less expensive per channel than DSPs [24]. Reconfigurability is 
not the unique selling point o f FPGAs, as DSPs can also be reconfigured. The unique 
selling point o f FPGAs is offering a chip that is reconfigurable and can offer the user a 
processing speed that can com e relatively close to that o f an ASIC.
As FPGA technology evolves they will eventually becom e small, low pow er devices 
relative to the ASICs of today. Obviously ASIC technology will also evolve and become 
smaller, faster and consume less power than future FPGAs. However, they cannot offer the 
reconfigurability and time to m arket advantages that future FPGAs will offer. Just as at the 
m oment a base station m anufacturer can reconfigure an FPGA rem otely by transm itting a 
bit stream, in the future network operators will be able to reconfigure FPGAs on mobile 
telephones remotely. This is a key aim of software defined radio (SDR) [25]. The
EngD Portfolio -  V olum e!, Edward Brown 12
Chapter 5: Commercial Relevance
reconfigurable nature of FPGAs may well allow them to eat into the ASICs share of the 
handset and base station market when SDR becomes common across all networks. The 
subject o f SDR is discussed in more detail in Chapter 6.4.
5.2. Contribution
All work carried out by the RE has been o f significant com m ercial relevance to Xilinx. The 
most significant piece o f work carried out was undoubtedly the System Generator BERT. 
This was used both as a research tool and as a marketing tool. A fter creating their Turbo 
codec’s the main problem  facing Xilinx was functional verification and generating results 
to present to custom ers for m arketing purposes. Initially the codec was functionally 
verified by com paring the output o f the VHDL simulations with a C++ model o f the codec. 
Any marketing or research data was also generated with the C++ model. The problem with 
this is that it can take weeks or even months to generate accurate plots. Using real 
hardware to test the system was the only real viable solution. The System Generator BERT 
was im plemented in both a Nallatech Xtrem eDSP Kit [12] and an Annapolis W ildcardll 
[13] PCM CIA card. Both o f these contained Xilinx Virtex II 3000 chips [26]. Using the 
System Generator BERT meant that requests for plots could be taken from customers, 
generated and passed to the custom er rapidly. In some cases the System Generator BERT 
could be given to the custom er so that they could test the Turbo codec core for themselves. 
Another important aspect o f the System Generator BERT is that it can be used in 
demonstrations and in conference presentations, an accurate BER plot could be generated in 
a matter o f seconds. As the system was automated using various scripts it also meant that 
the user could set up a num ber o f simulations and then leave the BERT running 
continuously. The BERT was designed to be extremely user friendly: to start a test the user 
needed only to be fam iliar with very basic M atlab skills, such as how to run a script.
The behavioural models designed by the RE were to be used by Xilinx customers who 
purchased the X ilinx/Frontier Turbo codec and therefore had great commercial relevance to 
Xilinx. The main purpose o f the behavioural model was to allow the user to simulate the 
Turbo codec. As well as producing the same results as the structural design it also had to 
offer a significant simulation speed-up over the structural design and be easy to use. The 
behavioural model could also be used to verify the structural VHDL design.
EngD Portfolio -  Volume 1, Edward Brown 13
Chapter 5: Commercial Relevance
The RE also designed a recursive systematic convolutional (RSC) encoder that conformed 
to the cdm a2000 standard for Xilinx. Again, it could be used by customers in their 
cdm a2000 codec.
The numerous reports created by the RE also had an impact on the commercial output from 
Xilinx. Two particular reports o f significance were “Investigation of Turbo Decoder 
Hardware A rchitectures”, Appendix D {vol. 2/pp. 26-40}, and “Calculating Input Values 
for Turbo Decoders” ; Appendix F {vol. 2/pp. 51-65}. The latter proposed a novel value 
that optim ised the perform ance of the Xilinx Turbo decoder. The form er was used to 
determine which architecture should be used to implement the cdm a2000 Turbo decoder.
EngD Portfolio -  V olum e!, Edward Brown 14
C hap te r 6: T echnical B ackground
6. Technical Background
In 1948 Claude Shannon published his channel coding theorem [27], this theorem showed 
that data could be transmitted in a noisy channel with little or no error, provided that the 
data rate does not exceed the channel capacity. The publication of this paper led to a 
number of different theorems, and therefore channel coding techniques, on how to achieve 
this goal. Common between all techniques is the inclusion of a channel encoder and 
channel decoder to the communications system. A simple communications system is 
shown in Figure 3.
r . Source ; _ k . Channel ModulatorSource Encoder Encoder T
Demodulator k Channel Source
4 Decoder Decoder
V Destination
Figure 3: Overview of a simple comm unications system
The encoder and decoder implemented can perform simple error detection, where the 
system simply realises that one or more bits received is in error and take action such as 
requesting that the data in error be re-transmitted, or a more complex system called forward 
error correction (FEC). FEC systems both detect and correct errors, this enhancement is 
paid for by the complexity of the FEC decoder. Two coding techniques are used to 
implement FEC systems, block codes and convolutional codes. The major distinction 
between the two is that the output from a block encoder is completely reliant on its current 
input alone. The output from a convolutional code is reliant on the current input plus the 
previous m inputs, where m is the number of memory elements in the convolutional 
encoder. The number of memory elements in a convolutional encoder is known as the 
constraint length. Block encoders accept A-bits at its input and produce n parity bits from 
this input. In total, n+k bits are transmitted. Of the bits transmitted, n is redundant as they 
contain no real information. Redundancy can be measured in terms o f code rate, R. For a
EngD  P ortfo lio  -  V o lum e 1, E dw ard  B row n 15
Chapter 6: Technical Background
block code redundancy can be measured using (1). Code rate is inversely proportional to 
the num ber o f parity bits generated. Hence, as redundancy increases the code rate 
decreases and less useful information is transmitted. The advantage of increasing 
redundancy is that the error correcting power of the code is increased.
R = — (1)
n + k
The code rate o f a convolutional encoder can be calculated using (2), where k  is the num ber 
of bits into the encoder and n is the num ber of bits out.
As Turbo codes are a convolutional FEC technique this portfolio thesis shall look at this 
type o f code in more depth. The reader is referred to more com prehensive publications on 
the subject o f block codes [28, 29] for more information on this subject.
6.1. Convolutional Codes
An exam ple of a convolutional encoder is shown in Figure 4. The com ponent encoders in 
Turbo encoders are recursive systematic convolutional (RSC) encoders. The encoder 
shown in Figure 4 is both non-recursive and non-systematic. The differences between the 
two types o f convolutional encoders will be highlighted in this section. In Figure 4 the 
input, ut, is passed through memory elem ents and modulo-2 adders to generate parity 
streams, X tpo and X f 1. U sing (2) it can be shown that the code rate for this encoder is y .
A convolutional encoder can be represented using a polynomial. The polynom ial that 
represents X f 0 is shown in (3); the polynom ial that represents X f 1 is shown in (4)
(2)
n
U, ----<1—► • — >| I 1  K )  >x r
F ig u r e  4 :  C o n v o lu t io n a l  e n c o d e r
g , ( D )  = l  + D  + D 2 
g 2( D  ) = 1 + D 2
(3)
(4)
EngD Portfolio -  V olum el, Edward Brown 16
Chapter 6: Technical Background
Any non-system atic, non-recursive convolutional encoder can be converted to a RSC 
convolutional encoder using the rule shown in (5).
g 2( D )G ( D )  = 1,
g j ( D )
(5)
Figure 5 shows the equivalent RSC encoder created using (5) from (3) and (4).
— > X , PI
KD * > * K) ►a-;
F ig u r e  5 : R S C  e n c o d e r
For Turbo codes the main advantage o f using RSC instead of non-RSC encoders is that 
when RSC encoders are concatenated in parallel they produce relatively more high weight 
code words, leading to a better BER perform ance [30]. Berrou et al [31] and Benedetto et 
al [32] both show that the BER perform ance of an RSC encoder is better than the BER 
performance o f the corresponding non-RSC encoder.
To decode a code word it is advantageous for the decoder to know the starting state o f the 
encoder that produced the code word. This is done by returning the encoder to the all zero 
state after a packet has been encoded. This is relatively easy for the non-RSC encoder as 
all that is required is to input a sequence o f m  zeros, where m  is the num ber of m em ory 
elements. For the RSC encoder a process known as trellis termination [33] m ust be used to 
return the encoder to the all zero state. Figure 6 shows the process of trellis termination. 
When a block has been encoded the data input, ut, to the first XOR gate is replaced by the 
feedback input to the first XOR gate, thereby always producing an input o f zero. The 
outputs of the encoder when in the trellis termination state are known as tail bits.
----------------------------------- ►©-----------►a-,”
►» •  ►m — *-►
F ig u r e  6 :  R S C  e n c o d e r  w i th  t r e l l i s  t e r m in a t io n
EngD Portfolio -  Volume 1, Edward Brown 17
Chapter 6: Technical Background
6.2. Turbo Codes
Introduced in 1993 Turbo codes [31] are a relatively recent innovation in channel coding 
research. They have already been included in numerous international standards, including 
deep space telemetry [34], digital video broadcasting (DVB) [8], UM TS [2] and cdma2000
[5]. Their popularity has grown due to their BER perform ance at low signal to noise ratios, 
the initial paper [31] came within 0-7dB of the Shannon limit. The unprecedented 
perform ance o f Turbo codes can be attributed to three main factors: the use o f parallel 
concatenated convolutional encoders; the use of an interleaver between each component 
encoder and the use of iterative decoding. A  top-level view o f a Turbo encoder is shown in 
Figure 7. The systematic and parity outputs can be punctured before transmission. 
Puncturing is a process where certain encoder outputs are deleted so that the code rate is 
increased, hence increasing the information rate. W hen an encoder output is punctured it is 
replaced with an all zero codeword at the decoder input. Using puncturing will usually 
result in a degraded BER perform ance, as will be shown in Chapter 7. The systematic 
output from  RSC2, X ts , is always punctured, in this case the BER performance is not 
com prom ised as X ts can be reconstructed using an interleaver and the systematic data from 
RSC1, X f  . In Figure 7, X™  represents parity data output by RSC1 and X f 1 represents 
data output by RSC2.
Interleaver
RSC1
Encoder
 X™-
RSC2
Encoder
F ig u r e  7 : T o p - le v e l  v ie w  o f  a  T u r b o  e n c o d e r
A Turbo encoder is created from  three components, two recursive systematic convolutional 
encoders (RSCs), discussed in Section 6.2.1, and an interleaver, discussed in Section 6.2.2. 
Each RSC is known as a com ponent encoder.
A Turbo decoder is created using four components, two soft-in soft-out decoders, an 
interleaver and a de-interleaver. The top-level o f a Turbo decoder is shown in Figure 8.
EngD Portfolio -  Volum el, Edward Brown 18
C hap te r 6: T echnical B ackground
For i iterations each SISO shares its output data with the preceeding decoder. Data output 
from the SISO decoder, Leah(ut ), is the SISO estimate of the original data input to the 
corresponding Turbo encoder and is known as extrinsic data. As (7) will show, the 
systematic input to the decoder, T/s , is contained in the natural output of  the SISO decoder, 
L(Uk). Therefore, in a real hardware implementation it is unlikely that only l^ab(m,) will be 
output by the SISO decoder, the actual value output will be Leah(ut ) + Yt,s. However, the 
SISO in Figure 8 shows only L;ah(ur) being output for clarity. In Figure 8, a X subscript 
indicates that the associated variable is in interleaved form, relative to the original input 
data sequence. Ytp<) represents the noisy version of the data output by RSC1 in Figure 7,
Y ^ 1 represents the noisy version of the data output by RSC2, After i iterations, the
components shown as dashed lines are initialized. The hard decision (HD) block is a basic 
thresholder that compares the summed input to 0. If the input is greater than 0 the decoder 
estimate, u k, is a binary 1. If the summed input is less than 0 the decoder is estimate is a 
binary 0. If the summed input is exactly 0 there is the same probability that the output 
should be a 0 or a 1. In this case the system designer must decide what the output should 
be. This could involve an additional algorithm, i.e. alternating the output for a 0 input 
between 0 and 1 or simply making the output always be 1 or 0 when a 0 input is received.
y P < >
SISOl Interleaver
For i 
iterations
HD
Block "
S IS02De-Interleaver
Figure 8: Top-level view of a Turbo decoder
Sharing of information between each SISO decoder is a major contribution to the 
performance of Turbo codes. The Turbo decoder and its components are discussed more 
comprehensively in Section 6.2.3.
EngD  P ortfo lio  -  V olum e 1, E dw ard  B row n 19
Chapter 6: Technical Background
6.2.1. Turbo Encoder
The input to Turbo encoder comes from a data set of size N, known as the block size. 
Traditionally, the input to the encoder is 1 bit wide. However, systems such as the DVB 
Turbo encoder accept multiple bit inputs. As Figure 7 shows the systematic output from 
RSC 2 is not transmitted. This is achieved by puncturing the Turbo encoder output. 
Puncturing the output o f a Turbo encoder involves deleting certain bits according to a 
puncturing pattern.
The com ponent RSC encoder for the UM TS standard is shown in Figure 9.
u,
F ig u r e  9 :  U M T S  c o m p o n e n t  e n c o d e r
As each UM TS RSC encoder outputs only one parity stream the UM TS Turbo encoder has 
a standard code rate o f ^ .
A cdma2000 com ponent RSC encoder is shown in Figure 10.
+  X:
F ig u r e  1 0 : c d m a 2 0 0 0  c o m p o n e n t  e n c o d e r
The cdma2000 RSC encoder has two parity stream outputs and can therefore have a code 
rate of \ \  or y . The puncturing pattern of the cdma2000 standard is shown in Table 
3.
EngD Portfolio -  V olum el, Edward Brown 20
Chapter 6: Technical Background
T a b le  3 : c d m a 2 0 0 0  p u n c tu r in g  p a t te rn s  f o r  d a ta
Code Rate
Output 12
i
3
i
4
1
5
x? 1 1 1 1 1 1 1 1
x p°
1 0 1 1 1 1 11
x r 0 0 0 0 1 0 1 1
x,s 0 0 0 0 0 0 0 0
x po 0 1 1 1 0 1 1 1
x r 0 0 0 0 1 1 11
In the puncturing table a 0 represents a bit that is deleted and a non-zero num ber shows 
how many times the symbol in question is transmitted. For example, if data is being 
transm itted at a code rate of \  the output from the encoder will be X f , X ™ , X f , X tpo. 
W hen the trellis termination is initialised the puncturing pattern is altered so that X ts is 
transmitted, certain bits are repeated so that the code rate during trellis termination matches 
the code rate requested by the user. The puncturing pattern for trellis termination is shown 
in Table 4.
T a b le  4 :  c d m a 2 0 0 0  p u n c tu r in g  p a t te rn s  f o r  t r e l l i s  t e r m in a t io n
Code Rate
Output 12
1
3
1
4
i
5
111 000 222 000 222 000 333 000
x po 111 000 111 000 111 000 111 000
x r 000 000 000 000 111 000 111 000
x,s 000 111 000 222 000 222 000 333
x r 000 111 000 111 000 111 000 111
x r 000 000 000 000 000 111 000 111
The DVB com ponent RSC encoder is fundam entally different from  the UM TS and 
cdm a2000 RSC com ponent encoders as it accepts two binary inputs for each time period, t,
EngD Portfolio -  Volum el, Edward Brown 21
Chapter 6: Technical Background
highlighted in Figure 11. Appendix E  {vol. 2/pp. 41-50}, gives an in-depth overview on 
the subject o f duo-binary Turbo codes.
A r
Bt
Wt Yt
F ig u r e  1 1 : D V B  c o m p o n e n t  e n c o d e r
► B;
The DVB standard Turbo encoder can output data in any one of seven code rates. These 
code rates are shown in Table 5.
T a b le  5 : D V B  s ta n d a r d  p u n c tu r in g  p a t te rn s
Code Rate
Output i3
2
5
1
2
2
3
3
4
4
5
6
7
K 1 11 1 10 100 1000 100000
w, 1 10 0 00 000 0000 000000
6.2.2. Interleaver
The interleaver improves the perform ance of the Turbo codec as it reduces the correlation 
between the data entering RSC1 and the data entering RSC2. By interleaving the data 
entering RSC2 it also protects the encoded data from  burst errors. The input data to an 
interleaver is assigned an address. This address is then changed using a specific algorithm. 
A block diagram o f the address generator for the cdm a2000 interleaver is shown in Figure 
12 .
EngD Portfolio -  V olum el, Edward Brown 22
C hap te r 6: T echnical B ackground
Get d from 
look up table 
such that
d+5N< 2
Initialize 
(d+5) bit 
counter to 0
Add 1 and 
select the d  
LSBs
4 f t MSBs
\ , d
Multiply and 
select the d 
LSBs
LSBs^ *
Look up f d
/ d
MSBs
Bit
reverse
5
LSBs
Discard if 
input > N
d+5
Figure 12: cdma2000 standard interleaver
If the address generated is greater than the block size of the current input data set then it is 
discarded. The components in the cdma2000 interleaver are easily implementable in 
hardware as they contain common mathematical operations such as counters, multipliers 
and bit reversals. The UMTS standard interleaver [2] is more mathematically complex and 
could be considered more difficult to implement in hardware.
6.2.3. T u rb o  Decoder
The key component of the Turbo decoder is the soft-in soft-out (SISO) decoder. SISO 
decoders, as their name suggests, accept soft value inputs and give soft value outputs. Soft 
values are j  bits wide; the j  bits represent a fractional number, an example of a decoder that 
accepts soft inputs is the Viterbi decoder [35]. One disadvantage of the traditional Viterbi 
decoder is that it only outputs single bits. By using soft inputs and soft outputs the BER 
performance of the decoder is improved. The main algorithms used to implement SISO 
decoders are the soft output Viterbi algorithm (SOVA) [36] and the Maximum a Posteriori 
(MAP) [37]. Using the MAP algorithm results in a better BER performance compared with 
the SOVA algorithm [38]. The MAP algorithm produces the most probable information bit 
per time instance for a given data set, the Viterbi algorithm produces a maximum likelihood 
sequence for a given data set. By performing a forward and backward traversal of the
EngD  P ortfo lio  -  V olum e 1, E dw ard  B row n 23
Chapter 6: Technical Background
trellis, the M AP algorithm produces two probabilities for the same time instance. This 
improves the BER performance of the Turbo decoder.
The Turbo decoders tested during the R E ’s industrial project were designed using 
variations o f the M AP algorithm. Therefore, only this algorithm and its variants are 
described in this portfolio thesis. The variants described are the log-M AP algorithm , the 
M ax-log-M AP algorithm and the Max Scale algorithm. All o f these algorithm s offer 
varying levels o f perform ance and use varying levels o f resources on the FPGA when 
implemented. Usually, as resources increase the BER performance of the im plem ented 
algorithm improves.
The soft inputs to the M AP algorithm are input in the form of a log-likelihood ratio, L(u,). 
A log-likelihood ratio shows the likelihood that a soft value represents a binary 1 or 0. 
Equation 6 shows how a log-likelihood ratio is calculated, where y t represents the data 
received from the com m unications system demodulator.
r(“, = n > , ) 'l (ut ) = In (6)
f k  = 0 \ y , ) _
If the result o f (6) is positive then it is m ost likely that the result should be a binary 1, if the 
result is negative it is most likely that the result should be a binary 0. The probability that 
the result should be 0 or 1 increases as the magnitude o f the result increases. If  the result of
(6) is exactly 0 then it is equally likely that the result is 0 or 1.
Ryan [39] shows that the output from a M AP decoder consists o f three elements, shown in
(7).
L(ul )= L ‘w {u,)+ L'0UT(u ,)+ L c Yi,s (7)
The variable Lc  is dependent on the variance of the channel, LeIN(ut ) represents extrinsic 
data input to the SISO decoder, e.g. ) into S IS 0 2  in Figure 8. !0UT(ut ) represents 
the extrinsic output from the SISO Decoder, e.g. Le2]A (u t ) out o f S IS 0 2  in Figure 8.
The three main variables used to calculate the outputs from  the M AP decoder are at(s), the 
alpha probability of state s at time t, p t(s), the beta probability o f state s at time t, and y,(s 
s), the branch probability. The alpha probability is produced by performing a forward trace 
of the trellis, the beta probability is produced by a backward trace o f the trellis.
EngD Portfolio -  V olum el, Edward Brown 24
C hap te r 6: T echn ical B ackground
The branch probability is the probability that the trellis moves from state s ’ at time t-1 to 
state s at time t. Ryan [39] shows that the branch metric can be calculated using (8).
T, (S' ,s )  = exp [ j X f ( L ‘J u ,  )+ L c Yls )+ L c ( x y ; 0 + X ^ Y , F‘ )\ (8)
For a standard Turbo decoder each state in the trellis has two branches entering it. This 
means that each state is associated with two alpha probabilities and two beta probabilities, 
all four probabilities which have a branch probability associated with them. Equation (9) 
shows how a,(s) is calculated in a forward trace, this is shown graphically in Figure 13. 
The probability that the trellis is in state 5 at time t-1 is equal to the probability that the 
previous state was s ’ and the trellis travelled along branch y,(s\ s) to get to state s. As 
Figure 13 shows each branch is associated with either a 0 or a 1. If a branch is associated 
with a 0 then the state it comes from will have a negative a probability, similarly a branch 
associated with a 1 will have come from a state with a positive alpha probability. 
Consequently (9) will yield either a positive or negative a depending on the magnitude of 
the two previous states.
<x, C o )  t = l .......N  (9)
Figure 13: Calculating alpha probability
The beta probability is calculated in a similar fashion to the a probability, except it is 
calculated when a backward trace is performed. Figure 14 shows graphically how the beta 
probability is calculated. The actual equation performed is shown in (10). The state, s , at 
time t-1 has two probabilities associated with it. The probability that the state at time t was 
P ‘t (s') and the trellis travelled along branch yf (s ',s )  and the probability that the state at 
time t was J3°(s') and the trellis travelled along branch y { (s ' , s ) .  Whichever of the two
probabilities is most likely will have the largest magnitude and therefore the output of (10) 
will be positive or negative depending on the result of (10).
P,-i( s ) = Y j P<( s ' ) x 7 , ( s ' ’s ) t = N, ..., 2 (10)
E ngD  P ortfo lio  -  V olum e 1, E dw ard  B row n 25
C hap te r 6: T echnical B ackground
A'C)
Figure 14: Calculating beta probability
As Figure 8 shows the output of the decoder after i iterations is given by (11).
U u , ) =  Len (u, )+  U2I(u, )+ L(.y,5(u, ) (11)
In terms of a^s) ,  fik(s) and yk(s’, s) the output from the decoder is calculated as shown in 
( 12 ).
X  f e - ,  A  CO)
(j'j)
U u , )  = 0^j  , -  . , . . (12)
LVX,-,U)x?, (s.iJxAWJ
(s'.s)
u ,= - l
EThe MAP algorithm gives exceptional BER results at relatively low — , however, the
No
MAP algorithm is extremely difficult to implement in hardware as it contains mathematical 
operations such as logarithms and divisions which are costly in terms of hardware 
resources. A modification of the MAP algorithm is the log-MAP algorithm [40]. As the 
name suggests the log-MAP algorithm is an implementation of the MAP algorithm in the 
logarithm domain. Taking the logarithm of (8), (9) and (10) yields (13), (14) and (15).
A, (s)  = l n ( a t (s)) (13)
B, {s) = ln{j3,{s)) (14)
G, {s, ,s)  = ln{yl {s ’’s ))
G , ( s ' , s i -= ( j X ,s f e (« ,)+  L . y ? ))+ ( j L  ( y « X , ro + y f ' X ' 1)) (15)
The Jacobian logarithm, (16), can be used to obtain a formula for both (13) and (14) that 
can be easily implemented in hardware, shown in (17) and (18).
MAX * (a " ,A ' ' ) = ln(eA' + e A!)
MAX  * [a ° , A ! ) = MAX [A1! , A l )+ ln[l + e~]A°~A!]) (16)
Ai ( j )= M A X * (A i_/ (5 ')+ G ,( j ' ,5 ))  t = 1 N -l  (17)s’
EngD  Portfo lio  -  V o lu m e l, E dw ard  B row n 26
MAX *(B ,C O +C ,(*'.*))s
C h a p te r  6 : T e c h n ic a l  B a c k g ro u n d  
t = N, 2 (18)
Similarly the output from the decoder can also be calculated using the Jacobian logarithm, 
shown in (19).
L{ut ) = MA X  + C O ) + #,(*))
(19)
( s' . s)  u,=+l
( s ' , s)  u,=-l
The Max* com putation involves adding both the alpha or beta metrics to the branch metric 
they are associated with, then com paring both of the summed metrics. The largest is 
selected and the offset is added from a look up table. This output is then passed to the next 
state in the trellis diagram. The add, compare, select, offset com ponent is at the heart of 
all o f the M AX* computations, it is shown graphically in Figure 15 [41]. One ACSO unit 
is needed for every state in the trellis diagram.
offset
t m l , ( s 0 L U T
A V ) o  >0 +
a! (s') 0 <T)
QffA!_,(s') --------1
sign
OffAt(s)
A t(s)
F ig u r e  15 : A d d ,  c o m p a re ,  s e le c t,  o f fs e t  u n i t
The log-M AP algorithm can be further simplified by excluding the offset, ln[l + e /4' 1), 
in (16), resulting in an implementation called the M ax-log-M AP algorithm. Chapter 7 will 
show that removing the offset results in a relatively poor BER perform ance. A 
compromise between the log-M AP and M ax-log-M AP algorithm  is the M ax Scale 
algorithm. The Max Scale algorithm  has a similar complexity to the M ax-log-M A P 
algorithm but achieves a coding gain o f between 0-2dB and 0-4dB com pared with the M ax- 
log-M AP algorithm  [42]. The coding gain is achieved by multiplying the SISO decoder 
output by a scaling factor, s f  shown in (20).
L 'o v t  (“,) = k(«,)- fc (“,) + A x 5 )Jx sf (20)
EngD Portfolio -  Volum el, Edward Brown 27
C hap te r 6: T echn ica l B ackground
6.3. Field P ro g ram m ab le  G ate  A rray s
Field Programmable Gate Arrays (FPGAs) are off the shelf hardware parts that offer a 
compromise between the speed of ASICs and the reconfigurability of DSPs. There are 
three main basic elements in FPGAs, these are: configurable logic blocks (CLBs); input 
output blocks (IOBs) and routing. The basic structure of an FPGA is shown in Figure 16.
In the Xilinx Virtex-II FPGA each CLB contains 4 slices. Each slice contains two 4-input 
look-up tables (LUTs), two registers and some combinatorial logic, used to implement fast 
carry chains, and two multiplexers. The resources used by any given design are measured 
in slices.
EZ3 E D  H  ^
1  
E 
I
C L B  /  R o u t i n g  R o u t i n g
I O B  M a t r i x
Figure 16: FPGA basic structure
The routing matrix is programmable, allowing the user to change routing for each new 
design downloaded to the FPGA. As has been stated previously, FPGAs offer a speed 
advantage over DSPs and a reconfigurability advantage over ASICs. However, both DSPs 
and ASICs are smaller and consume less power than FPGAs. To compete with DSPs 
FPGAs now contain elements such as memory and embedded multipliers. The Xilinx 
Virtex-II FPGA [26] has up to 168 18bit x 18bit embedded multipliers (EMults), along with 
168 18Kbit block RAMs (BRAMs). The Xilinx Virtex-II Pro [43] extended the 
programmability of Xilinx FPGAs by including up to 2 embedded PowerPC processors. 
Adding these extra components allows FPGAs to outperform DSPs in applications that 
require intensive computing.
It is well known that the development time of FPGA design is significantly shorter than 
ASICs. However, DSPs have a shorter design development time than FPGAs. This is
T i
E ngD  P ortfo lio  -  V olum e 1, E dw ard  B row n 28
C hap te r 6: T echnical B ackground
mainly due to the proliferation of tools available to DSP designers and the fact that DSPs 
can be programmed using high level languages. To combat this Xilinx introduced their 
System Generator tool. System Generator is a Matlab Simulink add-on that allows FPGAs 
to be designed using a simple schematic capture environment using a flow that DSP 
designers are familiar with. The System Generator design flow is shown in Figure 17 [44]. 
System Generator includes various library components that can be used for DSP design. 
These vary in complexity from simple logic gates to a Viterbi decoder. Library 
components are designed in low level VHDL so are highly optimised for implementation in 
Xilinx FPGAs.
Hardware in the loop 
co-simulation
System level simulation 
using Simulink blocks
Automatic generation 
o f  VHDL and testbench
Automatic creation of 
hardware co­
simulation target
Figure 17: System Generator DSP design flow
The System Generator design flow can also be presented in a fashion that is familiar to 
hardware design engineers, as shown in Figure 18 [15].
FPGA
Map & 
PARSynthesis
Verification
Simulink
simulation
ModelSim
cosimulation
System
Generator
Design
Generate 
bitstream & 
hardware token
Matlab/Simulink
Figure 18: System Generator hardware design flow 
EngD Portfolio -  Volume 1, Edward Brown 29
Chapter 6: Technical Background
This shows the flexibility o f System Generator. W hile System Generator may not allow 
hardware engineers to fine tune designs in the same way a hardware description language 
(HDL) does, it does allow hardware and systems engineers to work together on the same 
design using a common platform, improving productivity.
6.4. Third Generation Mobile Technology
The main 3G standards are based around spread spectrum technology [45, 46, 47]. Spread 
spectrum uses spreading sequences such as W alsh codes and Gold codes to distinguish 
between different users. This is required as all users within a cell share both time and 
frequency. Figure 19 shows an example o f the spreading process. Each user in a cell is 
assigned a specific chip sequence. The base station controlling a cell stores all chip 
sequences assigned to each user and uses a correlator to determine which user is 
transm itting to the base station. The chip rate for the cdma2000 standard is 3-6864 
M chips/sec1, UMTS has a chip rate of 3-84 Mchips/sec. One other difference between the 
two standards is that the cdm a2000 standard is synchronised with a GPS clock while the 
UM TS standard is completely asynchronous. This makes the UM TS receiver more 
complex, however the cdm a2000 receiver is dependent on government satellites for the 
required GPS information. The cdma2000 standard is to be im plemented in the United 
States, the UMTS standard will be used in Europe. One advantage o f deploying a spread 
spectrum system in the US is that previous 2G  mobile standards used spread spectrum 
systems. W hereas in Europe, the previous 2G system uses the GSM standard. M eaning, 
handover in the US cdma2000 system is sim pler than in the European UM TS system. This 
is because previous US standards also used spread spectrum systems such as cdmaOne, 
whereas previous European systems have been based on the non spread spectrum GSM 
standard.
4 - T - t
+  f
Chip period Ts
F ig u r e  19 : S p re a d  s p e c t ru m  s ig n a l
A c t u a l c h ip  p e r io d  u s e d  is  u s u a l ly  l- 2 2 8 8 M c h ip / s e c ,  to  m a tc h  IS - 9 5  s ta n d a rd .
EngD Portfolio -  V olum el, Edward Brown 30
C hap te r 6: T echnical B ackground
The multipath delays inherent to mobile channels can cause problems in spread spectrum 
systems as they reduce orthogonality. Figure 20 shows a multipath channel between a user 
and a base station, Figure 21 shows the affect this has on the signal received at the base 
station.
x(t+T)
x(t+T) + x(t)
x(t)
i
Figure 20: Multipath channel
x(t)
iTunjTTTumj
x(t+T)LrttLrumuTililT
x(t+T) + x(t)
JTL
Figure 21: Signal received at base station
A simple solution could be to use the strongest signal received from each user. However, 
to obtain the best possible signal the base station must extract all signals received from each 
user. This is achieved in spread spectrum systems by using a rake receiver [48] at the base 
station. A pilot sequence is used to determine the characteristics of the channel, the pilot 
sequence is known by both the mobile and the base station. This pilot sequence can be 
used by the rake receiver to determine the different components of a user’s transmitted 
signal. A rake receiver is shown in Figure 22, where Ax represents the delay between each 
received signal.
EngD  P o rtfo lio  -  V olum e 1, E dw ard B row n 31
Chapter 6: Technical Background
Correlator ■
A,
received
signal
A2
A3
-> Correlator
A2
-► Correlator -
 A3 -
chip sequence
F ig u r e  2 2 :  R a k e  r e c e iv e r
Figure 23 shows a typical 3G system architecture, highlighting where channel coding takes 
place within the transport channel. Turbo codes are used in 3G for coding data traffic, the 
inherent latency associated with the recursive Turbo decoder means currently they are 
unlikely to be used for voice traffic.
As Figure 23 shows multiple users data are transmitted by one base station, each user is 
assigned a channel to transmit on. The ability o f FPGAs to process multiple channels, or 
users, at once is one major advantage that FPGAs have over DSPs for implementing 
components required by base stations. This advantage is gained because o f the parallel 
nature of FPGA architectures. Although the cost o f an FPGA is significantly higher than 
the cost o f a DSP, the cost per channel is significantly lower [24]. The savings of using an 
FPGA rather than a DSP are estim ated to be $490 per channel [24]. Some DSPs now 
contain dedicated hardware to perform tasks such as Viterbi and Turbo decoding [49]. The 
disadvantage o f using these hard wired components is that they cannot be reconfigured 
when a particular standard changes and become obsolete when a new standard needs to be 
implemented.
EngD Portfolio -  Volum el, Edward Brown 32
User 1 User 2
CRC attachment CRC attachment
Channel coding Channel coding
Rate matching/ 
puncturing
Rate matching/ 
puncturing
1st interleaving 1st interleaving
Physical channel 
segmentation
i
2nd interleaving 
- 1
Physical channel 
mapping
Chapter 6: Technical Background 
User i
CRC attachment
Channel coding
Rate matching/ 
puncturing
1st interleaving
"" 5Radio frame Radio frame
C A
Radio frame
segmentation segmentationv y segmentation
i r(
k Transport channel
\
Ar multiplexing
Figure 23: Transport channel of a typical 3G system
The changing nature of 3G standards is one reason for choosing an FPGA implementation 
over an ASIC implementation. One example of this was the fifth release of  the UMTS 
standard which contained standards for High-speed Downlink Packet Access (HSDPA). It 
is estimated that future UMTS systems will transmit 80% of data using HSDPA [50]. 
Therefore, HSDPA will provide a lot of revenue for network providers. Any UMTS base 
station using a combination of ASICs and DSPs designed before release 5 would have to 
redesign their base station, again causing lost revenue. If a UMTS base station was 
designed using FPGAs the release 5 standard could be implemented by reconfiguring the 
FPGAs in the base station. A base station designed using FPGAs gives the network 
provider confidence that future standards can be adhered to without expensive re-designs. 
This is important as network standards are sure to evolve even when standards are set. 
Network providers will want to implement systems such as multiple-in, multiple-out
E ngD  P ortfo lio  -  V olum e 1, E dw ard  B row n 33
Chapter 6: Technical Background
(M IM O) [51] when they become available to increase data rates and improve network 
efficiency.
One other im portant future developm ent in mobile technology is software defined radio 
(SDR) [52]. The main idea behind SDR is that mobiles can be reconfigured depending 
upon the environm ent in which it is being used. For example, a user m oving from  a 
country using a UM TS standard to a country using a cdm a2000 standard would have their 
handset reconfigured so that it could be used in a cdm a2000 environment. W hen the user 
returns to a UM TS environm ent the handset would again be reconfigured. SDR could also 
be used for entertainm ent purposes, a handset could be reconfigured to play various types 
o f media when required. Current ASICs cannot be used to implement SDR systems, 
FPGAs and DSPs can. However, as has already been mentioned, DSPs cannot com pete 
with the speed o f FPGAs and do not have the same parallel computing power that FPGAs 
have.
EngD Portfolio -  Volum el, Edward Brown 34
Chapter 7: Results and Discussion
7. Results and Discussion
Results in this Chapter show how the cdm a2000 and UM TS Turbo codecs perform  when 
certain param eters and channel conditions are altered. All results discussed were produced 
using the System  Generator BERT and in all cases an additive white Gaussian noise 
(AW GN) channel is used. Section 7.1 discusses the System Generator BERT and its 
novelty. Results produced by the BERT platform  are then shown and discussed for both 
the cdm a2000 and UM TS Turbo code standards. These results show how each o f the 
standards perform  when basic inputs such as block size, code rate and the num ber of 
iterations to be perform ed are altered. The UM TS Turbo decoder included a fast 
term ination algorithm, this algorithm  is presented and results obtained when using different 
fast term ination thresholds (FTTs) are shown. The BERT was also used to produce novel 
BER results. One value that could be changed in the BERT platform was channel variance. 
The RE used this to find a novel channel variance value that optim ised the cdm a2000 
Turbo decoder core. A param eterisable Turbo decoder is also presented, by altering the 
parameters o f the decoder core the amount o f FPGA resources used can be determ ined by 
the user. This is done by allowing the user to change parameters such as the SISO 
im plem entation algorithm, the sliding window size, the width o f input data metric and the 
width o f the internal data that represent the a, (3 and y metrics. Results obtained when these 
parameters are changed and the impact on the resources used are discussed. A nother novel 
result produced by the RE shows that the cdm a2000 Turbo code standard outperform s the 
UMTS Turbo code standard. Reasons for the difference in performance are presented.
7.1. Bit Error Rate Test Platform
A top level view o f the Bit Error Rate Test (BERT) Platform  is shown in Figure 24. 
Random data from the BERT platform  is input to the encoder under test. Encoded data is 
then passed to an additive white Gaussian noise channel, com posed o f a white Gaussian 
noise (W GN) generator and a soft converter. The soft converter converts a binary 0 to a -1 
and a binary 1 to a +1. A fter the encoded data is soft converted the W GN block adds noise. 
The noisy data is then passed to the decoder. Decoded data is passed to the BER calculator 
along with the original input data to the encoder. The two values are com pared and the 
BER can be calculated from this comparison, the result is then plotted by a M atlab script. 
A BERT is started by running a M atlab script, the script contains all inform ation needed by
EngD Portfolio -  Volume 1, Edward Brown 35
C hap te r 7: R esults and  D iscussion
the BERT to run a test, i.e., the number of  iterations to be performed, block size values and 
the code rate of the encoder. Figure 25 shows the procedure of running a test on the BERT.
B E R T  P la tfo rm
BER Calculator
WGN
Random Data
Soft Converter
Decoder
Encoder
C odec U nder T est
Figure 24: BERT Platform top-level
The Matlab script controlling the BERT platform can contain multiple tests, each test 
containing a different value for block size, number of iterations to be performed and code 
rate. The script automatically begins a test once the previous test has finished. If there are 
no more tests to be performed then the simulation is stopped. The results presented in this 
thesis were for two different Turbo codecs. However, the BERT platform could be adopted 
to test other codecs.
For a point to be plotted one of two conditions must be met. The number of  errors detected 
must be equal to or greater than x, a value input by the user, or the decoder must have 
output y bits, again this value is determined by the user. If neither of these events occurs 
another packet of data, whose length is equal to the block size stated in the Matlab script is 
encoded. If either of the conditions is met the BER value to be plotted is checked against a 
minimum BER value, if the BER value to be plotted is below the minimum BER value it is 
discarded and the next test is executed. The minimum BER value is determined by the 
user. If the BER value to be plotted is greater than the minimum BER value it is plotted by
Ea Matlab script, the — value is then incremented and the process of plotting another point
N0
is restarted. If a point is plotted because y bits are received then another user defined input,
EngD  P ortfo lio  -  V olum e 1, E dw ard B row n 36
C hapte r 7: R esults and D iscussion
minimum errors, is used to ensure that the result to be plotted was within confidence 
intervals. If the Turbo decoder processes y  bits, at least z of these bits have to be in error 
for a point to be plotted. If they are not the next test is executed without that point being 
plotted. Normally for statistical reliability z would be set to 100 bits [53], meaning that to 
meet a certain minimum BER y  has to be chosen accordingly. For example, if z=100 and 
minimum B E R = l x l 0 7 then y  must be greater than lx lO9.
No
More tests 
to be 
.executed/
Yes
No
Yes
/ y bits \  
received an< 
> z errors 
\de tec tedy
Yes BER < 
min 
. BER.
x errors 
.detected.
No
No
Yes
Decodenoise
AddEncode
random
data
Run Matlab 
script
Increment Plot point and 
write result to 
file
Figure 25: Procedure for running BERT
The BERT platform implemented in the System Generator tool is shown in Figure 26. The 
control logic in the BERT contains components that compares the encoder input with the 
decoder output and decides when a BER point should be plotted. The advantage of the 
BERT being implemented in System Generator is that it allows non-FPGA designers to use 
and upgrade a BERT platform in an FPGA environment without using standard FPGA 
design techniques such as HDLs. One example of this is the implementation of a 
puncturing block for both the cdma2000 and UMTS Turbo codecs. Neither of these cores 
originally contained a puncturing block. The RE created puncturing blocks for both the 
cdma2000 and UMTS Turbo codecs using System Generator.
E ngD  P ortfo lio  -  V olum e 1, E dw ard  B row n 37
C hapte r 7: R esu lts and D iscussion
cdma2000/UMTS 
Turbo encoder cdma2000/UMTS 
Turbo decoder
Control
logic
A WGN 
ChannelsRandom
data
Figure 26: System Generator implementation of BERT platform
Although another BERT platform has been designed in System Generator [441, the BERT 
platform presented in this thesis is almost entirely user programmable whereas the BERT 
previously implemented in System generator was not. The following elements of the 
BERT platform are user programmable:
• Number of errors to be detected before a point is plotted
• Number of bits to be processed before a point is plotted
• Minimum BER value to be plotted
• Number of iterations to be performed
• Block size of data to be encoded
• Code rate of data to be encoded
Having a user programmable BERT has a two-fold advantage for the sponsoring company. 
It gives them a test platform that is tuneable to any FEC system. Also, having the ability to 
change the test parameters, e.g. number of errors to be detected before a point is plotted, 
highlights the reconfigurability of FPGAs.
Originally the BERT platform contained 5 AWGN channels, one for each encoder output in 
the cdma2000 standard. The smallest FPGA device the BERT platform plus the target 
Turbo codec could be implemented on was a Xilinx Virtex-II 3000. The Virtex-II 3000 
contains a total of 14,336 slices. Although the BERT and Turbo codec only occupy around 
3000 slices the number of block RAMs it requires means that it has to be implemented on
EngD  P ortfo lio  -  V o lu m e l, E dw ard  B row n 38
Chapter 7: Results and Discussion
the X ilinx V irtex-II 3000. One advancem ent made by the RE was to reduce the num ber of 
channels in the BERT to one, this reduces the num ber of block RAM s required and means 
the BERT platform and Turbo codec being tested could be im plemented on a Xilinx Virtex- 
II 2000 device. This is highlighted in Table 6 which shows the resources available on both 
the Virtex-11 2000 and the Virtex-11 3000 and the resources used by the BERT when 
im plem ented with five channels and with only one channel. W hen the BERT was being 
tested with five channels a flaw was discovered with the X ilinx Synthesis Tool (XST) that 
caused it to crash. The tool would crash because of the num ber of files generated by the 
BERT when five channels were included in the design. The only solution available at this 
time was to use a different synthesis tool. This flaw was not found when using only a 
single channel. Using the BERT with only one channel therefore had more advantages than 
ju st reducing the resources required. It also allowed Xilinx customers to take the BERT 
through the System Generator design flow using only Xilinx software tools, meaning 
Xilinx customers would not have to spend extra money on licenses for synthesis tools.
T a b le  6 :  R e s o u rc e s  a v a i la b le  a n d  re s o u rc e s  r e q u ir e d  f o r  v a r io u s  B E R T  im p le m e n ta t io n s
Slices EMults BRAMs
Device Resources Available
Virtex II 2000 10,752 56 56
Virtex II 3000 14,336 96 96
BERT Implementation Resources Required
Single Channel 3,481 10 54
Five Channels 7,181 34 86
The main advantage of im plementing the BERT platform  in hardware instead o f software 
was the speed up obtained. It was shown [15] showed that the speedup obtained when 
using a hardware implementation of the BERT platform  gave a speedup o f 17,500 over a 
software implementation of the BERT platform.
7.2. Block Size
W hen the block size o f the data input to a Turbo encoder is increased the BER perform ance 
o f the Turbo codec improves. This is shown in Figure 27 and Figure 28. The latter shows 
how BER performance improves as block size increases for the UM TS Turbo codec. 
Figure 27 shows how BER performance improves as block size increases for the cdma2000
EngD Portfolio -  Volum el, Edward Brown 39
C hap te r 7: R esults and D iscussion
Turbo codec. All BER curves in Figure 27 were produced using the Max Scale SISO 
algorithm, a code rate of j  and 5 iterations of the Turbo decoder.
a:
UJ
CO
pTTTTTTTTTTTnTTTTTTTl
: —*— cdma2000. Max Scale, Block Size 378, Iterations 5 Rate 5
I —t— cdma2000, Max Scale, Block Size 570, Iterations 5, Rate 5
; -©- cdma2000. Max Scale Block Size 762. Iterations 5 Rate 5
; —B- cdma2000, Max Scale Block Size 1146 Iterations 5, Rate 5
; —*— cdma2000, Max Scale Block Size 1530. Iterations 5, Rate 5
. cdma2000. Max Scale Block Size 2298 Iterations 5. Rate 5
cdma2000. Max Scale Block Size 3066 Iterabons 5 Rate 5
Bit Error Rate Plot
0 5 1 15
Eb/NO
Figure 27: BER perform ance o f cdma2000 Turbo codec as block size changes
Bit Error Rate Plot
QC
UJ
co
1.5 2
Eb/NO
—t— UMTS, Max Scale Block Size 378, Iterations 5. Rate Matching 0
-©- UMTS Max Scale Block Size 762, Iterations 5. Rate MatchmgO
-B- UMTS. Max Scale Block Size 1530. Iterations 5, Rate Matching 0
— UMTS Max Scale Block Size 3066 Iterations 5. Rate Matching 0
:
Figure 28 BER perform ance o f UMTS Turbo codec as block size changes
E ngD  P ortfo lio  -  V olum e 1, E dw ard  B row n 40
C hapte r 7: R esults and  D iscussion
BER performance improves as block size increases because of the interleaver in the Turbo 
encoder. As the number of bits input to the Turbo interleaver increases the correlation 
between the data input to RSC1 and RSC2 decreases. Decreasing the correlation of  the 
RSC inputs allows each decoder to output a vastly different representation of the data set 
input to the Turbo encoder, allowing each SISO decoder to create a more independent 
estimate of the original input data set. Figure 28 shows the error floor for the UMTS Turbo 
decoder being reached, as will be shown in Section 7.6 the error floor for the cdma2000 
Turbo code implementation is approached later than the UMTS standard Turbo code, 
leading to the cdma2000 standard having a slightly better performance than the UMTS 
standard.
7.3. I te ra tio n s
Increasing the number of iterations performed by the Turbo decoder improves the BER 
performance. Figure 29 and Figure 30 show how the number of iterations performed affect 
the BER performance of the cdma2000 and UMTS Turbo codecs respectively.
Bit Error Rate Plot
—**- cdma2000, Max Scale. Block Size 3066, Iterations 3, Rate 3
—I— cdma2000, Max Scale Block Size 3066 Iterations 5, Rate 3
—©- cdma2000. Max Scale Block Size 3066, Iterations 8, Rate 3
—B- cdma2000. Max Scale. Block Size 3066, Iterations 15, Rate 3
2 101
CO I ! ! ! ! ! ! ! ! ! !
Eb/NO
Figure 29: BER performance of cdm a2000 Turbo codec as number of iterations performed increases
EngD  P ortfo lio  -  V olum e 1, E dw ard  B row n 41
C hapte r 7: R esults and D iscussion
Bit Error Rate Plot
—*- UMTS. Max Scale. Block Size 378. Iterations 3. Rate Matching 0
—t— UMTS Max Scale. Block Size 378. Iterations 5 Rate Matching 0
—©- UMTS, Max Scale. Block Size 378. Iterations 9, Rate Matching 0
: I I I ! ! I I I I ! 1 I ! I !
10
2.5
Eb/NO
Figure 30: BER performance o f UMTS Turbo codec as number of iterations perform ed increases
As the number of iterations increases the coding gain between each single increment 
decreases. This can be seen in Figure 29. The coding gain between the BER curves for 
iterations equal to 3 and 5 is 0 5 d B  at a BER of IxlO 6. At the same BER the coding gain 
between iterations equal to 5 and 15 is only 0-4dB.
The UMTS Turbo codec also contained the option of including a fast termination 
algorithm. Fast termination allows the Turbo decoder to stop before the specified number 
of iterations to be performed is reached. There are a number of different algorithms 
available to achieve this [54, 55]. The algorithm implemented for the UMTS Turbo 
decoder [54] monitors hard outputs from the SISO decoders and is one of the simplest 
algorithms to implement. When a certain number of consecutive SISO decoder outputs are 
equal for a particular decoder input at time t the decoder outputs data. The number of 
consecutive SISO decoder outputs that have to be equal is determined by the user. The user 
can specify a maximum number of iterations, if this maximum is reached before a 
consecutive number of SISOs are determined to be equal the decoder begins to output data. 
The advantage of fast termination is that the average number of iterations is reduced, 
thereby reducing the average latency of the Turbo decoder. As the number of iterations is 
reduced the power consumed by the FPGA per decoder cycle is also reduced. Given that
E ngD  P ortfo lio  -  V olum e 1, E dw ard  B row n 42
C hapte r 7: R esults and D iscussion
power consumption is a major consideration for base station designers this is a relatively 
important result.
The FTT determines how many consecutive SISOs must be equal for a particular decoder 
input at time t. If the FTT is set to zero, fast termination is switched off. If the FTT is set
to one, two consecutive SISO decoder outputs must be equal before the decoded data is
output. If the FTT is set to two, three consecutive SISO decoder outputs must be equal 
before the decoded data is output. This process continues up to a maximum FTT of seven, 
in this instance eight consecutive SISO decoder outputs must be equal before the decoded 
data is output. Figure 31 shows a BER plot for different FTTs. Figure 32 shows the 
average number of iterations performed for each FTT. The disadvantage of using a low
E EFTT is shown between —  values l-5dB and 2-25dB in Figure 31. At these —  values the
N0 & No
curve with a FTT of 3 outperforms the FTT of 1. The main advantage of using a low FTT 
is that the average number of iterations is lower with respect to higher FTTs. Up until ldB 
the BER curves are relatively similar due to the average number of iterations being 
performed being relatively close together. After ldB, Figure 31 shows that the two BER 
curves start to diverge slightly. The divergence is most obvious above l -5dB.
Bit Error Rats
I : : : : : : : : : : : : M : : : : : : :  f :
UMTS. Max Scale Block Size 762. Max Iter 5. FFT 1. RM 0 
UMTS. Max Scale Block Size 762. Max Iter 5. FFT 3 RM 0
l I i I 2
m 10 : : i | ; I
Eb/NO
Figure 31: BER plo t o f  d ifferen t fast te rm ina tion  th resho lds
EngD  Portfo lio  -  V olum e 1, E dw ard  B row n 43
C hap te r 7: R esu lts and  D iscussion
Average Iterations Plot
UMTS, Max Scale Block Size 762. Max Iter 5. FTT 1 RM 0 
—(— UMTS. Max Scale, Block Size 762, Max Iter 5, FTT 3, RM 0
4
o>
><
2
1
Eb/NO
Figure 32: Average iterations plot for different fast termination thresholds
The fast termination algorithm implemented uses hard outputs from each SISO decoder. 
As each SISO decoder is initially not configured to output hard values the Turbo decoder 
structure must be changed. The Turbo decoder structure for implementing the particular 
fast termination algorithm described in this section is shown in Figure 33.
Fast termination 
threshold
Counter
en
po
enable Y,s - • ->  SISO l >  Interleaver
For i 
iterationsHD
De-Interleaver 4 SISO 2
Y 'pi 1 a
Figure 33: T urbo d ecoder w ith fast term ination  im p lem en ted
EngD  P ortfo lio  -  V olum e 1, E dw ard  B row n 44
C hapte r 7: R esu lts and  D iscussion
The extra resources required for the Turbo decoder with fast termination is one block RAM 
and approximately thirty slices.
7.4. Code R ate
Decreasing the Turbo encoder code rate has the advantage of improving BER performance, 
as shown in Figure 34. However, it also increases the amount o f  redundant information 
sent by the Turbo encoder. The cdma2000 standard Turbo encoder’s puncturing pattern 
was discussed in Section 6.2.1. Results obtained when changing the code rate in the 
cdma2000 Turbo codec are shown in Figure 34. The drop in performance is due to the 
punctured bits being replaced with an all zero codeword at the decoder input.
Bit Error Rate Plot
—*- cdma2000 Max Scale Block Size 570. Iterations 5. Rate 2
—t— cdma2000. Max Scale Block Size 570 Iterations 5. Rate 3
-©- cdma2000 Max Scale Block Size 570 Iterations 5 Rate 4
-B- cdma2000 Max Scale Block Size 570 Iterations 5 Rate 5
Eb/NO
Figure 34: Effect of using different code rates on BER performance in cdma2000 Turbo codec 
7.5. P a ram ete risab le  T u rb o  D ecoder
The object of the novel parameterisable Turbo decoder is to offer the user o f  the decoder 
various options that they can use to balance performance and the resources used by the 
FPGA to suit their particular need. The parameterisable options available are:
• Input data width: the input to the Turbo decoder can be anywhere between 3 and 7 
bits in length. The integer part o f  this input can be either 2 or 3 bits. The fractional 
part can be between 1 and 4 bits in length.
EngD  P ortfo lio  -  V olum e 1, E dw ard  B row n 45
Chapter 7: Results and Discussion
•  Internal metric width: The integer part o f the a, fi, y and log-likelihood metrics can 
be either 6 or 7 bits. The fractional part o f these metrics can be between 1 and 4 
bits.
• Sliding window size: The size of the sliding window can be either 32 or 64.
•  SISO algorithm: The SISO algorithm  can be either the log-MAP, M ax-log-M AP or 
M ax Scale algorithms.
•  External RAM : The data processed and produced by the Turbo decoder can use 
memory available on the FPGA or external RAM.
The sliding window technique [56] is a way of reducing the am ount o f m emory required by 
the Turbo decoder. If  the sliding window technique was not used the num ber o f fi and y 
metrics required to be stored would be directly proportional to the block size of the packet 
being decoded. If  the sliding window technique is used the num ber of fi and y metrics that 
need to be stored is directly proportional to the sliding window size. D epending on the 
sliding window size the am ount o f memory needed for the fi  and y metrics is reduced by a 
factor of 159 for the UMTS standard and 647 for the cdm a2000 standard. Figure 35 shows 
how the sliding window technique works. The packet to be decoded is split into windows 
o f the same size. In decoding stage 1 an estim ate o f the starting fi metric for window 1 is 
gained by performing a backwards recursion on window 2. The initial values o f all fi 
metrics in window 2 is an all zero symbol, in stage one the fi values calculated for window 
2 are discarded. At the same time a forward recursion is perform ed on window 1. All a 
metrics calculated for window 1 are stored. In stage 2 the fi values for window 1 are 
calculated. These values and the a metrics calculated in stage 1 can then be used to 
calculate the log-likelihood ratios for window 1. At the same time an initial backward 
recursion is perform ed on window 3 to gain a starting point for the fi metric calculations in 
window 2. A forward recursion on window 2 is also perform ed at this time.
1 packet
Decoding 
stage 1
a(U{) H u t ) 1 window 
< ►
fii(u t)  ^ a(ut) fis(ut)
stage 2 1 2 3 X
Figure 35: Sliding window technique
EngD Portfolio -  Volume 1, Edward Brown 46
Chapter 7: Results and Discussion
As m entioned previously the sliding window technique decreases the memory required by 
the Turbo decoder, it also decreases the clock cycle latency o f the Turbo decoder. 
However, the BER performance is compromised as the fi  metrics calculated using the 
sliding window technique will have an error associated with them. Traditional Turbo 
decoders require a latency that is at least twice the block size o f the packet being decoded. 
Appendix D {vol. 2/pp. 26-40}, presents an architecture that requires a latency o f only one 
times the block size of a single packet while still retaining the correct fi metrics. Although 
this architecture was never implemented, theoretically it should reduce the latency o f one 
Turbo decoder iteration by -j. W here one Turbo decoder iteration includes two SISO
decoder operations, one interleaver operation and one de-interleaver operation. In a 
traditional Turbo decoder a SISO must perform both a forward and backward traversal o f 
the trellis, this will take approxim ately twice as many clock cycles to complete one SISO 
operation as it would to perform  an interleaver or de-interleaver operation. If it takes c 
clock cycles to interleave or de-interleave a data set, then it will take 2c clock cycles to 
complete one SISO operation on the same data set, giving a total o f 6c clock cycles to 
complete one Turbo decoder iteration on any given data set for a traditional Turbo decoder. 
The architecture proposed halves the num ber o f clock cycles required to com plete one 
SISO operation to c, giving the total num ber of clock cycles to complete one Turbo decoder 
iteration to be 4c for the new architecture proposed.
The advantage o f using a larger sliding window size is that a more accurate value for the fi  
metrics is obtained. However, the latency required by a larger sliding window size is 
relatively larger. This is shown in Figure 36 which also shows that the latency for a 
traditional Turbo decoder is alm ost twice the latency for a Turbo decoder using the sliding 
window technique.
EngD Portfolio -  Volume 1, Edward Brown 47
C h a p te r  7 : R e s u lts  a n d  D is c u s s io n  
Traditional Turbo decoder
Decoding 
stage 1 K
P ( U t )
-256-
Decoding 
stage 2
a(u t) ■
-256-
Decoding 
stage 1
Decoding 
stage 2
P2
Sliding window 64
Sliding window 32 
o-i f$2
1 2 3 4 5 6 7 8
Pi «2 Ps
14-1—W4—1
1 2 3 4 5 6 7 8
p 2  <*3 p 4
I4H- H 4 - 1
l 2 3 4 5 6 7 8
Ps a 4 p 5
4 -1 - N 4 —1
1 2 3 4 5 6 7 8
Pa «5  p 6
4 -4 —N4—1
l 2 3 4 5 6 7 8
J#5 a <5 P 7
4—f—H4—1
1 2 3 4 5 6 7 8
P 6  « 7
14-4—N4—1
1 2 3 4 5 6 7 8
^ 7
4 -1
1 2 3 4 5 6 7 8
N—4
l 2 3 4 5 6 7 8
Decoding 
stage 3
P2 , 0-3 , P4
14—— I-------- N 4-—
1 2 3 4
Decoding 
stage 4
B3 0L4
— I-------- H
1 2 3 4
Decoding 
stage 5
Pa
1 2 3 4
F ig u r e  3 6 :  D e c o d in g  o f  p a c k e t  w i t h  b lo c k  s iz e  2 5 6  u s in g  t r a d i t io n a l T u r b o  d e c o d e r ,  T u r b o  d e c o d e r  w i t h  
s l id in g  w in d o w  6 4  a n d  T u r b o  d e c o d e r  w i t h  s l id in g  w in d o w  3 2
EngD Portfolio -  Volum el, Edward Brown 48
Chapter 7: Results and Discussion
A sliding window o f size 32 can operate on two windows in the same amount of time that it 
takes a sliding window o f size 64 to operate on 1. Hence, each decoding stage contains two 
operations for every sliding window of size 32 and only one operation for a sliding window 
of size 64. Each decoding stage for the traditional Turbo decoder takes a total o f 256 clock 
cycles, one clock cycle per input symbol. Therefore, a total o f 512 clock cycles are 
required to output all log-likelihood metrics. The decoding stage for each of the sliding 
window implementations takes 64 clock cycles to complete. The only exception is for the 
sliding window o f size 32 in decoding stage 5, as Figure 36 shows, this takes only 32 clock 
cycles. By adding the num ber o f clock cycles for both sliding window sizes in Figure 36, it 
can be shown that a sliding window o f size 64 will take 320 clock cycles to decode a set o f 
data containing 256 symbols. A sliding window o f size 32 will take 288 clock cycles to 
decode the same data set, both o f these results conform  to (20). Therefore (20) can be used 
to estimate the total clock cycles required to decode a data set, given the num ber of symbols 
in the data set and the sliding window size being used to decode the data set.
clock cycles = block size + sliding window size (20)
The BERT platform is also capable of producing frame error rate (FER) plots as well as 
BER plots. The FER is the number o f errors occurring in one packet o f data. Figure 37 
shows the FER for a sliding window o f size 64 and a sliding window of size 32, 
highlighting the advantage of using a relatively larger sliding window size. However, as 
Section 7.8 will show, a standard Turbo decoder outperform s a Turbo decoder using the 
sliding window technique.
EngD Portfolio -  Volum el, Edward Brown 49
C hapte r 7: R esu lts and D iscussion
Bit Error Rate
3. RM 0 
3 RM 0
UMTS Max Scale SW 32. Block Size 762 
UMTS Max Scale SW64 Block Size 762
u j  1 0  * 
co
10  -
10'7
Eb/NO
Figure 37: FER plot showing different sliding window sizes
Implementing a sliding window of size 64 will require more resources than implementing a 
sliding window of size 32. A Turbo decoder using the Max Scale algorithm with 5 bits 
representing input to the decoder, 9 bits representing internal metrics and a sliding window 
of size 32 will require 3,401 LUTs, 1,668 registers and 47 block memory components. The 
same configuration, with a sliding window size of 64 uses 3,697 LUTs, 1,883 registers and 
47 block memory components.
Sliding window size is not the only parameter that can be used to alter the memory required 
a Turbo decoder. The input and internal metric widths also have an impact on both 
memory and performance. As the width of both the internal metric and input data increase, 
the BER performance of the Turbo decoder improves. Increasing the width of these values 
essentially decreases quantization noise. The input data comes from a demodulator, as 
Figure 3 shows; there will be a conversion between the data coming from the demodulator 
into the Turbo decoder. Allowing the turbo decoder to use more bits to represent the input 
data means that the quantization error will be reduced. The internal metric represents the a, 
/?, y and log-likelihood metrics. Increasing the width of these metrics allows the decoder to 
produce a more accurate representation of the internal metrics, increasing the BER 
performance of the decoder. This is shown in Figure 38 for different internal metric 
fractional widths; the integer width of all of these BER plots is 6.
E ngD  P ortfo lio  -  V olum e 1, E dw ard  B row n 50
C hapte r 7: R esu lts and D iscussion
Bit Error Rate
-x -  cdma2000, Max Scaie SW 32. Block Size 3066. Iter 5. Rate 3, Frac Width 2
— cdma2000 Max Scale. SW 32. Block Size 3066. Iter 5 Rate 3. Frac Width 3
-0 - cdma2000, Max Scale SW 32, Block Size 3066, Iter 5, Rate 3, Frac Width 4
cc
ID
co
10 '*
0 9
Eb/NO
Figure 38: BER plot for different internal metrics
The Turbo decoder used in Figure 38 uses the Max Scale algorithm with an input data 
width of 4 bits and a sliding window of size 32. A fractional width of 3 and 4 produces 
very similar results, however both are a great improvement over a fractional width of 2. At 
a BER of l x l O 4 the coding gain of using 4 fractional bits compared with 2 fractional bits is 
007dB . Using 3 fractional bits gives a coding gain of 0 05dB compared with 2 fractional 
bits at the same BER. The resources required in implementing the Turbo decoder with 
different internal fractional widths is shown in Table 7.
Table 7: FPGA resources required for different internal fractional widths
F rac tiona l
w idth Slices E M ults BRA M s
2 1800 2 40
3 1920 2 42
4 2066 2 43
As Table 7 shows the resources required by a Turbo decoder using 4 internal fractional bits 
is slightly more than a Turbo decoder using 2 or 3 internal fractional bits. If the amount of 
resources used by the Turbo decoder is not an issue, then the core with an internal fractional 
width of 4 bits should always be used. Just over 10% more resources are required, when
E ngD  P ortfo lio  -  V o lu m e l, E dw ard  B row n 51
C hap te r 7: R esu lts and D iscussion
compared with a Turbo decoder using 2 internal fractional bits. However, if the resources 
used were an issue then a compromise would be to use a Turbo decoder with only 3 
fractional bits. At a BER of 1x10 4 the coding gain of using 4 fractional bits compared with 
3 fractional bits is only 00 2 d B . Each state in Turbo decoder is associated with 4 internal 
metrics, an a, /? and two y metrics. Each time instance is also associated with a log- 
likelihood metric. Each time instance has 8 states and therefore in total 33 metrics are 
associated with a single time instance. Increasing the internal metric width by a single bit 
means that the memory required increases by 33 bits per time instance.
The SISO algorithm chosen has a significant bearing on the performance of the Turbo 
decoder. Figure 39 highlights the differences between the three algorithms described in 
Section 6.2.3. The Turbo decoder used to create the plots in Figure 39 had an input data 
width of 5 bits, an internal metric width of 9 bits and a sliding window of size 32. Table 8 
shows the resources used for the Turbo decoder implemented with different SISO 
algorithms.
Bit Error Rats
— cdma2000, Max-log-MAP. Block Size 3066, Iter 5 Rate 3 
—(— cdma2000. Max Scale Block Size 3066 Iter 5 Rate 3 
-©- cdma2000. log-MAP Block Size 3066 Iter 5 Rate 3
u j  1 0
Eb/NO
Figure 39: BER perform ance of log-MAP, M ax-log-M AP and Max Scale algorithm
EngD  P o rtfo lio  -  V olum e 1. E dw ard  B row n 52
C h a p te r  7 :  R e s u lts  a n d  D is c u s s io n  
T a b le  8 : F P G A  re s o u rc e s  u s e d  f o r  d i f f e r e n t  S IS O  a lg o r i t h m s
SISO  A lgorithm Slices E M ults B R A M s
M ax-log-M A P 1971 2 47
M ax Scale 1942 2 47
log-M A P 2137 2 47
As m entioned in Section 6.2.3 the M ax Scale and M ax-log-M AP algorithms are extremely 
similar, hence the resources required are very similar. However, Figure 39 shows that the 
M ax Scale gives a significant coding gain relative to the M ax-log-M AP algorithm. The 
coding gain for the M ax Scale algorithm com pared with the M ax-log-M AP algorithm is 
alm ost 0-3dB at a BER of lx lO -4. Figure 39 also shows the coding gain that is achieved by 
the more complex log-M AP algorithm. This has a coding gain o f approximately O ldB 
compared with the M ax Scale algorithm. The extra resources required by the log-M AP 
algorithm over the other two SISO algorithms are used to im plem ent the look up tables 
used in the log-M AP algorithm. Again, if  the amount o f resources required by the Turbo 
decoder were not an issue, then the log-M AP algorithm should always be chosen as the 
SISO decoder implementation, as only around 10% more resources are required by the log- 
M AP implementation, when compared with the M ax-log-M AP implementation.
7.6. cdma2000 vs. UMTS
Using the System Generator BERT the RE was able to com pare the cdma2000 and UM TS 
Turbo code standards. Figure 40 shows the BER plots for the cdma2000 and Turbo 
decoders at various block sizes. The UM TS results are shown in red, the cdma2000 results 
are shown in blue.
EngD Portfolio -  Volum el, Edward Brown 53
C hapte r 7: R esu lts and D iscussion
10
10
10
10
10
<r
Litm
10
10
10
10
Figure 40 shows that the cdma2000 Turbo decoder is better than the UMTS Turbo decoder 
at all block sizes tested for the Max Scale implementation, this result was also observed for 
the log-MAP algorithm. In some instances the variation in performance could be attributed 
to different design techniques or design flows. However, both decoders were implemented 
using the same design tools and design flow which would discount this argument. The 
most likely reason for the difference in performance is either the Turbo encoder or Turbo 
interleaver. Section 6.2.2 discussed the differences between the cdma2000 and UMTS 
Turbo interleavers. This is likely to be a contributing factor to the coding gain obtained by 
the cdma2000 Turbo interleaver relative to the UMTS Turbo interleaver. Although the 
UMTS Turbo interleaver is more complex and therefore more difficult to implement in 
hardware it does have some advantages over the cdma2000 Turbo interleaver. The main 
advantage is that the cdma2000 Turbo interleaver can only process discrete block sizes 
whereas the UMTS Turbo interleaver can process a packet of any width between 40 bits 
and 5 114 bits.
Figure 40 also reveals more advantages of using hardware rather than software for 
implementing the BERT. Using hardware means that BER plots can be generated at very 
low BER values. A software implementation would have to run for weeks or even months
! ! ! ! !H 'n 7 ! i r7 = T ir !M t! ! l ! in i! ! ! ! l® !! ! ! ! !
H i i i i i i i i i i i i i i i H i i g i n i H i
Bit Error Rate Plot
cdma2000 Max Scale Block Size 378. Iterations 9 
UMTS Max Scale Block Size 378, Iterations 9,ft0 
cdnia2000 Max Scale. Block Size 1530 Iterations 9 
UMTS Max Scale Block Size 1530. Iterations 9 
cdma2000 Max Scale Block Size 4602. Iterations 9 
UMTS. Max Scale Block Size 4602, Iterations 9
Hllj ...........|
I m m n m m m i
15 2
Eb/NO
Figure 40: Comparison of cdm a2000 and UMTS Turbo decoders
E ngD  P o rtfo lio  -  V olum e 1, E dw ard  B row n 54
Chapter 7: Results and Discussion
to get to these BER levels. At the low BER levels it can be seen the BER plot is 
approaching the Turbo error floor. The error floor is an inherent problem with Turbo 
interleavers. The error floor occurs after the so-called waterfall region o f the BER plot. 
The waterfall region o f the BER plot is where the BER curve has a very steep gradient, i.e. 
between 0*3dB and 0-8dB for a block size o f 4602 in Figure 40. The error floor o f this 
BER curve occurs above 08 d B . One point o f note from Figure 40 is that as the plot 
showing a block size o f 4602 approaches the error floor it has a shallower gradient than the 
other two BER curves. By referring to Figure 40 it can be seen that as block size increases 
the gradient o f the BER plot as it approaches error floor becomes shallower. Had the BER 
curve for a block size o f 4602 continued it would have crossed the BER curve for a block 
size o f 1530. Section 7.7 discusses one particular situation where a BER curve for a code 
rate o f j  cross over the BER curves o f a higher code rate. A novel solution to this problem 
is given.
7.7. Channel Variance
The channel variance value, cr2, o f an AW GN channel has an impact on the input to the 
Turbo decoder, which is in the form of a log-likelihood ratio, L(ut), the effect is shown in
(21), where / / i s  the mean.
= (21)
<7
The data input to the decoder would usually take the form o f as both the mean and 
variance would be estim ated to be 1.
Channel variance is also linked to by (22).
F 1
—  = ------------- - j  (22)
N 0 2 x  rate x  c r
A plot o f (22) is shown in Figure 41. If a channel variance estim ate o f 1 was used then it 
means that the Turbo decoder is optimal at ^  values o f OdB, T8dB and 3dB for code rates 
o f j  , y and j  respectively.
EngD Portfolio -  Volum el, Edward Brown 55
C hapte r 7: R esu lts and D iscussion
E b / N o  v s  N o i s e  V a r i a n c e
2 .S
O  R a t e  1 / 5  
— (—  R a t e  1 / 4  
— « -  R a t e  1 / 3  
R a t e  1 / 2
2.3 
2 2
d)
( JCTO
TO>O)
CO
o
0.9
0 . 7
0 . 5
E b / N o  ( d B )
E
Figure 41: Plot of -^ 4- against noise variance
It would be expected that if a number of BER plots were generated, each with a different 
code rate, the BER curve with the lowest code rate would have the best BER performance. 
However, one result produced by the BERT platform, shown in Figure 42, shows that the 
BER curves for code rates of -j and j  both have a better BER performance than a code rate
of x  above a x r  value of l-8dB. This result could have occurred because of the Turbo5 N0
interleaver as discussed in Section 7.6. However, after referring to Figure 41 the RE 
decided to investigate changing the noise variance value to see how it affected the BER plot 
shown in Figure 42. A channel variance value of 1 was used in the channel to generate the 
plots in Figure 42.
EngD  Portfo lio  -  V olum e 1, E dw ard  B row n 56
C hapte r 7: R esults and D iscussion
B E R  vs E b/N o
10' — Rat e 2 , B lo c k S iz e  378 , Ite ra tions 5
—I— R ate 3 , B lo c k S iz e  378 , Ite ra tions 5
- e -  R ate 4 , B lo c k S iz e  378 , Ite ra tions 5
- B -  R ate 5 , B lo c k S iz e  378 , Ite ra tions 5
■4
10 '
■5
10 '
10'6
•710'
32.51.5 21
E b/N o (dB)
Figure 42: Non-optimal BER performance of code rate y  using <f value of 1
The RE used the BERT platform to alter the channel variance value in (21). A number of 
channel variance values were tested until an optimum value was found for the code rates in 
Figure 42. Noise variance was incremented from a value of 1 to a value of 1 -35 in 
increments of 0 05 to find the optimal noise variance value. Figure 43 shows the results 
obtained when using a code rate of y zoomed into the region of interest. Values of 11 and 
115 were found to be optimal for a code rate of  y .  Although variance values of 1, 105 
and 1-3 seem to perform better at the highest value they suffer from the crossover 
problem highlighted earlier. For a code rate of y  noise variance values of 11, 1-2 and 1-25 
were found to be most optimal. Results obtained for code rates of y and \  showed a 
similar pattern. As a value of 1-1 is optimal for all code rates, it was chosen as the 
preferred o 2n value. Figure 12 compares the results obtained when the value of noise 
variance was set to 1 and 11.
E ngD  Portfo lio  -  V olum e 1, E dw ard  B row n 57
C hapte r 7: R esu lts and D iscussion
— Ra t e  5 . B lockS ize  378  . I tera tions 5  . V ariance  1 1
R a te  5 . B lockS ize  378  , I tera tions 5  . V aria n ce  1 2
- e -  R a te  5  . B lockS ize  378  . I tera tions 5  . V aria n ce  1 25
- O -  R a te  5  . B lockS ize  378  . I tera tions 5  . V aria n ce  1 15
— Ra t e  5  . B lockS ize  378  . I tera tions 5  . V aria n ce  1 35
R a te  5  . B lockS ize  378  , I tera tions 5  . V ariance  1.3 
R a te  5  . B lockS ize  3 78  . I tera tions 5  . V aria n ce  1 05
- 0-  R a te  5  . B lo ck S ize  3 78  . I tera tions 5  . V aria n ce  1
Figure 43:
—x—
-B-
cdma2000,
cdma200Q.
cdma2000.
cdma2000.
Max-log-MAP,
Max-log-MAP
Max-log-MAP
Max-log-MAP
Block Size 378, Iterations 5. 
Block Size 378 Iterations 5. 
Block Size 378. Iterations 5, 
Block Size 378 Iterations 5
Rate 2 
Rate 3 
Rate 4 
Rate 5
Code rate
1 6  1 8  2  2 .2  2  4 2 6
E b/N o (dB)
\  BER plots for noise variance values between 1 and 1 -35 
Bit Error Rate
16 18 2 2 2 2 4 2 6 2 8 3
Eb/NO
a:
U Jm
Figure 44: Optimal BER performance of code rate j  using a2 value of l • 1
As Figure 44 shows, the problem of crossover has been resolved, giving a more preferable 
result. However, at the expense of solving the crossover problem, the BER performance of 
all code rates is compromised, relative to BER plots where the variance value is calculated
EngD  Portfo lio  -  V olum e 1, E dw ard  B row n 5 8
C hapter 7: R esu lts and D iscussion
using (22). This is shown in Figure 45. BER plots where a" is known are shown in red, 
BER plots where e r = M  are shown in blue.
Bit Error Rate
cr
LUco
0 0 5 1 15 2 2 5 3 3 5 4
Eb/NO
Figure 45: Comparison of BER plots with known a2 and BER plots using a2 = I I
As Figure 45 shows the BER plots with a known cr value clearly outperforms BER plots 
whose ( f  =1 • 1. At a BER of 10 6, the BER plots using a known cf value outperform the 
BER plots whose cf =\ 1 at code rates of -j , j  and j  by around 0-25dB.
7.8. C om parison  W ith  Published  R esults
To qualify the results published in this portfolio, BER plots generated using the System 
generator BERT were compared to results already published by other researchers.
Figure 46 shows results produced by the RE for the cdma2000 decoder. This plot shows 
how the log-MAP algorithm performs when using a block size of 1530 at all available 
puncturing rates. The results shown in Figure 46 are comparable with those published by 
Valenti and Sun [57], Table 9 shows how both sets of results compare at a BER of 10 6, at 
all rates the results shown in [57] are, at maximum, within O ldB of the results produced by 
the RE.
10   : = -r. .
Rate 2, Var 1.1 
Rate 2, Known Var 
Rate 3, Var 1 1 
Rate 3, Known Var 
Rate 4, Var 1 1 
Rate 4, Known Var 
Rate 5, Var 1.1 
Rate 5. Known Var
cdma2000
cdma2000
cdma2000
cdma2Q00
cdma2000
cdma2000
cdma2000
cdma2000
Max-log-MAP, BS 378 
Max-log-MAP. BS 378 
Max-log-MAP BS 378 
Max-log-MAP. BS 378 
Max-log-MAP. BS 378 
Max-log-MAP BS 378 
Max-log-MAP. BS 378 
Max-log-MAP. BS 378
I;
l i i l iH ili i i iHi
i l H l l h l h i U
M im n iilH !
d i l l l l l i i l l l l i l l f lnnnnnn
E ngD  P ortfo lio  -  V olum e 1, E dw ard  B row n 59
Chapter 7: Results and Discussion 
Table 9: Comparison of Valenti and Sun [57] with results produced by the RE at BER of 106
R ate 12
i
3
i
4
i
5
V alenti & Sun [57] T7dB ldB 0-75dB 0-6dB
RE 1 -75dB M d B 0-7dB 0-65dB
Bit Error Rate
1 0 °
10 
10
10
cc „ 
l l i  1 0  
CO
10 
10 
10 
10
0 0 2 0 4 06 08 1 1.2 1 4 1 6 1 8 2
Eb/NO
Figure 46: BER plot for comparison with Valenti and Sun [57]
Figure 47 shows results produced by the RE using the cdma2000 Turbo decoder for a block 
size of 378 and iterations between l and 8. Qi [58], reveals results for these decoder
settings at a ^  of F2dB. Table 10 compares the BER results for [58] with the results
shown in Figure 47. The improvement seen in the results published by Qi can be explained 
by the fact that Qi used the traditional method of decoding Turbo codes, whereas the RE 
used a sliding window of size 32. The improvement is most evident when either 5, 6 or 7 
iterations are performed. Table 10 shows when these number of iterations are performed 
the standard Turbo decoder has BER that is almost one decade better than the Turbo 
decoder using the sliding window technique.
! !
i H H i i t i m
InHNHip
i m m i m i j M
i l i i H M N H i i i i i i
: : £ s  imir
— cdma2000. log-MAP Block Size 1530, Iterations 14, Rate 2
—(— cdma2000 log-MAP Block Size 1530, Iterations 14, Rate 3
-©- cdma2000. log-MAP Block Size 1530 Iterations 14, Rate 4
-EF cdma2Q00. log-MAP Block Size 1530. Iterations 14, Rate 5
 r-
s ! i 1
II I I I MHII f Hl f H  
i h  i i V  : ;
N i i m n m
=   !!!!!;........  s ||
■ i ..................
iiiuinn
- - r .......... - .............
E ngD  P ortfo lio  -  V olum e 1, E dw ard  B row n 60
Chapter 7: Results and Discussion 
Table 10: Comparison o f Qi [58] with results produced by the RE
Ite ra tio n s 1 2 3 4 5 6 7 8
Qi [58] 6x10 '' 3x 10 2 9x10 '2 2x1 O'3 6x1 O'3
oXr-~ 8x1 O'3 7x1 O'3
RE 3-5x10'' 9-5x1 O'1 4x1 O'2 8-5x1 O'2 9-25x1 O'2 9-75x1 O'2 lx lO '3 2x1 O'3
Bit Error Rate
<x.
i u
CO
0 0 5 1 1.5 2 2 5 3 3 5 4 4 5 5 5 5
Eb/NO
Figure 47: BER plot for comparison with Qi [58]
Figure 48 shows how the results produced by the RE compare with Sugimoto et al [59] for 
a block size of 378, 6 iterations and a code rate of 4 using the cdma2000 decoder. Table 
11 compares the results produced by the RE and by Sugimoto et al for BER plots at 10 ',  
104 and 10 5. The algorithm used in both cases is the Max Scale algorithm, called the 
NSubMap in [59].
Table 11: Com parison o f results produced by RE and Sugimoto et al [59]
B ER 1 0 3 10-* 10 5
Sugim oto  et al [59] 0-9dB 1 -2dB T4dB
RE 0-9dB 1 • 15dB l-4dB
Table 11 shows that the results shown in [59] are very comparable with those produced by 
the RE, reinforcing the quality of the results produced in this portfolio.
Size 378 
Size 378 
Size 378 
Size 378 
Size 378 
Size 378 
Size 378 
Size 378
Rate jcdma2000
cdma2000
cdma2000
cdma2000
cdma2000
cdma2000
cdma2000
cdma2000
iog-MAP
log-MAP
log-MAP
log-MAP
log-MAP
log-MAP
log-MAP
log-MAP
SW 32 
SW 32. 
SW 32 
SW 32 
SW 32. 
SW 32 
SW 32 
SW 32
Block
Block
Block
Block
Block
Block
Block
Block
Iterations 1
Iterations 2 Rate o
Rate 3Iterations 3
Rate 3Iterations 4
Iterations 5 Rate 3
Iterations 6 Rate 3
Iterations 7 Rate 3
Iterations 8 Rate 3
S::E:::::::i:::::::::::c = :
: :
; i ; ■ 7
ii! lillHnli! iniinni! ilmHli :C5h:H:i C :E C : i :CCr :   : TT
EngD  Portfo lio  -  V olum e 1, E dw ard  B row n 61
C hapte r 7: R esults and D iscussion
Bit E rror R a to
cdma2000. Max Scale Block Size 378. Iterations 6 Rate 4 [;
! -  i iil i l i i i i l i M i l l i l l i l i i i i
: N .  j \i l i i n i :
(X
uca
> :  s  i = T h j j  <
Eb/NO
Figure 48: BER plot for comparison with Sugimoto et al [59]
Sagimoto et al [59] was also used to compare the results produced by the RE shown in 
Figure 49, this plot shows the BER performance of the cdma20()0 decoder with a block size 
of 6138 at code rates of j- and j ,  performing 6 iterations. Table 12 compares the results 
produced by the RE and those published in [59] at BERs of 10 '\  1()'4 and 1 0 5. The results 
shown in [59] and in Figure 49, both use the Max Scale algorithm.
Table 12: Comparison o f results produced by RE and Sugim oto et al [59]
R ate 12
i
3
BER 10 3 10 4 10 s 10 3 10 4 10 s
Sugim oto et al [59] 0-95dB 1 2dB l-45dB 0-5dB 0-6dB 0-675dB
RE 1-ldB l-2dB 1 -3dB 0-5dB 0-6dB 0-7dB
Table 12 again shows that the results produced by the RE are comparable by those shown 
in [59]. The difference between the BER curves at the selected BERs is, at most, 0 1 5dB.
E ngD  P ortfo lio  -  V olum e 1, E dw ard  B row n 62
C hapte r 7: R esu lts and D iscussion
10
10
10
10
(XLU
CD
10
10
10
The UMTS Turbo decoder was used to produce the results shown in Figure 50. Table 13 
compares the results shown in [57] and with those produced by the RE for a block size of 
1530, a decoder using the log-MAP algorithm, a code rate of y and iterations of 5, 7 and 9 
at a BER of 106.
Table 13: Com parison of Valenti and Sun [57] with results produced by the RE at BER of 10'6
I te ra tio n s 5 7 9
V alenti & Sun [57] 1 -2dB 1-ldB ldB
RE l-25dB 1 -15dB 1 05dB
Bit Error Rate
:::::::::::::::::::::
i p i i i l i j j i i i i i H H H
cdma2000, Max Scale Block Size 6138. Iterations 6. Rate 2 
cdma2000, Max Scale Block Size 6138. Iterations 6. Rate 3
:-Z-: I ; : : : : : : : : : 2:::::::::::::::n
: ■■■ ■ . :.......T- • ;....  •;...............i ..... V •• ; in |  i
: : : : : : : : : : : : : : : : : : : :  : : : ::E2:
: : : : : : : : : : : : : : : : : — ...................» * - - - -
. . . H............ .............. : j ...................: : ........... : T
! ■ : I  \
.........
I \ ~ ' \
0.2 0 4 0 6 0.8
Eb/NO
12 1.4
Figure 49: BER plot for comparison with Sugimoto et al [59]
E ngD  P ortfo lio  -  V olum e 1, E dw ard  B row n 63
C hapte r 7: R esu lts and D iscussion
Bit Error Rate
—x- UMTS. iog-MAP Block Size 1530. Iterations 5, Rate 3
—f— UMTS log-MAP Block Size 1530 Iterations 7. Rate 3
-0 - UMTS log-MAP Block Size 1530 Iterations 9 Rate 3
10
o:
UJ
CO
10 =
l l i l l l l i l l l l l l l l l lM I i l l l i l l i H l i i l iS i i n i l l l iE i l i l l l i O l I i i n i l l l l i l l i l l i l l l l l l l l l l l l l l i l l l a i i l l i :
0 2
Eb/NO
Figure 50: BER plot for comparison with Valenti and Sun [57]
As Table 13 shows, the RE and [57] produce results that are, at most, 005dB  apart, 
showing that the results produced by the RE using the UMTS Turbo decoder are 
comparable with those published by other researchers.
E ngD  P o rtfo lio  -  V olum e 1, E dw ard  B row n 64
Chapter 8: Conclusion
8. Conclusion
In this portfolio thesis the RE has introduced the subject o f Turbo codes, FPGAs and 3G 
mobile technology in general. FPGAs were evaluated against other silicon devices such as 
ASICs and DSPs. Presently, the main com petitor for FPGAs in the 3G market is DSPs. 
This thesis has shown that FPGAs have both more processing power than DSPs and that 
they are cheaper per channel than DSPs. The main advantage over ASICs for FPGAs is 
their reconfigurability. Being able to reconfigure FPGAs means that new standards, or 
upgrades to standards, can be im plemented remotely, potentially saving the network 
provider money. The case o f the UM TS standard release 5 was highlighted, release 5 
standardised the HSDPA architecture for UMTS. Any network provider who had 
im plem ented an earlier release of the UMTS standard in an ASIC would have to have 
respun their design, absorbing large non-recurring engineering costs in the process. If the 
design was im plem ented in an FPGA the new standard could be downloaded remotely.
The design of Turbo codes in FPGAs was also covered, the RE proposed a novel hardware 
architecture that halved the latency com pared to a traditional Turbo decoder. One o f the 
main advantages o f im plementing a Turbo decoder in an FPGA rather than a DSP is that 
the FPGA can process multiple channels simultaneously. The maximum num ber of Turbo 
decoders that can be im plem ented on a Xilinx Virtex-II 3000 FPGA is six: this figure is for 
a cdm a2000 Turbo decoder using the M ax Scale algorithm with a data input width of 5 bits, 
an internal metric width o f 9 bits, a sliding window size of 32 and using external RAM. 
The num ber of cores that can be im plemented on a single device is restricted by the num ber 
o f block RAM s used. W hen external RAM  is used only 15 BRAM s are required; if 
external RAM  is not used 47 BRAM s are required. This highlights the large memory 
requirem ents o f Turbo decoders.
A  novel param eterisable Turbo decoder was described. The novelty o f this decoder is that 
the user can specify a num ber of parameters that allow them to offset perform ance for 
im plem entation resources. The portfolio thesis highlighted the performance gains o f 
altering certain parameters and revealed the extra resources that these performance gains 
w ould require.
EngD Portfolio -  Volume 1, Edward Brown 65
Chapter 8: Conclusion
Implementing the BERT in System Generator produced a novel design that the sponsoring 
company could use to evaluate all FEC cores. It also helped the sponsoring company 
evaluate using System Generator for core development. Using System  Generator allowed 
the RE to create a unique design that could be easily used and could also be configured to 
the users need. The configurable options in the System Generator BERT allowed the user 
to specify how many errors should be detected before plotting a point, at the same time the 
BERT monitored how many bits were processed by the decoder. Once a certain threshold
was reached a point would be plotted. The advantage of this technique is that at low 
errors are accum ulated rapidly, however at relatively higher values it takes longer for
the error threshold to be reached. W hen this occurs the num ber o f bits processed threshold 
is used to plot a point, which speeds up the BERT for any given test. The user can also 
specify the minimum BER point to be plotted. W hen this threshold is reached the BERT is 
stopped. A  M atlab script is used to automate the BERT platform, in the script a user can 
specify a num ber of code rates, block sizes and iterations to be performed so that a num ber 
o f tests can be performed consecutively. The RE was able to reduce the resources used by 
the System Generator BERT. This allowed the BERT to fit onto a smaller device and 
allowed it to be taken through the System Generator design flow using only Xilinx software 
tools.
The BERT allowed the RE to produce a num ber o f novel results. These were a com parison 
o f UMTS and cdm a2000 Turbo codecs and a channel variance value that optim ised the 
cdma2000 Turbo decoder.
The BERT was also a great commercial tool for the sponsoring company. It allowed the 
sponsoring com pany to evaluate their FEC cores using their own tools rather than third 
party software such as a C sim ulator while obtaining a relatively large speed up. It also 
allowed them to give their custom ers a tool that they could easily use even if they were not 
fam iliar with the FPGA design process. The BERT system is also used extensively as a 
company dem onstration of the System  Generator hardware emulation.
Other work com pleted by the RE was also of great commercial relevance to Xilinx. The 
RE designed a Turbo encoder behavioural model that could be used by Xilinx customers
EngD Portfolio -  Volume 1, Edward Brown 66
Chapter 8: Conclusion
who purchased their UMTS Turbo encoder core. A cdma2000 standard RSC encoder was 
also designed by the RE.
A num ber o f research reports produced by the RE allowed Xilinx to evaluate different 
proposals for their Turbo decoder designs. These included reports on subjects such as 
Turbo decoder hardware architectures, allowing Xilinx to decide which hardware 
architecture to use in their cdma2000 decoder. The RE also evaluated how the DVB 
standard Turbo codec could be designed, taking note of the main differences between this 
duo-binary Turbo codec and standard binary Turbo codecs. A report on how to calculate 
the input values to Turbo decoders led to a novel variance value that improved the 
cdma2000 Turbo decoder core. Finally a number of papers were published by the RE; 
these papers highlighted the perform ance and param eterisablity o f the Turbo codecs 
presented in this thesis.
8.1. Future Direction
The System  Generator BERT described in this thesis used an AW GN channel to evaluate 
the Turbo codecs presented. An AW GN channel allows the user to evaluate the Turbo 
codecs presented in this thesis against other implementations, however AW GN channels do 
not represent a  real mobile channel. The AW GN channel could be replaced by one which 
is more like a real mobile environm ent such as a Rayleigh fading or Rician fading channel.
Another departm ent in the sponsoring company produced a cdm a searcher in System 
Generator. If more 3G components were implemented in System Generator the sponsoring 
company could build a demonstration of a 3G base station. The System generator BERT 
would play an integral part in this model. Using such a model would be o f great 
commercial advantage to the sponsoring company.
Some o f the architectures proposed by the RE could be im plemented and tested against the 
sliding window architectures already implemented. The most potentially interesting could 
be the novel architecture proposed by the RE, although the memory usage would be high it 
would be interesting to see the performance advantages gained.
The RE im plem ented one novel channel variance value to improve the cdm a2000 Turbo 
decoder core. A future developm ent would be to extend this so that each code rate at
EngD Portfolio -  Volume 1, Edward Brown 67
Chapter 8: Conclusion
discrete -I5- values were associated with a certain variance value. These values could be
N0
stored as a look up table on the FPGA. Alternatively, (21) could be implemented. Both of 
these im plem entations would require an estimate o f the channel to be input to the decoder.
In terms o f future coding developm ents, low-density parity check (LDPC) codes [60] are 
em erging as a challenge to Turbo codes in future mobile standards. LDPC codes should be 
evaluated against current Turbo code implementations to determine what improvements are 
possible.
EngD Portfolio -  Volume 1, Edward Brown 68
Chapter 9: References
9. References
[1] C. Dick, J. Steensma. FPGA Implementation of a 3GPP Turbo codec. Signals, 
System s and Computers, 2001. Conference Record of the Thirty-Fifth Asilom ar 
Conference on ,Volume: 1 , 4-7 Nov. 2001
Pages:61 - 65 vol.l
[2] 3rd Generation Partnership Program (3GPP). Technical Specification Group Radio 
Access Network; M ultiplexing and Channel Coding (FDD) 3GPP TS 25.212 V5.6.0, 
2003.
[3] D. Johnson. Programming a Xilinx FPGA in “C” . Xilinx Xcell Journal No. 34, pp. 
26-30. D ecem ber 1999.
[4] D. Johnson. Architectural Synthesis from Behavioral Code to Implementation in a 
X ilinx FPGA. Xilinx Xcell Journal No. 36, pp. 23-25. June 2000.
[5] 3rd Generation Partnership Program 2 (3GPP2). C.S0002-C Physical Layer Standard 
for Spread Spectrum Systems, Version 1.0, Release C, May 2002.
[6] Nallatech. BERT Test Platfrom User Guide Issue 10 . May 2002, 
w ww.nallatech.com , last accessed July 2002.
[7] E. Brown. Reconfigurable cores for wireless appliances: turbo codes. Poster 
presentation, SET for Europe, London , UK , Decem ber 2002.
[8] Digital Video Broadcasting. Interaction Channel for Satellite Distribution Systems, 
ETSI EN 301 790, Version 1.3.1, April 2003.
[9] E. Brown, J. Irvine, B. W ilkie. A mem ory-efficient parameterisable FPGA 
im plem entation of the cdma2000 codec. Colloquium on DSPenabled Radio, 
Livingston , Scotland , 22-23 Sep 2003.
[10] E. Brown, J. Irvine, B. W ilkie. A m emory-efficient parameterisable FPGA 
implementation of the cdma2000 codec. W orld W ireless Congress, San Francisco, 
CA, 25-28 May 2004.
[11] Xilinx. System Generator U ser Guide. 
http://www.xilinx.com /products/software/sysgen/app_docs/user_guide.htm , last 
accessed June 2004.
[12] Nallatech. Virtex-II X trem eDSP Developm ent Kit, 
http://www.xilinx.com /ipcenter/dsp/Xtrem eDSP_Developm ent_Kit- 
II_User_Guide.pdf, last accessed, June 2004.
[13] Annapolis. Annapolis W ildcard II Data Sheet. 
http://w w w .annapm icro.com /datasheets/w cii_m arkdata_l2969_l_2.pdf, last accessed 
June 2004.
[14] A lpha Data. ADM -XRC-II PC M ezzanine Card U ser Guide v l -9. http://www.alpha- 
data.com/pdf/adm-xp.pdf, last accessed June 2004.
[15] E. Brown, J. Irvine, B. W ilkie. Rapid Prototyping of an Automated Test Harness for 
Forward Error Correcting Codes, 11th European W ireless Conference, pp 79-83, 
Nicosia, Cyprus, 10-13 April 2005
[16] E. Brown, J. Irvine, B. W ilkie. Rapid Prototyping of a Test Harness for Forward 
Error Correcting Codes. ACM /SIGDA thirteenth International Symposium on Field 
Program m able Gate Arrays (FPGA 2005), M onterey, CA, 20-22 Feb 2005.
EngD Portfolio -  Volume 1, Edward Brown 69
Chapter 9: References
[17] BBC. 3G Goes Live in the UK. http://news.bbc.co.Uk/l/hi/technology/2808761.stm , 
last accessed February 2005.
[18] M. Benson, H.J. Thom as. Investigation of the UM TS to GSM Handover Procedure. 
IEEE 55th V ehicular Technology Conference, VTC Spring 2002, Volume 4, pp. 1829- 
1833. B irm ingham , USA, 6-9 M ay 2002.
[19] E. Jugl, U. Bernhard, H. Pampel. Strategy and Performance on UM TS-GSM  
Handover. 4 th International Conference on 3G M obile Communication Technologies, 
3G 2003, Volum e 2, pp. 1307-1312. London, UK, 25-27 June 2003.
[20] D. Lugara. J. Tartiere. L. Girard. Performance of UM TS to GSM Handover 
Algorithm s. 15th International Symposium on P erso n a l, Indoor and M obile Radio 
Com m unications, PIM RC 2004. Volume 1, pp. 444-448. Barcelona, Spain, 5-8 
Septem ber 2004.
[21] BBC. V ideo Phones Show Slow Take Off. 
http://new s.bbc.co.U k/l/hi/technology/3322359.stm , last accessed February 2005.
[22] BBC. U K  Gearing Up For 3G Christmas, 
http://new s.bbc.co.U k/l/hi/business/3739834.stm , last accessed February 2005.
[23] ZD N et UK, 3 Leans on LG as 3G Gathers M omentum. 
http://news.zdnet.co.uk/com m unications/3ggprs/0,39020339,39164118,00.htm , last 
accessed February 2005.
[24] BDTI. A lternatives to DSP: W hat and Why? 
www.bdti.com /articles/20030527_A ltem atives_to_D SPs.pdf, last accesses April 
2005.
[25] w w w .sdrfom m .com
[26] Xilinx. Xilinx Virtex-II Data Sheet. 
http://direct.xilinx.com /bvdocs/publications/ds031.pdf, last accessed February 2005.
[27] C.E. Shannon. A M athem atical Theory of Communication. Bell System Technical 
Journal, Vol 27, pp. 379-423, July 1948, pp. 623-656, October 1948.
[28] S. Lin, D.J. Costello Jr. Error Control Coding: Fundam entals and Applications. 
Prentice-Hall, 1983, ISBN 0-13-283796-X, pp. 51-84
[29] S. Benedetto, E. Biglieri. Principles o f Digital Transm ission W ith W ireless 
A pplications. K luwer Academ ic/Plenum  Publishers, 1999, ISBN 0-306-45753-9, pp. 
452-527.
[30] D. D ivsalar, F. Pollara. Turbo Codes for PCS Applications. IEEE Proceedings 1995 
International Com m unications Conference Seattle, USA, pp. 54-59, June 1995.
[31] C. Berrou, A. G lavieux, P. Thitimajshima. Near Shannon Limit Error-correcting 
Coding and D ecoding Turbo codes. IEEE Proceedings 1993 International 
Com m unications Conference, pp. 1064-1070, 1993.
[32] S. Benedetto, G. M ontorsi. Unveiling Turbo Codes: Some Results on Parallel 
Concatenated Coding Schemes. IEEE Transactions on Information Theory, Vol. 42, 
No. 2, pp. 409-428, M arch 1996.
[33] D. D ivsalar, F. Pollara. Turbo Codes for Deep-Space Communications. 
http://tm o.jpl.nasa.gov/tm o/progress_report/42-120/120D.pdf, last accessed April 
2005.
[34] Consultive Com m ittee for Space Data Systems (CCSDS). Recommendation for space 
data system standards: Telemtry and channel coding. CCSDS 101.0-B-6, Blue Book, 
October 2002.
EngD Portfolio -  V olum el, Edward Brown 70
Chapter 9: References
[35] A.J. Viterbi. Convolutional Codes and Their Performance in Communication 
Systems. IEEE Transactions on Communications, Vol 19, No 5, pp. 751-772,
O ctober 1971.
[36] J. H agenauer and L. Papke. Decoding Turbo Codes W ith The Soft Output Viterbi 
A lgorithm  (SOVA). Proceedings of the International Symposium on Information 
Theory, pp. 164, 1994.
[37] L. R. Bahl, J.Cocke, F.Jelink and J.Reviv. Optimal Decoding of Linear Codes for 
M inim izing Symbol Error Rate. IEEE Transactions on Information Theory, pp. 284- 
287, M arch 1974.
[38] P. Robertson, E. Villebrun and P. Hoeher. A Comparison o f Optimal and Sub- 
O ptim al M AP Decoding Algorithms Operating in the Log Domain. IEEE 
International Conference on Communications. Vol. 2, pp. 1009-1013, 18-22 June 
1995.
[39] W. Ryan. A Turbo Code Turtorial. 
http://w ww.ee.udel.edu/~fu/im ages/tc_tutorial.pdf, last accessed April 2005.
[40] P. Robertson and P. Hoeher. Optimal and sub-optimal maximum a posteriori 
algorithm s suitable for Turbo decoding. European Trans, on Telecommunications, 
Vol. 8, No. 2, 1997, pp. 119-125.
[41] E. Boutillon, W.J. Gross, G. Gulak. VLSI Architectures for the M AP Algorithm. 
IEEE Transactions on Communications, Vol. 51, No. 2, pp. 175-185, February 2003.
[42] J. Vogt, A. Finger. Improving the M AX Log M AP Turbo Decoder. IEEE Electronics 
Letters, Vol 36, No 23, pp. 1937-1939, Novem ber 2000.
[43] Xilinx. Xilinx Virtex-II Pro Data Sheet. 
http://direct.xilinx.com /bvdocs/publications/ds083.pdf, last accessed April 2005.
[44] V. Singh, A. Root, E. Hemphill, N. Shirazi, J. Hwang. Accelerating Bit Error Rate 
Testing Using a System Level Design Tool. 11th Annual IEEE Symposium on Field- 
Program m able Custom Computing M achines, 9-11 April 2003, pp. 62-68.
[45] G.R. Cooper, R.W. Nettleton, D.P. Grybos. Cellular Land-M obile Radio: Why 
Spread Spectrum. IEEE Communications M agazine, Vol. 17, No.2, M arch 1979, pp. 
17-23
[46] H. Holma, A. Toskala. W CDM A for UM TS-Second Edition. J. W iley and Sons, 
ISBN 0470844671,2002.
[47] A.J. Viterbi. CDM A-Principles o f Spread Spectrum Communications. Addison- 
W esley, ISBN 0201633744, 1995.
[48] R. Price, P.E. Green Jr. A Communication Technique for M ultipath Channels. 
Proceedings of Institute o f Radio Engineers, Vol. 46, M arch 1958, pp. 555-570.
[49] Texas Instruments. TM S320TCI100 Fixed-Point Digital Signal Processor Datasheet. 
http://focus.ti.com /general/docs/lit/getliterature.tsp?baseLiteratureNum ber=sprs218, 
last accessed April 2005.
[50] Xilinx. Using FPGAs in W ireless Base Station Designs. 
http://w w w .xilinx.com /publications/xcellonline/xcell_52/xc_pdf/xc_v4wireless52.pdf 
, last accessed April 2005.
[51] X. Li, H. Huang, G.J. Foschini, R.A. Valenzuela. Effects o f Iterative Detection and 
D ecoding on the Performance of BLAST. IEEE Global Telecommunications 
Conference: GLOBECOM 2000, 27 Novem ber -  1 December, Vol. 2, pp. 1061-1066.
[52] IEEE. Software Radios. IEEE Communications M agazine, Vol. 3, No. 5, M ay 1995.
EngD Portfolio -  Volume 1, Edward Brown 71
Chapter 9: References
[53] M.C. Jeruchim, P. Balaban, and K.S. Shanmugan. Simulation of 
Communication Systems. Plenum Press, New York, 1992
[54] A. M atache, S. Dolinar, F. Pollara. Stopping Rules for Turbo Decoders. TM O 
Progress Report 42-142, http://tm o.jpl.nasa.gov/tm o/progress_report/42-142/142J.pdf, 
August 15, 2000, last accessed, April 2005.
[55] D. Garrett. Xu Bing. C. Nicol. Energy efficient turbo decoding for 3G mobile. 
International Symposium on Low-Power Electronics and Design, 6-7 August 2001, 
pp. 328-333
[56] S. A. Barbulescu. Sliding W indow and Interleaver Design. Electronics Letters, Vol. 
37, Issue 21, 11 October 2001, pp. 1299-1300.
[57] F. Dowla. Handbook of RF and W ireless Technologies. Newes, 2003. ISBN 
0750676957. pp. 375-400.
[58] J. Qi. Turbo Code in IS-2000 CDM A Communications Under Fading. W ichita State 
University, M Sc Thesis. October 1999, pp. 28.
[59] H. Sugimoto, T. W ang, X. Ping. A Sub-M AP D ecoder Performance in IM T2000 
M obile Environment. 2nd International Symposium on turbo Codes & Related Topics, 
4-7 September, 2000.
[60] R.G. Gallager. Low Density Parity Check Codes. IRE Transactions on Information 
Theory, Vol IT-8, pp.21-28, January 1962.
EngD Portfolio -  Volume 1, Edward Brown 72
Reconfigurable Cores for Wireless 
Appliances: Turbo Codes
V O LU M E 2 (O F  2)
Edward Brown
A themed portfolio submitted to 
The Universities of
Edinburgh 
Glasgow 
Heriot Watt 
Strathclyde
for the Degree of 
Doctor of Engineering in System Level Integration
© Edward Brown, July 2004
Contents
Appendix A : Algorithm Development - Encoder Behavioural Model............................... 1
Appendix B : Algorithm Development - Decoder Behavioural M odel............................... 8
Appendix C : Introduction to Turbo Codes.............................................................................13
Appendix D : Investigation of Turbo Decoder Hardware Architectures........................ 26
Appendix E : Duo-Binary Turbo Codes................................................................................... 41
Appendix F : Calculating Input Values for Turbo D ecoders............................................. 51
Appendix G : Published Papers...................................................................................................66
