A New Simplified Algorithm Suitable for Implementation on FPGA for Turbo Codes by Thangarajah, Krishnamohan
University of Windsor 
Scholarship at UWindsor 
Electronic Theses and Dissertations Theses, Dissertations, and Major Papers 
2010 
A New Simplified Algorithm Suitable for Implementation on FPGA 
for Turbo Codes 
Krishnamohan Thangarajah 
University of Windsor 
Follow this and additional works at: https://scholar.uwindsor.ca/etd 
Recommended Citation 
Thangarajah, Krishnamohan, "A New Simplified Algorithm Suitable for Implementation on FPGA for Turbo 
Codes" (2010). Electronic Theses and Dissertations. 7998. 
https://scholar.uwindsor.ca/etd/7998 
This online database contains the full-text of PhD dissertations and Masters’ theses of University of Windsor 
students from 1954 forward. These documents are made available for personal study and research purposes only, 
in accordance with the Canadian Copyright Act and the Creative Commons license—CC BY-NC-ND (Attribution, 
Non-Commercial, No Derivative Works). Under this license, works must always be attributed to the copyright holder 
(original author), cannot be used for any commercial purposes, and may not be altered. Any other use would 
require the permission of the copyright holder. Students may inquire about withdrawing their dissertation and/or 
thesis from this database. For additional inquiries, please contact the repository administrator via email 
(scholarship@uwindsor.ca) or by telephone at 519-253-3000ext. 3208. 
A New Simplified Algorithm Suitable for




Submitted to the Faculty of Graduate Studies
Through Electrical and Computer Engineering
In Partial Fulfillment of the Requirements
for the Degree of Master of Applied Science
at the University of Windsor
Windsor, Ontario, Canada
2010
© 2010 Krishnamohan Thangarajah













Your file Votre référence
ISBN: 978-0-494-70590-2
Our file Notre référence
ISBN: 978-0-494-70590-2
NOTICE:
The author has granted a non-
exclusive license allowing Library and
Archives Canada to reproduce,
publish, archive, preserve, conserve,
communicate to the public by
telecommunication or on the Internet,
loan, distribute and sell theses
worldwide, for commercial or non-
commercial purposes, in microform,
paper, electronic and/or any other
formats.
The author retains copyright
ownership and moral rights in this
thesis. Neither the thesis nor
substantial extracts from it may be
printed or otherwise reproduced
without the author's permission.
AVIS:
L'auteur a accordé une licence non exclusive
permettant à la Bibliothèque et Archives
Canada de reproduire, publier, archiver,
sauvegarder, conserver, transmettre au public
par télécommunication ou par l'Internet, prêter,
distribuer et vendre des thèses partout dans le
monde, à des fins commerciales ou autres, sur
support microforme, papier, électronique et/ou
autres formats.
L'auteur conserve la propriété du droit d'auteur
et des droits moraux qui protège cette thèse. Ni
la thèse ni des extraits substantiels de celle-ci
ne doivent être imprimés ou autrement
reproduits sans son autorisation.
In compliance with the Canadian
Privacy Act some supporting forms
may have been removed from this
thesis.
While these forms may be included
in the document page count, their
removal does not represent any loss
of content from the thesis.
Conformément à la loi canadienne sur la
protection de la vie privée, quelques
formulaires secondaires ont été enlevés de
cette thèse.
Bien que ces formulaires aient inclus dans




Author's Declaration of Originality
I hereby certify that I am the sole author of this thesis and, to the best of my
knowledge, my thesis does not infringe upon anyone's copyright nor violate any
proprietary rights and that any ideas, techniques, quotations, or any other material from
the work of other people included in my thesis, published or otherwise, are fully
acknowledged in accordance with the standard referencing practices.
I declare that this is a true copy of my thesis, including any final revisions, as
approved by my thesis committee and the Graduate Studies office, and that this thesis
has not been submitted for a higher degree to any other University or Institution.
iii
Abstract
In this thesis, a new algorithm for Turbo codes and a novel implementation of
turbo decoder employed with this algorithm is developed. The decoder has an optimal
performance in terms of Bit Error Rate(BER) in all Signal to Noise Ratio(SNR) for all frame
sizes and any states of Turbo codes. In hardware implementation, we combine the
normalization and matrices modules in a single module in order to minimize the internal
connection delay which is the bottleneck in hardware implementation, so that the result
can be obtained in one single clock signal. Having implemented in this fashion, data rate
of 28Mbps forl6 state decoder has been achieved. This can be further improved by
changing the algorithm for the normalization modules and LLR modules with MAX
operator. The matrices modules with the proposed algorithm and the normalization and




To my ever loving family
Mom, dad, sisters Mala,and Kala,my wife Shammy ,my kids vithu andJathu,
my nephew Keeran, and my niece Nakeeta.




With utmost sincerity I express my gratitude and respect to my advisors Dr.
Behnam Shahrrava and Dr. Mohammed Khalid, who have always inspired me to work
with honesty, and discipline. Their timely guidance has been indispensible boons
contributing to the completion of this thesis. I am also thankful to my committee
members Dr.Robert Kent and Dr. Mitra Mirhassani for their remarkable comments.
I am also thankful to Sundeep LaI and Matt Murawski for their helpful comments,
and also extend my note of thanks to Dr. Rashid Rashidzadeh for letting me to use the
RCIM lab for the simulations.
This note would be incomplete without thanking graduate departmental
secretary Andria Ballo for always being there for valuable guidance.
vi
Table of Contents
Author's Declaration of Originality iii
Abstract ¡v
A Sincere Dedication ?
Acknowledgement vi
List of Figures x
List of Tables xii
List of Abbreviations xiii
CHAPTERl INTRODUCTION 1
1.1 Problem Statement 1
1.2 Motivation 2
1.3 Research Methodology 3
1.4 Principal Results 3
1.5 Thesis Organization 4
CHAPTER 2 Turbo Codes and Its Implementation Issues 5
2.1 Fundamentals of Turbo Codes 5
2.1.1 Mathematical Background 8
2.1.2 Literature Survey 10
2.1.2.1 LOG-MAP 12
2.1.2.2 MAX-LOG-MAP 13
2.1.2.3 SIMPLIFIED LOG-MAP 13
2.1.2.4 IMPROVEDLOG-MAP 14
2.2 Fixed Point Representation 14
2.3 Implementation Issues of Turbo decoder 14
CHAPTER 3 New Simplified Algorithm Suitable for Implementation on FPGA for Turbo
Decoding 16
3.1 New Algorithm 16
vii
3.1.1 Comparison of Complexity of All Algorithms 18
3.2 Software Implementation and Simulation 19
3.2.1 Software Implementation of Turbo Decoder with New Algorithm 19
3.2.2 Floating Point Simulation 21
3.2.2.1 State-4 Turbo Decoder 21
3.2.2.2 State-16 Turbo Decoder 27
3.2.3 Fixed Point Simulation 29
3.3 Observations From the Simulation 32
CHAPTER 4 HARDWARE IMPLEMENTATION AND VALIDATION 34
4.1 Hardware Implementation of Turbo Decoder 34
4.1.1 16-State Turbo Decoder on FPGA 35
4.1.1.1 Trellis Diagram Of the Turbo Decoder 39
4.1.1.2 Gamma Module 40
4.1.1.3 FinalAlpha Module 41
4.1.1.4 FinalBeta Module 49
4.1.1.5 LLRCore of Turbo Decoder 55
4.1.1.6 LECore of Turbo Decoder 58
4.1.1.7 Control Module of Turbo Decoder 59
4.1.2 MyLogl04 Module for MAX* 63
4.2 Optimization of Data Rate 65
4.3 Hardware Synthesis And Simulation 66
4.3.1 Hardware Synthesis of Turbo Decoder 68
4.4 Novelties of Implementation 69
4.5 Observation from the Hardware Simulation 70
CHAPTER 5 CONCLUSIONS 71
REFERENCES 73
APPENDIX 76
Al. Matlab codes used in this thesis 76





Figure 2-1 Turbo Encoder with code rate 1/3 5
Figure 2-2 RSC encoder with 4 memory elements 6
Figure 2-3 Serial MAP Turbo decoder 7
Figure 3-1 Correction values to be added with MAX 17
Figure3-2 Flowchart for MATLAB simulation of Turbo decoder 20
Figure 3-3 Simulation results for all algorithms 22
Figure 3-4: Simulation result for Low SNR 23
Figure 3.5 Effect of forward normalization and backward normalization 24
Figure 3.6: Different frame sizes with OdB SNR 25
Figure3.7: Different frame sizes for 0.5dB 25
Figure 3.8: Different Frame size for SNR 1 26
Figure 3.9: Different Iterations with frame size 1024 26
Figure 3. 10: From very low to moderate SNR with different Iterations 27
Figure 3. 11 Simulation results for 16-state decoder with 5 iterations for all algorithms.
........................................................................................................................................... 28
Figure 3. 12 Simulation results of 16 states decoder for MAP and new algorithm with 4
iterations and Improved MAX-LOG-MAP and Simplified MAX-LOG-MAP with 5 iterations.
........................................................................................................................................... 29
Figure 3. 13 Bit Error Rate for all algorithm with low SNR 30
Figure3. 14 Simulation results for Two different fixed point representations 31
Figure 3. 15 Comparison of fixed point of (10,4) and (8,4) representation 32
Figure 4.1: HDL blocks for the Turbo decoder 36
Figure 4.2 Trellis Diagram for the generator polynomial [1 0 0 0 1; 1 1 1 1 1] 39
Figure 4.3: Gamma module used as GAMMAF and GAMMAB in the DECODER 40
Figure 4.4: Final alpha for the Turbo Decoder 42
Figure 4.5: Total Alpha module without normalization 44
Figure 4.6 : Basic Block for Alpha module 45
Figure 4.7 : Normalization module for Alpha 48
Figure 4.8: Final Beta with normalization 49
Figure 4.9: Final Beta without normalization 52
Figure 4.10: Basic block for backward metric calculation 53
Figure 4.11: Normalization module for Beta 54
Figure 4.12: Architecture for finding LLRone 56
Figure 4.13: Architecture for computing LLRzero 57
?
Figure 4.14: LLR module 58
Figure 4.15: Block diagram for extrinsic computation 58
Figure 4.16 : Flow chart for the control module 61
Figure 4.17: Control module 62
Figure 4.18: Signal flow diagram for the new algorithm 64
Figure 4.19: Comparison of single algorithm with double algorithm 66




Table3.1 Slope and the intersection for the selected regions 18
Table3.2 : Complexity of the algorithms 18
Table 4.1: Stratix Il : EP2S180F150814 features 35
Table 4.2: Memory and its usage 38
Table 4.3 Description of the modules used in the DECODER 38
Table 4.4 Signal description for GAMMACORE 41
Table 4.5: Signal description for Alpha module 42
Table 4.6: Signal description of FINALBETA 50
Table 4.7: State and Signal assignment 59
Table 4.8: Resources and Frequency comparison 67
Table 4.9: Resource comparison for the Turbo Decoder 69
Table 4.10 Comparison of throughput with recent implementation 70
xii
List ofAbbreviations
3GPP Third Generation Partnership Program
4G Forth Generations
ABS Absolute
AWGN Additive White Gaussian Noise
ALUT Adaptive Look Up Table
BCJR Bahl Cocke Jeinek Raviv
BER Bit Error Rate
BPSK Binary Phase Shift Key
CE Cross Entropy
CRC Cyclic Redundancy Check
DSP Digital Signal Processors
FPGA Field Programmable Gate Arrays
HDA Hard Decision Aided
HDL Hardware Description Language
LLR Log Likelihood Ratio
MAP Maximum A posteriori Probability
MAX Maximum
MULT Multiplier
NRE Non Returnable Engineering
RSC Recursive Systematic Convolutional
SCR Sign Change Ratio
SISO Soft Input Soft Output
SNR Signal to Noise Ratio
SOVA Soft Output Veterbi Algorithm
VHDL Very high speed integrated circuit HDL
VLSI Very Large Scale Integration
UMTS Universal Mobile Telecommunication System
CHAPTER 1
INTRODUCTION
Channel codes can be classified into two major classes; block codes and
convolutional codes. In block codes, one of the information sequence of length k is
mapped into a binary sequence of length n, called codeword, and the code rate is
defined as k/n. Block codes are memory less, i.e. the codeword depends only on the
current k information bits, whereas the convolutional codes have finite-state machines
which makes the current codeword depend not only on the current data bit but also
state of the finite machine. Turbo code underlies within the convolutional codes.
In this Chapter, the importance of Turbo codes and its usage are addressed, and
the motivation for this thesis and principal results are briefly explained.
1 . 1 Problem Statement
Due to the presence of distortion, noise, and interference, achieving error-free
digital communication is not possible without channel coding which basically adds
redundant information called parity bits to the data bits to detect and correct the errors.
After the invention of Turbo Code [1] which has a BER performance very close to
Shannon's theoretical limit, most researchers had been trying to find a practical
algorithm that can be implemented in real world applications. These algorithms are
optimal at medium to high SNR for small constraint length of the encoder.
The decoding algorithm used in turbo codes can be MAP or SOVA, but MAP has
better BER performance than SOVA. Due to the complexity inherited with MAP
algorithm makes it impossible to implement in hardware; as a result LOG-MAP algorithm
was a feasible solution to MAP without incurring any performance loss. Since the on-
going research on improving the data rate while minimizing the BER performance loss,
the VLSI implementation of turbo code is not a practical solution due to its NRE cost. The
feasible solution is implementing on FPGA which can be reconfigurable when the
1
modifications need to be done or the standards should be changed according to the
research outcomes.
To improve the data rate, researchers came up with windowing technique [22],
where the frame size is divided into small frame sizes and each window is parallel
processed using the data acquisition in order to find the backward matrices to be
initialized. This type of implementation has some drawbacks such as it cannot be used in
power limited applications, has some BER performance degradation, not a feasible
solution for small frame sizes.
Because of its near Shannon limit performance, turbo codes have been
incorporated into many standards such as the consulate committee for space data
systems (CCSDS), 3GPP/UMTS, and cdma2000[10], and an standard for IEEE
802.16(WiMax)[ll]. For 4G systems, turbo codes should support different frame sizes
ranging from 100 to 10,000 and the data rate of 100 Mbps to 1 Gbps. The existing
algorithms have BER degradation for large frame size.
The objective of this thesis is to come up with an optimal algorithm for MAX* in
order to avoid the errors as much as possible so that the quantization errors due to the
fixed point implementation can be minimized, and throughput of the decoder is
increased so that it can support the 4G applications.
1 .2 Motivation
Near Shannon's limit of Turbo codes BER performance instigated to find a
feasible algorithm which can achieve a performance similar to LOG-MAP algorithm. High
data rate needed for 4G applications motivated this research in order to find a solution
without affecting the BER performance for any frame size at all SNR ranging from low to
high, and independent of the encoder's constraint length.
2
1 .3 Research Methodology
Research papers related to Turbo decoding and implementation provided by
my advisors were analyzed thoroughly to understand the principle of Turbo codes and
its implementation issues, and recent publications were also subject to analyze the role
of the interleaves and windowing technique in Turbo Codes. Once the principle of
Turbo codes was analyzed a new algorithm was developed and its performance was
simulated using Matlab to verify its validity for different frame sizes and constraint
lengths. Fixed point simulation of the algorithm was done to finalize the fixed point
representation that best validates the findings for the hardware implementation.
Once the validity of the new algorithm is assured, the FPGA was chosen to fit
the memory and DSP blocks needed for the Turbo decoders. The first step in hardware
implementation was to develop the MAX* function, which is the major module that
dominates the performance of the Turbo decoder, and checked the output with the
floating point calculation to see how it differs from it. After the validation of MAX*,
which is named as "mylogl04", each individual modules was realized seperately in order
to utilize the modular design so that if any changes to be made, the particular module
can be modified without affecting the whole design.
Finally all the modules were integrated and tested with the data from the
MatLab using a test bench to evaluate the BER performance of the designed Turbo
Decoder.
1 .4 Principal Results
• A new simplified and implementable algorithm, which has similar BER
performance to LOG-MAP for any SNR, constraint length, and frame size,
was developed for MAX* function.
3
• Complete decoder was implemented in FPGA with the assumption of
availability of received data and parity information.
• High data rate of 60 Mbps without penalizing the BER performance was
achieved
• Number of adders and MAX* functions needed for the normalization
module used in Alpha and Beta modules were reduced significantly.
1.5 Thesis Organization
Chapter 2 discusses the Turbo codes and the mathematical background needed
for the turbo decoder. The research work related to Turbo Codes and the existing
algorithm used in MAX* function and the implementation issues with each algorithm are
also presented in this chapter.
Chapter 3 introduces a new algorithm and validity of this algorithm is evaluated
with MatLab simulation. All existing algorithms discussed in Chapter 2 are subject to
comparison to see the benefit of the new algorithm. And also the fixed point simulation
was carried out to best define the wordlength suitable for all SNR ranging from low to
high for hardware implementation.
Chapter 4 describes the hardware implementation for 16-state Turbo Decoder
and optimization for normalization module and the data rate of the turbo decoder. The
evaluation of the turbo decoder is also carried out by comparing with the floating point
simulation with MatLab.
Finally Chapter 5 brings out the conclusion of this research work and the
benchmark of this thesis. Throughout this thesis we assume the modulation scheme is
BPSK and the channel is considered as an AWGN.
4
CHAPTER 2
Turbo Codes and Its Implementation Issues
In this chapter, fundamentals of turbo codes and its mathematical background
will be discussed, and also the research work related to turbo codes and its
implementation will be reviewed briefly. The fixed point representation is also discussed
for signed numbers and the factors that need to be considered are discussed for the
hardware implementation.
2. 1 Fundamentals of Turbo Codes
Turbo Codes are two RSC codes concatenated with an interleaver. The Figure 2.1
shows the basic block for the turbo codes. The two RSC encoders can be identical or
different. For the particular turbo codes, the code rate is 1/3, but this can be improved
by puncturing the parity bit from the encoders; the higher the code rate the better the
spectral efficiency.
Information bits
Figure 2-1 Turbo Encoder with code rate 1/3.
The interleaver(n) is usually selected to be a block pseudorandom interleaver
that reorders the bits in the information sequence before feeding them to the second
5
encoder. However, for high data rate turbo codes, the interleaver must be designed to
avoid the memory contention due to the multiple processes trying to access the data
from the memory. The purpose of the interleaver is to make the data uncorrelated and
produce a code that contains very few code words of low weight, which is called
multiplicity that is a factor for the coding gain of the turbo codes.
An important factor in the performance of the turbo code is the length of the
interleaver, which is referred to as interleaver gain. With sufficiently large interleaver[l],
the performance of the turbo code is very close to the Shannon limit. The data bit along
the parity bits are modulated and transmitted to the channel serially. The systematic bit
from the second RSC encoder is just ignored.
The RSC codes are given by their generator matrix of the form G(D) = 1 2 ,
where gx (D) and g2(ß) are the feedback and feed forward polynomial, respectively.
The figure 2.2 shows a (37, 21)oct RSC encoder where ^1 (D) = [11111] and g2(D) = [1
0 0 0 1] corresponding to ^1(D) = 1 + D + D2 + D3 + D4 and g2(D) = 1 + D4.
Pi
+. : 1 ?
Figure 2-2 RSC encoder with 4 memory elements
In the receiver part, the turbo decoding can start once the data and parity
bits are available; in this case the branch matrix and the forward matrix are
6
calculated and stored for calculating the backward matrix and log likelihood ratios
once all the branch matrix and forward matrix are computed. With dual path processing,
i.e. backward and forward matrices are calculated simultaneously, the decoder has to
wait till all the data and parity bits are completely received. The advantage of the dual
path processing is that of doubling the data rate while minimizing the decoding delay
[12].
The turbo decoder can be equipped with SOVA or MAP algorithm to decode the
data, but MAP algorithm has better BER performance than SOVA [5]. The Figure 2.3





Figure 2-3 Serial MAP Turbo decoder
Initially the extrinsic values for first MAP decoder are set to zero by assuming the
equal probability for 1 and O. The first MAP decoder computes its LLR values from the
systematic data, parity data and the extrinsic information. Since the channel information
is available and the extrinsic values are from the other MAP decoder, the first MAP
decoder must suppress the channel information and the extrinsic values from its LLR
value to calculate the extrinsic values to be fed to the next MAP decoder. The extrinsic







The second decoder then computes its own LLR values from the channel
information which includes interleaved data and the parity bits from the second encoder
and computes the extrinsic values to be passed to the first MAP decoder as explained
before. The extrinsic information generated by a decoder acts as the a priori probability
information for the next stage. This process will be continued iteratively until a stopping
criterion is met. At the end of the iteration, the LLR values are used to decode the data
by using its sign; positive means 1 and negative means 0. This is also called hard decision
making.
Since the extrinsic values, which is referred to as soft values, are the input and
the output from each MAP decoder the decoder is sometimes referred to as SISO
decoder. More number of iterations is done for getting better BER but once it reached
the error floor, which occurs for high SNR, there would not be any significant
improvement in BER.
2.1.1 Mathematical Background
The computation of a priority probability is the key factor in turbo decoding
algorithm. The MAP decoder, which implements the BCJR algorithm, has to evaluate the
LLR defined as:
¦A(dfc)=-togpr(dfc=0|3f) EqI
Since the encoder has a very short memory, i.e. the output depends only the
current state of the encoder and the current input to the encoder, the process can be
considered a Markov process [8]. EqI can be simplified as:
? (dk) = log ^ * fe . * r TT-- Eq2
Where,
8
Yi[(yk.yJc)'sk-i.Sk] = v(ysk\dk = O .v{yk \dk = i,sfc,sfe_i)
.q(dk = ¿ISfcjSfc-O.PriSfclSfc-i} Eq3.
afc(sft) = ZsfcX^oy¿(yfc'5fc-i'5fc)-a/c-i(5fc-i) Eq4·
^fc(5fc) = Isfe+1El-=0y¿(yfc+i^fc^fc+i)-^fc+i(5fc+i) Eq5.
If the initial state of the encoder is known, i.e. S0 = O then
r \ ? r ? (Im = Oa0(m) = Pr{50=m}=(0m^0
And the encoder is terminated with known state, i.e. Sn= 0 then
ßN(m) = Pr{sN = m) = [Q ™ f Q
If the encoder is not terminated, then
ßN(m) = Pr{sw = m} = 1/P , where P is the number of states of the
encoder. In our case, the first encoder is terminated to a known state whereas the
second encoder is left open, i.e. can have any one of the 16 states.
In Eq3, the value of q(dk = i\sk,sk-J is either one or zero depending on
whether bit I is associated with the transition from state sk^1 to sk or not [3]. Since
there is no parallel transition, i.e. only one transition is possible; the followings are the
facts [3].
Pr{sk\sk^} = Pr{dk = 1} when q(dk = l\sk, S11-J = land
Prfsjsfc-i} = Pr{dk = 0} when q{dk = 0|sfc,sfc_!) = 1
9
2.1.2 Literature Survey
Since the introduction of turbo codes [1] and its capability of near Shannon limit
error correction performance, a lot of interest had been raised to find practical decoding
algorithms for implementation in real systems. The original BCJR [6] algorithm used in
MAP, cannot be realized in hardware due to its complex probability functions and non-
liner functions. The modified BCJR [3], which is a logarithmic version of the original
algorithm, was proposed by Patrick, Peter, and Emmanuelle, which in turns, brought
other algorithms such as MAX-LOG-MAP, simplified MAX-LOG-MAP and improved MAX-
LOG-MAP. Amongst these algorithms the MAX-LOG-MAP is not sensitive to SNR
mismatch [9] and requires a max operation only. To improve the performance of the
MAX-LOG-MAP, a simple look-up table [3] or a threshold detector [2] was utilized as a
correction value. Each of these algorithms is discussed in the following sections.
Due to the iterative nature of the decoding process, a high computational
complexity is inevitable; as a result the latency and energy consumption increase linearly
with the number of iterations. Particularly, more number of iterations would not
improve the BER at high SNR due to the fast convergence to the error floor. And also the
effectiveness of the decoding process strongly depends on the channel characteristics,
which can change from block to block due to the noise and fading. In some cases even
with very large number of iterations, it is impossible to have a successful decoding.
Therefore some stopping criterions must be implemented in turbo decoders. Sum-
Reliability, and combined minimum LLR and sum reliability stopping criterions were
proposed in [23]. In sum-reliability, the sum of absolute value of LLR is computed after
each iteration and compared with the previous sum-reliability value. If the sum-
reliability for the current iteration is less than the previous one the decoding process is
halted. In the second stopping criterion, a threshold value is used to compare with the
minimum value of LLR and the sum-reliability is also used. Having met either one of the
conditions makes the decoding process to be stopped. Other stopping criterions such as
10
CE [24], mean-reliability [25], minimum LLR [26], SCR [27], HAD [27], and CRC [28] are
also proposed in the literature.
Due to the complexity of the MAP algorithm, it is impossible to design a high
throughput turbo decoder unless the windowing technique is employed, wherein
several MAP processors operate on smaller sized windows within each received frame
[8]. This is very essential for 4G applications where the peak data rate is in the range of
200Mbps [29] with extremely tight delay constraints. In windowing technique, the frame
length is divided into smaller sizes and each window is associated with its own MAP
processor. For the windowing technique data acquisition is performed in order to
initialize the backward matrix to reduce the errors in the initialization. The problem with
the windowing technique is the memory contention [8], where multiple processors try
to access the same memory simultaneously. Using the buffers [33] or modified memory
addressing [34], memory contention can be resolved, and also specifically designed
interleaver [30, 31, 32] can be employed to totally avoid the memory contention. If an
interleaver is contention free for all window sizes that divide the interleaver length, it is
called a maximum contention free interleaver [8]. The advantage of the maximum
contention free interleaver is that there is no restrictions on selecting the window size
other than it should divide the interleaver length. Quadratic permutation polynomials
over integer rings and maximum contention free permutation polynomials interleaver
were proposed in [8]. The contention free requirements are discussed in [29].
The implementation of turbo codes can be classified into serial and parallel
architectures. In serial architecture one MAP decoder is utilized to decode the whole
frame length, whereas in parallel architecture frame length is considered into several
sub blocks which can be parallel processed using MAP decoders associated with each
sub block in order to reduce the latency and in the mean time increasing the throughput
which is directly proportional to the number of sub blocks. The parallel architecture can
use the contention free interleavers [8] or the network on chip [35] in order to avoid the
memory contention. The bit width that represents the data, parity, and extrinsic
11
information that stored in the memory has an impact on power consumption since the
most of the power consumption comes from power dissipation from the memory [35].
The optimization of bit width for extrinsic information was proposed in [5].
Since the throughput of the decoder is strongly dependent upon the MAX*
operator used in MAP decoder, and the recursive computations needed for the LLR
calculations, tree architecture[20] is inevitable to reduce the number of clock cycles for
the recursions and the critical path delay of the LLR module. Dual path processing [12]
and radix-4 [16] are also used to improve the throughput, however the radix-4
implementation has some BER degradation.
2.1.2.1 LOG-MAP
In Eq3, the probability functions can be written as
p&fK = tik.,,.0 = ^.«sSW-i«*·»"* Eq6.
PWIdI = O=T=.^-*01' Eq7.
Due to the exponential terms in Eq6 and Eq7, the Eq3, 4, and 5 are calculated in
logarithmic domain to reduce the complexity in MAP. These equations can be written as
, [? s p\ p 2ykxk(Q , 23>fcxk(i,5k,5k_i) -1 . «.
'»0 iV0
Eq8.
lnak(sk) = In (E^1 S'=? ß1??^?·??·^-^+1?a^^-^) Eq9.
\nßk(Sk) = In (Zs^1 S?=? ß1???^+??^+?^]+??ß?+1^?{?^ Eql0.
The value K in Eq8 can be ignored [3].




Yi[(yk>yD>sk-i,Sk] = ????[(?a,??), S^11S11]
Using the Jacobean expansion, we can write
ln(e* + è?) = maxO,y) + In (1 + e^*^) EqIl.
Therefore the LLR can be written as:
A(dft) = rc fliäX (âfc_1(5k_1)+yk(5fc_1,5fc)+^fc(5fc))(Sk,5k_1),dfc=l V /
- max fäfc_1(5fc_1)+yfc(5fc_1,5k)+^fc(5fc)) Eql2.
Where
fñS3t(x,y) = MAX* = max(x,y) + In (1 + e~lx~yl)
In EqIl, the second term is considered the correction function. Using different
methods to find this correction function, MAP algorithm can be referred to MAX-LOG-
MAP, simplified MAX-LOG-MAP, and improved MAX-LOG-MAP.
2.1.2.2 MAX-LOG-MAP
In MAX-LOG-MAP, the correction function is ignored. This ignorance will have a
significant impact at 0 to 3dB SNR values, but not in SNR values greater than 5dB. Even
in the range of 3 to 5dB SNR values, more number of iterations may be needed to
achieve the same BER compared to other algorithm. But from the hardware point of
view, it is the least complex algorithm amongst the existing algorithms and also it is not
sensitive to SNR mismatch [9].
2.1.2.3 SIMPLIFIED LOG-MAP
In this algorithm [2], the correction function has two values. If the difference is
less than 2 a value 0.375 is added to the max operation, otherwise zero is added. This
13
algorithm also has some variation over MAP algorithm, because the errors due to the
quantization are not evenly distributed.
2.1.2.4 IMPROVED LOG-MAP
McLaren series is employed in this algorithm [4], to find the correction function
as explained below:
log(l + exp(-|x - y|)) « log2 -\\x-y\
« max (0, log! -\\x-y\)
This will work great for the absolute value of the difference less than 1.3863, and
also shift operation and the one more max operation are the overhead in the hardware
implementation, and have poor BER performance for higher order of turbo decoders.
Even though this algorithm also has some deviation from the MAP at 0 to 2dB SNR
values, it is better than the simplified algorithm when the absolute values are less than
1.3863. In our case, we will compare this algorithm and the MAP algorithm for the fixed
point analysis.
2.2 Fixed Point Representation
The fixed point representation is very important in FPGA implementation of any
digital circuit since the floating point implementation is costly and power consuming. A
signed fixed point is represented as A(a,b), where a is the word length and b is the
fraction length. The range of the representation is from —2a~b to —2a~b- l/2b.
2.3 Implementation Issues of Turbo decoder
When the turbo decoder is to be designed, we need to consider some design
constraints such as area, power, memory requirements, throughput and latency, and the
14
acceptable BER. The fixed point representation plays a major role in deciding constraints
mentioned above. Since the received data is influenced with the noise, the SNR decides
the range of the received data; as a result, the fixed point representation must include
all the data from minimum to maximum. If the range of the fixed point representation is
too large most of the input will be quantized by a zero value, i.e. an erasure [21]. On the
other hand, if the range is too small most of the input values are saturated and leads the
hard decision to soft quantization [21].
The data rate and the latency of the turbo decoder is the inverse of the
maximum critical path delay from the LLR module and the normalization module.
Fortunately the BER performance of the turbo decoder is not sensitive to either of these
two computations. In contrast, branch, forward, and backward matrices calculations are
the ones determine the BER performance of the turbo decoder in addition to the other
factors such as interleaver and generator polynomial. Therefore the selection of the
algorithm for these calculations can be a relaxed algorithm in order to increase the data
rate while maintaining the BER performance. And also this will have a significant
reduction in area and power consumption.
The selection of the algorithm also has an impact on the constraints mentioned
above. The simple algorithm which gives the high throughput while giving very poor BER
performance is MAX-LOG-MAP. Therefore it is totally dependent upon designer's choice
of what algorithm needs to be chosen and what fixed point representation must be
considered based on the applications.
15
CHAPTER 3
New Simplified Algorithm Suitable for Implementation
on FPGA for Turbo Decoding
In this chapter, a new algorithm is derived based on the correction function
mentioned in the previous chapter. The simulation of the new algorithm with the
existing algorithms is analysed. Fixed point simulation is also performed in order to
come up with a suitable word length which does not affect the BER performance in any
SNR.
3 . 1 New Algorithm
The objective of developing this new algorithm is to have a BER performance
similar to LOG-MAP at low SNR applications. The development of the algorithm stars
with the correction function in MAX* operation. The MAX* operation is defined as:
MAX+(Xy) = max(x,y) + In (1 + e-1*-*'1) Eql3.
The Figure 3.1 shows the correction values versus the absolute difference of ?
and y. The function can be modeled with linear functions by considering the whole range
into sub ranges covering from 0 to 4, and rest of the range can be ignored. The
correction values have significant effect when the absolute difference is between 0 and
1 for low SNR application.
16





Figure 3-1 Correction values to be added with MAX
Therefore the correction function can be written as:
log(l + e '* yl) = m * \x - y\ + c,
where m and c are the slope and the intersection of the linear functions of each region
shown in Table 3-1.
Since the region between 1 and 2 in Figure 3-1 has the maximum curvature, it is
considered into two regions to reduce the effects of correction errors. It is noteworthy
to point out these slopes and constants do not depend on the SNR. Having reduced the
correction errors have a better BER performance in all SNR ranging from very low to high
and also it has minimal effect on quantization errors in hardware implementation.
17
Table3.1 Slope and the intersection for the selected regions
|x-y| Slope(m). Intersection(c)
Otol -0.3788 0.6931
1 to 1.5 -0.2238 0.5371
1.5 to 2 -0.1490 0.4249
2 to 3 -0.0783 0.2835
3 to 4 -0.0305 0.1401
3.1.1 Comparison of Complexity ofAll Algorithms
Table 3-2 describes the complexity of the algorithm with the existing algorithm in
algorithmic point of view.
Table3.2 : Complexity of the algorithms.





The proposed algorithm has a multiplier to be implemented, but today's
technology allows implementing very fast multipliers in FPGA. To reduce the area and
the critical path delay for the multiplier, the slope and intersection can be represented
in (4, 4) with unsigned format, and the absolute different can be truncated to (6, 4) with
unsigned format as well so that the multiplier inputs have width 4 and 6, since the slope
and intersection values are always less than 1, and correction values are neglected after
4.
18
3 .2 Software Implementation and Simulation
MATLAB simulations of the Turbo decoder with new algorithm are carried out in
this chapter, and the results are presented. For the simulation, different algorithms
discussed in Chapter 2, are considered for the sake of comparison with the new
algorithm, and various frame sizes with fixed SNR are also taken into account to validate
the performance of the turbo decoder with the frame size. The fixed point simulation is
also carried out to determine the best wordlength and fraction length for all SNR in
order to implement in the hardware. The results from the MATLAB simulation validate
the developed algorithm and form the basis forthe HDL implementation of the same.
3.2.1 Software Implementation of Turbo Decoder with New Algorithm
Following the literature survey and the development of the new algorithm, the
next step is a detailed MATLAB simulation of the proposed algorithm. MATLAB R2009a
Version 7.8 has been used to develop the code to verify the turbo decoder performance
in terms of BER.
There are three stages of testing the algorithm in MATLAB:
1. Test the algorithm with fixed frame size and different SNR.
2. Test the algorithm with fixed SNR and various frame sizes.
3. Test the algorithm with fixed frame size and various SNR with fixed point
representation
The flowchart in Figure 3.2 shows the sequential of the MATLAB program used to




V Length, Error Frame
\ Limit, Iteration, and SNR
Compute next state and
previous state matrices, and
linterleaver.
Generate random data and
feed it to the encoder.
Modulate the data with BPSK.
Generate AwGN noise and






Compute Alpha and Beta and
then LLR for each data
Calculate extrinsic information
for each data, and interleave
extrinsic values.
___________i




Compute LLR and extrinsic
values, Deinterleave LLR and
extrinsic values. Compute
errors.
Figure3-2 Flowchart for MATLAB simulation of Turbo
3.2.2 Floating Point Simulation
For floating point simulation, two constraint length decoders were considered in
order to evaluate the BER for state-4 and state-16 decoders. The whole purpose of doing
both state decoders is to show how the existing algorithms fail to show an optimal BER
performance when the algorithms are applied to high order decoders.
3.2.2.1 State-4 Turbo Decoder
• Generator Polynomial [1 0 1; 1 1 1].
• Random Interleaver.
• AWGN channel.
• Frame error limit -25
• Frame size 1024.
For the simulation, encoder 1 is brought to a known state; stateO, by adding 2 tail
bits to the frame and encoder 2 is left open; it can have any state at the end of the
frame bit plus tail bits. As a result, the following matrices are initialized as:
• Encoder 1 forward and backward matrices are set to [0, infty, infty, infty].
• Encoder 2 forward matrix is set to [0, infty, infty, infty].
• Encoder2 backward matrix is set to [Iog0.25, Iog0.25, Iog0.25, Iog0.25].
• Extrinsic values are set to 0.
21







Figure 3-3 Simulation results for all algorithms
From the simulation, it can be seen that all algorithms except the proposed
algorithm, have some deviation from the original MAP algorithm, and also they exhibit
almost same BER performance after 1.9dB SNR.
22
The following simulation is done to show the results clearly at low SNR.








Figure 3-4: Simulation result for Low SNR.
The following simulation was done to show that the normalization of alpha and
beta can be done either using total alpha or total beta. This is very important in
hardware implementation for the dual processes in order to optimize the data rate. The












Figure 3.5 Effect of forward normalization and backward normalization
Figure 3.6 to 3.9 show the effect of the frame size for all the algorithm. Frame
sizes were considered from 100 to 4000. It can be seen from the simulation results,
when the SNR decreases, the performance of the other algorithm deviates from the
original MAP algorithm due to the error propagation along the frame size.
24





0 500 1000 1500 2000 2500 3000 3500 4000
Frame Size










500 1000 1500 2000 2500 3000 3500 4000
Frame Size
Figure3.7: Different frame sizes for 0.5dB .
25








Figure 3.8: Different Frame size for SNR 1.





Figure 3.9: Different Iterations with frame size 1024.
26





Figure 3. 10: From very low to moderate SNR with different Iterations
From figure 3.9, the number of iterations needed to get the same BER for the
proposed algorithm is 4 while the other algorithm needs 5 iterations. Simulation result
in figure 3.10 compares the proposed algorithm and improved MAX-LOG-MAP with 6
iterations and 8 iterations respectively, from very low to moderate SNR. Increasing the
number of iterations will increase the latency of the decoder which is directly
proportional to the frame size.
3.2.2.2 State-16 Turbo Decoder
• Generator Polynomial [1 0 0 0 1; 1 1 1 1 1].
• Random Interleaver.
• AWGN channel.
• Frame error limit 25
• Frame size 1024.
27
For simulation, encoder 1 is brought to a known state; stateO, by adding 4 tail
bits to the frame and encoder 2 is left open; it can have any state at the end of the
frame. The data and the tail bits along with the parity bits are transmitted through an
AWGN channel. As a result, the following matrices are initialized as:
• Encoder 1 forward and backward matrices are set to [0, infty=> 15].
• Encoder 2 forward matrix is set to [0, infty=>15].
• Encoder2 backward matrix is set to [log(l/16) =>16].
• Extrinsic values are set to 0.






Figure 3. 11 Simulation results for 16-state decoder with 5 iterations for al
algorithms.
28
16 State Bit error Rate for frame size = 1 0241
?— MAP 4iter









Figure 3. 12 Simulation results of 16 states decoder for MAP and new algorithm with
4 iterations and Improved MAX-LOG-MAP and Simplified MAX-LOG-MAP with 5
iterations.
From Figures 3.11 and 3.12 it can be concluded that these two algorithms
depend on the number of states of the decoder, whereas the proposed algorithm shows
optimal BER performance. It is to note that most of the satellite communications use
16-states Turbo Codes to have better BER performance.
3.2.3 Fixed Point Simulation
The following are the modes for the fixed point simulation.
• Round mode - Nearest
• Overflow mode - Saturate
• Product Wordlength - 8,10bits
• Product Fraction length - 2,4 bits
• Sum word length - 8,10 bits
• Sum Fraction length - 2,4 bits
• Frame size - 400
29
• Random Interleaver
Due to the time consuming simulation for the fixed point representation, frame
size was considered as 400. The whole purpose of the fixed point simulation is to find
the optimal word length and fraction length for all SNR ranging from very low to high
without affecting the BER. Increasing the word length will retard the decoder data rate
and increase the area needed for the decoder. Therefore it is solely dependent upon the
application where the trade off amongst the BER, area, and data rate is paramount.










0 0.1 0.2 0.3 0.4 0.5 0.6 O-?
SNR
Figure 3. 13 Bit Error Rate for all algorithm with low SNR
The Figure 3.13 shows the simulation results for MAP algorithm with floating
point and for other algorithm with fixed point representation. It can be seen from the







even ¡? the presence of quantization errors since the errors induced by the algorithm are
minimized to its thousandth position, which can be represented by the fixed point
representation chosen for this simulation without incurring quantization errors. The SNR
range was purposely selected in order to reduce the simulation time needed for all
algorithms, which was almost 5 min for one frame transmission, meanwhile to show
clearly the deviation from the original algorithm.
For the following simulation, only the proposed algorithm and the improved
MAX-LOG-MAP were considered.







0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
SNR














'" D 0.5 t 1.5
SNR
Figure 3. 15 Comparison of fixed point of (10,4) and (8,4) representation
From Figure 3.14 and 3.15, it can be seen that the fixed point representation of
(10,4) has better BER performance in all SNR compared to other representation.
3.3 Observations From the Simulation
The simulation results from floating point and fixed point confirm the validity of
the proposed algorithm for low SNR applications. In fact, all algorithms exhibit same BER
performance at high SNR since the errors introduced by the MAX* operator can be
ignored with the high reliability of the data bits. At low SNR applications, increasing the
frame size does not improve the performance of the decoder for other algorithm; this is
as opposed to the fact that increasing the frame size will improve the BER performance.
The number of iterations to get the BER from other algorithms is less than that of
these algorithms. This is due to the errors from the MAX* operation which uses different







The fixed point representation of (10,4), which has a range of -32 to 31.9375, has
better BER performance in all SNR. This representation is chosen to implement the turbo
decoder in the hardware.
33
CHAPTER 4
HARDWARE IMPLEMENTATION AND VALIDATION
The turbo decoder with MAP algorithm using proposed MAX* operation is
implemented using VHDL and the modular design has been shown in this chapter. The
data flow through individual modules is also described in top-down fashion, and an
overview of the entire HDL implementation is produced. To optimize the data rate of the
turbo decoder using hybrid algorithms, with slight degradation in BER, is also presented.
The system is analyzed and synthesized using Altera Quartus and the test bench was
simulated using ModelSim. It is assumed that the received systematic and parity data
are available before the decoding process is initiated, and the output from the decoder
is stored to compare with the original data to calculate the errors in the decoding
process. The performance of the decoder is also presented in this Chapter.
4.1 Hardware Implementation of Turbo Decoder
For the hardware implementation, a 16-state decoder is chosen to incorporate
with the 4G applications. The following are the parameters chosen for the turbo
decoder:
• Generator Polynomial : [1 0 0 0 1; 1 1 1 1 1].
• Random Interleaver
• Frame size 1024, including the tail bits.
• 5 iterations.
The four tail bits are generated by the encoder 1 at the end of the frame, which
has only 1020 bits, and these four bits are used to bring the encoder 1 to a known state;
state 0. The same four bits are fed to the second encoder and the final state of the
34
second encoder ¡s left open, i.e., any state is possible. The generated bits are also
transmitted with the parity bits generated by both encoders.
The targeted device for hardware implementation of the turbo decoder is Altera
Stratix Il : EP2S180F150814. Table 4.1 highlights the main aspects of the selected FPGA.
Table 4.1: Stratix Il : EP2S180F150814 features.
Feature Value
Combinational ALUTs 143,520
Block /Distributed RAM 9,383,040
DSP block 9 bit 768
Maximum Clock Frequency 550 MHz
Gate Technology 90 nm
Core Voltage 1.2 V
4.1.1 16-State Turbo Decoder on FPGA
The block diagram of the HDL representation of a turbo decoder on FPGA is
presented in Figure 4.1. The language used for the FPGA implementation is VHDL, and
the internal modules are described in the following sections. At the end of this Chapter,
the hardware simulation is compared with the M atLab simulation and area and data rate






























Figure 4.1: HDL blocks for the Turbo decoder.
The DECODER has only two inputs which are single bit and one output often bits.
The sign of the DATAOUT is used to decode the data. The CONTROL module is the brain
of the decoder as it controls all the modules, including the memory modules and
temporary registers. Table 4.2 briefly describes each of the module, and Table 4.3 shows
the memory modules and its sizes.
As mentioned earlier, the data and parity bits are assumed to be available, and
the LEMEM is initialized to zero before the start of decoding process. GAMMAF and
GAMMAB read the data, parity, and the extrinsic values from the beginning and end of
the frame respectively and calculate the branch matrix, and during this clock period, the
initialization values for the alpha and beta are saved in the ALPHAMEM and BETAMEM
respectively, and also these values are made available for FINALALPHA and FINALBETA
through a temporary register, to calculate the next alpha and beta values.
36
In the next clock cycle, outputs from the gamma modules and the temporary
registers are fed to the FINALALPHA and FINALBETA to calculate the forward and
backward matrices, which are loaded back to the temporary registers for the next data.
The values in the temporary registers are stored in the ALPHAMEM and BETAMEM
during the next clock cycle while the gamma values are computed. This process will
continue until the middle of the frame is reached. At this point the control generates the
enable signal for LLRCORE and LEMODULE.
Once the middle of the frame is reached the gamma modules are disenabled and
the LLRCORE for the forward direction reads the gamma and beta values and evaluate
the a posteriori probability using the alpha values from the temporary register that holds
the alpha values, whereas the LLRCORE for the backward direction reads the gamma and
alpha values from the memory, and beta values from the temporary register. LEMODULE
then reads the data value from the data memory and calculates the extrinsic values.
During the next clock cycle, alpha and beta are calculated and the extrinsic
values are stored in the LEMEM for the next iteration. Once the end of the frame is
reached the next decoder starts decoding as explained above, but data and extrinsic
values are read using the interleaver addresses, which are hard wired constants within
the control unit.
When the iteration reaches its predefined value and the frame position is at the
middle, the decoding process starts with the outputs from both LLRCOREs while the
LEMODULE is deactivated. Since we have two LLRCOREs, the decoder decodes two data
per one clock cycle.
37
Table 4.2: Memory and its usage
MEMORY SIZE(bits) USAGE
DATAMEM 1024X10 Holds the received data.
PARITY1MEM 1024X10 Holds parity
encoderl.
from
PARITY2MEM 1024X10 Holds parity from
encoder2.
LEMEM 1024X10 Saves the extrinsic values.
ALPHAMEM 512X10 Saves alpha values.
BETAMEM 512X10 Saves the beta values.
Table 4.3 Description of the modules used in the DECODER
MODULE FUNCTIONALITY.
GAMMAF Calculates the branch matrix from beginning of the
frame.
GAMMAB Calculates the branch matrix from end of the frame.
FINALALPHA Calculates the forward matrix.
FINALBETA Calculates the backward matrix
LLRCORE Calculates the a posteriori probability for each data.
LEMODULE Calculates the extrinsic information for each data
CONTROL
Generates all the addresses for the memory and control
signals for the modules
38
4.1.1.1 Trellis Diagram Of the Turbo Decoder
The trellis diagram is derived from the generator polynomial which is used to
encode the data. Since the encoder has 4 memory elements, it can have 16 states and
the input is binary, the trellis diagram has 32 branches. Figure 4.2 shows the transitions






Figure 4.2 Trellis Diagram for the generator polynomial [1 O O O 1; 1 1 1 1 1]
39
4.1.1.2 Gamma Module
Gammacore is used to calculate the branch matrix associated with data, parity,
and extrinsic information for a particular frame position k. Since the decoder is binary,
we have only four possible transitions such as 0 to 0, 0 to 1, 1 to 0, and 1 to 1 and both 0
to 0 and 0 to 1 need three input adder and the other two transitions need four input
adder. Though, the name implies adder, the operation is mixed with adder and
subtraction, as a result, all the inputs are two's complemented in order to use the adder
to subtract. The output GAMMA[K] is 4x10 bus which is gammaOO, gammaOl,
gammalO, and gammall respectively. The equations implemented in this module are:
Ko (^) = —datain(k) — parityinQc) — LEQc).
YiQt) = —datainQc) + parityin(k) — LEQi).
y2(fc) = datainQc) — parityinQc) + LEinQc) — LE(Jc).









Figure 4.3: Gamma module used as GAMMAF and GAMMAB in the DECODER.
40
Table 4.4 Signal description for GAMMACORE
SIGNAL Size Description
ENABLE Input from the control, when enable is
high, the ouput is read.
DATAIN 10 Input from the data memory
PARITYIN 10 Input from the parity memory
LEIN 10 Input from the LE memory
GAMMA 4X10 Output to alpha and beta modules.
The GAMMACORE in Figure 4.3 is instantiated as GAMMAF and GAMMAB in the
turbo decoder.
4.1.1.3 FinalAlpha Module
The Alpha module is used to calculate the 16 forward matrices from the branch
matrices and the previous forward matrices. From Figure 4.4, all 16 forward matrices are
derived and implemented in the alpha core. In the fixed point implementation, we need
to normalize the alpha values to avoid the overflow. The normalization factor does not
affect the BER performance of the decoder, but it affects the throughput of the decoder












Figure 4.4: Final alpha for the Turbo Decoder
The alpha values and gamma values from the current data are used to calculate
the alpha values for the next data. The output of TOTALALPHAGAMMA is subtracted
from the output of the TOTALALPHACORE, in order to normalize the alpha values.
Table 4.5: Signal description for Alpha module
Signal Size Description
Enable Synchronized with the clock and activate
the FINALALPHA.
Alpha[k-1] 16X10 Forward matrices for the current data.
Gamma[k-1] 4X10 Branch matrices for the current data.
Alpha[k] 16X10 Forward matrices for the next data.
42
4.1.1.3.1 TotalAlphaCore Of FinalAlpha
The following 16 equations are implemented in the TOTALALPHACORE.
«o (fc) = Ko(Zc - l)cc0(k - 1) + y30 - Da1(Zc - 1)
«i (fc) = y2(fc - l)a2{k - 1) + K1(Zc - l)a3(fc - 1)
OC2 Qc) = 7l(/c - Da5(Zc - 1) + K2(Zc - Da4(Zc - 1)
«3 (?) = Yo(k - Da6(^ - 1) + y3(Zc - Da7(Zc - 1)
OC4 {k) = Yl(k - Da9(Zc - l) + y2(fe - l)a8(/c - 1)
«s (Zc) = Yo(k - Da10(Zc - 1) + K3(Zc - Da11(Zc - 1)
OC6 (k) = Yo(k - l)a12(fc - D + Y3(k - Da13(Zc - 1)
oc7 (fc) = Yl(k - Da15(Zc - 1) + K2(Zc - Da14(Zc - 1)
K8 (fc) = Xo(fr - Da1(Zc - 1) + K3(Zc - Da0(Zc - 1)
oc9 (k) = Ki(Zc - I)U2Ck - 1) + K2(Zi - l)a3(fc - 1)
OC10 (Zc) = Ki(Zc - Da4(Zc - 1) + K2(Zc - Da5(Zc - 1)
ocn (JO = 7o(fc - l)a7(fe - 1) + y3(fe - l)a6(fe " 1)
OC12 (k) = Yl(k - Da8(Zc - 1) + y2(fr - l)a9(fc ~ D
°Ci3 (Zc) = Ko(Zc - D«n (Zc - 1) + K3(Zc - l)a10(fc - D
OCi4 (Zc) = y0(k - l)a13(k - 1) + K3(Zc - Da12(Zc - 1)























Figure 4.5: Total Alpha module without normalization.
44
4.1.1.3.1.1 AlphaCore Of TotaIAlphaCore module
This is the basic block used in the TOTALALPHACORE. This block instantiates the
MYLOG104. It just adds the alpha values with the associated gamma values and these
values are captured by the MYLOG104 module to give the final output. The MYLOG104
module is described at the end of this Chapter. Figure 4.6 shows the block diagram of
the ALPHACORE. If the decoder is intended to modify for other algorithm we need to










Figure 4.6 : Basic Block for Alpha module
4.1.1.3.2 TotalAlphaGamma of FinalAlpha
From Figure 4.2, we see 32 branches for the alpha and beta calculations which
contribute the calculation of normalization values for alpha and beta. Therefore the
normalization function can be written as:
45
TOTALGAMMAALPHA (K)
= LOG (/^1 (O) ajf_! (0) + ??-?(3)a^-?(0) + K^1 (O)0^1(I)
+ y^^o^Cl) + ylf_1(2)aJf_1(2) + yJf.1(l)%.1(2)
+ y^iCZ)«^!^) + 7*-?(1)a*-?(3) + yJf_i(2)aJf_1(4)
+ y^Cl)«*^) +ylf_1(2)aJf_1(5) + YK-i(X)aK-i(S)
+ yjf-iWajf-iCe) + y^_1(3)a^_1(6) + y*-! (O)O^1 (7)
+ y^-itf)«^^) + yJf_1(2)ajr_1(8) + ?^1(I)a^1(Q)
+ yJf_1(2)aJC_1(9) + ??-??^a?-?^) + Yk-i(0)ccK-i(10)
+ y*-! O)O^1(IO) + yK-i(0)ajf-i(ll) + ??-?(?)a?-?
+ yir_i(0)aíf_1(12) + yjr_1(3)ajr_1(12) + ?^1(O)U^1(U)
+ Y^1O)(Xk-I + yjf-i(2)a/f_1(14) + y*-! (I)O^1 (14)
+ yJf_1(2)alf_1(15) + yif-i(l)a^-i(15))
For simplicity, we ignore the time index from the equation.
= log (yoOo + CC1 + CC6 + CC7 + CC10 + CC11 + CC12 + Of13)
+ YÁc^o + CC1 + CC6 + cc7 + cc10 + Ct11 + cc12 + cc13)
+ Yl(CC2 +(X3 +CC4 + CC5 +CC8 +CC9 + CC14 + OT15)
+ Y2(CC2 + CC3 + CC4 + CC5 + CC8 + CC9 + CC14 + OT15))
= log ((y0 + Y3Ka0 + CL1 + a6 + a7 + a10 + U11 + cc12 + cc13)
+ (.Yl + Y2K CC2 + CC3 + OC4 + CC5 + U8 + OC9 + CC14 + U15))
Let ? = (?0+ Y3)(CX0 + Ci1 + a6 + oc7 + cc10 + Gt11 + cc12 + a13) and
Y = (Yl + Y2X CC2 + CC3 + U4 + CC5 + Ct8 + CC9 + CC14 + a15)
Therefore
TOTALGAMMAALPHA = LOG(x+y)
= LOG(e* + e?)
Where ? = log((y0 + y3)( oc0 + U1 + ac6 + cc7 + cc10 + alx + cc12 + oc13))
= Og(Cy0 + y3) + log(a0 + OC1 + Cc6 + CC7 + cc10 + U11 + cc12 + cc13)
Similarly
Y = log(yi + y2)+ log( a2 + a3 + cc4 + CC5 + cc8 + cc9 + oc14 + cc15)
46
Therefore instead of adding the alpha values with the corresponding gamma values we
came up with the following architecture for the normalization module.
The figure 4.7 shows the block diagram of the TOTALALPHAGAM MA. It is
noteworthy at this point to reveal the fact that this module has the longest critical path
delay and affects the data rate of the decoder. To improve the data rate without
affecting BER performance is also given at the end of this Chapter and the BER











































my log 104 U
mylog104
ADDER2
Figure 4.7 : Normalization module for Alpha.
48
4.1.1.4 FinalBeta Module
The FINALBETA module ¡s similar to the FINALALPHA module in terms of signal
processing, but it differs how the inputs and outputs are related. In this module the beta
values of the current data are used to compute the beta values for the previous data.
And also to normalize the backward matrices, the backward matrices are used instead of
old-fashioned method which uses the forward matrices, so that the storing the forward
matrices can be avoided. All the backward matrices are derived from the Figure 4.2 and













Figure 4.8: Final Beta with normalization.
The output of TOTALGAMMABETA is two's complemented so that the adder can
be used to subtract it from the output of TOTALBETACORE.
49
Table 4.6: Signal description of FINALBETA.
SIGNAL SIZE DESCRIBTION
ENABLE Synchronized with the clock, used to
activate the module
GAMMA[K+!] 4X10 Input from the GAMMA module
BETA[K+!] 16X10 Feedback from the FINALBETA
BETA[K] 16X10 Output fed back to same module
4.1.1.4.1 TotalBetaCore of FinalAlpha
The following equations are implemented in the TOTALBETACORE.
ßQ(k - 1) = Y0(W0(V + Y3(k)ß8(k)
ßx(k - 1) = Yo(k)ß8(k) + Y3(k)ß0(k)
ß2(k - 1) = Y2(W2Ck) + Y1(W3Ck)
ß3(k - 1) = Y2(k)ß2(k) + Yl(k)ß3(k)
ß4(k - 1) = Yo(k)ß0(k) + Y3(k)ß8(k)
ß5(k - 1) = Yo(k)ßs(k) + Ys(k)ß0(k)
ß6(k -1) = Y2(k)ß2(k) + Yl(k)ß3(k)
ß7(k - 1) = Y2(k)ß2(k) + Yl(k)ß3(k)
ß8(k - 1) = Yo(k)ß0(k) + Ys(k)ß8(k)
ß9(k -1)= Yo(k)ß8(k) + Y3(k)ßQ(k)
ßio(k -D= Y2(k)ß2(k) + Yl(k)ß3(k)
ßu(k -D= Y2(k)ß2(k) + Y1(WAk)
50
/?12(fc - 1) = Yo(k)ß0(k) + Y3(k)ß8(k)
ß13(k - 1) = YoWßsdk) + Y3(k)ß0(k)
ßu(k - 1) = Y2(k)ß2(k) + Yl(k)ß3(k)
ßls (fc - 1) = ?2 OOßz (IO + Yi OOß3 OO






















Figure 4.9: Final Beta without normalization
52
4.1.1.4.1.1 BetaCore of TotalBeta Module
The BETACORE computes the backward matrices for the previous data from the
current data. The beta values are added with the associated gamma values and the









Figure 4.10: Basic block for backward matrix calculation.
4.1.1.4.2 TotalBetaGamma of FinalBeta
This module implements the following equation which is derived from the trellis
diagram, in order to normalize the beta values.
TOTALGAMMABETA(K - 1)
= LOGfo-iCûOfo (0) + 7?-?(0)/?? (8) + 7?-?(0)/?? (3)
+ /K-I(O)Af(Il) + 7^-?(0)^(5) + Yk-AWkWS) + 7?-?(0)/?* (6)
+ XK-I(O)Af(U) + 7?-?(3)/?? (0) + 7?-?(3)/?? (8) + ??-?(3)& (3)
+ Yk-I(Wk(IV + y*-i(3)Äf (5) + yjf-iO)^ (13) + ??-?&)ß?
+ /?-?(3)/??(14) + ttf-iCOAftt) +yjf-i(l)^(9) +??-?(?)ß?(10)
+ ??-?(?)ß?(?) +??-?(1)?t(12) + ??-??)ß?(.*) + ^1(I)Af(IS)
+ 7?-?(1)/?? (?) + ??-?(?)ß??) + YK-iV)ßK(?) + 7?-?(2)/?* (10)















































Figure 4.11: Normalization module for Beta.
54
The novelty of this architecture is that of the reduced number of adders needed
by avoiding the addition of gamma and beta at the beginning. Instead beta and gamma
values are fed to the MYLOG104 and the corresponding gamma beta values are then
added. This will reduce the number of adders to 2 from 64, and it would not change the
critical path delay of the module.
4.1.1.5 LLRCore of Turbo Decoder
The LLRCORE consists of two modules that one of the modules computes the a
posteriori probability of input 1 and the other module computes for input 0. The
implemented equations are derived from the trellis diagram shown in Figure 4.2 , by
considering the forward and backward matrices that associated with gammalO and
gammall for the case of input 1 and gammaOO and gammaOl for the case of input 0.
The derived equation for input 1 :
LLRone(k) = Log[ak(0)Yk(.3)ßk(8) + ak(l)yk(3)0k(O) + ak(2)yk(2)/?k(l)
+ ak(3)yk(2)/?k(9) + ak(4)yk(2)/?k(2) + ak(5)yk(2)/?k(10)
+ ak(6)yk(3)/?k(ll) + ak(7)yk(3)/?k(3) + ak(8)yk(2)/?k(4)
+ ak(9)yk(2)/?k(12) + ak(10)yk(3)/?k(13) + ak(ll)yk(3)/?fc(5)
+ ak(12)yk(3)/?k(14) + ak(13)yk(3)/?k(6) + ak(14)yk(2)/?k(7)
+ ak(15)yk(2)/?k(15)]
The derived equation for input 0:
LLRzeroik) = Lo0[ak(O)yk(O)/?k(O) + ak(l)yk(0)/?k(8) + ak(2)yk(l)/?k(9)
+ afc(3)yk(l)/?k(l) + ak(4)yk (1)^(10) + ak(5)yk(l)/?k(2)
+ «fc(6)yk(0)/?k(3) + ak(7)yk(0)/?k(ll) + ak(8)yk(l)/?k(12)
+ ak(9)yk(l)/?k(4) + ak(lO)yk(O)0k(5) + ak(ll)yk(0)/?k(13)
+ ak(12)yk(0)/?k(6) + ak(13)yk(0)/?k(14) + ak(14)yk(l)0k(15)
+ ak(15)yk(l)/?k(7)]
The implementation of these equations are shown in Figure 4.12 and Figure 4.13 .
Though the implementation employs the parallel architecture the critical path delay of




































































































































































Figure 4.13: Architecture for computing LLRzero.
57









Figure 4.14: LLR module
4.1.1.6 LECore of Turbo Decoder
LECORE is implemented to find the extrinsic information that each decoder is
learnt itself. Since both decoders are accessible to channel information, LECORE must
suppress the channel information as well as extrinsic information learnt from the other








Figure 4.15: Block diagram for extrinsic computation
58
4.1.1.7 Control Module of Turbo Decoder
The flow chart for the control module is shown in Figure 4.16. The control
module has only two inputs; elk and start. Since the control module provides the
addresses of memory of data, parity, and extrinsic values and the decoder decodes the
data frame from both directions, control module has two counters; one is up counting
and the other is down counting. And also when decoding process is switched to second
decoder, control module should be able to provide the interleaved addresses for data
and extrinsic memory location. To get around the clock transition, the interleaver
addresses are defined within the control signal as hard wired constants. This will make
sure that during the clock transition, the data and extrinsic addresses are readily
available along with the parity address.
The control module has 5 states such as IDLE, STAO, STAI, STA2, and STA3. Table
describes what signal assignments are carried out by the control module in each state.
Detailed information can be found in Appendix under the VHDL description for control
module.
Table 4.7: State and Signal assignment.
STATE SIGNAL ASSIGNMENT
IDLE All memories are enabled.
Alphawr, betawr, gammawr, gammaenable, Ld_Areg, and Ld_Breg are set to
1.
Alphaenable, betaenable, LLRenable, LEenable, and decode are set to 0.
STAO Memory modules for gamma, beta, and alpha are disabled with a condition1.
Alphawr, betawr, gammawr, gammaenable, Ld_Areg, and Ld_Breg are set to
0.
Alphaenable and betaenable are set to 1.
______________Upcounting and downcounting.
STAI Memory modules for gamma, beta, and alpha are enabled with a condition2.
Gammawr, alphawr, betawr, gammaenable, Ld_Areg, and Ld_Breg are set to
1 with a condition2.
Alphaenable and betaenable are set to 0.
Decode is set to 1 with a condition3.
59
Leenable is set to 0 with a condition4.
STA2 Memories for gamma, alpha, and beta are enabled.
Alphaenable and betaenable are set to 1.
LLRenable, LEenable, LEwr, Ld_Areg, and Ld_Breg are set to 0.
Upcounting and downcounting.
Decoder is switiched with a condition5.
Iteration is decreased by 1 with a condition6.
STA3 Memories for gamma, alpha, and beta are disabled.
LLRenable, LEwr, Ld_Areg, and Ld_Breg are set to 1.
LEenable is set to 0 or 1 with a condition7.
Conditions:
1. up counter is less than 511.
2. up counter is less than or equal to 511.
3. iteration = 0, decoder = 1, and upcounter = 512.
4. iteration = 0 and decoder = 1.
5. upcounter = 1023.
6. upcounter = 1023 and deocoder = 1.














































Figure 4.17: Control module
"Thick lines are 10 bits width and others are single bit.
62
Apart from the state transition, the control signal has two other processes, which
are activated by the upcounter and decoder signals. One of the processes handles which
parity memory must be enabled based on the decoder, and the other process handles
memory for data and extrinsic values to be interleaved or not, and how alpha and beta
are addressed when counter is less than or equal to 511. These processes are vital in
this design by serving two purposes; one is we need only one decoder, as a result the
area is reduced by 50% and the second one is memory needed for storing alpha and
beta are also reduced by 50%.
The novel implementation of the turbo decoder comes in the form of integration
of single modules needed for each alpha, beta, gamma, and LLR modules, together
within the desired module in order to increase each module throughput by means of
reducing the interconnection delays, which is the bottleneck for the most of the digital
design.
4.1.2 MyLogl04 Module for MAX*
The MYLOG104 modules implements the new algorithm described in detail in
Chapter 3. This module is made up of MAX, ABS, COMPARATOR, and a MULT units. It has
two 10 bit inputs and one 10 bits output representing the integer part and fractional
part with signed bit. The Figure shows the block diagram of the module used in alpha,














Figure 4.18: Signal flow diagram for the new algorithm.
The intersection and slope used in the correction function are represented in
(8,7) since these values are always less than one, the representation does not need any
bit for integer. First the absolute different of ? and y is calculated and it is used as
selection signal for the intersection and slope which are hardwired constants. In the
meantime, the maximum value of ? and y is computed. The multiplier captures the
output from the ABS and the output from the slope and produces its output. Since the
ABS output is (10,4) and the slope is (8,7) the resultant output of the MULT is (18,11), as
a result the output from the intersection and the MAX should be concatenated with tail
bits and leading bits.
The output from the adder which captures the output from the multiplier and
the intersection is the correction value which needs to be added with the maximum
value. The signed output of the second adder needs to be truncated in order to get the
output of the module in (10,4) representation. The ADJUST process is used to increase
the precision of the output.
64
4.2 Optimization of Data Rate
The BER performance of the Turbo Decoder mostly depends on what algorithm is
used to compute the correction values for forward, backward, and branch matrices.
Since the longest critical path introduced in alpha and beta modules relies on the
normalization module, which does not have any significant impact on the accuracy of
these matrices, the normalization module can be employed with MAX operation in order
to increase the throughput of these modules without affecting the BER performance in
any manner. And also the longest critical path of the LLR module can be reduced by
employing same algorithm as well. This will slightly decrease the BER performance
compared to LOG-MAP in low SNR regions. To overcome this problem, an error scaling
factor is added to the MAX operation. Figure shows the MatLab simulation with original
algorithm and original algorithm with MAX for normalization and LLR computation.
65





Figure 4.19: Comparison of single algorithm with double algorithm.
For simplicity, we call this method as hybrid algorithm for turbo decoder. The
hybrid algorithm has some degradation in BER, but it can be seen from the hardware
simulation that it has significant improvement in throughput of individual module, and
also area is very much reduced.
4.3 Hardware Synthesis And Simulation
Hardware synthesis was done using Quartus Il version 9.0. Each module is
analyzed and synthesized separately to find the resources needed by each individual
module. The Time Quest Analysis tool from Altera was used to find the maximum
frequency for each individual module. The result is compared with [20] in Table 5.7.
66








GAMMA 356 82.66 356 82.66 1138 141.22
BETA 7616 20.21 5911 63.34 1055 139.86
ALPHA 7616 20.31 5911 63.41 1138 141.22
LLR 6766 20.46 3760 63.98 2111 143.02
The test bench was created to capture the data out from the decoder at the end
of five iterations. The simulation was run with Altera ModelSim. The data and parity bits
were created using MatLab and the received data was modified to fixed point
representation with (10,4), and these values are copied to the memory initialization files
linked to the memory module. Once the data and parity are available, the test bench is
simulated for 210 us in order to have iterated five times. At the end of the simulation
the MatLab script is run to find the number of errors from the hardware simulation.
67






Figure 4.20: Hardware simulation for both realizations to compare with MatLab simulation.
4.3.1 Hardware Synthesis of Turbo Decoder
After each individual module is analyzed and synthesized the full turbo decoder
was implemented by integrating all the modules including memory modules and control
module to estimate the resources. Table 4.8 shows the resource usage for both models
considered above.
68
Table 4.9: Resource comparison for the Turbo Decoder.
Resource
HDL with mylogl04
# of resources Percentage
HDL with hybrid algorithm
# of resources Percentage
ALUTs 31,191 22% 21,159 14%
DSPs 256 33% 68 9%
MEMORY 409,600 4% 409,600 4%
LOGIC REG 36 1% 36 1%
4.4 Novelties of Implementation
1. All the modules needed for alpha, beta, gamma, and LLR are integrated within
each individual module in order to reduce the internal connection delays so that
the throughput of each module is increased.
2. Number of adders needed for both alpha and beta normalization are reduced to
4 from 124. This will reduce the area needed and the power consumption by the
turbo decoder.
3. Number of clock cycle is minimized by means of integrating combinational
module into a single module that is enabled with clock transition.
4. Normalization of alpha and beta is done using total alpha and total beta
respectively in order to avoid the memory needed to save the total alpha.
5. Gamma values are written into the memory while alpha and beta values are
calculated and alpha and beta are written while gamma values are computed.
This will reduce the number of clock cycle by 50%.
69
4.5 Observation from the Hardware Simulation
The HDL realization of Turbo decoder with mylogl04 algorithm has the BER
performance similar to LOG-MAP, whereas the implementation of turbo decoder with
two algorithm; one for alpha, gamma, and beta, and the other for normalization and LLR
has the throughput similar to MAX-LOG-MAP. The second implementation has very
small deviation from the LOG-MAP in terms of BER performance.
Table 4.10 compares the throughput with some of the implementations. Our
implementation shows a significant improvement in throughput while it maintains the
performance, in terms of BER, of the Turbo decoder.
Table 4.10 Comparison of throughput with recent implementation
Ref # of Processor Technology Throughput/(Mbps)
[17] 64 VLSI 930
[18] VLSI 27.6




This work FPGA 51




In this thesis we have implemented a very high speed turbo decoder with the
new algorithm which gives optimal performance in terms of BER. We have also shown
an architecture for normalization module to reduce the number of adders and MAX*,
while improving the critical path delay, and also a hybrid architecture was implemented
to increase the data rate while reducing the area needed by the turbo decoder, without
affecting the BER performance.
It can be seen from the MatLab simulationan hardware simulation, the
architecture of the turbo decoder does not degrade the BER performance of the
decoder at any SNR, ranging from very low to high, compared to the LOG-MAP
algorithm. This does not imply that the algorithm is not sensitive to the SNR mismatch.
Only MAX-LOG-MAP is not sensitive to SNR mismatch. But all the log versions of the
MAP algorithm has minimal sensitive to SNR mismatch, whereas the MAP algorithm is
totally dependent upon how accurately the SNR can be evaluated from the channel
information.
In hardware implementation of turbo decoder, there is always a tradeoff
between performance parameters, such as area, power, cost, speed, and BER. If the
decoder is implemented with the algorithm proposed for the branch matrix, forward
and backward matrices and the normalization and the LLR with MAX-LOG-MAP, we will
end up with very high data rate with slight degradation on BER performance, which
results in giving reduction in area and power. On the other hand, if the matrices
calculations are powered with the new MAX* algorithm and the others with MAX-LOG-
MAP with ESF, the decoder exhibits a similar BER performance compared to LOG-MAP
with significant data rate reduction and slight increase in area. Therefore it is totally
dependent upon the application where the turbo decoder is to be implemented.
71
The Stratix Il has 90nm gate technology. If we implement the same architecture
in top end FPGA, the data rate will have significant improvement over the old
technology and the area is reduced as well.
72
REFERENCES
1. C.Berrou, A. Glavieus, and P. Thitimajshima, "Near Shannon Limit Error-correcting Coding:
Turbo codes/' Proc. 1993 IEEE International conference on Communication, pp. 1064-1070,
May 1993.
2. WJ. Gross and P.G.Gulak, "Simplified MAP algorithm suitable for implementation of turbo
decoders," Electronics Letters, vol. 34, no. 16, pp. 1577-1578, August 1998.
3. Patrie Robertson, Peter Hoeher, and Emmanuelle Villebrun "Optimal and sub-optimal MAP
algorithms suitable for turbo decoding," European Trans. On Telecomm, Vol. 8 no. 2, pp.
119-126, March -April 1997.
4. Shahram Talakoub, Leila Sabeti, Behnam Shahrrava, and Majid Ahmadi, "An Improved Max-
Log-MAP algorithm for Turbo decoding and Turbo Equalization," IEEE transactions on
Instrumentation and measurement, vol. 56, NO. 3, pp. 1058-1063, June 2007.
5. Ashwani Singh, E.Boutillon, and G. Masera, "Bit width optimization of extrinsic information
in Turbo decoder," 5th international symposium on Turbo codes and Related topics, pp. 134-
138, 2008.
6. L.Bahl, J. Cocke, F. Jelinek, and J.Raviv, "Optimum decoding of linear codes for minimizing
symbol error rate," IEEE Trans. On Inf. Theory, vol. IT-20, pp. 284-287, Mar 1974.
7. B.P. Lathi and Zhi Ding, "Error Correcting Codes" in Modern digital and analog
communication systems, Forth edition. New York: Oxford University press, 2009, pp.951-
959.
8. Michel J.Thul, and Norber When, "FPGA implementation of parallel turbo decoders,"
Integrated circuits and systems Design, SBCCI 2004, 17th Symposium, pp. 198 - 203, 2004.
9. Alexander Worm, Peter Hoeher, and Norbert When, "Turbo decoding without SNR
estimation," IEEE Communications, Vol.4, N0.6, pp. 193-195, June 2000.
10. Third Generation Partnership Project 2(3GPP2), Physical layer standard for cdma2000 spread
spectrum systems, Release D, version 1, Feb 2004.
11. IEEE standard for local and metropolitan area networks. Part 16: air interface for fixed
broadband wireless access systems, Nov 2004.
12. Hamid R Sadjadpour, "Maximum a posteriori decoding algorithms for turbo codes,"
Proceedings of SPIE, vol. 4045, pp. 73-83, 2000.
13. M.J.Thuul, F. Gilbert, T. Vogt, G. Kreiselmaier, and N. When, "A Scalable system Architecture
for High Throughput Turbo decoders," Journal of VLSI Signal Processing systems, Vol. 39, pp.
63-77, 2005.
14. G. Prescher, T. Gemmeke, and T. Noll, "A Parameterizable Low-Power High-throughput
Turbo Decoder," IEEE International Conference on Acoustics, Speech, and Signal Processing,
pp. 25-28, Mar 2005.
15. Oscar Y. Takeshita, "On Maximum Contention Free Interleaves and Permutation
Polynomials Over Integer Rings," IEEE Transactions on Information Theory, vol. 52, NO. 3, pp.
1249-1253 March 2006.
73
16. Duk Gun Choi, Min-Hyuk Kim, Jin Hee Jeong, and Ji Won Jung, "An FPGA Implementation of
High speed Flexible 27-Mbps 8-state Turbo Decoder," ETRI Journal, Vol 29, NO. 3, pp. 363-
370, June 2007.
17. Karim. S. M, and Chakrabarti. I, "An improved low power high throughput LOG-MAP turbo
decoder," Consumer Electronics, IEEE transactions on, Issue 2, pp. 450-457, May 2010.
18. S.J.Lee, N. R. Shanbhag, and A.C. Singer, "A 285-MHz pipelined MAP decoder in 0.18 um
CMOS," IEEEjournal of solid state circuits, vol. 40, NO. 8, pp. 1718-1725, August 2005.
19. Martin. I.del Barco, Gabriel N. Maggio, and Damián. A .Morero, "FPGA Implementation of
high speed parallel maximum a posteriori(MAP) decoders," Proceeding of the Argentine
School of Micro-Nano electronics Technology and Applications, pp. 98-102, 2009.
20. Roberto Ramirez Martin, Andres David Garcia Garcia, Luis Fernando Gonzalez Perez, and
Javier Eduardo Gonzalez Villarruel, "Hardware architecture of MAP algorithm for Turbo
Codes implemented in a FPGA," Proceedings of the 15th international Conference on
Electronics, Communications and Computers, pp. 70-75, 2005.
21. Boutillon E., Douillard C, and Montorsi G, "Iterative decoding of concatenated convolutional
codes: lmplementaiton Issues," Proceeding of the IEEE, vol 95, Issue 6, pp. 1201-1227, June
2007.
22. Yuping Zhang, and Keshab K.Parhi, "Parallel Turbo Decoding," IEEE International Symposium
on Circuits and Systems, pp. 509-512, 2004.
23. F. Gilbert, F. Kienle, and N. When, "Low complexity stopping criteria for UMTS turbo
decoders," Vehicular Technology Conference, vol.4, pp. 2376-2380, 2003.
24. J. Hagenauer, E. Offer, and L. Papke. "Iterative Decoding of Binary Block and Convolutional
Codes," IEEE Transactions on Information Theory, vol.42, no 2, pp. 429-445, Mar 1996.
25. F. Zhai and I. J .Fair. "New Error Detection Techniques and Stopping Criteria for Turbo
Decoding," in Proc. 2000 IEEE Canadian Conference on Electrical and Computer Engineering,
pp. 58-62, Mar 2000.
26. Z. Whang and K. K. Parhi, "Decoding Metrics and their Applications in VLSI Turbo Decoders,"
In Proc. 2000 Conference on Acoustics, Speech, and Signal Processing, pp. 3370-3373, Sept
2000.
27. R. Y. Shao, S. Lin, and M. C. P. Fossorier, "Two Simple Stopping Criteria for Turbo Decoding,"
IEEE Transactions on Communications, vol. 47, no. 8, pp. 1117-1120, Aug 1999.
28. A. Shibutani, H. Suda, and F. Adachi, "Reducing average number of turbo decoding
iterations," Electron. Lett, vol.35, no. 9, pp. 701-702, Apr 1999.
29. Ajit Nimbalker, T. Blankenship, Brian Classon, Thomas E. Fuja, and Daniel J. Costello,
"Contention-Free Interleaves for High-throughput Turbo Decoding," IEEE Transactions on
Communications, vol. 56, no. 8, pp. 1258-1266, August 2008.
30. T. K. Blankenship, B. Classon, and V. Desai, "Channel coding for 4G systems with adaptive
modulation and coding," IEEE Wireless Communication Mag., vol. 9, pp. 8-13, Apr 2002.
31. A. Giuletti, L. van der Perre, and M. Strun, "Parallel turbo decoding interleavers : avoiding
collisions in accesses to storage elements," Electron. Lett, vol. 38, pp. 232-234, Feb 2002.
74
32. A. Nimbalker, T. K. Blankenship, B. Classon, T.E. Fuja, and D. J. Costello, "Inter-Window
shuffle ¡nterleavers for high throughput turbo decoding," in Proc. Int. Symp. Turbo Codes and
Related Topics, pp. 355-358, Sept 2003.
33. M. J. Thul, F. Gillbert, and N. When, "Optimized Concurrent interleaving architecture for high
throughput turbo decoding," in Proc, Int. Conf. Electronics Circuits and Systemspp 1099-
1102, Sept 2002.
34. A. Tarable, S Benedetto, and B Montorsi, "Mapping Interleaves laws to parallel turbo and
LDPC decoder architectures," IEEE Trans. Information theory, vol.50, pp. 2002-2009, Sept
2004.
35. C. Schurger, F. Catthoor, and M. Engel, "Memory optimization of MAP turbo decoder
algorithms," IEEE Trans, on VLSI system, vol. 9, no. 2, pp. 305-312, April 2001.
75
APPENDIX
Al. Matlab codes used in this thesis
% This m file simulates all algorithms
% MAX-LOG-MAP/ Improved-MAX-LQG-MAP, LOG-MAP, Simplified-MAX-LOG-MAP
% and the proposed algorithms are considered.
% Channel is assumed to be Assumed to be AWGN,
% for different generator g must be changed
% Frame size and frame limit can be changed
% iteration can be changed
clear ail
%diary myfileallalgo.txt
a = 1; %channel fading factor
ferrlim = 25; %Frame limit
niter = 5; %lteration
lnfty = le20; % DEfine the infinity
g = [1 0 0 0 1;1 1 1 1 1]; %generator polynomial
%Find the next state and previous states
[n,k] = size(g); %find the number of rows and columns
m = k-1; %memory elements ¡n the encoder
[nextO,nextl] = stateout(g); %call the function stateout
%generate the next_state matrix containing state transitions and
% the input and output associated with it.
next_state = [next0(:,2) nextl(:,2) next0(:,3) nextl(:,3)];
[temp,alp] = sort(next0(:,2));
[temp,alpp] = sort(nextl(:,2));
%previous states are generated.
pre_state = [alp alpp];
%no punctures
puncture = 1;
rate = l/(2+puncture); % rate is assumed to be 1/3
nstates = 2Am; % number of states ¡n the encoder
N = 1024; % Frame length.
L_total =N+m; % Tail bits for terminating first encoder
[temp alphaD] = sort(rand(l,L_total-m)); %Random interleaver.
tempi = max(alphaD);
%Tail bits are not interleaved
newalphaD = [alphaD templ+l:templ+m];
EbNOdb =[0 0.5 1 1.5 2 ]; %define the SNR range
nEN = 1;
for nEN = l:length(EbNOdb)
%define the matirx for holding errors for each SNR
76
%lnitialize to zeros.
errsLogD(nEN,l:niter) = zeros(l,niter); % erros for LOG-MAP
errsCorlD(nEN,l:niter) = zeros(l,niter); % errors for PROPOSED
errsSimD(nEN,l:niter) = zeros(l,niter); % Errors for Simplified
errslmpD(nEN,l:niter) = zeros(l,niter); % Errors for Improved
nferrlmpD(nEN,l:niter) = zeros(l,niter); % Frame errors for Improved
nferrCorlD(nEN,l:niter) = zeros(l,niter); %frame errors for proposed
nferrLogD(nEN,l:niter) = zeros(l,niter); %Frarne errors for Log-map
nferrSimD(nEN,l:niter) = zeros(l,niter); %Frame errors for simplified
en = 10A(EbN0db(nEN)/10); %Calculate in terms of energy
L_c = 4*a*en*rate; %Channel coefficient
sigma = l/sqrt(2*rate*en); %Moise Variant
nframe = 0; %lnitia!ize number of frames sent to zero
while nferrlmpD(nEN,niter) < ferrlim
nframe = nframe+1;
? = round(rand(l,L_total-m)); % Create random data
% Call myencode_bit function to encode the data
en_outputD = myencode_bit(x,g,alphaD); %encode the data
rD = en_outputD + sigma*randn(l,L_total*3); %add the random noise
%extract the data for decoder 1 and decoder 2
%Call mydemuitiplex function
ykD = mydemultiplex(rD,alphaD,m);
ykmD = 0.5*L_c*ykD; %Muitiply with the channel coefficient
rec_sD = ykmD(l,:); %Data for decoder 1
rec_s2D = ykmD(2,:); %Data for decoer 2
%lnitally extrinsic values are set to 0
L_ecorlD(l:L_total) = zeros(l,L_total); %Extrinsic for proposed
L_elmpD(l:L_total) = zeros(l,L_total); %Extrinsic for Improved
L_eLogD(l:L_total) = zeros(l,L_total); %Extrinsic for Log-MAP
L_eSimD(l:L_total) = zeros(l,L_total); %Extrinsic for Simplified
for iter = l:niter


























for il = l:L_total
%Branch matrices associated with input 1 and input 0 for










for state = lmstates
%find the state transition probability for each algorithm.
%findlog,lmpfindlog,Simfindlog are used to calculate
% log(exp(x)+exp(y)) based on the algorithm of its own.
NrcorlD = findlog([0 L_acorlD(il)]);
NrLogD = log(l+exp(L_aLogD(il)));
llrlmpD = lmpfindlog([0 L_almpD(il)]);
HrSimD = Simfindlog([0 L_aSimD(il)]);
%find the branch metric for the transition














gammaOSimD(il,state) = -rec_sD(2*il-l)+ rec_sD(2*il)...
*next_state(state,3)-HrSimD;
gammalSimD(il,state) = rec_sD(2*il-l) + rec_sD(2*il)...
*next_state(state,4)+L_aS¡mD(¡l)-llrS¡mD;
end % end of "for state=..."











































for ¡2 = L_total:-l:l








LogBetalD(i2,stal) = log(tempBetal)- total_gamAILogD(¡2);
end














for da = l:L_total
























% Calculate the extrinsic values for each algorithm
% these need to be interleaved for the second decoder
L_ecorlD = L_allcorlD - 2*rec_sD(l,l:2:2*L_total) - L_acorlD;
L_eSimD = L_allSimD - 2*rec_sD(l,l:2:2*L_total) - L_aSimD;
L_eLogD = L_allLogD - 2*rec_sD(l,l:2:2*L_total) - L_aLogD;
L_elmpD = L_alllmpD - 2*rec_sD(l,l:2:2*L_total) - L_almpD;
%First decoder finishes its process and passes extrinsic values
%to the second decoder
% Initialize Alpha and Beta for the second decoder
% Second decoder is not terminated; Left open




















% Interleave extrinsic values from the first decoder
% Temporary variables to hold the interleaved extrinsic values





for il = l:L_total
% Initialize all gamma matrices associated with input 1 and 0 for










%find the probability for state transition
llrcorlD = findlog([0 L_a2corlD(il)]);
HrSimD = Simfindlog([0 L_a2SimD(il)]);
MrLogD = log(l+exp(L_a2LogD(il)));
llrlmpD = lmpfindlog([0 L_a2lmpD(il)]);
% Caiculateee the branch metric from the systematic data
% and the parity bits from the second decoder
gamma20corlD(il,sta2) = -rec_s2D(2*il-l) + rec_s2D(2*il)...
*next_sta2(sta2,3)-llrcorlD;
gamma21corlD(il,sta2) = rec_s2D(2*il-l) + rec_s2D(2*il)...
*next_sta2(sta2,4)+L_a2corlD(il)-llrcorlD;
gamma20lmpD(il,sta2) = -rec_s2D(2*il-l) + rec_s2D(2*il)...
*next_sta2(sta2,3)-HrlmpD;
gamma21lmpD(il,sta2) = rec_s2D(2*il-l) + rec_s2D(2*il)...
82
*next_sta2(sta2,4)+L_a2lmpD(¡l)-llrlmpD;
gamma20LogD(il,sta2) = -rec_s2D(2*¡l-l) + rec_s2D(2*¡l)...
*next_sta2(sta2,3)-HrLogD;
gamma21LogD(il,sta2) = rec_s2D(2*¡l-l) + rec_s2D(2*il)...
*next_sta2(sta2,4)+L_a2LogD(¡l)-llrLogD;
gamma20S¡mD(¡l,sta2) = -rec_s2D(2*¡l-l) + rec_s2D(2*il)...
*next_sta2(sta2;3)-llrSimD;
gamma21S¡mD(¡l,sta2) = rec_s2D(2*¡l-l) + rec_s2D(2*il)...
*next_sta2(sta2,4)+L_a2S¡mD(il)-llrSimD;
end %sta2





































for i2 = L_total:-l:l







LogBeta2D(¡2,stal) = log(tempBeta2)- total_gamA2LogD(¡2);
end








































%Extrins¡c values are calculated for decoder 1
L_ecorlD = L_all2corlD - 2*rec_s2D(l:2:2*L_total) - L_a2corlD;
L_eS¡mD = L_all2S¡mD - 2*rec_s2D(l:2:2*L_total) - L_a2S¡mD;
L_elmpD = L_all2lmpD - 2*rec_s2D(l:2:2*L_total) - L_a2lmpD;
L_eLogD = L_all2LogD - 2*rec_s2D(l:2:2*L_total) - L_a2LogD;










% if any errors update the frame error for the particular iteration
if errcorlD(iter) > 0
nferrCorlD(nEN,iter) = nferrCorlD(nENJter) + 1;
end
85
if errLogD(iter) > 0
nferrLogD(nEN,iter) = nferrLogD(nEN,iter) + 1;
end
if errlmpD(iter) >0
nferrlmpD(nEN,iter) = nferrlmpD(nEN,iter) + 1;
end
iferrSimD(iter)>0
nferrSimD(nEN,iter) = nferrSimD(nEN,iter) + 1;
end
end





% display the number of errors and frame
%transmitted for every three frame
if rem(nframe, 3) == 0 1 1 nferrlmpD(nEN,niter) >= ferrlim
%Bit error rate is calculated for each algorithm
berCorl(nEN,l:niter) = errsCorlD(nEN,l:niter)/nframe/(L_total -m);
berLog(nEN,l:niter) = errsLogD(nEN,l:niter)/nframe/(L_total -m);
berlmp(nEN,l:niter) = errslmpD(nEN,l:niter)/nframe/(L_total -m);
berSim(nEN,l:niter) = errsSimD(nEN,l:niter)/nframe/(L_total -m);
fprintf('********Frame size = %d ***********\n'#|__total);
fprintfC********EbNO = %5.2f ********\n',EbNOdb(nEN));
fprintf('%d frames transmitted, %d frames in MuI_Corl error.\n'...
,nframe,nferrCorlD(nEN,niter));
fprintf('%d frames transmitted, %d frames in Sim error.\n'...
,nframe,nferrSimD(nEN,niter));
fprintf('%d frames transmitted, %d frames in Log error.\n'...
,nframe,nferrLogD(nEN,niter));
fprintf('%d frames transmitted, %d frames in Imp error.\n'...
,nframe,nferrlmpD(nEN,niter));
fprintf('Bit error rate\n');
fprintf('Log Error MulCor Error Imp Error Sim Error\n');
for ¡7 = l:niter




% Number of frame transmitted is limited to 5000










title(['Bit error Rate for frame size = ',num2str(N)])
xlabel('SIMR');
ylabel('BER');
% This function changes an integer number to binary for a given % wordlength b
function binary = bin_num(a,b)
state = zeros(l,b); %Assign all zeros
k = a;
for i = l:b















% This function find the iog(lx)
% Sx is a matrix.
%fnmycorrection function calculates the correction value in log(exp(x)
%+exp(y)).
% Iteratively calculates the final result of %log{exp{xl)+exp(x2)+exp(x3)+ ....)






for i = l:llenx-l




% This fuction computes the integer value from a binary number
% This function is used to encode the data.
function integer = intnum(a)
ele = length(a);
temp = 0;
for i = l:ele
temp = temp + a(i)*2A(ele-i);
end
integer = temp;
function correction = fnmycorrection(x,y)





m 152 = -0.1490;
cl52 = 0.4249;
m2 = -0.0783; %-(0.3133-0.1269); 96-0.1864;
c2 = 0.2835; %m2*(-l)+0.3133; %0.4997
m3 = -0.0305; %(0. 1269-0.0489); %-0.078
c3 = 0.1401; % m3*(-2)+0.1269; %0.2829
diff = abs(x-y); %Absolute value of ? - y;
% select the slope and intersection based on the abs of (x-y)
if diff >= 0 && diff <=1
correction = mO*diff + cO;
elseif diff >1 && diff <= 1.5
88
correction = mll5*diff + cll5;
elseifdiff>1.5&&diff<=2
correction = ml52*diff + cl52;
elseïf diff >2 && diff <= 3
correction = m2*diff + c2;
elseif diff > 3 && diff <= 4




% This function encodes the data for both encoders.
% First encoder is terminated.
% Second encoder is left open
% inputl is the randomly generated data.
% Alpha is the random interleaver .
function en_out = myencode_bit(inputl,g,alpha)
data = inputl;
% Interleave the data for the second encoder
input2 = ¡nputl(alpha);
len_input = length(inputl);




% Call the function stateout
[stateO,statel] = stateout(g);
statetransition = inistate;
% iteration for the input
for i = l:len_input
if(inputl(i) == 0) % if the input ¡sO
temp_out(l,i) = state0(ini_state,3); %Parity from encoderl
ini_state = state0(ini_state,2);
state_transition(i+l) = ini_state;







% iteration for the termination
% Needs to find the data sequence for the termination
for j = l:m
%to encode the data, state should be in binary form
% Ca!! the function bin_num
bin = bin_num(ini_state-l,m);
dk = mod(bin*g(2,2:k)',2); % dk is 1 or 0;
data(l,i+j) = dk; % Data is stored for the 2nd encoder
if(dk==0)









temp_data = 2*data-l; % BPSK moudulation for original data
ini_state = 1;
state2_transition = inistate;
for h = l:len_input
if(input2(h) == 0) % if the input is 0
% the ouput is already moudulated









for d = l:m












len_out = length(temp_out); %Length of the parity vector
%Extract data and create a matrix
%that holds the data, parityl, and parityZ in an order




% This function computes the next state
% for the input 1 and 0
% and also computes the output and applies BPSK modulation
function [next_stateO,next_statel] = stateout(g)
[n,k] = size(g);
m = k-1; % number of memory elements
% input is 0
d_k = 0;
fori = l:2Am
next_stateO(i,l) = i; % initial state





% find the output due to the transition
next_state0(i,3) = 2*mod(fa*g(l,:)',2)-l; %BPSK modulation
bin(2:m) = bin(l:m-l);
bin(l,l) = ak;





next_statel(i,l) = ¡; %initial state






% Find the output for the transition
next_statel(i,3) = 2*mod(fa*g(l,:)',2)-l; %BPSK modulation
bin(2:m) = bin(l:m-l);
bin(l,l) = ak;
next_statel(i,2) = intnum(bin)+l; % Next state in integer
end
% This function extracts the received data and stores in a
% matrix with two rows one for decoder 1 and the other one
% is for the second decoder, Systematic data is interleaved.
function mydemux = mydemultiplex(datain,alpha,m)
datajen = length(datain);
temp = max(alpha);








%Script for the hardware simulation
%Decoded data is stored in datatout_B_file
%and dataoutF_file using the Test bench






mydata = [datai tempdataf];
oridata(newalphaD) = mydata;
% ? is the data created randomly by Matlab.
HDLerrs = length(find(oridata(l:L_total-m)~=x))
92
A2. VHDL Codes for the implementation
--This module computes the log(exp(x)+exp(y)).





entity mylogl04 is - entity name is mylogl04
port( x,y : in std_logic_vector(9 downto O) ; - two input definition; 10 bits
logout : out std_logic_vector(9 downto O) -- output 10 bits
);
end entity mylogl04;
architecture beha of mylogl04 is
-component mult is generated using megafunction form Quartus.
- Multiplier has two inputs with width lObits and 8 bits, the output is 18 bits
component mult
PORT
( dataa : IN STD_LOGIC_VECTOR (9 DOWNTO O);
datab : IN STD_LOGIC_VECTOR (7 DOWNTO O);
result : OUT STD_LOGIC_VECTOR (17 DOWNTO O)
);
end component;
-Slope and intersection are defined with 8 bits signed.
--ms(8,7) are slopes and cs(8,7) are intersection with signed representation.
Constant mO : std_logic_vector(7 downto 0):="11000100";
constant m25 : std_logic_vector(7 downto 0):="11001100";
constant m50 : std_logic_vector(7 downto 0):="11010011";
constant m75 : std_logic_vector(7 downto 0):="11011010";
constant m2 : std_logic_vector(7 downto 0):="11101000";
constant m3 : std_logic_vector(7 downto 0):="11110110";
constant m4 : std_logic_vector(7 downto 0):="11111100";
constant m5 : std_logic_vector(7 downto 0):="00000000";
constant m6 : std_logic_vector(7 downto 0):="00000000";
constant cO : std_logic_vector(7 downto 0):="01011001";
constant c25 : std_logic_vector(7 downto 0):="01010111";
constant c50 : std_logic_vector(7 downto 0):="01010011";
constant c75 : std_logic_vector(7 downto 0):="01001110";
constant c2 : std_logic_vector(7 downto 0):="01000000";
constant c3 : std_logic_vector(7 downto 0):="00100100";
constant c4 : std_logic_vector(7 downto 0):="00010010";
constant c5 : std_logic_vector(7 downto 0):="00001000";
constant c6 : std_logic_vector(7 downto 0):="00000100";
93
-module adder2 ¡s defined. It consists a tempadderZ which is generated from
-megafunction. Overflow is saturated within adder 2 module.
Component adder2 is
port (94ata : in std_logic_vector(9 downto 0);
datab : in std_logic_vector(9 downto 0);
dataout : out std_logic_vector(9 downto 0)
);
end component;
--Intermediate signals are defined.
Signal maxy,tempout : std_logic_vector(9 downto 0);
signal tempm,tempc : std_logic_vector(7 downto 0);
signal tempoutl,temp_outl : std_logic_vector(17 downto 0);
signal tempout2 : std_logic_vector(17 downto 0);
signal tempdisp,tempdispl,tempdisp2 : std_logic_vector(17 downto 0);
signal temp2 : std_logic_vector(10 downto 0);
signal display2 : std_logic_vector(9 downto 0);
signal tempabs,tempy : std_logic_vector(9 downto 0);
begin -- behavior of the architecture begins
absprocess : process(tempabs,x,y) -- process for calculating absolute value.
Begin -absprocess
-if ? and y are equal to -32 the abs is 0;
-tempabs is the output from the adder2{subtract)
-if tempabs is -32 then the absolute value is rounded to 31.75.




if tempabs = "1000000000" then
tempout <= "0111111111";
else






end process; - absprocess
-take the 2's compliment for input y so that the adder2 can be used to subtract
process(y) -2's compliment process
begin
if y = "1000000000" then
tempy<= "0111111111";
else
tempy<= (noty) + T;
94
end if;
end process; -2's compliment process
-Intentiate adder2 to subtract
findabs : adder2 port map(x,tempy,tempabs);








-Select the slope and intersection based on the absolute difference of ? and y.
-tempm is the slope and tempe is the intersection.
Mux : process(tempout)
begin -mux process
if(tempout >= "0000000000" and tempout <= "0000000100") then
tempm <= mO;
tempe <= cO;
elsif(tempout > "0000000100" and tempout <= "0000001000") then
tempm <= m25;
tempe <= c25;
elsif(tempout > "0000001000" and tempout <= "0000001100") then
tempm <= m50;
tempe <= c50;
elsif(tempout > "0000001100" and tempout <= "0000010000" ) then
tempm <= m75;
tempe <= c75;
elsif(tempout > "0000010000" and tempout <= "0000100000" ) then
tempm <= m2;
tempe <= c2;
elsif(tempout > "0000100000" and tempout <= "0000110000" ) then
tempm <= m3;
tempe <= c3;
elsif(tempout > "0000110000" and tempout <= "0001000000" ) then
tempm <= m4;
tempe <= c4;











end process mux; - process mux
--Intentiate multiplier; inputs are absolute value(tempout) and the slope(tempm)
multiply : mult port map(tempout,tempm,temp_outl);
--Add the intersection Concatnate tempe according to fraction and integer place.
tempoutl <= temp_outl + ("000000" & tempe &"0000");
-Add the maximum value of ? and y Concatnate maxy according to the sign and fraction
tempdisp <= tempoutl + (?'& maxy & "0000000") when maxy(9) = ?' else
tempoutl + (1I" & maxy & "0000000") ;
-Need to extract only 10 bits from 18 bits. Since the output is signed representation
-we need to take 2's compliment before extracting.
process(tempdisp) -2's compliment for tempdisp = iog(exp(x)+exp(y))
begin -process
if(tempdisp(17) = T) then




end process; -2's compliment for tempdisp
adjust : process(tempdispl) - the value is rounded to the next highest number
begin -adjust process
if (tempdispl(6) = 'l')then





-10 bits are extracted and the sign is reassured.
temp2 <= tempdisp2(17 downto 7) when tempdisp(17) = ?' else
((not tempdisp2(17 downto 7)) + 1);
-Overflow is checked.
overflow : process(temp2)
begin - over flow process
96
case temp2(10 downto 9) is
when "00" => display2 <= temp2(9 downto 0) ;
when "01" => display2 <= "0111111111" ;
when "10" => display2 <= "1000000000" ;
when "11" => display2 <= temp2(9 downto 0) ;
when others => null;
end case;
end process overflow; -- end process
logout <= display2; -- Assign the output
end beha; - End the behavior of the architecture
-Three input adder
- Overflow is saturated based on the sign
library ¡eee;
use ieee.std_logic_1164.all;
-Entity definition, 3 inputs and an output with lObits width
entity adder3 is
port ( dataa : in std_logic_vector(9 downto 0);
datab : in std_logic_vector(9 downto 0);
datac : in std_logic_vector(9 downto 0);




architecture beha of adder3 is
- component description











IN STD_LOGIC_VECTOR (9 DOWNTO 0);
IN STD_LOGIC_VECTOR (9 DOWNTO 0);
IN STD_LOGIC_VECTOR (9 DOWNTO 0);
OUT STD_LOGIC_VECTOR (11 DOWNTO 0)
97
Signal tempresultl : std_logic_vector(ll downto 0);
signal overflow : stdjogic;
signal tempout : std_logic_vector(9 downto 0);
begin - architecture behaviour
-Intentiate the tempadder3
-tempresultl is the output from ternpadder3
addi : tempadder3 port map(dataa,datab,datac,tempresultl);
process(tempresultl) --Overflow is detected
begin -Overflow
if ((tempresultl(ll) = ?") and (tempresultl(lO) = ?') and (tempresultl(9) = ?')) or






process(overflow,tempresultl) -Overflow is saturated based on the sign.
Begin -- Overflow saturataion
if (overflow = T and tempresultl(ll) = T ) then -Overflow is detected and the sign is negative
tempout <= "1000000000"; -minus 32
elsif (overflow = ?' and tempresultl(ll) = 1O') then -Overflow is detected sign possitive
tempout <= "0111111111"; -+31.75
else
tempout <= tempresultl(9 downto 0); - no overflow is detected
end if;
end process; - overflow saturation
dataout <= tempout; -output is assigned
end beha; -- end the architecture for adder3
— Two input adder for signed numbers




-Entity definition for the adder2. Two inputs and an outpt with 10 bits width
entity adder2 is
port (dataa : in std_logic_vector(9 downto 0);
datab : in std_logic_vector(9 downto 0);




architecture beha of adder2 ¡s
-Intermediate signal definition.
signal tempout : std_logic_vector(9 downto 0);
signal over,sign : stdjogic;
--component description




dataa : IN STD_LOGIC_VECTOR (9 DOWNTO 0);
datab : IN STD_LOGIC_VECTOR (9 DOWNTO 0);
overflow : OUT STD_LOGIC ;





-inputs are dataa and datab, outputs are over and tempout.
-over is the overflow detector from tempadder2.
add : tempadder2 port map(dataa,datab,over,tempout);
sign <= dataa(9); -sign of one of the input is chosen to saturate the output
process(tempout,over,sign) -the output from the tempadderZ is saturated
begin -process
if (over = ?' and sign = T) then -overflow and negative number
dataout <= "1000000000"; -negative 32
elsif (over = T and sign = 1O') then -overflow and positive number
dataout <= "0111111111"; -+31.75
else
dataout <= tempout; -no overflow
end if;
end process;
end beha; - end behavior of the architecture
- Four input adder




-Entity definition; 4 inputs and an output
entity adder4 is
99
port( datainl : ¡? std_log¡c_vector(9 downto 0);
datain2 : in std_!ogic_vector(9 downto 0);
datain3 : in std_logic_vector(9 downto 0);
datain4 : in std_logic_vector(9 downto 0);
dataout : out std_logic_vector(9 downto 0)
);
end entity;
-Description of the architecture
architecture beha of adder4 is
- Component description
- tempadderf is a parallel adder from quartus megafunction




IN STD_LOGIC_VECTOR (9 DOWNTO 0);
IN STD_LOGIC_VECTOR (9 DOWNTO 0);
IN STD_LOGIC_VECTOR (9 DOWNTO 0);
IN STD_LOGIC_VECTOR (9 DOWNTO 0);









signal tempresultl : std_logic_vector(ll downto 0);
signal overflow : stdjogic;
signal tempout : std_logic_vector(9 downto 0);
begin - architecture behaviour
-Intentiate tempadder4, output is tempresultl
FINDTEMPRESULT1 : tempadder4 port map(datainl,datain2,datain3,datain4,tempresultl);
process(tempresultl) - overflow detection
begin - process
-two MSBs are considered to detect the overflow
if ((tempresultl(ll) = ?') and (tempresultl(lO) = ?') and
(tempresultl(9) = ?')) or ((tempresultl(ll) = ?') and





end process; - end of overflow detection
process(overflow,tempresultl) - overflow is saturated
begin - process
if (overflow = T and tempresultl(ll) = 'G ) then -overflow and negative output
tempout <= "1000000000";




tempout <= tempresultl(9 downto 0); ~ no overflow
end if;
end process;
dataout <= tempout; - Assign the output
end beha; - End of architecture for adder4
-This is the basic module for calculating forward metric from the branch matrix and previous
-forward metric. This module is used to calculate all the forward matrix in TotalAlphacore.
-- It has two inputs and one output.




use ieee. numeric_std. all;
--Entity définition
—alphalin and alpha2in are the previous alpha values
-gam mal and gamma2 are the branch metrics associated with the alphalin and alpha2in
entity AlphaCore is
port ( alphalin : in std_logic_vector(9 downto O);
alpha2in : in std_logic_vector(9 downto O);
gammal : in std_logic_vector(9 downto 0);
gamma2 : in std_logic_vector(9 downto 0);
alphaout : out std_logic_vector(9 downto 0)
);
end entity;
architecture beha of AlphaCore is
-adder2 is defined
component adder2 is
port (dataa : in std_logic_vector(9 downto 0);
datab : in std_logic_vector(9 downto 0);
dataout : out std_logic_vector(9 downto 0)
);
end component; -- adder2
~mylogl04 is defined
component mylogl04 is
port( x,y : in std_logic_vector(9 downto 0) ;





signal tempalphagammal, tempalphagamma2 : std_logic_vector(9 downto 0);
signal tempalpha : std_logic_vector(9 downto 0);
begin ~ architecture
-Add alpha and gamma values
-Intanfiate adder2 two times
Alphagammal : adder2 port map(alphalin,gammal,tempalphagammal);
Alphagamma2 : adder2 port map(alpha2in,gamma2,tempalphagamma2);
--Intentiate mylogl04
calculateALPHA : mylogl04 port map(tempalphagammal,tempalphagamma2,tempalpha);
alphaout <= tempalpha; -Assign the output from the mylogl04
end beha; -End architecture for AlphaCore
-This module consists of AlphaCore to calculate all alpha values for a particular
-generator polynomial. The inputs alpha and gamma must be assigned to the Alphacore
-based on the trellis diagram.
-Similar module should be implemented for beta(backward recursions).
-In our case 16 alpha values need to be calculated, 16 AlphaCore are ¡ntantiated.
-No normalization is done in this module




use work.AlphaType.all; -AlphaType is a user defined package
-Entity definition
entity TotalAlphaCore is
port ( alphain : in Alpha; -16x10 bus defined in AlphaType package
gammain : in Gamma; -4x10 bus defined in AlphaType package
Alphaout :out Alpha -16x10 bus defined in AlphaType package
);
end entity;
architecture beha of TotalAlphaCore is
component AlphaCore is -Component declaration forAlphaCore
port ( alphalin : in std_logic_vector(9 downto 0);
alpha2in : in std_logic_vector(9 downto 0);
gammal : in std_logic_vector(9 downto 0);
gamma2 : in std_logic_vector(9 downto 0);




signal tempalpha,tempout : Alpha ;
signal tempgamma : Gamma;
begin -Architecure behaviour
alphaout <= tempout; -- output is assigned
-temporary sigal assignments
-can be used directly
tempAlpha <= alphain;
tempgamma <= gammain;
-lntantiate 16 AlphaCore modules and each output from the module is assigned to tempout
CALCULATEALPHAO : Alphacore port map
(tempalpha(0),tempalpha(l)/tempgamma(0),tempgamma(3),tempout(0));
CALCULATEALPHA1 : Alphacore port map
(tempalpha(2),tempalpha(3),tempgamma(2),tempgamma(l),tempout(l));
CALCULATEALPHA2 : Alphacore port map
(tempalpha(5),tempalpha(4)/tempgamma(l),tempgamma(2),tempout(2));
CALCULATEALPHA3 : Alphacore port map
(tempalpha(6),tempalpha(7),tempgamma(0),tempgamma(3),tempout(3));
CALCULATEALPHA4 : Alphacore port map
(tempalpha(9),tempalpha(8),tempgamma(l),tempgamma(2),tempout(4));
CALCULATEALPHA5 : Alphacore port map
(tempalpha(10),tempalpha(ll),tempgamma(0),tempgamma(3),tempout(5));
CALCULATEALPHA6 : Alphacore port map
(tempalpha(12)/tempalpha(13),tempgamma(0),tempgamma(3),tempout(6));
CALCULATEALPHA7 : Alphacore port map
(tempalpha(15),tempalpha(14),tempgamma(l),tempgamma(2),tempout(7));
CALCULATEALPHA8 : Alphacore port map
(tempalpha(l),tempalpha(0),tempgamma(0),tempgamma(3),tempout(8));
CALCULATEALPHA9 : Alphacore port map
(tempalpha(2),tempalpha(3),tempgamma(l)/tempgamma(2),tempout(9));
CALCULATEALPHAa : Alphacore port map
(tempalpha(4)/tempalpha(5),tempgamma(l),tempgamma(2),tempout(10));
CALCULATEALPHAb : Alphacore port map
(tempalpha(7),tempalpha(6),tempgamma(0)/tempgamma(3),tempout(ll));
CALCULATEALPHAc : Alphacore port map
(tempalpha(8),tempalpha(9),tempgamma(l),tempgamma(2),tempout(12));
CALCULATEALPHAd : Alphacore port map
(tempalpha(ll),tempalpha(10),tempgamma(0),tempgamma(3),tempout(13));
CALCULATEALPHAe : Alphacore port map
(tempalpha(13),tempalpha(12),tempgamma(0),tempgamma(3)/tempout(14));
CALCULATEALPHAf : Alphacore port map
(tempalpha(14)/tempalpha(15),tempgamma(l),tempgamma(2),tempout(15));
103
end beha; -- End the behavior of the architecture for TotalAlphacore
-This module calculates the normalization value used in FinalAlpha module.
-maxlog is used to calculate log(exp(x) + exp(y)); max(x,y) is considered.





use work.AlphaType.all; - -User defined package
entity total_gammaAlpha is -Entity definition
port ( gammain : in Gamma; -4x10 bus defined in AlphaType
alphain : in Alpha; -16x10 bus defined in AlphaType
totalout : out std_logic_vector(9 downto O)
Ï;
end entity;
architecture beha of total_gammaAlpha is
component maxlog is -component maxlog is declared
port ( x,y : in std_logic_vector(9 downto O);
logout : out std_logic_vector(9 downto 0)
);
end component;
component adder2 is -Adder2 declaration
port (dataa : in std_logic_vector(9 downto 0);
datab : in std_logic_vector(9 downto 0);




signal tempgammal,tempgamma2 : std_logic_vector(9 downto 0);
signal templ,temp2,temp3,temp4 : std_logic_vector(9 downto 0);
signal temp5,temp6,temp7,temp8 : std_logic_vector(9 downto 0);
signal temp9,templ0,templl,templ2 : std_logic_vector(9 downto 0);
signal templ3,templ4,templ5,templ6 : std_logic_vector(9 downto 0);
signal tempout : std_logic_vector(9 downto 0);
begin - architecture behavior
-The inputs are selected from the trellis diagram
-First branch calculation by intantiating 8 maxlog modules.
-12 means alpha(O) and alpha(l) are the imputs for maxlog module
104
FINDALPHA12 : maxlog port map(alphain(0),alphain(l),templ);
FINDALPHA78 : maxlog port map(alphain(6),alpha¡n(7),temp2);
FINDALPHA1112 : maxlog port map(alphain(10),alphain(ll),temp3);
FINDALPHA1314 : maxlog port map(alphain(12),alphain(13),temp4);
FINDALPHA34 : maxlog port map(alphain(2),alphain(3),temp5);
FINDALPHA56 : maxlog port map(alpha¡n(4),alpha¡n(5),temp6);
FINDALPHA910 : maxlog port map(alpha¡n(8),alpha¡n(9),temp7);
FINDALPHA1516 : maxlog port map(alphain(14),alphain(15),temp8);
-Second branch calculation
-the inputs arethe ouputs from previous modules. 4 maxlog are intantiated.
-1278 means it gets the input from 12 and 78
FINDALPHA1278 : maxlog port map(templ,temp2,temp9);
FINDALPHA11121314 : maxlog port map(temp3,temp4,templ0);
FINDALPHA3456 : maxlog port map(temp5,temp6,templl);
FINDALPHA9101516 : maxlog port map(temp7,temp8,templ2);
-Third branch calculation
-FIRST gets the input from 1278 and 11121314
-TWO gets the input from 3456 and 9101516
FINDALPHAFIRST : maxlog port map(temp9,templ0,templ3);
FINDALPHATWO : maxlog port map(templl,templ2,templ4);
FINDGAMMA1 : maxlog port map(gammain(0),gammain(3),tempgammal);
FINDGAMMA2 : maxlog port map(gammain(l),gammain(2),tempgamma2);
ADDALPHAFIRSTGAMMA1: adder2 port map(templ3,tempgammal,templ5);
ADDALPHATWOGAM M A2 : adder2 port map(templ4,tempgamma2,templ6);
FINDTOTAL : maxlog port map(templ5,templ6,tempout);
totalout <= tempout;
end beha; -End architecture.
-This module computes the normalized alpha values
-It instantiats TotalAlphaCore and TotalgammaAlpha




use work.AlphaType.all; - User defined package
-Entity definition
entity FinalAlpha is
port( enable : in stdjogic;
alphain : in Alpha; -Alpha and Gamma are defined in AlphaType
gammain : Gamma;
105
Alphaout : out Alpha
);
end entity;
-architectue of behavior of FinalAipha
architecture beha of FinalAipha is
component TotalAlphaCore is -Component declaration of TotalAlphaCore
port ( alphain : in Alpha;




component adder2 is -Component declaration of adder2
port (dataa : in std_logic_vector(9 downto 0);
datab : in std_logic_vector(9 downto 0);
dataout : out std_logic_vector(9 downto 0)
);
end component;
component total_gammaAlpha is -- Component declaration of total_gammaAlpha
port ( gammain : in Gamma;
alphain : in Alpha;




signal tempalpha,temptotalAlpha : Alpha;
signal temptotal,temptotall : std_logic_vector(9 downto 0);
begin
-The output from the total_gammaAlpha is 2's complemented
-to use the adderZ as subtracter
process(temptotal) ~ 2's complement process
begin
if temptotal = "1000000000" then
temptotall <= "0111111111";
else
temptotall <= (not temptotal) + 1;
end if;
end process;
-Infantiate TotalAlphaCore to calculate the Alpha values without normalization
CALCULATEALPHA : TotalAlphaCore port map(alphain,gammain,tempalpha);
-Compute the normalization value to be subtracted, by intantiating total_gammaAlpha
CALCULTETOTAL : total_gammaAlpha port map(gammain,alphain,temptotal);
-Intanfiate 16 adder2 modules to normalize the output form TotalAlphaCore
CALTEMPALPHAO : adder2 port map(tempalph.a(0),temptotall,temptotalAlpha(0));
CALTEM PALPHAl : adder2 port map(tempalpha(l),temptotall,temptotalAlpha(l));
CALTEMPALPHA2 : adder2 port map(tempalpha(2),temptotall,temptotalAlpha(2));
CALTEMPALPHA3 : adder2 port map(tempalpha(3),temptotall,temptotalAlpha(3));
106
CALTEMPALPHA4 : adder2 port map(tempalpha(4),temptotall,temptotalAlpha(4));
CALTEMPALPHA5 : adder2 port map(tempalpha(5),temptotall,temptotalAlpha(5));
CALTEMPALPHA6 : adder2 port map(tempalpha(6),temptotall,temptotalAlpha(6));
CALTEMPALPHA7 : adder2 port map(tempalpha(7),temptotall,temptotalAlpha(7));
CALTEM PALPHA8 : adder2 port map(tempalpha(8),temptotall,temptotalAlpha(8));
CALTEM PALPHA9 : adder2 port map(tempalpha(9),temptotall,temptotalAlpha(9));
CALTEM PALPHAlO : adder2 port map(tempalpha(10),temptotall,temptotalAlpha(10));
CALTEMPALPHA11 : adder2 port map(tempalpha(ll),temptotall,temptotalAlpha(ll)),
CALTEM PALPHA12 : adder2 port map(tempalpha(12),temptotall,temptotalAlpha(12));
CALTEM PALPHA13 : adder2 port map(tempalpha(13),temptotall,temptotalAlpha(13)).
CALTEMPALPHA14 : adder2 port map(tempalpha(14),temptotall,temptotalAlpha(14));
CALTEMPALPHA15 : adder2 port map(tempalpha(15),temptotall,temptotalAlpha(15)).
--When there is a change in temptotalAipha check the enable is one
-Enable is from the control module, the output is available when enabe is 1
process(enable,temptotalAlpha)
begin




end beha; --End the behavior of FinalAlpha
--This module computes the branch matrix from the systematic data, parity data, and
-extrinsic values.
-~mylogl04 is used to calculate log(exp(x)+exp(y}}
--Gammaout contains gamma00(0/0; input 0 and output 0), gamma01(0/l),gammal0(l/0),
-gammall(l/l)





use work.AlphaType.all; -User defined package
-Entity definition
entity gammaF is
port (enable : in stdjogic;
datain : in std_logic_vector(9 downto O); -Systematic data
parityin : in std_logic_vector(9 downto 0); -parity data
L_eadd : in std_logic_vector(9 downto 0); -extrinsic value




architecture beha of gammaF is
-Intermediate signal definitions
signal tempLe : std_logic_vector(9 downto 0);
signal gammaOO : std_logic_vector(9 downto O);
signal gammaOl : std_logic_vector(9 downto O);
signal gammalO : std_logic_vector(9 downto O);
signal gammall : std_logic_vector(9 downto O);
signal tempdatain : std_logic_vector(9 downto 0);
signal tempparityin : std_logic_vector(9 downto O);
signal tempL_E : std_logic_vector(9 downto 0);
component mylogl04 is -Component mylogl04 is declared
port( x,y : in std_logic_vector(9 downto 0) ;
logout : out std_logic_vector(9 downto 0)
);
end component mylogl04;
component adder3 is -Component adder3 declaration
port ( dataa : in std_logic_vector(9 downto 0);
datab : in std_logic_vector(9 downto 0);
datac : in std_logic_vector(9 downto 0);
dataout : out std_logic_vector(9 downto 0)
);
end component;
component adder4 is --Component adder4 declaration
port( datainl : in std_logic_vector(9 downto 0);
datain2 : in std_logic_vector(9 downto 0);
datain3 : in std_logic_vector(9 downto 0);
datain4 : in std_logic_vector(9 downto 0);







if datain = "1000000000" then
tempdatain <= "0111111111";
else














if tempLe = "1000000000" then
tempL_E <= "0111111111";
else
tempL_E <= (not tempLe) + 1;
end if;
end process;
-Calculate the state transition probability from the extrinsic value
-Intentiate mylogl04 module
calculateL_e : mylogl04 port map("0000000000",L_eadd,tempLe);
-The equations are derived under the Gamma Module in this thesis
-GammaOO calculation
CALCULATEGAMMAOO : adder3 port map(tempdatain,tempparityin,tempL_E,gammaOO);
-GammaOl calculation
CALCULATEGAMMA01 : adder3 port map(tempdatain,parityin,tempL_E,gamma01);
-GammalO calculation
CALCULATEGAMMA10 : adder4 port map(datain,tempparityin,tempL_E,L_eadd,gammalO);
—Gammall calculation
CALCU LATEGAM MAIl : adder4 port map(datain,parityin,tempL_E,L_eadd,gammall);
-When the enable is 1 then the output is changed
-Output remains until enable is changed to 1, Latch operation
process(enable,gamma00,gamma01,gammal0,gammall)
begin







end beha; -End the behavior of GammaF
-This module is used to calculate the LLR
-The inputs are Alpha(16xl0 bus), Beta(16xl0 bus), and Gamma(4xl0 bus)
-maxlog is used to. calculate log(exp(x)+exp(y)) to reduce the critical path delay.
-Tree architecture is used to calculate the LLR
-Inputs for the adder3 must be derived from the trellis diagram.






use work.AlphaType.all; -user defined package
-Entity definitori
entity LLRCore is
port ( enable : in stdjogic;
alphain : in Alpha;
betain : in Alpha;
gammain : in Gamma;
LLRout : out std_logic_vector(9 downto O)
);
end entity;
architecture beha of LLRCore is
component adder2 is ~adder2 declaration
port (dataa : in std_logic_vector(9 downto O);
datab : in std_logic_vector(9 downto O);
dataout : out std_logic_vector(9 downto 0)
);
end component;
component adder3 is -adders declaration
port ( dataa : in std_logic_vector(9 downto 0);
datab : in std_logic_vector(9 downto 0);
datac : in std_logic_vector(9 downto 0);
dataout : out std_logic_vector(9 downto 0)
);
end component;
component maxlog is --maxlog declaration
port(- elk : in stdjogic;
x,y : in std_logic_vector(9 downto 0) ;
logout : out std_logic_vector(9 downto 0)
);
end component maxlog;
-Intermediate signal definitions for input 1
signal temp0,templ,temp2,temp3 : std_logic_vector(9 downto 0);
signal temp4,temp5,temp6, temp7 : std_logic_vector(9 downto 0);
signal temp8,temp9,templ0,templl : std_logic_vector(9 downto 0);
signal templ2,templ3,templ4,templ5 : std_logic_vector(9 downto 0);
signal templog0,templogl,templog2 : std_logic_vector(9 downto 0);
signal templog3,templog4,templog5,templog6,templog7 : std_logic_vector(9 downto 0);
signal templog20,templog21,templog22,templog23 : std_logic_vector(9 downto 0);
signal templog30,templog31 : std_logic_vector(9 downto 0);
110
-Intermediate signai definitions for input 0
signal ZtempO,Ztempl,Ztemp2,Ztemp3 : std_logic_vector(9 downto 0);
signal Ztemp4,Ztemp5/Ztemp6, Ztemp7 : std_logic_vector(9 downto 0);
signal Ztemp8,Ztemp9,ZtemplO,Ztempll : std_logic_vector(9 downto 0);
signal Ztempl2,Ztempl3,Ztempl4,Ztempl5 : std_logic_vector(9 downto 0);
signal ZtemplogO,Ztemplogl,Ztemplog2,Ztemplog3 : std_logic_vector(9 downto 0);
signal Ztemplog4,Ztemplog5,Ztemplog6,Ztemplog7 : std_logic_vector(9 downto 0);
signal Ztemplog20,Ztemplog21,Ztemplog22,Ztemplog23 : std_logic_vector(9 downto 0);
signal Ztemplog30,Ztemplog31 : std_logic_vector(9 downto 0);
signal tempLLRl,tempLLRO : std_logic_vector(9 downto 0);
signal twosLLRO,tempLLRout : std_logic_vector(9 downto 0);
begin
-When enable is one output is changed; latch
process(enable,tempLLRout)
begin




-LLR is calculated by subtracting the probability due to input 0 from probability due to input 1
-Probability due to input 0 is 2's complemented.
process(tempLLRO) -2's complement
begin
if tempLLRO = "1000000000" then
twosLLR0<= "0111111111";
else
twosLLRO <= (not tempLLRO) + 1;
end if;
end process;
-adder2 is intantiated to calculate the final LLR
FINDLLROUT : adder2 port map(tempLLRl,twosLLRO,tempLLRout);
-First branch caiculations for input 1, just adding the inputs to feed them to maxlog.
-16 adders modules are instantiated for input 1.
FINDTEMPO : adder3 port map(gammain(3),alphain(0),betain(8),temp0);
FINDtempl : adder3 port map( gammain(3),alphain(l),betain(0),templ);
FINDtemp2 : adder3 port map( gammain(2),alphain(2),betain(l),temp2);
FINDtemp3 : adder3 port map( gammain(2), alphain(3), betain(9),temp3);
FINDtemp4 : adder3 port map( gammain(2), alphain(4), betain(2),temp4);
FINDtemp5 : adder3 port map( gammain(2),alphain(5), betain(10),temp5);
FINDtemp6 : adder3 port map( gammain(3),alphain(6), betain(ll),temp6);
FINDtemp7 : adder3 port map( gammain(3),alphain(7) , betain(3),temp7);









adder3 port map( gammain(2) , alphain(9), betain(12),temp9);
: adder3 port map( gammain(3) , alphain(lO) , betain(13),templ0);
: adder3 port map( gammain(3) , alphain(ll), betain(5),templl);
: adder3 port map( gammain(3) , alphain(12), betain(14),templ2);
: adder3 port map( gammain(3) , alphain(13),betain(6),templ3);
: adder3 port map( gammain(2) , alphain(14), betain(7),templ4);
: adder3 port map( gammain(2) , alphain(15) , betain(15),templ5);
-First branch calculations for input 0, just adding the inputs to feed them to maxlog

















: adder3 port map( gammain(O) , alphain(O) , betain(0),ZtempO);
) , alphain(l) , betain(8),Ztempl);
) , alphain(2) , betain(9),Ztemp2);
) , alphain(3) , betain(l),Ztemp3);
) , alphain(4) , betain(10),Ztemp4);
) , alphain(5) , betain(2),Ztemp5);
) , alphain(6) , betain(3),Ztemp6);
) , alphain(7) , betain(ll),Ztemp7);
) , alphain(8) , betain(12),Ztemp8);
_____ r_ _,_, 0_ _...,-) , alphain(9) , betain(4),Ztemp9);
. adder3 port map( gammain(O) , alphain(lO) , betain(5),ZtemplO);
: adder3 port map( gammain(O) , alphain(ll) , betain(13),Ztempll);
: adder3 port map( gammain(O) , alphain(12) , betain(6),Ztempl2);
: adder3 port map( gammain(O) , alphain(13) , betain(14),Ztempl3);
: adder3 port map( gammain(l) , alphain(14) , betain(15),Ztempl4);
: adder3 port map( gammain(l) , alphain(15) , betain(7),Ztempl5);
adder3 port map( gammain(0
adder3 port map( gammain(l
adder3 port map( gammain(l
adder3 port map( gammain(l
adder3 port map( gammain(l
adder3 port map( gammain(0
adder3 port map( gammain(0
adder3 port map( gammain(l
adder3 port map( gammain(l
-CALCULATE INTERMIDIATE LOG VALUES FOR THE INPUT 1
-First branch modules(adder3) provide input to maxlog modules
-Second branch calculations for input 1
-8 maxlog modules are intantiated
CALCULATETEMPLOG0 : maxlog port map(tempO,templ,templogO);
CALCULATETEMPLOG1 : maxlog port map(temp2,temp3,templogl);
CALCULATETEMPLOG2 : maxlog port map(temp4,temp5;templog2);
CALCULATETEMPLOG3 : maxlog port map(temp6,temp7,templog3);
CALCULATETEMPLOG4 : maxlog port map(temp8,temp9,templog4);
CALCULATETEMPLOG5 : maxlog port map(templ0,templl,templog5);
CALCULATETEMPLOG6 : maxlog port map(templ2,templ3,templog6);
CALCULATETEMPLOG7 : maxlog port map(templ4,templ5,templog7);
-Third branch calculation for input 1
--Second branch modules provide the input signals.
-4 maxlog modules are intantiated, inputs are from the previous maxlog modules
CALCULATETEMPLOG20 : maxlog port map(templog0,templogl,templog20);
CALCULATETEMPLOG21 : maxlog port map(templog2,templog3/templog21);
112
CALCULATETEMPLOG22 : maxlog port map(templog4,templog5,templog22);
CALCULATETEMPLOG23 : maxlog port map(templog6,templog7,templog23);
-Forth branch calculation for input 1
-Third branch modules provide the input signals
-2 maxlog modules are intantiated, inputs are from the previous maxlog modules
CALCULATETEMPLOG30 : maxlog port map(templog20,templog21,templog30);
CALCULATETEMPLOG31 : maxlog port map(templog22,templog23,templog31);
-Probability for input 1
CALCU LATETEM PLLRl : maxlog port map(templog30,templog31,tempLLRl);
-CALCULTATETHE INTERMIDIAE LOG VALUES FOR INPUTO
-First branch modules{adder3 for input 0) provide input to maxlog modules
-Second branch calculations for input 0
-8 maxlog modules are intantiated
CALCULATEZTEMPLOGO : maxlog port map(ZtempO,Ztempl,ZtemplogO);
CALCULATEZTEMPLOG1 : maxlog port map(Ztemp2,Ztemp3,Ztemplogl);
CALCULATEZTEMPLOG2 : maxlog port map(Ztemp4,Ztemp5,Ztemplog2);
CALCULATEZTEMPLOG3 : maxlog port map(Ztemp6,Ztemp7,Ztemplog3);
CALCULATEZTEMPLOG4 : maxlog port map(Ztemp8,Ztemp9,Ztemplog4);
CALCULATEZTEMPLOG5 : maxlog port map(ZtemplO,Ztempll,Ztemplog5);
CALCULATEZTEMPLOG6 : maxlog port map(Ztempl2,Ztempl3,Ztemplog6);
CALCULATEZTEMPLOG7 : maxlog port map(Ztempl4,Ztempl5,Ztemplog7);
-Third branch calculation for input 0
--Second branch modules provide the input signals
-4 maxlog modules are intantiated, inputs are from the previous maxlog modules
CALCULATEZTEMPLOG20 : maxlog port map(ZtemplogO,Ztemplogl,Ztemplog20);
CALCULATEZTEMPLOG21 : maxlog port map(Ztemplog2,Ztemplog3,Ztemplog21);
CALCULATEZTEMPLOG22 : maxlog port map(Ztemplog4,Ztemplog5,Ztemplog22);
CALCULATEZTEMPLOG23 : maxlog port map(Ztemplog6,Ztemplog7,Ztemplog23);
-Forth branch calculation for input 0
-Third branch modules provide the input signals
-2 maxlog modules are intantiated, inputs are from the previous maxiog modules
CALCU LATEZTEMPLOG30 : maxlog port map(Ztemplog20,Ztemplog21,Ztemplog30);
CALCULATEZTEMPLOG31 : maxlog port map(Ztemplog22,Ztemplog23,Ztemplog31);
-probability for input 0
CALCULATETEMPLLRO : maxlog port map(Ztemplog30,Ztemplog31,tempLLR0);
END beha; -End the behavior for LLR
-This module calculate extrinsic values for each data.
-The inputs are the output from LLRmodule, datain form datamem, and LEin from LEmem







port (enable : in stdjogic; -Signal form control module
LLRin : in std_logic_vector(9 downto O);
datain : in std_logic_vector(9 downto O);
LEin : in std_logic_vector(9 downto O);
LEout : out std_logic_vector(9 downto O)
);
end entity;
architecture beha of LEModule is
-Component description
component adder4 is
port( datainl : in std_logic_vector(9 downto O);
datain2 : in std_logic_vector(9 downto 0);
datain3 : in std_logic_vector(9 downto 0);
datain4 : in std_logic_vector(9 downto 0);
dataout : out std_logic_vector(9 downto 0)
);
end component;
signal tempLEout,tempdatain,tempLEin : std_logic_vector(9 downto 0);
begin
process(enable,tempLEout) -process to assign the output when enable is 1
begin




process(datain) --2's complement for datain
begin
if datain = "1000000000" then
tempdatain <= "0111111111";
else
tempdatain <= (not datain) + 1;
end if;
end process;
process(LEin) -2's complement for LEin
begin
if LEin = "1000000000" then
tempLEin <= "0111111111";
else




-Instantiate adder4 to subtract datain and LEiti from LLRin
FINDLE : adder4 port map(LLRin,tempdatain,tempdatain,tempLEin,tempLEout);
end beha;
-This module provides all the control signal and the addresss needed for data, parity
-and extrinsic values.




use ieee. numeric_std. all;
entity control is
port( elk : in stdjogic; -Clock signal input
start : in stdjogic; -Start the decoding process
countup : out std_logic_vector(9 downto O); - counter values for forward direction
countdown : out std_logic_vector(9 downto O); --counter values for backware direction
dataLEaddressF : out std_logic_vector(9 downto 0); -Address for LE forward
dataLEaddressB : out std_logic_vector(9 downto 0); -Address for LE backward
parityaddressF : out std_logic_vector(9 downto 0); -Parity address for forward
parityaddressB : out std_logic_vector(9 downto 0); -parity address for backward
alphaAddress : out std_logic_vector(9 downto 0); -Address for alpha
betaAddress : out std_logic_vector(9 downto 0); -Address for Beta
gammaAddressF : out std_logic_vector(9 downto 0); -Gamma address for forward
gammaAddressB : out std_logic_vector(9 downto 0); -Gamma address for backward
datamemenable : out stdjogic; -memory enable for datamem
paritylenable : out stdjogic; -memory enable for paritylmem
parity2enable : out stdjogic; - memory enable for parity2mem
LEmemenable : out stdjogic; -memory enable for LEmem
datawr : out stdjogic; -Read/write enable for datamem
paritylwr : out stdjogic; -Read/write enable for paritylmem
parity2wr : out stdjogic; -Read/write enable for parity2mem
LEwr : out stdjogic; -Read/write enable for LEmem
alphamemenable :out stdjogic; -memory enable for Alphamem
betamemenable : out stdjogic; -memory enable for betamem
gammamemenable : out stdjogic; -memory enable for gamma memory
alphawr : out stdjogic; -Read/write enable for Alphamem
betawr : out stdjogic; -Read/write enable for betamem
gammawr : out stdjogic; -Read/write enable for gammatnem
alphaenable : out stdjogic; -Enable for FinalAlpha module
betaenable : out stdjogic; -Enable for FinaIBeta module
gammaenable : out stdjogic; -Enable for GammaF module
LLRenable : out stdjogic; -Enable for LLR module
115
LEenable : out stdjogic; -Enable for LEmodule
Ld_Areg : out std_logic; -Load signal for Alpha register
Ld_Breg : out std_logic; -Load signal for Beta register
decode : out stdjogic; - decode is set to one at the end of the iteration
decoder : out stdjogic -- Used to switch the decoder 1 to decoder 2
);
end entity;
architecture beha of control is
-Define an array type to hold the interleaver address
type inter is array (0 to 1023) of stdJogic_vector(9 downto 0);
- INTERLEAVER ADDRESSES ARE DEFINED AS CONSTATANTS.
































sig_countup : stdJogic_vector(9 downto 0) := "0000000000"; -initalized to zero
sig_countdown : stdJogic_vector(9 downto 0) := "1111111111"; -initialized to 1023
sig_decoder : stdjogic := ?'; -DECODER IS INITIALIZED TO ZERO =>decöder 1
statejype is (IDLE,STA0,STA1,STA2,STA3); -State transition for the control module
state, next_state : statejype;
iter : stdJogic_vector(2 downto 0) := "100"; -NUMBER OF ITERATIONS
sig_dataLEaddressF : stdJogic_vector(9 downto 0):= "0000000000";
sig_dataLEaddressB : stdJogic_vector(9 downto 0):= "1111111111";
sig_parityaddressF : stdJogic_vector(9 downto 0);
sig_parityaddressB : stdJogic_vector(9 downto 0);
sig_alphaAddress : stdJogic_vector(9 downto 0);
sigJaetaAddress : stdJogic_vector(9 downto 0);
sig_gammaAddressF : stdJogic_vector(9 downto 0) ;
sig_gammaAddressB : stdJogic_vector(9 downto 0);
sig_datamemenable : stdjogic := ?;
sig_paritylenable : stdjogic :='G ;
sig_parity2enable : stdjogic := ?';
sigJ-Ememenable : stdjogic := T;
sig_datawr : stdjogic := ?' ;
sig_paritylwr : stdjogic := ?' ;
sig_parity2wr : stdjogic := 1O';
sig_LEwr : stdjogic := 1O' ;
sig_alphamemenable : stdjogic := T;
sigjDetamemenable : stdjogic := 1I';
sig_gammamemenable : stdjogic := ?;
sig_alphawr : stdjogic := ?;
sigjaetawr : stdjogic := T;
sig_gammawr : stdjogic := 1I';
sig_alphaenable : stdjogic; -ALPHA MODULE
sigjpetaenable : stdjogic; -BETA MODULE
116
Signal s¡g_gammaenable : std_logic;-GAMMA MODULE
signal sig_LLRenable : stdjogic; -LLR MODULE
signal sig_LEenable : stdjogic; - LE MODULE
signal sig_Ld_Areg : stdjogic := 1O';
signal sigJ-dJ3reg : stdjogic := ?';
signal sig_decode : stdjogic := T;
begin
-BASED ON THE DECODER PARITY ENALBLE IS SECLECTED
PARITYSELECT: process(sig_decoder)
begin









- DATA ADDRESSES ARE ASSIGNED ACCORDING TO THE DECODER
begin
-if (clk'event and elk = ?') then
if sig_decoder = 1O' then -decoder 1
sig_dataLEaddressF <= sig_countup;
sig_dataLEaddressB <= sig_countdown;




-ONCE REACHED THE MIDDELE OF THE FRAME
- ALPHA AND BETA ADDRESSES ARE SWITCHED TO READ FROM THE MEMORY









process(clk) -State transitions for the control module
begin









if (sig_decoder = ?') then -if the decoding process is already started
next_state <= STAO;
elsif(sig_decoder = 1O' and (start = T or sig_decode = 1O')) then --if start is active high
next_state <= STAO;
else -Either decoding not started or start is active low
next_state <= IDLE; -remains until start is active high
end if;
when STAO => -decoding is process
next_state <= STAI; -go to next state
when STAI =>
if sig_countup <= "0111111111" then -counter not reached the middle of the frame
next_state <= STAO; -go to STAO
else
next_state <= STA2; -go to STA2; LLR calculation
end if;
when STA2 =>
if sig_countup = "1111111111" then -counter reached end of the frame
next_state <= IDLE;






end process; -state transitions for control module ends
process(clk) -Control Signal assignments based on the state
begin




































sig_countup <= sigcountup + 1;
s¡g_countdown <= s¡g_countdown - 1;
when STAI =>



































¡f{iter = "000" and sig_decoder = T) then













sig_countup <= sig_countup + 1;
sig_countdown <= sig_countdown - 1;
if (sig_countup = "1111111111") then --at the end of the frame
sig_decoder <= not sig_decoder; -Switch the decoder
120
end if;
--iteration is reduced by one when the second decoder is reached the end of the frame
if (sig_decoder = T and sig_countup = "1111111111") then



















end process; --Signal assignment process
-Assign all temporary signal values to the output of the Control module
countup <= sig_countup;
countdown <= sig_countdown;
dataLEaddressF <= sig_dataLEaddressF ;
dataLEaddressB <= sigdataLEaddressB ;








parity2enable <= sig_parity2enable ;
LEmemenable <= sig_LEmemenable ;
datawr <= sig_datawr ;
paritylwr <= sig_paritylwr ;
parity2wr <= sig_parity2wr ;
LEwr <= sig_LEwr ;
121
alphamemenable <= sig_alphamemenable ;
betamemenable <= s¡g_betamemenable ;
gammamemenable <= sig_gammamemenable;
alphawr <= sig_alphawr ;
betawr <= sig_betawr ;
gammawr <= sig_gammawr ;
alphaenable <= sig_alphaenable ;
betaenable <=sig_betaenable ;
gammaenable <= sig_gammaenable ;
LLRenable <= sig_LLRenable ;
LEenable <= sig_LEenable;
Ld_Areg <= sig_Ld_Areg ;
Ld_Breg <= sig_Ld_Breg ;
decoder <= sig_decoder;
decode <= sig_decode;
end beha; --Behavior of the control module
--This is the module for compiete decoder







port (elk : in stdjogic;
start : in stdjogic;
--for the test purposes, in actual implementation not required
testcountup : out std_logic_vector(9 downto O); - just for test purpose
testcountdown : out std_logic_vector(9 downto O); - just for test purpose
testdecode : out stdjogic; - just for test purpose
testdecoder : out stdjogic; ~ just for test purpose
testllrenable : out stdjogic;
gammaFout : out Gamma;
gammaBout : out Gamma;
alphaout : out Alpha;
betaout : out Alpha;
-Following two outputs are the actual output from the decoder
DataoutB : out stdjogic; -decoded data from the forward modules




architecture beha of TestGa
-data memory generated


















IN STD_LOGIC_VECTOR (9 DOWNTO O);
IN STD_LOGIC_VECTOR (9 DOWNTO O);
INSTDJ.OGIC;
IN STD_LOGIC_VECTOR (9 DOWNTO O);







OUT STD_LOGIC_VECTOR (9 DOWNTO O);
OUT STD_LOGIC_VECTOR (9 DOWNTO O)














IN STD_LOGIC_VECTOR (9 DOWNTO O);
IN STD_LOGIC_VECTOR (9 DOWNTO O);
INSTDJ.OGIC;
IN STD_LOGIC_VECTOR (9 DOWNTO O);







OUT STD_LOGIC_VECTOR (9 DOWNTO O);
OUT STD_LOGIC_VECTOR (9 DOWNTO O)
end component;









IN STD_LOGIC_VECTOR (9 DOWNTO O);
IN STD_LOGIC_VECTOR (9 DOWNTO O);
IN STDJ.OGIC ;
IN STDJ.OG IC_VECTOR (9 DOWNTO O);






















































OUT STD_LOGIC_VECTOR (9 DOWNTO O);
OUT STD_LOGIC_VECTOR (9 DOWNTO O)
Dual port memory
IN STD_LOGIC_VECTOR (9 DOWNTO O);
IN STD_LOGIC_VECTOR (9 DOWNTO O);
IN STDJ.OGIC;
IN STD_LOGIC_VECTOR (9 DOWNTO O);
IN STD_LOGIC_VECTOR (9 DOWNTO O);
IN STDJ-OGIC := T;
IN STDJ.OGIC := ?;
IN STD_LOGIC := T;
: OUT STDJ.OG IC_VECTO R (9 DOWNTO O);
: OUT STDJ.OG IC_VECTO R (9 DOWNTO O)
IN STD_LOGIC_VECTOR (9 DOWNTO O);
I N STDJ.OG IC ;
IN STDJ.OGIC;
IN STD_LOGIC_VECTOR (9 DOWNTO O);
IN STDJ.OGIC;
IN STDJ.OGIC;
: OUT STD_LOGIC_VECTOR (9 DOWNTO O)
IN STDJ.OG IC_VECTO R (9 DOWNTO O);
IN STDJ.OGIC;
IN STDJ.OGIC ;
IN STD_LOGIC_VECTOR (9 DOWNTO O);
INSTDJ.OGIC;


























IN STD_LOGIC_VECTOR (9 DOWNTO O);
IN STDJ.OGIC;
INSTDJ.OGIC;
IN STD_LOGIC_VECTOR (9 DOWNTO O);
IN STDJ.OGIC;
OUT STD_LOGIC_VECTOR (9 DOWNTO O)
IN STD_LOGIC_VECTOR (9 DOWNTO O);
IN STDJ.OGIC ;
IN STDJ.OGIC ;
IN STDJ.OGIC_VECTOR (9 DOWNTO O);
INSTDJ.OGIC;






















IN STDJ.OGIC_VECTOR (9 DOWNTO O);
INSTDJ.OGIC;
INSTDJ.OGIC;
IN STDJ.OGIC_VECTOR (9 DOWNTO O);
IN STDJ.OGIC ;
OUT STDJ.OGIC_VECTOR (9 DOWNTO O)
IN STDJ.OGIC_VECTOR (9 DOWNTO O);
INSTDJ.OGIC;
INSTDJ.OGIC;


















































IN STD_LOGIC_VECTOR (9 DOWNTO O);
INSTDJ.OGIC;
INSTDJ.OGIC;
IN STD_LOGIC_VECTOR (9 DOWNTO O);
INSTDJ.OGIC;
OUT STD_LOGIC_VECTOR (9 DOWNTO O)
IN STD_LOGIC_VECTOR (9 DOWNTO O);
IN STDJ.OGIC;
INSTDJ-OGIC;
IN STDJ.OG IC_VECTO R (9 DOWNTO O);
INSTDJ.OGIC;
OUT STD_LOGIC_VECTOR (9 DOWNTO O)
IN STDJ.OGIC_VECTOR (9 DOWNTO O);
INSTDJ.OGIC;
INSTDJ-OGIC;
IN STDJ_OGIC_VECTOR (9 DOWNTO O);
INSTDJ.OGIC;
OUT STDJ.OG IC_VECTOR (9 DOWNTO O)
IN STDJ.OGIC_VECTOR (9 DOWNTO O);
INSTDJ-OGIC;
INSTDJ-OGIC;
IN STDJ.OGIC_VECTOR (9 DOWNTO O);
IN STDJ-OGIC ;
126
: OUT STD_LOGIC_VECTOR (9 DOWNTO O)
);
end component;









IN STD_LOGIC_VECTOR (9 DOWNTO O);
IN STDJ.OGIC;
IN STDJ.OGIC;
IN STDJ.OGIC_VECTOR (9 DOWNTO O);
INSTDJ.OGIC;













IN STD_LOGIC_VECTOR (9 DOWNTO O);
IN STDJ.OGIC;
INSTDJ.OGIC;
IN STDJ.OGIC_VECTOR (9 DOWNTO O);
IN STDJ.OGIC;
























IN STDJ.OGIC_VECTOR (9 DOWNTO O);
IN STDJ.OGIC;
IN STD_LOGIC;
IN STDJ.OGIC_VECTOR (9 DOWNTO O);
INSTDJ.OGIC;
OUTSTD_LOGIC_VECTOR (9 DOWNTO O)
IN STDJ.OGIC_VECTOR (9 DOWNTO O);
IN STDJ.OGIC;
IN STDJ.OGIC;
IN STD_LOGIC_VECTOR (9 DOWNTO O);
INSTDJ.OGIC;
OUT STDJ.OG IC_VECTOR (9 DOWNTO O)
127
end component;
















IN STD_LOGIC_VECTOR (9 DOWNTO O);
INSTDJ.OGIC;
INSTDJ.OGIC;
IN STD_LOGIC_VECTOR (9 DOWNTO O);
INSTD_LOGIC;





IN STD_LOGIC_VECTOR (9 DOWNTO O);
IN STDJ.OGIC;
IN STD_LOGIC ;
IN STD_LOGIC_VECTOR (9 DOWNTO O);
IN STD_LOGIC ;
OUT STD_LOGIC_VECTOR (9 DOWNTO O)
);
end component;








IN STD_LOGIC_VECTOR (9 DOWNTO O);
INSTDJ.OGIC;
INSTDJ-OGIC;
IN STDJ.OGIC_VECTOR (9 DOWNTO O);
INSTDJ.OGIC;
OUT STDJ.OGIC_VECTOR (9 DOWNTO O)
);
end component;








IN STDJ_OGIC_VECTOR (9 DOWNTO O);
IN STDJ-OGIC ;
IN STDJ.OGIC ;
IN STDJ.OGIC_VECTOR (9 DOWNTO O);
INSTDJ.OGIC;













IN STD_LOGIC_VECTOR (9 DOWNTO O);
IN STDJ.OGIC;
IN STDJ.OGIC ;
IN STD_LOGIC_VECTOR (9 DOWNTO O);
IN STDJ.OGIC;
OUT STDJ.OG IC_VECTOR (9 DOWNTO O)
);
end component;









IN STDJ.OG IC_VECTOR (9 DOWNTO O);
INSTDJ.OGIC;
IN STDJ.OGIC;
IN STD_LOGIC_VECTOR (9 DOWNTO O);
IN STDJ.OGIC;

























component beta6 -Memory for Beta(6
PORT
IN STDJ.OGIC_VECTOR (9 DOWNTO O);
IN STDJ.OGIC ;
IN STDJ.OGIC;
IN STDJ.OGICJ/ECTOR (9 DOWNTO O);
INSTDJ.OGIC;
OUT STDJ.OGIC_VECTOR (9 DOWNTO O)
-Memory for Beta(S)
IN STD_LOGIC_VECTOR (9 DOWNTO O);
INSTDJ.OGIC;
IN STDJ.OGIC;
IN STDJ.OGIC_VECTOR (9 DOWNTO O);
IN STDJ.OGIC;








IN STD_LOGIC_VECTOR (9 DOWNTO O);
IN STD_LOGIC ;
INSTDJ-OGIC;
IN STD_LOGIC_VECTOR (9 DOWNTO O);
INSTDJ-OGIC;













IN STDJ-OGIC_VECTOR (9 DOWNTO O);
INSTDJ.OGIC;
INSTDJ.OGIC;
IN STD_LOGIC_VECTOR (9 DOWNTO O);
INSTDJ.OGIC;
OUT STD_LOGIC_VECTOR (9 DOWNTO O)
end component;

























IN STDJ.OGIC_VECTOR (9 DOWNTO O);
INSTDJ.OGIC;
INSTDJ.OGIC;
IN STDJ.OGIC_VECTOR (9 DOWNTO O);
INSTDJ.OGIC;
OUT STDJ.OGIC_VECTOR (9 DOWNTO O)
IN STDJ.OG IC_V ECTOR (9 DOWNTO O);
INSTDJ.OGIC;
INSTDJ.OGIC;
IN STD_LOGIC_VECTOR (9 DOWNTO O);
IN STDJ.OGIC ;









IN STD_LOGIC_VECTOR (9 DOWNTO O);
INSTDJ-OGIC;
IN STDJ.OGIC;
IN STD_LOGIC_VECTOR (9 DOWNTO O);
INSTDJ.OGIC;















IN STDJ.OGIC_VECTOR (9 DOWNTO O);
INSTDJ.OGIC;
INSTDJ.OGIC;
IN STD_LOGIC_VECTOR (9 DOWNTO O);
INSTDJ.OGIC;











IN STDJ.OGIC_VECTOR (9 DOWNTO O);
INSTDJ.OGIC;
INSTDJ.OGIC;
IN STDJ.OGIC_VECTOR (9 DOWNTO O);
IN STDJ.OGIC ;















component betal4 --Memory for Beta(14
PORT
(
IN STDJ.OGIC_VECTOR (9 DOWNTO O);
INSTDJ.OGIC;
INSTDJ.OGIC;
IN STD_LOGIC_VECTOR (9 DOWNTO O);
INSTDJ.OGIC;








IN STD_LOGIC_VECTOR (9 DOWNTO O);
INSTDJ-OGIC;
IN STDJ.0GIC;
IN STD_LOGIC_VECTOR (9 DOWNTO 0);
INSTDJ.OGIC;












IN STDJ.OGIC_VECTOR (9 DOWNTO O);
INSTDJ.OGIC;
INSTDJ.OGIC;
IN STD_LOGIC_VECTOR (9 DOWNTO O);
INSTDJ.OGIC;



























-Memory for gammaOO; Dual port memory
.a : IN STDJ.OG IC_VECTO R (9 DOWNTO O);
b : IN STDJ.OGICJ/ECTOR (9 DOWNTO O);
INSTDJ.OGIC;
IN STDJ.OGIC_VECTOR (9 DOWNTO O);
IN STD_LOGIC_VECTOR (9 DOWNTO O);
IN STDJ.OGIC := T;
IN STDJ.OGIC := T;
IN STDJ.OGIC := T;
OUT STD_LOGIC_VECTOR (9 DOWNTO O);
OUT STD_LOGIC_VECTOR (9 DOWNTO O)
Memory for gammaOl; Dual port memory
a : IN STDJ.OG IC_VECTO R (9 DOWNTO O);
.b : IN STDJ.OG IC_VECTOR (9 DOWNTO O);
INSTDJ.OGIC;
IN STDJ.OGIC_VECTOR (9 DOWNTO O);
IN STDJ.OGIC_VECTOR (9 DOWNTO O);























IN STD_LOGIC := T;
OUT STD_LOGIC_VECTOR (9 DOWNTO O);
OUT STD_LOGIC_VECTOR (9 DOWNTO O)
lemory for gammalO; Dual port memory
a : IN STD_LOGIC_VECTOR (9 DOWNTO O);
b : IN STD_LOGIC_VECTOR (9 DOWNTO O);
IN STDJ-OGIC;
IN STD_LOGIC_VECTOR (9 DOWNTO O);
IN STD_LOGIC_VECTOR (9 DOWNTO O);
IN STDJ.OGIC := T;
IN STDJ.OGIC := T;
IN STDJ.OGIC := T;
OUT STD_LOGIC_VECTOR (9 DOWNTO O);
OUT STDJ_OGIC_VECTOR (9 DOWNTO O)
component gamma3
PORT












: IN STD_LOGIC_VECTOR (9 DOWNTO O);
: IN STDJ.OG IC_VECTO R (9 DOWNTO O);
INSTDJ.OGIC;
IN STD_LOGIC_VECTOR (9 DOWNTO O);







OUT STDJ.OG IC_VECTOR (9 DOWNTO O);
OUT STDJ.OGICJ/ECTOR (9 DOWNTO O)
);
end component;
component FinalBeta is —FinalBeta module
port( enable : in stdjogic;
betain : in Alpha;
gammain : Gamma;
Betaout : out Alpha
);
end component;
component FinalAlpha is -FinalAlpha module
port( enable : in std logic;
alphain : in Alpha;
gammain : Gamma;
Alphaout : out Alpha
);
end component;
component gammaF is -GammaF module
port ( enable : in stdjogic;
datain : in std_logic_vector(9 downto 0);
parityin : in std_logic_vector(9 downto 0);
L_eadd : in std_logic_vector(9 downto 0);
gammaout : out Gamma
);
end component;
component LLRCore is -LLRCore module
port ( enable : in stdjogic;
alphain : in Alpha;
betain : in Alpha;
gammain : in Gamma;
LLRout : out std_logic_vector(9 downto 0)
);
end component;
component LEModule is -LEModule definition
port ( enable : in stdjogic;
LLRin : in stdJogic_vector(9 downto 0);
datain : in stdJogic_vector(9 downto 0);
LEin : in stdJogic_vector(9 downto 0);
LEout : out stdJogic_vector(9 downto 0)
);
end component;
component control is -control module definition
port( elk : in stdjogic;
start : in stdjogic;
countup : out stdJogic_vector(9 downto 0);
countdown : out stdJogic_vector(9 downto 0);
dataLEaddressF : out stdJogic_vector(9 downto 0);
dataLEaddressB : out stdJogic_vector(9 downto 0);
parityaddressF : out stdJogic_vector(9 downto 0);
parityaddressB : out stdJogic_vector(9 downto 0);
alphaAddress : out stdJogic_vector(9 downto 0);
betaAddress : out std logic_vector(9 downto 0);
gammaAddressF : out stdlogic_vector(9 downto 0);
gammaAddressB : out stdJogic_vector(9 downto 0);
datamemenable : out stdjogic;
paritylenable : out stdjogic;
134
parity2enable : out stdjogic;
LEmemenable : out stdjogic;
datawr : out stdjogic;
paritylwr : out stdjogic;
parity2wr : out stdjogic;
LEwr : out stdjogic;
alphamemenable :out stdjogic;
betamemenable : out stdjogic;
gammamemenable : out stdjogic;
alphawr : out stdjogic;
betawr : out stdjogic;
gammawr : out stdjogic;
alphaenable : out stdjogic;
betaenable : out stdjogic;
gammaenable : out stdjogic;
LLRenable : out stdjogic;
LEenable : out stdjogic;
Ld_Areg : out stdjogic;
LdJ3reg : out stdjogic;
decode : out stdjogic;




























iniBeta : Alpha ;
sig_countup : stdJogic_vector(9 downto 0);
sig_countdown : stdJogic_vector(9 downto 0);
sig_dataLEaddressB : stdJogic_vector(9 downto 0);
sig_dataLEaddressF : stdJogic_vector(9 downto 0);
sig_parityaddressF : stdJogic_vector(9 downto 0);
sig_parityaddressB : stdJogic_vector(9 downto 0);
sig_alphaAddress : stdJogic_vector(9 downto 0);
sigjDetaAddress : stdJogic_vector(9 downto 0);
sig_gammaAddressF : stdJogic_vector(9 downto 0);




























































-Alpha register is initialized to (1=>0, rest =>-32(Highest value for (10,4) signed representati




-Beta values initialized for decoder 1; encoderl is terminated to state 0




-Beta values initialized for decoder 2, Encoder 2 is not terminated.
signal Beta_regl : Alpha := (others => "1111010100"); ~log(l/16), all states are possible
signal LLRAIpha,LLRBeta : Alpha;
signal gammaFadd,gammaBadd : stdJogic_vector(9 downto 0);
signal tempgammaoutF,tempgammaoutB : Gamma;
signal tempgammaF,tempgammaB : Gamma;
signal gammaFin,gammaBin : Gamma;
signal sig_decoder : stdjogic ;
signal tempdataF,tempdataB : stdJogic_vector(9 downto 0);
signal tempparitylF,tempparitylB : stdJogic_vector(9 downto 0);
signal tempparity2F,tempparity2B : stdJogic_vector(9 downto 0);
signal tempparityF,tempparityB : stdJogic_vector(9 downto 0);
signal tempLEF,tempLEB : stdJogic_vector(9 downto 0);
signal sigJ-EoutF,sigJ-EoutB,sigJ.LRoutB,sigJ.LRoutF : stdJogic_vector(9 downto 0);
begin
-based on the decoder parity address are assigned
tempparityF <= tempparitylF when (sig_paritylenable = T) else
tempparity2F;
tempparityB <= tempparitylB when sig_paritylenable = T else
tempparity2B;
-Decoding process for backward direction
DECODEDATAB : process(sig_LLRoutB,sig_decode) -decode the data when decode is high
begin
if sig_decode = 1I' then






end process; -End of decoding process
-Decoding process for forward direction
DECODEDATAF : process(sig_LLRoutF,sig_decode)
begin
if sig_decode = T then







-Load the register based on the decoder and the frame position
LOADREGISTERA : process(sig_Ld_Areg,tempalpha)
begin
if (sig_countup = "0000000000" and sig_Ld_Areg = T) then
inialpha <= Alpha_reg; -inialpha is loaded from the Alpha register(initialszation)
elsif(sig_countup > "0000000000" and sig_Ld_Areg = T) then
inialpha <= tempalpha; --inialpha is loaded from the FinalAlpha module
end if;
if sig_countup = "0000000000" then
if (sig_decoder = T and sig_Ld_Breg = 1I') then
iniBeta <= beta_regl; -inibeta is loaded for decoder 2 from the register beta_regl
elsif(sig_decoder = ?' and sig_Ld_Breg = T) then
iniBeta <= beta_reg0; -inibeta is loaded for decoder 1 from the register beta_reg0
end if;
elsif (sig_countup > "0000000000" and sig_Ld_Breg = T) then
iniBeta <= tempbeta; -inibeta is loaded from the FinalBeta
end if;
end process; -end the load process
-Gamma values are either stored or read based on the frame position
SELECTGAMMAF : process(tempgammaF,tempgammaoutF) -Forward direction
137
begin
if (sig_countup <= "0111111111") then
gammaFin <= tempgammaF; -from the gammaF module, needs to be stored
else
gammaFin <= tempgammaoutF; -from the memory moduie
end if;
end process;
SELECTGAMMAB : process(tempgammaB,tempgammaoutB) -backward direction
begin
if (sig_countup <= "0111111111") then
gammaBin <= tempgammaB; --from the gammaF module, to be stored
else
gammaBin <= tempgammaoutB; --From the gamma memory
end if;
end process;
-Instantiate the control module





































-Instantiate the data memory module
--Memory is read only during the decoding process












-Instantiate the parityl memory module
--Memory is read only during the decoding process









q_a => tern ppa rity1 F,
q_b => tempparitylB
);
-Instantiate the parity2 memory module
-Memory is read only during the decoding process













-Instantiate the gammaOO memory module
-Memory is read/written during the decoding process












-Instantiate the gammaOl memory module
-Memory is read/written during the decoding process












-instantiate the gammalO memory module
-Memory is read/written during the decoding process













--Instantiate the gammall memory module
--Memory is read/written during the decoding process












-Instantiate the LEmem memory module
--Memory is read/written during the decoding process












-Instantiate the alpha(O) memory









-Instantiate the alpha(l) memory
141













































wren => sig alphawr,
q => LLRAIpha(6)
);
alpha7_inst : alpha7 PORT MAP ( -Instantiate the alpha{7) memory













































































































































































































-Instantiate GammaF module for forward direction
CALCU LATEGAM MAF : gammaF port map
(sig_gammaenable,tempdataF,temppantyF,tempLEF,tempgammaF);
-Instantiate GammaF module for backware direction
CALCULATEGAMMAB : gammaF port map
(sig_gammaenable,tempdataB,tempparityB,tempLEB,tempgammaB);
-Instantiate the FinalAlpha module
CALCULATEALPHA : FinalAlpha port map(sig_alphaenable,inialpha,gammaFin,tempalpha);
-Instantiate the FinalBeta module
CALCULATEBETA : FinalBeta port map(sig_betaenable,iniBeta,gammaBin,tempbeta);
-Instantiate LLRCore module for forward direction






-Instantiate LLRCore module for backward direction






-Instantiate LE module for forward direction
147







-instantiate LE module for backward direction







gammaFout <= gammaFin; -for test purpose
gammaBout <= gammaBin; --for test purpose
alphaout <= tempalpha; -for test purpose
betaout <= tempbeta; --for test purpose
testcountup <= sig_countup; -for test purpose
testcountdown <= sig_countdown; -for test purpose
testdecode <= sig_decode; -for test purpose
testdecoder <= sig_decoder; -for test purpose
dataoutF <= sig_dataoutF; -Assign the decoded data to dataoutF
dataoutB <= sig_dataoutB; -Assign the decoded data to dataoutB
testllrenable <= sigjlrenable; -for test purpose
end beha; -End of the behavior of the complete decoder




type Alpha is array(0 to 15) of std_logic_vector(9 downto O);











architecture beha of testbenchGALBETA is
signal elk : stdjogic := 1O1;
signal start : stdjogic := T ;
signal test_countup : std_logic_vector(9 downto 0);
signal test_countdown : std_logic_vector(9 downto 0);
signal test_decode : stdjogic;
signal test_decoder : stdjogic;
signal test_dataoutF : stdjogic;
signal test_dataoutB : stdjogic;
signal testjlrenable : stdjogic;
--file for writing the decoded data in the forward direction
file outJileF :text open WRITEJVIODE IS "/home/vlsi/thanga2/dataoutF_file";
--file for writing the decoded data in the backware direction
file outJileB : text open WRITEJVIODE IS "/home/vlsi/thanga2/dataoutBjile";
component TestGammaAlphaBeta is -component under test
port (elk : in stdjogic;
start : in stdjogic;
testcountup : out stdJogic_vector(9 downto O); - just for test purpose
testcoutdown : out stdJogic_vector(9 downto O); - just for test purpose
testdecode : out stdjogic; ~ just for test purpose
testdecoder : out stdjogic; -- just for test purpose
testllrenable : out stdjogic;
gammaFout : out Gamma;
gammaBout : out Gamma;
alphaout : out Alpha;
betaout : out Alpha;
DataoutB : out stdjogic;




start <= 1O' after 20 ns; -Start signal is active high for 20 ns
process(clk) -clock signal is generated, period is 10ns
begin
elk <= not elk after 5 ns;
end process;
process(testjlrenable)
-Variable type to hold the decoded data in line type
variable outJineF,outJineB : line; --defined in texio package
begin
¡f(test_decode = 1I' and testjlrenable = ?') then
write(out_lineF,test_dataoutF); -Format the dataoutF
writeline(out_fileF,out_lineF); -write the data into the file
write(out_lineB,test_dataoutB); --Format the dataoutB
writeline(out_fileB,out_lineB); -write the data into the file
end if;
end process;
--Instantiate the unit under test







end beha; --End of the test bench
150
VITA AUCTORIS
Krishnamohan Thangarajah was born ¡? Sri Lanka. He completed his degree in
engineering titled B.A.Sc. Eng. (Hons.) in Communication Engineering from University of
Windsor, Canada in 2008. At the time of writing this thesis Krishnamohan is a candidate
for the degree of M. A. Sc. in Electrical and Computer Engineering, at the University of
Windsor (Ontario, Canada).
His research interests include error correction coding and communications systems,
digital signal processing on FPGAs for such systems and hardware realizations, and
wireless communication systems.
151
