Digital VLSI Architectures for Advanced Channel Decoders by Biroli, ANDREA DARIO GIANCARLO
04 August 2020
POLITECNICO DI TORINO
Repository ISTITUZIONALE
Digital VLSI Architectures for Advanced Channel Decoders / Biroli, ANDREA DARIO GIANCARLO. - (2016).
Original
Digital VLSI Architectures for Advanced Channel Decoders
Publisher:
Published
DOI:10.6092/polito/porto/2653143
Terms of use:
Altro tipo di accesso
Publisher copyright
(Article begins on next page)
This article is made available under terms and conditions as specified in the  corresponding bibliographic description in
the repository
Availability:
This version is available at: 11583/2653143 since: 2016-10-16T17:32:44Z
Politecnico di Torino
POLITECNICO DI TORINO
SCUOLA DI DOTTORATO
Dottorato in Ingegneria Elettronica – XXVIII ciclo
Tesi di Dottorato
Digital VLSI Architectures for
Advanced Channel Decoders
Ing. Andrea Dario Giancarlo Biroli
Tutore
Prof. Guido Masera
27 May 2016
To my beloved parents
i
Contents
List of Figures iv
1 Summary 1
2 Introduction 3
2.1 Brief history of information and coding theory . . . . . . . . . . . . . 3
2.2 Near-Capacity Channel Codes . . . . . . . . . . . . . . . . . . . . . . 5
2.2.1 Linear block codes . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.2 Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Iterative Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.1 Optimum decoding . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Log-Likelihood ratio for AWGN channel and BPSK modulation . . . 10
2.5 QAM modulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.5.1 Square QAM . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3 Polar code theory 19
3.1 Preliminary definitions . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1.1 Channel models and channel coding . . . . . . . . . . . . . . . 19
3.2 Channel polarization effect . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2.1 Channel combining . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2.2 Channel splitting . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2.3 Operations on recursive synthetic channels . . . . . . . . . . . 28
3.2.4 Channel polarization . . . . . . . . . . . . . . . . . . . . . . . 30
3.3 Encoding and decoding of Polar Codes . . . . . . . . . . . . . . . . . 35
3.4 Belief propagation in polar codes . . . . . . . . . . . . . . . . . . . . 36
3.5 Belief propagation scheduling . . . . . . . . . . . . . . . . . . . . . . 42
3.6 Uniform Belief Propagation Decoder Structure . . . . . . . . . . . . . 49
3.7 Graph Representation: Redundant Trellises . . . . . . . . . . . . . . 50
3.8 Min-Sum Approximation . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.9 Performance evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 57
ii
4 Belief Propagation Polar Decoder software model 60
4.1 Introduction to C model . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.1.1 Setting file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.1.2 Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.1.3 Channel simulation and soft information evaluation . . . . . . 62
4.1.4 Graph description and Decoding . . . . . . . . . . . . . . . . . 64
4.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5 State of art for polar codes decoder implementations 70
6 Belief Propagation Decoder hardware implementation 76
6.1 Effects of scheduling algorithms on architectures . . . . . . . . . . . . 76
6.2 Hardware Implementation . . . . . . . . . . . . . . . . . . . . . . . . 78
6.2.1 Main Entity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.2.2 Processing Element . . . . . . . . . . . . . . . . . . . . . . . . 82
6.3 Synthesis results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
7 Conclusions 87
7.1 Achieved results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Bibliography 90
iii
List of Figures
2.1 Performance of codes example for Hamming, Golay, Reed-Muller (RM),
concatenated (CC), BHC, Reed-Solomon (RS), LDPC, turbo (TC)
and polar (PC) codes. It is also reported practical implementations of
codes for deep-space probes: Mariner (RM) 1969, Pioneer (CC) 1968,
Voyager(RS) 1977, Galileo (Viterbi) 1989. And for cellular applica-
tions: GSM (Viterbi) 1987, IS-95 standard (RM)1995. Figure from
[1]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Scheme representing transmitter, transmission channel and receiver. 5
2.3 Scheme of AWGN channel. . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 BPSK signal constellation. . . . . . . . . . . . . . . . . . . . . . . . 11
2.5 Used Mapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.6 Probability density for the AWGN channel. . . . . . . . . . . . . . . 11
2.7 Examples of type I, II, and III QAM constellations. . . . . . . . . . 13
2.8 The MASK constellation. . . . . . . . . . . . . . . . . . . . . . . . . 16
2.9 The QAM symbol probability. From [2]. . . . . . . . . . . . . . . . . 17
2.10 The 16-QAM constellation with Gray coding. . . . . . . . . . . . . . 18
3.1 Binary symmetric channel (BSC). . . . . . . . . . . . . . . . . . . . 20
3.2 Binary Erasure Channel (BEC). . . . . . . . . . . . . . . . . . . . . 24
3.3 W2 channel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.4 W4 channel obtained by recursions of W2 and W . . . . . . . . . . . . 26
3.5 Generalized WN channel obtained by recursion of two WN
2
. . . . . . 27
3.6 Channel variables relationship. . . . . . . . . . . . . . . . . . . . . . 30
3.7 Binary tree for the recursive construction of synthetic channels. . . . 32
3.8 Symmetric capacity I(W (i)N ) for N BEC identical channels with era-
sure probability  = 0.5. a)N = 16 b)N = 64 c)N = 128 d)N = 1024. 34
3.9 Forney-style graph representation of a polar code of length eight. . . 37
3.10 Basic computational block for BP polar code decoding. Pointed out
messages for the evaluation of L̂1 information. . . . . . . . . . . . . 37
3.11 Basic computational block for BP polar code decoding. Pointed out
messages for the evaluation of L̂2 information. . . . . . . . . . . . . 39
iv
3.12 Basic computational block for BP polar code decoding. Pointed out
messages for the evaluation of R̂2 information. . . . . . . . . . . . . 40
3.13 Basic computational block for BP polar code decoding. Pointed out
messages for the evaluation of R̂1 information. . . . . . . . . . . . . 40
3.14 Bidirectional scheduling algorithms: A)Linear LR. B)Linear RL. C)Circular
LR. D)Circular RL. . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.15 Bidirectional scheduling algorithms: A)Linear LR, start R. B)Linear
RL, start L. Unidirectional scheduling algorithms: G)Circular LR.
H)Circular RL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.16 Scheduling algorithms: I)Circular double-wave. J)All-on. K)Odd-
Even. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.17 Scheduling algorithms comparison for N = 1024, rate 1
2
. . . . . . . . 48
3.18 Uniform graph representation with shuffle operators for F⊗3 . . . . 49
3.19 Uniform graph representation with reverse − shuffle operators for
F⊗3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.20 Example of three redundant trellises. . . . . . . . . . . . . . . . . . . 50
3.21 Comparison result of FER performance of two redundant graph rep-
resentations for 15 iterations. . . . . . . . . . . . . . . . . . . . . . . 51
3.22 Function Ω(x, y). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.23 Min-Sum approximation of function Ω(x, y). . . . . . . . . . . . . . 55
3.24 Approximation error of using Min-sum compared to function Ω(x, y). 55
3.25 Performance comparison of approximations for polar decoding. From
[3]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.26 Typical digital coded and uncoded communication system performance.
57
3.27 Typical regions in an error probability curve for iterative decoding
algorithms: the solid line identify the waterfall region and the error
floor region. The trade-off between the two regions is illustrated by the
second curve (dashed line) which has lower error floor at the expense
of higher convergence threshold. . . . . . . . . . . . . . . . . . . . . . 58
3.28 Cycles example on polar code factor graph for the case of N = 8, girth
gmin = 12. From [4]. . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.1 Block description of C model. . . . . . . . . . . . . . . . . . . . . . . 60
4.2 Structure of LLRs message allocation for C model. . . . . . . . . . . 63
4.3 Example of S(1)3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.4 Structure of LLRs message allocation for C model. . . . . . . . . . . 64
4.5 Performance for BP decoding with different schedules, BPSK mod-
ulation, floating point decoding, rate 1
2
, codeword length N = 1024,
150 states a)BER. b)FER. . . . . . . . . . . . . . . . . . . . . . . . . 66
v
4.6 FER performance for BP decoding with different schedules, BPSK
modulation, 200 operations. a)finite precision decoding (9 bit), rate
1
2
, codeword length N = 1024. b)floating point decoding, rate 1
2
,
codeword lengthN = 1024, 650 operations. c)floating point decoding,
rate 1
2
, variable codeword length (N = 512, 256). d)floating point
decoding, variable rate (1
4
, 3
4
), codeword length N = 1024. . . . . . . . 68
4.7 FER performance for BP decoding with different schedules, floating
point decoding, rate 1
2
, codeword length N = 1024. a) 200 operations,
16-QAM modulation. c) 200 operations, 64-QAM modulation. d) 200
operations, 256-QAM modulation. . . . . . . . . . . . . . . . . . . . . 69
6.1 Scheduling C message update example. . . . . . . . . . . . . . . . . . 77
6.2 Fully parallel architecture for a code of length N = 4. . . . . . . . . . 78
6.3 Reduced complexity architecture. . . . . . . . . . . . . . . . . . . . . 79
6.4 Bidirectional wave architecture. . . . . . . . . . . . . . . . . . . . . . 80
6.5 Bidirectional wave register memory block detail. . . . . . . . . . . . . 80
6.6 Bidirectional wave architecture refined I scheduling. . . . . . . . . . . 81
6.7 Normalized min-sum block for two’s complement data representation. 82
6.8 Normalized min-sum block for modulus and sign data representation. 83
6.9 Sum block for modulus and sign data representation. . . . . . . . . . 84
vi
Chapter 1
Summary
Every modern digital communication and storage system makes strong use of error-
correcting codes. Typical applications are wireless communications, sensor net-
works, optical communications, computer hard drives, flash memories and space
probes. The demand of codes with better error-correcting capability for new and
emerging applications as well as the design and implementation of those high-gain
error-correcting codes are open challenges. The direct mapping of these algorithms
to hardware implementation often leads to very high complexity architecture be-
cause they usually involve complex mathematical computations so lots of research
is performed to reduce complexity while enhancing decoding performance.
This work aims to focus on Polar codes, which are a recent class of channel codes
with the proven ability to reduce decoding error probability arbitrarily small as the
block-length is increased, provided that the code rate is less than the capacity of
the channel. This property and the recursive code-construction of this algorithms
attracted wide interest from the communications community.
Hardware architectures with reduced complexity can efficiently implement a polar
codes decoder using either successive cancellation approximation or belief propaga-
tion algorithms. The latter offers higher throughput at high signal-to-noise ratio
thanks to the inherently parallel decision-making capability of such decoder type.
In Chapter 2 is proposed an overview on technological improvements from dawn of
channel coding techniques and related forefront applications to actual solutions. In
the same chapter general characteristics of error correcting block codes and iterative
decoding properties are presented. Effects of Additive White Gaussian Noise are
explored in relation to different channel modulations.
At Chapter 3 the polar codes theory is deeply analyzed. Channel polarization, effects
of channel combining and splitting are described. Coding and decoding methods,
approximations and performance evaluation are presented. A novel inspection on
scheduling algorithms for Belief Propagation is given. Drawn observation enable to
focus the research from simple maximum achievable throughput to throughput over
1
1 – Summary
area efficiency ratio. Property of polar codes graph representation like uniformity
or trellis redundancy are evaluated.
The model used to simulate different algorithms and hardware architectures is in-
troduced in the fourth chapter. Simulation results are commented and general ob-
servation are drawn.
In Chapter 5 the state-of-art background for belief polar code decoders is presented.
Chapter 6 describes implemented architectures in relation with proposed best schedul-
ing algorithms discovered. Synthesis results of processing elements and overall struc-
tures are presented. Researched objectives in terms of absolute throughput and area
efficiency are met.
Last chapter reports conclusion and compare achieved results with best architectures
from literature. It also reports suggested future works in the direction of further
improve obtained results.
2
Chapter 2
Introduction
2.1 Brief history of information and coding theory
The most fundamental parameter regarding a communication channel is its capac-
ity C, a concept introduced by Claude E. Shannon in 1948 [5] that showed how to
calculate the highest rate at which information can be transmitted reliably over the
channel. This cut-off rate is considered the be the "practical coding limit" and is
also known as "Shannon limit".
For the next half century, channel coding central objective was to find practical
coding schemes that could approach channel capacity on well-understood channels
such as the additive white Gaussian noise (AWGN) channel. This goal proved to be
challenging, but not impossible. [6]
For the first couple of decades, algebraic coding dominated the field of channel cod-
ing. The principal objective of algebraic coding theory is to maximize the minimum
Hamming distance for a given code to maximize error correction power. To improve
error detection and correction, binary linear block codes were invented: Hamming
code [7], Golay [8] codes and Reed-Muller codes [9] [10].
Even though algebraic codes were clearly limited to a few, channel models, their
development appeared in the early sixties to be an area of considerable potential
payoff, in applied as well as theoretical directions, beginning with the algebraic rep-
resentation [11] of all linear error-correcting codes, and culminating at the end of
the decade with the important class of Bose-Chaudhuri-Hocquenghem (BCH) codes
[12], [13] and following Reed–Solomon [14] which are a class of non-binary BCH
codes.
On the road to modern capacity-approaching codes, an essential step has been to
replace hard-decision with soft-decision decoding (decoding that takes into account
the reliability of received channel outputs). The earliest soft-decision decoding al-
gorithm in literature is Wagner decoding, described in [15].
3
2 – Introduction
An alternative line of development that was more directly inspired by Shannon’s
Figure 2.1: Performance of codes example for Hamming, Golay, Reed-Muller (RM),
concatenated (CC), BHC, Reed-Solomon (RS), LDPC, turbo (TC) and polar (PC)
codes. It is also reported practical implementations of codes for deep-space probes:
Mariner (RM) 1969, Pioneer (CC) 1968, Voyager(RS) 1977, Galileo (Viterbi) 1989.
And for cellular applications: GSM (Viterbi) 1987, IS-95 standard (RM)1995.
Figure from [1].
probabilistic approach to coding is named "Probabilistic coding". It is more con-
cerned with finding classes of codes that optimize average performance as a function
of coding and decoding complexity. Following coding schemes fall into this class.
Convolutional codes and product codes were invented by Peter Elias in 1954-1955
[16] [17]. In 1962 Gallager developed Low-density parity-check code (LDPC code)
[18] but his work was not taken into account until nineties by David MacKay.
Concatenated codes were introduced by Dave Forney in 1966 [19]. It is based on an
inner and an outer code that can be relatively short and easy to encode and decode,
while the resulting concatenated code is powerful and much longer.
To decode convolutional codes, Andy Viterbi in 1967 proposed an "asymptotically
4
2 – Introduction
optimal" algorithm (known as Viterbi Algorithm) which performs a maximum-
likelihood decoding through the search of the closest sequence on a trellis with
finite number of states. In nineties, we can date the main developments in codes ap-
proaching the Shannon’s bound: Turbo codes (1993) [20], the rediscovery of LDPC
(1996) [21],[22] and Polar codes (2008) [23].
In Figure 2.1 a performance comparison among cited codes and forefront applica-
tions of these codes can be observed. Notice that from the start in 1948, coding
research has been oriented in closing the gap between practical achievable perfor-
mance (with proper code structure influenced by technological constrains) and the
Shannon bound (Eb
N0
> 2
r−1
r
> ln 2 ' −1.6dB).
2.2 Near-Capacity Channel Codes
2.2.1 Linear block codes
In Figure 2.2 a generic numerical transmission classical point to point scheme is
presented, even if nowadays a part of research, especially in multimedia applications,
study the possibility of combining source and channel coding, in order to optimize
transmissive resources. Block codes are channel codes which allow to reveal and
correct errors induced by transmission channel. It is briefly reported fundamental
Figure 2.2: Scheme representing transmitter, transmission channel and receiver.
concept on linear block codes.
Let F = GF (2) be a Galois Field of order 2 and FN a vector space of dimension N
on GF(2). A block code C ⊆ FN is an application C : FK → FN . It is linear if:
• ∀λ ∈ F, u ∈ C, also λu ∈ C;
• ∀u, v ∈ C, also u⊕ v ∈ C.
5
2 – Introduction
C is characterized by (N,K, d) where d is the minimal distance of the code, defined
as in 2.1.
d , min
u/=v
dH(u, v) (2.1)
where dH is the Hamming distance. Hence information bitstream is divided in
vectors u, of dimension 1xK and mapped on codeword x of dimension 1xN , through
linear transformation 2.2.
x = uG (2.2)
The ratio R = K/N is called code rate. A code is called systematic if x can be
rewritten as x = [u|c]. Matrix GT of dimension KxN is called generating matrix
and it is in systematical form if it respect following three properties:
• Leftmost not-zero value of every row is a 1;
• Every column containing the leftmost 1 has all other input equal to 0;
• If the leftmost not-zero value of the row i is in column ti, then t1 < t2... < tr
So a particular case occurs when the matrix is built as follows.
G = [IK |PN−K ] (2.3)
where IK is an identity matrix KxK and PN−K of dimension Kx(N −K).
So the coding for a code C has the desirable characteristic that information symbols
appear clearly in the codeword x; more in general, the symbol xi will appear in
position ti in the codeword x = uG, if the leftmost value of row i-th of G happens
at column ti.
Every code (linear or not), that verify the property that exist a rule of coding so
that information symbols appear clearly is called systematic. It can be proven that
every linear code can be made systematic.
In a digital transmission channel, sent codeword x can be corrupted by noisy channel
and the received word y can be expressed as in 2.4.
y = x+ e (2.4)
where e is the error vector that represent channel effects. Knowing the received sig-
nal y, the decoder tries to evaluate an estimate of x̂ of the transmitted information
word u. Estimated codeword x̂ must be calculated to be the closest word (inside
the codeword set C) to the transmitted one. To do so, it is chosen a method based
on codeword distance (either Hamming or Euclidean distance). To perform di eval-
uation, the decoder should compute (N − K) parity bits and compare them with
original parity bits of x, if equal, the received word is correct. So an error occur if
at least one parity check fails.
6
2 – Introduction
2.2.2 Coding
For simplicity of exposition, it will be considered in the following paragraph that
message bits u are placed at the end of the codeword. The codeword can be divided
in M check bits followed by K message bits, so the codeword is divisible in M
control bits followed by K message bits u.
x = [c|u] (2.5)
Thanks to the previous assumption, H parity matrix can be divided in a A matrix
MxM which occupy first M column of H and a B matrix MxK which occupy
remaining columns of H.
H = [A|B] (2.6)
So the requirement that a codeword comply with all parity checks (Hx = 0) can be
written as:
Ac+Bu = 0 (2.7)
Assuming A is not singular, follows:
c = A−1Bu (2.8)
A can be singular for some choices so codeword bits are message bits, however it
always exist a choice of A not singular if H rows are linearly independent. Anyway it
is possible that H rows are not linearly independent (i.e. some rows are redundant).
This particular case is not very interesting.
Equation 2.8 defines which should be control bits, however the way those bits are
evaluated can be fulfilled in different ways.
The developed software does not involve the use of a matrix representation in sparse
form since the high number of ones in the generator polar code matrix, equal to
3log2(N) where N is the codeword length. The explicit representation of the generator
matrix G can be expressed for the systematic form as in 2.9.
G = [A−1B|IK ] (2.9)
This form, expressed for a generic generator matrix, does not take into account
specific recursive structure of polar codes that will be discussed in detail in chapter
3.3.
2.3 Iterative Decoding
Actual high performance decoders work with data represented as information called
“soft-information”.
7
2 – Introduction
Block codes can be decoded through two different strategies which depend on input
data format at the decoder. When decoder inputs are made by bits (0 or 1), it is
called “Hard” decoding; while if inputs are real values (i.e. 0.784) it is considered
“Soft” decoding.
Recalling Figure 2.2, channel decoder inputs are digital demodulator outputs, so
if this component which receives channel signal, represent it as real values, then
following decoder will be “soft”. Instead if the digital demodulator takes its decision
also on transmitted data choosing if the sent bit was a 0 or 1, it will be called
“hard” decoder. Moreover, since in soft decoding, real values are involved, the metric
for the chosen distance is the Euclidean, while for hard decoding Hamming distance
is used.
On decoding performance point of view, it is well known in information theory that
Soft Decoding overcame Hard Decoding because it is able to correct a larger number
of errors. Decoding process becomes quite difficult when a word is searched in the
space of all possible messages, which minimize the probability of having an error.
Hard Decoding limits this process in the search into the subspace of binary words
which are a limited set. Instead Soft Decoding “extends” the search by considering all
real values, this means that the research space is considerably augmented. However
the probability of finding transmitted codeword is also increased; and also adopting
same error correction code, soft decoding can achieve better gains compared to hard
decoding. From these remarks appear that high performance channel decoders use
Soft decoding strategy even if decoder architectures are more complex.
For the specific case of Polar code decoders, they do not work on raw soft data
received from the channel, but on a “reliability” measure obtained from the data.
These metrics represent how received signal is close to bit 0 or 1 of the transmitted
signal; they are called likelihood-ratio (λd) and defined as in 2.10
λd =
P (yd|xd = 0)
P (yd|xd = 1) =
1− Pd
Pd
(2.10)
Where yd is the d-th received bit, xd is d-th transmitted bit and Pd indicate the
probability of receiving a bit 1 in position d given that bit 1 has been transmitted in
position d. Let us remark that the conditional probability evaluated on input signal
needs the knowledge on the transmitted information, that is if it has been sent a 1
or a 0 from transmitter.
It is also defined the Log-likelihood-ratio (LLR) as in 2.11.
Λd , ln(λd) = ln
P (yd|xd = 0)
P (yd|xd = 1) = ln
1− Pd
Pd
(2.11)
The decoder purpose is to determine the transmitted correct value, starting from the
knowledge of the received information, iteratively updating likelihood-ratio metrics,
8
2 – Introduction
that means to determine the a posteriori probability on received data. Since these
metrics are probabilities, the iterative update will be multiplication among them
and resulting probability will be interpreted as an information on how close the
received message is to a codeword. The choice of using logarithms is therefore
motivated by the easiness to implement an algorithm which presents sums instead
of multiplications, in fact the use of multiplicative blocks is more expensive in terms
of hardware and less stable form numerical representation point of view compared to
sum blocks. Moreover the introduction of the logarithmic domain does not modify
the overall behaviour of the decoder.
For the AWGN channel and BPSK modulation some simplification occur to the
evaluation formula of LLRs as presented in the following chapter 2.4, while for
QAM modulation a detailed overview is given in 2.5.
2.3.1 Optimum decoding
The purpose of decoders is to find the codeword ĉ with the highest probability of
having been sent over the channel, given the channel output y and the knowledge
of the used code:
ĉ = arg
c′∈C
maxP (c = c′|y) (2.12)
Such kind of decoding is called word maximum a posteriori (W-MAP). The code
knowledge appears in the conditioned probability. Using Bayes theorem the a pos-
teriori probability P (c = c′|y) can be expressed as:
P (c|y) = P (y|c)P (c)
P (y)
=
P (y|c)P (c)∑
c∈C
P (y|c)P (c) (2.13)
If the a priori probability P (c) is equal for every c (equiprobable source), since P (y)
does not depend from c then 2.12 can be expressed as:
ĉ = arg
c′∈C
maxP (y|c = c′) (2.14)
This criterion is known as word maximum likelihood (W-ML). P (y|c) is the likeli-
hood function when expressed for a particular y and for a given c it represents the
density probability function.
If the channel is memoryless, it is also valid:
P (y|c = c′) =
N−1∏
j=0
P (yj|cj = c′j) (2.15)
9
2 – Introduction
So the W-ML criterion becomes:
ĉj = arg
c′∈{0,1}
maxP (yj|cj = c′) (2.16)
The only way to obtain an optimal W-MAP is to test every memory word, which
means 2K combinations if a binary source is considered. W-MAP and W-ML de-
coders are optimum equivalent decoders if the source is equiprobable.
2.4 Log-Likelihood ratio for AWGN channel and
BPSK modulation
Figure 2.3: Scheme of AWGN channel.
Supposing to represent the transmission medium as an AWGN (Additive White
Gaussian Noise) channel with N (0, σ2 = N0
2
), the received signal y(t) can be written
as y(t) = x(t) + n(t), where x(t) is the transmitted signal, so it is considered a con-
stellation of only two symbols as for binary phase shift keying (BPSK) modulation,
and n(t) represent the noise component (Figure 2.3).
Phase shift keying (PSK) is widely used in the communication industry and BPSK
is the simplest implementation with just two signals with different phases.
Typically these two phases are 0 and pi, the signals are:
s1(t) = Acos(2pifct), 0 ≤ t ≤ T, for 1
s2(t) = −Acos(2pifct), 0 ≤ t ≤ T, for 0 (2.17)
These signals are called antipodal and they can also be graphically represented by
a signal constellation (Figure 2.4) in the coordinate system with
Φ1(t) =
√
2
T
cos(2pifct) for 0 ≤ t ≤ T (2.18)
The energy of a transmitted signal is therefore:
E = A2
T
2
(2.19)
10
2 – Introduction
Figure 2.4: BPSK signal constellation.
Let us remind that for AWGN channel the density probability function is defined
Figure 2.5: Used Mapping.
Figure 2.6: Probability density for the AWGN channel.
by 2.20
f(ri|bi) = 1√
2piσ
e
−
(ri − (1− 2bi))2
2σ2 (2.20)
11
2 – Introduction
where bi is the bit that is transmitted and is mapped as bi = 0 for symbol x = s2
and bi = 1 for symbol x = s1 with unit energy that is normalized (as in Figure 2.5).
ri is the received symbol. So for the Gaussian channel with variance σ is given by
ES
N0
= 1
2σ2
. As LLR was previously defined (2.11) it can be written the following.
Λd = ln
P (yd|xd = s2)
P (yd|xd = s1) = ln
f(rd|bd = 0)
f(rd|bd = 1) =
ln
e
−
(rd − 1)2
2σ2
e
−
(rd + 1)
2
2σ2
=
ln e
−
(−2rd − 2rd)
2σ2 =
2
rd
σ2
=
4
ES
N0
rd
(2.21)
The final result of 2.21 will be used to compute simulations probability of the AWGN
transmission channel with BPSK signal mapping of sent data for the software im-
plementation.
2.5 QAM modulation
Quadrature amplitude modulation (QAM) is a class of nonconstant envelope schemes
that can achieve higher bandwidth efficiency thanM -ary Phase Shift Keying (MPSK)
with the same average signal power. The first QAM scheme was proposed by C. R.
Cahn in 1960 [24]. He extended phase modulation to a multi-amplitude phase mod-
ulation. This means that more than one amplitude is associated to every allowed
phase. Many different types of constellation have been presented in literature over
the years. Adopting the description of QAM constellation in [25] let us introduce
three most significative types .
Type I constellation is a fixed number of signal points (or phasors) equally spaced
on N circles, where N is the number of amplitude levels (Figure 2.7(a)). This type
of constellation suffers of the problem that the points on the inner ring are closest
together in distance so are most vulnerable to errors. To mitigate this problem, type
II constellation was proposed by Hancock and Lucky a few months later [26] (Figure
12
2 – Introduction
Figure 2.7: Examples of type I, II, and III QAM constellations.
2.7(b)). Signal points of type II constellation are still on circles, but the number of
points on the inner circle is lower than the number of points on the outer circle. This
makes the distance between two adjacent points on the inner circle approximately
equal to that on the outer circle.
In 1962 Campopiano and Glazer [27] proposed the square QAM constellation shown
in Figure 2.7(c). This III type of constellation does not offer a big improvement in
performance over the type II system, but its implementation is considerably simpler
than that of type I and II. It can be easily generated by two M -ary Amplitude Shift
Keying (MASK) signals sent on two phase-quadrature carriers and demodulated to
yield the two quadrature components. Few of the other constellations offer slightly
better error performance for AWGN channels, but with more complicated system
implementation. Due to this, the type III constellation has been the most widely
used system. QAM is used for instance in modems designed for telephone channels.
13
2 – Introduction
QAM schemes starting from uncoded 16QAM to trellis coded 128QAM are used in
CCITT telephone circuit modem standards V.29 to V.33. A very active research
on QAM applications are also satellite systems, point-to-point wireless systems, and
mobile cellular telephone systems.[25]
2.5.1 Square QAM
Considering both amplitude and phase modulation in a scheme, a general QAM
signal can be written as:
si(t) = Aicos(2pifct+ θi), for i = 1,2, ...M (2.22)
where Ai is the amplitude and θi is the phase of the i-th signal in the M-ary signal
set. The pulse shaping parameter p(t) which multiply si(t) has been neglected be-
cause there is no particular interest to discuss the improvement of the spectrum and
the intersymbol interference (ISI) for QAM current description for simulation, but it
is good to know for a practical implementation. In fact even if pulse shaping is not
desired, it inevitably occurs due to the limit of bandwidth for the considered system.
Pulse shaping is usually achieved through filtering so P (f) = HT (f)HC(f)HR(f)
(or equivalently p(t) = hT (t) ∗ hC(t) ∗ hR(t) ) where HT (f),HC(f) and HR(f) are
the spectral responses of the transmitter filter, channel, and receiver filter. A com-
mon choice of P (f) is the raised-cosine (or its approximated delayed version due
to causality issues), whose time domain function p(t) has zero values at sampling
instants except at t = 0, thus p(t) incurs no ISI.
Equation 2.22 can be written as
si(t) = Ii
√
E0
T
cos(2pifct) +Qi
√
E0
T
sin(2pifct) =
= Ii
√
E0
2
Φ1(t) +Qi
√
E0
2
Φ2(t)
(2.23)
where (Ii, Qi) are a pair of independent integers which determine the location of the
signal point in the constellation. E0 is the energy of the signal with the lowest ampli-
tude, or equivalently the average energy of the signal when (Ii, Qi) are normalized.
The two orthonormal function in 2.23 are:
Φ1(t) =
√
2
T
cos(2pifct) , for 0 ≤ t ≤ T
Φ2(t) =
√
2
T
sin(2pifct) , for 0 ≤ t ≤ T
(2.24)
14
2 – Introduction
The pair (Ii, Qi) is an element of an LxL matrix with all possible combinations. For
instance a 16QAM matrix has L = 4 and is represented by:
[Ii, Qi] =

(−3, 3) (−1, 3) (1, 3) (3, 3)
(−3, 1) (−1, 1) (1, 1) (3, 1)
(−3,−1) (−1,−1) (1,−1) (3,−1)
(−3,−3) (−1,−3) (1,−3) (3,−3)
 (2.25)
The generalized M -QAM matrix, where M = 4n, n = 1,2,3, ..., and L =
√
M , can
be written as:
[Ii, Qi] =

(−(L− 1), (L− 1)) (−(L− 3), (L− 1)) · · · ((L− 1), (L− 1))
(−(L− 1), (L− 3)) (−(L− 3), (L− 3)) · · · ((L− 1), (L− 3))
...
...
...
(−(L− 1),−(L− 1)) (−(L− 3),−(L− 1)) · · · ((L− 1),−(L− 1))

(2.26)
The constellation can be conveniently expressed in terms of (Ii, Qi), but the phasor
for the square QAM is
si =
(
Ii
√
E0
2
, Qi
√
E0
2
)
(2.27)
And consequently the average energy is given by:
Eavg = E
{
E0
2
(
I2i +Q
2
i
)}
=
E0
2
[
E
{
I2i
}
+ E
{
Q2i
}]
= E0E
{
I2i
}
(2.28)
where
E {I2i } =
1
L
[
(−(L− 1))2 + (−(L− 3))2 + · · ·+ (L− 3)2 + (L− 1)2] =
=
2
L
[12 + 32 + · · ·+ (L− 1)2] = 2
L
[
L/2∑
i=1
(2i− 1)2
]
=
= 1
3
(L2 − 1) = 1
3
(M − 1)
(2.29)
Thus the average power is
Pavg =
E0
3T
(M − 1) = P0
3
(M − 1) (2.30)
Where P0 is the power associated to the smallest signal. The average transmitted
power required to achieve a given minimum distance is only slightly greater than
the average power required for the best M -ary QAM signal constellation. For these
reasons, rectangular M -ary QAM signals are most frequently used in practice.
15
2 – Introduction
Error probability
Square QAM signal constellations have the distinct advantage of being easily gen-
erated as two MASK signals impressed on the in-phase and quadrature carriers,
each having L =
√
M signal points. An error occurs if the additive Gaussian noise
either n1 or n2 on the orthonormal bases Φ1, Φ2 is large enough to cause an error in
one of the two MASK signals. A QAM symbol is detected correctly only when two
MASK symbols are detected correctly. Thus the probability of correct detection of a
Figure 2.8: The MASK constellation.
QAM symbol is the product of correct decision probabilities for constituent MASK
systems.
Pc,M−QAM = P 2c,√M−ASK = (1− Pe,√M−ASK)2 (2.31)
And consequently the error probability is
Pe,M−QAM = 1− (1− Pe,√M−ASK)2 =
= 2Pe,
√
M−ASK − P 2e,√M−ASK
(2.32)
where Pe,√M−MASK is symbol error probability of a single MASK with one-half the
average power of the QAM signal.
Recalling that the symbol error probability of a MASK signal is given by
Pe,M−ASK =
1
M
M∑
m=1
P (err|m)
=
1
M
[
2(M − 2)Q
(
dmin√
2N0
)
+ 2Q
(
dmin√
2N0
)]
=
=
2(M − 1)
M
Q
(
dmin√
2N0
)
=
= 2
(
1− 1
M
)
Q
(√
6log2M
M2 − 1
Eb,avg
N0
)
(2.33)
16
2 – Introduction
where dmin is the distance between two signal of the constellation (as in Figure 2.9)
and Q(x) =
1
2
erfc
(
x√
2
)
.[2] It comes that the 2.32 becomes:
Pe,
√
M−ASK = 2
(
1− 1√
M
)
Q
(√
3log2M
M − 1
Eb,avg
N0
)
Pc,M−QAM = 4
(
1− 1√
M
)
Q
(√
3log2M
M − 1
Eb,avg
N0
)
−
(
2
(
1− 1√
M
)
Q
(√
3log2M
M − 1
Eb,avg
N0
))2
(2.34)
That is an exact expression for the symbol error probability of aM -ary QAM signal.
Figure 2.9: The QAM symbol probability. From [2].
17
2 – Introduction
Gray coding
To minimize the bit error of n-tuples of QAM points, Gray coding is typically used
for mapping these data. In fact the Grey code is constructed imposing that adjacent
symbols differ only for one bit. In Figure 2.10 an example of Gray coding is given
Figure 2.10: The 16-QAM constellation with Gray coding.
for 16-QAM constellation. Observing that square QAM can implement perfectly
Gray code, the bit error probability can be obtained from symbol error probability
as
Pb ∼= PS
log2M
(2.35)
supposing that there is only one bit of difference between close symbols.
18
Chapter 3
Polar code theory
Polar coding discover came out by studying a technique to improve the cutoff rate
of sequential decoding of a concatenated decoding scheme. Starting from a vector
channel and splitting it into multiple correlated subchannels, it was possible to
use a different sequential decoder on each subchannel. Polar coding was originally
designed as a simple recursive operation to be used as low complexity inner code
that implemented this behaviour. But it was noticed that polar coding performance
was so good that no outer convolutional code was needed to increment the cutoff
rate to channel capacity.[28]
3.1 Preliminary definitions
3.1.1 Channel models and channel coding
In order to specify a mathematical model for a channel, we shall specify:
1. the set of possible inputs to the channel,
2. the set of possible outputs,
3. for each input, a probability measure on the set of outputs.
Discrete memoryless channels (DMC) are a simple class of channel models and
can be defined as follows: the input is a sequence of letters from a finite alphabet
X = {a1, . . . , aK} , and the output is a sequence of letters from the same or a dif-
ferent alphabet Y = {b1, . . . , bJ}. Each letter in the output sequence is statistically
dependent only on the letter in the corresponding position of the input sequence
and is determined by a fixed conditional probability assignment P (bj|ak) defined for
each letter ak in the input alphabet and each letter bj in the output alphabet. For
19
3 – Polar code theory
Figure 3.1: Binary symmetric channel (BSC).
example, the binary symmetric channel (BSC) (see Figure 3.1) is a discrete mem-
oryless channel (DMC) with binary input and output sequence where each digit in
the input sequence is reproduced correctly at the channel output with some fixed
probability 1−p and is altered by noise into the opposite digit with probability p. In
general, for discrete memoryless channels, the transition probability assignment tells
us everything that we have to know about how the noise combines with the channel
input to produce the channel output. Another class of channel models which bears a
more immediate resemblance to physical channels is the class where the set of inputs
and set of outputs are each a set of time functions (waveforms), and for each input
waveform the output is a random process. A particular model in this class which is
of great theoretical and practical importance (particularly in space communication)
is the additive Gaussian noise channel. The set of inputs for this model is the set of
time functions with a given upper limit on power and the output is the sum of the
input plus withe Gaussian noise.[29]
The binary-input discrete memoryless channel (B-DMC) can be defined with W :
X → Y , where X = {0,1} is the input alphabet, Y is the output alphabet, and
W(y|x) are the transition probabilities for every x ∈ X ; y ∈ Y . The output alpha-
bet and the transition probabilities may be arbitrary.
Given a B-DMC , there are two channel parameters of primary interest for polar
codes: the symmetric capacity and the Bhattacharyya parameter. The symmetric
capacity is also known as maximum of the average mutual information, that is the
mean of the mutual information, which is a random variable defined as in 3.1.
IX;Y (ak, bj) = log
PX|Y (ak|bj)
PX(ak)
= log
PY |X(bj|ak)
PY (bj)
= IY ;X(bj, ak) (3.1)
where {a1, . . . , aK} is the X sample space, {b1, . . . , bJ} is the Y sample space and
XY is the joint ensemble with the probability assignment PXY (ak, bj). An event
x = ak might be interpreted as the input letter into a noisy discrete channel and
y = bj its output so 3.1 gives the information provided about the event x by the
20
3 – Polar code theory
occurrence of the event y.
The base of the logarithm defines the numerical scale used to measure information.
For base 2 logarithms, the numerical value is called the number of bits (binary digits)
of information, while for base e (natural logarithms) is the number of nats (natural
units).
The average mutual information between input and output is given by 3.2.
I(W ) = I(X;Y ) ,
K∑
k=1
J∑
j=1
PXY (ak, bj)log
PX|Y (ak|bj)
PX(ak)
(3.2)
For a DMC where Q(k) is the probability to measure to the input integer k and
P (j|k) is the transition probability of receiving integer j given k at the channel
input, 3.2 can be written as
I(W ) = I(X;Y ) =
K∑
k=1
J∑
j=1
Q(k)P (j|k)log P (j|k)
K∑
i=1
Q(i)P (j|i)
(3.3)
The capacity C of a discrete memoryless channel (DMC) can be written as
C , max
Q(0),··· ,Q(K−1)
∑
k,j
Q(k)P (j|k)log P (j|k)∑
i
Q(i)P (j|i) (3.4)
Notice that I(X;Y ) is a function of both the channel and the input assignment,
while C is a function only of the channel.
The evaluation of C involves a maximization over K variables with following con-
straints: Q(k) ≥ 0 and ∑Q(k) = 1. Since the function is continuous and the
maximization is over a closed bounded region of vector space, the maximum value
must exist.
The average mutual information for a DMC with independent identical distributed
(i.i.d.) inputs has been proved to be the symmetric capacity as in [29, Sec. 4.5] and
for a B-DMC can be written as in 3.5
I(W ) =
∑
y∈Y
∑
x∈X
1
2
W (y|x)log W (y|x)
1
2
W (y|0) + 1
2
W (y|1)
(3.5)
Another important information is given by the Bhattacharyya parameter of W,
Z(W ) ,
∑
y∈Y
√
W (y|0) +W (y|1) (3.6)
that is the upper bound on the probability of maximum-likelihood decision error
when W is used only once to transmit 0 or 1. The Bhattacharyya measure has a
21
3 – Polar code theory
simple geometric interpretation as the cosine of the angle between theK-dimensional
vectors (
√
W (y1|0), . . . ,
√
W (yk|0)) and (
√
W (y1|1), . . . ,
√
W (yk|1)).[30]
The parameter in 3.6 will be used instead of 3.5, to select the information set of
good channels W (i)N . Intuitively the relation between I(W ) and Z(W ) are:
I(W ) ≈ 1⇔ Z(W ) ≈ 0
I(W ) ≈ 0⇔ Z(W ) ≈ 1 (3.7)
Considering the cutoff symmetric channel it can be written as:
E0(ρ,Q) = −log
J−1∑
j=0
[
K−1∑
k=0
Q(k)P (j|k)1/(1+ρ)
]1+ρ
(3.8)
It is proven in [29, Sec. 5.6] that I(W ) ≥ E0(1, Q) so it can be rewritten as
E0(1, Q) = −log
J−1∑
j=0
[
K−1∑
k=0
Q(k)P (j|k)1/2
]2
=
= −log
J−1∑
j=0
[
Q(0)P (j|0) 12 +Q(1)P (j|1) 12
]2
= −log
J−1∑
j=0
[
1
2
√
P (j|0) + 1
2
√
P (j|1)
]2
= log
1
J−1∑
j=0
[
1
4
P (j|0) + 1
4
P (j|1) + 1
2
√
P (j|0)√P (j|1)]
= log
2
J−1∑
j=0
[
1
2
P (j|0) + 1
2
P (j|1)
]
+
J−1∑
j=0
[√
P (j|0)P (j|1)
]
= log
2
1 + Z(W )
(3.9)
The information of 3.7 is then partially given by the following inequality.
I(W ) ≥ log 2
1 + Z(W )
(3.10)
22
3 – Polar code theory
To impose an upper limit to I(W ) for a B-DMC, firstly define variation distance
d(W ) as
d(W ) , 1
2
∑
y∈Y
|W (y|0)−W (y|1)| (3.11)
Let us consider 3.5; it can be explicitly written for both inputs as
I(W ) =
∑
y∈Y
1
2
W (y|0)log W (y|0)1
2
W (y|0) + 1
2
W (y|1)
+W (y|1)log W (y|1)
1
2
W (y|0) + 1
2
W (y|1)

(3.12)
The ith term in brackets is given by
f(x) = x log
x
x+ δ
+ (x+ 2δ)log
x+ 2δ
x+ δ
(3.13)
where x = min{W (y|0),W (y|1)} and δ = 1
2
|W (y|0)−W (y|1)|. To maximize f(x)
over 0 ≤ x ≤ 1− 2δ it is computed:
df(x)
dx
= log
(
x(x+ 2δ)
(x+ δ)2
)
=
1
2
log
(√
x(x+ 2δ)
(x+ δ)
)
(3.14)
It appears
√
x(x+ 2δ) the geometric mean and (x + δ) the arithmetic mean of x
and (x+ 2δ), consequently df
dx
≤ 0 and f(x) is maximum when x = 0. In conclusion
f(x)|x=0 = 2δ so f(x) ≤ 2δ and substituting in 3.12 it follows:
I(W ) ≤
∑
y∈Y
1
2
|W (y|0)−W (y|1)| = d(W ) (3.15)
Let Ri , (W (yi|0) +W (yi|1))/2 and δi , 12 |W (yi|0)−W (yi|1)|, then Z(W ) can be
rewritten as:
Z(W ) =
∑
yi∈Y
√
(Ri + δi)(Ri − δi) =
J∑
i=1
√
(R2i − δ2i ) (3.16)
To carry out the maximization of 3.16 over δi subject to 0 ≤ δi ≤ Ri and i = 1, . . . , J ,
partial derivatives are computed:
dZ(W )
dδi
= − δi√
R2i − δ2i
d2Z(W )
dδ2i
= − R
2
i
(R2i − δ2i )
3
2
(3.17)
23
3 – Polar code theory
Z(W) is therefore is decreasing over all its domain and it is a concave function of δi
for each i. The maximum occurs at the solution of the set of i equations
dZ(W )
dδi
= k,
where k is a constant, in other words for δi = Ri
√
k2
(1 + k2)
.
Impose that d(W ) =
J∑
i=1
δi = δ and notice the fact that
J∑
i=1
Ri = 1, we find√
k2
(1 + k2)
= δ.
So the maximum occurs at δi = δRi and 3.16 has the value
J∑
i=1
√
R2i − δ2R2i =
√
1− δ2. We have thus shown that Z(W ) ≤ √1− d(W )2, which is equivalent to
d(W ) ≤√1− Z(W )2.
In conclusion following inequality which relates I(W ) and Z(W ) can be written.
I(W ) ≤ d(W ) ≤
√
1− Z(W )2 (3.18)
An important channel considered to obtain specific results (later discussed) for polar
codes theory is the B-DMC called binary erasure channel (BEC). As represented in
Figure 3.2: Binary Erasure Channel (BEC).
Figure 3.2 this channel has binary input X = {0,1} and ternary outputs Y = {0,1, e},
where the inputs are received unaltered with fixed probability 1−p otherwise inputs
are completely lost (erased) with probability p, so the symbol e is received with
probability p, called erasure probability.
Another important definition is the Kronecker product of a matrix A = [Ai,j] of
24
3 – Polar code theory
dimension mxn and matrix B = [Bi,j] of dimension rxs as:
A⊗B =
 A1,1B · · · A1,nB... . . . ...
Am,1B · · · Am,nB
 (3.19)
which is anmrxnsmatrix. Starting from this definition it can be given theKronecker power
A⊗n which is defined with a recursion for all n ≥ 1 as:
A⊗n = A⊗ A⊗(n−1) (3.20)
It is also given by convention that A⊗0 , [1].
3.2 Channel polarization effect
The channel polarization effect described in [23] is given by the creation of a vector
of N synthetic channels {W (i)N : 1 ≤ i ≤ N} from N independent copies of a
given B-DMC channel that causes as N becomes large, the symmetric capacity
terms {I(W (i)N )} tend towards 0 or 1 for all channels, with exception of a vanishing
fraction of indices i. This effect can be achieved through channel splitting and
channel combining operations based on [31].
3.2.1 Channel combining
Starting from identical copies of a given B-DMC channelW , with a recursive method
is produced the synthetic channel vector WN : XN → YN , where N = 2n, n ≥ 0.
For n = 0 the synthetic channel vector equals the given channel W1 , W , so no
Figure 3.3: W2 channel.
modification are introduced. For the second step of recursion (n = 1) two copies of
W1 are used to obtain channel W2 : X 2 → Y2 with the transition probability
W2(y1, y2|u1, u2) = W (y1|u1 ⊕ u2)W (y2|u2) (3.21)
25
3 – Polar code theory
In Figure 3.3 is given a graphical representation of how two channels are combined
to obtain the synthetic channel vector for n = 1. The recursion then follows by using
the new synthetic channels and combine again in the same way of the previous step,
so since W2 has two channels, two copy of it will be used to generate W4 : X 4 → Y4
with transition probability
W (y41|u41) = W2(y21|u1 ⊕ u2, u3 ⊕ u4)W2(y43|u2, u4) (3.22)
where it is used the general notation for row vectors aji as (a1, . . . , aN) with 1 ≤ i,
j ≤ N . If j < i, aji is considered void. In Figure 3.4 graphical representation
Figure 3.4: W4 channel obtained by recursions of W2 and W .
of interconnection for channel W4 is shown, it is also pointed out the permutation
operation R4 that maps an input s41 = (s1, s2, s3, s4) to v41 = (s1, s3, s2, s4). The
generating matrix that maps u41 → x41 form the inputs of W4 to the inputs of every
W (this set will be called W 4) can be written as
x41 = u
4
1G4 = u
4
1

1 0 0 0
1 0 1 0
1 1 0 0
1 1 1 1
 (3.23)
So it is generated the relation between the transition probabilities of W4 and those
of W 4.
W4(y
4
1|u41) = W 4(y41|u41G4) (3.24)
The recursion for creating the channel vector is generalized in Figure 3.5 where two
26
3 – Polar code theory
Figure 3.5: Generalized WN channel obtained by recursion of two WN
2
.
independent copies of WN
2
are combined to produce the channel WN . The input
vector uN1 is transformed according to:{
s2i−1 = u2i−1 ⊕ u2i
s2i = u2i
(3.25)
for 1 ≤ i ≤ N
2
. The operator RN in figure is a reverse shuffle permutation
operation over input sN1 to produce vN1 = (s1, s3, . . . , sN − 1, s2, s4, . . . , sN). vN1 is
the input of the two copies of WN
2
.
The overall mapping uN1 → xN1 from inputs to the inputs of raw channels WN , is
linear thanks the fact that every mapping uN1 → vN1 in the recursion is linear over
GF (2). So the final mapping can be represented by the generator matrix GN of
size NxN , where
xN1 = u
N
1 GN (3.26)
The transition probabilities of the two channels WN and WN are related by
WN(y
N
1 |uN1 ) = WN(yN1 |uN1 GN) (3.27)
27
3 – Polar code theory
for every yN1 ∈ YN , uN1 ∈ XN .
In Section 3.3 is shown that GN = BNF⊗n where N = 2n, n ≥ 0, BN is the
bit− reversal permutation matrix and F ,
[
1 0
1 1
]
.
As presented, the channel combining operation is fully specified by F , and GN and
F⊗n differ only by the (bit-reversed) order of rows.
3.2.2 Channel splitting
The process of splitting the vector channel WN into N binary-input independent
copies of channel W is called channel splitting. Formally W (i)N : X → YNxX i−1, for
1 ≤ i ≤ N , defined by the transition probabilities
W
(i)
N (y
N
1 , u
i−1
1 |ui) ,
∑
uNi+1∈XN−i
1
2N−1
WN(y
N
1 |uN1 ) (3.28)
where ui is the given input and (yN1 , u
i−1
1 ) is the output ofW
(i)
N . To have an intuitive
understanding of the channels {W (i)N }, consider to use a genie-aided decoder in which
the i-th decision of ui is taken after observing yNi and previous channel inputs u
i−1
1
supplied correctly by the genie regardless of any decision errors at early stages. If
uN1 is a-priori uniform on XN , then W (i)N is the effective channel seen by the ith
decision element.
For example in the case of synthetic channel vector W2, it can be split in W
(1)
2 and
W
(2)
2 , the transition probabilities of W
(1)
2 : X → Y2 and W (2)2 : X → Y2xX can be
written as:
W
(1)
2 (y
2
1|u1) = 12
∑
u2
W2(y
2
1|u21) = 12
∑
u2
W (y1|u1 ⊕ u2)W (y2|u2)
W
(2)
2 (y
2
1, u1|u2) = 12W2(y21|u21) = 12W (y1|u1 ⊕ u2)W (y2|u2)
(3.29)
where 3.25 channel combining information has been used. For the {W (i)N } channels
the Bhattacharyya parameter 3.6 becomes
Z(W
(i)
N ) =
∑
yN1 ∈YN
∑
ui−11 ∈X i−1
√
W
(i)
N (y
N
1 , u
i−1
1 |0)W (i)N (yN1 , ui−11 |1) (3.30)
3.2.3 Operations on recursive synthetic channels
The goal of this section is to show that the operations 3.27 and 3.28 obtained for
entire synthetic channel W (i)N originates from recursive single-step transformation of
same operation on channels which created it.
28
3 – Polar code theory
Considering a pair of binary-input channels W ′ : X → Y˜ and W ′′ : X → Y˜xX are
obtained by a single step-step transform of two independent copies of a binary-input
channel W : X → Y if exist a one-to-one mapping f : Y2 → Y such that
W ′(f(y1, y2)|u1) =
∑
u′2
1
2
W (y1|u1 ⊕ u′2)W (y2|u′2)
W ′′(f(y1, y2), u1|u2) = 12W (y1|u1 ⊕ u2)W (y2|u2)
(3.31)
for every u1, u2 ∈ X , y1, y2 ∈ Y . If it is true we can write:
(W,W )→ (W ′,W ′′) (3.32)
Let us recall 3.28 for 2N channels and 1 ≤ i ≤ N , we write
W
(2i−1)
2N (y
2N
1 , u
2i−2
1 |u2i−1) =
∑
u2N2i
1
22N−1
W2N(y
2N
1 |u2N1 )
W
(2i)
2N (y
2N
1 , u
2i−1
1 |u2i) =
∑
u2N2i+1
1
22N−1
W2N(y
2N
1 |u2N1 )
(3.33)
so considering the first equation and introducing subscript o and e on a vector that
respectively represent a subvector of the initial one composed by elements with only
odd or even indices, follows:
W
(2i−1)
2N (y
2N
1 , u
2i−2
1 |u2i−1) =
∑
u2N2i,o,u
2N
2i,e
1
22N−1
W2N(y
2N
1 |u2N1 ) =
=
∑
u2N2i,o,u
2N
2i,e
1
22N−1
WN(y
N
1 |u2N1,o ⊕ u2N1,e )WN(y2NN+1|u2N1,e ) =
=
∑
u2i
1
2
∑
u2N2i+1,e
1
2N−1
WN(y
2N
N+1|u2N1,e )
∑
u2N2i+1,o
1
2N−1
WN(y
N
1 |u2N1,o ⊕ u2N1,e ) =
=
∑
u2i
1
2
∑
u2N2i+1,e
1
2N−1
WN(y
2N
N+1|u2N1,e )W (i)N (yN1 , u2N1,o ⊕ u2N1,e |u2i−1 ⊕ u2i) =
=
1
2
∑
u2i
W
(i)
N (y
2N
N+1, u
2i−2
1,e |u2i,e)W (i)N (yN1 , u2i−21,o ⊕ u2i−21,e |u2i−1 ⊕ u2i)
(3.34)
because, as u2N2i+1,o rages over XN−i, u2N2i+1,o⊕u2N2i+1,e ranges also over XN−i. Similarly
for second equation of 3.31 we can write
W
(2i)
2N (y
2N
1 , u
2i−1
1 |u2i) =
∑
u2N2i+1
1
22N−1
W2N(y
2N
1 |u2N1 ) =
= 1
2
∑
u2N2i+1,e
1
2N−1
WN(y
2N
N+1|u2N1,e )
∑
u2N2i+1
1
2N−1
WN(y
N
1 |u2N1,o ⊕ u2N1,e ) =
= 1
2
W
(i)
N (y
N
1 , u
2i−2
1,o ⊕ u2i−21,e |u2i−1 ⊕ u2i)W (i)N (y2NN+1, u2i−21,e |u2i)
(3.35)
29
3 – Polar code theory
Considering following substitutions in 3.34 and 3.35 is possible to prove that these
equations are equal to 3.31.
W
(i)
N → W,
W
(2i−1)
2N → W ′,
W
(2i)
2N → W ′′,
u2i−1 → u1,
u2i → u2,
(yN1 , u
2i−2
1,o ⊕ u2i−21,e )→ y1,
(y2NN , u
2i−2
1,e )→ y2,
(y2N1 , u
2i−2
1 )→ f(y1, y2).
(3.36)
So the recursive mapping used can be written in general form as
(W
(i)
N ,W
(i)
N )→ (W (2i−1)2N ,W (2i)2N ) (3.37)
This shows that channel transformation fromWN to (W
(1)
N , . . . ,W
(N)
N ) can be broken
into single-step channel transformation.
3.2.4 Channel polarization
In the previous sections, using the channel combining and splitting operation, we
have created N new channels W (i)N . Let us consider the core recursion for two
identical channels and to use a simplified case of example this can be thought to be
equal to the case n = 1 recursion which generates 3.29. In section 3.2.3 the validity
of this approach has been proven.
This result can be exploited to evaluate how rate I(W (i)N ) and reliability Z(W
(i)
N )
parameters in function of the single-step transformation.
Suppose (W,W )→ (W−,W+), where W : X → Y , W− : X → Y˜ , W+ : X → Y˜xX
and there is a one-to-one function f : Y → Y˜ such that 3.31 are verified. Let us
Figure 3.6: Channel variables relationship.
consider the pairs of random variables (U0, U1) uniformly distributed over X 2, they
30
3 – Polar code theory
generate (X0, X1) = (U0 ⊕ U1, U1) (Figure 3.6). The transition probability becomes
PY0,Y1|X0,X1 = W (y0|x0)W (y1|x1), and define Y˜ = f(Y0, Y1). It is possible to write
W−(y˜|u0) = PY˜ |U0(y˜|u0)
W+(y˜, u0|u1) = PY˜ U0|U1(y˜, u0|u1)
(3.38)
Considering also the fact that (Y0, Y1)→ Y˜ is invertible, we get
I(W−) = I(U0; Y˜ ) = I(U0;Y0Y1)
I(W+) = I(U1; Y˜ U0) = I(U1;Y0Y1U0)
(3.39)
Since U0 and U1 are independent, I(U1;Y0Y1U0) = I(U1;Y0Y1|U0), using the chain
rule of mutual information is possible to write
I(W−) + I(W+) = I(U0;Y0Y1) + I(U1;Y0Y1U0) =
= I(U0;Y0Y1) + I(U1;Y0Y1|U0) =
= I(U0U1;Y0Y1) =
= I(X0X1;Y0Y1)
(3.40)
where the one-to-one relation between (X0, X1) and (U0, U1) has been used. From
Figure 3.6 can be observed that
I(X0X1;Y0Y1) = I(X0;Y0) + I(X1;Y1) = 2I(W ) (3.41)
Starting from this result is possible to prove I(W+) ≥ I(W ) by noticing
I(W+) = I(U1;Y0Y1U0) =
= I(U1;Y1) + I(U1;Y0U0|Y1) =
= I(W ) + I(U1;Y0U0|Y1)
(3.42)
In conclusion is possible to write
I(W−) + I(W+) = 2I(W )
I(W−) ≤ I(W ) ≤ I(W+) (3.43)
The first equality in 3.43 means that the single-step channel transform preserves
the symmetric capacity. The second inequality becomes I(W−) = I(W ) = I(W+)
if and only if W is a perfect noiseless channel ((W ) = 1) or a completely noisy
channel (I(W ) = 0), otherwise the single-step transform moves symmetric capacity
as I(W−) < I(W ) < I(W+), thus generating polarization. The last sentence is
proven by studying when I(U1;Y0U0|Y1) = 0, this can be rewritten equivalently as
PU0,U1,Y0|Y1(u0, u1, y0|y1) = PU0,Y0|Y1(u0, y0|y1)PU1|Y1(u1|y1) (3.44)
31
3 – Polar code theory
for all (u0, u1, y0, y1) such that PY1(y1) > 0, or equivalently
PY0,Y1|U0,U1(y0, y1|u0, u1)PY1(y1) = PY0,Y1|U0(y0, y1|u0)PY1|U1(y1|u1) (3.45)
for all (u0, u1, y0, y1). Since PY0,Y1|U0,U1(y0, y1|u0, u1) = W (y0|u0 ⊕ u1)W (y1|u1), it is
possible to write 3.45 as
W (y1|u1)[W (y0|u0 ⊕ u1)PY1(y1)− PY0,Y1|U0(y0, y1|u0)] = 0 (3.46)
By construction PY1(y1) =
1
2
W (y1|u1) + 12W (y1|u1 ⊕ 1) and PY0,Y1|U0(y0, y1|u0) =
1
2
W (y0|u0 ⊕ u1)W (y1|u1) + 12W (y0|u0 ⊕ u1 ⊕ 1)W (y1|u1 ⊕ 1), simplifying is possible
to obtain:
W (y1|u1)W (y1|u1 ⊕ 1)[W (y0|u0 ⊕ u1)−W (y0|u0 ⊕ u1 ⊕ 1)] = 0 (3.47)
choosing (u0, u1) = (0,0), (but is equal for all four realizations) 3.48 becomes:
W (y1|0)W (y1|1)[W (y0|0)−W (y0|1)] = 0 (3.48)
so ifW (y0|0) = W (y0|1) implies I(W ) = 0 or exist no y1 such thatW (y1|0)W (y1|1) >
0 which means I(W ) = 1.
Figure 3.7: Binary tree for the recursive construction of synthetic channels.
In Figure 3.7 is shown the process of recursive construction of channels by a
binary tree graph. Starting from the root node which is associated to channel W ,
two children channels W (1)2 and W
(2)
2 are generated. According to previous result
32
3 – Polar code theory
3.43, the second channel has better transmission property. This better channel is
then used in the recursion to produce other two channels (W (3)4 and W
(4)
4 ), which
again will be subject to 3.43 and so it will generate another better channel in terms
of capacity than the originating one and so on. This concept can be applied to all
generated channels both good and bad.
The final step is to proof that the recursion produces overall channels which are
more and more noiseless or noisy with the following theorem.
Theorem
For any B-DMC W , the channels {W (i)N } polarize in the sense that, for any fixed
δ ∈ (0,1), as N goes to infinity through powers of two, the fraction of indices
i ∈ {1, . . . , N} for which I(W (i)N ) ∈ (1− δ,1] goes to I(W ) and the fraction for which
I(W
(i)
N ) ∈ [0, δ) goes to 1− I(W ).
Proof
Let us consider the stochastic convergence property of the random sequence {In},
where In = I(Hn) is a random process obtained by the random channel process
Kn = Wb1...,bn .
{b1 . . . , bn} is a binary label of a channel W (i)2n (as also reported in Figure 3.7).
Calling Ω the space of all binary sequences (b1 . . . , bn) ∈ 0,1∞, F the Borel field
(BF) generated by the cylindrical set of coordinates S(b1 . . . , bn) , {ωn1 ∈ Ω : ω1 =
b1, . . . , ωn = bn}. So is possible to define F0 as the trivial BF of the null set, from
which follows F0 ⊂ F1 ⊂ · · ·Fn.
The sequence of random variable and BF {In,Fn;n ≥ 0} is a martingale if:
1. Fn ⊂ Fn+1 and In is Fn measurable
2. E{|In|} <∞
3. In = E{In+1|Fn}
The first condition is true by construction. The second statement is verified consid-
ering the fact that 0 ≤ In ≤ 1. The last condition is verified by writing
E{In+1|S(b1, . . . , bn)} = 1
2
I(Wb1,...,bn−1,0) +
1
2
I(Wb1,...,bn−1,1) = I(Wb1,...,bn) (3.49)
where the first equation of 3.43 is used and I(Wb1,...,bn) is the value of In on S(b1, . . . , bn).
So it follows that the sequence {In;n ≥ 0} converges almost everywhere to a ran-
dom variable I∞, such that E{I∞} = I0 by the general convergence results about
uniformly integrable martingales.
33
3 – Polar code theory
The limit of RV I∞ so will take values in {0,1} thanks to the transformation
(W,W ) → (W−2 ,W+2 ) as determined by the equality condition of second equation
in 3.43.
In Figure 3.8 is shown the polarization effect for a BEC channel at the increasing
2 4 6 8 10 12 14 16
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
(a)
10 20 30 40 50 60
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
(b)
20 40 60 80 100 120
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
(c)
100 200 300 400 500 600 700 800 900 1000
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
(d)
Figure 3.8: Symmetric capacity I(W (i)N ) for N BEC identical channels with erasure
probability  = 0.5. a)N = 16 b)N = 64 c)N = 128 d)N = 1024.
of codeword length N , in particular it can be noticed that I(W (i)N ) tends to be near
0 for small i and near 1 for i→ N .
34
3 – Polar code theory
3.3 Encoding and decoding of Polar Codes
The polarization effect can be exploited to construct codes that achieve the symmet-
ric channel capacity I(W ) by a method that is called polar coding. The fundamental
idea of polar coding is to synthesize, out of N independent identical copies of a given
B-DMC channel, a second set of N binary-input channelsW (i)N which can be individ-
ually accessed and send data only through those for which Z(W (i)N ) ' 0 to achieve
I(W
(i)
N ) ' 1 i.e. almost noiseless channel.
The general linear block code is expressed by
xN1 = u
N
1 GN (3.50)
where GN is the generator matrix of order N as defined in 3.26. Considering a
subset A ⊂ {1, · · · , N} is possible to rewrite it as
xN1 = uAGN(A)⊕ uAcGN(Ac) (3.51)
where GN(A) is the submatrix of GN created by the rows with index in A. The set A
is called the information set and uAc ∈ XN−K is the frozen bit vector. A mapping
from the source blocks uA to the codeword blocks xN1 is obtained by fixing A and
uAc and leaving uA as a free variable. This class of codes is called GN − coset codes,
identified by a vector (N,K,A, uAc) where N is the size of original GN matrix, K
is the size of A, so K/N is the code rate.
The following example shows the encoder mapping for a code (4,2, {2,4}, (0,0))
x41 = u
4
1G4 =
= (u2, u4)
[
1 0 1 0
1 1 1 1
]
+ (0,0)
[
1 0 0 0
1 1 0 0
]
(3.52)
So given a realization of the source block (u2, u4) = (1,1), the coded block becomes
x41 = (0,1,0,1). Therefore in polar codes is of particular importance the choice of the
frozen bits to achieve best coding performance as previously discussed.
The initial set of data uN1 , composed by information and frozen bits, is encoded
into codeword xN1 , the coded information is sent over WN vector channel and it is
received the channel output yN1 . The decoder goal is to correctly estimate uˆN1 of uN1 ,
given the knowledge of yN1 , A and uAc . So the decoder only needs to estimate uˆA of
uA because the decoding of frozen information is already known.
First proposed decoding method for decoding polar codes was Successive Cancellation
(SC), where given any (N,K,A, uAc) code, decision of uˆN1 are taken by computing:
uˆi ,
{
ui, if i ∈ Ac
hi(y
N
1 , uˆ
i−1
1 ), if i ∈ A (3.53)
35
3 – Polar code theory
where i goes in crescent order from 1 toN and the decision functions hi : YNxX i−1 →
X , i ∈ A are defined as:
hi ,
 0, if
W
(i)
N (y
N
1 , uˆ
i−1
1 |0)
W
(i)
N (y
N
1 , uˆ
i−1
1 |1)
≥ 1
1, otherwise
(3.54)
for all yN1 ∈ YN , uˆi−11 ∈ X i−1. A block error occurs if uˆN1 /= uN1 or equivalently uˆA /=
uA. Decision functions {hi} can be efficiently computed using recursive formulas,
but as drawback they are not maximum-likelihood (ML) decision because “future”
frozen bits (uj : j > i, j ∈ Ac) are considered random variables instead of known
bits. The exact computation of a-posteriori probability with all a-priori information
is performed by belief propagation algorithm, so better decoding performance are
expected at the cost of the increase of computational complexity.
3.4 Belief propagation in polar codes
Polar codes are similar to Reed-Muller codes as discussed in [32], so they can be
decoded using Belief Propagation on Forney-style factor graphs, as described in [33]
for Reed-Muller codes. An implementation for polar codes was conceptually pre-
sented in [34]. It was based on the direct mapping of the factor graph representation
(as in Figure 3.9 for codeword length N = 8) into a fully-parallel implementation
architecture. Each node of the factor graph can be identified by a couple of integers
(i, j), 1 ≤ i ≤ n + 1, 1 ≤ j ≤ N where the polar code length is N = 2n, so the
factor graph contains (n + 1) levels. The rightmost nodes (n + 1, j) are associated
with codeword x, that is the channel output y if considered for the decoding scheme
with noise of the channel, while leftmost nodes (1, j) elaborate source word u, that
are searched information.
General BP algorithm allows to determine codeword û , by implementing 3.55,
that is the ratio of the a posteriori probabilities of having received a 0 rather than
a 1 in position j and evaluated as in 2.13.
P (xj = 0|y)
P (xj = 1|y) =
P (yj|xj = 0)
P (yj|xj = 1)
P (xj = 0)
P (xj = 1)
(3.55)
Value û is determined bit per bit in the following way:
x̂j = 0 if P (xj = 0|y) > P (xj = 1|y)⇒ P (xj = 0|y)
P (xj = 1|y) > 1
x̂j = 1 if P (xj = 0|y) < P (xj = 1|y)⇒ P (xj = 0|y)
P (xj = 1|y) < 1
(3.56)
36
3 – Polar code theory
Figure 3.9: Forney-style graph representation of a polar code of length eight.
The final probability on x̂j used in 3.56 can be computed by evaluating probabili-
ties associated to each node (i, j) of the propagation graph, exploiting the recursive
structure of polar codes already discussed. So it is possible to study the single BP
computational block composed the coding part plus the channels which may repre-
sent either external or synthetic channels.
Messages inside each block are called left-propagating or right propagating accord-
Figure 3.10: Basic computational block for BP polar code decoding. Pointed out
messages for the evaluation of L̂1 information.
37
3 – Polar code theory
ing to the direction which they cross the block. In Figure 3.10 is presented the case
for the evaluation of upper left-going probability ratio, that will be called Lˆ1, given
the information available on all possible source of data (left and right propagating
messages). It is possible to rewrite left hand part of 3.55 as:
L̂1 =
P (u = 0|y1, y2, y˜2)
P (u = 1|y1, y2, y˜2) =
=
Px1,x2,y1,y2,y˜2(x1, x1 ⊕ 0, y1, y2, y˜2)
Px1,x2,y1,y2,y˜2(x1, x1 ⊕ 1, y1, y2, y˜2)
where has been introduced support variables x1, x2. Imposing the coding property
u = x1 ⊕ x2, and by setting u equal to the correct value in the numerator or
denominator, x2 only depends by the value of x1. So using Bayes theorem as right
hand part of 3.55 follows:
L̂1 =
∑1
x1=0
P (x1) · P (x2) · P (y2|x2) · P (y˜2|x2) · P (y1|x1)∑1
x1=0
P (x1) · P (x2) · P (y2|x2) · P (y˜2|x2) · P (y1|x1)
=
=
∑1
x1=0
P (x1) · P (x1) · P (y2|x1) · P (y˜2|x1) · P (y1|x1)∑1
x1=0
P (x1) · P (x1 ⊕ 1) · P (y2|x1 ⊕ 1) · P (y˜2|x1 ⊕ 1) · P (y1|x1)
Then explicitly writing last equation, supposing equiprobable information data x1
and x2 and performing some algebraic operations as reported, it is possible to obtain
result 3.57.
L̂1 =
Px1(0)Px2(0)P (y2|0)P (y˜2|0)P (y1|0) + Px1(1)Px2(1)P (y2|1)P (y˜2|1)P (y1|1)
Px1(0)Px2(1)P (y2|1)P (y˜2|1)P (y1|0) + Px1(1)Px2(0)P (y2|0)P (y˜2|0)P (y1|1)
=
=
Px1(0)
Px1(1)
Px2(0)
Px2(1)
P (y2|0)
P (y2|1)
P (y˜2|0)
P (y˜2|1)
P (y1|0)
P (y1|1) + 1
Px1(0)
Px1(1)
Px2(1)
Px2(1)
P (y2|1)
P (y2|1)
P (y˜2|1)
P (y˜2|1)
P (y1|0)
P (y1|1) +
Px1(1)
Px1(1)
Px2(0)
Px2(1)
P (y2|0)
P (y2|1)
P (y˜2|0)
P (y˜2|1)
P (y1|1)
P (y1|1)
=
=
P (y2|0)
P (y2|1)
P (y˜2|0)
P (y˜2|1)
P (y1|0)
P (y1|1) + 1
P (y1|0)
P (y1|1) +
P (y2|0)
P (y2|1)
P (y˜2|0)
P (y˜2|1)
=
Observing that L1 =
P (y1|0)
P (y1|1) , L2 =
P (y2|0)
P (y2|1) and R˜2 =
P (y˜2|0)
P (y˜2|1) we get:
L̂1 =
L2R˜2L1 + 1
L1 + R˜2L2
(3.57)
38
3 – Polar code theory
Figure 3.11: Basic computational block for BP polar code decoding. Pointed out
messages for the evaluation of L̂2 information.
With similar process, with reference to Figure 3.11, it is possible to compute the
lower left propagating message L̂2 as
L̂2 =
P (y˜2 = 0|u, y1, y2)
P (y˜2 = 1|u, y1, y2) =
=
Px1,x2,y1,y2,u(x1, u = x1, y1, y2, u = x1 ⊕ x2)
Px1,x2,y1,y2,u(x1, u = x1 ⊕ 1, y1, y2, u = x1 ⊕ x2)
=
where the variable x2 is imposed equal 0 in the numerator and equal 1 in denomi-
nator.
L̂2 =
∑1
x1=0
P (x1) · P (y1|x1) · P (u|x1) · P (y2|0) · Px2(0)∑1
x1=0
P (x1) · P (y1|x1) · P (u|x1 ⊕ 1) · P (y2|1) · Px2(1)
=
=
Px1(0)P (y1|0)P (u|0)P (y2|0)Px2(0) + Px1(1)P (y1|1)P (u|1)P (y2|0)Px2(0)
Px1(0)P (y1|0)P (u|1)P (y2|1)Px2(1) + Px1(1)P (y1|1)P (u|0)P (y2|1)Px2(1)
=
=
Px1(0)
Px1(1)
P (y1|0)
P (y1|1)
P (u|0)
P (u|1)
P (y2|0)
P (y2|0)
Px2(0)
Px2(0)
+ 1
Px1(0)
Px1(1)
P (y1|0)
P (y1|1)
P (u|1)
P (u|1)
P (y2|1)
P (y2|0)
Px2(1)
Px2(0)
+
Px1(1)
Px1(1)
P (y1|1)
P (y1|1)
P (u|0)
P (u|1)
P (y2|1)
P (y2|0)
Px2(1)
Px2(0)
=
=
P (y1|0)
P (y1|1)
P (u|0)
P (u|1) + 1
P (y1|0)
P (y1|1)
P (y2|1)
P (y2|0) +
P (u|0)
P (u|1)
P (y2|1)
P (y2|0)
=
L̂2 =
L1R1 + 1
L1
1
L2
+R1
1
L2
= L2
R1L1 + 1
L1 +R1 (3.58)
39
3 – Polar code theory
So the result for the L̂2 probability ratio is given by 3.58.
Comparing Figure 3.11 with Figure 3.12, it easy to note that by considering
Figure 3.12: Basic computational block for BP polar code decoding. Pointed out
messages for the evaluation of R̂2 information.
P (y˜2|x2) instead of P (y2|x2), equations should remain the same. So it is possible to
substitute L2 with R˜2 to obtain the result 3.59.
L̂2 = L2
R1L1 + 1
L1 +R1
⇒ R̂2 = R˜2R1L1 + 1
L1 +R1
(3.59)
Last result for upper right propagating message R̂1 with reference of Figure 3.13
Figure 3.13: Basic computational block for BP polar code decoding. Pointed out
messages for the evaluation of R̂1 information.
40
3 – Polar code theory
can be computed similarly to 3.57 by
R̂1 =
P (y1 = 0|u, y2, y˜2)
P (y1 = 1|u, y2, y˜2) =
=
Px1,x2,y2,y˜2,u(0, x2, y2, y˜2, x2)
Px1,x2,y2,y˜2,u(1, x2, y2, y˜2, x2 ⊕ 1)
=
=
∑1
x2=0
Px1(0) · P (x2) · P (y2|x2) · P (y˜2|x2) · P (u|x2)∑1
x2=0
Px1(1) · P (x2) · P (y2|x2) · P (y˜2|x2) · P (u|x2 ⊕ 1)
=
=
Px1(0)Px2(0)P (y2|0)P (y˜2|0)P (u|0) + Px1(0)Px2(1)P (y2|1)P (y˜2|1)P (u|1)
Px1(1)Px2(0)P (y2|0)P (y˜2|0)P (u|1) + Px1(1)Px2(1)P (y2|1)P (y˜2|1)P (u|0)
=
=
Px1(0)
Px1(0)
Px2(0)
Px2(1)
P (y2|0)
P (y2|1)
P (y˜2|0)
P (y˜2|1)
P (u|0)
P (u|1) + 1
Px1(1)
Px1(0)
Px2(0)
Px2(1)
P (y2|0)
P (y2|1)
P (y˜2|0)
P (y˜2|1)
P (u|1)
P (u|1) +
Px1(1)
Px1(0)
Px2(1)
Px2(1)
P (y2|1)
P (y2|1)
P (y˜2|1)
P (y˜2|1)
P (u|0)
P (u|1)
=
=
P (y2|0)
P (y2|1)
P (y˜2|0)
P (y˜2|1)
P (u|0)
P (u|1) + 1
P (y2|0)
P (y2|1)
P (y˜2|0)
P (y˜2|1) +
P (u|0)
P (u|1)
=
R̂1 =
R˜2L2R1 + 1
R˜2L2 +R1
(3.60)
Logarithmic representation allows to convert multiplications into sums, so in order
to simplify final equations to implement, LR messages are converted to log-likelihood
ratio messages (LLRs) according to 3.61.
`d , lnLd
rd , lnRd
(3.61)
where d is the considered message index. To implement all equations, with reference
to general graph representation (ex. Figure 3.9), equations 3.57, 3.58, 3.60 and 3.59
can be respectively written as
`
(t+1)
i,j = f
(
`
(t)
i+1,j, `
(t)
i+1,j+Ni
+ r
(t)
i,j+Ni
)
(3.62)
`
(t+1)
i,j+Ni
= `
(t)
i+1,j+Ni
+ f
(
`
(t)
i+1,j, r
(t)
i,j
)
(3.63)
41
3 – Polar code theory
r
(t+1)
i+1,j = f
(
r
(t)
i,j , `
(t)
i+1,j+Ni
+ r
(t)
i,j+Ni
)
(3.64)
r
(t+1)
i+1,j+Ni
= r
(t)
i,j+Ni
+ f
(
r
(t)
i,j , `
(t)
i+1,j
)
(3.65)
where log-likelihood ratio (LLR) of left propagating messages `(t)i,j are considered for
the (i, j) node at time (or equivalently iteration) index t = 0,1, . . ., and for the
same node also exist a LLR of right propagating message r(t)i,j . These equations are
implemented in processing element (PE) to perform the iterative update of LLR
messages. The function f is given by
f(x, y) = ln
[
(1 + e(x+y))
(ex + ey)
]
(3.66)
for any two LLRs x, y. This function f can be replaced by min-sum approximation
as presented in section 3.8. LLR messages that correspond to initialization are
`
(0)
n+1,j = ln
(
P (yj|xj = 0)
P (yj|xj = 1)
)
= ln
(
W (yj|0)
W (yj|1)
)
(3.67)
r
(0)
1,j =
{
0 if j is a data index
∞ if j is a frozen-bit index. (3.68)
because by setting r(0)1,j =∞ for a frozen-bit it means that the associated probability
to be a zero is 1 and consequently the probability of being a one goes to 0, while by
setting a 0 it corresponds (in logarithms) of having equal probability to have value
zero or one in position j. Every other r(0)i,j and `
(0)
i,j will be equal 0 at iteration t = 0.
As previously presented every PE is composed of four connection nodes, two for
each side of the same level, so for a general graph for polar decoding of length N ,
the direct mapping implementation will be composed of 1
2
N log2N PEs.
The decision about the final result at the end of t iteration for every variable node
(i, j) is taken by computing `(t)i,j + r
(t)
i,j but for practical BP decoding only first level
should be analyzed in order to obtain uncoded estimated vector û as in 3.69
ûj = 0 if `
(t)
1,j + r
(t)
1,j ≥ 0
ûj = 1 if `
(t)
1,j + r
(t)
1,j < 0
(3.69)
3.5 Belief propagation scheduling
The scheduling of a belief propagation algorithm is the order in which messages are
propagated in the graph. For the case of a code without cycles, the scheduling does
not affect the algorithm convergence. For practical codes with cycles the order in
42
3 – Polar code theory
Figure 3.14: Bidirectional scheduling algorithms: A)Linear LR. B)Linear RL.
C)Circular LR. D)Circular RL.
which processing elements elaborate messages and in which are activated (so what we
refer when talking about scheduling) will affect the convergence due to the presence
of cycles in the graph. In literature the scheduling problem has not been faced
systematically for polar codes. A brief discussions about this topic is given in [35],
where Hussami et. al. state that for general BDMC the chosen scheduling directly
affects decoding performance and only present their used scheduling empirically
found (considered later in this work as scheduling C). Also in [4] this scheduling is
used and it is stated to led to better performance among other tested schedules by
authors.
Implicitly in [36] another scheduling is considered, it is described below as scheduling
G, and it allows to significantly reduce implemented hardware resources and as
consequence improve critical path delay.
In order to describe different scheduling algorithms for BP polar decoding, it has
been considered the general graph structure, every level (column of PEs) has been
considered a single block that could evaluate left or right LLR messages. Eleven BP
43
3 – Polar code theory
Figure 3.15: Bidirectional scheduling algorithms: A)Linear LR, start R. B)Linear
RL, start L. Unidirectional scheduling algorithms: G)Circular LR. H)Circular RL.
scheduling have been considered, they have been identified by capital letters and an
intuitive name as
A Linear bidirectional Left to Right. Every iteration starts from left-most stage
and ends at the right-most one; LLRs updates are bidirectional.
B Linear bidirectional Right to Left. Every iteration starts from right-most stage
and ends at the left-most one; LLRs updates are bidirectional.
C Circular bidirectional Left to Right. One iteration is composed of bidirectional
updates of each stage from left-most to right-most and back from right-most
to left-most.
D Circular bidirectional Right to Left. Bidirectional updates in one iteration are
performed from right-most to left-most and back from left-most to right-most.
44
3 – Polar code theory
Figure 3.16: Scheduling algorithms: I)Circular double-wave. J)All-on. K)Odd-
Even.
E Linear bidirectional Left to Right, Right start. The initial iteration is performed
as B scheduling, then following iterations are like A scheduling.
F Linear bidirectional Right to Left, Left start. The initial iteration is performed
as A scheduling, then following iterations are like B scheduling.
G Circular unidirectional Left to Right. During one iteration stages are updated
as C scheduling, but when from left-most to right-most updates only right
messages are propagated, then from right-most to left-most only left messages
are updated.
H Circular unidirectional Right to Left. During one iteration stages are updated as
D scheduling, but when from right-most to left-most updates only left messages
are propagated, then from left-most to right-most only right messages are
updated.
I Circular double-wave. Both scheduling G and H are executed in parallel.
45
3 – Polar code theory
J All-on. All stages are bidirectionally updated in parallel at every iteration.
K Odd-Even. During one iteration only odd numbered stages are enabled, then in
the following step only even numbered stages are updated. LLRs updates are
bidirectional.
All presented scheduling algorithms (SAs) have been depicted in Figures 3.14, 3.15
and 3.16 for the case of a 3 stage decoder (equivalent to code block length of N = 8),
together with their respective flow diagram for a better comprehension. Every stage
i may update `(t)i,j or r
(t)
i+1,j for every 1 ≤ j ≤ N according to equation from 3.62 to
3.65 for a given t. In the presented flow graphs `i,j computations are indicated by
iL and similarly ri+1,j updates as iR.
A scheduling is called bidirectional if in the same operation the entire column of PEs
which compose one stage outputs both left and right messages at once. On other
hand if in the same state only left or right updates are carried out, the scheduling is
named unidirectional. With reference to the Figures from 3.14 to 3.16, SAs can be
easily categorized by the observation of its related flow chart, if inside the same state
iL and iR appear, it is bidirectional scheduling while if only one of them appear it
is called unidirectional.
The minimum sequence of different operations that have to be performed, is called
iteration. In Figures from 3.62 to 3.65 it is also shown a sample sequence of acti-
vated stages for each scheduling. The red numbers identify the initial stage labels
of every iteration.
The specific target is to find best scheduling for high-throughput BP decoding,
where all LLRs are in parallel. In fact it is easy to see from equations 3.62, 3.63,
3.64 and 3.65 that there is no inputs dependence for a single processing stage. So
it is granted that PEs maximum delay will be the smallest possible by exploiting
intrinsic hardware parallelism. Therefore this solution avoids of assigning an order
of computation of LLRs as in [35] and [4], but follows the classical belief propaga-
tion philosophy of "flooding", that means that messages from the same node are all
updated together, so inside all PEs all messages are parallel.
In order to fairly compare the SAs, the iteration parameter is not a good candidate,
because the number of operations change from one scheduling to the other, so at
parity of iteration it is not considered the delay to achieve output results, which for
a practical implementation means time delay. The choice of the scheduling have to
point out the best solution in terms of convergence of the algorithm compared to the
effort to obtain it. To quantify the effort it has been considered the number of states
of the considered scheduling and the number of parallel operation for each state. In
Table 3.1 is presented the general result for each schedule in terms of number of
states (S), parallel operation (PO) and the derived parameter operation frequency
fOP =
1
S·PO , which describes the "weight" of single operation compared to a full
iteration of the scheduling.
46
3 – Polar code theory
Scheduling # of states S parallel operations PO operation frequency fOP
A E n 2
1
2n
B F n 2
1
2n
C 2n− 2 2 1
4(n− 1)
D 2n− 2 2 1
4(n− 1)
G 2n− 2 1 1
2(n− 1)
H 2n− 2 1 1
2(n− 1)
I n− 1 2 1
2(n− 1)
J 1 n
1
n
K 2 n/2
1
n
Table 3.1: Scheduling properties for codeword length N = 2n.
Nevertheless to perform some simulation on different codes, the number of iteration
is still needed. So we decided to assign a relatively small fixed number of maximum
operation, to evaluate comparable number of iterations for the different SAs pre-
sented. The choice of this number of total operation has been evaluated iteratively,
where at least one algorithm behaves well and differentiation among the obtained
performance of the SAs is good. With the implementation prospective, the number
of total operation can be thought as the allowed number of clock cycles for the final
architecture, so the number of states S becomes the number of clock period per
iteration and the product of number of operation and operation frequency becomes
the number of iteration of the architecture design.
The simulation result of different SAs performance for the code length of N = 210 is
presented in Figure 3.17. The number of iterations used for 200 operations are: A,
B, E, F 10; G, H, I 11; J, K 20. In the Figure presented, (A,E) and (B,F ) SAs have
been regrouped because their result are superposed, while C and D are not drawn
because for the same number of iteration they have performance respectively equal
to G and H, as proven in [37]. Note that the presence of the Successive Cancellation
(SC) decoding curve, below other curves grants that by increasing the number of
iterations, these curves would get closer and even overstep it. The min-sum approx-
imation (discussed in Section 3.8) for both BP SAs and SC decoding has been used
47
3 – Polar code theory
1 1.5 2 2.5 3 3.5 4
10−4
10−3
10−2
10−1
100
Eb/N0 (dB)
FE
R
 
 
SC
A E
B F
C G
D H
K
J
I
Figure 3.17: Scheduling algorithms comparison for N = 1024, rate 1
2
.
in order to consider also the approximation effects on scheduling algorithms.
Most efficient algorithms appear to be G and H, where only one operation per state
is used, however another interesting schedule is I, because if we unconstrained the
number of parallel operation (which for the implementation becomes number of PE
stages), same performances can be achieved in half iterations. The number of par-
allel operation so is clearly related to the area used in the implementation, therefore
for other SAs the efficiency drops and the comparable number of parallel opera-
tions makes clear that it would be possible to achieve better throughput to area
ratio performance just by replicating a simpler architecture with one of the previous
scheduling analyzed.
48
3 – Polar code theory
3.6 Uniform Belief Propagation Decoder Structure
The factor graph for polar coding, as represented in Figure 3.9, could be imple-
mented into encoder and decoder architectures (both software or hardware) as is.
However the non-uniform structure of the graph from one level to the next makes
more difficult to reuse processing modules. It is possible to give a more uniform ar-
chitecture design for the graph by introducing an operator which reorders the index
of input vector.[34]
Figure 3.18: Uniform graph representation with shuffle operators for F⊗3
It is possible to obtain an equivalent graph architecture introducing a shuffle
operator S which maps an input vector vN1 of length N = 2n, into the vector
(v1, vN
2
+1, v2, vN
2
+2, . . . , vi, vN
2
+i, . . . , vN
2
, vN), where 1 ≤ i ≤ N2 .
The block
⊕
is the module-2 addition operation that transform the vector vN1 into
(v1 ⊕ v2, v2, v3 ⊕ v4, v4, . . . , v2i−1 ⊕ v2i, v2i, . . . , vN−1 ⊕ vN , vN), where 1 ≤ i ≤ N2 .
This
⊕
block is equivalent to the last level of Figure 3.9. In Figure 3.18 the new
uniform structure graph is presented. Note that the uniform structure is end-to-end
equivalent to the non-uniform case and only internal interconnections have been
rearranged. The number of introduced S and
⊕
blocks is equal to n.
Figure 3.19: Uniform graph representation with reverse − shuffle operators for
F⊗3
Another possible uniform graph representation is given in Figure 3.19, where R is
the reverse − shuffle operator that transforms vector vN1 of even length N into
(v1, v3, . . . , viodd , . . . , vN−1, v2, v4, . . . , vieven , . . . , vN), for 1 ≤ i ≤ N . This graph im-
plements the inverse of the previous uniform structure, so since the inverse of the
generating matrix of polar codes is the matrix itself, also this realization is correct.
Due to the particular simple structure of
⊕
block the previous two uniform realiza-
tions have been presented, nevertheless there are many other uniform factor graphs.
49
3 – Polar code theory
Hardware implementations can benefit from these uniform realizations because the
same block can be reused without changing interconnections.
3.7 Graph Representation: Redundant Trellises
(a) (b) (c)
Figure 3.20: Example of three redundant trellises.
As observed in [35], the generating matrix of polar codes may be described cor-
rectly by many redundant representations. In Figure 3.20 three valid representation
are given for a code block length N = 8. For a generic code block length N = 2n the
number of possible different representations are n!. Decoding performance of these
overcomplete representation are different even for other conditions being equal.
Since the problem was open it has been decided to systematically investigate prop-
erty of redundant trellises for a dimension of code of practical use. It was selected
N = 210, which leads to about 3.6 · 106 different structures. To perform this task
a modified version of the simulation C code has been used. Starting from known
architecture as defined in literature [34], to every stage has been assigned a number
(P 0123456789), then all possible permutation of that sequence was used to repre-
sent different interconnections.
The list of structures under test has been realized as a linked list, in order to elimi-
nate bad candidates and let only good structures to be further tested. In fact due
to the big amount of candidates the simple moving on a list and checking a variable
(for good or bad candidate) would require too much time.
The optimization search has been conducted for D scheduling algorithm for BP
decoding with 15 iterations at 1dB SNR, in order to need few frames, but still
observing FER variations on performance. Achieved results has pointed out that
classical structure is among top 6 best trellises. Top 10 results have been also stud-
ied from 0 to 4dB, FER simulations are so close that can be considered practically
50
3 – Polar code theory
0 0.5 1 1.5 2 2.5 3 3.5 4
10−5
10−4
10−3
10−2
10−1
100
Eb/N0 (dB)
FE
R
 
 
SC
P 0123456789
P 0124356789
Figure 3.21: Comparison result of FER performance of two redundant graph repre-
sentations for 15 iterations.
superposed. In Figure 3.21 the best architecture discovered (P 0124356789) is com-
pared with the classical implementation.
Since the classical architecture has the good property of being an uniform structure
it has been proven to be not rewarding the choice of a different trellis structure.
3.8 Min-Sum Approximation
The Min-sum approximation has been introduced in order to simplify the evaluation
of the function Ψ(x) = ln tghx
2
= ln e
x−1
ex+1
for the algorithm implementation of Belief
Propagation LDPC codes. With the finite precision of an hardware implementation
this function shows numerical stability problems, so the introduction of a problem
relaxation (with small losses < 0.5dB) is acceptable considered that losses would
occur in any case. Other solutions, different from the one here exposed, are realized
by Look up tables or piecewise linear approximations of the function Ψ. But in
51
3 – Polar code theory
every case exists the problem of numerical stability.
This simplification firstly appeared in [38] named λ-Min Algorithm. It represents a
good compromise between achieved result and implementation complexity. The use
of this solution for polar codes has been proposed in [3].
Let us consider the module of the probability ratio for the general parity check ϕ
verified by expression 3.70.
Λϕ = 2tgh
−1
(
n∏
i=1
tgh
( |Λi|
2
))
(3.70)
When n = 2 it can be equivalently written as
Ω(x, y) = 2tgh−1
(
tgh
x
2
tgh
y
2
)
= 2sgn(x)sgn(y)tgh−1
(
tgh
|x|
2
tgh
|y|
2
)
(3.71)
Where function Ω(x, y) is proven to be an associative operator, for example if it is
given ϕ = c1 ⊕ c2 ⊕ c3, Λ12 = Ω(Λ1,Λ2) and Λ23 = Ω(Λ2,Λ3) then Ω(Λ12,Λ3) =
Ω(Λ1,Λ23), that means that it does not care the order in which they are evaluated.
In Figure 3.22 is reported a graphical representation of function Ω(x, y).
Using the identity tgh−1(x) = 1
2
ln1+x
1−x and Eulero formulas through following al-
gebraic passages is possible to obtain an expression in terms of exponentials and
logarithms.
2tgh−1
(
tgh
|x|
2
tgh
|y|
2
)
=
= ln
1 + tgh
|x|
2
tgh
|y|
2
1− tgh |x|
2
tgh
|y|
2
=
= ln
1 +
e
|x|
2 − e−
|x|
2
e
|x|
2 + e
−
|x|
2
e
|y|
2 − e−
|y|
2
e
|y|
2 + e
−
|y|
2
1− e
|x|
2 − e−
|x|
2
e
|x|
2 + e
−
|x|
2
e
|y|
2 − e−
|y|
2
e
|y|
2 + e
−
|y|
2
=
52
3 – Polar code theory
= ln
e |x|2 + e− |x|2
e |y|2 + e− |y|2
+
e |x|2 − e− |x|2
e |y|2 − e− |y|2

e |x|2 + e− |x|2
e |y|2 + e− |y|2
−
e |x|2 − e− |x|2
e |y|2 − e− |y|2

=
= ln
2e
|x|+ |y|
2 + 2e
−|x| − |y|
2
2e
|x| − |y|
2 + 2e
−|x|+ |y|
2
=
= ln
e
|x|+ |y|
2
e
|x|+ |y|
2
1 + e−|x|−|y|
e−|x| + e−|y|
 =
ln
(
1 + e−|x|−|y|
)− ln (e−|x| + e−|y|) (3.72)
Notice that same process to obtain 3.72 without considering the module of inputs
give as results 3.66.
It is also valid following property:
ln
(
ea + eb
)
= max(a, b) + ln
(
1 + e−|a−b|
)
(3.73)
Demonstration :
Let a > b, means that b − a < 0, ln (ea + eb) = ln [ea (1 + eb−a)] = a +
ln
(
1 + e−|a−b|
)
.
Let b > a, a− b < 0 so: ln (ea + eb) = ln [eb (1 + ea−b)] = b+ ln (1 + e−|a−b|).
That means 3.73 is valid.
As consequence 3.72 can be rewritten as follows:
ln
(
1 + e−|x|−|y|
)−max(−|x|,−|y|)− ln (1 + e−|−|x|+|y||) =
ln
(
1 + e−|x|−|y|
)
+ min(|x|, |y|)− ln (1 + e−||x|−|y||) =
So 3.71 becomes:
Ω(x, y) = sgn(x)sgn(y)
(
min(|x|, |y|) + ln 1 + e
−|x|−|y|
1 + e−||x|−|y||
)
(3.74)
53
3 – Polar code theory
This exact result can be approximated neglecting the second term in parenthesis.
This operation is correct when |x|, |y|  0 and ||x| − |y||  0, moreover this second
term is exactly equal to 0 when x = 0 or y = 0. In Figure 3.24 is shown the error
introduced by using the approximation compared to the use of function Ω(x, y).
In general the loss of performance introduced by the use of min-sum approximation
Figure 3.22: Function Ω(x, y).
compared to the use of exact formula is partially recovered, through two correction
methods:
• Offset
• Normalization
In the first case, the equation of the processing node becomes:
Ω˜(x, y) = sign(x) · sign(y) ·max (min(|x|, |y|)− β,0) (3.75)
where β is an optimized offset for the quantization scheme used in the architecture.
Typical values of β are not explored in literature for polar codes (to the best of our
knowledge) but referring to LDPC we expect β ≤ 0.25.
In the second case the equation becomes:
Ω˜(x, y) = sign(x) · sign(y) · α ·min(|x|, |y|) (3.76)
54
3 – Polar code theory
Figure 3.23: Min-Sum approximation of function Ω(x, y).
Figure 3.24: Approximation error of using Min-sum compared to function Ω(x, y).
55
3 – Polar code theory
Figure 3.25: Performance comparison of approximations for polar decoding. From
[3].
α < 1 is the normalization factor for the processing node, its value is typically
α ≥ 0.8. This case has been explored in [3] with the name of Scaled Min-Sum
(SMS). In Figure 3.25 with a normalization factor of α = 0.9375 for the case of
BPSK modulation over AWGN channel, codeword length 1024 and rate 1
2
is shown
that the loss introduced by the min-sum approximation is compensated by the scal-
ing factor α. Performance are therefore overlapped to original Belief Propagation
algorithm (BP) at 60 iterations and Successive Cancellation (SC) algorithm. The
choice of α = 0.9375 = 1
2
+ 1
4
+ 1
8
+ 1
16
= 1 − 2−4 is made because of the easiness
of hardware implementation, in fact it can be realized with just a shift and a sum
stage. This choice has also been adopted in the current work for hardware imple-
mentations.
In principle even complex normalization algorithms or different corrections of offset
and normalization could be combined to achieve better results, but good proven
56
3 – Polar code theory
results with simple SMS architecture make the choice of introducing complex cor-
rection an inadvisable path with the final hardware implementation purpose. The
study of variation of codes performances due to the variation of introduced correc-
tion also depends on bit quantization and the decoding algorithm used, these studies
are called Density Evolution Analysis.
3.9 Performance evaluation
Usually communication system with uncoded messages, compared to system with
coded data behaves like in Figure3.26. For Eb/N0 values lower than a certain quan-
Figure 3.26: Typical digital coded and uncoded communication system performance.
tity (known as convergence abscissa), coded system has usually lower performance
compared to uncoded ones: since some redundancy is introduced, more data are
transmitted into the channel and so more errors may occur. Excluding this region
where also coded communications are not reliable, coded systems allow the trans-
mitted power reduction because the same error probability (Bit Error Rate) can
be achieved with a lower signal to noise ratio Eb/N0.
Fixed a certain BER value, it is called Coding Gain of a coded system, the difference
in dB of the signal to noise ratio Eb/N0 needed to the uncoded system compared to
the coded one to achieve same desired BER.
The error ratio of iteratively decoded codes has a typical form as schematized in
Figure 3.27 for two cases. For the solid line three different regions can be distin-
guished:
57
3 – Polar code theory
Figure 3.27: Typical regions in an error probability curve for iterative decoding
algorithms: the solid line identify the waterfall region and the error floor region.
The trade-off between the two regions is illustrated by the second curve (dashed line)
which has lower error floor at the expense of higher convergence threshold.
• First region is where the code is inefficient, under the convergence thresh-
old. Even if the number of iteration is increased there are no performance
improvements.
• The waterfall region where the error rate has a sharp fall, which increases
with the number of iterations.
• The error floor region is where the descent of the error rate is lowest than the
waterfall region. The error floor is given to the minimum Hamming distance
for the code.
As illustrated by the dashed curve, often exist a trade-off between code performance
in the waterfall region and those in error floor region. For the specific case of po-
lar codes is stated in [39] and [4] that BER performance of polar codes under belief
propagation decoding shows no sign of error floor up to very low BER for the AWGN
channel (< 10−9 for N = 213, rate 1
2
). It is proven that this behaviour is due to the
high minimum distance (equal to the stopping distance) achieved in Polar Codes
and the girth of its related trellis graph.
It is called gn the girth of a variable node vni, that is the length (number of paths) of
58
3 – Polar code theory
the smallest cycle which crosses vni. Being the girth of a graph gmin = min
n∈{(1,...N2)}
(gn)
the smallest number of steps that avoid auto-confirmation of information among all
possible nodes. A stopping set is a non-empty set of nodes such that for every given
set of nodes on the left-most side of the graph, every node that is connected to the
right of the initial nodes is in the set, for this new set, apply same rule up to the
right-most nodes. It is called stopping tree a stopping set with initial set of only
one node that contains an information bit. The stopping distance is defined as the
size of a stopping set.
In Figure 3.28 an example of cycles for polar codes is given. An interesting consid-
Figure 3.28: Cycles example on polar code factor graph for the case of N = 8, girth
gmin = 12. From [4].
eration about those cycles is that since a polar code graph of code length N = 2n is
always contained in a graph of code length N = 2n+1, also cycles in smaller codes
are always present in bigger ones. Moreover first order cycles appears in the graph
for a code length of N = 4, a second order cycles which connects the first order
cycles appears for N = 8, it clearly appears that successive cycles for increasing
code lengths connects two halves of previous graph size.
59
Chapter 4
Belief Propagation Polar Decoder
software model
4.1 Introduction to C model
The software model has been realized with the purpose of allowing to simulate po-
lar codes performance under different conditions and also to evaluate the behave
of hardware decoders. The general block structure is presented in Figure 4.1. The
Figure 4.1: Block description of C model.
code start expecting as input a setting file, inside the file specific information are
given to perform the simulation analysis. The first block is in charge to generate
uniformly distributed random binary inputs as uncoded information. These data
are then coded according to specified polar matrix to obtain coded bits sequences.
Every coded sequence generated is a frame that is tested by sending it through the
channel. The purpose of the block channel is to simulate effects of sending signals
60
4 – Belief Propagation Polar Decoder software model
through an AWGN channel and receiving its output. After this block, the channel
received signals must be converted into soft-information to feed the described polar
decoder. The polar decoder carry out its duty by iterating multiple times the de-
coding algorithm. When decoder iterations are completed a final hard decision is
taken based on final soft-updates. As last step for the single frame the decoded bit-
stream is compared with original message and error statistics are updated. Multiple
frames are processed in order to statistically estimate performance of the decoding
structure through updates on error statistics.
4.1.1 Setting file
The setting file is a text file passed as parameter to the main function of the software.
To give a clear insight of the developed code let us comment information contained
in the setting file. The information contained in the setting file are:
Output file Name of the result output file.
Check results period Number of seconds to display an output. This output gives
the number of processed frames out of total and an estimation of the time
needed to complete decoding simulation. This value is also used for the peri-
odic save of updated error statistics on a log file.
uncoded frame size Number of uncoded bits randomly generated
Bhattacharyya epsilon This variable gives the initial erasure probability of the
BEC channel to compute the symmetric capacity of generated synthetic chan-
nels. This parameter is ignored if it is selected (inside the code) to use specific
Bhattacharyya parameters evaluated through simulations for the AWGN chan-
nel.
Max number of iterations It is the maximum number of iteration to perform
the BP decoding.
Max number of frame simulated It is the maximum number of simulated frames
if not enough error occurs.
Eb/N0 It states the minimum SNR, the increase step and the maximum SNR.
lfsr_seed First number it is the seed for the random noise generating function
of uncoded bits, the second parameter specify which random function should
be used for the generation. ’0’ calls random generation obtained through the
use of C primitive rand(), while ’1’ is used to invoke a Box-Müller algorithm
implemented for C code.
61
4 – Belief Propagation Polar Decoder software model
unif_seed It uses as input parameters equal to previous lfsr_seed. It is used to
set the seed for the Gaussian noise evaluation.
intrinsic information fractional bits, minimum and maximum value These
parameters are respectively the number of fractional bits used by the decoder,
the minimum and the maximum representable value inside the decoding ar-
chitecture.
extrinsic information minimum and maximum value It is the data represen-
tation of values from the channel.
scaling factor for min-sum algorithm It is the normalization factor for the min
sum approximation.
Starting frame number It is the number to start resumed simulation, if equal 1
it is a new simulation.
Decoding type method Every decoding scheduling described in the software is
associated with a number. This parameter specify which decoder to use.
BPSK=0 or 16-64-256QAM, BICM=0,1->off,on The first parameter select
which modulation to use, the second one enables or disables bit interleaved
coded modulations (BICM).
So the setting file is a card which describes the simulation that is processed.
4.1.2 Encoding
Once the uncoded data u are generated, the encoding is performed as a generic block
code. So the generating matrix G is computed by presented recursive construction,
then coded information x is computed as xT = GTuT . In order to save time of
execution of the code the GT matrix is computed only before the start of frame
iterations. Also to reduce the memory occupation G matrix is deleted and memory
freed.
Once the bits are coded they may be interleaved if the BICM option is selected.
This functionality emulates practical coding schemes used in real application envi-
ronments.
4.1.3 Channel simulation and soft information evaluation
For BPSK modulation over AWGN channel, the simulation of Gaussian noise ad-
dition is evaluated by 2.21. For QAM symbols the scheme of AWGN channel is
62
4 – Belief Propagation Polar Decoder software model
expressed by Y = Xk + Nk, where Nk is a complex Gaussian noise with N (0, N02 ).
So it is possible to write the probability of having received Y given that signal
Xk ∈ S was transmitted as
P (Y |Xk) = 1
2piσ2
e
−|Y −Xk|2
2σ2 (4.1)
and consequently
f(Y |Xk) = e
−|Y −Xk|2
2σ2
ln f(Y |Xk) = −|Y −Xk|
2
2σ2
=
−ES|Y −Xk|2
N0
(4.2)
where the average energy per symbol is E{|x|2} = ES. As can be seen in Figure
4.2 the coded frame is mapped into a list of signals. The list of symbols of the
Figure 4.2: Structure of LLRs message allocation for C model.
frame is sent over the AWGN channel, so for each signal in S, the receiver computes
f(Y |Xk). The LLR estimation on n-th bit is given by 4.3.
λn = ln
P (dn = 0)
P (dn = 1)
= ln
∑
Xk∈S(0)n
f(Y |Xk)∑
Xk∈S(1)n
f(Y |Xk)
= ln
∑
Xk∈S(0)n
f(Y |Xk)− ln
∑
Xk∈S(1)n
f(Y |Xk)
=
∗
max
Xk∈S(0)n
(f(Y |Xk))− ∗max
Xk∈S(1)n
(f(Y |Xk))
(4.3)
where S(0)n represents the set of symbols with n-th bit equal to 0 and S(1)n is the set
of symbols with n-th bit equal to 1. In Figure 4.3 an example of S(1)3 is given.
63
4 – Belief Propagation Polar Decoder software model
Figure 4.3: Example of S(1)3 .
Figure 4.4: Structure of LLRs message allocation for C model.
4.1.4 Graph description and Decoding
The LLR messages that are computed during the simulation are stored into two
memory of size Nx(n + 1). Each matrix is used to store only left-going or right-
going messages. In Figure 4.4 is depicted the memory representation, it is also pos-
sible to see the initialization of memories according to channel LLR estimations for
matrix_messages_left and of a-priori informationRF inmatrix_messages_right.
All other cells are set to zero at start of decoding.
64
4 – Belief Propagation Polar Decoder software model
This memory representation stores the update messages in the same order of the
graph description for polar codes as in Figure 3.9. To reduce the complexity of the
code, the final version of the simulation software does not implement explicit node
interconnections, but it exploits the uniform structure of the decoder. The reverse-
shuffle function called sorter and its inverse back− sorter are used to appropriately
remap data of the columns interested. The scheduling algorithm is defined with a
structure composed of one integer and six pointers.
typedef struct generic_fun_struct {int tot;
int *vector_col;
int *vector_upd;
int *calc_left;
int *calc_right;
int *left_temp;
int *right_temp;
} generic_f;
The integer tot defines the total number of elements in the vector_col, which is the
ordered list of the stages that are executed by the SA. The calc_left and calc_right
store for the stage pointed by vector_col if it must be performed left propagation
or right propagation (or both). left_temp and right_temp are pointers to two
memory locations of the same size ofmatrix_messages_left. Before any operation
on each stage, data from matrix_messages_left and matrix_messages_right
are copied into these memories. The results of propagated messages are stored
inside these support memories. Only when vector_upd has a 1 in the corresponding
position of the column, after evaluation of propagation messages, the results in the
support memories are copied back to correspondent matrix_messages_left and
matrix_messages_right. This solution allows to simulate hardware behaviour
when on the same clock multiple operations are performed.
4.2 Simulation Results
Computed results in terms of BER and FER for every step of iteration are saved
into the result text file. With simple Matlab scripts, this file can be elaborated
and plotted to obtain following curves. On practical point of view, to design and
compare decoder architectures, the favourite representation is given by FER results
which defines if the packet received was completely decoded correctly or not.
Since the length of the iteration is different among Scheduling Algorithms(SAs), to
obtain an information of performance given a certain fixed time is possible to as-
sign a given number of states of the flow graph. In Figure 4.5 a comparison among
presented SAs for same number of states is given. It is possible to see that most
65
4 – Belief Propagation Polar Decoder software model
performing scheduling at the same states condition are J, followed by K and I, which
outperform also SC decoding.
Comparing scheduling to a fixed number of states, does not give a reliable compari-
son because, as can see in their flow chart, the number of operation which compose
a state may be different.
To obtain an information of the speed of convergence for a scheduling instead of
1 1.5 2 2.5 3 3.5 4
10−6
10−5
10−4
10−3
10−2
10−1
100
Eb/N0 (dB)
BE
R
 
 
SC
A E
B F
C G
D H
K
J
I
(a)
1 1.5 2 2.5 3 3.5 4
10−4
10−3
10−2
10−1
100
Eb/N0 (dB)
FE
R
 
 
SC
A E
B F
C G
D H
K
J
I
(b)
Figure 4.5: Performance for BP decoding with different schedules, BPSK modula-
tion, floating point decoding, rate 1
2
, codeword length N = 1024, 150 states a)BER.
b)FER.
consider the number of state, is possible to consider the number of operations to
achieve performance. The result anticipated in Figure 3.17 reports FER results for
BPSK modulation over an AWGN channel with floating point precision of decoding
and 200 cycles of maximum decoding operations. It is possible to compare this result
with finite precision result for a 9 bit quantized messages of the final architecture of
Figure 4.6a. This comparison shows that the precision chosen does not modify the
curves plotted. This quantization parameter has been chosen for the final hardware
implementation, however by decreasing the number of bits and accepting a small
loss to decoding performance it is possible to reduce the area of final hardware im-
plementation.
66
4 – Belief Propagation Polar Decoder software model
A deep investigation has been made to point out properties to compare these schedul-
ing. Main effect analyzed are listed below:
• differences among scheduling policies tend to increase when the block size is
increased;
• differences among scheduling policies tend to increase when the code rate is
reduced;
• differences among scheduling policies tend to increase when the number of
iterations is reduced.
As can be seen observing first and second graph of Figure 4.6, the effect of different
scheduling is present both at an high number of operations, that can be considered
the convergence of the decoding (650 operations), and at an early number of iteration
(200 operations). The effect of scheduling becomes more evident as the decoding
time is reduced and it is appreciable from comparing these two graphs. In the third
graph of Figure 4.6 the codeword length N directly affects Frame Error Rate (FER)
in concordance with theory, but the scheduling effect is also present. Comparing
G to I scheduling, difference are small and vary from about 0,3dB for N = 256 to
0.4dB for N = 1024. While for scheduling comparison like I to F difference are of
about 1dB, in the same range.
Performance related to SA also depend on rate. As shown in the fourth graph
of Figure 4.6, the decrease of the rate offer better performance compliant with
theory. For the case of small decoding time, as before, the difference between G
and I becomes more and more significant as the code rate reduces. In the presented
figure, rate 3
4
gives a difference of about 0.5dB that becomes 2.5dB for rate 1
4
. The
reduction of the rate is however not always feasible in practical application because
it reduces the effective information sent every frame. In the first to third graph
of Figure 4.7, effects of channel modulation, for Additive White Gaussian Noise
(AWGN) channel and Quadrature Amplitude Modulation (16-QAM, 64-QAM, 256-
QAM) on all scheduling and SC decoding algorithm with min-sum approximation
are analyzed. All QAM modulation use Gray code bit assignment. The effect of the
modulation on channel is that differences among scheduling policies tend to slightly
reduce when the order of modulation is increased. For example considering G to I
scheduling, the gain to move from 16-QAM to 256QAM is about 0.25dB.
67
4 – Belief Propagation Polar Decoder software model
Figure 4.6: FER performance for BP decoding with different schedules, BPSK mod-
ulation, 200 operations. a)finite precision decoding (9 bit), rate 1
2
, codeword length
N = 1024. b)floating point decoding, rate 1
2
, codeword length N = 1024, 650 oper-
ations. c)floating point decoding, rate 1
2
, variable codeword length (N = 512, 256).
d)floating point decoding, variable rate (1
4
, 3
4
), codeword length N = 1024.
1 1.5 2 2.5 3 3.5 4
10−4
10−3
10−2
10−1
100
Eb/N0 (dB)
FE
R
 
 
SC
A E
B F
C G
D H
K
J
I
(a)
1 1.5 2 2.5 3 3.5 4
10−4
10−3
10−2
10−1
100
Eb/N0 (dB)
FE
R
 
 
SC
A E
B F
C G
D H
K
J
I
(b)
1 1.5 2 2.5 3 3.5 4
10−4
10−3
10−2
10−1
100
Eb/N0 (dB)
FE
R
 
 
SC N=512
SC N=256
F N=512
G N=512
J N=512
I N=512
F N=256
G N=256
J N=256
I N=256
(c)
1 1.5 2 2.5 3 3.5 4
10−4
10−3
10−2
10−1
100
Eb/N0 (dB)
FE
R
 
 
SC K=256
SC K=768
F K=256
G K=256
J K=256
I K=256
F K=768
G K=768
J K=768
I K=768
(d)
68
4 – Belief Propagation Polar Decoder software model
Figure 4.7: FER performance for BP decoding with different schedules, floating
point decoding, rate 1
2
, codeword length N = 1024. a) 200 operations, 16-QAM
modulation. c) 200 operations, 64-QAM modulation. d) 200 operations, 256-QAM
modulation.
1 1.5 2 2.5 3 3.5 4
10−4
10−3
10−2
10−1
100
Eb/N0 (dB)
FE
R
 
 
SC
A E
B F
C G
D H 
K
J
I
(a)
1 2 3 4 5 6 7
10−3
10−2
10−1
100
Eb/N0 (dB)
FE
R
 
 
SC
A E
B F
C G
D H
K
J
K
(b)
6 7 8 9 10 11 12
10−4
10−3
10−2
10−1
100
Eb/N0 (dB)
FE
R
 
 
SC
A E
B F
C G
D H
K
J 
I
(c)
69
Chapter 5
State of art for polar codes decoder
implementations
Initial description of equations and feasibility of Belief Propagation (BP) decoding
was given in [34] in 2010. This article lead the way to different BP architectures in
the following years. Here is presented a review of published implemented solutions to
give a research context background. At the end of this chapter Table 5.1 summarize
main performance results and hardware costs.
2011
First BP decoding architecture attempt found in literature implemented BP algo-
rithm on an FPGA.[40] This implementation has been created with the aim to give
maximum flexibility in term of possible decoding graphs. The use of FPGA allows
trading the number of processing units with the control part by reprogramming
it so achievable throughput change according to these choices. The code rate can
instead be changed at runtime, by setting correct frozen data. By adding an ex-
tra address generator it is possible to decode also graphs smaller than maximum
codeword length N without reprogramming. Used FPGA is a Xilinx XC5VLX85,
which allows to implement maximum block size of N = 8192, with 16 PEs. Highest
reported throughput rate is 53Mbps for a (256,192) polar code and 5 iterations,
however decoding performance results are given for 50 iterations.
2012
In [41] is presented a GPU implementation to explore the parallel computing capabil-
ity of this solution. The NVIDIA Fermi Architecture is used, to prove the feasibility
of implementing a BP decoder and exploit multiprocessors, on chip registers, shared
and global memory to efficiently improve the occupancy of available resources for
70
5 – State of art for polar codes decoder implementations
parallel computing. Operation used are in floating point base and computed mes-
sages are likelihood ratios (LRs). Parallel threading programming handle processors
to evaluate LR update equations, data are exchanged among each other through the
global memory, the address related to data also need to be interchanged in order to
perform shuffle and its inverse among data. The shared memory (faster than the
previous one) is used to copy left and right messages and perform operation. At the
end of the iteration, results are updated in the global memory. The used card is an
NVIDIA GTX 560 Ti, with 384 cores with working frequency of 1.66GHz, which
is presented to implement up to block size N = 2048, and for a small block length
N = 256 it reach 57.20Mbps of throughput at 10 iterations.
2013
As described in Section 3.8, in [3] was first suggested to use min-sum approximation
for BP polar decoding. It was also proposed a particular PE (divided in two parts,
for upper and lower messages) in which data representations are changed between
sign-magnitude to two’s complement to perform additions and then converted in
sign and magnitude for comparison, scaling and as output. It is evaluated that
for the introduction of the scaling block, the overall path delay is augmented by
about 1
5
. A critical path reduction is introduced by merging sum modulus and the
two conversion modulus inside the PE. The overall BP architecture is then used as
a systolic architecture by inserting between each stage a pipeline register to store
update messages. Only one stage is activated at a time, so during every iteration
n− 1 stages are not working, where N = 2n is the codeword length.
2014
In June 2014, the work of Park was announced in [36]. It presented a BP par-
allel decoder architecture that at 4dB achieve 4.68Gbps of "coded throughput",
which means that no frozen bits are used for the transmission. The main improve-
ment was the reduction of unnecessary operation for what is called scheduling C
in this work so obtaining scheduling G (more details in Chapter 6.1).[37] The ob-
tained architecture needs 45kb of memory to store data and it is composed by a
pipelined double stage in order to mitigate the memory access delay to data. The
structure also implements an early termination rule of 3 successive agreement on
hard decoding decision to achieve declared performance as coded throughput (P1).
Without the early termination the maximum coded throughput is 2.02Gbps, to de-
code a codeword length N = 1024 at nominal supply of 1V . This architecture has
been fabricated with TSMC 65nm CMOS process in a chip, where the decoder core
area occupy 1.8mm x 0.82mm, relevant measurements of the chip characteristics are
given in Table 5.1, where also a low frequency low voltage condition is reported (P2).
71
5 – State of art for polar codes decoder implementations
The architecture described in [42], aim to reduce the complexity of BP decoder
by eliminating calculation of frozen bit LLRs. So the resulting architecture is strictly
related to the specific polar code implemented. Four different type of PE are imple-
mented, to take care of all possible case of presence of frozen data. This simplified
BP decoder reduce the number of operation (sums and probability update) of about
20− 25% for code blocklengths ranging from N = 128 to N = 32768.
The work in [43] optimize their previous architecture [3] with the classical analy-
sis of operation dependency over time, retiming and pipelining for different decoding
messages. This approach allows to increase the hardware utilization up to 100%.
For the initial implementation, four solution are presented:
-Overlapping at iteration level, that means to reorder operation according to data
dependencies to reduce latency. It is done by filling unused PEs stages with follow-
ing task (rescheduling).
-Overlapping at codeword level, that is common message pipelining. This choice
force to implement n times the initial memory of the architecture where the code-
word length is N = 2n, to store all messages.
-Joint overlapping of codeword and iteration level, that combines previous ap-
proaches. The overlapping at iteration level is pipelined, this reduces the total
amount of needed memory to just double than initial architecture.
-Folded architecture, by implementing the architecture only with a fixed number of
stages. This folding factor can grow from 1 (P1)to n/2 (P2), where the code block
length is N = 2n. Furthermore to decrease latency is used overlapping at iteration
level (P2).
The introduction of early stopping criteria for BP decoding have been studied
in [44]. Two possible solutions are presented, the first one evaluates estimation of
uncoded uˆ and coded xˆ bits, and use the generating matrix G of polar codes to
verify if xˆ = uˆG. This method is called G-matrix-based detection type. The second
method is named (minLLR)-based detection type, it define the decoding process
complete if the minimum absolute value among all LLR is above a certain threshold.
This method also uses an estimation to the channel condition in order to set the
correct value of the threshold. It evaluates the Hamming distance between xˆ and uˆG
to compute this estimation. The implemented architecture with first early stopping
criterion (P1), is not influenced in terms of critical path (PE has bigger critical path
for codes with common lengths N ≤ 10000) while for the second criterion (P2), a
pipeline stage is necessary to reduce the critical path of the Hamming distance mea-
surement block for high speed designs. G-matrix solution can reduce the number of
iterations up to 42.5%, while adaptive (minLLR)-based reach 23.2% at 3.5dB. The
72
5 – State of art for polar codes decoder implementations
base BP decoder solution remains unchanged as in [43].
In [45] is proposed an hybrid Belief Propagation-Successive Cancellation decoder.
The main idea is to use a the BP decoder with early termination of [44] and if the
codeword is not decoded after the maximum number of 60 iterations, a SC decoding
is performed on updated LLRs from the BP decoder. It is shown for the case study
of a (1024,512) polar code that this hybrid solution shows performance lower than
BP decoder of about 0.2dB in the entire SNR region. Compared to SC decoder,
for medium SNR (around 2.5dB) the hybrid decoding reach 0.2dB of gain, and for
higher and lower SNR the behaviour converge to SC performance (that for high SNR
is better than BP). From the hardware point of view the SC and BP computation
are carried out by same PEs, because BP decoding is a generalized case of SC, so
with the proper input signals and the introduction of some multiplexer the BP PE
has been generalized.
2015
To reduce the amount of total memory used to implement polar codes BP decoders,
the work in [46] propose to combine four adjacent 2 by 2 PEs of the polar code graph
into one 4 by 4 block suitably interconnected. By doing so the total amount of mem-
ory and stages is halved while the critical path is increased from 2.05ns of typical
solution (implemented by authors) to 5.07ns. The decoder area at growing code-
word length becomes advantageous after N = 1024, in fact as N = 212,214,216,218,
the area is reduced by 18.4%,45.5%,51.5%,52.7% compared to typical implementa-
tion. This solution is interesting because works in the region where polar codes
perform better. The drawback for presented results is that the throughput ratio
between this solution and conventional solution also decrease with N due to the
critical path.
The architecture presented in [47] implements a Soft-output Cancellation (SCAN)
decoder, it uses polar BP update equation to decode the trellis graph in the specific
SC order. This solution force the exploration of the graph as a binary tree, but this
exploration is iterated multiple times as in BP. This choice of scheduling allows to
reduce the overall memory form N(n+1) of the classical trellis graph to nN
2
+N
2
−2 as
presented in [48] for the codeword length N = 2n. This architecture has been imple-
mented on XC5VLX85 FPGA device with a reduced number of PE. Given decoding
performance are compared with lower than average performance BP decoder, but
with normal BP decoding are comparable or worst. Obtained throughput is compa-
rable to [40] FPGA implementation. The number of implemented Look Up Tables
grows linearly from +18% for small N = 256 up to +43% for N = 8192. However
for same explored cases it reduces the number of Flip-flops of approximately one
73
5 – State of art for polar codes decoder implementations
order of magnitude.
To reduce the energy dissipation of BP decoding and to mitigate the required
latency introduced by multiple iterations, in [49] is presented a method called sub-
factor-graph freezing, which avoids to calculate updates for already converged por-
tions of the graphs during subsequent iterations. The presented decoding method
offers very small improvement (less than 0.1dB) compared with classical BP decod-
ing. Through this method is therefore possible to reduce the average number of
iterations of about 40−46% if compared to generic BP decoder as in [34] and 17% if
G-matrix-based method is implemented as in [44] at SNR of 3dB. The lowered aver-
age number of computations grants to lower power consumption and it is estimated
by 65% and 30 − 46% respectively for same comparisons. An effective hardware
implementation of this method is still not existing in literature.
In [50] a modified PE is introduced; the evaluation of left-propagating upper LLR
message is evaluated only through left-going LLRs data (except for last stage). The
other equations for polar codes BP probability updates remain unchanged. This
simplification reduce the amount of total LLRs message to be stored to 5N − 3.
The graphs exploration evolves like SCAN algorithm and presents similar perfor-
mance in terms of Frame error rate with negligible degradation. The implemented
decoder area is about 66% smaller than [36] but the compared throughput is about
20%. With comparison at [44], the decoder dimensions are reduced by 83% while
the throughput is 11% of the other architecture.
The proposed architecture in [51] exploits the simplification of PE nodes, so four
new types of PE are used: -All frozen node, for the part directly connected to frozen
leaf, it is not necessary to compute LLRs.
-All information node, all leaf node are information. The right propagating LLRs
have 0 information and it will not be updated so they can be removed.
-Repetition node, where there is only a single information leaf on the last node. The
information bit is just copied multiple times, so it is possible to avoid the message
passing in multiple stages.
-Single parity check node, where there is only a single leaf frozen node. It can be
substituted by a single parity check node to evaluate belief information.
It is used the proposed scheduling of [36] and the general early termination rule for
block codes given by the equation xˆH = 0 where H is the parity matrix and xˆ is
the hard estimation of the LLRs information updated at right-most of the decoding
graph. Form both Frame error rate performance and average number of iteration for
decoding points of view it is proved that the architecture in [44] and the proposed
one have practical same performance. The reduction of computation of the proposed
architecture due to the nodes is instead up to 40% for low SNR compared to [36].
74
5 – State of art for polar codes decoder implementations
P
ro
pe
rt
y
U
ni
t
[4
0]
[4
1]
[3
]
[3
6]
P
1
[3
6]
P
2
[4
3]
P
1
[4
3]
P
2
D
ec
od
in
g
ty
pe
A
G
A
B
lo
ck
le
ng
th
10
24
25
6
20
48
25
6
10
24
10
24
10
24
R
at
e
0.
5
0.
75
0.
5
0.
5
0.
5
Te
ch
no
lo
gy
F
P
G
A
G
P
U
C
M
O
S
C
M
O
S
P
ro
ce
ss
nm
65
45
C
or
e
ar
ea
m
m
2
1.
47
6
Su
pp
ly
V
1
0.
47
5
Fr
eq
ue
nc
y
M
H
z
16
0
16
60
30
0
50
50
0
P
ow
er
m
W
47
7.
5
18
.6
It
er
at
io
ns
5
35
10
6.
57
c
6.
57
c
60
T
hr
ou
gh
pu
t
M
b/
s
27
.8
3
53
.3
3
1.
23
57
.2
0
37
46
76
a
77
9.
3a
42
6
20
00
E
ne
rg
y
effi
ci
en
cy
pJ
/b
it
10
2.
1
23
.8
E
ne
rg
y
eff
.
pe
r
it
er
.
pJ
/b
/i
te
r
15
.5
4
3.
63
A
re
a
effi
ci
en
cy
M
b/
s/
m
m
2
31
68
52
8.
0
P
ro
pe
rt
y
U
ni
t
[4
4]
[4
4]
P
1
[4
4]
P
2
[4
6]
[4
7]
[5
0]
D
ec
od
in
g
ty
pe
A
C
SC
A
N
SC
A
N
B
lo
ck
le
ng
th
10
24
10
24
10
24
10
24
32
76
8
R
at
e
0.
5
0.
5
0.
5
0.
9
Te
ch
no
lo
gy
C
M
O
S
C
M
O
S
F
P
G
A
C
M
O
S
C
M
O
S
P
ro
ce
ss
nm
45
45
90
90
C
or
e
ar
ea
m
m
2
0.
74
7
0.
97
4.
73
4
Su
pp
ly
V
Fr
eq
ue
nc
y
M
H
z
50
0
19
7
90
57
1
51
8
P
ow
er
m
W
It
er
at
io
ns
40
23
b
30
.7
b
15
5
1b
1b
T
hr
ou
gh
pu
t
M
b/
s
29
00
45
00
35
00
16
83
a
17
.5
6
95
8a
22
08
a
E
ne
rg
y
effi
ci
en
cy
pJ
/b
it
32
8
22
0
29
1
E
ne
rg
y
eff
.
pe
r
it
er
.
pJ
/b
/i
te
r
A
re
a
effi
ci
en
cy
M
b/
s/
m
m
2
98
7
46
6
a
C
od
ed
th
ro
ug
hp
ut
;b
E
ar
ly
te
rm
in
at
io
n
at
3.
5d
B
;c
E
ar
ly
te
rm
in
at
io
n
at
4d
B
Table 5.1: Implemented BP architectures in literature.
75
Chapter 6
Belief Propagation Decoder
hardware implementation
This chapter describes developed architectures for the evaluation of hardware im-
plementations performance.
Top down approach is used to present designed structures. An initial decision that
has been taken for the developed designs is to avoid to constrain the architectures
as in [42],[51]. These solutions allows to reduce hardware complexity and increase
working frequency by focusing on a specific target code and consequently optimize
the hardware, but obtained results lose the capability to decode different polar codes.
In fact the frozen bit information is fixed in hardware and cannot be changed while
working. For practical application this force to redesign the decoder architecture
if polar codes improvements are discovered. Moreover this decision force to imple-
ment in the final application a specific decoder for every single polar code used for
transmission. This lack of flexibility has been judged unacceptable for the develop
of real exploitable designs.
6.1 Effects of scheduling algorithms on architectures
Scheduling Algorithms (SA) directly affects properties of implemented architectures.
From parallelism point of view, the architecture design may be classified in
• Fully serial,
• Partially parallel,
• Fully parallel.
Serial implementation is clearly the simplest and compact but the limited compu-
tational resource cause to be the slowest. This solution could be sufficient to grant
76
6 – Belief Propagation Decoder hardware implementation
moderate throughput requirements. On the contrary the fully parallel architecture
can reach highest speed but it may face two fundamental problems:
• Lack of flexibility;
• Conflicts in memory access.
For the specific purpose of studying achievable performance of polar codes decoders,
fully parallel implementations are the most liked because they do not penalize
throughput or increase latency, however if dimensions of codes becomes too large it
is necessary to realize partially parallel architectures.
Previous consideration does not affect the possibility to reduce hardware structure
due to different scheduling while achieving maximum performance.
Following descriptions on SAs are related to the ones that have been implemented
Figure 6.1: Scheduling C message update example.
in an hardware design. For scheduling C, if all update messages are observed, when
Left to Right (L→ R) bidirectional updates are performed, evaluated left propagat-
ing messages are then rewritten at Right to Left (R→ L) stage updates, so the first
update is lost without having been used. Generalizing this consideration allows to
move from scheduling C to scheduling G without loss of performance. Moreover the
PE will only need to calculate half of previous information, becoming unidirectional.
In Figure 6.1 the time evolution of updates is presented for a 4 state C scheduling
(or equivalently a 3 stage decoder) with same name convention given in Section 3.5
(RF are a-priori information and Lch are channel LLRs). It is possible to see that in
this example the LLR 2L is calculated in (L→ R) but never used, so it is possible to
avoid to save that value in memory. Moreover the updated information is used only
in the following time period, this observation grants that it is possible to replace
these messages in the data storing block of the implemented design.
These properties are exploited to reduce hardware components in the final architec-
ture. Identical considerations lead from scheduling D to H. The architecture which
implements previous observations has been called Reduced Complexity Architecture
(RC).
By similar considerations to implement scheduling I is possible to simplify the re-
quired hardware without loss of performance. Only two operation are needed each
77
6 – Belief Propagation Decoder hardware implementation
state, this will be executed by Bidirectional Wave Architecture (BI-W).
To implement J scheduling the number of stages must be equal to the number of op-
eration needed in one state. It can be considered as the upper bound of implemented
resources for previous SAs. For this algorithm Fully Parallel (FP) Architecture will
be used.
6.2 Hardware Implementation
6.2.1 Main Entity
Fully Parallel Architecture
Figure 6.2: Fully parallel architecture for a code of length N = 4.
Conceptually, the simplest approach to implement the BP decoder is to map the
factor graph of Figure 3.9 directly to hardware as shown in Figure 6.2. This fully
parallel (FP) architecture uses N
2
log2(N) PEs. A supplementary feature in Figure
6.2 is the presence of switching networks, which are circuits for implementing certain
permutation operations, such as perfect shuffle as described in 3.6. The decoder
contains banks of memory registers. Those registers marked “L” hold left-going
LLR messages, while those marked “R” hold right-going LLR messages. The R-
registers at level 1 are initialized in accordance with (3.68) and the L-registers at
level n + 1 are initialized in accordance with (3.67). All remaining registers are
initialized to zero. Note that there is no need to have any R registers at the channel
78
6 – Belief Propagation Decoder hardware implementation
level n + 1. Following the initialization, the registers are updated in accordance
with formulas (3.62)–(3.65) and in accordance with an update schedule. In order to
conserve energy, the update schedule should avoid executing an update when there
is no change in the messages coming from the neighbours.
This architecture opportunely controlled is able to perform any scheduling algorithm
by construction, moreover a-priori information of inner memory register may be
preloaded to save computations and reduce dynamic energy consumption.
Reduced Complexity Architecture
Figure 6.3: Reduced complexity architecture.
The reduced complexity (RC) architecture depicted in Figure 6.3 is composed
of a single stage of PEs that receive the LLR values either from a register memory
block or from a RAM (or ROM if the polar code is fixed) memory R0 at the start of
each iteration. R0 stores the a-priori information of frozen bits, the memory block
stores (n + 1) × (N) LLR messages. The information from the channel is stored
in the rightmost Ln memory column, the final soft output is stored in the leftmost
memory column L0 and the other memories are used to save temporary information
of both left and right messages. This kind of architecture memory reduction has
been proposed earlier in [37]. The switching networks SWNL and SWNR permute
the LLR messages back and forth as reverse − shuffle in 3.6. This choice allows
to store LLR messages from channel in the memory block in their original order.
The control unit is deputed to send correct signal and addresses in order to select
correct outputs and store data updates.
To save initial a-priori computations, fixed results (allowed by the implemented G
scheduling) are preloaded to register memory block.
79
6 – Belief Propagation Decoder hardware implementation
Bidirectional Wave Architecture
The Bidirectional Wave Architecture (BI-W) can be seen as an extension of previous
RC architecture. As can be seen in Figure 6.4 the number of processing PE stages
Figure 6.4: Bidirectional wave architecture.
Figure 6.5: Bidirectional wave register memory block detail.
is doubled, so it is possible to calculate two LLR set update in parallel. According
to observations on I scheduling the register memory bock has size (n+ 2)× (N). As
can be seen from Figure 6.5 assigning a specific memory organization, by separating
even and odd index messages of the lower or upper half, it is possible to reduce
the hardware complexity to four single double port memory register accordingly
interconnected. This simplification is possible due to message order specified by
80
6 – Belief Propagation Decoder hardware implementation
Figure 6.6: Bidirectional wave architecture refined I scheduling.
I scheduling. R0 RAM memory and switching networks SWNL and SWNR are
unchanged. To enhance performance of the final design a modified I scheduling has
effectively been implemented, as reported in Figure 6.6. From computation point of
view no changes are inserted in the new presented scheduling, however this new flow
map explicitly point out effective operations. Initial iR messages are precomputed
and loaded in the correct memory positions for a-priori LLRs, so at first iteration
only half PE stages are used. At the last iteration, only the left-going updates
for last second half of operations of the iteration are necessary to compute final
estimated decoded message. These observations combined allows to start a new
codeword decoding if available at the last iteration of the previous one saving half
of total number of operations for first iteration.
81
6 – Belief Propagation Decoder hardware implementation
6.2.2 Processing Element
The basic PE for the proposed BP decoding architectures is a component that has
four inputs and four outputs, implementing two sets of belief updates: a left-wave
of updates given by (3.62) and (3.63) and a right update given by (3.64) and (3.65).
One implementation option is to define a full PE that can carry out the left- and
Figure 6.7: Normalized min-sum block for two’s complement data representation.
right-wave updates simultaneously. One may conserve hardware by noting the con-
gruent nature of the two updates and defining a half PE that can carry out either
the left or the right update at any given time.
With proper scheduling this half-size PE does not increase latency while it reduces
complexity significantly. The normalized min-sum approximation is used to evaluate
part of the four outputs of the general PE component. For higher clock speed, ev-
ery half-size PE is implemented using sign and modulus representation. This result
has been achieved comparing two’s complement (2’c) and sign and modulus (SM)
solutions.
Since in literature the only detailed architecture structure of polar code PEs is
given in [3], presentation of both 2’c and SM basic PE component is given to point
out architectural improvements and prove how synthesis results have been achieved,
in particular no conversion between 2’c and SM are performed in the same architec-
ture, saving conversion blocks delays.
For the 2’c implementation, the sum is the classical sum block, and it is necessary
a min-sum component specialized for 2’c. The min-sum component presented in
Figure 6.7, has been designed to present the best performance achievable in terms
of minimum combinatory delay.
The basic structure is made by four sum stages which evaluates (1−2−α)a, (1−2−α)b
82
6 – Belief Propagation Decoder hardware implementation
Figure 6.8: Normalized min-sum block for modulus and sign data representation.
or their opposite values in parallel. A multiplexer selects the correct output from
each stage. Signals res_sum and res_dif are the carry/remainder of the sum or
subtraction operation on k + 1 bits of a and b; Φ is computed as Φ = ak−1bk−1 +
bk−1(a > b)k + ak−1(a > b)k.
The normalization used is (1 − 2−α) = 0.9375 so the scaling operation can be im-
plemented with a simple shift-addition circuit as in [3]. In order to reduce rounding
errors for truncation, in sum stage α decimal bits are used to represent data and an
additional 2−α is summed to approximate the round function. This approximation
work as the real round for positive numbers, while for negative values, only for −0.5
decimal, result is rounded up instead of down, and for all other cases round func-
tion is preserved. This relaxation allows to just sum 2−α as approximation both for
positive and negative 2’c numbers to obtain the correct behaviour.
The sign and modulus (SM) solution allows a simple comparison of the modulus
to implement the min-sum approximation. As can be seen in Figure 6.8, the min-
sum component reduces to a single comparison block, which directly computes the
sign of difference between the two modulus, a multiplexer to select the smallest and
normalization blocks, similar to 2’c case of normalization of two inputs (in this case
modulus are always considered as positive inputs). For the generation of the sign
output an xor gate evaluates the sign of the solution except for the case that at
least one of the two inputs is a zero, that is checked by a simple comparison to zero
module and the correct output is selected through a multiplexer.
The sum modulus for the case SM becomes more complex. As represented in Figure
6.9, the simple sum component of 2’c case is replaced by a sum and two subtractor
modules, to perform all possible combination of operation between the two input.
This component decompose operation A+B in all possible combinations to evaluate
83
6 – Belief Propagation Decoder hardware implementation
Figure 6.9: Sum block for modulus and sign data representation.
the modulus output as:
|O| =

|A|+ |B| if (A > 0)(B > 0) + (A < 0)(B < 0)
|A| − |B| if (A > B)((A > 0)(B < 0) + (A < 0)(B > 0))
|B| − |A| if (A < B)((A > 0)(B < 0) + (A < 0)(B > 0))
The sign concordance, given by the xor of sign inputs, select the sum module as
output of the multiplexer, or the remainder res_dif of first subtractor module
output is used to select between the two subtractor modules.
To evaluate the sign of the sum operation, res_dif gives the information of the
highest value, and selects accordingly the output on the multiplexer. If both input
have same module but different sign, the sign output must be zero. To perform this
check a compare module is added and its results goes inside an and gate with xor
result of sign inputs to the first control signal of the multiplexer for sign output.
84
6 – Belief Propagation Decoder hardware implementation
6.3 Synthesis results
Both SM and 2’c solution for PE have been implemented in VHDL, validated and
synthesized using Synopsys Design Compiler (version Z-2007.03-SP1) and STMicro-
electronics standard cell library for 45 nm CMOS VLSI design. Obtained result
for 2’c PE and SM PE are presented in Table 6.1. To the best of our knowledge,
Property Unit SM 2’c
Critical Path ns 0.95 1.05
Combinational Area µm2 2873.91 1482.82
Switching power mW 0.4905 0.2117
Total power mW 0.8941 0.3829
Table 6.1: Comparison of developed PEs
presented critical path results are the best results compared to available ones in
literature. The fast solution proposed uses 10% less time to compute outputs, as
drawback the combinational area is increased of 94% also the switching and total
power are increased by 131% and 133% for the single PE.
Both structures for PE have been used to implement proposed decoder architectures.
The main features of implemented designs are reported in Table 6.2.
As expected, the maximum throughput achievable is obtained by sign and modulus
(SM) Fully Parallel (FP) architecture.
The speed-up introduced by selecting the best data representation for both FP and
RC architectures in terms of frequency and throughput is only 1%. This phenomenon
is attributable to close PE results in terms of critical delay.
Notice that an unexpected result appears, as can be seen from core area compu-
tations the synthesis tool performs better both in terms of maximum achievable
frequency and core area for larger SM structures of a constant ratio above 11%.
This result has been exploited by implementing only for MS case the Bidirectional
Wave architecture (BI-W). For BI-W architecture the throughput is computed for
the single codeword decoding. In this worst case condition it is possible to observe
that the throughput over area ratio is better than RC solution of about 33%
The energy estimation has been evaluated by propagating the switching activity
from input ports with uniform distribution through all internal nets of the architec-
ture. It appears that BI-W uses 15% more energy per bit compared to RC solution.
85
6 – Belief Propagation Decoder hardware implementation
Property Unit FP RC FP RC BI-W
Data representation 2’s complement sign-modulus
Decoding Scheduling BP J BP G BP J BP G BP
Block length 1024
Rate 0.5
CMOS Process nm 45
Core area mm2 12.46 1.65 10.89 1.48 1.93
Supply V 1 1 1 1 1
Frequency MHz 606 555 625 588 571
Power mW 2056.5 328.4 1846.7 331.0 638.14
Iterations 65 9 65 9 10
Throughput Mb/s 4773 1754 4923 1858 3215
Energy efficiency pJ/bit 430.82 187.22 375.11 178.09 198.48
Energy eff. per iter. pJ/b/iter 6.63 20.80 5.77 19.79 19.85
Area efficiency Mb/s/mm2 383.10 1063.08 452.14 1253.96 1665.84
Table 6.2: Comparison of developed polar decoders
86
Chapter 7
Conclusions
7.1 Achieved results
In Table 7.1 designed architectures are compared with main Belief Propagation (BP)
solutions available in literature. As expected, the Fully Parallel (FP) architecture
achieves the maximum throughput of almost 5Gbps, so about 2 times the best
throughput design in literature. The cost of this performance is the occupied area
to implement all PEs. To overcome this problem the research has been oriented
to optimize the area efficiency parameter. Best two scheduling candidates have
been implemented. The Reduced Complexity (RC) architecture reaches comparable
performance with existing solutions, while Bidirectional-Wave (BI-W) architecture
improves of more than 33% the throughput over area ratio compared to second
candidate (and literature). This qualifies BI-W architecture as best decoder as
throughput over area design.
From the energy efficiency point of view, achieved results show that RC architecture
improves of about 46% the best result in literature while BI-W architecture reduces
this gain of only 6%.
7.2 Future work
As future development of the current work many paths are open:
Polar codes algorithmic improvement The developed C software for simula-
tion allows to inspect many aspects of polar codes, in particular the Frozen
Bit Set can improve performance decoding without any architectural modifi-
cation. Frozen bits depend and can be optimized for the considered channel
but also for the decoding algorithm. Present used sets come from BEC best
set or SC optimized set for AWGN channel at 0 dB of SNR. The C software
87
7 – Conclusions
could be used as a starting point to optimize via Monte Carlo simulations (or
other kind of optimizations) for specific BP decoding and target channel.
Evaluation of early stopping rules Starting from already existing early stop-
ping rules it is left open to implement those rules to enhance maximum achiev-
able throughput for presented architectures by reducing the total number of
iteration for good message SNR conditions. Moreover in [44] is claimed thatH-
matrix-based approach similar to the one used for most block codes (like LDPC
codes) cannot be exploited because the update of the channel message xˆ to test
if xˆHT = 0 where H is the parity matrix. However this statement is incorrect
(as confirmed in [51]), in fact it is possible to compute xˆj = `n+1,j + rn+1,j,
where 1 ≤ j ≤ N with N = 2n codeword length. Usually this information is
not used (and not computed), because it is exploited the polar codes property
that the trellis graph give already the uncoded result. So it would be interest-
ing to compare their proposed G-matrix-based solution with the previous one
in terms of hardware complexity.
ASIC realization Best presented BP architectures could be further studied to
achieve an effective ASIC chip that targets a gate level technology. Con-
straining the architecture to the available hardware allows to evaluate real
performance and compare them with post synthesis results.
88
7 – Conclusions
P
ro
pe
rt
y
U
ni
t
[3
6]
[4
3]
P
2
[4
4]
[4
6]
[5
0]
F
P
R
C
B
I-
W
D
ec
od
in
g
Sc
he
du
lin
g
B
P
G
B
P
A
B
P
A
B
P
C
SC
A
N
B
P
J
B
P
G
B
P
I
B
lo
ck
le
ng
th
10
24
10
24
10
24
10
24
10
24
10
24
10
24
10
24
R
at
e
0.
5
0.
5
0.
5
0.
5
0.
5
0.
5
0.
5
0.
5
Te
ch
no
lo
gy
C
M
O
S
C
M
O
S
C
M
O
S
C
M
O
S
C
M
O
S
C
M
O
S
C
M
O
S
C
M
O
S
P
ro
ce
ss
nm
65
45
45
45
90
45
45
45
C
or
e
ar
ea
m
m
2
1.
48
4.
80
†
4.
80
†
0.
75
0.
97
10
.8
9
1.
48
1.
93
Su
pp
ly
V
1
1
1
1
Fr
eq
ue
nc
y
M
H
z
30
0
50
0
50
0
19
7
57
1
62
5
58
8
57
1
P
ow
er
m
W
47
7.
5
18
46
.7
33
1.
0
63
8.
14
It
er
at
io
ns
15
60
40
15
1
65
9
10
T
hr
ou
gh
pu
t∗
M
b/
s
10
24
20
00
29
00
84
1
47
9
49
23
18
58
32
15
E
ne
rg
y
effi
ci
en
cy
pJ
/b
it
46
6.
31
32
8
32
8
37
5.
11
17
8.
09
19
8.
48
E
ne
rg
y
eff
.
pe
r
it
er
.
pJ
/b
/i
te
r
31
.0
9
5.
47
8.
20
5.
77
19
.7
9
19
.8
5
A
re
a
effi
ci
en
cy
M
b/
s/
m
m
2
69
3.
77
61
9.
17
89
7.
80
11
26
.5
0
49
3.
81
45
2.
14
12
53
.9
6
16
65
.8
4
N
or
m
al
iz
ed
to
45
nm
ac
co
rd
in
g
to
IT
R
S
ro
ad
m
ap
T
hr
ou
gh
pu
t∗
M
b/
s
12
63
20
00
29
00
84
1
65
2.
42
49
23
18
58
32
15
E
ne
rg
y
effi
ci
en
cy
pJ
/b
68
3.
09
32
8
32
8
37
5.
11
17
8.
09
19
8.
48
A
re
a
effi
ci
en
cy
M
b/
s/
m
m
2
12
50
.2
1
61
9.
17
89
7.
80
11
26
.5
0
11
95
.7
4
45
2.
14
12
53
.9
6
16
65
.8
4
∗T
hr
ou
gh
pu
t
ob
ta
in
ed
by
di
sa
bl
in
g
th
e
B
P
ea
rl
y-
st
op
pi
ng
ru
le
s
fo
r
fa
ir
co
m
pa
ri
so
n.
†E
st
im
at
io
n
fr
om
ga
te
co
un
t.
Table 7.1: Comparison of implemented architectures with literature.
89
Bibliography
[1] Matthieu Bloch. A (very) brief hystory of coding theory.
http://users.ece.gatech.edu/mbloch/sp10_ece6606/coding_history.pdf.
[2] Masoud Salehi John G. Proakis. Digital communications. McGraw-Hill Higher
Education, Boston, 5 edition, 2008.
[3] Bo Yuan and K.K. Parhi. Architecture optimizations for BP polar decoders.
In 2013 IEEE International Conference on Acoustics, Speech and Signal Pro-
cessing (ICASSP), pages 2654–2658, May 2013.
[4] A. Eslami and H. Pishro-nik. On Finite-Length Performance of Polar Codes:
Stopping Sets, Error Floor, and Concatenated Design. IEEE Transactions on
Communications, 61(3):919–929, March 2013.
[5] C. E. Shannon. A mathematical theory of communication. The Bell System
Technical Journal, 27(3):379–423, July 1948.
[6] D. J. Costello and G. D. Forney. Channel coding: The road to channel capacity.
Proceedings of the IEEE, 95(6):1150–1177, June 2007.
[7] R. W. Hamming. Berror detecting and error correcting codes. Bell Syst. Tech.
J., 29:147–160, 1950.
[8] M. J. E. Golay. Notes on digital coding. Proc. IRE, 37:657, June 1949.
[9] D. E. Muller. Application of boolean algebra to switching circuit design and to
error detection. IRE Trans. Electron. Comput., EC-3:6–12, September 1954.
[10] I. S. Reed. A class of multiple-errorcorrecting codes and the decoding scheme.
IRE Trans. Inform. Theory, IT-4:38–49, September 1954.
[11] D. Slepian. A class of binary signal alphabets. Bell Syst. Tech. J., 35:203–234,
1956.
[12] A. Hocquenghem. Codes correcteurs d’erreurs. Chiffres, 2:147–156, 1959.
[13] R. C. Bose and D. K. Ray-Chaudhuri. On a class of error correcting binary
group codes. Inform. Contr., 3:68–79, 1960.
[14] I. S. Reed and G. Solomon. Polynomial codes over certain finite fields. J.
SIAM, 8:300–304, June 1960.
[15] R. A. Silverman and M. Balser. Coding for constant-data-rate systems. IRE
Trans. Inform. Theory, PGIT-4:50–63, September 1954.
[16] P. Elias. Coding for noisy channels. IRE Conv. Rec., 4:37–46, March 1955.
90
Bibliography
[17] P. Elias. Error-free coding. IRE Trans. Inform. Theory, IT-4:29–37, September
1954.
[18] R. Gallager. Low-density parity-check codes. IRE Transactions on Information
Theory, 8(1):21–28, January 1962.
[19] Jr. G. D. Forney. Concatenated Codes. MA: MIT Press, Cambridge, 1966.
[20] A. Glavieux C. Berrou and P. Thitimajshima. Near shannon limit error-
correcting coding and decoding: Turbo codes. In Proc. 1993 Int. Conf. Com-
mun., page 1064–1070, Geneva, Switzerland, May 1993.
[21] D. J. C. MacKay and R. M. Neal. Good codes on very sparse matrices. In
C. Boyd, editor, Proc. Cryptography Coding. 5th IMA Conf., page 100–111,
Berlin, Germany, 1995. Springer.
[22] D. J. C. MacKay and R. M. Neal. Near shannon limit performance of low-
density parity-check codes. Elect. Lett., 32:1645–1646, August 1996.
[23] E. Arikan. Channel polarization: A method for constructing capacity-achieving
codes. In IEEE International Symposium on Information Theory, 2008. ISIT
2008, pages 1173–1177, July 2008.
[24] C. Cahn. Combined Digital Phase and Amplitude Modulation Communica-
tion Systems. IRE Transactions on Communications Systems, 8(3):150–155,
September 1960.
[25] Fuqin Xiong. Digital Modulation Techniques. Boston : Artech House, 2 edition,
2006.
[26] J. Hancock and R. Lucky. Performance of Combined Amplitude and Phase-
Modulated Communication Systems. IRE Transactions on Communications
Systems, 8(4):232–237, December 1960.
[27] C. Campopiano and B. Glazer. A Coherent Digital Amplitude and Phase Mod-
ulation Scheme. IRE Transactions on Communications Systems, 10(1):90–95,
March 1962.
[28] E. Arıkan. On the Origin of Polar Coding. IEEE Journal on Selected Areas in
Communications, 34(2):209–223, February 2016.
[29] Robert G. Gallager. Information Theory and Reliable Communication. Wiley,
1968.
[30] Peter IRockett Frank J. Aherne, Neil A.Thacker. The bhattacharyya met-
ric as an absolute similarity measure for frequency coded data. Kybernetika,
34(4):[363]–368, 1998.
[31] E. Arikan. Channel combining and splitting for cutoff rate improvement. IEEE
Transactions on Information Theory, 52(2):628–639, February 2006.
[32] E. Arıkan. Channel polarization: A method for constructing capacity-achieving
codes for symmetric binary-input memoryless channels. IEEE Transactions on
Information Theory, 55(7):3051–3073, July 2009.
[33] G.D. Forney Jr. Codes on graphs: normal realizations. IEEE Transactions on
Information Theory, 47(2):520–548, February 2001.
91
Bibliography
[34] E. Arıkan. Polar codes: A pipelined implementations. In Proc. 4th Int. Symp.
Broadband Communication, Melaka, Malaysia, 11-14 July 2010.
[35] N. Hussami, R. Urbanke, and S.B. Korada. Performance of polar codes for
channel and source coding. In Information Theory, 2009. ISIT 2009. IEEE
International Symposium on, pages 1488 –1492, 28 2009-july 3 2009.
[36] Youn Sung Park, Yaoyu Tao, Shuanghong Sun, and Zhengya Zhang. A 4.68gb/s
belief propagation polar decoder with bit-splitting register file. In 2014 Sym-
posium on VLSI Circuits Digest of Technical Papers, pages 1–2, June 2014.
[37] Youn Sung Park. Energy-Efficient Decoders of Near-Capacity Channel Codes.
2014. PhD.
[38] E. Boutillon F. Guilloud and J.L. Danger. λ-min decoding algorithm of regular
and irregular ldpc codes. In International Symposium on Turbo Codes and
Related Topics (ISTC), pages 451–454. IEEE, September 2003.
[39] A. Eslami and H. Pishro-Nik. On bit error rate performance of polar codes in
finite regime. In 2010 48th Annual Allerton Conference on Communication,
Control, and Computing (Allerton), pages 188–194, September 2010.
[40] A Pamuk. An FPGA implementation architecture for decoding of polar codes.
In 2011 8th International Symposium on Wireless Communication Systems
(ISWCS), pages 437–441, November 2011.
[41] R.L. Bharath Kumar and N. Chandrachoodan. A GPU implementation of belief
propagation decoder for polar codes. In 2012 Conference Record of the Forty
Sixth Asilomar Conference on Signals, Systems and Computers (ASILOMAR),
pages 1272–1276, November 2012.
[42] Y. Zhang, Q. Zhang, X. Pan, Z. Ye, and C. Gong. A simplified belief prop-
agation decoder for polar codes. In Wireless Symposium (IWS), 2014 IEEE
International, pages 1–4, March 2014.
[43] B. Yuan and K. K. Parhi. Architectures for polar BP decoders using folding. In
2014 IEEE International Symposium on Circuits and Systems (ISCAS), pages
205–208, June 2014.
[44] Bo Yuan and K.K. Parhi. Early Stopping Criteria for Energy-Efficient Low-
Latency Belief-Propagation Polar Code Decoders. IEEE Transactions on Signal
Processing, 62(24):6496–6506, December 2014.
[45] Bo Yuan and K.K. Parhi. Algorithm and architecture for hybrid decoding
of polar codes. In 2014 48th Asilomar Conference on Signals, Systems and
Computers, pages 2050–2053, November 2014.
[46] J. Sha, X. Liu, Z. Wang, and X. Zeng. A memory efficient belief propagation
decoder for polar codes. China Communications, 12(5):34–41, May 2015.
[47] G. Berhault, C. Leroux, C. Jego, and D. Dallet. Hardware implementation of
a soft cancellation decoder for polar codes. In 2015 Conference on Design and
Architectures for Signal and Image Processing (DASIP), pages 1–8, September
2015.
92
Bibliography
[48] C. Leroux, A. J. Raymond, G. Sarkis, and W. J. Gross. A Semi-Parallel
Successive-Cancellation Decoder for Polar Codes. IEEE Transactions on Signal
Processing, 61(2):289–299, January 2013.
[49] S. M. Abbas, Y. Fan, J. Chen, and C. Y. Tsui. Low complexity belief propaga-
tion polar code decoder. In 2015 IEEE Workshop on Signal Processing Systems
(SiPS), pages 1–6, October 2015.
[50] J. Lin, C. Xiong, and Z. Yan. Reduced complexity belief propagation decoders
for polar codes. In 2015 IEEE Workshop on Signal Processing Systems (SiPS),
pages 1–6, October 2015.
[51] J. Xu, T. Che, and G. Choi. XJ-BP: Express Journey Belief Propagation
Decoding for Polar Codes. In 2015 IEEE Global Communications Conference
(GLOBECOM), pages 1–6, December 2015.
[52] Jiming Chen Yan Zhang, Laurence T. Yang. RFID and sensor networks: ar-
chitectures, protocols, security and integrations. CRC Press, 2010.
[53] A. Viterbi. Information theory in the sixties. IEEE Transactions on Informa-
tion Theory, 19(3):257–262, May 1973.
[54] P Vontobel R. Koetter. Graph-covers and iterative decoding of finite length
codes. In 3rd International Symposium on Turbo Codes and related topics, 1-5
Sept. 2003.
[55] Michael S. Postol David J.C. MacKay. Weaknesses of margulis and ramanujan-
margulis low-density parity-check codes. Electronic Notes in Theoretical Com-
puter Science, 74, 2003.
[56] K.K. Parhi D. Oh. Min-sum decoder architectures with reduced word length
for ldpc codes. IEEE Transactions on circuits and systems, 57(1):105–115,
January 2010.
93
