Recursive Descriptions of Polar Codes by Presman, Noam & Litsyn, Simon
ar
X
iv
:1
20
9.
48
18
v3
  [
cs
.IT
]  
18
 Ju
n 2
01
5
Recursive Descriptions of Polar Codes
Noam Presman and Simon Litsyn∗
Abstract
Polar codes are recursive general concatenated codes. This property motivates a recursive formal-
ization of the known decoding algorithms: Successive Cancellation, Successive Cancellation with Lists
and Belief Propagation. Using such description allows an easy development of these algorithms for
arbitrary polarizing kernels. Hardware architectures for these decoding algorithms are also described
in a recursive way, both for Arikan’s standard polar codes and for arbitrary polarizing kernels.
1 Introduction
Polar codes were introduced by Arikan [1] and provided a scheme for achieving the symmetric capacity of
binary memoryless channels (B-MC) with polynomial encoding and decoding complexities. Arikan used
the so-called (u+ v, v) construction, which is based on the following linear kernel
G2 =
[
1 0
1 1
]
.
In this scheme, a 2n×2n matrix, G
⊗
n
2 , is generated by performing the Kronecker power on G2. An input
vector u of length N = 2n is transformed into an N length vector x by multiplying a certain permutation
of the vector u by G
⊗
n
2 . The vector x is transmitted over N independent copies of the memoryless
channel, W . This results in new N (dependent) channels between the individual components of u and
the outputs of the channels. Arikan showed that these channels exhibit the phenomenon of polarization
under Successive Cancellation (SC) decoding. This means that as n grows, there is a proportion of I(W)
(the symmetric channel capacity) of the channels that become clean channels (i.e. having the capacity
approaching 1) and the rest of the channels become completely noisy (i.e. with the capacity approaching
0). Arikan showed that the SC decoding algorithm has an algorithmic time and space complexity which
is O(N · log(N)) (the same asymptotic complexities apply also for the encoding algorithm). Furthermore,
it was shown [2] that asymptotically in the block length N , the block error probability of this scheme
decays to zero like O(2−
√
N ).
Generalizations of Arikan’s code structures were soon to follow. Korada et al. considered binary and
linear kernels [3]. They showed that a binary linear kernel is polarizing if and only if there does not exist a
column permutation of its generating matrix which is upper-triangular, and analyzed its rate of polariza-
tion, by introducing the notion of the kernel exponent. Mori and Tanaka considered the general case of a
mapping g(·), which is not necessarily linear and binary, as a basis for channel polarization constructions
[4]. They gave sufficient conditions for polarization and generalized the exponent for these cases. They
further showed examples of linear and non-binary Reed-Solomon codes and Algebraic Geometry codes
with exponents that are far better than the exponents of the known binary kernels [5]. The authors of
this correspondence gave examples of binary but non-linear kernels having the optimal exponent per their
kernel dimensions [6, 7].
∗Noam Presman and Simon Litsyn are with the the School of Electrical Engineering, Tel Aviv University, Ramat Aviv
69978 Israel. (e-mails: {presmann, litsyn}@eng.tau.ac.il.).
1
All of the aforementioned polar code structures have homogenous kernels, meaning that the alphabet
of their inputs and their outputs are the same. The authors of this correspondence considered the case
that some of the inputs of a kernel may have different alphabet than the rest of the inputs [8]. This results
in the so-called mixed-kernels structure, that have demonstrated good performance for finite length codes
in many cases. A further generalization of the polar code structure was suggested by Trifonov [9], in which
the outer polar codes were replaced by suitable codes along with their appropriate decoding algorithms.
We note here, that the representation of polar codes as instances of general concatenated codes (GCC) is
fundamental to this correspondence, and we elaborate on it in the sequel.
Generalizations and alternatives to SC as the decoding algorithm were also extensively studied. Tal
and Vardy introduced the Successive Cancellation List (SCL) decoder [10, 11]. In this algorithm, the
decoder considers up to L concurrent decoding alternatives on each one of its stages, where L is the size of
the list. At the final stage of the algorithm, the most likely result is selected from the list. The asymptotic
time and space complexities of this decoder are the same as those of the standard SC algorithm, multiplied
by L. Furthermore, incorporation of a cyclic redundancy check code (CRC) as an outer-code, results in a
scheme with an excellent error-correcting performance, which in many cases is comparable with state of
the art schemes (see e.g. [11, Section V]).
Belief-Propagation is an alternative to the SC decoding algorithm. This is a message passing iterative
decoding algorithm that operates on the normal factor graph representation of the code. It is known
to outperform SC over the Binary Erasure Channel (BEC) [12] and seems to have good performance on
other channels as well [12, 13].
Leroux et al. considered efficient hardware implementations for the SC decoder of the (u+ v, v) polar
code [14, 15]. They gave an explicit design of a ”line decoder” with N/2 processing elements and O(N)
memory elements. Their work, contains an efficient approximate min-sum decoder, and a discussion on
a fixed point implementation. Their design is verified by an ASIC synthesis. Efficient limited parallelism
decoders were considered by Leroux et al. [16] and by Pamuk and Arikan [17]. Hardware implementation
of SCL decoder was discussed in Balatsoukas-Stimming et al. papers [18, 19]. Pamuk considered a
hardware design of BP decoder tailored for an FPGA implementation [20].
The goal of this paper is to emphasize the formalization of polar codes as recursive GCCs and the
implication of this property on the encoding and decoding algorithms. The main contributions of this
manuscript are as follows: 1) Formalizing Tal and Vardy’s SCL as a recursive algorithm, and thereby
generalizing it to arbitrary kernels. 2) Formalizing Leroux et al. SC line decoder and generalizing it to
arbitrary kernels. 3) Defining a BP decoder with GCC schedule, and suggesting a BP line architecture
for it.
The paper is organized as follows. In Section 2 we describe polar code kernels as the generating
building blocks of polar codes. We then elaborate on the fact that polar codes are examples of recursive
GCC structures. This fundamental notion, is the motivation for formalizing the encoding and decoding
algorithms in a recursive fashion in Sections 3 and 4, respectively. In particular, we study the standard
SC, the SCL (both for Arikan’s kernels and arbitrary ones) and BP (for linear lower triangular kernels)
decoding algorithms. These formalizations lay the ground for schematic architectures of the decoding
algorithms in Subsection 5.1. Specifically, we restate Leroux et al. SC pipeline and SC line decoders,
and introduce a line decoder for the GCC schedule of the BP algorithm. Finally, in Subsection 5.2, we
consider generalizations of these architectures for arbitrary kernels.
2 Preliminaries
Throughout we use the following notations. For a natural number ℓ, we denote [ℓ] = {1, 2, 3, ..., ℓ} and
[ℓ]− = {0, 1, 2, ..., ℓ− 1}. We represent vectors by bold letters. For i ≥ j, let uij = [uj uj+1 . . . ui] be the
sub-vector of u of length i− j+1 (if i < j we say that uij = [ ], the empty vector, and its length is 0). For
two vectors u and v of lengths nu and nv, we denote the nu+nv length vector which is the concatenation
of u to v by [u,v] or [u v] or u • v or just uv. For a scalar x, the nu + 1 length vector u • x, is just the
2
concatenation of the vector u with the length one vector containing x. Matrices are denoted by boldface
capital letters. We denote the set of all the matrices of n1 rows and n2 columns over a field F by F
n1×n2 .
Let A ∈ Fn1×n2 . We denote row i (column j) of the matrix by Ai→ (A↓j). The element at row i and
column j is denoted by Ai,j . The sub-matrix containing only rows i1 ≤ i ≤ i2 and columns j1 ≤ j ≤ j2 is
denoted as Ai1:i2,j1:j2 .
In this paper we consider kernels that are based on bijective transformations over a field F . A channel
polarization kernel of ℓ dimensions, denoted by g(·), is a mapping
g : F ℓ → F ℓ.
This means that g(u) = x, u,x ∈ F ℓ.
We refer to this type of kernel as a homogeneous kernel, because its ℓ input coordinates and ℓ output
coordinate are from the same alphabet F . Symbols from an alphabet F are called F -symbols in this
paper. The homogenous kernel g(·) may generate a polar code of length ℓm F -sybmols by inducing a
larger mapping from it, in the following way [4].
Definition 1 (Homogenous Polar Code Generation) Given an ℓ dimensions transformation g(·),
we construct a mapping g(n)(·) of N = ℓn dimensions (i.e. g(n)(·) : F ℓ
n
→ F ℓ
n
) in the following recursive
fashion.
g(1)(uℓ−10 ) = g(u
ℓ−1
0 ) ;
for n > 1, g(n) =
[
g (γ0,0, γ1,0, γ2,0, . . . , γℓ−1,0) ,
g (γ0,1, γ1,1, γ2,1, . . . , γℓ−1,1) , . . . ,
g
(
γ0,N/ℓ−1, γ1,N/ℓ−1, γ2,N/ℓ−1, . . . , γℓ−1,N/ℓ−1
) ]
,
where
[γi,j ]
j=N/ℓ−1
j=0 = g
(n−1)
(
u
(i+1)·(N/ℓ)−1
i·(N/ℓ)
)
, i ∈ [ℓ]− .
2.1 Polar Codes as Recursive General Concatenated Codes
General Concatenated Codes (GCC)1 are error correcting codes that are constructed by a technique, which
was introduced by Blokh and Zyabolov [22] and Zinoviev [23]. In this construction, we have ℓ outer-codes
{Ci}
ℓ−1
i=0 , where Ci is an Nout length code of sizeMi over alphabet Fi. We also have an inner-code of length
Nin and size
∏ℓ−1
i=0 |Fi| over alphabet F , with a nested encoding function φ : F0×F1× ...×Fℓ−1 → F
Nin .
The GCC that is generated by these components is a code of length Nout ·Nin symbols and of size
∏ℓ−1
i=0 Mi.
It is created by taking an ℓ×Nout matrix, in which the i
th row is a codeword from Ci, and applying the
inner mapping φ on each of the Nout columns of the matrix. As Dumer describes in his survey [24], GCCs
can give good code parameters for short length codes when using appropriate combinations of outer-codes
and a nested inner-code. In fact, some of them give the best parameters known. Moreover, decoding
algorithms may utilize their structure by performing local decoding steps on the outer-codes and utilizing
the inner-code layer for exchanging decisions between the outer-codes.
As Arikan already noted, polar codes are instances of recursive GCCs [1, Section I.D]. This observation
is useful as it allows to formalize the construction of large length polar code as a concatenation of several
smaller length polar codes (outer-codes) by using a kernel mapping (an inner-code). Therefore, applying
this notion to Definition 1, we observe that a polar code of length ℓm symbols, may be regarded as a
collection of ℓ outer polar codes of length ℓm−1 (the ith outer-code is [γi,j ]
j=N/ℓ−1
j=0 = g
(n−1)
(
u
(i+1)·N/ℓ−1
i·N/ℓ
)
for i ∈ [ℓ]−). These codes are then joined together by employing an inner-code (defined by the kernel
1The construction of the GCCs is a generalization of Forney’s code concatenation method [21].
3
Figure 1: A GCC representation of a polar code of length ℓn symbols constructed by a homogenous kernel
according to Definition 1
function g(·)) on the outputs of these mappings. There are N/ℓ instances of the inner-mapping, such that
instance number j ∈ [N/ℓ]− is applied on the j
th symbol from each outer-code.
The above GCC formalization is illustrated in Figure 1. In this figure, we see the ℓ outer-code
codewords of length ℓm−1 depicted as gray horizontal rectangles (similar to rows of a matrix). The
instances of the inner-codeword mapping are depicted as vertical rectangles that are located on top of the
gray outer-codes rows (resembling columns of a matrix). This is appropriate, as this mapping operates
on columns of the matrix which rows are the outer-code codewords. Note that for brevity we only drew
three instances of the inner mapping, but there should be ℓm−1 instances of it, one for each column of
this matrix. In the homogenous case, the outer-codes themselves are constructed in the same manner.
Note, however, that even though these outer-codes have the same structure, they form different codes in
the general case. The reason is that they may have different sets of frozen symbols.
Example 1 (Arikan’s Construction) Let g(u0, u1) = [u0 u1] ·G2. Let u be an N = 2
n length binary
vector. The vector u is transformed into an N length vector x by using a bijective mapping g(n)(·) :
{0, 1}N → {0, 1}N . The transformation is defined recursively as
for n = 1 g(1)(u) = g(u) = [u0 + u1, u1] ,
for n > 1 g(n)(u) = xN−10 , (1)
where [x2j , x2j+1] = [γ0,j + γ1,j , γ1,j ] for j ∈ [N/2]−, and [γ0,j]
N/2−1
j=0 = g
(n−1)
(
u
N/2−1
0
)
, [γ1,j ]
N/2−1
j=0 =
g(n−1)
(
uN−1N/2
)
are the two outer-codes (each one of length N/2 bits). Figure 2 depicts the GCC block
diagram for this example.
The GCC structure of polar codes can be also represented by a layered2 Forney’s normal factor graph
[25]. Layer #0 of this graph contains the inner mappings (represented as sets of vertices), and therefore
2In a layered graph, the vertices set can be partitioned into a sequence of sub-sets called layers and denoted by
L0, L1, · · · , Lk−1. The edges of the graph connect only vertices within the layer or in successive layers.
4
Figure 2: Example 1’s GCC representation (Arikan’s construction)
we refer to it as the inner-layer. Layer #1 contains the vertices of the inner layers of all the outer-codes
that are concatenated by layer #0. We may continue and generate layer #i by considering the outer-codes
that are concatenated by layer #(i − 1) and include in this layer all the vertices describing their inner
mappings. This recursive construction process may continue until we reach to outer-codes that cannot be
decomposed into non-trivial inner-codes and outer-codes. Edges (representing variables) connect between
outputs of the outer-codes to the input of the inner mappings. This representation can be viewed as
observing the GCC structure in Figure 1 from its side.
Example 2 (Layered Normal Factor Graph for Arikan’s Construction) Figures and 3 and 4 de-
pict a layered factor graph representation for length N = 2n symbols polar code with kernel of ℓ = 2
dimensions. Figure 3 gives only a block structure of the graph, in which we have the two outer-codes
of length N/2 that are connected by the inner layer (note the similarities to the GCC block diagram in
Figure 2). Half edges represent the inputs uN−10 and the outputs x
N−1
0 of the transformation. The edges
(denoted by γi,j , j ∈ [N/2]− , i ∈ [2]−) connect the outputs of the two outer-codes to the inputs of the
inner mapping blocks, g(·). A more elaborated version of this figure is given in Figure 4, in which we
unfolded the recursive construction.
Strictly speaking, the green blocks that represent the g(·) inner-mapping are themselves factor graphs
(i.e. collections of vertices and edges). An example of a normal factor graph specifying such a block is
given in Figure 5 for Arikan’s (u + v, v) construction (see Example 1). Vertex a0 represents a parity
constraint and vertex e1 represents an equivalence constraint. The half edges u0, u1 represent the inputs
of the mapping, and the half edges x0, x1 represent its outputs. This graphical structures is probably the
most popular visual representation of polar codes (see e.g. [1, Figure 12] and [26, Figure 5.2] ) and is also
known as the ”butterflies” graph because of the edges arrangement in Figure 4.
2.2 Mixed-Kernels Polar Codes
Thus far, we described homogenous kernels constructions in which a single kernel and code alphabet
is used for generating the polar codes structures. It may be advantageous in terms of error-correction
performance and complexity to combine two types of kernels (each one over different alphabet) into
one structure. Such constructions are called mixed-kernels structures [8, 27]. In order to have a more
comprehensive introduction to the notion of mixed-kernels we give an example of the structure (taken
from [8, 27]).
5
Figure 3: Representation of a polar code with kernel of ℓ = 2 dimensions as a layered factor graph
Figure 4: Representation of a polar code with kernel of ℓ = 2 dimensions as a layered factor graph (detailed
version of Figure 3 - recursion unfolded)
6
Figure 5: Normal factor graph representation of the g(·) block from Figures 3 and 4 for Arikan’s (u+v, v)
construction
Example 3 (Mixed-Kernels Construction) Let g(·) be a four dimensions binary mapping defined as
g(u) = u ·G⊗22 . Using g(·) we define an additional kernel
g0(u0, u(1,2), u3) , g(u0, u1, u2, u3),where u(1,2) , [u1, u2] ∈ {0, 1}
2. (2)
In other words we take the u1 and u2 binary inputs to g(·) and combine them into a single quaternary
entity u(1,2). We informally say that u1 and u2 were glued together generating u(1,2).
Let g1(·) :
(
{0, 1}2
)4
→
(
{0, 1}2
)4
be a polarizing kernel over the quaternary alphabet. For example,
g1(·) can be a kernel, based on the extended Reed-Solomon code of length four, GRS(4) that was proven
by Mori and Tanaka [28, Example 20] to be a polarizing kernel. The homogenous polar code generated by
g1(·) is dubbed the RS4 polar code. Using g1(·), we can extend the mapping of g0(·) to a length N = 4
n bits
code. Both g0(·) and g1(·) are referred to as the constituent kernels of the construction. Note that g1(·)
is introduced in order to handle the glued bits u(1,2) of the input of g0(·) and therefore is also referred to
as the auxiliary kernel of the construction. The standard Arikan’s construction (based on the Kronecker
power) does not suffice here, because of the glued bits u(1,2), that need to be jointly treated as a quaternary
symbol.
The mixed-kernels construction can be readily explained in terms of GCC structure. Let g(1)(·) = g0(·).
In order to extend this construction to a mapping g(n)
(
u4
n−1
0
)
, n > 1 for which some of the inputs are
glued, we suggest the following recursive GCC construction. We define three outer-code:
outer-code #0: [γ0,j]
j=N/4−1
j=0 = g
(n−1)
(
u
N/4−1
0
)
, uj , γ0,j ∈ {0, 1}, j ∈ [N/4]−;
outer-code #1: [γ1,j]
j=N/4−1
j=0 = g
(n−1)
2
([
u(N/4+2j,N/4+2j+1)
]j=N/4−1
j=0
)
, u(N/4+2j,N/4+2j+1), γ1,j ∈ {0, 1}
2, j ∈
[N/4]−;
outer-code #2: [γ2,j]
j=N/4−1
j=0 = g
(n−1)
(
uN−13N/4
)
, uj+3N/4, γ2,j ∈ {0, 1}, j ∈ [N/4]−,
where u(i,j) means that the items of sub-vector u
j
i were glued together, generating an element from
the larger alphabet {0, 1}j−i+1. Note that outer-codes #0 and #2 are just mixed-kernels constructions of
length N/4 bits. The output of these outer-codes are binary vectors, but the input is a mixture of binary
and quaternary symbols (generated by bits that were glued together). Outer-code #1 is a homogenous polar
code construction of length N/4 quaternary symbols, that all of its input symbols and output symbols are
bits that were glued together in pairs. Finally, these three outer-codes are combined together using the
g0(·) inner mapping.
g(n) =
[
g0 (γ0,0, γ1,0, γ2,0) , g0 (γ0,1, γ1,1, γ2,1) , . . . ,
g0
(
γ0,N/4−1, γ1,N/4−1, γ2,N/4−1
) ]
.
Figure 6 depicts this GCC construction. Note that outer-code #1 was drawn as a rectangle having the
same width of outer-code #0 (or #2). This property symbolizes that all the outer-codes have the same
7
Figure 6: A GCC representation of the length N = 4n bits mixed-kernels polar code g(n)(·) described in
Example 3
length in terms of symbols. On the other hand, the height of the rectangle of outer-code #1 is twice the
height of each of the rectangles of the other two outer-codes. This property indicates that the symbols
alphabet size of outer-code #1 is twice the size of the symbols alphabet of the other outer-codes (for which
the symbols are bits). This is because outer-code #1 is a quaternary mapping in which both the input
symbols and the output symbols are pairs of glued bits.
The recursive GCC structure of polar codes enables recursive formalizations of the algorithms asso-
ciated with them. These algorithms benefit from simple and clear descriptions, which support elegant
analysis. Furthermore, in some cases it allows reuse of resources and indicates which operations may be
done in parallel. The essence of the recursive encoding algorithm has already been described in Definition
1. In Section 3 we formalize these ideas and in addition describe an algorithm for systematic encoding
of polar codes. Afterwards, we consider the decoding algorithms of polar codes, giving them a recursive
formulation in Section 4.
3 Recursive Descriptions of Polar Codes Encoding Algorithms
In this section we discuss encoding algorithms for polar codes. We begin in Subsection 3.1 by describing
a non-systematic encoding algorithm that is a direct consequence of the GCC structure discussed in
Subsection 2.1. Subsection 3.2 considers systematic encoding algorithm of linear polar codes with lower
triangular kernel generating matrix.
3.1 Non-Systematic Encoding
In this subsection we consider a non-systematic recursive encoding algorithm that is based on the recursive
GCC structures of polar codes. Let us begin by describing the algorithm for Arikan’s (u + v, v) polar
code. Let u be an N length binary vector, serving as the encoder input. Each polar code of length N
is defined by its N length frozen-indicator vector z, such that zi = 1 if and only if the i
th input of the
encoder is frozen (i.e. fixed and known to both of the encoder and the decoder) and zi = 0 otherwise. For
a (u+ v, v) polar code of dimension k, we have that wH(z) = N − k, where wH(·) is the Hamming weight
of the vector. Given an information vector u˘ ∈ {0, 1}k, it is the role of the encoder to output a binary
codeword x ∈ {0, 1}N representing its corresponding codeword. Given an information vector u˘ and z,
it is easy to generate u ∈ {0, 1}n, the encoder input, such that values of u˜ are sequentially assigned to
8
the non-frozen components of u and elements corresponding to frozen indices are set to a predetermined
value (here, we arbitrarily decided to set the frozen values of u to zero), i.e.
if zi = 1 then ui = 0, ∀i ∈ [N ]−; (3)
uθi = u˘i, ∀i ∈ [k]−,
where θ is a k length vector, such that θi is the i
th index of z corresponding to zero value (i.e. indicating
a non-frozen input symbol). The signature of the non-systematic encoding algorithm for N = 2n code is
x = NonSysEncoder (u) . (4)
Algorithm 1 describes a recursive implementation of the encoder. Note that for a scalar input u to (4)
(i.e. n = 0) we have the output x equal to u.
Algorithm 1 Non-Systematic Encoder for (u + v, v) Polar Code, of Length N = 2n Bits, n ≥ 1
[◮◮◮] Input: u.
//Initialization:
⊲ Allocate two binary vectors x(0) and x(1) each one of length N/2.
//Encode the Outer-Codes:
⊲ Encode the two outer-codes of length N/2 using the information sub-vectors u
N/2
0 and u
N−1
N/2 :
x(i) = NonSysEncoder
(
u
(i+1)·N/2−1
i·N/2
)
, ∀i ∈ {0, 1}. (5)
//Encode the Inner-Code:
⊲ Apply the inner-code (u + v, v) on the pairs
[
x
(0)
j , x
(1)
j
]
:
x2j+12j =
[
x
(0)
j + x
(1)
j , x
(1)
j
]
, ∀j ∈ [N/2]− . (6)
[◭◭◭] Output: x.
Let us now consider a general kernel g(·) of ℓ dimensions over a field F , i.e. g : F ℓ → F ℓ. The signature
of the encoder remains the same, only that both u and x are in FN . Algorithm 2 describes the encoding
procedure for this case. Similarly to the (u+ v, v) case, the function has its output equal to its its input
for scalar inputs.
Encoding of mixed-kernels is performed in a similar fashion. The difference is that in the outer-code
encoding phase we need to provide information sub-vectors of different lengths. Let us consider the
mixed-kernels instance given in Example 3. We have three computations of outer-codes
x(0) = NonSysEncoder
(
u
N/4−1
0
)
; x(1) = NonSysEncoder(RS4)
(
u
3·N/4−1
N/4
)
; x(2) = NonSysEncoder
(
uN−13N/4
)
,
(9)
where x(0),x(2) ∈ {0, 1}N/4 and x(1) ∈ {0, 1}N/2. The function NonSysEncoder(RS4) is the encoding
procedure of the homogenous RS4 code which input and output are GF (4) vectors. The elements of
GF (4) are represented by their binary vector (cartesian) form.
9
Algorithm 2 Non-Systematic Encoder for Homogenous Polar Code of Length N = ℓn F -Symbols, Based
on Kernel g(·), n ≥ 1
[◮◮◮] Input: u.
//Initialization:
⊲ Allocate ℓ vectors
{
x(i)
}ℓ
i=0
, each one of length N/ℓ F -symbols.
//Encode the Outer-Codes:
⊲ Encode the ℓ outer-codes of length N/ℓ using the information sub-vectors
{
u
(i+1)·N/ℓ−1
i·N/ℓ
}
i∈[ℓ]−
:
x(i) = NonSysEncoder
(
u
(i+1)·N/ℓ−1
i·N/ℓ
)
, ∀i ∈ [ℓ]− . (7)
//Encode the Inner-Code:
⊲ Apply the inner-code g(·) on the sub-vectors
[
x
(i)
j
]
i∈[ℓ]−
, ∀j ∈ [N/ℓ]−:
x
(j+1)·ℓ−1
j·ℓ = g
(
x
(0)
j , x
(1)
j , . . . , x
(ℓ−1)
j
)
, ∀j ∈ [N/ℓ]− . (8)
[◭◭◭] Output: x.
3.2 Systematic Encoding
In this subsection we consider systematic encoding of polar codes. A systematic encoder has the prop-
erty that the non-frozen symbols of the encoder input vector, u, appear explicitly in their correspond-
ing codeword, x. Formally speaking, for a length N code, we define a bijective mapping function
mN (·) : [N ]− → [N ]−, such that a systematic encoder corresponding to mN (·) outputs x, satisfying
ut = xmN (t) for all non-frozen indices t ∈ [N ]− (i.e. zt = 0). A systematic encoder is advantageous
because it facilitates retrieval of the user information without performing a decoding first (assuming no
errors occurred in the received codeword). Furthermore, Arikan demonstrated by simulations systematic
coding systems having better BER performance compared to non-systematic coding systems using the
same (u+ v, v) polar code [29].
In this paper we consider systematic encoders for linear kernels having a lower triangular generating
matrix G ∈ F ℓ×ℓ. The signature of the systematic encoder is defined as follows:
[x, u˜] = SysEncoder (u, z) , (10)
where the vectors x,u and z were defined before in Subsection 3.1 and x is a systematic encoding of u. The
vector u˜ ∈ FN is the input for the non-systematic encoder that results in x, i.e. x = NonSysEncoder (u˜)
and u˜t = 0 if zt = 1 for all t ∈ [N ]−. While not being a necessary output of the algorithm, u˜ is used here
to enable a more comprehensible description of the systematic encoder. Indeed, the systematic encoder
may be understood as an algorithm for finding the vector u˜ meeting these requirements.
Let us first consider the N = ℓ F -symbols case. In this case we have to find u˜ such that u˜ ·G = x,
and
∀t ∈ [ℓ]−, ut = xmℓ(t) if zt = 0 and otherwise u˜t = 0. (11)
For this base case we take mℓ(·) to be the identity function, i.e. mℓ(t) = t, ∀t ∈ [ℓ]−. Algorithm 3
describes the systematic encoding procedure for this case. It can be easily shown by induction on the
for-loop variable j (beginning with ℓ − 1 and ending with 0) that on each step condition (11) is met.
10
Algorithm 3 Systematic Encoder for Homogenous Length N = ℓ F -Symbols Polar Code, Based on
Lower Triangular Kernel G ∈ F (ℓ×ℓ)
[◮◮◮] Input: u; z.
//Initializations:
⊲ Allocate two vectors u˜,x ∈ F ℓ. Initialize x = 0.
//Successively encode u:
⊲ For j = ℓ− 1 to 0 Do
• If zj == 0 Then set u˜j = G
−1
j,j · (uj − xj) ; Else set u˜j = 0;
• Set x = x+ u˜j ·Gj→;
[◭◭◭] Output: x; u˜.
For the general N = ℓn case where n > 1 we utilize the GCC structure of the polar code in order to
perform systematic encoding. Let us first describe the indices mapping function mN (·). As was already
noted in the GCC discussion, and was exemplified in the non-systematic encoder (Algorithm 2), the input
sub-vector u
(i+1)N/ℓ−1
i·N/ℓ is also the input of outer-code Ci for all i ∈ [ℓ]−. The following requirement of the
mapping function will prove useful in the recursive implementation.
mN (t) ≡
⌊
t
N/ℓ
⌋
(mod ℓ) ∀t ∈ [N ]−, ∀N = ℓn. (12)
The implication of (12) is that non-frozen symbols placed at index t, such that b ·N/ℓ ≤ t < (b+1) ·N/ℓ,
where b ∈ [ℓ]− should appear at the output xτ where τ = a · ℓ+ b and a is some number in [N/ℓ]−. Note
that index t of the input corresponds to the inputs of outer-code Cb. Furthermore, if x
(i) is the outer-code
codeword of Ci we have xτ =
∑ℓ−1
i=b Gi,b · x
(i)
a (see Figure 1). This connection is useful, because if we
already systematically encoded all the inputs t′ such that (b+1) ·N/ℓ ≤ t′ (corresponding to outer-codes
Cb′ where b
′ ≥ b + 1), by appropriately calling the systematic encoder of Cb we can ensure that indeed
xτ = ut.
It can be proven by induction that a mapping function implementing the following recursion formula
indeed satisfies (12):
for n > 1, mℓn (t) = ℓ ·mℓn−1 (Rℓn−1 [t]) +
⌊
t
ℓn−1
⌋
, ∀t ∈ [ℓn]− ; (13)
mℓ (t) = t, ∀t ∈ [ℓ]− ,
where Rβ(α) is the remainder of α divided by β. Note that according to this definition, mℓ(·) is a base
ℓ reversal function, i.e. mℓ(t) has its base ℓ representation being equal to the base ℓ representation of t
given in reverse order (for ℓ = 2 this transformation is also known as the reverse shuffle operation).
Algorithm 4 describes the recursive algorithm for the general N = ℓn case (for n > 1). The algorithm
can also be easily adapted for mixed-kernels. Let us prove that the algorithm meets the systematic
encoding requirement.
Observation 1 After round i of the for-loop (beginning with ℓ− 1 and ending with 0), codeword compo-
nents xτ such that Rℓ [τ ] ≥ i are not changed anymore by Algorithm 4.
Proof Equation (16) updates vector x at the end of round i. Since G is lower triangular, we always have
that ∀i′ ∈ [ℓ]−, Gi′,j′ = 0 for j′ > i′. Therefore all the updates for rounds i′ < i of the for-loop will have
zeros in the vector Gi′→ in places corresponding to xτ where Rℓ [τ ] ≥ i. ♦
11
Observation 2 After round i of the for-loop (beginning with ℓ−1 and ending with 0), we have xmN (t) = ut
for all non-frozen components ut such that Rℓ [mN (t)] = i, where i ∈ [ℓ]− and t ∈ [N ]−.
Proof Let t be such that Rℓ [mN (t)] = i. Following (12) we have that
N
ℓ · i ≤ t ≤
N
ℓ · (i + 1) − 1.
Consequently, it is encoded at round i of the for-loop, dedicated for encoding Ci. Assume that t =
N
ℓ · i+r
where r = RN/ℓ [t]. In (14) we have u˜
′
r = G
−1
i,i ·
(
ut − xmN (t)
)
. After applying the systematic encoder in
(15), we have that x˜mN
ℓ
(r) = u˜
′
r. Following the execution of (16), we have xmN (t) = xmN (t)+x˜⌊mN (t)/ℓ⌋·Gi,i.
However, due to (13), we have ⌊mN (t) /ℓ⌋ = mN/ℓ
(
RN/ℓ [t]
)
= mN/ℓ (r). Therefore, we have xmN (t) =
xmN (t) + u˜
′
r · Gi,i = ut. Owing to Observation 1, the value of xmN (t) will not further change in the
algorithm, which proves the statement. ♦
Algorithm 4 Systematic Encoder for Homogenous Length N = ℓn F -Symbols Polar Code, Based on
Lower Triangular Kernel G ∈ F (ℓ×ℓ)
[◮◮◮] Input: u; z.
//Initializations:
⊲ Allocate ℓ vectors
{
u˜(i)
}
i∈[ℓ]− of length
N
ℓ F -symbols.
⊲ Allocate two vectors u˜′, x˜ ∈ FN/ℓ.
⊲ Allocate two vectors u˜,x ∈ FN . Initialize x = 0.
//Successively encode u:
⊲ For i = ℓ− 1 to 0 Do //encode Ci
• Prepare vector u˜′ which serves as the modified input to the encoder of Ci:
u˜′r =
{
0, zi·N
ℓ
+r = 1;
G−1i,i ·
(
ui·N
ℓ
+r − xmN(i·Nℓ +r)
)
, otherwise.
∀r ∈
[
N
ℓ
]
−
; (14)
• Run Ci systematic encoder:[
x˜, u˜(i)
]
= SysEncoder
(
u˜′, z(i+1)·
N
ℓ
−1
i·N
ℓ
)
; (15)
• Update the encoded vector x
x
(r′+1)·ℓ−1
r′·ℓ = x
(r′+1)·ℓ−1
r′·ℓ + x˜r′ ·Gi→, ∀r
′ ∈
[
N
ℓ
]
−
; (16)
[◭◭◭] Output: • x;
• u˜ =
[
u˜(0), u˜(1), . . . , u˜(ℓ−1)
]
;
4 Recursive Descriptions of Polar Codes Decoding Algorithms
In this section we describe decoding algorithms for polar codes in a recursive framework that is induced
from their recursive GCC structures. Roughly speaking, all the algorithms we consider here have a similar
format. Consider the GCC structure of Figure 1. In this construction we have a length N symbols code
that is composed of ℓ outer-codes, denoted by {Ci}
ℓ−1
i=0 , each one of length N/ℓ symbols. The decoding
12
algorithms that are considered here are composed of ℓ pairs of steps. The ith pair is dedicated to decoding
Ci as described in Algorithm 5.
Algorithm 5 Decoding Outer-code Ci, i ∈ [ℓ]−
//STEP 2 · i:
⊲ Using the previous steps, prepare the inputs to the decoder of outer-code Ci.
//STEP 2 · i+ 1:
⊲ Run the decoder of code Ci on the inputs you prepared.
⊲ Process the output of this decoder, together with the outputs of the previous steps.
Typically, the codes {Ci}
ℓ−1
i=0 are polar codes of length N/ℓ symbols, thereby creating the recursive
structure of the decoding algorithm.
Note that the decoding algorithm structure in Algorithm 5 is quite typical for decoding algorithms
of GCCs. As an example, see the decoding algorithms in Dumer’s survey on GCCs [24]. In addition,
the recursive decoding algorithms for Reed-Muller (RM) codes, utilizing their Plotkin (u+ v, v) recursive
GCC structure were extensively studied by Dumer [30, 31] and are closely related to the algorithms we
present here. Actually, Dumer’s simplified decoding algorithm for RM codes [31, Section IV] is the SC
decoding for Arikan’s structure, we describe in Subsection 4.1.
The algorithms we describe in a recursive fashion are the SC (Subsection 4.1), Tal and Vardy’s SCL
(Subsection 4.2) and BP (Subsection 4.3). For all of these algorithms, we first consider Arikan’s (u+ v, v)
code and then provide generalizations for other kernels, both homogenous and mixed. We note, that when
possible, we prefer that the inputs to the algorithm and the internal computations are interpreted as log
likelihood ratios (LLRs). Consequently, the SC algorithm and BP are described in such manner. In SCL,
however, we need to be able to decide among different simultaneous decoding option, therefore we use
log-likelihoods (LLs) instead of LLRs.
Furthermore, in our discussion we do not consider how to efficiently compute these quantities. In some
cases, especially with large kernels or with large alphabet size, these calculations pose a computational
challenge. Approaches to adhere this challenge, are efficient decoding algorithms (such as variants of
Viterbi algorithms) or approximations of the computations (for example, the min-sum approximation
that Leroux et al. used [15] or the near Maximum Likelihood (ML) decoding algorithms that were used
by Trifonov [9]).
Remark 1 (SCL Decoding with LLRs) Balatsoukas-Stimming et al. presented an LLR based SCL
decoder [19] in which the decoding options are selected based on a measure called the path-metric (PM).
PM is a function of the computed LLRs and the already decided information symbols. It can be easily seen
that tracking the PM measure can also be integrated into the recursive description given in Subsection 4.2.
This can be achieved by introducing an additional data-structure to hold its computations.
4.1 A Recursive Description of the SC Algorithm
We begin by considering the SC decoder for Arikan’s (u+ v, v) construction. Description of the algorithm
for generalized arbitrary kernels then follows. The inputs of the SC algorithm for Arikan’s construction
are listed below.
• An N length vector of input LLRs, λ, such that λj = ln
Pr(Yj=yj |Xj=0)
Pr(Yj=yj |Xj=1) for j ∈ [N ]−, where Yj is
the measurement of the jth channel Xj → Yj .
• Vector indicator z ∈ {0, 1}N , in which zi = 1 if and only if element number i of the information
vector u is frozen.
13
The algorithm outputs the following structures.
• An N length binary vector uˆ containing the information word that the decoder estimated. This
vector includes the frozen symbols placed in their appropriate positions.
• An N length binary vector xˆ which is the codeword corresponding to uˆ.
The SC function signature is defined as
[uˆ, xˆ] = SCDecoder (λ, z) . (17)
First, let us describe the decoding algorithm for length N = 2 bits code, i.e. for the basic kernel
g(1)(u, v) = (u+ v, v). We get as input λ = [λ0, λ1] which are the LLRs of the output of the channel (λ0
corresponds to the first output of the channel and λ1 corresponds to the second output). The procedure
has four steps as described in Algorithm 6. Note that steps 1 and 3, may be done based on the LLRs
Algorithm 6 SC of the (u+ v, v) Kernel
[◮◮◮] Input: λ; z.
//STEP 0:
⊲ Compute the LLR of u: λˆ = 2 tanh−1 (tanh(λ0/2) tanh(λ1/2)).
//STEP 1:
⊲ Decide on u, (denote the decision by uˆ).
//STEP 2:
⊲ Compute the LLR of v (given the estimate of uˆ): λˆ = (−1)uˆ · λ0 + λ1.
//STEP 3:
⊲ Decide on v, (denote the decision by vˆ).
[◭◭◭] Output: • uˆ = [uˆ, vˆ];
• xˆ = [uˆ+ vˆ, vˆ].
computed on steps 0 and 2, respectively (i.e. by their sign), or by using an additional side information
(for example, if u is frozen, then the decision is based on its known value). A decoder for length N polar
code is described in Algorithm 7.
Let us now generalize this decoding algorithm for a GCC homogenous scheme with general kernel. In
this case for length N F -symbols code, we have an ℓ length mapping g(u) = x over the F alphabet, i.e.
g(·) : F ℓ → F ℓ. The inputs and the outputs of the decoding algorithm are the same as in the (u + v, v)
case, except that here the LLRs may correspond to non-binary alphabet. As a consequence, we need to
have |F | − 1 LLR input vectors
{
λ(t)
}
t∈F\{0}
each one of length N and defined such that
λ
(t)
j = ln
Pr (Yj = yj|Xj = 0)
Pr (Yj = yj |Xj = t)
(20)
for j ∈ [N ]−, where Yj is the measurement of the jth channel Xj → Yj . Furthermore u and x are in FN .
Note that we always have λ
(0)
j = 0 and therefore it doesn’t have to be calculated. The following is the
signature for the general SC decoder
[uˆ, xˆ] = SCDecoder
({
λ(t)
}
t∈F\{0}
, z
)
. (21)
14
Algorithm 7 SC Recursive Description for Length N = 2n Bits (u + v, v) Polar Code
[◮◮◮] Input: λ; z.
//STEP 0:
⊲ Compute the LLR input vector, λˆ
N/2−1
0 , for the first outer-code such that
λˆi = 2 tanh
−1 (tanh(λ2i/2) tanh(λ2i+1/2)) , ∀i ∈ [N/2]− .
//STEP 1:
⊲ Give the vector λˆ as an input to the polar code decoder of length N/2. Also provide to the
decoding algorithm, the indices of the frozen bits from the first half of the codeword (corresponding
to the first outer-code), i.e. run[
uˆ(0), xˆ(0)
]
= SCDecoder
(
λˆ, z
N/2−1
0
)
. (18)
According to (17), uˆ(0) is the information word estimation for the first outer-code, and xˆ(0) is its
corresponding codeword.
//STEP 2:
⊲ Using λ and xˆ(0), prepare the LLR input vector, λˆ
N/2−1
0 , for the second outer-code, such that
λˆi = (−1)
xˆ
(0)
i · λ2i + λ2i+i, ∀i ∈ [N/2]− .
//STEP 3:
⊲ Give the vector λˆ as an input to the polar code decoder of length N/2. In addition, provide
the indices of the frozen bits from the second half of the codeword (corresponding to the second
outer-code), i.e. run [
uˆ(1), xˆ(1)
]
= SCDecoder
(
λˆ, zN−1N/2
)
, (19)
where uˆ(1) and xˆ(1) are the estimations of the information word and its corresponding codeword of
the second outer-code.
[◭◭◭] Output: • uˆ =
[
uˆ(0), uˆ(1)
]
;
• xˆ =
[
xˆ
(0)
i + xˆ
(1)
i , xˆ
(1)
i
]N/2−1
i=0
.
15
In the GCC structure of this polar code there exist at most ℓ outer-codes {Ci}, each one of length N/ℓ
symbols. We may have less than ℓ outer-codes, in case some of the inputs are glued (which results in a
mixed-kernels construction). In such cases, the outer-code corresponding to the glued inputs is considered
to be over a larger size input alphabet. We assume that each outer-code has a decoding algorithm
associated with it. This decoding algorithm is assumed to receive as input the ”channel” observations on
the outer-code symbols (usually manifested as probabilities matrices, or LLR vectors). If the outer-code
is a polar code, then this algorithm should also receive the indices of the frozen symbols of the outer-code.
We require that the algorithm outputs its estimation on the information vector and its corresponding
outer-code codeword.
Let us first consider an ℓ length code generated by a single application of the kernel i.e. x = g (u). Note
that this is the base case of the recursion. Assuming that we already decided on symbols ui−10 (denote this
decision by uˆi−10 ), computing the LLR vector λˆ
(t)
corresponding to the ith input of the transformation
(i.e. ui) is done according to the following rule
λˆ(t) = ln


∑
u
ℓ−1
i+1∈F ℓ−i−1 Rg
(
uˆi−10 , 0,u
ℓ−1
i+1
)
∑
u
ℓ−1
i+1∈F ℓ−i−1 Rg
(
uˆi−10 , t,u
ℓ−1
i+1
)

 , (22)
where
Rg(u
ℓ−1
0 ) = exp
(
−
ℓ−1∑
r=0
λ(xr)r
)
, such that x = g(u). (23)
Consequently, SC decoding for the ℓ length polar code includes sequential calculations of the likelihood
values
{
λˆ(t)
}
t∈F\{0}
corresponding to non-frozen ui according to (22) followed by a decision on ui (denoted
by uˆi) for i ∈ [ℓ]−. If ui is frozen, we set uˆi to be equal to its predetermined value. Finally in (21) we
output uˆ = [uˆ0 uˆ1 . . . uˆℓ−1], and xˆ = g (uˆ).
We now turn to describe the SC decoding algorithm for length N > ℓ homogenous polar code over F
based on the same kernel g(·). As we already mentioned, due to the code structure, the decoding algorithm
is composed of pairs of steps, such that the ith pair deals with the ith outer-code, where i ∈ [ℓ]−.
We denote the information word that was estimated by the decoder of the mth outer-code by uˆ(m)
and its corresponding codeword by xˆ(m), both of them are of length N/ℓ symbols. Algorithm 8 describes
the pair of steps of the SC algorithm i ∈ [ℓ]− and Algorithm 9 specifies its output generation.
Remark 2 (LLR Calculations Simplification for Linear Kernels) Let us assume that g(·) is an ℓ
dimensions linear kernel, having a generating matrix G ∈ Fℓ×ℓ, such that x = g(u) = u ·G. It can be
easily seen that if xˆ = uˆi−10 ·G0:i−1,0:ℓ−1, then (22) is equivalent to
λˆ(t) = ln


∑
x∈Γi+xˆ exp
(
−
∑ℓ−1
r=0 λ
(xr)
r
)
∑
x∈Γi+xˆ+t·Gi→ exp
(
−
∑ℓ−1
r=0 λ
(xr)
r
)

 , t ∈ F\{0}, (26)
where Γi =
{
v
∣∣v = w ·G((i+1):(ℓ−1)),(0:(ℓ−1)), v ∈ F ℓ,w ∈ F ℓ−1−i}. Note that Γi is the linear code
induced by the last ℓ− 1− i rows of the generating matrix G. Furthermore Γi+ xˆ is the coset of the linear
code Γi that is induced by the coset vector xˆ.
The calculation method in (26) is attractive because it implements the enumeration of the cosets mem-
bers as a summation of xˆ (the estimated coset vector, computed throughout the algorithm) with predeter-
mined sets Γi + t ·Gi→ (the cosets of Γi in Γi−1). Therefore, efficient ways to calculate (26) for the case
of xˆ = 0 (e.g. using trellis decoding by employing the dual code of Γi) can be easily utilized for calcu-
lating (26) for non-zero xˆ. This can be done by appropriately modifying the input LLR vector reflecting
the notion that all the possible enumerated codewords are members of the cosets, considered in the case
of xˆ = 0, shifted by the constant vector xˆ. As a consequence, using as inputs the LLRs of a modified
16
Algorithm 8 SC Decoder Steps Dedicated for Outer-Code Ci, i ∈ [ℓ]−
//STEP 2 · i:
⊲ Prepare |F | − 1 LLR input vectors
{
λˆ
(t)
}
t∈F\{0}
each one of length N/ℓ using (24), i.e
λˆ
(t)
j = ln


∑
wℓ−1
i+1∈F ℓ−i−1 Rg
(
xˆ
(0)
j , xˆ
(1)
j , . . . , xˆ
(i−1)
j , 0,w
ℓ−1
i+1
)
∑
w
ℓ−1
i+1∈F ℓ−i+1 Rg
(
xˆ
(0)
j , xˆ
(1)
j , . . . , xˆ
(i−1)
j , t,w
ℓ−1
i+1
)

 , ∀t ∈ F\{0} and ∀j ∈ [N/ℓ]− ,
(24)
where xˆ(m) is the estimated codeword of outer-code C(m) that was computed at the previous steps,
m ∈ [i]−. Note that for the LLR calculation of λˆ
(t)
j in (24) we use input LLRs corresponding to
channel indices j · ℓ, j · ℓ+ 1, . . . , (j + 1) · ℓ− 1.
//STEP 2 · i+ 1:
⊲ Decode the ith outer-code using the computed LLR vectors
{
λˆ
(t)
}
t∈F\{0}
, i.e.
[
uˆ(i), xˆ(i)
]
= SCDecoder
({
λˆ
(t)
}
t∈F\{0}
, z
(i+1)·N/ℓ−1
i·N/ℓ
)
. (25)
Algorithm 9 SC Decoder Output Generation
[◭◭◭] Output: (occurs after applying Algorithm 8 for all i ∈ [ℓ]−)
• uˆ =
[
uˆ(0), uˆ(1), . . . , uˆ(ℓ−1)
]
;
• xˆ
(j+1)·ℓ−1
j·ℓ = g
(
xˆ
(0)
j , xˆ
(1)
j , . . . , xˆ
(ℓ−1)
j
)
, ∀j ∈ [N/ℓ]−.
17
channel generated by adding the known vector xˆ to the original channel output will allow to employ the
computations of the zero case for general cases.
Algorithm 8 can be adapted to support the computation technique suggested here. First, we initialize
the vector xˆ (later given as output) to be the all-zeros vector. Secondly, we replace (24) by the following
calculation
λˆ
(t)
j = ln


∑
x∈Γi+xˆ(j+1)ℓ−1j·ℓ
exp
(
−
∑ℓ−1
r=0 λ
(xr)
j·ℓ+r
)
∑
x∈Γi+xˆ(j+1)ℓ−1j·ℓ +t·Gi→
exp
(
−
∑ℓ−1
r=0 λ
(xr)
j·ℓ+r
)

 , t ∈ F\{0}. (27)
Thirdly, after estimating xˆ(i), the outer-code codeword of Ci in (43), we need to update the coset vector xˆ
by calculating
xˆ
(j+1)·ℓ−1
j·ℓ = xˆ
(j+1)·ℓ−1
j·ℓ + xˆ
(i)
j ·Gi→, ∀j ∈ [N/ℓ]− . (28)
As a consequence, in Algorithm 9 the decoder can just output xˆ that was calculated throughout the odd
steps of the algorithm. This simplification is used in our suggested schematic implementation in Subsection
5.2.1.
In case we have a mixed-kernels construction, the generalization is quite easy. In order to illustrate
this we consider an example of ℓ dimensions kernel in which we have glued the symbols u1 and u2 to a
new symbol u1,2 ∈ F
2 (see Example 3 for an instance of such structure). In this case, we treat these
two symbols as one entity, and consider the outer-code associated with them, denoted as C1,2, as an N/ℓ
length code over the alphabet F 2. The only change we have in the decoding algorithm is for the pair of
decoding steps of Algorithm 8 corresponding to this ”glued” symbols outer-code. For the first step in the
pair, we need to compute |F |2 − 1 LLR vectors
{
λˆ
(t0,t1)
}
(t0,t1)∈F 2\{(0,0)}
each one of length N/ℓ. These
vectors serve as an input to the the decoder of C1,2. In this case, each LLR component in the vector, is a
function of both u1 and u2 inputs to the kernel. Equation (24) is therefore updated as follows:
λˆ
(t1,t2)
j = ln


∑
w
ℓ−1
3 ∈F ℓ−3 Rg
(
xˆ
(0)
j , 0, 0,w
ℓ−1
3
)
∑
w
ℓ−1
3 ∈F ℓ−3 Rg
(
xˆ
(0)
j , t1, t2,w
ℓ−1
3
)

 , ∀j ∈ [N/ℓ]− . (29)
The second step of the pair in Algorithm 8 remains unchanged.
4.2 A Recursive Description of the SCL Algorithm
In this subsection we provide a recursive description of the SCL decoder, originally introduced by Tal and
Vardy [11]. Each stage of the SCL algorithm involves comparisons of likelihoods of different SC decoding
possibilities (resulting from keeping more than one decision option at the previous decoding stages).
Therefore, we assume that the inputs to the algorithm as well as its internal computation values are
interpreted as likelihoods, instead of LLRs3(note, however, that LLRs can be used as well, see Remark 1).
Note, that if the decoding list is of size 1, then the formulation given below is of the SC decoder described
in Subsection 4.1 (with the only difference that likelihoods are employed instead of LLRs).
The SCL algorithm, described in this subsection, returns as output a list of decoding possibilities. The
most likely element of this list should be given as output.
4.2.1 Sequential Decoders as Path Traversal Algorithms in Decoding Trees
Before dwelling into the details of SCL let us discuss the general idea that this algorithm entails. Se-
quential decoding algorithms examine their decision space (i.e. the set of all possible results) and choose
a result from it, by gradually refining the space (i.e. eliminating some of the possible outcomes) until a
3The notion of likelihoods normalization that was considered by Tal and Vardy [11, Algorithm 14] to avoid floating-point
or fixed-point underflows is also applicable here and should be employed for numerical stability.
18
Figure 7: Decoding tree for (u + v, v) polar code illustrating the decision space of the SC and SCL
algorithms
predetermined number of outcomes remains (in SC the number is 1, in SCL the number is L, from which
the best outcome is chosen). In SC and SCL the decision space is described by the input vector to the
encoder, u. The decision space is refined by determining the components of u in a consecutive order.
It is quite common to describe the decision space of an algorithm by an edge-labeled directed tree
dubbed a decoding tree. Note, however, that, strictly speaking, the graphical structure of the decoding
tree is generally a forest, because we may have multiple nodes at the top of the tree (representing different
input models) which are not connected to each other. These nodes are dubbed the roots of the decoding
tree.
Figure 7 illustrates such a decoding tree used in sequential decoding of Arikan’s (u+ v, v) polar code.
The decoding tree is a layered graph, such that the edges of each layer correspond to a single entry of
the vector u (the layers boundaries are indicated by the dotted lines in Figure 7). The nodes in the
graph indicate sequential decision junctions and the edges emanating from each node represent possible
assignments to the variable of the layer. The single path between the roots of the decoding tree and a
node appearing on the top of layer ui in the graph, indicates previous decisions (on variables u
i−1
0 ) that
preceded the decision on ui. Consequently, the paths between the root of the tree and the leaves of the
tree correspond to all possible assignments to the vector u. For example, in Figure 7 the paths of the
illustrated tree correspond to all the binary assignments to ui+30 given that u
i−1
0 is a fixed prefix (indicated
in the figure by the string 01001..., and further denoted by uˆi−10 ) and u
i+3
i ∈ {0, 1}
4.
In the SC algorithm the decoder always considers a single path (dubbed as a decoding-path) among the
possible tree paths. The decoding-path is gradually paved by sequentially joining to it edges emanating
from nodes reached by the previous stages. On stage #i of the algorithm the edge selection corresponds
to the most preferable assignment to the variable ui (assuming that the path leading to ui’s layer is fixed).
Figure 8 illustrates this abstraction. Given a certain prefix assignment to the sub-vector ui−10 = uˆ
i−1
0 , we
first turn to decide which edge emanating from the node corresponding to this prefix is better (i.e. we select
the best assignment to ui given the prefix). In SC this is done by calculating the likelihood of ui using
the channel observation vector y and the information prefix ui−10 = uˆ
i−1
0 , i.e. W
(
y,ui−10 = uˆ
i−1
0 |ui = b
)
where b ∈ {0, 1}. We call this likelihood function, the observed model of the decision node and we view it
as a function of only the variable b (the value of ui) while the other elements (y and u
i−1
0 ) are considered
as observations that define the statistical model. Based on this model calculations the ′0′ edge was chosen
in Figure 8 (indicated by the thick black edge). This decision, in turn, is used to update the model to
W
(
y,ui−10 = uˆ
i−1
0 , ui = 0|ui+1 = b
)
, which is further used to decide on ui+1 on the next step. Moving
forward, the algorithm successively updates the model and choose to add to the existing path, uˆi+10 , the
edges corresponding to, uˆi+2 = 1 and uˆi+3 = 0. In case that a certain variable is frozen then the next
edge is chosen as the one corresponding to its fixed value.
When applied to polar codes, SC has an advantage in terms of its algorithmic complexity. Utilizing
the recursive structure of the code, it is possible to efficiently compute the likelihoods needed for the
19
Figure 8: Representation of SC as a sequential walk on a decoding tree
decision on ui by reusing previous computation results obtained when deciding on u
i−1
0 . In other words,
in SC the channel observation model is easily updated given some results of the calculations performed
for determining the former observation model. The space needed to store these temporary calculations is
linear in the code length (assuming the kernel size ℓ and the alphabet size |F | are fixed). On the other
hand, when SC algorithm decides on ui it does not take into account the existence of possible frozen
values of its descendant nodes. In other words it assumes that all the assignments to uN−1i are possible
when calculating the likelihoods, even though the code structure enforces certain variables to be fixed.
This lack of ”future awareness” and the inability of the algorithm to change its past decisions (i.e. the
algorithm always advances in one direction in the tree, from ”top” to ”bottom”) are fundamental reasons
for its sub-optimality.
The SCL algorithm with list of size L is a generalization of SC, in which the decoder considers
simultaneously at most L possible decoding-paths. It can be seen that the complexity of SCL both in
time and in space will be approximately bounded from above by the complexity of SC times L. The
reason for this is that operations associated with each tree junction in SCL are roughly same ones that
would have been associated to the junction if it were on a single SC path. We need however additional
operations for choosing the L edges with the maximal likelihoods for continuing the paths. This can
be done in linear time (in L) per each decoding tree layer (information symbol, ui). Secondly, tracking
data structures need to be defined and utilized, in order to keep tabs on the existing decoding-paths while
allowing an emulation of the SC algorithm for each separate decoding path. As we next see we can employ
such structures and algorithms that will not exceed L times the asymptotic complexity of SC.
In Figure 9 we described the SC and SCL algorithms as a sequence of decoding tree refinements. The
models for these decisions are sequentially updated using past decisions (selected paths) and the current
observation model. The connection between the input observation model and the input model for the
next outer-code is defined by the inner-code layer. These updated input models are recursively provided
to the smaller outer-codes, until the we reach codes of single symbols (corresponding to the elements of
u70) in which decisions are made. It is indeed a property of the the recursive description of SC and SCL
that each recursion step utilizes a decoding tree of which the layers are the outer-codes. A fundamental
property of the algorithms is that on each recursion step, updating a model based on an edge selection
is linear in the outer-code length (assuming that the kernel is fixed). As a consequence, the number of
operations of SC is O (N · logN).
4.2.2 Data Structures for Tracking Decoding Paths in SCL
Tracking the employed observation-model is easy in SC because at any given point in time we assume
only a single model (induced by previous SC decisions). On the other hand, in SCL, multiple models are
considered simultaneously and it is therefore required to efficiently keeping track of them. Specifically, for
each constituent code of the GCC we must store the tree structure connecting between its outer-codes.
We now propose data structures for meeting this requirement.
20
Figure 9: SCL (L = 4) algorithm example of (u + v, v) with N = 8 bits (see Figure 7) illustrated on the
right a decoding tree on the outer-codes of the structure (C0, C1). The left decoding tree expands each
edge of the right tree into decoding-paths on the outer-codes of C0 and C1 . The labels of the edges are
the values of the outer-codes.
• S(e) - an ℓ×Lmatrix describing the edges of the decoding tree. Specifically, S
(e)
0→ contains indications
for the edges in the C0 layer and S
(e)
1→ corresponds to the C1 layer. The only interesting nodes in the
tree are the ones having a decoding path leading to them (we call them active nodes). We use the
arbitrary convention that active nodes are assigned numbers in [L]− starting from the level’s leftmost
node to the rightmost node as appeared in the figure. To represent this in our data structure we
let S
(e)
i,j contain the index of the single node at the top of layer i that is connected to node j at the
bottom of the layer. In case there are less than L nodes at the bottom of a layer, the matrix entries
corresponding to the missing nodes are assigned the null symbol, φ.
• S(p) - an L× ℓ matrix, such that S
(p)
i→ defines the single path between the roots of the tree and the
ith node at the bottom of the final layer. This path is specified in terms of the nodes indices, such
that S
(p)
i,j is the node located on the top of layer j in the path. Note that S
(p) is easily derived from
S(e).
• s - an L length vector describing the origin model for each decoding path, i.e. s =
(
S
(p)
↓0
)T
.
• Xˆ(i) where i ∈ [ℓ]− - ℓ matrices (of dimensions L×N/ℓ) used for keeping the labels of the selected
edges in S(e). Here Xˆ
(i)
r→ contains the label of the edge pointing to node r ∈ [L]− at the bottom of
layer i ∈ [ℓ]− in the decoding tree. Note that this edge is represented by the S
(e)
i,r entry.
• Uˆ(i) where i ∈ [ℓ]− - ℓ matrices (of dimensions L × N/ℓ), such that Uˆ
(i)
r→ is the information word
(including the assignment of the frozen symbols) of the outer-code corresponding to Xˆ
(i)
r→ codeword.
The decoding-paths data structures are generated throughout the decoding process. SCL sequential
traversing the decoding tree from its first level to its leaves results in updating these matrices. After
deciding on layer i ∈ [ℓ]− we write row i in S(e), and prepare matrices Xˆ(i) and Uˆ(i). Following S(e)’s
update, we prepare a new version of the paths matrix S(p) and its corresponding source vector s. Note
that on stage i, we interpret S
(p)
r,0:(i−1) as the single path leading from the roots to node r at the bottom
of layer i.
4.2.3 SCL Recursive Definition
Having defined the decoding paths tracking data structures, we are now ready to describe the SCL
algorithms’s inputs and outputs. Consider SCL for the (u+ v, v) polar code of length N bits with list size
21
L. The inputs of the algorithm are listed below.
• Two likelihood matricesΠ(0) andΠ(1) of L×N dimensions. Each row of the matrices corresponds to
a different input observation model option, considered by the decoder. The plurality of models exists,
due to SCL’s feature of constantly keeping a list of L decoding-paths representing past decisions on
the information word symbols. Each decoding-path induces a different statistical model, in which
it is assumed that the information sub-vector, associated with it, is the one that was transmitted.
We have
Π
(b)
i,j = Pr
(
Y
(i)
j = y
(i)
j |Vj = b
)
, (30)
where Y
(i)
j is the measurement of the j
th channel Vj → Yj of the i
th option in the list and b ∈ {0, 1}.
• A scalar ρin indicating how many rows in Π
(0) and Π(1) are occupied. The algorithm supports
tracking of ρin ∈ [L] input models simultaneously.
• A vector indicator z ∈ {0, 1}N , in which zi = 1 if and only if the i
th component of u is frozen.
The algorithm outputs the following structures.
• A matrix Uˆ of L × N dimensions, which represents L arrays of information values (each array of
length N) - this is the list of the possible information words that the decoder estimated.
• A matrix Xˆ of L×N dimensions, which represents L arrays of codewords (each array of length N)
- this is the list of codewords that correspond to the information words in Uˆ.
• A vector sL−10 , that indicates for each row in Uˆ and Xˆ to which row in the input Π
(0) and Π(1) it
has originated from (i.e. it refers to the statical model that was assumed when estimating this row).
• A scalar ρout indicating how many rows in Uˆ or Xˆ are occupied.
The SCL function signature is defined as
[
Uˆ, Xˆ, s, ρout
]
= SCLDecoder
({
Π(b)
}
b∈{0,1}
, ρin, z
)
. (31)
For length N = 2 bits code (i.e. the code induced by a single application of the kernel) the procedure
is described in Algorithm 10. In order to specify the SCL decoder for length N = 2n polar code, let us
assume that we already developed an SCL decoder for length N/2 polar code. Using this assumption, a
recursive decoder for length N polar code is described in Algorithm 11. Let T (n) be the decoding time
complexity, for length N = 2n bits polar code. Then T (n) = 2 · T (n− 1) + O(L ·N), and T (1) = O(L),
which results in T (n) = O(L ·N · logN). Similarly, the space complexity of the algorithm can be shown
to be O(L ·N).
The generalization of the decoding algorithm for a homogenous kernel of ℓ dimensions with alphabet
F is quite straight-forward. Here we emphasize the principal changes, from the (u + v, v) case. Firstly,
the only change in the input is that we should have |F | channel matrices, Π(b), one for each alphabet
symbol b ∈ F . With this change in alphabet the definition of each matrix in (30) remains. Consequently,
the function signature is defined as follows.
[
Uˆ, Xˆ, s, ρout
]
= SCLDecoder
({
Π(b)
}
b∈F
, ρin, z
)
. (40)
In the decoding algorithm, we have ℓ pairs of steps, such that each one is dedicated to a different outer-
code. Before reaching step 2 · i − 1, we already decoded outer-codes {Cm}
i−1
m=0. Using the decoding tree
terminology, we can say that we have traversed i layers of the tree (starting from the roots) generating at
most L decoding-paths. As a result we have the paths tracking data structures S(e), S(p), s,
{
Uˆ(m)
}i−1
m=0
22
Algorithm 10 SCL Decoding for the (u+ v, v) Kernel
[◮◮◮] Input:
{
Π(b)
}
b∈{0,1}; ρin; z.
//Initialization: Initialize the decoding-paths data structures : S(e), S(p), s, Uˆ(0), Xˆ(0), Uˆ(1) and Xˆ(1).
//STEP 0:
⊲ Generate two ρin length vectors, p
(0) and p(1). For each of the ρin occupied rows ofΠ
(0) andΠ(1)
compute p
(0)
r =
1
2
(
Π
(0)
r,0 ·Π
(0)
r,1 +Π
(1)
r,0 ·Π
(1)
r,1
)
and p
(1)
r =
1
2
(
Π
(0)
r,0 · Π
(1)
r,1 +Π
(1)
r,0 ·Π
(0)
r,1
)
, for r ∈ [ρin]−.
//STEP 1:
⊲ Concatenate the two vectors into one 2 · ρin length vector, p = [p
(0),p(1)].
⊲ Let p˜ be a vector that contains the ρ = min{2 · ρin, L} largest values of p.
⊲ For each r ∈ [ρ]− have S
(e)
0,r = σ and Uˆ
(0)
r,0 = β if and only if the r
th component of p˜ was originated
from p
(β)
σ . In other words, its source model is σ and the decoding tree edge connecting between
source model (represented by a node at the top level of the graph) and node r at the bottom of the
first layer has label β.
REMARK: If u is frozen (without loss of generality assume that it is set to the 0 value), then steps
0 and 1 can be skipped and ρ = ρin, S
(e)
0,0:ρin−1 = [0, 1, ..., ρin − 1] ,Uˆ
(0) = 0.
⊲ Update S(p) and s accordingly.
//STEP 2:
Generate two ρ length vectors, p(0) and p(1). For each of the ρ occupied rows of S(p) compute
(∀r ∈ [ρ]−).
p(0)r =
1
2
·
{
Π
(0)
sr ,0
· Π
(0)
sr ,1
, Uˆ
(0)
r,0 = 0;
Π
(1)
sr ,0
· Π
(0)
sr ,1
, Uˆ
(0)
r,0 = 1.
(32)
p(1)r =
1
2
·
{
Π
(1)
sr ,0
· Π
(1)
sr ,1
, Uˆ
(0)
r,0 = 0;
Π
(0)
sr ,0
· Π
(1)
sr ,1
, Uˆ
(0)
r,0 = 1.
(33)
//STEP 3:
⊲ Concatenate the two vectors into one 2 · ρ length vector, p = [p(0),p(1)].
⊲ Let p˜ be a vector that contains the ρout = min{2 · ρ, L} largest values of p.
⊲ For each r ∈ [ρout]− have S
(e)
1,r = σ and Uˆ
(1)
r,0 = β if and only if the r
th component of p˜ was
originated from p
(β)
σ .
REMARK: If the second bit is frozen (without loss of generality assume that it is set to the 0 value),
then steps 2 and 3 can be skipped and S
(e)
1,0:ρ−1 = [0, 1, . . . , ρ− 1] , Uˆ
(1) = 0, ρout = ρ.
⊲ Update S(p) and s accordingly.
[◭◭◭] Output: • Uˆr→ = [Uˆ
(0)
S
(p)
r,1 ,0
, Uˆ
(1)
r,0 ], ∀r ∈ [ρout]−;
• Xˆ =
[
Uˆ↓0 + Uˆ↓1, Uˆ↓1
]
;
• s;
• ρout.
23
Algorithm 11 SCL Decoder for Length N = 2n Bits (u + v, v) Polar Code
[◮◮◮] Input:
{
Π(b)
}
b∈{0,1}; ρin; z.
//Initialization: ⊲ Initialize the decoding-paths data structures : S(e), S(p), s, Uˆ(0), Xˆ(0), Uˆ(1) and
Xˆ(1).
//STEP 0:
⊲ Prepare the probability transition matrices for the first outer-code decoder. Specifically, generate
two matrices P(b) of dimensions L×N/2, b ∈ {0, 1}, such that
P
(0)
r,j =
1
2
(
Π
(0)
r,2·j ·Π
(0)
r,2·j+1 +Π
(1)
r,2·j ·Π
(1)
r,2·j+1
)
(34)
and
P
(1)
r,j =
1
2
(
Π
(0)
r,2·j · Π
(1)
r,2·j+1 +Π
(1)
r,2·j · Π
(0)
r,2·j+1
)
, ∀r ∈ [ρin]− , ∀j ∈ [N/2]− (35)
//STEP 1:
⊲ Decode the first outer-code using the updated channel model matrix, i.e.
[
Uˆ(0), Xˆ(0), S
(e)
0→, ρ
]
= SCLDecoder
({
P(b)
}
b∈{0,1}
, ρin, z
N/2−1
0
)
. (36)
⊲ Update S(p) and s following (36).
//STEP 2:
⊲ Prepare the input matrices for the decoder of the second outer-code of length N/2. Specifically,
generate two matrices P(b) of dimensions L×N/2, b ∈ {0, 1}, such that
P
(0)
r,j =
1
2
·
{
Π
(0)
sr ,2·j ·Π
(0)
sr ,2·j+1, Xˆ
(0)
r,j = 0;
Π
(1)
sr ,2·j ·Π
(0)
sr ,2·j+1, Xˆ
(0)
r,j = 1,
(37)
and
P
(1)
r,j =
1
2
·
{
Π
(1)
sr ,2·j ·Π
(1)
sr ,2·j+1, Xˆ
(0)
r,j = 0;
Π
(0)
sr ,2·j ·Π
(1)
sr ,2·j+1, Xˆ
(0)
r,j = 1,
, ∀r ∈ [ρ]− , ∀j ∈ [N/2]− . (38)
//STEP 3:
⊲ Decode the second outer-code using the updated channel model matrix, i.e.
[
Uˆ(1), Xˆ(1), S
(e)
1→, ρout
]
= SCLDecoder
({
P(b)
}
b∈{0,1}
, ρ, zN−1N/2
)
. (39)
⊲ Update S(p) and s following (39).
[◭◭◭] Output: • Uˆ→r =
[
Uˆ
(0)
S
(p)
r,1→
, Uˆ
(1)
r
]
, ∀r ∈ [ρout]−;
• Xˆr,even = Xˆ
(0)
S
(p)
r,1→
+ Xˆ
(1)
r→ and Xˆr,odd = Xˆ
(1)
r→ , ∀r ∈ [ρout]−;
• s;
• ρout.
Here Xˆr,even (Xˆr,odd) are the vectors of the even (odd) indices columns of row number r in matrix
Xˆ.
24
and
{
Xˆ(m)
}i−1
m=0
updated and describing the possible paths, that reach nodes at the top of the ith layer.
Algorithm 12 elaborates on steps 2 · i and 2 · i + 1 which find the sequel to the L paths in layer i of the
tree. The output generation of SCL is described in Algorithm 13.
Algorithm 12 SCL Decoding Steps Dedicated for Outer-Code Ci, i ∈ [ℓ]−
//Let ρ be set to the number of active nodes at the top of layer i. For i = 0 set ρ = ρin.
STEP 2 · i
⊲ Using the decoding results of the outer-codewords from the previous steps i.e. Xˆ(m), for m ∈
[i− 1]−, prepare the N/ℓ length likelihood lists,
{
P(b)
}
b∈F . Each item in the list is an L × N/ℓ
matrix, and all of them will serve as inputs to the decoder of the N/ℓ length outer-code #i. For
the computation of row r of P(b), use the input statistical model sr, that is the likelihoods in rows{
Π
(b)
sr→
}
b∈F
.
P
(b)
r,j =
∑
x∈A(r,j,b)
Π
(x0)
sr ,j·ℓ · Π
(x1)
sr ,j·ℓ+1 ·Π
(x2)
sr ,j·ℓ+2 · . . . ·Π
(xℓ−1)
sr ,(j+1)·ℓ−1, ∀r ∈ [ρ]− , ∀j ∈ [N/ℓ]− , (41)
where A(r,j,b) is defined to be the set of all possible codewords c = g(v) of the inner-code (defined by
the kernel g(·)), having vi = b, and the prefix v
i−1
0 defined by the r
th decoding-path edge labels. Note
that the rth decoding path nodes are σ =
[
S
(p)
r,0:(i−1) , r
]
and their corresponding jth inner-code
information prefix is v =
[
Xˆ
(m)
σm+1,j
]i−1
m=0
. Consequently we have,
A(r,j,b) ,
{
g(v)
∣∣∣∣vℓ−1i+1 ∈ F ℓ−1−i∧ vi = b∧ vi−1 = X(i−1)r,j ∧ vm = Xˆ(m)S(p)
r,m+1,j
where m ∈ [i− 2]−
}
.
(42)
STEP 2 · i+ 1
⊲ SCL decode the ith outer-code using the updated channel model matrix, i.e.
[
Uˆ(i), Xˆ(i), S
(e)
i→, ρ
]
= SCLDecoder
({
P(b)
}
b∈F
, ρ, z
(i+1)·N/ℓ−1
i·N/ℓ
)
. (43)
⊲ Update S(p) and s following (43).
The decoder for the basic N = ℓ length code also contains ℓ pairs of steps. The procedure is similar
to Algorithm 12. However, instead of delivering the likelihood matrices
{
P(b)
}
b∈F (here these matrices
are actually column vectors) to an outer-code decoder, we concatenate them to a vector p˜ and choose
the ρ = min {L, |F | · ρ} maximal elements from it. Following this selection we update the decoding path
tracking structures. This is a generalization of the case of N = 2 decoder in the (u+ v, v) construction.
In case the kernel is mixed, the generalization is also quite easy. Let us consider the mixed-kernels
example, from the end of Subsection 4.1. The only changes we have in the decoding algorithm, are for
the pair of steps in Algorithm 12 associated with the glued outer-code C0,1. In step 3 (the preparation
step for this outer-code), we prepare |F |2 input matrices P(b1,b2), for all (b1, b2) ∈ F
2. In order to do
this, we modify equations (41) and (42) replacing b with the pair (b1, b2), corresponding to v1 and v2
in (42). The decoder of C1,2 is supposed to return a list of estimations of the information words, their
corresponding codewords and the model indicator vectors. These outputs and the temporary structures
are re-organized, as is done in step 2 · r for the decoding algorithm of the homogenous kernel polar code.
Note, however, that at the end of step 3, there are three information words lists Uˆ(0), Uˆ(1) and Uˆ(2) along
25
Algorithm 13 SCL Decoding Algorithm Output Generation
[◭◭◭] Output: (occurs after applying Algorithm 12 for all i ∈ [ℓ]−)
• Uˆ→r =
[
Uˆ
(0)
S
(p)
r,1→
, Uˆ
(1)
S
(p)
r,2→
, . . . , Uˆ
(ℓ−2)
S
(p)
r,ℓ−1
→, Uˆ
(ℓ−1)
r
]
, ∀r ∈ [ρ]−;
• Xˆr,(j·ℓ):((j+1)·ℓ−1) = g
(
Xˆ
(0)
S
(p)
r,1→
, Xˆ
(1)
S
(p)
r,2→
, . . . , Xˆ
(ℓ−2)
S
(p)
r,ℓ−1→
, Xˆ
(ℓ−1)
r
)
, ∀r ∈ [ρ]− , ∀j ∈ [N/ℓ]−;
• s;
• ρout = ρ.
with their corresponding three outer-code codewords lists. This is because we have decoded C1,2’s glued
symbols simultaneously, which resulted in retrieving Uˆ(1), Uˆ(2), Xˆ(1) and Xˆ(2) in a single decoding step.
4.3 A Recursive Description of the BP Algorithm
BP is an iterative message-passing decoding algorithm, which messages are sent over Forney’s normal
factor graph [32]. Although being an alternative to SC decoding [1] there is no evidence which algorithm
has better performance over general channels, except for the BEC, in which BP is shown to outperform
SC [12]. Simulations, however, suggest that in many cases BP outperforms SC. On the other hand, SCL
with small list size L outperforms BP in many cases.
The order of sending the messages on the graph is called the schedule of the algorithm. Hussami et
al. suggested employing the ”Z shape schedule” for transferring the messages [12, Section II.A]. In this
correspondence we introduce a serial schedule which is induced from the GCC structure of the code.
We begin by describing the types of messages that are computed throughout the algorithm for the
(u + v, v) polar code. Figure 5 depicts the normal factor graph representation of Arikan’s kernel. We
have four symbol half edges denoted by u, v, x0 and x1. These symbols have the following functional
dependencies among them: x0 = u + v and x1 = v. The messages and the inputs that are sent on the
graph are assumed to be LLRs, and their values are taken from R
⋃
{±∞}. The ∞ and −∞ are special
types of LLR values that indicate known assignment of 0 and 1, respectively. They are used to support
the existence of the polar code’s frozen symbols.
We associate four input LLR messages with the symbols half edges. These messages may be generated
by the output of the channel, by known values associated with frozen bits or by computations that were
done in this iteration or previous ones. We represent these messages by µ
(in)
u , µ
(in)
v , µ
(in)
x0 and µ
(in)
x1 . The
algorithm computes four output LLR messages, µ
(out)
u , µ
(out)
v , µ
(out)
x0 and µ
(out)
x1 , indicating the estimations
of u, v, x0 and x1, respectively, by the decoding algorithm. The messages are computed according to
the extrinsic information principle, i.e. each message that is sent from a node on an adjacent edge is a
function of all the messages that were previously sent to the node, except the message that was received
over the particular edge. The nodes of the graphs are denoted by a0 (the adder functional) and e1 (the
equality functional). Using the ideas mentioned above we have the following computation rules.
µe1→a0 = f(=)(µ
(in)
x1 , µ
(in)
v ), (44)
µa0→e1 = f(+)(µ
(in)
x0 , µ
(in)
u ), (45)
µ(out)u = f(+)(µ
(in)
x0 , µe1→a0), (46)
µ(out)v = f(=)(µ
(in)
x1 , µa0→e1), (47)
µ(out)x0 = f(+)(µ
(in)
u , µe1→a0), (48)
26
µ(out)x1 = f(=)(µ
(in)
v , µa0→e1), (49)
where f(=)(z0, z1) , z0 + z1 and f(+)(z0, z1) , 2 tanh
−1 (tanh(z0/2) · tanh(z1/2)). We denote by µα→β
where α, β ∈ {e1, a0} the message sent from node α to node β. µ
(out)
u and µ
(out)
x0 are sent from a0 over the
half edges corresponding to symbols u and x0, respectively. µ
(out)
v and µ
(out)
x1 are sent from e1 over the
half edges corresponding to symbols v and x1, respectively. Note that
f(=)(±∞, z1) = f(=)(z0,±∞) = ±∞ (50)
f(+)(±∞, z1) = ±z1, f(+)(z0,±∞) = ±z0. (51)
We now turn to give a recursive description of an iteration of the algorithm. As depicted in Figure
4 the factor graph of the length N bits code, has log2N layers. In each layer, there exist N/2 copies
of the kernel normal factor graph, depicted in Figure 5. As a consequence, for each layer, we have N/2
realizations of each type of input messages, output messages and inner messages (each one is corresponding
to a different set of symbols and interconnect). To denote the ith realization of these messages, we use
the notation µα→β,i, µ
(in)
γ,i and µ
(out)
γ,i , where α, β ∈ {a0, e1} and γ ∈ {x0, x1, u, v}. As before, we denote
the channel LLRs by the N length vector λ. Each input message or inner message, unless given (by the
channel output or by a prior knowledge on the frozen bits) is set to 0 before the first iteration. It is
assumed that the inner messages are preserved between the iterations (and see a further discussion in the
sequel).
Let us describe the BP decoder inputs and outputs for the (u+v, v) code of length N bits. The inputs
of the algorithm are the following.
• An N length vector of input LLRs, λ, containing the observation from the channel.
• A pointer to a matrix M(u,in) of N × log2(N) dimensions, which is used to hold the µ
(in)
v and µ
(in)
u
messages between iterations. We employ a pointer here, because we would like to be able to change
the values of the matrix as the algorithm progresses.
• A vector indicator z ∈ {0, 1}N , in which zi = 1 if and only if the i
th component of the information
vector u is frozen.
The algorithm outputs the following structures.
• An N length binary vector uˆ containing the information word that the decoder estimated (including
its frozen symbols).
• An N length vector xˆ containing the LLRs for the estimated codeword symbols. This structure is
used to store the µ
(out)
x0 and µ
(out)
x1 messages.
The BP function signature is defined as follows
[uˆ, xˆ] = BPDecoder
(
λ, M(u,in), z
)
. (52)
Algorithm 14 outlines the BP iteration for length N > 2 code. Algorithm 15 completes this recursive
description by considering the case of length N = 2 bits code. Note that in the algorithms we use aliases
for several of our inputs in order to improve the procedure readability. We say that s is an alias for a
variable w (and denote it by s :≡ w), if s is an alternative name for the memory space of w, and therefore
any algorithmic operation on w has the same results and side-effects as performing the operation on s.
General schedules of BP may require to hold a dedicated memory for storing µ
(in)
u , µ
(in)
v , µ
(in)
x0 , µ
(in)
x1 and
µa0→e1 type of messages that were previously computed. This memory may be needed for each realization
of such messages, specifically, for each layer of the graph and for each (u + v, v) normal subgraph, as in
Figure 5. However, for our GCC schedule, excluding µ
(in)
v , we do not need to save any message beyond the
27
Algorithm 14 BP Decoder of Length N = 2n Bits (u+ v, v) Polar Code
[◮◮◮] Input: λ; M(u,in); z.
//Initializations:
We use the following aliasing for the inputs of the algorithm.
µ(in)x0,r :≡ λ2r and µ
(in)
x1,r :≡ λ2r+1, ∀r ∈ [N/2]− ;
µ(in)u,r :≡M
(u,in)
2r,0 and µ
(in)
v,r :≡M
(u,in)
2r+1,0, ∀r ∈ [N/2]− ;
//STEP 0:
⊲ Compute messages [µe1→a0,r]
N/2−1
r=0 using (44).
⊲ Compute messages
[
µ
(out)
u,r
]N/2−1
r=0
using (46).
//STEP 1:
⊲ Perform an iteration on the first outer-code: give the vector
[
µ
(out)
u,r
]N/2−1
r=0
as an input to the polar
code BP iterative decoder of length N/2 bits. Also provide the indices of the frozen bits from the
first half of the codeword. The decoder outputs an estimation of the first outer-code codeword to[
µ
(in)
u,r
]N/2−1
r=0
(manifested as LLRs) and an estimation of its information word to the binary vector
uˆ(0), i.e.[
uˆ(0),
[
µ(in)u,r
]N/2−1
r=0
]
= BPDecoder
([
µ(out)u,r
]N/2−1
r=0
, M
(u,in)
0:(N/2−1),1:(log2(N)−1), z
N/2−1
0
)
. (53)
//STEP 2:
⊲ Compute the messages [µa0→e1,r]
N/2−1
r=0 using (45).
⊲ Compute the messages
[
µ
(out)
v,r
]N/2−1
r=0
using (47) (Note that these two steps can be combined into
one computation).
//STEP 3:
⊲ Perform an iteration on the second outer-code: give the vector
[
µ
(out)
v,r
]N/2−1
r=0
as an input to the
polar code BP iterative decoder of length N/2. Also provide the indices of the frozen bits from the
second half of the codeword. The decoder outputs an estimation of the second outer-code codeword
to
[
µ
(in)
v,r
]N/2−1
r=0
(manifested as LLRs) and an estimation of its information word to the binary vector
uˆ(1), i.e.[
uˆ(1),
[
µ(in)v,r
]N/2−1
r=0
]
= BPDecoder
([
µ(out)v,r
]N/2−1
r=0
, M
(u,in)
N/2:(N−1),1:(log2(N)−1), z
N−1
N/2
)
. (54)
⊲ Compute messages [µe1→a0,r]
N/2−1
r=0 using (44).
⊲ Compute messages
[
µ
(out)
x0,r
]N/2−1
r=0
and
[
µ
(out)
x1,r
]N/2−1
r=0
using (48) and (49), respectively.
[◭◭◭] Output: • uˆ =
[
uˆ(0), uˆ(1)
]
;
• xˆ2·r = µ
(out)
x0,r ; xˆ2·r+1 = µ
(out)
x1,r , ∀r ∈ [N/2]−.
28
Algorithm 15 BP Decoder for Length N = 2 Bits (u+ v, v) Polar Code
[◮◮◮] Input: λ; M(u,in); z.
//Initializations:
⊲ We use the following aliasing to the inputs of the algorithm.
µ(in)x0 :≡ λ0 and µ
(in)
x1 :≡ λ1;
⊲ Initialize the u, v input LLR messages (we assume that frozen variables are fixed to the zero value)
µ(in)w =
{
0, w is not frozen;
∞, w is frozen.
∀w ∈ {u, v}. (55)
//STEP 0:
⊲ Compute µe1→a0 according to (44).
//STEP 1:
⊲ If u is not frozen, compute µ
(out)
u according to (46), and make a hard decision on this bit, based
on its sign (denote it by uˆ). Otherwise, uˆ = 0.
//STEP 2:
⊲ Compute µa0→e1 according to (45).
//STEP 3:
⊲ If v is not frozen, compute µ
(out)
v according to (47), and make a hard decision on it, based on its
sign (denote it by vˆ). Otherwise, vˆ = 0.
⊲ Compute µ
(out)
x0 and µ
(out)
x1 according to (48) and (49), respectively.
[◭◭◭] Output: • uˆ = [uˆ, vˆ];
• xˆ =
[
µ
(out)
x0 , µ
(out)
x1
]
.
29
(a) A binary linear kernel with lower triangular generat-
ing matrix
(b) The RS3 kernel defined in (56)
Figure 10: Normal factor graphs representations of polar codes kernels
iteration boundary. This is because that in each iteration, all the messages except µ
(in)
v are re-computed
before their first usage (in the iteration). The implication of this observation is that the required memory
consumption can be reduced (see Subsection 5.1.6). Furthermore, the memory used for the other messages
is temporary and needed only for the same iteration. It can be seen that the memory for these temporary
messages is linear in the block length. The requirement to keep all the µ
(in)
v type of messages beyond the
iteration boundary of the algorithm results in memory consumption of Θ (N · log(N)).
In each iteration, we send one instance for each of the possible messages and for each (u+ v, v) block
realization in the code, except for the µe1→a0 type of message for which we send two messages (for all the
layers, besides the last one). Consequently the iteration time complexity is Θ (N · log(N)).
A complete BP implementation may require several iterations. The number of iterations may be
fixed or set adaptively, which means that the algorithm continues until some consistency constraints are
satisfied. An example for such a constraint, is that the signs of the LLR estimations for all the frozen bits
agree with their know values (i.e. if all the frozen bits are set to zero, then µ
(out)
w > 0 for all the frozen
bits, w). In this case, it is possible to stop an iteration in the middle by keeping a counter in a similar
way to the method that is usually used in BP decoding of LDPC codes using the check-node based serial
schedules (see e.g. [33]). We note, however, that in the LDPC case, the consistency is manifested by the
fact that all the parity check equations are satisfied.
Let us now consider BP for polar codes with general kernels. For this description we require the kernels
to be linear and be represented by a lower triangular generating matrix. The input to the kernel and the
output of the kernel are ℓ length vectors u and x ∈ Fℓ, respectively, satisfying x = g (u) = u ·G, where
G is an ℓ× ℓ lower triangular generating matrix. Figure 10a depicts a normal factor graph for such an ℓ
dimensions binary kernel. We have that an edge ei → aj exists in the graph if and only if Gi,j 6= 0. Being
a lower triangular matrix means that in the factor graph there are no edges ei → aj , such that j > i.
In case the kernel is non-binary, each edge ei → aj also has a label equal to its Gi,j value. For example,
Figure 10b depicts the normal factor graph corresponding to the RS3 kernel, with generating matrix
GRS3 =

 1 0 01 1 0
α2 α 1

 . (56)
Similarly to the discussion on the (u+v, v) code, we define input messages to the factor graph denoted
by µ
(in)(t)
ui and µ
(in)(t)
xj and their corresponding output messages µ
(out)(t)
ui and µ
(out)(t)
xj where i, j ∈ [ℓ]− and
t ∈ F\{0}. Moreover, we have messages µ
(t)
ei→aj and µ
(t)
aj→ei for every edge ei → aj . All of these messages
30
are LLRs such that for a message µ(t) we have µ(t) = ln
(
Pr(y|ω=0)
Pr(y|ω=t )
)
, where y is a vector of observations,
and ω is the variable associated with the edge on which the message µ(t) is transmitted. In case the code
is binary the letter indication t may be omitted. All the messages are calculated using the (typically false)
assumption that the factor graph is cycle free, and consequently for each node the messages sent to it are
statistically independent. Let aj be an adder node corresponding to column j of the generating matrix
G↓j , such that
∑ℓ−1
i=j Gi,j · uj = xj . We have for i, j ∈ [ℓ]− and i ≥ j
µ(t)ei→aj = f(=)
(
t,
[[
µ(τ)ar→ei
]
τ∈F\{0}
]j
r=0,r 6=i
,
[
µ(in)(τ)ui
]
τ∈F\{0}
)
; (57)
µ(t)aj→ei = f(+)
(
t,
[[
µ(τ)er→aj
]
τ∈F\{0}
]ℓ−1
r=j,r 6=i
,
[
µ(in)(τ)xj
]
τ∈F\{0}
, G−1i,j ·
[
[Gr,j ]
ℓ−1
r=j,r 6=i , 1
])
; (58)
µ(out)(t)ui = f(=)
(
t,
[[
µ(τ)ar→ei
]
τ∈F\{0}
]j
r=0
)
; (59)
µ(out)(t)xj = f(+)
(
t,
[[
µ(τ)er→aj
]
τ∈F\{0}
]ℓ−1
r=j,
, [Gr,j ]
ℓ−1
r=j
)
. (60)
The functions f(=)(·) and f(+)(·) are generalizations of the functions that were presented before for the
(u+ v, v) case. Note that for these functions, the number of input arguments that follow t (the alphabet
symbol) may vary. This number is equal to the degree of the sending node minus one. Consequently, we
denoted them in (57)-(60) as vector of vectors (i.e. using the
[[
µ
(τ)
r
]
τ∈F\{0}
]
r∈B
notation), however it
should be understood that each element of this vector of vectors is a different argument to the functions.
Given an equality node ei with degree d, the function f(=)(·) is defined as follows
f(=)
(
t,
[
µ
(τ)
0
]
τ∈F\{0}
,
[
µ
(τ)
1
]
τ∈F\{0}
, . . . ,
[
µ
(τ)
d−2
]
τ∈F\{0}
)
,
d−2∑
r=0
µ(t)r , (61)
where t ∈ F\{0} and
[
µ
(τ)
r
]
τ∈F\{0}
are LLR messages received at the node from d− 1 edges adjacent to
it. Denote the variable associated with these edges by ωr for r ∈ [d − 1]− and let ω denote the variable
associated with the edge which messages were not given as input (we refer to this edge as the ”missing
edge”). The output of this function is the LLR message sent from ei on the missing edge. Being a
repetition constraint (ω = ωr for all r ∈ [d− 1]−) the LLR calculated in (61) appears as a summation of
the LLRs corresponding to the same alphabet letter t. See Figure 11a for an illustration of this case.
Given an adder node aj with degree d, the function f(+)(·) is defined as follows
f(+)
(
t,
[
µ
(τ)
0
]
τ∈F\{0}
,
[
µ
(τ)
1
]
τ∈F\{0}
, . . . ,
[
µ
(τ)
d−2
]
τ∈F\{0}
,γ
)
, ln


∑
ω
d−2
0 ∈A(γ,0) exp
(
−
∑d−2
r=0 µ
(ωr)
r
)
∑
ω
d−2
0 ∈A(γ,t) exp
(
−
∑d−2
r=0 µ
(ωr)
r
)


(62)
where t ∈ F\{0}, µ
(0)
r , 0 and
[
µ
(τ)
r
]
τ∈F\{0}
are LLR messages received at the node from d − 1 edges
adjacent to it, r ∈ [d − 1]−. Let us denote the variables corresponding to these edges by {ωr}
d−2
r=0 and
the variable corresponding to the missing edge by ω. It is assumed that the generating matrix equation
corresponding to node aj is
ω =
d−2∑
r=0
γr · ωr. (63)
31
(a) Equality node messages (b) Adder node messages
Figure 11: Messages of BP algorithm
Consequently, the set A(γ,t) is defined as the set of all assignments to ωd−20 , such that ω = t in (63),
i.e. A(γ,t) =
{
ωd−20 |
∑d−2
r=0 γr · ωr = t
}
. See Figure 11b for an illustration of this case. Note that naive
calculation of (62) for all t ∈ F\{0} has time complexity of O(|F |d) while computation that uses trellis
has complexity of O(d · |F |2).
We are now ready to describe the BP algorithm for general lower triangular kernels. The inputs of
the algorithm are as follows.
• N length vectors of input LLRs,
{
λ(t)
}
t∈F\{0}
, containing the observations of the channel (t indi-
cates the code alphabet letters).
• Pointers to matrices
{{
M(ui,in)(t)
}
t∈F\{0}
}
i∈[ℓ]−
of N/ℓ × logℓ(N) dimensions, which are used to
hold the µ
(in)(t)
ui messages between iterations. Pointers are employed here, because we would like to
be able to change the values of the matrix as the algorithm progresses.
• Pointers to matrices
{{
M(aj→ei)(t)
}
t∈F\{0}
}
0≤j≤i≤ℓ−1
of N/ℓ× logℓ(N) dimensions, which are used
to hold the µ
(t)
aj→ei messages between iterations.
• Vector indicator z ∈ {0, 1}N , in which zi = 1 if and only if the i
th component of the information
vector u is frozen.
The algorithm outputs the following structures.
• AnN length vector uˆ ∈ FN , containing the information vector that the decoder estimated (including
its frozen symbols).
• N length vectors
{
xˆ(t)
}
t∈F\{0} which are the LLRs for the estimated codeword. This structure is
used for delivering the µ
(out)(t)
xj messages.
The BP function signature is defined as follows (note that the i, j in the third argument are limited such
that 0 ≤ j ≤ i ≤ ℓ− 1)
[
uˆ,
{
xˆ(t)
}
t∈F\{0}
]
= BPDecoder
({
λ(t)
}
t∈F\{0}
,
{
M(u,in)(t)
}
t∈F\{0}
,
{{
M(aj→ei)(t)
}
t∈F\{0}
}
i,j
, z
)
.
(64)
We start with the N = ℓ symbols case. Algorithm 16 gives the description for this case. Algorithm 17
consider the N > ℓ symbols case.
32
Algorithm 16 BP Decoder for Length N = ℓ F -Symbols Polar Code
[◮◮◮] Input:
{
λ(t)
}
t∈F\{0}
;
{
M(u,in)(t)
}
t∈F\{0};
{{
M(aj→ei)(t)
}
t∈F\{0}
}
i,j
; z.
//Initializations:
⊲ Use the following aliases to the inputs of the algorithm.[
µ(in)(t)xj
]
t∈F\{0}
:≡
[
λ
(t)
j
]
t∈F\{0}
, ∀j ∈ [ℓ]−;
µ(t)aj→ei :≡M
(aj→ei)(t)
0,0 , ∀t ∈ F\{0}, ∀0 ≤ j ≤ i ≤ ℓ− 1.
⊲ Initialize the vector
[
µ
(in)(t)
ui
]
t∈F\{0}
µ(in)(t)ui =
{
0, zi = 0;
∞, zi 6= 0.
∀i ∈ [ℓ]−, ∀t ∈ F\{0}.
//Iteration:
⊲ For j = ℓ− 1 to 0 Do
• Compute
[
µ
(t)
ei→aj
]
t∈F\{0}
, ∀i, s.t. j < i ≤ ℓ− 1 using (57);
• Compute
[
µ
(t)
aj→ej
]
t∈F\{0}
using (58).
⊲ For i = 0 to ℓ− 1 Do
• Compute
[
µ
(t)
aj→ei
]
t∈F\{0}
, ∀j, s.t. 0 ≤ j < i using (58);
• If ui is not frozen, compute
[
µ
(out)(t)
ui
]
t∈F\{0}
according to (59), and make a hard decision on
this symbol, based on the LLR vector (denote the hard decision by uˆi). If ui is frozen, set
uˆi = 0;
• Compute
[
µ
(t)
ei→aj
]
t∈F\{0}
, ∀j, s.t. 0 ≤ j < i using (57).
⊲ Compute
[
µ
(t)
xj
]
t∈F\{0}
, ∀j ∈ [ℓ]− using (60).
[◭◭◭] Output: • uˆ = [uˆ0, uˆ1, . . . , uˆℓ−1];
• xˆ =
[[
µ
(out)(t)
x0
]
t∈F\{0}
,
[
µ
(out)(t)
x1
]
t∈F\{0}
, . . . ,
[
µ
(out)(t)
xℓ−1
]
t∈F\{0}
]
.
33
Algorithm 17 BP Decoder of Length N = ℓn F -Symbols Polar Code
[◮◮◮] Input:
{
λ(t)
}
t∈F\{0}
;
{
M(u,in)(t)
}
t∈F\{0};
{{
M(aj→ei)(t)
}
t∈F\{0}
}
i,j
; z.
//Initializations:
⊲ Use the following aliases to the inputs of the algorithm.[
µ(in)(t)xj ,r
]
t∈F\{0}
:≡
[
λ
(t)
r·ℓ+j
]
t∈F\{0}
∀j ∈ [ℓ]− and ∀r ∈ [N/ℓ]−;
[
µ(in)(t)ui,r
]
t∈F\{0}
:≡
[
M
(ui)(t)
r,0
]
t∈F\{0}
∀i ∈ [ℓ]− and ∀r ∈ [N/ℓ]−;
µ(t)aj→ei,r :≡M
(aj→ei)(t)
r,0 ∀t ∈ F\{0}, ∀0 ≤ j ≤ i ≤ ℓ− 1 and ∀r ∈ [N/ℓ]−.
//Iteration:
⊲ For j = ℓ− 1 to 0 Do
• Compute
[
µ
(t)
ei→aj ,r
]
t∈F\{0}
, ∀i, s.t. j < i ≤ ℓ− 1 and ∀r ∈ [N/ℓ]− using (57);
• Compute
[
µ
(t)
aj→ej ,r
]
t∈F\{0}
and ∀r ∈ [N/ℓ]− using (58).
⊲ For i = 0 to ℓ− 1 Do
• Run steps 2 · i and 2 · i+ 1 of Algorithm 18.
⊲ Compute
[
µ
(out)(t)
xj
]
t∈F\{0}
, ∀j ∈ [ℓ]− using (60).
[◭◭◭] Output: • uˆ =
[
uˆ(0), uˆ(1), . . . , uˆ(ℓ−1)
]
;
• xˆ
(t)
r·ℓ+j = µ
(out)(t)
xj ,r , ∀j ∈ [ℓ]− , ∀t ∈ F\{0} and ∀r ∈ [N/ℓ]−.
34
Algorithm 18 BP Iterations Steps Dedicated for Decoding of Outer-Code Ci, i ∈ [ℓ]−
//STEP 2 · i:
⊲ Compute
[
µ
(t)
aj→ei,r
]
t∈F\{0}
, ∀j, s.t. 0 ≤ j < i and ∀r ∈ [N/ℓ]− using (58).
⊲ Compute
[
µ
(out)(t)
ui,r
]
t∈F\{0}
, ∀r ∈ [N/ℓ]− according to (59).
//STEP 2 · i+ 1:
⊲ Give the vector
{[
µ
(out)(t)
ui,r
]N/ℓ−1
r=0
}
t∈F\{0}
as an input to the polar code decoder of length N/ℓ
symbols. Also provide to this decoder the indices of the frozen symbols corresponding to Ci and
pointers to the matrices containing the messages of this outer-code. Assume that the decoder outputs[[
µ
(in)(t)
ui,r
]
t∈F\{0}
]N/ℓ−1
r=0
and the estimation of the information word of Ci, i.e.
[
uˆ(i),
[[
µ(in)(t)ui,r
]
t∈F\{0}
]N/ℓ−1
r=0
]
= BPDecoder
({[
µ(out)(t)ui,r
]N/ℓ−1
r=0
}
t∈F\{0}
,
{
M
(u,in)(t)
i·N/ℓ:((i+1)·N/ℓ−1),1:(logℓN−1)
}
t∈F\{0}
,
{{
M
(aj′→ei′ )(t)
i·N/ℓ:((i+1)·N/ℓ−1),1:(logℓN−1)
}
t∈F\{0}
}
i′,j′
, z
(i+1)·N/ℓ−1
i·N/ℓ
)
.
(65)
// Note that i′, j′ in the third argument of (65) are limited such that 0 ≤ j′ ≤ i′ ≤ ℓ− 1.
⊲ Compute
[
µ
(t)
ei→aj ,r
]
t∈F\{0}
, ∀j, s.t. 0 ≤ j < i and ∀r ∈ [N/ℓ]− using (58).
35
Thus far we discussed homogenous kernels. BP on mixed-kernels polar codes can be defined in a
similar manner. In mixed-kernels structures we have at least two types of constituent kernels, each one
with different alphabet. In order to connect these kernels, we combine several input symbols of the first
kernel and consider them as a single entity for decoding purposes. We say that these symbols are ”glued”
together, thereby creating a symbol of the larger-alphabet kernel. The output symbols of the larger
alphabet size kernel are given as input to the glued input entry of the inner mapping defined by the first
kernel. In order to support this gluing operation we introduce an additional node to the normal factor
graph, and label it by the ’&’ symbol. This node serves as a ”bridge” between the two alphabets.
Example 4 (BP on Mixed-Kernels) Let us consider the mixed-kernels code discussed in Example 3.
In this example we use the G =
[
1 0
1 1
]⊗ 2
binary matrix as our first kernel and glue its input com-
ponents u1 and u2 ∈ GF (2) into one entity called u(1,2) ∈ GF (4). The second kernel is the RS4 kernel
described by the generating matrix (??). All the BP messages sent over the edges of this kernel and the
RS4 kernel were already discussed above, except the ones sent and received by the &(1,2) node. Note that
the correspondance between the binary representation of u(1,2) and its representation over GF (4) is as
follows: [u2, u1] = [0, 0] ≡ 0; [0, 1] ≡ 1; [1, 0] ≡ α and [1, 1] ≡ α
2.
µ&(1,2)→e1 = ln

 exp
{
−µe2→&(1,2) − µ
(in)(α)
u(1,2)
}
+ 1
exp
{
−µe2→&(1,2) − µ
(in)(α2)
u(1,2)
}
+ exp
{
−µ
(in)(1)
u(1,2)
}

 (66)
µ&(1,2)→e2 = ln

 exp
{
−µe1→&(1,2) − µ
(in)(1)
u(1,2)
}
+ 1
exp
{
−µe1→&(1,2) − µ
(in)(α2)
u(1,2)
}
+ exp
{
−µ
(in)(α)
u(1,2)
}

 (67)
µ(out)(α)u(1,2) = µe2→&(1,2) (68)
µ(out)(1)u(1,2) = µe1→&(1,2) (69)
µ
(out)(α2)
u(1,2) = µe1→&(1,2) + µe2→&(1,2) (70)
We use the following aliases between the messages mentioned in (66)- (70) and the messages of the
standard homogenous kernel defined in (57)- (60): µ
(in)
ui :≡ µ&(1,2)→ei ; µ
(out)
ui :≡ µei→&(1,2) ; for i ∈ {1, 2}.
The BP schedule suggested in Algorithm 17 is preserved, i.e. each iteration starts in an initialization
step and then moves to BP decoding of its outer-codes. Messages (68)-(70) are computed before calling
the BP decoder of the RS4 outer-code, in order to convert binary LLRs into quaternary ones. Moreover,
messages (66) and (67) are employed after the BP iteration on the RS4 outer-code has finished, in order
to convert the quaternary LLRs into binary ones.
In the next section we describe architectures implementing the decoding algorithms we covered so far.
5 Recursive Descriptions of Polar Code Decoders Hardware Ar-
chitectures
In this section we study schematic architectures that are induced from the recursive decoding algorithms
presented in Section 4. Indeed most of the algorithmic details were given in that section, therefore the
purpose of our discussion here is to consider aspects of hardware algorithms, such as possible parallelism,
scheduling and memory resources managements. Note, however, that throughout the discussion, our
presentation is relatively abstract, emphasizing the important concepts and features of the recursive
designs without dwelling into all the specifics. Consequently, the figures representing the block diagrams
36
Figure 12: Normal factor graph representation for the first kernel of Example 4. This kernel is constructed
by gluing inputs u1, u2 of the mapping defined by the generating matrix G.
should not be considered as full detailed specifications of the implementation, but rather as illustrations
that aim to guide the reader in the task of designing the decoder.
Throughout this section we use the same notations for signals array and registers arrays. Let u(0 :
N − 1) be an N length signals array. We denote its ith component by u(i). If v is a two dimensional array
(i.e. a matrix) of L rows and N columns, we denote it by v(0 : M − 1, 0 : N − 1). Naturally, the ith row
of this array is denoted by v(i, 0 : N − 1), and it is a one dimensional array of N elements, of which the
jth element is denoted by v(i, j).
5.1 Arikan’s Construction Decoders
This subsection covers architectures for Arikan’s (u+v, v) construction. Generalizations of this discussion
for other polar code types are presented in Subsection 5.2. We begin by the simple SC pipeline decoder
(Subsection 5.1.2), and then proceed to the more efficient SC line decoder (Subsection 5.1.3). Both of
these designs were previously presented by Leroux et al. [14, 15] in a non-recursive fashion. We conclude
by introducing a BP line decoder (Subsection 5.1.6).
5.1.1 The Processing Element
The basic computation element of the decoding circuits, described in Subsections 5.1.2 and 5.1.3, is the
processing element (PE). Figure 13 depicts the PE block. Note that throughout Subsection 5.1 we use
thick arrows to designate signals corresponding to real numbers (to be represented by some quantization
method) and thin arrows to designate binary signals. The PE block has three inputs:
• λ(0 : 1) - an array of two input LLRs.
• uˆ(in) - an estimation of the ”u” bit from the coded pair (u+ v, v).
• cu - a binary control signal determining the type of LLR that the circuit gives as output in λ
(out);
cu = 0 means that we calculate the LLR of u and cu = 1 means that we calculate the LLR of v
given the estimation of u (the input signal uˆ(in)).
The circuit outputs the LLR of u or v depending on the control signal cu
λ(out) =
{
2 tanh−1 (tanh (λ(0)/2) tanh (λ(1)/2)) , cu = 0;
(−1)uˆ
(in)
· λ(0) + λ(1), cu = 1.
(71)
37
Figure 13: (u+ v, v) polar code PE block
(a) Pipeline decoder
(b) Line decoder
Figure 14: Blocks of the (u + v, v) polar code decoders of length N bits
5.1.2 The SC Pipeline Decoder
Figure 15 contains a block description of the SC pipeline decoder. The decoder’s signals λ(0 : N − 1),
z(0 : N − 1), uˆ(0 : N − 1) and xˆ(0 : N − 1) correspond to the inputs and outputs of the SCDecoder
function (17) λ, z, uˆ and xˆ, respectively. For code length N = 2 bits, the SC decoder includes a single
PE and a slicer. It operates according to Algorithm 6.
A block diagram of the implementation of this decoder for N > 2 is depicted in Figure 15. Scanning
the diagram from right to left we can observe the following ingredients. The λ(0 : N − 1) LLR input to
the circuit is given as input to an array of N/2 PEs, {PEj}
N/2−1
j=0 , which all of them are controlled by the
same control signal, c
(internal)
u . The output of these PEs is denoted by the array of signals Λ(0 : N−1) and
stored in an array of N/2 registers R(0 : N/2− 1) (depicted as rectangle blocks with the register names,
R(i), written in them). These registers are given as the LLR input to a SC pipeline decoder of length
N/2 bits. This decoder is referred to as the embedded N/2 length decoder within the N length decoder.
The embedded decoder is also given as input the frozen bits indicator signals z˜(0 : N/2 − 1) (binary
array), which is generated by splitting the z(0 : N − 1) binary array into two halves using the MUX array
(M0a). The multiplexers in (M0a) are controlled by the internal binary signal outerCodeID that indicates
the ordinal of the outer-code that the embedded decoder decodes. For instance, if outerCodeID = 0 then
the embedded decoder handles the first outer-code and therefore it should be given as input the first half
of the z array. The two outputs of the embedded decoder are denoted by signals arrays u˜(0 : N/2 − 1)
and x˜(0 : N/2− 1). The array u˜(0 : N/2 − 1) is given as input to the two halves of the output decoded
information bits array uˆ(0 : N − 1). The DeMUX array (M0b) determines to which part of the uˆ array u˜
is written.
The Encoding Unit performs the encoding of the outer-code’s estimated codewords into the estimated
codeword of the N length code. The binary register tmpxˆ(0 : N − 1) stores the temporary value of the
estimated codeword xˆ which is the signals array at its input. The encoding layer is given as input the
outerCodeID signal and the two signals arrays x˜(0 : N/2− 1) and tmpxˆ(0 : N − 1). Its output is derived
38
Figure 15: Block diagram for the SC pipeline decoder
as follows
xˆ (2j : (2j + 1)) =
{
[x˜(j), 0] , outerCodeID = 0;
[x˜(j) + tmpxˆ(2j), x˜(j)] , outerCodeID = 1
, ∀j ∈
[
N
2
]
−
. (72)
Note that in order to avoid delays due to sampling by registers, it is important that the codeword
estimation (which is one of the outputs of the decoder) will be the output of the encoding layer and not
the register following it. This issue and further timing concerns are considered in the next subsection.
We describe the recursive schematic decoding procedure for N > 2 in Algorithm 19. Let us consider
the complexity of this circuit. We assume that a PE finishes its operation in one clock cycle. Denote
by T (n) the time (in terms of the number of clock cycles) that is required to complete the decoding of
N = 2n length polar code. Then, T (n) = 2 + 2 · T (n − 1) n > 1 and T (1) = 2. This recursion yields
T (n) = 2N − 2. Denote by P (n) the number of PEs for a decoder of length N = 2n bits polar code, we
have P (n) = 2n−1 + P (n − 1) n > 1 and P (1) = 1, resulting in P (n) = 2n − 1 = N − 1. The cost of
the encoding unit is of 2 ·
∑n
i=1 2
i = 4 · (N − 1) bits registers, and
∑n−1
i=0 2
i = N − 1 xor circuits. We
should have ρ(n) registers for holding LLR values, so ρ(n) = 2n−1 + ρ(n − 1) n > 1 and ρ(1) = 0, so
ρ(n) = N − 2. Note, that in this design, we assume that the encoding layer is a combinatorial circuit.
5.1.3 The SC Line Decoder
In the decoder pipeline design of length N polar code, the N/2 processing elements {PEj}
N/2−1
j=0 , are
only employed during steps 0 and 2 of the algorithm. During the other steps (that ideally consume
2 · T (n − 1) = 2N − 4 clock cycles of the total 2N − 2 clock cycles) these processors are idle, resulting
in an inefficient design. In order to increase the processors utilization we observe that the maximum
number of operations that can be done in parallel by the PEs in the SC decoding algorithm is N/2. As
39
Algorithm 19 SC Pipeline Decoder of Length N (u+ v, v) Polar Code
//STEP 0:
⊲ Set c
(internal)
u = outerCodeID = 0.
Using the PEs array {PEj}
N/2−1
j=0 , prepare the LLRs input array for the embedded decoder of the
first N/2 length outer-code and output it on the signals array Λ(0 : N/2− 1), such that
Λ(j) = 2 tanh−1 (tanh(λ(2j)/2) tanh(λ(2j + 1)/2)) , j ∈ [N/2]−.
Sample the Λ(0 : N/2− 1) array by the registers array R(0 : N/2− 1). Sample the first half of the
frozen bits indicator z by the z˜ register, i.e. z˜(0 : N/2− 1) = z(0 : N/2− 1).
//STEP 1:
⊲ Execute the embedded decoder on R (0 : N/2− 1) and z˜(0 : N/2− 1).
⊲ Sample the u˜(0 : N/2− 1) output array by the first half of uˆ, i.e. uˆ(0 : N/2− 1) = u˜(0 : N/2− 1).
Sample the x˜(0 : N/2− 1) output array by the x(outer)(0 : N/2 − 1) register, i.e. x(outer)(0 :
N/2− 1) = x˜(0 : N/2− 1). Let the Encoding Unit process x˜(0 : N/2− 1) according to (72).
//STEP 2:
⊲ Set c
(internal)
u = outerCodeID = 1.
Using the PEs array {PEj}
N/2−1
j=0 , prepare the LLRs input array for the embedded decoder of the
second N/2 length outer-code and output it on the signals array Λ(0 : N/2− 1), such that
Λ(j) = (−1)x
(outer)(j)λ(2j) + λ(2j + 1), j ∈ [N/2]−.
Sample the Λ(0 : N/2 − 1) array by the registers array R(0 : N/2 − 1). Sample the second half of
the frozen bits indicator z by the z˜ register, i.e. z˜(0 : N/2− 1) = z(N/2 : N − 1).
//STEP 3:
⊲ Execute the embedded decoder on R (0 : N/2− 1) and z˜(0 : N/2− 1).
⊲ Sample the u˜(0 : N/2− 1) output array by the second half of uˆ, i.e. uˆ(N/2 : N − 1) = u˜(0 :
N/2− 1). Let the Encoding Unit process x˜(0 : N/2− 1) according to (72).
40
a consequence, in order to support the maximum level of parallelism, the design has to include at least
N/2 PEs. The line decoder4, that we describe in this subsection, achieves this lower-bound.
Figure 14b depicts the line decoder block for length N bits code. The line decoder has two operation
modes.
Standard Mode (S-Mode): modeIn = 0
The decoder gets as input LLRs array, λ(0 : N−1), and the frozen bits indicator vector, z(0 : N−1).
Upon completion of its operation the decoder outputs the hard decision on the information word
uˆ(0 : N − 1) and its corresponding codeword xˆ(0 : N − 1) (this is the operation mode we supported
thus far in the pipeline decoder).
PE-Array Mode (P-Mode): modeIn = 1
The decoder gets as input a signals array of LLRs λ(0 : N − 1), a control signal c
(in)
u and a binary
vector uˆ(in)(0 : N/2− 1). The output is a signals array λ(out)(0 : N − 1) of LLRs, where
λ(out)(j) =
{
2 · tanh−1 (tanh (λ(2j)/2) · tanh (λ(2j + 1)/2)) , c(in)u = 0;
(−1)uˆ
(in)(j) · λ(2j) + λ(2j + 1), c
(in)
u = 1,
∀j ∈
[
N
2
]
−
.
(73)
In Figure 16, we provide a block diagram for this decoder. Note, that in order to maintain the maximum
level of parallelism, the length N polar code decoder ought to have N/2 processors. Thus, in order to
build the length N polar code decoder using an embedded N/2 length polar code decoder (already having
N/4 processors), we use an additional array of N/4 PEs, which is referred to as the auxiliary array. The
input signal modeIn indicates wether the decoder is used in S-Mode or in P-Mode. The mode signal is an
internal signal that controls whether the N/2 length embedded decoder is in P-Mode.
Let us scan Figure 16 from right to left and observe its important ingredients. The auxiliary PEs array
contains N/4 processors {PEj}
N/4−1
j=0 to which the second half of input array λ(N/2 : N−1) is connected.
The first half of the input LLRs array λ(0 : N/2 − 1) is connected to the embedded line decoder via
the MUX array (M2), in which all the multiplexers are controlled by the binary signal cm. The other
input alternative of the (M2) array is the registers array R(0 : N/2− 1). The cu input of {PEj}
N/4−1
j=0 is
determined by the (M3) multiplexer, such that in S-Mode (modeIn = 0) the input is c
(internal)
u (an internal
signal) and otherwise cu = c
(in)
u (one of the inputs to the N length decoder). The output of (M3) also
serves as the c
(in)
u input to the embedded decoder. The modeIn signal further controls the (M4) MUX
array, such that in S-Mode the uˆ(in) input to {PEj}
N/4−1
j=0 is the input sub-vector uˆ
(in) (N/4 : N/2− 1)
and otherwise the input is x(outer) (N/4 : N/2− 1) (the second half of the estimated codeword output of
the embedded decoder). Furthermore, modeIn also controls the (M1) MUX array that selects between
uˆ(in) (0 : N/4− 1) and x(outer) (0 : N/4− 1) for modeIn = 1 and modeIn = 0 respectively. The output
of (M1) serves as the uˆ(in) input of the embedded decoder. The internal binary signal, mode, is given to
the embedded decoder as its modeIn input.
The S-Mode and the P-Mode procedures of the line decoder are described in Algorithms 20 and 21,
respectively. Let us discuss the complexity of the decoder. Let P (n) be the number of processors of the
N = 2n decoder. Then, P (n) = 2n−2+P (n− 1) P (1) = 1, so P (n) = 2n−1 = N/2. The number of LLR
registers is ρ(n) = 2n−1+ ρ(n− 1), ρ(1) = 1, so we have ρ(n) = 2n− 1 = N − 1. Note that ρ(n) doesn’t
account for the binary registers for z˜, tmpxˆ and uˆ.
At this point, we would like to make a remark regarding the efficiency of the proposed design. The
recursive design has the benefit of being a comprehensible reflection of the implemented algorithm. It
also has the advantage of emphasizing the parts of the system that may be reused. However, it might be
argued that it has a disadvantage considering the routing of signals in the circuit. This is because we use
4 Note that strictly speaking, the original line decoder, presented by Leroux et al. [15, Section 3.3], is not precisely the
same design, discussed here. The differences, however, appear to be minor (existing mostly in the routing between the LLR
registers and the PEs). As a consequence we preferred not to distinguish it from Leroux’s design.
41
Figure 16: Block diagram for the SC line decoder
42
Algorithm 20 S-Mode of SC Line-Decoder of Length N (u+ v, v) Polar Code (modeIn = 0)
//STEP 0:
⊲ Set cm = c
(internal)
u = outerCodeID = 0, mode = 1.
Operate the embedded decoder in P-Mode, such that at the output of the decoder we have
Λ(j) = 2 · tanh−1 (tanh (λ(2j)/2) · tanh (λ(2j + 1)/2)) ∀j ∈ [N/4]−.
Use the auxiliary PEs array and compute
Λ(j) = 2 · tanh−1 (tanh (λ(2j)/2) · tanh (λ(2j + 1)/2)) ∀N/4 ≤ j ≤ N/2− 1
Sample the Λ(0 : N/2− 1) array by the registers array R(0 : N/2− 1). Sample the first half of the
frozen bits indicator z by the z˜ register, i.e. z˜(0 : N/2− 1) = z(0 : N/2− 1).
//STEP 1:
⊲ Set mode = 0 and cm = 1.
Execute the embedded decoder in S-Mode on R (0 : N/2− 1) and z˜(0 : N/2− 1).
⊲ Sample the u˜(0 : N/2− 1) output array by the first half of uˆ, i.e. uˆ(0 : N/2− 1) = u˜(0 : N/2− 1).
Sample the x˜(0 : N/2− 1) output array by the x(outer)(0 : N/2 − 1) register, i.e. x(outer)(0 :
N/2− 1) = x˜(0 : N/2− 1). Let the Encoding Unit process x˜(0 : N/2− 1) according to (72).
//STEP 2:
⊲ Set cm = 0, c
(internal)
u = mode = outerCodeID = 1.
Operate the embedded decoder in P-Mode, such that at the output of the decoder we have
Λ(j) = (−1)x
(outer)(j) · λ(2j) + λ(2j + 1) ∀j ∈ [N/4]−.
Use the auxiliary PEs array and compute
Λ(j) = (−1)x
(outer)(j) · λ(2j) + λ(2j + 1) ∀N/4 ≤ j ≤ N/2− 1
Sample the Λ(0 : N/2 − 1) array by the registers array R(0 : N/2 − 1). Sample the second half of
the frozen bits indicator z by the z˜ register, i.e. z˜(0 : N/2− 1) = z(N/2 : N − 1).
//STEP 3:
⊲ Set mode = 0 and cm = 1.
Execute the embedded decoder in S-Mode on R (0 : N/2− 1) and z˜(0 : N/2− 1).
⊲ Sample the u˜(0 : N/2− 1) output array by the second half of uˆ, i.e. uˆ(N/2 : N − 1) = u˜(0 :
N/2− 1). Let the Encoding Unit process x˜(0 : N/2− 1) according to (72).
43
Algorithm 21 P-Mode of SC Line-Decoder of Length N (u+ v, v) Polar Code (modeIn = 1)
⊲ Set cm = 0,mode = 1.
Operate the embedded decoder in P-Mode, such that at the output of the decoder we have
Λ(j) =
{
2 · tanh−1 (tanh (λ(2j)/2) · tanh (λ(2j + 1)/2)) c(in)u = 0;
(−1)uˆ
(in)(j) · λ(2j) + λ(2j + 1) c
(in)
u = 1;
∀j ∈ [N/4− 1]− .
Use the auxiliary PEs array and compute
Λ (j) =
{
2 · tanh−1 (tanh (λ(2j)/2) · tanh (λ(2j + 1)/2)) , c(in)u = 0;
(−1)uˆ
(in)(j) · λ(2j) + λ(2j + 1) c
(in)
u = 1;
∀N/4 ≤ j ≤ N/2− 1.
//Note that the signals array Λ(0 : N/2− 1) is wired to the output signals array λ(out)(0 : N/2− 1).
the embedded decoder as a black box and consequently we route all the signals from it and to it, using its
interface. As a result, some of the signals traverse lengthy paths before reaching their target processor.
These paths may be too long for the decoder circuit to have an adequate clock frequency, thereby resulting
in degradation of the achievable throughput. We therefore recommend that after constructing the circuit
in a recursive manner, it should be optimized by unfolding the recursive units and contracting the paths.
Furthermore, we advise that for building a decoder of length 2N bits code, the designer will use the
already optimized design of the N length decoder (for the embedded unit), thereby taking advantage of
the recursion.
We give below two examples of long paths hazards, that are likely to pose a problem. Workarounds
for these challenges are further provided.
1. The (M2) MUX array at the input of the embedded line decoder of the length N/2 code was included
because of the introduction of P-Mode. A closer examination of our design, reveals that some of
(M2) input signals traverse long paths before reaching their destination PE. For example, the inputs
λ(0) and λ(1) need to traverse log2(N)− 1 multiplexer layers before reaching their processor. Since
P-Mode needs to be accomplished in a single clock cycle, this long path might be prohibitive. By
unfolding the N/2 length embedded decoder block, the designer is able to control the lengths by
carefully routing the signals.
2. The encoding layer also suffers from long routing. In our analysis, we assumed that the encoding
procedure is combinatorial, and therefore has to be completed within one clock cycle. This may be
a problem when several encoding circuits are operated one after the other. For instance, this is the
case of step 3 of the decoder of length N/2i code, that occurs within step 3 of the decoder of length
N/2i−1 code for all i ∈ [2N − 2]. In this case, O(logN) operations need to occur in a sequential
manner in one clock cycle. For large N and high clock frequency circuit, this might not be feasible.
The idea of Leroux et al. [15] was to use flip-flops for saving the partial encoding for each code bit
in the different layers of the decoding circuit. Each such flip-flop, is connected using a xor circuit to
the signal line of the estimated information bit. As such, whenever the SC decoder decides on an
information bit, the flip-flops corresponding to the code bits that are dependent on this information
bit are updated accordingly. These flip-flops need to be reset whenever we start decoding their
corresponding outer-code. For example, when we start using the embedded N/2 length decoder (on
step 1 and step 3) its flip-flops of partial encoding need to be erased (because they correspond to a
new instance of outer-code).
The above notion may also be described recursively, by changing the specification of the length
44
N polar code decoder in S-mode, and requiring it to output the estimated information bits as
soon as they’re ready. The decoder should also have an N length binary indicator vector, that
indicates which code bits are dependent on the currently estimated information bit. It is easy to see
that using the indicator vector of the length N/2 decoder, it is possible to calculate the N length
indicator vector, by using the (u + v, v) mapping. This, however, generates again a computation
path of length Θ(logN). This problem, can be addressed, by having a fixed indicator circuit for each
partially encoded-bit flip-flop. This circuit will indicate which information bit should be accumulated
depending on the ordinal number of this bit. For example, for the decoder of the code of length
N , we should have an array of N/2 flip-flops, each one corresponds to a bit of the codeword of
the N/2 length first outer-code. Each one of these flip-flops, should have an indicator circuit, that
gets as input a value of a counter signaling the ordinal number of the information bit that has
been estimated, and returns 1 iff its corresponding codeword bit is influenced by this information
bit. For example, the indicator circuit, corresponding to the first code bit, is a constant 1, because
x0 =
∑N/2−1
i=0 ui, i.e. it is dependent on all the information bits. On the other hand, the last bit’s
indicator (i.e. of xN/2−1) returns 1 iff its input equals to N/2− 1, because xN/2−1 = uN/2−1. Using
the global counter (that is advanced whenever an information bit is estimated) and the indicator
circuits, each code bit that is influenced by this information bit change its flip-flop state accordingly.
Using the Kronecker power form of the generating matrix of the (u+ v, v) polar code, it can be seen
that each of such indicator circuits can be designed by using no more than O(log n) = O(log logN)
AND and NOT circuits, therefore the total cost of these circuits will be of O(N log logN) in terms of
space complexity. Further improvements to the efficiency of the circuit can be achieved by employing
Fan and Tsui’s high performance partial sum network [34]. This network implements the indicator
circuits with constant space complexity and delay (per circuit).
In summary, the recursive architecture may be developed and modified to achieve the timing require-
ments of the circuit. This may be done by ”opening the box” of the embedded decoders, and altering
them to support more efficient designs.
A careful examination of the line-decoder reveals that the auxiliary PEs array is only used on steps 0
and 2, and is idle on the other steps. This fact motivates us to consider two variations on this design. The
first one adds hardware and use these arrays to increase the throughput, while the second one decreases
the throughput and thereby reduces the required hardware.
5.1.4 Parallel Decoding of Multiple Codewords
High throughput communication systems may require support of simultaneous decoding of multiple code-
words. A naive approach to meet this challenge is implementing p instances of the decoder when there is
a need for decoding p codewords simultaneously. However, because the PEs auxiliary array is idle most
of the time, it seems like a good idea to ”share” this array among several decoders. By appropriately
scheduling the commands to the processors, it is possible to have a decoder implementation for p parallel
codewords which is less expensive than just duplicating the decoders.
Since the array is idle during steps 1 and 3, in which the embedded length N/2 decoder is active,
it is possible to have p ≤ T (n − 1) + 1 = N − 1 decoders sharing the same auxiliary array. The
decoding of each one of them is issued in a delay of one clock cycle from each other. Assuming that
p = N − 1, we have a decoding time T (n) + N − 2 = 3N − 4 for N − 1 codewords while having
p · P (n − 1) + N/4 = (N − 1) · N/4 + N/4 = N2 · 4 processors, which is about half of the number of
processors of the naive solution.
This notion can be further developed. For the embedded N/2 length decoder, there is a an auxiliary
array of N/8 processors. This auxiliary array is used on steps 0 and 2 of the decoders of length N and
length N/2. Therefore, it is idle most of the time, and we can share it among the p decoders of length
N/2. Assuming that p = N − 1, we may allocate three auxiliary arrays that will be shared among the
decoders, each one is dedicated to one of these different steps: one array for step 0 (and 2) of the N
length decoder, one array for step 0 of the N/2 length decoder and one array for step 2 of the N/2 length
45
decoder. For each of the decoded codewords the number of clock cycles between these steps is at least p,
therefore there will be no contention on these resources and the throughput will not suffer because of this
hardware reduction.
In general, for p = N − 1, the auxiliary array within the embedded decoder of length N2i polar decoder
(i ∈ [log2(N)− 2]), can be shared among the p decoders, provided that we allocate an instance of the
array for each of the decoding steps it is used in, during the first half of the decoding algorithm for the
length N code (i.e. during the N length decoder’s steps 0 and 1). As a consequence, for this specific
array, we have one call in step 0 of the N length decoder, one call for step 0 and one call for step
2 of the embedded N2 length decoder, two calls for step 0 and two calls for step 2 of the
N
22 length
embedded decoder, ..., 2i calls for step 0 and 2i calls for step 2 for the length N2i embedded decoder.
In summary, we require
∑i
t=0 2
t = 2i+1 − 1 auxiliary arrays of processors, each one contains N2i+2 PEs.
In particular, we need N − 1 PEs for the length 2 decoder (each PE is allocated to a specific decoder),
and N2 ·
∑log2(N)−2
i=0
2i+1−1
2i+1 ≈
N
2 (log2(N)− 1) PEs for the other decoders lengths. This adds up to
approximately N2 (1 + log2(N)) PEs. We conclude that this solution allows an increase of the throughput
in a multiplicative factor of N , while the PEs hardware is only increased by approximately log2(N) factor.
Note, that the number of registers should be increased by a multiplicative factor of O(p) = O(N) as well.
A closer look at the above design, reveals that we actually allocated for each sub-step of steps 0 and 1
of the N length decoder a different array of processors. The decoding operations of the p codewords will
go through these units in a sequential order. However, each decoder should have its own set of registers
saving the state of the decoding algorithm. Another observation is that when we finish decoding the first
codeword (i.e. the one we started decoding in time 0), we can start decoding codeword number N in the
next time slot (and then codeword number N + 1, etc.), in a pipelined fashion. Note that Leroux et al.
considered a similar idea, and referred to it as the vector-overlapping architecture [14].
5.1.5 Limited Parallelism Decoding
An alternative approach for addressing the problem of low utilization of the auxiliary PEs arrays is
to limit the number of processing elements that may be allowed to operate simultaneously. This is a
practical consideration, since typically, a system design has a parallelism limitation which is due to power
consumption and silicon area constraints. The limited parallelism, inevitably results in an increase of the
decoding time, and thereby a decrease of the throughput.
The length N line decoder has PE parallelism of N/2, because it may simultaneously compute at most
N/2 LLRs using the N/2 PEs. Let us consider a line decoder of length N code with limited parallelism
of N/2i, where i ∈ [log2N ]. This means, that the decoder has exactly
N
2i PEs. If i = 1 then the decoder
is actually the standard line decoder. Figure 17 depicts the block diagram of the decoder for i > 1. We
highlight the changes that were applied to the standard line decoder (Figure 16) in creating Figure 17.
• The auxiliary PEs array was omitted.
• The embedded line decoder of the N/2 length code was replaced by a limited parallelism line decoder,
with parallelism of N/2i.
• The input to the registers array R(N/4 : N/2− 1) is the signals array Λ(0 : N/4− 1).
• A MUX array (M2a) was added providing the ”channel” inputs to the (M2) MUX array. The control
signal of the (M2a) array is an internal binary signal called subStep, such that the output of the
array is λ(0 : N/2− 1) if subStep = 0 and otherwise it equals to λ(N/2 : N − 1). Similarly, subStep
is the control signal of two additional MUX arrays (M1a) and (M1b) providing inputs to the (M1)
MUX array. We have the outputs of these arrays equal uˆ(in) (0 : N/4− 1) and x(outer) (0 : N/4− 1)
for subStep = 0 and uˆ(in) (N/4 : N/2− 1) and x(outer) (N/4 : N/2− 1) otherwise.
• The output LLR signals array λ(out) (0 : N/2− 1) is routed such that
λ(out) (0 : N/4− 1) = R (0 : N/4− 1) and λ(out) (N/4 : N/2− 1) = Λ (0 : N/4− 1) . (74)
46
The limited parallelism S-mode decoding algorithm has four steps as before, however steps 0 and 2 are
modified including now two sub-steps. On each sub-step we calculate half of the LLRs because we don’t
have an auxiliary array. Note that depending on the parallelism of the embedded decoder, those sub-steps
may require more than one clock cycle. In a similar manner the P-mode operation is also amended, and
now contains two sub-steps.
Let us analyze the time complexity of this algorithm. We denote by T (n, n− i) the S-Mode running
time (in terms of clock cycles) for length N = 2n bits polar code with limited parallelism of N/2i = 2n−i.
We note that T (n, n − 1) = T (n), where T (n) = 2N − 2 is the time complexity of the standard line
decoder. The following recursion formula is derived
T (n, n− i) = 2 · T (n− 1, n− i) + 4 · Tp(n− 1, n− i), (75)
where Tp(n,m) is the running time of the N = 2
n bits length decoder with 2m limited parallelism in
P-Mode.
Tp(n,m) =
{
1, n−m ≤ 1;
2 · Tp(n− 1,m), otherwise.
(76)
Therefore,
Tp(n,m) =
{
1, n−m ≤ 1;
2n−m−1, otherwise. (77)
It can be shown that
T (n, n− i) = 2 ·N + (i− 2) · 2i , i ≥ 1. (78)
Equation (78) reveals the tradeoff between the number of PEs and the running time of the algorithm. For
example, decreasing the number of processors by a multiplicative factor of 8, compared to the standard
case (i.e. i = 4), results in an increase of only 34 clock cycles in the decoding time. We note however,
that in order to implement such a decoder, additional routing circuitry (e.g. multiplexers layers) should
be included.
Remark 3 (SCL Implementation) For a limited list size, the SCL decoder may also be implemented by
a line decoder. This requires to duplicate the hardware by the list size, L, and to introduce the appropriate
logic (i.e. comparators and multiplexer layers). It is possible to provide an implementation with O(f(L)·N)
time complexity, where f(·) is a polynomially bounded function, that is dependent on the efficiency of
algorithms for selection of L most likely decoding paths from a list of 2L paths (which is done by the
N = 2 length decoder). Furthermore, the normalization of the likelihoods should be carefully considered,
and also should have its impact on the precise (i.e. non asymptotic) time complexity. As was mentioned in
Subsection 5.1.5 by limiting the parallelism of the decoder, it is possible to reduce the number of processors
with reasonable hit to the throughout.
5.1.6 The BP Line Decoder
As we already noticed in Subsection 4.3, BP is an iterative algorithm, in which messages are sent on
the normal factor graph representing the code. In this subsection, we consider an implementation of the
BP decoder that employs the GCC serial schedule. Figure 18a, depicts the proposed design processing
element (PE). This unit has two inputs for message LLRs (µ
(in)
0 and µ
(in)
1 ), and depending on the control
signal c(BPPE) it performs either the f(+)(·, ·) function or the f(=)(·, ·), i.e.
µ(out) =


f(=)
(
µ
(in)
0 , µ
(in)
1
)
, c(BPPE) = 0;
f(+)
(
µ
(in)
0 , µ
(in)
1
)
, c(BPPE) = 1.
(79)
Since the PE has to support the implementation of equations (44)-(49), we introduce two routing layers
for the inputs (OP-MUX) and the outputs (OP-De-MUX) that ensure that the proper inputs are given
47
Figure 17: Block diagram for the limited parallelism line decoder
48
(a) Processing element and routing layers
(b) BP line decoder block
Figure 18: BP line decoder components definitions
c(opMux), c(opDeMux) c(BPPE) µ
(in)
0 µ
(in)
1 µ
(out) Equation
0 0 µ
(in)
x1 µ
(in)
v µe1→a0 (44)
1 1 µ
(in)
x0 µ
(in)
u µa0→e1 (45)
2 1 µ
(in)
x0 µe1→a0 µ
(out)
u (46)
3 0 µ
(in)
x1 µa0→e1 µ
(out)
v (47)
4 1 µ
(in)
u µe1→a0 µ
(out)
x0 (48)
5 0 µ
(in)
v µa0→e1 µ
(out)
x1 (49)
6 0 or 1 µ
(ext,in)
0 µ
(ext,in)
1 µ
(ext,out) (79)
Table 1: Routing tables for OP-MUX and OP-DEMUX in Figure 18
to the processor and that its output is dispatched to the appropriate destination. These routing units are
controlled by two control signals c(opMux) and c(opDeMux) which have seven possible values, and is thereby
represented by three bits. Table 1 specifies the valid assignments of c(BPPE), c(opMux) and c(opDeMux)
for implementing different operations. The last option (copMux = copDeMux = 6) is used in the decoder’s
P-Mode, that is defined in the sequel.
The proposed decoder structure is inspired by the recursive structure of the SC line decoder. Figure
18b depicts the BP line decoder block. Similarly to the SC line decoder we specify two operation modes:
• S-Mode (modeIn = 0): the decoder completes a single iteration of the BP decoder, given the inputs
λ(0 : N − 1), z(0 : N − 1) and outputs uˆ(0 : N − 1) and xˆ(0 : N − 1) (defined in the BP signature,
(52)).
• P-Mode (modeIn = 1): the decoder serves as an array of N/2 processors and performs simulta-
neously a parallel computation on the input array λ(0 : N − 1) such that ∀i ∈ [N/2]− the output
λ(out)(i) is the outcome of applying the BP PE on inputs λ(i) and λ(i +N/2) with C(BPPE,in) as
the control signal.
Figure 19 contains a block diagram for this design. Due to the vast number of details in this figure,
we chose to enlarge three parts of this figures, named sub-figures A, B and C, in Figures 20, 21 and 22,
respectively. The memory plays a fundamental role in the design, as it enables storing messages within
the iteration boundary and beyond it. The basic requirement is that each ”butterfly” realization of the
(u+ v, v) factor graph, should have memory resources to store its messages. To allow messages to be kept
within the iteration boundary, it is only required to have one registers array for each length of outer-code
and for each message type. However, the need for keeping a message beyond the iteration boundary
requires a dedicated memory array for each outer-code instance. Note that messages which their values
49
Figure 19: Block diagram for the BP line decoder. Details of figure appear in Figures 20, 21 and 22
corresponding to sub-figures A, B and C respectively.
50
Figure 20: Block diagram for the BP line decoder (Figure 19) - zoom-in: Sub-figure A
are calculated before being used for the first time in each iteration are not required to be kept beyond the
iteration boundary. In the case of the (u+ v, v) code and the GCC schedule, only messages of type µ
(in)
v
need to be kept beyond the iteration boundary. We suggest to satisfy this requirement in the following
way. In the length N decoder, we associate a registers matrix µ
(in)
v (0 : #r(N) − 1, 0 : N/2 − 1). Here,
#r(N) is the number of realizations of factor graphs corresponding to outer-codes of size N that exist in
our code.
For the N bits length code, there is only one factor graph of this size (i.e. the entire graph), and
therefore for this decoder #r(N) = 1. Consider now the N/2 bits length decoder that is embedded within
the N length decoder. We see in Figure 19, that this decoder has its number of realizations as 2 ·#r(N),
i.e. for the N bits length decoder we have #r(N/2) = 2. This is because we have two outer-codes of
length N/2 bits in the N length code. Therefore, the memory matrix associated with it has two rows
and N/4 columns. The first row is dedicated to the first realization of the outer-code and the second row
is dedicated to the second realization. Within this N/2 bits length decoder, there is an embedded N/4
length decoder with 2 ·#r(N/2) realizations, so in this case #r(N/4) = 4. As a result, it has a registers
matrix with 4 rows and N/8 columns (each row is dedicated to one of the 4 outer-codes of length N/4 in
this GCC scheme). This development continues, until we reach the embedded decoder of length 2, which,
by induction, has #r(2) = N/2 realizations for the N length decoder, so it requires a registers matrix
with N/2 rows and one column.
For a correct operation of the decoder, it is required to inform the embedded decoders to which
realization of the outer-code’s factor graph they are currently referring. This is the role of the realizationID
input signal in Figure 18b, that takes decimal values in [#r (N)]−, and therefore requires ⌈log2 (#r (N))⌉
bits for their representation. Moving to the implementation in Figure 19 we can observe that indeed
RealizationID is used to select the row of µ
(in)
v corresponding to the outer-code realization that is currently
processed. Furthermore, an internal signal RealizationID(N/2) is defined as the RealizationID input of
51
Figure 21: Block diagram for the BP line decoder (Figure 19) - zoom-in: Sub-figure B
Figure 22: Block diagram for the BP line decoder (Figure 19) - zoom-in: Sub-figure C
52
the embedded N/2 length decoder, such that
RealizationID(N/2) = 2 ·RealizationID+ outerCodeID, (80)
where outerCodeID ∈ {0, 1} indicates the ordinal of the N/2 bits length outer-code (of the current
decoded length N code) that is currently processed.
We also need to have registers arrays for the messages of type µe1→a0 , µa0→e1 , µ
(in)
u , µ
(out)
u and µ
(out)
v ,
each one of them of length N/2. We denote them by µe1→a0(0 : N/2 − 1), µa0→e1(0 : N/2 − 1), µ
(in)
u (0 :
N/2− 1), µ
(out)
u (0 : N/2− 1) and µ
(out)
v (0 : N/2− 1), respectively. Note, that as opposed to the memory
structure for the µ
(in)
v messages, these arrays do not need to be available beyond the iteration boundary,
therefore it is sufficient to have them as arrays and not matrices. Furthermore, the arrays for messages
µe1→a0 , µ
(out)
u and µ
(out)
v , can be replaced by a single temporary array of length N/2. However, in the
description of the hardware structure, we chose not to do this, in order to keep the discussion more
comprehensible.
The routing units OP-MUX and OP-De-MUX that appeared in Figure 18a were grouped together in
Figure 19 into routing arrays (M3a), (M3b), (M4a) and (M4b). The inputs and outputs to these routing
arrays are arrays of inputs and outputs corresponding to the types of inputs and outputs that appear in
Figure 18a. The convention is that in these routing arrays, the ith output corresponds to the ith input from
each signals array (the signals array is selected by the control signal of the routing array). Moreover, the
ith output of the OP-MUX array corresponds to the consecutive ith processor from the array of processors
it serves. Similarly, the ith input of the OP-De-MUX array corresponds to the ith consecutive processor
from the array of processors it serves.
MUX arrays (M1a), (M1b), (M2a) and (M2b) are used to select the LLR inputs to the embedded
decoder, λ˜ (0 : N/2− 1). The select signal cm determines if the inputs to the embedded decoder comes
from the outputs of the OP-MUX arrays (M3a) and (M3b) if cm = 0, or from the MUX-Arrays (M2a)
and (M2b) if cm = 1. We shall see that cm = 0 is used when the embedded decoder is employed
in S-Mode, while cm = 1 is used when it is employed in P-Mode. The multiplexer (M5) selects the
appropriate source for the c(BPPE) control signal, such that in S-Mode (modeIn = 0), c(BPPE) takes
the internal c(BPPE,internal) signal, and in P-Mode it takes the input signal c(BPPE,in). Finally, note
that the λ(0 : N − 1) inputs signals array is wired both to µ
(in)
x0 (0 : N/2 − 1), µ
(in)
x1 (0 : N/2 − 1) signals
arrays (used in S-Mode) and to µ
(ext,in)
0 (0 : N/2 − 1) and µ
(ext,in)
1 (0 : N/2 − 1) (used in P-Mode). The
µ
(out)
x0 (0 : N/2 − 1) and µ
(out)
x1 (0 : N/2 − 1) signals arrays are wired to the xˆ(0 : N − 1) output signals
array.
The S-Mode operation of the decoder is described in Algorithms 22 and 23. The P-Mode procedure
is described in Algorithm 24.
Let us, now, consider the time complexity (in terms of the number of clock cycles for running an
iteration) of this design. As before, let T (n) be the time complexity of the decoder of length N = 2n bits
polar code. We assume that each operation of the BP PE requires one clock cycle. As a consequence, we
have
T (n) = 2 · T (n− 1) + 7, for n > 1 (81)
and T (1) = 4, resulting in T (n) = 5.5 ·N−7 = Θ(N). The memory consumption, however is Θ(N · logN),
because of the memory matrices for the µ
(in)
v type of messages. The number of processing elements in
this design is N/2. Note that our proposed PE can be further improved to support some PE operations
occuring in parallel. For example, if the BP PE is designed such that the operation of f(+)(·) and the
operation of f(=)(·) can be performed simultaneously in one clock cycle, we can execute the last two
operations in step 3 in one clock cycle. Consequently, this will reduce the free addend in (81) to 6.
Further reduction is possible if the processor can execute f(+)(·) and direct its output to f(=)(·) in one
clock cycle. This improvement will result in joining the two operations in step 2, into one operation.
Enabling the computation of f(=)(·) and directing its output to f(+)(·) in the same clock cycle, results in
consolidation of the two operations of step 0 into one operation (actually, the latter change may also allow
53
Algorithm 22 S-Mode (Steps 0 and 1) of BP on Length N (u+ v, v) Polar Code (modeIn = 0)
//STEP 0:
⊲ Set cm = c
(BPPE,internal) = c(opMux) = c(opDeMux) = 0,mode = 1.
Operate the embedded decoder in P-Mode, such that at the output of the decoder we have
µe1→a0(j) = f(=)
(
µ(in)x1 (j) , µ
(in)
v (j)
)
∀j ∈ [N/4]−.
Use the auxiliary PEs array and compute
µe1→a0(j) = f(=)
(
µ(in)x1 (j) , µ
(in)
v (j)
)
∀N/4 ≤ j ≤ N/2− 1.
Store these messages in their designated memory array.
⊲ Set cm = outerCodeID = 0, c
(BPPE,internal) = mode = 1, c(opMux) = c(opDeMux) = 2.
Simultaneously operate the embedded decoder (P-Mode) and the auxiliary array and store their
outputs in the memory area such that
µ(out)u (j) = f(+)
(
µ(in)x0 (j), µ
(in)
e1→a0 (j)
)
∀j ∈ [N/2]−.
Sample the first half of the frozen bits indicator z by the z˜ register, i.e. z˜(0 : N/2−1) = z(0 : N/2−1).
//STEP 1:
⊲ Set mode = outerCodeID = 0, cm = 1.
Execute the embedded decoder in S-Mode on µ
(out)
u (0 : N/2−1) as the LLR input and z˜(0 : N/2−1)
as the frozen symbols indicator vector. The realization ID of the embedded decoder (denoted by
realizationID(N/2)) is calculated according to (80).
⊲ Sample the u˜(0 : N/2− 1) signals array by the first half of uˆ, i.e. uˆ(0 : N/2− 1) = u˜(0 :
N/2− 1). Sample the x˜(0 : N/2− 1) signals array by the registers array µ
(in)
u (0 : N/2− 1), i.e.
µ
(in)
u (0 : N/2− 1) = x˜(0 : N/2− 1).
54
Algorithm 23 S-Mode (Steps 2 and 3) of BP on Length N (u+ v, v) Polar Code (modeIn = 0)
//STEP 2:
⊲ Set cm = 0, c
(BPPE,internal) = c(opMux) = c(opDeMux) = mode = 1.
Simultaneously operate the embedded decoder (P-Mode) and the auxiliary array and store their
outputs in the memory area such that
µa0→e1(j) = f(+)
(
µ(in)x0 (j) , µ
(in)
u (j)
)
∀j ∈ [N/2]−.
⊲ Set cm = c
(BPPE,internal) = 0,mode = outerCodeID = 1, c(opMux) = c(opDeMux) = 3.
Simultaneously operate the embedded decoder (P-Mode) and the auxiliary array and store their
outputs in the memory area such that
µ(out)v (j) = f(=)
(
µ(in)x1 (j) , µa0→e1(j)
)
∀j ∈ [N/2]−.
Sample the second half of the frozen bits indicator z by the z˜ register, i.e.
z˜(0 : N/2− 1) = z(N/2 : N − 1).
//STEP 3:
⊲ Set cm = mode = 0, outerCodeID = 1.
Execute the embedded decoder in S-Mode on µ
(out)
v (0 : N/2−1) as the LLR input and z˜(0 : N/2−1)
as the frozen symbols indicator vector. The realization ID of the embedded decoder (denoted by
realizationID(N/2)) is calculated according to (80).
⊲ Sample the u˜(0 : N/2− 1) signals array by the second half of uˆ, i.e. uˆ(N/2 : N − 1) =
u˜(0 : N/2− 1). Sample the x˜(0 : N/2− 1) signals array by registers array µ
(in)
v (0 : N/2− 1),
i.e. µ
(in)
v (0 : N/2− 1) = x˜(0 : N/2− 1).
⊲ Set cm = c
(BPPE,internal) = c(opMux) = c(opDeMux) = 0,mode = 1.
Simultaneously operate the embedded decoder (P-Mode) and the auxiliary array and store their
outputs in the memory area such that
µe1→a0(j) = f(=)
(
µ(in)x1 (j), µ
(in)
v (j)
)
∀j ∈ [N/2]−.
⊲ Set cm = 0, c
(BPPE,internal) = mode = 1, c(opMux) = c(opDeMux) = 4.
Simultaneously operate the embedded decoder (P-Mode) and the auxiliary array and store their
outputs in the memory area such that
µ(out)x0 (j) = f(+)
(
µ(in)u (j), µe1→a0 (j)
)
∀j ∈ [N/2]−.
⊲ Set cm = c
(BPPE,internal) = 0,mode = 1, c(opMux) = c(opDeMux) = 5.
Simultaneously operate the embedded decoder (P-Mode) and the auxiliary array and store their
outputs in the memory area such that
µ(out)x1 (j) = f(=)
(
µ(in)v (j), µa0→e1 (j)
)
∀j ∈ [N/2]−.
//Note that µx0 and µx1 signals array are wired to the xˆ(0 : N−1) output signals array, as specified
in Figure 18a.
55
Algorithm 24 P-Mode of the BP Line Decoder of Length N (u+ v, v) Polar Code (modeIn = 1)
⊲ Set cm = 1, c
(opMux) = c(opDeMux) = 6.
Simultaneously operate the embedded decoder (P-Mode) and the auxiliary array such that we have
at the output of the decoder
λ(out) (j) =
{
f(=) (λ(j), λ(j +N/2)) , c
(BPPE,in) = 0;
f(+) (λ(j), λ(j +N/2)) , c
(BPPE,in) = 1.
∀j ∈ [N/2]−
to consolidate the second and third computation in step 3, leaving our first suggested change obsolete).
These changes result in 4, as the free addend in (81) and T (2) = 2, so T (n) = 3 ·N − 4.
The remarks, raised on the SC line decoder recursive design at the end of Subsection 5.1.3 also ap-
ply here. Specifically, this design also suffers from long paths hazards especially in the routing layers of
P-Mode. Consequently, more efficient designs may be applied by unfolding the recursive blocks. Further-
more, the issue of idle clock cycles for the BP PE is also a problem of this design and the solution of
Subsections 5.1.4 and 5.1.5 may be adapted to this decoder too.
Note however that while in the SC decoder, the existence of inactive PEs is due to the properties of
the SC algorithm, which dictates the scheduling of the message computation, in the BP case, this is due
to the scheduling we choose and not a mandatory property of the algorithm. Other types of scheduling do
exist, and currently there is no evidence which scheduling is better (for example, in terms of the achieved
error rate or in terms of the average number of iterations required for convergence). Hussami et al. [12]
proposed to use the Z-shape schedule, which description suggests a constant level of parallelism of N
PEs (of the type we considered here) operating all the time. This seems to give the Z-shape schedule
an advantage over the GCC schedule if the number of processors is not limited (unless the technique of
Subsection 5.1.4 is applied). It is an interesting question to find out which schedule is better, when the
number of processors is limited. This is a matter for further research.
5.2 Decoding Architectures for General Polar Codes
Thus far, we described decoding algorithms for the (u + v, v) polar code. This notion has enabled us to
restate the SC implementation for Arikan’s construction, that were proposed by Leroux et al. [15]. In
addition, we suggested a BP decoding implementation employing the GCC schedule. In this subsection, we
generalize these constructions for other types of polar codes. Since we already covered implementations for
Arikan’s code in some details, in this section we provide a more concise description of the implementations,
mainly emphasizing the principle differences from the designs in Subsection 5.1.
5.2.1 Recursive Description of the SC Line Decoder for General Linear Kernels
Let C be a homogenous linear polar code over field F , constructed by a kernel of ℓ dimensions. This kernel
has an ℓ × ℓ generating matrix, G associated with it. Let f be the number of bits required to represent
all the field elements, i.e. f = ⌈log2 |F |⌉.
Figure 23a depicts the basic processing element (PE) of the SC line decoder. The LLR input λ(0 : ℓ−1)
and output λ(out) are specified such that each entry λ(j) is actually a vector of |F | − 1 elements. These
elements are the logarithms of the likelihood ratio of the zero symbol and one of the F\{0} symbols
(denoted by λ(t) in (20), where t ∈ F\{0}). In our block diagrams, thick lines are used to carry these
LLR signals. In other words, assuming that each aforementioned λ(t) is represented by β bits, each thick
line is composed of β · (|F | − 1) bit lines. The input signal cu has ℓ possible values, each one corresponds
to a different LLR processing step as specified in (26). The input signals array xˆ(in)(0 : ℓf − 1) represents
a coset vector for the currently processed kernel. This is an ℓ length word over F and as such it is
56
(a) Processing element
(b) Decoder block definition
Figure 23: Block definitions of SC line decoder for length N polar code based on a linear ℓ dimensions
kernel with alphabet F
represented by ℓ · f bits. Let xˆ ∈ F ℓ be the vector represented by this register array, furthermore let[
λ
(t)
i
]
t∈F\{0}
be the LLR vector corresponding to the signal input λ(i), where i ∈ [ℓ]−. Similarly, let[
λˆ(t)
]
t∈F\{0}
be the LLR vector corresponding to the output signal λ(out). If the cu input represents the
decimal value i (denote it by cu ≡ i), the circuit’s output is defined by Equation (26).
Figure 23b specifies the block definition for the this general kernel line decoder. Most of the labels of
this block’s input and output signals are the same as in Figure 14b and they keep their functionality as
well. There are some modifications, however, that are required in order to support the change in the kernel
and the alphabet. The signals arrays xˆ(in), uˆ and xˆ represent vectors of length N , over the F alphabet.
As a consequence, each entry in them is represented by f bits. The input signal c
(in)
u (0 : ⌈log2 ℓ⌉ − 1),
used in P-Mode (modeIn = 1), has ℓ possible values, each one corresponds to a different LLR processing
step as specified in (24) in Algorithm 8. Since the maximum number of PEs employed simultaneously is
N/ℓ, the line decoder is designed to have N/ℓ length LLR output signals array. The functionality of the
decoder in P-Mode is that for all j ∈ [N/ℓ]− we have λ
(out)(j) be the output of a PE that is given as
inputs the LLR array λ (j · ℓ : (j + 1) · ℓ− 1), the coset vector xˆ(in) (j · ℓ : (j + 1) · ℓ− 1) and cu = c
(in)
u .
In S-Mode (modeIn = 0) the decoder outputs its estimations for the information word uˆ (0 : Nf − 1) and
its corresponding codeword xˆ (0 : Nf − 1) given the LLR input signals array λ (0 : N − 1) and the frozen
indicator vector z (0 : N − 1).
The generalization of the (u + v, v) block diagram in Figure 17 and its corresponding algorithms can
be easily completed using the above PE description and Algorithms 8 and 9. We leave the details for the
reader.
5.2.2 Recursive Description of the BP Line Decoder for General Kernels
Subsection 5.2.1 considered the adaptation of the (u + v, v) line decoder for supporting general kernels.
Designing a BP line decoder for general polar codes entails similar difficulties. In this subsection we only
highlight the principal necessary modifications to the BP decoder in Subsection 5.1.6 in order to adjust
it to the case of ℓ dimensions kernel over alphabet F .
• The LLR inputs, internal signals and memories should be extended to support LLRs over F . See
57
Subsection 5.2.1 for more details.
• The routing layers OP-MUX and OP-De-MUX need to be extended in order to support all the
different messages calculated by the PE.
• The Memory Region in Figure 19 needs to include registers array to support each of the algorithm’s
possible messages. Messages that are required to be kept beyond the iteration boundary have to
be stored in a matrix, such that each row corresponds to a different realization of the code. The
number of LLRs in each row of these matrices is N/ℓ, the outer-code length in F symbols. On the
other hand, messages that in each iteration, their values are calculated before being used for the
first time (in the iteration) requires only registers arrays of length N/ℓ. See Subsection 5.1.6 for
more details on the distinction between these two types of messages.
• Algorithms 22 and 23 are replaced by ℓ pairs of steps each one is dedicated to a different outer-code
Ci. See Algorithms 17 and 18 for further details.
5.2.3 Decoders for Mixed-Kernels and General Concatenated Codes
So far, we considered decoders for homogenous polar codes over alphabet F . These codes have the
attractive property, that the outer-codes in their GCC structure are themselves (shorter) polar codes
from the same family. Therefore, we were able to use a single embedded decoder of a code of length N/ℓ
symbols within the decoder of the code of length N symbols. This embedded decoder is used ℓ times,
each time on different inputs (i.e. indices of the frozen symbols and the input messages). Unfortunately,
this property no longer applies when mixed-kernels polar codes are used.
Let us consider the ℓ = 4 dimensions mixed-kernels polar code described in Example 3. In the decoder
for length N = 4n bits code, we need to have an embedded decoder of the mixed-kernels code of length
N/4 bits and an additional embedded decoder for the RS4 polar code of length N/4 quaternary symbols.
Note, however, that even here, a reuse of circuits is still possible, as the decoder for the RS4 code of
length N/4, requires an embedded decoder for the RS4 code of length N/16 within it. The latter decoder
(and its embedded decoders) can be shared with the decoder for the mixed-kernels code of length N/4
(that requires an embedded RS4 decoder of the same length).
Summary and Conclusions
We considered the recursive GCC structures of polar codes which led to recursive descriptions of their
encoding and decoding algorithms. Specifically, known algorithms (SC, SCL and BP) were formalized in
a recursive fashion, and then were generalized for arbitrary kernels. Moreover, recursive architectures for
these algorithms were considered. We restated known architectures, and generalized them for arbitrary
kernels.
In our discussion, we preferred for brevity, to give somewhat abstract descriptions of the subjects,
emphasizing the main properties while neglecting some of the technical details. However, a complete
design requires a full treatment of all of these specifics (see e.g. Leroux et al. for the (u+ v, v) case [15]).
A subject that requires a more careful attention, is the study of BP decoder and specifically the
proposed GCC schedules. A comparison between this schedule and other proposed schedules (e.g. the
Z shaped schedule) is an intriguing question. Furthermore, a comparison of the BP decoder versus SCL
decoder for general kernels taking into account error-correction performance and the decoder’s complexity
is also an interesting topic. These questions are subjects for further research.
References
[1] E. Arikan, “Channel polarization: A method for constructing capacity-achieving codes for symmetric
binary-input memoryless channels,” IEEE Trans. Inf. Theory, vol. 55, no. 7, pp. 3051–3073, 2009.
58
[2] E. Arikan and E. Telatar, “On the rate of channel polarization,” Jul. 2008. [Online]. Available:
http://arxiv.com/abs/0807.3806
[3] S. B. Korada, E. Sasoglu, and R. Urbanke, “Polar codes: Characterization of exponent, bounds,
and constructions,” Jan. 2009. [Online]. Available: http://arxiv.com/abs/0901.0536
[4] R. Mori and T. Tanaka, “Channel polarization on q-ary discrete memoryless channels by arbitrary
kernels,” Jan. 2010. [Online]. Available: http://arxiv.org/abs/1001.2662
[5] ——, “Non-binary polar codes using reed-solomon codes and algebraic geometry codes,” Jul. 2010.
[Online]. Available: http://arxiv.org/abs/1007.3661
[6] N. Presman, O. Shapira, and S. Litsyn, “Binary polar code kernels from code decompositions,” Jan.
2011. [Online]. Available: http://arxiv.org/abs/1101.0764
[7] N. Presman, O. Shapira, S. Litsyn, T. Etzion, and A. Vardy, “Binary polarization kernels from code
decompositions,” Oct. 2014.
[8] N. Presman, O. Shapira, and S. Litsyn, “Polar codes with mixed kernels,” in Information Theory
Proceedings (ISIT), 2011 IEEE International Symposium on, 2011, pp. 6–10. [Online]. Available:
(fullversion)http://arxiv.org/abs/1107.0478
[9] P. Trifonov, “Efficient design and decoding of polar codes.” [Online]. Available:
http://dcn.infos.ru/∼petert/
[10] I. Tal and A. Vardy, “List decoding of polar codes,” 2011.
[11] ——, “List decoding of polar codes,” Jun. 2012. [Online]. Available:
http://webee.technion.ac.il/people/idotal/papers/preprints/polarList.pdf
[12] N. Hussami, S. B. Korada, and R. Urbanke, “Performance of polar codes for channel and source
coding,” Jan. 2009.
[13] E. Arkan, “A performance comparison of polar codes and reed-muller codes,” IEEE Commun. Lett.,
vol. 12, no. 6, pp. 447–449, 2008.
[14] C. Leroux, I. Tal, A. Vardy, and W. J. Gross, “Hardware architectures for successive cancellation
decoding of polar codes,” Nov. 2010. [Online]. Available: http://arxiv.org/abs/1011.2919
[15] C. Leroux, A. Raymond, G. Sarkis, I. Tal, A. Vardy, and W. Gross, “Hardware
implementation of successive-cancellation decoders for polar codes,” Journal of Signal
Processing Systems, vol. 69, pp. 305–315, 2012, 10.1007/s11265-012-0685-3. [Online]. Available:
http://dx.doi.org/10.1007/s11265-012-0685-3
[16] C. Leroux, A. Raymond, G. Sarkis, and W. Gross, “A semi-parallel successive-cancellation decoder
for polar codes,” IEEE Trans. Signal Process., vol. 61, no. 2, pp. 289–299, 2013. [Online]. Available:
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6327689
[17] A. Pamuk and E. Arikan, “A two phase successive cancellation decoder architecture for polar
codes,” in Information Theory Proceedings (ISIT), 2013 IEEE International Symposium on, 2013,
pp. 957–961. [Online]. Available: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6620368
[18] A. Balatsoukas-Stimming, A. J. Raymond, W. J. Gross, and A. Burg, “Hardware architecture for
list sc decoding of polar codes,” Mar. 2013.
[19] A. Balatsoukas-Stimming, M. B. Parizi, and A. Burg, “Llr-based successive cancellation
list decoding of polar codes,” arXiv preprint arXiv:1401.3753, 2013. [Online]. Available:
http://arxiv.org/abs/1401.3753
59
[20] A. Pamuk, “An fpga implementation architecture for decoding of polar codes,” in Wireless Commu-
nication Systems (ISWCS), 2011 8th International Symposium on, nov. 2011, pp. 437 –441.
[21] J. Forney, G. D., Concatenated Codes. Cambridge, MA: M.I.T. Press, 1966.
[22] E. Blokh and V. Zyabolov, “Coding of generalized concatenated codes,” Probl. Peredachi. Inform.,
vol. 10, no. 3, pp. 45–50, 1974.
[23] V. Zinoviev, “Generalized concatenated codes,” Probl. Peredachi. Inform., vol. 12, no. 1, pp. 5–15,
1976.
[24] I. Dumer, Handbook of Coding Theory. Eds., Elsevier, The Netherlands, 1998, ch. Concatenated
Codes and Their Multilevel Generalizations.
[25] J. Forney, G. D., “Codes on graphs: normal realizations,” IEEE Trans. Inf. Theory, vol. 47, no. 2,
pp. 520–548, 2001.
[26] S. B. Korada, “Polar codes for channel and source coding,” Ph.D. dissertation, EPFL, 2009.
[Online]. Available: http://library.epfl.ch/en/theses/?nr=4461
[27] N. Presman, O. Shapira, and S. Litsyn, “Polar codes with mixed kernels,” arXiv preprint
arXiv:1107.0478, 2011. [Online]. Available: http://arxiv.org/abs/1107.0478
[28] R. Mori and T. Tanaka, “Performance and construction of polar codes on symmetric binary-input
memoryless channels,” in Proc. IEEE Int. Symp. Information Theory ISIT 2009, 2009, pp. 1496–1500.
[29] E. Arikan, “Systematic polar coding,” IEEE Commun. Lett., vol. 15, no. 8, pp. 860–862, 2011.
[Online]. Available: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=5934670
[30] I. Dumer and K. Shabunov, “Soft-decision decoding of reed-muller codes: recursive lists,” IEEE
Trans. Inf. Theory, vol. 52, no. 3, pp. 1260 –1266, march 2006.
[31] I. Dumer, “Soft-decision decoding of reed-muller codes: a simplified algorithm,” IEEE Trans. Inf.
Theory, vol. 52, no. 3, pp. 954 –963, march 2006.
[32] J. Forney, G. D., “Codes on graphs: normal realizations,” IEEE Trans. Inf. Theory, vol. 47, no. 2,
pp. 520–548, 2001.
[33] E. Sharon, S. Litsyn, and J. Goldberger, “Efficient serial message-passing schedules for ldpc decod-
ing,” IEEE Trans. Inf. Theory, vol. 53, no. 11, pp. 4076 –4091, nov. 2007.
[34] Y. Fan and C. ying Tsui, “An efficient partial-sum network architecture for semi-parallel polar
codes decoder implementation,” IEEE Trans. Signal Process., vol. 62, no. 12, pp. 3165–3179, 2014.
[Online]. Available: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6803952
60
