Towards Practical Software Stack Decoding of Polar Codes by Aurora, Harsh & Gross, Warren J.
ar
X
iv
:1
80
9.
03
60
6v
1 
 [c
s.I
T]
  1
0 S
ep
 20
18
Towards Practical Software Stack
Decoding of Polar Codes
by Harsh Aurora, Warren Gross
ABSTRACT
The successive cancellation list decoding algorithm for polar codes yields near-optimal
decoding performance at the cost of high implementation complexity. The successive can-
cellation stack algorithm has been shown to provide similar decoding performance at a
much lower computational complexity, but software implementations report a sub-par T/P
performance. In this technical report, the benefits of the fast simplified successive cancel-
lation list decoder are extended to the stack algorithm, resulting in a throughput increase
by two orders of magnitude over the traditional stack decoder.
Department of Electrical and Computer Engineering
McGill University, Montréal
A technical report on material drawn from a Master’s Thesis submitted on
August 13, 2018
I Introduction
Polar codes were proposed by Erdal Arikan in 2008 [1] as the first set of linear block codes
that have an explicit construction and provably achieve the symmetric capacity of a binary
memoryless channel. Polar codes faced initial resistance due to the low throughput of the
sequential successive cancellation (SC) decoding algorithm, as well as its mediocre error-
correction performance at short code lengths. The successive cancellation list (SCL)
algorithm [2] and its CRC-aided variant [3] enabled optimal decoding performance at
short lengths, and fast simplified successive cancellation decoding [4–6] improved the
throughput performance considerably, thereby deeming polar codes a viable candidate
for practical applications. In 2016, polar codes were selected by 3GPP as one of the
error-correcting codes to be used in the enhanced Mobile Broadband (eMBB) control
channel [7, 8].
The successive cancellation stack (SCS) algorithm proposed in 2012 [9, 10] provides
similar error-correcting performance as the SCL algorithm with a complexity that varies
with the channel conditions. At high channel noise, SCS has the same complexity as SCL,
and as channel noise decreases, the SCS complexity approaches that of SC, making it an
attractive candidate. Although the SCS algorithm has an attractive complexity, software
implementations report a mediocre T/P performance. This technical report outlines a
method to apply the fast simplified decoding scheme in [4, 5] to the reduced memory stack
decoder in [11], resulting in a software T/P improvement of over two orders of magnitude,
from 9 Kbps to 930 Kbps.
This report is organized as follows: Section II provides relevant background information
regarding polar codes and the pertinent decoding algorithms. Section III highlights key
software implementation details of the decoders. Section IV describes the fast simplified
scheme applied to stack decoding, and Section V presents and discusses the simulation
results. Finally, Section VI concludes this report.
1
II Background
II-A Polar codes
Polar codes asymptotically achieve the symmetric channel capacity for a B-DMC W by
considering a set of N independent copies of W and recursively applying a polarizing
transform F = [ 1 01 1 ] to the inputs of the channels, resulting second set of N channels{
W
(i)
N
}
that are said to be polarized in the sense that K of the inputs are completely
reliable, while the remaining (N −K) inputs are completely unreliable, and as N → ∞
the fraction K
N
→ I(W ).
A polar code of length N and message bit length K shall be denoted by PC(N,K).
Given an information bit set A of size K and a corresponding frozen bit set AC of size
(N − K), the input uN−10 to the polarized channels is constructed from a message bit
sequence mK−10 by placing the bits at the indices contained in a A, and setting the
remaining indices to 0. The encoding step to generate the codeword cN−10 can then be
expressed as the matrix multiplication
cN−10 = u
N−1
0 F
⊗n ,
where n = log2N and F
⊗n is the nth Kronecker power of the kernel F , and can be
represented by the XOR tree shown in Figure 1. The tree has n stages, and the variable
λ ∈ [0, n] is used to denote the current stage in the tree. Given a stage λ, there are
(n− λ+ 1) branches denoted by φ ∈ [0, (n− λ)], and the size of each branch is Λ = 2λ.
II-B Decoding algorithms
All decoding algorithms in this section are described in the LLR domain.
Successive cancellation
The successive cancellation (SC) decoding algorithm [1] operates on the encoding tree,
propagating that channel values LLR(yi) from stage n to produce LLR
(
yN−10 , uˆ
i−1
0 |uˆ
i
)
2
λφ
0
0
1
0
1
2
3
0
1
2
3
4
5
6
7
3210
0
0
0
m0
0
m1
m2
m3
+
+
+
+
+
+
+
+
+
+
+
+
c0
c1
c2
c3
c4
c5
c6
c7
Figure 1 – Encoding tree for PC(8, 4) with A = {3, 5, 6, 7}.
f(α0, α1)
g(α0, α1, β)
β + α0
α1
Figure 2 – Min-sum approximation over the polarizing kernel.
at stage 0, according to the min-sum approximation [12] in Figure 2 and Equation (2).
The estimate uˆi can then be made following Equation (3).
f(α0, α1) = sign(α0) sign(α1) min(|α0|, |α1|) , α ∈ R (1)
g(α0, α1, β) = α0 + (−1)
βα1 , α ∈ R, β ∈ [0, 1] (2)
uˆi =


HD
(
LLR
(
yN−10 , uˆ
i−1
0 |uˆ
i
))
if i ∈ A
0 otherwise
(3)
3
λφ
α calculation
β propagation
(0, 0) (0, 1) (0, 2) (0, 3) (0, 4) (0, 5) (0, 6) (0, 7)
(1, 0) (1, 1) (1, 2) (1, 3)
(2, 0) (2, 1)
(3, 0)
Figure 3 – SC decoding tree and schedule for PC(8, 4).
The XOR encoding tree in Figure 1 is reinterpreted as a binary tree as shown in
Figure 3, and the stage λ and branch φ is used to identify each node, denoted by (λ, φ).
A node v = (λ, φ) has associated LLR values αv[i] and bit estimates βv[i], where i ∈
[0,Λ − 1]. The LLR’s of the root node at (n, 0) are obtained directly from the channel
output, and the LLR’s of child nodes v = (λ, φ) are calculated from the parent node
p =
(
λ+ 1,
⌊
φ
2
⌋)
and previous branch u = (λ, φ − 1) according to Equation (4). The
LLR α(0,i) calculated for a leaf node is the desired LLR
(
yN−10 , uˆ
i−1
0 |uˆ
i
)
.
α(n,0)[i] = LLR(y[i]) , i ∈ [0, N − 1]
αv[i] =


f(αp[i], αp[i+ Λ]) if φ is even
g(αp[i], αp[i+ Λ], βu[i]) if φ is odd
, i ∈ [0,Λ− 1]
(4)
The bit estimates of the leaf nodes at (0, i) correspond to uˆi, and are obtained via a
4
hard decision on its LLR. The bit estimates of parent nodes v = (λ, φ) are calculated by
propagating those of both the child nodes l = (λ − 1, 2φ) and r = (λ − 1, 2φ + 1), as
shown in Equation (5).
β(0,i)[i] =


HD(α(0,i)[i]) if i ∈ A
0 otherwise
, i ∈ [0, N − 1]
βv[i] = βl[i]⊕ βr[i]
βv
[
i+
Λ
2
]
= βr[i]
, i ∈
[
0,
Λ
2
− 1
]
(5)
Fast simplified successive cancellation
The fast simplified successive cancellation (FSSC) decoding algorithm [4] improves upon
the computational complexity of the SC decoding algorithm by recognizing constituent
codes in the SC decoding tree and pruning the nodes. The four nodes considered are:
• Rate-0
Rate-0 (R-0) nodes are the nodes in the SC tree below which all the leaf nodes
correspond to frozen bits. For an R-0 node at v = (λ, φ) in the decoding tree, no
further traversal is needed and the bit estimates for the stage can be update as
follows:
βv[i] = 0 , i ∈ [0,Λ− 1]
5
• Repetition
Repetition (REP) nodes contain only a single information bit at the rightmost leaf
node. The bit estimates for REP node at v = (λ, φ) in the tree can therefore only
be all 0’s or all 1’s, and the decision is made using an efficient ML decoding by:
βv[i] = HD
(
Λ−1∑
k=0
αv[k]
)
, i ∈ [0,Λ− 1]
• Rate-1
Rate-1 (R-1) nodes are the nodes in the SC tree below which all the leaf nodes
correspond to information bits. Similar to R-0 nodes, an R-1 node at v = (λ, φ)
requires no further traversal and the bit estimates for the stage can be updated by
taking a hard decision on the stage LLRs:
βv[i] = HD (αv[i]) , i ∈ [0,Λ− 1]
• Single parity check
Single parity check (SPC) nodes contain only a single frozen bit at the leftmost leaf
node. The bit estimates for REP node at v = (λ, φ) in the tree therefore have to
satisfy a parity constraint such that the XOR of all the estimates should be 0. This
can be achieved by computing the parity of the hard decisions of the REP node
LLRs, and then flipping the least reliable estimate if the parity is 1:
parity =
Λ−1⊕
k=0
HD (αv[k])
j = argmin
i
|αv[i]| , i ∈ [0,Λ− 1]
βv[i] =


HD (αv[i])⊕ parity if i = j
HD (αv[i]) otherwise
, i ∈ [0,Λ− 1]
6
Since the FSSC scheme does not traverse the decoding tree till the leaf nodes, the bit
estimates uˆi are not readily available. With non-systematic encoding, uˆi can be obtained
by re-encoding the estimated codeword present in the bit estimates at the root node of
the tree, β(n,0)[i], i ∈ [0, N −1]. With systematic encoding [13, 14], uˆi is directly available
in β(n,0)[i], i ∈ [0, N − 1].
Successive cancellation list
When the SC decoding algorithm encounters an information bit, an immediate decision
is made and half the potential remaining paths are discarded from consideration. On
the other hand, by considering both possibilities for information bits, ML decoding per-
formance is achieved at the cost of searching through paths that grow exponentially in
number. The successive cancellation list (SCL) decoding algorithm [2, 3] is a trade-off
between these two extremes in that it limits the number of paths under consideration to
a fixed list size L. At each information bit index, the number of paths is doubled. When
the number of paths exceeds L, the decoder only considers the L most reliable paths and
discards the rest.
In order to ascertain which paths should remain in the list and which should be dis-
carded, each path is associated with a path metric (PM) that is updated using the LLRs
when a decision is made at the leaf nodes for bit index i, as shown in (6) [15].
PMl =


PMl if uˆi = HD(α(0,i), l)
PMl + |α(0,i), l| otherwise
, ∀ l paths in the list (6)
After the SCL decoder has estimated all N bits, the path with the best PM is returned
as the decoding output. Results in [3] show a significant improvement in error correction
performance by appending a small cyclic redundancy check (CRC) code with the message
bits to aid the SCL decoder in choosing the correct path from the final candidates in the
list.
7
Fast simplified successive cancellation list
The FSSC scheme is applied to SCL decoding in [5], by defining the path creation and
PM update for an FSSC node v located at (λ, φ) as follows:
• Rate-0
An R-0 node creates no new paths, and the PM’s and node bit estimates are updated
according to:
βv,l[i] = 0 , i ∈ [0,Λ− 1]
PMl = PMl +
Λ−1∑
i=0
HD(αv,l[i]) |αv[i]|
, ∀ l paths in the list
• Repetition
REP nodes create only two candidate paths for each path in the list, and the bit
estimates and PM updates are given by:
β
(1)
v,l [i] = 0 , i ∈ [0,Λ− 1]
PM
(1)
l = PMl +
Λ−1∑
i=0
HD(αv[i]) |αv[i]|
β
(2)
v,l [i] = 1 , i ∈ [0,Λ− 1]
PM
(2)
l = PMl +
Λ−1∑
i=0
(1−HD(αv[i])) |αv[i]|
, ∀ l paths in the list
8
• Rate-1
R-1 nodes are limited in the number of paths that are created following the Chase-II
decoding algorithm. Each path in the list is extended with the four possible permu-
tations of flipping the hard decisions of the bits at indices m1 andm2, corresponding
to the two least reliable LLR’s. The 4L paths are then pruned back down to L.
β
(1)
v,l [i] = HD(αv,l[i]) , i ∈ [0,Λ− 1]
PM
(1)
l = PMl
β
(2)
v,l [i] =


HD(αv,l[i])⊕ 1 if i = m1
HD(αv,l[i]) otherwise
, i ∈ [0,Λ− 1]
PM
(2)
l = PMl + |αv[m1]|
β
(3)
v,l [i] =


HD(αv,l[i])⊕ 1 if i = m2
HD(αv,l[i]) otherwise
, i ∈ [0,Λ− 1]
PM
(3)
l = PMl + |αv[m2]|
β
(4)
v,l [i] =


HD(αv,l[i])⊕ 1 if i ∈ {m1, m2}
HD(αv,l[i]) otherwise
, i ∈ [0,Λ− 1]
PM
(4)
l = PMl + |αv[m1]|+ |αv[m2]|
, ∀ l paths in the list
• Single parity check
SPC nodes are more complex than the preceding nodes discussed because all can-
didate paths created have to pass the parity check of the node. The number of
candidate paths created in an SPC node are limited in a manner similar to R-1
nodes. Upon determining the indices m1, m2, m3 and m4 of the four least reliable
9
LLR’s, there are 16 possible permutations of flipping the hard decisions of the bits
at these indices, of which, only the half that satisfy the parity constraint are con-
sidered. The SPC node thus creates 8 candidate paths from each path in the list,
and the total of 8L paths are then pruned down to L. The bit estimates and PM
update equations for the SPC node are omitted for the sake of brevity, and can be
referenced from [5].
Successive cancellation stack
The SCL decoder considers L candidate paths for each bit estimate in the codeword,
resulting in a total search space of LN paths. At this point, the term iteration is defined
as a decoder making a leaf node bit estimate for a candidate path. The SC decoder
therefore takes N iterations to produce the decoding result, while the SCL decoder takes
NL iterations.
The successive cancellation stack (SCS) algorithm [9] is a sequential traversal through
the same search space as the SCL decoder. The algorithm begins by extending an initial
path following the SC procedure, and updating its PM following Equation (6). At the
time of estimating information bits, both candidates are considered and the less reliable
path is stored in a stack of size D that is assumed to be sufficiently large. As the algorithm
proceeds, the number of candidates in the stack grows, and in each iteration only the path
with the winning PM is extended.
The stack contains candidates of different lengths, and over the course of decoding if
L paths of length Ω ∈ [1, N ] have been extended, then all paths with length ω ≤ Ω are
removed from the stack [10], thus ensuring the same search space as SCL.
If the winning path has a length of N , its bit estimates are returned as the decoding
result and the algorithm terminates. Alternatively, the CRC-aided scheme in SCL can be
applied to validate the decoded result [10]. If the CRC check fails, the path is removed
from the stack and the algorithm continues. By nature of the algorithm, if L paths fail the
final CRC check, then all paths are removed from the stack and the algorithm terminates.
10
λ3
2
1
0
α memory: For LLR calculation β memory: For bit propagation
Figure 4 – α and β memory for PC(8, 4).
An upper bound on the size of the stack is D = LN [10], which is the maximum
number of paths the SCS algorithm can investigate. While results in [9, 10, 16–18] show
that it is possible to achieve similar error correction performance with much smaller values
of D (especially at high SNR’s), there is no general approach to determining the smaller
value of D for different code parameters and channel conditions. In SCS implementations
where D is less than the upper bound and the stack is full, new candidate paths replace
the path with the least reliable PM, and only if the PM of the new path is more reliable
itself.
III Implementation Details
This section introduces the memory layout and decoding schedule implementation for the
successive cancellation family of polar decoders, which is then extended to incorporate
list decoding. Finally, the stack decoder implementation is discussed.
III-A Successive cancellation decoders
The SC and FSSC algorithms make use of the α and a β memory tree structures shown
in Figure 4 to store intermediate LLR calculations and bit propagations. The memory
is structured according to the space efficient scheme outlined in [3], and has a spatial
complexity O(N) that scales linearly with the code length N .
Each stage λ in the α memory is only given Λ slots of memory - enough to store the
LLR’s of a single branch φ. This is possible because upon observing the SC schedule,
11
Algorithm 1: recursively_calc_α(λ, φ)
1 if λ = n then
2 return
3 if φ is even then
4 recursively_calc_α(λ + 1,
⌊
φ
2
⌋
)
5 v = (λ, φ)
6 u = (λ, φ− 1)
7 p =
(
λ+ 1,
⌊
φ
2
⌋)
8 Λ = 2λ
9 for i = 0, 1, . . . ,Λ− 1 do
10 if φ is even then
11 αv[i] = f(αp[i], αp[i+ Λ])
12 else
13 αv[i] = g(αp[i], αp[i+ Λ], βu[i])
one can see that when calculating the LLR’s αv[i] at a node v = (λ, φ), the LLR’s for all
branches φ′ < φ in the same stage λ will not be used again and can be safely overwritten.
Each stage in the β memory is given 2Λ slots of memory. This is because a stage must
store the bit estimates from two child branches in order to update the parent node in
the stage above. Once the bit estimates have been propagated, the values can be safely
overwritten by subsequent nodes in the stage.
The schedule of operations in the SC decoder is realized at run time using the index i
of the current bit uˆi being estimated, by implementing Equations (4) and (5) according
to Algorithms 1 and 2 [3].
Computing the FSSC schedule at run time incurs a significant computational penalty
since the entire decoding tree has to be traversed to identity the FSSC nodes. The
FSSC schedule is therefore created and stored as the decoder is instantiated, which the
decoder can then load and loop through for each decoding run. The schedule is stored as
operations and the nodes in the tree at which they are performed. Figure 5 illustrates an
example of an FSSC schedule created for the SC decoding tree in Figure 3. By abuse of
notation, the operations that implement Equations (4) and (5) are denoted by α and β
respectively, and the operations R-0, R-1, REP and SPC implement the equation for
the corresponding node.
12
Algorithm 2: recursively_update_β(λ, φ)
1 if φ is even then
2 return
3 v = (λ+ 1,
⌊
φ
2
⌋
)
4 l = (λ, φ− 1)
5 r = (λ, φ)
6 Λ = 2λ
7 for i = 0, 1, . . . ,Λ− 1 do
8 βv[i] = βl[i]⊕ βr[i]
9 βv [i+ Λ] = βr[i]
10 recursively_update_β(λ + 1,
⌊
φ
2
⌋
)
Operation:
Node:
α
(2,0)
REP
(2,0)
α
(2,1)
SPC
(2,1)
β
(2,1)
Figure 5 – FSSC schedule for PC(8, 4) with A = 3, 5, 6, 7.
III-B List decoders
The SCL family of decoders extend up to L paths simultaneously, each of which have
different values for intermediate LLR’s and bit estimates. The SCL and FSSCL therefore
instantiate L copies of the α and a β memory trees from Section III-A, resulting in a
spatial complexity of O(LN).
The naive approach to use these memory trees is to duplicate the α and β values for
new candidate paths, which results in wasted memory operations for paths that are killed
before the values are used.
The authors in [3] propose a lazy-copy scheme in which α and β memory is allocated
stage by stage, rather than the tree as a whole, and new candidate paths point to the
memory of the parent path that created them. Memory duplication now only occurs
when a path needs to modify a stage in memory pointed to by multiple paths, and only
that stage is duplicated. The SCL and FSSCL decoders in this work implement a minor
modification to the lazy-copy scheme of [3] to support decoding in the LLR domain.
13
The decoding schedule for SCL and FSSCL is realized in the same manner as outlined
for their counterparts SC and FSSC.
III-C Stack decoder
A candidate path that is placed on the stack must store its:
• path metric (PM)
• path length (PL)
• bit estimates uˆi , i ∈ [0, N − 1]
• intermediate α and β values
The stack is implemented as D length arrays of these data-structures, and when a
path is placed on the stack, it is assigned an index at which to store its values in these
arrays. The winning path for each iteration is determined through a linear search on the
PM arrays.
The PM and PL arrays are one-dimensional with a space complexity of O(D), while the
bit estimates array is two dimensional with a complexity of O(DN). The data-structure
for the α and β values follows the same structure as in Figure 4, resulting in a memory
complexity of O(DN). Its usage is also governed by the lazy copy scheme in [3]. Finally,
the schedule for SCS is realized following the same Algorithms 1 and 2 as in the SC
decoder.
Based on the observation that the SCS decoder extends only one path at a time, a
reduced memory scheme (SCS-RM) is proposed in [11] in which only a single copy of the
α and β memory is instantiated. The initial path is created, and as long as there is no
path switch, intermediate α and β values remain valid and the path can continue to be
extended. Potential candidates that are created store only their PM, PL and leaf node
bit estimates uˆi−10 , where i is the current length of the path.
14
Algorithm 3: Populating β memory with a new path p
1 for i = 0, 1, . . . , PLp − 1 do
2 β(0,i)[0] = uˆp[i]
3 recursively_update_β(0, i)
Algorithm 4: recursively_calc_α(λ, φ) modified for SCS-RM
1 if λ = n then
2 return
3 if φ is even or path has switched then
4 recursively_calc_α(λ + 1, ψ)
5 v = (λ, φ)
6 u = (λ, φ− 1)
7 p =
(
λ+ 1,
⌊
φ
2
⌋)
8 Λ = 2λ
9 for β = 0, 1, . . . ,Λ do
10 if φ is even then
11 αv[i] = f(αp[i], αp[i+ Λ])
12 else
13 αv[i] = g(αp[i], αp[i+ Λ], βu[i])
A path switch renders the α and β memory values invalid, which now have to be re-
calculated for the new path. This is achieved by first populating the β memory with the
estimates uˆi−10 , following which the α memory is updated by initiating the calculation of
LLR
(
yN−10 , uˆ
i−1
0 |uˆ
i
)
from the channel values at the root node of the decoding tree, rather
than from an intermediate stage as dictated by the standard SC decoding procedure. The
α and β memory can now be used following the traditional SC schedule until the next
path switch is encountered, at which point the re-calculation is performed again
Populating the β memory for the newly switched path p uses the same Algorithm 2
defined for the SC schedule. The procedure is highlighted in Algorithm 3, which is
performed only once when the path is switched. Recalculating the αmemory from the root
node requires a minor modification to the SC Algorithm 1, as highlighted in Algorithm 4.
15
IV Fast simplified stack decoding
The FSSCL scheme of [5] can readily be applied to the SCS decoder. The key difference is
that the FSSCL decoder has all candidate paths available at a given node, and is able to
prune paths and pick the survivors immediately. In contrast, the FSSCS decoder creates
all the candidate paths for the node and places them on the stack, and the paths are
either further extended or killed at a later point following the SCS algorithm rules.
Two key implementational details are highlighted, the first of which is that the FSSCS
decoder switches between paths at different points in the FSSC schedule. While the path
length alone can be used to determine the coordinates of the current node (λ, φ) in the
decoding tree, it is not sufficient to determine which FSSC operation (α, β, R-0, R-1,
REP or SPC) must be performed. To this end, when a path is placed on the stack, it
stores and additional parameter - its current progress in the FSSC schedule.
The second detail involves applying the SCS-RM scheme of [11] to the FSSCS decoder,
referred to as FSSCS-RM. Since the FSSC scheme does not necessarily traverse down to
the root nodes to make bit estimates, it is impossible for the FSSCS-RM decoder to
repopulate the β memory via the SCS-RM Algorithm 3. This hurdle is overcome by
changing the structure of the β memory of the FSSCS-RM decoder. The work in [19]
presents an efficient scheme to compute and store the β values in the context of VLSI
design, which is adapted to software in this work.
The β memory is now organized as an array of N bits as shown in Figure 6a. When
a nodes bit estimates are made following the FSSCL equations, the estimates are stored
directly in the β memory array beginning at the index corresponding to the length of the
path. In the case of a β propagation operation, the bits are XOR-ed in place. Figure 6
shows the usage of the β memory for the FSSC schedule of Figure 5. In Figures 6b
and 6c, the four bits corresponding to the REP and SPC node respectively are stored
at the correct locations, following which Figure 6d shows the β operation performed in
place.
The final content of the β memory is the estimated codeword at the root node of the
tree, and by using systematic encoding, the estimated message bits are readily available.
16
(a) β memory structure.
β
[0]
(2,0) β
[1]
(2,0) β
[2]
(2,0) β
[3]
(2,0)
(b) β memory contents following REP node.
β
[0]
(2,0) β
[1]
(2,0) β
[2]
(2,0) β
[3]
(2,0) β
[0]
(2,1) β
[1]
(2,1) β
[2]
(2,1) β
[3]
(2,1)
(c) β memory contents following SPC node.
β
[0]
(3,0) β
[1]
(3,0) β
[2]
(3,0) β
[3]
(3,0) β
[4]
(3,0) β
[5]
(3,0) β
[6]
(3,0) β
[7]
(3,0)
(d) β memory contents following propagation.
Figure 6 – β memory structure and usage in the FSSCS-RM decoder for PC(8, 4).
This leads to the observation that each path on the stack can store the β array directly
instead of the bit estimates uˆi. An additional advantage is that a path switch in the
FSSCS-RM scheme does not need to re-populate the β array, since the propagations are
already correctly stored in place.
V Results and discussion
Simulations are performed for PC(1024, 512), and the set of information bit indices A is
obtained from the polar code sequence listed in the 3GPP technical specification for the
5G standard [7]. The CRC used in all variants of the SCL and SCS decoders is the 24-bit
CRC-24C with a polynomial of 0xB2B117, also provided in [7]. The list parameter L is
set to 8 and the stack size D is set to the maximum size NL = 8192 for all decoders.
All code is written in C language and compiled with GCC version 6.3.0 using the
-Ofast, -march=native, -funroll-loops and -finline-functions compile flags. α
and β values are implemented using 32-bit floating point numbers and 8-bit unsigned
integers respectively. Simulations are run using 6 threads on an AMD Ryzen 5 1600
6-Core CPU clocked at 3.2 GHz. The T/P of the decoder is reported as an average per
thread, and considering only information bits.
Figure 7a exhibits that the FER performance is maintained for all variants of the stack
and list deocders. The slight FER permornace degradation in the fast simplified decoders
is attributed to the Chase-II approximation used [5].
Figure 7b shows that the baseline T/P of the SCS decoder is, at best, 9 Kbps at an
EbNo of 3 dB, which is more than an order of magnitude lower than the SCL T/P of
17
0 0.5 1 1.5 2 2.5
10−4
10−3
10−2
10−1
100
Eb/No (dB)
F
E
R
SCS
SCS-RM
SCL
FSSCS-RM
FSSCL
(a) FER.
0 0.5 1 1.5 2 2.5 3
10−3
10−2
10−1
100
Eb/No (dB)
T
/P
(M
b
/s
)
SCS
SCS-RM
SCL
FSSCS-RM
FSSCL
(b) T/P.
Figure 7 – Performance of the FSSCS decoder.
314 Kbps. The SCS-RM scheme is able to improve the SCS throughput by more than
an order of magnitude to 232 Kbps. The FSSCL decoder reports a T/P of 1.22 Mbps,
which is four times the T/P of SCL. Finally, applying the fast simplified scheme to SCS
decoding results in similar throughput gains as observed with SCL. At an EbNo of 3 dB,
FSSCS-RM provides a T/P of 930 Kbps, which is four times the T/P of SCS-RM and
two orders of magnitude more than the T/P of the baseline SCS.
18
VI Conclusion
This report outlines a procedure for applying the fast simplified scheme [5] to the reduced
memory stack decoder [11]. Results show that the T/P of the FSSCS-RM decoder is
improved by two orders of magnitude over the baseline SCS decoder, from 9 Kbps to
930 Kbps. The FSSCS-RM decoder using the largest stack size achieves the T/P of the
FSSCL decoder at practical SNR’s.
19
Bibliography
[1] E. Arikan, “Channel polarization: A method for constructing capacity-achieving
codes for symmetric binary-input memoryless channels,” IEEE Transactions on In-
formation Theory, vol. 55, pp. 3051–3073, July 2009.
[2] K. Chen, K. Niu, and J. R. Lin, “List successive cancellation decoding of polar codes,”
Electronics Letters, vol. 48, pp. 500–501, April 2012.
[3] I. Tal and A. Vardy, “List decoding of polar codes,” IEEE Transactions on Informa-
tion Theory, vol. 61, pp. 2213–2226, May 2015.
[4] G. Sarkis, P. Giard, A. Vardy, C. Thibeault, and W. J. Gross, “Fast polar decoders:
Algorithm and implementation,” IEEE Journal on Selected Areas in Communica-
tions, vol. 32, pp. 946–957, May 2014.
[5] G. Sarkis, P. Giard, A. Vardy, C. Thibeault, and W. J. Gross, “Fast list decoders for
polar codes,” IEEE Journal on Selected Areas in Communications, vol. 34, pp. 318–
328, Feb 2016.
[6] S. A. Hashemi, C. Condo, and W. J. Gross, “Fast simplified successive-cancellation
list decoding of polar codes,” in 2017 IEEE Wireless Communications and Network-
ing Conference Workshops (WCNCW), pp. 1–6, March 2017.
[7] 3rd Generation Partnership Project (3GPP), “Multiplexing and channel coding,”
3GPP TS 38.212 V.15.1.1, 2018.
[8] “Final report of 3GPP TSG RAN WG1 #87 v1.0.0.”
http://www.3gpp.org/ftp/tsg_ran/WG1_RL1/TSGR1_87/Report/Final_Minutes_report_RAN1%2387_v100.zip.
Reno, USA, November 2016.
20
[9] K. Niu and K. Chen, “Stack decoding of polar codes,” Electronics Letters, vol. 48,
pp. 695–697, June 2012.
[10] K. Niu and K. Chen, “Crc-aided decoding of polar codes,” IEEE Communications
Letters, vol. 16, pp. 1668–1671, October 2012.
[11] H. Aurora, C. Condo, and W. J. Gross, “Low-complexity software stack decoding
of polar codes,” in 2018 IEEE International Symposium on Circuits and Systems
(ISCAS), pp. 1–5, May 2018.
[12] C. Leroux, I. Tal, A. Vardy, and W. J. Gross, “Hardware architectures for successive
cancellation decoding of polar codes,” in 2011 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP), pp. 1665–1668, May 2011.
[13] E. Arikan, “Systematic polar coding,” IEEE Communications Letters, vol. 15,
pp. 860–862, August 2011.
[14] G. Sarkis, I. Tal, P. Giard, A. Vardy, C. Thibeault, and W. J. Gross, “Flexible and
low-complexity encoding and decoding of systematic polar codes,” IEEE Transac-
tions on Communications, vol. 64, pp. 2732–2745, July 2016.
[15] A. Balatsoukas-Stimming, M. B. Parizi, and A. Burg, “Llr-based successive cancella-
tion list decoding of polar codes,” IEEE Transactions on Signal Processing, vol. 63,
pp. 5165–5179, Oct 2015.
[16] K. Chen, K. Niu, and J. Lin, “Improved successive cancellation decoding of polar
codes,” IEEE Transactions on Communications, vol. 61, pp. 3100–3107, August 2013.
[17] V. Miloslavskaya and P. Trifonov, “Sequential decoding of polar codes,” IEEE Com-
munications Letters, vol. 18, pp. 1127–1130, July 2014.
[18] P. Trifonov, V. Miloslavskaya, and R. Morozov, “Fast sequential decoding of polar
codes,” CoRR, vol. abs/1703.06592, 2017.
[19] G. Berhault, C. Leroux, C. Jego, and D. Dallet, “Partial sums computation in polar
codes decoding,” in 2015 IEEE International Symposium on Circuits and Systems
(ISCAS), pp. 826–829, May 2015.
21
