Threshold-Based Fast Successive-Cancellation Decoding of Polar Codes by Zheng, Haotian et al.
1Threshold-Based Fast Successive-Cancellation
Decoding of Polar Codes
Haotian Zheng, Student Member, IEEE, Seyyed Ali Hashemi, Member, IEEE,
Alexios Balatsoukas-Stimming, Member, IEEE, Zizheng Cao, Member, IEEE, Ton Koonen, Fellow, IEEE,
John Cioffi, Fellow, IEEE, Andrea Goldsmith, Fellow, IEEE
Abstract—This paper focuses on developing fast successive-
cancellation (SC) decoding methods for polar codes. Fast SC
decoding overcomes the latency caused by the serial nature of the
SC decoding by identifying new nodes in the upper levels of the
SC decoding tree and implementing their fast parallel decoders.
Our proposed methods consist of several new techniques. First,
a novel sequence repetition node corresponding to a class of
bit sequences is presented. Most existing special node types are
special cases of the proposed sequence repetition node. Then a
fast parallel decoder is proposed for this class of node. To further
speed up the decoding process of general nodes outside this class,
a threshold-based hard-decision-aided scheme is introduced.
The threshold value that guarantees a given error-correction
performance in the proposed scheme is derived theoretically.
Analyses and simulations on a polar code of length 1024 and
rate 1/2 show that the fast decoding algorithm with the proposed
node can provide 19% latency reduction at Eb/N0 = 2.0 dB
compared to the fastest SC decoding algorithm in literature
without tangibly altering the error-correction performance of the
code. In addition, with the help of the proposed threshold-based
hard-decision-aided scheme, the decoding latency can be further
reduced by 54% at Eb/N0 = 5.0 dB.
Index Terms—Polar codes, Fast successive-cancellation decod-
ing, Sequence repetition node, Threshold-based hard-decision-
aided scheme.
I. INTRODUCTION
POLAR codes represent a channel coding scheme that canprovably achieve the capacity of a binary-input mem-
oryless channel [1]. The explicit coding structure and low-
complexity successive-cancellation (SC) decoding algorithm
has generated significant interest in polar code research across
both industry and academia. In particular, in the latest cellular
standard of 5G [2], polar codes are adopted in the control
channel of the enhanced mobile broadband (eMBB) use case.
Although SC decoding provides a low-complexity capacity-
achieving solution for polar codes with long block length, its
sequential bit-by-bit decoding nature leads to high decoding
latency, which constrains its application in low-latency com-
munication scenarios such as the ultra-reliable low-latency
communication (URLLC) [2] scheme of 5G. Therefore, the
H. Zheng, A. Balatsoukas-Stimming, Z. Cao, and A. M. J. Koonen are
with the Department of Electrical Engineering, Eindhoven University of
Technology, The Netherlands (e-mails: {h.zheng, a.k.balatsoukas.stimming,
z.cao, a.m.j.koonen}@tue.nl). (Corresponding author: Zizheng Cao)
S. A. Hashemi, J. Cioffi, and A. Goldsmith are with the Department
of Electrical Engineering, Stanford University, Stanford, CA 94305 USA
(e-mails: {ahashemi, cioffi}@stanford.edu, andrea@wsl.stanford.edu).
Part of this work is accepted for presentation at the IEEE International
Conference on Communications, Dublin, Ireland, June 2020.
design of fast SC-based decoding algorithms for polar codes
with low decoding latency has received a lot of attention [3].
A look-ahead technique was adopted to speed up the bit-by-
bit SC decoding in [4], [5] by pre-computing all the possible
likelihoods of the bits that have not been decoded yet, and
selecting the appropriate likelihood once the corresponding
bit is estimated. Using the binary tree representation of SC
decoding of polar codes, instead of working at the bit-level
which corresponds to the leaf nodes of the SC decoding tree,
parallel multi-bit decision is performed at the intermediate
nodes of the SC decoding tree. An exhaustive-search decoding
algorithm is used in [6]–[9] to make multi-bit decisions and to
avoid the latency caused by the traversal of the SC decoding
tree to compute the intermediate likelihoods. However, due to
the high complexity of the exhaustive search, this method is
generally only suitable for nodes that represent codes of very
short lengths.
It was shown in [10] that a node in the SC decoding tree
that represents a code of rate 0 (Rate-0 node) or a code of rate
1 (Rate-1 node) can be decoded efficiently without traversing
the SC decoding tree. In [11], fast decoders of repetition (REP)
and single parity-check (SPC) nodes were proposed for the SC
decoding. Techniques were developed in [12]–[15] to adjust
the codes that are represented by the nodes in the SC decoding
tree to increase the number of nodes that can be decoded
efficiently. However, these methods result in a degraded error-
correction performance. On the basis of the works in [10],
[11], five new nodes (Type-I, Type-II, Type-III, Type-IV, and
Type-V) were identified and their fast decoders were designed
in [16]. In [17], a generalized REP (G-REP) node and a
generalized parity-check (G-PC) node were proposed to reduce
the latency of the SC decoding even further. In [18], seven of
the most prevalent node patterns in short polar codes were
analysed and efficient algorithms for processing these node
patterns in parallel are proposed. However, the decoding of
some of these node patterns leads to significant performance
loss. All these works require the design of a separate decoder
for each class of node, which inevitably increases the imple-
mentation complexity. In addition, as shown in this work, the
achievable parallelism in decoding can be further increased
without degrading the error-correction performance.
For general nodes that do not fall in one of the above
node categories, [19] proposed a hard-decision scheme based
on node error probability. Specifically, in that work it was
shown that extra latency reduction can be achieved when the
communications channel has low noise. However, the hard-
ar
X
iv
:2
00
5.
04
39
4v
1 
 [c
s.I
T]
  9
 M
ay
 20
20
2decision threshold is calculated empirically rather than for a
desired error-correction performance. In [20], a hypothesis-
testing-based strategy is designed to select reliable unstruc-
tured nodes for hard decision. However, additional operations
are required to be performed to calculate the decision rule,
thus, incurring extra decoding latency. For all the existing
hard-decision schemes, a threshold comparison operation is
required each time a general constituent code is encountered
in the course of the SC decoding algorithm.
In this paper, a fast SC decoding algorithm with a higher
degree of parallelism than the state of the art is proposed.
First, a class of sequence repetition (SR) nodes is proposed
which provides a unified description of most of the existing
special nodes. This class of nodes is typically found at a higher
level of the decoding tree than other existing special nodes.
Utilizing this class of nodes, a fast simplified SC decoding
algorithm called the SR node-based fast SC (SRFSC) decod-
ing algorithm is proposed. The proposed SRFSC decoding
algorithm achieves a higher degree of parallelism and has
smaller latency than the state of the art, without degrading the
error-correction performance. Performance results show that
the proposed SRFSC decoder achieves up to 50%, 31%, and
29% latency reduction with respect to the decoders in [11],
[16], and [17], respectively.
Second, a threshold-based hard-decision-aided (TA) scheme
is proposed to speed up the decoding of the nodes that are
not SR nodes for a binary additive white Gaussian noise
(BAWGN) channel. Consequently, a TA-SRFSC decoding
algorithm is proposed that adopts a simpler threshold for
hard-decision than that in [19]. The effect of the defined
threshold on the error-correction performance of the pro-
posed TA-SRFSC decoding algorithm is analyzed. Moreover,
a systematic way to derive the threshold value for a desired
upper bound for its block error rate (BLER) is determined.
Performance results show that, with the help of the proposed
TA scheme, the decoding latency of SRFSC decoding can
be further reduced by 54% at Eb/N0 = 5 dB on a polar
code of length 1024 and rate 1/2. To mitigate the possible
error-correction performance loss caused by the proposed TA
scheme, a multi-stage decoding strategy is introduced that
achieves significant average latency reduction with negligi-
ble error-correction performance deterioration with respect to
SRFSC decoding.
The rest of this paper is organized as follows. Section II
gives a brief introduction to the basic concept of polar codes
and fast SC decoding. In Section III, the SRFSC decoding
algorithm is introduced. With the help of the proposed TA
scheme, the TA-SRFSC decoding is presented in Section IV.
Section V analyzes the decoding latency and simulation results
are shown in Section VI. Finally, Section VII gives a summary
of the paper and concluding remarks.
II. PRELIMINARIES
A. Notation Conventions
In this paper, blackboard letters, such as X, denote a set
and |X| denotes the number of elements in X. Bold letters,
such as v, denote a row vector, vT denotes the transpose of
N 13N 12
N 22
N 11
N 21
N 31
N 41
N10
N20
N30
N40
N50
N60
N70
N80
û [1]
û [2]
û [3]
û [4]
û [5]
û [6]
û [7]
û [8]
y [1]
y [2]
y [3]
y [4]
y [5]
y [6]
y [7]
y [8]
+ + +
+ + +
+ + +
+ + +
= = =
= = =
= = =
= = =
→ ←
→ ←
→ ←
→ ←
→ ←
→ ←
→ ←
→ ←
β10 α
1
0
β20 α
2
0
β30 α
3
0
β40 α
4
0
β50 α
5
0
β60 α
6
0
β70 α
7
0
β80 α
8
0
→ ←
→ ←
→ ←
→ ←
→ ←
→ ←
→ ←
→ ←
β11 [1] α
1
1 [1]
β11 [2] α
1
1 [2]
β21 [1] α
2
1 [1]
β21 [2] α
2
1 [2]
β31 [1] α
3
1 [1]
β31 [2] α
3
1 [2]
β41 [1] α
4
1 [1]
β41 [2] α
4
1 [2]
→ ←
→ ←
→ ←
→ ←
→ ←
→ ←
→ ←
→ ←
β12 [1] α
1
2 [1]
β12 [2] α
1
2 [2]
β12 [3] α
1
2 [3]
β12 [4] α
1
2 [4]
β22 [1] α
2
2 [1]
β22 [2] α
2
2 [2]
β22 [3] α
2
2 [3]
β22 [4] α
2
2 [4]
→
→
→
→
→
→
→
→
β13 [1]
β13 [2]
β13 [3]
β13 [4]
β13 [5]
β13 [6]
β13 [7]
β13 [8]
Fig. 1. SC decoding on the factor graph of a polar code with N = 8.
v, and notation v [i : j], 1 ≤ i < j represents a subvector
(v [i] , v [i+ 1] , . . . , v [j]). ⊕ is used as the bitwise XOR oper-
ation and v [i : j]⊕z = (v [i]⊕ z, v [i+ 1]⊕ z, . . . , v [j]⊕ z),
z ∈ {0, 1}. The Kronecker product of two matrices F and
G is written as F ⊗ G. FN represents an N × N square
matrix and F⊗nN denotes the n-th Kronecker power of FN .
Throughout this paper, ln(x) denotes the natural logarithm
and log(x) indicates the base-2 logarithm of x, respectively.
B. Polar Codes
A polar code with code length N = 2n and information
length K is denoted by P (N,K) and has rate R = K/N .
The encoding process can be expressed as x = uGN , where
u = (u [1] , u [2] , . . . , u [N ]) is the input bit sequence and x =
(x [1] , x [2] , . . . , x [N ]) is the encoded bit sequence. GN =
RNF
⊗n
2 is the generator matrix, where RN is a bit-reversal
permutation matrix and F2 = [ 1 01 1 ].
The input bit sequence u consists of K information bits
and N − K frozen bits. The information bits form set A,
transmitting information bits, while the frozen bits form set Ac,
transmitting fixed bits known to the receiver. For symmetric
channels, without loss of generality, all frozen bits are set to
zero [1]. To distinguish between frozen and information bits,
a vector of flags d = (d [1] , d [2] , . . . , d [N ]) is used where
each flag d [k] is assigned as
d [k] =
{
0, if k ∈ Ac,
1, otherwise.
(1)
The codeword x is transmitted through a channel after mod-
ulation. In this paper, non-systematic polar codes and binary
phase-shift keying (BPSK) modulation which maps {0, 1} to
{+1,−1} are considered. Transmission takes place over an
additive white Gaussian noise (AWGN) channel.
C. SC Decoding and Binary Tree Representation
SC decoding can be illustrated on the factor graph of
polar codes as shown in Fig. 1. The factor graph consists of
n + 1 levels and, by grouping all the operations that can be
performed in parallel, SC decoding can be represented as the
3traversal of a binary tree. This traversal is shown in Fig. 2,
starting from the left side of the binary tree. At level j of
the SC decoding tree with n+ 1 levels, there are 2n−j nodes
(0 ≤ j ≤ n), and the i-th node at level j (1 ≤ i ≤ 2n−j)
of the SC decoding tree is denoted as N ij . The left and the
right child nodes of N ij are N 2i−1j−1 and N 2ij−1, respectively, as
illustrated in Fig. 2. For N ij , αij [k], 1 ≤ k ≤ 2j , indicates
the k-th input logarithmic likelihood ratio (LLR) value, and
βij [k], 1 ≤ k ≤ 2j , denotes the k-th output binary hard-
valued message. For the AWGN channel, the received vector
y = (y [1] , y [2] , . . . , y [N ]) from the channel can be used
to calculate the channel LLR as 2y/σ2, where σ2 is the
variance of the Gaussian noise. SC decoding starts by setting
α1n [1 : N ] = 2y/σ
2. A node will be activated once all its
inputs are available. When LLR messages pass through a node
in the factor graph which is indicated by the ⊕ sign, the f
function over the LLR domain is executed as
α2i−1j−1 [k] = f
(
αij [2k − 1] , αij [2k]
)
, (2)
and when LLR messages pass through a node in the factor
graph which is indicated by the = sign, the g function over
the LLR domain is executed as
α2ij−1 [k] = g
(
αij [2k − 1] , αij [2k] , β2i−1j−1 [k]
)
, (3)
where
f (x, y) = 2 arctanh
(
tanh
(x
2
)
tanh
(y
2
))
, (4)
g (x, y, u) = (−1)u x+ y. (5)
The f function can be approximated as [21]
f (x, y) = sign (x) sign (y)min (|x| , |y|) . (6)
When the LLR value of the k-th bit at level zero αk0 , 1 ≤ k ≤
N , is calculated, the estimation of u [k], denoted as uˆ [k], can
be obtained as
uˆ [k] = βˆk0 =
{
0, if k ∈ Ac,
1−sign(αk0 )
2 , otherwise.
(7)
The hard-valued messages are propagated back to the parent
node as
βˆij [k] =
{
βˆ2i−1j−1
[
k+1
2
]⊕ βˆ2ij−1 [k+12 ] , if mod (k, 2) = 1,
βˆ2ij−1
[
k
2
]
, if mod (k, 2) = 0.
(8)
After traversing all the nodes in the SC decoding tree, uˆ
contains the decoding result. Thus the latency of SC decoding
for a polar code of length N in terms of the number of time
steps can be represented by the number of nodes in the SC
decoding tree as [1]
TSC = 2N − 2. (9)
D. Fast SC Decoding
The SC decoding has strong data dependencies that limit
the amount of parallelism that can be exploited within the
algorithm because the estimation of each bit depends on the
estimation of all previous bits. This leads to a large latency in
y [1 : 8]
N 13
N 12 N 22
N 11 N 21 N 31 N 41
N 10 N 20 N 30 N 40 N 50 N 60 N 70 N 80
û [1] û [2] û [3] û [4] û [5] û [6] û [7] û [8]
α12
β12
α22
β22
α11
β11
α21
β21
α31
β31
α41
β41
α10
β10
α20
β20
α30
β30
α40
β40
α50
β50
α60
β60
α70
β70
α80
β80
Fig. 2. Binary tree representation of a SC decoder for a polar code with
N = 8.
the SC decoding algorithm. It was pointed out in [8] that for
a node N ij , βij
[
1 : 2j
]
can be estimated without traversing the
decoding tree by calculating
βˆij
[
1 : 2j
]
= argmax
βij [1:2
j ]∈Cij
2j∑
k=1
(−1)βij [k] αij [k] , (10)
where Cij is the set of all the codewords associated with
node N ij . Multibit decoding can be performed directly in an
intermediate level instead of bit-by-bit sequential decoding at
level 0, in order to traverse fewer nodes in the SC decoding
tree and consequently, reducing the latency caused by data
computation and exchange. However, the evaluation of (10)
generally requires exhaustive search over all the codewords
in the set Cij which is computationally intensive in practice.
In [11], a fast SC (FSC) decoding algorithm was proposed
which performs fast parallel decoding when specific special
node types are encountered. The sequences of information
and frozen bits of these node types have special bit-patterns.
Therefore, they can be decoded more efficiently without the
need for exhaustive search. Using the vector d, the four special
node types proposed in [11] are described as:
• Rate-0 node: all bits are frozen bits, d = (0, 0, ..., 0).
• Rate-1 node: all bits are non-frozen bits, d = (1, 1, ..., 1).
• REP node: all bits are frozen bits except the last one,
d = (0, ..., 0, 1).
• SPC node: all bits are non-frozen bits except the first one,
d = (0, 1, ..., 1).
In [16], five additional special node types and their corre-
sponding fast decoders were introduced. This enhanced FSC
decoding algorithm can achieve a lower decoding latency than
FSC decoding. The five special node types are:
• Type-I node: all bits are frozen bits except the last two,
d = (0, ..., 0, 1, 1).
• Type-II node: all bits are frozen bits except the last three,
d = (0, ..., 0, 1, 1, 1).
• Type-III node: all bits are non-frozen bits except the first
two, d = (0, 0, 1, ..., 1).
• Type-IV node: all bits are non-frozen bits except the first
three, d = (0, 0, 0, 1, ..., 1).
4Rate-0
G-PC




1j 
j
r
Rate-1
Rate-1
Rate-0
Rate-0
G-REP
r
r


Rate-C
Fig. 3. General structures of G-PC node (left) and G-REP node (right).
• Type-V node: all bits are frozen bits except the last three
and the fifth to last, d = (0, ..., 0, 1, 0, 1, 1, 1).
A generalized FSC (GFSC) decoding algorithm was pro-
posed in [17] by introducing the G-PC node and the G-REP
node. The G-PC node is a node at level j having all its
descendants as Rate-1 nodes except the leftmost one at a
certain level r < j, that is a Rate-0 node. The G-REP node
is a node at level j for which all its descendants are Rate-0
nodes, except the rightmost one at a certain level r < j, which
is a generic node of rate C (Rate-C). The structures of G-PC
and G-REP nodes are depicted in Fig. 3.
The key advantage of using specific parallel decoders for the
aforementioned special nodes is that, since the SC decoding
tree is not traversed when one of these nodes is encountered,
significant latency saving can be achieved. For example, if
N ij is a Rate-1 node, hard decision decoding can be used to
immediately obtain the decoding result as
βˆij [k] = h
(
αij [k]
)
=
{
0, if αij [k] ≥ 0,
1, otherwise.
(11)
If N ij is a SPC node, hard decision based on (11) is first
derived followed by the calculation of the parity of the output
using modulo-2 addition. The index of the least reliable bit is
found as
k′ = argmin
k
∣∣αij [k]∣∣ . (12)
Finally, the bits in a SPC node are estimated as
βˆij [k] =
{
h
(
αij [k]
)⊕ parity, if k = k′,
h
(
αij [k]
)
, otherwise.
(13)
This operation can be performed in a single time step [16].
Finally, if N ij is a G-PC node, the decoding can be viewed as a
parallel decoding of several separate SPC nodes. The decoding
of a G-PC node can generally be performed in one time step
considering parallel SPC decoders [17].
All of the aforementioned fast SC decoding algorithms
perform parallel decoding at an intermediate level of the
decoding tree in order to reduce the number of traversed nodes.
In fact, a parallel decoding algorithm that can decode a node
at a higher level of the decoding tree generally results in more
savings in terms of latency than the one that decodes a node
at a lower level of the decoding tree. This is due to the fact
that a node at a higher level of the SC decoding tree has a
(a) SC (b) [11] (c) [16] (d) Proposed
Fig. 4. Example of a decoding tree of the proposed SR node in comparison
with different available special nodes.
Rate-0/REP
Rate-0/REP
SR node
E
r
ˆ E
r
i
j
ˆ i
j
1j 
j
r
Source node
i
j
E
r
Fig. 5. General structure of a sequence repetition node.
longer length than the nodes below it. The following section
introduces a class of nodes which is at a higher level of the SC
decoding tree and shows how parallel decoding at this class of
nodes can be exploited to achieve significant latency savings
in comparison with the state of the art.
III. FAST SC DECODING WITH SEQUENCE REPETITION
NODES
A. Sequence Repetition (SR) Node
Let N ij be a node at level j of the binary tree representation
of SC decoding as shown in Fig. 2. An SR node is any node at
stage j for which all its descendants are either Rate-0 or REP
nodes, except the rightmost one at a certain stage r, 0 ≤ r ≤ j,
that is a generic node of rate C. An example of a decoding
tree of the proposed SR node in comparison with different
available special nodes is illustrated in Fig. 4. The general
structure of an SR node is depicted in Fig. 5. The rightmost
node N i×2j−rr at stage r is denoted as the source node of the
SR node N ij . Let E = i × 2j−r so the source node can be
denoted as NEr .
An SR node can be represented by three parameters as
SR(v,SNT, r), where r is the level of the SC decoding
5EG-PC




1j 
j
r
Rate-1
Rate-1
Rate-0/REP
i
j
Fig. 6. General structure of Extended G-PC (EG-PC) Node.
tree in which NEr is located, SNT is the source node type,
and v = (v [j] , v [j − 1] , . . . , v [r + 1]) is a vector of length
(j − r) such that for the left child node of the parent node of
NEr at level k, r < k ≤ j, v [k] is calculated as
v [k] =
{
0, if the left child node is a Rate-0 node,
1, if the left child node is a REP node.
(14)
Note that when r = j, N ij is a source node and thus v is an
empty vector denoted as v = ∅.
B. Source Node
To define the source node type, an extended class of G-PC
(EG-PC) nodes is first introduced. The structure of the EG-PC
node is depicted in Fig. 6. The EG-PC node is different from
the G-PC node in its leftmost descendant node that can be
either a Rate-0 or a REP node. The bits in an EG-PC node
satisfy the following parity check constraint,
z =
k×2j−r⊕
m=(k−1)2j−r+1
βij [m] , (15)
where k ∈ {1, . . . , 2r}, and z ∈ {0, 1} is the parity. Unlike
G-PC nodes whose parity is always even (z = 0), the EG-PC
node can have either even parity (z = 0) or odd parity (z = 1).
The parity of the EG-PC node can be calculated as
z =
{
0, if the leftmost node is Rate-0,
h
(∑2r
k=1 αk
)
, otherwise,
(16)
where αk = 2 tanh−1
(∏k2j−r
m=(k−1)2j−r+1 tanh
(
αij [m]
2
))
. Af-
ter computing z, Wagner decoders [22] can be used to decode
the 2r SPC codes with either even or odd parity constraints.
SPC, Type-III, Type-IV, and G-PC nodes can be represented
as special cases of EG-PC nodes. As a result, most of the
common special nodes can be represented as SR nodes where
SNT ∈ {Rate-0,Rate-1,EG-PC,Rate-C}. Table I shows the
corresponding representation of different node types at level j
in the decoding tree as special cases of the SR nodes.
C. Repetition Sequence
In this subsection, a set of sequences, called repetition
sequences, is defined that can be used to calculate the output
TABLE I
SR NODE REPRESENTATION OF COMMON NODE TYPES.
Node Type SR Node Representation Length of v
Rate-0 SR(∅,Rate-0, j) 0
REP SR((0, . . . , 0),Rate-1, 0) j
SPC SR(∅,EG-PC, j) 0
Rate-1 SR(∅,Rate-1, j) 0
Type-I SR((0, . . . , 0),Rate-1, 1) j − 1
Type-II SR((0, . . . , 0),EG-PC, 2) j − 2
Type-III SR(∅,EG-PC, j) 0
Type-IV SR(∅,EG-PC, j) 0
Type-V SR((0, . . . , 0, 1),EG-PC, 2) j − 2
G-PC SR(∅,EG-PC, j) 0
G-REP SR((0, . . . , 0),Rate-C, r) j − r
bit estimates of an SR node based on the output bit estimates
of its source node. To derive the repetition sequences, v is used
to generate all the possible sequences that have to be XORed
with the output of the source node to generate the output bit
estimates of the SR node. Let ηk denote the rightmost bit value
of the left child node of the parent node of NEr at level k+1.
When v[k + 1] = 0, the left child node is a Rate-0 node so
ηk = 0. When v[k+1] = 1, the left child node is a REP node,
thus ηk can take the value of either 0 or 1. The number of
repetition sequences is dependent on the number of different
values that ηk can take. Let Wv denote the number of ‘1’s
in v. The number of all possible repetition sequences is thus
2Wv . Let S = {s1, . . . , s2Wv } denote the set of all possible
repetition sequences.
The output bits of SR node βji [1 : 2
j ] have the property that
their repetition sequence is repeated in blocks of length 2j−r.
Let βEr [1 : 2
r] denote the output bits of the source node of an
SR node N ij . The output bits for each block of length 2j−r in
N ij with respect to the output bits of its source node can be
written as
βij
[
(k − 1) 2j−r + 1 : k2j−r] = βEr [k]⊕ sl, (17)
where k ∈ {1, . . . , 2r} and sl = {sl[1], . . . , sl[2j−r]} is the l-
th repetition sequence in S. To obtain the repetition sequence
sl and with a slight abuse of terminology and notation for
convenience, the Kronecker sum operator  is used, which
is equivalent to the Kronecker product operator, except that
addition in GF(2) is used instead of multiplication. For each
set of values that ηk’s can take, sl can be calculated as
sl = (ηr, 0) (ηr+1, 0) · · · (ηj−1, 0) . (18)
Example 1 (Repetition sequences for SR((1, 1),EG-PC, 2)).
Consider the example in Fig. 4 in which the SR node N 14 is
located at level 4 of the decoding tree and its source node
N 42 is an EG-PC (in this case SPC) node located at level 2.
Since v = (1, 1), Wv = 2 and |S| = 4. For η1 ∈ {0, 1} and
6η2 ∈ {0, 1},
s1 = (0, 0) (0, 0) = (0, 0, 0, 0),
s2 = (1, 0) (0, 0) = (1, 1, 0, 0),
s3 = (0, 0) (1, 0) = (1, 0, 1, 0),
s4 = (1, 0) (1, 0) = (0, 1, 1, 0).

For a polar code with a given d, the locations of SR nodes
in the decoding tree are fixed and can be determined off-line.
Therefore, the repetition sequences in S of all of the SR nodes
can be pre-computed and used in the course of decoding.
D. Decoding of SR Nodes
To decode SR nodes, the LLR values αErl [1 : 2
r] of the
source nodeNEr are calculated based on the LLR values αij [1 :
2j ] of the SR node N ij for every repetition sequence sl by the
following proposition:
Proposition 1. Let αij [1 : 2j ] be the LLR values of the SR
node N ij and αErl [1 : 2r] be the LLR values of its source
node NEr associated with the l-th repetition sequence sl. For
k ∈ {1, . . . , 2r} and l ∈ {1, . . . , 2Wv},
αErl [k] =
2j−r∑
m=1
αij
[
(k − 1) 2j−r +m] (−1)sl[m] . (19)
Proof: See Appendix A.
Using (17) and (19), (10) can be written as
βˆij
[
1 : 2j
]
= argmax
βij [1:2
j ]∈Cij
2j∑
k=1
(−1)βij [k] αij [k]
= argmax
βEr [1:2
r]∈CEr
sl∈S
2r∑
k=1
(−1)βEr [k]
2j−r∑
m=1
αijl
[
(k−1)2j−r+m](−1)sl[m]
= argmax
βEr [1:2
r]∈CEr
l∈{1,...,|S|}
2r∑
k=1
(−1)βEr [k] αErl [k] . (20)
Thus, the bit estimates of an SR node βˆij
[
1 : 2j
]
can be
calculated by finding the bit estimates of its source node
βEr [1 : 2
r] using (20) and the repetition sequences as shown
in (17).
The decoding algorithm of an SR node N ij is described in
Algorithm 1. It first computes αErl for l ∈ {1, . . . , |S|} and
generates |S| new paths by extending the decoding path at the
j−r rightmost bits corresponding to ηr, ηr+1, . . . , ηj−1. Note
that the l-th path is generated when the repetition sequence is
sl and αErl , β̂
E
rl
, and β̂ijl are its soft and hard messages. Then,
the source node is decoded under the rule of the SC decoding.
If the source node is a special node, a hard decision is
made directly. Parity check and bit flipping will be performed
further using Wagner decoder if the source node is an EG-PC
node. Finally, the optimal decoding path index can be selected
according to the comparison in (21) and the decoding result
is obtained according to (17).
Based on Algorithm 1, the SR node-based fast SC (SRFSC)
decoding algorithm is proposed. It follows the SC decoding
algorithm schedule until an SR node is encountered where
Algorithm 1 is executed. Note that the path selection operation
in step 3 can be performed in parallel with the decoding of
the source node in step 2 and the following g function. Once
the selected index lˆ is obtained, only the l-th decoding path
will be retained and the remaining paths will be deleted.
Algorithm 1: Decoding algorithm of SR node N ij
Input: αij
[
1 : 2j
]
, S;
Output: βˆij
[
1 : 2j
]
;
1) Soft message computation
for l ∈ {1, . . . , |S|} do
Calculate αErl according to (19).
end
2) Decoding of source node NEr
for l ∈ {1, . . . , |S|} do
if SNT=Rate-C then
Decode source node NEr using αErl and obtain
βˆErl .
else
if SNT=Rate-0 then
βˆErl [k] = 0, k ∈ {1, . . . , 2r},
else
βˆErl [k] = h
(
αErl [k]
)
, k ∈ {1, . . . , 2r}.
end
end
if SNT=EG-PC then
Perform parity check and bit flipping on βˆErl
using αErl .
end
end
3) Comparison and path selection
lˆ = argmax
l∈{1,...,|S|}
2r∑
k=1
∣∣αErl [k]∣∣ . (21)
Return βˆijlˆ to parent node according to (17).
IV. HARD-DECISION-AIDED FAST SC DECODING WITH
SEQUENCE REPETITION NODES
In this section, a novel threshold-based hard-decision-aided
scheme is proposed for the BAWGN channel. The purpose of
this algorithm is to speed up the decoding of general nodes
with no specific structure in the SRFSC decoding, especially
when the transmission channel has low noise. Additionally, a
multi-stage decoding strategy is introduced to eliminate the
possible error-correction performance degradation caused by
the proposed TA scheme.
A. Threshold-based Hard-decision-aided Scheme
For a BAWGN channel with standard deviation σn, it
was shown in [23] that, considering all the previous bits are
7decoded correctly, the LLR value aij [k], 1 ≤ k ≤ 2j , input into
node N ij can be approximated as a Gaussian variable using a
Gaussian approximation as
αij [k] ∼ N
(
M ij [k] , 2
∣∣M ij [k]∣∣) , (22)
where M ij [k] is the expectation of α
i
j [k] [24] such that
M ij [k] =
{
mij , if β
i
j [k] = 0,
−mij , if βij [k] = 1,
(23)
and mij can be calculated recursively offline assuming the all-
zero codeword is transmitted as
m1n = 2/σ
2
n, (24)
m2i−1j−1 = ϕ
−1(1− [1− ϕ(mij)]2), (25)
m2ij−1 = 2m
i
j , (26)
where
ϕ(x) =
1− 1√4|x|
∫∞
−∞ tanh
(
u
2
)
e−
(u−x)2
4|x| du, x 6= 0,
0, x = 0.
(27)
It was shown in [19] that, when the magnitude of the
LLR values at a certain node in the SC decoding tree is
large enough, the node has enough reliability to perform hard
decision directly at the node without tangibly altering the
error-correction performance. To determine the reliability of
the node, a threshold is defined in [19] as
T = ct log
1− 12erfc
(
0.5
√
mij
)
1
2erfc
(
0.5
√
mij
) , (28)
where ct ≥ 1 is a constant that is selected empirically. The
hard-decision estimate of the received LLR values is calculated
using
HBij [k] =
{
0, if αij [k] > T,
1, if αij [k] < −T.
(29)
Fig. 7 shows the distribution of αij [k] under the Gaussian
approximation. The red area represents the probability of
correct hard decision Pc and the blue area represents the
probability of incorrect hard decision Pe when βij [k] = 0
such that
Pc = Q
T −mij√
2mij
 , (30)
Pe = Q
T +mij√
2mij
 , (31)
where
Q (x) =
1√
2pi
∫ ∞
x
e−
t2
2 dt. (32)
The area between the two dashed lines represents the proba-
bility that a hard decision is not performed, which is equal to
1− Pc − Pe.
The issue with the method in [19] is that the threshold de-
fined in (28) contains complex calculations of complementary
T−T−20 −10 0 10 200
0.02
0.04
0.06
0.08
0.1
0.12
βij [k] = 0β
i
j [k] = 1
Fig. 7. Probability distribution of αij [k] under Gaussian approximation for
mij = 8. The red dashed area on the right represents the probability of correct
hard decision and the blue solid area on the left represents the probability of
incorrect hard decision when βij [k] = 0.
error functions erfc (·), making the calculation in (30) and (31)
inefficient. Moreover, the hard-decision threshold is calculated
empirically rather than for a desired error-correction perfor-
mance and the threshold comparison in (29) is performed every
time an unstructured node is encountered in the SC decoding
process. To simplify the calculation of the threshold and also
solve the other two problems, we take a different approach
than [19] by using the Gaussian distribution of αij [k] in Fig. 7
and constrain the probability of error when βij [k] = 0 to be
Pe = Q
T +mij√
2mij
 < Q (c) , (33)
where c (and thus Q (c)) is a positive constant, whose selection
method will be given in Proposition 2. This is equivalent to
T > −mij + c
√
2mij . (34)
Therefore, the threshold can be written as
T =
∣∣∣−mij + c√2mij∣∣∣ , (35)
where the absolute value ensures T is positive for all values
of mij > 0 with any c > 0.
For a node that undergoes hard decision in (29), the pro-
posed threshold leads to a bounded probability of hard decision
error as shown in the following proposition.
Proposition 2. Let 0 < ε < 1 and c > 0 be real numbers
such that
Q (c) ≤ 1− 2n√ε. (36)
Performing hard decision in (29) with the threshold in (35) on
nodes N ij whose mij satisfy
mij ≥
1
2
[
c−Q−1
(
Q (c)
1
2n
√
ε
− 1
)]2
, (37)
results in a probability of hard-decision error that is upper
bounded by 1− ε for any ε close to 1.
8Proof: See Appendix B.
The proposed TA scheme performs hard decision on a
node N ij only if (37) is satisfied. Since (37) is calculated
offline, only a fraction of nodes undergo hard decision in
the decoding process, which avoids unnecessary threshold
comparisons. Furthermore, a hard decision is performed on
a node if all of its input LLR values αij [k] satisfy (29).
Otherwise, standard SC decoding is applied on N ij to obtain
the decoding result. To speed up the decoding process, the
proposed TA scheme is combined with SRFSC decoding that
results in the threshold-based hard-decision-aided SR node-
based fast SC (TA-SRFSC) decoding algorithm. In TA-SRFSC
decoding, when one of the special nodes considered in SRFSC
decoding is encountered, SRFSC decoding is performed and
when a general node with no special structure is encountered,
the proposed TA scheme is applied. The following proposition
provides an upper bound on the BLER of the proposed TA-
SRFSC decoding.
Proposition 3. Let BLERTA−SRFSC and BLERSRFSC denote
the BLER of the TA-SRFSC decoding and the SRFSC decoding
respectively. The following holds
BLERTA−SRFSC ≤ 1− ε (1− BLERSRFSC) . (38)
Proof: See Appendix C.
Proposition 3 provides a method to derive the threshold
value for a desired upper bound of the BLER for TA-SRFSC
decoding. In fact, a large threshold value results in a better
error-correction performance than a small threshold value.
However, since a large threshold value allows for only a few
nodes to undergo hard decision in (29), this error-correction
performance gain is obtained at the cost of lower decoding
speed. Therefore, a trade-off between the error-correction
performance and the decoding speed can be achieved with
the proposed TA-SRFSC decoding algorithm.
B. Multi-stage Decoding
To mitigate the effect of the proposed TA scheme on the
error-correction performance of the TA-SRFSC decoding, a
multi-stage decoding strategy is adopted in which a maximum
of two decoding attempts is conducted. In the first decoding
attempt, TA-SRFSC decoding is adopted. If this decoding fails
and if there existed a node that underwent hard decision in this
first decoding attempt, then a second decoding attempt using
the SRFSC decoding is conducted. To determine if the TA-
SRFSC decoding failed, a cyclic redundancy check (CRC) is
concatenated to the polar code and it is verified after the TA-
SRFSC decoding. Additionally, the CRC is verified after the
second decoding attempt by the SRFSC decoding to determine
if the overall decoding process succeeded.
As shown in the next section, most of the received frames
are decoded correctly by the proposed TA-SRFSC decoding in
the first decoding attempt. As a result, the average decoding
latency of the proposed multi-stage SRFSC decoding is very
close to that of the TA-SRFSC decoding, while its error-
correction performance is very close to that of SRFSC decod-
ing. It is worth noting that the proposed multi-stage decoding
strategy can be generalized to have more than two decoding
attempts. In this scenario the first two attempts are the same as
described above, TA-SRFSC decoding is used first followed
by SRFSC decoding on nodes that underwent hard decision.
The third and any subsequent attempts would use increasingly
more powerful decoding techniques than the first two, such as
successive-cancellation list (SCL) decoding.
V. DECODING LATENCY
In this section, the decoding latency of the proposed fast
decoders measured in terms of the number of time steps is
analyzed using the same assumptions as in [1], [16]. More
specifically:
1) There is no resource limitation so that all the paralleliz-
able instructions are performed in one time step.
2) Bit operations are carried out instantaneously.
3) Addition/subtraction of real numbers and check-node
operation consume one time step.
4) Wagner decoding can be performed in one time step.
A. SRFSC and TA-SRFSC
For any node N ij that is not an SR node and that satisfies
(37), the threshold comparison in (29) is performed in parallel
with the calculation of the LLR values of its left child
node. Therefore, the proposed hard decision scheme does not
introduce overhead in the latency requirements for the nodes
that undergo hard decision.
The number of time steps required for the decoding of the
SR node is calculated according to Algorithm 1. In Step 1,
the calculation of the LLR values for the source node requires
one time step if v 6= ∅. If v = ∅, then the LLR values of
the source node are available immediately. Thus, the required
number of time steps for Step 1 is
T1 =
{
0, if v = ∅,
1, if v 6= ∅. (39)
The time step requirement of Step 2 depends on the source
node type. If SNT = Rate-C, the time step requirement of Step
2 is the time step requirement of the Rate-C node. If SNT =
Rate-0 or Rate-1, then there is no latency overhead in Step 2.
If SNT = EG-PC and in accordance with Section III-B, z can
be estimated in two time steps if the leftmost node is a REP
node (one time step for performing the check-node operation
and one time step for adding the LLR values). Also, Wagner
decoding can be performed in parallel with the estimation of
z assuming z = 0 or z = 1. As such, at most two time steps
are required for parity check and bit flipping of the EG-PC
node. The required number of time steps for Step 2 is
T2 =

0, if SNT = Rate-0 or Rate-1,
1 or 2, if SNT = EG-PC,
2r+1 − 2, if SNT = Rate-C.
(40)
Step 3 consumes two time steps using an adder tree and a
comparison tree if |S| > 1. If |S| = 1, then there is no latency
overhead in Step 3. Thus, the required number of time steps
for Step 3 is
T3 =
{
0, if |S| = 1,
2, if |S| > 1. (41)
9Since path selection in Step 3 can be executed in parallel
with the decoding of source node in Step 2 and the following
g function calculation, the total number of time steps required
to decode an SR node can be expressed as
TSR = T1 +max (T2, T3 − 1) , (42)
where T3 − 1 indicates that at least one time step in T3
can be reduced by parallelizing Step 3 and the g function
calculation. Therefore, TSR is a variable that is dependent on
its parameters. However, with a given polar code, the total
number of time steps required for the decoding of the polar
code using SRFSC decoding is fixed, regardless of the channel
conditions.
B. Multi-stage SRFSC
Let TTA−SRFSC and TSRFSC denote the average decoding
latency of the proposed TA-SRFSC decoding and the SRFSC
decoding, respectively, in terms of the number of required time
steps. The average decoding latency of the proposed multi-
stage SRFSC decoding, TMulti−stage SRFSC, is given by
TMulti−stage SRFSC = TTA−SRFSC + PRe−decodingTSRFSC,
(43)
where PRe−decoding indicates the probability that TA-SRFSC
decoding fails and there is at least one node that under-
goes hard decision in the TA-SRFSC decoding. Note that
PRe−decoding is less than or equal to the probability that the
output of TA-SRFSC decoding fails the CRC verification,
which can be approximated by BLERTA−SRFSC. The approx-
imation is due to the fact that the undetected error probability
of CRC is negligible [25]. In accordance with Proposition 3,
the approximate average decoding latency requirement for the
proposed multi-stage SRFSC decoding can be derived as
TMulti−stage SRFSC
. TTA−SRFSC + (1− ε (1− BLERSRFSC)) TSRFSC.
(44)
Note that the decoding latency of SRFSC decoding is fixed.
Therefore, the average decoding latency and the worst case de-
coding latency of SRFSC decoding are equivalent. The worst
case decoding latency of the proposed TA-SRFSC decoding
can be calculated when none of the nodes in the decoding
tree undergo hard decision. This occurs when the channel has
a high level of noise. Thus, the worst case decoding latency
of the proposed TA-SRFSC decoding is equivalent to the
decoding latency of the SRFSC decoding. Moreover, when
the channel is too noisy, PRe−decoding ≈ 0, because almost
none of the nodes undergo hard decision. Thus, the worst case
decoding latency of the proposed multi-stage SRFSC decoding
is equivalent to the worst case decoding latency of TA-SRFSC
decoding, which is the latency of SRFSC decoding.
VI. NUMERICAL RESULTS
In this section, the average decoding latency and the error-
correction performance of the proposed decoding algorithms
are analyzed and compared with state-of-the-art fast SC de-
coding algorithms. To derive the results, polar codes of length
N ∈ {128, 512, 1024}, which are adopted in the 5G standard
TABLE II
THE NUMBER OF SR NODES WITH DIFFERENT |S| IN 5G POLAR CODES OF
LENGTHS N ∈ {128, 512, 1024} AND RATES R ∈ {1/4, 1/2, 3/4}.
N R
|S|
Total
1 2 4 8 16
128
1/4 5 1 0 1 0 7
1/2 5 1 1 1 0 8
3/4 3 0 1 0 0 4
512
1/4 11 3 3 1 0 18
1/2 17 3 3 1 0 24
3/4 15 3 2 0 0 20
1024
1/4 17 4 3 2 1 27
1/2 24 8 4 1 1 38
3/4 26 7 2 1 0 36
[26], are used and a total of 107 frames are tested. The CRC of
length 16, which is adopted in the 5G standard with generator
polynomial D16 +D12 +D5 + 1, is used for all transmitted
frames to identify whether the decoding succeeded or failed.
For the sake of fairness, the latency of other baseline decoding
algorithms in the simulations are also calculated under the
same assumptions in Section V.
To simulate the effect of ε on the error-correction perfor-
mance and the latency of the proposed decoding algorithms,
three values of ε ∈ {0.9, 0.99, 0.999} are selected. In accor-
dance with (36), c >= 3.8 for ε = 0.9, c >= 4.3 for ε = 0.99,
and c >= 4.8 for ε = 0.999. According to (33), with the
increasing of c, Pe decreases which means less nodes will
be performed hard decision and the decoding latency will in-
crease. To get a tradeoff between error-correction performance
and latency reduction, we set c = 3.8 for ε = 0.9, c = 4.3
for ε = 0.99, and c = 4.8 for ε = 0.999. Consequently,
mij ≥ 9.3891 for ε = 0.9, mij ≥ 14.7255 for ε = 0.99,
and mij ≥ 16.1604 for ε = 0.999 in accordance with (37).
Using these values, the threshold T defined in (35), the BLER
upper bound for the TA-SRFSC decoding in (38), and the
approximate average decoding latency upper bound for the
multi-stage SRFSC decoding in (44) can be calculated for
different values of ε.
Table II reports the number of SR nodes with different |S|
at different code lengths and rates. It can be seen that when
the code length is 128, 512, and 1024, the codes with rate
1/2, 1/4, and 1/4 have the largest proportion of nodes with
|S| > 1, respectively. This in turn results in more latency
savings because a higher degree of parallelism can be exploited
with these nodes. Table III shows the length of SR nodes with
different |S| at different code lengths when R = 1/2. The
length of the SR nodes in the decoding tree corresponds to the
level in the decoding tree that they are located. SR nodes with
larger |S| that are located on a higher level of the decoding
tree contribute more in the overall latency reduction.
Table IV reports the number of time steps required to
decode polar codes of lengths N ∈ {128, 512, 1024} and
rates R ∈ {1/4, 1/2, 3/4} with the proposed SRFSC decoding
algorithm, and compares it with the required number of time
10
TABLE III
NODE LENGTH OF SR NODES WITH DIFFERENT |S| IN 5G POLAR CODES
OF LENGTHS N ∈ {128, 512, 1024} AND RATE R = 1/2.
N
Node
Length
|S|
1 2 4 8 16
128
4 1 1 0 0 0
8 1 0 0 0 0
16 2 0 1 0 0
32 1 0 0 1 0
512
8 5 3 0 0 0
16 5 0 3 0 0
32 5 0 0 1 0
64 2 0 0 0 0
1024
8 10 6 0 0 0
16 6 1 3 0 0
32 3 0 1 1 0
64 3 1 0 0 1
128 2 0 0 0 0
TABLE IV
NUMBER OF TIME STEPS FOR DIFFERENT FAST SC DECODING
ALGORITHMS OF POLAR CODES OF LENGTHS N ∈ {128, 512, 1024} AND
RATES R ∈ {1/4, 1/2, 3/4}.
N R [11] [16] [17] SRFSC
128
1/4 40 29 29 21
1/2 46 35 34 24
3/4 20 15 14 11
512
1/4 112 75 75 58
1/2 126 92 89 75
3/4 104 70 68 63
1024
1/4 178 127 121 89
1/2 222 154 154 124
3/4 186 128 126 111
steps of the decoders in [11], [16], and [17]. The proposed
SRFSC decoder provides up to 50% latency reduction with
respect to the decoder in [11], up to 31% latency reduction
with respect to the decoder in [16], and up to 29% latency
reduction with respect to the decoder in [17].
Table V compares the number of special nodes of polar
codes with N = 1024 and R = {1/4, 1/2, 3/4} for the
proposed SRFSC decoding algorithm and the decoders in
[11], [16], and [17]. It can be seen that the proposed SRFSC
decoding algorithm has the minimum number of special node
types and total number of nodes, since SR nodes are located
at a higher level of the decoding tree.
Fig. 8 and Fig. 9, respectively, show the BLER and BER
performance of different decoding algorithms when N = 1024
and R = 1/2, for different values of energy per bit to noise
power spectral density ratio (Eb/N0). For each value of ε, the
BLER of TA-SRFSC decoding is depicted together with the
upper bound calculated by Proposition 3. It can be seen that the
introduction of the TA scheme results in BLER performance
loss for the proposed TA-SRFSC decoding with respect to SC
2.0 2.5 3.0 3.5 4.0 4.5 5.0
10−5
10−4
10−3
10−2
10−1
ε=0.9
ε=0.99
ε=0.999
Eb/N0 (dB)
B
L
E
R
TA-SRFSC (ε = 0.9) Upper Bound (ε = 0.9)
TA-SRFSC (ε = 0.99) Upper Bound (ε = 0.99)
TA-SRFSC (ε = 0.999) Upper Bound (ε = 0.999)
SC [19] (ct = 1)
SRFSC [19] (ct = 2)
Multi-stage SRFSC
Fig. 8. BLER performance of different decoding algorithms for the 5G polar
code of length N = 1024 and rate R = 1/2.
and SRFSC decoding, especially at higher values of Eb/N0.
Moreover, the simulations confirm that the BLER curves of
TA-SRFSC decoding fall below their respective upper bounds.
It can also be seen that, as the Eb/N0 value increases beyond
a specific point, the BLER/BER performance of the TA-
SRFSC decoding degrades. This is because of the difference in
the performance of hard-decision and soft-decision decoding.
In accordance with (37), more nodes undergo hard decision
decoding for larger values of Eb/N0. Therefore, while the
channel conditions improve, the hard decision decoding in-
troduces errors that reduce the error-correction performance
gain associated with these large Eb/N0 values. As a result,
the BLER/BER performance degrades after a certain value of
Eb/N0. This phenomenon exists as long as there are nodes
that can undergo hard decision decoding. After all the nodes
are decoded using hard decision, the BLER/BER performance
improves again as Eb/N0 increases.
For all values of ε, the proposed multi-stage SRFSC de-
coding results in almost the same BLER/BER performance.
Therefore, only one curve is plotted in Fig. 8 and Fig. 9 for the
multi-stage SRFSC decoding. It shows negligible performance
deterioration compared to the conventional SC and SRFSC
decoders and provides a better BLER/BER performance than
the method in [19] for ct ∈ {1, 2}. A similar trend can also
be observed when comparing the BLER/BER performance of
different schemes for polar codes of other lengths and rates.
Fig. 10 presents the average decoding latency in terms
of the required number of time steps for the proposed TA-
SRFSC decoding, with and without multi-stage decoding.
11
TABLE V
NUMBER OF SPECIAL NODES FOR DIFFERENT DECODING ALGORITHMS OF POLAR CODES OF LENGTH N = 1024 AND WITH RATE R = {1/4, 1/2, 3/4}.
R
R
at
e-
0
R
at
e-
1
SP
C
R
E
P
Ty
pe
-I
Ty
pe
-I
I
Ty
pe
-I
II
Ty
pe
-I
V
Ty
pe
-V
G
-P
C
G
-R
E
P
SR
[11]
1/4 18 15 16 22
1/2 17 17 26 26 — — — — — — — —
3/4 11 26 21 17
[16]
1/4 3 2 7 13 2 2 3 2 7
1/2 0 4 12 12 2 2 2 2 12 — — —
3/4 1 7 11 7 1 2 3 2 8
[17]
1/4 2 19 19
1/2 — 4 — — — — — — — 26 18 —
3/4 7 23 12
SRFSC
1/4 27
1/2 — — — — — — — — — — — 38
3/4 36
2.0 2.5 3.0 3.5 4.0 4.5 5.0
10−7
10−6
10−5
10−4
10−3
10−2
10−1
ε=0.9
ε=0.99
ε=0.999
Eb/N0 (dB)
B
E
R
TA-SRFSC (ε = 0.9) [19] (ct = 1) SC
TA-SRFSC (ε = 0.99) [19] (ct = 2) SRFSC
TA-SRFSC (ε = 0.999) Multi-stage SRFSC
Fig. 9. BER performance of different decoding algorithms for the 5G polar
code of length N = 1024 and rate R = 1/2.
In particular, it compares them with the latency of SRFSC
decoding and the decoder in [19] at different values of Eb/N0
when N = 1024 and R = 1/2. It can be seen that the
required number of time steps for the proposed TA-SRFSC
decoding decreases as Eb/N0 increases and is reduced by
36% for ε = 0.999, by 45% for ε = 0.99, and by 54% for
ε = 0.9, compared to SRFSC decoding at Eb/N0 = 5 dB.
The required number of time steps for the proposed multi-
stage SRFSC decoding is close to that of the TA-SRFSC
decoding and outperforms the method in [19] with ct = 1
2.0 2.5 3.0 3.5 4.0 4.5 5.0
60
80
100
120
140
Eb/N0 (dB)
N
um
be
r
of
tim
e
st
ep
s
TA-SRFSC (ε = 0.9) Multi-stage SRFSC (ε = 0.9)
TA-SRFSC (ε = 0.99) Multi-stage SRFSC (ε = 0.99)
TA-SRFSC (ε = 0.999) Multi-stage SRFSC (ε = 0.999)
[19] (ct = 1) Upper Bound (ε = 0.9)
[19] (ct = 2) Upper Bound (ε = 0.99)
SRFSC Upper Bound (ε = 0.999)
Fig. 10. Average decoding latency of different decoding algorithms for the
5G polar code of length N = 1024 and rate R = 1/2.
by 21% for ε = 0.9 at Eb/N0 = 5 dB while providing
a significantly better BLER/BER performance. Fig. 10 also
presents the approximate upper bound derived in (44). It can
be seen in the figure that the upper bound in (44) becomes
tighter as ε increases.
Fig. 11 compares the average number of threshold com-
parisons in (29) for the proposed TA-SRFSC decoder with
ε ∈ {0.9, 0.99, 0.999}, and the decoder in [19] with ct ∈
{1, 2}. It can be seen that the proposed TA-SRFSC decoder
12
2.0 3.0 4.0 5.0
0
10
20
30
40
50
Eb/N0 (dB)
A
ve
ra
ge
nu
m
be
r
of
th
re
sh
ol
d
co
m
pa
ri
so
n
TA-SRFSC (ε = 0.9) [19] (ct = 1)
TA-SRFSC (ε = 0.99) [19] (ct = 2)
TA-SRFSC (ε = 0.999)
Fig. 11. Average number of threshold comparisons of the proposed TA-
SRFSC decoding in comparison with the hard-decision scheme in [19] for
the 5G polar code of length N = 1024 and rate R = 1/2.
shows significant advantage with respect to [19] in terms
of the average number of threshold comparisons. The TA-
SRFSC decoder with ε = 0.9 provides at least 39% reduction
with respect to [19] with ct = 1 while having a lower
decoding latency. This means the decoder in [19] executes
many unnecessary threshold comparison operations, while TA-
SRFSC decoding only makes hard decisions when a node
satisfies (37).
VII. CONCLUSION
In this work, a new sequence repetition (SR) node is
identified in the successive-cancellation (SC) decoding tree of
polar codes and an SR node-based fast SC (SRFSC) decoder
is proposed. In addition, to speed up the decoding of nodes
with no specific structure, the SRFSC decoder is combined
with a threshold-based hard-decision-aided (TA) scheme and
a multi-stage decoding strategy. We show that this method
further reduces the decoding latency without tangibly affecting
the error-correction performance when the communications
channel has low noise. In particular, simulation results for
a polar code of length 1024 and rate 1/2 show that SRFSC
decoding obtains up to 19.5% decoding latency reduction with
respect to the fastest known decoding algorithm in [17], and
the reduction reaches 26.5% at code length 1024 and rate 1/4.
In addition, the proposed multi-stage SRFSC decoding reduces
the average decoding latency by 54% with respect to SRFSC
decoding at Eb/N0 = 5 dB on a polar code of length 1024 and
rate 1/2. This average latency saving is particularly important
in real-time applications such as video. Future work includes
the design of a fast SC list decoder using SR nodes.
APPENDIX A
PROOF OF PROPOSITION 1
Let Ik denote the k × k identity matrix for k ≥ 1. Since
the source node is the rightmost node in an SR node, the g
function calculation in (3) can be used as
αEr [1 : 2
r] = αij
[
1 : 2j
]× (I2j−1 ⊗ ((−1)ηj−1 , 1)T)
×
(
I2j−2 ⊗ ((−1)ηj−2 , 1)T
)
× · · ·
×
(
I2r ⊗ ((−1)ηr , 1)T
)
.
Using the identity (A⊗B)×(C ⊗D) = (A× C)⊗(B ×D)
with A = I2j−2 , B = I2 ⊗ ((−1)ηj−1 , 1)T , C = I2j−2 , and
D = ((−1)ηj−2 , 1)T results in
αEr [1 : 2
r] =αij
[
1 : 2j
]
×
[
(I2j−2 × I2j−2)
⊗
(
I2 ⊗ ((−1)ηj−1 , 1)T × ((−1)ηj−2 , 1)T
)]
×
(
I2j−3 ⊗ ((−1)ηj−3 , 1)T
)
× · · ·
×
(
I2r ⊗ ((−1)ηr , 1)T
)
,
which can be written as
αEr [1 : 2
r] =αij
[
1 : 2j
]
× I2j−2 ⊗
(
((−1)ηj−2 , 1)T ⊗ ((−1)ηj−1 , 1)T
)
×
(
I2j−3 ⊗ ((−1)ηj−3 , 1)T
)
× · · ·
×
(
I2r ⊗ ((−1)ηr , 1)T
)
,
where the identity I2⊗(a1, . . . , ak)T×(b1, b2)T = (b1, b2)T⊗
(a1, . . . , ak)
T is used. Repeating the above procedures results
in
αEr [1 : 2
r] = αij
[
1 : 2j
]
× I2r⊗
(
((−1)ηr , 1)T⊗· · · ⊗ ((−1)ηj−1 , 1)T
)
= αij
[
1 : 2j
]
×I2r⊗
(
(−1)sl[1],(−1)sl[2],. . .,(−1)sl[2j−r]
)T
.
Thus for k ∈ {1, . . . , 2r},
αErl [k] =
2j−r∑
m=1
αij
[
(k − 1) 2j−r +m] (−1)sl[m] .
This completes the proof.
APPENDIX B
PROOF OF PROPOSITION 2
To prove Proposition 2, a lemma is first introduced as
follows.
13
Lemma 1. For any node N ij whose 2j bits undergo a hard
decision in (29), the probability of correct decoding can be
calculated as ( PcPe+Pc )
2j .
Proof: In accordance with Fig. 7, for any node N ij ,
considering all the previous bits are decoded correctly, the
probability that the k-th bit (1 ≤ k ≤ 2j) in the node under-
goes a hard decision is Pc + Pe. Moreover, The probability
of a correct hard decision for the k-th bit in the node is
Pc, regardless of the value of βij [k]. Thus, the conditional
probability that a hard decision on the k-th bit is correct given
that the k-th bit undergoes a hard decision is PcPe+Pc . Since the
LLR values of bits in a node are independent of each other,
the conditional probability that hard decisions on all the 2j
bits of node N ij are correct given that all its 2j bits undergo
hard decisions can be calculated as
(
Pc
Pe+Pc
)2j
.
To have a probability of correct decoding of at least ε for all
the nodes that undergo hard decision in a polar code of length
2n, any such node N ij is assumed to have the probability of
correct decoding of at least 2(n−j)
√
ε. Under this assumption,
even in the worst case that all bits in the code are decoded
by hard decision, the correct decoding probability is at least
ε. Therefore and by using the result in Lemma 1,(
Pc
Pe + Pc
)2j
≥ 2(n−j)√ε, (45)
which is equivalent to
Pc
Pe + Pc
≥ 2n√ε. (46)
If mij ≤ 2c2, then T = −mij + c
√
2mij , Pc =
Q
(
c−
√
2mij
)
, and Pe = Q (c). Thus (46) can be written as
1
2
[
c−Q−1
(
Q (c)
1
2n
√
ε
− 1
)]2
≤ mij ≤ 2c2, (47)
which requires
Q (c) ≤ 1− 2n√ε. (48)
If mij ≥ 2c2, then T = mij − c
√
2mij , Pc = Q (−c), and
Pe = Q
(√
2mij − c
)
. Thus (46) can be written as
mij ≥ max
{
2c2,
1
2
[
c+Q−1
((
1
2n
√
ε
− 1
)
Q (−c)
)]2}
.
(49)
If (48) holds and by using the fact that Q(−c) = 1 − Q(c),
then
2c2 ≥ 1
2
[
c+Q−1
((
1
2n
√
ε
− 1
)
Q (−c)
)]2
. (50)
Thus mij ≥ 2c2, which always holds based on the initial
assumption. Therefore, it is sufficient to have
mij ≥
1
2
[
c−Q−1
(
Q (c)
1
2n
√
ε
− 1
)]2
, (51)
and (48) to ensure (46). In other words, the probability that
all the nodes that undergo hard decision (29) in the decoding
process are decoded correctly will be lower bounded by ε
if (48) and (51) are satisfied. Thus, the probability of hard
decision error will be upper bounded by 1−ε. This completes
the proof.
APPENDIX C
PROOF OF PROPOSITION 3
Note that based on Proposition 2, any node N ij that is
decoded using (29) has a probability of correct hard decision
of at least 2(n−j)
√
ε. For any node that undergoes the SRFSC
decoding, the probability of correct decoding is determined
by the error rate of SRFSC decoding. Thus, the probabil-
ity of correct decoding for TA-SRFSC decoding is at least
ε (1− BLERSRFSC). Consequently, BLERTA−SRFSC ≤ 1 −
ε (1− BLERSRFSC). This completes the proof.
ACKNOWLEDGMENTS
This work is supported by ERC Proof-of-Concept project
BROWSE+; NWO Zwaartekracht program on Integrated
Nanophotonics; International collaboration fund of Changchun
Institute of Optics, Fine Mechanics and Physics (CIOMP);
Open Fund of the State Key Laboratory of Optoelectronic
Materials and Technologies (Sun Yat-sen University); and
Huawei. In addition, S. A. Hashemi is supported by a Postdoc-
toral Fellowship from the Natural Sciences and Engineering
Research Council of Canada (NSERC).
REFERENCES
[1] E. Arıkan, “Channel polarization: A method for constructing capacity-
achieving codes for symmetric binary-input memoryless channels,” IEEE
Trans. Inf. Theory, vol. 55, no. 7, pp. 3051–3073, Jul. 2009.
[2] 3GPP, “Final report of 3GPP TSG RAN WG1 #87 v1.0.0,” Reno,
USA, Nov. 2016. [Online]. Available: http://www.3gpp.org/ftp/tsg ran/
WG1 RL1/TSGR1 87.
[3] K. Niu, K. Chen, J. Lin, and Q. Zhang, “Polar codes: Primary concepts
and practical decoding algorithms,” IEEE Commun. Mag., vol. 52, no. 7,
pp. 192–203, 2014.
[4] C. Zhang, B. Yuan, and K. K. Parhi, “Reduced-latency SC polar decoder
architectures,” in Proc. IEEE Int. Conf. Commun. (ICC), Jun. 2012, pp.
3471–3475.
[5] A. Mishra, A. J. Raymond, L. G. Amaru, G. Sarkis, C. Leroux,
P. Meinerzhagen, A. Burg, and W. J. Gross, “A successive cancellation
decoder ASIC for a 1024-bit polar code in 180nm cmos,” in Proc. Asian
Solid-State Circuits Conf., Nov. 2012, pp. 205–208.
[6] B. Yuan and K. K. Parhi, “Reduced-latency LLR-based SC list decoder
for polar codes,” in Proc. 25th Ed. Great Lakes Symp. VLSI, May. 2015,
pp. 107–110.
[7] ——, “Low-latency successive-cancellation list decoders for polar codes
with multibit decision,” IEEE Trans. VLSI Syst., vol. 23, no. 10, pp.
2268–2280, Oct. 2015.
[8] G. Sarkis and W. J. Gross, “Increasing the throughput of polar decoders,”
IEEE Commun. Lett., vol. 17, no. 4, pp. 725–728, Apr. 2013.
[9] C. Husmann, P. C. Nikolaou, and K. Nikitopoulos, “Reduced latency
ml polar decoding via multiple sphere-decoding tree searches,” IEEE
Trans. Veh. Technol., vol. 67, no. 2, pp. 1835–1839, Feb. 2017.
[10] A. Alamdar-Yazdi and F. R. Kschischang, “A simplified successive-
cancellation decoder for polar codes,” IEEE Commun. Lett., vol. 15,
no. 12, pp. 1378–1380, Dec. 2011.
[11] G. Sarkis, P. Giard, A. Vardy, C. Thibeault, and W. J. Gross, “Fast polar
decoders: Algorithm and implementation,” IEEE J. Sel. Areas Commun.,
vol. 32, no. 5, pp. 946–957, May. 2014.
[12] Z. Huang, C. Diao, and M. Chen, “Latency reduced method for modified
successive cancellation decoding of polar codes,” Electron. Lett., vol. 48,
no. 23, pp. 1505–1506, Nov. 2012.
14
[13] A. Balatsoukas-Stimming, G. Karakonstantis, and A. Burg, “Enabling
complexity-performance trade-offs for successive cancellation decoding
of polar codes,” in Proc. IEEE Int. Symp. Inf. Theory Process. (ISIT),
Jun. 2014, pp. 2977–2981.
[14] L. Zhang, Z. Zhang, X. Wang, C. Zhong, and L. Ping, “Simplified
successive-cancellation decoding using information set reselection for
polar codes with arbitrary blocklength,” IET Commun., vol. 9, no. 11,
pp. 1380–1387, Jul. 2015.
[15] P. Giard, A. Balatsoukas-Stimming, G. Sarkis, C. Thibeault, and W. J.
Gross, “Fast low-complexity decoders for low-rate polar codes,” J. Sign.
Process. Syst., vol. 90, no. 5, pp. 675–685, May. 2018.
[16] M. Hanif and M. Ardakani, “Fast successive-cancellation decoding of
polar codes: identification and decoding of new nodes,” IEEE Commun.
Lett., vol. 21, no. 11, pp. 2360–2363, Nov. 2017.
[17] C. Condo, V. Bioglio, and I. Land, “Generalized fast decoding of polar
codes,” in Proc. IEEE Global Commun. Conf. (GLOBECOM), Dec.
2018, pp. 1–6.
[18] H. Gamage, V. Ranasinghe, N. Rajatheva, and M. Latva-aho, “Low
latency decoder for short blocklength polar codes,” arXiv preprint
arXiv:1911.03201, 2019.
[19] S. Li, Y. Deng, L. Lu, J. Liu, and T. Huang, “A low-latency simplified
successive cancellation decoder for polar codes based on node error
probability,” IEEE Commun. Lett., vol. 22, no. 12, pp. 2439–2442, Dec.
2018.
[20] H. Sun, R. Liu, and C. Gao, “A simplified decoding method of polar
codes based on hypothesis testing,” IEEE Commun. Lett., pp. 1–1, Jan.
2020.
[21] C. Leroux, I. Tal, A. Vardy, and W. J. Gross, “Hardware architectures
for successive cancellation decoding of polar codes,” in Proc. IEEE Int.
Conf. Acoust., Speech Signal Process., May. 2011, pp. 1665–1668.
[22] R. Silverman and M. Balser, “Coding for constant-data-rate systems,”
Trans. IRE Prof. Group Inf. Theory, vol. 4, no. 4, pp. 50–63, Sep. 1954.
[23] P. Trifonov, “Efficient design and decoding of polar codes,” IEEE Trans.
Commun., vol. 60, no. 11, pp. 3221–3227, Nov. 2012.
[24] Z. Zhang and L. Zhang, “A split-reduced successive cancellation list
decoder for polar codes,” IEEE J. Sel. Areas Commun., vol. 34, no. 2,
pp. 292–302, Feb. 2016.
[25] M. El-Khamy, J. Lee, and I. Kang, “Detection analysis of CRC-assisted
decoding,” IEEE Commun. Lett., vol. 19, no. 3, pp. 483–486, Mar. 2015.
[26] 3GPP, “3GPP TS RAN 38.212 v1.2.1,” Dec. 2017. [Online]. Available:
http://www.3gpp.org/ftp/Specs/archive/38 series/38.212/38212-f30.zip.
