Segmented Successive Cancellation List Polar Decoding with Tailored CRC by Zhou, Huayi et al.
Journal of xxx manuscript No.
(will be inserted by the editor)
Segmented Successive Cancellation List Polar Decoding with
Tailored CRC
Huayi Zhou · Xiao Liang · Liping Li · Zaichen Zhang · Xiaohu You ·
Chuan Zhang∗
Received: October 00, 2017 / Accepted: date
Abstract As the first error correction codes provably
achieving the symmetric capacity of binary-input dis-
crete memory-less channels (B-DMCs), polar codes have
been recently chosen by 3GPP for eMBB control chan-
nel. Among existing algorithms, CRC-aided successive
cancellation list (CA-SCL) decoding is favorable due to
its good performance, where CRC is placed at the end
of the decoding and helps to eliminate the invalid can-
didates before final selection. However, the good per-
formance is obtained with a complexity increase that
is linear in list size L. In this paper, the tailored CRC-
aided SCL (TCA-SCL) decoding is proposed to balance
performance and complexity. Analysis on how to choose
the proper CRC for a given segment is proposed with
the help of virtual transform and virtual length. For fur-
ther performance improvement, hybrid automatic re-
peat request (HARQ) scheme is incorporated. Numeri-
cal results have shown that, with the similar complex-
ity as the state-of-the-art, the proposed TCA-SCL and
HARQ-TCA-SCL schemes achieve 0.1 dB and 0.25 dB
performance gain at frame error rate FER = 10−2, re-
spectively. Finally, an efficient TCA-SCL decoder is im-
plemented with FPGA demonstrating its advantages
over CA-SCL decoder.
Huayi Zhou · Xiao Liang · Zaichen Zhang · Xiaohu You ·
Chuan Zhang∗
Lab of Efficient Architectures for Digital-communication and
Signal-processing (LEADS),
National Mobile Communications Research Laboratory,
Southeast University, Nanjing, China
E-mail: {hyzhou, xiao liang, zczhang, xhyu,
chzhang}@seu.edu.cn
∗corresponding author
Liping Li
Key Laboratory of Intelligent Computing and Signal Process-
ing of the MoE, Anhui University, Hefei, China
E-mail: liping li@ahu.edu.cn
Keywords Polar codes · segmented CA-SCL · tailored
CRC · HARQ · VLSI
1 Introduction
Polar codes, proposed by Arıkan [1, 2], are considered
as a breakthrough of coding theory. It is shown that
polar codes can provably achieve the symmetric capac-
ity of binary-input discrete memory-less channels (B-
DMCs) [2]. Besides the capacity achieving performance,
the asset of polar coding compared to the state-of-the-
art (SOA) is its corresponding low-complexity decoding
algorithms. Therefore, polar codes have been adopted
by 3GPP for eMBB control channels.
Though linear programming (LP) decoder [3], suc-
cessive cancellation (SC) decoder, and belief propaga-
tion (BP) decoder [4, 5] have been proposed for polar
codes, their performance is not comparable with maxi-
mum likelihood (ML) decoder. Thus, the breadth-first
SC decoder named SC list (SCL) decoder, was pro-
posed by [6, 7]. Cyclic redundancy check (CRC), widely
adopted for error detection, has been proved as a sim-
ple and effective enabler for further performance im-
provement with respect to SCL decoder. Numerical re-
sults have shown that, CRC-aided SCL (CA-SCL) de-
coder [8] achieves at least no worse performance than
the SOA turbo and low-density parity-check (LDPC)
decoders [9]. Usually, CRC is placed at the end of de-
coding to eliminate invalid candidates before final de-
cision. The disadvantages are: 1) Though has better
performance than SCL decoder, CA-SCL decoder still
suffers from time and space complexity regarding the
list size L. 2) For intermediate candidates which have
already gone wrong, no early elimination could be taken
ar
X
iv
:1
80
3.
00
52
1v
2 
 [e
es
s.S
P]
  1
6 M
ar 
20
18
2 Huayi Zhou et al.
in time until the decoding end is reached, and the com-
putation afterward is in vain.
To address the complexity and redundancy, [10] pro-
posed a segmented CA-SCL (SCA-SCL) decoder. At
the same time, [11] independently proposed a parti-
tioned CA-SCL (PSCL) decoder, which is similar as
the SCA-SCL decoder but with a different partition
method. Both decoders divide code bits into segments
and insert CRC bits in between, to rule out invalid can-
didates per segment rather than to wait until the de-
coding ends. Thus, they can reduce redundancy while
keeping comparable performance as CA-SCL decoders.
However, existing decoders usually apply the same CRC
length to the same number of information or code bits.
Though convenient, those straightforward schemes fail
to take the code construction into consideration. It is
not clear whether the existing uniform partition schemes
are optimal and whether better performance can be
achieved with the same number of CRC bits.
To our best knowledge, no existing literature has
discussed the CRC distribution for SCA-SCL decoding,
and its hardware implementation. Analysing the CRC
requirement by unequal-length segments and introduc-
ing concepts of virtual transform and virtual length,
this paper devotes itself in figuring out a tailored CA-
SCL (TCA-SCL) decoding of improved performance
and lower complexity than SOA. An HARQ-TCA-SCL
decoding is proposed for further performance improve-
ment. Contributions of this paper are: 1) Efficient CRC
distribution is proposed for the first time, showing per-
formance advantage over SOA. 2) This paper does not
limit itself to specific decoder design, but proposes a
formal TCA methodology, which can be readily applied
to any existing SCA-SCL decoders. 3) The efficient im-
plementation methodology is also proposed and verified
with FPGA implementations.
The remainder of the paper is organized as follows.
Section 2 reviews the preliminaries. Section 3 analyzes
the SCA-SCL decoders for possible refinement. The TCA-
SCL decoding is given in Section 4. The HARQ-TCA-
SCL decoding is given in Section 5. Section 6 gives the
performance and complexity analysis of the proposed
decoding schemes. Section 7 proposes a hardware ar-
chitecture for TCA-SCL decoding. FPGA implementa-
tions are given in the same section. Finally, Section 8
concludes the entire paper.
2 Preliminaries
2.A Polar Codes
Denote the input alphabet, output alphabet, and tran-
sition probabilities of a B-DMC by X , Y, and W (y|x).
With block length N = 2n, the information vector, en-
coded vector, and received vector are uN1 = (u1, ..., uN ),
xN1 = (x1, ..., xN ), and y
N
1 = (y1, ..., yN ). The polar en-
coding is given by
xN1 = u
N
1 GN = u
N
1 BNF
⊗n, (1)
where GN and BN are the generation matrix and bit-
reversal permutation matrix respectively, and F = [ 1 01 1 ].
Transmitting channels between xN1 and y
N
1 are
W
(i)
N (y
N
1 , u
i−1
1 |ui), derived by channel combining
WN (yN1 |xN1 ) = WN (yN1 |uN1 GN ) (2)
and channel splitting
W
(i)
N (y
N
1 , u
i−1
1 |ui) =
∑
uNi+1
1
2N−1W
N (yN1 |xN1 ), i = 1, ..., N. (3)
Define I(W ) as the symmetric capacity. For B-DMC
W and δ ∈ (0, 1), W (i)N polarizes: as N goes to infin-
ity via powers of 2, I(W
(i)
N ) ∈ (1 − δ, 1] approaches
I(W ) and I(W
(i)
N ) ∈ [0, δ) approaches (1 − I(W )). In
(N,K) codes, the K most reliable channels with indices
in information set A are chosen to transmit the K in-
formation bits in uN1 ; whereas the others, with indices
in frozen set Ac, transmit the (N −K) frozen bits.
2.B SC and SCL Polar Decoders
The SC polar decoding tree is a full binary tree. Fig.
1 shows a toy example for N = 8. For each node at
the n-th level, two possible choices are 0 and 1. Each
set consisting of all the leaf nodes is associated with
a unique estimated codeword uˆN1 = (uˆ1, uˆ2, ..., uˆN ). If
i ∈ Ac, uˆi = 0. Otherwise, the SC decoder computes
its log-likelihood ratio (LLR):
L
(i)
N (y
N
1 , uˆ
i−1
1 ) = log
W
(i)
N (y
N
1 , uˆ
i−1
1 | ui = 0)
W
(i)
N (y
N
1 , uˆ
i−1
1 | ui = 1)
, (4)
and generates its decision as
uˆi =
{
0, if L
(i)
N (y
N
1 , uˆ
i−1
1 ) ≥ 0;
1, otherwise.
(5)
The LLR updating is conducted based on the two
equations listed in Eq. (6). max∗ denotes the Jacobi
logarithm:
max∗(x1, x2)
∆
= ln(ex1 + ex2) = max(x1, x2) + ln(1 + e
−|x1−x2|). (7)
This recursive process starts from each (sub-)tree’s
root and always traverses the left branch before the
right (Fig. 1). When the leaf level is reached, hard de-
cision is made and returned to the parent node.
As a greedy search algorithm, SC decoding keeps
only one path based on step-wise decision, with com-
plexity of O(N logN). However, this single-candidate
Segmented Successive Cancellation List Polar Decoding with Tailored CRC 3
L
(2i−1)
N (y
N
1 , uˆ
2i−2
1 |u2i−1) = max ∗(L(i)N/2(yN/21 , uˆ2i−21,o ⊕ uˆ2i−21,e |u2i−1) + L(i)N/2(yNN/2+1, uˆ2i−21,e |0),
L
(i)
N/2(y
N/2
1 , uˆ
2i−2
1,o ⊕ uˆ2i−21,e |u¯2i−1) + L(i)N/2(yNN/2+1, uˆ2i−21,e |1)),
L
(2i)
N (y
N
1 , uˆ
2i−1
1 |u2i) = L(i)N/2(yN/21 , uˆ2i−21,o ⊕ uˆ2i−21,e |u2i−1 ⊕ u2i) + L(i)N/2(yNN/2+1, uˆ2i−21,e |u2i).
(6)
Level 1
Level 2
Level 3
LN
(2i-1)
  
LN
(2i)
  
LN/2
(2i)
  
u2i-1 
u2i  
u  
Level 0
Fig. 1 Tree illustration of SC decoding process.
method only guarantees the local optimality, and will
possibly result in incorrect result. To this end, the SCL
decoding, which keeps a list of L survivals, was pro-
posed by [6, 7] independently. Fig. 2 illustrates the dif-
ference between SC and SCL algorithms. The complex-
ity of SCL decoder is O(LN logN). At the i-th step, if
i ∈ A, the SCL decoder splits each current path into
two paths with both uˆi = 0 and uˆi = 1. Out of the
2L paths, only the L best ones are kept. Finally, the
decoder chooses the best path at the end of decoding
process.
SC list decoding (L=2)SC decoding 
......
......
......
0 1 1 0
0
1
0 1 1 0
0
1
1 0 1 0
0
1
Fig. 2 SC decoding and SCL decoding with L = 2.
CRC
K = k+m 
Polar 
Encoder
N bits
W
SCL De-CRC
Failure
Otherwise
CA-SCL Decoder
k 
bits
At least one path pass
Output
Fig. 3 CA-SCL polar decoding.
2.C CA-SCL Polar Decoder
For further improvement, CA-SCL decoder introduces
CRC as a detection tool at the end of decoding [8]. Il-
lustrated in Fig. 3, CRC detector helps to decide which
candidates are possibly correct before metric compar-
ison. Here, m denotes the number of CRC bits. The
CRC-passed candidate with the largest metric value is
chosen as the final result. If no candidate passes the
CRC detection, a decoding failure is claimed.
3 Segmented CA-SCL Decoding Schemes
In this section, we first introduce two SCA-SCL decod-
ing schemes, then propose a refined version. Without
loss of generality, the (1024, 512) code [2] is employed
as a running example, whose polarization is in Fig. 4.
Here W is a BEC with erasure probability  = 0.5,
I(W
(i)
N ) is computed by: I(W
(2i−1)
N ) = I(W
(i)
N/2)
2,
I(W
(2i)
N ) = 2I(W
(i)
N/2)− I(W (i)N/2)2;
(8)
with I(W (1)) = 1 − . The blue stars in Fig. 4 denote
the information bits, whereas the red points denote the
frozen bits.
1 256 512 768 1024
Channel index
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Sy
m
m
et
ric
 c
ap
ac
ity
Frozen bits
Information bits
Fig. 4 Channel polarization for a BEC with  = 0.5.
3.A Comparison of Different Segmented Schemes
To the authors’ best knowledge, there are two segmented
CRC-aided SCL methods. The PSCL scheme proposed
4 Huayi Zhou et al.
in [11] aims to reduce memory consumption, and ap-
plies uniform partitions to code bits for implementa-
tion convenience. The hardware reduction comes at the
cost of some performance loss compared to the conven-
tional CA-SCL algorithm and always forces the number
of candidate paths to 1 after each CRC. The SCA-SCL
scheme proposed in [10] aims to reduce both the time
and space complexity. Uniform segments are applied to
information bits and CRC is employed as a tool to elim-
inate decoding redundancy without harming the perfor-
mance.
PSCL:  uniform partition of code bits
T1
T2
T3
T4
SCA-SCL:  uniform partition of information bits
S1
S2
S3
S4
Distribution of code  bits Frozen bits Information  bits
Fig. 5 Different segmented decoding schemes with P = 4.
Let P denote the number of segments. For PSCL
decoder, the index set of Segment-i is Ti (1 ≤ i ≤ P ).
| · | denotes the cardinality of one set. We have∑P
i=1
|Ti| = N, and |Ti| = N/P. (9)
For SCA-SCL decoder, the index set of Segment-i is Si
(1 ≤ i ≤ P ). we have∑P
i=1
|Si| = N, and |Si ∩ A| = K/P. (10)
One simple example of P = 4 is illustrated in Fig. 5.
Theoretically, both schemes are similar but differ
in the partition methods. PSCL decoding employs uni-
form code bit partition, which is implementation friendly.
However, since only one candidate can survive after
each CRC, small performance degradation is expected,
especially in low SNR region. SCA-SCL decoding em-
ploys uniform information bit partition, which can keep
the performance as CA-SCL decoding while successfully
reducing the space and time complexity. This advantage
comes from the decoding flexibility. However, the flexi-
bility will make the implementation more complicated.
3.B PSCL with Early Termination
The first observation is that, both schemes apply the
same CRC to the uniformly partitioned segments. With-
out looking into the symmetric capacity of each binary
channel, this straightforward scheme may not be opti-
mal. The second observation is that both schemes have
their own merits, it would be smarter to merge them
together. In other words, it is estimated that we can
propose a new approach which is both implementation
friendly and adaptive.
One simple mixture of both schemes is to introduce
early termination to PSCL decoding. However, this sim-
ple combination may not be reasonable in certain cases.
Fig. 6 gives an example with P = 4. Shown in Fig. 7,
for the (1024, 512) code with m = 32, the information
lengths of four segments are 20, 123, 156, and 245, re-
spectively. If uniform CRC bits are employed, the first
segment has |T ′1| = |T1| − |C1| = 12 information bits
and the last segment has |T ′4| = |T4| − |C4| = 237 in-
formation bits. It is unreasonable to use the same 8-bit
CRC to both the 12-bit and 237-bit segments. To this
end, the TCA-SCL decoding is proposed in the follow-
ing section.
4 CA-SCL Decoding with Tailored CRC
In this section, we first discuss how to measure the
requirement of CRC bits for different segments. Then
the concepts of virtual transform and virtual length are
introduced. A visualization method of polarized chan-
nel’s symmetric capacity is also proposed. The detailed
TCA-SCL decoding is finally proposed. It should be
noted that though the TCA-SCL decoding is based on
uniform partition of code bits, it can be readily applied
to other uniform or nonuniform partition schemes.
4.A Requirement of CRC Length for Polar Codes
Assume a total of m CRC bits are available and are
divided into P segments C1, C2, . . . , CP . It may not be
suitable to set |C1| = |C2| = . . . = |CP | for P segments
with different lengths. How to measure the requirement
of the CRC length for each segment is critical. To the
authors’ best knowledge, no literature has addressed
this specific problem. To maintain the same error detec-
tion capability in the situation of independent channels,
it is concluded that longer sequence requires more CRC
bits [12]. However, this conclusion does not suit polar
codes because the reliability of different channels are
different. A reasonable measurement on requirement of
CRC length should take both sequence length and sym-
metric capacity into account. In the following, concepts
of virtual transform and virtual length are proposed to
this end.
4.B Virtual Transform and Virtual Length
Including CRC bits, we always pick the K + m most
reliable bits out of N based on the symmetric capacity
Segmented Successive Cancellation List Polar Decoding with Tailored CRC 5
k 
bits
PSCL Decoding with 
Early Termination
K = k+m
Polar 
Encoder
N bits
W
Segment 
SCL
De-CRC 
Failure
Pass
Output
Segment 
SCL
De-CRC 
Pass Segment 
SCL
De-CRC 
Pass Segment 
SCL
De-CRC 
Pass
En-CRC
Segmented 
by code bits
Fail Fail Fail Fail
Fig. 6 PSCL decoding with early termination.
T '1 C1 T '2 C2 T '4C3T '3 C4
T1 T2 T3 T4
Fig. 7 The CRC allocation of PSCL (N = 1024, K = 512).
I(W
(i)
N ) with i ∈ A′. A′ is the new information set
including CRC bits, and |A′| = K +m. Calculate I¯ as
follows:
I¯ =
1
K +m
∑
i∈A′
I(W
(i)
N ). (11)
Definition 1 Virtual Transform To operate the vir-
tual transform, we first calculate I ′(i):
I ′(i) = I¯/I(W (i)N ). (12)
The virtual value of the channel is
J(i) =

1 +
(I ′(i)− 1)
2(1− I¯) , if I
′(i) ≥ 1;
1− (1− I
′(i))
2(1− I¯) , if I
′(i) < 1.
(13)
Definition 2 Virtual Length The summation of J(i)
in the k-th segment is its virtual length:
vlk =
∑
i∈{Tk∩A′}
J(i). (14)
The CRC allocation is given by
|C1| : . . . : |CP | = adjust
(
m× vl1∑P
i=1 vli
, . . . ,
m× vlP∑P
i=1 vli
)
,(15)
where adjust(·) is a function which adjusts the allo-
cation results to near integers and takes the follow-
ing steps: 1) find an unmarked k which has minimum
|ROUND( m×vlk∑P
i=1 vli
) − m×vlk∑P
i=1 vli
|, then mark k and set
|Ck| = ROUND( m×vlk∑P
i=1 vli
); 2) repeat step 1) for (P − 2)
times; 3) Assume the left unmarked index is k′. Set
|Ck′ | = m−
∑
i 6=k′
|Ci|, where 1 ≤ i ≤ P .
4.C Visualization of Channel Symmetric Capacity
Before we give more details of the proposed TCA-SCL
decoding, one visualization method of symmetric ca-
pacity is proposed for easy understanding and illustra-
tion. In this visualization, the gradient colors from iri-
descence are used to demonstrate the symmetric ca-
pacity of each channel. According to the legend, the
more symmetric capacity approaching 1 (0), the more
bathochromic (hypsochromic) it will be. Fig. 8(a) shows
the visualization for polar codes with N = 64.
Example 1 For (1024, 512) polar codes with 32 CRC
bits, visualization of code bits is given in Fig. 8(b).
The visualization of 512 information bits is given in
Fig. 8(c). For TCA-SCL decoding, set P = 4. Accord-
ing to Definition 2, the ratio of virtual lengths is:
vl1 : vl2 : vl3 : vl4 = 3.54 : 9.84 : 10.91 : 7.70, (16)
which is illustrated by Fig. 8(d). Then the CRC alloca-
tion is obtained according to Eq. (15):
|C1| : |C2| : |C3| : |C4| = 3 : 10 : 11 : 8. (17)
The refined CRC allocation based on virtual length is
given in Fig. 8(e).
Remark 1 Generally speaking, the hypsochromic part
in the visualization chart mainly contributes to the vir-
tual length. The more hypsochromic segment requires
more CRC bits.
This refined SCA-SCL decoding based on virtual
length is named TCA-SCL decoding. Details of TCA-
SCL decoding is given as follows. The corresponding
performance and implementation are discussed in Sec-
tion 6 and Section 7.
4.D Tailored CA-SCL Decoding
The detailed tailored CA-SCL decoding is given in this
subsection. For TCA-SCL encoding, we set P segments
6 Huayi Zhou et al.
(a) Visualization of symmetric capacity for code bits (N = 64).
(b) Visualization of symmetric capacity for code bits (N = 1024).
(c) Visualization of symmetric capacity for information bits (N = 1024, K = 512).
103 11 8
(d) Four segments of information bits with virtual lengths (N = 1024, K = 512).
T '1 C1 T '2 C2 T '4C3T '3 C4
T1 T3 T4T2
(e) The CRC allocation of TCA-SCL (N = 1024, K = 512).
Fig. 8 Visualization illustration of symmetric capacity.
and perform the virtual transform to obtain the cor-
responding virtual lengths. Then we allocate the CRC
bits according to the ratio of virtual lengths before po-
lar encoding.
Here, addCRC(·) is function which performs Eq. (15).
Function encoder(·) performs conventional polar encod-
ing. For TCA-SCL decoding, SCL decoding with early
termination is performed as follows. Here, the function
SCL′(·) is the SCL decoding for Segment-j. Define Ui
as the the output paths set of SCL(·) in i-th segment.
Define passCRC(·) as the function which checks if at
least one path of Ui can pass the CRC. If one or more
than one path can pass the CRC, the path with the
largest metric of them is chosen to refresh uˆN1 .
5 TCA-SCL Decoding with HARQ
Besides early termination, the proposed TCA-SCL de-
coding can also work in a HARQ way when segmented
CRC fails. HARQ has been widely used in delay insensi-
tive communication systems for a capacity-approaching
throughput [13–15]. Recently, HARQ has been consid-
ered for polar decoding. [16] introduced a HARQ scheme
Algorithm 1 TCA-SCL Polar Encoding
Input: uN1 , I(W
(i)
N ), N , K, m, P .
1: Set P segments;
2: I¯ = 1K+m
∑
i∈A′ I(W
(i)
N );
3: for i = 1; i <= N ; i+ + do
4: I ′(i) = I¯/I(W (i)N );
5: if I ′(i) ≥ 1 then
6: J(i) = 1 + (I
′(i)−1)
2(1−I¯) ;
7: else
8: J(i) = 1− (1−I′(i))
2(1−I¯) ;
9: end if
10: end for
11: for k = 1; k <= P ; k + + do
12: vlk =
∑
i∈{Tk∩A′} J(i);
13: end for
14: addCRC(uN1 , vl1, vl2, .., vlP );
15: xN1 = encoder(u
N
1 ).
Output: xN1 .
based on a class of rate-compatible polar codes con-
structed by performing punctures and repetitions us-
ing punctured polar coding [17]. An incremental redun-
Segmented Successive Cancellation List Polar Decoding with Tailored CRC 7
Algorithm 2 TCA-SCL Polar Decoding
Input: yN1 , N , P , L.
1: for i = 1; i <= P ; i+ + do
2: Ui = SCL′(yN1 , i, L);
3: if passCRC(Ui) = false then
4: break;
5: end if
6: refresh uˆN1 by the survival path in Ui;
7: end for
Output: uˆN1 .
dancy HARQ (IR-HARQ) scheme via puncturing and
extending of polar codes is proposed in [18]. Both al-
gorithms use punctured patterns to suit different rates.
However, puncturing causes a performance loss and needs
hybrid decoding schemes to remedy it with high com-
plexity. And IR-HARQ scheme needs to retransmit frozen
bits one by one after transmitting K information bits.
Therefore, the decoding complexity of IR-HARQ isO(N2 logN),
which is high for a large N .
To overcome this issue, we give a HARQ-TCA-SCL
scheme based on TCA-SCL decoding. When a segment
decoding failure occurs, the system resends the specific
segment and merges the new information bits with the
old ones by maximum ratio combining (MRC). For dif-
ferent segments sharing the same SNR, decoder can ap-
ply linear superposition to obtain the average value.
As the number of segment retransmission goes up, the
noise power converge to zero, which helps to improve
the performance effectively.
Segment 
SCL
De-CRC 
i = i+1
Pass
HARQ-TCA-SCL 
decoding
i ?TNo
Failure
Next 
segment
Finish 
decoding
Output
Yes
Retransmit 
and 
combine
K = k+m
Polar 
Encoder
N bits
W
k 
bits
En-CRC
Segmented 
by total bits
No
Yes
Fail
Fig. 9 Proposed HARQ-TCA-SCL decoding scheme.
The proposed HARQ-TCA-SCL scheme is illustrated
in Fig. 9. Let i denotes the current number of times a
transmission attempted, T denotes the maximum re-
transmission times, and j (≤ P ) denotes the current
position of the segments. The details of HARQ-TCA-
SCL scheme are listed as follows:
We initialize i = 1 for the HARQ-TCA-SCL de-
coding, then perform the SCL decoding for Segment-j
(function SCL′(·)) and obtain CRC results on each sur-
vival path at the end of segment SCL decoding. If at
least one path can pass CRC, we save the path with the
Algorithm 3 HARQ-TCA-SCL Polar Decoding
Input: yN1 , T , P , L
1: i = 1;
2: for j = 1 to P do
3: mark = false;
4: while i < T and mark = false do
5: Uj = SCL′(yN1 , j, L);
6: mark = passCRC(Uj);
7: if mark = false then
8: i = i+ 1;
9: Retransmit and combine Segment-j;
10: end if
11: end while
12: if mark = false then
13: break;
14: end if
15: refresh uˆN1 by the survival path in Uj ;
16: end for
Output: uˆN1
highest probability and move to the next segment. Oth-
erwise, we update i to i + 1, combine Segment-j with
the retransmitted part and the old ones, redo the TCA-
SCL decoding. Algorithm terminates with a decoding
failure if i = T .
6 Performance and Complexity Analysis
6.A Performance Analysis
In this subsection, performance comparison between
different algorithms is given with binary-input addi-
tive white Gaussian noise channels (BI-AWGNCs). Dif-
ferent code lengths, rates, and partition schemes are
considered: for Fig. 10(a), we have N = 64, K = 36,
m = 8, and P = 2; for Fig. 10(b), we have N = 1024,
K = 512, m = 32, and P = 4. The information set A
is selected according to [2, 19]. We use corresponding
hex value to represent CRC polynomial. For example,
a CRC-4 detector with polynomial g(D) = D4 +D+ 1
is described as CRC-4 (0x9) in this paper (the ‘+1’ is
implicit in the hex value). For (64, 36) code, we set 2
copies of CRC-4 (0x9) for (HARQ-)PSCL scheme, and
CRC-5 (0x12) and CRC-3 (0x5) for (HARQ-)TCA-SCL
scheme. For (1024, 512) code, we set 4 copies of CRC-8
(0xA6) for (HARQ-)PSCL scheme, and CRC-3 (0x5),
CRC-10 (0x327), CRC-11 (0x583), and CRC-8 (0xA6)
for (HARQ-)TCA-SCL scheme. All the CRC detectors
are with the best CRC generation polynomial suggested
by [12].
According to Fig. 10, compared with the PSCL scheme,
the proposed TCA-SCL scheme has a 0.1 dB perfor-
8 Huayi Zhou et al.
1 1.5 2 2.5 3 3.5 4 4.5 5
10−4
10−3
10−2
10−1
100
Eb/N0 (dB)
FE
R
 
 
PSCL
TCA−SCL
HARQ−PSCL (T = 3)
HARQ−TCA−SCL (T = 3)
(a) (N = 64, K = 36, m = 8, P = 2)
1 1.5 2 2.5 3
10−3
10−2
10−1
100
Eb/N0 (dB)
FE
R
 
 
PSCL
TCA−SCL
HARQ−PSCL (T = 3)
HARQ−TCA−SCL (T = 3)
(b) (N = 1024, K = 512, m = 32, P = 4)
Fig. 10 FER comparison of (HARQ-)TCA-SCL and
(HARQ-)PSCL schemes.
mance gain when FER = 10−2 for both (64, 36) and
(1024, 512) codes. The HARQ-TCA-SCL (T = 3) scheme
introduces a 0.25 dB and 0.13 dB gain over the HARQ-
PSCL scheme when frame error rate FER = 10−2 for
(64, 36) and (1024, 512) codes, respectively.
6.B Complexity Analysis
Define the product of the actual decoding length and
list size as the average list size. Since the average com-
putational complexity is proportional to the average list
size, here we analyze the average list sizes of TCA-SCL
and HARQ-TCA-SCL decoders denoted by L¯T and L¯H ,
respectively. Assume the total frame number is F , and
the decoder ends at the Pi-th segment of the i-th frame.
For the TCA-SCL decoder, L¯T can be calculated as
L¯T =
L×∑Fi=1 Pi
P × F . (18)
1 1.5 2 2.5 3 3.5 4 4.5 5
1
1.2
1.4
1.6
1.8
2
2.2
2.4
2.6
2.8
3
Av
er
ag
e 
Li
st
 S
ize
Eb/N0 (dB)
 
 
PSCL
TCA−SCL
HARQ−PSCL (T = 3)
HARQ−TCA−SCL (T = 3)
(a) (N = 64, K = 36, m = 8, P = 2)
1 1.5 2 2.5 3
1
1.2
1.4
1.6
1.8
2
2.2
2.4
2.6
2.8
3
Av
er
ag
e 
Li
st
 S
ize
Eb/N0 (dB)
 
 
PSCL
TCA−SCL
HARQ−PSCL (T = 3)
HARQ−TCA−SCL (T = 3)
(b) (N = 1024, K = 512, m = 32, P = 4)
Fig. 11 Average list sizes of (HARQ-)TCA-SCL and
(HARQ-)PSCL schemes.
Suppose the i-th frame is retransmitted Ri times (0 ≤
Ri ≤ T ). For the HARQ-TCA-SCL decoder, L¯H is cal-
culated as
L¯H =
L×∑Fi=1 (Pi +Ri)
P × F . (19)
For low SNR, thanks to the early termination L¯T
is small due to high error rate. On the other hand, a
larger number of retransmissions leads to a higher L¯H
for HARQ-TCA-SCL decoder. As SNR increases, L¯T
and L¯H converge to L: 1) TCA-SCL decoder is more
likely to finish the decoding process, and 2) the retrans-
mission time of HARQ-TCA-SCL decoder converges to
0. It should be noted that, according to Eq. (18) and
(19) 0 ≤ L¯H − L¯T ≤ LP T .
Shown in Fig. 11, (HARQ-)TCA-SCL scheme has
the same complexity as (HARQ-)PSCL scheme. The
HARQ-TCA-SCL scheme has 50.3% and 38.5% higher
complexity than the PSCL scheme at SNR = 1.5 dB
Segmented Successive Cancellation List Polar Decoding with Tailored CRC 9
for (64, 36) and (1024, 512) codes, respectively. As SNR
goes up, the complexity of HARQ-TCA-SCL scheme
tends to be as same as the PSCL scheme asymptotically
with better performance.
7 Efficient TCA-SCL Decoder Architectures
To facilitate the application of the proposed TCA-SCL
decoder, efficient architectures and FPGA implementa-
tions are proposed in this section and are also given
to demonstrate its merits. Since hardware consump-
tion and decoding latency are two main concerns of
SCL family decoder, the proposed architecture aims to
achieve a good balance in between. The HARQ-TCA-
SCL decoder can also be designed similarly.
7.A Hardware Consumption Analysis
7.A.1 Full Module TCA-SCL Architecture
In this subsection, a full module TCA-SCL architecture
is proposed, which is mainly based on the conventional
folded SC architecture proposed in [20]. The architec-
ture for full module TCA-SCL decoder is illustrated
in Fig. 12. It divides all mixed node modules (MNs)
into n = log2N stages, and each MN implements two
types of calculations mentioned in Eq. (6). According
to the conclusions in [20], for an N -bit SC decoder,
(N − 1) MNs are required. For an N -bit CA-SCL de-
coder, L(N − 1) MNs are employed.
Theorem 1 For one P -segmented TCA-SCL decoder
with list L, the total number of MNs is
MNtotal = N − L+ (L− 1)NP . (20)
Proof For the given decoder, its MNs can be catego-
rized into two parts. The first part includes Stages 1 to
log2 P . The second part includes Stages (log2 P + 1) to
n. It should be noted that since N is power of 2, log2 P
is always an integer.
Since each segment outputs only one candidate, the
first part obeys SC decoding rule, and list size L is not
necessary. The number of MNs is
MN1 =
∑log2 P
i=1 N/2
i−1 = N − NP . (21)
The second part obeys CA-SCL decoding rule with-
out considering the fine-gain scheduling. The number
of MNs is
MN2 = L
∑n
i=log2 P+1
N/2i−1 = L(NP − 1). (22)
Since the memory block corresponds to MNs, the
memory complexity is as follows
Corollary 1 Assume the quantization length for the
LLR message is q, the memory bits required are
memtotal = q ×MNtotal = q
(
N − L+ (L− 1)NP
)
. (23)
The list core (LC) module in Fig. 12 mainly imple-
ments the sorting operation. In order to reduce both the
sorting latency and complexity, the efficient distributed
sorting (DS) proposed in [21] is employed here.
7.A.2 Folded Module TCA-SCL Architecture
Thanks to the early termination scheme, the proposed
full module architecture for TCA-SCL decoding is mem-
ory efficient compared to conventional CA-SCL decod-
ing. However, the hardware utilization ratio (HUR) of
MNs is very low. Borrowing the fine-folding idea pro-
posed in [21, 22], this paper then proposes the folded
module TCA-SCL architecture for higher HUR. We set
up a sub-decoder with (2dn/2e − 1)L MNs for Stage
1 to dn/2e. Stage (dn/2e + 1) to n can also be im-
plemented by this sub-decoder in a time-multiplexing
manner. Fig. 13 gives an example of a even n, Stage
1 and Stage n/2 + 1, Stage 2 and Stage n/2 + 2, ...,
Stage n/2 and Stage n are time-multiplexing. If n is
odd, Stage 1 and Stage (n+1)/2+1, Stage 2 and Stage
(n + 1)/2 + 2, ..., Stage (n + 1)/2− 1 and Stage n are
time-multiplexing, and Stage (n + 1)/2 uses the last
stage alone. Parameter j in Fig. 13 denotes the cur-
rent folding order. However, the characteristics in Sec-
tion 7.A.1 which helps to reduce the complexity of the
first log2 P stages could not be employed here, because
folding technique is based on uniform hardware. The
complexity is
Theorem 2 For one folded module TCA-SCL decoder
with list L, the total number of MNs is
MNtotal = (2
dn/2e − 1)L. (24)
Proof When implementing Stage 1 to dn/2e, all the in-
put and output multiplexers choose mode ‘0’. 2dn/2e+1
executions are required to output 2dn/2e+1L LLRs for
Stage dn/2e. For P -segmented decoder, if log2 P ≥ dn/2e,
(2dn/2e − 1)(L − 1) MNs are idle during this decod-
ing stage. Otherwise, according to Eq. (21), (2dn/2e −
2dn/2e
P )(L − 1) MNs are idle. Therefore, (2dn/2e − 1)L
MNs are sufficient.
When implementing Stage (dn/2e+ 1) to n, all the
input and output multiplexers choose mode ‘1’. Since
2dn/2e+1L LLRs become the input of the sub-decoder,
no MN is idle during this stage. Therefore, the total
number of MNs is (2dn/2e − 1)L.
Theorem 3 Assume the quantization length for the LLR
message is q, the memory bits required by the folded
10 Huayi Zhou et al.
MN II
USUM
MN II
Stage 2 Stage n-1 Stage n
ˆ
-i
u2 1
ˆ
i
u2
... ...
MN 
MN I
MN 
MN I
MN 
LC
USUMUSUM
MN IIMN 
MN IIMN 
Stage 3 ...
...
MN 
MN 
MN 
MN 
MN 
MN 
MN 
MN 
MN 
MN 
...
MN 
MN 
Stage 1
Update
MN IICRC1
MN IICRC2
MN IICRCP
emory 
Update
C
MN Mixed Node
LC List Core
Multiplexer
ComparatorC
CRC CRC detector
L Parallel 
Modules
USUMUSUM
(1)
1 1( )L y
(1)
1 2( )L y
(1)
1 3( )L y
(1)
1 4( )L y
(1)
1 3( )NL y -
(1)
1 2( )NL y -
(1)
1 1( )NL y -
(1)
1 ( )NL y
...
0 1
0
1
P-1
0
1
0
1
...
P-1
......
Fig. 12 Architecture for full module TCA-SCL decoder.
1
0
1
0
MN II
USUM
MN II
Stage 1
or
Stage n/2+1
Stage n/2-1
or
Stage n-1
Stage n/2
or
Stage n
ˆ
-i
u2 1
ˆ
i
u2
MN 
LC
Mixed Node
List Core
... ...
MN 
MN I
MN 
MN I
MN 
LC
USUMUSUM
MN IIMN 
MN IIMN 
Stage 2
or
Stage n/2+2
...
...
USUMUSUM
L Parallel 
Modules
Update
MN IICRC1
MN IICRC2
MN IICRCP
Memory 
Update
Multiplexer
CRC CRC detector
C C Comparator
MN I
MN 
MN I
MN 
1
0
MN IILLR
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
MN I
MN 
MN I
MN 
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
(1)
1 ( 1) 1
( )
j N
L y
- +
(1)
1 ( 1) 2
( )
j N
L y
- +
(1)
1 ( 1) 3
( )
j N
L y
- +
(1)
1 ( 1) 4
( )
j N
L y
- +
(1)
1 3
( )
j N
L y
-
(1)
1 2
( )
j N
L y
-
(1)
1 1
( )
j N
L y
-
(1)
1 ( )j NL y
...
LLR LLR message
0 1
0
1
0
1
0
1
P-1
...
P-1
......
Fig. 13 Architecture for folded TCA-SCL decoder (n is even).
Table 1 Implementation Analysis for Different Schemes
Schemes Mixed node #
Memory (bit)
Latency (clock cycles)
LLRs Outputs
CA-SCL (N − 1)L 2046 (N − 1)Lq 2046q (K +m)L 1088 TCA
TCA-SCL (SF) N − L+ (L−1)N
P
1278 (N − L+ (L−1)N
P
)q 1278q (K +m)L 1088 TCA + (T1 + ...+ T(P−1))L
TCA-SCL (DF) N − L+ (L−1)N
P
1278 (N − L+ (L−1)N
P
)q 1278q 2(K +m)L 2176 TCA + P log2 P − 2P + 2
FTCA-SCL (SF) (
√
N − 1)L 62 (N − L+ (L−1)N
P
)q 1278q (K +m)L 1088 TCA + F + (T1 + ...+ T(P−1))L
FTCA-SCL (DF) (
√
N − 1)L 62 (N − L+ (L−1)N
P
)q 1278q 2(K +m)L 2176 TCA + F + P log2 P − 2P + 2
module TCA-SCL architecture is
memtotal = q
(
N − L+ (L− 1)NP
)
.
Proof The folded design only reduces the complexity of
MNs. However, the memory complexity stays the same
as the full module TCA-SCL architecture.
Table 2 gives FPGA results in accordance with Theo-
rem 3.
7.B Timing Analysis
7.B.1 Single Frame Scheme
As Fig. 12 shown, the decoding process for TCA-SCL
has the following steps: 1) In Segment j, MNs complete
the main decoding in Eq. (6). The 2L LLRs correspond
to uˆi for each path. 2) 2L LLRs are input to the LC
module. DS method [21] is employed to select the best
L paths. 3) The memory is updated and partial sum
vector uˆsum is calculated for uˆi+1. 4) We repeat the
above steps to get the L paths for uˆ
(NP j−1)
1 . Then, uˆNP j
is directly chosen as ‘0’ or ‘1’ for each path without de-
coding. After that, we input information bits in uˆ
N
P j
N
P (j−1)
for 2L paths to CRCj to pick up the only path for Seg-
ment j + 1. CRC is implemented with linear feedback
shift register (LFSR) [23], and determines the coeffi-
cient of xor. Shown in Fig. 12, P CRC modules are
employed. It should be noted that here CRCj takes care
of 2L paths in serial manner. Admittedly, designers can
process 2L with parallel CRCs. Considering the simple
Segmented Successive Cancellation List Polar Decoding with Tailored CRC 11
SEG1 SEG2
BLOCK1
BLOCK2
SEG1 SEG2
Frame 1
Frame 2
 CRC
 clock cycles
Dynamic TCA-SCL 
decoding clock cycles
(a) TCA-SCL polar decoder with SF scheme.
SEG3
Dynamic TCA-SCL
decoding clock cycles
CRC
 clock cycles
SEG4
BLOCK1
BLOCK2
Frame 1
SEG2 SEG3 SEG4
Frame 2
(b) TCA-SCL polar decoder with DF scheme.
Fig. 14 Timing analysis for TCA-SCL polar decoder with SF and DF schemes.
CRC and its short processing time, serial manner is
employed here.
The scheduling of this single frame (SF) scheme is
shown in Fig. 14(a). The latency of SF TCA-SCL de-
coder is
Theorem 4 Assume the latency of CA-SCL is TCA
clock cycles. The latency for CRCi is Ti. For one SF P -
segmented TCA-SCL decoder with list L, the decoding
latency is
TSF = TCA + 2L(T1 + ...+ T(P−1)). (25)
Proof After checking all 2L paths of Segment i, the
decoder selects one path and begins to decode Segment
(i+1). In SF scheme, segmented CRC scheme increases
latency for serial checking of Segment i
Tcrci = 2LTi. (26)
In SF scheme, checking Segment P of Frame 1 and
decoding Segment 1 of Frame 2 can be done at the same
time. Since the checking time is shorter than decoding
time, the latency increase is
Tinc = 2L
∑P−1
i=1
Ti. (27)
Now the proof is immediate.
Folded module TCA-SCL decoder can also work in
the proposed SF scheme.
Corollary 2 Assume the folding technique introduces
F extra clock cycles per frame, the latency of SF folded
module TCA-SCL decoder is
TSF = TCA + F + 2L(T1 + ...+ T(P−1)). (28)
7.B.2 Double Frame Scheme
SF decoding introduces 2L(T1+...+T(P−1)) extra clock
cycles per frame. During CRC detection, all MNs are
idle and HUR is therefore low. To this end, the double
frame (DF) scheme is proposed.
The main idea of DF is shown in Fig. 14(b). Two
frames are decoded simultaneously in an interleaved
manner: when Frame 1 checks (decodes) its Segment
i, Frame 2 decodes (checks) its Segment i (i− 1). Since
both frames share the same architecture, every time a
new segment is decoded, all LLRs in memory belong
to the other frame. If we keep the decoding latency of
each frame the same as CA-SCL decoder, Stage 1 to
(log2 P − 1) need an extra memory block of q(N2 + N4 +
...+ 2NP ) bits to save LLRs, which is not appreciated by
hardware design.
If no extra memory is available, each new segment
begins its decoding with Stage 1. In this way, DF scheme
still requires a memory block of
q
(
N − L+ (L− 1)NP
)
bits with slightly increased la-
tency. For DF full module TCA-SCL decoder:
Theorem 5 For one DF P -segmented TCA-SCL de-
coder with list L, the decoding latency is
TDF = TCA + P log2 P − 2P + 2. (29)
Proof For the interleaved manner in Fig. 14(b), the la-
tency of each segment is
Tsegi = max{Tdeci , Tcrci}, (30)
where Tdeci denotes the SCL decoding latency for Seg-
ment i, which includes SC decoding and DS. According
12 Huayi Zhou et al.
to [21], the DS latency for Segment i is approximately
2LTi, therefore
Tdeci > 2LTi. (31)
Since Tcrci = 2LTi
TDF =
∑P
i=1
Tsegi =
∑P
i=1
Tdeci . (32)
It is believed that there are 2i segments, which could
calculate from Stage (i+ 1), now calculates from Stage
1 and introduce latency of i·2i. Therefore, the decoding
increased latency is
Tinc =
∑log2 P−1
i=1
i · 2i = P log2 P − 2P + 2. (33)
Now the proof is immediate.
Folded module TCA-SCL decoder can also work in
the proposed DF scheme with the following latency.
Corollary 3 Assume the folding technique introduces
F extra clock cycles per frame, the latency of DF folded
module TCA-SCL decoder is
TDF = TCA + F + P log2 P − 2P + 2. (34)
Table 1 shows comparison between five different schemes:
CA-SCL decoder, SF (DF) full module TCA-SCL de-
coders, and SF (DF) folded module TCA-SCL decoders.
According to Section 4.C, CRC allocation is
(|C1|, |C2|, |C3|, |C4|) = (3, 10, 11, 8). Data in red show
the example of N = 1024, K = 512, P = 4, and L = 2.
7.C FPGA Implementation Results
To better demonstrate the advantages of the proposed
TCA-SCL decoders, FPGA implementations based on
Altera Stratix V are given as well. To be in accordance
with Table 2, five decoders have been implemented. The
same parameters as the aforementioned example are
employed here: N = 1024, K = 512, m = 32, P = 4,
and L = 2. All the five decoders employ the same LLR
quantization scheme of 1 sign bit, 6 integer bits, and 1
decimal bit. In Fig. 15, the FER performance compar-
ison of floating SC and quantized-SC with q = 8 bits
indicates the validity of the quantized scheme.
The implementation results are compared in terms
of adaptive logic modules (ALMs), registers, and mem-
ory bits. It is shown that, compared to the CA-SCL de-
coder, TCA-SCL (SF or DF) decoder can achieve 18.8%
or 15.0% ALM reduction. For further ALM reduction,
with the help of folding technique, FTCA-SCL (SF or
DF) decoder consumes 40.11% or 42.9% ALMs com-
pared to TCA-SCL (SF or DF) with slightly increased
latency, as analyzed in [22]. It is also observed that the
1 1.5 2 2.5 3
10−3
10−2
10−1
100
Eb/N0 (dB)
FE
R
 
 
Floating SC
Quantized−SC (8 bits)
Fig. 15 Performance comparison regarding quantization
(N = 1024, K = 512).
ALMs’ reduction is not that much as the reduction of
MNs listed in Table 1. This is because Table 1 does not
consider the comparison part, which introduces major
part of ALMs consumption and stays the same between
different architectures.
For implementation convenience, here memory has
been employed by both folded decoders. Therefore, we
consider the sum of registers and memory bits as the
total memory consumption. It is observed TCA-SCL
(SF or DF) decoder requires 77.03% or 82.1% memory
compared to the CA-SCL decoder. Also, the introduc-
tion of folding technique does not affect the memory
cost, which has been indicated by Theorem 3. Compar-
ing FTCA-SCL (DF) decoder and FTCA-SCL (SF) de-
coder, when DF scheme is employed, the latency can be
reduced 15.90% at the cost of 11.99% increased ALMs.
For the latency issue, since the critical paths of all
designs are determined by the critical path of the same
SC decoding kernel, we believe it is safe to compare in
term of clock number. It is shown that the segmented
CRC decoders will introduce more latency due to more
serial CRC operations. Second, the DF scheme is more
time efficient. Third, the folded versions come at the
cost of higher latency.
In general, the proposed four architecture of DC-
SCL decoding can reduce the hardware consumption
compared to CA-SCL decoder. Designers can choose
the suitable one according to different application re-
quirements.
8 Conclusions
In this paper, a segmented SCL polar decoding with tai-
lored CRC is proposed. Method on how to choose the
proper CRC for a given segment is proposed with help
Segmented Successive Cancellation List Polar Decoding with Tailored CRC 13
Table 2 FPGA Implementation Results for Different
Schemes
Schemes ALMs Registers Memory Latency
CA-SCL 102, 847 20, 064 0 2655
TCA-SCL (SF) 83, 529 15, 456 0 3253
TCA-SCL (DF) 87, 454 16, 480 0 2657
FTCA-SCL (SF) 33, 502 5, 515 11, 264 3749
FTCA-SCL (DF) 37, 518 6, 558 11, 264 3153
of concepts of virtual transform and virtual length. Nu-
merical results have shown that the proposed TCA-SCL
decoder can achieve better performance and lower com-
plexity than conventional CA-SCL decoder. Thanks to
the more reasonable CRC partition scheme, the TCA-
SCL decoder can also outperform the PSCL decoder.
For further performance improvement, HARQ-TCA-SCL
scheme is proposed at the cost of increased complex-
ity. Efficient architectures and FPGA implementations
are also proposed for a good balance between hardware
consumption and decoding latency.
References
1. E. Arıkan and E. Telatar, “On the rate of channel
polarization,” in Proc. IEEE International Sym-
posium on Information Theory (ISIT), 2009, pp.
1493–1495.
2. E. Arıkan, “Channel polarization: A method for
constructing capacity-achieving codes for sym-
metric binary-input memoryless channels,” IEEE
Trans. Inf. Theory, vol. 55, no. 7, pp. 3051–3073,
July. 2009.
3. N. Goela, S. B. Korada, and M. Gastpar, “On LP
decoding of polar codes,” in Proc. IEEE Informa-
tion Theory Workshop (ITW), 2010, pp. 1–5.
4. E. Arıkan, “A performance comparison of polar
codes and Reed-Muller codes,” IEEE Commun.
Lett., vol. 12, no. 6, pp. 447–449, June. 2008.
5. N. Hussami, S. B. Korada, and R. Urbanke, “Per-
formance of polar codes for channel and source cod-
ing,” in Proc. IEEE International Symposium on
Information Theory (ISIT), 2009, pp. 1488–1492.
6. I. Tal and A. Vardy, “List decoding of polar codes,”
in Proc. IEEE International Symposium on Infor-
mation Theory Proceedings (ISIT), 2011, pp. 1–5.
7. K. Chen, K. Niu, and J. Lin, “List successive can-
cellation decoding of polar codes,” Electronics Let-
ters, vol. 48, no. 9, pp. 500–501, April 2012.
8. K. Niu and K. Chen, “CRC-Aided decoding of po-
lar codes,” IEEE Commun. Lett., vol. 16, no. 10,
pp. 1668–1671, 2012.
9. E. Arıkan, “Polar Coding for 5G Wireless?” June
2015, Invited Talk of International Workshop on
Polar Code.
10. H. Zhou, C. Zhang, W. Song, S. Xu, and X. You,
“Segmented CRC-Aided SC list polar decoding,”
in Proc. IEEE Vehicular Technology Conference
(VTC), 2016, pp. 1–5.
11. S. A. Hashemi, A. Balatsoukasstimming, P. Gi-
ard, C. Thibeault, and W. J. Gross, “Parti-
tioned successive-cancellation list decoding of polar
codes,” in Proc. IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP),
2016, pp. 957–960.
12. P. Koopman and T. Chakravarty, “Cyclic redun-
dancy code (CRC) polynomial selection for em-
bedded networks,” in Proc. Annual IEEE/IFIP In-
ternational Conference on Dependable Systems and
Networks (DSN), 2004, pp. 145–154.
13. J. Hagenauer, “Rate-compatible punctured convo-
lutional codes (RCPC codes) and their applica-
tions,” IEEE Trans. Commun., vol. 36, no. 4, pp.
389–400, 1988.
14. D. N. Rowitch and L. B. Milstein, “On the per-
formance of hybrid FEC/ARQ systems using rate
compatible punctured turbo (RCPT) codes,” IEEE
Trans. Commun., vol. 48, no. 6, pp. 948–959, 2000.
15. G. Yue, X. Wang, and M. Madihian, “Design
of rate-compatible irregular repeat accumulate
codes,” IEEE Trans. Commun., vol. 55, no. 6, pp.
1153–1163, 2007.
16. K. Chen, K. Niu, and J. Lin, “A hybrid ARQ
scheme based on polar codes,” IEEE Commun.
Lett., vol. 17, no. 10, pp. 1996–1999, 2013.
17. K. Niu, K. Chen, and J. Lin, “Beyond turbo codes:
Rate-compatible punctured polar codes,” in Proc.
IEEE International Conference on Communica-
tions (ICC), 2013, pp. 3423–3427.
18. H. Saber and I. Marsland, “An incremental redun-
dancy hybrid ARQ scheme via puncturing and ex-
tending of polar codes,” IEEE Trans. Commun.,
vol. 63, no. 11, pp. 3964–3973, 2015.
19. I. Tal and A. Vardy, “How to construct polar
codes,” IEEE Trans. Inf. Theory, vol. 59, no. 10,
pp. 6562–6582, 2011.
20. C. Zhang, B. Yuan, and K. K. Parhi, “Reduced-
Latency SC polar decoder architectures,” in Proc.
IEEE International Conference on Communica-
tions (ICC), 2011, pp. 3471–3475.
21. X. Liang, J. Yang, C. Zhang, W. Song, and
X. You, “Hardware efficient and low-latency CA-
14 Huayi Zhou et al.
SCL decoder based on distributed sorting,” in
Proc. IEEE Global Communications Conference
(GLOBECOM), Dec 2016, pp. 1–6.
22. X. Liang, C. Zhang, S. Zhang, and X. You,
“Hardware-Efficient folded SC polar decoder based
on k-segment decomposition,” in Proc. IEEE Asia
Pacific Conference on Circuits and Systems (APC-
CAS), Oct 2016, pp. 1–4.
23. S. Lin and D. J. Costello, “Error control coding,”
Principles of Mobile Communication, vol. 44, no. 2,
pp. 607 – 610, 2004.
