Partitioned Successive-Cancellation List Decoding of Polar Codes by Hashemi, Seyyed Ali et al.
ar
X
iv
:1
51
2.
03
12
8v
2 
 [c
s.A
R]
  2
2 J
an
 20
16
PARTITIONED SUCCESSIVE-CANCELLATION LIST DECODING OF POLAR CODES
Seyyed Ali Hashemi⋆, Alexios Balatsoukas-Stimming†, Pascal Giard⋆,
Claude Thibeault⋄ and Warren J. Gross⋆
⋆McGill University, Montre´al, Que´bec, Canada
† ´Ecole polytechnique fe´de´rale de Lausanne, Lausanne, Switzerland
⋄ ´Ecole de technologie supe´rieure, Montre´al, Que´bec, Canada
ABSTRACT
Successive-cancellation list (SCL) decoding is an algorithm
that provides very good error-correction performance for po-
lar codes. However, its hardware implementation requires a
large amount of memory, mainly to store intermediate results.
In this paper, a partitioned SCL algorithm is proposed to re-
duce the large memory requirements of the conventional SCL
algorithm. The decoder tree is broken into partitions that are
decoded separately. We show that with careful selection of
list sizes and number of partitions, the proposed algorithm
can outperform conventional SCL while requiring less mem-
ory.
Index Terms— Partitioned List Decoder, Successive-
Cancellation List Decoder, Polar Codes, Hardware Imple-
mentation.
1. INTRODUCTION
Polar codes provably achieve the symmetric capacity of mem-
oryless channels and therefore have gained a lot of attention as
promising error-correcting codes [1]. Successive-cancellation
(SC) decoding was first proposed as a low-complexity decod-
ing algorithm for polar codes. It was shown that the error
probability of polar codes under SC decoding goes to zero as
the blocklength goes to infinity, provided that the rate of the
polar code is less than the capacity of the channel. From a
hardware implementation point of view, SC decoding can be
represented as a decoder tree having a fixed time and space
complexity and is thus very attractive [2]. However, the algo-
rithm is sub-optimal, especially for decoding moderate-length
polar codes.
To improve the error-correction performance of SC de-
coding, the SC list (SCL) decoding algorithm was proposed
in [3]. Unlike SC decoding, which estimates each bit based on
the estimation of previous bits, SCL keeps a constrained list of
the most likely candidates at each decoding step using the log-
likelihood (LL) of each candidate. SCL reduces the gap be-
tween SC and maximum likelihood (ML) decoding at the cost
of increased complexity. Furthermore, it was shown that con-
catenating polar codes with a cyclic redundancy check (CRC)
as an outer code improves the performance of SCL to the ex-
tent where polar codes decoded with CRC-aided SCL are able
to outperform low-density parity-check (LDPC) codes of the
same length and rate [3]. To reduce the hardware complexity
associated with LL-based SCL decoding, log-likelihood ra-
tio (LLR) values were used and the path metric calculations
adapted accordingly in [4]. Unfortunately, similarly to its LL-
based counterpart, LLR-based SCL decoding requires a large
memory to store the intermediate values, i.e. the total core
area is often largely dominated by memory [4].
In this paper, a partitioned SCL (PSCL) decoding algo-
rithm is proposed in order to reduce the memory requirements
associated with SCL decoding. More specifically, PSCL de-
coding performs SCL decoding on partitions of the decoder
tree and only one path candidate is transferred from one par-
tition to the next. As a result, memory can be shared between
the different partitions of the code, therefore, significantly
reducing the overall memory requirements. Without loss of
generality, we propose a CRC-aided scheme.
The paper is organized as follows: Section 2 offers a brief
overview on polar encoding and decoding. Section 3 de-
scribes the proposed PSCL algorithm and compares its error-
correction performance with that of conventional SCL de-
coding. In Section 4 hardware implementation results are
presented showing memory and total area savings of up to
41% and 42%, respectively, at similar error-correction perfor-
mance. Finally, conclusions are drawn in Section 5.
2. POLAR CODES
A polar code of length N = 2n which carries K information
bits, denoted by P(N,K), has rateR , K
N
and is constructed
by concatenating two polar codes of length N2 . Let us con-
sider an input set uN−10 = {u0, u1, . . . , uN−1} and a coded
set xN−10 = {x0, x1, . . . , xN−1}. The recursive concatena-
tion process can be expressed as a modulo-2 matrix multipli-
cation as in
xN−10 = u
N−1
0 G
⊗n
, (1)
where G⊗n is the n-th Kronecker product of the polarizing
matrix G =
[
1 0
1 1
]
.
00
0
u3
0
u5
u6
u7
x0
x1
x2
x3
x4
x5
x6
x7︸︷︷︸ ︸︷︷︸ ︸ ︷︷ ︸ ︸ ︷︷ ︸
level 0 1 2 3
Fig. 1: Polar encoding for P(8, 4).
Polar encoding consists of finding theK most reliable bit-
channels and transmitting the information bits through them.
The N − K least reliable bits are set to a predefined value
(usually 0) which is known by the decoder, and thus are called
frozen bits. An example of polar encoding for P(8, 4) is il-
lustrated in Fig. 1, where xN−10 is generated by the encoder
before being modulated and sent through the channel. The
noisy channel output yN−10 is input to the polar decoder.
SC decoding provides each bit estimate uˆi based on
yN−10 , the previously estimated bits uˆ
i−1
0 , and the location of
frozen bits F . The LLR-based formulation is
uˆi =
{
0, if i ∈ F or log P(y
N−1
0
,uˆ
i−1
0
|uˆi=0)
P(yN−1
0
,uˆ
i−1
0
|uˆi=1)
≥ 0,
1, otherwise.
(2)
SC works on a decoder tree such as the one illustrated in
Fig. 2. There are two types of messages passed through the
different levels in a decoder tree: 1) the soft messages which
contain the LLR values α; 2) the hard bit estimates β. Each
node at level s of the decoder tree contains 2s bits and the
messages in Fig. 2 are calculated as
αl[i] =sgn(α[i])sgn(α[i + 2s−1])min(|α[i]|, |α[i + 2s−1]|),
αr[i] =α[i + 2
s−1] + (1− 2βl[i])α[i], (3)
and
β[i] =
{
βl[i]⊕ βr[i], if i < 2s−1
βr[i+ 2
s−1], otherwise,
(4)
where ⊕ denotes the bitwise XOR operation [2] and β[i] are
called partial sums.
To improve the error-correcting performance of SC de-
coding, for each non-frozen bit, SCL decoding creates two
tentative paths on the decoding tree corresponding to uˆi = 0
and uˆi = 1. In order to avoid an exponential growth in the
number of tentative paths, only the L best (i.e., most likely)
paths are kept. Specifically, in LLR-based SCL decoding [4],
the L best paths are determined by the following path metric
PMli =
{
PMli−1, if uˆli = 12
(
1− sgn
(
αli
))
,
PMli−1 + |αli|, otherwise,
(5)
level
3
2
1
0
α
β
α l
β l
β
rα
r
Fig. 2: SC decoder tree for P(8, 4).
where l is the path index and αli is the LLR value associated
with the i-th bit at path l. A smaller path metric indicates a
more reliable path.
Unfortunately, SCL decoding requires a large amount of
memory to store the intermediate values. Let us assume that
the LLR and path metric values are quantized with Qα and
QPM bits, respectively. The total memory requirements for
the storage of the LLR values α, the path metrics PM, and the
partial sum values β in the SC and SCL algorithms are [2]
MSC = (2N − 1)Qα︸ ︷︷ ︸
α (LLR values)
+ N − 1︸ ︷︷ ︸
β (partial sums)
, (6)
and [4]
MSCL = (N + (N − 1)L)Qα︸ ︷︷ ︸
α (LLR values)
+ LQPM︸ ︷︷ ︸
path metrics
+(2N − 1)L︸ ︷︷ ︸
β (partial sums)
,
(7)
respectively. We note that QPM grows at most as logN [4],
so the term LQPM is negligible in Eq. (7).
3. PARTITIONED SCL DECODING OF
POLAR CODES
The large memory requirements of the SCL algorithm trans-
late into a large area occupation in the actual hardware de-
coder implementation. In fact, the total area is often largely
dominated by memory, e.g. the memory area accounts for
45% of the total area in [4]. In order to reduce the required
memory and, therefore, the area of the decoder, we propose a
partitioned SCL (PSCL) decoding technique.
3.1. Proposed PSCL Decoding Algorithm
The conventional CRC-aided SCL decoding algorithm first
performs SCL decoding to obtain the L most likely codeword
candidates and, in the end, selects the (hopefully) correct es-
timate by choosing the candidate that matches the expected
CRC. If no codeword verifies the CRC, the candidate with
the best path metric is selected.
In PSCL decoding, on the other hand, the decoder tree
is broken into partitions (i.e., subtrees) and SCL decoding is
performed only on the partitions, while the standard SC rules
level
n
n− 1
αl
βl
β
r
α
r
CRC-aided
SCL
CRC-aided
SCL
Fig. 3: PSCL with two partitions.
level
n
n− 1
n− 2
α
β
α l
β l
β
rα
r
CRC-aided
SCL
CRC-aided
SCL
CRC-aided
SCL
CRC-aided
SCL
Fig. 4: PSCL with four partitions.
are applied to the remainder of the decoding tree. Each par-
tition outputs a single candidate codeword which is selected
with the help of a CRC and then sent to the next partition for
further decoding. The decoding process starts with the stan-
dard SC update rules given by (3) and (4). Therefore, the
decoder does not require memory to store L entire trees of
internal LLRs, but only L copies of the partitions on which
SCL decoding is performed.
Fig. 3 and Fig. 4 show the PSCL process when a code is
broken into two and four partitions, respectively. The memory
used in each CRC-aided SCL decoding block can be shared
with the next decoding block since only one candidate sur-
vives after decoding each partition. The total memory usage
in PSCL with P partitions and list size L can be calculated as
MPSCL =
(
P−1∑
k=0
N
2k
+
(
N
2P−1
− 1
)
L
)
Qα︸ ︷︷ ︸
α (LLR values)
+ LQPM︸ ︷︷ ︸
path metrics
+
P−2∑
k=1
N
2k
+
(
N
2P−2
− 1
)
L
︸ ︷︷ ︸
β (partial sums)
, (8)
where P ≥ 2 and P = 1 makes PSCL decoding equivalent
to conventional SCL decoding. Also note that when P = 2,∑P−2
k=1
N
2k = 0.
It should be noted that the lower bound on the memory
usage for PSCL is the memory requirement of the SC algo-
rithm and the upper bound is the memory required by SCL
with list size L. Fig. 5 illustrates the PSCL memory usage
with different numbers of partitions and list sizes for a po-
lar code with N = 2048, Qα = 6 bits, and QPM = 8 bits.
20 21 22 23 24 25 26 27 28 29 210 211
0.5
1
1.5
·105
Number of Partitions
M
em
o
ry
B
its
PSCL2
PSCL4
PSCL8
SC Bound
SCL2 Bound
SCL4 Bound
SCL8 Bound
Fig. 5: Memory requirements for polar codes of length N =
2048. PSCLL (SCLL) denotes the PSCL (SCL) decoding
algorithm with list size L.
As it can be seen in the figure, the amount of memory de-
cays exponentially towards the SC bound as the number of
partitions increases. In other words, a small increase in the
number of partitions results in significant savings, e.g. using
four partitions in PSCL4 is expected to require less memory
than SCL2.
3.2. Error-Correction Performance
Fig. 6 shows the frame error rate (FER) and bit error rate
(BER) performance of SCL and PSCL. The error-correction
performance of the plain SC algorithm is also included as a
reference. SCLL-CRCx denotes the SCL algorithm with list
size L and CRC length x and PSCL(P,L)-CRCx represents
the PSCL algorithm with P partitions, list size L, and a CRC
of length x.
The performance results are provided for a polar code of
length N = 2048 and rate R = 12 , while a CRC of length
32 is used for the conventional SCL decoding algorithm. To
keep the code rate unchanged and to have a fair comparison,
PSCL(2, L) uses a CRC of length 16, i.e. each of its two
partitions uses a CRC16. Similarly, for PSCL(4, L) each of
the four partitions uses a CRC8. The CRC polynomials were
taken from [5, 6].
Fig. 6 shows that PSCL(2, 2)-CRC16 has identical FER
and BER performance compared to SCL2-CRC32 and there
is only a slight deterioration in performance when the code
is broken into four partitions, as shown by the PSCL(4, 2)-
CRC8 curve. However, in Fig. 6, it can also be seen that
PSCL(4, 4)-CRC8 has superior error-correction performance
compared to that of SCL2-CRC32. Furthermore, PSCL(4, 4)
actually requires slightly less memory than SCL2, as shown in
Fig. 5. Thus, PSCL achieves better performance and reduces
memory usage at the same time.
1 2 3
10−6
10−5
10−4
10−3
10−2
10−1
100
Eb/N0 [dB]
FE
R
1 2 3
10−7
10−6
10−5
10−4
10−3
10−2
10−1
Eb/N0 [dB]
B
ER
SC PSCL(2, 2)-CRC16
SCL2-CRC32 PSCL(4, 2)-CRC8
SCL4-CRC32 PSCL(4, 4)-CRC8
Fig. 6: Frame error rate (FER) and bit error rate (BER) per-
formance comparison between CRC-aided SCL and PSCL
decoding of P(2048, 1024). The code is optimized for
Eb/N0 = 2 dB.
4. HARDWARE IMPLEMENTATION RESULTS
Table 1 presents indicative synthesis results to compare SC,
the conventional CRC-aided SCL algorithm with L ∈ {2, 4},
and the proposed PSCL algorithm for L ∈ {2, 4} and P ∈
{2, 4}, for a polar code of blocklength N = 2048. For the
CRC-aided SCL algorithm, the hardware architecture of [4] is
used while an appropriately modified version of [4] was used
for the PSCL algorithm. All synthesis results are for a TSMC
90 nm CMOS library (1 V, 25◦C) with a target frequency of
500 MHz. All decoders have an equal latency of 5248 clock
cycles (10.5 µs) and throughput of 164 Mbps.
From Table 1, we observe that the PSCL(2, 2) and
PSCL(4, 2) decoders require 23% and 41% less memory
area than the SCL2 decoder, respectively. The PSCL(4, 4)
decoder implementation is shown to require 23% less mem-
ory area than the SCL2 decoder while offering a better coding
gain by approximately 0.25 dB at a BER of 10−5. Regardless
of the implementation, the memory area of the list decoders
amounts to 40%–45% of the total area. The memory savings
observed for the PSCL implementations thus translate into
very significant reductions in the total area, making them very
attractive compared to the conventional SCL decoders.
5. CONCLUSION
In this paper, we have proposed a novel partitioned list de-
coding algorithm for polar codes. In this algorithm, the code
is broken into partitions and each partition is decoded with
a CRC-aided successive-cancellation list decoder. Since the
memory is shared between different partitions in the code,
Table 1: Synthesis area results for the SC, CRC-aided SCL,
and PSCL decoding algorithms.
Algorithm Total (mm2) Memory (mm2)
SC 0.723 0.413
SCL2-CRC32 1.563 0.702
SCL4-CRC32 3.075 1.214
PSCL(2, 2)-CRC16 1.189 0.540
PSCL(4, 2)-CRC8 0.909 0.415
PSCL(4, 4)-CRC8 1.356 0.543
the memory requirements of a hardware implementation of
partitioned list decoder is significantly smaller than that of
a conventional list decoder without any error-correction per-
formance degradation. Implementation results show that at
equivalent error-correction performance, the proposed algo-
rithm leads to memory and total area savings of 41% and
42%, respectively, when compared to a similar list decoder
implementation. Moreover, the proposed algorithm enables
a coding gain of approximately 0.25 dB at a bit error rate of
10−5 while occupying 13% less total area than the conven-
tional CRC-aided successive-cancellation list decoder.
ACKNOWLEDGEMENT
The authors would like to thank Gabi Sarkis and Carlo Condo
of McGill University for helpful discussions.
6. REFERENCES
[1] E. Arıkan, “Channel polarization: A method for con-
structing capacity-achieving codes for symmetric binary-
input memoryless channels,” IEEE Trans. Inf. Theory,
vol. 55, no. 7, pp. 3051–3073, July 2009.
[2] C. Leroux, A.J. Raymond, G. Sarkis, and W.J. Gross,
“A semi-parallel successive-cancellation decoder for po-
lar codes,” IEEE Trans. Signal Process., vol. 61, no. 2,
pp. 289–299, Jan 2013.
[3] I. Tal and A. Vardy, “List decoding of polar codes,” IEEE
Trans. Inf. Theory, vol. 61, no. 5, pp. 2213–2226, May
2015.
[4] A. Balatsoukas-Stimming, M. Bastani Parizi, and
A. Burg, “LLR-based successive cancellation list decod-
ing of polar codes,” IEEE Trans. Signal Process., vol. 63,
no. 19, pp. 5165–5179, Oct 2015.
[5] P. Koopman and T. Chakravarty, “Cyclic redundancy
code (CRC) polynomial selection for embedded net-
works,” in IEEE Int. Conf. on Dependable Syst. and Netw.
(DSN), 2004, pp. 145–154.
[6] P. Koopman, “32-bit cyclic redundancy codes for internet
applications,” in IEEE Int. Conf. on Dependable Syst. and
Netw. (DSN), 2002, pp. 459–468.
