On Error-Correction Performance and Implementation of Polar Code List
  Decoders for 5G by Ercan, Furkan et al.
ar
X
iv
:1
70
8.
04
70
6v
2 
 [c
s.I
T]
  1
2 O
ct 
20
17
On Error-Correction Performance and
Implementation of Polar Code List Decoders for 5G
Furkan Ercan, Carlo Condo, Seyyed Ali Hashemi, Warren J. Gross
Department of Electrical and Computer Engineering, McGill University, Montre´al, Que´bec, Canada
Email: furkan.ercan@mail.mcgill.ca, carlo.condo@mail.mcgill.ca, seyyed.hashemi@mail.mcgill.ca, warren.gross@mcgill.ca
Abstract—Polar codes are a class of capacity achieving error
correcting codes that has been recently selected for the next
generation of wireless communication standards (5G). Polar
code decoding algorithms have evolved in various directions,
striking different balances between error-correction performance,
speed and complexity. Successive-cancellation list (SCL) and its
incarnations constitute a powerful, well-studied set of algorithms,
in constant improvement. At the same time, different implemen-
tation approaches provide a wide range of area occupations and
latency results. 5G puts a focus on improved error-correction
performance, high throughput and low power consumption: a
comprehensive study considering all these metrics is currently
lacking in literature. In this work, we evaluate SCL-based
decoding algorithms in terms of error-correction performance
and compare them to low-density parity-check (LDPC) codes.
Moreover, we consider various decoder implementations, for both
polar and LDPC codes, and compare their area occupation and
power and energy consumption when targeting short code lengths
and rates. Our work shows that among SCL-based decoders,
the partitioned SCL (PSCL) provides the lowest area occupation
and power consumption, whereas fast simplified SCL (Fast-
SSCL) yields the lowest energy consumption. Compared to LDPC
decoder architectures, different SCL implementations occupy up
to 17.1× less area, dissipate up to 7.35× less power, and up to
26× less energy.
I. INTRODUCTION
Polar codes, introduced by Arıkan in [1], are a class of error-
correcting codes that can provably achieve channel capacity
on a memoryless channel when the code length N tends to
infinity. They have been selected for the next generation of
wireless communication standards [2].
The 5G standardization process is putting a particular fo-
cus on improved error-correction performance, lower power
consumption and higher throughput. For example, machine-
to-machine communications in 5G target massive connectivity
among a high number of devices, on a scale higher than the
most bandwidth-demanding applications in 3G and 4G [3],
with a limited power budget. Therefore, reliable and efficient
encoding and decoding methods need to be designed.
In [1], the successive-cancellation (SC) decoding algorithm
is proposed for polar codes: it can be represented as a binary
tree search. While optimal with infinite code length, this
approach suffers from long decoding latency and mediocre
error-correction performance at moderate code lengths. To
improve the error-correction performance of SC, the SC List
(SCL) decoding algorithm was proposed in [4], that relies on
a list of L codeword candidates. A cyclic-redundancy check
(CRC) is also concatenated to the polar code, to help in
the selection of the correct candidate at the end of the SCL
decoding process. The improved error-correction performance
of CRC-aided SCL comes at the cost of additional compu-
tational complexity and latency. A hardware implementation
for SCL using logarithmic likelihood ratio (LLR) values was
presented in [5]. In order to reduce latency and increase
throughput, simplified SCL (SSCL) [6] and Fast-SSCL [7]
decoding algorithms were proposed, that rely on the identi-
fication of bit patterns to prune the SC decoding tree and
reduce the number of required bit estimations, with minor
or no error-correction performance degradation. Compared to
the conventional SCL, SSCL and Fast-SSCL can reduce the
number of time steps required to decode one codeword up
to 88% [7]. To address the high implementation complexity
of SCL decoders, a partitioned SCL (PSCL) decoder was
proposed in [8]: it shows substantial area occupation reduction
and negligible error-correction performance loss with respect
to conventional SCL decoders.
SCL-based decoders are currently one of the best candidates
to meet 5G error-correction performance requirements and
throughput. While most recent decoder architectures for polar
codes focus on improving throughput and area occupation,
little work has been done in terms of power consumption
[9], [10]. A large part of machine-to-machine connected
devices are mobile-end platforms that use batteries and small-
scale energy harvesting electronics: ultra-low power/energy
consumption for these devices is crucial [11].
This work provides an extensive study on polar code SCL-
based decoders in terms of frame error rate (FER) perfor-
mance, area occupation, and power/energy consumption. We
focus on short to medium code lengths, similar to those
chosen for the eMBB control channel [12]. For rates 12 and
2
3 , SCL-based decoders are compared against low-density
parity-check (LDPC) codes from the IEEE 802.16e (WiMAX)
standard with variable maximum iteration number. Then, we
address power consumption of polar code decoders based on
SCL, SSCL, Fast-SSCL and PSCL, and compare them against
LDPC codes.
The rest of this work is organized as follows. In Section II,
polar codes are briefly introduced along with various SCL-
based decoding algorithms. Hardware implementations of po-
lar code decoders are discussed in Section III. In Section IV,
the error-correction performance of polar codes is analyzed
and compared to that of LDPC codes from communications
standards. Section V presents synthesis results for a wide
u0
u1
u2
u3
u4
u5
u6
u7
x0
x1
x2
x3
x4
x5
x6
x7
Fig. 1. Polar code encoding for PC(8, 4). Gray indices indicate frozen bits
while black indices represent information bits.
variety of decoder architectures, and compares them to LDPC
decoders in literature. Conclusions are drawn in Section VI.
II. BACKGROUND
A. Polar Codes
Polar codes are able to achieve channel capacity through
channel polarization, that splits N channel utilizations into K
reliable ones, through which information bits are sent, and
N − K unreliable ones, used for frozen bits. A polar code,
represented as PC(N,K), is a linear block code of length
N = 2n and rate R = K/N . Encoding of a polar code can
be represented by a matrix multiplication:
x
N−1
0
= uN−1
0
G⊗n, (1)
where u
N−1
0
= {u0, u1, . . . , uN−1} is the input vector,
x
N−1
0
= {x0, x1, . . . , xN−1} is the encoded vector, and the
generator matrix G⊗n is the n-th Kronecker product of the
polar code matrix G = [ 1 01 1 ]. A polar code of length N is
composed of two concatenated polar codes of length N/2;
Fig. 1 depicts the encoding process for PC(8, 4).
In [1], it was shown that as N →∞, encoded bits become
either completely unreliable or completely reliable. For a polar
code of rate R = K/N , N − K most unreliable bits are
fixed to a constant that is known by the decoder, usually to
zero; remaining K reliable locations are used to transmit the
information bits. For the PC(8, 4) code in Fig. 1, bits u0, u1,
u2, and u4 are located on the least reliable indices, thus are
frozen and indicated with set Φ (gray indices in the figure),
while bits u3, u5, u6, and u7 are located on the most reliable
indices, which carry the information bits (black indices in the
figure).
By its serial nature, SC decoding estimates a bit uˆi accord-
ing to the channel output y
N−1
0
= {y0, y1, . . . , yN−1} and
previously estimated bits uˆ
i−1
0
= {u0, u1, . . . , ui−1}. Let us
represent the LLR value of ui as αui = ln
Pr[y
N−1
0
,uˆ
i−1
0
|uˆi=0]
Pr[y
N−1
0
,uˆ
i−1
0
|uˆi=1]
.
SC estimates each bit in accordance with
uˆi =
{
0, when αui ≥ 0 or i ∈ Φ;
1, otherwise.
(2)
S = 3
S = 2
S = 1
S = 0
uˆ0 uˆ1 uˆ2 uˆ3 uˆ4 uˆ5 uˆ6 uˆ7
α
β
α
l
β
l
α r
β r
Fig. 2. Succesive-cancellation decoding tree for a PC(8, 4) code.
SC decoding traverses the polar code tree in Fig. 2 starting
from the root node, and advances recursively from left to right.
Each parent node at stage S contains soft information (LLR
values) α = {α0, α1, . . . , α2S−1}, and passes this soft infor-
mation to its left and right children. Hard decision estimates
β = {β0, β1, . . . , β2S−1} are passed from child nodes to their
parent nodes. From a parent node at stage S, the soft infor-
mation passed to left child αl = {αl0, α
l
1, . . . , α
l
2S−1−1} and
right child αr = {αr0, α
r
1, . . . , α
r
2S−1−1} can be approximated
as
αli = sgn(αi) sgn(αi+2S−1)min(αi, αi+2S−1), (3)
αri = αi+2S−1 + (1− 2β
l
i)αi. (4)
The hard decision estimates β are calculated at each stage S
via the left and right messages received from child nodes, βl =
{βl0, β
l
1, . . . , β
l
2S−1−1} and β
r = {βr0 , β
r
1 , . . . , β
r
2S−1−1}, as
βi =
{
βli ⊕ β
r
i , if i ≤ 2
S−1
βri , otherwise.
(5)
where ⊕ denotes bitwise XOR operation, and 0 ≤ i < 2S .
At the leaf nodes, β values are hard decisions computed
by (2). The computational complexity of SC decoding is
O(N log2N).
B. Successive-Cancellation List (SCL) Decoding
In SCL decoding [4], when a bit is to be estimated, the
decoding process splits into two paths; one path estimates the
bit as a ’0’, and the other as a ’1’. Therefore, at each bit
estimation, the number of codeword paths double, until a list
size L is reached. In this context, SC can be considered as an
SCL with list size L = 1. Each path contains an information on
the likelihood of the path being the correct codeword, which
is defined as a path metric (PM). When the list size L is
doubled by estimating another bit in the sequence, the L least
likely paths are dropped based on their PM information, and
the list is updated. Compared to SC, SCL decoding yields a
better error-correction performance. Fig. 3 depicts the parallel
decoding process with list size L = 2 for PC(4, 3): uˆ0 is a
frozen bit and as a result no path splitting occurs. Estimating
uˆ1 creates two paths with associated path reliability values.
When uˆ2 is estimated, out of four possible paths, two of them
with least reliable PMs are discarded.
uˆ0
uˆ1
uˆ2
uˆ3
0
0 1
0 1 0 1
0 1 0 1 0 1 0 0
Fig. 3. SCL decoding stages for list size L = 2.
The PM is initialized as 0, and at each bit estimation, PM
is updated as
PMil =
{
PMi−1l , if uˆi =
1−sgn(αui )
2 ,
PMi−1l + |αui |, otherwise,
(6)
where l is the path index (0 ≤ l < L), and i is the estimated
bit index.
In [4], it was observed that SCL decoding could pick
a wrong codeword out of the final candidates if they are
evaluated only by their PM, even when the correct codeword
is present in the final list. Thus, a CRC is added as an outer
decoding process to aid SCL decoding, which improves the
error-correction performance significantly. On the other hand,
SCL decoding suffers from long decoding latency and higher
computational complexity of O(LN log2N).
C. Simplified SCL and Fast-SSCL Decoding
The throughput of SC can be improved by an order of mag-
nitude when applying the fast decoding techniques proposed in
[13] and [14]. These techniques identify particular information
and frozen bit patterns, reducing the decoding latency of
SC with no error-correction performance degradation. Such
special patterns are associated to nodes in the decoding tree:
Rate-0 nodes (with no information bits), Rate-1 nodes (with no
frozen bits), repetition (Rep) nodes (with a single information
bit) and single parity-check (SPC) nodes (with a single frozen
bit). In Fig. 2 the left and right child of the root node are
examples of a Rep node and an SPC node, respectively. In
[15], it was shown that adaptation of these special nodes is
applicable to SCL, yielding significant reduction in latency at
the cost of error-correction performance loss.
The SSCL algorithm from [6] proposes an efficient de-
coding technique by proving that Rate-0, Rate-1 and Rep
nodes need not to be traversed to update the PM while
guaranteeing error-correction performance preservation. This
approach reduces the number of decoding steps for a node of
length Nv from 3Nv − 2, to Nv for Rate-1 nodes, to 1 for
Rate-0 nodes, and to 2 for Rep nodes [16].
Fast-SSCL decoding [17], proposes an enhanced method
to reduce the decoding latency further for Rate-1 nodes,
down to min(L− 1, Nv) time steps with zero error-correction
performance degradation. It was shown that, when splitting
S=log
2
N
S=log
2
N−1
S=log
2
N−2
CRC-aided SCL CRC-aided SCL CRC-aided SCL CRC-aided SCL
Fig. 4. Partitioned SCL decoding tree with P = 4.
the paths over the Rate-1 node, the path split that does not
match the sign of the LLR will always be discarded after the
L− 1-th step.
D. Partitioned SCL Decoding
PSCL decoding divides the polar code into P constituent
sub-trees of length N/P , while every partition is decoded
by the CRC-aided SCL algorithm [8]. Each partition has
its own CRC, thus only one candidate is passed at the end
of each partition to the next, using standard SC rules [1].
This approach helps reducing the memory requirements, since
instead of storing L copies of the complete tree, L copies of
a single partition are required. In addition, the same physical
memory can be reused for different partitions. As a result, the
memory requirements decrease exponentially with P . Fig. 4
depicts a generic PSCL decoder tree for a partition size of
P = 4.
The reduced memory in PSCL comes at the cost of error-
correction performance degradation compared to the conven-
tional SCL decoding. As the number of partitions increases,
the error-correction performance decays towards that of SC
decoding. It was shown in [18] that a careful code construction
and CRC selection can improve the error-correction perfor-
mance of PSCL.
III. HARDWARE ARCHITECTURES FOR SCL-BASED
DECODERS
A. SCL Decoder
The architecture of the SCL decoder follows the one de-
scribed in [5]. It consists of five components: memory units,
metric computation unit (MCU), metric sorting unit (MSU),
address translation unit, and a controller. The MCU employs
L parallel SC decoders performing (3), (4), and (5), one for
each candidate codeword in the list. It also calculates the
PM values whenever L decision LLR values are calculated
according to (6). It then takes one clock cycle to update
and sort the PMs using MSU. PMs are stored in a register-
based memory architecture for each candidate, and are passed
to a compute/swap unit at the end of each bit estimation.
LLR and β memory units have L banks each, one for each
parallel decoder unit. Considering there are Pe processing
elements available, each bank is itself divided into two parts,
one handling the top stages of the decoding tree, where stage
S > log2(Pe), and one for the lower stages.
B. SSCL & Fast-SSCL Decoder
The architectures of SSCL and Fast-SSCL decoders are
based on the SCL architecture described in Section III-A: they
however expand MCU to perform Rate-0, Rate-1 and Rep node
calculations. Size and position of special nodes in the decoder
tree are computed offline and used by the decoder as inputs.
For Rate-0, no path splitting occurs, and a single step is used to
update the PM list. Rate-1 nodes are computed in two stages:
first the portion of the information bits (all of them in SSCL)
that are subject to path splitting is calculated. Then, in case of
Fast-SSCL, the remaining bits are estimated in a single step,
and their LLR values are used to update the PM according to
(6). Computations for Rep nodes are similar to those of Rate-0
nodes: the frozen bits are treated as in Rate-0 nodes, while an
additional step estimates the single information bit.
Both SSCL and Fast-SSCL architectures employ an L-
parallel CRC computation unit that updates the CRC as soon
as a bit is estimated by the SC decoders. They include
different degrees of parallelism to accommodate the single-
step estimation of multiple bits in Rate-0, Rate-1 and Rep
nodes.
C. PSCL Decoder
The PSCL decoder modifies the SCL decoder by reducing
the size of the LLR and β memories to fit the partition size. A
single memory takes care of the top of the tree, where the SC
rules are applied, and ad-hoc routing to processing elements
is performed depending on the tree stage.
IV. ERROR CORRECTION PERFORMANCE COMPARISON
In this section, the error correction performances of SCL-
based decoding algorithms described in Section II are evalu-
ated and compared against each other, and against LDPC codes
taken from the IEEE 802.16e standard where applicable. For
polar codes, we consider code lengths of 256 and 512: these
lengths are included in the 5G eMBB control channel [2]. The
LDPC code with N = 576 has been instead selected from the
WiMAX standard, being the only one of length comparable
to that of polar codes. Our simulation environment considers
additive white Gaussian noise (AWGN) channel and BPSK
modulation.
The error-correction performance of SCL, SSCL and Fast-
SSCL are identical. Therefore, they are referred with the
notation SCLL-CRCC, where L and C denote the list size
and the CRC length, respectively. PSCL decoders are referred
as PSCL(P ,L)-CRC(c0,c1,. . .,cP−1), where P denotes the
number of partitions and cp represents the CRC length of
partition p. For LDPC codes, T denotes the maximum number
of iterations, and the normalized min-sum algorithm is used
for decoding [19], together with layered scheduling [20].
The target code rates are R ∈ { 16 ,
1
3 ,
1
2 ,
2
3} for polar codes,
having been investigated in 5G discussions [12]. Among these
rates, WiMAX LDPC codes allow for R ∈ { 12 ,
2
3}.
A CRC of length 8 is selected for polar codes. For PSCL,
the CRC selection criteria from [18] was adopted. For a
target Eb/N0 value, a simulation sweeps the error-correction
2 4
10−6
10−5
10−4
10−3
10−2
10−1
100
Eb/N0 [dB]
F
E
R
1 2 3 4
10−6
10−5
10−4
10−3
10−2
10−1
100
Eb/N0 [dB]
F
E
R
SCL4-CRC8 PSCL(2,4)-CRC(4,4)
SCL8-CRC8 PSCL(2,8)-CRC(4,4)
Fig. 5. FER of SCL and PSCL for polar codes with N = 256 (left) and
N = 512 (right), and R = 1/6.
performance of PSCL with different CRC lengths. Only CRC
polynomials of degrees which are multiples of four are con-
sidered, to reduce the algorithm complexity. Then, for each
code length and rate, CRC lengths that provide the best error-
correction performance are selected.
Fig. 5, 6, 7, and 8 present the FER for SCL and PSCL
algorithms with list sizes of L ∈ {4, 8}, and code lengths
of N ∈ {256, 512} for code rates 1/6, 1/3, 1/2, and 2/3,
respectively. A consistent improvement in FER can be seen
when the list size is increased for all rates and lengths
when SCL decoder is used. For a target FER of 10−4, this
improvement reaches 1 dB when a polar code of length 512
with rate 1/3 is used. Similar observations can be made in
terms of PSCL, with a peak improvement of 0.25 dB for
PC(512, 256). In all cases, SCL decoders provide better error-
correction performance than their PSCL counterparts.
Fig. 9 and 10 present the FER of polar codes with N = 512
against LDPC WiMAX codes with N = 576, for R ∈
{1/2, 2/3}. The maximum number of iterations considered
for LDPC decoding is T ∈ {5, 10, 20}. For R = 1/2 codes,
SCL algorithm with L = 2 outperforms LDPC with T = 20,
while at FER= 10−4, PSCL(2,2)-CRC(8,8) has the same FER.
For R = 2/3 codes in Fig. 10, LDPC with T = 10 matched
the error-correction performance of SCL8-CRC8. Based on
these results, in the following section we compare decoder
architectures that target codes with matching FER. Thus,
LDPC decoders are compared to PSCL for R = 2/3, while
SCL, SSCL and Fast-SSCL are used in case of R = 1/2.
V. ASIC IMPLEMENTATION RESULTS
In this section, synthesis results for SCL, SSCL, Fast-SSCL
and PSCL for N ∈ {256, 512}, R ∈ { 12 ,
2
3}, and L ∈ {2, 4, 8}
are presented. For each architecture, the number of parallel
processing elements is Pe = 32. Based on simulations, PM
1 2 3 4
10−6
10−5
10−4
10−3
10−2
10−1
Eb/N0 [dB]
F
E
R
1 2 3
10−5
10−4
10−3
10−2
10−1
Eb/N0 [dB]
F
E
R
SCL4-CRC8 PSCL(2,4)-CRC(4,4)
SCL8-CRC8 PSCL(2,8)-CRC(4,4)
Fig. 6. FER of SCL and PSCL for polar codes with N = 256 (left) and
N = 512 (right), and R = 1/3.
1 2 3 4
10−5
10−4
10−3
10−2
10−1
100
Eb/N0 [dB]
F
E
R
1 2 3
10−6
10−5
10−4
10−3
10−2
10−1
100
Eb/N0 [dB]
F
E
R
SCL4-CRC8 PSCL(2,4)-CRC(8,8)
SCL8-CRC8 PSCL(2,8)-CRC(8,8)
Fig. 7. FER of SCL and PSCL for polar codes with N = 256 (left) and
N = 512 (right), and R = 1/2.
quantization is selected as 8 bits. For channel LLR and internal
LLR values, quantization is 4 and 6 bits respectively, two of
which are assigned to the fractional part. All memories have
been synthesized as registers.
The architectures are synthesized with TSMC 65 nm CMOS
technology, targeting a frequency of f = 800 MHz. Table I
compares the total area, power and energy consumption per
codeword for all four SCL-based decoder implementations
under the aforementioned design parameters.
The SCL decoder yields lower area occupation and power
consumption compared to SSCL and Fast-SSCL, With all the
considered code lengths, rates and list sizes. This is due to
the fact that the special node computations in SSCL and Fast-
2 4
10−5
10−4
10−3
10−2
10−1
100
Eb/N0 [dB]
F
E
R
2 4
10−5
10−4
10−3
10−2
10−1
100
Eb/N0 [dB]
F
E
R
SCL4-CRC8 PSCL(2,4)-CRC(4,4)
SCL8-CRC8 PSCL(2,8)-CRC(4,4)
Fig. 8. FER of SCL and PSCL for polar codes with N = 256 (left) and
N = 512 (right), and R = 2/3.
1 2 3 4 5
10−6
10−5
10−4
10−3
10−2
10−1
100
Eb/N0 [dB]
F
E
R
1 2 3 4 5
10−6
10−5
10−4
10−3
10−2
10−1
100
Eb/N0 [dB]
F
E
R
SCL2-CRC8 SCL4-CRC8
PSCL(2,2)-CRC(8,8) PSCL(2,4)-CRC(8,8)
LDPC T = 5 LDPC T = 10
LDPC T = 20
Fig. 9. FER for polar code with N = 512, R = 1/2 with SCL decoding
compared against LDPC N = 576, R = 1/2 with various T.
SSCL add substantial logic complexity. Additional complexity
is also caused by the parallel CRC units necessary to update
Rate-0 and Rep nodes in SSCL, and also Rate-1 nodes in Fast-
SSCL. In terms of energy consumption, Fast-SSCL provides
the best results compared to its predecessors: although the
power consumption is higher, the number of time steps needed
to decode a codeword is reduced dramatically, yielding the
lowest energy per frame.
In SCL-based implementations, memory is a major con-
tribution in both area occupation and power consumption,
that decreases exponentially with the partitioning factor of
1 2 3 4 5
10−7
10−5
10−3
10−1
Eb/N0 [dB]
F
E
R
1 2 3 4 5
10−7
10−5
10−3
10−1
Eb/N0 [dB]
F
E
R
SCL4-CRC8 SCL8-CRC8
PSCL(2,4)-CRC(4,4) PSCL(2,8)-CRC(4,4)
LDPC T = 5 LDPC T = 10
LDPC T = 20
Fig. 10. FER for polar code with N = 512, R = 2/3 with SCL decoding
compared against LDPC N = 576, R = 2/3 with various T.
PSCL. Thus PSCL, with its reduced memory requirements,
has the smallest area occupation and power consumption.
In this context, with a minor degradation in performance,
PSCL provides the best results for area- and power-efficient
implementations.
For all considered rates in Table I when N = 256, energy
consumption per codeword of PSCL follows a close trend to
SSCL when L = 2. SCL, with its long decoding process,
has the worst energy consumption, while Fast-SSCL has the
lowest one. As L increases, PSCL energy dissipation becomes
comparable to that of Fast-SSCL. For N = 512, energy
consumption for PSCL sits between that of SCL and SSCL
for L ∈ {2, 4}. For L = 8, the energy consumption of PSCL
is lower than that of SSCL and higher than that of Fast-SSCL.
This is due to the nonlinear increment in power consumption
that both SSCL and Fast-SSCL experience with increasing L.
Table II compares power, energy, and area of the considered
polar code decoders against architectures for LDPC 802.16e
codes taken from [21], [22], and [23]. Polar code decoders are
selected based on the observations from Fig. 9-10. Note that
the LDPC decoder architectures from [21] and [23] support
both considered rates R = { 12 ,
2
3}. Energy consumption for
the LDPC architectures in Table II are calculated with the
number of iterations T required to match the FER of polar
codes from Section IV.
In Table II, the area occupation for the LDPC decoders is
scaled to 65 nm technology for a fair comparison. For rate
R = 12 , the total area of polar code decoders ranges between
7.7× (Fast-SSCL vs. [21]) to 17.1× (PSCL vs. [23]) less than
that of LDPC WiMAX implementations. For rate R = 23 , the
advantage of polar decoders over LDPC decoders is lower,
with a minimum of 2.46× less area occupation.
TABLE I
SYNTHESIS AREA AND ENERGY CONSUMPTION RESULTS WITH 65 NM
TSMC CMOS TECHNOLOGY FOR SCL, SSCL, FAST-SSCL AND PSCL
DECODING OF POLAR CODES, Pe = 32, AND f = 800 MHZ.
Algorithm N R L
CRC Area Power Energy
[bits] [mm2] [mW] [nJ]
SCL
256
1
2
2
8
0.116 35.99 30.68
4 0.233 83.85 71.48
8 0.554 192.52 164.12
2
3
2 0.116 35.99 32.62
4 0.233 83.85 75.99
8 0.554 192.52 174.47
512
1
2
2 0.215 75.27 128.54
4 0.432 150.03 256.21
8 1.006 345.39 589.82
2
3
2 0.215 75.27 136.65
4 0.432 150.03 272.38
8 1.006 345.39 627.06
SSCL
256
1
2
2
8
0.206 56.77 23.99
4 0.395 119.91 50.66
8 0.826 277.17 117.10
2
3
2 0.206 56.77 29.45
4 0.395 119.91 62.20
8 0.826 277.17 143.78
512
1
2
2 0.343 98.91 83.70
4 0.628 192.97 163.30
8 1.314 421.47 356.67
2
3
2 0.343 98.91 102.25
4 0.628 192.97 199.48
8 1.314 421.47 435.69
Fast-SSCL
256
1
2
2
8
0.247 65.85 14.90
4 0.490 142.33 41.28
8 1.049 323.46 109.17
2
3
2 0.247 65.85 14.40
4 0.490 142.33 42.88
8 1.049 323.46 125.75
512
1
2
2 0.422 119.68 51.01
4 0.795 221.92 124.00
8 1.685 493.43 341.08
2
3
2 0.422 119.68 45.18
4 0.795 221.92 112.35
8 1.685 493.43 315.18
PSCL
P = 2
256
1
2
2
(4,4)
0.091 28.62 24.40
4 0.164 50.12 42.73
8 0.389 123.55 105.33
2
3
2 0.091 28.62 25.94
4 0.164 50.12 45.42
8 0.389 123.55 111.97
512
1
2
2 0.191 63.19 107.91
4 0.337 119.53 204.13
8 0.694 205.68 351.25
2
3
2 0.191 63.19 114.72
4 0.337 119.53 217.01
8 0.694 205.68 373.41
Comparing power and energy consumption of architectures
implemented with different technology nodes is not desirable,
power scaling leads to wildly inaccurate figures. However,
with the current scheme, SCL decoders consume up to 8.75×
less power, and up to 26.8× less energy per frame in case
TABLE II
COMPARATIVE AREA, POWER, AND ENERGY CONSUMPTION RESULTS FOR SCL, SSCL, FAST-SSCL AND PSCL ARCHITECTURES FOR POLAR CODES
WITH N = 512 AGAINST LDPC 802.16E ARCHITECTURES WITH N = 576.
SCLa Fast-SSCLa PSCLa,b SCLc SSCLc Fast-SSCLc LDPC [21] LDPC [22] LDPC [23]
Tech. (nm) 65 65 65 65 65 65 90 180 90
Rate 1/2 1/2 1/2 2/3 2/3 2/3 Any 1/2 Any
Area (mm2) 0.215 0.422 0.191 1.006 1.314 1.685 6.22 N/A 6.25
Aread (mm2) 0.215 0.422 0.191 1.006 1.314 1.685 3.24 N/A 3.26
Power (mW) 75.27 119.68 63.19 345.39 421.47 493.43 528 553 264
Energy (nJ) 128.54 51.01 107.91 589.82 356.67 315.18 1368 232.9 690.6
a L = 2, C = 8. b P = 2, (c0, c1) = (8, 8). c L = 8, C = 8. d Scaled to 65 nm.
of R = 12 . For R =
2
3 , polar codes yield 2.6× less power
consumption and 3.9× less energy dissipation per frame.
According to these results, SCL-based polar code decoder
implementations offer good solutions for 5G applications that
require low area, power or energy consumption. For com-
munication devices that require low power and energy, SCL,
Fast-SSCL, and PSCL offer better figures than LDPC codes
at comparable FER, code lengths and rates. Considering area
occupation along with power consumption, PSCL provides a
very favorable solution with negligible loss in error-correction
performance.
VI. CONCLUSION
In this work, we evaluate SCL-based polar code decoder
implementations in terms of error-correction performance, area
occupation, power consumption, and energy consumption, for
a code set case study. SCL, SSCL and Fast-SSCL have
the same error-correction performance, while PSCL suffers
minor FER loss. We show that the considered polar code de-
coders have comparable error-correction performance against
WiMAX LDPC codes. We also show and compare the area,
power and energy consumption for all four decoder imple-
mentations, and discuss their trade-offs. Comparing selected
SCL-based decoder implementations against WiMAX LDPC
architectures show that polar code decoders have reduced
area, power and energy consumption, which makes them more
suitable for potential 5G communications.
REFERENCES
[1] E. Arıkan, “Channel polarization: A method for constructing capacity-
achieving codes for symmetric binary-input memoryless channels,” IEEE
Transactions on Information Theory, vol. 55, no. 7, pp. 3051–3073, July
2009.
[2] C. Notes, “3GPP TSG RAN WG1 meeting no. 87, chairmans notes of
agenda item 7.1.5 channel coding and modulation,” , October 2016.
[3] K. Zheng, S. Ou, J. Alonso-Zarate, M. Dohler, F. Liu, and H. Zhu,
“Challenges of massive access in highly dense lte-advanced networks
with machine-to-machine communications,” IEEE Wireless Communi-
cations, vol. 21, no. 3, pp. 12–18, June 2014.
[4] I. Tal and A. Vardy, “List decoding of polar codes,” IEEE Transactions
on Information Theory, vol. 61, no. 5, pp. 2213–2226, May 2015.
[5] A. Balatsoukas-Stimming, M. B. Parizi, and A. Burg, “Llr-based suc-
cessive cancellation list decoding of polar codes,” IEEE Transactions on
Signal Processing, vol. 63, no. 19, pp. 5165–5179, Oct 2015.
[6] S. A. Hashemi, C. Condo, and W. J. Gross, “Simplified successive-
cancellation list decoding of polar codes,” in 2016 IEEE International
Symposium on Information Theory (ISIT), July 2016, pp. 815–819.
[7] S. A. Hashemi, C. Condo, and W. J. Gross, “Fast and flexible successive-
cancellation list decoders for polar codes,” CoRR, vol. abs/1703.08208,
2017. [Online]. Available: http://arxiv.org/abs/1703.08208
[8] S. A. Hashemi, A. Balatsoukas-Stimming, P. Giard, C. Thibeault, and
W. J. Gross, “Partitioned successive-cancellation list decoding of polar
codes,” in 2016 IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP), March 2016, pp. 957–960.
[9] A. Ren, B. Yuan, and Y. Wang, “Design of high-speed low-power
polar BP decoder using emerging technologies,” in 2016 29th IEEE
International System-on-Chip Conference (SOCC), Sept 2016, pp. 312–
316.
[10] A. Cassagne, O. Aumage, C. Leroux, D. Barthou, and B. L. Gal,
“Energy consumption analysis of software polar decoders on low power
processors,” in 2016 24th European Signal Processing Conference
(EUSIPCO), Aug 2016, pp. 642–646.
[11] J. Alonso-Zarate and M. Dohler, “M2M communications in 5G,” in
5G mobile communications, W. Xiang, K. Zheng, and X. Shen, Eds.
Switzerland: Springer International Publishing, 2017, pp. 361–379.
[12] “Evaluation on channel coding candidates for eMBB control channel,”
3GPP TSG RAN WG1 #87, R1-1611109, Reno, USA, Nov. 2016.
[13] A. Alamdar-Yazdi and F. R. Kschischang, “A simplified successive-
cancellation decoder for polar codes,” IEEE Communications Letters,
vol. 15, no. 12, pp. 1378–1380, December 2011.
[14] G. Sarkis, P. Giard, A. Vardy, C. Thibeault, and W. J. Gross, “Fast polar
decoders: Algorithm and implementation,” IEEE Journal on Selected
Areas in Communications, vol. 32, no. 5, pp. 946–957, May 2014.
[15] G. Sarkis, P. Giard, A. Vardy, C. Thibeault, and W. J. Gross, “Fast
list decoders for polar codes,” IEEE Journal on Selected Areas in
Communications, vol. 34, no. 2, pp. 318–328, Feb 2016.
[16] S. A. Hashemi, C. Condo, and W. J. Gross, “A fast polar code list
decoder architecture based on sphere decoding,” IEEE Trans. Circuits
Syst. I, vol. 63, no. 12, pp. 2368–2380, December 2016.
[17] S. A. Hashemi, C. Condo, and W. J. Gross, “Fast simplified successive-
cancellation list decoding of polar codes,” in 2017 IEEE Wireless
Communications and Networking Conference Workshops (WCNCW),
March 2017, pp. 1–6.
[18] S. A. Hashemi, M. Mondelli, S. H. Hassani, R. L. Urbanke, and
W. J. Gross, “Partitioned list decoding of polar codes: Analysis and
improvement of finite length performance,” CoRR, vol. abs/1705.05497,
2017. [Online]. Available: http://arxiv.org/abs/1705.05497
[19] M. P. C. Fossorier, M. Mihaljevic´, and H. Imai, “Reduced complexity
iterative decoding of low-density parity check codes based on belief
propagation,” IEEE Transactions on Communications, vol. 47, no. 5,
pp. 673–680, May 1999.
[20] D. E. Hocevar, “A reduced complexity decoder architecture via layered
decoding of LDPC codes,” in IEEE Workshop onSignal Processing
Systems, 2004. SIPS 2004., Oct 2004, pp. 107–112.
[21] C. H. Liu, C. C. Lin, S. W. Yen, C. L. Chen, H. C. Chang, C. Y. Lee, Y. S.
Hsu, and S. J. Jou, “Design of a multimode QC-LDPC decoder based
on shift-routing network,” IEEE Transactions on Circuits and Systems
II: Express Briefs, vol. 56, no. 9, pp. 734–738, Sept 2009.
[22] J. H. Hung and S. G. Chen, “A 1.45Gb/s (576,288) LDPC decoder for
802.16e standard,” in 2007 IEEE International Symposium on Signal
Processing and Information Technology, Dec 2007, pp. 916–921.
[23] C. H. Liu, S. W. Yen, C. L. Chen, H. C. Chang, C. Y. Lee, Y. S. Hsu,
and S. J. Jou, “An LDPC decoder chip based on self-routing network
for IEEE 802.16e applications,” IEEE Journal of Solid-State Circuits,
vol. 43, no. 3, pp. 684–694, March 2008.
