Fast Encoding and Decoding of Flexible-Rate and Flexible-Length Polar
  Codes by Hanif, Muhammad & Ardakani, Masoud
1Fast Encoding and Decoding of Flexible-Rate and
Flexible-Length Polar Codes
Muhammad Hanif and Masoud Ardakani, Senior Member, IEEE
Abstract—This work is on fast encoding and decoding of
polar codes. We propose and detail 8-bit and 16-bit parallel
decoders that can be used to reduce the decoding latency of the
successive-cancellation decoder. These decoders are universal and
can decode flexible-rate and flexible-length polar codes. We also
present fast encoders that can be used to increase the throughput
of serially-implemented polar encoders.
Index Terms—Polar codes, domination contiguity, maximum
a posteriori, maximum likelihood, minimum distance, systematic
codes, non-systematic codes.
I. INTRODUCTION
POLAR codes are capacity-achieving block codes that arerecently introduced by Arıkan [1]. Due to their provable
capacity-achieving performance with low-complexity encod-
ing and decoding, they have gained significant interest [2],
[3]. In particular, polar codes can be encoded and decoded in
a recursive fashion, which results in encoding and decoding
complexities of O(N log2N), where N is the code length [1].
One of the main challenges associated with polar codes is
their high decoding latency and low throughput [2], [4]. Since
the serial nature of decoding bottlenecks fast implementation
of polar coding, researchers have introduced novel ways to
reduce the decoding time [2]–[6]. For example, [5] introduces
the notion of rate-zero and rate-one nodes to reduce the de-
coding depth of the successive cancellation (SC) decoder. The
resulting simplified SC (SSC) decoder reduces the decoding
latency up to 20 times [6]. The decoding latency can further be
improved by identifying single-parity-check (SPC), repetition
(REP) and REP-SPC nodes in the decoding tree of polar codes
and implementing their fast decoders [2].
The key idea behind the above-mentioned strategies is to
increase the decoding speed by reducing the decoding depth
and implementing fast parallel decoders for some particular
frozen-bit sequences. As such, these schemes work only on
specific codes. In particular, the decoding tree of a given
rate and length polar code is first constructed, and then
the afore-mentioned nodes are identified and implemented in
hardware. Changing the code length or rate will necessitate re-
identification of these nodes. As such, these schemes are not
suitable for variable-rate and/or variable-length polar codes.
The new radio access technology will use a variable rate and
length polar code for downlink control information to fully
utilize the physical resources [7]–[9]. As such, implementing
fast polar decoders that work for any rate and length is of
practical interest. One such decoder, that reduces the decoding
M. Hanif and M. Ardakani are with the Department of Electrical
and Computer Engineering, University of Alberta, AB, Canada (email:
mhanif@uvic.ca, ardakani@ualberta.ca).
depth by one was proposed in [10]. In particular, the authors
proposed to decode two bits in parallel by implementing
four decoders corresponding to all frozen-bit sequences. The
same authors then extended this concept to larger power of
2 block sizes in [11], [12], but the extension results in huge
hardware cost as the number of all frozen-bit sequences grows
exponentially with the block size. Secondly, their decoding
methodology does not identify and utilize the code structure
for different frozen-bit sequences to reduce the hardware area
or computational complexity.
In this paper, we present fast decoders for variable-rate
and/or variable-length polar codes. In particular, we borrow
the idea presented in [10]–[12] of implementing R-bit (R =
2, 4, 8, 16) parallel decoders at the last stage. But, unlike [10]–
[12], we do not implement 2R parallel decoders for decoding
each contiguous block of R bits. Rather, we rely on a key
characteristic of properly-designed polar codes, domination
contiguity of the set of good bit-channel indexes [3], to sig-
nificantly reduce the required number of parallel decoders for
each block. Additionally, we use the minimum distance of the
polar code corresponding to each domination-contiguous set to
further reduce the number of required decoders. For example,
we implement only 21 instead of 216 parallel decoders for
R = 16, which implies that the required number of decoders
is reduced by 99.97% compared to a simple application of
ideas presented in [11], [12].
Achieving hardware-area reduction is not the only aim of
our proposed decoding strategy. We also reduce the decoding
complexity by relying on specific structure of the polar code
corresponding to each bit-channel index set. We aim to mini-
mize computationally-intensive operations, such as check-node
operations, while ensuring that the proposed parallel decoders
do not tangibly alter the performance of the SC decoder.
Unlike decoding, which is serial in nature, encoding of polar
codes can be done in parallel. In fact, the seminal work on
polar codes presented a fully-parallel encoding architecture for
non-systematic polar codes [1]. Although very high encoding
speed can be achieved by implementing a fully-parallel en-
coder, such an implementation is highly disadvantaged due to
large memory size and number of XOR gates, especially for
long polar codes. As such, folded or serial implementations of
polar-code encoders are implemented to reduce the hardware
area [13]. Moreover, systematic polar codes, as proposed in
[14], are serial by nature.
Like decoding, the serial implementation of polar encoder
results in higher encoding latency. Similar to our proposed
decoding strategy, we can increase the encoding speed by
implementing R-bit parallel encoders at the last stage. Such an
implementation is particularly helpful for flexible-rate and/or
ar
X
iv
:1
70
4.
00
65
1v
1 
 [c
s.I
T]
  3
 A
pr
 20
17
2flexible-length systematic polar codes as they are non-trivial to
be parallelized due to their bidirectional information transfer
[2], [3], [14].
Our main contributions are summarized in the following.
1) We present fast parallel decoders for variable-rate and
variable-length polar codes. In particular, we detail 9
(instead of 28) decoders for 8-bit parallel decoding and
21 (instead of 216) decoders for 16-bit parallel decoding
of polar codes. These decoders accommodate all frozen-
bit sequences that can occur in a block of 8 or 16 bits.
2) Secondly, our scheme improves the encoding speed of
serially-implemented variable-rate or variable-length po-
lar codes. Our scheme is particularly useful for flexible-
rate or flexible-length systematic polar codes as their
encoding is hard to be parallelized.
In the following, we first provide a background on polar
codes in Section II, primarily to establish some notations
and explain the challenges associate with variable-rate polar
codes. We also review the domination contiguity of the set of
good bit-channel indexes in Section II. We then present our
proposed R-bit parallel encoder/decoders in Section III. After-
wards, we present some numerical results for corroboration of
our proposed scheme, which will be followed by concluding
remarks in Section V.
Although we will be focussing mainly on the systematic
polar codes (due to their superior performance and difficulty
in parallelization [3], [14]), similar results and conclusions
can be drawn for non-systematic polar codes with little or no
modifications.
II. BACKGROUND
A. Encoding Polar Codes
Polar codes defined on the binary field, F2, are block codes,
which can be mathematically described as
x = uGN , (1)
where x ∈ FN2 is the codeword of length N = 2n, u ∈ FN2 is
the input vector comprising information and frozen bits, and
GN ∈ FN×N2 denotes the generator matrix. The matrix GN
for non-reversed polar codes1 is GN = F⊗n, where F⊗n is
the nth tensor power of F defined as
F⊗n =
[
F⊗(n−1) O
F⊗(n−1) F⊗(n−1)
]
, (2)
with F⊗0 = 1. Denoting the left and right halves of u (x)
by u0 and u1 (x0 and x1), respectively, (2) implies x1 =
u1GN/2, and x0 = u0GN/2 + x1. Consequently, a message
vector of length N can be encoded by encoding two message
vectors of length N/2 each. Repeating this process n times
results in encoding N message bits individually. As such, polar
codes have a low encoding complexity of O (N log2(N)).
1For the sake of exposition, we consider only non-reversed polar codes
as similar conclusions can be drawn for reversed-polar codes with proper
permutation.
B. Constructing Polar Codes
Polar codes rely on the phenomenon of channel polarization,
which constructs N polarized bit channels out of N inde-
pendent copies of a given binary memoryless channel, W . In
particular, N copies of W are combined and then split to a set
of N binary-input channels, W (i)N , where i = 0, 1, · · · , N −1,
such that the symmetric capacity of W (i)N tends towards
either 0 or 1 as N becomes large. Bit channels having near-
unity symmetric capacity are identified as ‘good’ channels,
whereas the others are classified as ‘bad’ channels and are
frozen to zero [1]. Mathematically, denoting the set of ‘good’
and ‘bad’ bit-channel indexes by A and Ac, respectively,
uAc = (ui : i ∈ Ac) = 0.
C. Decoding Polar Codes
Since a polar code can be encoded recursively, it can be
represented by a binary tree, where each node represents a
codeword [2], [5]. Fig. 1 (a) shows such a tree corresponding
to a polar code of length 16, where the white and black leaves
correspond to frozen and information bits, respectively.
(a) (b) (c) (d) (e)
Right
Left
Fig. 1. Decoding trees corresponding to the (a) SC, (b) SSC, (c) fast-SSC,
(d) proposed (R = 8), and (e) proposed (R = 16) decoding algorithms.
We explain different decoding algorithms using the tree
representation as follows. In the SC decoder [1], the root node
receives y, the channel log-likelihood ratios (LLR). Denoting
the left and right halves of y by y0 and y1, respectively, the
root node sends the outputs of check-node operations between
y0 and y1 to its left child. Mathematically, the left child
receives y˜0 = y0  y1, a vector of N/2 real numbers whose
ith element, y˜0i, is computed as
y˜0i = y0i  y1i = 2 tanh−1 [tanh(y0i
2
)
tanh
(y1i
2
)]
, (3)
where y0i and y1i are the ith element of y0 and y1, respec-
tively.
Afterwards, the left child performs decoding on y˜0 (which
we will explain later) and returns a binary vector, x˜0, to the
root node. The root node then computes y˜1 and sends it to
the right child. The ith element of y˜1, y˜1i, is computed as
y˜1i = (1− 2x˜0i) y0i + y1i, (4)
3where x˜0i is the ith element of x˜0. The right child then
performs decoding similar to the left child and returns a binary
vector, x˜1, to the root node. The root node then returns an
estimate of the codeword based on x˜0 and x˜1 as
x̂ =
[
x˜0 + x˜1 x˜1
]
, (5)
where the vector addition is performed in FN/22 .
For non-systematic polar codes, the left and right children
also return û0 and û1, respectively, to the root node. These
binary vectors are estimates of the left and right halves of the
input, u. The root node, in addition to computing x̂, computes
an estimate of u as û =
[
û0 û1
]
.
Since each child of the root node can be considered as the
root node of a subtree, each child performs exactly the same
operation of its parent node. This process continues until the
leaf nodes receive real-valued messages from their parents.
Since leaf nodes do not have any child, they either send 0 or
hard-decision estimates based on the received LLRs to their
parents depending on the frozen-bit sequence.
The decoding latency of the SC decoder depends on
the computation time of the check-node operation and the
decoding-tree depth. The SSC decoder [5] improves the la-
tency by identifying and removing descendants of rate-0 and
rate-1 nodes in the code tree as shown in Fig. 1 (b). Here,
instead of traversing (which involves performing check-node
operations) the subtree rooted in a rate-0 node, an all-zero
vector is sent to its parent node. Similarly, a rate-1 node
computes and sends the binary vector(s) to its parent node
without traversing its descendants.
The fast-SSC decoder [2] further prunes the decoding tree
by identifying the SPC, REP and REP-SPC nodes in the tree.
For example, in Fig. 1 (c), the subtrees of two SPC nodes are
removed from the tree resulting in the decoding depth of only
two.
As clear from Fig. 1, both the SSC and fast-SSC decoders
require identification of some special nodes based on the
frozen-bit sequence, which can be used to eliminate corre-
sponding subtrees in the code tree. In polar codes with flexible
rate or length, the frozen-bit sequence changes with the code
rate and length (see Fig. 2). Hence, these algorithms are not
directly applicable.
For variable-rate or variable-length polar codes, R-bit paral-
lel decoders as proposed in [10]–[12] can be used to improve
the decoding speed, where R = 2, 4, 8, · · · . Fig. 2 shows
such a case where a 16-bit parallel decoder is implemented
to decode a variable-rate polar of length N = 64. Here, the
frozen-bit sequence is represented in hexadecimal notation.
For example, when the frozen-bit sequence is FFFE, all bits
except the last one are frozen to zero in a block of 16 bits.
Observe that, unlike Fig. 1 (b) and (c), the code-tree struc-
ture remains the same regardless of the code rate. Secondly, the
code tree is also complete, i.e., all nodes except the leaf nodes
have two children. A similar observation can be made about
Fig. 1 (d) and (e), where 8-bit and 16-bit parallel decoders are
used to improve the decoding speed, respectively.
The R-bit parallel decoders must be able to decode all the
frozen-bit sequences that can occur in a block of R bits. One
way to satisfy this requirement is to implement 2R parallel
F
F
F
E
F
E
E
0
F
C
80
8000
F
E
E
8
E
000
C
000
0000
F
E
C
0
8000
8000
0000
Case 0
Case 16
Case 1
(a) (b)
y
f
x̂
û
k
=
31
k
=
48
k
=
53
Legend
Fig. 2. An illustration of (a) a code tree corresponding to length-64 polar
codes with code rates k/64, where k = 31, 48, and 53, and (b) the proposed
16-bit parallel decoder.
sub-decoders and select the appropriate one corresponding to
the frozen bit sequence as proposed in [10]–[12]. Unfortu-
nately, this solution becomes impractical even for small values
of R. However, we can reduce the required number of decoders
by using a key characteristic of properly-designed polar codes,
domination contiguity, as described below.
D. Domination Contiguity
Let {{N}} denote the set {0, 1, · · · , N − 1}, where N =
2n, and n is a positive integer. For i ∈ {N}}, we use 〈i〉2
to represent the n-bit binary representation of i, i.e., 〈i〉2 =
inin−1 · · · i1, where in is the most-significant bit. For i, j ∈
{{N}}, i is said to binary dominate j, denoted by i  j, if and
only if ik ≥ jk for all 1 ≤ k ≤ n, where ik and jk are the
kth least-significant bits in the binary representations of i and
j, respectively. A set, S ⊆ {{N}}, is domination contiguous if
h, j ∈ S and h  i  j implies i ∈ S.
For a properly-designed polar code, the set of good bit-
channel indexes, A, must be domination contiguous [3]. Since
not all subsets of {{N}} are domination contiguous, the car-
dinality of the set of all possible frozen-bit sequences of a
properly-designed polar code is less than 2N . In the following,
we will explain how this key characteristic can be used to
significantly reduce the number of required decoders from 2R
for R-bit parallel decoders.
III. PROPOSED ENCODER/DECODER STRUCTURE
The proposed encoders/decoders rely on parallel encod-
ing/decoding of R bit channels simultaneously, where R = 2t,
and t is a positive integer. Specifically, we divide N -bit u
into N/R consecutive groups each containing R bits, i.e.,
u = [u0 · · ·uN/R−1]. Then we encode/decode R bits in each
group simultaneously to reduce the encoding/decoding depth
of the code tree by t levels.
Quite intuitively, the greater the value of R, the faster the
encoder/decoder will be. On the downside, we require more
hardware to implement all the encoders/decoders for a block
size of R. Theorem 1 describes inheritance of domination
contiguity of A to the domination contiguity in each block
4of R bits, which will be used later to optimize the number of
required encoders/decoders for an R-bit parallel decoder.
Theorem 1. Let A˜i denote the set of good bit-channel indexes
corresponding to ui, i = 0, 1, · · · , N/R − 1. For a properly-
designed polar code, A˜i is domination contiguous.
Proof: Observe that all elements of each A˜i have the
same n− t most-significant bits, where n = log2(N), and t =
log2(R). Further, the n−t most-significant bits corresponding
to each A˜i differ in at least one bit from that of A˜j , where
j = 0, 1, · · · , N/R − 1, and j 6= i. Domination contiguity
of A implies that if h, j ∈ A˜i and h  k  j then k ∈ A.
But h and j have the same n − t most-significant bits. As
such, h  k  j implies that k also has the same n − t
most-significant bits. Therefore, k ∈ A˜i.
An immediate consequence of the domination contiguity of
A˜i’s is that the number of required encoders/decoders can be
reduced from 2R. The following theorem shows that the num-
ber of all domination-contiguous sets becomes significantly
less than 2R as R grows to infinity.
Theorem 2. Let U iR = {A˜i} denote a universal set containing
all the good bit-channel indexes of the ith R-bit block, ui, for
i = 0, 1, · · · , N/R−1. Denoting the cardinality of a set S by
|S|, the ratio |U iR|/2R is a decreasing function of R, and in
the limit as R goes to infinity, the ratio |U iR|/2R goes to zero.
Proof: Without loss of generality, we consider the first
R-bit block (i = 0). Furthermore, we let PR = |U0R| and split
the elements of any A˜0 ∈ U0R into two sets, S0 and S1, where
S0 contains those elements of A˜0 that are less than R/2, and
S1 contains the remaining.
Theorem 1 asserts that the domination contiguity of A˜0
begets domination contiguity of S0 and S1 . Consequently,
U0R ⊆ B, where B = {B : B = B0 ∪ B1}, B0 ∈ U0R/2,
B1 = {bi : (bi − R/2) ∈ B˜}, and B˜ ∈ U0R/2. Clearly,
|B| = P 2R/2, and as such, |U0R| = PR ≤ P 2R/2.
Moreover, if A˜0 6= {} then R − 1 ∈ A˜0 [2]. Equivalently,
A˜0 6= {} implies S1 6= {}. Conversely, if S1 = {} then
A˜0 = {} and S0 = {}. But B contains those sets which
correspond to S1 = {} and S0 6= {}. Therefore, U0R ⊂ B, and
PR < P
2
R/2.
Lastly, we define a sequence at = PR/2R = P2t/22
t
, where
t = log2(R), and t = 1, 2, · · · . Using the fact that 0 < at < 1
and at+1 < a2t , we get limt→∞ at = limR→∞
PR
2R
= 0.
Theorem 2 shows that the ratio PR/2R is a decreasing
function of R. But it does not imply that PR does not increase
rapidly. In fact, PR can be shown to be equal to 2, 3, 6, 20, 168,
and 7581 for R = 1, 2, 4, 8, 16, and 32, respectively. Clearly,
implementing PR parallel decoders to increase the decoding
speed becomes impractical even for moderate block lengths.
Fortunately, we can reduce the number of required decoders
further by eliminating some frozen-bit sequences depending
on the minimum distance of the corresponding code and the
position of frozen bits in the sequence as explained below.
Note that the minimum required number of encoders/decoders
corresponding to each block of R bits is R + 1 because the
number of frozen bits in a block of R bits can vary from 0
to R. Hence, at least R + 1 encoders/decoders are needed.
A solution based on R + 1 encoders/decoders, if exists, has
minimized the hardware area.
In the following, we present encoders for all the 20 cases
corresponding to A˜i for R = 8. Later, we reduce this number
to the minimum, i.e. 9, based on the position of frozen bits and
the minimum distance of the polar code corresponding to each
frozen-bit sequence. To do so, we define the notion of max-
frozen set, which will be used to eliminate some candidate
sets.
Let pin : {{N}} → {N}} denote a bit-permutation function
that maps j ∈ {{N}} to pin(j) such that the bits in 〈pin(i)〉2
are a permutation of the bits in 〈i〉2. Further, for D ⊆ {{N}},
we define pin (D) = {pin(d) : d ∈ D}. For example, the bit-
reversal permutation [1] for N = 4 maps j to pi2(j), where
〈pi2(j)〉2 = j1j2, and 〈j〉2 = j2j1. Likewise, if D = {0, 2, 3}
then pi2 (D) = {0, 1, 3}.
Lemma 1. If D is domination contiguous then so is pin (D).
Proof: Let pi−1n (·) be the inverse permutation function,
i.e., pi−1n (pin(j)) = j. Further, let h, j ∈ pin (D) and h 
k  j. As such, pi−1n (h), pi−1n (j) ∈ D. Also, we observe that
binary domination is invariant to bit permutations, i.e., h 
k  j implies pi−1n (h)  pi−1n (k)  pi−1n (j). Since domination
contiguity of D implies pi−1n (k) ∈ D, k ∈ pit (D). Therefore,
pin (D) is domination contiguous.
Let PN = [epin(0) · · · epin(N−1)] denote the permutation
matrix corresponding to the bit-permutation function pin(·).
Here, el denotes a column vector whose lth element is 1 and
the remaining N−1 elements are 0. It was shown in [15], [16]
that PNGN = GNPN . Therefore, uPNGN = xPN . Denot-
ing uPN = upin and xPN = xpin , we get upinGN = xpin .
Consequently, if x is a polar code for input u then permuting
the bit positions of u permutes x in the same manner. We refer
these codes to be the conjugates of the original code. Observe
that the set of good bit-channel indexes of a conjugate polar
code is pin (A).
Lemma 1 confirms that, similar to A, pin (A) is domination
contiguous. Further, when pin (A) differs A, the SC decoding
shows worse performance [16]. However, the SC decoder
can be modified to decode in the permuted order [15], [16]
to achieve the same performance. Consequently, for a given
decoding order, only one set of good bit-channel indexes
will show the best performance. Quite intuitively, the set
which results in early ‘decoding’ of the most frozen bits will
outperform others [17, Section 7.4.3]. We call this set the max-
frozen set. As such, for a given decoding order, the number of
required decoders can be reduced by implementing decoders
only for the max-frozen sets and ignoring their distinct bit-
permuted sets.
In the following, we present encoders and decoders for R =
8. For the sake of clarity, we will drop the subscript i from A˜i.
Also, the set of frozen bit-channel indexes will be represented
by A˜c. Lastly, we will consider the natural-order decoding of
polar codes to eliminate all the bit-permuted sets of the max-
frozen set.
5TABLE I
ALL POSSIBLE FROZEN-BIT SEQUENCES AND THE CORRESPONDING OUTPUTS FOR R = 8.
k f (Binary) f (Hex) A˜ Output, x dmin
0 1, 1, 1, 1, 1, 1, 1, 1 FF {} 0, 0, 0, 0, 0, 0, 0, 0 -
1 1, 1, 1, 1, 1, 1, 1, 0 FE {7} x7, x7, x7, x7, x7, x7, x7, x7 8
1, 1, 1, 1, 1, 1, 0, 0 FC {6,7} x6, x7, x6, x7, x6, x7, x6, x7 4
2 1, 1, 1, 1, 1, 0, 1, 0 FA {5,7} x5, x5, x7, x7, x5, x5, x7, x7 4
1, 1, 1, 0, 1, 1, 1, 0 EE {3,7} x3, x3, x3, x3, x7, x7, x7, x7 4
1, 1, 1, 1, 1, 0, 0, 0 F8 {5,6,7} x567, x5, x6, x7, x567, x5, x6, x7 4
3 1, 1, 1, 0, 1, 1, 0, 0 EC {3,6,7} x367, x3, x367, x3, x6, x7, x6, x7 4
1, 1, 1, 0, 1, 0, 1, 0 EA {3,5,7} x357, x357, x3, x3, x5, x5, x7, x7 4
4
1, 1, 1, 0, 1, 0, 0, 0 E8 {3,5,6,7} x356, x357, x367, x3, x567, x5, x6, x7 4
1, 1, 1, 1, 0, 0, 0, 0 F0 {4,5,6,7} x4, x5, x6, x7, x4, x5, x6, x7 2
1, 1, 0, 0, 1, 1, 0, 0 CC {2,3,6,7} x2, x3, x2, x3, x6, x7, x6, x7 2
1, 0, 1, 0, 1, 0, 1, 0 AA {1,3,5,7} x1, x1, x3, x3, x5, x5, x7, x7 2
1, 1, 1, 0, 0, 0, 0, 0 E0 {3,4,5,6,7} x347, x357, x367, x3, x4, x5, x6, x7 2
5 1, 1, 0, 0, 1, 0, 0, 0 C8 {2,3,5,6,7} x257, x357, x2, x3, x567, x5, x6, x7 2
1, 0, 1, 0, 1, 0, 0, 0 A8 {1,3,5,6,7} x167, x1, x367, x3, x567, x5, x6, x7 2
1, 1, 0, 0, 0, 0, 0, 0 C0 {2,3,4,5,6,7} x246, x357, x2, x3, x4, x5, x6, x7 2
6 1, 0, 1, 0, 0, 0, 0, 0 A0 {1,3,4,5,6,7} x145, x1, x367, x3, x4, x5, x6, x7 2
1, 0, 0, 0, 1, 0, 0, 0 88 {1,2,3,5,6,7} x123, x1, x2, x3, x567, x5, x6, x7 2
7 1, 0, 0, 0, 0, 0, 0, 0 80 {1,2,3,4,5,6,7} x1234567, x1, x2, x3, x4, x5, x6, x7 2
8 0, 0, 0, 0, 0, 0, 0, 0 00 {0,1,2,3,4,5,6,7} x0, x1, x2, x3, x4, x5, x6, x7 1
A. Block Size 8 Encoders/Decoders
Table I enlists all the 20 frozen-bit sequences, f , that can
occur in a block of 8 consecutive bit channels of a properly-
designed polar code. The ith component of f is 1 if the ith
bit channel is frozen and is 0 otherwise. These sequences are
grouped into 9 different cases depending on the number of
information bits in the block of 8 bits. The corresponding set
of good bit-channel indexes are also tabulated. Observe that
all the sets are domination contiguous. Lastly, the systematic
polar code, x, corresponding to each frozen-bit sequence is
also mentioned along with its minimum distance, dmin. As
we explain below, the code structure and its corresponding
frozen-bit sequence and dmin will be used to further reduce
the number of possible cases from 20 to 9. For the sake of
brevity, we have used the notation xabc to denote xa+xb+xc.
In the following, we discuss each individual case and
explain why a particular frozen-bit sequence is kept in each
case.
1) Case 0: This case corresponds the rate-0 node intro-
duced in [5], and the optimal decoder assigns an all-zero vector
to the output.
2) Case 1: This is an (8, 1) repetition code, and the optimal
maximum-likelihood (ML) decoder will add the LLRs of all
the channel outputs and perform threshold detection on the
sum [18]. The same decoder is used in [2], where they have
outlined some low-latency decoding strategies for improving
the decoding speed.
3) Case 2: All the three cases are (8, 2) repetition codes
and are conjugates of one another. For the SC decoder with
natural-order decoding, the first case is the max-frozen set.
Consequently, other cases will not occur. The optimal decoder,
like Case 1, will add the LLRs of four outputs to estimate x7
and other four LLRs to estimate x6.
4) Case 3: These codes are concatenated (8,4) repetition
and (4,3) single parity-check (SPC) codes and are conjugates
of one another. Since A˜ = {5, 6, 7} is the max-frozen set,
only the first case will occur in practice. The decoding
can be carried out by first adding the LLRs of the outputs
corresponding to the same bits. As such, we are left with
the LLRs of a (4,3) SPC code. The optimal ML decoder of
the SPC codes, Wagner decoder [19], makes hard-decision
estimates of xi’s and flips the least-reliable bit if the parity
check is not satisfied.
5) Case 4: The number of good bit-channel indexes, A˜,
is 4 when k = 4. Amongst them, three correspond to (8, 4)
repetition codes. As such, their dmin = 2. In only one case,
A˜ = {3, 5, 6, 7}, the minimum distance turns out to be 4. Since
code performance heavily depends on dmin, only this case will
occur in practice. In fact, this is an extended Hamming code
[20] and is equivalent to the repetition-SPC code introduced
in [2]. Although the optimal ML decoder for such a code
can be implemented easily [18], a low-complexity decoder of
this code was mentioned in [2]. Furthermore, the bit-error rate
(BER) performance of the code is not considerably altered
by implementing the low-complexity decoder instead of the
optimal decoder [2]. For completeness, we briefly mention the
low-complexity decoder below.
First, observe that (x0 + x4, x1 + x5, x2 + x6, x3 + x7)
constitute a (4,1) repetition code, while (x4, x5, x6, x7) is a
(4,3) SPC code. The repetition code can easily be decoded by
adding the LLRs, resulting in x̂8, a hard-decision estimate of
x8 = x3+x7. Afterwards, additional LLRs for (x4, x5, x6, x7)
are trivially computed either by keeping or switching the sign
of the LLRs of (x0, x1, x2, x3) depending on the value of x̂8.
After adding the LLRs of (x4, x5, x6, x7), we are left with a
6x8
x7
x3
x6
x5
x4
x2
x1
x0
Fig. 3. Tanner graph for the systematic-polar code corresponding to A˜ =
{3, 4, 5, 6, 7}.
(4,3) SPC code, which can be decoded by the Wagner decoder.
6) Case 5: Observe that all the codes are conjugate of one
another, and A˜ = {3, 4, 5, 6, 7} is the max-frozen set. Thus,
only the first case will occur in practice. By introducing an
additional node, x8 = x3 + x7, a cycle-free Tanner graph of
the code can be obtained as shown in Fig. 3. As such, a non-
iterative optimal maximum-a-posteriori (MAP) decoder can be
implemented [21].
Like Case 4, a low-complexity sub-optimal decoder can also
be implemented by first making a hard decision about x8.
Afterwards, depending on x̂8, LLRs of (x0, x1, x2, x3) can be
added or subtracted to that of (x4, x5, x6, x7). Hard decisions
of (x4, x5, x6, x7) can then be carried out, which along with x̂8
can be used to find estimates of (x0, x1, x2, x3). The decoding
latency can be reduced by implementing two parallel decoders
assuming x̂8 equals 0 or 1 and selecting the appropriate output
depending on the actual value of x̂8. Also, as verified by
the simulation results, the BER performance is not tangibly
degraded by implementing the sub-optimal decoder instead of
the optimal MAP decoder.
7) Case 6: All of the codes conjugates of one another. But
the first case will occur in practice as A˜ = {2, 3, 4, 5, 6, 7} is
the max-frozen set. Since this code is just two (4,3) SPC codes
put together, two Wagner decoders can be used to optimally
decode it.
8) Case 7: This is an (8,7) SPC code, and the Wagner
decoder can be used to optimally decode it. Note that this
code is equivalent to the one corresponding to the SPC node
mentioned in [2], which were introduced to increase the
decoding speed, especially for high-rate polar codes.
9) Case 8: This case is equivalent to the rate-1 node
introduced in [5], and a hard decision of the observed outputs
gives the decoded output.
Remark 1. For non-systematic codes, exactly the same de-
coders can be used to find a hard-decision estimate of x,
which can be used to decode the input, u. For example, for
f = (1, 1, 1, 1, 1, 1, 0, 0), x = (u6 + u7, u7, u6 + u7, u7, u6 +
u7, u7, u6+u7, u7), which is an (8,2) repetition code. Like in
Case 2, we can find hard-decision estimates of u6 + u7 and
u7, which can be used to find a hard-decision estimate of u6.
Remark 2. For permuted polar codes, similar conclusions
can be drawn as the set of good bit-channel indexes is
also domination contiguous for permuted (systematic/non-
systematic) polar codes. In particular, the max-frozen sets
for an individual case will be determined according to the
permuted decoding order. Also, the presented low-complexity
decoders can trivially be modified to decode the polar codes
corresponding to the max-frozen sets.
Remark 3. Implementing the proposed low-complexity de-
coders reduces the number of check-node operations signif-
icantly. In particular, the SC decoder uses 12 check-node
operations for decoding 8 bits for each of the afore-mentioned
cases. In our case, however, baring Case 4 and Case 5,
check-node operations are not used. Furthermore, decoders of
Case 4 and Case 5 use only 4 check-node operations, which
are implemented in parallel. As such, compared with the SC
decoder, where 12 check-node operations are used and are
carried out sequentially, the proposed decoders require less
check-node operations and are faster as these operations can
be implemented in parallel.
B. Block Size 16 Encoders/Decoders
Having discussed all the possible cases for R = 8, we now
consider polar codes for R = 16. Similar to R = 8 case,
we can significantly reduce the required number of parallel
encoders/decoders (from 168 to just 21). Table II enlists these
21 cases along with the dmin of the corresponding codes. Here,
we have used hexadecimal indexes for the sake of brevity.
The appendix provides a detailed description of the proposed
decoders for the cases tabulated in Table II.
It is worth noting that at least 17 encoders/decoders are
required for encoding/decoding flexible-rate polar codes. So
only four extra encoders/decoders are needed to ensure that
the polar codes designed for any rate, length and channel can
be encoded/decoded. The following theorems assert that the
encoders/decoders for f = FFC0, f = FF80, f = FCC0, and
f = C0C0 are not required when polar codes are designed for
a binary-erasure channel (BEC) or by Huawei formula [7].
Theorem 3. If polar codes are constructed for a BEC,
encoders/decoders for f = FFC0, f = FF80, f = FCC0,
and f = C0C0 are not required.
Proof: This assertion can be proved by noting that,
regardless of the value of erasure probability, the afore-
mentioned four cases do not occur when N = 16. The result
immediately follows by noting that the frozen-bit sequence for
each of 16-bit block is generated for a BEC [1].
Recently, in 3GPP RAN1 #87 meeting, an agreement was
reached to use variable-rate polar codes for uplink control
channel [7], [9]. Since polar-code design is channel depen-
dent, and the location of frozen bits varies with the channel
conditions, Huawei presented a channel-independent reliability
metric for constructing polar codes [7]. In particular, each
polarized bit-channel, W (j)N , is assigned a reliability metric,
Qj , computed as
Qj =
n∑
k=1
jk2
k−1
4 , (6)
7TABLE II
FROZEN-BIT SEQUENCES AND THE CORRESPONDING OUTPUTS FOR R = 16.
k f (Hex) Output, x dmin
0 FFFF 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 -
1 FFFE xF, xF, xF, xF, xF, xF, xF, xF, xF, xF, xF, xF, xF, xF, xF, xF 16
2 FFFC xE, xF, xE, xF, xE, xF, xE, xF, xE, xF, xE, xF, xE, xF, xE, xF 8
3 FFF8 xDEF, xD, xE, xF, xDEF, xD, xE, xF, xDEF, xD, xE, xF, xDEF, xD, xE, xF 8
4 FFE8 xBDE, xBDF, xBEF, xB, xDEF, xD, xE, xF, xBDE, xBDF, xBEF, xB, xDEF, xD, xE, xF 8
5 FEE8 x7BDEF, x7BD, x7BE, x7BF, x7DE, x7DF, x7EF, x7, xBDE, xBDF, xBEF, xB, xDEF, xD, xE, xF 8
6
FFC0 xACE, xBDF, xA, xB, xC, xD, xE, xF, xACE, xBDF, xA, xB, xC, xD, xE, xF 4
FEE0 x7BC, x7BD, x7BE, x7BF, x7CF, x7DF, x7EF, x7, xBCF, xBDF, xBEF, xB, xC, xD, xE, xF 4
7
FF80 x9ABCDEF, x9, xA, xB, xC, xD, xE, xF, x9ABCDEF, x9, xA, xB, xC, xD, xE, xF 4
FEC0 x7ACEF, x7BD, x7AF, x7BF, x7CF, x7DF, x7EF, x7, xACE, xBDF, xA, xB, xC, xD, xE, xF 4
8
FE80 x79ABCDF, x79F, x7AF, x7BF, x7CF, x7DF, x7EF, x7, x9ABCDEF, x9, xA, xB, xC, xD, xE, xF 4
FCC0 x6AC, x7BD, x6AE, x7BF, x6CE, x7DF, x6, x7, xACE, xBDF, xA, xB, xC, xD, xE, xF 4
9 FC80 x69ABCDF, x79F, x6AE, x7BF, x6CE, x7DF, x6, x7, x9ABCDEF, x9, xA, xB, xC, xD, xE, xF 4
10 F880 x5679ABC, x59D, x6AE, x7BF, x567CDEF, x5, x6, x7, x9ABCDEF, x9, xA, xB, xC, xD, xE, xF 4
11 E880 x3569ACF, x3579BDF, x367ABEF, x3, x567CDEF, x5, x6, x7, x9ABCDEF, x9, xA, xB, xC, xD, xE, xF 4
12
E800 x3568BDE, x3579BDF, x367ABEF, x3, x567CDEF, x5, x6, x7, x8, x9, xA, xB, xC, xD, xE, xF 2
C0C0 x246, x357, x2, x3, x4, x5, x6, x7, xACE, xBDF, xA, xB, xC, xD, xE, xF 2
13 E000 x3478BCF, x3579BDF, x367ABEF, x3, x4, x5, x6, x7, x8, x9, xA, xB, xC, xD, xE, xF 2
14 C000 x2468ACE, x3579BDF, x2, x3, x4, x5, x6, x7, x8, x9, xA, xB, xC, xD, xE, xF 2
15 8000 x123456789ABCDEF, x1, x2, x3, x4, x5, x6, x7, x8, x9, xA, xB, xC, xD, xE, xF 2
16 0000 x0, x1, x2, x3, x4, x5, x6, x7, x8, x9, xA, xB, xC, xD, xE, xF 1
where jk is the kth least-significant bit in the n-bit binary
representation of j, i.e., 〈j〉2 = jnjn−1 · · · j1.
The following theorem asserts that the extra cases are not
required when polar codes are constructed by Huawei formula.
Theorem 4. If polar codes are constructed by using Huawei
formula, the four extra case (f = FFC0, f = FF80, f =
FCC0, and f = C0C0) do not occur at all.
Proof: Observe that
Qj =
n∑
k=1
jk2
k−1
4 =
4∑
k=1
jk2
k−1
4
︸ ︷︷ ︸
T1
+
n∑
k=5
jk2
k−1
4
︸ ︷︷ ︸
T2
, (7)
Next, we partition {{N}} into N/16 consecutive groups, each
containing 16 numbers. By denoting them with Gi, where
i = 0, 1, · · · , N/16 − 1, we have G0 = {{16}} and Gi =
{16i + g : g ∈ G0}. Further, observe that Qj of all the
elements in Gi have exactly the same value of T2. As such,
inclusion of j ∈ Gi in A˜i depends solely on T1, or the
last four bits of 〈j〉2. Letting j˜ denote the decimal number
corresponding to the last 4 bits of j, we observe that the
values of T1 decreases monotonically when j˜ takes on values
from Q150 = (15, 14, 13, 11, 7, 12, 10, 9, 6, 5, 3, 8, 4, 2, 1, 0) left
to right. As such, for Case k (k = 0, 1, · · · , 16), the first k bit-
channel indexes are selected in Q150 for information transfer,
and the rest are frozen to zero. Consequently, only 17 unique
cases will occur, and the afore-mentioned four extra cases will
not occur.
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
10
-6
10
-5
10
-4
10
-3
10
-2
10
-1
10
0
Ec/N0
B
E
R
SC Decoder (Non-Systematic)
Proposed Low-Complexity (Non-Systematic)
Proposed Optimal (Non-Systematic)
SC Decoder (Systematic)
Proposed Low-Complexity (Systematic)
Proposed Optimal (Systematic)
Fig. 4. BER performance of systematic and non-systematic rate-1/2 polar
codes of length N = 256 in additive-white-Gaussian-noise channel with the
signal-to-noise ratio of Ec/N0 when decoded by conventional SC decoder
and the proposed decoders for R = 8.
IV. RESULTS
In this section, we compare the proposed decoding strategy
with the SC decoder in terms of the bit-error-rate and decoding
latency performances.
A. BER Performance
Fig. 4 compares the BER performance of the proposed 8-bit
parallel decoders with that of the SC decoder. Both optimal
8and low-complexity sub-optimal decoders are implemented for
the proposed decoders. Here, the polar codes are constructed
for the binary-erasure channel with the erasure probability of
e−1 ' 0.37 [22], [23].
Some interesting observations can be made by analyzing
the figure. First, the proposed decoders do not deteriorate the
performance of the SC decoder. Second, the performance gap
between the optimal and the sub-optimal decoders is negligibly
small, which implies that the low-complexity decoders can be
used instead of the optimal ones. Last, the proposed schemes
can be used for both systematic and non-systematic polar
codes.
B. Decoding Latency
One way to approximate the decoding latency of different
polar decoders is as follows. We presume that bit operations
and addition/subtraction of real numbers can be carried out in
one clock cycle, whereas a check-node operation takes Tc and
finding the minimum of a list takes Tm clock cycles. Note that
finding a minimum requires significantly less computations
than performing a check-node operation [24].
For a block of R bits, the SC decoder performs R/2 check-
node operations in each of log2(R) stages. But all the check-
node operations can be performed in parallel in the first stage,
whereas the last stage requires all check-node operations to
be performed in a sequential manner. In general, the number
of parallel check-node operations performed in the mth stage
is R/2m, where m = 1, 2, · · · , log2(R). Consequently, a
decoding latency of Tc(R/2 + R/4 + · · · + 1) = (R − 1)Tc
cycles is incurred in performing check-node operations for the
SC decoder. Similarly, the number of binary and real additions
involved in decoding R bits be shown to be 1 and R − 1,
respectively. Therefore, an SC decoder will take R+(R−1)Tc
clock cycles to decode a block of R bits. Note that this
decoding latency is fixed regardless of the frozen-bit sequence.
For the proposed schemes, the decoding latency varies
depending on the frozen-bit sequence. For example, for R = 8,
Case 0 can be executed instantaneously, whereas Case 4 incurs
the highest decoding latency of 1 + max {Tc, Tm} cycles,
which is calculated as follows. The check-node operations are
performed is parallel, so they take only Tc cycles for execution.
Meanwhile, two outputs can be generated corresponding to
x8 = 0 and 1 by two Wagner decoders (Tm+1 clock cycles),
and, depending on the value of x8 (computation time is 1
cycle), one of the outputs is selected. Therefore, the decoding
latency for Case 4 is max {1 + Tc, 1 + Tm}.
Observe that even the highest decoding latency of the pro-
posed decoders is much smaller than that of the SC decoder.
Hence, our proposed scheme will significantly improve the
decoding speed of variable-rate polar codes.
Table III shows the decoding latency, L, of the proposed
low-complexity decoders for blocks of length R = 8 and R =
16 bits. Observe that the proposed decoders have significantly
less decoding latency compared to the SC decoder, which has
the decoding latency of R+(R−1)Tc cycles for all the cases.
V. CONCLUSION
In this work, we presented fast 8-bit and 16-bit parallel de-
coders that can reduce the decoding-tree depth of the decoding
tree of variable-rate and variable-length polar codes. They can
reduce both the decoding latency and hardware complexity
without deteriorating the bit-error-rate performance of the
successive-cancellation decoder.
APPENDIX
BLOCK SIZE 16 DECODERS
This appendix details the proposed low-complexity parallel
decoders for the cases tabulated in Table II.
1) Case 0: The optimal decoder assigns an all-zero vector
to the output.
2) Case 1: This is a repetition code, and the optimal
decoder makes a hard decision on the sum of the LLRs of
the received bits.
3) Case 2: This is a (16,2) repetition code, and the optimal
ML estimates of xE and xF can be found by making hard-
decisions on the LLR sums of even-indexed and odd-indexed
bits, respectively.
4) Case 3: This is a (4,3) SPC code concatenated with a
(16,4) repetition code. The optimal decoder will first add the
LLRs of the received bits corresponding to xDEF, xD, xE, and xF
before finding their hard estimates with the Wagner decoder.
5) Case 4: This is an (8,4) extended Hamming code con-
catenated with a (16,8) repetition code. Decoders mentioned
in Case 4 for R = 8 can be used after adding the LLRs of the
first half to the second’s.
6) Case 5: Although an exhaustive-search based ML de-
coder can be implemented, we can reduce the decoding
complexity by introducing a new variable, z = x7 + xF. By
noting that x0 + x8 = x1 + x9 = · · · = x7 + xF = z, we
first find ẑ, a hard-decision estimate of z. Specifically, we add
y0  y8, y1  y9, · · · y7  yF to get an LLR for z and make a
hard decision on the LLR to compute ẑ.
Afterwards, the decoder computes y0±y8, y1±y9, · · · , y7±
yF, where addition is performed when ẑ = 0, and subtraction
otherwise. These values are then input to one of the decoders
of Case 4 for R = 8 to get estimates of x8, x9, · · · , xF. Lastly
ẑ is added to x̂8, x̂9, · · · , x̂F to compute x̂0, x̂1, · · · , x̂7.
Implementing two parallel decoders corresponding to ẑ
equalling 0 and 1 and choosing an appropriate output after
computing ẑ can reduce the decoding latency.
7) Case 6: For the first case, f = FFC0, an optimal
decoder can be implemented by noting that the codeword
comprises two concatenated (4,3) SPC and (8,4) repetition
codes. Two separate Wagner decoders can be used to find hard
estimates of the transmitted bits after adding the LLRs of the
repeated bits.
For the second case, f = FEE0, the decoder presented in
Case 5 (R = 16) can be used to decode the received LLRs
with the exception that the decoder of Case 4 (R = 8) is
replaced with the low-complexity decoder of Case 5 (R = 8).
9TABLE III
DECODING LATENCIES OF THE PROPOSED LOW-COMPLEXITY DECODERS FOR R = 8 AND R = 16.
R = 8 Latency, L R = 16 Latency, L R = 16 Latency, L
Case 0 0 Case 0 0 Case 8 1 + max {Tc, Tm}
Case 1 1 Case 1 1 Case 9 1 + max {Tc, Tm}
Case 2 1 Case 2 1 Case 10 3 + Tc + Tm
Case 3 1+Tm Case 3 1+Tm Case 11 3 + Tc + Tm +max {Tc, Tm}
Case 4 1 + max {Tc, Tm} Case 4 2 + max {Tc, Tm} Case 12 3 + Tc +max {Tc, Tm}
Case 5 1 + Tc Case 5 3 + max {Tc, Tm} Case 12 Tm
Case 6 Tm Case 6 1 + Tm Case 13 max {1 + Tc, Tm}
Case 7 Tm Case 6 3 + Tc Case 14 Tm
Case 8 0 Case 7 1 + Tm Case 15 Tm
Case 7 2 + Tm Case 16 0
Case 8 2 + Tm
8) Case 7: The first case, f = FF80, corresponds to a
concatenated (8,7) SPC and (16,8) repetition code. Therefore,
the code can be decoded optimally by adding the LLRs
of x0, x1, · · · , x7 to that of x8, x9, · · · , xF and finding hard
estimates by the Wagner decoder.
For the second case, f = FEC0, the decoder presented in
Case 5 (R = 16) can be used to decode the received LLRs
with the exception that the low-complexity decoder of Case 6
(R = 8) replaces the decoder of Case 4 (R = 8).
9) Case 8: For the first case, f = FE80, the decoder
presented in Case 5 (R = 16) can be used to decode the
received LLRs with the exception that the Wagner decoder is
used instead of the decoder of Case 4 (R = 8).
For the second case, f = FCC0, the even-indexed and
the odd-indexed bits constitute two separate (8,4) extended
Hamming codes. Therefore, the decoders for Case 4 (R = 8)
can be used to decode the received code vector.
10) Case 9: A low-complexity decoder can be implemented
by defining z0 = x6 + xE and z1 = x7 + xF. Observe that
x0 + x8 = x2 + xA = x4 + xC = x6 + xE = z0, and x1 +
x9 = x3 + xB = x5 + xD = x7 + xF = z1. The proposed
decoder makes hard decisions on y0  y8 + · · ·+ y6  yE and
y1  y9 + · · · + y7  yF and respectively assigns them to ẑ0
and ẑ1, hard estimates of z0 and z1, respectively. Afterwards,
additional LLRs for x8, x9, · · · , xF are computed by using ẑ0
and ẑ1. For example, the additional LLR of x8 is y0 when ẑ0
is 0 and −y0 otherwise. After adding the additional LLRs to
y8, y9, · · · , yF, Wagner decoder is used to find hard estimates
of x8, x9, · · · , xF, which along with ẑ0 and ẑ1 are used to
estimate x0, x1, · · · , x7.
The decoding latency can be reduced by implementing four
Wagner decoders (corresponding to four possible values of
(ẑ0, ẑ1)) and using the computed value of (ẑ0, ẑ1) to select
the output of corresponding Wagner decoder.
11) Case 10: A low-complexity decoder can be imple-
mented by defining four variables: z0 = x0 + x8 = x4 + xB,
z1 = x1 + x9 = x5 + xC, z2 = x2 + xA = x6 + xD, and
z3 = x3 + xB = x7 + xF. The decoder will first compute
ẑi’s, hard estimates of zi’s, where i = 0, 1, 2, 3, from the
LLRs obtained by adding LLRs of the output of check-node
operations. For example, LLR of z0 is computed by adding
y0y8 and y4yB, where yi denotes the LLR of xi. Depending
on the value of zi’s, additional LLRs for x8, x9, · · · , xF are
obtained from y0, y1, · · · , y7. For example, additional LLR
for x8 is y0 when z0 = 0 and is −y0 when z0 = 1.
After adding the additional LLRs to the received LLRs, the
decoder finds hard estimates of x8, x9, · · · , xF using Wagner
decoder. Finally, these hard estimates are used along with ẑi’s
to estimate x0, x1, · · · , x7.
12) Case 11: A low-complexity decoder can be imple-
mented by observing that z0, z1, · · · , z7 constitute an (8,4)
extended Hamming code, where z0 = x0 + x8, z1 = x1 +
x9, · · · , z7 = x7 + xF. As such, the decoders of Case 4
for R = 8 can be used to find ẑi’s, estimates of zi’s
for i = 0, 1, · · · , 7. Then, depending on the values of ẑi,
additional LLRs for x8, x9, · · · , xF are obtained from yi’s,
where i = 0, 1, · · · , 7. For example, the additional LLR
for x8 is y0 if ẑ0 = 0 and −y0 otherwise. After adding
the LLRs, Wagner decoder is used to compute estimates of
x8, x9, · · · , xF. The decoded bits along with ẑ0, ẑ1, · · · , ẑ7 are
then used to compute estimates of x0, x1, · · · , x7.
13) Case 12: For the first case, f = E800, the decoder
of Case 11 can be used to decode the received LLRs, with
the exception that the Wagner decoder is not used at all.
Rather, hard decisions are made on the updated LLRs of
x8, x9, · · · , xF.
In the second case, f = C0C0, the code word consists
of four separate (4,3) SPC codes, which can be individually
decoded by the Wagner rule.
14) Case 13: Introducing a new variable, z = x3 + x7 +
xB +xF, results in a cycle-free Tanner graph as shown in Fig.
5. As such, a non-iterative optimal MAP decoder can easily
be implemented for this case.
A low-complexity decoder can also be constructed for this
code. For example, making a hard decision on z results in four
separate SPC codes, which can be decoded by the Wagner rule.
15) Case 14: This code consists of two (8,7) SPC codes,
which can be optimally decoded by two Wagner decoders.
16) Case 15: This is a (16,15) SPC code, and Wagner
decoder can be used to decode the received LLRs optimally.
17) Case 16: The optimal decoder will make hard decisions
on the LLRs of the received bits.
10
x1
x5
x9
xD
x3
x7
xB
xF
z
x8 xCx4x0
xA xEx6x2
Fig. 5. Tanner graph for the systematic-polar code corresponding to f =
E000.
REFERENCES
[1] E. Arıkan, “Channel polarization: A method for constructing capacity-
achieving codes for symmetric binary-input memoryless channels,” IEEE
Trans. Inf. Theory, vol. 55, no. 7, pp. 3051–3073, Jul. 2009.
[2] G. Sarkis, P. Giard, A. Vardy, C. Thibeault, and W. J. Gross, “Fast polar
decoders: Algorithm and implementation,” IEEE J. Sel. Areas Commun.,
vol. 32, no. 5, pp. 946–957, May 2014.
[3] G. Sarkis, I. Tal, P. Giard, A. Vardy, C. Thibeault, and W. J. Gross,
“Flexible and low-complexity encoding and decoding of systematic polar
codes,” IEEE Trans. Commun., vol. 64, no. 7, pp. 2732–2745, Jul. 2016.
[4] P. Giard, G. Sarkis, C. Thibeault, and W. J. Gross, “237 Gbit/s unrolled
hardware polar decoder,” Electron. Lett., vol. 51, no. 10, pp. 762–763,
May 2015.
[5] A. Alamdar-Yazdi and F. R. Kschischang, “A simplified successive-
cancellation decoder for polar codes,” IEEE Commun. Lett., vol. 15,
no. 12, pp. 1378–1380, Dec. 2011.
[6] K. Niu, K. Chen, J. Lin, and Q. T. Zhang, “Polar codes: Primary concepts
and practical decoding algorithms,” IEEE Commun. Mag., vol. 52, no. 7,
pp. 192–203, Jul. 2014.
[7] Huawei and HiSilicon, “Details of the polar code design,” 3GPP TSG
RAN WG1 Meeting#87, Reno, USA, Tech. Rep. R1-1611254, Nov.
2016.
[8] Qualcomm Incorporated, “LDPC rate and compatible design overview,”
3GPP TSG-RAN WG1 #86bis, Lisbon, Portugal, Tech. Rep. R1-
1610137, Oct. 2016.
[9] Ericsson, “Performance study of polar code candidates,” 3GPP TSG-
RAN WG1 #88, Athens, Greece, Tech. Rep. R1-1703538, Feb. 2017.
[10] B. Yuan and K. K. Parhi, “Low-latency successive-cancellation polar
decoder architectures using 2-bit decoding,” IEEE Trans. Circuits Syst.
I, vol. 61, no. 4, pp. 1241–1254, Apr. 2014.
[11] ——, “Low-latency successive-cancellation list decoders for polar codes
with multibit decision,” IEEE Trans. VLSI Syst., vol. 23, no. 10, pp.
2268–2280, Oct. 2015.
[12] ——, “LLR-based successive-cancellation list decoder for polar codes
with multibit decision,” IEEE Trans. Circuits Syst. II, vol. 64, no. 1, pp.
21–25, Jan. 2017.
[13] H. Yoo and I. C. Park, “Partially parallel encoder architecture for long
polar codes,” IEEE Trans. Circuits Syst. II, vol. 62, no. 3, pp. 306–310,
Mar. 2015.
[14] E. Arıkan, “Systematic polar coding,” IEEE Commun. Lett., vol. 15,
no. 8, pp. 860–862, Aug. 2011.
[15] N. Hussami, S. B. Korada, and R. Urbanke, “Performance of polar codes
for channel and source coding,” in IEEE Int. Symp. Inf. Theory, Jun.
2009, pp. 1488–1492.
[16] H. Vangala, E. Viterbo, and Y. Hong, “Permuted successive cancellation
decoder for polar codes,” in Int. Symp. Inf. Theory Applicat., Oct. 2014,
pp. 438–442.
[17] J. Guo, “Polar codes for reliable transmission: Theoretical analysis
and applications,” Ph.D. dissertation, University of Cambridge,
Jun. 2015. [Online]. Available: http://itc.upf.edu/system/files/biblio-pdf/
Thesis JingGuo.pdf
[18] J. Snyders and Y. Be’ery, “Maximum likelihood soft decoding of binary
block codes and decoders for the Golay codes,” IEEE Trans. Inf. Theory,
vol. 35, no. 5, pp. 963–975, Sep. 1989.
[19] R. Silverman and M. Balser, “Coding for constant-data-rate systems,”
IRE Trans. Prof. Group Inform. Theory, vol. 4, no. 4, pp. 50–63, Sep.
1954.
[20] S. B. Wicker, Error control systems for digital communication and
storage. Prentice hall Englewood Cliffs, 1995, vol. 1.
[21] F. R. Kschischang, B. J. Frey, and H. A. Loeliger, “Factor graphs and
the sum-product algorithm,” IEEE Trans. Inf. Theory, vol. 47, no. 2, pp.
498–519, Feb. 2001.
[22] E. Arıkan, “A performance comparison of polar codes and Reed-Muller
codes,” IEEE Commun. Lett., vol. 12, no. 6, pp. 447–449, Jun. 2008.
[23] H. Vangala, E. Viterbo, and Y. Hong, “A comparative study of polar code
constructions for the AWGN channel,” arXiv preprint arXiv:1501.02473,
2015. [Online]. Available: http://arxiv.org/abs/1501.02473
[24] M. P. C. Fossorier, M. Mihaljevic, and H. Imai, “Reduced complexity
iterative decoding of low-density parity check codes based on belief
propagation,” IEEE Trans. Commun., vol. 47, no. 5, pp. 673–680, May
1999.
