Polar Codes with Memory by Zhou, Wenyue et al.
ar
X
iv
:1
90
7.
00
52
7v
1 
 [c
s.I
T]
  1
 Ju
l 2
01
9
1
Polar Codes with Memory
Wenyue Zhou, Student Member, IEEE, Qiang Liu, Student Member, IEEE,
Yifei Shen, Student Member, IEEE, Xiaofeng Zhou, Student Member, IEEE,
Chuan Zhang, Member, IEEE, Yaohua Xu, Member, IEEE and Liping Li, Member, IEEE.
Abstract—Polar codes with memory (PCM) are pro-
posed in this paper: a pair of consecutive code blocks
containing a controlled number of mutual information
bits. The shared mutual information bits of the succeeded
block can help the failed block to recover. The underlying
polar codes can employ any decoding scheme such as the
successive cancellation (SC) decoding (PCM-SC), the belief
propagation (BP) decoding (PCM-BP), and the successive
cancellation list (SCL) decoding (PCM-SCL). The analysis
shows that the packet error rate (PER) of PCM decreases
to the order of PER squared while maintaining the same
complexity as the underlying polar codes. Simulation
results indicate that for PCM-SC, the PER is comparable
to (less than 0.3 dB) the stand-alone SCL decoding with
two lists for the block length N = 256. The PER of
PCM-SCL with L lists can match that of the stand-alone
SCL decoding with 2L lists. Two hardware decoders for
PCM are also implemented: the in-serial (IS) decoder and
the low-latency interleaved (LLI) decoder. For N = 256,
synthesis results show that in the worst case, the latency of
the PCM LLI decoder is only 16.1% of the adaptive SCL
decoder with L = 2, while the throughput is improved by
13 times compared to it.
Index Terms—polar codes, successive cancellation de-
coding, mutual information bits, interleaved decoder, polar
codes with memory.
I. INTRODUCTION
Polar codes invented by Arıkan [1] have been
proven to be a coding scheme that can achieve the
capacity of symmetric binary-input discrete mem-
oryless channels (B-DMCs) with low complexity
of encoding and successive cancellation (SC) de-
coding. Nevertheless, on account of the insufficient
This work was supported in part by the National Natural Science
Foundation of China through grant 61501002, in part by the Natural
Science Project of Ministry of Education of Anhui through grant
KJ2015A102, and in part by the Talents Recruitment Program of
Anhui University.
Wenyue Zhou, Yaohua Xu and Liping Li are with the Key
Laboratory of Intelligent Computing and Signal Processing, Ministry
of Education, Anhui University, Hefei, China (liping li@ahu.edu.cn).
Qiang Liu, Yifei Shen, Xiaofeng Zhou and Chuan Zhang are with
the National Mobile Communications Research Laboratory, Southeast
University, Nanjing, China (chzhang@seu.edu.cn)
polarization, the error-correcting performance of
moderate length polar codes under SC decoding is
unsatisfactory [2], [3]. To acquire better finite-length
performance, successive cancellation list (SCL) de-
coding was proposed in [3]–[5] and it is comparable
to low-density parity-check (LDPC) codes in terms
of error-correcting performance. Belief propagation
(BP) was an alternative decoding algorithm [2], [6],
[7] over the factor graph of polar codes. It has better
performance than the SC decoding and supports
parallel decoding. But the bit error rate (BER)
performance of polar codes with BP decoding is
still inferior to the SCL decoding (shown in this
paper).
In this paper, a new construction scheme of polar
codes is proposed by sharing a controlled number of
information bits between two consecutive encoding
blocks. The input stream is divided into an odd
stream and an even stream. In the encoding process,
the corresponding odd and even blocks share a
fraction of information bits, which are called mutual
information bits in this paper. Cyclic redundancy
check (CRC) bits are attached to the information
bits in each block. The encoding of these two
blocks can be done sequentially or in parallel. In
the decoding process, only when one of the pair is
decoded correctly, the succeeded block can provide
the estimations of the mutual information bits to the
failed block. With a proper design of the positioning
of the mutual information bits, the failed block can
be recovered with another round of decoding.
Since the two consecutive blocks share mutual
information bits, it is like there is some memory
in the encoding process. Therefore, we call the
proposed scheme polar codes with memory (PCM)
to differentiate this scheme from the traditional
polar encoding scheme. In addition, this scheme can
be directly extended to m (m > 2) blocks. Based
on this, a general PCM is proposed in this paper,
which reduces the effective code rate loss while
maintaining the same order of PER, compared with
2the direct extension of PCM. Analysis shows that
the packet error rate (PER) of PCM is only square
of that of the underlying polar codes. Note that
this great performance improvement comes with a
complexity the same as the underlying polar codes.
The decoding of PCM can be implemented by the
SC, BP, or SCL decoding. In other words, the
decoding complexity of PCM is the complexity of
the underlying SC, BP, or SCL decoding. For ease
of description, PCM-SC-2 is used to refer to the
PCM employing the SC decoding and two blocks
sharing mutual information bits. Similarly, PCM-
BP-2 and PCM-SCL-2 refer to two blocks sharing
mutual information bits, each block employing the
BP and SCL decoding, respectively. In addition,
PCM-SC-m (m > 3) refers to the general PCM
employing the SC decoding with m blocks.
The simulation results show that the PER of
PCM-SC-2 is only 0.3 dB away from the stand-
alone SCL decoding with L = 2 (L being the
list size) for the studied case in the paper. The
performance of PCM-BP-2 can achieve the same
performance of the stand-alone SCL decoding with
L = 2. In addition, the performance of PCM-SCL-2
with L lists matches the performance of the stand-
alone SCL decoding with 2L lists. Two hardware
architectures are also proposed in the paper: an
in-serial (IS) architecture and a low-latency inter-
leaved (LLI) architecture. Implementation results
show that for the block length 256, the proposed
LLI architecture for PCM with the SC decoding has
lower latency and higher throughput compared to
the adaptive SCL decoder (L = 4) [8].
The rest of the paper is organized as follows.
Section II is on the basics of polar codes. Section III
introduces the proposed PCM scheme. Specifically,
Section III-A introduces the encoding process of
PCM, Section III-B is about the corresponding
decoding process, and in Section III-C, the optimal
strategy to position mutual information bits is pro-
posed. The error performance of PCM is analyzed
in Section III-D. The application of a BP or SCL
decoder to PCM is introduced in Section III-E. We
also compare PCM with Turbo codes in Section
III-F. Moreover, we extend the PCM to m > 2
blocks in Section IV. In Section V, the simulation
results are provided to validate the proposed PCM.
In Section VI, the hardware architectures of PCM
decoding are implemented. The concluding remarks
are provided at the end.
II. PRELIMINARIES OF POLAR CODES
Denote vN1 as an N-length vector (v1, ..., vN). Let
W : X → Y denote a symmetric B-DMC, with
the input alphabet X = {0, 1}, the output alphabet
Y , and the channel transition probability W (y|x),
x ∈ X , y ∈ Y . Let N = 2n (n ≥ 1) denote the block
code length. The generator matrix of polar codes
is G, which is given by G = BNF
⊗n. Here BN
denotes the bit-reversal permutation matrix, F =[
1 0
1 1
]
, and F⊗n represents the n-th Kronecker power
of F over the binary field F2. The codewords x
N
1
can be obtained by xN1 = u
N
1 G, where u
N
1 is the
source vector, consisting of K information bits and
N − K frozen bits (the fixed information in the
source vector). The codeword xN1 is transmitted over
N independent copies of W , written as WN , with
a transition probability WN(yN1 |x
N
1 ).
Channel polarization process has two parts: chan-
nel combining and channel splitting. Channel com-
bining is a phase that combines copies of W in a
recursive manner to produce a vector channel WN ,
withWN (y
N
1 |u
N
1 ) =W
N(yN1 |x
N
1 ). Channel splitting
is an operation splitting WN back into a set of N
binary-input channelsW
(i)
N , i ∈ {1, 2, ..., N}. The i-
th such channel is called bit channel i (meaning the
channel that bit i virtually experiences). According
to [1], I(W
(i)
N ) (the capacity of bit channel i)
converges to either 0 or 1 as N tends to infinity,
and the fraction of the bit channels with capacity 1
approaches I(W ).
With finite block lengths, not all bit channels are
fully polarized. The principle of polar codes is to
choose the K most reliable bit channels among
N bit channels to convey information bits. The
other bits are called frozen bits which are fixed
to be transmitted on the rest channels. The good
information set is denoted as A and complementary
set is Ac. Denote uA as a subvector of the vector
uN1 that takes elements of it from the set A.
The SC decoder is proposed in [1] and it recur-
sively computes the likelihood ratio (LR) of bit i
from
L
(i)
N (y
N
1 , uˆ
i−1
1 ) ,
W
(i)
N (y
N
1 , uˆ
i−1
1 |0)
W
(i)
N (y
N
1 , uˆ
i−1
1 |1)
, (1)
where uˆi−11 is the estimation of bits u
i−1
1 . The SC
decoder generates the estimate uˆi of bit ui (i ∈ A)
3     KL
&5&
$WWDFKPHQW
3RODU
(QFRGHU
N,K
&KDQQHO6&'HFRGHU
N
6DPSOHV
0HPRU\
KS
%LWV
0HPRU\
&5&
&KHFN
6HTXDQWLDO
,QSXW
%ORFN2GG
%ORFNV
%ORFN(YHQ
%ORFNV
KELWV
;
\
'HFLVLRQ
&KHFN
%LWV
5HVXOW
2XWSXW
KS
     KL KS
KLQIR = KL + KS
Fig. 1: System model of PCM employing the SC
decoding. The encoding diagram is shown here in a
sequential manner. A parallel encoding can be done
with two CRC attachment modules and two polar
encoders.
from
uˆi =
{
0, if L
(i)
N (y
N
1 , uˆ
(i−1)
1 ) ≥ 1
1, otherwise.
(2)
The decoding complexity of SC is O(N logN) [1].
III. POLAR CODES WITH MEMORY
In this section, the encoding of PCM and the
decoding strategies are introduced, which improves
the performance of polar codes under the SC, BP
or SCL decoding.
A. Encoding with Memory
The top-level scheme is shown in Fig. 1. Let Kcrc
denote the number of CRC bits in each block and
these CRC bits are part of the K information bits.
Then there are Kinfo = K −Kcrc pure information
bits in each block. Let Kp be the number of the
mutual information bits , and the number of the rest
information bits is denoted as Ki = Kinfo −Kp.
In the encoding process, a frame of sequential
input bits is first divided into chunks of the length
2Ki + Kp. Then each chunk is divided into two
blocks: Block Odd (with Ki +Kp bits) and Block
Even (with Ki bits). Block Even then takes the Kp
bits from Block Odd to form an input vector with
the length Kinfo for its CRC generation. In this
way, there are clearly Kp mutual information bits
which are both included in Block Odd and Block
&KXQN &KXQN 
KLQIRKSELWV
KL KS KL
%ORFN2GG %ORFN(YHQ
KS
3RODU
%ORFN2GG
3RODU
%ORFN(YHQ
,QSXW
KL KS KL KS 

%ORFN2GG %ORFN(YHQ
3RODU
%ORFN2GG
3RODU
%ORFN(YHQ
Fig. 2: The input bit arrangement of PCM.
Even. These mutual information bits are placed at
the same indices, and the mutual information set is
denoted as B. The input bit stream arrangement is
shown in Fig. 2.
The encoding of the two blocks can be done
sequentially or in parallel as seen from Fig. 1, where
both the CRC attachment and the polar encoding
are performed to Block Odd and Block Even alter-
natively, under the control of a switch.
B. The Decoding Process
The symbols of encoded code blocks are transmit-
ted over the symmetric B-DMC channelW , and the
noisy version of them are observed at the receiver
side. The receiver collects chucks of samples with a
length of 2N : the first N samples for Block Odd and
the rest for Block Even. The SC decoder generates
an estimate uˆN1 for each block. The CRC check
module returns a check result for each block. The
possible check results are:
• Case 1: Both Block Odd and Block Even are
decoded correctly;
• Case 2: Block Odd is decoded correctly but
Block Even is decoded incorrectly;
• Case3: Block Odd is decoded incorrectly while
Block Even is decoded correctly;
• Case 4: Both Blocks are decoded incorrectly.
For Case 1 and Case 2, since Block Odd is
decoded correctly, the Kp estimations of the mu-
tual information bits are stored in the memory for
possible re-use by the second round of decoding
of Block Even. For Case 1, since Block Even is
also decoded correctly, there is no need for any
more actions. For Case 2, Block Even is decoded
incorrectly, a new round of SC decoding for Block
Even can be carried out. For Case 3 and Case 4,
since Block Odd is decoded incorrectly, the initial
4N LR values of this block need to be saved for
a possible new round of decoding. For Case 3,
the correctly decoded Block Even can provide the
estimations of the mutual information bits to Block
Odd, invoking a new round of SC decoding of Block
Odd. For Case 4, since Block Even is also decoded
incorrectly, there is nothing the decoder can do for
both blocks.
A more detailed description of the decoding pro-
cess when a new round of SC decoding occurs
is as follows. The Kp estimations of the mutual
information bits from the correctly decoded block
are fed to the incorrectly decoded block. Take Case
2 as an example. Here the decoder of Block Even
can repeat the SC decoding up to the first bit in B.
When it reaches to the first bit with the index i ∈ B,
then the decoder takes this bit as a frozen bit: no
matter what the calculated LR value is for ui, it is
assigned to the decision taken from Block Odd. The
SC decoding process goes on until the end, treating
all bits in B as frozen bits. The re-decoding of Block
Odd in Case 3 is the same as that of Case 2.
C. Positioning of Mutual Information Bits
Every two consecutive transmitting blocks share
Kp mutual information bits. The positioning of
mutual information bits is to find an optimal way in
assigning these mutual information bits to the input
of the two blocks (Block Odd and Block Even).
Here “optimal” means the best system error perfor-
mance. The exact formulation is derived as follows.
The size of set B is |B| = Kp and the subvector uB
contains the mutual information bits. Theoretically,
there are
(
K
Kp
)
ways to choose the set B. Assume
the information set A = {i1, i2, ..., iK} is ordered in
the ascending order with respect to the bit channel
reliability. In other words, there exists the relation-
ship of Pe(W
(i1)
N ) ≥ Pe(W
(i2)
N ) ≥ ... ≥ Pe(W
(iK)
N ),
where Pe(W
(i)
N ) is the error probability of the i-th
information bit. The following proposition states an
optimal way to achieve the best union bound.
Proposition 1. Supposing the information set A =
{i1, i2, ..., iK} is ordered in the ascending order with
respect to the bit channel reliability, then the set B
containing the first Kp elements of the set A as
the mutual information bits indices can produce the
minimum union bound.
Proof. Define the PER over the information set A
as PB(A). Then its union bound [9] is
PB(A) ≤
∑
i∈A
Pe(W
(i)
N ). (3)
With a pair of consecutive code blocks, when re-
decoding is performed for either of them, it is
equivalent to the case that the information set of the
other block is A′ = A\B. This is because one block
is decoded correctly and the mutual information
bits are now considered as frozen bits for another
block. In such circumstance, the union bound for
the incorrectly decoded block is:
PB(A
′) ≤
∑
i∈A′
Pe(W
(i)
N ). (4)
Supposing set B′ is any other mutual information
set, so the equivalent information set of the incorrect
block can be similarly derived as A′′ = A \ B′. So
we can get
PB(A
′′) ≤
∑
i∈A′′
Pe(W
(i)
N ). (5)
Because set B contains the indices corresponding to
the Kp largest error probabilities in A, it is obvious
that ∑
i∈B
Pe(W
(i)
N ) ≥
∑
i∈B′
Pe(W
(i)
N ). (6)
Therefore,∑
i∈A′
Pe(W
(i)
N ) ≤
∑
i∈A′′
Pe(W
(i)
N ). (7)
It means that the union bound of PB(A
′) is smaller
than PB(A
′′). Since B′ is arbitrary, we can conclude
that PB(A
′) has the smallest union bound.
D. Error Performance Analysis
In this section, the error performance of PCM
is analyzed. Here we omit the inside argument of
PB(A) for compactness. Instead, the symbol PB
is used to represent the underlying PER of polar
codes with the information set A. The PER of PCM
consists of two parts:
• Part 1: Block Odd and Block Even are both
decoded incorrectly, corresponding to Case 4
in Section III-B.
• Part 2: The re-decoding of Block Even (Case
2) or Block Odd (Case 3) fails.
5For Part 1, the error probability is P 2B . For Part 2,
supposing the PER of the re-decoding is P ′B , the
error probability is therefore PB(1 − PB)P
′
B . The
PER of PCM is therefore:
Pnew = P
2
B + PB(1− PB)P
′
B. (8)
With the optimal placement of the mutual informa-
tion bits in Section III-C, there must be some blocks
which can be recovered with the help of additional
Kp frozen bits. Representing P
′
B by αPB, where α
can be obtained empirically for now, Eq. (8) can be
rewritten as:
Pnew = P
2
B +PB(1−PB)αPB = (1+α)P
2
B−αP
3
B.
(9)
By Eq. (9), it is shown that with the same com-
plexity of the SC decoding, PCM can achieve a
PER which is on the order of the underlying PER
squared.
E. Decoding with a BP or SCL Decoder
In the proposed PCM, the SC decoding can be
perfectly replaced by the BP or SCL decoding. For
Case 2 and Case 3, only one block is decoded
correctly. The correctly decoded block can provide
correct decisions of the mutual information bits to
be used by the incorrectly decoded block. Here note
that for the BP decoding, the best way to use these
correct decisions is to treat the mutual information
bits as frozen bits, instead of using the soft values
of them. The reason is simple: by treating them as
frozen bits, the initial LR values of these bits are
equivalently set to be infinity, which is definitely
better than using finite soft LR values from the
correctly decoded block. As for the SCL decoding,
the mutual information bits are treated as frozen
bits directly. Therefore, even with the BP or SCL
decoding for PCM, the mutual information bits are
used in the same way as the PCM employing the
SC decoding.
F. Comparison with Turbo Codes
The encoding of PCM shares a certain amount
of information bits between a pair of consecutive
blocks. This can be compared with Turbo codes with
two parallel identical encoders. Compared to Turbo
codes, there are several differences.
First, all the incoming information bits go through
two identical encoders for Turbo codes. While PCM
only shares a fraction of information bits between
two encoding blocks, enabling a flexible code rate
configuration. The second difference is that PCM
does not constantly exchange soft information be-
tween two blocks in the decoding process. Instead,
only when one block fails and the other succeeds,
estimations of the mutual information bits are fed
from the succeeded block to the failed block. The
information pass can be considered as a sporadic
procedure: the average percent of all additional
rounds of decoding is only (denoted as Pa)
Pa = PB(1− PB) = PB − P
2
B < PB. (10)
Compared with stand-alone polar codes, the addi-
tional decoding can result in a significant reduction
in PER as shown in Eq. (9), and it accounts for
only a small percentage of the overall decoding
operations.
IV. GENERAL POLAR CODES WITH MEMORY
In Section III, PCM is proposed where two
consecutive blocks share a controlled number of
information bits. A natural question arises: can we
extend this scheme to m > 2 polar blocks and
possibly achieve a better error performance? The
direct extension of the encoding scheme from two
blocks to m blocks (m > 2) is first analyzed in
this section. Then an improved encoding scheme is
proposed which achieves the same order of the PER
while improves the overall code rate of the direct
extension. This improved version is called general
PCM in this paper.
A. Direct Extension of Polar Codes with Memory
The PER of PCM in Section III-D is P 2B+PB(1−
PB)P
′
B , where each chunk contains two blocks.
When this scheme is extended to m blocks with
each containing Kp mutual information bits, the
PER consists of the following parts:
• Part 1: Only one block is decoded incorrectly,
and the new round of the decoding fails again;
• Part 2: Two blocks are decoded incorrectly, and
at least one block in the new round of decoding
fails again;
• ...
• Part m: All of the m blocks are decoded
incorrectly.
6For Part 1, because the re-decoding of the failed
block fails again, there is one block error among m
polar blocks. The PER in this case is therefore:
P1 =
1
m
C1mPB(1− PB)
m−1P ′B. (11)
For Part 2, with two blocks failed, the final block
error among m blocks consists of two case: 1) one
of the re-decoded blocks fails and 2) both of the re-
decoded blocks are decoded incorrectly. Therefore,
the PER is:
P2 =
2
m
C2mP
2
B(1−PB)
m−2(
1
2
C12P
′
B(1−P
′
B)+P
′2
B ).
(12)
Generally, for Part k, (1 ≤ k < m), the error
probability Pk is:
Pk =
k
m
CkmP
k
B(1− PB)
m−k(
1
k
C1kP
′
B(1− P
′
B)
k−1
+
2
k
C2kP
′2
B (1− P
′
B)
k−2 + ... + P
′k
B ).
(13)
For Part m, because all of the blocks are decoded
incorrectly, the error probability Pm is simply Pm =
PmB . Accumulating the error probability of each part
and simplifying the formula, the PER of the direct
extension of PCM is obtained:
Pnew =
m∑
k=1
Pk = PB(1− PB)
m−1P ′B
+
2
m
C2mP
2
B(1− PB)
m−2P ′B + ...+ P
m
B .
(14)
Replacing P ′B by αPB in Eq. (14), we can obtain
a new PER:
Pnew = αP
2
B(1− PB)
m−1
+
2
m
C2mαP
3
B(1− PB)
m−2 + ... + PmB .
(15)
With a relatively small PB , the new PER is
dominated by αP 2B(1−PB)
m−1, which corresponds
to the situation when only one block is decoded in-
correctly. All the other parts have terms on the order
of at least P 3B. Based on this fact, a general encoding
scheme in next section is proposed to deal with the
case where one block is decoded incorrectly among
m blocks. For all the other cases, no re-decoding is
performed. This enables the scheme to still maintain
the same PER order while improves the overall code
rate.
B. The General Polar Codes with Memory
In this section, a general encoding scheme of
PCM is proposed. From the discussions in the
previous section, it can be seen that the direct exten-
sion of the encoding scheme does not increase the
minimum order of the PER. The PER performance
is limited by the error event that there is only
one failed block among m blocks. All the other
error events have lower PER level. If the encoding
scheme is designed to only recover the limiting error
event while ignoring those error events with lower
PER level, then the overall effective code rate can
be improved.
For the direct extension of PCM, the effective
overall code rate is
Rm =
m(K −Kcrc)− (m− 1)Kp
mN
= R−
Kcrc
N
−
m− 1
m
Kp
N
,
(16)
with a rate loss of Kcrc
N
+ m−1
m
Kp
N
, where R denotes
the code rate of the underlying polar codes. To re-
duce the rate loss of the direct extension of PCM, a
general encoding scheme is proposed. Fig. 3 shows
such an input bit arrangement of the general PCM,
where each chunk contains m blocks. In Fig. 3,
the first m − 1 blocks have their own information
bits, no mutual information bits are shared among
them. However, for each of these m−1 blocks, Kp
information bits are taken out and added together
(modulo two addition). The resultantKp bits are put
as the mutual information bits for the last block. So
the input bit arrangement of the general PCM can
be shown as follows:
umB = u
1
B ⊕ u
2
B ⊕ ...⊕ u
m−1
B , (17)
where ukB, k ∈ (1, 2, ..., m) denotes the Kp mutual
information bits of block k. The positioning of the
mutual information bits for all m blocks follows
Proposition 1: they are put as those most poorly
protected information bits in each block.
In this way, the effective code rate of the general
PCM is:
Rm =
m(K −Kcrc)−Kp
mN
= R −
Kcrc
N
−
Kp
mN
.
(18)
With a large m, the fractional rate loss Kp/mN is
negligible with a constant Kp. However, a large m
comes with a higher decoding complexity. Trade-off
7&KXQN &KXQN 
mKLQIRKSELWV
KL KS
%ORFN
3RODU
%ORFN
,QSXW
KL KS KL KS  KL KS
%ORFN
3RODU
%ORFN
%ORFN
3RODU
%ORFN
%ORFNP
3RODU
%ORFNP

Fig. 3: The input bit arrangement of the general
PCM.
can always be made between a small rate loss and
a lower decoding latency.
The design of the general encoding scheme can
recover the Kp mutual information bits of the failed
block if all other m − 1 blocks in the chunk
are decoded successfully: ukB =
∑m
i=1,i 6=k u
i
B. This
scheme can not correct more than one block error
amongm blocks. When there is only one incorrectly
decoded block, the correct Kp mutual information
bits can be recovered and a new round of decoding
can be performed.
For the general PCM, the new round of decoding
occurs only when one block is decoded incorrectly,
the PER of our proposed scheme is:
Pnew = PB(1− PB)
m−1P ′B +
2
m
C2mP
2
B(1− PB)
m−2
+ ... +
k
m
CkmP
k
B(1− PB)
m−k + ...+ PmB ,
(19)
which can be rewritten by replacing P ′B by αPB:
Pnew = αP
2
B(1− PB)
m−1 +
2
m
C2mP
2
B(1− PB)
m−2
+ ...+
k
m
CkmP
k
B(1− PB)
m−k + ... + PmB .
(20)
Comparing Eq. (15) and Eq. (20), the general PCM
scheme has negligible performance loss compared
with the direct extension scheme.
V. SIMULATION RESULTS
In this section, we provide simulation results to
show the performance of PCM. The channel is the
additive white Gaussian noise (AWGN) channel.
The block length of the polar codes is N = 256,
and the number of underlying information bits is
K = 140, including a 12-bit CRC with a generator
polynomial g(x) = x12+x11+x10+x9+x8+x4+
x+1. The code rate of the underlying polar codes is
therefore R = K
N
= 0.5469. The number of mutual
information bits shared between two consecutive
blocks is set as Kp = 24.
Fig. 4 reports the BER performance of the PCM
with two consecutive blocks sharing mutual infor-
mation bits. The effective code rate of the PCM-
SC-2 is R2 = R−
2(K−Kcrc)−Kp
2N
= R− Kcrc
N
− Kp
2N
=
0.4531. For a fair comparison, the code rate of
the stand-alone polar codes with SC, BP, and SCL
decoding is adjusted as R2, and the stand-alone
polar codes with SC and BP decoding also contain
a 12-bit CRC. For the stand-alone SCL decoding,
the list size L is simulated for both L = 2 and
L = 4. It is observed that the PCM-SC-2 outper-
forms the traditional SC and BP decoding by about
0.41 dB and 0.22 dB at BER=10−4, respectively. In
addition, PCM-SC-2 achieves a comparable perfor-
mance (less than 0.3 dB) as the SCL decoding with
L = 2 at the same BER level. On the other hand,
the PCM-BP-2 achieves the same performance as
the SCL decoding with L = 2 when Eb/N0 ≥ 4 dB.
Fig. 5 shows the corresponding PER performance,
and the trend is consistent with that shown in Fig. 4.
Fig. 6 shows the simulated PER of the PCM-
SC-2 and the PER analyzed in Eq. (9). Here the
maximum (6.9) and the minimum (0.38) values of
α are found from the simulations, producing the
P uppernew and P
lower
new in Fig. 6. It is observed that
the PER performance of the PCM-SC-2 follows the
lower bound for small Eb/N0 values (less than 3
dB), and it follows the upper bound for large Eb/N0
values (larger than 3 dB), which indicates that the
PER performance of the PCM-SC-2 is on the level
of PER squared of the underlying polar codes.
Fig. 7 reports the PER performance of the PCM
employing the SCL decoding. It is shown that
the PCM-SCL-2 with L = 2 achieves the same
performance as the stand-alone SCL decoding with
L = 4 when Eb/N0 > 3.5 dB. And the PCM-SCL-2
with L = 4 and L = 8 outperform the stand-alone
SCL decoding with L = 8 and L = 16 by about 0.1
dB and 0.15 dB at PER=10−4, respectively.
The good performance of PCM comes at an
additional round of decoding, as shown by Eq. (10).
Fig. 8 shows the ratio of the additional decoding
81 1.5 2 2.5 3 3.5 4 4.5
10-4
10-3
10-2
10-1
100
Fig. 4: BER of the PCM-SC-2 and PCM-BP-2 over
AWGN channels: the block length is N = 256
and each block contains K = 140 information
bits, including 12-bit CRC and Kp = 24 mutual
information bits. The code rate of the underlying
polar codes is R = 0.5469. The stand-alone polar
codes with SC, BP, and SCL decoding all have
adjusted code rate of R2 = 0.4531.
1 1.5 2 2.5 3 3.5 4 4.5
10-4
10-3
10-2
10-1
100
Fig. 5: PER of the PCM-SC-2 and PCM-BP-2 over
AWGN channels. The parameters are the same as
those in Fig. 4.
to the overall decoding for the same system in
Figs. 4 and 5. The curve labeled as PB shown
in this figure is the PER of the underlying polar
codes. The success rate of the additional decoding
is also provided in this figure, shown by the line
with asterisks. It can be seen that for Eb/N0 ≥ 3
dB, the additional decoding rate and the additional
success rate are matched. What is important is that
the additional decoding efforts are controlled by the
PER of the underlying polar codes: the PB curve
also matches closely with the other two curves for
1 1.5 2 2.5 3 3.5 4 4.5
10-4
10-3
10-2
10-1
100
Fig. 6: The simulated PER of the PCM-SC-2 and
the analyzed PER in Eq. (9). The parameters are the
same as those in Fig. 4. The α is set as α = 6.9 for
the P uppernew and α = 0.38 for the P
lower
new .
1 1.5 2 2.5 3 3.5
10-4
10-3
10-2
10-1
100
Fig. 7: PER of the PCM-SCL-2 over AWGN chan-
nels. The parameters are the same as those in Fig. 4.
large Eb/N0. This can be seen from Eq. (10): when
PB is small (1 − PB approaches 1), the additional
decoding rate Pa is determined by PB. The decoding
failure rate of PCM is therefore left with an order
of P 2B.
Fig. 9 presents the PER curves of the general
PCM with m = 3, and the parameters are the same
as those in Fig. 4. By applying the proposed general
encoding scheme, the PER of PCM-SC-3 can be
represented as follows:
Pnew = (2 + α)P
2
B − (1 + 2α)P
3
B + αP
4
B. (21)
According to Eq. (18), the code rate of PCM-SC-
3 is R3 = 0.4688, so the stand-alone polar codes
with SC, BP, and SCL decoding all have the same
adjusted code rate as R3. It can be seen that PCM-
92 2.5 3 3.5 4 4.5
10-3
10-2
10-1
100
Fig. 8: The additional decoding rate of the PCM-
SC-2 over AWGN channels. The parameters are the
same as those in Fig. 4. The curve with legend PB
is the PER of the underlying polar codes.
1 1.5 2 2.5 3 3.5 4 4.5
10-4
10-3
10-2
10-1
100
Fig. 9: PER of the PCM-SC-3 over AWGN chan-
nels. The parameters are the same as those in
Fig. 4. The effective code rate of the PCM-SC-3
is R3 = 0.4688, and the stand-alone polar codes
with SC, BP, and SCL decoding all have the same
code rate R3.
SC-3 is about 0.18 dB worse compared with the
stand-alone SCL with L = 2 at PER=10−3 level.
VI. HARDWARE ARCHITECTURE
In this section, two hardware architectures for
PCM-SC-2 are proposed—the IS architecture and
the LLI architecture. The IS architecture is based
on the SC decoder proposed in [10], where the
processing elements (PEs) are designed with pre-
computation. The proposed architecture is capable
of performing both SC decoding and PCM-SC-2
decoding. The LLI architecture is inspired by the




D
D
D




D
D
D
Žϭ
ŽϮ
Žϯ
/ϭ
/Ϯ




D
D
D
Žϭ
ŽϮ
Žϯ
/ϭ
/Ϯ
Žϭ
ŽϮ
Žϯ
/ϭ
/Ϯ
 ĂĂ




D
D
D
Žϭ
ŽϮ
Žϯ
/ϭ
/Ϯ




D
D
D
Žϭ
ŽϮ
Žϯ
/ϭ
/Ϯ




D
D
D
Žϭ
ŽϮ
Žϯ
/ϭ
/Ϯ

^ĞĐŽĚĞƌ
ŝƚDĞŵŽƌǇ
&ĞĞĚĂĐŬWĂƌƚ
Z
ŚĞĐŬ
^ƐƚĂŐĞϭ ^ ^ƐƚĂŐĞϮ ^ ^ƐƚĂŐĞϳ ^
>>Z
DĞŵŽƌǇ

 



D
D
D
Žϭ
ŽϮ
Žϯ
/ϭ
/Ϯ
ĂĂ

Fig. 10: IS PCM-SC-2 decoder.
2-interleaved SC polar decoder [11], and it can
reduce decoding latency remarkably with only a
small increase in hardware consumption compared
with the IS architecture.
A. In-serial PCM-SC-2 Decoder
In order to increase hardware utilization and
reduce computational complexity, the decoder pro-
cesses the data in the form of log-likelihood ratio
(LLR) instead of LR. The top-level architecture of
the proposed IS PCM-SC-2 decoder is shown in
Fig. 10. It mainly consists of five modules: the LLR
memory module, the SC decoder module, the CRC
check module, the feedback module, and the bit
memory module. Compared with the conventional
SC decoder [10], the LLR memory module and the
bit memory module are additional.
The LLR memory is used to store LLRs which
are needed for Case 2 and Case 3. The bit memory
module is an important module in the architecture.
In the conventional SC decoder, the location and
the content of frozen bits are set in advance. When
the bit memory receives a frozen bit, it neglects
it, and this frozen bit is sent to the feedback
module directly. In the PCM-SC-2 decoder, when
Block Odd and Block Even are decoded in the first
round, they are decoded in the same way as in
a conventional SC decoder. When Block Odd or
Block Even passes the CRC check, the bit memory
immediately stores the mutual information bits of
this block. When it comes to Case 2 and Case 3,
the bit memory will read mutual information bits
and treat them as frozen bits in the second round
decoding of the failed block. In this way, the Kp
mutual information bits estimates from the correctly
10
 ĂĂ

ϮͲ/ŶƚĞƌůĞĂǀĞĚ
^ĞĐŽĚĞƌ
&ĞĞĚĂĐŬ
Z
ŚĞĐŬ
^ƐƚĂŐĞϭ ^ ^ƐƚĂŐĞϮ ^ ^ƐƚĂŐĞϳ ^
ŝƚDĞŵŽƌǇ
&ŽƌůŽĐŬKĚĚ


ĂĂĂĂ

ƌĞĂŬƉŽŝŶƚ
ŵĞŵŽƌǇ
DŽϭ
ŽϮ
Žϯ
/ϭ
/Ϯ




D D
D D
ƌĞĂŬƉŽŝŶƚ
ŵĞŵŽƌǇ
DŽϭ
ŽϮ
Žϯ
/ϭ
/Ϯ




D D
D D
ƌĞĂŬƉŽŝŶƚ
ŵĞŵŽƌǇ
DŽϭ
ŽϮ
Žϯ
/ϭ
/Ϯ




D D
D D
ƌĞĂŬƉŽŝŶƚ
ŵĞŵŽƌǇ
DŽϭ
ŽϮ
Žϯ
/ϭ
/Ϯ




D D
D D
ƌĞĂŬƉŽŝŶƚ
ŵĞŵŽƌǇ
DŽϭ
ŽϮ
Žϯ
/ϭ
/Ϯ




D D
D D
ƌĞĂŬƉŽŝŶƚ
ŵĞŵŽƌǇ
DŽϭ
ŽϮ
Žϯ
/ϭ
/Ϯ




D D
D D
ŝƚDĞŵŽƌǇ
&ŽƌůŽĐŬǀĞŶ
&ĞĞĚĂĐŬ
Fig. 11: LLI PCM-SC-2 decoder.
ƌĞĂŬƉŽŝŶƚ
ŵĞŵŽƌǇ
DŽϭ
ŽϮ
Žϯ
/ϭ
/Ϯ




D D
D D
Fig. 12: Processing element of LLI architecture.
decoded block are effectively fed to the bit memory
of incorrectly decoded block.
B. Low-latency Interleaved PCM-SC-2 Decoder
It should be noticed that when it comes to Case
2 and Case 3, the decoding latency of the IS PCM-
SC-2 decoder is 1.5 times of the conventional SC
decoder. When Block Odd or Block Even performs
a new round of decoding, the computation of LLRs
is redundant before the first erroneous mutual in-
formation bit. Based on this, an LLI architecture
employing interleaved decoding is proposed and
shown in Fig. 11, which is introduced as follows.
For a conventional N-bit SC decoder, there are
n stages—Stage 1 to Stage n, with only one stage
being active in a clock cycle. As described in [11],
a 2-interleaved SC decoder can decode two polar
blocks simultaneously. Inspired by this, the LLI
architecture is proposed. The main idea is that when
Block Odd is being decoded in Stage i (i 6= n),
Block Even can be decoded in Stage i − 1 since
this stage is idle for Block Odd. The decoding
process of the two blocks will conflict in the last
stage—Stage n, because every block needs to stay
in Stage n for two clock cycles. Therefore, an
additional PE is needed in this stage. As shown
in Fig. 11, LLI PCM-SC-2 decoder has an extra
PE in the Stage n. In addition, two independent bit
memories and feedback modules are designed for
Block Odd and Block Even, in order to decode them
simultaneously. Fig. 12 shows the PE of the LLI
PCM-SC-2 decoder. It has two additional registers
which are used to store LLRs of the two blocks,
compared with that of the IS PCM-SC-2 decoder.
With the proposed design, whenever a mutual
information bit in Block Even is decoded, it can
be compared with the mutual information bit of the
same location in Block Odd (which was decoded
one clock cycle before). If the two bits are different,
all intermediate LLR values of the two blocks,
which are stored in the registers, will be immedi-
ately sent to the breakpoint memory, and then the
decoding process continues. When it comes to Case
2 or Case 3, the incorrectly decoded block starts
the second round of decoding from the position
of the first different mutual information bit, and
the intermediate LLRs are directly read from the
breakpoint memory instead of being calculated. In
the studied case of PCM with the same parameters
as those of Fig. 4, the indices of the first mutual bit
and the last mutual bit are 32 and 209, respectively.
It means that if the second round of decoding is
required, the computation of the LLRs before the
32-th bit is avoided in the worst case, and the
computation of the LLRs before the 209-th bit is
avoided in the best case.
C. Implementation Results
The two decoders are implemented on the Xilinx
ZNYQ-7000 field-programmable gate array (FPGA)
platform. The latency of the LLI decoder is lower
than that of the IS decoder, and it is reduced nearly
by half in the first round of decoding, due to
the interleaved decoding for Block Odd and Block
Even. It is also reduced in the second round of
decoding, because the LLI decoder can begin with
the first erroneous mutual information bit. Fig. 13
shows the reduction rate of the average latency for
the LLI PCM-SC-2 decoder in the second round of
decoding, and the number of the samples is 100000
at each Eb/N0. It is remarkable that the LLI PCM-
SC-2 decoder can reduce the latency of the second
11
TABLE I: SYNTHESIS RESULTS COMPARISON OF DIFFERENT POLAR DECODERS FOR N =
256.
Decoder
PCM-SC-2 Decoder Combinational
SC Decoder
[12]
Adaptive SCL Decoder [8]
IS LLI L = 2 L = 4
LUTs 4302 4781 35152 12589 16565
FFs 4629 15381 1561 7809 10217
Total 8931 20126 36944 20398 26782
RAM [bit] 0 0 1792 0 0
Block RAMs 0 0 0 14 28
Max. Freq. [MHz] 105.3 104.7 — 135.5 121.5
Min.-Max.Latency [µs] 2.44 − 2.95 1.23 − 1.82 — 1.12 − 11.32 1.25− 12.67
Min.-Max.T/P [Mbps] 83− 100 166 − 201 — 12− 197 11− 177
1 1.5 2 2.5 3 3.5 4
0.45
0.5
0.55
0.6
0.65
0.7
Fig. 13: Reduction rate of the average latency for
the LLI PCM-SC-2 decoder in the second round of
decoding.
1 1.5 2 2.5 3 3.5 4
200
300
400
500
600
700 IS PCM-SC-2 decoderLLI PCM-SC-2 decoder
Fig. 14: Average latency of the IS PCM-SC-2 de-
coder and the LLI PCM-SC-2 decoder (in clock
cycles).
round decoding by 49.3% and 66.9% at Eb/N0 = 1
dB and Eb/N0 = 4 dB, respectively. Fig. 14 shows
the average latency of the two decoders. It can be
seen that the average latency of the LLI PCM-SC-
2 decoder is approximately half of that of the IS
PCM-SC-2 decoder because the latency reduction
rates are both around 50% in the first round and the
second round.
Table I shows the synthesis results comparison of
different polar decoders for N = 256, including the
IS and LLI PCM-SC-2 decoders, the combinational
SC decoder in [12], and the adaptive SCL decoder
[8] with L = 2 and L = 4. As shown in Table I,
the IS PCM-SC-2 decoder consumes the least hard-
ware resources, although the maximum throughput
is inferior to the others. The total consumptions
(FF and LUT) of the IS PCM-SC-2 decoder and
the LLI PCM-SC-2 decoder are only 24.2% and
54.5% of the consumption of SC decoder in [12],
respectively. The hardware consumption of the LLI
PCM-SC-2 decoder is 98.7% and 75.2% of the
consumption of adaptive SCL decoder with L = 2
and L = 4, respectively. The reason is that the SCL
decoder needs L SC decoder modules while the
proposed architecture only needs one. Moreover, the
decoders in [8] and [12] use additional RAM and
Block RAMs, which increases the consumption of
hardware resources, while the proposed PCM-SC-2
decoders do not.
Table I also shows the range of the latency
and the throughput of the PCM-SC-2 decoders and
the adaptive SCL decoder. It is observed that the
minimum latency and the maximum throughput of
the LLI PCM-SC-2 decoder are comparable to those
12
of the adaptive SCL decoder with L = 2, and
are slightly superior to those of the adaptive SCL
decoder with L = 4. As for the worst situation, the
maximum latency of the LLI PCM-SC-2 decoder is
only 16.1% and 14.4% of the adaptive SCL decoder
with L = 2 and L = 4, and the minimum throughput
is improved by more than 13 and 15 times compared
to them, respectively.
VII. CONCLUSION
In this paper, PCM employing the SC, BP, or SCL
decoding is proposed. By sharing a certain amount
of mutual information bits between a pair of blocks,
this scheme can bring down the PER to the square
of the underlying polar codes. Results show that
for the block length 256, the proposed PCM-SC-
2 and PCM-BP-2 decoders can match the PER of
the stand-alone SCL decoder with two lists. The
PER performance of PCM-SCL-2 decoder with L
lists can match the PER of the stand-alone SCL
decoder with 2L lists. In the meantime, the proposed
LLI hardware architecture for PCM can achieve 13
times more throughput compared to the adaptive
SCL decoder with two lists when the block length
N = 256 in the worst case.
REFERENCES
[1] E. Arıkan, “Channel polarization: A method for constructing
capacity-achieving codes for symmetric binary-input memory-
less channels,” IEEE Trans. Inf. Theory, vol. 55, no. 7, pp.
3051–3073, 2009.
[2] N. Hussami, S. Korada, and R. Urbanke, “Performance of polar
codes for channel and source coding,” in Proc. IEEE Int. Symp.
Inf. Theory, June 2009, pp. 1488–1492.
[3] I. Tal and A. Vardy, “List decoding of polar codes,” IEEE Trans.
Inf. Theory, vol. 61, no. 5, pp. 2213–2226, 2015.
[4] K. Niu and K. Chen, “CRC-aided decoding of polar codes,”
IEEE Commun. Lett., vol. 16, no. 10, pp. 1668–1671, October
2012.
[5] K. Chen, K. Niu, and J. Lin, “Improved successive cancellation
decoding of polar codes,” IEEE Trans. Commun., vol. 61, no. 8,
pp. 3100–3107, August 2013.
[6] E. Arıkan, “A performance comparison of polar codes and reed-
muller codes,” IEEE Commun. Lett., vol. 12, no. 6, pp. 447–449,
2008.
[7] A. Eslami and H. Pishro-Nik, “On bit error rate performance of
polar codes in finite regime,” in Proc. Annual Allerton Conf. on
Commun., Control, Computing (Allerton), 2010, pp. 188–194.
[8] A. Su¨ral and E. Arıkan, “An FPGA implementation of succes-
sive cancellation list decoding for polar codes,” Ph.D. disserta-
tion, Bilkent Univ., Ankara, 2016.
[9] I. Tal and A. Vardy, “How to construct polar codes,” IEEE
Trans. Inf. Theory, vol. 59, no. 10, pp. 6562–6582, October
2013.
[10] C. Zhang and K. Parhi, “Low-latency sequential and overlapped
architectures for successive cancellation polar decoder,” IEEE
Trans. Signal Process., vol. 61, no. 10, pp. 2429–2441, May
2013.
[11] C. Zhang and K. K. Parhi, “Interleaved successive cancellation
polar decoders,” in Proc. 2014 IEEE Int. Symp. Circuits Syst.
(ISCAS), June 2014, pp. 401–404.
[12] O. Dizdar, “High throughput decoding methods and architecture
for polar codes with high energy-efficiency and low latency,”
Ph.D. dissertation, Bilkent Univ., Ankara, 2017.
