On Path Memory in List Successive Cancellation Decoder of Polar Codes by Xia, ChenYang et al.
On Path Memory in List Successive Cancellation
Decoder of Polar Codes
ChenYang Xia, YouZhe Fan, Ji Chen, Chi-Ying Tsui
Department of Electronic and Computer Engineering, the HKUST, Hong Kong
{cxia, jasonfan, jchenbh}@connect.ust.hk, eetsui@ust.hk
Abstract—Polar code is a breakthrough in coding theory. Using
list successive cancellation decoding with large list size L, polar
codes can achieve excellent error correction performance. The
L partial decoded vectors are stored in the path memory and
updated according to the results of list management. In the state-
of-the-art designs, the memories are implemented with registers
and a large crossbar is used for copying the partial decoded
vectors from one block of memory to another during the update.
The architectures are quite area-costly when the code length
and list size are large. To solve this problem, we propose two
optimization schemes for the path memory in this work. First, a
folded path memory architecture is presented to reduce the area
cost. Second, we show a scheme that the path memory can be
totally removed from the architecture. Experimental results show
that these schemes effectively reduce the area of path memory.
Index Terms—Polar codes, List successive cancellation decod-
ing, Path memory, Partial-sums
I. INTRODUCTION
Polar codes [1] are the first kind of forward error correction
code that is proved to achieve channel capacity. The basic
decoding scheme of polar codes is called successive cancella-
tion decoding (SCD) [1]. The decoding is sequential in nature
as the decoding of a new bit has dependency on the already
decoded bits. Specifically, the data dependency comes from
the partial-sums which are obtained by encoding some of the
already decoded bits. These partial-sums are used as the inputs
of the subsequent computations. List successive cancellation
decoding (LSCD), proposed in [2], includes L parallelly-
decoded SCDs and keeps L partial decoded vectors during
decoding. By using a large list size and selecting a decoded
vector satisfying the cyclic redundancy check (CRC) [3], [4]
at the end of decoding, the error correction performance of
polar codes is greatly improved.
The hardware architecture of LSCD [5]–[10] implements L
SCD kernels to support the parallel calculations of L paths,
indicating the hardware complexity is at least L times that of
the single SCD. To achieve a moderate hardware complexity,
the semi-parallel architecture [11] and the folded partial-sum
network (PSN) [12] originally proposed for a single SCD are
also adopted in the LSCD architecture [5]–[9] due to their
low complexity. Several L × L crossbars are used for the
permutations of log-likelihood ratios (LLRs), partial-sums and
partial decoded vectors among different memories according
to the results of list management. In [5], pointers are used
to access the corresponding LLRs for computation instead of
directly copying the LLRs during update.
The path memory with L ·N memory bits is implemented
to store the L partial decoded vectors in the existing LSCD
architecture [5]–[7]. It updates the contents when a new bit is
decoded. The partial decoded vector of a path is duplicated
if both its expanded paths are kept. After this duplication, an
N -bit crossbar is needed for the permutation of the updated
partial decoded vectors, which has a very high complexity and
takes a large area when the code length and list size are large.
Moreover, the path memory needs to be implemented with
registers which usually have a larger area than SRAMs.
In this work, to optimize the complexity of the path memory,
we first propose a folded path memory by mimicking the
architecture of the folded PSN. Then we present a method
to recover the decoded bits from the partial-sums which
are already available in the folded PSN and hence the path
memory can be omitted. It is shown that the latency of this
recovery can be hidden in the decoding process, therefore will
not cause any latency overhead in most conditions.
Notation: In this paper, matrices and row vectors are de-
noted in boldface uppercase and lowercase letters, respectively.
XM represents a square matrix of order M and xM represents
an M -dimensional vector. xi is the ith element of a vector x.
II. MISCELLANEOUS
A. Introduction of Polar Codes
Polar codes [1] are a kind of linear block codes whose code
length is denoted as N . Its generator matrix is Kronecker
matrix F⊗n, where F =
[
1 0
1 1
]
and n = log2N . A
codeword xN can be encoded from a source word uN by
xN = uN ·F⊗n, where uN ,xN ∈ {0, 1}N . Figure 1(a) shows
an encoding signal flow graph of F⊗3 in which each “⊕” node
executes an XOR operation and each “·” node split its input.
LSCD of polar codes can be represented by a scheduling
tree shown in Figure 1(b). It includes two parts. The upper half
is a full binary tree with n+1 stages, representing L identical
SCDs for L paths. The stage indices are in descending order
from the root to the leaf nodes. Two kinds of nodes, denoted
as F-nodes and G-nodes, exist in this tree. The number of
functions executed in each node is also marked in Figure 1(b).
The functions in G-nodes depend on the partial-sums which
are encoded from the already decoded bits. Specifically, the
partial-sums for the jth node from the left on stage λ are
calculated as
[sˆλ(j−1)·Λ, ..., sˆ
λ
j·Λ−1] = [uˆ(j−1)·Λ, ..., uˆj·Λ−1] · F⊗λ, (1)
ar
X
iv
:1
71
2.
02
05
3v
2 
 [c
s.I
T]
  9
 D
ec
 20
17
s6 
0
u7 
s4 
s5 
1
1
s0 
s1 
s2 
s3 
2
2
2
2
s0 
s1 
s2 
s3 
s4 
s5 
s6 
s7 
3
3
3
3
3
3
3
3
s0 
s1 
s2 
s3 
s4 
s5 
s7 
0
0
0
0
0
0
0
u0 
u1 
u2 
u3 
u4 
u5 
u6 
s0 
s1 
s2 
s3 
s6 
s7 
1
1
1
1
1
1
s4 
s5 
s6 
s7 
2
2
2
2 x7 
x0 
x1 
x2 
x3 
x4 
x5 
x6 
(a)
1 1 1 1 1 1 1 1
2 2 2 2
4 4
8
u0 u1 u2 u3 u4 u5 u6 u7 LM
0
1
2
3F G
(b)
Cycle 0 1 2 3
Input 0 s17 s
1
6 s
1
5 s
1
4 s
2
3 s
2
2 s
2
1 s
2
0
Input 1 - - s37 s
3
6 s
3
7 s
3
6 s
3
5 s
3
4
Output s37 s
3
6 s
3
5 s
3
4 s
3
3 s
3
2 s
3
1 s
3
0
(c)
Figure 1. (a) Encoding signal flow graph and (b) LSCD scheduling tree of
polar codes and (c) the steps of partial-sum update in a folded PSN (Λ = 8
and P = 2).
PSN for 
length
2P codes
π
Mux
Partial-
sum
memory
Parallel PSN serially-updated PSN
P
^.ui
To PEs
P
P P
P
Figure 2. The block diagram of folded partial-sum network.
where stage index λ ∈ [0, n−1] and j ∈ [0, 2n−λ−1], Λ = 2λ
is the bit width of partial-sums at stage λ and uˆ and sˆ are the
decoded bits and partial-sums, respectively1. For example, the
G-node at stage 2 (j = 1) in Figure 1(b) needs the partial-
sums generated from [uˆ0, ..., uˆ3] in Figure 1(a).
The source bits are decoded in an ascending order. At each
leaf node, a source bit is decoded. For each path in LSCD,
either possibility that the decoded bit is 0 or 1 is considered
and the number of paths is doubled. If the number of paths
exceeds the list size L, list management operations, denoted
by the squares in the scheduling tree, are executed to keep the
best L decoding paths in the list and discard the others.
B. Folded Partial-sum Network
In SCD, 2λ F- or G-functions can be calculated in parallel
at stage λ, so N2 processing elements (each is used to calculate
one function) should be implemented if we want to maximize
the parallelism. However, the area cost will be very high.
To reduce the hardware complexity, semi-parallel architecture
[11] was proposed to limit the computational parallelism to
P = 2p( N), i.e., at most P functions are calculated in
one clock cycle and a node is calculated in
⌈
2λ−p
⌉
clock
cycles. The complexity of PSN in this kind of semi-parallel
architecture can also be reduced. In [12], an folded PSN
architecture is proposed, which generates at most P partial-
sum bits in one clock cycle. Its block diagram is shown in
Figure 2. The partial-sums for the nodes at stages not higher
than p are updated by a parallel PSN and stored in a P -bit
1For simplicity, the hats in uˆ and sˆ are omitted in the rest of this paper
Memory 0
Memory 1
Memory L-1
Shifter 0
Shifter 1
Shifter L-1
...
...Cross
bar
N
N
Decoded bits
Figure 3. The block diagram of the traditional path memory.
Reg. 0
Reg. 1
Reg. L-1C
ro
ss
b
a
rShifter 0
Shifter 1
Shifter L-1
SRAM 0
SRAM 1
SRAM L-1 C
ro
ss
b
a
r
LM Pointer mem
...
...
...
P PP PP1
LlogL LlogL
Figure 4. The block diagram of the proposed folded path memory.
register bank. The partial-sums at higher stages are serially
updated in a word of P bits and stored in an N2 -bit SRAM.
The encoding signal flow graph in Figure 1(a) can be used
to illustrate how to generate Λ = 8 bits of partial-sums for
G-node at stage 3. Supposing that the parallelism P = 2, the
partial-sums already generated are {[s20, s21, s22, s23], [s14, s15], s06}
and the newly decoded bits is u7. First, [s16, s
1
7] are parallelly
updated from s06 and u7. Then, the required partial-sums
[s30, ..., s
3
7] are generated according to the schedule shown in
Figure 1(c) within 4 clock cycles. According to the synthesis
results in [12], the folded PSN has a much smaller area than
other fully-parallel PSN architecture, such as the partial-sum
update logic [11] and the feed forward architecture [13].
C. Problems of the Existing Path Memory
The block diagram of the traditional path memory archi-
tecture for LSCD [5]–[7] is shown in Figure 3. L blocks of
memories are implemented to store the partial decoded vectors
of L paths. Each memory includes N bits of registers. After
the list management operation is executed, some paths are
pruned while other paths are kept and duplicated, and the
contents in the path memory are updated. First, the crossbar
permutes the paths according to the list management results.
Then the newly decoded bit of each path is appended to the
corresponding permuted partial decoded vector by a shifter.
Finally, the updated paths are stored in the path memory.
According to the synthesis results, the crossbar used in this
architecture, which has a quadratic complexity with respect to
the list size, takes a very large area when large code length and
list size are used and this becomes a significant issue of the
existing architecture. The registers also take a large area and
are expected to be substituted with other hardware-friendly
memory elements.
III. FOLDED PATH MEMORY
As discussed in Section II-C, in the traditional path memory,
the area overhead is mainly due to the N -bit crossbar when
the list size is large. Consequently, the key to reduce the
complexity of the path memory is to reduce the crossbar size.
u0 
u1 
u2 
u3 
u4 
u5 
u6 
u7 
u4 
u5 
u6 
u7 
u0 
u1 
u2 
u3 
(a)
30 21 s0 
s1 
s2 
s3 
s4 
s5 
s6 
s7 
3
3
3
3
3
3
3
s0 
s1 
s2 
s3 
s4 
s5 
s7 
0
0
0
0
0
0
u0 
u1 
u2 
u3 
u4 
u5 
u6 
s4 
s5 
1
1
s0 
s1 
s2 
s3 
2
2
2
s0 
s1 
s2 
s3 
s6 
s7 
1
1
1
1
1
s4 
s5 
s6 
s7 
2
2
2
2
(b)
Cycle 0 1 2 3
Input 0 s30 s
3
1 s
3
2 s
3
3 s
2
0 s
2
1 s
2
4 s
2
5
Input 1 s34 s
3
5 s
3
6 s
3
7 s
2
2 s
2
3 s
2
6 s
2
7
“⊕” s20 s21 s22 s23 s10 s11 s14 s15
“·” s24 s25 s26 s27 s12 s13 s16 s17
(c)
Figure 5. The signal flow graph of (a) folded path memory and (b) decoded
bits recovery and (c) the recovery schedule (Λ = 8 and P = 2).
As presented in Section II-B, the folded PSN updates at most
P partial-sum bits in one clock cycle. If this architecture is
used in an LSCD, the crossbar size is only P bits, which is
much smaller than that of the N -bit crossbar in a parallel
path memory. According to Section II-B, the partial-sums
and decoded bits have the same bit width and are always
updated at the same time during the decoding. Based on these
observations, we propose an architecture called folded path
memory which mimics the architecture of the folded PSN, as
shown in Figure 4.
The left part of the folded path memory includes L P -
bit register banks, L shifters and a P -bit crossbar. After the
list management operation, the crossbar read the P -bit partial
decoded vectors from the register banks and update them in
the same way as the parallel path memory shown in Figure 3.
When each register bank in the left part is full with P bits,
these bits are sent to the right part.
The right part uses L blocks of SRAMs to store the partial
decoded vectors. The port width of each SRAM is P bits
and its total size equals to N bits. The stored vectors are not
permuted for update. Instead, we can use NP pointers to store
the block indices of the SRAM in which each P bits are stored.
However, to update the pointers, we still need extra hardware.
To use as few pointers as possible, we still use a crossbar to
permute the decoded bits which have the same indices with
the partial-sums that are being updated. Take an example with
Λ = 8 and P = 2, whose signal flow graph shown in Figure
5(a) can be obtained by changing the “⊕” nodes in Figure 1(a)
to “·” nodes. During the four clock cycles when the partial-
sums at stage 3, [s30, ..., s
3
7], are generated, the corresponding
[u0, ..., u3], [u4, u5] and [u6, u7] of this path but previously
stored in different blocks of memories are permuted through
the crossbar and stored in the SRAM of this path in these four
cycles. By doing so, [u0, ..., u7] of each path can be pointed
by a pointer instead of three pointers. For a polar code with
code length equal to N , the partial decoded bits are store in
n− p+ 1 groups with their length Λ ∈ {N2 , N4 , ..., 2P, P, P}.
This means only n− p+ 1 pointers are enough for each path.
P/2 P/2 P/2 P/2
P P
Λ/2 Λ/2
Λ F G
SRAM activated
SRAM not activated
Figure 6. The scheduling tree of recovering the decoded bits.
Finally, as the two crossbars are never activated simul-
taneously, only one crossbar is implemented in the final
architecture. Comparing with the existing architectures, the
folded path memory uses a much smaller crossbar while it is
adaptive to LSCD with any code length and list size and also
easy to implement.
IV. RECOVERING DECODED BITS FROM PARTIAL-SUMS
In this section, based on the fact that the partial-sums are
encoded from the decoded bits, we introduce a scheme to
directly recover the decoded bits from the partial-sums stored
in a folded PSN. We also show that the proposed scheme do
not introduce any extra latency comparing with the traditional
semi-parallel decoding schedule. By doing so, the folded PSN
and the path memory are merged and the path memory can
be omitted.
We rewrite (1) in the form of block matrices, i.e., we divide
the vectors into P -dimensional sub-vectors and the generator
matrix into sub-matrices of order P and we have
[(sλP )0, ..., (s
λ
P ) Λ
P −1] =
[(uP )0, ..., (uP ) Λ
P −1] · (F
⊗p ⊗ F⊗λ−p) (2)
where (sλP )j = [s
λ
j·P , ..., s
λ
(j+1)·P−1] and (uP )j =
[uj·P , ..., u(j+1)·P−1] (j ∈ [0, ΛP − 1]). Each P -bit sub-vector
(sλP )j is the content stored in one address in the SRAM of
the folded PSN and is a linear combination of (uP )j · F⊗p.
Consequently, to recover the decoded bits from the partial-
sums, we first calculate all the intermediate values (uP )j ·F⊗p
from the partial-sums, then we encode the intermediate values
to get the corresponding decoded bits because (uP )j · F⊗p ·
F⊗p = (uP )j · IP = (uP )j , where I is an identity matrix.
Next, we derive the equations of (uP )j · F⊗p. From the
mixed-product property2 of Kronecker product, we can get
(F⊗p ⊗ F⊗λ−p) · (IP ⊗ F⊗λ−p) = F⊗p ⊗ I Λ
P
. (3)
Multiply both sides of (2) by (IP ⊗ F⊗λ−p), we can get
[(sλP )0, ..., (s
λ
P ) Λ
P −1] · (IP ⊗ F
⊗λ−p) =
[(uP )0, ..., (uP ) Λ
P −1] · (F
⊗p ⊗ I Λ
P
). (4)
Take an numerical example of (4) with ΛP = 4, by using the
multiplication of block matrices, we can get
(uP )0 · F⊗p = (sλP )0 + (sλP )1 + (sλP )2 + (sλP )3
(uP )1 · F⊗p = (sλP )1 + (sλP )3
(uP )2 · F⊗p = (sλP )2 + (sλP )3
(uP )3 · F⊗p = (sλP )3
(5)
2(A⊗B)(C⊗D) = (AC)⊗ (BD), if AC and BD exist.
Table I
THE SRAM SIZE IN ALL THE MENTIONED ARCHITECTURES
Port width SRAM size
Folded PSN 2P N
2
Folded path memory P N
Merged memory 2P N
The left hand side are the (uP )j · F⊗p we want to calculate,
and the right hand side are the P -bit sub-vectors of the partial-
sums. So (4) can be regarded as encoding P groups of ΛP -
bit sub-codes. With the P XOR gates in the folded PSN,
one XOR calculation in each of the P groups of encoding is
executed in one clock cycle. This indicates that for a sub-code
whose length equals to Λ, its latency for recovery equals to the
number of the “⊕” nodes in the encoding signal flow graph
of a ΛP -bit polar code. Consequently, the latency to recover Λ
decoded bits from the corresponding partial-sums is Λ2P log2
Λ
P
clock cycles. For example, to recover the Λ = 8 decoded bits
in an LSCD with P = 2 in Figure 5(b), we encode two 4-bit
sub-codes, [s30, s
3
2, s
3
4, s
3
6] and [s
3
1, s
3
3, s
3
5, s
3
7], whose schedule
is shown in Figure 5(c) and the total latency is 4 clock cycles.
Finally, an extra P -bit encoder for each path is used to encode
(uP )j · F⊗p.
For an N -bit polar code, n−p+1 groups of partial decoded
bits with their length Λ ∈ {N2 , N4 , ..., 2P, P, P} need to be
recovered from the folded PSN. The size of the SRAM is
N bits, which is twice that of a traditional folded PSN as
the N2 -bit partial-sums for stage n − 1 are not stored in a
traditional folded PSN [12]. The corresponding decoded bits
can be recovered after the right most G-node at stage λ is
calculated because these memory bits are never used to store
or update partial-sums in the subsequent decoding. By using
the cycles in which the folded PSN is idle, the latency can
be hidden in the decoding process. Specifically, as shown in
Figure 6, all the clock cycles used to calculate the nodes below
stage p before the beginning of the next recovery of Λ2 bits can
be used to recover the Λ bits because the SRAMs in the folded
PSN are not activated. By calculation, the number of cycles
in these stages is Λ(1 − 1P ) and it should be larger than the
latency for the recovery of Λ decoded bits which is Λ2P log2
Λ
P
cycles. Thus, the relationship between Λ and P where no extra
latency is introduced can be derived as
Λ < P · 22P−2. (6)
With practical parallelism P = 64 which is used in most of
the existing architecture [5]–[9], (6) is satisfied for the LSCD
with code length even equal to N = 220.
For simplicity, we call the folded PSN which can recover
the decoded bits the merged memory.
V. IMPLEMENTATION RESULTS
To show the area saving achieved by the proposed path
memory architectures, we synthesize folded PSN, traditional
path memory, folded path memory and merged memory in the
LSCD with different combinations of list size and code length
Table II
THE SYNTHESIS RESULTS WITH UMC 90NM TECHNOLOGY (UNIT: mm2)
Code length 210 213
List size 8 16 32 8 16 32
(1) Folded PSN 0.416 0.894 2.018 0.826 1.713 3.655
(2) Traditional
path memory
0.521 1.692 6.158 4.251 15.83 54.72
(3) Folded path
memory
0.228 0.526 1.279 0.696 1.462 3.151
(4) Merged
memory
0.511 1.056 2.288 1.165 2.365 4.905
(1) + (2) 0.937 2.586 8.176 5.077 17.54 58.38
(1) + (3) 0.644 1.420 3.297 1.522 3.175 6.806
Only (4) 0.511 1.056 2.288 1.165 2.365 4.905
with UMC 90 nm technology. The timing constraint for all
the designs is 1ns and P = 64 for a fair comparison. All the
SRAMs used in these architectures are summarized in Table I.
All of them have two ports so that they can read and write data
at the same time. The synthesis results are shown in Table II.
The area of pointer memory is not included as the pointers for
LLR memory are valid and can be reused for these memories.
Comparing with the traditional path memory, the folded path
memory achieves an area saving of more than 50% for all
different list size and code length combinations. It also has a
smaller area than the folded PSN and the merged memory as
the read port width of the SRAM is P bits instead of 2P bits.
For the merged memory, all the combinations satisfy (6),
indicating the decoded bits can be recovered without any
latency overhead comparing with the traditional schedule.
Each merged memory is slightly larger than its corresponding
folded PSN because of the extra encoders and SRAM bits.
For the storage of both partial-sums and decoded bits,
we can use either a folded PSN and a folded path memory
(“(1)+(3)”) or just a merged memory (“Only (4)”). The sum
of the area of a folded PSN and a traditional path memory
(“(1)+(2)”) is used as a benchmark for comparison. The larger
the list size and the code length are, the more saving we can get
from the proposed architecture. The merged memory brings
us the most saving as the path memory is no more needed.
Regarding to the area saving with respect to the whole LSCD,
the area of an LSCD with N = 210 and L = 16 is 7.47 mm2
according to [6], which means about 20% of the total area can
be saved if a merged memory is used in an LSCD.
VI. CONCLUSION
In this paper, we propose two methods to optimize the
hardware complexity of the path memory in the LSCD of
polar codes. The folded path memory mimics the architecture
of the folded PSN to reduce the bit width of the crossbar. It is
easy to implement and can be used in any semi-parallel LSCD
architecture. The merged memory can recover the decoded bits
from the partial-sums stored in the folded PSN in almost all the
practical LSCD and hence the path memory can be omitted.
Synthesis results show that a large area saving can be achieved.
REFERENCES
[1] E. Arıkan, “Channel polarization: A method for constructing capacity-
achieving codes for symmetric binary-input memoryless channels,” IEEE
Trans. Inf. Theory, vol. 55, no. 7, pp. 3051–3073, June 2009.
[2] I. Tal and A. Vardy, “List decoding of polar codes,” IEEE Trans. Inf.
Theory, vol. 61, no. 5, pp. 2213–2226, May 2015.
[3] K. Niu and K. Chen, “Crc-aided decoding of polar codes,” IEEE
Commun. Lett., vol. 16, no. 10, pp. 1668–1671, Oct 2012.
[4] B. Li, H. Shen, and D. Tse, “An adaptive successive cancellation list
decoder for polar codes with cyclic redundancy check,” IEEE Commun.
Lett., vol. 16, no. 12, pp. 2044–2047, Dec 2012.
[5] A. Balatsoukas-Stimming, M. Bastani Parizi, and A. Burg, “LLR-based
successive cancellation list decoding of polar codes,” IEEE Trans. Signal
Process., vol. 63, no. 19, pp. 5165–5179, Oct 2015.
[6] Y. Fan, C. Xia, J. Chen, C. Tsui, J. Jin, H. Shen, and B. Li, “A low-
latency list successive-cancellation decoding implementation for polar
codes,” IEEE J. Sel. Areas Commun., vol. 34, no. 2, pp. 303–317, Feb.
2016.
[7] S. A. Hashemi, C. Condo, and W. J. Gross, “Fast and flexible successive-
cancellation list decoders for polar codes,” IEEE Trans. Signal Process.,
vol. PP, no. 99, pp. 1–14, 2017.
[8] J. Lin, C. Xiong, and Z. Yan, “A high throughput list decoder architecture
for polar codes,” IEEE Trans. VLSI Syst., vol. 24, no. 6, pp. 2378–2391,
June 2016.
[9] C. Xiong, J. Lin, and Z. Yan, “Symbol-decision successive cancellation
list decoder for polar codes,” IEEE Trans. Signal Process., vol. 64, no. 3,
pp. 675–687, Feb 2016.
[10] S. M. Abbas, Y. Fan, J. Chen, and C.-Y. Tsui, “Low complexity belief
propagation polar code decoder,” in IEEE Workshop on Signal Process.
Syst. (SiPS), 2015, pp. 1–6.
[11] C. Leroux, A. J. Raymond, G. Sarkis, and W. J. Gross, “A semi-parallel
successive-cancellation decoder for polar codes,” IEEE Trans. Signal
Process., vol. 61, no. 2, pp. 289–299, Jan 2013.
[12] Y. Fan and C.-Y. Tsui, “An efficient partial-sum network architecture for
semi-parallel polar codes decoder implementation,” IEEE Trans. Signal
Process., vol. 62, no. 12, pp. 3165–3179, Jun 2014.
[13] C. Zhang and K. K. Parhi, “Low-latency sequential and overlapped
architectures for successive cancellation polar decoder,” IEEE Trans.
Signal Process., vol. 61, no. 10, pp. 2429–2441, May 2013.
