Memory Management in Successive-Cancellation based Decoders for
  Multi-Kernel Polar Codes by Bioglio, Valerio et al.
ar
X
iv
:1
80
9.
09
43
6v
1 
 [c
s.I
T]
  2
5 S
ep
 20
18
1
Memory Management in Successive-Cancellation
based Decoders for Multi-Kernel Polar Codes
Valerio Bioglio, Carlo Condo, Ingmar Land
Mathematical and Algorithmic Sciences Lab
Huawei Technologies France SASU
Email: {valerio.bioglio,carlo.condo,ingmar.land}@huawei.com
Abstract—Multi-kernel polar codes have recently been pro-
posed to construct polar codes of lengths different from powers
of two. Decoder implementations for multi-kernel polar codes
need to account for this feature, that becomes critical in mem-
ory management. We propose an efficient, generalized mem-
ory management framework for implementation of successive-
cancellation decoding of multi-kernel polar codes. It can be used
on many types of hardware architectures and different flavors
of SC decoding algorithms. We illustrate the proposed solution
for small kernel sizes, and give complexity estimates for various
kernel combinations and code lengths.
Index Terms—Polar Codes, Successive Cancellation Decoding,
Decoder Architectures.
Topic Designation–A. Communication Systems, 1. Mod-
ulation and Coding.
I. INTRODUCTION
Polar codes [1] are a family of error correcting codes with
capacity-achieving property over various classes of channels,
providing excellent error rate performance for practical code
lengths [2]. The construction of polar codes is based on the
polarization effect of the Kronecker powers of the binary
2 × 2 kernel matrix T2 =
(
1 0
1 1
)
. A major drawback of
this construction is the restriction of achievable block lengths
to powers of 2. Puncturing and shortening techniques can
be used to adjust the code length, at the cost of a reduced
bit polarization [3]. To overcome this limitation, multi-kernel
polar codes have been introduced in [4]. By mixing binary
kernels of different sizes in the construction of the code, these
codes prove that many block lengths can be achieved while
keeping the polarization effect.
Many software and hardware implementations of polar code
decoders have been proposed in literature. While software
guarantees a higher degree of flexibility in terms of data
structures, fast software decoders have to rely on efficient
memory management [5], [6]. The importance of smart mem-
ory usage is even more evident in hardware implementations,
where memory accounts for the majority of area occupation
and power consumption, and heavily impacts decoder speed
[7]–[9]. The memory structure first proposed in [10] for purely
binary polar codes, and widely adopted in SC-based decoders
[11], relies on the observation that memory requirements
decrease as the decoding stage increases. We show how this
trend continues in multi-kernel polar codes, proposing an
efficient memory structure for SC-based polar decoders, and
G12 =


1 1 1 0 0 0 0 0 0 0 0 0
1 0 1 0 0 0 0 0 0 0 0 0
0 1 1 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 0 0 0 0 0 0
1 0 1 1 0 1 0 0 0 0 0 0
0 1 1 0 1 1 0 0 0 0 0 0
1 1 1 0 0 0 1 1 1 0 0 0
1 0 1 0 0 0 1 0 1 0 0 0
0 1 1 0 0 0 0 1 1 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1
1 0 1 1 0 1 1 0 1 1 0 1
0 1 1 0 1 1 0 1 1 0 1 1


Fig. 1: Transformation matrix G12 = T2 ⊗ T2 ⊗ T3.
providing functions for the evaluation of the overall memory
requirements. This structure supports the decoding of codes
constructed with any combination of kernel sizes, making
it an ideal framework for multi-kernel decoder hardware
implementations [12].
II. MULTI-KERNEL POLAR CODES
Multi-kernel polar codes generalize the construction of polar
codes by mixing binary kernels of different sizes. Similarly to
polar codes, an (N,K) multi-kernel polar code is completely
defined by a N ×N transformation matrix GN and a frozen
set F , with |F| = N−K . Transformation matrix has the form
GN = Tp1 ⊗ Tp2 · · · ⊗ Tps , (1)
where Tpi is a pi× pi binary matrix, i = 1, 2, . . . , s, denoting
a polarizing kernel of size pi, and N = p1 ·p2 · . . . ·ps. Binary
kernels of different sizes can be found in [13]. Transformation
matrix G12 = T2 ⊗ T2 ⊗ T3 is shown in Figure 1, where
T2 =
(
1 0
1 1
)
, T3 =

1 1 11 0 1
0 1 1

 (2)
and the recursive structure of the matrix is highlighted. The
frozen set F indicates the N−K bits to be frozen in the code
construction, and can generally be designed according to bit
reliabilities [4] or minimum distance [14]. Finally, the encoder
is defined by x = u ·GN , mapping the input vector u ∈ F2
N
to the codeword x ∈ F2
N , where ui = 0 for i ∈ F , and ui,
2u0
u1
u2
u3
u4
u5
u6
u
T3
T3
T2
T2
T2
T2
T2
T2
T2
T2
x0
x1
x2
x3
x4
x5
x6
x
L2,0
L2,1
L2,2
L2,3
L2,4
L2,5
L2,6
L
L3,0
L3,1
L3,2
L3,3
L3,4
L3,5
L3,6
L
L1,0
L1,1
L1,2
L1,3
L1,4
L1,5
L1,6
L
u3,0
u3,1
u3,2
u3,3
u3,4
u3,5
u3,6
u
u2,0
u2,1
u2,2
u2,3
u2,4
u2,5
u2,6
u
u1,0
u1,1
u1,2
u1,3
u1,4
u1,5
u1,6
u
7
u8
u9
u10
u11
T3
T3
T2
T2
T2
T2
7
x8
x9
x10
x11
2,7
L2,8
L2,9
L2,10
L2,11
3,7
L3,8
L3,9
L3,10
L3,11
1,7
L1,8
L1,9
L1,10
L1,11
3,7
u3,8
u3,9
u3,10
u3,11
2,7
u2,8
u2,9
u2,10
u2,11
1,7
u1,8
u1,9
u1,10
u1,11
P1P2P3
Stage 1Stage 2Stage 3
Fig. 2: Tanner graph defined by G12 = T2 ⊗ T2 ⊗ T3.
i /∈ F , stores the information bits. We recall the set I = Fc
to be termed as information set.
The structure of multi-kernel polar codes can be better
understood through the Tanner graph of the code; this consists
of various pi × pi blocks Bpi , corresponding to the different
Tpi kernels used in the construction of the transformation
matrix, connecting input vector and codeword. Each of the
s stages composing the graph is formed by Ni = N/pi
kernel blocks Bpi , performing the operations involving kernel
Tpi . Permutations Pi between stages are described in [4]; an
example of Tanner graph for a G12 is given in Figure 2.
Multi-kernel polar codes can be decoded through successive
cancellation (SC) decoding on the Tanner graph of the code,
where log-likelihood ratios (LLRs) [15] are passed from the
right to the left, while partial sums (PSs) based on hard
decisions on the decoded bits are passed from the left to
the right. LLRs and PSs are calculated in the kernel blocks,
depicted as
Bp
u0, l0
u1, l1 ...
up−1, lp−1
x0, L0
x1, L1...
xp−1, Lp−1
Blocks in the same column belong to the same stage and can
perform decoding operations in parallel. Roughly speaking,
Li and li represent the LLRs of the partial sums ui and xi
respectively. However, PSs are calculated on the basis of the
previously decoded bit, hence they may not match with the
connected LLRs. We indicate with Li,(j−1)pi , . . . , Li,jpi−1
and ui,(j−1)pi , . . . , ui,jpi−1 the LLRs and PSs input of the
j-th block of stage i respectively, with j ≤ Ni = N/pi.
LLRs L1,0, . . . , L1,N−1 correspond to channel LLRs, while
us,0, . . . , us,N−1 correspond to the decoded bits. An example
of this labeling is given in Figure 2.
Given the binary input vector u = (u0, u1, . . . , up−1),
corresponding to the partial sums calculated from the decoded
bits, the output vector x = (x0, x1, . . . , xp−1) is calculated as
x = u · Tp. If we call T
i
p the i-th column of the kernel matrix
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22step
Fig. 3: Data flow graph of the SC decoder for the multi-kernel
polar code defined by G12 = T2 ⊗ T2 ⊗ T3. Circles represent
LLR updates, squares represent PS updates.
Tp, the update rule for the PSs can be written as
xi = u · T
i
p. (3)
The vector x corresponds to the partial sums calculated by the
kernel Tp, that will be used as input for the LLRs calculations
of other blocks. This update rule is performed from left to
right, and can be used also for the encoding.
Output LLRs l0, . . . , lp−1 are calculated sequentially using
the input LLRs L0, . . . , Lp−1 coming from the previous stage
and the PSs corresponding to the previously decoded bit, i.e.,
li = f
p
i (L0, . . . , Lp−1, u0, . . . , ui−1), (4)
with l0 = f
p
0 (L0, . . . , Lp−1). This update is performed from
the right to the left, and corresponds to the successive can-
cellation principle. Rules for the derivation of LLR update
functions for arbitrary binary kernels can be found in [16].
III. MEMORY MANAGEMENT
Similarly to polar codes, it is possible to describe the SC
decoding process of a multi-kernel polar code using a data flow
graph, depicted in Figure 3 for the code generated by G12.
The data flow graph represents memory dependencies arising
during the decoding, where circles and squares represent
memory needed to store LLRs and PSs, respectively, and
black circles represent channel LLRs. In particular, circles
and squares identify the need for new memory allocation,
while horizontal lines determine the number of time steps for
which the values need to be stored. Thick lines represent LLR
updates, while dotted lines identify operations involving partial
sums, i.e. LLR updates when they merge with thick lines and
PSs updates when they connect squares. The study of the data
flow graph highlights the strong dependencies among data,
along with a repetitive structure in the LLR update functions,
and gives a precise order in the scheduling of the decoding
operations. In general, the hardest constraint for the LLR
update functions is given by the calculation of the necessary
PSs. The memory usage patterns observed in Figure 3 can be
3found for code of any length, and can be exploited to develop
a memory management framework as follows.
A. Memory Structure
The memory structures for a generic SC decoder for the
multi-kernel polar code defined by G12 is presented in Fig-
ure 4, along with memory dependencies. We call Λ, Π and Υ
the data structures used to respectively store LLRs, PSs and
decoded bits. We define as Q the number of bits assigned to the
representation of each internal LLR, while a partial sum and a
decoded bit are, by definition, single-bit values. The proposed
memory structure relies on the observation made in in [10]
that memory requirements for polar codes decoding decrease
as the stage index increases; we show that this phenomenon
can be extended to multi-kernel polar codes.
The memory structure of multi-kernel polar codes decoder
depends on the order of the kernels defining the transformation
matrix GN = Tp1⊗. . .⊗Tps , where s is the number of stages,
i.e., the number of factors in the Kronecker product. LLRs
can be stored in s+ 1 Q-bits vectors Λ0, . . . ,Λs of different
lengths, i.e. with a different number of elements. The length
of vector Λs is always 1, and stores the LLR of the currently
decoded bit. The length of vector Λi is given by the product
of the last s− i kernel sizes, i.e., Λi has pi+1 · . . . · ps entries.
PSs are stored in s binary matrices Π1, . . . ,Πs of different
width and depth, depending on the decoding stage. The width
of Πi is given by the size of kernel pi, while its depth is given
by the product of the last s − i kernel sizes, i.e., is given by
pi+1 · . . . · ps, similarly to LLR vectors. The first matrix Πs
is an exception, since it has width ps − 1. This is due to the
fact that the last column of Πs would be updated during the
PS update phase of the decoding of the last bit. Since the PS
update is executed right after the bit estimation, we skip this
last PS update and we do not need to store the last column of
Πs. Finally, the decoded bits are stored in the binary vector
Υ of length N .
B. Memory Update
Algorithm 1 SC Algorithm
1: Initialize Λ,Π,Υ
2: for i = 0 . . . N − 1 do
3: LLR update
4: ui calculation
5: PS update
6: end for
7: return u
Algorithm 1 depicts the logical flow of operations required
by SC decoding. In this Section, we follow its schedule and
first describe the update operations for the LLRs, then for the
decoded bits u, and finally for the PSs. The memory update
operations are performed by the kernel block. LLR vector Λ0
is initially filled with the N LLRs extracted from the received
symbols, while the rest of the memory is initialized to zero;
we recall that the LLRs have to be permuted according to P1
before the insertion in Λ0. According to the SC algorithm, bits
Λ0(0)
Λ0(1)
Λ0(2)
Λ0(3)
Λ0(4)
Λ0(5)
Λ0(6)
Λ0(7)
Λ1(0)
Λ1(1)
Λ1(2)
Λ1(3)
Λ1(4)
Λ1(5)
Λ2(0)
Λ2(1)
Λ2(2)
Π1(0,0)Π1
Λ2
Λ3
Λ1
Λ3(0)
LLRs
PSs
Λ0(8)
Λ0(9)
Λ0(10)
Λ0(11)
Π1(1,0)
Π1(2,0)
Π1(3,0)
Π1(4,0)
Π1(5,0)
Π2(0,0)Π2(0,1)
Π2(1,0)Π2(1,1)
Π2(2,0)Π2(2,1)Π3(0,0)Π3(0,1)
Π3
Π2
Λ0
Fig. 4: Memory structure for G12 = T2⊗T2⊗T3. White dots
represent LLR updates, black dots represent PS updates. Red
lines represent B2 blocks, blue lines represent B3 blocks.
are decoded sequentially, hence some of the memory structures
are updated at every bit estimation; this update process is
illustrated for the decoding of generic input bit ui.
LLR update:
The update function (4) to be used in this phase depends on the
index i of the decoded bit ui, and it is selected using the mixed
radix representation of i based on the kernels composing the
transformation matrix of the code.
In a base p radix system, positive integers are represented
as a finite sequence of digits smaller than p. A mixed radix
system is a non-standard positional numeral system, generaliz-
ing classic radix system, in which the numerical base depends
on the digit position. A well known example of a mixed radix
numeral system is the one used to measure time in hours,
minutes and seconds. We use the sequence 〈p1, . . . ps〉 of the
sizes of the kernels constructing the transformation matrix
as the base of a finite mixed radix system representing the
index i of the decoded bit ui. According to this representation,
any integer i < N can be expressed as a vector of s digits
i = b
(i)
1 . . . b
(i)
s , with 0 ≤ b
(i)
j < pj and
i = b(i)s +
s−1∑
j=1
b
(i)
j · (pj+1 · . . . · ps). (5)
The mixed radix representation of the decoded bits indices for
G12 = T2 ⊗ T2 ⊗ T3 is given by:
i 0 1 2 3 4 5 6 7 8 9 10 11
b
(i)
1 0 0 0 0 0 0 1 1 1 1 1 1
b
(i)
2 0 0 0 1 1 1 0 0 0 1 1 1
b
(i)
3 0 1 2 0 1 2 0 1 2 0 1 2
The update of LLR vectors proceeds from right to left in
Figure 4. Starting from Λ1, all the vectors are updated using
the previous LLR vector and the present PS matrix as input; in
general, vector Λj is updated using vector Λj−1 and the partial
4sums stored in Πj . The LLR update rule to be used in the
update of Λj is selected using the mixed radix representation
of i, and more precisely the LLR update rule f
pj
b
(i)
j
is used.
This method is an extension of the method proposed in [10]
for the scheduling of f and g functions in the decoding of
polar codes. Each entry Λj(k) of the LLR vector is calculated
as
Λj(k) = f
pj
b
(i)
j
(Λj−1(k · pj), . . . ,Λj−1((k + 1) · pj − 1),
Πj(k, 0), . . . ,Πj(k, b
(i)
j )).
(6)
The update operations of a vector Λj can be run in parallel to
reduce latency using up to pj+1 · . . . · ps kernel blocks.
Using the proposed LLR update algorithm, s LLR vectors,
i.e. from Λ1 to Λs, are updated for every decoded bit ui.
However, the data flow presented in Figure 3 shows that the
number of vectors to be updated actually depends on the
index i of the decoded bit. A closer look to the mixed radix
representation table suggests the reason of this scheduling:
in fact, the mixed radix representations of two consecutive
numbers differ only on the right of the position of the
rightmost nonzero element of the second number. In practice,
given i− 1 = b
(i−1)
1 . . . b
(i−1)
s , if it exists an index z such that
b
(i)
z 6= 0 and b
(i)
j = 0 for all j > z, we have that b
(i)
j = b
(i−1)
j
for all j < z. As a consequence, to decode the bit ui it is not
necessary to update the vectors Λj with indices j < z, and the
update can be run starting from Λz . Of course, for the case
i = 0 all the vectors have to be updated. This acceleration
technique is a generalization of the one proposed in [15] for
polar codes, and halves the number of vectors updates.
This property allows a further simplification of the LLR
update algorithm. We have seen that the LLR update is run
starting from Λz with z such that b
(i)
z 6= 0 and b
(i)
j = 0 for
all j > z. This means that the vector Λz is updated using
function f
pj
b
(i)
z
, while all the other vectors are updated using a
function of the form f
pj
0 . This means that it is only necessary
to find the subscript of the first LLR update function, while
the other ones all have subscript 0.
ui estimation:
If i ∈ F , i.e. it belongs to the frozen set, its value is known
to be zero, hence Υ(i) = 0. Otherwise, i.e. if i /∈ F ,
the value decoded bit is decided by hard decision on its
LLR. After the LLR update phase, the LLR of the bit ui
will be copied in Λs as explained in next paragraph. In
our implementation, negative LLRs represent the bit 1, while
positive LLRs represent the bit 0. Through hard decision, we
set Υ(i) = sgn(Λs(0))+12 . To sum up, we have that
Υ(i) =
{
0 if i ∈ F
sgn(Λs(0))+1
2 if i /∈ F
(7)
PS update:
PS matrices are updated in decreasing order starting from
Πs. Inside each matrix, the entries update is performed per
columns, in increasing order starting from the first column.
When the last column of a matrix is filled, a column of the
next matrix is updated. Similarly to LLRs, the update function
depends on the mixed radix representation of index i; in
Algorithm 2 LLR update
1: n = 1
2: if i = 0 then
3: break
4: else
5: for z = s . . . 1 do
6: if i mod pz 6= 0 then
7: break
8: end if
9: i = i
pz
10: n = n · pz
11: end for
12: end if
13: b = i mod pz
14: for k = 0 . . . n− 1 do
15: Λz(k) = f
pz
b (Λz−1(k · pz), . . . ,Λz−1((k + 1) · pz −
1),Πz(k, 0), . . . ,Πz(k, b))
16: end for
17: for j = z + 1 . . . z do
18: n = n · pj
19: for k = 0 . . . n− 1 do
20: Λj(k) = f
pj
0 (Λj−1(k ·pj), . . . ,Λj−1((k+1) ·pj−
1))
21: end for
22: end for
Algorithm 3 ui calculation
1: if i ∈ F then
2: Υ(i) = 0
3: else
4: Υ(i) = sgn(Λs(0))+12
5: end if
particular, the number of matrices to be updated is given by the
number of consecutive digits of the mixed radix representation
of i with the highest symbol admitted by the radix, counting
from the last digit.
Update always starts from the last PS matrix Πs, that is a
row vector of width ps. The value of the decoded bit ui is
copied in the column b
(i)
s of the matrix, i.e., Πs(0, b
(i)
s ) =
Υ(i). When bs = ps − 1, the last column of the matrix
has been filled, and the column b
(i)
s−1 of the matrix Πs−1
is updated, otherwise the update process ends. In general, if
b
(i)
j = pj − 1 for all j > z and b
(i)
z < pz − 1, the matrices
Πs,Πs−1, . . . ,Πz are going to be updated. When the last
column of matrix Πj is filled, i.e. when b
(i)
j = pj−1, then the
column bj−1 of matrix Πj−1 has to be updated. In this case,
each row of Πj is used to update the column b
(i)
j−1 of Πj−1
as [Πj−1(k · pj , b
(i)
j−1), . . . ,Πj−1((k + 1) · pj − 1, b
(i)
j−1)] =
[Πj(k, 0), . . . ,Πj(k, pj − 1)] · Tpj for k = 0, . . . , pj+1 · . . . ·
ps − 1. If we call T
k
p the vector formed by the k-th column
of the kernel matrix Tp, the update rule for the PSs can be
rewritten as
Πj(k, b
(i)
j ) = Πj
(⌊
k
pj−1
⌋
,−
)
· T cpj (8)
5for k = 0, . . . , pj+1 · . . . · ps, where Πj(k,−) represents the
k-th row of Πj and c = (k mod pj+1) + 1. As an exception,
the PS update step is not executed for the last decoded bit
uN−1, since this phase would have been executed after the
decoding of the last bit and it would be pointless.
Algorithm 4 PS update
1: n = 1
2: if i = N − 1 then
3: return
4: end if
5: for j = s− 1 . . . 1 do
6: if i + 1 mod pj+1 6= 0 then
7: return
8: end if
9: i = i+1
pj+1
− 1
10: n = n · pj+1
11: b = i mod pj
12: for k = 0 . . . n− 1 do
13: c = (k mod pj+1) + 1
14: Πj(k, b) = Πj+1
(⌊
k
pj−1
⌋
,−
)
· T cpj
15: end for
16: end for
IV. ANALYSIS AND CONCLUSIONS
The proposed memory structure allows to limit the memory
requirement of a multi-kernel polar decoder. In fact, a naı¨ve
memory management of the SC decoder for a multi-kernel
polar codes with transformation matrix GN = Tp1 ⊗ . . . ⊗
Tps requires to store all the LLRs and the PSs depicted in
the Tanner graph of the code. As a consequence, MLLR =
N · (s + 1) LLRs and MPS = N · s PSs, with N = p1 ·
. . . ·ps, have to be stored, with space complexity O(sN). The
memory requirement is hence linearly dependent on both the
code length N and the number of kernels s.
In the proposed memory structure, every LLR vector Λi
with i ≤ 1 stores N
p1·...·pi
LLRs, while the first vector Λ0
stores the N LLRs derived from the received signals. In total,
for the proposed memory framework
MLLRprop = N +
N
p1
+ N
p1·p2
+ . . .+ 1 =
= (. . . ((p1 + 1) · p2 + 1) · . . .) · ps + 1
(9)
LLRs have to be stored. Similarly, every PS matrix Πi with
i > 1 stores N
p1·...·pi
· pi =
N
p1·...·pi−1
partial sums, while Π1
stores N
p1
· (ps − 1) PSs. Then, the total number of PSs is
MPSprop =
N
p1
· (ps − 1) +
N
p1·p2
+ . . .+ ps =
= (. . . ((p1 · p2 + 1) · p3 + 1) · . . .) · ps.
(10)
By construction, we have that MPSprop ≤ N ≤ M
LLR
prop < 2N ,
hence the space complexity for both LLRs and PSs is reduced
to O(N). A comparison between the memory requirements for
the proposed memory structure and the naı¨ve one involving
only kernels of sizes 2 and 3 is presented here:
N 12 72 144 384 972
MLLRprop 22 139 283 766 1822
MLLR 48 432 1008 3456 7776
MPSprop 15 102 210 573 1335
MPS 36 360 864 3072 6804
The memory requirement reduction enabled by the proposed
memory structure is remarkable. This proves that multi-kernel
polar codes can be used as a valid alternative to punctured
polar codes in terms of memory complexity. Given the sim-
ilarities between polar codes and multi-kernel polar codes, it
is straightforward to apply the proposed memory structure to
list or simplified SC decoders. Finally, the proposed imple-
mentation can be easily transposed to hardware, reducing the
complexity of an ASIC or FPGA dedicated architecture.
REFERENCES
[1] E. Arikan, “Channel polarization: a method for constructing capacity-
achieving codes for symmetric binary-input memoryless channels,”
IEEE Transactions on Information Theory, vol. 55, no. 7, pp. 3051–
3073, July 2009.
[2] G. Liva, L. Gaudio, T. Ninacs, and T. Jerkovits, “Code design for short
blocks: A survey,” in arXiv preprint, arXiv:1610.00873, Oct. 2016.
[3] V. Bioglio, F. Gabry, and I. Land, “Low-complexity puncturing and
shortening of polar codes,” in IEEE Wireless Communications and
Networking Conference (WCNC), San Francisco, CA, USA, March
2017.
[4] F. Gabry, V. Bioglio, I. Land, and J.-C. Belfiore, “Multi-kernel
construction of polar codes,” in IEEE International Conference on
Communications (ICC), Paris, France, May 2017.
[5] B. L. Gal, C. Leroux, and C. Jego, “Software polar decoder on an
embedded processor,” in IEEE Workshop on Signal Processing Systems
(SiPS), Belfast, UK, Oct 2014.
[6] Y. Shen, C. Zhang, J. Yang, S. Zhang, and X. You, “Low-latency
software successive cancellation list polar decoder using stage-located
copy,” in IEEE International Conference on Digital Signal Processing
(DSP), Beijing, China, Oct 2016.
[7] C. Leroux, I. Tal, A. Vardy, and W. J. Gross, “Hardware architectures for
successive cancellation decoding of polar codes,” in Acoustics, Speech
and Signal Processing (ICASSP), 2011 IEEE International Conference
on, Prague, Czech Republic, May 2011.
[8] S. A. Hashemi, C. Condo, F. Ercan, and W. J. Gross, “Memory-efficient
polar decoders,” IEEE Journal on Emerging and Selected Topics in
Circuits and Systems, vol. 7, no. 4, pp. 604–615, Dec 2017.
[9] F. Ercan, C. Condo, S. A. Hashemi, and W. J. Gross, “On error-
correction performance and implementation of polar code list decoders
for 5G,” in Allerton Conference on Communication, Control, and
Computing, Monticello, IL, USA, Oct 2017.
[10] C. Leroux, A. J. Raymond, G. Sarkis, and W. J. Gross, “A semi-parallel
successive-cancellation decoder for polar codes,” IEEE Transactions on
Signal Processing, vol. 61, no. 2, pp. 289–299, 2013.
[11] A. Balatsoukas-Stimming, A. J. Raymond, W. J. Gross, and A. Burg,
“Hardware architecture for list successive cancellation decoding of polar
codes,” IEEE Transactions on Circuits and Systems II: Express Briefs,
vol. 61, no. 8, pp. 609–613, 2014.
[12] G. Coppolino, C. Condo, G. Masera, and W. J. Gross, “A multi-kernel
multi-code polar decoder architecture,” IEEE Transactions on Circuits
and Systems I: Regular Papers, pp. 1–10, 2018.
[13] N. Presman, O. Shapira, S. Litsyn, T. Etzion, and A. Vardy, “Binary
polarization kernels from code decompositions,” IEEE Transactions on
Information Theory, vol. 61, no. 5, pp. 2227–2239, May 2015.
[14] V. Bioglio, F. Gabry, I. Land, and J.-C. Belfiore, “Minimum-distance
based construction of multi-kernel polar codes,” in IEEE Global
Communications Conference (GLOBECOM), Singapore, Dec. 2017.
[15] A. Balatsoukas-Stimming, M. Bastani Parizi, and A. Burg, “LLR-
based successive cancellation list decoding of polar codes,” in IEEE
International Conference on Acoustics, Speech and Signal Processing
(ICASSP), Florence, Italy, May 2014.
[16] V. Bioglio and I. Land, “On the marginalization of polarizing kernels,”
in International Symposium on Turbo Codes & Iterative Information,
Hong Kong, December 2018.
