A 5.16Gbps decoder ASIC for Polar Code in 16nm FinFET by Liu, Xiaocheng et al.
ar
X
iv
:1
80
7.
01
45
1v
1 
 [c
s.I
T]
  4
 Ju
l 2
01
8
A 5.16Gbps decoder ASIC for Polar Code
in 16nm FinFET
Xiaocheng Liu, Qifan Zhang, Pengcheng Qiu, Jiajie Tong, Huazi Zhang, Changyong Zhao, Jun Wang
Huawei Technologies Co. Ltd.
Email: {liuxiaocheng, Qifan.Zhang, qiupengcheng, justin.wangjun}@huawei.com
Abstract—Polar codes has been selected as 5G standard. How-
ever, only a couple of ASIC featuring decoders are fabricated,
and none of them support list size L > 4 and code length
N > 1024. This paper presents an ASIC implementation of three
decoders for polar code: successive cancellation (SC) decoder,
flexible decoder and ultra-reliable decoder. These decoders are
all SC based decoder, supporting list size up to 1, 8, 32 and code
length up to 215, 214, 211 respectively. This chip is fabricated in
a 16nm TSMC FinFET technology, and can be clocked at 1 Ghz.
Optimization techniques are proposed and employed to increase
throughput. Experiment result shows that the throughput can
achieve up to 5.16Gbps. Compared with fabricated AISC de-
coder and synthesized decoder in literature, the flexible decoder
achieves higher area efficiency.
Index Terms—Polar code, ASIC, decoding, SCL.
I. INTRODUCTION
Polar codes, proposed by Arikan [1], has been selected
as the 5G standard. Although Polar codes with successive-
cancellation (SC) decoding is proved to achieve channel ca-
pacity in the asymptotic sense, its error-correction performance
is inferior to that of low-density parity-check (LDPC) or Turbo
codes at short or moderate lengths. SC list (SCL) decoding,
regarded as the most efficient decoding algorithm of polar
codes, improves the error-correction performance but suffers
from low latency and low throughput due to the serial nature
of SC. Much effort has been made to optimize the decoding
of Polar codes [2]–[9]. However, most works lack ASIC
implementation and thus bear less practical relevance.
A couple of ASIC featuring decoders are fabricated in [10]–
[12]. The chip presented in [10] implements the SC decoding
algorithm; The chip presented in [11] implements the belief-
propagation decoding algorithm. Both of them suffer from
mediocre error-correction performance. The chip presented in
[12] implements SCL decoding, but constrains the largest list
size Lmax = 4 and largest code length Nmax = 1024, which
limits its application scope.
A. Motivation and Contribution
This work is motivated by the desire to provide ASIC
decoder to support polar codes research and speed up pro-
totype building of 5G communication systems. The ASIC
decoder should have low latency and high error-correction
performance, and support a wide range of list sizes and code
lengths. To satisfy all the desired properties, we integrated
three decoders in one chip: SC decoder, flexible decoder and
ultra-reliable decoder.
• SC decoder is designed for low latency and long code
length with N = 215;
• Flexible decoder is a SCL decoder with Nmax = 2
14 and
Lmax = 8. The list size of the flexible decoder can be
configured during runtime;
• Ultra-reliable decoder is also a SCL decoder, designed for
ultra-reliable scene with largest code length Nmax = 2
11
and list size L = 32.
All the decoders support any code rate. This is the first ASIC
implemented SCL decoder supporting L > 4 and N > 1024.
To improve throughput, several optimization techniques are
proposed. We propose a new internal log-likelihood ratio
(LLR) messages storage method which can reduce 86% of
the internal LLR memory. A serial list processing architecture
is proposed to avoid the crossbar of LLR. This can reduce
resource and improve timing performance. To improve utiliza-
tion ratio of processing element (PE), we propose to decode
two packages simultaneously, which can improve throughput
by 54%. We also recovery decoded bit from partial-sum to
save memory.
B. Outline
The rest of this paper is organized as follows. Section II
gives a brief review of polar codes and SC-base decoding
algorithms. The proposed ASIC architectures and optimization
techniques are presented in Section III. Section IV presents
the implementation results and comparison with state-of-the-
art works. Section V concludes the paper.
II. POLAR CODE
An (N, k) polar code has a code lengthN and k information
bits. The code rate R is defined by R = k/N . The information
bits are assigned to the k most reliable sub-channels, and
the remaining sub-channels are assigned by pre-defined value,
typical zero—called frozen bits. The encoding of Polar code
can be defined as c = uG, where u is the source vector, G is
the generator matrix, defined as G , F⊗n, where F =
[
1 0
1 1
]
is the kernel, ⊗ denotes Kronecker power, and n = log2N .
A. SC-based Decoders
The decoding graph of SC decoder is shown in Fig. 1.
The soft values propagate from right to left and the hard
bits propagate from left to right. The information vector u is
decoded sequentially from top to bottom. A hardware-friendly
version of soft value updating is carried out in log-likelihood
fg
f
g
f
f
g
g
f
f
f
f
STAGE 1 STAGE 2STAGE 0
û7
û6
û5
û4
û3
û2
û1
û0
Estimated 
Values
y0
ǆ1,0
ǆ1,1
Dec.
ǆ0,2
ǆ0,0
Dec.
Dec.
Dec.
f
g
f
g
f
f
g
g
g
g
g
g
ǆ1,4
ǆ1,5
Dec.
ǆ0,6
ǆ0,4
Dec.
Dec.
Dec.
Channel
LLRs
ǆ2,0
ǆ2,1
ǆ2,2
ǆ2,3
y1
y2
y3
y4
y5
y6
y7
Fig. 1. decoding graph.
ratio (LLR) domain. Two incoming LLRs (Lin1 and Lin2) are
combined to produce Lout with the following f-function
Lout = sign(Lin1 · Lin2) ·min(|Lin1|, |Lin2|). (1)
or g-function
Lout = Lin1 + (−1)
sˆ · Lin2, (2)
where sˆ is called partial sum (PS). For an SCL decoder, the
decoding process is similar to SC decoder except that it keeps
L paths. When making hard decision for each bit, L paths
split into 2L paths, and the ones with smallest path metric
(PM) are kept. For list size l and bit ui, the LLR of stage 0 is
denoted as Ll
0,i and its hard decision is denoted as β
l
0,i. The
PM updates according to
PM li =
{
PM li−1, if u
l
i = β
l
0,i
PM li−1 + |L
l
0,i|, otherwise
(3)
After all bits are decoded, the path with the smallest PM
is selected as the decoding output. To further improve error
correction performance, concatenated polar code is proposed.
For cyclic redundancy check (CRC) aided SCL (CA-SCL)
[13], the most reliable path that passes the CRC is selected
as the decoding output. For parity-check SCL (PC-SCL) [14],
each parity bit is decided by its parity function rather than by
the LLR.
III. ARCHITECTURE
The overview of our ASIC design is shown in Fig. 2. It
mainly comprises six units: SC decoder, flexible decoder, ultra-
reliable decoder, de-frozen unit, code construction unit and
scheduler. Five flexible decoders are integrated in the chip
to achieve high throughput. The flexible decoder and ultra-
reliable decoder can be configured as SCL, CA-SCL or PC-
SCL during runtime. The code construction unit generates
frozen bit set for all the three decoders. This can avoid
transmission of frozen bit set and support any code rate. The
de-frozen unit is responsible to remove frozen bits in source
vector. The data-flow is managed by input-scheduler and out-
scheduler.
/GHF/GHFFlexible
decoder
Ultra-reliable
decoder
SC
decoder De-frozen
In_scheduler Out_scheduler
Data in
CMU
Core clk
ref_clk
construction
LVDS clk
x5
LVDS
rceiver
LVDS
sender
Data out
SPI_bus
Fig. 2. The architecture of the chip.
C
h
a
n
n
e
l 
L
L
R
 
m
e
m
o
ry
F
ro
z
e
n
 b
it
m
e
m
o
ry
1
2
8
P
E
6
4
P
E
3
2
P
E
Internal LLR memory
L
L
R
3
2
 D
F
F
s
1
6
P
E
8
P
E
M
o
s
t 
lik
e
h
o
o
d
4
P
E
decoded bits out
Serial unit
... x8
Sorting
4bits PM/PS 
parallel unit
mux
G
o
o
d
 b
it
m
e
m
o
ry
CRC/
Parity check
PM memory
PS memory
Fig. 3. The architecture of flexible decoder.
There are four clock domains in the chip. All the six
units are clocked by “core clk”, which is generated by clock
management unit (CMU). The CMU also generates “LVDS
clk” for LVDS(Low Voltage Differential Signaling) sender.
LVDS receiver and SPI(Serial Peripheral Interface) bus are
clocked by external input clock.
The LLRs are represented in sign-and-magnitude form as
in [10]. We denote Qi, Qc as the number of bits to represent
internal LLR and channel LLR. In our ASIC design, we set
Qc = 6 for all decoders. We denote Qsort, QPM as the
number of bits to represent PM in metric sorter and PM in
memory, respectively. After sorting, the minimum PM will be
subtracted from the PM of all list paths and the quantization
of PM will be reduced from Qsort to QPM . Therefore, QPM
of our ASIC is smaller than that of [12].
A. Flexible decoder
Flexible decoder supports variable list size and code length,
with upper limit Lmax = 8 and Nmax = 2
14. The architecture
of flexible decoder is shown in Fig. 3. Channel LLR memory
stories the received channel LLRs. Internal LLR memory
stories the LLRs generated during decoding process. Good
bits are information bits with higher reliability. They are stored
and used to reduce path splitting as [8]. In this decoder, we
set Qi = 6, Qsort = 7, and QPM = 6 to preserve the same
error performance as a floating-point decoder.
PEs are capable of performing f and g function. If the
parallelism of f/g node is larger than 16, LLR processing is
executed in the serial unit. Otherwise, it is executed in the
parallel unit. 4-bit is decoded simultaneously in parallel unit
by employing multi-bit decision [4]. Up to 32 rate zero nodes
[2] and rate one nodes [2] in which all bits are good bits
are also decoded simultaneously in parallel unit. Moreover,
decoding starts from the first non-frozen bit as [12].
We propose optimization techniques to improve throughput.
They are employed in our decodes and presented below.
1) LLR Memory Reduction: In a decoder chip, the internal
LLR memory takes up most of the total core area. The Lout
in each stage should be stored and will be reused as shown in
Fig. 1. The memory size for internal LLR is
MEM = L×Qi ×
n−1∑
i=0
2i = LQi(N − 1). (4)
In a flexible decoder, we only save internal LLR for every
three neighboring stages. The LLRs between these stages can
be re-calculated on the fly from the stored LLRs. Therefore,
the memory size for internal LLR is reduced to
MEM = L×Qi ×
n/3−1∑
i=n/3−3
23i ≈ 0.14× LQi(N − 1). (5)
We can see that almost 86% of internal LLR memory is
reduced. To compensate the latency introduced by LLR re-
calculation, more PEs are utilized.
2) Serial List Processing: We propose a serial list pro-
cessing architecture, which executes the LLR processing of
different list in serial. As far as we know, all the hardware
architecture in literature [3]–[5], [9] contains L SC decoder
cores and execute the LLR processing of different list paths in
parallel. Due to LLR exchange among paths, this architecture
requires a crossbar of LLR. The crossbar contains L L-to-1
multiplexers with complexity growing proportional to L2.
Our serial list processing architecture executes LLR of
different list one by one, thus does not require a crossbar. The
LLR exchange among lists can be implemented by exchanging
the address of memory. Compared to parallel architecture, se-
rial architecture introduces no extra latency when PE quantity
is the same. However, the complexity is reduced and the timing
performance will significantly improve especially when list
size and PE quantity is large.
At stage t, only 2t f/g functions need to be executed as
shown in Fig. 1. Therefore, the large number of PEs in serial
unit can not be fully used when t <= 4. For these stages, we
apply parallel unit to decrease the latency.
3) Double-Package Mode: Due to the serial nature of SC
decoding, the PEs are idle during PM sorting period. The
sorting latency is comparable with f/g execution latency when
list size is large and multi-bit decoding is used. To improve the
utilization ratio of PEs, double-package mode is applied when
two packages are decoded simultaneously. When the PM of a
package is sorting, the PEs are utilized to execute f/g function
for the other package.
We define the decoding time as T when only one package is
decoded in the decoder. It requires less than 1.3 ∗T to decode
two packages under double-package mode. This can improve
the throughput by 54%.
4) Decoded-bit Recovery: In general, independent memory
for u and PS are required in decoder. Their sizes are both LN
bits [3]. However, the memory for u is not necessary since u
is only required when decoded-bits are sent out.
Proposition 1: For SCL decoder, u can be recovered from
PS after all bits are decoded.
Proof: We denote Sˆt as the stored PS vector of stage t.
Since the serial nature of SC, only 2t elements of Sˆt will be
update simultaneously and need to be stored. After all bits are
decoded, the stored PS locates at the right lower triangle of
decoding graph (e.g.,the green PS in Fig. 1). We can infer that
the final stored
Sˆt = u
N−2t−1
N−2(t+1)
·Gt, (6)
where Gt = F
⊗t. A characteristic of generator matrix G is
G = G−1. So, we can induce that
uN−2
t
−1
N−2(t+1)
= Sˆt ·G
−1
t = Sˆt ·Gt. (7)
It can be seen as a polar encoding on Sˆt. According to (7),
uN−2
0
can be obtained. uN−1 can be obtained when the last bit
is decoded. Thus, u can be recovered after all bits are decoded.
The polar encoding can be implemented by bitwise XOR. It
takes much less chip area that LN bits memory. Furthermore,
the encoding on PS can be executed in parallel with decoding
of next package, has no effect on throughput. Therefore, we
use LN bits memory to save PS and N bits memory to save
recovered u. Compared with general method, we can save (L−
1)N bits memory.
B. SC Decoder
The SC decoder is a simplified version of flexible decoder
without path metric management. The SC decoder exploits
SSC decoding algorithm [2] and supports code length N =
215. Due to the long code length, we set Qi = 7 to avoid
error-correction performance loss.
C. Ultra-reliable Decoder
Aiming at L = 32, we design an ultra-reliable decoder
based on flexible decoder. In the decoder, Qsort and QPM
is the same as those in the flexible decoder, Qi = 7 at stage
0 and Qi = 6 at other stages. The main differences between
ultra-reliable decoder and flexible decoder are shown below:
1) Serial List Processing: There is a semi-parallel unit
besides serial unit and parallel unit in ultra-reliable decoder.
The number of PE, subjecting to the number of dependent
nodes, can not be added arbitrarily to decrease the latency.
Due to the large list size and the number of PE, the latency of
serial processing deceases the throughput significantly at some
stages. Therefore, four list paths of stage 4 ∼ 3 are executed
in parallel. These stages are called semi-parallel unit.
2) LLR Memory Reduction: Larger list size requires larger
memory size for internal LLR according to (4). Therefore, we
save internal LLR for every 4 neighboring stages to save more
memory in ultra-reliable decoder. However, four LLR copies
of stage 5 need to be stored for supporting semi-parallel unit
and increasing the throughput. In total, almost 87% of internal
LLR memory are reduced compared with (4).
Flexible 
decoder
Flexible 
decoder
Flexible 
decoder
Flexible 
decoder
Flexible 
decoder
Ultra-
reliable 
decoder
I/O 
buffer & 
scheduler
SC 
decoder
Fig. 4. The micrograph of the decoder ASIC.
Fig. 5. The photograph of the decoder ASIC.
3) Multi-bit Parallel Processing: Multi-bit decision is also
adopted in ultra-reliable decoder. However, only 2-bit and up
to 4 rate 0/1 nodes can be decoded simultaneously.
4) Double-Package Mode: Double-package mode is not
supported to save memory.
IV. IMPLEMENTATION RESULTS AND MEASUREMENT
This ASIC is fabricated in a 16nm TSMC FinFET technol-
ogy. The chip area is 6mm2 with fclk = 1Ghz, where fclk
is the highest frequency of “core clk”. the micrograph and
photograph of the chip is shown in Fig.4 and Fig.5.
A. Measurement Setup
To test the decoder ASIC, we design a printed circuit
board (PCB) which integrates the decoder ASIC and a Xilinx
xc7vx690t FPGA. The PCB can be inserted into the PCIE slot
of a computer. Test data is generated on the computer, and
send to FPGA throughput PCIE. The FPGA acts as a bridge
between computer and decoder ASIC.
B. Error-Correction Performance and Throughput
The frame error rate (FER) under various list sizes, code
lengths and code rates are tested by the designed PCB
with 24 bits CRC, and plotted in Fig.6. The codewords
are randomly generated, modulated with quadrature phase-
shift keying (QPSK) and transmitted over an additive white
Gaussian noise channel. As a reference, the floating-point
results are also plotted in Fig.6. It can be seen that quantization
incurs performance loss less than 0.1dB.
TABLE I
MEASURED THROUGHPUT
condition
T/P (Mbps) code rate
1/4 1/2 2/3 3/4 8/9
L = 1,N = 215 1649 2599 3351 3821 4786
L = 8,N = 214 1750 2968 3777 4245 5164
L = 32,N = 211 33 54 70 75 91
The measured throughputs are summarized in Table. I. The
throughput (T/P) is defined by
T/P = k × fclk/T, (8)
where T is the decoding latency. The highest throughput is
5.164 Gbps when code rate R = 8/9. The throughput of
flexible decoder is even higher than the SC decoder since
5 flexible decoder cores are implemented. In terms of area
efficiency, SC decoder is the highest one in the three decoder.
Due to the large list size, the throughput of ultra-reliable
decoder is much lower than the other two decoders.
C. Comparison With State-of-the-Art Fabricated ASICs
The comparison with state-of-the-art fabricated ASICs is
shown in Table II. Our SC decoder supports N = 215, but SC
decoder in [12] [10] only supports N = 210. Therefore, it is
hard to give a precise comparison for these SC decoders in the
table. The flexible decoder and [12] run at the same code rate
and length, but the former runs with larger list size (L = 8)
and can support larger code length. Even though, the area
efficiency of flexible decoder is much higher than the scaled
result of [12]. As for ultra-reliable decoder, no fabricated ASIC
decoder with L = 32 has been reported in literature.
To further evaluate our architecture, we present the synthesis
result of the flexible decoder and state-of-the-art decoders
in Table III. We re-synthesize one flexible decoder and set
Nmax = 1024 for fair comparison. The fclk increases to
1.1GHz. The scaled result shows that the flexible decoder out-
performs state-of-the-art decoders in terms of area efficiency.
V. CONCLUSION
In this paper, we present an ASIC implementation of
three SC-based decoders for polar code in a 16nm TSMC
FinFET technology. To our knowledge, this is the first ASIC
implemented SCL decoder supporting L > 4 and N > 1024.
To improve throughput, several optimization techniques are
proposed. Measurement result shows that throughput of the
SC decoder, flexible decoder and ultra-reliable decoder can
achieve up to 5.16Gbps, 4.79Gbps and 91Mbps, respectively.
Compared with fabricated AISC decoder and synthesized
decoder in literature, the flexible decoder achieves higher area
efficiency.
REFERENCES
[1] E. Arikan, “Channel polarization: A method for constructing capacity-
achieving codes for symmetric binary-input memoryless channels,” IEEE
Transactions on Information Theory, vol. 55, no. 7, pp. 3051–3073, July
2009.
Es/N0 (dB)
-2 -1 0 1 2 3 4
F
E
R
10-5
10-4
10-3
10-2
10-1
100
N=2048,L=32
floating-point
ASIC decoder
Es/N0 (dB)
-2 -1 0 1 2 3 4
F
E
R
10-5
10-4
10-3
10-2
10-1
100
N=16384,L=8
floating-point
ASIC decoder
Es/N0 (dB)
-2 -1 0 1 2 3 4 5 6
F
E
R
10-5
10-4
10-3
10-2
10-1
100
N=32768,L=1
floating-point
ASIC decoder
Fig. 6. The error-correction performance of the three decoders under code rates R ∈ [1/3, 1/2, 2/3].
TABLE II
COMPARISON WITH STATE-OF-THE-ART FABRICATED ASICS
implementation SC decoder SC decoder flexible decoder ultra-reliable decoder [12] [12] [10] [11]
algorithm SC SC SCL(L=8) SCL(L=32) SC SCL(L=4) SC BP(15 iter)
code length 32768 32768 1024 2048 1024 1024 1024 1024
code rate 1/2 869/1024 1/2 1/2 869/1024 1/2 1/2 1/2
technology 16nm 16nm 16nm 16nm 28nm 28nm 180nm 65nm
supply(V ) 0.9 0.9 0.9 0.9 0.9 0.9 1.3 1.0
Frequency(MHz) 1000 1000 1000 1000 452 308 150 300
T/P (Mbps) 2599 4442 3241 54 7836 65.5 49.0 1024(2)
area(mm2) 0.35 0.35 2.27 0.43 0.35 0.44 1.71 1.48
area Eff. (Mbps/mm2) 7426 12691 1428 126 22389 148 28.7 692
Normalized for 16nm(1)
T/P (Mbps) 2599 4442 3241 54 13713 115 551 4160(2)
area(mm2) 0.35 0.35 2.27 0.43 0.114 0.144 0.0135 0.0897
area Eff. (Mbps/mm2) 7426 12691 1428 126 120289 793 40815 46377
1 Area is scaled as λ2, frequency as 1/λ, where λ is the technology feature size.
2 The throughput is scaled to worst case.
TABLE III
COMPARISON OF SYNTHESIS RESULTS FORN = 1024
implementation This work [9] [6] [7]
list size 8 8 8 8
technology 16nm 65nm 90nm 90nm
Frequency(MHz) 1100 722 289 637
T/P (Mbps) 713 599 374 123
area(mm2) 0.06 3.975 7.22 3.58
area Eff. (Mbps/mm2) 11883 151 51 34.4
Normalized for 16nm(1)
T/P (Mbps) 713 2434 2104 692
area(mm2) 0.06 0.241 0.228 0.113
area Eff. (Mbps/mm2) 11883 10124 9228 6124
1 Area is scaled as λ2, frequency as 1/λ, where λ is the technology
feature size.
[2] A. Alamdar-Yazdi and F. R. Kschischang, “A simplified successive-
cancellation decoder for polar codes,” IEEE Communications Letters,
vol. 15, no. 12, pp. 1378–1380, December 2011.
[3] A. Balatsoukas-Stimming, A. J. Raymond, W. J. Gross, and A. Burg,
“Hardware architecture for list successive cancellation decoding of polar
codes,” IEEE Transactions on Circuits and Systems II: Express Briefs,
vol. 61, no. 8, pp. 609–613, Aug 2014.
[4] B. Yuan and K. K. Parhi, “Low-latency successive-cancellation list
decoders for polar codes with multibit decision,” IEEE Transactions
on Very Large Scale Integration (VLSI) Systems, vol. 23, no. 10, pp.
2268–2280, Oct 2015.
[5] J. Lin and Z. Yan, “An efficient list decoder architecture for polar codes,”
IEEE Transactions on Very Large Scale Integration (VLSI) Systems,
vol. 23, no. 11, pp. 2508–2518, Nov 2015.
[6] J. Lin, C. Xiong, and Z. Yan, “A high throughput list decoder architecture
for polar codes,” IEEE Transactions on Very Large Scale Integration
(VLSI) Systems, vol. 24, no. 6, pp. 2378–2391, June 2016.
[7] A. Balatsoukas-Stimming, M. B. Parizi, and A. Burg, “Llr-based suc-
cessive cancellation list decoding of polar codes,” IEEE Transactions on
Signal Processing, vol. 63, no. 19, pp. 5165–5179, Oct 2015.
[8] B. Li, H. Shen, and K. Chen, “A decision-aided parallel sc-list decoder
for polar codes,” arXiv preprint arXiv:1506.02955(2015), 2015.
[9] S. A. Hashemi, C. Condo, and W. J. Gross, “Fast and flexible successive-
cancellation list decoders for polar codes,” IEEE Transactions on Signal
Processing, vol. 65, no. 21, pp. 5756–5769, Nov 2017.
[10] A. Mishra, A. J. Raymond, L. G. Amaru, G. Sarkis, C. Leroux,
P. Meinerzhagen, A. Burg, and W. J. Gross, “A successive cancellation
decoder asic for a 1024-bit polar code in 180nm cmos,” in 2012 IEEE
Asian Solid State Circuits Conference (A-SSCC), Nov 2012, pp. 205–
208.
[11] Y. S. Park, Y. Tao, S. Sun, and Z. Zhang, “A 4.68gb/s belief propagation
polar decoder with bit-splitting register file,” in 2014 Symposium on VLSI
Circuits Digest of Technical Papers, June 2014, pp. 1–2.
[12] P. Giard, A. Balatsoukas-Stimming, T. C. Mller, A. Bonetti, C. Thibeault,
W. J. Gross, P. Flatresse, and A. Burg, “Polarbear: A 28-nm fd-soi asic
for decoding of polar codes,” IEEE Journal on Emerging and Selected
Topics in Circuits and Systems, vol. 7, no. 4, pp. 616–629, Dec 2017.
[13] K. Niu and K. Chen, “Crc-aided decoding of polar codes,” IEEE
Communications Letters, vol. 16, no. 10, pp. 1668–1671, October 2012.
[14] H. Zhang, R. Li, J.Wang, S. Dai, G. Zhang, Y. Chen, H. Luo, and J.Wang,
“Parity-check polar coding for 5g and beyond,” 2018 International
Conference on Communications (ICC), pp. 1–6, May 2018.
