Toward Terabits-per-second Communications: A High-Throughput Hardware
  Implementation of $G_N$-Coset Codes by Tong, Jiajie et al.
ar
X
iv
:2
00
4.
09
89
7v
1 
 [c
s.I
T]
  2
1 A
pr
 20
20
Toward Terabits-per-second Communications: A
High-Throughput Hardware Implementation of
GN-Coset Codes
Jiajie Tong, Xianbin Wang, Qifan Zhang, Huazi Zhang, Shengchen Dai, Rong Li, Jun Wang
Huawei Technologies Co. Ltd.
Email: {tongjiajie, wangxianbin1, Qifan.Zhang, zhanghuazi, daishengchen, lirongone.li,
justin.wangjun}@huawei.com
Abstract—Recently, a parallel decoding algorithm of GN -coset
codes was proposed. The algorithm exploits two equivalent decod-
ing graphs. For each graph, the inner code part, which consists
of independent component codes, is decoded in parallel. The
extrinsic information of the code bits is obtained and iteratively
exchanged between the graphs until convergence. This algorithm
enjoys a higher decoding parallelism than the previous successive
cancellation algorithms, due to the avoidance of serial outer code
processing. In this work, we present a hardware implementation
of the parallel decoding algorithm, it can support maximum
N = 16384. We complete the decoder’s physical layout in
TSMC 16nm process and the size is 999.936µm×999.936µm, ≈
1.00mm2. The decoder’s area efficiency and power consumption
are evaluated for the cases of N = 16384, K = 13225 and
N = 16384, K = 14161. Scaled to 7nm process, the decoder’s
throughput is higher than 477Gbps/mm2 and 533Gbps/mm2
with five iterations.
I. INTRODUCTION
A. Background
High throughput is one of the primary targets for the
evolution of mobile communications. The next generation of
mobile communication, i.e., 6G, is expected to supply 1 Tbp/s
throughput [1]. which requires roughly a 100x increase in
throughput over the 5G standards.
GN -coset codes, defined by Arıkan in [2], are a class of
linear block codes with the generator matrix GN . GN is an
N×N binary matrix defined as GN , F⊗n, in which N = 2n
and F⊗n denotes the n-th Kronecker power of F = [ 1 01 1 ].
Polar codes is a specific type of GN -coset codes, adopted
for the 5G control channel, respectively. The throughput of
polar codes is limited by the successive cancellation (SC)
decoders, since they are serial in nature.
Recently, a parallel decoding framework of GN -coset codes
is proposed in [3]. It is alternately decoded on two factor
graphs G and Gpi, as shown in Fig. 1. The permuted graph
is generated by swapping the inner codes and outer codes.
The decoder only decodes the inner codes of each graph
Λ ∈ {G,Gpi}. In each Λ, the inner codes are
√
N independent
sub-codes that can be decoded in parallel. The code construc-
tion under the parallel decoding algorithm is different from
polar/RM codes, and is studied separately in [3].
B. Motivations and Contributions
This paper introduce an ASIC implementation based on
the parallel decoding framework (PDF). We set up a decoder
which can support N = 16384 GN -coset codes. It deploys√
N = 128 sub-decoder to decode the 128 independent sub-
codes in parallel. The target of high throughput and area
efficiency is decomposed into the reduction of sub-codes
decoding latency, worst-case iteration time, and chip area, and
optimized respectively.
We adopt the proposal in [4] which employs successive
cancellation (SC) decoders as the component decoder. It can
support soft-in-hard-out decoding which results in low decod-
ing complexity and reduced interconnection among the com-
ponent decoders. In this work, we propose hardware-oriented
optimizations on LLR generation and quantization. We imple-
mented the whole decoder in hardware and present the ASIC
layout to evaluate multiple key metrics. The hardware-specific
data is obtained from the cells flip ratio from the circuit
simulation results and the parasitic capacitance extracted from
the layout result. With 16nm process, the area efficiency
is 120Gbps/mm2, the power consumption is 100mW and
energy efficiency is around 1pJ/bit. Scaled to 7nm, the area
efficiency can reach 533Gbps/mm2 with five iterations.
II. PARALLEL DECODING
A parallel decoding framework is introduced in [3], where
three types of component decoders (i) soft-output SC list
(SCL), (ii) soft-output SC permutation list and (iii) soft
cancellation (SCAN) are employed to decode the sub-codes
(inner codes). To achieve even higher area efficiency, this
work adopts SC [4], i.e., without list decoding, as the sub-
decoder. In this section, we describe the parallel decoding
framework-successive cancellation (PDF-SC) algorithm from
the implementation point of view.
A. Parallel decoding framework (PDF)
We use (i, j) to denote the j-th code bit position of i-th
inner sub-code, and k to denote the code bit position in the
GN -coset code. They have the following relationship
k =
{
j ×√N + i, for Λ = G,
i×
√
N + j, for Λ = Gpi. (1)
2Stage
1
Stage 
2
Stage 
3
Stage 
4
Stage 
1
Stage 
2
Stage 
3
Stage 
4
(a) (b)
Outer codes Inner codes Outer codes Inner codes
 pxx =
 pxx =
 pxx =
 pxx =
 pxx =
 pxx =
 pxx =
 pxx =
 pxx =
 pxx =
 pxx =
 pxx =
 pxx =
 pxx =
 pxx =
 pxx =
 xx =
 xx =
 xx =
 xx =
 xx =
 xx =
 xx =
 xx =
 xx =
 xx =
 xx =
 xx =
 xx =
 xx =
 xx =
 xx =
u
u
u
u
u
u
u
u
u
u
u
u
u
u
u
u
u
u
u
u
u
u
u
u
u
u
u
u
u
u
u
u
Fig. 1. For GN -coset codes, equivalent encoding graphs may be obtained
based on stage permutations: (a) Arıkan’s original encoding graph [2] and
(b) stage-permuted encoding graph. Each node adds (mod-2) the signals on
all incoming edges from the left and sends the result out on all edges to the
right.
The aforementioned parallel turbo-like decoding framework
is described in Algorithm 1. In every two iterations, the
algorithm alternately decodes the two graphs G and Gpi (line
4 in Algorithm 1) with
√
N inner component decoders. The
i-th component decoder, denoted by SubDecoder(), is a SC
decoder assisted by the error detector.
Note that each component decoder takes a soft input vector
Lti, but outputs hard code bit estimates cˆ and error detecting
results e. The mismatch between soft input and hard output
poses a challenge for iterative decoding, since the SC compo-
nent decoders in the next iteration cannot directly take the
hard output from the previous iteration as input. To solve
this problem, [4] proposes to generate soft values from the
hard outputs. Specifically, the log likelihood ratio (LLR) of
the (i, j)-th code bit in the t-th iteration, denoted by Lt(i,j),
is calculated from the hard decoder outputs from the alternate
graph (line 5).
For the i-th component decoder, the hard output vector and
soft input vector have length
√
N
cˆti , {cˆt(i,j), j = 0 · · ·
√
N − 1},
Lti , {Lt(i,j), j = 0 · · ·
√
N − 1},
and the error detection indicator eti is a binary value.
The
√
N independent inner sub-codes allow us to instantiate√
N component decoders for maximum degree of parallelism.
After tmax iterations, the algorithm outputs all N hard bits.
B. Component decoder: SubDecoder()
The component decoder SubDecoder() [4] is described
in Algorithm 2. Before each SC decoding, error detection
is performed. This can be achieved by applying a syndrome
check based on hard decisions (line 2∼5 in Algorithm 2). The
cases with detected errors are denoted by Type-1 and otherwise
Type-2.
The error detector brings two-fold advantages. On the one
hand, since Type-2 component codes require no further SC
Algorithm 1 Parallel decoding framework.
Require:
The received signal y = {yk, k = 0 · · ·N − 1};
Ensure:
The recovered codeword: xˆ = {xˆk, k = 0 · · ·N − 1};
1: Initilize: Lch,k ,
2yk
σ2
for k = 0 · · ·N − 1;
2: for iterations: t = 1 · · · tmax do
3: Select decoding graph: Λ = (t%2) ? G : Gpi ;
4: for inner component decoders: i = 0, 1 · · ·√N − 1 (in
parallel) do
5: Lt(i,j) = Lgen(Lch,k, cˆ
t−1
(j,i), cˆ
t−2
(i,j), e
t−1
j ), ∀j;
6: cˆti, e
t
i = SubDecoder(L
t
i,FΛ,i);
7: end for
8: end for
9: for ∀(i, j) do
10: xˆk = cˆ
tmax
(i,j) ; (i, j)⇌ k is described in (1).
11: end for
decoding, this approach reduces power consumption. If all√
N sub-codes pass error detection, decoding can be early
terminated for further power saving. On the other hand, the
error detection result provides us with additional information
that the input LLRs of Type-2 component codes are more
reliable than those of Type-1. Such information can be used
to improve the overall performance by estimating the input
LLRs from the hard outputs.
Algorithm 2 The i-th SubDecoder()
Require:
The input LLRs: Lti;
Frozen set: FΛ,i.
Ensure:
Binary output cˆti, error detected indicator e
t
i;
1: eti = False;
2: Hard decisions:
cˆt(i,j) = L
t
(i,j) < 0, j ∈ {0, 1, . . . ,
√
N − 1};
3: Vector uˆti = cˆ
t
i ×G√N ;
4: Syndrome check:
5: if uˆt(i,j) 6= 0, ∀j ∈ FΛ,i then
6: eti = True;
7: SC decoding: cˆti = SCDecoder(L
t
i,FΛ,i);
8: end if
C. Input LLR generator: Lgen()
In each iteration, the input LLRs are calculated by
• Type-1: et−1j = 1
Lt(i,j) = Lch,k+
2αt
σ2
(1−2cˆt−1(j,i))−
2βt
σ2
(1−2cˆt−2(i,j)), (2)
• Type-2: et−1j = 0
Lt(i,j) = Lch,k +
2γt
σ2
(1 − 2cˆt−1(j,i)), (3)
where a set of damping factors (αt, βt, γt) are defined for
each iteration, and k is calculated from (i, j) according to (1).
The specific values of damping factors are optimized in [4].
3Since SC decoder is invariant to input LLR scaling, we
can cancel noise variance σ2 during LLR initialization. By
multiplying both sides of (2) and (3) by σ
2
2 , the equations are
simplified as follows{
L˜ti,j = yk + α
t(1− 2cˆt−1(j,i))− βt(1− 2cˆt−2(i,j)), et−1j = 1,
L˜ti,j = yk + γ
t(1− 2cˆt−1(j,i)), et−1j = 0,
(4)
where L˜ , L×σ
2
2 and y is the received signal.
We use a pair of new coefficients to replace α and β:{
δt = αt + βt,
θt = αt − βt.
Hence (4) can be replace by (5).
L˜ti,j = yk +∆
t
i,j(1− 2cˆt−1(j,i)), (5)
in which ∆ti,j is determined by the binary e
t−1
j and the hard
outputs of the previous two iterations ct−1j,i , c
t−2
i,j as in (6). The
input signal calculation reduces to only one addition operation.
∆ti,j =


δt,where et−1j = 1 and c
t−1
j,i 6= ct−2i,j ,
θt,where et−1j = 1 and c
t−1
j,i = c
t−2
i,j ,
γt,where et−1j = 0.
(6)
III. AN ASIC IMPLEMENTATION
In this section, we present the ASIC implementation of a
PDF-SC decoder in TSMC 16nm process for N = 16384.
The hardware optimization addresses both the SC decoders for
component codes and the overall parallel decoding framework
for GN -coset codes.
A. Bit Quantization
Lower precision quantization is the key to higher through-
put, thanks to its reduced implementation area and increased
clock frequency. As a tradeoff, performance loss is expected.
To maximize throughput while retaining performance, we must
determine an appropriate quantization width. Specifically, we
use simulation to find out the smallest quantization width of
a fixed-point decoder within 0.1dB performance loss from a
floating decoder.
First, we compare the performance of component codes un-
der Algorithm 2. We test two cases Nsub = 128,Ksub = 115
and Nsub = 128,Ksub = 119 with different quantization
widths. According to Fig.2, 6-bit quantization achieves the
same performance as floating-point, 5-bit quantization incurs
< 0.1dB loss, and 4-bits quantization yields significant loss.
Therefore, we set the quantization width to 5 or 6 bits.
We then simulate the BLER performance of the long codes
N = 1282,K = 1152 and N = 1282,K = 1192 under
different quantizations, as shown in Fig. 3. Similarly, 6-bit
quantization has no performance loss, and 5-bit quantization
only incurs < 0.1dB loss for both code rates. Again, 4-bit
quantization suffers from performance degradation.
Es/N0(dB)
6.5 7 7.5 8 8.5 9 9.5
BL
ER
10-4
10-3
10-2
10-1
N=128,K=115
Floating
Fixed, Width=6
Fixed, Width=5
Fixed, Width=4
Es/N0(dB)
7 8 9 10
10-4
10-3
10-2
10-1
N=128,K=119
Floating
Fixed, Width=6
Fixed, Width=5
Fixed, Width=4
Fig. 2. Sub-Decoder Performance Comparison between Floating Point and
Fixed Point.
Es/N0(dB)
5.4 5.6 5.8 6 6.2
BL
ER
10-4
10-3
10-2
10-1
100
N=16384,K=13225
Floating
Fixed, Width=6
Fixed, Width=5
Fixed, Width=4
Es/N0(dB)
6.4 6.6 6.8 7
10-4
10-3
10-2
10-1
100
N=16384,K=14161
floating
Fixed, Width=6
Fixed, Width=5
Fixed, Width=4
Fig. 3. Decoder Performance Comparison between Floating Point and Fixed
Point.
B. SC Core Optimization
A component SC decoder is optimized via Rate-0 nodes,
Rate-1 nodes [5], single parity check (SPC) nodes and repeti-
tion nodes (REP) [6]. The decoder skips all Rate-0 nodes,
parallelizes Rate-1, SPC and REP nodes for code blocks
shorter than 32. If neither applies, maximum-likelihood (ML)
multi-bit decision [7] is employed for 4-bit blocks.
The architecture of SC decoders used here is described
in [8], with supported code length reduced from 32768 to
128 to save area. With TSMC 16nm technology, an SC
core synthesizes to 4,100µm2 area. Under 1.05Ghz clock
frequency, its latency is shown in Table I.
C. SC Core Sharing
A unique design that significantly reduces area is called “SC
core sharing”. In particular, we bind four SC cores as a sub-
decoder group. The four SC cores share input/output pins, LLR
updating circuits and error detector related components. The
sharing reduces a lot of local computation resources and global
TABLE I
SUB-DECODER DECODING LATENCY FOR N=128 POLAR CODES
Information Bits(K) 111 115 119 122
Latency
(Cycle) 24 19 18 13
(ns) 22.8 18.05 17.1 12.35
4PEs Used 
As Adders
LLR 
Storage
Sub-decoder #i #i+1
1
,
t
i j
c
-
#i+2 #i+3
t
i
L
1
,
t
i j
c
-
2
,
t
i j
c
-
2
1,
t
i j
c
-
+
2
2 ,
t
i j
c
-
+
2
3,
t
i j
c
-
+
Update when related sub 
decoder LLRs are ready
t
i
D
t
i
y
1
,
t
j i
c
-
1t
j
P
-
1t
i
P
-
Top interconnections
(Global Routing)
From Other Sub-
decoder Groups
To Other Sub-
decoder Groups
To the other sub cores
L L L...
PE PE PE...
Error 
Detector
~
Sub-decoder Group
Fig. 4. The Sub-Decoders Group architecture.
TABLE II
DECODER AREA EFFICIENCY
Info Iter- Es/N0 Latency Area Eff. Convert to
size ation (dB) (ns) (Gbps/mm2) 10nm 7nm
13225
4 7.14 91.2 135.07 310.66 596.47
5 6.82 114 108.06 248.53 477.17
6 6.55 136.8 90.05 207.11 397.64
7 6.36 159.6 77.18 177.52 340.84
8 6.20 182.4 67.53 155.33 298.23
14161
4 7.79 87.4 150.92 347.11 666.45
5 7.48 109.25 120.73 277.69 533.16
6 7.22 131.1 100.61 231.41 444.30
7 7.06 152.95 86.24 198.35 380.83
8 6.97 174.8 75.46 173.55 322.22
routing resources, but increases the latency between iterations.
However, the overall area efficiency and power efficiency are
improved.
In addition, we reuse the adders in the SC core to perform
the input LLR addition in (5). These adders were used for the
f+-function calculation in each processing element (PE) [9].
Altogether, 30% area can be saved for each sub-decoder group.
Fig. 4 shows the architecture of a sub-decoder group, including
how adders in the PEs are reused. We run the synthesis flow
with TSMC 16nm process, and the resulting area of a sub-
decoder group is 19,400µm2.
D. Global Layout
Combining all algorithmic and hardware optimizations,
the synthesized decoder ASIC requires 999.936µm ×
999.936µm ≈ 1.00mm2 area. The global layout is presented
in Fig. 5. In the center of the layout is the top logic, including
the input channel LLR storage, finite state machine (FSM)
controller, interleaved connection routing and output buffer.
The aforementioned 32 sub-decoder groups (SG in the figure)
are placed around the top logic, highlighted by different colors.
SG-0
SG-1
SG-2
SG-3
SG-4
SG-5
SG-6
SG-7 SG-8 SG-9
SG-10
SG-11
SG-12
SG-13
SG-14
SG-15
SG-16SG-17SG-18
SG-19
SG-20
SG-21
SG-22
SG-23
SG-24 SG-25
SG-26
SG-27
SG-28
SG-29
SG-30
SG-31
Channel-LLR Storage, 
FSM Controller,
Global routing,
Output Buffer
Output
In
p
u
t
9
9
9
.9
3
6
u
m
Fig. 5. The global layout of ASIC for parallel decoding.
IV. KEY PERFORMANCE INDICATORS
The key performance indicators (KPIs) are examined.
First of all, we evaluate the area efficiency using equa-
tion AreaEff(Gbps/mm2) = Info Size(bits)
Latency(ns)×Area(mm2)
1. The
proposed decoder can reach up to hundreds of gigabits per
second within one square millimeter. The evaluated throughput
in given in Table II2. With TSMC 16nm process, the area
efficiency for code rate 13225/16384 and 14161/16384 are
67Gbps/mm2 and 75Gbps/mm2 when the maximum number
of iterations is eight. If we reduce to five iterations by
allowing 0.5db performance loss, the area efficiency can reach
108Gbps/mm2 and 120Gbps/mm2. The estimated through-
put under 10nm and 7nm technologies can be converted from
16nm3. With the more advanced 7nm process, the throughput
is as high as 477Gbps/mm2 and 533Gbps/mm2, which are
much higher than the 100Gbps/mm2@7nm target given in the
EPIC project [12]. Note that the KPI is achieved at code length
16384, which exhibits significant coding gain over codes with
length 1024 [4]. With future technologies of 5nm and below,
it is promising to achieve an extremely challenging target of
1Tbps/mm2.
The area efficiency is also compared with a highly-
optimized and fabricated ASIC polar4 decoder in [8]. For both
code rates, the throughput of the proposed decoder (with five
iterations) is nine times as fast as a polar fast-SC decoder,
and 53 times that of a CA-SC-List-8 decoder5. Detailed
1The error detector can early terminate the decoding, but its worst-case
latency is guaranteed by maximum iteration times, as required by most
practical communication systems.
2The third column “Es/N0” is chosen such that BLER= 10−4.
3The converting ratios including cell density ratio and speed improvement
ratio are obtained from the TSMC process introductions [10], [11].
4The polar codes are constructed by Gaussian approximation at
Es/N0=5.3dB, 6.3dB, 5.3dB and 6.0dB for (N,R)=(16384, 0.807),
(16384, 0.864), (32768, 0.807) and (32768, 0.864) respectively. N is code
length and R is code rate.
5In [8], the fabricated ASIC SC decoder supports code length 32768.
5TABLE III
COMPARISONWITH FABRICATED ASIC OF TRADITIONAL POLAR DECODER
Implementation This Work This Work This Work This Work [8] [8] [8] [8]
Construction GN -coset GN -coset GN -coset GN -coset Polar Polar Polar Polar
Decoding Algorithm PDF-SC PDF-SC PDF-SC PDF-SC SC SC CA-SC-List CA-SC-List
List size / Iterations 5 8 5 8 1 1 8 8
Code Length 16384 16384 16384 16384 32768 32768 16384 16384
Code Rate 0.807 0.807 0.864 0.864 0.807 0.864 0.807 0.864
EsN0@BLER=10−4 6.82 6.20 7.48 6.97 5.61 6.49 5.24 6.13
Technology All in TSMC 16nm
Clock Ferquency(Ghz) 1.05 1.05 1.05 1.05 1.00 1.00 1.00 1.00
Throughtput (Gbps) 108.06 67.53 120.73 75.46 4.16 4.56 0.91 1.01
Area(mm2) 1.00 1.00 1.00 1.00 0.35 0.35 0.45 0.45
Area Eff(Gbps/mm2) 108.06 67.53 120.73 75.46 11.89 13.02 2.02 2.24
EsN0(dB)
5 6 7 8
Av
er
ag
e 
Su
b-
De
co
de
r R
un
 T
im
es
0
200
400
600
800
1000
Info-size=13225 maxIter=5
Info-size=13225 maxIter=8
Info-size=14161 maxIter=5
Info-size=14161 maxIter=8
EsN0(dB)
5 6 7 8
Po
w
er
(m
W
)
0
50
100
150
200
250
300
350
400
Info-size=13225 maxIter=5
Info-size=13225 maxIter=8
Info-size=14161 maxIter=5
Info-size=14161 maxIter=8
EsN0(dB)
5 6 7 8
En
er
gy
(nJ
/bi
t)
0
1
2
3
4
5
6
Info-size=13225 maxIter=5
Info-size=13225 maxIter=8
Info-size=14161 maxIter=5
Info-size=14161 maxIter=8
Fig. 6. Left: The Sub-decoder running times during per packet decoding. Center: Decoding Power efficiency (mW/mm2). Right: Energy efficiency (pJ/bit).
comparison results can be found in Table III.
We further evaluate the average running time and power
consumption (per packet). In the lower SNR region, longer
decoding time and higher power consumption are observed.
But in the higher SNR region, both decoding time and power
consumption are much smaller, thanks to the built-in error
detection and early termination function described in sec-
tion II-B. Specifically, only 15% component codes cannot pass
the error detection while the rest 85% can skip SC decoding.
The power consumption (mW/mm2) and energy efficiency
(pJ/bit) follow similar trend to the average running time.
When Es/N0> 8dB, the power consumption is smaller than
100mW/mm2. The energy efficiency is around 1pJ/bit, again
meeting the target proposed in [12]. These results6 are plotted
in Fig. 6. The power consumption and energy efficiency are
evaluated with TSMC 16nm process.
V. CONCLUSIONS
In this paper, we present an ASIC implementation of high-
throughput GN -coset codes. The parallel decoding framework
leads to a hardware with high area efficiency and low decoding
power consumption. An area efficiency of 120Gbps/mm2 is
achieved within approximately 1mm2@16nm process. The
power consumption is as low as 100mW and energy efficiency
is around 1pJ/bit. Scaled to 7nm process, the area efficiency
can reach 533Gbps/mm2. It confirms that GN -coset codes
6Note that the power consumption with eight iterations is lower than that
with five iterations. This is due to the lower error detection successful rate
during the first five (1-5) iterations. The last three (5-8) iterations consume
much lower power, which reduced the averaged power level.
can meet the high-throughput demand in next-generation wire-
less communication systems.
REFERENCES
[1] W. Saad, M. Bennis, and M. Chen, “A vision of 6G wireless systems:
applications, trends, technologies, and open research problems,” IEEE
Network, 2019.
[2] E. Arıkan, “Channel polarization: a method for constructing capacity-
achieving codes for symmetric binary-input memoryless channels,” IEEE
Transactions on Information Theory, vol. 55, no. 7, pp. 3051-3073, Jul.
2009.
[3] X. Wang et al., “On the construction of GN -coset codes for parallel
decoding,” Accepted by IEEE Wireless communications and Networking
Conference 2019.
[4] X. Wang et al., “Toward Terabits-per-second Communications: Low-
Complexity Parallel Decoding of GN -Coset Codes,” Available on ArXiv.
[5] A. Alamdar-Yazdi and F. R. Kschischang, “A Simplified Successive-
Cancellation Decoder for Polar Codes,” in IEEE Communications Letters,
vol. 15, no. 12, pp. 1378-1380, December 2011.
[6] S. A. Hashemi, C. Condo, and W. J. Gross, “Fast and flexible successive-
cancellation list decoders for polar codes,” IEEE Transactions on Signal
Processing, vol. 65, no. 21, pp. 5756–5769, Nov 2017.
[7] G. Sarkis, P. Giard, A. Vardy, C. Thibeault and W. J. Gross, “Fast Polar
Decoders: Algorithm and Implementation,” in IEEE Journal on Selected
Areas in Communications, vol. 32, no. 5, pp. 946-957, May 2014.
[8] X. Liu et al., “A 5.16Gbps decoder ASIC for Polar Code in 16nm
FinFET,” 2018 15th International Symposium on Wireless Communication
Systems (ISWCS), Lisbon, 2018, pp. 1-5.
[9] A. Balatsoukas-Stimming, M. B. Parizi and A. Burg, “LLR-Based Suc-
cessive Cancellation List Decoding of Polar Codes,” in IEEE Transactions
on Signal Processing, vol. 63, no. 19, pp. 5165-5179, Oct. 2015.
[10] Taiwan Semiconductor Manufacturing Company Lim-
ited, “TSMC 10nm Technology,” [Online]. Available:
https://www.tsmc.com/english/dedicatedFoundry/technology/10nm.htm.
[11] Taiwan Semiconductor Manufacturing Company Lim-
ited, “TSMC 7nm Technology,” [Online]. Available:
https://www.tsmc.com/english/dedicatedFoundry/technology/7nm.htm.
[12] “EPIC - Enabling practical wireless Tb/s communications with next gen-
eration channel coding.” [Online]. Available: https://epic-h2020.eu/results.
