Toward Terabits-per-second Communications: Low-Complexity Parallel
  Decoding of $G_N$-Coset Codes by Wang, Xianbin et al.
ar
X
iv
:2
00
4.
09
90
7v
1 
 [c
s.I
T]
  2
1 A
pr
 20
20
Toward Terabits-per-second Communications:
Low-Complexity Parallel Decoding of GN-Coset
Codes
Xianbin Wang, Jiajie Tong, Huazi Zhang, Shengchen Dai, Rong Li, Jun Wang
Hangzhou Research Center, Huawei Technologies, Hangzhou, China
Emails: {wangxianbin1,zhanghuazi,lirongone.li,justin.wangjun}@huawei.com
Abstract—Recently, a parallel decoding framework of GN -
coset codes was proposed. High throughput is achieved by
decoding the independent component polar codes in parallel.
Various algorithms can be employed to decode these component
codes, enabling a flexible throughput-performance tradeoff. In
this work, we adopt SC as the component decoders to achieve
the highest-throughput end of the tradeoff. The benefits over
soft-output component decoders are reduced complexity and
simpler (binary) interconnections among component decoders. To
reduce performance degradation, we integrate an error detector
and a log-likelihood ratio (LLR) generator into each component
decoder. The LLR generator, specifically the damping factors
therein, is designed by a genetic algorithm. This low-complexity
design can achieve an area efficiency of 533Gbps/mm2 under
7nm technology.
I. INTRODUCTION
A. GN -coset codes
GN -coset codes, defined by Arıkan in [1], are a class of
linear block codes with the generator matrix GN .
GN is an N ×N binary matrix defined as
GN , F
⊗n, (1)
in which N = 2n and F⊗n denotes the n-th Kronecker power
of F = [ 1 01 1 ].
The encoding process is
xN1 = u
N
1 GN , (2)
where xN1 , {x1, x2, · · · , xN} and uN1 , {u1, u2, · · · , uN}
denote the code bit sequence and the information bit sequence
respectively.
An (N,K) GN -coset code [1] is defined by an information
set A ⊂ {1, 2, ..., N}, |A| = K . Its generator matrix GN (A)
is composed of the rows indexed by A in GN . Thus (2) is
rewritten as
xN1 = u(A)GN (A), (3)
where u(A) , {ui|i ∈ A}.
The key to constructing GN -coset codes is to properly
determine an information set A. RM codes [2] and polar codes
[1] are two well-known examples of GN -coset codes. They
determine A according to Hamming weight and sub-channel
reliability, respectively.
Recently, a parallel decoding framework of GN -coset codes
is proposed in [3]. As shown in Fig. 1(a), the encoding process
of GN -coset codes can be described by an n-stage encoding
Stage 1 Stage 2 Stage 3 Stage 4 Stage 1 Stage 2 Stage 3 Stage 4
Outer codes Inner codes Outer codes Inner codes
u
u
u
u
u
u
u
u
u
u
u
u
u
u
u
u
u
u
u
u
u
u
u
u
u
u
u
u
u
u
u
u
 xx =
 xx =
 xx =
 xx =
 xx =
 xx =
 xx =
 xx =
 xx =
 xx =
 xx =
 xx =
 xx =
 xx =
 xx =
 xx =
Successful decoding 
or reach the maximum
 iterations 
No
YES
 pxx =
 pxx =
 pxx =
 pxx =
 pxx =
 pxx =
 pxx =
 pxx =
 pxx =
 pxx =
 pxx =
 pxx =
 pxx =
 pxx =
 pxx =
 pxx =
LLR
Decoding result 
(a) (b)
(c)
Parallel 
decoding of 
inner codes
DEC-1
DEC-2
DEC-3
DEC-4
Parallel 
decoding of 
inner codes
DEC-1
DEC-2
DEC-3
DEC-4
Fig. 1. For GN -coset codes, equivalent encoding graphs may be obtained
based on stage permutations: (a) Arıkan’s original encoding graph [1] and (b)
stage-permuted encoding graph. Each node adds (mod-2) the signals on all
incoming edges from the left and sends the result out on all edges to the right.
(c) The parallel decoding framework only processes the inner code parts of
the equivalent graphs, leaving the outer codes unprocessed.
graph. The former and latter stages respectively correspond
to outer and inner codes. The inner codes are independent
component codes that can be decoded in parallel [4].
This parallel framework first produces equivalent encod-
ing/decoding graphs by permuting the inner and outer parts of
the original decoding graph G (see Fig. 1). During decoding,
we only process the inner code parts of these equivalent
graphs, leaving the outer codes unprocessed. The LLRs from
different graphs about the same code bit are exchanged itera-
tively to reach a consensus. Since all inner codes are decoded
in parallel, this decoding framework supports a very high
2degree of parallelism. The code construction under the parallel
decoding algorithm is different from polar/RM codes, and is
studied separately in [3].
B. Motivations and Contributions
This paper mainly focuses on further enhancing decod-
ing throughput. The aforementioned iterative LLR exchange
procedure requires soft-output component decoders such as
SCL and SCAN to provide extrinsic LLRs [3]. But if we
aim at an ultra-high-throughput decoder, implementing soft-
output component decoders gives rise to two problems. First,
the area efficiency of SCL and SCAN is much lower than fast-
SC. Second, the interconnections among the large number of
component decoders consume considerable chip area.
To alleviate both problems, we propose to adopt hard-output
SC as the component decoder. First, the complexity and stor-
age are reduced within each component decoder. Compared
with SCAN with one iteration, SC has 1/4 decoding complex-
ity and 1/2 storage. Compared with soft-output SCL with list
size 8, SC has 1/16 decoding complexity and 1/8 storage [5].
Second, the interconnections among component decoders are
also reduced. Compared with soft-output decoders, hard-
output SC decoders significantly simplifies routing because
only hard bits are propagated among component decoders.
Besides, we introduce an error detector before each SC
decoder to opportunistically reduce computation. If no error
is detected, we skip the SC decoding and directly output the
hard decisions.
To minimize the performance loss due to the above simplifi-
cations, we propose a genetic algorithm based LLR generator.
LLR input (for this iteration) is generated from SC decoding
output (from previous iterations) via a set of damping factors
to determine the amplitudes. The damping factors have a sig-
nificant impact on the decoding performance, and is “learned”
offline through a genetic algorithm based on unsupervised
learning. Compared with “hand-picked” parameters based on
greedy stepwise optimization, the proposed genetic algorithm
exhibits better performance.
II. STAGE PERMUTED PARALLEL DECODING
GN -coset codes [3] natively support parallel decoding, as
the inner codes are independent. To decode these component
codes, various soft-output decoders, e.g., SCL, SC permutation
list and SCAN, are employed [3]. In this work, we propose
hard-output SC decoders to achieve higher area efficiency.
A. Parallel decoding framework
The parallel decoding framework in [3] is modified to
support SC component decoders. In Algorithm 1, a GN -coset
code is alternately decoded on two factor graphs G and Gpi,
as shown in Fig. 1. The stage permuted graph Gpi is generated
by swapping the inner codes and outer codes in G. Only the
inner codes of each graph Λ ∈ {G,Gpi} are decoded. And their
decoding outputs are exchanged between the decoding graphs.
The
√
N component decoders can be implemented in parallel.
Now that we use SC to decode the component codes, the
hard output must be converted into soft LLR as input for the
Algorithm 1 Parallel decoding framework.
Require:
The received signal y = {yi, i = 1 · · ·N};
Ensure:
The recovered codeword: xˆ = {xˆi, i = 1 · · ·N};
1: Initialize Lch,i ,
2yi
σ2
, ∀i; epi,i = 0, ei = 0, ∀i; Λ = G;
2: for iterations: t = 1 · · · tmax do
3: Select decoding graph: Λ = Gpi if Λ == G else G;
4: if Λ is G then
5: for inner component codes: i = 1 · · ·√N (in paral-
lel) do
6: ei = ErrorDetector(xˆ
t−1
pi,i,∀j );
7: if ei == 0 then
8: xˆti,∀j = xˆ
t−1
pi,i,∀j ;
9: else
10: Lti,j = LLRgen(Lch,i+(j−1)
√
N
, xˆt−1pi,i,j , xˆ
t−2
i,j , epi,j),
∀j;
11: xˆti,∀j = SCdecoder(L
t
i,∀j);
12: end if
13: end for
14: else
15: for inner component codes: i = 1 · · ·√N (in paral-
lel) do
16: epi,i = ErrorDetector(xˆ
t−1
∀j,i);
17: if epi,i == 0 then
18: xˆtpi,∀j,i = xˆ
t−1
∀j,i;
19: else
20: Ltpi,j,i = LLRgen(Lch,(i−1)
√
N+j , xˆ
t−1
j,i , xˆ
t−2
pi,j,i, ej),
∀j;
21: xˆtpi,∀j,i = SCdecoder(L
t
pi,∀j,i);
22: end if
23: end for
24: end if
25: end for
next iteration. Therefore, an LLR generator is placed before
the SC decoder (line 8). Meanwhile, an error detector is placed
before the LLR generator (line 6).
For decoding graph G (resp. Gpi), the j-th code bit of
the i-th inner component code is denoted by x(i, j) (resp.
xpi(j, i)). Take graph G for example, the hard outputs (HO)
from different component decoders of the previous iteration
are combined into xˆt−1pi,i,∀j and then sent for error detection
(line 6).
• If no error is detected, i.e., the error detection output
(E) ei = 0, then xˆ
t−1
pi,i,∀j are directly taken as the new
“HO” result of this iteration (line 8), and SC decoding is
skipped.
• Otherwise, if ei = 1, the LLR of code bit x(i, j) in the
t-th iteration, denoted by Lti,j , is generated from chan-
nel LLR Lch,i+(j−1)
√
N and previous “HO&E” results
(line 10). The generated LLRs are decoded by SC to
output new “HO” results (line 11).
Either way, new “HO&E” results are sent to the next iteration.
After tmax iterations, the algorithm outputs the estimated
codeword of the last decoding iteration as results.
3LLR 
generator
SC 
decoder
Error 
detector
ch
L
M
U
X
E
E
HO
HO
M
U
X
Fig. 2. A component decoder consists of an error detector, an LLR generator
and an SC decoder. The “E” result of this iteration is taken to switch the MUX.
If no error is detected (E=0), the “HO” result from the previous iteration is
directly taken as the new “HO” result of this iteration. Otherwise, LLRs are
generated for SC decoding to output new “HO” result.
B. SC as component decoder
A component decoder consists of three parts, an error
detector, an LLR generator and an SC decoder (see Fig. 2).
An SC decoder takes soft LLR input but generates hard bits
output. The mismatch between hard output and soft input
poses a challenge for iterative decoding, as the hard output
cannot be directly used as soft input for the next iteration. To
solve this problem, an LLR generator is required to generate
soft values from the hard output.
An error detector is placed before the LLR generator and it
serves two purposes. First, if its input vector (i.e., HO from the
previous iteration) is a codeword (no error detected), LLR gen-
eration and SC decoding can be skipped to save computation.
Second, it provides a way to estimate the reliability of hard
bits. Heuristically, if an input vector is already a codeword,
they are deemed more reliable. Otherwise, if error detection
failed, there is a chance that the error can not be corrected by
an SC decoder, which implies less reliability. Therefore, error
detection results facilitate the “recovery” of soft LLRs for the
next iteration.
In practice, an error detector based on syndrome check can
be implemented by reusing the encoding circuit. It costs almost
no additional hardware resource.
The LLR generator is activated when an error detection fails.
It takes four inputs (i) the channel LLR, (ii,iii) the hard outputs
of the previous two iterations, and (iv) the error detection
output of the previous iteration.
Take the non-permuted graph G for example. For code bit
xi,j , its input LLR is generated based on the previous-iteration
error detection output epi,j ∈ {0, 1}.
If epi,j = 1, meaning error detection failed and hard output
is from an SC decoder, the input LLR is the sum of channel
LLR and hard outputs from the previous two iterations:
Lti,j = Lch,i+(j−1)
√
N +
2αt
σ2
(1− 2xˆt−1pi,i,j)−
2βt
σ2
(1− 2xˆt−2i,j ),
(4)
where αt and βt respectively denote the damping factors,
which determine the amplitude.
If epi,j = 0, meaning hard output is directly from an error
detector since no error was found, the input LLR is the sum of
the channel LLR and hard output from the previous iteration:
Lti,j = Lchan,i+(j−1)
√
N +
2γt
σ2
(1− 2xˆt−1pi,i,j), (5)
where the damping factor is denoted by γt.
Finally, the input LLR vector is sent to an SC decoder to
output new “HO” results.
III. GENETIC ALGORITHM BASED LLR GENERATOR
DESIGN
The LLR generator, parameterized by the three damping
factors, has a significant impact on the overall performance.
Unfortunately, a theoretical optimum is difficult to obtain
due to the following reasons. First, the extrinsic information
transfer analysis is hard with the proposed component decoder.
Second, the output of the component decoder is correlated with
all its input vector due to the loopy decoding graph. Both make
conventional density evolution methods inapplicable.
Artificial intelligence provides an alternative method in
the case where a precise theoretical approach is unavailable.
Recently, deep learning, reinforcement learning and genetic
algorithm have been applied to design better code construc-
tions [7] and decoding algorithms [8], [9].
Inspired by this, we exploit a genetic algorithm based on
unsupervised learning to design the damping factors. Damping
factors play a similar role of chromosomes in the genetic
algorithm, because they both individually and collaboratively
contribute to the fitness of a candidate. A good candidate
requires that all its damping factors are respectively good.
As such, a pair of good parents is likely to produce a good
offspring, and this suggests that the genetic algorithm may
ultimately converge to a good candidate.
At first, we start the genetic algorithm by initializing a
population of sizeM . Each candidate contains 3tmax damping
factors, including αt, βt and γt, t = 1, 2, ..., tmax. tmax
denotes the maximum decoding iteration. We initialize each
candidate as follows.
• Without any given prior knowledge, the initial damp-
ing factors are sampled from a uniform distribution
U(0, vsup). By adjusting the parameter vsup, we can trade
optimality (a larger vsup) for convergence rate (a smaller
vsup).
We observe that α1, β1 and γ1 (used to calculate LLR for
the first decoding iteration) can be directly set to 0 without
any performance loss, since there is no information from the
previous iteration. Similarly, β2 is directly set to 0.
The population are evaluated through Monte Carlo method
and then ordered based on decoding performance. The mini-
mum signal-to-noise ratio to achieve a target block error rate
(SNR@targetBLER) is taken as the performance metric.
Then, the algorithm enters a loop consisting of four steps.
1) Select two distinct parents from the population. The
i-th candidate is selected according to a probability
e−λi∑
M
j=1
e−λj
(normalized), where λ is called the sample
focus. In this way, a better candidate will be selected
with a higher probability. By adjusting the parameter λ,
we can tradeoff between exploitation (a larger λ) and
exploration (a smaller λ).
2) Crossover between parents to produce an offspring.
Specifically, each damping factor of the offspring is
4TABLE I
HYPER PARAMETERS OF GENETIC ALGORITHM
Parameters Value
Population Size (M ) 32
vsup 2
Sample focus(λ) 0.01
Mutate probability (pmutate) 0.07
Mutate variance (σmutate) 0.3
Iteration
0 500 1000 1500 2000 2500 3000
SN
R@
BL
ER
10
-
3
6.5
7
7.5
8
8.5
9
Evolution process of the required SNR @ BLER=10-3
Learning trajectories:  GN-coset + SC + LLRgen
''Hand-picked'' parameters: GN-coset + SC + LLRgen
Learning trajectories: GN-coset + SC
''Hand-picked'' parameters: GN-coset + SC 
Case 1:(208,8.056)
Case 2:(838,7.396)
Baseline a: 7.51
Case 3:(3156,7.322)
Baseline b: 7.02
Case 4:(447,7.166)
Case 5:(1237,6.895) Case 6:(3000,6.823)
Fig. 3. The learning trajectories of the SNR@BLER= 10−3. After about
600 iterations, the genetic algorithm “learned” better damping factors than
the “hand-picked” ones based on greedy stepwise optimization. After about
3000 iterations, the algorithms in both cases converge.
randomly selected from the corresponding ones of its
parents.
3) Mutate the offspring randomly. This is implemented
by independently mutating each damping factor with
probability pmutate. Specifically, if one damping factor
is mutated, a random value sampled from Gaussian
distribution N (0, σ2mutate) is added up to it. By adjusting
pmutate and σ
2
mutate, we can tradeoff optimality (larger
σ2mutate and pmutate) and convergence rate (smaller σ
2
mutate
and pmutate).
4) Insert the offspring back to the population according to
the decoding performance.
The algorithm loop is terminated after reaching a maximum
number of iterations.
IV. PERFORMANCE EVALUATION
We evaluate the performance gain brought by the proposed
LLR generator and the genetic algorithm, respectively. The
hyper parameters1 of the genetic algorithm are provided in
Table I. The learning trajectories of the required SNR to
achieve BLER=10−3 are presented in Fig. 3. It shows that
the decoding performance first improves rapidly as the genetic
algorithm iterates and then converges gradually.
Two types of gains can be observed from Fig. 3. First,
the gain brought by the proposed LLR generator is 0.5dB at
BLER= 10−3, for both converged and non-converged cases.
1The further optimization of the hyper parameters may bring improved
performance. This is outside the scope of this paper.
TABLE II
THE DAMPING FACTORS DESIGNED BY GENETIC ALGORITHM
ai βi γi
i = 1 0 0 0
i = 2 0.2680 0 1.9997
i = 3 0.4236 0.2075 0.6695
i = 4 0.5051 0.2542 0.8296
i = 5 0.6147 0.3574 0.7598
i = 6 1.2661 0.9922 0.7647
i = 7 0.4054 0.2714 0.7851
i = 8 0.5360 0.1566 0.8723
Es/N0 (dB)
6 6.5 7 7.5 8 8.5 9
BL
ER
10-4
10-3
10-2
10-1
100
K=14161 QPSK/AWGN
Case 1: GN-coset + SC 
Case 2: GN-coset + SC
Case 3: GN-coset + SC
Baseline a: GN-coset + SC
Case 4: GN-coset + SC + LLRgen
Case 5: GN-coset + SC + LLRgen
Case 6: GN-coset + SC + LLRgen 
Baseline b: GN-coset + SC + LLRgen
Fig. 4. With converged damping factors, the gains brought by the proposed
LLR generator and the genetic algorithm are 0.7dB and 0.2dB at BLER=
10−4.
The error detection results facilitate the “recovery” of soft
LLRs in the proposed LLR generator, leading to the observed
gain.
Second, we exemplify the “learning gain” through three
points on the learning curve and present their BLER perfor-
mances in Fig. 4. On the one hand, this proves the effectiveness
of the genetic algorithm in designing good damping factors.
With the converged damping factors in Table II, the proposed
scheme is 0.2dB better than the best “hand-picked” damping
factors2. On the other hand, it confirms that the component
decoder with the proposed LLR generator exhibits better
decoding performance than the case without it.
Next, we compare our scheme with some baselines in
literatures.
1) The same code construction decoded by the parallel soft
output decoding algorithm [3]. This scheme exhibits a
similar degree of parallelism to the proposed decoding
algorithm, but incurs higher implementation complexity
due to the difficulty in handling the internal decoder data
flow.
2) A polar code with the same length and code rate,
evaluated under SC decoding. It enjoys more coding
gain but incurs larger decoding latency due to the serial
nature of SC decoding.
3) A recently proposed polar coding scheme with simi-
lar target for terabit/s throughput [6], which employs
2The “hand-picked” method is a greedy stepwise optimization that chooses
the best damping factors in every iteration.
5EsN0(dB)
6 6.5 7 7.5 8
Bl
oc
k 
Er
ro
r R
at
e
10-4
10-3
10-2
10-1
100
N=16384,K=14043
GN-coset + SC + LLRgen
GN-coset + SC 
GN-coset + SCAN
Polar + SC (N=16384)
Polar + SC (N=1024)
EsN0(dB)
6 6.5 7 7.5 8
Bl
oc
k 
Er
ro
r R
at
e
10-4
10-3
10-2
10-1
100
N=16384,K=14161
GN-coset + SC + LLRgen
GN-coset + SC 
GN-coset + SCAN
Polar + SC (N=16384)
Polar + SC (N=1024)
Fig. 5. Compared with Type-1 and Type-2 baselines, the proposed decoder
only trades 0.25dB∼ 0.3dB loss at BLER= 10−4 for improved area
efficiency and reduced decoding latency. Compared with Type-3 baseline,
the proposed scheme exhibits 0.75dB gain at BLER= 10−4. The polar
codes are constructed by Gaussian approximation at Es/N0= 6.3dB, 6.8dB,
6.0dB and 6.8dB for code rates 14161/16384, 885/1024, 14043/16384
and 877/1024, respectively.
Es/N0 (dB)
5 5.5 6 6.5 7
Cp
m
pl
ex
ity
 R
at
io
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
N = 16384, K=14161
GN-coset 
Es/N0 (dB)
5 5.5 6 6.5 7
Cp
m
pl
ex
ity
 R
at
io
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
N = 16384, K=14043
GN-coset 
Fig. 6. At BLER=10−4 , bypassing SC decoding can reduce 75% decoding
complexity.
an unrolled hardware architecture for high throughput.
“Unrolling” is only applicable for relatively short codes
(e.g., 1024) and thus sacrifices coding gain.
The evaluation results are presented in Fig. 5. Compared
with Type-1 and Type-2 baselines, the proposed decoder only
trades 0.25dB∼ 0.3dB loss at BLER= 10−4 for improved
area efficiency and reduced decoding latency. Compared with
Type-3 baseline, the proposed scheme exhibits 0.75dB gain at
BLER= 10−4.
Then, we evaluate the complexity reduction due to skipped
SC decoding. The number of activated SC decoders is mea-
sured to evaluate the complexity. The results are presented
in Fig. 6. It shows that the complexity reduction ratio varies
with SNR. For the case with higher SNR (lower BLER),
more complexity is reduced. At BLER=10−4, bypassing SC
decoding can reduce 75% decoding complexity.
At last, the area efficiency of the proposed decoder is
presented in Table III (see details in our ASIC implementation
[10]). With TSMC 16nm process, the area efficiency for
code rate 14161/16384 is 75Gbps/mm2 when the maximum
TABLE III
DECODER AREA EFFICIENCY
Info Iter- Latency Area Eff Convert to
size ation (ns) (Gbps/mm2) 10nm 7nm
14161
5 109.25 120.73 277.69 533.16
6 131.1 100.61 231.41 444.30
7 152.95 86.24 198.35 380.83
8 174.8 75.46 173.55 322.22
number of iterations is eight. The equivalent throughput under
7nm technology is about 322Gbps/mm2 with eight iterations
and 533Gbps/mm2 with five iterations.
V. CONCLUSIONS
In this work, we propose a low-complexity parallel decoding
algorithm of GN -coset codes. The framework exploits two
equivalent decoding graphs. For each graph, the inner com-
ponent codes are independent and support parallel decoding.
The component decoder adopts a novel design comprising an
error detector, an LLR generator and an SC decoder. The
LLR generator, parameterized by a set of damping factors,
is “learned” offline by a genetic algorithm based unsupervised
learning. The proposed decoding algorithm achieves compa-
rable performance to the case with soft-output component
decoder and conventional polar codes, but requires much lower
decoding and hardware implementation complexity.
REFERENCES
[1] E. Arıkan, “Channel Polarization: A method for constructing capacity-
achieving codes for symmetric binary-input memoryless channels,” in
IEEE Transactions on Information Theory, vol. 55, no. 7, pp. 3051-3073,
July 2009.
[2] S. Kudekar, S. Kumar, M. Mondelli, H. D. Pfister, E. Sasoglu and
R. L. Urbanke, “Reed-muller codes achieve capacity on erasure channels,”
in IEEE Transactions on Information Theory, vol. 63, no. 7, pp. 4298-
4316, July 2017.
[3] X. Wang, H. Zhang, R. Li, J. Tong, Y. Ge, and J. Wang, “On the
construction of GN -coset codes for parallel decoding,” accepted by IEEE
Wireless communications and Networking Conference, 2020 (available:
https://arxiv.org/abs/1904.13182).
[4] H. Zhang et al., “A flip-syndrome-list polar decoder architecture for ultra-
low-latency communications,” in IEEE Access, vol. 7, pp. 1149-1159,
2018.
[5] X. Liu et al., “A 5.16Gbps decoder ASIC for polar code in 16nm FinFET,”
2018 15th International Symposium on Wireless Communication Systems
(ISWCS), Lisbon, 2018, pp. 1-5.
[6] A. Sural, E. G. Sezer, Y. Erturul, O. Arıkan and E. Arıkan, “Terabits-
per-second throughput for polar codes,” 2019 IEEE 30th International
Symposium on Personal, Indoor and Mobile Radio Communications
(PIMRC Workshops), Istanbul, Turkey, 2019, pp. 1-7.
[7] L. Huang, H. Zhang, R. Li, Y. Ge and J. Wang, “AI coding: learning to
construct error correction codes,” in IEEE Transactions on Communica-
tions, vol. 68, no. 1, pp. 26-39, Jan. 2020.
[8] X. Wang et al., “Learning to flip successive cancellation decoding of
polar codes with LSTM networks,” 2019 IEEE 30th Annual International
Symposium on Personal, Indoor and Mobile Radio Communications
(PIMRC), Istanbul, Turkey, 2019, pp. 1-5.
[9] F. Carpi1, C. Hager, M. Martalo, R, Raheli, and H. D. Pfister, “Rein-
forcement learning for channel coding: learned bit-flipping decoding,”
2019 57th Annual Allerton Conference on Communication, Control, and
Computing (Allerton), Monticello, IL, USA, 2019, pp. 922-929.
[10] J. Tong, X. Wang, Q. Zhang, H. Zhang, S. Dai, R. Li, and J. Wang,
“Toward terabits-per-second communications: a hardware implementation
of high-throughput GN -coset codes,” Available: https://arxiv.org/, 2020.
