High-Throughput VLSI Architecture for GRAND by Abbas, Syed Mohsin et al.
High-Throughput VLSI Architecture for GRAND
Syed Mohsin Abbas, Thibaud Tonnellier, Furkan Ercan, and Warren J. Gross
Department of Electrical and Computer Engineering
McGill University, Montral, Qubec, Canada
Emails: syed.abbas@mail.mcgill.ca, thibaud.tonnellier@mcgill.ca, furkan.ercan@mail.mcgill.ca, warren.gross@mcgill.ca
Abstract—Guessing Random Additive Noise Decoding
(GRAND) is a recently proposed universal decoding algorithm
for linear error correcting codes. Since GRAND does not
depend on the structure of the code, it can be used for any
code encountered in contemporary communication standards
or may even be used for random linear network coding. This
property makes this new algorithm particularly appealing.
Instead of trying to decode the received vector, GRAND
attempts to identify the noise that corrupted the codeword.
To that end, GRAND relies on the generation of test error
patterns that are successively applied to the received vector. In
this paper, we propose the first hardware architecture for the
GRAND algorithm. Considering GRAND with ABandonment
(GRANDAB) that limits the number of test patterns, the
proposed architecture only needs 2 +
∑n
i=2
⌊
i
2
⌋
time steps to
perform the
∑3
i=1
(
n
i
)
queries required when AB = 3. For
a code length of 128, our proposed hardware architecture
demonstrates only a fraction (1.2%) of the total number of
performed queries as time steps. Synthesis result using TSMC
65nm CMOS technology shows that average throughputs of 32
Gbps to 64 Gbps can be achieved at an SNR of 10 dB for a code
length of 128 and code rates rate higher than 0.75, transmitted
over an AWGN channel. Comparisons with a decoder tailored
for a (79, 64) BCH code show that the proposed architecture
can achieve a slightly higher average throughput at high SNRs,
while obtaining the same decoding performance.
Index Terms—Error correcting code (ECC), guessing random
additive noise decoding (GRAND), maximum likelihood decoding
(MLD), VLSI architecture.
I. INTRODUCTION
Since the landmark paper by Shannon [1] in 1948, one
of the goal of researchers in the field of information theory
was to find good error correcting codes that can be effi-
ciently decoded. As soon as 1950, Hamming proposed his
eponymous codes that can always correct one error [2]. Ten
years later, BoseChaudhuriHocquenghem (BCH) codes [3],
[4] were discovered. They have the pleasant property that
the number of correctable errors is chosen by design. To
find the locations of the errors, two main algorithms can be
considered: the BerlekampMassey algorithm [5], [6] or the
Peterson-GorensteinZierler (PGZ) algorithm [7]. After major
advances and rediscoveries in the 90’s, polar codes were
proposed in 2008 along with their decoding algorithm called
successive cancellation (SC) algorithm [8]. This is the first
proven class of codes that asymptotically reaches the Shannon
limit. Serially concatenated with an outer cyclic redundancy
check (CRC) code [9], polar codes have been selected as
part of the 5G New Radio (NR) standard [10]. All of these
algorithms require devoted decoding techniques, and a decoder
tailored for a decoding algorithm cannot be directly utilized
for another one.
Recently, a universal decoding algorithm for linear codes
has been proposed [11]. Named Guessing Random Additive
Noise Decoding (GRAND), the algorithm does not rely on
the underlying channel code. Instead of using the properties
and the structure of the code to identify the errors that may
occur at the reception due to the channel noise, GRAND
guesses the noise present in the received vector. In other
words, GRAND is noise-centric rather than being code-centric
and is able to tackle any aforementioned coding schemes that
have been developed over the course of information theory.
GRAND is not the only decoding algorithm that is code
agnostic. However, considering high rate codes, GRAND has a
lower computational complexity than a brute-force search [11],
or does not require the costly Gaussian elimination required
for information set decoding [12]. The Gaussian elimination
is also required for random linear network coding [13] and
GRAND could be an efficient way to reduce the computational
complexity of this powerful encoding scheme. In addition, an
approach similar to GRAND has been used to lower the error
floor of turbo codes in [14], which could lead to the use of
GRAND in conjunction with usual decoding techniques.
To identify the noise, GRAND has three main steps. First,
error patterns are generated in a specific order. Then, the
error patterns are combined with the received vector, and
finally, queries for codebook membership on the resulting
words are realized. Generating all the possible error patterns is
impractical and unwanted. Thus, GRAND with ABandonment
(GRANDAB) has been also proposed to limit the number of
queries performed during the process [11].
In [15], the application of GRANDAB for short length and
high rate CRC-polar codes encountered in the 5G NR standard
has been demonstrated. Considering a code of length 128 and
up to 3 errors, a maximum number of 349 632 queries may
be required. However, it has been observed that the average
number of queries are much smaller than the worst-case
scenario for practical signal-to-noise ratio (SNR) conditions.
Thus, GRANDAB offers a high throughput with an average
low latency at moderate-to-high SNR regimes, and tailored for
high code rates, both of which are particularly crucial for ECC
storage applications.
In [16] and in [17], GRAND is enhanced to consider soft-
information at its input. Impressive decoding performance is
achieved, where substantial gains over the SC-List [18] are
presented. However, we limit the scope of this work to hard
ar
X
iv
:2
00
7.
07
32
8v
1 
 [c
s.I
T]
  1
4 J
ul 
20
20
input decoding.
In this paper, we propose a high throughput hardware
architecture for GRANDAB. To this end, we first show how
to share computations required by the GRAND algorithm,
using basic linear algebra. Then we propose an efficient
hardware exploiting the proposed sharings. To the best of our
knowledge, this is the first hardware architecture implementing
the GRAND algorithm. Considering a code of length 128, and
with a correction capability of up to three errors, the proposed
architecture can achieve an average coded throughput of up to
64 Gbps. Moreover, the proposed architecture can achieve the
same average throughput as a recently proposed decoder that
can only consider a (79, 64) BCH code.
The rest of this work is organized as follows: In Section II,
preliminaries regarding linear codes and GRAND algorithms
are given. In Section III, the proposed hardware architecture
is detailed. Synthesis results and comparison with a state-of-
the-art decoder for BCH code are given. Finally, concluding
remarks are drawn in Section V.
II. PRELIMINARIES
A. Notations
Matrices are denoted by a bold upper-case letter (M ),
while vectors are denoted with bold lower-case letters (v).
The transpose operator is represented by >. The number of
k-combinations from a given set of n elements is noted by(
n
k
)
. 1n is the indicator vector where all locations except the
nth are 0 and the the nth is 1. All the indices start at 1.
B. Linear block codes
Due to their convenient representations, linear block codes
are a class of error-correcting codes widely adopted by com-
munication standards. In the following, we restrict ourselves
with operations in the Galois field with 2 elements, noted F2.
A block code is a mapping g : Fk2 → Fn2 , where k < n. This
way, a vector u of size k maps to a vector c of size n. The
set of the 2k vectors c is called a code C, whose elements c
are called codewords. The ratio R , kn is the code rate. If g
is a linear mapping, then C is a linear block code. Thus, there
exists a k×n matrix G called generator matrix of the code C.
Then, the encoding process can be realized as a vector-matrix
product: c = u·G. We can define H , the (n−k)×n generator
matrix of the dual code of C. H is also called the parity-check
matrix of C and verifies the following property:
∀ c ∈ C, H · c> = 0. (1)
Consider that c has been transmitted over a noisy channel
and that r is received at the output of the channel. Because
of the channel noise, r can differ from c. Therefore, we can
establish the relationship between r and c as: r = c ⊕ e,
where e is the error pattern caused by the channel noise. The
syndrome is defined by s , H · r>. According to (1), s is
zero if and only if r is a codeword. Thus, if s is zero either
there is no error or the error pattern is itself a codeword. This
is the basic principle for standard array decoding [19], and
also for GRAND.
Algorithm 1: GRAND for linear codes
Input: H , G−1, r
Output: uˆ
1 e← 0
2 while H · (r ⊕ e)> 6= 0 do
3 e← generateNewErrorPattern()
4 uˆ← (r ⊕ e) ·G−1
5 return uˆ
C. Maximum Likelihood Decoding via GRAND
Guessing Random Additive Noise Decoding (GRAND) is a
recently proposed hard detection decoder that has been proven
to be a maximum likelihood (ML) decoder [11]. Algorithm 1
summarizes the steps of the GRAND procedure. The principle
of GRAND is to generate test error patterns, to apply them to
the received vector, and to check if the generated candidate is
a codeword by verifying that
H · (r ⊕ e)> (2)
is equal to zero. If so, cˆ , r ⊕ e is the estimated codeword.
To perform a proper decoding, cˆ has to be converted into the
estimated message: uˆ , cˆ · G−1, where G−1 is the n × k
matrix such that G ·G−1 is the identity matrix of size k. Note
that this step is not required for systematic codes, since the
message bits directly appear in the codeword.
The most important property of GRAND is that it requires
no other condition on the code except the linearity. Thus,
GRAND can be considered for any linear code if only the
parity check matrix (H) is provided. To the best of our
knowledge, the only other decoders that can decode any linear
code are the brute-force ML decoder and the information set
decoding [12], or its more recent version known as ordered
statistic decoding [20]. However, the ML decoder is imprac-
tical for high-rates codes since the 2k codewords have to be
evaluated, while the two others require Gaussian elimination,
which is challenging to efficiently implement in hardware [21].
D. GRAND with Abandonment
To limit the computational complexity of the GRAND
algorithm, GRAND with ABandonment (GRANDAB) is also
proposed in [11]. In that case, the decoder abandons the
search for the error pattern after a fixed number of queries
is reached. Therefore, GRANDAB results in an approximated
ML decoding. The notation GRANDAB with AB = t, means
that the Hamming weight of the considered error patterns do
not exceed t. As a result, the maximum number of queries for
a code of length n is given by
t∑
i=1
(
n
i
)
. (3)
For an illustrative purpose, Fig. 1 compares the frame error
rate (FER) performance obtained with GRANDAB AB = 3,
using CRC codes with different rates (R ≥ 0.75). All
4 5 6 7 8 9 10 11 12
10−8
10−7
10−6
10−5
10−4
10−3
10−2
10−1
100
SNR (dB)
FE
R
CRC(128,120)
CRC(128,112)
CRC(128,104)
CRC(128, 96)
Fig. 1. Comparison of the GRANDAB (AB = 3) decoding performance
using CRC codes for several rates and N=128.
codes have a length n of 128. The generator polynomials
are 0x04C11DB7, 0xB2B117, 0x1021, and 0xD5 for
k = 96, k = 104, k = 112, and k = 120, respectively.
A BPSK modulation and an AWGN channel with variance
σ2 are considered. The SNR in dB is defined as SNR =
−10 log10 σ2. The demodulator provides hard decisions to
the GRANDAB decoder. Observe that the FER performance
improves with the number of redundancy bits, up to the
case CRC(128, 104). Considering more redundancy bits does
not improve the decoding performance since the considered
version of GRANDAB cannot correct more than 3 errors.
Nevertheless, GRANDAB is an effective way for decoding
any code, especially compared to its high-complexity agnostic
hard decoder counterparts, for which an impractical number
of computations are required. With n = 128 and AB = 3,
a total number of 349 632 queries are required. Despite this
large number of queries, targeting a FER of 10−4, the average
number of queries become 445, 412, 4.58, and 1.01 for
k = 96, k = 104, k = 112, and k = 120, respectively. This
is another advantage of GRAND: the average computational
complexity decreases sharply as channel conditions improve.
III. VLSI ARCHITECTURE FOR GRAND
In this section, we provide details of the proposed VLSI
architecture for GRANDAB (AB = 3) decoding of linear
codes. Since GRAND decoding is agnostic to the underlying
channel code, the proposed VLSI architecture can be used
to decode any linear block code conforming with the length
and rate constraints, given the parity check matrix (H) of
that code. Before presenting the details of our proposed VLSI
architecture, we provide a minimal mathematical background
– exploiting the linearity of the considered codes – required
to simplify the problem.
A. Computations reformulation
For the one bit-flip error patterns 1i, with i ∈ J1 . . nK,
using the distributivity rule, (2) can be written as
H · (r ⊕ 1i)> =H · r> ⊕H · 1>i , (4)
sn
s1
s2
s3
Dial 1
0
0
0
0
Dial 2
0sn−1
Fig. 2. Content of the dials for checking the one-bit-flip error patterns.
where H · r> is the (n − k)-bits syndrome associated with
the received vector r and H ·1>i is the (n−k)-bits syndrome
associated with the one bit-flip error pattern 1i.
Noticing that the two bit-flips noise sequences 1i,j , with
i ∈ J1 . . nK, j ∈ J1 . . nK and i 6= j, can be written as
1i,j = 1i ⊕ 1j , (2) can be expressed as
H · (r ⊕ 1i,j)> =H · r> ⊕H · 1>i ⊕H · 1>j , (5)
for the two bit-flips case. Similarly, the three-bit-flips noise
sequences 1i,j,k, where i, j and k are the flipped bit positions,
can be checked for code membership with
H · (r⊕1i,j,k)> =H ·r>⊕H ·1>i ⊕H ·1>j ⊕H ·1>k . (6)
Equations (4)-(6) are the core of the proposed VLSI ar-
chitecture. By combining several different one bit-flip noise
sequence syndromes, it is possible to compute all the queries
corresponding to several bit-flips. In the following, we denote
by si the syndrome corresponding to the one-bit-flip error
pattern at location i: si = H · 1>i , which also corresponds
to the ith column of the parity check matrix.
B. Principle, Details and Scheduling
The scheduling of the proposed architecture comprises four
fundamental decoding steps. In the first one, the syndrome of
the received word is computed (H · r>). In the second step,
all the error patterns with a Hamming weight of 1 are inde-
pendently combined with the syndrome of the received word.
In the third and fourth steps, Hamming weights of 2 and 3 are
considered, respectively. During the iterations of any of the
described steps, when (2) results in a zero, the corresponding
estimated word is the output and the procedure is terminated.
To efficiently generate the different error patterns, the proposed
architecture is based on what we call dials.
A dial is a n × (n − k)-bit register file which stores all
the n syndromes associated with the one-bit-flip error patterns
(si). The dial has the ability to shift its content in a cyclic
manner at each time step; i.e. when the content of the second
row is shifted to the first row, the content of the first row is
shifted to the last row. Moreover, during a cyclic shift, the
content of the last row may be replaced by the (n − k)-bit
wide null vector. This operation is called shift-up. After a
shift-up operation has taken place, the following cyclic shifts
sn
s1
s2
s3
Dial 1 Dial 2
s2
s3
sn
s1
sn−1
s4
(a) First time step.
sn
s1
s2
s3
Dial 1 Dial 2
s2
s3
s1sn−1
s4
s5
(b) Second time step.
Fig. 3. Content of the dials for checking the two-bit-flips error patterns at
different time steps.
exclude rows containing null vectors. Note that a dial works
in conjunction with an index dial, a n× log2 n-bit cyclic shift
register file, which performs the same operations (cyclic shift
or shift-up) to keep track of the indices (i in si). As explained
later, only 2 dials are used in the proposed architecture.
For checking the one-bit-flip error patterns, the content of
the dials is depicted in Fig. 2. By combining each row of
the dials with the syndrome of the received vector, we can
compute (4) in one time step.
Fig. 3(a) shows the content of the dials at the first time step
when checking for the two-bit-flips error patterns: the content
of the dial 2 is the image of dial 1 cyclically shifted by one.
By combining each row of the dials with the syndrome of the
received vector, we can compute n two-bit-flips error patterns
in one time step. At the next time step, the content of the dial 2
is cyclically shifted by one as shown in Fig. 3(b). Observing
that 1i,j = 1j,i, all the
(
n
2
)
two-bit-flips error patterns are
tested for code membership after a total of
⌊
n
2
⌋−1 cyclic shifts
from the original setting (Fig. 3(a)). Hence, a total of
⌊
n
2
⌋
time
steps are required to compute (5). Note that to keep track of
the indexes, whenever a dial is rotated, its corresponding index
shift register (index dial) is also rotated.
Regarding the three-bit-flips error patterns, we show that
only two dials can be used. Indeed, if three dials are con-
sidered, the scheduling and the associated hardware become
more complex to avoid error pattern duplications. Instead, a
controller is used in conjunction with the dials to generate
the test patterns. The controller takes care of the first bit-
flip, while the dials are responsible for considering the two
other bit-flips. Fig. 4(a) shows the content of the dials and the
syndrome output by the controller to generate n − 1 three-
bit-flips error patterns at time step 1. To do so, the dial 1
is shifted-up by 1, while the dial 2 is shifted-up by 1 and
cyclically shifted by 1 at the initialization. In the next time
step, the dial 2 is cyclically shifted by 1 to generate the next
n−1 three bit-flip noise sequences as shown in Fig. 4(b). After⌊
n−1
2
⌋
time steps all the
(
n−1
2
)
three-bit-flips error patterns
with s1 are generated. In the next time step, the controller
outputs s2 while the dial 1 is shifted-up by 1 and the dial
2 is reset, shifted-up by 2 and cyclically shifted by 1. This
sn
s2
s3
s2
s3
snsn−1
s4
s5s4
0 0
Dial 1 Dial 2
Control.
s1
(a) First time step.
Dial 1 Dial 2
sn
s2
s3
s2
s3
sn−1
s4
s5
s4
0 0
s6
Control.
s1
(b) Second time step.
Dial 1 Dial 2
Control.
0
s3
s3
0
sn
s4
s5s4
0 0
s6
s2
s5
(c) Time step
⌊
n−1
2
⌋
+ 1.
Dial 1 Dial 2
Control.
0
s3
0
sn s4
s5
s4
0 0
s6
s2
s5 s7
(d) Time step
⌊
n−1
2
⌋
+ 2.
Fig. 4. Content of the dials and syndrome outputted by the controller for
checking the three-bit-flips error patterns at different time steps.
generates n− 2 three-bit-flips error patterns, as shown in Fig.
4(c). In the next time step, the dial 2 is cyclically shifted by 1,
allowing to generate the next n−2 three-bit-flips error patterns
as shown in Fig. 4(d). Hence,
⌊
n−2
2
⌋
time steps are used to
generate all the
(
n−2
2
)
three-bit-flips error patterns with s2 set
and s1 excluded. Similarly, this process is repeated until sn−2
is outputted by the controller, where only one three-bit-flips
error pattern is generated:H ·r>⊕H ·s>n−2⊕H ·s>n−1⊕H ·s>n .
Finally, checking all the three-bit-flips error patterns requires∑n−1
i=2
⌊
i
2
⌋
time steps.
In summary, the number of required time steps to check all
the error patterns with Hamming weights of 3 or less is given
by:
2 +
n∑
i=2
⌊
i
2
⌋
. (7)
Using some mathematical manipulation, the ratio between (3)
and (7) – that expresses the parallelization factor – can be
approximated by 2∗n3 . Thus, the longer the code, the higher
the savings compared with a conventional and serial approach.
The proposed hardware architecture for the GRANDAB
algorithm with AB = 3 is shown in Fig. 5. Its input is the hard
decision vector r of length n and its output is the estimated
word uˆ, padded with zeros to match the length of n. For the
sake of clarity, the control and clock signals are omitted in the
Figure. At any time, to support any code given the length and
rate constraints, an H matrix can be loaded. The data path
consists essentially of the interconnection through 2 × n + 1
(n− k)-bit-wide XOR gates of the dials, the syndrome of the
received word, and the syndrome provided by the controller
(2×n for the dials and 1 for the controller), as described in the
previous paragraphs. Each of the n test syndromes is NOR-
reduced, to feed an n-to-log2 n priority encoder. The output
of each NOR-reduce is 1 if and only if all the bits of the
syndrome computed by (2) are 0. The output of the priority
encoder controls two multiplexers, used to forward the indices
associated with the valid syndrome to the word generator.
Finally, the word generator combines the hard decision vector
r and the three indices to produce the estimated codeword,
which is translated into the estimated word and outputted.
IV. IMPLEMENTATION RESULTS
The proposed architecture has been implemented in Verilog
HDL and synthesized using the Synopsys Design Compiler
version P-2019.03 with TSMC 65nm CMOS technology. The
design has been verified using test benches generated via the
bit-true C model of the proposed hardware.
Table I presents the synthesis results for the proposed
decoder with n = 128, AB = 3 and a code rate between
0.75 and 1. Thus, the length of the syndromes is constrained
to the interval J0 . . 32K. The implementation can support a
maximum frequency of 500 MHz. No pipelining strategy is
used, therefore one clock cycle corresponds to one time step.
Using (7), 4 098 cycles are required in the worst-case (W.C.)
for decoding a 128-length code. Recall that GRANDAB
(AB = 3) requires a total number of 349 632 queries for
decoding any code of length 128. Hence, our proposed VLSI
architecture demonstrates only a fraction (1.2%) of the total
number of performed queries as latency. With a frequency of
500 MHz, the proposed architecture results in a worst-case
information throughput (W.C. T/P) of 11.71 to 14.64 Mbps
for the CRC codes considered in Section II-D. However, the
average latency is much shorter than the worst-case latency,
especially in the mid-to-high SNR region. Using the bit-true
model, the average latency is computed after considering at
least 100 frames in error for each SNR points. Fig. 6(a) depicts
the average latency for the considered codes. Irrespective of
the code rate, we can see that the average latency reduces
when the channel condition becomes better, up to the point
where the average latency reach only 1 cycle per decoded
codeword. The counterpart of the latency, the throughput, is
given in Fig. 6(b). Observe that the information throughput
grows with the SNR up to reaching the values of 48 Gbps to
60 Gbps, according to the code rate. In addition, considering
an FER of 10−4, average information throughputs of 9 Gbps,
9 Gbps, 56 Gbps and 60 Gbps are obtained for the information
lengths of 96, 104, 112, and 120, respectively.
To the best of our knowledge, there is no hardware im-
plementation of a hard detection decoder in the literature that
achieves the same code flexibility as our proposed architecture.
Thus, performing a fair comparison is difficult. However, we
propose to compare it with a state-of-the-art hard decision
algebraic decoder. Recently, a high throughput VLSI architec-
ture based on the PGZ algorithm for a (79, 64) BCH code
TABLE I
TSMC 65 NM CMOS IMPLEMENTATION RESULTS FOR GRANDAB
(AB = 3) AND n = 128.
Technology (nm) 65
Supply (V) 0.9
Max. Freq (MHz) 500
Area (mm2) 0.25
W.C. Latency (cycles) 4098
W.C. T/P (Mbps)
(128, 96) 11.71
(128, 104) 12.68
(128, 112) 13.66
(128, 120) 14.64
TABLE II
TSMC 65 NM CMOS IMPLEMENTATION COMPARISON FOR GRANDAB
(AB = 2) AND n = 79.
GRANDAB (AB = 2) (79, 64) BCH decoder [22]
Technology (nm) 65 65
Supply (V) 1.1 1.2
Frequency (GHz) 1 N/A
Area (µm2) 126 733 3 264
Latency (ns)
min. 1 1.1
avg. 1.09 1.1
max. 41 3
Code compatible Yes No
Rate compatible Yes No
decoder has been proposed [22]. Since up to 2 errors can
be corrected with this decoder, GRANDAB with AB = 2 is
enough to achieve the same decoding performance. Therefore,
we re-synthesized our architecture by limiting n to 79 and by
setting AB = 2. Hence, a total of 1+1+
⌊
79
2
⌋
= 41 time steps
are required to decode any code of length 79 with at most 2
errors.
Table II compares the implementation results of the
GRANDAB (AB = 2) decoder and the BCH decoder in [22].
The proposed decoder is 41× larger and has 13.6× higher
worst-case latency. On the other hand, the average latency
of the two decoders are equivalent at an SNR of 10dB. At
higher SNRs, the proposed decoder exhibits a slightly better
minimum latency and achieves an information throughput of
64 Gbps, while the BCH decoder is limited to 58 Gbps.
Finally, while [22] can only decode the (79, 64) BCH code,
the proposed GRANDAB (AB = 2) can decode any code with
n = 79 and R ≥ 0.75.
V. CONCLUSION
In this paper, we proposed the first hardware architecture
for the GRANDAB algorithm. The decoding algorithm has the
uncommon property of being able to decode any linear code.
By using linear algebra basics, we were able to decompose
the computations of the GRAND algorithm to improve the
inherent parallelism. By doing so, the proposed hardware ar-
chitecture can accomplish 349 632 queries in 4 098 time steps.
ASIC synthesis results showed that an average information
throughput of at least 9 Gbps can be achieved with a block
length of 128 when a FER of 10−4 is targeted. Moreover, the
average throughput increases when the channel conditions be-
H · r>
Dial 1 Dial 2
r
n n − k n − k
n − k
n − k
n − k
1
1
1
1
n : log2(n)
Encoder
log2(n)
n:1
Mux
n:1
Mux
log2(n)
Word
Generator
Controller
n − k
log2(n)
log2(n)
In
d
ex
D
ia
l
1
xˆ
n
log2(n)
log2(n)
log2(n)
log2(n)
In
d
ex
D
ia
l
2
H
(n − k) × n
Fig. 5. Proposed architecture for GRANDAB (AB = 3).
6 7 8 9 10 11
100
101
102
103
SNR (dB)
L
at
en
cy
(c
yc
le
s)
6 7 8 9 10 11
102
103
104
105
SNR (dB)
In
fo
.T
hr
ou
gh
pu
t
(M
bp
s)
(a) : Latency (b) : Info. Throughput
CRC(120,128) CRC(112,128)
CRC(112,128) CRC(128, 96)
Fig. 6. Average latency and average information throughput of the proposed
hardware architecture, using the same coding schemes as in Fig. 1.
come better. Hence, the average coded throughput for the same
parameters can reach up to 64 Gbps. Finally, the architecture
can achieve the same average throughput as a BCH decoder
tailored for a (79, 64) code. The proposed architecture paves
the way for future implementation of the GRAND algorithm
that can consider soft information as their inputs.
REFERENCES
[1] C. E. Shannon, “A mathematical theory of communication,” Bell System
Technical Journal, 1948.
[2] R. W. Hamming, “Error detecting and error correcting codes,” Bell
System Technical Journal, vol. 29, pp. 147–160, 1950.
[3] A. Hocquenghem, “Codes correcteurs d’erreurs,” Chiffres, 1959.
[4] R. C. Bose and D. K. Ray-Chaudhuri, “On a class of error correcting
binary group codes,” Information and control, vol. 3, no. 1, pp. 68–79,
1960.
[5] E. Berlekamp, “Nonbinary BCH decoding (abstr.),” IEEE Transactions
on Information Theory, vol. 14, no. 2, pp. 242–242, 1968.
[6] J. Massey, “Shift-register synthesis and BCH decoding,” IEEE Transac-
tions on Information Theory, vol. 15, no. 1, pp. 122–127, 1969.
[7] W. W. Peterson, “Encoding and error-correction procedures for the Bose-
Chaudhuri codes,” IRE Trans. Inf. Theory, vol. IT-6, no. 1, pp. 459–470,
1960.
[8] E. Arikan, “Channel polarization: A method for constructing capacity-
achieving codes for symmetric binary-input memoryless channels,” IEEE
Transactions on Information Theory, vol. 55, no. 7, pp. 3051–3073,
2009.
[9] W. W. Peterson and D. T. Brown, “Cyclic codes for error detection,”
Proceedings of the IRE, vol. 49, no. 1, pp. 228–235, 1961.
[10] 3GPP, “NR; Multiplexing and Channel Coding,” http://www.3gpp.org/
DynaReport/38-series.htm, Tech. Rep. TS 38.212, April 2020, Rel. 16.1.
[11] K. R. Duffy, J. Li, and M. Mdard, “Capacity-achieving guessing random
additive noise decoding,” IEEE Transactions on Information Theory,
vol. 65, no. 7, pp. 4023–4040, 2019.
[12] E. Prange, “The use of information sets in decoding cyclic codes,” IRE
Transactions on Information Theory, vol. 8, no. 5, pp. 5–9, 1962.
[13] T. Ho, R. Koetter, M. Medard, D. R. Karger, and M. Effros, “The benefits
of coding over routing in a randomized setting,” in IEEE International
Symposium on Information Theory, 2003. Proceedings., 2003, pp. 442–.
[14] T. Tonnellier, C. Leroux, B. Le Gal, B. Gadat, C. Jego, and N. Van
Wambeke, “Lowering the error floor of turbo codes with CRC verifica-
tion,” IEEE Wireless Communications Letters, vol. 5, no. 4, pp. 404–407,
2016.
[15] K. R. Duffy, A. Solomon, K. M. Konwar, and M. Me´dard, “5G NR CA-
Polar maximum likelihood decoding by GRAND,” in 2020 54th Annual
Conference on Information Sciences and Systems (CISS). IEEE, 2020,
pp. 1–5.
[16] K. R. Duffy, “Ordered reliability bits guessing random additive noise
decoding,” arXiv preprint arXiv:2001.00546, 2020.
[17] A. Solomon, K. R. Duffy, and M. Me´dard, “Soft maximum likelihood
decoding using GRAND,” arXiv preprint arXiv:2001.03089, 2020.
[18] I. Tal and A. Vardy, “List decoding of polar codes,” IEEE Transactions
on Information Theory, vol. 61, no. 5, pp. 2213–2226, 2015.
[19] D. Slepian, “A class of binary signaling alphabets,” Bell System Tech.
J., vol. 35, pp. 203–234, 1956.
[20] M. P. C. Fossorier and S. Lin, “Soft-decision decoding of linear block
codes based on ordered statistics,” IEEE Transactions on Information
Theory, vol. 41, no. 5, pp. 1379–1396, 1995.
[21] S. Scholl, C. Stumm, and N. Wehn, “Hardware implementations of
Gaussian elimination over GF(2) for channel decoding algorithms,” in
2013 Africon, 2013, pp. 1–5.
[22] S. Choi, H. K. Ahn, B. K. Song, J. P. Kim, S. H. Kang, and S. Jung,
“A decoder for short BCH codes with high decoding efficiency and low
power for emerging memories,” IEEE Transactions on Very Large Scale
Integration (VLSI) Systems, vol. 27, no. 2, pp. 387–397, 2019.
