High-Speed Arithmetic General Terms Design by Tong Zhang & Keshab K. Parhi
On the High-Speed VLSI Implementation of
Errors-and-Erasures Correcting Reed-Solomon Decoders
Tong Zhang and Keshab K. Parhi
Department of Electrical and Computer Engineering, University of Minnesota
ftzhang,parhig@ece.umn.edu
ABSTRACT
Recently a novel algorithm transformation was proposed to
reduce the critical path of Berlekamp-Massey algorithm im-
plementation for errors-alone Reed-Solomon decoding. In
this paper, we apply the same methodology to transform
the Berlekamp-Massey algorithm for errors-and-erasures RS
decoding. We present a regular hardware architecture to
implement the reformulated Berlekamp-Massey algorithm,
which can achieve high throughput. Moreover, an operation
scheduling scheme is proposed to further reduce the hard-
ware complexity without loss of throughput.
Categories and Subject Descriptors
B.2.4 [ARITHMETIC AND LOGIC STRUCTURES]:
High-Speed Arithmetic
General Terms
Design
Keywords
Reed-Solomon codes, Berlekamp-Massey algorithm, erasure,
VLSI architectures
1. INTRODUCTION
Reed-Solomon (RS) codes are widely used for forward er-
ror correcting (FEC) in numerous communication systems
because of their good error correction capability for burst
errors. A conventional errors-alone RS decoding procedure
can be modi¯ed to correct both errors and erasures [1][4]
(an erasure occurs when the position of a corrupted symbol
is known). RS decoder with the capability of correcting er-
rors as well as erasures will improve performance in various
systems [9][3].
To achieve higher decoding speed, except applying the
more advanced physical-level, e.g., sub-micron, technology
to improve the throughput, the algorithm/architecture-level
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for pro£t or commercial advantage and that copies
bear this notice and the full citation on the £rst page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior speci£c
permission and/or a fee.
GLSVLSI’02, April 18-19, 2002, New York, New York, USA.
Copyright 2002 ACM 1-58113-462-2/02/0004 ...$5.00.
transformations and modi¯cations are also playing impor-
tant roles. In RS decoders, the throughput bottleneck is in
solving the Berlekamp's key equation, where two algorithms
are typically employed [1]: extended Euclidean (eE) algo-
rithm and Berlekamp-Massey (BM) algorithm. Recently a
novel algorithm transformation was proposed in [7] to refor-
mulate the BM algorithm for errors-alone RS decoders so
that the key-equation-solving block has a very regular semi-
systolic architecture with the critical path of only Tmult +
Tadd, where Tmult and Tadd are the delays of the ¯nite ¯eld
multiplier and adder, respectively. This amount of criti-
cal path is much smaller than that of the previous RS de-
coder implementations using BM algorithm and comparable
to the best known result of implementations based on eE
algorithm. The basic methodology behind such algorithm
transformation is to remove the data dependency inside of
one iteration at the cost of increased computation complex-
ity so that the critical path can be reduced. In this pa-
per, we show that this methodology can be readily applied
to the errors-and-erasures RS decoder implementations to
achieve regular semi-systolic architecture and critical path
of Tmult + Tadd. Moreover, exploiting the characteristics of
errors-and-erasures RS decoding, we propose an operation
scheduling scheme for the developed decoder architecture to
reduce the hardware complexity without loss of throughput.
This paper is organized as follows. We introduce the fun-
damentals of RS decoding in Section 2. An inverse-free BM
algorithm for errors-and-erasures RS decoding is described
in Section 3. In Section 4, we apply the methodology pro-
posed in [7] to transform the BM algorithm for errors-and-
erasures RS decoding, based on which we present a semi-
systolic high-speed hardware architecture. We note that in
order to facilitate the reader's reference to [7], we use the
same notations as much as possible in the algorithm and
hardware architecture descriptions.
2. RS CODES DECODING
Let's consider an (n;k) RS code over GF(2
m), where each
code symbol is a m-bit data, n = 2
m ¡ 1 is the codeword
block length and k is the information length. Since RS codes
are maximum distance separable (MDS) codes [6], the (n;k)
RS code has a minimum distance d = n¡k+1. When such
RS code is used on a channel that makes both errors and era-
sures, any pattern of v errors and ½ erasures can be corrected
provided that d ¸ 2v + ½ + 1. Let C(z) denote the trans-
mitted codeword polynomial and R(z) denote the received
word polynomial in which erased symbols are represented asblanks (or zeroes), we have
R(z) = C(z) + E(z) + F(z);
where E(z) and F(z) represent the error polynomial and
erasure polynomial. De¯ne the errata polynomial as
~ E(z) = E(z) + F(z)
= ~ ei1z
i1 + ~ ei2z
i2 + ¢¢¢ + ~ eiv+½z
iv+½:
Let ® be the primitive element in GF(2
m), we say that er-
rata magnitudes ~ Yl = ~ eil for l = 1;:::;v + ½; occurred at
errata locations ~ Xl = ®
il for l = 1;:::;v + ½. Suppose that
the d¡1 consecutive powers of ® used in RS code construc-
tion are ®
m0;®
m0+1;¢¢¢ ;®
m0+d¡2. The RS decoder always
begin by computing the syndromes
sk = R(®
m0+k) = ~ E(®
m0+k); 0 · k · d ¡ 2:
We introduce the syndrome polynomial S(z) =
Pd¡2
k=0 skz
k,
and de¯ne the errata locator polynomial ¤(z) and errata
evaluator polynomial ­(z) as
¤(z) =
v+½ Y
j=1
(1 ¡ ~ Xjz) = 1 + ¸1z + ¢¢¢ + ¸v+½z
v+½;
­(z) =
v+½ X
i=1
~ Yi ~ X
m0
i
v+½ Y
j=1;j6=i
(1 ¡ ~ Xjz)
= !0 + !1z + ¢¢¢ + !v+½¡1z
v+½¡1:
We can determine the above two unknown polynomials ¤(z)
and ­(z) by solving the key equation [1]:
­(z) ´ ¤(z)S(z) mod z
d¡1: (1)
Notice that the v +½ roots of ¤(z) consist of the inverses of
v unknown error locations and ½ known erasure locations.
Once ¤(z) and ­(z) are found, the decoder can use Chien
search to identify the v unknown roots of ¤(z) so that all
the v + ½ errata locations ~ Xi become known. Then all the
v + ½ errata magnitudes ~ Yi can be determined by using the
Forney algorithm [4]
~ Yi = ¡
~ X
1¡m0
i ­( ~ X
¡1
i )
¤0( ~ X
¡1
i )
; for i = 1;2;¢¢¢ ;v + ½: (2)
Using all the ~ X
0
is and ~ Y
0
i s we can derive the errata poly-
nomial ~ E(z), and ¯nally reconstruct the correct codeword
C(z) = R(z) ¡ ~ E(z).
The VLSI implementation of the above decoding process
typically contains four consecutive pipelined stages : 1. the
syndrome computation (SC) block; 2. the key equation
solver (KES) block; 3. the Chien search (CS) block; 4. the
errata evaluator (EE) block. All the blocks work in parallel
on consecutive codewords. Pipeline is also typically used in
each block to achieve higher throughput. Compared with
the other three blocks which can be pipelined in a straight-
forward manner, the KES block, based on BM or eE al-
gorithm, cannot be easily pipelined due to the presence of
inherent feedback loops. Thus we may say that the through-
put bottleneck in RS decoders is in the KES block.
3. BERLEKAMP-MASSEY ALGORITHM
Recently an e±cient inverse-free BM algorithm was pro-
posed in [5] for errors-and-erasures RS decoding. We adopt
such algorithm as the starting point of this work. In the
following, we brie°y describe this algorithm based on [5].
Notice that such algorithm actually ¯nds scalar multiples
¯ ¢ ¤(z) and ¯ ¢ ­(z), where ¯ is a ¯eld element in GF(2
m).
Since these scaled results will not a®ect the computation of
errata locations using Chien search and the errata magni-
tudes using (2), we still refer to the polynomials output by
the following BM algorithm as ¤(z) and ­(z).
Suppose errata locations ~ Xl1; ~ Xl2;¢¢¢ ; ~ Xl½ correspond to
the ½ erasures, we de¯ne the erasure locator polynomial as
ª(z) =
½ Y
i=1
(1 ¡ ~ Xliz) = 1 + Ã1z + ¢¢¢ + Ã½z
½: (3)
Notice that, provided with 2v+½+1 · d, when ½ = d¡1 or
½ = d ¡ 2, no errors occurred (or v = 0) and the errata lo-
cator polynomial ¤(z) = ª(z) is immediately available, the
decoder can bypass the following BM algorithm and com-
putes errata evaluator polynomial ­(z) directly using (1).
Algorithm 3.1. Inverse-Free BM Algorithm for Errors-
and-Erasures Decoding
1. Initialization: r = k
(0) = 0, °
(0) = 1, ¤
(0)(z) =
B
(0)(z) = ª(z).
2. Set r = r + 1. If r > d ¡ ½ ¡ 1, go to step 6, else
compute the discrepancy
±
(r¡1) = ¸
(r¡1)
0 ¢ sr+½¡1 + ¢¢¢ + ¸
(r¡1)
r+½¡1 ¢ s0: (4)
3. Compute ¤
(r)(z) = °
(r¡1)¢¤
(r¡1)(z)¡z¢±
(r¡1)¢B
(r¡1)(z)
so that for i = 0;1;¢¢¢ ;d ¡ 3;
¸
(r)
i = °
(r¡1) ¢ ¸
(r¡1)
i ¡ ±
(r¡1) ¢ b
(r¡1)
i¡1 : (5)
4. If ±
(r¡1) 6= 0 and k
(r¡1) ¸ 0, then
8
<
:
B
(r)(z) = ¤
(r¡1)(z);
°
(r) = ±
(r¡1);
k
(r) = ¡k
(r¡1) ¡ 1:
else
8
<
:
B
(r)(z) = z ¢ B
(r¡1)(z);
°
(r) = °
(r¡1);
k
(r) = k
(r¡1) + 1:
5. Return to step 2.
6. Set ¤(z) = ¤
(d¡½¡1)(z), and compute ­(z) = (¤(z) ¢
S(z)) mod z
d¡1 so that for i = 0;1;¢¢¢ ;d ¡ 3
!i = ¸
(d¡½¡1)
0 ¢ si + ¸
(d¡½¡1)
1 ¢ si¡1 ¢¢¢ + ¸
(d¡½¡1)
i ¢ s0:
Notice that b
(r)
¡1 occurring in the above algorithm are set
to zeroes. This algorithm provides the solutions of the key
equation: ¤(z) and ­(z), where all the ¸i for i > v + ½ and
!j at least for j > v + ½ ¡ 1 are zeroes.
Notice that the loop between step 2 and 5 is completed in
d¡½¡1 iterations, and in each iteration there is a data de-
pendency between the computations of (4) and (5) through
the discrepancy ±
(r¡1). Thus, if we want to realize the above
algorithm in such a way that one iteration is completed in
one clock cycle for the minimal latency, the critical pathwill be larger than 2 ¢ (Tmult + Tadd) that is typically much
longer than the critical paths of all other building blocks in
RS decoder. Therefore, reducing this critical path becomes
the key to achieve higher RS decoding throughput.
4. HIGH-SPEEDRSDECODERARCHITEC-
TURE
As mentioned earlier, the authors of [7] proposed a novel
algorithm transformation to reformulate the errors-alone cor-
recting BM algorithm to signi¯cantly reduce the critical
path, where the basic idea is to remove the data dependency
inside of one iteration at the cost of increased computation
complexity so that the critical path can be reduced. In this
section, we apply the same methodology to reformulate the
above BM algorithm for errors-and-erasures RS decoding.
The developed KES block has a regular semi-systolic archi-
tecture with the critical path of only Tmult + Tadd. More-
over, to reduce the overall RS decoder hardware complexity
without loss of latency, we propose an operation scheduling
scheme to make KES block not only solve the key equa-
tion but also compute an intermediate polynomial that is a
primary input to the reformulated BM algorithm.
4.1 Reformulated BM Algorithm
In the following we present how to apply the essential
methodology behind the algorithm transformation proposed
in [7] for errors-alone RS decoding to our interested errors-
and-erasures RS decoding. The readers are highly recom-
mended to read [7] for its detailed descriptions.
First notice that the discrepancy ±
(r¡1) in Algorithm 3.1,
the pivotal element along the data dependency path, is ac-
tually ±
(r¡1)
r+½¡1, the coe±cient of z
r+½¡1 in the polynomial
¢
(r¡1)(z) = ¤
(r¡1)(z) ¢ S(z)
= ±
(r¡1)
0 + ±
(r¡1)
1 z + ¢¢¢ + ±
(r¡1)
r+½¡1z
r+½¡1 + ¢¢¢ :
If we introduce a new polynomial £
(r)(z) = B
(r)(z) ¢ S(z),
according to Algorithm 3.1, we have
¢
(r)(z) = ¤
(r)(z) ¢ S(z)
= (°
(r¡1) ¢ ¤
(r¡1)(z) ¡ z ¢ ±
(r¡1) ¢ B
(r¡1)(z)) ¢ S(z)
= °
(r¡1) ¢ ¢
(r¡1)(z) ¡ z ¢ ±
(r¡1) ¢ £
(r¡1)(z); (6)
where £
(r)(z) = B
(r) ¢ S(z) will be either ¢
(r¡1)(z) =
¤
(r¡1)(z) ¢ S(z) or z ¢ £
(r¡1)(z) = z ¢ B
(r¡1) ¢ S(z). We
can easily prove that, if ¢
(0)(z) is available at the begin-
ning and we replace the computation of (4) in Algorithm
3.1 with (6), discrepancy ±
(r¡1) = ±
(r¡1)
r+½¡1, the crucial el-
ement in each iteration, will be computed in the previous
clock cycle instead of the current clock cycle (or we can say
that discrepancy ±
(r¡1) is computed in a look-ahead style).
In this way, the original data dependency in each iteration
(or intra-iteration data dependency) has been transformed
into successive iterations (or replaced by an inter-iteration
data dependency). Thus, with the disappearance of intra-
iteration data dependency, all the computations in one it-
eration can be performed in parallel and, as shown later,
the corresponding KES block can achieve the critical path
of only Tmult + Tadd.
Let z
i mod z
d¡1 = 0 for i < 0, we introduce the polyno-
mial ©(z):
©(z) = (z
¡½ ¢ ª(z) ¢ S(z)) mod z
d¡1 =
d¡2 X
i=0
Áiz
i; (7)
where
Ái = Ã0 ¢ si+½ + Ã1 ¢ si+½¡1 + ¢¢¢ + Ã½ ¢ si:
Moreover, notice that for any i < r+½, ±
(r)
i and µ
(r)
i cannot
a®ect the value of any later discrepancy ±
(r+j)
r+½+j. Conse-
quently, we need not store ±
(r)
i and µ
(r)
i for i < r + ½. Thus
we de¯ne ^ ±
(r)
i = ±
(r)
i+r+½, ^ µ
(r)
i = µ
(r)
i+r+½ and the polynomials
^ ¢
(r)(z) =
d¡2 X
i=0
^ ±
(r)
i z
i and ^ £
(r)(z) =
d¡2 X
i=0
^ µ
(r)
i z
i (8)
with initial values ^ ¢
(0)(z) = ^ £
(0)(z) = ©(z). Based on the
above discussion and introduced polynomials, we can readily
transform Algorithm 3.1 to the following algorithm suitable
for high-speed implementations:
Algorithm 4.1. Reformulated Inverse-Free BM Algorithm
for Errors-and-Erasures Decoding
1. Initialization: r = k
(0) = 0, °
(0) = 1, ¤
(0)(z) =
B
(0)(z) = ª(z) and ^ ¢
(0)(z) = ^ £
(0)
i (z) = ©(z).
2. Set r = r + 1. If r > d ¡ ½ ¡ 1, algorithm terminates,
else compute ^ ¢
(r)(z) = z ¢ °
(r¡1) ¢ ^ ¢
(r¡1)(z) ¡ ^ ±
(r¡1)
0 ¢
^ £
(r¡1)(z) so that for i = 0;1;¢¢¢ ;d ¡ 2,
^ ±
(r)
i = °
(r¡1) ¢ ^ ±i+1 ¡ ^ ±
(r¡1)
0 ¢ ^ µ
(r¡1)
i : (9)
3. Compute ¤
(r)(z) = °
(r¡1)¢¤
(r¡1)(z)¡z¢^ ±
(r¡1)
0 ¢B
(r¡1)(z)
so that for i = 0;1;¢¢¢ ;d ¡ 3,
¸
(r)
i = °
(r¡1) ¢ ¸
(r¡1)
i ¡ ^ ±
(r¡1)
0 ¢ b
(r¡1)
i¡1 : (10)
4. If ^ ±
(r¡1)
0 6= 0 and k
(r¡1) ¸ 0, then
8
> > <
> > :
B
(r)(z) = ¤
(r¡1)(z);
^ £
(r)(z) = z
¡1 ¢ ^ ¢
(r¡1)(z);
°
(r) = ^ ±
(r¡1)
0 ;
k
(r) = ¡k
(r¡1) ¡ 1:
else
8
> > <
> > :
B
(r)(z) = z ¢ B
(r¡1)(z);
^ £
(r)(z) = ^ £
(r¡1)(z);
°
(r) = °
(r¡1);
k
(r) = k
(r¡1) + 1:
5. Return to step 2.
It is clear that, in the above algorithm, all the computations
in one iteration, (9) and (10), can be carried out in parallel,
from which we may easily develop its hardware realization
architecture, as shown later, with a critical path of only
Tmult+Tadd. Notice that ¤
(d¡½¡1)(z) is exactly the solution
of the key equation for ¤(z). Although ^ ­(z) = ^ ¢
(d¡½¡1)(z)
is not the solution of the key equation for ­(z), following
the argument in [7], we have
­( ~ X
¡1
i ) = ¡ ~ X
¡(d¡1)
i ^ ­( ~ X
¡1
i ); for i = 1;2;¢¢¢ ;v + ½:Control
m
(U)
 0
m 0
(L)
l 0
(r) l 1
(r) l i
(r)
D
1
0
D b
PE0 i
g
(r)
(r) d
MC
(r)
MC
(r)
(r) d
l i
(r)
g
(r) g
(r)
(r) d
b (r)
MC
(r)
(r) d
g
(r) g
(r)
(r) d
d
(r)
i
d
(r)
i
D
1
0
D
q
(r)
i
0
PE1d−2 PE1
i−1
(r)
i
i+1
d−3 PE1
(r)
i
d
PE1
PE1
k
(L) k
  i
 i
i
d−3
(r) l
 1  0
 d−2 d−3
(L)
m d−3
d−3 PE01 PE00 0
0
Error Locator Update (ELU) block Discrepancy Computation (DC) block
k
(U) (U)
  i
i
(L)
k
(U) k
(U)
k
(L)
  i  d−3
(U)
 1
(U)
 0 m m
d−2
(r)
(L)
m m
d
(r)
m
(L)
(r)
i
 d−2  d−3
(r)
0
PE0 PE0
(L)
d−3 d d d
k
(U)
k
Figure 1: Key equation solver (KES) block architecture.
Thus, using the above reformulated algorithm, we can ¯nd
all the errata locations ~ Xi by applying Chien search on ¤(z)
and derive all the errata magnitudes by rewriting (2) as
~ Yi = ¡
~ X
2¡d¡m0
i ^ ­( ~ X
¡1
i )
¤0( ~ X
¡1
i )
; for i = 1;2;¢¢¢ ;v + ½:
In this way, we still say that the Algorithm 4.1 solves the
RS decoding key equation and the corresponding hardware
block is referred as key equation solver (KES).
4.2 Decoder Architecture
Following the above discussion, we know that such errors-
and-erasures RS decoder principally contains six blocks: 1.
the syndrome computation (SC) block; 2. the erasure loca-
tor (EL) block to compute erasure locator polynomial ª(z);
3. the ©-block to compute polynomial ©(z); 4. the key equa-
tion solver (KES) block; 5. the Chien search (CS) block; 6.
errata evaluator (EE) block. Since all other blocks have
been well-investigated in conventional errors-alone RS de-
coder implementations, in what follows, we only consider
the architecture design for the EL, ©-block and KES blocks.
For the implementations of other blocks, readers are referred
to [2][8], etc.
The architecture design for EL block can be performed
in a quite straightforward manner. Let ®
t1;®
t2;¢¢¢ ;®
t½ be
the ½ erasure locations where t1 < t2 < ¢¢¢ < t½. Suppose,
before the erasure at ®
ts is detected, we have obtained the
intermediate erasure locator polynomial
ª
(s¡1)(z) =
s¡1 Y
i=1
(1 ¡ ®
tiz)
= 1 + Ã
(s¡1)
1 z + ¢¢¢ + Ã
(s¡1)
s¡1 z
s¡1:
Then, after detecting the erasure at ®
ts, we update the in-
termediate erasure locator polynomial as
ª
(s)(z) = ª
(s¡1)(z) ¢ (1 ¡ ®
tsz)
= 1 + Ã
(s)
1 z + ¢¢¢ + Ã
(s)
s z
s;
where Ã
(s)
i = Ã
(s¡1)
i ¡ ®
ts ¢ Ã
(s¡1)
i¡1 . Based on such iterative
update rule, we can easily develop one possible hardware
architecture as shown in Fig. 2 for the EL block if the re-
ceived codeword is fed to the decoder one code symbol per
clock cycle, which is a typical con¯guration in conventional
RS decoder design. Notice that the critical path here is
Tmult + Tadd.
a
0
D D
y
D
1
0
D
y
1 2 y
d−2
1,  when erasure detected
0,  otherwise
C =
Figure 2: . Erasure locator (EL) block architecture.
Next let's consider the KES architecture. Directly explor-
ing the above reformulated BM algorithm and referring to
the semi-systolic KES architecture presented in [7], we may
easily obtain the semi-systolic KES architecture, as shown
in Fig. 1, for the implementations of errors-and-erasures RS
decoder. This KES architecture mainly consists of two sub-
blocks: discrepancy computation (DC) block and error lo-
cator update (ELU) block, which perform the computations
of (9) and (10) in each iteration, respectively. Compared
with the KES for errors-alone RS decoder [7], this architec-
ture contains b
d¡1
2 c¡1 more PE0 blocks due to the higher
possible degree of errata locator polynomial ¤(z).
Notice that, as shown in Fig. 1, ·
(U)
i and ·
(L)
i in each
PE0i and ¹
(U)
i and ¹
(L)
i in each PE1i represent the ini-
tialization value of the corresponding upper and lower DFF
(D °ip-°op) in each PE0i and PE1i blocks. According to
Algorithm 4.1, if we initialize all the DFF's with
(
·
(U)
i = ·
(L)
i = Ãi; for i = 0;1;¢¢¢ ;d ¡ 3;
¹
(U)
i = ¹
(L)
i = Ái; for i = 0;1;¢¢¢ ;d ¡ 2;such KES block will produce ¤(z) and ^ ­(z) in d¡½¡1 clock
cycles.
Finally let's consider the implementation of ©-block for
the computation of polynomial ©(z). From (7) we know
that ©(z) is obtained through a polynomial multiplication
followed by modular z
d¡1. It's well-known that the polyno-
mial multiplication can be realized using a FIR ¯lter with
the coe±cients initialized by one of the two multiplying poly-
nomials [1], thus we can implement the ©-block using a d¡1
order FIR as shown in Fig. 3: each coe±cient is set to si,
the coe±cient of S(z), and each coe±cient Ãi of ª(z) is
sequentially fed to this FIR as primary input.
D D
s
D
r
r f
0
d−2 s d−3 r s s
y
−1 s 0
f
d−2 f
y , 0 r y
1 , ,
Figure 3: FIR Architecture for ©-block.
Each DFF in the such FIR ¯lter is reset to zeroes at the
beginning, then each coe±cient Ãi is input sequentially with
the subscript i increasing from 0 to ½. We can easily prove
that after the last element Ã½ is fed to FIR at (½ + 1)-th
clock cycle, as shown in Fig. 3, the output of all the DFF's
in this FIR are exactly Ád¡2;Ád¡3;¢¢¢ ;Á0, the coe±cients
of desired polynomial ©(z).
From the above discussion, we know that the ©-block
and the KES block complete computation for one codeword
frame in ½+1 and d¡½¡1 clock cycles, respectively. Since
the erasure number ½ may varies from 0 to d ¡ 1, it's de-
sirable to implement the ©-block and the KES block in the
same pipeline stage so that they work on the same codeword
frame and produce polynomial ¤(z) and ^ ­(z) in the ¯xed
d clock cycles. However, in each d clock cycles, the ©-block
and the KES block will be idle in d¡½¡1 and ½+1 clock cy-
cles, respectively, which is a somehow waste of silicon area.
In the following, we propose a simple scheduling scheme for
the KES block in such a way that it can compute ©(z) and
solve the key equation in total d clock cycles. Therefore, we
don't need to physically implement the above d ¡ 1 order
FIR for the ©-block.
Notice that, as shown in Fig. 1, if we set °
(r) = 1 and
MC = 0 and consecutively assign ±
(r) as Ãi with the sub-
script i increasing from 0 to ½, the DC block will perform in
the exactly same way as the d¡1 order FIR for the ©-block
as in Fig. 3. Based on the above observation, we propose the
following scheduling scheme for the KES block to complete
both the computation of ©(z) and solve the key equation in
total d clock cycles:
Procedure 4.1. KES Block Operation Scheduling
1. In the ¯rst ½ + 1 clock cycles, compute ©(z) using the
DC block only:
(a) Initialize all the DFF's in DC block with ¹
(U)
i = 0
and ¹
(L)
i = si, for i = 0;1;¢¢¢ ;d¡2, set °
(r) = 1
and MC = 0.
(b) For 0 · r · ½, in each clock cycle, set ±
(r) = Ãr.
After ½ + 1 clock cycles, the content of the upper DFF
in each PE1i block, ^ ±
(½+1)
i , is equal to Ái.
2. In the following d ¡ ½ ¡ 1 clock cycles, compute the
¤(z) using ELU block and ^ ­(z) using DC block:
(a) Re-initialize all the DFF's in DC and ELU blocks
with ¹
(L)
i = ¹
(U)
i = Ái and ·
(L)
i = ·
(L)
i = Ãi.
(b) For 0 · r · d ¡ ½ ¡ 1, in each clock cycle, set
±
(r) = ^ ±
(r)
0 and generate °
(r) and MC
(r) accord-
ing to Algorithm 4.1.
Finally, we obtain the polynomial ¤(z) and ^ ­(z) as
illustrated in Fig. 1.
5. CONCLUSION
Applying the methodology behind the algorithm trans-
formation proposed in [7], in this paper, we reformulated
the BM algorithm for errors-and-erasures RS decoding and
correspondingly present the semi-systolic architecture which
has a critical path of only Tmult+Tadd. Moreover, an oper-
ation scheduling scheme has been proposed for the key equa-
tion solver (KES) block so that the overall RS decoder hard-
ware complexity can be reduced without loss of through-
put. Compared with the errors-alone RS decoder, the pre-
sented errors-and-erasures RS decoder may achieve the same
throughput but requires extra d + 2 ¢ b
d¡1
2 c ¡ 3 multipliers
and d+b
d¡1
2 c¡3 adders due to the implementation of era-
sure locator (EL) block and higher possible degree of errata
locator polynomial ¤(z) in the KES block.
6. REFERENCES
[1] R. E. Blahut. Theory and Practice of Error Control
Codes. Addison Wesley, 1984.
[2] H.-C. Chang, C. B. Shung, and C.-Y. Lee. A
Reed-Solomon product-code (RS-PC) decoder chip for
DVD applications. IEEE Journal of Solid-State
Circuits, 36(2):229{238, Feb. 2001.
[3] C. B. et al. The U.S. HDTV standard the grand.
IEEE Spectrum, 32:36{45, April 1995.
[4] J. Forney. On decoding BCH codes. IEEE Trans. on
Inform. Theory, IT-11:549{557, Oct. 1965.
[5] J.-H. Jeng and T.-K. Truong. On decoding of both
errors and erasures of a Reed-Solomon code using an
inverse-free Berlekamp-Massey algorithm. IEEE
Trans. on Communications, 47(10):1488{1494, Oct.
1999.
[6] V. Pless. Introduction to the theory of error-correcting
codes. Wiley, 1998.
[7] D. V. Sarwate and N. R. Shanbhag. High-speed
architecture for Reed-Solomon decoders. IEEE Trans.
on VLSI Systems, 9(5):641{655, Oct. 2001, available
at http://www.icims.csl.uiuc.edu/~shanbhag/myhome.
[8] K. Seki, K. Mikami, M. Baba, N. Shinohara,
S. Suzuki, and H. Tezuka. Single-ship 10.7Gb/s FEC
CODEC LSI using time-multiplexed RS decoder. In
IEEE Custom Integrated Circuits Conference, pages
289{292, 2001.
[9] L.-L. Yang and L. Hanzo. Performance analysis of
coded M-ary orthogonal signaling using
errors-and-erasures decoding over frequency-selective
fading channels. IEEE Journal on Selected Areas in
Communications, 19(2):211 {221, Feb. 2001.