Abstract: A Reed-Solomon decoder for errorsand-erasures correction, based on a new algebraic decoding algorithm, is presented. This high-speed decoder requires only n clock cycles for decoding each received n-symbol block. A serial structure that requires very few multipliers and provides a general expression to calculate the coefficients of the erasure-locator polynomial is also presented. A (15, 11) RS decoder and its shortened version (7, 3) RS decoder are used as design examples to illustrate the operating procedure of the new decoding algorithm.
Introduction
The encoder/decoder for an RS code differs from a binary encoder/decoder in that it operates on multiple bits instead of individual bits [l-4). An (n, k) RS code is a block sequence of symbols in a Galois field GF(2"'). This sequence of symbols can be considered as the coefficients of a code polynomial C ( X ) = co + cI + cz X 2 + ' . . + c,-,X"-', where c, E GF(2"). The parameters of an (n, k) RS code are listed as follows: rn = number of bits per symbol n = 2" -1 = block length of a codeword in symbols k = number of information symbols in a codeword t = maximum number of error symbols that can be d = n -k + 1 = 2t + 1 = minimum distance of the For the t-error correcting RS code, the generator polynomial is given by where mod { K(X)X" '/g(X)} indicates the remainder polynomial of K(X)X'-' divided by g(X). An erasure is an error for which the error position is known but the magnitude is not. An RS code with minimum distance d is capable of decoding any pattern of w errors and s erasures as long as 2w + s i d. Some RS decoding algorithms for errors-and-erasures correction have already been proposed [l-51, including a standard algebraic method [l-41, a transform method [6, 71, and Euclid's method [l, 71. Recently, a modified stepby-step decoding algorithm [8, 91 for t-error-correcting cyclic codes was presented. In this paper, by combining the standard algebraic decoding method and the step-bystep decoding concept, a new algebraic decoding method is proposed to reduce the complexity of computation such that it can be easily implemented by VLSI technology.
Standard algebraic decoding algorithm
The standard algebraic decoding method [3-51 for errors-and-erasures correction of RS codes is described briefly as follows. Suppose that a code vector C ( X ) is transmitted, errors and erasures occur such that a received r(X) = C ( X ) + e ( X ) = ro + r,X + . . . + rn-,Xn-' is obtained. The error pattern e(X) can be described by a list of values and locations of its nonzero components. The location will be given in terms of an error-location number which is simply aj for the (n -j)th symbol. Thus, each nonzero component of e(X) is described by a pair of field elements yi (the error value) and X i (the error-location number), where X i and yi are both elements of GF(2"). If X i is known and yi is unknown, then (Ui, 6) is used to replace the pair ( X i , v, where Ui and fi denote the erasure locator and erasure magnitude, respectively. Suppose that w < t errors occur in positions X , , X , , ..., X , , with nonzero magnitudes Y,, Y2, ..., Y, , respectively. Also, s erasures occur in positions U,, U 2 , . . . , U, with respective magnitudes V,, V,, ..., V,. Furthermore, vector This work was supported by the National Science Council of Republic of China, under grant NSC82-0404-EW9-122.
I assume that 2w + s < d. The first step in the decoding process is to calculate the syndrome values. Since C(X) is a code vector and has these elements as roots, the syndromes S, are given by the lowest-order position of the cyclically shifted word, then the corresponding erasure value is calculated using the following formula:
" -1 S, = r(a9 = C(aq) + e(a3 = e(aq) for q = 1, 2, . . . , 2t.
As the syndrome values are given, the remaining considerations are to find the erro-locators and the corresponding error values, and also to find the erasure values providing that the erasure-locators are known already. Define the erasure-locator polynomial and the errorlocator polynomial, respectively, as follows:
Also, define the modified syndromes as As soon as the u.P)s are obtained, by solving eqn. 7, the error-locators can be found using the Chien search. Let the given erasure-locators and the calculated error- By the same procedures, all other erasure values can be solved step by step. The new algebraic decoding method is a combination of the standard algebraic decoding method and the step-by-step method.
In the following description a decoder for RS codes of distance 5 is used to illustrate the new algebraic decoding algorithm. This RS decoder has the capability of correcting two errors, or one error and two erasures, or four erasures. In our discussion, only cases for which the received word has, at most, one error and some erasures are considered.
Decoding algorithm for RS codes of distance 5
If the decoder restricts itself to correcting at most one error, the (n, k) RS code of distance 5 can work in the following ten cases:
(1) no error and no erasure (2) no error, and one erasure, say (Ul, VI) (3) no error, and two erasures, say (U,, VI), (U,, V,) (4) no error, and three erasures, say (Ul, Vl), (U,, V,), To make U, = 1, the entire received word is cyclically shifted step-by-step. Then we check the lowest-order symbol of the cyclically shifted word whether it is an erasure symbol or not. If there is an erasure symbol at The computation to find the coefficients uddw of the erasure-locator polynomial with U, being deleted is very complicated. To simplify the hardware complexity, a serial structure for calculating these coefficients is pro-
posed here.
I rewritten as (13)
The erasure-locator polynomial defined by eqn. 4 is If the decoding circuit is used for decoding the ( n -J k -f) RS code, a shortened version of (n, k) RS code, then the proper syndromes for decoding the received symbol r,-, -f is equal to the remainder resulting from dividing X'r(X) by MAX) where M k X ) = ( X -a:) for i = 1, 2, . . . , 2t. This computation can be accomplished with a one-stage LFSR, while the input is multiplied by a constant field element before entering the LFSR. Computing all gf) in this way, the extra f shifts of the syndrome register can be avoided, and the decoding algorithm discussed in the previous Section is directly applicable to the shortened RS codes. In fact, uAX) can be calculated serially. This is because multiplying a polynomial, say A(X), by ( X -U,) is equivalent to multiplying A ( X ) by X and subtracting the polynomial obtained from multiplying each coefficients of A ( X ) by U,. From this point of view, uAX) can be obtained by the following procedures:
(i) Set uAX) = 1 and
(iv) If q < s, then increment q by 1 and go to step 3, else stop
The erasure-locator polynomial with U, being deleted, defined by eqn. 8, can be rewritten as
Then ud&X) can be obtained by the same algorithm with slight modification as described below. 
Buffer module
The buffer module is composed of 16 one-symbol shift registers in GF(2'). Each one-symbol shift register is constructed by four binary shift registers linked in parallel. The first 15 symbol registers are used to store the received word, while the last one is used to latch the symbol which is shifted out and is ready for decoding. Since the number of shifts required for decoding errors and erasures is equal to that for calculating the initial syndrome values and erasure-locators, the calculation of syndrome and erasure-locator of the next received word can be performed at the same time. Thus, the average number of shifts required for decoding one received nsymbol block is just n. This line-speed decoding capability can be achieved by using two syndrome and erasure-locator calculation modules in the decoder structure as shown in Fig. 1 . The two modules interchange each other to calculate the syndrome values and erasurelocators of the received words. When a block is being shifted into the buffer module, the first syndrome and erasure-locator calculation module is working to calculate the syndrome values and erasure-locators of the current received block, while the second one holds the calculated syndrome values and the erasurelocators of the previous block which are ready for the errors-anderasures correction modules and others. As soon as the current block is completely received, the calculated result from the first syndrome and erasure-locator calculation module is passed to the errors-and-erasures correction modules for decoding that block, and the second syndrome and erasure-locator calculation module is enabled to calculate the syndrome values and erasure-locators of next received block.
2 Syndrome and erasure-locator calculation

Errors-and-erasures correction module
The errors-and-erasures correction module is used to calculate the error-locator and the erasure values from the calculated result of the syndrome and erasure-locator calculation module. The block diagram of the error-and-erasures correction module is illustrated in Fig. 3 , which comprises an erasure-locator polynomial calculation The erasure-locator polynomial calculation circuit comprising a sigma circuit and a sigma clock is used to calculate the coefficients of the erasure-locator polynomial using the algorithm derived. The sigma circuit first set udq(x) = 1, then multiply udq(x) by ( X -Uq), q = 1, 2, 3, 4. This circuit can be realised by three multipliers, four adders, and four pieces of one-stage LFSRs. The modified syndromes TI, To, and Tp can be obtained from the syndrome values and the coefficients of the erasure-locator polynomial by the following equations :
Therefore, the modified syndrome calculation circuit can be implemented by seven multipliers and five adders. 
Finite field multiplier, inverse and adder
The decoding speed of the decoder is dominantly determined by the delay (computation time) of the multiplier. The multiplication of two elements in GF(24) can be achieved by using a rn = 4 cellular-array multiplier [lo] . Another type of Massey-Omura multipliers [ 111 taking 250 the decoding speed, although some extra basis invert operations should be added for transforming the elements represented by a conventional basis { 1, a, a, a' , . . . , am-'} into the ones represented by a normal basis {a, a', a4, . . . , For any a in the finite field GF(2"), aZm = a. Hence, the inverse of a is a -l = a'"-'. The inverse in GF(z4) can be accomplished by using combinational logic circuits, and the addition in GF(24) can be accomplished by using 2-input Exclusive-OR (XOR) gates. Fig. 4 shows the required control signals for the (15, 11) RS decoder for error and erasure correcting. This module is implemented in Fig. 5 . The function of each control signal is described in the following list.
Control module
(1) CK is the data rate of the received word.
(2) CKXS is an internal clock of the decoder, and its (3) VD is the DC voltage source. (4) RESET is the initial reset of the decoder. (5) RS1 and RS2 are used to reset the syndrome and erasure-locator calculation modules after the results are computed and passed out for every 2n clock cycles which is the decoding time of a received block.
frequency must be at least 8 times faster than that of CK.
(6) IN is the serially received data input. (7) FLAG is the serial received erasure-flag. (8) ERR is an indicator to indicate whether the decoder has error correction capability or not. If the received word contains no more than two erasures, then ERR = 1, which indicates that the decoder has one error
correction capability; else ERR = 0, which indicates that 4.6 Operation sequence and simulation results the decoder can not correct any error.
Assume that r(X) = r3 X4 + r, X7 + r,X" is the (9) ERA is an indicator to indicate that there is an received block, where rl = 5, r, = 7, r3 = 6, and rl is an erasure whose locator is 1. (10) GATEl is the signal used to control some switch circuits of the decoder. GATEl will be low and high alternately for every n cycles. When GATEl is low, the current input symbols from IN will be switched to the first syndrome-and erasure-locator calculation module to calculate its initial syndrome values and erasurelocators. Then the calculated syndrome values and erasure-locators of the previous received block in the second syndrome and erasure-locator calculation module will be passed to the errors-and-erasures correction modules for further processing. After n clock cycles, GATEl will be high and the operations of the first and second syndrome and erasure-locator calculation modules alternate with each other.
(1 1) GATE3 will be high for one-half clock cycle after GATEl has changed. This signal is used to select the erasure-locator polynomial calculation circuit to calculate the coefficients of the erasure-locator polynomial.
(12) GATE4 will be high for one-half clock cycle after GATEl has changed one-half clock cycle. This signal is used to select the error-locator and erasure value calculation module to calculate the error-locator U E R R and save U E R R as an erasure-locator. When GATE4 is low, the error-locator and erasure value calculation module is selected to calculate the erasure value LV.
(13) CUI, CU,, CU,, CU, are used to control the data input of the erasure-locator polynomial calculation module. When CUI (or CU,) is high, the current erasure locator LO1 (or L03) from the erasure-locator calculation module is passed to the erasure-locator polynomial calculation module via the data bus L013. Similarly, when CU, (or CU,) is high, the current erasure-locator LO2 (or L04) from the erasure-locator calculation module is passed to the erasure-locator polynomial calculation module via the data bus L024. (14) CLEAR is used to reset the sigma circuit after the proper coefficients of the erasure-locator polynomial are latched.
(15) SIGNAL13 and SIGNAL24 are the signals which are put into the sigma-clock circuit to synthesise SIGNAL, the LFSR triggering clock of the sigma circuit.
(16) CTL is the latch clock of the sigma circuit to latch the proper coefficients of the erasure-locator polynomial calculated by the sigma circuit.
(17) SAMPLE is the sampling clock to sample the final correct result after the error or the erasure symbol has been removed from the decoding symbol. the (15, 11) RS decoder is illustrated in Fig. 6 . The period of CKX8 equals to 50 time units and that of CK equals to 800 time units. The operation sequence of the decoder is described as follows:
(1) In the first 15 cycles (time 0 to 12000), GATEl is slow, so r(X) is shifted into the buffer module and in the same time, syndrome values and erasure-locators are calculated in the first syndrome and erasure-locator calculation module.
(2) At time 12000, GATEl switches to high and the next word is ready for decoding, while the results calculated in the first syndrome and erasure-locator calculation module are switched to the errors-and-erasures correction module.
(3) At time 12400, GATE4 switches from low to high to select the error-locator and-erasure value calculation module to compute the error-locator of the first received word and save the computed error-locator as erasurelocator. Since r3 is an error and its locator is a", the calculated U E R R is equal to EH, where H denotes the hexdecimal representation, GATE4 switches from high to low after the UERR is obtained.
(4) At time 12400, the highest order symbol of r(X) is shifted out and the syndrome values and erasure-locators in the first syndrome and erasure-locator calculation module are cyclically shifted once, while the first symbol of the next block is received and its syndrome values and erasure-locators are calculated in the second syndrome and erasure-locator calculation module.
(5) From time 12400 to time 24400, the error and erasure correction module checks each LSFRs of the erasure-locator. If one of the erasure-locators is found to be 1 (i.e. a'), then it indicates that the currently shifted out symbol is an erasure, and the erasure value LV will be calculated from the errors-and-erasures correction module.
(6) After the erasure value LV is known, LV is added to the output of the buffer module BO, which generates the output AD. The output AD is sampled at the rising edge of SAMPLE and the final decoded result of the first symbol of r(X) is obtained as the output OUT.
The simulation result in Fig. 6 illustrates that the error symbol rl of r(X) is found at time 15200 and the erasure symbols r , , r , are found at time 18400 and 20800, respectively. The final decoded word of r(X) will be zero which is a codeword of the (15, 11) RS code. CK is the data rate of the received block. Therefore, the values and erasure-locators corresponding to the first decoded symbol of the shortened RS code can be obtained by multiplying r(X) by X8 before calculating syndrome and erasure-locators.
LV(3 00 4000 8000 12000 16000 20000 24000 28000 32000 working data rate of the decoder is dominantly determined by the computation time of the errors-and-erasures correction module, especially that of multiplications. The decoder works successfully when the period of CK is 800 time-units, assuming that one gate delay is one time-unit. Furthermore, if an IC technology with 0.5 ns gate delay is employed, for example 1.0 pm CMOS process, then the working data rate of this decoder will be up to 2M symbols/s.
5
Conclusion
A high-speed RS decoder for errors-and-erasures correction based on a new algebraic decoding method has been presented. A serial structure which provides a general expression for calculating the erasure-locator polynomial is also proposed. Consequently, a (15,9) RS decoder with d, , = 7 can be also applied to decode the (15, 11) or (15, 13) RS code with dnin = 5, 3, respectively, by only modifying the control module. The (7, 3) RS code, a shortened code derived from (15, 11) RS code, can be decoded by the same decoder structure with slight modification of the buffer module, syndrome and erasurelocator correction module, and control module. Since the block length of this code is 7, only 8 one-symbol registers are needed in the buffer module to store the received block and latch up the shifted-out symbol. The syndrome The numbers of gates required in the (15, 11) RS decoder and the (7, 3) RS decoder are shown in Table 1 . A brief comparison in hardware complexity and computation time between the standard algebraic decoding method and the new algebraic decoding method for RS decoder is also given in Table 2 , where the chip area taken by a GF(z4) multiplier is defined as one unit a and a multiplier delay is defined as one unit mt. 
