A new VLSI design of a pipeline Reed-Solomon decoder is presented. The transform
Introduction
Recently a VLSI design of a pipeline Reed-Solomon decoder was presented [ l ] . A modified form of Euclid's algorithm was developed which avoided computations of inverse elements. A systolic array architecture was designed, from a suggestion by Brent and Kung [ 2 ] , to implement the modified Euclid's algorithm. More recently, another VLSI design of an RS decoder was introduced [3] . It combined the algorithm in [4] and the modified Euclid's algorithm instead of the continued fraction technique. The decoder design in [3] used a time domain decoding algorithm to reduce the massive circuitry required by the inverse transform in [ 11 .
The decoder design also included the erasure correction capability, and, during the design process, a recursive architecture was derived to implement the modified Euclid's algorithm by far fewer circuits than used in [ 11 .
It has been pointed out [SI that the errata locator polynomial can be obtained directly from the Massey-Berlekemp algorithm if initialized properly. This suggestion led to improvements in the VLSI design in [3] .
In this article, an efficient time domain RS decoding algorithm is described and verified. It is shown that the modified Euclid's algorithm can produce the errata locator polynomial and errata evaluator polynomial simultaneously, similar to the Massey-Berlekemp algorithm. The VLSI architectures for syndrome computations, polynomial expansions, modified Euclid's algorithm performance, and polynomial evaluations are also described.
This work was carried out during the architectural phase of the Advanced Reed-Solomon Decoder (ARSD) project and should be viewed as a companion to the recent work of Truong, er 01. [ 6 ] . In that article, a transform domain decoder architecture is developed which, due to its design simplicity, has been chosen for the prototype VLSI implementation of the ARSD. However, the work presented here and in [6] clearly shows that the time domain architecture has many desirable features which make it an attractive candidate for future VLSI implementation.
II. The Time Domain Reed-Solomon Decoding Algorithm
Let N = 2 m -1 be the length of the (N,Z) RS code with design distance d .
Let
Then for each symbol ri that is labeled as an erasure, a-i should be the root of the erasure locator polynomial A(X).
That is,
Step 3. Multiply the syndrome polynomial S(X) by the erasure locator polynomial A(X) to form the modified syndrome polynomial
i=O be the received message. Suppose e errors and E erasures occur, and 2e + E < d -1. Define A = {a+Iri declared as an erasure}.
The decoding algorithm is as follows:
,
Step 1 . Compute the syndromes
Step 2. Compute the erasure locator polynomial A(*. Assume the erasure location information is received in the form of a binary sequence synchronous to the received message
Step 4. If deg(A(X)) > deg(T(X)), then no error has occurred, i.e., e = 0. Thus there is no need to perform the modified Euclid's algorithm. Let the errata locator polynomial a(X) = A(X) and the errata evaluator polynomial w(X) = T(X). If Step 5. Evaluate the errata locator polynomial o(X) for a -i ,
then ri is a corrupted symbol.
Step 6. Compute the corresponding errata magnitudes by
the errata magnitude Note that the scale factor carried by w(X) and o(X) is automatically cancelled by this division.
Step 7 . Subtracting 4 from ri yields the decoded codeword (12)
Note that the modified Euclid's algorithm in Step 4 is a combination of three techniques. First, observe that the error locator polynomial h(X) and the errata evaluator polynomial w(X) can be obtained from Euclid's algorithm by computing the GCD of the modified syndrome T(X) and Xd-l with the following initializations:
Since e errors and E erasures occur and 2e t E < d -1, as in Theorem 8.4 of [7] , the following properties hold:
Applying properties (14) and (17) to Theorem 8.5 of [7] implies that there exist a unique j and a unique polynomial p ( X ) such that By properties (15) and (16), P(X) is a constant, which can be taken t o be unity without affecting the roots of h(X) or the magnitudes e i . The second technique applied to the modified Euclid's algorithm is that the errata locator polynomial u ( x ) = A(x) X ( X ) can be obtained directly from the Euclid's algorithm. To achieve this, po(X) must be initialized t o be the erasure locator polynomial A(X) instead of 1 , and the iteration stop criterion must be changed t o deg(Ri(X)) < deg(Xi(X)).
Such a change simply results in all $(X) carrying the factor A(X). The errata evaluator polynomial w ( X ) is not affected by such initialization because X,(X) does not involve the computation of R,(X). As will be shown later, using the modified Euclid's algorithm t o compute the errata locator polynomial directly eliminates the need for polynomial multiplication circuits and delay lines in a VLSI pipeline implementation. Thirdly, the modified Euclid's algorithm uses cross multiplication and subtraction t o replace polynomial division. Such operations eliminate the need t o compute finite field inverse elements, which is performed by a table look-up, in this step. Since a look-up table involves the use of a large silicon area in VLSI, it is preferable to d o this as infrequently as possible.
Example. Consider an RS (8,4) code over G F (17) with generator polynomial g(X) = (X -2) ( X -22)(X-z3)(X-24). 
Thus the errata evaluator is w ( X ) = R 2 ( X ) = 10X2 -X + 2 and the errata locator is a(X) = & ( X ) = 9 X 3 + 2 X 2 + 2 X + 8 The VLSI architecture of the pipeline RS decoder is shown in Fig. 1 . The syndromes S(X) are computed by a form of polynomial evaluation. The ak generation block converts binary erasure location information t o powers of a which are the roots of the erasure locator polynomial. The modified syndromes T ( X ) and the erasure locator polynomial A(X) can be computed by two polynomial multiplication circuits. By the use of a multiplexing and recursive technique, the modified Euclid's algorithm is implemented with a significant reduction of cells over a previous design [ l ] . The errata evaluator polynomial w ( X ) and the errata locator polynomial o(X) are then evaluated using two polynomial evaluation circuits different from the one used for syndrome computation. The errata locations thus obtained direct the subtractions of the errata from the received messages to produce the decoded messages. In the following, the VLSI design of each functional block is described.
VLSl Implementation of the Syndrome Computation
The syndrome computation is an evaluation of a polynomial of length N on d -1 points.
Since N > d -1, it is best t o compute all syndromes simultaneously in the following manner as each ri is received:
1
T ( X ) E S ( X ) A(* mod Xd-l
can also be computed in the same manner except T ( X ) uses S ( X ) , instead of 1, as an initial condition. Therefore, a polynomial expansion circuit is developed to calculate T ( X ) and
N X ) .
Note that for an arbitrary S(X), which may be 1 ,
This computation can be accomplished by a linear shift of
S ( X ) , multiplication of every coefficient of S ( X ) by a-i,
and finite field additions. A systolic array is designed, as shown in Fig. 3 . t o implement such simple operations. The control signal "zero" ensures that the resultant polynomial would not be changed ifa-' = 0.
V. A New Architecture to Perform the Modified Euclidean Algorithm
A systolic array was designed in [2] t o compute the error locator polynomial by a modified Euclidean algorithm. The array required 2 t cells, twice the number of correctable errors. It is capable of performing the modified Euclidean algorithm continuously.
Note that rN-l is the first received symbol. Starting from the innermost parentheses, syndrome Sk is gradually computed as ri are received. After ro is entered, all d -1 syndrome computations are completed at the same time. They are ready t o be shifted out serially at that point. A systolic array design of a syndrome computation circuit is shown in Fig. 2 .
In the modified Euclidean algorithm only one syndrome polynomial is computed in the time interval of one code word. As a consequence, the original architecture in [2] of a pipeline RS decoder is not as efficient as it might be. A substantial portion of the systolic array is always idling. This fact makes possible a more efficient design with fewer cells
IV. A VLSl Design for Polynomial Expansion
\ and no loss in the throughput rate. Figure 4 shows the new alternate architectural design. The input multiplexer directs the syndrome polynomials to different cells. Each processor cell is almost identical to the cell presented in [2] , except that it is used to process data recursively.
The architecture of the new basic cell is given in Fig. 5 . Compared with the previous systolic array design [ 2 ] , the present scheme for multiplexing the recursive cell computations significantly reduces the number of cells and as a consequence the number of circuits. Table 1 shows that the cell reduction is greater for high rate codes.
VI. A VLSl Design of a Polynomial Evaluation Circuit
In RS deco'ding the errata locator polynomial One last observation on the polynomial evaluation: the evaluation of o'(X) uses only the coefficients of a(X) with odd power terms. This property makes it possible to obtain the evaluation of o'(X) as a by-product from the evaluation of o(X) at no cost. As illustrated in Fig. 7 , simply use two smaller exclusive-OR trees to sum the even terms and odd terms of a(X) separately. The summation of the odd terms yields C J ' ( ( Y -~) . Another exclusive-OR operation on the two partial sums results in o(a-') itself. 
