# N86-22791 

# A Single Chip VLSI Reed-Solomon Decoder 

H. M. Shao, T. K. Truong, I. S. Hsu, and L. J. Deutsch<br>Communications Systems Research Section<br>I. S. Reed<br>University of Southern California


#### Abstract

A new VLSI design of a pipeline Reed-Solomon decoder is presented. The transform decoding technique used in a previous design is replaced by a time domain algorithm. A new architecture that implements such an algorithm permits efficient pipeline processing with minimum circuitry. A systolc array is also developed to perform erasure corrections in the new design. A modified form of Euclid's algorithm is implemented by a new architecture that maintains the throughput rate with less circuitry. Such improvements result in both enhanced capability and a significant reduction in silicon area, therefore making it possible to build a pipeline (31,15) RS decoder on a single VLSI chip


## I. Introduction

Recently Brent and Kung (Ref. 1) suggested a systolic array architecture to compute the greatest common divisor (gcd) of two polynomials. Based on this idea a VLSI design of a plpelne Reed-Solomon decoder was developed (Ref. 2). The syndrome computation of this decoder for a 4-bit $(15,9)$ RS code was implemented on a chip (Ref. 3).

In the design of the chip for the above-mentioned decoder, three major problems arose:
(1) While the architecture for syndrome computation took ( $N-I$ ) cells for an ( $N, I$ ) RS code, it required $N$ identical cells to implement the inverse transform in the architecture suggested in Ref. 2. As a consequence for a long code such as the $(255,223)$ RS code, the inverse transform circuit would need 255 cells and be quite large.
(2) The basic cell of the systolic array needed to perform a modified form of Euclid's algorithm occupied considerable silicon area, approximately 60 times the size of a syndrome computing cell. Since the decoding algorithm in Ref. 2 required ( $N-I$ ) of such cells, the entre systolic array needed much more sllicon area than desired.
(3) Erasure corrections became necessary and were not included in the orignal design. Hence the decoder required several modifications of the orignal architecture design in Ref. 2.

To reduce the large circuit area required by the inverse transform operation it was decided to modify the onginal transform decoding algorithm. Also after considering the need for erasure correction, it was found that the decoding algorthm given in Ref. 4 could accommodate both requirements.

In this algorithm the errata magnitudes are calculated in the time domain and a Chien search is used to find the error locations. The architecture of the new algorithm is designed to operate sequentally in a pipeline, thereby enabling the circuif. șize to grow with the error correctıng capability $(N-I)$ instead of the code length $N$.

The systolic array designed origınally for the modified form of Euclid's algorthm could process polynomials continuously (Ref. 2). However, in real-tıme RS decoding, there is a need to compute only one syndrome polynomial for each received codeword. If one takes advantage of this by a better utilization of multiplexing, the required pipeline throughput rate can be maintained by the use of fewer basic cells.

In this article, an improved VLSI architecture over that in Ref. 2 is developed utilizing the above observations. A systolic array is also designed for the needed polynomial expansion used in the erasure polynomial computation. These new modifications result in both an enhanced capability and a significant reduction in slicon area without any loss in the pipeline throughput rate.

## II. The Decoder Architecture

Let $N=2^{m}-1$ the length of the ( $N, I$ ) RS code over $G F\left(2^{m}\right)$ with design distance $d$. Suppose that $t$ errors and $s$ erasures occur, and $s+2 t<d-1$. The decoding procedure in Ref. 4 is summarized as follows

Let $X_{i}$ be an error location or an erasure location and $\Lambda=\left\{X_{i} \mid X_{i}\right.$ is an erasure location $\}, \lambda=\left\{X_{i} \mid X_{i}\right.$ is an error location\}. Let $Y_{l}$ be the corresponding errata magnitude and $r=\left(r_{0}, r_{1}, \ldots, r_{N-1}\right)$ be the recenved vector.

Step 1 Compute the syndrome polynomial

$$
S(Z)=\sum_{k=1}^{\infty} S_{k} Z^{-k}
$$

where

$$
\begin{align*}
S_{k} & =\sum_{n=0}^{N-1} r_{n} \alpha^{n k} \\
& =\sum_{i=1}^{S+t} Y_{i} X_{i}^{k} \tag{1}
\end{align*}
$$

for $1 \leqslant k \leqslant d-1$

The derivative $P^{\prime}(Z)$ of $P(Z)$ is obtained by dropping the even terms of $P(Z)$. The errata magnitudes $Y_{k}$ are calculated then by a field inversion and a number of multiplications Next the error locations are obtained in the form of a binary sequence by the use of another polynomial evaluation circuit which performs the Chien search on $\lambda(Z)$. This sequence of error locations, together with the input erasure location binary sequence, directs the addition of $Y_{k}$ to the received message.

## III. A VLSI Design for Expanding the Erasure Locator Polynomial

It is reasonable to assume that the erasure location informatoon derived from outside the chip, possibly from a convolutoonal decoder. Let it arrive serially in the form of 1 's and 0's. A simple circuit of the form shown in Fig. 2(a) first converts this erasure data into a sequence of $\alpha^{k}$ 's and 0 's, where $\alpha^{k} \in \Lambda$.

Given $\alpha^{k} \in \Lambda$, the computation of the erasure polynomial demands the expansion of

$$
\begin{align*}
\Lambda(Z) & =\prod_{a^{k} \in \Lambda}\left(Z-\alpha^{k}\right) \\
& =\left(Z-\alpha^{k_{1}}\right)\left(Z-\alpha^{k_{2}}\right) \ldots\left(Z-\alpha^{k_{S}}\right) \tag{6}
\end{align*}
$$

Note that for an arbitrary polynomial $Q(Z)$ that

$$
\begin{equation*}
Q(Z)\left(Z-\alpha^{k}\right)=Z Q(Z)-\alpha^{k} Q(Z) \tag{7}
\end{equation*}
$$

Such an operation involves polynomial shifts, scalar multıplications and additions Thus the multiplications of ( $Z-\alpha^{k}$ ) in Eq. (6) can be implemented by the systolic array given in Fig. 2(b). Since it contains zeros as well as $\alpha^{k}$ 's, the input stream is used to control the updating of the latches in each basic cell. At the end of the arrivals of the erasure locations, the coefficients of $\Lambda(Z)$ are loaded from the latches into registers and shifted out serially.

## IV. A New Architecture to Perform the Modified Euclidean Algorithm

A systolic array was designed in Ref. 2 to compute the error locator polynomial by a modified Euclidean algorithm. The array required $2 t$ cells, twice the number of correctable errors. It is capable of performing the modified Euclidean algonthm continuously

In the modified Euclidean algorithm only one syndrome polynomial is computed in the time interval of one code word. As a consequence, for the original architecture in Ref 2, a pipelıne RS decoder is not as efficient as it might be. A substantial portion of the systolic array is always idling This fact makes possible a more efficient design with fewer cells and no loss in the throughput rate.

For the ( $N, I$ ) RS code, the length of the syndrome polynomial is $N-I$. The maximum length of the resultant Forney syndrome polynomial is also $N-I$. Imagine now that a single cell is used recursively to perform the successive steps of the modified Euclidean algorithm instead of pipelining data to the next cell. Then it would take $N-I$ recursions to complete the algorithm, where each recursion requires $N-I$ symbol times. Therefore, using a single cell recursively requires only a total of ( $N-I)^{2}$ symbol time to complete the modified form of Euclidean algorithm. Since a syndrome polynomial needs to arrive every $N$ symbol tımes, only $\left\lfloor(N-I)^{2} / N\right\rfloor$ cells are needed to process successive syndrome polynomials at a full pipeline throughput rate.

Figure 3 shows the new alternate architecture design. The input multıplexer directs the syndrome polynomials to different cells. Each processor cell is almost identical to the cell presented in Ref.2, except that it is used to process data recursively.

The primary difference in the new cell structure from the architecture of the previous cell (Ref.2) is presented as follows: Since division is avoided in the modified form of Euclid's algorithm, a scalar factor appears at the output Although such a scale factor, call it $K$, is irrelevant to the problem of finding roots of the error locator polynomial $\lambda(Z)$, it must be removed from the errata evaluator polynomial $A(Z)$. In order to effectively utilize the processor cell given in Ref. 2, the factor $K$ which appears at the output of each cell is calculated independently of the cell computation This is accomplished by using a multiplier, operating recursively, to accumulate the product of all the nonzero leading coefficients of the divisor polynomials. An inverse computation circuit and a multiplier after the demultiplexer is used to remove the unwanted scalar $K$ from $K A(Z)$. This computational process is illustrated in Fig. 3

The architecture of the new basic cell is given in Fig 4. Compared with the previous systolic array design (Ref 2), the present scheme for multiplexing the recursive cell computations significantly reduces the number of cells and as a consequence the number of crrcuits Table 1 shows that the cell reduction is greater for high rate codes.

## V. A Polynomial Evaluation Pipeline

Polynomials are evaluated not only in the Chien search process, but also when the errata magnitudes are computed. In RS decoding, one needs to evaluate

$$
\begin{equation*}
A(Z)=\sum_{i=0}^{S+t-1} A_{i} Z^{\prime} \tag{8}
\end{equation*}
$$

for $Z=\alpha^{k}$ and $0 \leqslant k \leqslant N-1$ given $A_{i}, 0 \leqslant i \leqslant s+t-1$ Note that Eq (8) has a form which is identical to the syndrome computation Eq (1).

However, in Eq. (8) the polynomial is shorter than in Eq. (1). Also since $N>s+t-1$, Eq. (8) is evaluated over a wider range than Eq. (1) is computed. These two differences make it inefficient to implement Eq. (8) in a manner similar to that used for syndrome computations. A better method is to evaluate $A_{i}\left(\alpha^{I}\right)^{k}$ sequentially for each $k$ at cell $i$. This is illustrated in Fig. 5. The polynomial coefficient $A_{i}$ is multiplied by $\alpha^{i}$ at the initialization of cell $i$. From then on a feedback loop computes the quantities $A_{i}\left(\alpha^{i}\right)^{k}$ for $k=1,2,3, \ldots, N-1$. The summation shown at the bottom of the figure is implemented quite simply since all quantities are binary.

## VI. Conclusion

An improved VLSI architecture of a pipeline Reed-Solomon decoder is presented herein Compared with the previous design in Ref 2, this architecture not only now corrects erasures, it is simpler, more regular, smaller in chip area and operates equally as fast. It is estimated that the polynomial expansion circuit and the polynomial multiplication circuit need approximately the same number of transistors as the syndrome computing pipeline. On the other hand, each polynomial evaluation circuit takes about half the number of transistors. Finally, each cell in the modified form of Euclid's algonithm circuit requires approximately the same chip area as the syndrome circuit.

Based on a previous nMOS chip fabrication of the syndrome pipeline (Ref. 3) and the design of the basic cell of the modified form of Euclid's algorithm (Ref. 6), it is estimated that a $(15,9)$ RS decoder chip would require about 29 thousand transistors. A $(31,15)$ RS decoder would require about 88 thousand transistors. Considering the presently existing VLSI technology, a high throughput 5-bit $(31,15)$ RS decoder could be implemented readily on a single VLSI chip. Of course such a chip would have a possible immediate application to JTIDS (for Joint Tactical Information Distribution System of DoD ).

## References

1. Brent, R.P., and Kung, H. T., "Systolic VLSI arrays for polynomial GCD computations," Dep. Computer Science, Carnegie-Mellon Univ., Pittsburgh, PA, Rep., 1982.
2. Shao, H. M., Truong, T. K., Deutsch, L. J., Yuen, J. H., and Reed, I. S., "A VLSI Design of a Pıpeline Reed-Solomon Decoder," IEEE Trans. on Computer, Vol. C-34, No. 5, May 1985, pp. 393-403.
3. Shao, H. M., "A VLSI Syndrome Computing chip for Reed-Solomon Decoding," to be published in TDA Progress Report 42-85, Jet Propulsion Laboratory, Pasadena, Calif.
4. Reed, I. S., Truong, T. K. and Miller, R. L., "Decoding of B. C. H. and R.S codes with errors and erasures using contınued fractions," Electronic Letters, August 1979, Vol 15, No. 17, pp. 542-544.
5. Peterson and Weldon, Error-Correcting Codes, 2nd Edition, the MIT Press, Cambridge, pp. 172-173, 1972.
6. Hsu, I. S. and Shao, H M., "A VLSI chip for the implementation of the modified Euclid's Algorithm," to be published in TDA Progress Report 42-85, Jet Propulsion Laboratory, Pasadena, Calif.

Table 1. The comparison of the number of cells required in the modified Euclid's algorithm computation

| RS Code | Full Systolc Array | Multuplexing on <br> Recursive Calls |
| :---: | :---: | :---: |
| $(15,9)$ | 6 | 3 |
| $(31,15)$ | 16 | 9 |
| $(255,223)$ | 32 | 5 |



Fig. 1. VLSI architecture of a pipeline Reed-Solomon decoder for both errors and erasures correction


Fig. 2. A systolic array to expand a polynomial


Fig. 3. The new architecture to perform the modified form of Euclid's algorithm where $n=\left\lfloor(N-I)^{2 / N}\right\rfloor$


Fig. 4. The structure of a recursive cell that performs the modified Euclid's algorithm


Fig. 5. A polynomial evaluation pipeline circuit

