Abstract-A new cellular structure for a versatile Reed-Solomon (RS) decoder is introduced based on time domain decoding algorithm. The time domain decoding algorithm is restructured to be suitable for introducing the cellular structure. The main advantages of this structure are its versatility and very simple cellular structure. By versatile decoder we mean a decoder that can be programmed to decode any (n, k) RS code defined in Galois field 2 m with a fixed block length n and a fixed symbol size m. This decoder can correct both errors and erasures for any message length k. The introduced decoder is cellular and has a very simple structure and hence it is suitable for VLSI designs.
INTRODUCTION
REED-SOLOMON codes are used in both communications and data storage systems. Every particular application has its own distinct requirements usually satisfied by its own individual hardware design. Recently, two structures for versatile RS decoders were developed, one based on algebraic decoding [1] , [2] and the other one based on transform decoding [3] . These structures can decode a large range of RS codes but they are not cellular and are very complex to be designed. Other structures have also been presented [4] , [5] , [6] , [7] , [8] based on algebraic or transform decoding algorithms which are suitable for RS decoders with specific error correction capability. These structures can be used to design RS decoders on a single VLSI chip with low complexity and high throughput. However, they are not efficient for designing versatile RS decoders with programmable error correction capability.
In this paper, our main objective is to introduce a very simple cellular structure for versatile RS decoders. The introduced structure is based on time domain decoding algorithm [9] , [10] . The time domain decoding algorithm can be used to design noncellular [11] and cellular versatile decoders.
In this paper, first the existing time domain algorithm is restructured to be suitable for introducing the cellular RS decoder. Then the cellular structure is explained in detail. The cellular decoder is versatile and can be used to decode any (n, k) RS code defined in GF(2 m ) with a fixed block length n and a fixed symbol size m. This decoder can correct both errors and erasures and it can be programmed for the message size k or equivalently for the error and erasure correction capability. The complexity and throughput of the introduced decoder is also explained. It is shown that the versatile decoder has a very simple cellular structure and can easily be built on a single VLSI chip.
DECODING ALGORITHM
Let GF(2 m ) be the finite field of 2 m elements. Also, let n = 2 m -1 be the length of the (n, k) RS code over GF(2 m ) with minimum distance d = n -k + 1 where k denotes the number of m-bit message symbols. This code has the capability of correcting P errors and U erasures as long as 2P + U n -k. The time domain decoding algorithm was first presented by Blahut [9] . To design a cellular structure for the Reed-Solomon decoder, some modifications and restructurings should be introduced in the time-domain decoding algorithm. These modifications mainly reduce the complexity of the decoder. As shown in [9] , there are three steps in the time-domain decoding algorithm for finding the errata locator and value vectors. The operations in these three steps are different, but, there are some similarities and all together n iterations are required to perform the decoding. Our main objective is to combine these three steps and obtain an algorithm which has n iterations with the same operations in each iteration.
The first step, which has ρ iterations, is the time-domain erasure locator vector calculation. In this step, there is only one vector, the erasure locator vector, which is updated for ρ iterations.
In the second step, the Berlekamp-Massey algorithm is performed to find the errata locator vector λ. In this step, the vector λ is initialized with the result of the first step, and updated in each iteration and after n -k -ρ iterations λ = λ (n-k) is the errata locator vector.
In the third step, which has k iterations, the errata value vector is calculated. In this step, the vector λ is fixed and has the final value of the second step of the algorithm. But this time the value of another vector s is initialized with the received noisy vector v and updated in each iteration to correct v in the nth iteration.
To combine these three steps, two control variables are added to the algorithm to differentiate between three steps of the timedomain decoding algorithm. These variables, σ and β, are equal to one in the first ρ iterations. In iterations U + 1, , n -k, σ is zero, and β is one. In the last k iterations, both σ and β are zero.
The restructured time-domain decoding algorithm, having the same operations in all n iterations, is as follows. Let v be the received noisy Reed-Solomon code word with erasures at locations j r , r = 1, , U. The following set of recursive equations can be used to compute e i n ( ) for i = 0, 1, , n -1: In this algorithm, one modification is also provided which decreases the complexity of the decoder. As shown above in the re-0018-9340/97$10.00 ©1997 IEEE is altered by a factor equal to D in but this equals one since D is the primitive element of the GF(2 m ) and has order of n. Therefore, this algorithm has the same result as the original time domain decoding algorithm. The detailed proof of the modified algorithm is given in [12] . The flow diagram of the restructured time-domain decoding algorithm is shown in Fig. 1 . In this algorithm, first the initialization is performed. The iterations start by incrementing the iteration counter r and calculating the discrepancy ∆. Then, the control variables β and σ are found based on the iteration counter r and the value of ∆ is updated. In the next section, this restructured algorithm is used to introduce a cellular structure for Reed-Solomon decoders. The part of the algorithm inside the dashed box forms the control unit of the decoder. Note that the part for evaluation of ∆ and the variables λ i , b i , e i , and s i are not implicitly dependent on the value of iteration counter, r. This point will help in introducing a simple structure for the cells of the Reed-Solomon decoder.
CELLULAR STRUCTURE
In this section, a cellular structure for Reed-Solomon decoders is introduced. This structure is based on the restructured timedomain decoding algorithm shown in Fig. 1 . The introduced decoder can decode any (n, k) RS code with fixed code word length n = 2 m -1 and programmable message length k. The structure of the decoder is shown in Fig. 2 . This decoder consists of three sets of n identical cells, a Decoding-Control, and an Exponentiation cell. The Input/Output (I/O) cells receive the input to the RS decoder symbol by symbol with a symbol rate of Clk-In and provide each received component v i to the ith Decoding Cell. The Decoding Cells evaluate the errata value vector and apply the ith component of this vector e i to the ith I/O cell for all i. Then the ith I/O cell corrects the received symbol v i by adding the errata symbol e i . After correction, the message part of the corrected code vector is sent to the output of the decoder with symbol rate of Clk-Out.
As the received vector is being stored in the I/O cells, the erasure information ER is passed through the Exponentiation cell and stored in the Erasure cells. After receiving the whole block of the received vector, the number of erasures ρ and values of a j r , r = 1, , U are available to the Decoding control cell. The Decoding cells are responsible for all the functions outside the dashed box of Fig. 1 . These cells calculate the components of the errata value vector in n iterations using Clk-In. The Decoding Control circuitry controls the function of the Decoding cells in each iteration based on the erasure information, the discrepancy ∆ 0 and the number of information symbols k.
Input/Output Cell
To explain the function of the I/O cells let T be the time interval for receiving one block of data (n symbols), and also assume that the decoder receives the symbol v n-1 first and the symbol v 0 last. The output of the decoder is also in the same order which means that d k-1 = c n-1 is transmitted first and d 0 = c n-k last. Moreover, we have considered a systematic RS(n, k) code and the information symbols of v i and c i are in locations with indices i = n -1, n -2, , n -k while the parity symbols are in locations i = n -k -1, n -k -2, , 0.
During each time interval T three functions are performed in parallel. The first function is receiving a data block, v, symbol by symbol through in i registers and shifting the data towards the rightmost I/O cell with Clk-In clock. The second function is decoding the previously received data block available in dec i registers. The third function is transmitting the decoded data block available in out i registers by shifting the data block symbol by symbol towards the rightmost I/O cell with Clk-Out clock.
In Fig. 3 , η is a control variable which is equal to 1 after receiving the last symbol of each data block and before receiving the first symbol of the next data block and is equal to zero otherwise. Therefore, at the end of each time interval T, η is 1, and the contents of dec i registers are stored in out i registers and the contents of in i registers are stored in dec i registers. 
Erasure and Exponentiation Cells
There is an input, ER, to the decoder which gives the erasure information about any received symbol. If the received symbol v i is an erasure then we need to store registers. In the next time interval T, the eout i registers are shifted to the left by Clk-In to be used by the Decoding Control cell. During the same time interval, the ein i registers store the erasure information of the next data block. In Fig. 4 , the erasure counter is responsible for finding the number of the erasures in each data block and storing them in the m-bit register R ρ . The output of the RU register is available to the Decoding Control cell.
Decoding-Control Cell
The Decoding-Control cell is shown in Fig. 6 . The algorithm of this unit is based on the dashed box in Fig. 1 . The decoder has a Reset input which becomes high to enable the RS decoder. This input is only applied to the iteration counter and is not propagated to any other cell of the decoder. The iteration counter, which is a divide by n counter, is activated by Clk-In. The output of this counter r indicates the iteration number of the decoding process. A comparator evaluates the iteration controls η, σ, and β based on the iteration number r, the message length k, and the number of erasures in each data block, ρ. Value of η is 1 in the iteration r = 0 and is zero otherwise. Values of σ and β are as shown in Fig. 1 .
There are two other signals for controlling the decoding process of the Decoding cells which are the m-bit discrepancy ∆ and 1-bit variable δ. In iterations r = 1, 2, , U, the erasure information a j r is directed to the ∆ output. In the rest of iterations, the value of ∆ is equal to the received value of the discrepancy ∆ 0 which is calculated in the Decoding cells. The control variable G is evaluated as shown in Fig. 1 . This value is found based on the iteration number r, the number of erasures U, the discrepancy ' 0 and the value of the temporary variable L. The value of L is stored in an m-bit register which is initially set to zero and updated in each iteration based on (2).
Decoding Cell
The structure of the Decoding Cell is based on the operations available outside the dashed box in Fig. 1 . The detailed structure of the Decoding cell is given in Fig. 7 . This cell has three sets of m-bit registers for storing values of s i , O i , and b i after each iteration. The m-bit symbol v i is the input to the cell and the m-bit symbol e i is the component of the errata value vector which is calculated at the end of the decoding process. In each iteration, the ith cell evaluates the partial value of the discrepancy ∆ i and propagates it to the next cell. This partial discrepancy is calculated as,
based on the partial discrepancy coming from the previous cell ∆ i+1 . As shown in Fig. 2 , the partial discrepancy ' n is fixed to zero and this forces the output of the left Decoding cell in Fig. 2 to have the value of the discrepancy ' 0 which is an input to the DecodingControl unit.
In each iteration the Decoding-Control cell updates the discrepancy ∆ 0 and feeds it back to the cells. The Decoding-Control cell also calculates the control variables E, V, and G. 
COMPLEXITY AND THROUGHPUT
The introduced cellular structure is for a versatile error-anderasures RS decoder with programmable code parameters, n, k, and t. Such a versatile decoder can be used in different applications with different requirements on error correction capabilities. The length of the codeword, n, and number of information symbols, k, can be varied from 0 to 2 m -1. The main objective of this cellular structure is to provide VLSI designers with a simple and timeefficient development process. The introduced cellular structure is not optimized for any specific set of values of code parameters. Therefore, it is not easy and fair to compare the complexity and throughput of this structure with those of decoders or structures designed for fixed and specific coding parameters [4] , [5] , [6] , [7] , [8] . In this section, complexity and throughput of the introduced cellular structure are examined for various values of m.
Complexity
As shown in the previous section, main building blocks of the introduced cellular structure are the Galois field multiplier and mbit register. To discuss the complexity, let's define the RS-Decoder cell as the collection of one Erasure (Fig. 5) , one I/O (Fig. 3) , and one Decoding cell ( Fig. ) . The Decoder-Control cell is the combination of the Decoding-Control cell (Fig. 6 ) and the Exponentiation cell (Fig. 4) . Therefore, the cellular structure consists of n= 2 m -1 RS-Decoder cells and one Decoder-Control cell.
Each RS-Decoder cell consists of five Galois field multipliers and eight m-bit registers. There are also four m-bit exclusive or gates (XOR), four m-bit switches (multiplexer), four m-bit simple switches, and two 1-bit switches. Moreover, for fan-out problems, we need to pass all the control signals through a delay gate, therefore, 2m + 4 extra gates are needed for this purpose. By simple switch we mean that one of the inputs of the switch is constant.
The m-bit registers are very simple and each of them is a D Flip-Flop with one D input, one clock input, and one Q output. These registers do not need clear or set controls. The XOR gates, switches and delay gates of an RS-Decoder cell are altogether equivalent to five m-bit registers from the gate count point of view. Therefore, an RS-Decoder cell requires equivalent of 13 m-bit registers and five m-bit Galois field multipliers. The complexity of the decoder in terms of number of multipliers and number of m-bit registers is given in Table 1 . As shown in the table, a (255, K) RS decoder with programmable number of message symbols (k) requires 340,500 gates which can fit on a single VLSI chip. For the calculation of the total number of gates, standard basis multiplier and inverter are considered. The reason for this choice is having many multipliers and only one inverter in the design of the decoder, and the multiplier in standard basis has the least complexity compared to other multipliers [13] . For the inverter, a combinational logic circuitry or a table look-up ROM can be used.
Throughput
To discuss the throughput of the introduced decoder, the maximum propagation delay path for each iteration should be found. This path has the following three major components: The second delay component can also be reduced using the same kind of idea. In this case, instead of passing the control signals from one Decoding cell to another with one gate delay, we introduce a fan-out circuitry. That is, each bit of the control signals is applied to the cells in such a way that one control bit supplies eight of the Decoding cells. Therefore, the delay of this circuitry is only m -2 gate delays.
The total delay for one iteration in this case is equal to delay of 5m + 8 gates and the maximum bit rate at the input of the decoder is ] is the delay of one gate. In this case, the maximum bit rate of the decoder, given in Table 2 , is much higher. Note that this increase in the maximum bit rate decreases the number of gates of the decoder but the design of the decoder will be more difficult. 
CONCLUSION
In this paper, a new cellular structure for a versatile Reed-Solomon decoder was presented. The time domain decoding algorithm was restructured to be suitable for introducing the cellular structure. The structure of the cellular decoder is such that different RS codes defined in GF(2 m ) can be programmed to correct errors and erasures for fixed block lengths n and fixed symbol size m, by changing the number of message length k = 1, 2, , 255. From the point of view of versatility, the introduced cellular decoder is superior to all other RS decoder structures. 
