Abstract Reed-Solomon (RS) codes play an important role in providing error protection and data integrity. Among various RS decoding algorithms, the Peterson-Gorenstein-Zierler (PGZ) in general has the least computational complexity for small t values. However, unlike the iterative approaches (e.g., Berlekump-Mussey algorithm), it will encounter divided-by-zero problems in solving multiple t values. In this paper, we propose a multi-mode hardware architecture for error number ranging from zero to three. We first propose a cost-down techniques to reduce the hardware complexity of a t=3 decoder. Then, we perform algorithmic-level derivation to identify the confgurable feature of our design. With the manipulations, we are able to perform multi-mode RS decoding in one unified VLSI architecture with very simple control scheme. The very low cost and simple datapath make our design a good choice in small-footprint embedded VLSI systems such as Error Control Coding (ECC) in memory systems
INTRODUCTION
Reed-Solomon (RS) code has a widespread use for forward error correcting in digital transmission and storage systems. It is a special case of BCH codes, and has become a popular choice to provide data integrity due to its good error correction capability for burst transmission errors [ 
1][2][3].
Among various RS decoding algorithms, the Peterson-Gorenstein-Zierler (PGZ) algorithm [4] [5] provides the simplest way to realize the RS decoder for t S 3 . It is very cost-effective for systems that require only small correcting capability, e.g., Error Control Coding (ECC) in processor-memory systems and digital answer machines. Unlike the iterative RS decoding methods (e.g., Berlekump-Mussey algorithm [6] [7]), the major drawback of the conventional PGZ algorithm works for only single correction capability. That is, the PGZ circuit to solve t=3 cannot function correctly if t is 1 or 2. As a result, a t S 3 PGZ decoder will need three copies of hardware componmts to compute t = I , t=2, and t=3, respectively. The whole circuit is shown in Figure 1 
The proposed multi-mode PGZ decoder Obviously, placing three copies of decoders on a circuit will definitely be a waste of silicon area and cost. We seek a simple way to merge three different decoders into one unified VLSI circuit, In this paper, we derive a configurable VLSI architecture to perform the multi-mode RS code for various correction capabilities (i.e., different t values) based on the Peterson-Gorenstein-Zierler (PGZ) algorithm. We call it Multi-mode PGZ decoder as illustrated in Figure 1 (b). The reconfigurable feature of the proposed multi-mode PGZ decoder can solve t=O, 1,2,3 errors altogether, which leads to significant saving in hardware cost.
The rest of this paper is organized as follows. In Sec. 2, we go through the details of the PGZ decoding algorithm. Then, we derive the reduced-complexity RS decoder for t=3. In Sec. 3 and 4, we present the multi-mode RS decoder. In Sec 5, we discuss the hardware complexity to illustrate the hardware saving of our approach. Finally, we conclude our work in Sec. 6.
REVIEW OF PGZ ALGORITHM Syndrome Calculation
Let polynomial c(x) denote the transmitted code word. Then the received code word, r(x), can be represented as 
PGZ Algorithm fort =1
Given t=Z, from Eq. (4), we have Then we can compute the error location as o ( x ) =(To + x Next, we can solve the key equation for t=l
where the error value polynomial is Nx)=%, and O O =CTOsl.
PGZ Algorithm for t=2
For t=2, Eq. (4) is reduced to Then, we have Then the error location polynomial can be written as
Solving the key equation for t=2 yields
Then, the error value polynomial can be represented as
PGZ Algorithm for t=;3
Similarly, for t=3, we have
Then, the error location polynomial can be written as (22) o(n) = 0, + 0,x +o,x2 + x3
The key equation for t=3 can be written as what follows, we provide a method to calculate to oo, ol, and o2 in a cost-efficient way.
The Reduced-Complexity Decoder Architecture for t=3
According to our observation, in Eq. (21) the denominator have two S3S4S5 terms, which can be cancelled out on Finite-field addition. This condition can be applied to the numerator of oo, which contains two S2S3S4 terms. We also discover that the term, SzS5, appears quite often in Eq. (21), e.g., it is the common term of S2S2S5, S2S3S5, S2S4S3, S2SsS5. Thus, if we calculate S2S5 first, the overall computation complexity can be reduced significantly. Similarly, we can identify other common terms, such as S&, S&, S3S3, S2S5, S,S5, S&, and calculate them first, which leads to cost-efficient architecture as illustrated in Figure 2. When oo, o] , and o2 are available, a, Q, and 02 can be obtained from Eq. (26) , as illustrated in Figure 3 . 
MULTI-MODE PGZ ALGORITHM AND ARCHITECTURE

Problems of t=3 PGZ Architecture when t=l or 2
The block diagram introduced in Sec 2.D can function correctly only when the received code word has exactly three errors. However, if the error number is less than three, divided-by-zero problem will occur. Specifically, for t=3, we have to solve
If the error number is less than 3, the three columns of the 3-by-3 matrix will become linearly dependent, that is Apparently, the c values are now equal to divided-by-zero numbers, which cannot be manipulated anymore. Hence, the t=3 architecture above cannot guarantee the right result given that t=l or 2. To overcome this situation, three copies of hardware (Figure l(a) ) are needed, together with a specific state machine to check the error status.
The Proposed Multi-mode Decoding Algorithm
In fact, the zero values contain some information to facilitate our derivations. That is, by recognizing one of four terms in Eq. (29) and one of three terms in Eq. (30) the error number can be decided. For instance, (S2SqSg+SqSqSq+S3S3S6+S2S5Sg) will equal to zero when t=0,1,2; (S2Sq +S3S3 ) will equal to zero when t=0,1 and S2 will equal to zero when t=O. Consequently, we employ these three terms to detect the error number t. Figure 4 shows the flowchart to detect the error number. By examining Eq. (14) carefully, we can discover that SI&, S2S2, S2S4, S3S3, S2S3, SJ4, and SlS3+S2S2, S2S4+S3S3, S2S3+SIS4 are generated when calculating CJ for t=2. Our approach is to compute CJ for t=3 using these terms as basis. Meanwhile, as we mention in Sec 2.D, two S3S4S5 terms and two S2S3S4 terms can be neglected, which helps a lot in reducing the overall complexity. Although there are more hardware, the multi-mode PGZ decoder will generate the term needed to calculate different CJ for t=1,2,3 at the same time. Providing that we know the error number, the correct term to calculate CJ value can be chosen. Multiplexors in the multi-mode decoder will perform this selection. Figure 5 and Figure 6 show the block diagram of the proposed multi-mode PGZ decoder architecture. The algorithm of the controller will base on the flowchart in Figure 4 . 
MULTI-MODE CHIEN'S SEARCH & FORNEY 'S METHOD
After locating all (T and w values, the error location polynomial of Eq. (5) and the error value polynomial of Eq. (7) 
, and (34), for t = I , t=2, and t=3, respectively: 
The equation for t=3 case is obviously the most complicated. Therefore, once an architecture can resolve it, the equation for t=2 as well as t=I can be calculated only by changing coefficients. The controller will control the multiplexor to select the proper w values in Figure 6 . Figure 7 shows the implementation of Chien's search & Forney's method. The offset is the corrupted data, which must be added to corresponding error value E, to produce the corrected data. 
COMPARISON OF COMPLEXITY
The main drawback of PGZ algorithm is that its hardware complexity will rise rapidly provided that t is larger than three. Direct implementation of the PGZ algorithm for t=3 without employing any cost-down techniques requires 40 Finite-field multiplier (FFM) and 16 Finite-field adder (FFA). By exploiting the special properties of the finite field operations in Sec 2.D, we had derived a reduced-complexity PGZ decoder for t=3.
It requires only 21 FFh4 and 11 FFA and the hardware complexity is saved by approximately 50%. Furthermore, the design techniques of the reduced-complexity t=3 design is applied to our multi-mode PGZ decoder for any t 1 3 . It needs only 24 FFM and 12 FFA. The comparison of hardware complexity is shown in Table 1 . As we can see, compared with the reduced-complexity designs, only three additional FFM and one addition FFA are required in our multi-mode PGZ architecture. That is, our multi-mode PGZ architecture can solve for t=0,1,2,3 errors in one unified VLSI architecture, but with very small hardware overhead.
Arc hi tec ture type Direct implementation PGZ algorithm for t = 3
The derived reduced complexity PGZ for t = 3 The proposed Multi-mode PGZ for t = 0, I , 2, 3
In the paper, we proposed the algorithm derivation and VLSI architecture of a multimode PGZ-based RS decoder. It can compute the correct error locations and error values for any t less than four, accompanied by very small complexity. Due to the help of configurable architecture, we can easily perform the RS code for different values of t without re-designing the hardware architecture. 
