Forward Error Correction (FEC) is a key component in communication system which mostly contains scrambler, ReedSolomon coding, interleaving, and trellis coding. For the performance and complexity issues, design parameters are different in various applications. In this paper, a multi-mode FEC processor is presented to meet different system requirements with a power and area efficient architecture. The proposed processor is fully compliant to ITU-T J.83 cable modem system including the reconfigurable Reed-Solomon decoder and memory-based universal convolutional interleaver. With 0.18um 1P6M CMOS technology, the simulation result shows the FEC decoder can work over 100MHz while costs 54.5K gate counts and two 376x8 bits embedded duel-port SRAM. The average power consumption in most critical mode is about 34.2mW at 100MHz. While running at 7MHz that meets symbol rate of cable modem, the power dissipation is 2.32mW.
INTRODUCTION
In communication system, channel coding is a key module which uses various types of error correcting algorithm and interleaving techniques to minimize the effect of channel noise during data transmission. They can be summarized as the following four parts in most systems: scrambler, ReedSolomon(RS) coding, interleaving, and trellis coding. And different applications have specific parameters to achieve an optimum system. Due to the similarity in FEC sections, such as ITU-T J.83, DVB, and ATSC Digital TV, etc, a multi-mode FEC design is an important issue to lower down the design cost. As for the RS code, it is not easy to implement a decoder that meets different finite field definition and generator polynomial, and each application has its own dedicated hardware for RS decoding. Moreover, memory controller of interleaver is also difficult to generate proper addresses for multi-standard. In this paper, a multi-mode architecture of FEC decoder is proposed, which mainly contains a multi-mode RS decoder and a universal convolutional interleaver. The proposed design with the lowest overhead can support different annexes in J.83 and DVB.
The organization of this paper is as follows. In section2, the motivation of multi-mode FEC design will be described, and ITU-T J.83 cable modem system is chosen as the design platform . Reed -Solomon coding with the multi -mode implementation methods will be addressed in Section 3. In Section 4, a novel and low complexity memory-based interleaving methodology is presented. Section 5 mentions the scrambler and trellis coding. Finally, the overall multi-mode FEC decoder and performance are discussed in Section 6.
MULTI-MODE FEC DESIGN
An efficient architecture for multi-mode design is an important issue and challenge to lower down the design cost. In ITU-T J.83 recommendation, there are four annexes for digital transmission system. Digital television cable networks should use one of the systems which are specified in annex A, B, C and D. A comparison of FEC section in different annexes of ITU-T J.83 is listed in table 1. There are three modes in RS codes and various parameters in convolutional interleaving. It is a challenge to design a multi-mode FEC decoder to achieve various standards while considering the complexity and power consumption. The efficient architecture of multi-mode FEC design will be proposed in later sections. 
MULTI-MODE RS DECODER
Reed-Solomon decoding process can be divided into four steps [1] . First, a finite field multiplier (FFM) for different finite field definition should be designed. Then, the syndrome calculator calculates a set of syndromes from the received codewords. The key equation solver produces the error locator polynomial (x) and the error value evaluator polynomial (x) from the syndromes. By the Chien search and the error value evaluator, we can get the error locations and error values respectively. The proposed multi-mode architecture is described in the following sub-sections. It can be used in many applications, such as ITU-T J.83, DVB system, etc.
Multi-Mode Finite Field Multiplier
For different RS codes, the different primitive polynomials will cause a challenge to design a finite filed multiplier (FFM). However, FFM can be split into multiply and modular operation respectively. The primitive polynomial only has an impact on modular operation. Therefore, the complexity of programmable design just lies in the modular operation. A multi-mode FFM is proposed as shown in figure 1. (2 7 ) which are decided by current mode. The architecture of multi-mode syndrome calculator is shown in figure 2(c). For different specification, a specific group of cells will be chosen.
Based on [2] , moreover, the first t syndromes equal to zeros implies all syndromes are zeros, which can simplify the error detection procedure. It not only improves the power consumption, but also reduces the complexity. 
Key Equation Solver
To solve the key equation, (x) = (x) S(x) mode x 2t ………………. (1) Berlekamp-Massey (BM) algorithm is used due to its regular operation. For different t, it needs 2t iterations to find error locator polynomial (x). Base on the proposed multi-mode FFM and modified decomposed algorithm [1] [2], the multi-mode key equation solver is proposed. The computation of (x) after (x) results in fewer multiplications and additions than the original BM algorithm. It includes only one key equation solver with three proposed multi-mode FFMs to calculate (x) and (x) respectively. Hence, the hardware complexity is reduced. The architecture is depicted in figure 3 . 
Error Value Evaluator
Forney algorithm is a method to achieve error value evaluator. Assume j is the j-th root of error locator polynomial (x). For annex A, C, and D, the error value: Figure 6 shows the proposed architecture. It will calculate '( j ) and ( j ) at the same time while the left mux will choose j 2 , the bottom mux will choose j . After calculating '( j ), '( j ) will multiply j in annex A,C and D. In order to calculate the final error value, the bottom mux will choose the upper path. 
UNIVERSAL CONVOLUTIONAL INTERLEAVING
The main purpose of interleaving is to resist burst errors which are induced in noisy channel. And, convolutional interleaving has better ability to spread burst errors than block interleaving. The structure of (I, J) convolutional interleaver based on Forney approach and Ramsey type III approach [3] is shown in figure 7.
. . .
I-1

I-2 I-3
Interleaver De-Interleaver
Figure 7: Structure of (I, J) convolutional interleaving
It is not efficient for implementing so many pieces of FIFO for hardware which consumes lots of power, area and routing difficulty. Hence, a better solution is to use a two-port SRAM to reduce the area, and power consumption. The key issue becomes how to generate the correct address for each input and output data. As a result, a novel and low-complexity memory-based method to implement the multi-mode convolutional interleaver and de-interleaver is proposed, which is induced from [4] [5].
A Low Complexity Algorithm for Interleaving
Take an example for (12, 17) convolutional de-interleaver which is adopted in ITU-T J.83A, C and DVB system. In figure 8 , the datum received are 0 x x … x 12 x x x …204 1 x x x … 2244 2041 …11 …, where the number means the input index from interleaver and x means don't care symbol at the beginning. When de-interleaving, in figure 9(a), after writing 0 to memory, the interval between 0 and the next writing address is (I-1)xJ = 187 (here is x). The interval between previous address and the next address is (I-2)xJ = 170, …, until 2J = 34. These numbers are the same as the numbers of FIFO on branches. When writing 12 to the memory, it needs go back to the address of "initial writing address -1" and does the previous operation again. After writing 202 to the memory, the data stored in memory is like figure 9(b) . Then we can see that the interval between 0 and 1 is (I-2)xJ = 170. The distance between 1 and 2 is (I-3)xJ = 153… etc. At this time, the memory size in figure 9(b) is JxIx(I-1)/2, just the same as the minimum memory requirement in figure 7. Because there is no space to write 2244 into memory, it must increase memory sizes. By the observation, it needs more J memory size. As shown in figure 9(c), when 0 is read out from memory, 2244 is written into memory. Then, 1 is read out, 2041 is written to the original position of 0, etc. Therefore, the required memory size is JxIx(I-1)/2 + J. The maximum size is 65032 bytes for (128, 8) in ITU-T J.83B, and it just needs more 8 bytes than the original structure.
However, when the address is out of the memory size, it must modulo the address by the memory size. in (12, 17) de-interleaving
The detail operations in universal de-interleaver are described as pseudo codes in figure 10. Interleaver is just the inverse of de-interleaver, which can be easily formulated too. 
SCRAMBLER AND TRELLIS CODING
The circuit complexity of randomization is simple, and it is suggested to use dedicated hardware for different annexes.
Trellis coding is only included in J.83B. Using hard-decision Viterbi decoder with 16 states can fit the requirement of ITU-T J.83.
DISCUSSION
Base on the proposed multi-mode RS decoder, universal deinterleaver, descrambler and trellis decoder, the overall multimode FEC decoder is illustrated in figure 11 . Implemented with 0.18um 1P6M CMOS technology, the simulation result shows the FEC decoder can work over 100MHz while costs 54.5K gate counts, two 376x8 bits embedded dual-port SRAM and 65032 bytes external memory for de-interleaver with only 8 bytes overhead. In fact, 7 MHz has met the requirement. The detail gate counts of each module are listed in table 2. Table 2 also shows the gate counts of RS Decoder in ITU-T J.83D which is the most complex RS code in ITU-T J.83. The proposed multimode RS decoder is only larger about 1.1K gate count than that specified in J.83D. Besides, the (12, 17) interleaver in [6] needs two 128-byte RAM and four 256-byte RAM. On the other hand, it requires memory size of 1280 bytes. For the proposed algorithm and architecture in the same interleaver, it needs only one 1139-byte RAM and a low complexity controller. In [4] , [5] and [6] , they can only meet for suitable standard using the same component, but the proposed FEC processor can be used in many standards, such as ITU-T J.83, DVB, ATSC Digital TV, etc. The average power consumptions for each mode in postlayout simulation are listed in table 3. The floorplan of layout is shown in figure 12 , and the chip size is 1892x1892um 
