Proposed is a new area-efficient truncated inversionless BerlekampMassey (TiBM) architecture for the Reed-Solomon (RS) decoder. The area-efficient feature of the proposed architecture is obtained by truncating redundant processing elements in the key equation solver (KES) block using the BM algorithm. This increases the hardware utilisation of the processing elements used to solve the key equation and reduces the hardware complexity of the KES block. The proposed TiBM architecture has the lowest hardware complexity compared with conventional KES architectures.
Existing RiBM architecture: The RiBM architecture [1] consists of 3t + 1 PEs and a control unit in which each PE includes two registers. The registers from PE 0 to PE 2t21 are initialised to16 syndromes value and the registers from PE 2t to PE 3t21 are initialised to 8 zero values. The registers in PE 3t are initialised to 1. The upper register and bottom register perform to update d i (r) and u i (r), respectively. The control unit controls the MC(r), g(r) and d 0 (r). After 2t clock cycles, PE 0 -PE t21 generate the error value polynomial v(x) and PE t -PE 2t generate the error locate polynomial l(x). First, PE 2t -PE 3t21 were initialised to zero values and the zero values in each PE 2t -PE 3t21 fed to the next PEs continually during some period, as shown in Fig. 1 . Therefore, this fact confirms that redundant PEs doing unnecessary zero operations can be removed.
Proposed area-efficient TiBM architecture: Fig. 1a shows the data flow of 2t times in the RiBM algorithm for the RS (255,239) decoder. We can find that the parts of the dotted line have always zero values regardless of the number of errors. Because, as mentioned earlier, PE 2t -PE 3t21 are initialised by zeros and each PE passes the zero value to the next PE in the RiBM algorithm. That is, these fixed zero values and the unnecessary t 2 1 PEs in the RiBM architecture can be removed. TiBM architecture has 2t + 2 PEs, while the RiBM architecture requires 3t + 1 PEs. In the TiBM architecture, original t + 1 PE1s which are employed in the RiBM architecture are used in PE1 0 -PE1 t and modified t + 1 PE2s are used in PE2 t+1 -PE2 2t + 1 . Some zero values were lost because of truncated t 2 1 PE1s. Thus, MUX(1) and MUX(2) were added in the PE2s to give the zero values during the appropriate time. (1) and MUX (2) . Each selection signals of 9 MUX(1)s are represented by 2 bits, which are 00, 01 and 10. So the total selection signals of 9 MUX(1)s are 18 bits. Also, each selection signal of 9 MUX(2)s are represented as 1 bit, which is 0 and 1. So the total selection signals size of 9 MUX(2)s are 9 bits. MUX signal Gen. 1 and MUX signal Gen. 2 generate 27 bits selection signals. MUX signal Gen. 1 can be generated by concatenating 18 bits for MUX(1) and 9 bits for MUX (2) . The former 18 bits move to the right every two clock cycles and '2' is inserted at the very left of control 2 unit, as shown in Fig. 3 . Also, the latter 9 bits move to the right every two clock cycles and '1' is inserted at the very left. For instance, 27 bits initial selection signals (2,2,0,1,1,1,1,1,1 and 2,2,0,1,1,1,1,1 and 1,1,1,0 ,0,0,0,0,0 after two clock cycles. Also, the next selection signals are updated to 2,2,2,2,0,1,1,1,1 and 1,1,1,1,0,0,0,0,0. Finally, the 27 bits outputs are selected by FSM.
For the correct l(x) and v(x), d i (r) and u i (r) must be propagated exactly in the KES block during 2t clock cycles. These d i (r) and u i (r) are propagated with some cases as follows. Secondly, u i (r) is propagated as the following two cases: † Case 1: If u i (r) is decided by previous operation u i (r 2 1), the selection signal is 0. † Case 2: If u i (r) is decided by backward PE2 having u i21 (r), the selection signal is 1.
If the selection signals of MUX(1) and MUX(2) are adjusted as this method, the error locator polynomial l(x) and error evaluator polynomials v(x) can be obtained correctly using only 2t + 2 PEs after the operation of 2t times. Performance evaluation and comparison: The RS(255,239) decoder using the proposed TiBM architecture was coded in Verilog HDL language and synthesised using Synopsys design tools and the 90 nm 1.1 V CMOS technology library. Table 1 shows the performance comparison and implementation results. For an exact comparison, we re-synthesised the conventional RiBM architecture under the same 90 nm CMOS technology conditions. The RS decoder using the TiBM architecture can operate at 400 MHz and the total gate count is 19 730 without FIFO in which the gate count of the TiBM architecture is 13 500. Hence, the RS decoder using the TiBM architecture has about 25% fewer gate counts than the conventional RS decoders. As shown in Table 1 , the proposed TiBM architecture has lower hardware complexity compared to the RiBM [1] and E-DCME [3] architectures. The RiBM architecture consists of 6t + 2 registers, 3t + 1 MUXs, 6t + 2 GF multipliers, 3t + 1 GF adders and other control circuits. In contrast, the proposed TiBM architecture consists of 4t + 4 registers, 4t + 4 MUXs, 4t + 4 GF multipliers and 2t + 2 GF adders. Compared with the latest area-efficient folded architectures [4, 5] , the proposed TiBM architecture has very low latency 16 clock cycles compared to the latency 260 clock cycles of the folded architecture. Also, the proposed architecture has comparable hardware complexity with folded architectures. Experiment results demonstrate that the RS decoder using the TiBM architecture has the smallest hardware complexity compared with previously reported area-efficient RS decoders. The proposed architecture is well suited for high-speed low-complexity RS decoder design.
