This paper presents a novel Discrepancy Computationless RiBM (DcRiBM) algorithm and its architecture for decoding BCH codes. The DcRiBM algorithm allows elimination of the discrepancy computation control block and reduced hardware complexity as compared to conventional RiBM algorithm architecture. The low-complexity DcRiBM architecture has been designed and implemented with 0.18-μm CMOS standard cell technology in a supply voltage of 1.8 V. The BCH (2040,1930) decoder with the proposed architecture operates approximately 2.9 Gb/s at a clock frequency of 265 MHz and has approximately 32% fewer gate counts than the conventional RiBM architecture.
or ME algorithm can be used to solve a key equation for an error locator polynomial ( ) r λ of BCH decoding procedure.
A well known reformulation of inversionless Berlekamp-Massey (RiBM) algorithm [2] calculates the error locator polynomials and the error evaluator polynomial through the iterative procedure for solving the key equation. This algorithm computes the next discrepancy ( +1) δ r % at the same time that is used to compute the current polynomial update. The architecture based on RiBM algorithm has extremely regular semisystolic structure with the critical path only (T mult +T add ) as compared to the critical path of previous iBM algorithm (>2·(T mult +T add )). Moreover, the RiBM architecture also supports lower gate complexity and simpler control structures as compared to architecture based on the ME algorithm [2] . But this architecture uses a lot of Galois-Field (GF) multipliers and GF adders, which occupy extensive silicon area and results in high power consumption. In this paper, a novel discrepancy computationless RiBM (DcRiBM) algorithm and its architecture are proposed with the aim of reducing the hardware complexity and improving the clock frequency for high-speed low-complexity BCH decoders. This design does not require the discrepancy computation control block and has a less number of processing element (PE) processor compared to the conventional RiBM architecture. Therefore, it has a less number of GF multipliers and GF adders, which provides considerably reduced hardware complexity.
Section II shows the prοposed DcRiBM algorithm and its architecture for the high-speed low-complexity BCH decoders. In section III, results and performance comparison are presented. Finally, conclusions are provided in section IV.
II. PROPOSED DISCREPANCY COMPUTATIONLESS RIBM ALGORITHM AND ARCHITECTURE

A. Proposed
Discrepancy-Computationless RiBM Algorithm
We propose the novel DcRiBM algorithm to remove the discrepancy-computation control block and have small hardware complexity. The generator polynomial of the BCH codes is specified in terms of its roots over GF (2 m ). The generator polynomial ( ) g x is given as the least common multiple of minimal polynomials of (
(1) The odd-numbered syndromes can be computed by evaluating the received polynomials at the necessary odd-powered roots of the generator polynomial.
The necessary roots of the generator polynomial comprise at least 2t consecutive symbol value
···
, where α represents the primitive element and L represents an appropriate power of roots. The syndromes i s for binary BCH codes possess a property as described by the following (2). 
The equation (3) shows how to generate a regular repeating control signal. The discrepancy 0 ( ) r δ % is given as the multiple of appropriate symbol value, so thus the odd number of iterative operation regularly generates value of discrepancy 0 ( ) r δ % '0'. The even-numbered syndromes can be computed by (3) . In this property, the number of iterative operation in a BCH decoding process can be halfreduced.
The proposed DcRiBM algorithm is described by the pseudo-code shown in below.
The DcRiBM Algorithm
Initialization:
B. Proposed Discrepancy-Computationless RiBM Architecture
The disadvantage of the conventional RiBM architecture described in [2] is that it uses a lot of GF multipliers and GF adders, which occupy extensive silicon area and results in high power consumption. Fig. 1 shows the novel DcRiBM architecture, in which there is no discrepancy computation control block and it has less number of PE processors, This architecture eliminates discrepancy computation control block, which controls whether ( ) i θ r % is updated or not in RiBM architecture. By using 1-bit counter, not only PE block but also 0 ( ) δ r % block are controlled consecutively. Also the value of '0' is stored into
Note that the discrepancy 0 ( ) r δ % generated fixed PE (zero-th) position is always '0' in odd number iteration. On the other hand, discrepancy 0 ( ) r δ % in even number iteration is always '1'. We use the fact that regular repeating control signal can not only reduce the number of iteration but also eliminate the discrepancy computation control block described in RiBM architecture.
Based on the new initial conditions compared with the existing RiBM algorithm, the novel DcRiBM algorithm performs the computation of next discrepancy ( +1) i δ r % as (4), where 0 ( ) δ r % is the discrepancy, which feed into every PE processor at the same time. The upper equation in (4) has critical path T mult ·(T mult +T add ), so we reformulate the upper equation in (4) to guarantee the reduction of the critical path delay T mul +T add +2·T mux . The proposed DcRiBM architecture has almost similar critical path delay with the conventional 
Furthermore, the proposed DcRiBM architecture has systolic array structure, as shown in Fig. 2 . Fig.  2 shows the PE processor of DcRiBM architecture, which consists of 2 GF multipliers, 1 GF adder, 2 latches, 1 D flip-flop and 5 muxes. The conventional RiBM architecture for RS decoders needs to evaluate both error evaluator polynomial and error locator polynomial, but the proposed DcRiBM architecture for BCH decoders considers only error locator polynomial. So we can detect and correct errors using only error locator polynomial in BCH decoding procedures. The conventional RiBM architecture with 3t+1 PE processors use the former t PE processors to obtain the error evaluator polynomial and the latter PE processors from PE t to PE 2t to obtain the error locator polynomial. Therefore the former t PE processors were eliminated and replaced to PE t to PE 2t processors, because BCH decoder does not need the error evaluator polynomial. After 2t clock cycle, DcRiBM architecture can generate the coefficients of the error locator polynomial consecutively from PE 0 to PE t processor in BCH decoding procedure. Table 1 summarizes the hardware complexity and path delay of the various KES architectures, and compares them with the proposed DcRiBM architecture. The DcRiBM architecture reduces the hardware complexity over the previous RiBM architecture. The RiBM architecture requires 3t+1 GF adders, 6t+2 GF multipliers, 6t+2 latches and 3t+1 muxes. In contrast, the proposed DcRiBM architecture consisting 2t+1 PE processors requires 2t+1 GF adders, 4t+2 GF multipliers, 4t+2 latches, 2t+1 D flip-flops and 10t+5 muxes. Due to requiring only 2t+1 PEs for the proposed DcRiBM architecture, the proposed DcRiBM architecture requires less GF multiplier and GF adders, and thus it has smaller area and higher performance compared to the RiBM architecture. The proposed architecture has a critical path T mul +T add +2·T mux , which is comparable to the critical path in RiBM architecture.
III. IMPLEMENTATION AND COMPARISON
The proposed DcRiBM architectures for BCH decoder were designed in Verilog HDL and simulated to verify its functionality. The proposed DcRiBM architectures have been synthesized using appropriate time and area constraints. Both simulation and synthesis step were carried out using SYNOPSYS design tools and 0.18-㎛ CMOS technology optimized for a 1.8 V supply voltage. Table 2 shows the implementation results of the proposed DcRiBM architecture and the conventional RiBM architecture for BCH (2040, 1930) over GF (2 11 ), which takes part in inner decoder of concatenated BCH codes. Table 3 shows the implementation results of the proposed DcRiBM architecture and the conventional RiBM architecture for BCH(3860, 3824) over GF (2 12 ), which takes part in outer decoder of concatenated BCH codes. The BCH (2040, 193) decoder with the proposed DcRiBM architecture operates approximately at a clock frequency of 265 MHz and has a 2.9 Gb/s and requires approximately 32% fewer gate counts and extremely simpler control logic than the conventional RiBM architecture.
IV. CONCLUSION
This paper presents a novel DcRiBM algorithm and its architecture for high-speed low-complexity BCH decoders. The regular repeating 0 ( ) r δ % is evaluated in each iterative operation, and thus we can eliminate the discrepancy computation control block and reduce the number of iterative operation in a BCH decoding process. The DcRiBM algorithm allows the elimination of the discrepancy computation control block and reduced hardware complexity as compared to the conventional RiBM architecture. The BCH (2040, 1930) decoder with the proposed architecture operates approximately 2.9 Gb/s at a clock frequency of 265 MHz and requires approximately 32% fewer gate counts than RiBM architecture. Also, DcRiBM architecture can be used for several concatenated BCH code scheme commonly utilized for both submarine and terrestrial optical fiber system for higher correction ability and throughput.
