Abstract-Compared with traditonal hard Bose-ChaudhuriHochquenghem (BCH) decoders, soft BCH decoders provide better error-correcting performance but much higher hardware complexity. In this brief, an improved soft BCH decoding algorithm is presented to achieve both competitive hardware complexity and better errorcorrecting performance by dealing with least reliable bits and compensating one extra error outside the least reliable set. For BCH (255, 239; 2) and (255, 231; 3) codes, our proposed soft BCH decoders can achieve up to 0.75-dB coding gain with one extra error compensation and 5% less complexity than the traditional hard BCH decoders.
I. INTRODUCTION
Bose-Chaudhuri-Hocquenghen (BCH) [1] codes are popular in storage devices and communication systems, such as Flash memories, DMB-T, DVB-T2, and DVB-S2 systems. While operating under GF(2 m ), an (N, K ; t) BCH code has a error-correcting capability t with block length of N bits and information length of K bits, where
In general, soft-decision maximum likelihood decoding (MLD) can provide around 3-dB coding gain for the same codes as compared with hard-decision decoding algorithms. However, MLD algorithm requires extreme computation complexity and is impractical for hardware implementation. Therefore, suboptimal soft decoding algorithms of error-control codes become popular and have aroused many research interests in BCH decoding [2] - [8] . Forney [2] developed the generalized-minimum-distance (GMD) algorithm, which uses soft information to generate test sequences for several hard BCH decoders to form a list of candidate code words and choose the most likely one from the candidate list. With a similar concept, the Chase, Modified GMD, and Chase-GMD algorithms [3] - [5] are also widely used to efficiently generate the candidate list and have been applied in many applications. In addition, a suboptimum maximum a posteriori (MAP) algorithm [6] with a Hamming SISO decoder, and the adaptive belief propagation algorithm were proposed for soft BCH decoding in 2005 and 2008, respectively [7] , [8] .
The hardware complexity and the storage requirement of a soft BCH decoder are generally much higher than that of a hard BCH decoder [2] - [8] . On the other hand, an error magnitude solverbased soft BCH decoding method that collects and deals with the least reliable bits instead of the entire code word was developed to achieve lower complexity decoders in 1997 [9] . There is a total of 2t-chosen least reliable bits and their corresponding error magnitudes are calculated to determine the error locations in these 2t positions. Due to the limited possible error locations, it can provide lower complexity than other soft decoders, even lower complexity than the traditional hard decoder. However, this kind of soft decoder corrects the errors only when all actual error locations are collected in the limited possible locations. The decoder is unable to solve any error even though only one error occurred outside these locations. The hardware complexity is improved but the error-correcting performance highly depends on the reliability of the input signals. As a result, the tradeoff between error-correcting performance and hardware complexity is a bottleneck for the soft BCH decoders. In this brief, the proposed soft decoding algorithm has a concept similar to [9] , but has one extra error compensation to enhance the correcting performance for the high code-rate BCH decoders. Consequently, in contrast to the conventional hard BCH decoders, the proposed soft BCH decoders achieve better performance by compensating one extra error outside the least reliable set and provide comparable hardware complexity by dealing with the least reliable bits. Moreover, the conventional BCH decoder contains three major blocks: syndrome calculator, key equation solver, and Chien search. For long-block-length BCH decoders, the decoding latency is dominated by the syndrome calculator and the Chien search. Unlike the conventional algorithms using parallel Chien search to enhance throughput, an error-locator evaluator is proposed to eliminate Chien search procedure for higher throughput [10] .
This brief is organized as follows. Section II describes the proposed soft BCH decoding algorithm. The proposed architecture and comparison between hard and soft decoders are presented in Section III. Based on the proposed method, Section IV demonstrates the implementation results of the soft BCH decoders. Finally, we conclude this brief in Section V.
II. PROPOSED SOFT BCH DECODING ALGORITHM
The proposed soft BCH decoder includes three major blocks: syndrome calculator, error-locator evaluator, and compensation error magnitude solver (CEMS) [11] . As compared with [9] , the proposed soft BCH decoder enhances the error-correcting performance by compensating one extra error outside the least reliable locations while maintaining the low-complexity property.
A. Proposed Decoding Scheme
In the proposed decoding scheme, the reliability values are fed into the soft decoder and the received polynomial R(x) are generated by inverting their sign bits in the BPSK modulation. The syndrome polynomial S(
where α is the primitive element over GF(2 m ) and v is the number of actual errors. Notice that e i is the ith actual error location and β e i = α e i indicates the corresponding error locator. With the soft inputs, the decoder chooses 2t least reliable inputs and evaluates their corresponding error locators to form the error-locator set
can be obtained in accordance with B, because β l i is the error locator of the l i th location and β l i = α l i . In BCH codes, if the l i th location is the exact error location, the error magnitude γ i is equal to 1; otherwise, γ i is equal to 0. The error-magnitude set
T is defined as the error magnitude in accordance with L, and is valid only if it is a binary vector. [7] (a) (b) The relation between B, , and the syndrome vector,
where the 2t × 2t matrix in (2) is defined as error-locator matrix B.
To represent the difference between S and the product of B and , a discrepancy vector
Notice that both the operations in (2) and (3) are under GF(2 m ). It is evident that if all errors are located within the location set L, the valid can be determined to make be a zero vector. Otherwise, is calculated as a nonbinary vector and this decoding approach fails to correct errors. If any error occurred outside L, the decoder is unable to solve any error, resulting in the lower correcting performance. However, the error-correcting ability can be enhanced by not only correcting errors located inside L but also correcting errors outside L. Under the analysis of , an error located at l miss and outside
, where β l miss = α l miss . Notice that a geometrical progression is a sequence of numbers where each term can be formulated by multiplying the previous one by a common ratio. To improve the error-correcting ability, we can additionally check whether has the property of a geometrical progression and make a compensation for finding the missing location l miss from . Accordingly, the proposed compensation soft BCH decoder can correct up to 2t + 1 errors. Except for the l miss , other error locations are l i whose corresponding γ i equals 1. The estimated code word polynomialĈ(x) can be obtained by inverting values at these error locations in the received polynomial R(x). Notice that, the proposed soft decoding algorithm can be applied to the RS codes as well but the computation of becomes much more complex because of the nonbinary characteristic.
B. CEMS Algorithm
The CEMS algorithm is applied to calculate and according to (2) and (3). The Gauss Elimination method is the most intuitive way to solve (2); however, it may provide invalid (nonbinary) error magnitude γ i and its computation complexity is O(n 3 ). In BCH codes, the valid error magnitude in is a binary value. Accordingly, solving (2) and (3) 
The modified discrepancy vector from (4) is
Notice that only t rows in the B odd and S odd means only half the computations for calculating odd in contrast to calculating , leading to significant computation reduction. The following steps illustrate the details of the proposed algorithm for CEMS.
A heuristic search for all binary combinations is completed by iteratively counting value from 0 to 2 2t −1 . At each iteration, the solver verifies whether or not odd becomes a geometrical progression. Once the geometrical progression check passes at certain value, the corresponding error locations in L and l miss can be found with and odd .
C. Simulation Results
For the purpose of comparing with existing methods, our proposed designs are compared with the traditional hard decision, GMD [2] , the 2-b flipping Chase [3] , the modified GMD [4] , and the two iterations suboptimum MAP [7] decoding algorithms. In all cases, the BPSK modulation and the AWGN channel are used and all the performances are compared at 10 −5 BER.
The simulation results of 2-error-correcting and 3-error-correcting BCH codes with 255-b code word length are presented in Fig. 1(a)  and (b) , respectively. The proposed soft decoder can correct at least one random error and as many as 2t + 1 errors if there are 2t errors located at the least reliable positions (LRPs). For the GMD/modified GMD decoder, which can correct at least t random errors, at most 2t/2t + 1 errors can be corrected, if all of them are located at the 2t/2t + 1 LRPs. The 2-b flipping Chase decoder can correct at least t random errors and as many as t + 2 errors, if there are two errors located at the two LRPs. The proposed soft BCH (255, 239; 2) decoder outperforms hard BCH (255, 239; 2) decoder by 0.75 dB. In addition, our proposed decoder is comparable with the Chase decoder while providing an improvement of 0.13-0.35 dB over other soft decoders. For BCH (255, 231; 3) codes, a coding gain of 0.4 dB can be achieved by our soft BCH decoder when compared with the hard decoder. In contrast to the GMD, the modified GMD, and the Chase decoders, our soft decoder has 0.03-0.22 dB performance loss. Notice that the two errors outside the limited possible locations are sufficient to make the proposed algorithm fail, whereas such error patterns will be corrected by other soft decoders. Therefore, the proposed soft decoder cannot offer a better performance than other soft decoders for a high SNR region (BER < 10 −8 ). However, these soft decoders demand several times the hardware complexity of our proposed soft decoder. Fig. 2 demonstrates the performances of the hard and soft BCH decoders with a code word length of 63, 255, and 1023 b, respectively. Our proposed soft decoders can outperform hard decoders for errorcorrecting capability t = 2 − 4. Notice that, the achieved coding gain decreases in accordance with increasing t.
III. VLSI ARCHITECTURE FOR PROPOSED SOFT BCH DECODERS
As mentioned in Section II, the proposed soft decoder includes three major blocks. In [11] , we discussed an efficient architecture for each block. However, in contrast to the CEMS presented in [11] , the CEMS proposed in this brief will provide a new sharing architecture, leading to (t − 1)-multiplier hardware reduction. The architecture comparison between the hard and soft BCH decoders is demonstrated in the end of this section.
A. Compensation Error Magnitude Solver
Based on algorithm A, Fig. 3 illustrates the CEMS architecture to evaluate odd = B odd × + S odd with S odd and B. The solid lines are the data flow of the B odd matrix construction procedure while the dash lines are the data flow of the geometrical progression check procedure. There are 2t 2 registers for storing all entries in B odd matrix. In the ith column, the initial value of the first row register is set as β l i so that the output of the squarer will always be β 2 l i . The tth row register is also initially set as β l i and iteratively multiplied by β 2
for operating cycles j = 1∼(t − 1). Consequently, the register values of the ith column form a geometric progression with the common ratio β 2 l i after t − 1 cycles. The B odd matrix is calculated with a total of only the 2t multipliers and the 2t squarers in t − 1 cycles. These registers will hold their values in matrix multiplication procedure: odd = B odd × + S odd .
Both matrix multiplication and geometrical progression check are evaluated simultaneously in the following 2 2t cycles. A heuristic search for all binary combinations is completed iteratively to count value from 0 to 2 2t −1 . At each iteration, the β
value stored in the register will be operated with γ i to generate the modified discrepancy vector odd . Then, the solver verifies whether odd is a geometrical progression or not. In the geometrical progression check procedure, δ 1 passes through a squarer to generate δ 2 1 , which is multiplied with each δ i value for being compared with δ i+2 . If odd is a geometrical progression, then δ i × δ 2 1 = δ i+2 for i = 1, 3, . . . , 2t − 3. The CEMS applies t − 1 multipliers and one squarer to check this relation, and employs LUT to obtain l miss according to δ 1 = α l miss . However, the geometrical progression check is processed after the B odd matrix construction, the squarer and multipliers can be shared, leading to a total of only the 2t multipliers and the 2t squarers in the proposed CEMS architecture. The critical path of the matrix multiplication procedure is (T and + 2t × T xor ) for generating odd while that of the geometrical progression check procedure is (T xor + 2T mux +T sq +T mult ) for using δ 1 to verify the relation δ i ×δ 2 1 = δ i+2 with i = 1, 3, . . . , 2t − 3. Notice that T and , T xor , T mux , T sq , and T mult represent the critical path of AND gate, XOR gate, multiplexer, squarer, and multiplier, respectively. Consequently, the critical path of CEMS is (T and + (2t
B. Architecture Comparison
The architectures of the hard and soft BCH decoders are compared in Table I . The proposed soft BCH decoder is designed with the CEMS approach, whereas the hard BCH decoders are designed with inversionless Berlekamp-Massey (iBM) algorithm [12] and simplified iBM (SiBM) algorithm [13] , respectively. A total of the 2t multipliers, the 2t squarers, and one LUT are utilized in the soft BCH decoder. The registers in the first row of B odd matrix in the CMES is applied to store the error-locator set B, which is also stored in the registers of the error-locator evaluator. Therefore, these registers can be shared, resulting in a total of 2t 2 −2t registers used in the CEMS. In addition, the syndrome calculator and the error-locator evaluator take N clock cycles simultaneously in the decoding process and the CEMS takes 2 2t + t − 1 clock cycles.
In finite field operations, a multiplier is more complex than a register and a multiplexer. Due to fewer multipliers, the proposed soft BCH decoder, with more registers and multiplexers as well as an additional LUT, has similar hardware complexity when compared with the hard BCH decoders with iBM and SiBM algorithms. According to the synthesis results in CMOS 90-nm technology, the complexity ratio over GF (2 8 ) among each 8-b 2-to-1 multiplexer, squarer, constant multiplier, 8-b register, multiplier, and LUT is 1:1.5:1.5:2.5:12:27. The normalized complexity of the soft BCH decoder is around (5t 2 + 49t + 26.5) 8-b 2-to-1 multiplexers, whereas that of the hard BCH decoder with iBM/SiBM algorithms is (54t + 42)/(72t + 5) 8-b 2-to-1 multiplexers, respectively. For the high-code-rate BCH codes, the error-correcting capability t is small, implying that the proposed soft decoder can provide similar hardware complexity as hard decoders even though the complexity of hard and soft decoders is linear and quadratic to t, respectively.
Based on Table I , the effect of error-correcting capability t to the hardware complexity and latency can be illustrated. Our soft decoder can provide the competitive hardware complexity when t equals 2 or 3. For example, the normalized complexity of the soft BCH (255, 239; 2) decoder is around 144.5 8-b 2-to-1 multiplexers whereas that of the hard BCH (255, 239; 2) decoder with iBM/SiBM algorithms is 150/149 8-b 2-to-1 multiplexers, respectively. Moreover, the proposed soft decoder searches for error locations at error-locator evaluator procedure, leading to less than 62.6% latency compared with that from the hard BCH decoders when t is smaller than 4.
IV. IMPLEMENTATION RESULTS
In Table II , the BCH decoders with hard and soft-decision methods are implemented for BCH (255, 239; 2) and BCH (255, 231; 3) codes. The hard BCH decoders solve key equation with iBM and SiBM algorithms, respectively, as well as evaluate error locations with Chien search while the soft BCH decoder is designed with CEMS.
The implementation results reveal that the proposed soft BCH (255, 239; 2) and (255, 231; 3) decoders can reach 4.2 and 6.7 K gate count with 400 and 360 MHz operation frequency, respectively, in standard CMOS 90-nm technology, which are similar to that provided by the hard BCH decoders. Although the hard BCH decoders with SiBM can operate under 500-MHz frequency, our proposed soft decoders provide better throughput because of lower latency. Compared with the traditional hard BCH decoders, the proposed soft BCH decoders computing error locations without Chien search achieve 1.6-1.9 times throughput enhancement.
V. CONCLUSION
This brief provided the improved soft decoders with one extra error compensation. Compared with the conventional hard BCH decoder, our proposed soft BCH decoders not only achieved better error-correcting performance but also provided competitive hardware complexity. The decoders with soft information can reduce hardware complexity by focusing on the least reliable bits. Meanwhile, the error-correcting ability is improved with one extra error compensation. Experimental results show that the proposed soft BCH (255, 239; 2) and BCH (255, 231; 3) decoders can obtain 0.75-and 0.4-dB coding gain, respectively, over the corresponding hard BCH decoders at 10 −5 BER. According to postlayout simulation in 90-nm CMOS technology, the proposed soft decoders can achieve up to 1.9 times throughput enhancement and 5% gate count reduction as compared with the traditional hard BCH decoders.
