Abstract-This paper presents a low-complexity noniterative soft-decision Bose-Chaudhuri-Hocquenghem (SD-BCH) decoder architecture and design technique for wireless body area networks (WBANs). A SD-BCH decoder with test syndrome computation, a syndrome calculator, Chien search and metric check, and error location decision is proposed. The proposed SD-BCH decoder not only uses test syndromes, but also does not have an iteration process. The proposed SD-BCH decoder provides a 0.75~1 dB coding gain compared to a hard-decision BCH (HD-BCH) decoder, and almost similar coding gain compared to a conventional SD-BCH decoder. The proposed SD-BCH (63, 51) decoder was designed and implemented using 90-nm CMOS standard cell technology. Synthesis results show that the proposed non-iterative SD-BCH decoder using a serial structure can lead to a 75% reduction in hardware complexity and a clock speed 3.8 times faster than a conventional SD-BCH decoder.
I. INTRODUCTION
Recently, wireless body area network (WBAN) technology has drawn wide attention owing to the demand for short-range wireless communication and the advent of energy-efficient VLSI technology. The WBAN [1] is a special purpose sensor network designed to operate autonomously to connect various medical sensors and appliances, located inside and outside of a human body. It consists of small, intelligent devices attached on or implanted in the body, which are capable of establishing a wireless communication link.
Bose-Chaudhuri-Hocquenghem (BCH) codes are important multiple-error-correcting cyclic codes that are widely used in communications and storage systems [2] - [4] . In the WBAN [1] , the BCH (63, 51) code and its shortened (31, 19) code are adopted to enhance transmission reliability under different channel conditions. Either hard-decision BCH (HD-BCH) decoders or soft-decision BCH (SD-BCH) decoders can be adopted at the receiver. In general, a SD-BCH decoder can achieve better error performance compared to a HD-BCH decoder, and the improvement in bit error rate (BER) performance of the SD-BCH decoder can translate into power savings at the transmitter, given the same data link requirements.
In general, there are several methods for SD-BCH codes. Maximum likelihood decoding (MLD) [5] , generalized minimum distance (GMD) [6] , sliding encoding-window (SEW) [7] , and Chase algorithms [8] were developed to produce a list of candidate codeword. In several soft-decoding methods, a Chase algorithm can be used to correct errors with a hard-decision kernel. Generally, a SD-BCH decoder with a Chase algorithm has an iteration process and also requires a test pattern generator.
Yang et al. [9] found that a SD-BCH decoder for energy-constrained WBANs provided a 1dB coding gain, compared to a HD-BCH decoder. In order to reduce the energy dissipation and area, Yang et al. [9] presented early termination (ET), probabilistic sorting and passtransistor logic-based Chien search. An early termination scheme was used to reduce the number of unnecessary test patterns. A probabilistic sorting scheme is utilized to reduce the architectural complexity of the sorting circuit. For the hard-decision kernel, Yang et al. used a Peterson algorithm, which uses the determinant value.
In this paper, we propose a non-iterative SD-BCH decoding algorithm and efficient decoder architecture without iteration. Specifically, the SD-BCH decoder has test syndrome computation (TSC), which eliminates the test pattern generator, a hard-decision kernel based on a modified step-by-step (m-SBS) algorithm [4] , and error location decision (ELD). Although the SD-BCH decoder generally uses an iteration method, the proposed SD-BCH decoder does not require iteration. Moreover, in order to reduce hardware complexity, optimized test syndrome computation, a syndrome factor calculator, and error location decision are also presented.
The rest of this paper is organized as follows. In Section II, SD-BCH decoding algorithm is briefly described. Section III describes the proposed noniterative SD-BCH decoding algorithm and provides analysis of a BER simulation. Section IV presents the proposed SD-BCH decoder architecture and design techniques. In Section V, the results and a performance comparison are presented. Finally, a conclusion is provided in Section VI.
II. OVERVIEW OF SOFT-DECISION BCH DECODING ALGORITHM
In this section, we introduce the basics of SD-BCH (n, k, t) decoding algorithm, n and k denote the codeword length and the information length, respectively, and t denotes the error correcting capability. Type II Chasebased SD-BCH decoding algorithm has been discussed in [8, 9] . The Chase-II algorithm is a sub-optimum softdecision algorithm that uses an error correction only hard-decision decoding (HDD) as the kernel.
The procedure for the Chase-II algorithm is described as follows:
1) Find the locations of the least reliable bits (LRBs), where p = [d min /2] , and d min is the minimum Hamming distance of the codeword.
2) Generate test patterns by considering all combinations of the LRBs.
3) Decode each test pattern by using the HDD kernel. If this test pattern is decoded successfully using the HDD kernel, the decoded word is regarded as a candidate codeword.
4) Evaluate the Euclidean distance for each candidate codeword in the list and select the one with the smallest Euclidean distance as the best decision codeword.
In fact, it is unnecessary for the HDD kernel in the Chase algorithm to process all test patterns since redundant test patterns can be eliminated to save energy.
In [9, 10] , the authors proposed a test pattern generator to correct more errors than HD-BCH decoder. This idea comes from adding LRBs in a codeword, which is passed a hard-decision unit. The procedure for the conventional SD-BCH decoding using test-pattern generator is described as follows: 
Syndromes of test pattern 2:
Syndromes of test pattern 3:
Syndromes of test pattern 4:
where 
III. PROPOSED NON-ITERATIVE SOFT-DECISION BCH DECODING ALGORITHM
Suppose an HD-BCH (n, k, t) code has an n-bit codeword, a k-bit message and t correctable bits using algebraic decoders, operation under a Galois-field GF (2 m ) , and that α is the primitive root over the primitive polynomial f(x). Otherwise, with a SD-BCH code based on a Chase algorithm [8] , it is possible that the number of corrected errors is t + p, which determines the position of the p least reliable bits (LRBs), where p ≤ d min /2. This requires a test pattern generator and an error-correctiononly hard-decision decoder (HDD) as the kernel. Chase algorithm-based soft-decision decoders (SDDs) evaluate the soft-decision metric using the HDD kernel. However, in case of BCH (63, 51, 2) code and p = 2, we only need a test syndrome α t0
, α t1 without a test pattern generator.
The major steps for the proposed non-iterative SD-BCH (63, 51, 2) decoding are described as follows: 1) Separate the hard-decision value and magnitude in the log likelihood ratio (LLR) input bits.
2 In addition, the proposed non-iterative SD-BCH decoding algorithm is described in detail as follows: Syndrome Calculator: 
Initially, we can separate hard-decision r HD,i with magnitude │r i │ (i = n-1, …, 0) from the received LLR. The hard-decision r HD,i , which is one bit, is a most significant bit of the LLR and is fed into the hard-decision kernel to calculate the syndromes. On the other hand, magnitude │r i │ is used to find a test syndrome value in test syndrome computation.
Yang et al. [9] demonstrated a probabilistic sorting scheme to have a correct second minimum value with a high probability. The second minimum value is generated from the last second stages of the tree. On the other hand, in the proposed method, two minimum values in serial processing can search the test syndrome value to divide the total clock cycle into halves. Therefore, α t0 is possibly a syndrome value, which is one value out of {α n-1 , … ,
}, indicating a location of the minimum magnitude value from │r n-1 │ to │r (n-1)/2+1 │. Also, α t1 is determined to be a syndrome value out of {α Syndrome of test pattern 1:
Syndrome of test pattern 2: While the H value is calculated, the metric check computes the M value, which is the number of errors, by adding the H value. Finally, in the error location decision, the decision codeword d is set to 1 between the hard decision and test patterns by the controller. Fig. 1 shows the BER performance comparison for the proposed (63, 51) SD-BCH decoder, a conventional SD-BCH decoder [9] and a HD-BCH decoder. Additive white Gaussian noise (AWGN) channel and BPSK are considered. The proposed SD-BCH decoder provides a 0.75~1dB coding gain (BER = 10 -5 ) compared to the HD-BCH decoder, and almost similar coding gain compared to a conventional SD-BCH decoder.
IV. PROPOSED NON-ITERATIVE SOFT-DECISION BCH DECODER ARCHITECTURE
The proposed SD-BCH decoder architecture consists of four major units: TSC, hard-decision kernel, ELD, and controller, as shown in Fig. 2 . The hard-decision kernel consists of a syndrome calculator (SC), a syndrome factor calculator (SFC), and Chien search (CS) and metric check (MC).
Test Syndrome Computation
In this subsection, we discuss the detailed hardware architecture of the proposed TSC architecture using a loop circuit. Fig. 3 shows the proposed TSC block. Since the test pattern generator requires a lot of resisters, the proposed TSC architecture only uses the test syndrome values by the loop circuit generating a syndrome value without the location of min 1 , min 2 . The proposed TSC block generates test syndromes α t0 , α t1 by inserting the magnitude of the received LLR. The TSC block needs control signal ctr TS and comparison signal cp to get test syndrome α t0 and α t1 ; ctr TS is 1, while magnitude is inserted from │r n-1 │ to │r (n-1)/2 │. If cp is 1, the inserted magnitude value is smaller than the previous magnitude value. Then, test syndrome α t0 is updated to a syndrome value that indicates the location of the smallest magnitude. If cp=0, the stored magnitude value is smaller than the inserted magnitude value, and α t0 holdsthe previous value.
Hard-Decision Kernel
The hard-decision kernel uses a HD-BCH decoding structure based on an m-SBS algorithm [4] . It consists of the SC block, the SFC block, and the CS and MC block, as shown in Fig. 2 . 
A. Syndrome calculator
The SC block calculates all the syndromes S i (1 ≤ i ≤ 2t-1) by inserting a hard-decision bit.
As mentioned, S i (i=1, 3) is required in m-SBS algorithm-based HD-BCH decoding [4] to calculate the syndrome. The SC consists of a parallel SC cell to receive the parallelized codeword, as shown in Fig. 4(a) . Fig. 4(b) shows the detailed hardware architecture of the proposed SFC architecture. The SFC block calculates the syndrome factors, which are R, A, and B using the syndrome values S 1 , S 3 from the SC block, and the test syndrome values α t0 , α t1 from the TSC block. The SFC block consists of GF multipliers and GF adders. Therefore, the architecture of the SFC block is comparatively simple and has an efficient area, because it remains unaffected by a serial or parallel structure. Fig. 4(c) shows the CS and MC block, which consists of four parallel blocks because the proposed SD-BCH decoder operates without iteration. The values of the syndrome factor, which are outputs of the SFC block, are fed to the CS block. The function of a NOR gate replaces the syndrome value with the H value using Eq. (2). The output H values can be calculated with the CS block to search the error location information. The CS block checks whether an error has occurred. If R + Aα j + Bα 2j = 0, then the H value is 1; that is, an error has occurred in the j th location. Otherwise, the H value is zero; that is, no error has occurred. The CS block consists of constant GF multipliers, GF adders, multiplexers, NOR gates, and D flip-flops. In the MC block, metric value M, which is the number of errors, is computed by adding the H values during 63 clock cycles. For example, if the number of errors is two, the M value becomes 2.
B. Syndrome Factor Calculator

C. Chien Search and Metric Check
Error Location Decision
The ELD block consists of a constant GF multiplier, GF adders, multiplexers, NOR gates and D-FFs, as shown in Fig. 5 . The ELD block checks the error location and corrects errors according to H values. The m-bit H value can be transformed to a one-bit value by bitwise NOR. If the H value is non-zero, otherwise, it is zero. In the proposed method, two types of H value are required to check the error location and correct errors. The first type of H value is the reference value that has information about the number of errors in the codeword. The second type of H value is compared with the reference H value to check whether the bit location is erroneous. The decision codeword d is decided by control signal ctr, which is selected by the controller. The main loop circuit generates α j , where j = n-1, … , 0.
Controller
In the controller, ctr is selected by R HD , R tp2 , R tp3 , and R tp4 , which are the syndrome factors from the SFC block, and metric value M. First, control signal ctr is decided by R values. For example, if R HD is zero, ctr is 1 and selects the hard-decision candidate codeword. In addition, if R HD is not zero and R tp2 is zero, ctr is 1 and decides the second test pattern. If R HD and R tp2 are not zero and R tp3 is zero, ctr is 2. Finally, if R tp4 is zero and R HD , R tp2 , and R tp3 are not zero, ctr is 3 and selects the third test pattern. It means that control signal ctr decides one of the test patterns, because R value = zero indicates that the number of errors is none or 1.
V. RESULTS AND COMPARISON
The proposed SD-BCH decoder architecture was modeled in Verilog HDL and then simulated to verify its functionality using a test pattern generated from a C simulator. After complete verification of the design functionality, it was then synthesized using appropriate time and area constraints. Both simulation and synthesis steps were carried out using the SYNOPSYS design tool and 90-nm CMOS technology. Table 1 shows the hardware complexity comparison of the proposed SD-BCH decoder for a P-parallel factor. The basic building blocks of the SD-BCH decoder are the GF multiplier, GF adder, multiplexer, D flip-flop, and the comparator. The number of complex operation units (e.g., GF multiplier, GF adder) increases depending on the parallel factor. However, the number of GF multipliers and GF adders in the SFC block is the same, regardless of the parallel factor. Table 2 shows the performance comparisons between the proposed SD-BCH decoders using levels of parallelism (P) = 1, 3, 7 and the conventional SD-BCH decoder using a fully-parallel hard-decision kernel [9] . The proposed SD-BCH decoder using a serial structure has 4,121 NAND gate counts from the synthesized results, which shows a 75% gate count saving, compared with the conventional SD-BCH decoder. The proposed BCH decoder operates at a clock rate of 250 MHz, has a latency of 126 clocks, and throughput of 202 Mbps. In addition, the proposed SD-BCH decoder using a 7-parallel structure performs 1.68 times throughput with gate-count savings of 50%, compared with a * Gate count is calculated based on area information of Yang et al. [9] , unit gate area in the TSMC 90-nm CMOS library. † Throughput is calculated by P×f×R, where P = levels of parallelism = number of bits in one clock cycle , f = clock speed, and R = k/n.
conventional SD-BCH decoder. If there are single error in the input, the conventional SD-BCH decoder [9] has 1 processing cycle (= 15 ns) due to a fully-parallel (P = 63) hard-decision kernel and early termination scheme. However, if there are more than two errors in the input, the processing cycles of a conventional SD-BCH decoder will be increased with iteration. The proposed SD-BCH decoder has the same processing cycles regardless of the number of errors. That is, the proposed SD-BCH (63, 51) decoders with 3-parallel (P = 3) and 7-parallel (P=7) have 42 and 18 processing cycles, respectively. In addition, the efficiency of the SD-BCH decoder is calculated by the throughput-to-gate-count ratio (Mbps/M gates). The proposed SD-BCH decoders with P=3 and P=7 have much better efficiency compared to the conventional SD-BCH decoder using a fully-parallel (P = 63) hard-decision kernel. For the same BER performance, the proposed SD-BCH decoder has an efficient area and continuously operates without iteration.
VI. CONCLUSIONS
This paper presents a low-complexity non-iterative SD-BCH decoder architecture and its efficient design techniques. A hardware-friendly, non-iterative SD-BCH decoding algorithm is proposed and adopted for the SD-BCH decoder. In addition, a novel test syndrome computation, a hard-decision kernel, and an error location decision block are proposed. The proposed SD-BCH decoder has better BER performance than a HD-BCH decoder and significantly less hardware complexity than a conventional SD-BCH decoder.
Taesung Kim received the B.S
