Abstract-A 50Mbps, 2.24mm 2 double-binary turbo decoder is designed and implemented in 0.13μm CMOS process for the WiMAX standard. To reduce the large extrinsic memory needed in double-binary turbo decoding, the proposed decoder exchanges the bit-level extrinsic information values rather than the traditional symbol-level extrinsic information values, which is achieved by deriving two simple conversions. The proposed turbo decoder, with a low-complexity hardware interleaver generating interleaved addresses for two data flows simultaneously, provides an efficient stopping criterion for double-binary turbo decoding using bit-level extrinsic information as well as huge memory size reduction of 20.6%.
I. INTRODUCTION
The turbo code introduced in 1993 is one of the most powerful forward error correction channel codes, and provides near optimal performance approaching the Shannon limit [1] . Recently, the double-binary turbo code has received a great attention and adopted in several mobile radio systems such as DVB-RCS and IEEE 802.16 standard (WiMAX) [2] , as it can offer many advantages over the single-binary turbo code [3] .
Although there are many implementations for classical single-binary turbo decoders [4] - [6] , there has been little research dedicated to the hardware implementation of the double-binary turbo decoder [7] [8]. Compared to the singlebinary turbo code, the double-binary turbo code requires much more memory in decoding. Especially, the size of the extrinsic information memory becomes large, since the number of extrinsic information values to be exchanged between two soft-input soft-output (SISO) decoders increases three times in double-binary turbo decoding. Also, the increased complexity of a SISO decoder makes it prohibitive to implement a high throughput turbo decoder where several SISO decoders are employed for parallel turbo decoding [4] .
In this paper, we propose a bit-level extrinsic informationbased decoding architecture to reduce the number of extrinsic information values to be exchanged between two SISO decoders, which leads to huge memory reduction in doublebinary turbo decoder implementation. In addition, a low complexity SISO decoder structure is also presented. To verify the proposed architecture, a double-binary turbo decoder based on the bit-level extrinsic information exchange is implemented for WiMAX with employing a single SISO decoder and the dedicated double-flow interleaver.
II. DESIGN ISSUES in DOUBLE-BINARY TURBO DECODER
A typical turbo decoder is based on the time-multiplex architecture that contains one or more SISO decoders, an interleaver, and an extrinsic memory [5] . Based on the MAP algorithm [1], the double-binary turbo decoding is well described in [8] . Focusing on double-binary turbo decoding, we describe in this Section the conventional symbol-level extrinsic information and the implementation issues.
A. Increased Hardware Complexity of a SISO decoder
The metric calculation complexity of the non-binary turbo codes is higher than that of the single-binary turbo codes due to the increased number bits to be processed at a time. For the double-binary turbo codes where two bits are processed at a time, the number of branches connected to each trellis state is increased from two to four which leads to three times of hardware complexity than that of the classical single-binary turbo codes as described in [8] .
B. Huge Extrinsic Information Memory Requirement
In the double-binary turbo code, three symbol-level extrinsic information values are defined as follows [7] [8];
where z belongs to {01,10,11} , u k is the input symbol consisting of two bits and ( ) p means the probability. The extrinsic information is exchanged iteratively between the two SISO decoders during the whole decoding process. In doublebinary turbo decoding, more extrinsic information values are to be exchanged compared to the single-binary turbo decoding that stores only one extrinsic information value. Therefore, a large memory is needed in a double-binary turbo decoder.
Taking into account the quantization scheme indicated in Table I , the memory size required for a double-binary SISO decoder is summarized in Fig. 1 . It is crucial to reduce the extrinsic information memory even if several SISO decoders are adopted for parallel decoding, as the extrinsic information memory is much bigger than the memory required in SISO decoding as indicated in Fig. 1 .
III. PROPOSED DECODER ARCHITECTURE AND OPERATION
For an efficient turbo decoder implementation, an optimized SISO decoder is necessary especially when several SISO decoders are adopted to achieve high throughput. In addition, by reducing the number of extrinsic information values, overall turbo decoder complexity can be lowered. Focusing on the time-multiplex turbo decoder, we propose a new double-binary turbo decoder architecture that includes a double-flow hardware interleaver.
A. Low-Complexity SISO decoder design
As the complexity of the metric calculation increases, the sliding window with border memory is efficient for nonbinary turbo decoding, because it can eliminate the need of the complex dummy backward calculation. In the proposed architecture, we adopt the 4-bit border metric encoding presented in [8] to reduce the border memory size. By applying the border metric encoding, since the border memory size is constant regardless of the number of SISO decoders in a turbo decoder, we can remove all complex dummy metric calculation units at the expense of a small border memory and slight performance degradation.
In addition, to lower the complexity of the SISO decoder, we exploit the branch metric recovery scheme. Branch metrics, , are defined in double-binary turbo code as follows.
where z belongs to {01,10,11} , x k and y k are transmitted and received codewords, respectively, and we assume the binary phase shift keying modulation. The superscripts p and s denote the parity bits and systematic bits, respectively. In (2), ( ) ,
is the extrinsic information received from the other SISO decoder. Since there are 16 unique branch metrics in the double-binary turbo code, the branch metric memory size becomes significant, especially multiple SISO decoders are adopted to achieve high throughput. Therefore, we propose a new branch metric recovery scheme which does not store whole branch metric values, but stores only essential values required to recover the branch metrics. From (2), we can obtain following relation;
L is the intrinsic information defined as follows. 
Therefore, by storing only essential sub-metrics as shown in Fig. 2 , we can significantly reduce the memory size required to store branch metrics at the expense of the simple calculation expressed in (3) . We can reduce the branch memory size further by keeping proposed the bit-level extrinsic information as shown in Fig. 2. 
B. Bit-level Extrinsic Information Exchange
As discussed in Section II, there are three symbol-level extrinsic information values to be exchanged between two SISO decoders. To reduce the size, the extrinsic information can be stored in bit-level. In this subsection, we describe how to derive the bit-level extrinsic information from the symbollevel extrinsic information, and two simple converters as described in Fig. 3 .
Two bit-level extrinsic information values are defined as follows.
(
where the input symbol u k consists of a pair of two bits, A and B, i.e., u k = AB. By expressing the bit-level probabilities with the symbol-level probabilities, the symbol-to-bit conversion can be derived as follows. 
By converting three symbol-level values into two bit-level values as expressed in (6), we can reduce the number of values to be stored in a memory from three to two. To retrieve the symbol-level extrinsic information values from bit-level values for next SISO decoding, the bit-tosymbol conversion can be obtained by applying appropriate approximations according to the sign values of the bit-level extrinsic information after the symbol-level extrinsic information is expressed with the bit-level extrinsic information as follows. 
1) Case-I : L be
Through (7)- (10), we can retrieve three symbol-level values from two-bit level values stored in a memory, which are required for double-binary SISO decoding. Both converters, described in (6)-(10), can be implemented with simple operations such as addition and maximum as shown in Fig. 4 . Regardless of the bitwidth of the extrinsic information, by decreasing the number of extrinsic information values to be exchanged, the proposed architecture can reduce the extrinsic information memory which is the largest memory in double-binary turbo decoder as shown in Fig. 1 .
C. Dedicated Double-Flow Hardware Interleaver
Compared to the traditional memory-based interleaver, a dedicated interleaver can achieve small area as well as low power consumption [5] . Based on the dedicated hardware interleaver presented in [8], we propose the double-flow interleaver which can generate the write addresses as well as read addresses for the extrinsic information memory as shown in Fig. 5 , where P 0 and initial values are determined according to the frame length, N [2] . With two initial value sets, two addresses can be generated on-the-fly from one shared accumulator. Since write addresses can be generated on-thefly, the address queue required to delay read addresses in the time-multiplex architecture [5] [8] can be removed. The initial address values can be managed with employing a small lookup table.
IV. Double-Binary Turbo Decoder Design for WiMAX
To verify the proposed architecture, we implemented a double-binary turbo decoder for WiMAX. Design parameters used in the implementation are the same as those in Table I . The proposed time-multiplex turbo decoder consists of a single SISO decoder, two converters, the double-flow dedicated interleaver, and the memory which holds the bitlevel extrinsic information as shown in Fig.6 .
A. Proposed Stopping Criterion with Bit-level Extrinsic Info.
To avoid unnecessary iterations at a high signal-to-noise ratio (SNR), a simple early stopping criterion is employed where the sign values of incoming bit-level extrinsic information, S e A and S e B , are compared with hard-decision bits, S llr A and S llr B . The effect of the stopping criterion and the hardware schematic is shown in Fig. 7 . At the end of SISO decoding, the proposed turbo decoder stops iteration if the following condition is satisfied for all pairs in a frame.
& & 
The proposed stopping criterion can be implemented with low hardware complexity as shown in Fig. 7 . If the above condition is satisfied for all pairs in a frame, then STOP indicates 1. However, if there are any pairs which cannot 
B. Double-Binary Turbo Decoder Implementation Results
The BER performance of the proposed turbo decoder is shown in Fig. 8 , where we compare with that of a conventional turbo decoder that is based on the symbol-level extrinsic information [8] . The memory size required for the proposed turbo decoder is summarized in Table II . By adopting the proposed bit-level extrinsic information exchange, the memory size required for the extrinsic information is reduced to two-third of the conventional method as denoted in Table II which leads to total memory size reduction of 20.6%. When several SISO decoders are adopted to achieve a higher throughput [4] , the size of the state metric memory should be increased, but the proposed architecture is still effective in reducing the total memory size, as the extrinsic information memory is much larger than the state metric memory as indicated in Table II . Fig. 9 summarizes implementation results. The proposed turbo decoder is implemented with 0.13μm 1-poly 6-metal standard CMOS process. The decoder occupies 2.24mm 2 and takes 4,948 cycles for each iteration to process a 2400-pair (4800 bit) frame. As a result, the proposed decoder provides up to 50Mbps at the frequency of 200MHz.
V. CONCLUSION
This paper has presented a double-binary turbo decoder developed for the WiMAX standard. To reduce the memory size required in double-binary turbo decoding, the proposed decoder exchanges the bit-level extrinsic information instead of the conventional symbol-level extrinsic information. The conversions between the symbol-level information and the bitlevel information have been derived with two low-complexity converters. The proposed decoder consisting of a single optimized SISO decoder, a low-complexity double-flow hardware interleaver is implemented in 0.13μm CMOS process and occupies 2. 
