Abstract-This paper presents a six-iteration concatenated BoseChaudhuri-Hocquenghem (BCH) code and its high-speed twoparallel decoder architecture for 100 Gb/s optical communications. The proposed architecture features a very high data processing rate as well as excellent error correction capability. The proposed six-iteration concatenated BCH code structure with a block interleaving methods allows the decoder to achieve 9.19 dB net coding gain performance at 10 -15 decoder output bit error rate to compensate for serious transmission quality degradation. Also, the proposed high-speed concatenated BCH decoder architecture was implemented to support 100 Gb/s data processing rate. Thus, it has potential applications in next generation forward error correction schemes for 100 Gb/s longhaul optical communications.
INTRODUCTION
The Bose-Chaudhuri-Hocquenghem (BCH) codes are a class of powerful multiple error-correcting cyclic codes [1] . The BCH codes are used in a broad class of error correcting codes such as optical fiber communication systems, second generation Digital Video Broadcasting (DVB-S2) and digital communication systems.
The Reed-Solomon (RS) (255,239) code has been used and standardized in the International Telecommunication Union Telecommunication Standardization Sector (ITU-T) G.975 and G.709 [2] . This code has a net coding gain (NCG) of 6.2 dB at a 10 -15 decoder output bit error rate (BER) with 6.69% redundancy ratio. However, for high-speed (40 Gb/s and beyond) optical fiber communication systems, more powerful forward error correction (FEC) codes have become necessary in order to achieve higher correction ability than the RS(255, 239) code and compensate for serious transmission quality degradation. Thus, several Super-FEC schemes are considered and recommended in the ITU-T G.975.1 recommendations [2] .
Furthermore, the standardization of a hard-decision FEC, which allows the redundancy ratio up to 7%, for a 100 Gb/s optical channel transport network 4 (OTN4) is under discussion at the ITU-T. As a result, the RS(255,239) code has become mandatory for short-reach systems. However, no specific FEC was determined as a standard for metro and long-haul systems, although several candidates proposed their own FEC codes. In this paper, we propose a six-iteration concatenated BCH code for OTN and its high-speed two-parallel decoder architecture for 100 Gb/s long-haul optical communication systems.
The rest of this paper is organized as follows. Section II describes proposed six-iteration concatenated BCH code scheme and Section III describes the proposed high-speed concatenated BCH decoder architecture. In Section IV, the implementation and comparison are presented. Then, conclusions are presented in Section V.
II. PROPOSED CONCATENATED BCH CODE

A. Conventional Concatenated BCH Codes
The conventional concatenated BCH code described in I.3 subclause of the G.975.1 recommendations [2] (I.3-CBCH code hereafter) use BCH (2040, 1930) and BCH(3860,3824) codes, which can correct up to 10 and 3 bit errors per inner and outer codeword, respectively. Furthermore, the I.3-CBCH code provides 8.99 dB NCG at 10 -15 decoder output BER with threeiterative decoding without additive redundancy compared to the RS(255,239) code. This technique can improve the error correction capability without decreasing the code rate. The I.3-CBCH decoder for 100 Gb/s OTN was proposed in [3] . However, when implementing a concatenated decoder, iterations are unfolded to process a continuous data stream. Therefore the hardware complexity of the concatenated BCH decoder in [3] was quite high.
In
[4], a concatenated BCH (2040, 1952 ) and BCH(3904,3820) code, which can correct up to 8 and 7 bit errors per inner and outer codeword respectively, was proposed (L-CBCH hereafter). Also, the L-CBCH code provides 8.91 dB NCG at 10 -15 decoder output BER with only two-iterative decoding. The L-CBCH decoder for 100 Gb/s OTN in [4] reduced hardware complexity approximately 34% with only 0.08 dB decrease in NCG performance compared to the I.3-CBCH code. However, the redundancy ratio of L-CBCH code is 6.81% because it uses fixed stuff (FS) byte in optical channel data unit (ODU) frame. Thus it loses compatibility with the standard RS(255,239) code. 
978
B. Proposed Six-iteration Concatenated BCH Code Scheme
Although the I.3-CBCH and L-CBCH codes provide approximately 9 dB NCG performance, it may not sufficient for a long-haul systems such as intercontinental submarine optical communication systems. So we focused on developing a novel concatenated code which provides high NCG performance. To that purpose, we introduce different parity usage compared to the I.3-CBCH and L-CBCH codes. Fig. 1 (a) shows the parity scheme of the conventional I.3-CBCH and L-CBCH codes. First, outer code encodes the payload data, and then inner code encodes the payload data and parity of the outer code. Compared to that, the proposed parity scheme in Fig. 2 (b) is slightly different. The outer code encodes the payload data first, but the inner code only encodes the payload 6.55e-6 3.23e-6 6.59e-7 1.62e-7 1.09e-8 2.52e-9 5 3.22e-6 1.52e-6 2.11e-7 3.72e-8 0 † 0 † 6 2.23e-6 1.03e-6 1.21e-7 1.66e-8 0 † 0 † †: no errors are observed.
data, not including the parity of the outer code. By using this parity scheme, 'parity on parity' in Fig. 1 (a) can be used to strengthening the error correction capability of either the inner or outer code over the payload data.
With parity scheme in Fig. 1 (b) in consideration, we did an extensive analysis over extremely noisy channel. As a result, we propose a six-iteration concatenated BCH code with the aim of maximizing the NCG performance for a long-haul optical communication systems. The proposed concatenated BCH code use BCH(996,956) and BCH(3920,3824) codes as the inner and outer code, respectively, which can correct up 4 and 8 bit errors, respectively. In the section of transmitted energy per bit to noise power spectral density ratio (Eb/N0) is 5.4~5.5 dB, the BCH(996,956) code provides best error correction capability against the random error compared with other BCH codes with redundancy ratio 4.2~4.5%. And compared to the L-CBCH code, the error correction capability of the outer code is increased from 7 to 8 bits due to the parity scheme in Fig. 1 (b) . Theoretically, a drawback of the parity scheme in Fig. 1 (b) is earlier occurrence of the error floor than the parity scheme in Fig. 1 (a) . But it is still well-below 10 -15 decoder output BER, which is not necessary in our simulations. Table I shows the performance simulation result. From our C simulation using binary phase shift keying (BPSK) transmission over the additive white Gaussian noise (AWGN) channel, the proposed code shows 9.19 dB NCG performance for six-iterative decoding. It is 0.2 dB and 0.28 dB higher than the conventional I.3-CBCH code with three-iterative decoding and L-CBCH code with two-iterative decoding, respectively, at the 10 -15 decoder output BER. Table I shows a C simulation result for fair comparison. The simulation executed using 10 12 test bits at 5.50 and 5.51 dB of Eb/N0, which is a point of interests of the proposed code. At first iteration, the I.3-CBCH code shows the best performance due to its strong inner code. However, the L-CBCH code and proposed code outperforms the I.3-CBCH code as iterations increase. And compared with the L-CBCH code, the proposed code shows the stronger error correction capability as iterations increase because it has stronger outer code while inner code strength of the two codes are similar. Fig.  3 (a) shows a block diagram of the proposed six-iteration concatenated BCH scheme. It consists of BCH encoders, BCH decoders and interleavers/deinterleavers. A two-parallel processing structure is inevitable to achieve 100 Gb/s throughput with a practical clock frequency. The A B and B A blocks shown in Fig. 3 (a) represent frame converters which are required for two-parallel processing. The original serial OTN frame structure is named as A-format and parallelized OTN frame structure is named as B-format. With the B-format OTN frame structure, two-parallel processing of encoding and decoding is possible. Detailed structure of A and B-format is described in [3] [4] . After converted, each B-format frame is processed in the outer encoder and it aligned into 8 BCH(3920,3824) codewords, as shown in Fig. 3 (b) . Also, the 8 B-format frames are collected in the interleaver and block- interleaved. Then the interleaved data in each frame aligned into 32 BCH(996,956) codewords, as shown in Fig. 3 (c) . The decoding process is executed in the reverse order. The interleaving/deinterleaving scheme of the proposed sixiteration concatenated BCH code is the same with that of the conventional I.3-CBCH and L-CBCH codes and it is well described in [2] .
III. PROPOSED SIX-ITERATION CONCATENATED BCH DECODER ARCHITECTURE Fig. 4 shows the block diagrams of the proposed sixiteration concatenated BCH decoder, which has two-parallel BCH(3920,3824) and BCH(996,956) decoders. Fig. 4 (a) shows the block diagram of the outer decoder. Since the single OTN frame consists of eight BCH(3920,3824) codes, twoparallel outer decoder processes 16 interleaved BCH(3920, 3824) codewords simultaneously. Each upper and lower 128-bit section, which is same with a single symbol bit length of the B-format OTN frame, processes eight BCH(3920,3824) codewords. The outer decoder has 16 syndrome computation (SC) blocks, 1 shared Dual-processing simplified inverseless Berlekamp-Massey (Dual-pSiBM) key equation solver (KES) block, and 16 Chien search and error correction blocks. Each SC block processes 16 parallel bits and calculates on a Galoisfield (GF) (2 12 ) symbolic base. simultaneously. Since the single OTN frame consist of 32 BCH(996,956) code, the two-parallel inner decoder has 64 SC blocks, 2 shared Dual-pSiBM KES block, and 64 Chien search and error correction blocks. Each SC block processes 4 parallel bits and calculates on the GF(2 10 ) symbolic base. Both the inner and outer decoder processes 256 bits per one clock cycle. Detailed architectures of the sub-blocks are described in [3] [4] .
IV. IMPLEMENTATION RESULTS AND COMPARISON
The proposed two-parallel concatenated BCH decoder architecture was modeled in Verilog HDL and simulated to verify their functionality using a test pattern generated from a C simulator. After complete verification of the design functionality, it was synthesized using appropriate time and area constraints. Both simulation and synthesis steps were carried out using a SYNOPSYS design tool and 90-nm CMOS technology optimized for a 1.1V supply voltage. Table II shows the implementation results of the conventional I.3-CBCH, L-CBCH and proposed concatenated BCH decoder architectures which include each interleaver, deinterleaver, inner decoder and outer decoder for single iteration. The proposed decoder has the lowest hardware complexity because the error correction capability of the single BCH(996,956) inner code is only 4. However, the memory size and latency are almost identical in all three decoders. Table III shows the implementation results of the conventional I.3-CBCH, L-CBCH and proposed concatenated BCH decoder architecture with their proposed iterations. The proposed decoder provides 9.19 dB NCG performance which is 0.2 and 0.28 dB higher than the I.3-CBCH and L-CBCH decoder, respectively. The proposed six-iteration concatenated BCH decoder consist of 2 frame converters, 5 interleavers, 6 deinterleavers, and 6 inner/outer decoders. The hardware complexity of the proposed concatenated BCH decoder is 3,732,000 gates and 4,114,176 bits of memory. It is quiet high compared to the I.3-CBCH and L-CBCH decoders because of six iterative decoding. However, in some long-haul optical communication systems such as intercontinental submarine systems, 0.2 dB performance improvement in NCG can be cost-effective than few million gates increase in silicon area, because it can reduce the required number of the submarine optical repeaters which costs fairly high. Also, the L-CBCH 
