Abstract-Constrained turbo block convolutional (CTBC) codes are developed for 100G and beyond optical transmissions. The CTBC codes developed herein each fit within one optical transport network (OTN) frame. The CTBC codes involve a simple outer block code that is serially concatenated with a simple inner recursive convolutional code using a constrained interleaver that simultaneously delivers a high interleaver gain and a high minimum Hamming distance. Codes with 11.1 dB net coding gain (NCG) at 12.5% overhead (OH), 11.3 dB NCG at 15% OH, 11.6 dB NCG at 20% OH, and 11.9 dB NCG at 23.4% OH are reported. Compared with other codes that have been previously proposed for OTN applications, CTBC codes have much lower encoding/decoding complexity, improved NCG/OH tradeoffs, and avoid negative error floor effects.
I. INTRODUCTION

R
ESEARCH and development into the 100G and beyond optical transport network (OTN) continues to evolve [1] - [9] . Soft decision forward error correction coders (SD FECs) are being developed to deal with about 10 dB weaker signal strengths in 100G transmissions than in 10G transmissions. SD FECs are designed to provide high net coding gains (NCGs) at BER out = 10 −15 and overheads (OHs) at or below 20% as recommended by [2] , although codes with OHs as high as 25% are reported in [2] . SD FECs that are considered for OTN applications are mainly either turbo product codes (TPCs), (also known as block turbo codes (BTCs)), or low density parity check (LDPC) codes. TPCs with 11.1 dB NCG at 15% OH and 11.3 dB NCG at 20% OH are currently being used in the industry [5] - [7] . LDPC codes, including non-binary LDPC codes with multi-dimensional signaling, are often recommended for OTN applications by concatenating with other powerful codes to mitigate their adverse error floor effects at BER out = 10 −15 [1] , [2] , [8] , [9] .
In this letter, we develop a family of SD FECs called constrained turbo block convolutional (CTBC) codes. CTBC codes are structured similarly to TPC, but use a simpler outer block component code and a much simpler recursive convolutional inner code to provide an improved NCG/OH tradeoff and a lower encoding/decoding complexity as compared to a TPC. CTBC codes avoid adverse error floor effects and achieve their NCG improvements through the Manuscript received January 23, 2014; revised March 7, 2014 use of a constrained interleaver [10] that simultaneously provides a high minimum Hamming distance (MHD) and a high interleaver gain.
II. CONSTRAINED INTERLEAVERS FOR OTN APPLICATIONS
Constrained interleavers were developed in [10] and [11] . Normally, row/column interleavers or helical interleavers are used with TPCs to ensure that the MHD of the TPC is equal to the product distance, MHD=d o d i , however, with little or no interleaver gain [11] , where d o and d i are the MHDs of the outer and inner codes respectively. In [11] , constrained interleavers (CIs) are presented that provide an interleaver gain similar to uniform interleaving, while simultaneously maintaining the product distance of a TPC. In [10] , the CI design technique in [11] is extended to concatenations with inner recursive codes. The CI-2 interleaver design technique described in [10] is used to construct the above-mentioned CTBC codes. As shown in Fig. 1 , CTBC codes are constructed similarly to a TPC, but achieve improved performance over a TPC using a less powerful outer code, B, a CI-2 interleaver [10] , and a much simpler recursive convolutional code (RCC) for the inner code. The CI-2 interleaver of [10] implements a semi-random permutation function that achieves an interleaver gain similar to fully random interleaving, but also meets a set of deterministic constraints that are designed to cause the MHD of the resulting CTBC code to be as high as MHD=d 2 o d i . The CTBC codes of Fig. 1 use the CI-2 to increase both MHD and interleaver gain so that the less powerful outer code B and the much simpler inner RCC can be used to achieve an improved NCG/OH tradeoff as compared to previously reported SD FECs for the OTN application. Typically, a single or double error correcting outer block code B along with a simple accumulator, that has generator function G(D) = 1/(1 + D), rate one and MHD d i = 1 can be used to generate powerful CTBC codes.
As discussed in [10] , the permutation function of the CI-2 interleaver that satisfies all constraints can be designed using a row/column structure with ρ codewords of the outer code B per row. Each codeword of the outer code B is placed on a randomly selected row until the CI-2 row/column structure is filled. Then the CI-2 bit-wise-randomizes the contents of rows subject to an additional set of pre-selected inter-row constraints. Using the notations in [10] , the inter-row constraints ensure that the columns occupied by coded bits of a codeword on row i and coded bits of a codeword on row (i −l)
1041-1135 © 2014 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
can share no more than k(l) columns up to a maximum number of rows l max . The values l max and k(l), l = 1, 2, . . . , l max , uniquely define all inter-row constraints. Rows of CI-2 can be filled one row at a time, and one coded bit position at a time of every codeword on that row up to the last coded bit position. When randomizing any bit position of any codeword, CI-2 first uses the inter-row constraints to identify bit positions where a particular bit cannot be placed, and then randomly selects an interleaved bit position among all remaining bit positions on that row for that bit. This randomization process achieves the highest interleaver gain subject to all inter-row constraints thereby maintaining the desired MHD. As per the Lemma 1 of [10] , a CI-2 interleaver can be designed to achieve a pre-selected MHD value for the concatenation that can vary from the product distance
To achieve a pre-selected MHD value, the required number of rows L and the set of inter-row constraints can be chosen according to equations (3) and (4) in [10] . In this letter, the maximum allowed interleaver length is restricted to a single OTN frame size, so that in some cases the preselected MHD value is less than d 2 o d i . However, in systems where an integer number of OTN frames may be coded and transmitted together, longer CI-2 interleavers can be designed to provide higher MHDs and higher interleaver gains to thereby achieve better NCG/OH tradeoffs than those reported herein.
Following [10] , to determine the best CI-2 interleaver whose length is equal to the OTN frame size, (a) find the set of possible MHD values and design the corresponding CI-2 interleaver by properly selecting the parameters, L, l max and k(l), l = 1, 2, . . . , l max at each of the candidate achievable pre-selected MHD values, (b) find their error floor variations according to [10] , and (c) based on the error floor variations, select the best suitable MHD and the corresponding CI-2 interleaver to maximize NCG at a given BER out =10 −15 .
The above-described CI-2 design technique need only be executed once, prior to VLSI design time. At run time, in a massively parallel VLSI SISO decoder, each processor alternates between performing outer decoding on one or more codewords of the outer code B, and inner decoding on a subsequence of elements of the RCC using an approach similar to [12] and/or the references cited therein. To implement the parallel CI-2 interleaver/deinterleaver, similar to Figs. 5 and 6 in [12] , a state machine periodically generates a predetermined control and addressing sequence that causes an interconnection network followed by a parallel set of dual port output buffers to implement the desired semi-randomized permutation function. As the parallel processors update each new set of extrinsic information elements, the parallel interleaver is busy distributing each previously updated extrinsic information element to its respective target location in a target dual ported buffer associated with a target processor.
III. PROPOSED CODES
In this section, four different CTBC codes, labeled CTBC-1, CTBC-2, CTBC-3 and CTBC-4, are constructed as illustrated in Fig. 1 . All codes considered here are compatible with a single standard OTN frame that includes 122,368 message bits plus coding overhead bits. The details of CTBC-1 through CTBC-4 are presented in this section, and for reference, the key properties of each of these codes are presented in Table I .
A. CTBC-1
As illustrated in Fig. 1 , CTBC codes are constructed by concatenating an outer block code B, via a CI-2 interleaver [10] , with an inner recursive convolutional code. In order to fit into the OTN frame size, CTBC-1 selects B to be a simple single bit error correcting o d i =16 and to provide an interleaver gain similar to that of the uniform interleaver. Using the notations in [10] , the CI-2 used in CTBC-1 is designed to have L = 8 rows, ρ = 239 codewords of B per row and a set of inter-row constraints defined by l max = 3, k(1) = k(2) = 1 and k(3) = 2. Using OH=(1/R − 1) × 100% [13] , where R is the overall rate of the code, the OH of CTBC-1 is 12.5%.
B. CTBC-2 and CTBC-3
The CTBC-2 and CTBC-3 codes use an inner recursive convolutional code with d i = 2. They are constructed by coupling the output of a CTBC-1 encoder described in Section III-A with a (λ, λ − 1) single parity check (SPC) encoder. The value of λ can be properly chosen to achieve a desired OH. Specifically, we have chosen λ = 46 to generate CTBC-2 with 15% OH and λ = 16 to generate CTBC-3 with 20% OH. At the decoder, each of these codes can be considered to be a single concatenated code with an inner code that is a combination of the accumulator and the SPC code. This combination follows a 4-state trellis structure shown in Fig. 2 , where every state S = (S A , S B ) is defined by the accumulator state S A and the SPC state S B . During every λ th interval in which the extra parity bit b p is added by the SPC code, the combined state structure in Fig. 2 ensures that S B = 0 at the end of that interval, however, without modifying S A thereby preserving the recursive nature of the inner convolutional code. The combined inner recursive convolutional code formed by the accumulator and the SPC code has MHD d i = 2, and thus provides additional power as compared to the accumulator alone which has MHD d i = 1. CTBC-2 and CTBC-3 use the same CI-2 as the CTBC-1 code and have the same MHD=16 as CTBC-1, but provide additional NCG due to the increased reliability of the extrinsic information that gets passed to B from the accumulator and SPC combination during iterative decoding. While the CI-2 in CTBC-2 and CTBC-3 can be redesigned to increase the MHD to MHD = d 2 o d i = 32, such a design would require a higher number of OTN frames and is not considered here. If one sets λ = 10, a code with NCG=11.8 dB and OH=25% results. However, as shown below, a more efficient CTBC code can be constructed.
C. CTBC-4
The CTBC-4 code uses a double error correcting (79, 64) shortened extended BCH code with MHD=6 as the outer code B and like CTBC-1, uses the accumulator as the inner code. In order to fit into the OTN frame, the CI-2 interleaver of CTBC-4 was designed the same way as that of CTBC-1 with L = 8 rows, ρ = 239 codewords of B per row and a set of inter-row constraints defined by l max = 3, 
D. Decoding and BER Performance
The proposed codes CTBC-1 through CTBC-4 can each be decoded as a single concatenation by running SISO iterations. The accumulator can be soft decoded using the standard BCJR algorithm [14] or the min-sum algorithm [15] , while the BCH code B can be soft decoded using the reduced complexity versions of either the Pyndiah algorithm [16] , [17] , ordered statistics decoding (OSD) [18] , [19] , or using other known methods of soft decoding block codes [14] . The combined inner decoder in CTBC-2 and CTBC-3 can be soft decoded using the BCJR algorithm according to the state diagram in Fig. 2. Fig. 3 shows the simulated BER out variations of CTBC-1 through CTBC-4 with Q in over a Gaussian channel with BPSK signaling, where Q in is defined using the input BER to the FEC, BER in , as Q in = √ 2er f c −1 (2B E R in ) [3] . The simulated results in Fig. 3 have been collected by decoding the accumulator (and the combined accumulator/SPC inner recursive code in the case of CTBC-2 and CTBC-3) using the BCJR algorithm and running eight SISO iterations. In Fig. 3 , for the decoding of every codeword of the outer block code B, we have separately considered OSD decoding [18] and simplified Pyndiah decoding with p = 6 least reliable received bits [16] , [17] . In the case of OSD, we have chosen the order of OSD to be d o /4 [18] . Hence, second order OSD, OSD (2) , is used in CTBC-4 that employs the (79, 64) outer code with d o = 6, while first order OSD, OSD (1) , is used in all the other CTBC codes which employ the (72, 64) outer code with d o = 4.
It was verified in [10] that CTBC code BER simulation results match well with the error floor calculated using equations (6) and (7) in [10] . Equations (6) and (7) respectively account for the error floor contributions from all possible combinations of one and two weight d o outer codewords that generate a minimum weight codeword of the CTBC. Each of these error floor contributions is calculated by multiplying its reduced CI-2 error coefficient by the Q function term corresponding to the MHD of the CTBC [10] . The same match was also observed with different outer codes using Pyndiah decoding. The error floors of CTBC-1 and CTBC-4 are found by directly following [10] Hence, the projected Q in values of all CTBC codes to reach BER out = 10 −15 can be numerically found by extending their error rate variations in Fig. 3 . Recalling that Q in =18 dB for uncoded signaling, the NCG of each variation in Fig. 3 can be calculated from the projected Q in value in Fig. 3 as NCG = (18 − Q in + 10log 10 R) at BER out = 10 −15 . Table I lists the NCG at BER out = 10 −15 , OH and other properties of the four proposed CTBC codes. Below Table I are provided the numerically found best scaling factors α i = (α i,1 , α i,2 , . . . , α i,8 ) used in soft decoding to scale the extrinsic information generated by the i th decoding stage where i = 1 corresponds to the inner code and i = 2 corresponds to the outer block code and α i, j denotes the scaling factor used by the i th stage during the j th iteration. When simulated at the Q in values corresponding to the NCG values of Table I , we observed zero errors in over 30 million OTN frames of each CTBC code, suggesting that the error probability curves can be extended as shown in Fig. 3 .
E. Complexity
Due to the use of much simpler component codes, both the decoding and encoding complexity of CTBC-1 through CTBC-4 are significantly lower than those of the previously known SD FECs. The decoding complexity depends on the number of decoding iterations and the number of operations required in the decoding of each component code per message bit during each SISO iteration. The soft decoding of the accumulator using the min-sum algorithm requires only 2/R additions and 3/R min/max operations [15] , where R is the overall rate of the CTBC code. The accumulator in CTBC-1 and CTBC-4, and the combination of the accumulator and the SPC code in CTBC-2 and CTBC-3, can also be decoded using the standard BCJR algorithm, which can be efficiently implemented in parallel as described in [12] , or according to the sum product algorithm as described in [15] . As to the outer code B, soft decoding of a single codeword of a (n, k) block code according to reduced complexity Pyndiah algorithm in [16] requires less than (7 × 2 p + 6 × n + 6) real additions and multiplications, while hard decoding of different test patterns can be done using lookup tables with further simplifications [17] . Hence, soft decoding of B requires less than 14/R real additions and multiplications when p = 6. Parallel decoding of codewords reduces latency. OSD decoding [14] with a complexity on the order of O(k 2 ), reduced complexity OSD decoding [19] or other algorithms for soft decoding of block codes [14] can be used to achieve the NCGs reported herein with OSD with an increase in the complexity beyond that of Pyndiah decoding.
F. Comparison
We compare the performance and complexity of the proposed CTBC-1 through CTBC-4 codes with the most promising previously known codes for OTN applications [1]- [9] . As stated in Section II, similar to SD FECs involving both LDPC codes and TPCs, all CTBC codes discussed here can be implemented using massive parallelism. Fig. 4 illustrates the NCG/OH tradeoff of the proposed CTBC codes along with the best known TPC-based SD FECs as used in the industry [5] - [7] as well as the LDPC codes of [8] , [9] . By viewing Fig. 4 , it is clearly seen that CTBC-1 through CTBC-4 provide a range of improved NCG/OH tradeoffs. Performance of the CTBC codes versus various other LDPC codes can be seen by comparing Fig. 4 to the Figure of Appendix A of [2] . The NCGs of the CTBC codes as shown in Fig. 4 were obtained using the OSD decoding of the outer block code B. Compared with TPCs, CTBC codes employ a shorter and simpler outer code and a much simpler inner code. As a result, CTBC codes have a significantly lower complexity than known powerful TPCs [5] - [7] .
IV. CONCLUSION Powerful constrained turbo block convolutional (CTBC) codes that are compatible with a single OTN frame have been constructed for 100G and beyond applications. Specifically, four different CTBC codes have been proposed. CTBC codes have been constructed by concatenating relatively very simple component codes and using a constrained interleaver type 2 (CI-2) [10] to boost the MHD of the concatenation to as high as the square of the MHD of the outer code and to reduce the associated error coefficients. This allows the proposed CTBC codes to provide a better NCG/OH tradeoff without any negative error floor effects, and to do so with a much lower encoding/decoding complexity as compared to previously known SD FECs for OTN applications.
