Forward error correction (FEC) is as one of the key technologies enabling the next-generation high-speed fiber optical communications. In this paper, we propose a rate-adaptive scheme using a class of generalized low-density parity-check (GLDPC) codes with a Hamming code as local code. We show that with the proposed unified GLDPC decoder architecture, a variable net coding gains (NCGs) can be achieved with no error floor at BER down to 10 -15 , making it a viable solution in the next-generation high-speed fiber optical communications.
INTRODUCTION
To date, commercial fiber-optics communication systems have employed the fixed forward error correction (FEC) and constellation size. To closely approach the Shannon limit, iterative decodable FECs, such as turbo product codes (TPC) and low-density parity-check (LDPC) with soft-decision decoding (SDD) algorithm have been employed. Rate-adaptive techniques, which enable increasing the information rates over short links and reliable transmission over long links, are likely to become more important with ever-increasing network traffic demands. In the literature, serial concatenation of two Reed-Solomon (RS) codes can provide rate variation by using shortening and puncturing techniques [1] . As continuing evolution, employing different code rates of LDPC codes have been demonstrated as an alternative solution for rate adaptation [2] and adjusting the field size and the code rate of non-binary LDPC coded have been shown to bring more flexibility [3] .
The concept of generalized LDPC (GLDPC) codes was first introduced in [4] , and then a class of well-constructed codes was proposed and simulated in [5] [6] [7] . Since then, braised BCH codes, staircase codes and spatially coupled LDPC codes have been viewed as specific classes of generalized LDPC codes [8] [9] [10] . In this paper, we construct GLDPC codes by replacing the parity-check equations in a parity-check matrix of a global LDPC code by a simple linear block code. The decoding is based on several low-complexity soft-input-soft-output (SISO) linear block decoders operating in parallel, which provides more accurate estimates of bit reliabilities than single parity-check codes for a global LDPC decoder after small number of iterations. However, due to the high-complexity of the BCJR decoder, the GLDPC coding is limited to simple linear block component codes such as Hamming, binary BCH, and Reed-Muller (RM) codes. The GLDPC codes have already been studied for optical communication applications; see [11] . In this paper, we propose a rate adaptive scheme based on GLDPC codes utilizing Hamming codes as component codes and show that larger minimum distance and larger girth can be achieved with carefully designed GLDPC code, as compared to conventional LDPC codes. We provide a software reconfigurable FPGA decoder of the proposed GLDPC codes and show their remarkable flexibility for code-rate adaptation.
The paper is organized as follows. In Section 2, we describe the proposed class of GLDPC codes, derived from Hamming codes, suitable for ultra-high-speed optical transport networks. Then we provide the FPGA-based decoder architecture of the proposed GLDPC codes detailed with generalized check node processor in Section 3, while the performance analysis is provided in Section 4. Some important concluding remarks are provided in Section 5. 
GENERALIZED LDPC CODES

Design of GLDPC codes
In this section, we consider the design of high-rate GLDPC codes with low error floors. This process consists two steps. The first step is based on the same principle as the design of LDPC code. For a given code rate, a density evolution [12] or extrinsic information transfer chart analysis [13] can be exploited to derive the decoding thresholds of bipartite graph whose node distribution are optimized with the aim of minimizing the signal-to-noise ratio threshold. However, the search for a good mixture of component codes can be impractical due to large set of possible node types involved in GLDPC codes. For efficient implementation purpose, we first design a quasi-cyclic LDPC codes with column weigh and row weight based on permutation matrices [14] , the parity-check matrix can be written by
where !,! represents the × circulant permutation matrix with a one at column + !,! mod for row r. Once we constructed a quasi-cyclic LDPC codes, we can substitute a certain number of single parity-check codes with simple block codes, such as Hamming codes, Reed-Muller codes, BCH codes and so on. However, we only choose Hamming codes for low implementation complexity. Depending on the number of single parity check codes replaced, we can achieve a fine-tuning of code rate. In this paper, if original LDPC code with (N, K, M) and code rate of ! , we consider replacing every d single-parity check with (n, k, m) block code, so that the resulted code rate ! is bounded by
Example: Let us start with a very simple code with 6 check nodes, 45 variable nodes, in which all check nodes are connected to each of the 15 variable nodes. Thus the check nodes have degree 15 and variable nodes have degree of 2. Both original and GLDPC parity-check matrices are given as follows
Decoding of GLDPC codes
For the sake of completeness of the presentation, we provide the layered decoding algorithm of the proposed GLDPC code with Hamming code as component code. However, the proposed algorithm as applicable to any linear block code to be used as the local code. Let !" !,! , !" !,! , and ! represent the check c to variable v message, the variable v to check c at kth iteration and l-th layer message, and the log-likelihood ratio (LLR) from the channel, respectively; where = 1, … , !"# and = 1, … , . The layered min-max algorithm is adopted in this paper, whose data flow can be summarized as follows
• Bit decision step:
• Variable node processing rule:
• Check node processing rule:
o If it is local code: BCJR-based APP updating rule is applied:
o Else: scaled min-sum updating rule is applied:
the index ! in Eq. (8) is set to − 1 when < ! while set to otherwise and the implementation of Eq. (9) will be discussed in next section. The criterion of early termination is achieved when decoded bits converge to transmitted codeword, since only 1/ portion of check nodes involved in each layer.
ARCHITECTURE OF GLDPC DECODER
Overview
In order to verify the performance of the proposed GLDPC codes, we use a field programmable gate array (FPGA) platform, whose high-level diagram is illustrated in Fig. 1 . This platform is similar to other platforms reported in the literature [15] [16] [17] and consists of three parts: a Gaussian noise generator, a GLDPC decoder, and an error counter circuit. The Gaussian noise generator using two LFSR-based uniform generator combined with Box-Muller algorithm generates samples of the white Gaussian noise. Such generated sequence of samples is multiplied with standard deviation of AWGN and fed to the LDPC decoder with quantized log-likelihood ratios (LLRs). The GLDPC decoder is based on the layered decoding algorithm [18] and uses a scaled min-sum updating rule with constant scaling factor for single parity check decoders and a maximum a posteriori (MAP) probability computation rule based on BCJR algorithm for the local codes. It is worth mentioning that the local codes are distributed uniformly in the check nodes, this tremendously simplifies the addressing problem of the local codes. Based on the local code valid signal, the most up-todate variable to check node message is routed to either conventional check-node processor or BCJR-based MAP processor. 
BCJR-based APP decoder architecture
The implementation of BCJR-based MAP decoder can be divided into three parts. As shown in Fig. 2 , the first part calculates forward and backward recursion likelihoods, the second part corresponds to the memories storing the intermediate data and , and the third part is a combiner calculating the output. Since the trellis derived from a block code is time-variant, it implies that a selection signals should be pre-stored in ROM so that we can select an appropriate feedback output to the input of forward and backward recursion blocks. Aiming to keep reasonable complexity and latency, we replace the max-star operation by max operation and adopt a bidirectional recursion scheme, which minimizes the memory sizes for and as well as the latency. To be more specific, Fig. 3 shows the timing diagram of decoding of the (15, 11) Hamming code. The first half of the forward and backward recursion updated and stored in memories, current ! and current ! are combined to the output ! . After that, each cycle will generate two outputs, one is obtained by combining current with from memory blocks while the other one is obtained by combining the current with from the memory. This technique reduces the total latency to the length of local codes plus the latency due to parallel-to-serial conversion of the input and serial-to-parallel conversion of the output. In summary, the complexity of MAP decoder is reasonable low and thus makes the proposed GLDPC code very promising. 
EMULATION RESULTS AND ANALYSIS
To prove the advantages of the proposed rate adaptive GLDPC codes, we start from a well-designed LDPC code, which is a quasi-cyclic (3, 15)-regular, girth-10 (34635, 27710, 0.8) binary LDPC code and choose a simple (15, 11) Hamming code as component code. For the purpose of ease of implementation, we sweep different parameters d in the rage of {∞, 127, 63, 31, 15}, which corresponds to the code rates of {0.8, 0.7953, 0.7906, 0.7807, 0.7601}. The precision of LLRs, variable-to-check message, and check-to-variable message are set to 5-bit, 5-bit, and 6-bit respectively; and the maximum number of iterations is set to either 10 or 15. The bit error rate (BER) performances vs. Q factor are present in Fig. 4 with 10 iterations and Fig. 5 with 15 iterations. The net coding gain of the designed mother is 11.61 dB and 11.71 dB for 10 and 15 iterations, which demonstrates its fast convergence. One can clearly observe for the Figure that the BER performance is enhanced as d decreases, thus fine-tuning of code rate can be achieved. Figure5. BER performance vs. Q factor with maximum number of iterations set to 10.
Figure5. BER performance vs. Q factor with maximum number of iterations set to 15.
CONCLUSION
In this paper, we have highlighted a class of generalized LDPC codes as potential candidates for future lightwave transmission systems. We have provided a construction method of GLDPC codes based on simple block codes and demonstrated its flexibility for rate adaptation with unified hardware architecture. We have optimized its complexity and shown by means of FPGA-based emulation that very low error rates can be achieved. To the best of the authors' knowledge, this is the first FGPA implementation of GLDPC codes published in the literature.
ACKNOWLEDGMENT
This work was supported in part by NSF ERC Center for Integrated Access Networks (CIAN) under grant EEC-0812072 as well as by MURI ONR program.
