In this invited paper, we describe a rate-adaptive FEC scheme based on LDPC codes together with its software reconfigurable unified FPGA architecture. By FPGA emulation, we demonstrate that this class of rate-adaptive LDPC codes based on shortening with an overhead from 25% to 42.9% provides a coding gain ranging from 13.08 dB to 14.28 dB at a post-FEC BER of 10 -15 for BPSK transmission. In addition, the proposed rate-adaptive LDPC coding combined with higher-order modulations have been demonstrated including QPSK, 8-QAM, 16-QAM, 32-QAM, and 64-QAM, which covers a wide range of signal-to-noise ratios. Furthermore, we apply the unequal error protection by employing different LDPC codes on different bits in 16-QAM and 64-QAM, which results in additional 0.5 dB gain compared to conventional LDPC coded modulation with the same code rate of corresponding LDPC code.
, and that are preferably adaptable to the time-varying optical channel conditions [4] [5] . Recently, softdecision binary and non-binary LDPC codes with an outer hard-decision code pushing the system BER to levels below target BER have been proposed in [6] [7] . Meanwhile, a spatially coupled LDPC code has been demonstrated to have very low error floors below the system's target bit-error rate (BER) [8] [9] . While it is essential to design an optimized FEC code offering the best coding gain, determining the optimal tradeoff between high-order modulation formats and the overhead of FEC codes is highly concerned in next generation 400-Gbit/s technology. Most recently, DP-QPSK, DP-16-QAM, DP-64-QAM, with varying code rates have been studied to achieve the highest generalized mutual information (GMI) at a given signal-to-noise ratio (SNR) and this study explored a total of 10 modulation formats to find the best combination of spectral efficiency and highest span loss budget [10] [11] .
In this invited paper, based on our recent publications [12] [13] [14] , we describe an adaptive field programmable gate array (FPGA)-based LDPC-coded modulation for the next generation of optical communication systems. Our motivation to this adaptive coded-modulation scheme is two-fold. Firstly, a well-constructed capacityapproaching LDPC code offers the promise of substantial performance gain. Secondly, a unified architecture of LDPC decoder together with various modulation formats have been shown to allow a wide range of performances for optical transport networks (OTNs), where large number of parameters can be reconfigured in order to cope with the time-varying optical channel conditions and service requirements.
The reminder of the paper is organized as follows. In Section 2, we first present the data flow of the LDPCcoded modulation emulator and the associated unified FPGA-based architecture and the corresponding performance, as well as the logic utilization, power consumption, latency, and throughput analysis. In Section 3, we then describe a rate-adaptive LDPC coding scheme combined with higher order modulation formats. Section 4 provides some important concluding remarks.
FPGA-BASED LDPC-CODED MODULATION EMULATOR
Let and represent the symbol log-likelihood ratio (LLR) of symbol s i and bit LLR of bit b j in one symbol, respectively; and let ( | ) represent a posteriori probability of the symbol s i given the received symbol . For the LDPC decoder, let , , , , and represent the check c to variable v extrinsic information (message), the variable v to check c at k-th iteration and l-th layer message, and the LLR from the channel, respectively; wherein = 1, … , (I max denotes the maximum number of iterations) and = 1, … , (γ is the column weight). The layered scaled min-sum algorithm (with scaling factor set to 0.75) is adopted in this paper [12] [13] [14] . The emulation processors can be summarized as Eqs. (1)- (6), wherein Eqs. (1)-(3) describe the symbol LLR and bit LLR calculation, while Eqs. (4)-(6) are related to the layered decoding algorithm.
In Eqn. (1) s 0 represents a referent symbol, while s f in (6) represents the scaling factor.
FPGA Architecture
We study the performance of our rate-adaptive LDPC-coded modulation in an FPGA platform, whose high-level diagram is illustrated in Fig. 1(a) . This platform consists of three parts: a set of PRBS 31 generators, a M-QAM mapper, two Gaussian noise generators, a symbol log-likelihood ratio calculator, a bit LLR calculator, a rateadaptive LDPC decoder based on layered scaled min-sum algorithm, and an error counter circuit. The PRBS 31 generator is based on linear feedback shift register with a 31-bit initial value. An M-QAM mapper is stored in two read only memories (ROMs). The Gaussian noise generator, implemented using two linear feedback shift register (LFSR)-based uniform generator combined with Box-Muller algorithm, generates samples of the white Gaussian noise. Such generated sequence of samples is multiplied with standard deviation of the noise σ and fed to the symbol LLR block which is implemented based on Eq. (1). It is worth mentioning that the max-star (max * ) operation is replaced by max operation to reduce complexity. Then the quantized bit LLR is obtained based on Eqn. 
The architecture of the code rate reconfigurable binary LDPC decoder is shown in Fig. 1(b) . There are three types of processors shown in the figure: (i) variable node unit (VNU) based on Eqns. (3)-(4) will take input from memories and and produce , and , , (ii) scaled min-sum check node unit (CNU) based on Eqn. (5) take inputs , and produce , , (iii) early termination unit (ETU) that is making a bit decision based o , n. In addition, there are four types of memories in the implementation: (i) memory for with size of × × stores , , (ii) memory for with size of × stores the initial LLRs, (iii) memory for ̂ with size n stores the decoded bits. In discussion above, denotes the column weight, is the codeword length, and represent the word-lengths for and . The most computational complexity involved module is CNU, shown in Fig. 1(c) . In this module, the ABS-block first takes the absolute value of the inputs and the sign XOR array produces the output sign. Then we find the first minimum value via binary tree and trace back the survivors to find the second minimum value as well as the position of the first minimum value. At last, we will reconstruct the output data from sign bits and the three outputs from scaled two minimums' finder block. Furthermore, we can take advantage of the technique to significantly reduce the memory usage.
Emulation Results and Analysis
The mother LDPC code (3, 15) (n=34635, k=27710) is constructed based on permutation matrices due to its efficient implementation [15] , and the rate adaptation is achieved by eliminating several blocks from a mother code by setting the initial LLRs to the largest integer value. We employ the 8-bit uniform quantization scheme for messages ( , , , , ) to ensure that the error floor phenomenon is due to the code-design itself instead of finite precision representation, while keeping the decoding complexity reasonably low.
The BER vs. SNR performance of the proposed rate-adaptive LDPC code with number of layered iterations set to 45 is summarized in Figure 2 , where we have shown a set of LDPC component codes of code rates {0.8, 0.786, 0.77, 0.75, 0.727, 0.7} in which rate-adaptation is performed via puncturing, combined with a set of modulation formats, namely, BPSK, QPSK, 8-QAM, 16-QAM, 32-QAM, and 64-QAM. Table 1 presents coding gains at BER of 10 -15 , obtained via extrapolation (the last point corresponding to BER of 10 −15 is extrapolated). One can clearly observe that a flexible NCGs ranging from 11.83dB to 12.25dB can be achieved by employing our rate-adaptive LDPC coding scheme. Additionally, when combined with higher-order modulation formats, the proposed rate-adaptation, when applied to both component code rates and modulation format size, can offer extremely flexible performance by adapting to the time-varying optical channel conditions. It is worth mentioning that the coding gain decreases as the constellation size increases. We will explain and address this observation in the next section. 
Implementation Analysis
Apart from error correction performance of the rate-adaptive LDPC-coded modulation, logic utilization, power consumption, and latency represent other relevant aspects. We compare the logic utilization and power consumption of six LDPC-coded modulation schemes, which have been implemented in Xilinx xc6vsx475t. Each emulator comprises log 2 ( ) PRBS generators ( is the signal constellation size), one or two Gaussian noise generator for BPSK and higher order modulation formats, respectively; one symbol LLR calculator, one bit LLR calculator, and one reconfigurable LDPC decoder. The resource utilization is summarized in Table 2 . One can clearly notice that the occupied slices utilization increases as the modulation format size increases, while the memory utilization is almost the same due to negligible amount of memory utilization except inside the LDPC decoder. In addition, the on-chip power consumption from clocks, logics, signals, BRAMs, DSPs, MMCMs, and IOs are shown in the last column in Table. 2. The power consumption increases as we increase modulation format size, while this increase is not significant. As we discussed above, we duplicate four LDPC-coded modulation emulators in one FPGA and with four FPGAs available in our rapid prototyping platform, in total 16 emulators are employed. Each decoder consists of 3 CNUs and 45 VNUs in the implementation, hence the throughput of the decoder can be calculated by * /[( ⁄ + ) * ], where = 200 MHz is the FPGA running frequency, is number of bits per codeword, = 2309 is the block size, = 3 is the pipeline depth, = 7 is the latency of VNP and CNP, = 45 is the maximum number of layered iterations. It is worth noting that the decoder will converge fast at high SNR regime (~24 iterations verified by simulation). The aggregation throughput of the mother code will be ~3.17 Gbit/s at low SNR regime and ~5.94 Gbit/s at high SNR regime, while the throughput of code rate of 0.7 will be ~2.11 Gbit/s and ~3.96 Gbit/s respectively.
RATE-ADAPTIVE LDPC-CODED MODULATION BY UNEQUAL ERROR PROTECTION (UEP)
The uncoded and coded BER performance vs. SNR of each bit within a symbol in BPSK, QPSK, 8-QAM, 16-QAM, 32-QAM, and 64-QAM are shown in Fig. 3 (a) and (b) , respectively. A close look at Fig. 3 reveals that each bit in higher order modulation is protected unequally. For instance, the performance of the first bit and second bit has the same performance (the same applies for the third and the fourth bit) in 16-QAM. Additionally, at input BER threshold of 4.2×10 -2 of LDPC code with code rate of 0.75, the corresponding SNR limits of first and second bit in 16QAM are 9.37 dB and 11.72 dB, respectively. This phenomenon is illustrated in Fig. 3(b) as well since the SNR gap of coded BER is approximately 2.4 dB. The overall SNR limit of post-FEC BER of 10 -15 will be limited by the worst bit, which inspires us to use unequal error protection (UEP) of rate-adaptive LDPC codes for different component bits in higher-order modulation formats. Another interesting observation is the best bit performance in 64-QAM is comparable to the worst bit performance in 16-QAM. In addition, the slopes of performance curves, associated with different bits in high-order modulation formats, are slightly different since the distribution of bit LLRs is not Gaussian anymore.
(a) (b) Figure 3 .
BER vs. SNR performance for: (a) uncoded, (b) LDPC-coded cases.
In order to bridge the gap between different bits in high-order modulation formats, non-binary LDPC codes can be employed [16] . Due to its extremely high implementation complexity [7] , we propose to use different error correction performance codes to different bits in high-order modulation format. Namely, instead applying code rate of 0.75 to all four bits in 16-QAM and six bit 64-QAM, we employ code rate of 0.7 to first and third bit and code rate of 0.8 to second and fourth bits in 16-QAM. Meanwhile, we apply code rate of 0.7, 0.75, 0.8 to the first and fourth pair, second and fifth pair, and third and sixth pair bit in 64-QAM; both configurations will result in the same code rate of 0.75. As shown in Fig. 4 , the BER vs. SNR performance reveals the existence of coding gain improvement of UEP scheme compared with the conventional scheme. More specifically, the UEP scheme provides 0.5 dB additional gain at the BER of 10 −15 compared to the corresponding (3, 12)-regular QC-LDPC (27708, 20781) code when 16-QAM and 64-QAM are used. Meanwhile, there is no error floor phenomenon observed at BER of 10 −15 after ~10 16 bits have been emulated, which implies the effectiveness of designing highgirth QC-LDPC code. It is worth mentioning that we can further bridge the gap between different bits in large constellation by more flexible component codes, however, addressing the difference of latency and the throughput of different component decoder might be more challenging. :  bit0  QPSK  bit0,bit1  8-QAM:  bit0  bit1  bit2  16-QAM:  bit0,bit 2  bit1,bit3  32-QAM:  bit0  bit1  bit2  bit3  bit4  64-QAM:  bit0,bit3 bit1, :  bit0  QPSK:  bit0,bit1  8-QAM:  bit0  bit1  bit2  16-QAM:  bit0,bit2  bit1,bit3  32-QAM:  bit0  bit1  bit2 bit3  bit4  64-QAM:  bit0,bit3  bit1,bit4  bit2,bit5   10  11  12  13  14  15  16  17  18 
CONCLUDING REMARKS
In this invited paper, we have described our novel class of FPGA-reconfigurable rate-adaptive LDPC codes with overhead ranging from 25% to 42.9% for high-speed optical transmission systems. The BER performance has been verified through FPGA emulation system and it has been shown that the FPGA-based LDPC-coded modulation schemes exhibit a superior waterfall performance and excellent error floor performance down to BER of 10 −15
. Moreover, additional SNR gain of 0.5 dB can be achieved by employing the rate-adaptive LDPC codes to 16-QAM and 64-QAM, and on such a way providing the unequal error protection. To the best of our knowledge, this is the first FPGA implementation results of flexible LDPC-coded modulation. We believe that the FPGA-based rate-adaptive QC-LDPC codes together with several modulation schemes represents one of the promising candidates for the next generation of optical communication systems.
