Abstract We propose and demonstrate a low-complexity LDPC FEC system for coherent optical applications. Implementation results show an estimated NCG of 11.0 dB with 20% overhead, 160 Gbps throughput, and energy consumption of 3.4 pJ per bit.
Introduction
Forward error correction (FEC) is a critical component of modern optical communication systems, which demand FEC with very high performance, featuring throughput of at least 100 Gbps, low power consumption, and coding gain approaching the theoretical limit 3 . There is a continual demand for higher performing FEC, as higher net coding gain (NCG) permits longer maximum ranges and increased capacity.
Soft-decision (SD) FEC operates on reliability measures of received symbols, and thus can achieve higher performance than hard-decision (HD) FEC, which operate only on the most likely received symbols. In coherent optical systems, soft information is readily available, so SD-FEC is possible. Turbo product codes (TPCs) 2 and low-density parity check (LDPC) codes 5, 7 are the most commonly proposed FEC codes for such systems. Spatially-coupled (SC) LDPC codes, which provide greater NCG than block LDPC codes, have been the focus of much recent work 3, 9, 10 . Staircase codes, a type of SC HD code that achieve coding gains approaching SD codes, have also been proposed for optical transport applications 11 . TPCs and LDPC codes are decoded using iterative message passing algorithms, which are quite costly in terms of computation and power consumption. It has been estimated that SD-FEC using LDPC codes consumes approximately 15-20% of the total energy in a long-haul 100 Gbps coherent optical link 8 . The applicationspecific integrated circuit (ASIC) implementations of these high-throughput, computationally complex decoders also require high silicon area, which translates directly to high capital cost. Careful consideration of the trade-offs between conflicting performance and complexity requirements is therefore needed when designing such FEC systems 6 .
We previously proposed the adaptive degeneration (AD) LDPC decoding algorithm as a compromise between performance and complexity for SD-FEC in coherent optical communication systems 1 . In this paper, we present an improved version of the AD algorithm, called the "prior-assisted AD" (PAD) algorithm, so named because it preserves prior information and uses it throughout decoding. In the following sections, we describe the algorithm, then present field-programmable gate array (FPGA) simulation and ASIC synthesis results of a PAD decoder using a (36000, 30000) quasi-cyclic (QC) LDPC code. It achieves an estimated NCG of 11.0 dB at a bit error rate (BER) of 10 −15 with 20% coding overhead (OH), which represents an improvement of about 0.4 dB over standard AD, and reduces the gap to normalized min-sum algorithm (NMSA) based FEC to 0.4 dB.
The PAD Algorithm
An LDPC code is characterized by an m×n parity check matrix H, with n bits and m parity checks. A bit i participates in parity check j iff H j,i = 1. If all parity checks are met, the n bits form a valid codeword. An (n, k) LDPC code has k information bits and n − k parity bits. An LDPC code can also be described with a Tanner The PAD algorithm is described in Algorithm 1, while free parameters are listed and described in Table 1 but if no decoding progress has been made in the previous iterations (as defined by a reduction in the number of unsatisfied parity checks), a larger value γ 1 is used. The PAD algorithm adds measures that improve trapping set correction capability in LDPC codes with d v = 4, as well as performance in the waterfall region. The major addition is the storage and use of prior LLR information in κ v and λ v . If the prior LLR magnitude λ v is sufficiently large, then its sign κ v acts as an additional input to the VN message sum, either on every 4th iteration (if T 0 < λ v < T 1 ), or on every iteration if λ v is very large (λ v ≥ T 1 ). This is disabled via control register 1 when the number of unsatisfied parity checks falls below a threshold τ , since it can interfere with the correction of trapping sets containing VNs with high-magnitude but incorrect λ v .
Additionally, M v is prevented from having a higher magnitude than λ v . This measure increases the ability of the algorithm to correct certain classes of trapping sets, in which incorrect VNs would otherwise allow M v to increase to the maximum magnitude. To prevent correct bits from being flipped, λ v cannot be set smaller than γ 1 . This threshold was found to achieve a good balance between trapping set correction capability and avoiding mass flips of correct bits.
Decoder Implementation Results
To characterize the performance of the PAD algorithm, we implemented a PAD decoder for a (36000, 30000) regular QC-LDPC code with d v = 4 and d c = 24, constructed using the finite field subset method 4 . LLRs and M v all use 5 bit quantization with a standard fixed-point format of 1 sign bit, 3 integer bits, and 1 fractional bit. The maximum number of iterations i m is set to 74. The decoder architecture is fully parallel (the FPGA implementation is partially parallel, but emulates a fully parallel design). Fig. 1 shows frame error rate (FER) and BER performance results obtained from software FPGA implementations using BPSK modulation over an AWGN channel. No failures due to small trapping sets were observed with E b /N 0 set to 3.9 dB, and no errors were observed in 9 · 10 14 bits at 3.95 dB. Based on these observations, we estimate NCG of 11.0 dB at a BER of 10 −15 by extrapolating the BER curve 1, 9 . This is roughly 1.65 dB away from the Shannon limit. Compared to other FEC systems, this decoder's estimated NCG is about 0.4 dB less than NMSA with the same OH 7 , 0.4 dB more than AD 1 , and 0.6 dB more than hard-decision staircase codes 11 . ASIC synthesis results for this decoder are summarized in Table 2 . The fabrication process is STMicroelectronics 28 nm FD-SOI, using a nominal supply voltage of 0.9 V. The power estimate was obtained via netlist simulation with backannotated parasitics, and random codewords with E b /N 0 set to 4.0 dB. Due to the PAD algorithm's increased complexity, silicon area and energy consumption are both about 50% higher than AD. Throughput is lower as well, which is primarily due to slower convergence of the d v = 4 LDPC code compared to the d v = 6 codes used with AD, which necessitates a higher maximum number of iterations.
However, the estimated energy consumption of 3.37 pJ per bit remains several times lower than reported figures for more complex FEC systems. Estimates for the energy consumption of SD-LDPC decoders for coherent optical systems range from 20 pJ per bit 5 to 60 pJ per bit 8 in 28 nm fabrication technologies, and 70 pJ per bit for a TPC decoder in 40 nm CMOS 2 .
Conclusions
The PAD algorithm is more complex than AD, but achieves significantly higher NCG. With an estimated NCG of 11.0 dB with 20% OH, and energy consumption of 3.37 pJ per bit, it represents an excellent trade-off for FEC in coherent optical systems where cost and power consumption are high priorities.
