In ultra-high-speed (>400Gb/s per wavelength), high-spectral efficiency coherent optical communication systems using multi-carrier spectral superchannels, the maximum reach is severely limited due to linear and, foremost, nonlinear impairments. Hence, the implementation of advanced digital signal processing (DSP) techniques in optical transceivers is crucial for alleviating the impact of such impairments. However, the DSP performance improvement comes at the expense of increased cost and power consumption. Given that the computational complexity of the applied linear and nonlinear equalizers is the factor that determines the trade-off between the performance improvement and cost, in this study we provide an extended analysis on the computational complexity of various linear and nonlinear equalization approaches. First, we draw a complexity comparison between a conventional OFDM coherent receiver versus a filter-bank based OFDM receiver and it is shown that the latter provides significant complexity savings. Second, we present a comparison between the digital back-propagation split-step Fourier (DBP-SSF) method and the inverse Volterra series transfer function nonlinear equalizer (IVSTF-NLE) in terms of performance and computational complexity for a 32 Gbaud polarization multiplexed (PM)-16 quadrature amplitude modulation (QAM) OFDM superchannel. Keywords: multi-band OFDM superchannels, filter-bank modulation, nonlinear compensation, computational complexity, power consumption.
INTRODUCTION
Fibre bandwidth exhaustion and exponentially increasing traffic due to data intensive multimedia services render absolutely necessary the upgrade of the current wavelength division multiplexed (WDM) optical networks which can accommodate up to 100 Gb/s per wavelength. In order to meet these high capacity requirements, superchannel transceivers have been proposed enabling the transmission of high bit rates such as 400 Gb/s and 1 Tb/s [1] . On the other hand, the high susceptibility of these systems to linear, and mostly, to nonlinear fibre impairments requires the necessary development of advanced digital signal processing (DSP) algorithms to mitigate such impairments. Nonetheless, this advancement in the DSP part should be realized according to the trends in transceiver design considering the computational complexity and power consumption limitations. System vendors are interested in moving the pluggable transceivers for 100G coherent applications into smaller transceivers, such as the C form factor pluggable 2 (CFP2) in order to increase the bandwidth density. Although the CFP2 appears as a very promising candidate, it is challenging to include all the necessary elements of a 100G coherent transceiver (i.e. either in terms of the footprint or the electrical power budget). Another solution is the CFP2-ACO (analog coherent optics) [2] . In this design, the DSP ASIC is placed on the line card rather than on the module, reducing the footprint and the power consumption compared to other alternatives, such as the CFP or the OIF MSA. The main drawback for CFP2-ACO is that the flexibility is reduced since it can only be plugged into line card slots specifically designed for this transceiver technology. Under this restriction, the focus has now switched to the CFP2-digital coherent optics (DCO) which incorporates the DSP chip (anticipated to be released in 2017). However, the major challenge is to integrate the DSP chip in a limited package space while reducing the power consumption further in order to meet the stringent target specifications of CFP-DCO [3] .
According to OIF-Tech-Options-400G-01.0, the most viable solution to reduce the computational complexity of the DSP chip, and consequently, the power consumption in superchannel transmission systems (carrying bit rates greater than 100 Gb/s) is to divide the signal into many sub-channels operating at lower baud rate whilst maintaining the same digital-to-analog converter (DAC)/analog-to-digital converter (ADC) requirements [4] . The European Union FP7 ASTRON project has suggested and developed a filter-bank based optically-shaped orthogonal frequency division multiplexing (OS-OFDM) transceiver. The concept is to break the digital processing into multiple parallel virtual sub-channels, occupying disjoint spectral sub-bands. The main advantage of this transceiver is that the greater the number of sub-channels the more the delay spread due to chromatic dispersion (CD) is reduced, hence lowering the DSP complexity [5] . It is shown that the ASTRONproject filter-bank based OS-OFDM superchannel solution provides more than 40% reduction in complexity, compared to a conventional transceiver [6] .
In this paper, we carry out a comparison in terms of DSP complexity between the conventional OS-OFDM and the filter-bank based OS-OFDM transceiver. Moreover, considering that the filter-bank approach compensates only for linear impairments, we extended our investigation to two different nonlinear equalization techniques, digital back-propagation based on split-step Fourier method (DBP-SSF) [7] and 3 rd -order inverse Volterra series transfer function nonlinear equalizer (IVSTF-NLE) [8] . For cases of low complexity and power consumption (i.e. small number of steps-per-span), it has been shown through experimental results [9] that the performance of the IVSTF-NLE, in terms of Q-factor improvement, is comparable to the performance of the DBP-SSF method (i.e ~0.3 dB after 10×1000 km of single-mode fiber (SMF) in WDM transmission line [9] ). Therefore, we compare the IVSTF-NLE with the single-step-per-span DBP-SSF (DBP-SSF 1 ) and three-stepsper-span DBP-SSF (DBP-SSF 3 ) equalizers in terms of computational complexity and power consumption. Considering the computational complexity, our results reveal that the DBP-SSF 1 is only slightly less complex compared to IVSTF-NLE while the latter is almost three times less complex than the DBP-SSF 3 . Finally, the power consumption comparison between the nonlinear equalizers, is drawn, both theoretically and in real-time operation, considering 90 nm-and 45 nm-ASIC technology. We use the power consumption of a commercially available Intel 18-core Xeon processor chip, which is equal to 165 W, as a power-baseline. Both approaches show that the DBP-SSF 1 and the IVSTF-NLE consume ~200 W for a 1200 km transmission distance, which can be considered within the practical limits. In contrast, the DBP-SSF 3 equalizer consumes ~220 W for a transmission distance of just 400 km. Therefore, the power consumed after 1200 km, exceeding the 165 W power baseline significantly, is prohibitively high for real time implementations.
The rest of the paper is organized as follows: In Section 2, the design of the filter-bank based OS-OFDM transceiver is described and compared with the conventional transceiver in terms of computational complexity. In Section 3, the IVSTF-NLE is compared with the DBP-SSF 1 and -SSF 3 equalizers in terms of computational complexity and power consumption. Finally, the conclusions drawn from this study are presented in Section 4.
COMPUTATIONAL COMPLEXITY EVALUATION OF LINEAR EQUALIZATION SCHEMES
In this section, the design of the filter-bank based OS-OFDM transceiver is explained and its complexity, in comparison to a conventional receiver solution, is estimated. Figure 1 depicts the basic architecture of a filter-bank based OS-OFDM transceiver. In the system under development within the ASTRON project and discussed in this paper, the generated signals carry a 1 Tb/s bit rate, accommodated in eight channels with a total bandwidth of 175 GHz. Each channel has bandwidth of 25 GHz, in which each channel is divided into 16 sub-bands (occupying 1.6 GHz bandwidth) using discrete Fourier transform (DFT)-spread-OFDM (DFTS-OFDM) modulation scheme. Only 15 sub-bands are used for transmitting data while the 16 th sub-band is used as a sampling guard-band dedicated for the DAC/ADC filter roll-off (i.e., the symbol rate is 25 Gbaud but the ADC sampling rate is (25 GS/s)(16/15) = 26.6 GS/s). There are 960 = 1024×15/16 subcarriers per channel (i.e., data, null and pilot subcarriers), modulated with quadrature phase shift keying (QPSK) and 16-quadrature amplitude modulation (QAM) modulation with differential encoding. Note that no guard-bands are inserted, either between adjacent sub-bands (the ASTRON filter-bank is guard-band-free), or between adjacent channels.
The basic architecture of filter-bank based OS-OFDM transceiver
The eight optical sub-carriers are generated by a mode-locked laser (MLL). Subsequently, the spectral lines are separated using an arrayed waveguide grating (AWG) followed by a bank of InP in-phase/quadrature (IQ) modulators. Each of the modulators is driven by a pair of DACs. The modulator outputs are passively combined to generate the optical output of the transmitter (Tx). At the receiver (Rx), the signals are combined with optical local oscillators (LOs) generated using a MLL. For each channel, a polarization beam splitter separates the two polarizations, and following balanced detection, the signals are digitized using ADCs. DSP modules are used to perform signal shaping in the transmitter and dispersion and polarization mode dispersion (PMD) compensation, polarization tracking, frequency offset correction and phase noise mitigation in the receiver.
In Fig. 2 , the basic diagram of the DSP for the filter-bank based OS-OFDM approach is presented. The signal is filtered into sub-bands, which are independently processed, as described above. The main advantage offered by the filter-bank based OS-OFDM DSP is high chromatic dispersion (CD) compensation at very low computational complexity. This is due to the CD delay spread being quadratic in the total bandwidth (BW). Therefore, the CD-induced delay spread is reduced by a factor of M 2 by slicing the total BW into M sub-bands, as illustrated in Fig. 2 [5] . Consequently, effective dispersion compensation is achieved with lower computational complexity as quantified in the following sub-section. 
Complexity comparison between the filter-bank based OS-OFDM and the conventional transceiver
In this subsection, we draw a comparison between a full-band conventional receiver (Rx) versus a multi-subband (MSB) DFT-S OFDM Rx for 4,000 km transmission over SMF. We will show, via complexity calculations, that the filter-bank based OS-OFDM DSP provides substantial complexity advantage when implemented using either a field programmable gate array (FPGA) or an application specific integrated circuit (ASIC) DSP hardware. Evidently, the full-power saving advantages will be most pronounced in ASIC implementations but even in FPGA implementation the power savings are still considered highly beneficial, enabling the reduction of the total number of FPGAs used to implement the receiver. Initially, we define the complexity rate C, as the number of multipliers per second based on the assumption that the multipliers are the heaviest DSP operation, although the adders contribute non-negligibly in certain cases. Next, we define the complexity figure of merit, c=C/R, as the multipliers per given standard time interval (e.g., a symbol interval or a sampling interval or a hardware clock interval), where R is the number of standard time intervals per second. In this study, we use 1/(419 MHz) as the standard FPGA clock interval. Hence, the C and c, are given by the following formulas:
where V is the oversampling rate (the filter-bank is twice under-decimated, i.e., twice oversampled relative to a conventional critically sampled filter bank), M is the size of the FFT used in the filter-bank ( 16 M = ), and N is the FFT size used in the initial per-sub-band filtering of the DFT-spread OFDM sub-band receivers (in our study, N =128 is for the initial per-sub-band filtering, followed by the DFT-despreader FFT size N /2=64), Equation (2) is the basis for the evaluation of the complexity of the filter-bank based solution. For a conventional receiver solution, we have assumed a conventional MIMO polarization equalizer with a dozen taps and an Overlap-And-Save (OLS) conventional receiver, with its complexity modelled as follows:
Then, we optimize over l
,
where s L and h L are the duration of the signal record in order to perform the OLS FFT and the duration of the overlap window (equal to or exceeding the duration of the impulse response, h), respectively.
Based on these analytic complexity models, Fig. 3 shows the complexity comparison between a conventional receiver and the multi-sub-band (MSB) DFT-SOFDM receiver as published in [6] .
Figure 3. The complexity comparison of a full-band conventional Rx versus a multi-sub-band (MSB) DFT-S OFDM Rx for 4,000-km transmission over SMF and for 12 taps of memory for the conventional POL-demux
2×2 MIMO EQZ [6] .
The main source of complexity savings of the MSB solution stem from more efficient CD and 2x2 MIMO (PolDemux) equalizer (EQZ), due to the usage of sub-bands (we have already discussed the positive impact of sub-banding on CD but a similar benefit is drawn regarding polarization equalization, as the Polarization Mode Dispersion (PMD) frequency dependence is negligible over each narrowband sub-band). The 2 times underdecimated (Udeci) FB "overhead", which enables these savings, namely the computational resources required to invest into partitioning the channels spectrum into sub-bands (the complexity of the filter-bank DSP structure itself) is seen to be just several percent of total receiver complexity (the low complexity is a key advantage of our patented 2× under-decimated filter-bank structure). We note that both systems were designed to operate at the same high spectral efficiency over 2,000-km of SSMF: very low cyclic prefix (CP) spectral overhead of 1.56% (=2/128 =8/1,024). The full-band DFT-S OFDM transmitter (used with both receivers) uses 1,024-point OFDM symbols and inserts eight samples of CP in each of the MSB sub-band receivers, simply dropping one CP sample every 128 samples. In contrast, the full-band receiver needs heavy CD and adaptive 2×2 MIMO EQZs in the time domain (TD), before OFDM processing (an alternative for the conventional receiver would be a very long CP, reducing the spectral efficiency (SE), but we selected to conduct the comparison under identical very high SE assumptions). In summary, the following conclusions are evident: we save 57% of the DSP complexity by counting multipliers, i.e., reducing the complexity by a factor of 1/(100% -57%) =1/0.53=1.89. The conventional single carrier-transmission uses twice oversampling whereas the proposed scheme uses an oversampling factor of 1.06. Hence, our OS-OFDM approach further reduces the computational complexity. This is not to be confused with our twice-oversampling, which occurs within each sub-band (we indeed sample at 1.06 -e.g. 26.6 GS/s for 25 GHz spectrum-and have even demonstrated an RF anti-aliasing filter that allows us to take advantage of the compact spectrum afforded by our DSP, which is the counterpart of tight Nyquist spectrum). Thus, complexity is directly reduced, on account of the substantial reduction in sampling ratio, by another factor of 2/1.06 = 1.89 (in addition to the factor, which also happens to be 1.89, due to the 57% reduction in the DSP complexity). Consequently, altogether, the total complexity is reduced by a factor of 1.89×1.89 = 3.57. However, the DSP is typically about half the receiver ASIC (the other half being the soft decision forward error correction (FEC)), so we save a factor of approximately 3.57/2 = 1.79 for the whole receiver digital ASIC. As 1/1.79 = 0.56, then we save 1 -0.56 = 0.44 = 44%. Summarizing the key conclusion, under the various stated assumptions, through the use of the filter-bank based OS-OFDM transceiver can offer 44% saving in complexity (weighing in multipliers and sampling rate reductions).
COMPARISON OF NONLINEAR EQUALIZATION SCHEMES IN TERMS OF COMPLEXITY AND POWER CONSUMPTION
In this section we compare three different nonlinear equalizers, namely the IVSTF-NLE, the DBP-SSF 1 and the DBP-SSF 3 , in terms of computational complexity and power consumption using ASIC technologies. The 3 rdorder IVSTF-NLE discussed in this paper is a simplified version of the work presented in [7] . Both approaches, IVSTF-NLE and DBP-SSF, provide an approximate solution of the Manakov equation (12) . However, the key question is which of the two methods introduces the smallest error and at what expense in terms of computational complexity and power consumption. The signal propagation is described by the 
Comparison of computational complexity
The operating principle of the DBP-SSF is depicted in Fig. 4 . The algorithm calculates the propagation of the signal through the inverse of the actual link with inverted fibre loss, dispersion and nonlinearity, and negative amplifier gains. It makes use of the efficient and well-known split-step Fourier method. It functions as a zeroforcing equalizer and, although it has been shown to be sub-optimal when the effects of optical noise are included, still provides almost optimal performance at an achievable level of complexity. The detailed model of the algorithm can be found in [7] . According to [8] , the total complexity for the DBP-SSF Nsteps , in terms of number of real multiplications, is equal to ( ) 2 4 log 10.5
The operating principle of the IVSTF-NLE is illustrated in Fig. 5 . In this case, the solution of the Manakov equation is obtained with an analytical approach using the inverse Volterra series transfer function (IVSTF) kernels up to the third order, as described in [8] . The equalization process is divided into two parts: The linear part is realized in a single stage for all the fibre spans in the frequency domain (Fig. 5a) , whereas the nonlinear part is realized separately for each fibre span in the time domain (Fig. 5b) . The nonlinear compensation is realized by sweeping the adjustable parameter c in the vicinity of its nominal value, 0 1 [8] .
For the expressions of the IVSTF kernels of the first and third order, we use their simplified versions, as published in [8] . The simplified versions are based on two assumptions:
1. An optical fibre link with total number of spans spans N without dispersion compensation fibre (DCF). Thus, the IVSTF kernel of the first order is expressed as
where span L denotes the span length.
2. We take a simplified version of
in which the second term on the right-hand side is simplified to a term which represents the effective fiber length per span. Also, the waveform distortion within a span is ignored. Under these simplifications, the 3 rd -order kernel of the IVSTF is written as
where is the spacing between the discrete frequencies in the sampled spectrum. The detailed derivation of the above expressions can be found in [8] .
In Fig. 5(a) we present the linear part of the nonlinear equalizer, whose operation is summarized by the following relation, as given in [8] ( ) ( ) ( )
where is the output from the linear branch and ( ) . Following the rationale as described in [8] , the necessary number of real multiplications is 
N
As shown in Fig. 6 , we observe that the DBP-SSF 1 and the IVSTF-NLE equalizers differ only slightly in complexity, while DBP-SSF 3 appears to be almost three times more complex compared to the previous two. 
Comparison of power consumption based on ASIC technologies
In this section, we assessed the power consumed for the nonlinear compensation when using the IVSTF-NLE, the DBP-SSF 1 and the DBP-SSF 3 equalizers in a 32 Gbaud PM-16QAM system with the following parameters: 32 Gbaud PM-16QAM, 1200 km SMF, consisting of 15 spans with span length equal to 80 km and 4.5 dB erbium-doped fiber amplifier (EDFA) noise figure. Finally, the number of samples per symbol (SpS) used was equal to 2 for both linear and nonlinear compensation. The power consumption of the IVSTF-NLE and the DBP-SSF equalizers was estimated by calculating the number of multipliers used in the equalizer, and assuming the use of 90-nm complementary metal oxide semiconductor (CMOS) ASIC technology. Additionally, for the DBP-SSF equalizers, a complete 45-nm CMOS based circuit design study was carried out using a synthesis tool, and used to estimate the power consumption. Based on the calculated number of real multiplications for the three different equalizers (Section 3.1), we estimated the corresponding consumed power in each case. For the IVSTF-NLE, the number of real multiplications, required for its realization, was estimated as follows: the length of the 32 Gbaud PM-16QAM OFDM signal, directly before the input of the IVSTF-NLE, and after being downsampled to its initial sampling frequency, is N = 137546 samples, equal to the sum of the number of OFDM symbols in the frame (i.e. data, null and pilot subcarriers). Then, using the aforementioned formula calculating the number of real multiplications per polarization per sample, for and 10 spans N = it is calculated that 860.0189 real multiplications are carried out. If we apply the method OS with FFT block size 256 and overlap size of 46 samples, then 184 bits are carried for 32 Gbaud PDM-16QAM. In that case, the number of real multiplications per bit was found to be 4.6740. The number of real multiplications per bit was then multiplied by the power consumed per multiplier in order to obtain a first order rough estimation of the power consumption. The power consumption per multiplier was calculated following the rationale described in [11] , and it has been estimated to be ~0.24 W. Using this approximation, we estimated the power consumed by the IVSTF-NLE, DBP-SSF1 and DBP-SSF3 equalizers for transmission distances ranging from 100 to 1200 km, as presented in Fig. 7 . In Fig. 7 , the power of the linear part of the equalizer is also plotted for comparison. We observe that the power consumption of IVSTF-NLE and DBP-SSF 1 , after 1200 km, is ~250 W and ~220 W, respectively. This can be compared with the power consumption of the DBP-SSF 3 equalizer, which is ~220 W after only 400 km, and greater than 600 W after 1200 km transmission distance.
Since the previous approach is approximate for estimating the power consumption (relying though on an accurate computational complexity analysis), we also performed a more thorough and accurate power consumption analysis based on the full digital design of the DBP-SSF DSP suitable for ASIC implementation. The power consumed by the DBP-SSF 1 is calculated to be ~200 W after 1200 km transmission distance. Although these values are high, they are within practically achievable limits, as shown by comparison with the power consumption of the commercially available Intel 18-core Xeon processor chip, which is 165 W. The power consumption for the DBP-SSF 3 is more than three times higher compared to the power consumed by the DBP-SSF 1 for a 1200 km transmission distance. Thus, while the performance gains from the DBP-SSF with this step size is greater, its use is limited to a distance of 400 km in order to remain relatively close to the 165 W power limit. These results were obtained without intensive optimization and can be further enhanced by optimizing the circuit at the algorithmic level and reducing the total required number of steps-per-span. The results presented with this second specific approach were based on an ASIC technology designed in 45-nm CMOS. Nonetheless, switching from 45-nm to 22-nm CMOS technology node, the power can be reduced almost 2.5 times, as suggested in [12] .
CONCLUSION
In this study, a filter-bank based OS-OFDM transceiver is presented as a possible solution to the design considerations for integrating the DSP chip in the limited package space of a CFP-DCO with low power consumption. Its computational complexity was compared with the computational complexity of a conventional transceiver. It is shown that the filter-bank based OS-OFDM transceiver can provide a DSP complexity reduction of over 40% compared to the conventional one.
Given that the filter-bank based OS-OFDM DSP provides only linear compensation, the study was extended by estimating the computational complexity and power consumption of three different nonlinear equalizers, namely the IVSTF-NLE, the DBP-SSF 1 and the DBP-SSF 3 . The results reveal that the computational complexity of the DBP-SSF 1 is slightly lower compared to the IVSTF-NLE, in terms of the required number of real multiplications, whereas the DBP-SSF 3 is almost three times more complex compared to the previous two. Considering the power consumption comparison, the IVSTF-NLE consumes slightly more compared to DBP-SSF 1 which is in agreement with the computational complexity analysis. The power consumption of the latter is close to the power limit of 165 W (comparable to that of the high performance Intel 18-core Xeon processor) after 1200 km, while the DBP-SSF 3 consumes prohibitively high power exceeding the ~600 W. Therefore, given the stringent requirements for the CFP-DCO transceivers, it is apparent, based on our results, that a lot of effort should be focused on power consumption reduction of the DSP chip implementation when nonlinear compensation techniques are taken into account, which might come at the expense of the transmission performance (as measured in reach increase).
