Design tradeoffs and challenges in practical coherent optical transceiver implementations by Morero, Damián Alfonso et al.
JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 34, NO. 1, JANUARY 1, 2016 121
Design Tradeoffs and Challenges in Practical
Coherent Optical Transceiver Implementations
Damián A. Morero, Mario A. Castrillón, Alejandro Aguirre, Mario R. Hueda, and Oscar E. Agazzi, Life Fellow, IEEE
(Tutorial Review)
Abstract—This tutorial discusses the design and ASIC imple-
mentation of coherent optical transceivers. Algorithmic and archi-
tectural options and tradeoffs between performance and complex-
ity/power dissipation are presented. Particular emphasis is placed
on flexible (or reconfigurable) transceivers because of their im-
portance as building blocks of software-defined optical networks.
The paper elaborates on some advanced digital signal processing
(DSP) techniques such as iterative decoding, which are likely to
be applied in future coherent transceivers based on higher order
modulations. Complexity and performance of critical DSP blocks
such as the forward error correction decoder and the frequency-
domain bulk chromatic dispersion equalizer are analyzed in detail.
Other important ASIC implementation aspects including physical
design, signal and power integrity, and design for testability, are
also discussed.
Index Terms—ASIC, chromatic dispersion equalization, CMOS
implementation, DSP, FEC, iterative optical receivers, optical
fiber, reconfigurable coherent transceivers, VLSI.
I. INTRODUCTION
THE combination of digital signal processing (DSP), ad-vanced CMOS VLSI technology, and coherent optical
transmission has revolutionized optical communications and it
has enabled major increases in speed, capacity, spectral effi-
ciency and flexibility of optical transmission, as well as major
cost reductions [1]–[4]. Transmission at 100 Gigabits per sec-
ond (Gb/s) with dual polarization (DP) quadrature phase shift
keying (QPSK) coherent systems (i.e., symbol rate ≈ 32 Giga-
baud, GBd) has reached maturity and widespread deployment.
200 Gb/s DP 16-quadrature amplitude modulation (16-QAM)
coherent transceivers are commercially available today. Semi-
conductor industry is currently engaged in the ASIC develop-
Manuscript received June 15, 2015; revised July 26, 2015; accepted August
13, 2015. Date of publication August 18, 2015; date of current version January
24, 2016.
D. A. Morero is with the ClariPhy Argentina S.A., Bloque Suquı́a, Complejo
Capitalinas, Córdoba X5000SFV, Argentina, and also with the Universidad
Nacional de Córdoba - CONICET, Córdoba X5016GCA, Argentina (e-mail:
dmorero@efn.uncor.edu).
M. A. Castrillón and M. R. Hueda are with the Universidad Nacional de
Córdoba - CONICET, Córdoba X5016GCA, Argentina (e-mail: acastrillon@
efn.uncor.edu; mhueda@com.uncor.edu).
A. Aguirre is with the ClariPhy Argentina S.A., Bloque Suquı́a,
Complejo Capitalinas, Córdoba X5000SFV, Argentina (e-mail: alejandro.
aguirre@clariphy.com.ar).
O. E. Agazzi is with the ClariPhy Communications, Inc., Irvine, CA 92618
USA (e-mail: oscar.agazzi@clariphy.com).
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/JLT.2015.2470114
ment of transceivers for 400 Gb/s and soon it will move to
1000 Gb/s (1 Tb/s). A way to achieve immediately 400 Gb/s
rate is to transmit on two wavelengths at 200 Gb/s each [1].
Key drivers of coherent technology today are: (i) Increases in
speed, (ii) Reduction of power dissipation, and (iii) Flexibility
in modulation formats, data rates, coding gain (CG), host-side
protocols, etc. [5]. The latter functionality is a fundamental as-
pect of reconfigurable transceivers, which are key components
of software defined optical networks (SDON) [6], [7].
This article presents tradeoffs in the design and implementa-
tion of coherent optical reconfigurable transceivers. Algorithmic
and architectural options and tradeoffs between performance
and complexity are presented and candidate DSP algorithms
for next generation coherent transceivers are discussed. As an
example, we discuss the joint iterative detection and decod-
ing (JIDD) algorithm proposed in [8]. This algorithm has been
found to be effective in compensating laser phase noise, laser
frequency fluctuations, and fiber nonlinearities [9].
The telecommunications industry is migrating from static
networks with little flexibility to SDON [10], [11]. The moti-
vation for this paradigm shift is that large increases in traffic
cannot be supported by static increases in network capacity.
The usable bandwidth in fibers is limited and increases in
spectral efficiency come at the cost of reduced reach. As
a result, a homogeneous increase in data rate or spectral
efficiency is not economically viable. Furthermore, telecom-
munications companies face an increasingly heterogeneous and
dynamic environment in optical transport networks (OTN).
This includes, among other parameters, connection lengths,
bandwidth requirements, and connection hold times. Another
reason for this heterogeneous and dynamic environment is the
migration to flexible grid dense wavelength division multiplex-
ing (DWDM) links [11]. Supporting elastic or SDONs requires
transceivers that support adaptation of at least the following
parameters and functionality: modulation format, symbol rate,
channel spacing, and forward error correction (FEC) overhead
and CG [1], [7], [12]. By adapting these and other parameters
it is possible to achieve excellent power dissipation/reach/data
rate tradeoffs. The same transceiver can operate in metro,
long haul, ultra long haul and submarine links with optimal
performance and high power efficiency. Towards this end,
FEC codes with medium-high net coding gains (NCG) (e.g.,
NCG = [9 − 13] dB) at a bit error rate (BER) of 10−15 and
variable overhead (OH) (e.g., OH = [7 − 60]%) are mandatory
(e.g., see [13] and references therein). Furthermore, the demand
for higher symbol rates (e.g., 45 or 64 GBd) on long fiber links
0733-8724 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.
122 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 34, NO. 1, JANUARY 1, 2016
prompts the implementation of complex and flexible bulk chro-
matic dispersion (BCD) equalizers capable of compensating
a wide range of chromatic dispersion (e.g., [0 − 250] ns/nm)
[4], [14], [15]. Complexity and power consumption of high
performance FEC and BCD blocks are major challenges for
practical implementations of commercial devices.1
In this paper we address practical aspects of the architecture
of commercial coherent transceivers for metro, long haul, and
submarine fiber optic communications. We focus the analysis
on performance-complexity tradeoffs of the most critical DSP
blocks: BCD equalizer and FEC decoder. Implementation top-
ics such as layout, energy efficiency, wiring, signal and power
integrity issues, testability, etc., are also presented. Since these
factors have a significant impact on performance, it is crucial to
take them into account during the design and implementation of
the transceiver.
The rest of this paper is organized as follows. Section II dis-
cusses architectural tradeoffs as well as some promising DSP
techniques. Section III analyzes performance-complexity trade-
offs of the most critical DSP blocks in coherent transceivers,
while Section IV addresses practical implementation topics. Fi-
nally, concluding remarks are given in Section V. To facilitate
reading, frequently used abbreviations are listed in Table I.
II. ARCHITECTURAL TRADEOFFS
Optical network operators want to achieve the best possible
tradeoff between spectral efficiency and reach. On short fibers,
where the optical signal-to-noise ratio (OSNR) is high, they want
to be able to transmit more bits per unit bandwidth by selecting
larger constellations such as 16-QAM, 32-QAM or 64-QAM.
But on longer fibers where the OSNR is much lower, they want
to be able to switch to smaller constellations such as QPSK or
even binary phase shift keying (BPSK) to be able to close the
link at the expense of a reduced data rate. Other tradeoffs can be
achieved by controlling the code rate or overhead and the CG,
varying the symbol rate, etc. If the transceiver can support this
type of programmability, it is also possible to achieve the best
tradeoff between performance and power dissipation. Moreover,
the same transceiver can be used in very different applications,
ranging from metro to long haul, ultra long haul or submarine, by
just reconfiguring it appropriately. This kind of reconfigurable
transceivers is a key component of SDON. In the following we
analyze some tradeoffs in the design of the most critical blocks
of reconfigurable coherent transceivers.
A. Practical Architecture of Reconfigurable Transceivers
Reconfigurable transceivers support link adaptation, where
transmission parameters such as modulation and coding are
adjusted to take advantage of prevailing channel conditions.
In this way it is possible to achieve excellent power dissipa-
tion/reach/data rate tradeoffs. A related concept is that of slice-
able transceivers [7], where, for example, a 400 Gb/s transceiver
can be sliced into three virtual transceivers with rates 200, 100
1Development of high speed, high resolution analog-to-digital and digital-to-
analog converters (i.e., ADC and DAC) is also a difficult task (e.g., see [16]).
TABLE I
LIST OF COMMONLY USED ABBREVIATIONS
ASIC Application-Specific Integrated Circuit
B2B Back to Back
BCD Bulk Chromatic Dispersion
BER Bit Error Rate
BICM Bit Interleaved Coded Modulation
BPS Blind Phase Search
BPSK Binary Phase Shift Keying
CDE Chromatic Dispersion Estimation
CG Coding Gain
CPR Carrier Phase Recovery
CTS Clock Tree Synthesis
DFT Design for Testability
DP Dual Polarization
DSP Digital Signal Processing
DWDM Dense Wavelength Division Multiplexing
ECPR Explicit Carrier Phase Recovery
FEC Forward Error Correction
FFE Feedforward Equalizer
FFT Fast Fourier Transform
FTN Faster-Than-Nyquist
HD Hard Decision
IFFT Inverse Fast Fourier Transform
JIDD Joint Iterative Detection and Decoding
LC-CM Low Complexity Coded Modulation
LDPC Low Density Parity Check
NCG Net Coding Gain
OH Overhead
OSNR Optical Signal-to-Noise Ratio
PDN Power Distribution Network
PMD Polarization Mode Dispersion
QAM Quadrature Amplitude Modulation
QPSK Quadrature Phase Shift Keying
RAM Random Access Memory
SD Soft Decision
SDD Soft Decision Demapper
SDON Software Defined Optical Networks
SQRT-RC Square Root Raised Cosine
SNR Signal-to-Noise Ratio
STA Static Timing Analysis
TPC Turbo Product Codes
UQ Uniform Quantization




Symbol Rate 10–32 GBd
Modulation Schemes (diff/no-diff) DP-BPSK, DP-QPSK, DP-8-QAM,
DP-16-QAM
Line-side SD-FEC (Gain [dB]/Overhead [%]) LDPC (11.3/20, 11.2/18, 11.1/16)
Line-side HD-FEC (Gain [dB]/Overhead [%]) RS (8/6.67)
Pilot Symbol Overhead 0-5 %
CD Compensation Capability 2–250 ns/nm
Mean PMD Compensation Capability 10–50 ps
Polarization State Tracking 10–100 kHz
Host Protocol 100G Ethernet, OTU3, OTU4
and 100 Gb/s. Table II shows some examples of programmable
parameters in a reconfigurable transceiver.
Fig. 1 describes the egress path (transmit path from the client
towards the optical fiber) of a typical reconfigurable transceiver
available today. Data is received from the host on up to 20 lanes
MORERO et al.: DESIGN TRADEOFFS AND CHALLENGES IN PRACTICAL COHERENT OPTICAL TRANSCEIVER IMPLEMENTATIONS 123
Fig. 1. Block diagram of the egress path.
Fig. 2. Block diagram of the ingress path.
at 10 Gb/s per lane, and assembled in OTU4 Frames, according
to the OTN standards. The data is then modulated in one of vari-
ous user selectable modulation formats, such as BPSK, QPSK or
16-QAM, either differential or non-differential. Then the sym-
bols are filtered by spectral shaping filters having a square root
raised cosine response (SQRT-RC), possibly combined with a
pre-emphasis response (necessary to pre-compensate for the at-
tenuation of the electrical circuits connecting the chip with the
optical modulator). The sampling rate of these filters is typi-
cally twice the symbol rate, but it can be adapted by digital
interpolation to a different sampling rate in the DACs, chosen
to optimize analog design. The outputs of the transmitter are
four signals representing the in-phase and the quadrature com-
ponents of the horizontal and vertical polarizations. Therefore
the optical signal is modulated in both phase and polarization.
Fig. 2 describes the ingress path (receive path from the opti-
cal fiber towards the client). The optical signal is converted to
electrical and it enters the transceiver as four lanes represent-
ing the in-phase and quadrature components of the two polar-
izations. These are sampled and digitized by four high speed
ADCs, sampling typically at a rate of between 55 and 64 GHz.
After some pre-processing,2 the signal is passed to the BCD
equalizer, which estimates the chromatic dispersion of the link
2The preprocessing before BCD includes: retiming and demultiplexing to
reduce the clock rate for parallel processing, compensation of demodulator
skews and angular errors, and coarse carrier frequency offset compensation.
through the fiber length estimator (FLE) block, and compen-
sates the chromatic dispersion of several thousand kilometers of
fiber. SQRT-RC filtering is also implemented in the BCD block.
Typical values of the sampling rate in the BCD filter could be
in the range of 1.5/T to 1.75/T . A FIFO is used to cross from
the clock domain of the BCD filter to that of the feedforward
equalizer (FFE). The interpolator resamples the signal to 2/T
and adjusts the sampling phase under control of the timing re-
covery (TR) block. TR synchronizes the local clock to the clock
of the transmitter at the opposite end of the fiber [17]. It also
adjusts the phase to maximize the signal-to-noise ratio (SNR).
Then the signal is passed to the FFE, which compensates for
polarization mode dispersion (PMD) and other forms of inter-
symbol interference (ISI), and demultiplexes the polarizations
[18]. Then the carrier phase recovery (CPR) block eliminates
any residual carrier not eliminated by the local oscillator and
compensates phase noise [19], [20], and finally the signal is
detected by a soft decision demapper (SDD). The SDD is a gen-
eralization of a simple slicer. Instead of hard decisions (HDs),
it computes soft decisions that represent the log likelihood ratio
of a 0 versus a 1. This information is used by the soft decision
(SD) FEC decoder to achieve higher CG than a traditional hard
decision FEC decoder [21]. Finally, the OTU4 frames are iden-
tified, the payload is extracted, errors are corrected by the soft
decision FEC decoder (e.g., a low density parity check (LDPC)
decoder) and the bits are passed to the client through the host
interface.
124 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 34, NO. 1, JANUARY 1, 2016
B. DSP Tradeoffs
1) Carrier Phase Recovery: CPR is a key function of in-
tradyne coherent optical QPSK/M -QAM receivers [19], [22].
In these devices, CPR algorithms are required to track effects
such as phase noise as well as laser frequency fluctuations in-
troduced by mechanical vibrations or other sources, including
power supply noise [23]. Phase noise constitutes one of the ma-
jor factors that limit the performance of coherent transceivers.
Phase noise may exist as a result of the nonzero linewidth of
the transmit laser or the local oscillator. It is also introduced by
self and cross phase modulations (SPM/XPM) in DWDM sys-
tems [24]–[27]. Feedforward phase estimation schemes such as
the Viterbi Viterbi (VV) or blind phase search (BPS) algorithms
have been widely used as a result of their good laser linewidth
tolerance and feasibility for parallel implementation [19], [22].
These schemes can be used in combination with feedback CPR
techniques in order to improve the tracking of short-term fre-
quency instabilities of the lasers [20]. Alternatively, a JIDD
technique with pilot symbols has been found to provide a better
performance than traditional CPR solutions in the presence of
laser phase noise, fiber nonlinearities, and frequency fluctua-
tions [9].3 However, complexity reduction of the JIDD receiver
is still required for it to be applied in commercial devices. In
DWDM systems with high XPM-induced phase noise, a pilot
RF tone inserted at the edge of the band in the transmitted signal
can be exploited by the receiver to extract a reference for the car-
rier recovery system [28], [29]. Since the pilot tone is affected
by the same phase noise as the signal, the extracted RF tone can
be used to compensate the carrier phase noise. A pilot RF tone
could also be combined with the JIDD technique to improve the
decoding process and avoid the overhead of the pilot symbols
typically used in non-differential modulation systems.
2) Chromatic Dispersion Equalization: As we shall show
in Section III, reducing the power consumption of the chro-
matic dispersion equalizer is a challenge in ultra long haul or
submarine transceivers required to compensate CD up to 250-
300 ns/nm. Another challenge in all applications of coherent
transceivers is achieving high accuracy and fast convergence of
the chromatic dispersion estimation (CDE) algorithm [30], [31].
The accuracy of frequency domain CDE techniques based on the
timing tone level decreases in the presence of spectral shaping
with low roll-off factor. Accuracy and speed of convergence of
CDE algorithms based on time domain cost functions depends
on the modulation format [30] and may degrade in the presence
of high noise and residual channel dispersion (e.g., high-order
PMD or controlled ISI in faster-than-Nyquist (FTN) systems
[32]). Exhaustive computer simulations are necessary to ensure
reliable operation of CDE.
3) Nonlinear Compensation: Nonlinear compensation us-
ing digital back propagation has been extensively discussed in
the literature [29], [33]–[36]. Some reduced complexity imple-
mentations of back propagation have been proposed, however
complexity remains very high. A lower complexity technique
called perturbation based precompensation has been proposed
[37], [38]. This technique is based on computing long sums of
3Additional discussion on iterative receivers is provided in Section II-E.
cross products of transmitted symbols. The terms are weighted
by channel-dependent coefficients. Although the number of
terms involved in the summations could be large, complexity
is not high because the cross products are essentially logic oper-
ations which are efficiently implemented with a few gates. The
main challenge with this technique is that the coefficients are
not easily determined and there is no known way to make them
adaptive.
4) Spectral Shaping: A new standard, called the flexible
grid, enables the allocation of fine-grained units of bandwidth to
each wavelength. Spectral shaping aims at reducing the band-
width of the channels with the objective of packing as many
channels as possible in a given optical fiber bandwidth, tak-
ing advantage of the flexible grid. Conversely, spectral shaping
allows maximizing the data rate transmitted on a given band-
width, such as the typical 50 GHz DWDM channels. As we
described previously, spectral shaping is typically implemented
using SQRT-RC filters and requires DAC at the TX side. Cus-
tomers are interested in using roll-off factors as low as possible
(e.g., < 5%) in order to maximize the spectral efficiency. This
creates challenges for several DSP blocks such as TR where a de-
crease of the roll-off factor reduces the energy of the timing tone
used to recover the clock. Recent works have proposed new TR
techniques for Nyquist-shaped signals with small roll-off factor
[3], [17]. Further work is required to assess their tolerance in
the presence of channel impairments such as high-order PMD
or residual CD.
C. FEC Tradeoffs and Trends
1) Performance: Let K and N be the code dimension and
the length of a block FEC code. The code rate is R = K/N
while the overhead is given by OH = 100 × (N/K − 1) [%].
CG is the SNR difference between the uncoded and coded sys-
tem required to achieve a certain BER (e.g., 10−15) over an
ideal BPSK modulated additive white Gaussian noise (AWGN)
channel. The NCG defined by
NCG = CG + 10 · log10(R), (1)
is used to evaluate the performance of a FEC. Note that the NCG
takes into account the SNR penalty caused by the bandwidth
expansion required to keep the effective information rate of the
coded system.
Fig. 3 shows the NCG versus OH map of the state of the
art FECs suitable for high speed optical communication [21],
[39]–[51]. Since current commercial FECs use moderate OHs
(∼ 20%), an important performance increase is expected in the
short term by moving from an OH of ∼ 20% to values between
25% and 60%.4 Note that a further increase of the OH (i.e.,
> 60%) does not provide significant benefits in NCG. Therefore,
new elaborate modulation, coding, and DSP solutions will be
required in the medium-long term to improve the performance
of optical coherent reconfigurable transceivers. Some examples
of the most promising techniques are discussed in Section II-E.
4In applications with limited bandwidth, such as those which have 50 GHz
optical filters, the additional overhead may be absorbed by the use of higher
order modulations without increasing the signal bandwidth.
MORERO et al.: DESIGN TRADEOFFS AND CHALLENGES IN PRACTICAL COHERENT OPTICAL TRANSCEIVER IMPLEMENTATIONS 125
Fig. 3. NCG versus OH of the state of the art FECs suitable for high speed
optical communication [21], [39]–[51].
2) FEC and Pilot Symbols: Constellations with rotational
symmetry, such as QPSK or 16-QAM, suffer from certain catas-
trophic errors called cycle slips, which happen when the car-
rier phase rotates by a multiple of 90◦ [52]. This effect can
be mitigated by using differential encoding, but this causes an
OSNR penalty in the range of 1 dB. To reduce this penalty, non-
differential modulation can be used in combination with certain
known pilot symbols introduced by the transmitter [53]. They
allow the receiver to unambiguously determine the carrier phase
avoiding or greatly mitigating the effect of cycle slips. Cycle slip
correction eliminates most of the penalty of differential modu-
lation. Note that the pilot rate introduces an extra overhead to
the coded system. The optimal values of the pilot rate and FEC
OH depend on channel impairments (such as phase noise level)
and link configuration (e.g., modulation format). In order to im-
prove spectral efficiency, the OH of the pilot symbols and FEC
code should be among the configurable parameters discussed in
Section II.
3) FEC Design and Error Floor: Powerful FEC techniques
such as LDPC codes or turbo product codes (TPC) with iter-
ative soft-decision decoding are widely used to provide high
NCG at very low BER (10−15) as required in optical networks.
Unfortunately, LDPC and TPC with iterative SD decoding may
suffer from error floors at low error rates (e.g., < 10−10) [54].
The error floor problem can be mitigated (or avoided) by a very
careful design of both the parity check matrix and the decoder
algorithm [42]. Serial code concatenation [55] (SCC) can also
be used to combat the error floor problem. Although SCC al-
lows the use of iterative FEC codes optimized for high CG and
not for error-floor reduction (e.g., irregular LDPC codes), non-
concatenated FEC schemes may provide a better solution in
terms of latency and parameter programmability.
4) Soft and Hard Decoding (SD Versus HD): Soft-decision
decoding is able to provide up to 1.5 dB gain over hard-decision
decoding at the expense of a higher implementation complex-
ity. This increase in complexity (and power consumption) of
SD decoders cannot be avoided in certain applications (e.g.,
OH ∼ 20% and NCG  11.5 dB). On the other hand, FEC
codes with OH  15% and HD decoding are preferred in appli-
cations where low power consumption is mandatory. In moder-
ate performance situations (e.g., NCG ∼ [9 − 11]dB), a careful
comparison between low OH-SD and high OH-HD is needed
to derive a FEC solution that achieves a proper tradeoff be-
tween NCG and implementation complexity. The latter topic is
discussed in more detail in Section III-D.
5) FEC Latency: FEC latency is of particular importance in
applications such as high frequency trading [56] which require
very low latency transactions. In these situations the latency
introduced by FEC at the transmitter can be avoided by using
systematic encoding. However at the receiver the decoder must
first collect the whole codeword before correcting the errors. As
a consequence, a short codeword length is required to reduce
latency. This limits the use of high performance HD-FECs, such
as [39], which require large codeword length. Furthermore, in
iterative decoders the latency is in general proportional to the
number of iterations. Codes with fast convergence are prefer-
able. These codes must be designed to optimize their perfor-
mance under the constraint of a small number of iterations. The
latency caused by iterations can also be reduced by implement-
ing a very fast recursive hardware engine instead of a pipelined
chain of concatenated iteration stages. Also note that since short
codeword length SD-FECs can provide a performance similar to
that of long codeword length HD-FECs, the former becomes an
interesting candidate for low latency applications with moderate
FEC performance.
D. Modulation Tradeoffs
1) Bit Resolution: High order modulation schemes (e.g., 32
or 64 QAM) are required to increase the capacity of limited
bandwidth fiber links. This improvement of the fiber capacity
is achieved at the expense of an increase of the OSNR. As a
result of the implementation penalty caused by several imper-
fections in practical transceivers, extra OSNR must be added to
achieve the expected performance. The limited effective num-
ber of bits (ENOB) of ADC/DAC, laser linewidth, and the finite
resolution of DSP algorithms are some of the most important
limitations in practical VLSI implementation (e.g., see [57] and
references therein). In particular, the design of high speed and
low power ADCs with moderate resolution (e.g., ENOB ∼ 6–7
bit) represents an important challenge for the telecommunica-
tion industry. Faster than Nyquist signaling is being considered
as an alternative solution to increase the spectral efficiency. This
improvement of capacity may be achieved without increasing
the modulation order or the sampling rate, thus the requirements
for the ADC may be relaxed. Although the complexity of the
detection algorithm grows as result of the controlled ISI intro-
duced in FTN, it might be managed by using known high-speed
DSP techniques. Therefore, FTN becomes interesting as an al-
ternative to high order modulation based on orthogonal pulses.
2) Bit Mapping: Typically, the constellation points and the
bit mapping are designed independently in separate stages.
Thus, most of the theory related to the design of constellations
focuses on the asymptotic symbol error rate (SER) instead
of the BER as a criterion to optimize the design [58]. Once
the constellation is designed, bit mapping is achieved by
126 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 34, NO. 1, JANUARY 1, 2016
using the minimum BER criterion. Better performance can
be achieved if the constellation points and the bit mapping
are jointly designed [59]. This optimization must be done for
each operation point of interest, e.g., at the OSNR threshold
of each FEC. Furthermore, the overall performance can be
improved if the constellation points, bit mapping, FEC, and
interleaving are jointly designed. Towards this end, extrinsic
information transfer (EXIT) charts can be used [60]. For coded
modulation with a given number of iterations the modulation
and the decoder should be jointly designed to maximize the
mutual information in the EXIT chart.5 On the other hand, for
hard-decision FECs (e.g., staircase code [39]), the constellation
can be designed by using the distance spectrum [59].
3) Coded Modulation: Coded modulation is the natural evo-
lution of classical constellations to higher dimensions and im-
proved distance properties [61]. It is based on the introduction
of interdependencies among sequences of signal points such
that not all sequences are possible. As a result, the minimum
distance dmin between two possible sequences can be greater
than the minimum distance d0 in 2D space. This results in a CG
of 10 log10(d2min/d
2
0). In the short term, only low complexity
coded-modulation (LC-CM) schemes will be feasible for prac-
tical implementation in high-speed optical transceivers. LC-CM
combines a simple FEC code with the modulation. To achieve
the NCG required in optical networks (e.g., NCG  11dB),
LC-CM must be serially concatenated with a powerful outer
code. Tradeoffs must be made between the OHs of the inner
and the outer FEC codes. For instance, LC-CM schemes may
be able to provide an extra gain if HD-FECs are used and part
of its OH is shifted to the soft-decision domain of the LC-CM.
In the near term LC-CM schemes combined with HD-FEC may
be an alternative solution to SD-FEC with moderate gain. When
high NCG is required, however, most of the OH should be in the
powerful outer FEC, therefore the improvements achieved by
LC-CM are small. Furthermore, in some cases the CG achieved
by a LC-CM serially concatenated with a powerful outer code
will be lower than the one provided by a traditional bit inter-
leaved coded modulation (BICM) with an SD-FEC and similar
OH. This can be observed from Fig. 4 where the BER versus
the SNR per bit (Eb/N0) of two LC-CM schemes proposed for
high speed optical communication are analyzed: Polarization
Switched QPSK (PS-QPSK) [62], [63] and 128 Set Partition
QAM (128-SP-QAM) [64]. Comparisons with two BICM sys-
tems are also included: BICM-1 and BICM-2. The LDPC code
used in BICM-1 is the outer code of the concatenated solu-
tion based on LC-CM (i.e., a 20% LDPC). BICM-2 employs
a LDPC code with OH similar to the one of the serially con-
catenated scheme (i.e., LC-CM+FEC code used in BICM-1).
From Fig. 4 we observe that LC-CM is able to slightly im-
prove the performance respect to BICM-1. However, notice that
BICM-2 achieves a better performance than the LC-CM-based
solution. Taking into account that the implementation complex-
ities of LC-CM and the high OH-based BICM solutions used
in these simulations are similar, we conclude that LC-CM may
5Although Gray mapping is optimal for non-iterative decoding, it may not be
the best solution for iterative decoding of coded modulation schemes.
Fig. 4. Performance of BICM schemes versus LC-CM in combination soft
decision FECs.
Fig. 5. Iterative receiver architecture: DSP and FEC integration levels.
not be useful in these applications.6 In the long term, all the
OH of the FEC should be moved to the inner code in order to
maximize the benefits of coded modulation. Towards this end,
trellis coded-modulation (TCM) and iterative demodulation and
decoding may be adopted. In particular, iterative demodulation
and decoding seems to be one of the most promising alterna-
tives in terms of complexity and integration with the other DSP
blocks, as we shall discuss in the following section.
E. Advanced DSP Techniques
1) Iterative Receivers: Performance of coherent receivers
can be improved by combining the decoding of the soft de-
cision FEC and demapping, phase and frequency estimation,
and feedforward equalization. This can be done by introducing
iterations between the FEC decoding and the other blocks as
shown in Fig. 5. The first integration step combines modula-
tion and coding (i.e., iterative coded-modulation), as discussed
6Similar observations have been recently reported in [65] for a 256 points 4D
constellation based on the D4 -lattice (256-D4 ) which does not outperform the
classical PM-16-QAM when combined with a TPC based FEC.
MORERO et al.: DESIGN TRADEOFFS AND CHALLENGES IN PRACTICAL COHERENT OPTICAL TRANSCEIVER IMPLEMENTATIONS 127
Fig. 6. Post-FEC BER versus Eb /N0 for DP-16-QAM and 32 GBd. Laser
linewidth=500 kHz and frequency fluctuation of 200 MHz at 35 kHz (see [9]
for more details).
previously. The second integration step includes the CPR in the
iterative algorithm such as the JIDD algorithm proposed in [8].
Fig. 6 shows the performance of JIDD-based optical coherent
receivers in the presence of phase noise and laser frequency
fluctuations for DP 16-QAM and 32 GBd. The JIDD technique
is compared with two state of the art techniques based on an ex-
plicit CPR (ECPR). The first one denoted ECPR-1 is based on
the BPS algorithm combined with differential modulation [19],
and the second one denoted ECPR-2 is based on an interpolation
filter aided by pilot symbols followed by a BPS stage [66]. It
can be seen that joint iterative decoding and detection greatly
outperforms these alternative techniques and continues to work
when the others break down. This superiority of the JIDD is still
observed even without frequency fluctuations [9].
From the above, it is expected that further integration of
other blocks will improve the performance of the transceiver.
However, as a result of the prohibitively high implementation
complexity of an iterative super-receiver, the integration pro-
cess is best carried out through local integrations, such as it-
erative coded-modulation [67], [68], joint state of polarization
and carrier phase compensation [69], and stochastic digital-
backpropagation [70].
2) Multiple Carrier Receiver (Superchannels): A super-
channel system combines multiple coherent optical carriers in
order to create a unified channel of a higher data rate (e.g., see
[1] and references therein). In a superchannel system, all opti-
cal signals are modulated and multiplexed together at the same
site, transmitted and routed together over the optical link, and
received at a common site. Superchannels increase the spectral
efficiency (i.e., the channel gap is smaller than in DWDM) and
the operational scalability. An interesting feature of a super-
channel system is the possibility to exploit at the receiver the
information of its constituents to implement joint processing
[1]. This can be used to simplify some DSP blocks (e.g., car-
rier frequency recovery) or mitigate certain impairments such
as crosstalk between adjacent signals. Taking into account the
high interest of the telecommunication industry for sliceable
transceivers [7], it is expected that practical joint processing
architectures will be required in the medium term.
Fig. 7. Channel capacity versus OSNR.
Although superchannels have all the advantages mentioned
above, they also require a higher optical component count and
increase the total power dissipation, since multiple modula-
tor/demodulator, TIA, and modulator driver sets are required.
Other tradeoffs of superchannels are discussed in [71]. One of
the solutions to overcome the limitations of superchannels men-
tioned above is the use of integrated photonics [1]. There is a
complementary trend in the industry to move towards higher
data rate transmission on a single wavelength [72]. This can
be accomplished by increasing spectral efficiency (for example
using higher order modulations [73] and/or faster than Nyquist
signaling [32],[74]), but this in general comes at the cost of
reduced reach. Higher symbol rate transmission with moder-
ate spectral efficiency on a single wavelength is one of the
techniques considered for next generation long haul coherent
transceivers at data rates of 400 Gb/s to 1Tb/s [75]. Of course,
higher symbol rate transmission and high spectral efficiency
techniques can be used in combination with superchannels to
achieve even higher capacity [76].
III. PERFORMANCE-COMPLEXITY TRADEOFFS
A. Fundamental Limits and Implementation Penalty
It is useful to look at some of the fundamental limits of the
performance of the transmission system. In Fig. 7 we plot the
channel capacity in Gb/s as a function of the OSNR for three
channel bandwidths commonly used in optical communications:
100, 50, and 37.5 GHz. We assume DP, AWGN, and bandwidth
equal to the channel spacing in the DWDM grid. Thus, the
classical formula for the channel capacity in Gaussian channels
can be expressed in terms of the OSNR over 0.1 nm as








where Bw is the channel width and B0 = 12.5 GHz (0.1 nm).
One very important measure of the receiver performance of-
ten used by customers to compare different receivers is Back-to-
128 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 34, NO. 1, JANUARY 1, 2016
TABLE III
EXAMPLES OF B2B OSNR
Data Rate Bandwidth B2B OSNR at Commercial
[Gb/s] [GHz] Capacity [dB] Transceivers [dB]
100 37.5 7 10.5
200 50 11 17.5
400 100 13 N/A
Fig. 8. BER versus OSNR for QPSK and 16-QAM. Symbol rate: 32 GBd and
FEC threshold: 2.4 × 10−2 .
Back (B2B) OSNR. This measure ignores penalties introduced
by the channel and it is dominated by the CG of the FEC and
(to a lesser extent) by the implementation penalties. It is inter-
esting to find its fundamental limits based on channel capacity.
These limits are independent of any particular modulation and
coding scheme. They are useful to compare versus specific im-
plementations. Table III compares typical values of OSNR in
B2B conditions for state of the art commercial transceivers ver-
sus the fundamental limit imposed by the channel capacity given
by eq. (2) for various data rates and channel bandwidths.
Unlike in the previous analysis where the ideal B2B OSNR
was based on the channel capacity and not on any particular
modulation and coding scheme, next we assume specific modu-
lation schemes (QPSK and 16-QAM), and a specific net coding
gain (NCG = 11.3 dB at BER = 10−15 and 20% overhead), and
a specific symbol rate (32 GBd), resulting in a BER threshold
for the FEC of 2.4 × 10−2 (i.e., CG = 12 dB ). From Fig. 8 we
verify that the ideal B2B OSNR in these conditions is 10 dB
for QPSK and about 16 dB for 16-QAM. Comparing the actual
B2B OSNR of Table III versus these limits gives an indication
of the implementation penalty.
B. Transceiver Complexity
Table IV shows the relative complexity of blocks in a
transceiver that works at 100 Gb/s using QPSK modulation,
or at 200 Gb/s using 16-QAM. The measure of complexity
in this table is power dissipation. One interesting conclusion
is that 16-QAM doubles the data rate at the expense of only
30% power increase. Another conclusion is that the blocks that
TABLE IV
RELATIVE COMPLEXITY OF THE TRANSCEIVER BLOCKS
Block 100 Gb/s QPSK 200 Gb/s 16-QAM
FEC Encoder 0.02 0.04
TX DSP 0.02 0.02
BCD Equalizer 0.21 0.21
FFE 0.14 0.14
Carrier Recovery 0.02 0.02
Soft Decision Comp. 0.03 0.03





consume the most power are the BCD equalizer and the LDPC
decoder, followed by the FFE and the AFE. Therefore we ad-
dress performance/complexity tradeoffs for the BCD and the
FEC blocks in some detail in the following sections.
C. Power Consumption of BCD Equalizers
Complexity of the BCD equalizer is dominated by the com-
plexity of the fast Fourier transform (FFT) and inverse FFT
(IFFT) engines. The most important measure of complexity
from a practical standpoint is power dissipation. In practice it is
found that power dissipation correlates well with the number of
complex multiply-add operations per unit time in the FFT.
BCD equalizer power estimation is based on scaling a known
design which may be in a different technology node and for a
different symbol or sampling rate, and it may use a different FFT
block size. Scaling is based mainly on the number of complex
multiply-add operations per FFT block, and it also accounts for
scaling of symbol rate and technology. A good estimate of the
BCD power is given by
PBCD ≈ KTechPReffB NOPS, (3)
where KTech is the technology scaling factor, PRef is a power
reference design, fB is the symbol rate; NOPS is the number of





where R is the oversampling factor, NFFT is the number of
operations per FFT block, SFFT is the FFT block size, and M is
the length (in samples) of the CD impulse response. The latter
parameter can be roughly approximated by
M = RfB νm Cd, (5)
where Cd is the CD (e.g., [ns/nm]) and νm is the modulated





with δ, λ, and c being the roll-off factor of the filter, the wave-
length (e.g., 1550 nm), and the speed of light, respectively.
Equation (4) takes into account the operations required by
both FFT and IFFT blocks (2NFFT), as well as the complex
MORERO et al.: DESIGN TRADEOFFS AND CHALLENGES IN PRACTICAL COHERENT OPTICAL TRANSCEIVER IMPLEMENTATIONS 129
Fig. 9. BCD power versus CD and FFT size. Assumptions: (i) Excess band-
width 20%, (ii) Oversampling factor 1.6, (iii) Reference design 32GBd symbol
rate, 55 ns/nm CD compensation capability 4k FFT size, and (iv) Radix 2 FFT.
multiplications used for frequency filtering (SFFT). The ap-
proximate number of operations per FFT block for radix
2 and radix 4 FFTs are NFFT = SF F T2 (log2 SFFT − 1) and
NFFT = 3SF F T4 (log4 SFFT − 1), respectively. There are theo-
retical bounds for the minimum number of operations in the
FFT, such as the Winograd or Heideman bounds [77]. A par-
ticular algorithm can be compared to these bounds to evaluate
its performance. Many algorithms come close to the bounds, so
there is little room for optimization here.
Fig. 9 shows the normalized power dissipation of the BCD
equalizer (PBCD = PBCD/PBCD ,Ref ) as a function of CD com-
pensation capability, FFT block size, and symbol rate. We use
δ = 0.20, R = 1.6, and radix 2 FFT. The reference design
(PBCD ,Ref ) considers a receiver with 32 GBd symbol rate, 55
ns/nm CD compensation capability, and 4k FFT block size. The
same fabrication technology is assumed for all the cases (e.g.,
KTech is constant). It can be seen that large FFT block sizes
are in general more efficient than small blocks, except in the re-
gion of very low CD compensation. For large block size, power
dissipation increases modestly as a function of increasing CD.
However, small block sizes exhibit a dramatic increase as a re-
sult of the loss of efficiency in the computation because of the
large size of the overlap block compared to the FFT block. Also,
symbol rate has a dramatic effect on the power dissipation.
D. Complexity of FEC Codes
Soft decision coding schemes such as LDPC codes or TPCs
with large codeword length (e.g.,  104) are mandatory to
achieve near Shannon limit performance. Parallel architectures
are required for multigigabit transceivers.7 Parallel architectures
for LDPC and TPC usually suffer from high layout complexity
as a result of the complex interconnection patterns inherent in
their iterative decoders. This problem is exacerbated by code
rate programmability, required to provide flexibility to SDONs.
7Serial digital algorithms operating at clock frequencies of hundreds of GHz
are not possible with current CMOS technology (e.g., 16 nm).
Fig. 10. FEC complexity versus OH and NCG normalized to the FEC reported
in [42], [78], [79] with N ≈ 2 · 104 and average parity check matrix column
weight 4. Reference (point): NCG = 11.3 dB at BER = 10−15 with 20%
OH.
To mitigate this problem, a well structured parity check matrix
is needed. Although current commercial products show high
performance and good flexibility [78], further additional work
will be necessary to efficiently implement future generations of
FECs with high gains (e.g., NCG  12 dB) and variable code
rates. Given the crucial role of FECs with near Shannon limit
performance for development of SDON, some practical related
aspects of FEC implementation are discussed in the following.
1) Complexity versus FEC Overhead: As observed in Fig. 3,
NCGs of current FEC codes closely approach the Shannon ca-
pacity. Unfortunately, for a given OH, an important increase
in complexity is needed to further increase the NCG by just a
fraction of a dB. This can be inferred from Fig. 10, where an
estimated normalized complexity of FEC is depicted as a func-
tion of the OH and NCG8. For example, NCG can be increased
from 11.3 to 11.9 dB with 20% OH at the expense of an increase
of three times (∼3×) in complexity. Notice that the same NCG
improvement can be achieved by a FEC with ∼42% OH and
lower (∼ 0.8×) complexity. Therefore, industry is considering
the use of higher OHs as a practical approach to (i) increase the
NCG and (ii) reduce the impact on the FEC complexity. It is
important to realize that the higher OH requires an increase of
8The NCG and the complexity of an LDPC code for fixed codeword length
N and average parity check matrix column weight w̄c can be approximated
by functions Φ(R, i, b) and O(ν, R, i, b) respectively, where ν is the uncoded
bit-rate, R is the code-rate, i is the number of iterations and b is the number
of bits used in the internal fixed point resolution of the decoder. Therefore,
the NCG as a function of R, O and ν , denoted Ψ(R, O, ν), can be computed
as Ψ(R, O, ν) ≈ maxi ,b Φ(R, i, b) subject to O(ν, R, i, b) = O. However,
the latter optimization is not straightforward due to inherent complexity of
O(·) and Φ(·). We suggest an approximation based on the observation that
SD-FEC complexity is dominated by the complexity of the decoder which is
approximately proportional to i, b and the coded bit-rate ν/R. This simplifies
the complexity constraint to ORi′b′ν ′ = O ′R′ibν , where O ′ is a reference
complexity for a fixed set of parameters ν ′, i′, b′ and R′. In Fig. 10 we used ν =
ν ′ = 100 Gb/s, w̄c = 4, N ≈ 2 · 104 and Φ(R, i, b) estimated by computer
simulation based on the LDPC code described in [79]. A similar tradeoff is
expected for alternative architectures such as those described in [80].
130 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 34, NO. 1, JANUARY 1, 2016
the symbol rate or the modulation order. Therefore the impact
of a high OH FEC solution on the implementation complexity
of other critical blocks such as ADCs, DACs, DSP blocks, etc.,
must be included in the analysis.
2) Structured Parity Check Matrix: A well structured parity
check matrix can be used to mitigate the problem caused by the
high interconnection complexity of parallel implementations of
LDPC and TPC [80]. For instance, regular column-partition
(RCP) [81], [82] combined with quasi-cyclic (QC) constraint
was proposed in [42] for LDPC codes. Shift LDPC code also
provides an efficient hardware oriented parity check matrix con-
straint [83]. We highlight that owing to the natural product code
constraint, interconnection complexity in TPC is simpler than
in most LDPC codes [84]–[86].
3) Variable Code Rates: The interconnection problem is
exacerbated in rate-programmable FECs. A careful design is
required to avoid (or minimize) further increase of area and
power dissipation. The use of independent FECs allows archi-
tectures highly optimized for a fixed set of parameters to be
designed. This approach avoids the power penalty by turning
off the FEC blocks that are not used. However, its disadvantage
is the high area that increases the cost of the chip. Therefore,
an important effort is being directed towards designing pro-
grammable FEC architectures with low power and area penal-
ties. A rate-adaptable architecture can be dynamically recon-
figured to change the code-rate by varying the code-dimension
or the code-length. To reduce complexity, code-rate variation is
done by preserving the underlying structure of the parity check
matrix, for example by adding or removing cyclic submatrices
in a QC-LDPC code [79]. Thus, a well structured parity check
matrix not only reduces the interconnection complexity but also
provides a direct way to implement rate-adaptability. In this re-
gard, LDPC codes have been shown to provide more flexibility
than TPC codes [86].
4) Early Termination: The CG of SD-FEC increases as the
number of iterations, Ni , increases. On the other hand, latency
and power consumption grow with Ni . The number of iterations
must be selected to provide a good tradeoff between complexity
and performance (e.g., 5 to 15 iterations). For a given perfor-
mance, the optimal (minimum) value of Ni shall depend on the
channel conditions. The value of Ni required to achieve a cer-
tain performance reduces as OSNR increases. Furthermore, the
number of iterations may vary at a certain OSNR due to different
noise realizations. Then, Ni should be dynamically adjusted to
minimize power consumption and increase the FEC throughput.
Several techniques to stop the decoding process have been re-
ported in the past literature [87], [88]. Typically, these schemes
are built upon two critera:
C1. Stop when the decoder has converged or a maximum num-
ber of iteration has been reached;
C2. Stop when the syndrome is zero or a maximum number of
iterations has been reached.
Criterion C1 avoids iterations once the soft-decision outputs
are stable. This fact minimizes the probability of high values of
Ni . When the errors in the received codeword can be corrected
by the FEC, criterion C2 minimizes the number of iterations. On
the other hand, when the errors in the received codeword cannot
be corrected, the maximum value of Ni will be reached in C2-
based decoders. Therefore, a combination of C1 and C2 seems
to be the best approach to minimize the number of iterations and
power consumption.
5) Quantization: Fixed-point implementation of SD de-
coders plays an important role on both the performance and
the complexity of FEC. The numbers of bits must be selected to
minimize the effects of quantization and saturation. Typically,
4 and 5 bits uniform quantization (UQ) is adopted in practical
VLSI implementations. Error floors caused by quantization ef-
fects with UQ [42] may be efficiently mitigated (or avoided)
by using non-uniform quantization (NUQ) [89]. However, this
improvement can be achieved at the expense of an increase of
complexity of the arithmetic operations. Therefore, a careful
analysis of UQ and NUQ must be carried out to assess the best
tradeoff between complexity (e.g., arithmetic operations) and
performance (e.g., number of bits).
IV. PRACTICAL IMPLEMENTATION TOPICS
This section discusses low level ASIC implementation as-
pects with significant impact on the complexity and perfor-
mance. These aspects are: (i) Floor planning, (ii) Placement,
(iii) Clock tree synthesis (CTS), (iv) Routing, (v) Timing sig-
noff, (vi) Power dissipation, (vii) Power and SI, and (viii)
Design for testability (DFT).
A. Floor Planning
Current generation coherent optical transceivers typically in-
tegrate in a single monolithic device functionality such as that
exemplified in Figs. 1 and 2 and features such as those listed in
Table II. Complexity may be in the 350 million gates range and
beyond. Partitioning the design into blocks of manageable size
is fundamental. There is tradeoff on the computation time re-
quired to process each block by the electronic design automation
(EDA) tools and the integration time. As the number of blocks
decreases so does the time and effort required for their integra-
tion. However, a reduced number of blocks requires managing
large block sizes increasing the processing time for each individ-
ual block. The optimal size is in the range of 9 to 13.5 million
gates. This is particularly important in DSP and FEC blocks,
which perform a large amount of computation. This often leads
to large data buses and complex interconnections. These must
be managed beginning with the floor planning stage by adding
physical restrictions (e.g., block shape, block area and port lo-
cation). A square shape is commonly used as an starting point
but sometimes it is not optimum for achieving the lowest power
dissipation. The aspect ratio and the location of input/output
(I/O) ports may have significant impact in the placement effort
and the final power dissipation. Fig. 11 shows the top-level par-
titioning of a typical coherent optical transceiver as described
in Figs. 1 and 2. Design must be partitioned in such a way as
to minimize the number of physical ports per block. Some DSP
algorithms such as BCD or FEC require some amount of local
data storage. It is important to analyze the tradeoff between us-
ing random access memories (RAMs) and flip flops. Usually a
large number of flip flops will increase the power dissipation
MORERO et al.: DESIGN TRADEOFFS AND CHALLENGES IN PRACTICAL COHERENT OPTICAL TRANSCEIVER IMPLEMENTATIONS 131
Fig. 11. Example of a top level partitioning for a typical optical coherent
transceiver.
of the clock tree. It can also increase the complexity of the
scan chains9 and require extra buffers for hold time fixes during
static timing analysis (STA). On the other hand, RAMs generate
physical restrictions to the place and route engines. This is exac-
erbated if memory compilers do not have the right combination
of address and word length needed, restricting even more the de-
sign space.10 Advanced compilers support features targeting low
power design such as power down, sleep mode, redundancy and
transistors with high voltage threshold (Vt) for peripheral logic,
etc. If memories are in the data path (as required in some regular
structures such as FFT and IFFT), they should be pre-placed at
the floor planning stage in a fixed location following the data
flow, thus reducing long interconnections and avoiding criss-
crossing. This task may help reduce the parasitic capacitance of
the wires and minimize the switching power component. Since
not all ranges of frequencies are supported for the same RAM
configuration, the memory may restrict the clock frequency of
the logic computation surrounding it. This restriction can be
avoided by using, for example, dual clock domains or even dual
voltage domains.
B. Placement
At this stage the physical location of all components is de-
fined. They remain essentially fixed during the rest of the physi-
cal design flow. The quality of the final results depends critically
on achieving good placement. Typically, a significant increase
in the number of cells compared to the post synthesis netlist oc-
curs in this step. Buffer trees are built to distribute loads on nets
with high fanout. It is possible to create hard or soft regions to
guide placement. Moreover, it is possible to block certain areas.
Placement can be driven to minimize congestion or to focus on
9Scan chains are sequential elements connected back to back to shift-in and
shift-out test data. The goal of the scan chains is to make each node in the circuit
controllable and observable.
10This is further exacerbated by the insertion of hardware to test defects on
the memory blocks during the manufacturing process.
Fig. 12. Placement of a complex DSP block. (a) Flat placement (3.82 W). Fly
lines show the input to output path through each group of taps from a filter. (b)
Stratified placement (2.55 W) I/Os ports placed at left and right plus additional
place guides regions are used to guide data path.
timing. Manual placement (tiling) is used in regular data paths
or in areas with large cell density. Initial placement attempts to
achieve a homogeneous distribution and an area utilization11 of
about 60%. A certain percentage of spare cells is added dur-
ing the placement. These cells can be used for small functional
corrections at the very end stage in the design, before tape-
out. These corrections are made by modifying interconnections
based on metal layers only. If a new tape-out is needed to fix
the problem, this technique may help reduce the manufacturing
costs as only a few masks will change, without affecting the
base layer masks. Fig. 12 shows two different floor plans of a
DSP block. In Fig. 12(a) the tool automatically managed the
location of the logic for each sub-block (flat placement) while
in Fig. 12(b) the tool was restricted to follow a certain pattern
in the location of logic and I/Os (stratified placement) based on
the designer’s knowledge of the functionality of the block. Dif-
ferent colors represent standard-cell density for each sub-block,
while cyan regions show part of the wires connected to I/O ports
with internal logic. In the flat placement, I/O ports are located
at the top and bottom, while in the stratified placement the I/O
ports were located on the right side to minimize congestion and
wire length. Different regions in the stratified placement indi-
cate soft constraints used to guide the placement tools (there
are four regions and each one comprises two sub-blocks). The
total power dissipations for the flat and the stratified placements
were 3.82 and 2.55 W, respectively. This example shows that a
human-assisted placement design is preferred for complex DSP
blocks.
High power consumption of the BCD equalizer is mainly
caused by the complex multiply-add operations required in
11Area utilization is the ratio between standard cells+macros area and the
effective placement area.
132 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 34, NO. 1, JANUARY 1, 2016
FFT/IFFT blocks. Custom multipliers in combination with a
carefully selected subset of standard cells would enable tiling
them manually in a very regular structure during placement, thus
reducing the length of the interconnections. Placement must be
guided in a different way to manage the complex interconnec-
tion of high-performance FEC parallel architectures. Clustering
the processing unit elements and minimizing the drive strength
of the buffers used in far interconnections between clusters are
some of the options available to improve the placement of com-
plex interconnection blocks such as iterative LDPC decoders.
C. Clock Tree Synthesis
Clock trees are one of the main sources of dynamic power
usage. This makes CTS a critical power optimization step. Tra-
ditional automatic CTS design algorithms often work against
the goal of power reduction by introducing more buffering than
necessary. This is exacerbated by (i) uncertainties in IC man-
ufacturing and (ii) number of operating modes and power do-
mains. A human assisted CTS design taking into account the
implementation architecture of the involved blocks is needed
to optimize power. Towards this end, CTS design must pur-
sue strategies that lower the overall capacitance and minimize
switching activity. Furthermore, CTS optimization should ide-
ally trace back to the system level architecture design in order
to improve the overall tradeoff between chip performance and
complexity.
D. Routing
In blocks with intensive routing, such as SD-FEC, it is con-
venient to use the maximum stack of metals offered by the
technology. Upper metal layers are used to built the power grid,
medium layers for long interconnections, and low metal layers
are preferred mostly for local interconnections. It is recom-
mended to use via redundancy for reliability. Non default rules
such as double width and double or triple space rules in clock
nets may help minimize signal integrity (SI) effects such as
cross coupling among clock nets and signal routing. Default
rules represent a good starting point for signal nets. It is impor-
tant to control the cell pin density in early stages to improve the
pin accessibility and routability of the design. Over constrained
design rules such as maximum transition or capacitance may
result in high power dissipation.
E. Timing Signoff
The STA flow is set based on guidelines provided by the
foundry and experience in taping out at the same technology
node. The full chip is analyzed under process, voltage and tem-
perature (PVT) variations for each operation mode. The quality
of results is measured in terms of number cases and magnitude
of: (i) setup and hold timing violations, (ii) signal and clock
transitions, (iii) minimum pulse width and period at clock pins
of sequential elements and (iv) glitch or noise immunity. SI
is verified in STA taking into account both crosstalk coupling
noise and noise glitch effect. The former impacts delay neg-
atively and the latter can upset adjacent logic circuitry. Noise
effects are exacerbated at low voltage supplies, therefore effects
such as dynamic voltage drop (DVD) must be considered in
STA. In multimillion instance designs the voltage drop back an-
notation flow must be integrated in STA. This approach reduces
risks associated with the above mentioned phenomena, which
are difficult to predict accurately due to their dependence on
system activity.
F. Power Dissipation
In CMOS technology the short circuit current and the current
flow in interconnections determine dynamic power dissipation.
Static power is dissipated even in stable states by the leakage
currents of the transistors. Leakage exist unless the power supply
is completely turned off. It can be reduced by using high Vt cells
as much as possible when large portions of the design are idle.
Otherwise using large amounts of high Vt cells may increase
dynamic power due to the short circuit current in CMOS devices.
It is also possible to automatically downsize the cells or swap Vt
types if timing is not degraded. In the same way, it is possible to
remove hold buffers if there is enough hold margin. On the other
hand, dynamic power depends on how large the amount of circuit
operating at same time is. In data path intensive algorithms
such as BCD, FFE, and CPR the entire circuits switch almost
every clock, therefore, finding the lowest operating frequency
is important. In designs where processing data cycle to cycle
is not required, clock gating techniques are extensively used to
reduce the switching activity on the clock path. Alternatively,
the sharing operator technique is commonly used when there
are several clock cycles available to process the incoming data.
This allows arithmetic processing elements such as multipliers
and adders to be reused. From the technology point of view,
choosing advanced technologies such as FinFet transistors is a
good option to reduce power in comparison to planar transistors.
From the point of view of the algorithms some tradeoffs need
to be analyzed. For instance, reducing hardware by increasing
frequency not necessarily results in power savings.
To meet power specifications it is necessary to have an early
estimation of power consumption. This allows designers to re-
act during the algorithm and architecture design phase. With
modern EDA tools power can be estimated at register transfer
level (RTL). These tools use the information from the technol-
ogy libraries plus generic parasitic models extracted from a real
layout in combination with the toggling information provided
by the RTL simulation. The fast estimation of this approach al-
lows different architectures to be studied and refinements to be
made. Iteration time is low as the layout flow is skipped in this
loop. The accuracy of these power numbers is good enough to
make decisions and select among different architectures.
Final power estimations are performed post layout, having
at this point the physical information of the routing. The accu-
racy of the results is set by the quality of the inputs used into
the power calculation signoff tool. The inputs are: (i) standard
parasitic exchange format (SPEF) files, (ii) timing libraries,
and (iii) value change dump (VCD) files. SPEF files provide
parasitic information with all the capacitances extracted using
signoff parasitic extraction tools. Timing libraries include power
MORERO et al.: DESIGN TRADEOFFS AND CHALLENGES IN PRACTICAL COHERENT OPTICAL TRANSCEIVER IMPLEMENTATIONS 133
information and they are selected based on the signoff criteria
determined by the application. VCD files provide the toggle in-
formation needed to analyze the dynamic power components.
The VCD is typically dumped from gate level simulations un-
der real conditions. These simulations annotate the cell and wire
delays extracted from STA signoff tools. Using the real delay of
the cells and wires in the simulation is important to capture the
glitches that occur in the data path. These glitches have impor-
tant impact on the toggle count. Omitting glitch effects in DSP
systems results in optimistic power estimations.
G. Power and SI
Power integrity is related to the stability of power sources.
Ideally, the maximum current can be delivered at any rate to
the power distribution network (PDN) while the voltage at the
output of the power source remains constant. Unfortunately, real
power sources suffer from degradations due to resistance, reac-
tance and resonance, called the 3Rs. The voltage degradation
in the PDN due to the resistance increases with wire lengths.
The later generate a static voltage drop caused by the average
current flowing through the PDN which is known as IR drop.
The voltage drop in the PDN due to the reactance is a DVD
proportional to the inductance of the PDN and the temporal
variations of the current flowing through the PDN. Therefore,
a high inductance and a fast rate of change of the current ex-
acerbates this degradation. On the other hand, the resonance
is intrinsic to the system. The PDN can be represented by an
RLC circuit with a frequency of resonance fr . At this frequency
the impedance seen from the switching circuits towards to the
voltage source is maximum [90], [91]. This situation must be
avoided by ensuring that this frequency is far enough from
the frequency of operation of the circuit. Inserting additional
capacitance is useful, usually on-chip decoupling-caps and in-
tegrated capacitors for package are used for such purpose. For
instance, circuits such as FFT/IFFT blocks which require large
amount of current in a small fraction of time and operate at
frequencies close to the resonant frequency of the PDN may
generate power integrity issues, such as ground bouncing. A
proper control of the frequency of resonance or the frequency of
operation helps to avoid power integrity issues. Toward this end,
a complete model of the entire PDN is required to do an accu-
rate analysis (e.g., model must include board, package and die
impedances)
Power integrity noise in power and ground lines is caused by
simultaneous switching of large numbers of gates and registers.
Therefore, minimizing the impedance of the PDN (on board,
on package and on chip) helps reduce the problem. Impedance
minimization can be achieved by reducing the PDN resistance
and inductance or by using decoupling capacitors. However,
the later has equivalent series resistance and equivalent series
inductance which also cause resonant behavior.
Finally, SI requires preserving the bandwidth, reducing la-
tency, minimizing noise, and reducing power dissipation of
interconnects. Coupling noise has three primary deleterious
effects: (i) Functional failure, (ii) Glitch power consumption,
and (iii) Delay uncertainty. Circuit level methodologies for
SI include the design of tapered buffers, repeater insertion,
shielding, gate and wire sizing, wire spacing, signal rerouting,
and wire reordering (e.g., see [92] for a thorough analysis of
these topics).
H. Design for Testability
Detecting failures in integrated circuits at submicron technol-
ogy nodes has become a major challenge. Large and complex
designs require both structural DFT and ad-hoc techniques to de-
tect manufacturing defects and to test performance of the design.
A combination of dedicated hardware and embedded firmware
is needed to ensure the highest level of test quality in complex
system on chips (SoCs) for optical coherent transceivers. De-
sign size, operation modes, flip-flop count, test time and power
dissipation are parameters commonly explored when design-
ing a DFT strategy. Core test wrapping methodology has been
extensively used to give more flexibility and control over the
observability points [93]. These core wrappers can be selected
independently to receive testing patterns and this way put a
specific region of the chip under test.
Ad-hoc DFT techniques are used for debugging and functional
verification. The transceiver typically incorporates a serial pe-
ripheral interface which provides access to registers used to
control the device operation, read or write parameters and co-
efficients, and read status signals. It also has a diagnostic unit
(DU), which provides a host of observability and controllability
features used for testing, characterization, and channel diagnos-
tics. For example, the DU captures data from chip blocks such
as the ADC, FFE, CPR, etc., in real time and then it is accessed
externally at lower speed. In this way, internal data from almost
any point in the chip can be captured and analyzed by software.
V. CONCLUSION
Coherent transceivers are among the most complex devices
designed by the semiconductor industry. They implement so-
phisticated signal processing, coding, framing and mapping
functions at ultra high speed, meeting high performance and
low power dissipation requirements. Flexible transceivers able
to operate in software defined networks at rates up to 200 Gb/s
per wavelength are now commercially available. The semicon-
ductor industry is currently actively working on transceivers
operating at 400 Gb/s and beyond. The DSP revolution in opti-
cal communications is expected to continue in the future with
the application of even more advanced techniques. This will
result in increases in data rate, spectral efficiency and flexibility
of the network as well as in significant cost reductions.
REFERENCES
[1] X. Liu, S. Chandrasekhar, and P. Winzer, “Digital signal processing tech-
niques enabling multi-Tb/s superchannel transmission,” IEEE Signal Pro-
cess. Mag., vol. 31, no. 2, pp. 16–24, Mar. 2014.
[2] M. Kuschnerov et al., “Advances in signal processing,” presented at the
European Conf. Exhibition Optical Communication, Amsterdam, The
Netherlands, 2012, Paper We.2.A.1.
[3] X. Zhou and L. Nelson, “Advanced DSP for 400Gb/s and beyond optical
networks,” J. Lightw. Technol., vol. 32, no. 16, pp. 2716–2725, Aug. 2014.
[4] M. Kuschnerov et al., “DSP for coherent single-carrier receivers,” J.
Lightw. Technol., vol. 27, no. 16, pp. 3614–3622, Aug. 2009.
134 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 34, NO. 1, JANUARY 1, 2016
[5] M. Jinno et al., “Distance-adaptive spectrum resource allocation in
spectrum-sliced elastic optical path network,” IEEE Signal Process. Mag.,
vol. 48, no. 8, pp. 138–145, Aug. 2010.
[6] P. Layec et al., “Elastic optical networks: The global evolution to soft-
ware configurable optical networks,” J. Bell Labs Tech., vol. 18, no. 3,
pp. 133–155, 2013.
[7] N. Sambo et al., “Next generation sliceable bandwidth variable transpon-
ders,” IEEE Signal Process. Mag., vol. 53, no. 2, pp. 163–171, Feb. 2015.
[8] A. Barbieri, G. Colavolpe, and G. Caire, “Joint iterative detection and
decoding in the presence of phase noise and frequency offset,” IEEE
Trans. Commun., vol. 55, no. 1, pp. 171–179, Jan. 2007.
[9] M. A. Castrillon et al. (2015). On the performance of joint it-
erative detection and decoding in coherent optical channels with
laser frequency fluctuations,” Opt. Fiber Technol. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S1068520015000474
[10] O. Gerstel et al., “Elastic optical networking: A new dawn for the optical
layer?” IEEE Signal Process. Mag., vol. 50, no. 2, pp. s12–s20, Feb.
2012.
[11] M. Jinno et al., “Spectrum-efficient and scalable elastic optical path net-
work: Architecture, benefits, and enabling technologies,” IEEE Signal
Process. Mag., vol. 47, no. 11, pp. 66–72, Nov. 2009.
[12] G. Gho and J. Kahn, “Rate-adaptive modulation and coding for op-
tical fiber transmission systems,” J. Lightw. Technol., vol. 30, no. 12,
pp. 1818–1828, Jun. 2012.
[13] G. Tzimpragos et al., “A survey on FEC codes for 100 G and beyond
optical networks,” IEEE Commun. Surveys Tuts., to be published.
[14] E. Ip and J. Kahn, “Fiber impairment compensation using coherent de-
tection and digital signal processing,” J. Lightw. Technol., vol. 28, no. 4,
pp. 502–519, Feb. 2010.
[15] D. Crivelli et al., “Architecture of a single-chip 50 Gb/s DP-QPSK/BPSK
transceiver with electronic dispersion compensation for coherent opti-
cal channels,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 61, no. 4,
pp. 1012–1025, Apr. 2014.
[16] C. Laperle, “Advances in high speed ADC, DAC, and DSP for optical
transceivers,” in Proc. Opt. Fiber Commun. Conf., Anaheim, CA, USA,
2013, Paper OTh1F.5.
[17] N. Stojanovic, Y. Zhao, and C. Xie, “Feed-forward and feedback timing
recovery for Nyquist and faster than Nyquist systems,” presented at the
Optical Fiber Communication Conf., San Francisco, CA, USA, 2014,
Paper Th3E.3.
[18] D. Crivelli, H. Carrer, and M. Hueda, “Adaptive digital equalization in the
presence of chromatic dispersion, PMD, and phase noise in coherent fiber
optic systems,” in Proc. IEEE GLOBECOM, 2004, vol. 4, pp. 2545–2551.
[19] T. Pfau, S. Hoffmann, and R. Noe, “Hardware-Efficient coherent digital
receiver concept with feedforward carrier recovery for M-QAM constel-
lations,” J. Lightw. Technol., vol. 27, no. 8, pp. 989–999, Apr. 2009.
[20] P. Gianni et al., “Compensation of laser frequency fluctuations and
phase noise in 16-QAM coherent receivers,” IEEE Photon. Technol. Lett.,
vol. 25, no. 5, pp. 442–445, Mar. 2013.
[21] K. Onohara et al., “Soft-decision-based forward error correction for 100
Gb/s transport systems,” IEEE J. Sel. Topics Quantum Electron., vol. 16,
no. 5, pp. 1258–1267, Sep. 2010.
[22] M. Taylor, “Phase estimation methods for optical coherent detection using
digital signal processing,” J. Lightw. Technol., vol. 27, no. 7, pp. 901–914,
Apr. 2009.
[23] P. Gianni et al., “A new parallel carrier recovery architecture for intradyne
coherent optical receivers in the presence of laser frequency fluctuations,”
in Proc. IEEE GLOBECOM, 2011, pp. 1–6.
[24] G. P. Agrawal, Nonlinear Fiber Optics (ser. Optics and Photonics). San
Diego, CA, USA: Academic, 2006.
[25] M. Magarini et al., “Empirical modeling and simulation of phase noise
in long-haul coherent optical transmission systems,” Opt. Exp., vol. 19,
no. 23, pp. 22 455–22 461, Nov. 2011.
[26] Z. Tao et al., “Simple fiber model for determination of XPM effects,” J.
Lightw. Technol., vol. 29, no. 7, pp. 974–986, Apr. 2011.
[27] S. Kumar, “Analysis of nonlinear phase noise in coherent fiber-optic sys-
tems based on phase shift keying,” J. Lightw. Technol., vol. 27, no. 21,
pp. 4722–4733, Nov. 2009.
[28] T. Pfau, S. Hoffmann, and R. Noe, “Coherent optical 25.8-Gb/s OFDM
transmission over 4160-km SSMF,” J. Lightw. Technol., vol. 26, no. 1,
pp. 6–15, Feb. 2008.
[29] A. Diaz et al., “Analysis of back-propagation and RF pilot-tone based
nonlinearity compensation for a 9× 224Gb/s POLMUX-16QAM system,”
presented at the Optical Fiber Communication Conf., Los Angeles, CA,
USA, 2012, Paper OTh3C.5.
[30] R. Soriano et al., “Chromatic dispersion estimation in digital coherent
receivers,” J. Lightw. Technol., vol. 29, no. 11, pp. 1627–1637, May 2011.
[31] S. Qi, A. Lau, and L. Chao, “Fast and robust blind chromatic dispersion
estimation using auto-correlation of signal power waveform for digital
coherent systems,” J. Lightw. Technol., vol. 31, no. 2, pp. 306–312, Jan.
2013.
[32] J. E. Mazo, “Faster-than-Nyquist signaling,” J. Bell Labs Tech., vol. 54,
no. 8, pp. 1451–1462, 1975.
[33] X. Li et al., “Electronic post-compensation of WDM transmission impair-
ments using coherent detection and digital signal processing,” Opt. Exp.,
vol. 16, no. 2, pp. 880–888, 2008.
[34] Z. Maalej et al., “Reduced complexity back-propagation for optical com-
munication systems,” presented at the Optical Fiber Communication
Conf., Los Angeles, CA, USA, 2012, Paper JW2A.63.
[35] D. Millar et al., “Mitigation of fiber nonlinearity using a digital coher-
ent receiver,” IEEE J. Sel. Topics Quantum Electron., vol. 16, no. 5,
pp. 1217–1226, Sep. 2010.
[36] C.-Y. Lin et al., “Adaptive digital back-propagation for optical commu-
nication systems,” presented at the Optical Fiber Communication Conf.,
San Francisco, CA, USA, 2014, Paper M3C.4.
[37] Z. Tao et al., “Multiplier-free intrachannel nonlinearity compensating al-
gorithm operating at symbol rate,” J. Lightw. Technol., vol. 29, no. 17,
pp. 2570–2576, Sep. 2011.
[38] Q. Zhuge et al., “Aggressive quantization on perturbation coefficients for
nonlinear pre-distortion,” presented at the Optical Fiber Communication
Conf., San Francisco, CA, USA, 2014, Paper Th4D.7.
[39] B. Smith et al., “Staircase codes: FEC for 100 Gb/s OTN,” J. Lightw.
Technol., vol. 30, no. 1, pp. 110–117, Jan. 2012.
[40] D. Chang et al., “LDPC convolutional codes using layered decoding al-
gorithm for high speed coherent optical transmission,” presented at the
Optical Fiber Communication Conf., Los Angeles, CA, USA, 2012, Pa-
per OW1H.4.
[41] L. Nelson et al., “WDM performance and multiple-path interference toler-
ance of a real-time 120 Gbps Pol-Mux QPSK transceiver with soft decision
FEC,” presented at the Optical Fiber Communication Conf., Los Angeles,
CA, USA, 2012, Paper NTh1I.5.
[42] D. Morero et al., “Non-concatenated FEC codes for ultra-high speed
optical transport networks,” in Proc. IEEE GLOBECOM, 2011, pp. 1–5.
[43] Y. Miyata et al., “UEP-BCH product code based hard-decision FEC for
100 Gb/s optical transport networks,” presented at the Optical Fiber Com-
munication Conf., Los Angeles, CA, USA, 2012, Paper JW2A.7.
[44] I. Djordjevic, L. Xu, and T. Wang, “Reverse concatenated coded mod-
ulation for high-speed optical communication,” IEEE Photon. J., vol. 2,
no. 6, pp. 1034–1039, Dec. 2010.
[45] K. Liu et al., “Quasi-cyclic LDPC codes: Construction and rank analysis
of their parity-check matrices,” in Proc. Inform. Theory Appl. Workshop,
San Diego, CA, USA, 2012, pp. 227–233.
[46] S. Song et al., “A unified approach to the construction of binary and
nonbinary quasi-cyclic LDPC codes based on finite fields,” IEEE Trans.
Commun., vol. 57, no. 1, pp. 84–93, Jan. 2009.
[47] L. Zhang et al., “Quasi-cyclic LDPC codes on cyclic subgroups of finite
fields,” IEEE Trans. Commun., vol. 59, no. 9, pp. 2330–2336, Sep. 2011.
[48] K. Sugihara et al., “A spatially-coupled type LDPC code with an NCG of
12 dB for optical transmission beyond 100 Gb/s,” presented at the Optical
Fiber Communication Conf., Anaheim, CA, USA, 2013, Paper OM2B.4.
[49] J. Li et al., “Algebraic quasi-cyclic LDPC codes: Construction, low error-
floor, large girth and a reduced-complexity decoding scheme,” IEEE Trans.
Commun., vol. 62, no. 8, pp. 2626–2637, Aug. 2014.
[50] F. Buchali et al., “Implementation of 64QAM at 42.66 GBaud using
1.5 samples per symbol DAC and demonstration of up to 300 km fiber
transmission,” presented at the Optical Fiber Communication Conf., San
Francisco, CA, USA, 2014, Paper M2A.1.
[51] Digital Video Broadcasting, ETSI EN Standard 302 307, Rev. 1.2.1, 2009.
[52] M. Castrillon, D. Morero, and M. Hueda, “A new cycle slip compensation
technique for ultra high speed coherent optical communications,” in Proc.
IEEE Photon. Conf., Burlingame, CA, USA, Sep. 2012, pp. 175–176,
Paper MU2.
[53] H. Zhang et al., “Cycle slip mitigation in POLMUX-QPSK modulation,”
presented at the Optical Fiber Communication Conf., Los Angeles, CA,
USA, 2011, Paper OMJ7.
[54] T. Richardson, “Error floors of LDPC codes,” presented at the 41th Annual
Allerton Communication Control Computing Conf., Monticello, IL, USA,
2003.
[55] D. Morero and M. Hueda, “Novel serial code concatenation strategies
for error floor mitigation of low-density parity-check and turbo product
MORERO et al.: DESIGN TRADEOFFS AND CHALLENGES IN PRACTICAL COHERENT OPTICAL TRANSCEIVER IMPLEMENTATIONS 135
codes,” Can. J. Elect. Comput. Eng., vol. 36, no. 2, pp. 52–59, Spring
2013.
[56] D. Schneider, “The microsecond market,” IEEE Spectrum, vol. 49, no. 6,
pp. 66–81, Jun. 2012.
[57] J. Geyer et al., “Practical implementation of higher order modulation
beyond 16-QAM,” presented at the Optical Fiber Communication Conf.,
Los Angeles, CA, USA, 2015, Paper Th1B.1.
[58] C. Campopiano and B. Glazer, “A coherent digital amplitude and phase
modulation scheme,” IRE Trans. Commun. Syst., vol. 10, no. 1, pp. 90–95,
Mar. 1962.
[59] F. Schreckenbach et al., “Optimization of symbol mappings for bit-
interleaved coded modulation with iterative decoding,” IEEE Commun.
Lett., vol. 7, no. 12, pp. 593–595, Dec. 2003.
[60] S. ten Brink, “Convergence of iterative decoding,” Electron. Lett., vol. 35,
no. 10, pp. 806–808, May 1999.
[61] G. Forney et al., “Efficient modulation for band-limited channels,” IEEE
J. Select. Areas Commun., vol. 2, no. 5, pp. 632–647, Sep. 1984.
[62] M. Karlsson and E. Agrell, “Which is the most power-efficient modulation
format in optical links?” Opt. Exp., vol. 17, no. 13, pp. 10 814–10 819,
Jun. 2009.
[63] M. Karlsson and E. Agrell, “Four-dimensional optimized constellations
for coherent optical transmission systems,” presented at the European
Optical Communication Conf., Torino, Italy, 2010, Paper We.8.C.3.
[64] L. Coelho and N. Hanik, “Global optimization of fiber-optic communi-
cation systems using four-dimensional modulation formats,” presented at
the European Optical Communication Conf., Geneva, Switzerland, 2011,
Paper Mo.2.B.4.
[65] T. A. Eriksson et al., “Experimental investigation of a four-dimensional
256-ary lattice-based modulation format,” presented at the Optical Fiber
Communication Conf., Los Angeles, CA, USA, 2015, Paper W4K.3.
[66] M. Magarini et al., “Pilot-symbols-aided carrier-phase recovery for 100-
G PM-QPSK digital coherent receivers,” IEEE Photon. Technol. Lett.,
vol. 24, no. 9, pp. 739–741, May 2012.
[67] F. Yu et al., “Soft-decision LDPC turbo decoding for DQPSK modulation
in coherent optical receivers,” presented at the European Optical Commu-
nication Conf., Geneva, Switzerland, 2011, Paper We.10.P1.70.
[68] M. A. Castrillon, D. A. Morero, and M. R. Hueda. (2012). Joint
demapping and decoding for DQPSK optical coherent receivers. CoRR.
vol. abs/1206.4914, [Online]. Available: http://arxiv.org/abs/1206.4914
[69] C. B. Czegledi, E. Agrell, and M. Karlsson, “Symbol-by-symbol joint po-
larization and phase tracking in coherent receivers,” presented at the Op-
tical Fiber Communication, Los Angeles, CA, USA, 2015, Paper W1E.3.
[70] N. Irukulapati et al., “Stochastic digital backpropagation,” IEEE Trans.
Commun., vol. 62, no. 11, pp. 3956–3968, Nov. 2014.
[71] S. O. Arik, K.-P. Ho, and J. M. Kahn, “Optical network scaling:
Roles of spectral and spatial aggregation,” Opt. Exp., vol. 22, no. 24,
pp. 29 868–29 887, Dec 2014.
[72] D. Hillerkuss, Single-Laser Multi-Terabit/s Systems (ser. XVIII), 184 S.
Karlsruhe, Germany: KIT Scientific Publishing, 2013.
[73] P. Winzer, “High-spectral-efficiency optical modulation formats,” J.
Lightw. Technol., vol. 30, no. 24, pp. 3824–3835, Dec. 2012.
[74] J. Anderson, F. Rusek, and V. Owall, “Faster-than-Nyquist signaling,”
Proc. IEEE, vol. 101, no. 8, pp. 1817–1830, Aug. 2013.
[75] G. Raybon et al., “Single-carrier and dual-carrier 400-Gb/s and 1.0-Tb/s
transmission systems,” presented at the Optical Fiber Communication
Conf., San Francisco, CA, USA, 2014, Paper Th4F.1.
[76] G. Raybon, “High symbol rate transmission systems for data rates from
400 Gb/s to 1Tb/s,” presented at the Optical Fiber Communication Conf.,
Los Angeles, CA, USA, 2015, Paper M3G.1.
[77] P. Duhamel, “Algorithms meeting the lower bounds on the multiplica-
tive complexity of length-2n DFTs and their connection with practical
algorithms,” IEEE Trans. Acoust. Speech Signal Process., vol. 38, no. 9,
pp. 1504–1511, Sep. 1990.
[78] O. E. Agazzi, “Design trade-offs in practical ASIC implementations,”
presented at the Optical Fiber Communication Conf., Los Angeles, CA,
USA, 2015, Paper Th1B.3.
[79] D. Morero et al., “Non-concatenated FEC codes for ultra-high speed
optical transport networks,” U.S. Patent 8 918 694, Dec. 23, 2014.
[80] Z. Wang, Z. Cui, and J. Sha, “VLSI design for low-density parity-check
code decoding,” IEEE Circuits Syst. Mag., vol. 11, no. 1, pp. 52–69,
Jan./Mar. 2011.
[81] D. Morero, G. Corral-Briones, and M. Hueda, “Parallel architecture for
decoding LDPC codes on high speed communication systems,” in Proc.
Argentine School Micro-Nanoelectronics Technol. Appl., Buenos Aires,
Argentina, 2008, pp. 107–110.
[82] L. Liu and C.-J. Shi, “Sliced message passing: High throughput overlapped
decoding of high-rate low-density parity-check codes,” IEEE Trans. Cir-
cuits Syst. I, Reg. Papers, vol. 55, no. 11, pp. 3697–3710, Dec. 2008.
[83] J. Sha et al., “Multi-Gb/s LDPC code design and implementation,” IEEE
Trans. Very Large Scale Integr. Syst., vol. 17, no. 2, pp. 262–268, Feb.
2009.
[84] C. Jego, P. Adde, and C. Leroux, “Full-parallel architecture for
turbo decoding of product codes,” Electron. Lett., vol. 42, no. 18,
pp. 1052–1053, Aug. 2006.
[85] R. Pyndiah et al., “Near optimum decoding of product codes,” in Proc.
IEEE GLOBECOM, 1994, vol. 1, pp. 339–343.
[86] S. Dave et al., “Soft-decision forward error correction in a 40-nm ASIC
for 100-Gbps OTN applications,” in Proc. Opt. Fiber Commun. Conf., Los
Angeles, CA, USA, 2011, Paper JWA014.
[87] Y.-H. Chen et al., “A channel-adaptive early termination strategy for LDPC
decoders,” in Proc. IEEE Workshop Signal Process. Syst., Tampere, Fin-
land, 2009, pp. 226–231.
[88] W. Stoye, “LDPC iteration control by partial parity check,” in Proc. IEEE
Int. Conf. Ultra-Wideband, Vancouver, BC, Canada, 2009, pp. 602–605.
[89] X. Zhang and P. Siegel, “Quantized iterative message passing decoders
with low error floor for LDPC codes,” IEEE Trans. Commun., vol. 62,
no. 1, pp. 1–14, Jan. 2014.
[90] R. Nair and D. Bennett, Power Integrity Analysis and Management for
Integrated Circuits (Prentice Hall Signal Integrity Library), Englewood
Cliffs, NJ, USA: Prentice-Hall, 2010.
[91] M. Popovich, A. Mezhiba, and E. Friedman, Power Distribution Networks
With On-Chip Decoupling Capacitors. New York, NY, USA: Springer,
2007.
[92] E. Salman and E. G. Friedman, High Performance Integrated Circuit
Design. New York, NY, USA: McGraw Hill, 2012.
[93] P. Girard, N. Nicolici, and X. Wen, Power-Aware Testing and Test Strate-
gies for Low Power Devices (SpringerLink: Bücher), New York, NY, USA:
Springer, 2010.
Damián A. Morero was born in Córdoba, Argentina. He received the degree in
electronic engineering with honors and the Ph.D. degree in engineering science
from the National University of Cordoba (UNC), Cordoba, Argentina. In 2003
and 2005, he received the Academic Excellence Award from the Engineers As-
sociation of Cordoba Argentina and the UNC, respectively. From 2006 to 2009,
he received Ph.D. Fellowships from the Secretary of Science and Technology
(SeCyT), Argentina. He is currently the Project Director at ClariPhy Argentina
S.A. where he has been engaged in the research and development of Error Cor-
rection Coding schemes and Digital Signal Processing algorithms for high speed
optical communications and data storage. His research interests include coding,
information theory, signal processing, computer science and VLSI design.
Mario A. Castrillón received the Electronic Engineer degree from the National
Technological University, Córdoba, Argentina, in 2009. In the same year, he
joined the Digital Communications Research Laboratory, Department of Elec-
tronic Engineering, National University of Córdoba, Córdoba, Argentina, where
he is currently working toward the Ph.D. degree in engineering sciences. His
research interests include coding, information theory, and signal processing.
From 2009 to 2013, he received the Ph.D. Fellowship from the Secretary of
Science and Technology, Argentina. He received a Type-II Grant awarded by
National Scientific and Technical Research Council, Argentina.
Alejandro Aguirre was born in Corrientes, Argentina. He received the Elec-
tronic Engineer degree from the National University of Córdoba, Argentina, in
2011. In 2009, he joined ClariPhy Argentina S.A. where he started working on
high speed analog circuit design and later moved to physical design of mixed
signal integrated circuits. Since 2010, he has been responsible for the physical
design of several generations of optical coherent transceivers at speeds of 10,
40, 100, and 200 Gb/s.
136 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 34, NO. 1, JANUARY 1, 2016
Mario R. Hueda was born in Jujuy, Argentina. He received the Electrical and
Electronic Engineer degree in 1994, and the Ph.D. degree in 2002, both from the
National University of Cordoba, Argentina. Since 1997, he has been with the
Digital Communications Research Laboratory at the Department of Electronic
Engineering, National University of Cordoba, Argentina. He is also with CON-
ICET (National Scientific and Technological Research Council of Argentina).
His research interests include digital communications, signal processing, and
synchronization. He has 15 patents issued or pending, and has published more
than 60 technical papers in journals and conferences.
Oscar E. Agazzi received the Ph.D. degree in electronic engineering from the
University of California at Berkeley, Berkeley, CA, USA. He is the Senior
Vice-President and Chief Systems Architect at ClariPhy Communications, Inc.
He is the Chief Architect of the ClariPhy family of coherent as well as direct
detection optical transceivers. Prior to joining ClariPhy, he worked at Broad-
com Corporation and at Lucent Technologies Bell Laboratories. He has more
than 150 patents issued or pending, and has published more than 60 technical
papers in journals and conferences. He is a Lucent Technologies Bell Labs
Fellow.
