Power-Efficient ASIC Implementation of DSP Algorithms for Coherent Optical Communication by Larsson-Edefors, Per & B\uf6rjeson, Erik
Power-Efficient ASIC Implementation of DSP Algorithms for
Coherent Optical Communication
Downloaded from: https://research.chalmers.se, 2021-08-31 11:46 UTC
Citation for the original published paper (version of record):
Larsson-Edefors, P., Börjeson, E. (2020)
Power-Efficient ASIC Implementation of DSP Algorithms for Coherent Optical Communication
IEEE Photonics Society Summer Topical Meeting Series , July 2020
http://dx.doi.org/10.1109/SUM48678.2020.9161072
N.B. When citing this work, cite the original published paper.
research.chalmers.se offers the possibility of retrieving research publications produced at Chalmers University of Technology.
It covers all kind of research output: articles, dissertations, conference papers, reports etc. since 2004.
research.chalmers.se is administrated and maintained by Chalmers Library
(article starts on next page)
Power-Efficient ASIC Implementation of DSP Algorithms
for Coherent Optical Communication
Per Larsson-Edefors and Erik Börjeson
Chalmers University of Technology, Gothenburg, Sweden
perla@chalmers.se
Abstract—Coherent optical communication critically relies on efficient
digital signal processing (DSP). We outline the application-specific in-
tegrated circuit (ASIC) implementation flow for DSP algorithms and
discuss approaches to reducing the digital ASIC power dissipation of
high-throughput DSP implementations for coherent fiber-optic commu-
nication systems.
I. POWER-AWARE ASIC IMPLEMENTATION: A PRIMER
Digital ASIC design involves a number of phases, and electronic
design automation (EDA) tools, as shown in Fig. 1. Once a fixed-
point model of an algorithm has been developed, the designer creates
a hardware description language (HDL) implementation containing
different digital modules. Since throughput requirements are very
strict for fiber-optic communication systems, choice of parallelism
and pipelining is essential: The bit rate can be increased either by
parallelizing hardware in several data lanes or inserting synchroniza-
tion elements that pipeline the computations, effectively increasing
the clock rate. In algorithms with feedback, however, it is far from


















Fig. 1. Digital ASIC implementation phases.
Once the HDL code has been thoroughly verified for functionality
using logic simulations, the next EDA tool performs netlist synthesis.
With information on the IC technology and its logic gate cells, the
synthesis tool maps the HDL code into a cell netlist under a timing
constraint, which is the longest delay that we can accept between any
two synchronizing elements, to sustain the required bit rate. If the
timing constraint is not met, the designer needs to optimize the HDL
implementation or, worse, reconsider the whole design approach.
In digital CMOS circuits, power dissipation is either static (leakage
in off-state transistors) or dynamic (switching gate outputs). Leakage
is mainly a problem in performance-oriented IC technologies, in
which switching speed is prioritized. The always present switching
power dissipation, Psw, is caused by charges brought from the power
supply down to ground to charge and discharge logic gate outputs.
Assuming Q is the charge and VDD is the supply voltage, an energy of
Q ·VDD = (CVDD) ·VDD is dissipated during a charge-discharge cycle
of a gate output with capacitance C. For a netlist with N logic cells,





Here, f is the clock rate, while Ci and αi is the capacitance and
switching activity, defined as the fraction of cycles when a gate output
switches from 0 to 1, respectively, of the output of gate i. To identify
αi, we perform netlist simulations based on meaningful test vectors.
The final phase needed to complete the ASIC design is to run
place and route (P&R) [1], in which logic cells are placed by
abutment, inter-cell logic signal wires routed, and a clock network
is constructed. This completes the physical implementation and
provides very accurate layout-based data from which to estimate
area usage, longest delay, and power dissipation. It should be noted,
however, that modern synthesis tools for digital ASICs can often
accurately estimate wire lengths without having to perform full P&R.
II. DESIGN OF FEC CIRCUITS
While DSP often has come to include forward error correction
(FEC), these two functions have little in common. In fact, DSP
circuits are mainly performing filtering to continuously shape the data
stream of a signal, while the FEC circuit is, for most of the time, only
monitoring a stream of digital data. The actual correction of an error
does cause significant power dissipation, but this event is rare [2].
In fiber-optic communication, the FEC module is often specified to
reduce a pre-FEC bit-error rate (BER) around 10−3–10−2 to a post-
FEC BER of 10−15. This means the switching power dissipation is
higher in early iterations, at the front-end of the FEC module [3].
Since observable post-FEC errors are extremely rare, the analysis
of coding gain and error floor is recognized to be a formidable chal-
lenge: One of our field-programmable gate array (FPGA) prototypes
was running continuously for 30 days to capture statistically stable
data for a post-FEC BER of 10−15 [4] and in a post-deadline paper
at OFC’18, ZTE demonstrated a system with 50 FPGAs to explore
deep-BER behavior of FEC circuits intended for OIF 400G ZR [5].






























Fig. 2. Simulation time of decoder HDL code as function of Eb/N0.
Since FEC power dissipation depends strongly on how many errors
are corrected, netlist simulations (Fig. 1) are instrumental to an
accurate power analysis. But, again, since the correction of an error
is rare, FEC simulations in general are very time consuming. Fig. 2
shows the time it takes to perform logic simulations (Fig. 1) for a
BER analysis of a product decoder [3] based on BCH(115,94,3)
component codes. The longest simulation corresponds to a run-time
of 26 days, yet this reaches only down to a post-FEC BER of 3 ·10−8.
Since they only occasionally correct erroneous data in the stream,
FEC circuits can be aggressively optimized for low power dissipation.
For example, we have demonstrated hard- and soft-decision FEC
circuits for optical communication, with bit rates in the range of
400 Gb/s to 1 Tb/s, having energy efficiencies of around 1 pJ/bit [3],
[4], [6]. We prefer a metric of energy efficiency, i.e., energy per bit,
Ebit = Psw/throughput,
over absolute power dissipation as it allows us to conveniently
perform design tradeoffs between different system components.
III. DESIGN OF DSP CIRCUITS
Since coherent schemes require sophisticated DSP functions, DSP
power dissipation is an issue [7]. In contrast to FEC, DSP circuits
operate on each data sample and thus have high switching activities.
This severely restricts the available power-saving design options.
A good starting point for ASIC design can be to separate the part of
the DSP algorithm that is involved in estimation of signal properties
from the actual operation on the data samples. The fundamental
reason for this is that the estimation circuits may not need the full
parallelism required by operations at full bit rate: For the adaptive
equalizer, which we expect to dominate DSP power dissipation for
20–100-km coherent links [8], we can reduce the power dissipation
by more than 50% via simplifications to the tap update algorithm [9].
To save power in DSP circuits, simplified coherent receiver ar-
chitectures can be used [10]. A self-homodyne receiver represents a
lower limit of DSP functions: One polarization is sacrificed for a pilot
tone, to avoid the local oscillator [11]. But for the sake of throughput,
we would prefer to use the full optical field. If we scale a conventional
coherent receiver architecture down to shorter reaches, we can neglect
some impairments, e.g., chromatic and polarization-mode dispersion,
which allows us to use only a short adaptive equalizer. This in turn
will move the design focus to analog-digital converters (ADCs) [12]
and DSP components that are indispensable for any fiber length.
Unless pilot symbols are used, the carrier phase recovery (CPR)
module will constitute a significant portion of a short-reach full-field
coherent receiver. While ADC power dissipation scales linearly with
sampling rate but exponentially with resolution [12], CPR power dis-
sipation scales linearly with bit rate but quadratically with modulation
format. As we increase the modulation format, the resolution needs
to be increased, which further increases CPR power dissipation.
Fig. 3 shows a CPR design [13] based on the blind phase search
(BPS) algorithm. Here we can discern estimation blocks and a phase
compensation block. While the compensation has to be done in many
parallel lanes to meet the required total CPR bit rate, the estimation is
only partly parallelized. Rather, as much as possible of the estimation,
especially the later portions of the algorithm, is handled without










Fig. 3. CPR block diagram.
Several proposed CPR algorithms use multiple estimation-
compensation stages to improve performance. But in a 32-GBd 16-
QAM CPR module, with an energy efficiency of 1.1 pJ/bit [13], 25%
is dissipated in the compensation block. Thus, cascading of up to four
separate estimation-compensation stages [14] will be costly in terms
of power dissipation. In contrast, the use of pilot symbols makes CPR
implementations less complex. Here, estimation is straightforward
to implement and, thus, the compensation block dominates the
total power dissipation of a pilot-based CPR module, whose energy
efficiency is 0.38 pJ/bit for a 32-GBd 16-QAM design [13].
Simulation run-time is not only an issue for FEC, but also for
deep-BER analysis of DSP. For example, to analyze a CPR module’s
sensitivity to cycle slips, the module needs to be analyzed at a
hardware-centric level, to ensure the detailed logic circuit behavior
is captured. One way to describe the module in such a bit-equivalent
manner is to write HDL code and run this as an emulation prototype
in an FPGA system. The simulation time speed-up when using one
FPGA, instead of using MATLAB-HDL co-simulation on a capable
workstation, was shown to be five orders of magnitude [15].
IV. CONCLUSION
The design of power-efficient digital ASICs is essential to optical
communication systems. Coherent schemes offer many advantages,
but they are known to lead to complex DSP and FEC circuits. While
DSP circuits operate on all data samples, the correction of an error is
a rare operation in FEC circuits. This distinction between principles
of operation, which is still to be addressed in algorithms published
in the open literature, allows us to suggest two different approaches
to power-efficient implementation of DSP and FEC algorithms.
REFERENCES
[1] D. A. Morero et al., “Design tradeoffs and challenges in practical
coherent optical transceiver implementations,” IEEE J. Lightw. Technol.,
vol. 34, no. 1, pp. 121–136, Jan. 2016.
[2] P. Larsson-Edefors et al., “Implementation challenges for energy-
efficient error correction in optical communication systems [invited],” in
OSA Advanced Photonics Congress, SPPCom, July 2018, p. SpTh4F.2.
[3] C. Fougstedt et al., “Energy-efficient high-throughput VLSI architectures
for product-like codes,” IEEE J. Lightw. Technol., vol. 37, no. 2, pp.
477–485, Jan. 2019.
[4] K. Cushon et al., “Low-power 400-Gbps soft-decision LDPC FEC for
optical transport networks,” IEEE J. Lightw. Technol., vol. 34, no. 18,
pp. 4304–4311, Sept. 2016.
[5] Y. Cai et al., “FPGA investigation on error-floor performance of a
concatenated staircase and Hamming code for 400G-ZR forward error
correction,” in Opt. Fiber Commun. Conf. (OFC), Mar. 2018, p. Th4C.2.
[6] K. Cushon et al., “A high-throughput low-power soft bit-flipping LDPC
decoder in 28 nm FD-SOI,” in European Solid State Circuits Conf.
(ESSCIRC), Sept. 2018, pp. 102–105.
[7] T. Kupfer et al., “Optimizing power consumption of a coherent DSP
for metro and data center interconnects,” in Opt. Fiber Commun. Conf.
(OFC), Mar. 2017, p. Th3G.2.
[8] C. Fougstedt et al., “ASIC design exploration for DSP and FEC of
400-Gbit/s coherent data-center interconnect receivers,” in Opt. Fiber
Commun. Conf. (OFC), Mar. 2020, p. Th2A.38.
[9] C. Fougstedt et al., “Dynamic equalizer power dissipation optimization,”
in Opt. Fiber Commun. Conf. (OFC), Mar. 2016, p. W4A.2.
[10] X. Zhou et al., “Beyond 1 Tb/s intra-data center interconnect technology:
IM-DD or coherent?” IEEE J. Lightw. Technol., vol. 38, no. 2, pp. 475–
484, Jan. 2020.
[11] L. Lundberg et al., “Power consumption of a minimal-DSP coherent link
with a polarization multiplexed pilot-tone,” in Eur. Conf. Opt. Commun.
(ECOC), Sept. 2016, pp. 1190–1192.
[12] T. Drenski et al., “ADC/DAC and ASIC technology trends,” in Opto-
Electronics and Communications Conf. (OECC), July 2019, p. TuB2-1.
[13] E. Börjeson et al., “VLSI implementations of carrier phase recovery
algorithms for M-QAM fiber-optic systems,” IEEE J. Lightw. Technol.,
2020, Early access, doi: 10.1109/JLT.2020.2976166.
[14] S. M. Bilal et al., “Multistage carrier phase estimation algorithms for
phase noise mitigation in 64-quadrature amplitude modulation optical
systems,” IEEE J. Lightw. Technol., vol. 32, no. 17, pp. 2973–2980,
Sept. 2014.
[15] E. Börjeson et al., “Towards FPGA emulation of fiber-optic channels
for deep-BER evaluation of DSP implementations [invited],” in OSA
Advanced Photonics Congress, SPPCom, July 2019, p. SpTh1E.4.
