We present the detailed architecture of a 4×4 continuous-time adaptive equalizer for analog signal processingbased coherent optical dual polarization-quadrature phase shift keying (DP-QPSK) receivers. The proof of concept equalizer that uses the constant modulus algorithm (CMA) for weight coefficient update is implemented in a 130 nm BiCMOS technology. The equalizer designed for 100-Gb/s operation occupies an area of ∼1.4 mm×1.35 mm. The equalizer is validated experimentally at 40-Gb/s data rate and by using post-layout circuit simulations at 100-Gb/s. The equalizer output after processing with a behavioral Costas loop based carrier phase recovery and compensation (CPRC) module shows an error vector magnitude (EVM) of 28% when the transmitter EVM is 24% in a 40-Gb/s link of 10 km single mode fiber (SMF). Performance of the equalizer can be improved further by implementing it in more advanced technologies. The simple architecture makes the equalizer suitable for intra data center analog coherent interconnects with SMF channels of length less than 10 km and carrier wavelengths of either 1310 nm or 1550 nm.
I. INTRODUCTION
E NHANCEMENT of energy efficiency in short-reach optical interconnects to curtail the increased power consumption, which grows with speed and capacity of the interconnects [1] , in data centers is an actively investigated area nowadays. Intensity modulation-direct detection (IMDD) through multi-mode fibers is giving way to more spectrally efficient modulation formats through single mode fibers (SMF) in short-reach links as well [2, 3] . Even though, IMDD based techniques with pulse amplitude modulation formats are popular currently, scalability of data rate is a challenge for such methods in short-reach links [4] . Whereas, coherent optical reception, which is the primary choice for long-haul communications, helps to achieve data rate scalability along with an increased receiver sensitivity [5] .
A typical optical link for long-haul communication uses a digital coherent receiver [5, 6] . In a digital coherent receiver, the received signals are converted to the digital domain using analog-to-digital converters (ADC) after the optical-toelectrical (O/E) conversion. The digitized signals are processed using digital signal processors (DSP). This ADC+DSP approach makes receivers too complex to be used in short-reach links [6, 7] . The complexity of coherent receivers for shortreach links can be reduced if the signals are processed in the analog domain itself [6] [7] [8] [9] [10] [11] . For example, the feasibility study of a DSP-free homodyne DP-QPSK receiver for data center links has been discussed in [6] analytically. The reported polarization multiplexing based architecture uses a simplified 2 × 2 equalizer, for correcting small amounts of dispersion and bandwidth limitations, the weight coefficients of which can be updated by using either the CMA or a least mean square algorithm. In another study, validation results of a proof-of-concept 400 Gb/s DSP-free coherent transceiver for data center interconnects that uses dual polarization-16 quadrature amplitude modulation format is reported in [10] . More recently an analog coherent engine which uses analog signal processing followed by a feed forward equalizer filter for improving signal quality for 15 m SMF has been demonstrated in [7] . This engine works up to 400 Gb/s and dissipates 2 W of power.
Studies have also proposed low-power analog coherent optics based on optical phase-locked loop (OPLL) which work for very short distances due to the choice of a low dispersion, but high attenuation O-band [3, 11] . Another study demonstrates a 40-Gb/s OPLL based analog coherent BPSK receiver that uses the integration of electronic and photonic integrated circuits (IC) which eliminates the usage of DSP based approach for carrier offset removal [8] . More recently a technique to achieve carrier phase synchronization using analog signal processing based IC, phase modulator, and tunable lasers has been demonstrated [12] . These receivers use OPLL for phase and frequency offset compensation and achieves significant power savings by removing DSP completely. Looking at the literature it may be summarized that miniaturized analog coherent receiver using electronic and photonic integration is a solution for the power dissipationsize-cost problem in the future generation data centers.
An analog domain processing-based 100-Gb/s DP-QPSK Here τ d is the tap delay, τc is the delay inserted to compensate for the delay mismatch in the forward paths, and τe is the delay inserted to compensate for the delay mismatch in the error generators.
receiver for short-reach links and dispersion compensated links is proposed earlier [9] , a block diagram of which is shown in Fig. 1 . In this receiver, the three signal processing operationsequalization, CPRC, and clock and data recovery (CDR) are performed in the analog domain itself. An equalizer is used in the receiver for polarization rotation and to mitigate the effects of polarization mode dispersion and residual chromatic dispersion. A CPRC is required to compensate for the phase, and frequency mismatches between the transmitter and receiver lasers and a CDR are required for clock recovery from the received signals and re-time the same. Analog domain processing is an attractive choice for equalization in high-speed links. For instance, continuous-time lowpower equalizers for various high-speed applications were reported in [13] [14] [15] [16] [17] [18] [19] [20] . However, the first continuous-time DP-QPSK equalizer that uses a CMA algorithm for weight coefficient update is demonstrated in [21] . This paper gives a detailed description of its architecture and its validation results.
The remainder of this paper is organized as follows. Section II gives the detailed architecture of the CMA Equalizer. Section III describes the block-wise implementation and integration of a two tap equalizer using 130 nm BiCMOS technology. Section IV discusses the details of the experimental validation of the equalizer at 40-Gb/s and post-layout circuit simulation results of the equalizer at 100-Gb/s. Section V draws the conclusions of the paper.
II. SYSTEM OVERVIEW Due to channel dispersion, polarization misalignment, and polarization dependent loss, the four signals coming out of the optical front-end of a DP-QPSK receiver will not be independent of each other. Hence, a multi-dimensional equalizer that has four inputs and four outputs need to be used to process all the four signals jointly. The equalizer should adaptively mitigate the dispersion happening in the fiber, which is a time-varying phenomenon. To adapt the equalizer weight coefficients, the CMA algorithm [22] is used, which is one of the simplest blind equalization algorithms that can be used with DP-QPSK signals. Fig. 2 (a) shows the architecture of a 4×4 equalizer that uses the CMA algorithm for adaptation. The equalizer has a feedforward block and an error generator. It has inputs x and y and outputs x eq and y eq which are complex signals corresponding to the X and Y polarizations, respectively. The feedforward block has four transversal filters with coefficients h xx , h xy , h yx , and h yy which are arranged as a butterfly structure. This structure generates equalized output signals, x eq and y eq from the inputs, and the CMA error signals ε x and ε y , which are calculated in the error generator. The outputs are given by [9] x eq = h T xx x + h T xy y (1)
where x and y are vectors containing delayed input signals. The continuous-time update equations of the equalizer weight coefficients are given by [9] h xx,
where x = x I + jx Q and y = y I + jy Q are the X and Y polarization input signals of the equalizer, x eq = x eq,I +jx eq,Q and y eq = y eq,I + jy eq,Q are the equalizer outputs in the corresponding polarizations, τ d is the tap delay, and 0 k L, where L is the total number of delay cells in each transversal filter. The subscripts I and Q represent the in-phase and quadrature-phase components of the signals, respectively.
III. IMPLEMENTATION DETAILS
A fractionally spaced two tap continuous-time CMA equalizer for 100-Gb/s DP-QPSK links is designed and fabricated using 130 nm BiCMOS technology from ST Microelectronics as a proof of concept. Fig. 2 (b) shows a detailed block diagram of the prototype equalizer, which consists of linear transversal filters, error generators, and weight update modules. The figure is a block level translation of the system described by Eqns.
(2)-(6) with a few minor modifications to take care of the circuit level issues discussed in the following sub-section.
A. Issues of Delay Mismatch and Signal Swing
Path delay is an inherent problem associated with any circuit which gets worse as the number of circuit blocks in the path is increased. This problem becomes crucial if the circuit has multiple parallel paths. Signal delays through the parallel paths are to be made equal over the desired frequency range. In the CMA equalizer, it is evident from Eqns. (3)-(6) that there are parallel paths, signals through which get multiplied later on. In such cases, delay cells are inserted in the paths that have lower group delays. In Fig. 2 (b) delay cells τ c are inserted to compensate for the delay mismatch in the forward paths and τ e are inserted to compensate for the delay mismatch in the error generators. Eqns. (3)-(6) assume a normalized signal magnitude which is taken as 100 mV single ended in the circuit. Gains of all the building blocks are scaled according to this signal level. In Fig. 2 (b) the expected amplitude is represented by A, which is 100 mV for a single-ended signal. Taking these into consideration, Eqn. (3) can be modified as
Eqns. (4)-(6) can also be modified correspondingly. It can be summarized from Fig. 2(b) that the main operations in the equalizer are delay, multiplication, addition, and integration. The following sub-section briefly describes design details of basic building blocks of the equalizer. B. Basic Building Blocks of the Equalizer 1) Delay Cell: An active delay cell is chosen over a passive delay cell to save chip area and hence, for the ease of routing. The delay cell is implemented by cascading several common emitter (CE) stages followed by a common collector (CC) buffer to drive a large capacitive load. A circuit schematic of the delay cell is shown in Fig. 3 . The group delay of the cell τ d with five cascaded CE stages is shown in Fig. 4 1 . The delay cell, with an area of ∼95 µm×50 µm, has a DC gain of −1.4 dB, and a bandwidth of 20.7 GHz. This provides a group delay of 24.3 ps with a 10% delay-bandwidth of 22.1 GHz. The
. DC transfer characteristics of the multiplier. Within the signal swing range of 200 mV the multiplier shows a maximum of 10% error at the output. delay cells τ c and τ e are implemented similarly by varying the number of CE stages as per the delay needed.
2) Multiplier: The multiplication operation is realized using the Gilbert cell topology [23] . Fig. 5 shows a circuit schematic of the multiplier. The circuit has degeneration resistances to improve the linearity. The multiplier has a DC gain of 7.4 dB and a bandwidth of 20.2 GHz from I1 when I2 is 200 mV, and a DC gain of 6.02 dB and a bandwidth of 17.9 GHz from I2 when I1 is 200 mV. Due to the lower bandwidth of I2 low-speed signals such as filter weight coefficients are applied at this input. DC transfer characteristics of the multiplier shown in Fig. 6 . It can be observed that the maximum deviation from the linear region is only 10% in the signal swing of concern. A single multiplier occupies an area of ∼40 µm×45 µm. A complex multiplier is implemented by connecting four such Gilbert cells, and a squaring circuit is implemented by giving the same signals to both inputs.
3) Adder: The addition operation is performed in the current domain by adding two current signals onto a common resistor. This technique is used in blocks such as complex multiplier where routing is minimal. When an additional gain is required, two CE sections are used with common load resistors, as shown in Fig. 7 . The adder has a DC gain of 12.3 dB and a bandwidth of 18.4 GHz. For the convenient placement of adder circuit in the equalizer layout, the adder is designed as half circuits, each of which has an area of ∼20 µm×40 µm. 4) Integrator: The settling behavior and steady-state error performance of the equalizer are primarily decided by the DC gain and cut-off frequency of the integrator, which makes it a critical building block. A folded BiCMOS amplifier topology with a large capacitive load is chosen to get a very high DC gain without compromising the pole locations. A circuit schematic of the integrator is shown in Fig. 8 . The high gain amplifier of the integrator is a modified version of the basic folded cascode amplifier discussed in [24] [25] [26] and the G m stage of the amplifier is designed with bipolar transistors as suggested in [27] . To obtain a very low-frequency pole without affecting the high gain, the output resistance of the circuit has to be maximized. Hence, the cascode section of the amplifier is biased at a low current. The amplifier gain is given by [24] 
the output resistance is given by Fig. 10 . Schematic of the output buffer. The buffer occupies an area of ∼44 µm×46 µm and has a bandwidth of 37.8 GHz. Fig. 11 . Output resistance of the buffer. The differential output resistance is ∼100 Ω in the band of concern.
V CC and the cut-off frequency is given by
The integrator shown in Fig. 8 has a common-mode feedback structure that comprises transistors N 5 -N 9 and the dummy transistors Q P2 and Q N2 , detailed design of which can be found in [24] . The integrator has an area of ∼120 µm×160 µm and a DC gain of 103.9 dB with a bandwidth of 56.2 Hz. 5) Reset Circuit: To initialize the weight coefficients, a reset circuit shown in Fig. 9 is used. An enable signal, EN 1 is applied at the startup of the equalizer operation which turns on switches N 3 and N 4 . The enable signal EN 2 is a delayed version of EN 1 to avoid any unwanted discharge of the integrator capacitor. Different combinations of R 1 to R 4 are chosen so as to initialize the weight coefficients at the integrator output nodes. When a global reset signal is asserted the equalizer tap coefficients are reset to the values
6) Output Buffer: Outputs nodes of the equalizer are designed to drive high-speed transmission lines. To match impedance at the output nodes CE buffers are used, a schematic of which is shown in Fig. 10 . The parallel combination 2R C ||R L helps to match the impedance without decreasing the value of R C . This configuration also helps to achieve different AC and DC output resistances while maintaining the Fig. 13 . Input reflection coefficient of the matching circuit. S 11 is lower than −10 dB in the band of concern.
1407µm 1351µm (A)
x y x eq y eq Fig. 14. Micro-graph of the equalizer IC. The block labels correspond to the explanation given in Fig. 2(b) .
transistors in the active region of operation. The buffer has a DC gain of 4.9 dB and a bandwidth of 37.8 GHz. The output of the buffer, which occupies an area of ∼44 µm×46 µm is matched, as shown in Fig. 11 with a differential output resistance of 99.7 Ω. 7) Other Building Blocks: Level shifters are used to shift common-modes up or down at various nodes, and AC coupling is used wherever this level shifting is not possible. All bias currents are mirrored from a single bias current which is supplied from outside. To match the impedance of high-speed inputs, the circuit shown in Fig. 12 is used. This configuration provides an input differential matching of 100 Ω, and the desired common mode to the input signals. There is also an electro-static discharge (ESD) protection circuit made up of ESD diodes. Simulations show an S 11 parameter, which is better than -10 dB in the frequency band of concern, which is shown in Fig. 13 .
C. The Equalizer IC
A micrograph of the prototype equalizer is shown in Fig. 14, in which all the sub-blocks are marked corresponding to Fig.  2(b) . The IC has 50 pads of which 28 are meant for high-speed differential signals which are arranged in a ground-signalsignal-ground-signal-signal-ground (GSSGSSG) pattern. Rest of the pads are used for reset signal, bias current, amplitude control signals, and supply voltage and ground. The equalizer occupies ∼1.4 mm×1.35 mm chip area and draws 1 A current from a 2.5 V supply. 
IV. RESULTS AND DISCUSSION
The equalizer is validated experimentally at 40-Gb/s data rate and by post-layout simulations at 100-Gb/s data rate, details of which are given in the following sub-sections.
A. Measurement Results at 40-Gb/s Data Rate Fig. 15 shows a block diagram of the experimental setup with a 40-Gb/s DP-QPSK system. An external cavity laser of 1550 nm wavelength is used as the carrier source at the transmitter. A 50:50 power splitter (PS) divides the laser output into two parts-one of which gets modulated at the transmitter and the other is used as a local oscillator (LO) at the receiver. Two independent 10-Gb/s data streams are generated using a 10-Gb/s pseudo-random binary sequence generator which is clocked by a 10 GHz source. The two data streams are amplified (I and Q) to drive the nested Mach-Zehnder modulator (MZM) shown in the figure. The MZM gives out a 20-Gb/s QPSK modulated carrier which is split into X and Y polarizations using a polarization beam splitter (PBS). The X output of the PBS is directly connected to the X input of a polarization beam combiner (PBC), whereas the Y output is delayed using a 2 m-long polarization maintaining fiber before connecting to the Y input of the PBC. The optical delay is used to de-correlate the X and Y QPSK signals. The MZM-PBS-optical delay-PBC combination emulates a DP-QPSK modulator which gives out a 40-Gb/s DP-QPSK modulated carrier to the channel. At the receiver side, the signal from the channel, S and the LO are fed to the inputs of an integrated coherent optical receiver front-end. The required power level of the LO is maintained using a variable optical attenuator (VOA), which is connected just before the receiver front-end. The receiver front-end consists of PBSs, optical hybrids, balanced photodiodes, and trans-impedance amplifiers in the mentioned order. This module maintains a signal level of 400 mVpp on all four inputs (x I , x Q , y I , and y Q ) of the equalizer IC which is wirebonded to a printed circuit board (PCB). The PCB shown in Fig. 16 is designed using RT duriod/6010LM laminate. It has four pairs of input transmission lines and four pairs of output transmission lines which end on SMA connectors mounted along the periphery of the PCB. The outputs of the equalizer are stored using a 21 GHz real-time oscilloscope to process further using a behavioral model of a CPRC. Fig. 17 depicts the block diagram of the behavioral CPRC that consists of a single sideband mixer, phase detector, loop filter, and quadrature-phase voltage controlled oscillator. Further details of the CPRC can be found in [9] . An EVM 2 of 2 EVM of a signal in the I-Q space can be calculated as
where I k is the I component of the k th received symbol, Q k is the Q component of the k th received symbol, I k is the expected I component of the k th received symbol, Q k is the expected Q component of the k th received symbol, and N is the total number of received symbols. the output signal is calculated by sampling the CPRC outputs the eye-opening is maximum. Performance of the equalizer is characterized using the 40-Gb/s system for different link distances. Fig. 18 shows the eyediagrams obtained at various stages of the experimental setup with a back-to-back optical link. The eye-diagram shown in Fig. 18(a) is that of the transmitted signal with 24% EVM. Fig. 18(b) shows the received signal eye-diagram, which is distorted due to the system non-idealities. Fig. 18(c) shows the equalizer output eye-diagram and Fig. 18(d) shows the CPRC output eye-diagram with 28% EVM.
X polarization constellations at various stages of the system are shown in Fig. 19 with results of a back-to-back link in the top row, a 5 km link in the middle row, and a 10 km link in the bottom row. Since the same laser is used at the transmitter and receiver, there is only a minimal phase offset between the received signal and LO at the receiver in the back-to-back link. Under such conditions, the equalizer can recover four constellation points, but with a slight rotation due to the phase insensitivity of the CMA algorithm. However, in 5 km and 10 km links, a large optical path difference between the signal and LO results in a frequency offset which is evident from the equalized constellation diagrams, which appear as rings on the I-Q plane. Post-Processing carried out with the behavioral CPRC results in the constellations shown in Fig. 19(c) with an EVM of 28% in the back-to-back, 32% in the 5 km, and 33% in the 10 km links. Similar constellations are obtained for the Y polarization also. Preliminary results of the equalizer at a lower data rate were presented earlier in [21] . Also, analog domain costas loop based carrier phase recovery and compensation chip results were demonstrated in [12] .
B. Post Layout Simulation Results at 100-Gb/s Data Rate
Due to the non-availability of a cost-effective packaging solution for high-speed ICs with a large pin count experimental validation of the equalizer at the designed data rate is not performed. Also, the small pad pitch of the IC limited the possibilities of an optimal direct die attach and a very well matched on-PCB transmission lines. Apart from these issues owing to the bandwidth limitation due to bond-wire inductance, transmission lines, and SMA connectors functionality of the equalizer at 100-Gb/s is verified through post-layout simulations. A 100-Gb/s transmission system is modeled in VPItransmissionMaker from which the received signals are exported for circuit simulation after the O/E conversion. Fig.  20 shows the results of a post-layout simulation carried out in an all-typical corner with the data from a simulation model of an optical link with a 5 km SMF channel. Fig. 20(a) shows the eye-diagram of the transmitted data with an EVM of 1%, which becomes distorted when it reaches the receiver side, as shown in Fig. 20(b) . Fig. 20(c) is the eye-diagram of the equalizer output and Fig. 20(d) is the output of a behavioral CPRC with an EVM of 27.8%. Detailed performance analysis of the equalizer has been reported earlier in [9] using prelayout circuit simulations.
V. CONCLUSION
The detailed architecture of a continuous-time adaptive equalizer that uses the CMA algorithm for weight coefficient update is presented. A two tap prototype of the equalizer which is meant for 100-Gb/s DP-QPSK coherent optical receivers is implemented in a 130 nm BiCMOS technology. Experimental validation carried out with a 40-Gb/s back-toback link shows an EVM which is very close to the transmitter EVM. Performance of the equalizer at 100-Gb/s data rate is verified through post-layout circuit simulations considering all the parasitics. The equalizer architecture can be extended for coherent optical communication systems that use higher order modulation formats to increase the data rate. Even though the results shown in this paper correspond to 1550 nm wavelength the equalizer can be used with 1310 nm wavelength as well thereby increasing the overall channel length. The promising results presented makes the equalizer an ideal choice for shortreach optical links such as data center interconnects.
