# A Circuit-Level Implementation of Voltage-Tuning Scheme for Realizing Optical PAM-4 Using Three-Segment Microring Modulator 

Rui Wang*<br>Department of Computer and Electrical Engineering, University of Idaho, Moscow, USA<br>Received 18 December 2019; received in revised form 08 January 2020; accepted 24 March 2020

DOI: https://doi.org/10.46604/ijeti.2020.5072


#### Abstract

Silicon Photonics, as one of the solutions to satisfy ever-increasing data bandwidth growth, becomes more challenging due to the latest technologies such as Internet of Things (IoT). Higher order pulse amplitude modulation (PAM) schemes is one of the answers to push towards higher data transmission in the presence of bandwidth limited optical devices. In this paper, we have implemented a circuit-level PAM-4 transmitter design based on the voltage-tuning scheme for realizing optical PAM-4 using a three-segment microring modulator. Simulation results based on the extracted layout using TSMC 65nm LP technology and IMEC-ePIXfab SiPhotonics ISIPP50G technology show that our proposed circuit-level transmitter structure is able to achieve PAM-4 data rate of $25-\mathrm{Gb} / \mathrm{s}$ with extinction ratio of 9 dB and PAM-4 energy efficiency of $0.5 \mathrm{pJ} / \mathrm{bit}$. The results also verify that the scheme is able to achieve high tuning flexibility, but the proposed transmitter will consume more power as a result.


Keywords: interconnect, optical transmitter, photonics IC, PAM-4 transmitters

## 1. Introduction

Internet of Things (IoT) has gradually been realized through a growing number of physical objects connected to the Internet at an unprecedented rate. IoT enables physical objects to see, hear, think, and perform jobs by having them communicate together so as to share information and to coordinate decisions [1]. IEEE has defined 50, 100, 200, and 400-Gb/s links in IEEE 802.3bs and 802.3 cd standards for data center connectivity in order to answer the challenges of continuing bandwidth growth that comes along with IoT [2]. Silicon photonics platforms have recently been proven to be used for 400G data center applications [3]. Therefore, silicon photonics can be considered as one of the solutions to satisfy the IEEE standards mentioned above.

Silicon photonics was first proposed in the 80s by Soref [4]. After the rapid breakthrough in the last decade where the same foundries for building transistors are also capable of building chips to manipulate light, it has become one of the most suitable platforms for integrated optics owing to its low power consumption, low cost, small footprint, and most importantly, complementary metal oxide semiconductor (CMOS) compatibility [5]. As one of the key components for realizing high-speed data transmission, a high performance silicon-based optical modulator is required to convert electrical data into optical power. Currently, silicon-based Microring Modulators (MRMs) [6] are one of the most researched optical modulators due to their small size, mostly capacitive loading, and consequently lower power consumption in the drivers, which provide a promising energy-efficient solution to push data transmission toward 400G. Due to the bandwidth limitation of the optical components in the optical transceivers [7], pulse amplitude modulation (PAM)-4 is presented as one of the solutions to double the data rate

[^0]without requiring a larger modulator bandwidth. As one of the examples, all of the proposed long-reach 400G IEEE standards (400G-DR4, 400G- FR8, and 400G-LR8) are based on the PAM-4 scheme [8].

As one of the schemes for realizing optical PAM-4 output, using tunable voltage-mode drivers to control a three-segment microring modulator is able to achieve equal-spaced PAM-4 optical eye response with high linearity even when phase-shifter length variation in the ring modulation is present [9]. However, only the voltage-mode drivers presented from the reference is in circuit level, while the other parts are all implemented by using ideal circuits, which cannot justify that this scheme is able to be realized in the circuit level.

This paper demonstrates a circuit-level transmitter implementation of the voltage-tuning scheme on the basis of a $5 \mu \mathrm{~m}$ radius three-segment microring modulator (MRM) as shown in Fig. 1. The transmitter structure is realized by using TSMC 65 nm LP technology, while the modeling of the three-segment microring modulator is based on IMEC-ePIXfab SiPhotonics ISIPP50G technology. The simulation results based on the extracted layout show that the voltage tuning scheme can be implemented in the circuit levels, and the implemented transmitter design is able to achieve high energy efficiency in terms of modulator and driver. Also, the transmitter structure can accommodate $\pm 10 \%$ phase-shifter length variation inside the three-segment MRM while maintaining a high percentage of level separation mismatch ratio RLM.


Fig. 1 Simplified systematic block diagram
The paper is organized as follows. Section 2 discusses the overview of the voltage-tuning scheme to realize optical PAM-4 using the three-segment MRM; the proposed circuit-level PAM-4 transmitter structure based on the voltage-tuning scheme is presented in section 3. The detailed circuit implementation of each block in the transmitter structure is demonstrated in section 4. The simulation result derived from extracted layout of the transmitter is presented in section 5. Section 6 concludes the paper.

## 2. Overview of the Voltage-Tuning Scheme for Realizing Optical PAM-4 using Three-Segment Microring Modulator

The final result of the voltage-tuning scheme is to realize four equal-spaced optical power levels, which is equivalent to four equal-spaced transmission intensity of those four optical power levels at resonant wavelength. If we denote the transmission intensity of the three-segment MRM as $|T|^{2}\left(V_{1}, V_{2}, V_{3}, \lambda\right)$ where $V_{1,2,3}$ are the reverse biasing voltages applied to the three phase-shifters inside the microring. If three reverse biasing voltages satisfy $V_{1,2,3} \in\left[V_{\min }, V_{\max }\right]$, the top and bottom levels of the transmission intensity at resonant wavelength $\lambda_{0}$ can be expressed as:

$$
\begin{equation*}
\left|T_{11}\right|^{2}=|T|^{2}\left(V_{\max }, V_{\max }, V_{\max }, \lambda_{0}\right) \tag{1}
\end{equation*}
$$

$$
\begin{equation*}
\left|T_{00}\right|^{2}=|T|^{2}\left(V_{\min }, V_{\min }, V_{\min }, \lambda_{0}\right) \tag{2}
\end{equation*}
$$

Thus, the transmission intensity for the two middle levels can be obtained by:

$$
\begin{align*}
& \left|T_{10}\right|^{2}=\left(\left|T_{11}\right|^{2}-\left|T_{00}\right|^{2}\right) \times \frac{2}{3}  \tag{3}\\
& \left|T_{01}\right|^{2}=\left(\left|T_{11}\right|^{2}-\left|T_{00}\right|^{2}\right) \times \frac{1}{3} \tag{4}
\end{align*}
$$

Instead of only having binary levels, $V_{1,2,3}$ will be discretized more finely and are denoted as $V_{1}\left(x_{\mathrm{i}}, a_{\mathrm{i}, 3: 0}, b_{\mathrm{i}, 3: 0}\right), V_{2}\left(x_{\mathrm{j}}, a_{\mathrm{j}, 3: 0}, b_{\mathrm{j}, 3: 0}\right)$, and $V_{3}\left(x_{k}, a_{k, 3: 0}, b_{k, 3: 0}\right)$ respectively. $x_{q}, a_{q, 3: 0}$, and $b_{q, 3: 0}$, where $q=i, j, k$, have binary levels " 1 " and " 0 ". As a result, the scheme can be categorized into two parts: coarse tuning and fine tuning.

### 2.1. Coarse tuning

If $a_{q, 3: 0}=" 0000 "$ and $b_{q, 3: 0}=" 0000 ", V_{p}$ where $p=1,2,3$ can be expressed as follows. As a result, there are only eight different combinations for $\left(V_{1}, V_{2}, V_{3}\right)$. That is why we call this scenario coarse tuning.

$$
V_{p}=\left\{\begin{array}{l}
V_{\max } \text { when } x_{q}=" 1 "  \tag{5}\\
V_{\min } \text { when } x_{q}=" 0 "
\end{array}\right.
$$

According to Eqs. (1)-(2), $\left|T_{11}\right|^{2}$ has to choose $\left(V_{1}, V_{2}, V_{3}\right)=\left(V_{\max }, V_{\max }, V_{\max }\right)$ corresponding to $x_{i}, x_{j}, x_{k}=$ " 111 ", while $\left|T_{00}\right|^{2}$ has to choose $\left(V_{1}, V_{2}, V_{3}\right)=\left(V_{\min }, V_{\text {min }}, V_{\text {min }}\right)$ corresponding to $x_{i}, x_{j}, x_{k}=" 000$ ". The other two optical power levels $\left|T_{10}\right|^{2}$ and $\left|T_{01}\right|^{2}$ can be chosen from the six remaining voltage combinations corresponding to $x_{i}, x_{j}, x_{k}=" 110 ", " 101 ", " 100 ", " 011 "$, " 010 " and " 001 ". Mathematically, the above algorithm is equivalent to 2 -to- 3 bit mapping as shown in Fig. 2 (a) where $B_{l} B_{0}$ from $\left|T_{B_{1} B_{0}}\right|^{2}$ can be mapped to $x_{i}, x_{j}, x_{k}$ from $\left|T_{x_{i} x_{j} x_{k}}\right|^{2}$.

If $B_{1} B_{0}$ are two uncorrelated NRZ bit streams, the aforementioned mapping can be realized through three 4-to- 1 selectors whose inputs are static binary codes " 1 " or " 0 ", and each selector input corresponds to one column of the mapped result, $x_{i}$ or $x_{j}$ or $x_{k}$.


(a) Block diagram for realizing coarse tuning
(b) Precise tuning disable signal

Fig. 2 Voltage-tuning scheme related block diagrams

### 2.2. Fine tuning

After $x_{q}$ where $q=i, j, k$ has been chosen for both $\left|T_{10}\right|^{2}$ and $\left|T_{01}\right|^{2}$ through coarse tuning, $V_{p}$ where $p=1,2,3$ can have 16 possible values when $x_{q}=$ " 1 " and $x_{q}=$ " 0 " respectively. As a result, there will be 4096 (16x16x16) possible optical power levels in total and thus we name this scenario fine tuning.

Unlike $x_{q}$ being mapped from two NRZ streams $B_{1} B_{0}, a_{q, 3: 0}$ and $b_{q, 3: 0}$ are two 4-bit binary vectors which are constant. However, as long as $a_{q, 3: 0}$ and $b_{q, 3: 0}$ are not " 0000 ", $\left|T_{11}\right|^{2}$ and $\left|T_{00}\right|^{2}$ will be affected, so the extinction ratio will be reduced. Therefore, we need to generate another signal $V T_{-} E N$ to disable fine tuning when $B_{l} B_{0}$ equals either " 11 " or " 00 ", which can be realized through the XOR gate as shown in Fig. 2 (b).

## 3. Transmitter Structure

Based on the voltage-tuning scheme discussed in the previous section, a transmitter block diagram can be implemented as shown in Fig. 3.


Fig. 3 Transmitter block diagram
In the transmitter block diagram, there is a path [q], where $q=1,2,3$, generating voltage $V_{q}$ to drive one of the three phase-shifters from the three-segment MRM. Each path contains one voltage-mode driver, two inverter-based predrivers, two CML-to-CMOS converters, one CML 4-to-1 selector, and one CML D-type flip-flop.

The driver has CMOS logic inputs $x_{q u}$ and $x_{q d}$ along with two 4-bit DC inputs $a_{q, 3: 0}$ and $b_{q, 3: 0}$. On the other hand, the two uncorrelated NRZ streams $B_{l} B_{0}$, which act as selecting signals for 4-to-1 selectors in order to realize the coarse tuning, come from external bit-error-rate tester, BERT and thus have to be CML logic so as to achieve high speed. As a result, the 4-to-1 selector has to use CML logic, and there has to be a CML-to-CMOS converter between the driver and the selector. The driver sets $V_{\max }$ as $2.4 \mathrm{~V}, V_{D D I}$ as 1.2 V , and $V_{\text {min }}$ as 0 V , while all the circuits with CML logic including the CML-to-CMOS converter use 1.2 V as their $V_{D D}$ and 0 V as their ground.

There are two pairs of binary inputs for the driver. The first pair, $x_{q u}$ and $x_{q d}$, come from the output of the CML-to-CMOS converter. Since $x_{q u}$ and $x_{q d}$ drives the pull-up and pull-down branch of the driver respectively, they should share the waveform, but have different power supplies and grounds. $x_{q u}$ has a power supply of 2.4 V and uses 1.2 V as its ground, while $x_{q d}$ has a
power supply of 1.2 V and uses 0 V as ground. Also, inverter-based predriver has to be inserted after the CML-to-CMOS converter so that there will be enough driving capacity for the driver inputs. To drive both pull-up and pull-down branches of the driver, we have used two predrivers: one with 2.4 V as the power supply and 1.2 V as the ground, the other with 1.2 V as the power supply and 0 V as the ground. By putting AC-coupling capacitors before the two predrivers, their respective CML-to-CMOS converters can share the same power supply and ground, 1.2 V and 0 V respectively. However, in order to avoid charge sharing between two AC-coupling capacitors due to different power supplies and grounds of the two predrivers, we have used one CML-to-CMOS converter for each predriver even though they have the same power supply and ground.

The second pair of the driver inputs, $V T_{-} E N_{d}$ and $\overline{V T_{-} E N_{u}}$, are the signals for disabling fine tuning when two uncorrelated NRZ streams $B_{1} B_{0}$ are either " 11 " or " 00 ". Due to the structure of the driver, two pairs of the driver binary inputs have to arrive at the same time. Therefore, the path for creating fine tuning disable signals has to share the same block diagram as path [q], which contains one CML XOR gate, CML D-type flip-flop, two CML-to-CMOS converters, and four inverter-based predrivers. Although functionally we only need to use one inverter-based predriver for $V T_{-} E N_{d}$ and $\overline{V T_{-}} E N_{u}$ respectively, their outputs have to drive three drivers instead of one like the signal $x_{q u}$. As a result, we decide to add two more predrivers so that there will be enough driving capacities for three drivers without introducing extra loading to the signal path.

In the transmitter, there are three paths driving three phase-shifters and one extra path supplying the signals for disabling fine tuning. By making each path both contain the same blocks and have the same loading, we can make sure there are minimum skew between them. Inserting a CML D-type flip-flop before a CML-to-CMOS converter and after a CML 4-to-1 selector or a CML XOR gate can both synchronize the output of those four CML blocks and reduce their output jitter simultaneously. To realize the synchronization, we have to generate "clean" clock signals $C L K_{q}$ where $q=i, j, k, x$ for the CML D-type flip-flop, which can be achieved through robust design of a clock distribution network.

Just like clock signals $C L K_{q}$, two NRZ bit streams $B_{1 q}$ and $B_{0 q}$ where $q=i, j, k, x$ are the input signals for four CML circuits: three CML 4-to- 1 selectors and one CML XOR gate. Additionally, their original signals, $C L K, B_{1}$, and $B_{0}$, are all coming externally from BERT. As a result, we decide to implement two distribution networks, which is similar to the clock distribution network mentioned above, so as to generate $B_{1 q}$ and $B_{0 q}$ where $q=i, j, k, x$ respectively.

In our design, we have implemented flip-chip bonding to connect our drivers and three-segment MRM. Therefore, we add a flip-chip ball equivalent circuit at the loading of each driver [10].

## 4. Circuit Implementation

### 4.1. CML 4-to-1 selector and CML XOR gate

Although it is possible to realize CML 4-to-1 selector using a single CML circuit, deep sub-micron technology won't lend itself to stacking more than four transistors between supply and ground without driving one of them out of the saturation region [11]. Therefore, we implement the CML 4-to-1 selector by combining three CML 2-to-1 selectors as shown in Fig. 4, where two 2-to-1 selector1s sit at the front with their inputs provided off-chip by DC supply/ground and one 2-to-1 selector2 with its inputs fed by the outputs of two 2-to-1 selector 1 s .

In our design, we have assumed 1.2 V as our CML logic " 1 " and 0.8 V as our CML logic " 0 " with DC operating point as 1 V . Since the outputs of the CML 2-to-1 selector1 control the input transistors $M_{1}$ to $M_{4}$ of the CML 2-to-1 selector2, they should share the same DC operating point 1 V . Thus, the input transistors $M_{1}$ to $M_{4}$ of the CML 2-to-1 selector1 should also have the same DC operating point. Therefore, the DC inputs of CML 2-to-1 selector1 have to be converted by a buffer to CML logic before being fed into the input transistors $M_{1}$ to $M_{4}$ of the CML 2-to-1 selector1. The buffer ends up having $400 \Omega$ loading resistor $R_{\mathrm{D}}$ and 1 mA tail current $I_{0}$. The loading resistors and tail currents of the Gilbert cells inside both types of 2-to-1
selectors share the same value as those buffers. Since the drains of transistors $M_{5}$ and $M_{6}$ are connected to the sources of transistors $M_{1}$ to $M_{4}$, selecting signals $B_{1 i}$ and $B_{0 i}$ have to be shifted to lower DC operating point 0.7 V before applying to transistor $M_{5}$ and $M_{6}$, which is realized by level-shifter with the loading resistor $R_{\mathrm{D} 1}$ of $200 \Omega, R_{\mathrm{D} 2}$ of $350 \Omega$ and tail current $I_{1}$ of 1 mA . Due to limited voltage headroom, transistors from $M_{1}$ to $M_{8}$ are all low-threshold voltage transistors (lvt).

According to schematic of the CML 4-to-1 selector, selecting signal $B_{0 i}$ has passed through a level-shifter and a Gilbert cell before arriving at inputs of the CML 2-to-1 selector2, while selecting signal $B_{1 i}$ has only passed a level-shifter. Because signal $B_{0 i}$ and $B_{1 i}$ are synchronized, one buffer, which has the same delay as the Gilbert cell, has to be inserted after signal $B_{1 i}$ to compensate the skew.

At the same time, different loading will also act as another source of skew; signal $B_{0 i}$ drives two level-shifters, while signal $B_{1 i}$ only drives one buffer. Therefore, we have inserted one buffer after signal $B_{0 i}$ and added another one in cascade with the existing one after signal $B_{1 i}$. Furthermore, one dummy buffer is added directly after signals $B_{1 i}$. As a result, both signal $B_{1 i}$ and $B_{0 i}$ will drive two buffers.

According to the transmitter structure shown in Fig. 4, the outputs of both CML 4-to-1 selector and CML XOR gate should be synchronized. Therefore, instead of proposing a different structure, we implement the CML XOR gate with a CML 4-to-1 selector by using signal $B_{1 i}$ and $B_{0 i}$ as two selecting signals and DC input " 1 ", " 0 ", " 1 ", and " 0 " as four selector inputs, which corresponds to $1.2 \mathrm{~V}, 0 \mathrm{~V}, 1.2 \mathrm{~V}$ and 0 V respectively.


Fig. 4 CML 4-to-1 selector schematic and CML XOR gate

### 4.2. CML D-type flip-flop

In our CML D-type flip-flip design as shown in Fig. 5(a), we implement the design where two CML D-latches are in cascade with the first D-latch synchronized by $\overline{C L K}$ and the second by $C L K$ [12]. For each D-latch, transistor $M_{5}$ and $M_{6}$ with width of $10 \mu \mathrm{~m}$ are switched by clock signals from 0 to $2 \times I_{\text {bias }}=2.4 m A$, which results in $0.24 m A / \mu m$ current density, close to the peak $f_{T}$ current density of n-MOSFETs [13]. Although this is CML logic, we determine to achieve larger single-ended output amplitude of $600 m V_{p p}$, which can facilitate the jitter reduction of CML-to-CMOS converter. The resulting loading resistor $R_{D}$ is set as $250 \Omega$.

In order to improve the speed of the latch, we also choose transistors of different $V_{T}$ for signal path and clock path: $M_{1}, M_{2}$, $M_{3}$, and $M_{4}$, which are in the signal path, are low- $V_{T}$ transistors, while $M_{5}$ and $M_{6}$, which are in the clock path, are high- $V_{T}$ transistors. By using high- $V_{T}$ transistors in the clock path, the input pair of amplifiers that feeds the clock signals from the clock distribution network will be biased close to the velocity saturation region by having larger $V_{D S}$. By using low- $V_{T}$ transistors in the signal path, larger $g_{m}$ can be achieved with the same tail current.


Fig. 5 CML circuit schematics

### 4.3. CML-to-CMOS converter

Our proposed CML-to-CMOS converter- as shown in Fig. 5(b)- contains an amplifier, an inverter using Differential Cascade Voltage Swing Logic (DCVSL), and two inverter chains whose outputs are connected through six back-to-back inverter pairs.

Due to high DC operating point of the CML logic in our design and in order to ensure lower-jitter performance for CML-to-CMOS converter, we will further amplify the output signals from CML D-type flip-flop and also lower their DC operating point before feeding them into DCVSL so that the outputs of DCVSL can achieve rail-to-rail. The amplifier, as a result, has its single-ended amplitude reaching $800 m V_{p p}$ with DC operating point of 0.8 V .

After the amplification, we use DCVSL to convert CML logic to CMOS logic [14]. This logic uses a positive feedback latches as loadings, and its pull-down network (PDN) can be realized by two simple NMOS transistors which are driven by the differential outputs of the amplifier. Using this logic, the CML to CMOS converter output can reach the amplitude as large as rail-to-rail, which is essential to reduce the jitter of the inverter chains.

As shown in the transmitter schematic, only one of the differential outputs of the CML to CMOS converter will be fed to the predrivers. Besides, upon arriving to the same output node of DCVSL, differential output signals of the amplifier will pass through different numbers of transistors and thus, the output of DCVSL transitioning from high to low is faster than the transitioning from low to high. Therefore, there will be duty cycle distortion after passing through DCVSL [15]. To correct the duty cycle distortion, we add another inverter chain at the other output of DCVSL as well as back-to-back connected inverters between those two inverter chains. Since the transition from high to low is always faster, and two inverter chains after
differential outputs of DCVSL have opposite polarities, the two back-to-back inverters that connect those two inverter chains can average out the skew between two transition states and thus correct the duty cycle distortion. In our design, we add six of them along with nine inverters in cascade so that the output of the inverter chains has sufficient inverters to both correct the duty cycle and supply enough driving capacity for predrivers.

### 4.4. Clock and NRZ bit stream distribution network

The three distribution networks that are used to generate $C L K_{q}, B_{1 q}$, and $B_{0 q}$ where $q=i, j, k, x$ as mentioned in the transmitter design are shown in Fig. 6. There are three types of clock distribution networks: transmission line, inverter chain and CML logic. Among these three, transmission line has the best balance between jitter and power dissipation; this is why we decide to implement it in our design [16]. Also, the skew between different output nodes on the same transmission line will be minimized, which will result in minimizing the skew among $C L K_{q}, B_{1 q}$, and $B_{0 q}$ themselves where $q=i, j, k, x$.


Fig. 6 Clock and NRZ bit stream distribution network
Differential clock signals inputs from external BERT are modeled by two 12.5 GHz differential sine waves $C L K$ and $\overline{C L K}$ with single-ended amplitude of $300 m V_{p p}$ and DC operating point of 1 V . The clock signals pass through an open-drain amplifier, which is used for distributing clock signals by driving transmission line as its loading. Normally, the T-Line is designed to have characteristic impedance of $50 \Omega$ and thus, we need to use $50 \Omega$ as the termination resistor for the T-Line in order to minimize reflection. However, with such a low termination resistor, we need a significant amount of power to provide the same gain as using higher termination resistors. As a result, we use $125 \Omega$ as the termination resistor $R_{T}$ to balance between power and reflection, which results in using 4 mA in tail current so as to render the same gain and DC operating point as using 1 mA tail current and the $500 \Omega$ loading resistor. Since the transmission line is supposed to drive the tail transistors $M_{5}$ and $M_{6}$ of CML D-type flip-flop, we need four buffers to both shift transmission line output DC operating point and further increase their amplitudes. As a result, the final amplifiers have loading resistors $R_{D 1}$ of $50 \Omega, R_{D}$ of $3340 \Omega$ and tail current of 1.5 mA , which results in single-ended output of $300 \mathrm{~m} V_{p p}$ and DC operating point of 0.875 V .

Two pairs of differential NRZ bit streams $B_{1} / \overline{B_{1}}$ and $B_{0} / \overline{B_{0}}$ and from external BERT are modeled by two pairs of differential PRBS voltages with single-ended amplitude of $300 m V_{p p}$ and DC operating point of 0.9 V . Both bit streams use PN8
as their LFSR mode and choose seed $=1$, but $B_{0}$ is delayed for 85 period comparing to $B_{1}$. Unlike the clock distribution network, there is no level shifting for the NRZ bit stream distribution network, and a simple amplifier "AMP1" as shown in Fig. 5 is sufficient. Amplifier "AMP1", as a result, has the loading resistor $R_{D}$ of $400 \Omega$ and the tail current of 0.75 mA .

The transmission line structure we have used in our design is differential microstrip with coplanar shields routed in metal 7 (M7) and metal 5 (M5). Instead of using Virtuoso to simulate transmission line effect, we have used the more specialized electromagnetics tool SONNET.

### 4.5. Predriver and driver

The simplified schematic of predriver and driver can be seen in Fig. 7. There are six inputs for the driver: two from the outputs of CML 4-to-1 selector, two from the output of CML XOR gate, $\overline{a_{i, 3: 0}}$, and $b_{q, 3: 0}$. The first four are NRZ bit streams, which need predrivers to ramp up the driving capacity, while the final two are DC signals and can be applied directly to the driver inputs. The input signals for the four predrivers are $X_{i}, \overline{V T_{-} E N}, V T_{-} E N$, and $X_{i}$; their resulting output signals of the four predrivers are $X_{i u}, \overline{V T}_{-} E N_{u}, V T_{-} E N_{d}$, and $X_{i d}$ respectively.


Fig. 7 Driver and predriver schematic
In addition to ramp up driving capacity, the predriver can also achieve level-shifting by implementing an ac-coupling capacitor and two back-to-back inverters in front of a series of inverters in cascade with a fanout of 2 . The two predrivers that drive the pull-up branch of the driver have 2.4 V as their power supply and 1.2 V as their ground, while the two predrivers that drive the pull-down branch have 1.2 V as their power supply and 0 V as their ground. As a result, the DC operating point of the predriver can self-bias itself to the mid-level between its power supply and ground: 1.8 V for two predrivers that drive the pull-up branch and 0.6 V for two predrivers that drive the pull-down branch, which can maximize the noise margin of the inverter chains.

When the NRZ bit stream contains a string of consecutive identical digits, the voltage at the AC-coupling node will drop, resulting in low-frequency pattern-dependent jitter (PDJ) [17]. In our design, although we are using PN8 with a delay of 85 period between two NRZ streams, the longest consecutive identical digits are 13 bits due to the usage of three 4 -to- 1 selectors. By choosing AC-coupling capacitor to be 2.5 pF , the PDJ in our system will be 2.2 ps , and it is smaller than $3 \%$ of the bit period.

The driver that we are using is the voltage-mode driver with 2.4 V as supply and 0 V as ground. Between supply and ground, there are four transistors stacked together so that their $V_{D S}$ is not bigger than the nominal power supply of 65 nm technology. NMOS transistor M5 and PMOS transistor M6 are connected to the sources of transistors M2 and M3 so that when the pull-down branch changes from enabled to disabled, the sources of transistor M3 will be discharged to 1.2 V ; when the pull-up branch changes from enabled to disabled, the sources of transistor M 2 will be discharged to 1.2 V . There are two blocks at the gate of cascode transistor M2 and M3 which can generate edge-triggered pulses [18]. When signal $X_{i d}$ jumps from 0V to 1.2 V , the gate of M 2 will drop to 0 V for a short period of time and jump back again to 1.2 V , which will increase the $V_{S G}$ of transistor M2 for a short period of time and can accelerate the charging at the driver output; when signal $X_{\text {iu }}$ drops from 2.4 V to 1.2 V , the gate of M 3 will rise to 2.4 V for a short period of time and drop back to 1.2 V , which will increase the $V_{G S}$ of transistor M3 for a short period of time and can accelerate the discharging at the driver output.

The output voltage-tuning is realized through adding four NMOS transistors with the size ratio of 8:4:2:1 to the source of transistor M2 and four PMOS transistors with the same size ratio to the source of transistor M3. Each of the four transistors in the pull-up branch is controlled by a NOR gate through signal $\overline{a_{i, 3: 0}}$ and $b_{q, 3: 0}$, while each of the four transistors in the pull-down branch is controlled by a NAND gate through signal $V T_{-} E N_{d}$ and $b_{i, 3: 0}$. The sources of both pairs of transistors will be added to 1.2V.

Due to the driver structure, the loadings for four predrivers are different, which would skew between each path after passing through their respective predrivers. Therefore, in our design, we introduce a dummy logic at the loading of each predriver to balance the loading of each path to minimize the skew between each path.

## 5. Simulation Results

Our proposed transmitter is implemented in TSMC 65nm LP technology, and the three-segment MRM is implemented in IMEC-ePIXfab SiPhotonics ISIPP50G technology. In order to use the Cadence Virtuoso to simulate silicon photonics components, we have utilized Verilog-A block diagram as shown in Fig. 8 to model the three-segment MRM [19], which is based on the layout shown in Fig. 9 occupying the area of $0.0705 \mathrm{~mm}^{2}(0.15 \mathrm{~mm}$ by 0.47 mm$)$.


Fig. 8 Three-segment MRM and its Verilog-A block diagram
As shown in Fig. 8(a), the ring modulator contains a silicon-on-insulator (SOI) waveguide bus, a $5 \mu \mathrm{~m}$ radius ring phase-shifter, and a coupling section between them. The three segments inside the $5 \mu \mathrm{~m}$ radius ring phase-shifter have the center angles of $\pi: \pi / 2: \pi / 4$, which are modeled by phase-shifter3, phase-shifter2, and phase-shifter1 as shown in Fig. 8(b) respectively. The remaining length inside the ring is divided into three to separate the three segments and is modeled together, using the waveguide model. Besides, a continuous-wave (CW) laser and grating-couplers shown in Fig. 8(a) represent the flow
of optical signals. All of the parameters used in the Verilog-A models that are utilized to form the block diagram shown in Fig. 8(b) will base on IMEC-ePIXfab SiPhotonics ISIPP50G technology where the real and imaginary part of $\Delta n_{e f f}$ as well as the junction capacitance used in its phase-shifter model can be seen in Fig. 10 in terms of biasing voltage.


Fig. 9 Transmitter core layout and three-segment MRM layout


Fig. 10 The change of effective index and capacitance of phase-shifters with respect to biasing voltage
The resulting transmission curve of the three-segment MRM derived from its Verilog-A model can be seen in Fig. 11, where $|T|^{2}(0 V, 0 V, 0 V, \lambda)$ is the transmission curve under zero-biasing condition, and $\left|T_{0}\right|^{2}(2.4 V, 2.4 V, 2.4 V, \lambda)$ is the transmission curve under maximum reverse-biasing condition. As a result, the extinction ratio of the three-segment MRM, $E R_{0}$, is 8.7 dB .

In order to verify the adaptability of our proposed transmitter structure, we model the two variations of the $5 \mu m$ three-segment MRM, where the first has three $10 \%$ longer phase-shifters, and the second has three $10 \%$ shorter phase-shifters. The first variation with longer phase-shifters has three phase-shifters with length of $10 \pi \times 1.1 / 8,10 \pi \times 1.1 / 4$, and $10 \pi \times 1.1 / 2$ respectively and one waveguide with length of $10 \pi \times 3 / 80$, all in the unit of $\mu m$. The second variation with shorter phase-shifters has three phase-shifters with length of $10 \pi \times 0.9 / 8,10 \pi \times 0.9 / 4$, and $10 \pi \times 0.9 / 2$ respectively and one waveguide with length of $10 \pi \times 17 / 80$, all in the unit of $\mu m$.

The transmission curve of two variations can also be seen in Fig. 11. Both variations share the same transmission curve $|T|^{2}(0 V, 0 V, 0 V, \lambda)$ as the proposed three-segment MRM under zero-biasing condition. Under maximum reverse-biasing condition, the first variation has the transmission curve of $\left|T_{2}\right|^{2}(2.4 V, 2.4 V, 2.4 \mathrm{~V}, \lambda)$, while the second variation has the transmission curve of $\left|T_{1}\right|^{2}(2.4 V, 2.4 V, 2.4 V, \lambda)$, which causes extinction ratio of $8.3 \mathrm{~dB}\left(E R_{2}\right)$ and $9.3 \mathrm{~dB}\left(E R_{1}\right)$ for the first and second variation respectively.

In order to verify our proposed circuit-level transmitter structure, an eye-diagram of optical power output $P_{\text {out }}$ of our proposed three-segment MRM can be used as the main criteria. Eye-diagrams of the two variations of our proposed MRM can be further used to verify the adaptability of our transmitter structure. At the same time, because the optical PAM-4 transmitter has to be working as fast as 12.5 GHz in our design, the parasitic effect will play a huge role in the circuit-level design and thus, we have drafted the layout of our proposed circuit-level transmitter design in order to perform post-layout simulation. The implemented layout can be seen in Fig. 9 , and its circuit area is $0.1728 \mathrm{~mm}^{2}(0.48 \mathrm{~mm}$ by 0.36 mm$)$. The extraction view derived from the implemented transmitter layout uses $R-C-C_{c}$ modeling and is implemented into our transient simulation to derive the final eye-diagram of $P_{\text {out }}$.


Fig. 11 Transmission curve of the three-segment MRM and its two variations

The transient simulation is performed under the following conditions. First, 1 mW optical power is injected into our proposed three-segment MRM through a laser source, which corresponds to the $V_{\text {laser }}$ of 31.6 mV . Second, the frequency offset $\Delta f$ is set to be -1.232 THz , which corresponds to the resonant wavelength of 1559.94 nm as shown in Fig. 10. The differential clock signal inputs from external BERT are modeled by two 12.5 GHz differential sine waves with a single-ended amplitude of $300 m V_{p p}$ and a DC operating point of 1 V , while two pairs of uncorrelated, and differential NRZ bit streams from external BERT are modeled by two pairs of 12.5 GHz differential PRBS voltages with single-ended amplitude of $300 \mathrm{~m} V_{p p}$ and DC operating point of 0.9 V .

The eye-diagram of $P_{\text {out }}$ at $12.5 \mathrm{GS} / \mathrm{s}$ symbol rate from our proposed three-segment MRM, which is derived from the extraction simulation result of the circuit-level transmitter structure, can be seen in Fig. 12, where all four levels are equally spaced with level mismatch $R_{L M}=98 \%$.


Fig. 12 Eye-diagram of Pout at $12.5 \mathrm{GS} / \mathrm{s}$ symbol rate based on the three-segment MRM
The transmission intensity for level " 10 " and " 01 ", which this eye-diagram is based on, can be expressed as:

$$
\begin{align*}
& \left|T_{10}\right|^{2}=|T|^{2}\left[V_{1 L}(1,0000), \quad V_{2 L}(1,1110), \quad V_{3 L}(0,0000)\right]  \tag{6}\\
& \left|T_{01}\right|^{2}=|T|^{2}\left[V_{1 L}(1,0000), \quad V_{2 L}(1,1110), \quad V_{3 L}(0,0000)\right] \tag{7}
\end{align*}
$$



Fig. 13 Eye-diagram of Pout at $12.5 \mathrm{GS} / \mathrm{s}$ symbol rate based on the first MRM variation


Fig. 14 Eye-diagram of Pout at $12.5 \mathrm{GS} / \mathrm{s}$ symbol rate based on the second MRM variation
The eye diagrams of $P_{\text {out }}$ at $12.5 \mathrm{GS} / \mathrm{s}$ symbol rate from the first and the second variation of our proposed three-segment MRM can be seen in Fig. 13 and Fig. 14 respectively, where all four levels are equally spaced with level mismatch $R_{L M}=99.1 \%$ and $R_{L M}=97.7 \%$. The transmission intensity for level " 10 " and " 01 " that eye-diagrams in Fig. 13 and Fig. 14 are based on can be expressed as:

$$
\begin{equation*}
\left|T_{10}\right|^{2}=|T|^{2}\left[V_{1 L}(1,0010), \quad V_{2 L}(1,1110), \quad V_{3 L}(0,0000)\right] \tag{8}
\end{equation*}
$$

$$
\begin{align*}
& \left|T_{01}\right|^{2}=|T|^{2}\left[V_{1 L}(1,0010), \quad V_{2 L}(1,1110), \quad V_{3 L}(0,0000)\right]  \tag{9}\\
& \left|T_{10}\right|^{2}=|T|^{2}\left[V_{1 L}(1,0000), \quad V_{2 L}(1,1111), \quad V_{3 L}(0,0000)\right]  \tag{10}\\
& \left|T_{01}\right|^{2}=|T|^{2}\left[V_{1 L}(1,0000), \quad V_{2 L}(0,1111), \quad V_{3 L}(0,0000)\right] \tag{11}
\end{align*}
$$

As can be seen from Eqs. (6)-(11), lower " 1 " tuning is used for all three scenarios [9]. Besides, $V_{3 L}(0,0000)$ showing up in all three scenarios means that fine tuning in the $x_{k}$ is always disabled. Therefore, the two predrivers that are supplying the signals for disabling fine tuning only need to drive the driver for $x_{i}$ and $x_{j}$, which not only simplifies global routing but also reduces the parasitic capacitance coming along with it, as shown in Fig. 9. By disabling fine tuning in the $x_{k}$, lower amount of possible PAM-4 optical levels will be produced for both lower " 1 " tuning and higher " 0 " tuning; 6 out of 15 different PAM-4 optical levels have 15 possible PAM-4 optical levels, while 9 out of 15 different PAM-4 optical levels have 256 possible PAM-4 optical levels. As a result, both lower " 1 " tuning and higher " 0 " tuning have in total 2400 possible PAM- 4 optical levels respectively.

The power consumption of the transmitter is summarized in Table 1. There are two power supplies: 1.2 V and 2.4 V . Only predrivers and drivers are using both power supplies. The most power is consumed in the CML circuits which include three CML 4-to-1 selectors, one CML XOR gate, and four CML D-type Flip-Flop, whose power consumption is 145.9 mW . The CML-to-CMOS converter consumes 28.4 mW , while the power consumption of both predriver and drivers combined is 34.45 mW . The energy efficiency of the whole transmitter structure is $8.29 \mathrm{pJ} / \mathrm{bit}$. If we only consider drivers and predrivers, the energy efficiency is $1.37 \mathrm{pJ} / \mathrm{bit}$, and if we only consider drivers and the modulator, the energy efficiency is as low as $0.5 \mathrm{pJ} / \mathrm{bit}$.

Table 1 Transmitter Power Consumption Breakdown

| Circuit | Supply Voltage [V] | Supply Current [I] | Power Consumption [mW] |
| :---: | :---: | :---: | :---: |
| CML Circuit | 1.2 | 121.59 | 145.9 |
| CML-to-CMOS Converter | 1.2 | 23.7 | 28.4 |
| Predrivers | 1.2 | 1.16 | 1.39 |
|  | 2.4 | 9 | 21.6 |
| Drivers | 1.2 | 1.83 | 2.2 |
|  | 2.4 | 3.86 | 9.26 |
| Total Transmitter Power Consumption |  |  | 208 |

Table 2 Performance Summary of our Circuit-level Transmitter Design

| Photonics Circuit | IMEC iSiPP50G with LP 65nm CMOS |
| :---: | :---: |
| Integration | Flip-Chip |
| Wavelength | 1550 nm |
| Driver Supply | 2.4 V |
| Modulator Device | Microring-Resonator |
| Extinction Ratio | 9 dB |
| Number of Possible PAM-4 Levels | 4800 |
| PAM-4 Data Rate | $25 \mathrm{~Gb} / \mathrm{s}$ |
| PAM-4 Energy Efficiency | $0.5 \mathrm{pJ} / \mathrm{bit}$ |

We have summarized the performance of our circuit-level transmitter in Table 2. The circuit-level transmitter structure is able to achieve targeted PAM-4 data rate of $25 \mathrm{~Gb} / \mathrm{s}$ with the extinction ratio of 9 dB . We determine to implement flip-chip integration method so that the transmitter structure can achieve better energy efficiency comparing to bond-wire integration. The optical signal is working around 1550 nm wavelength, which is within the C-band. The driver supply is twice the nominal voltage of TSMC 65 nm . By implementing flip-chip bonding, we have been able to achieve PAM-4 energy efficiency of $0.5 \mathrm{pJ} / \mathrm{bit}$, which only includes power of the optical modulator and its driver. Although the fine tuning is always disabled for the

LSB $x_{k}$ due to the layout constraint, we can still achieve in total 4800 possible PAM-4 levels, which is much higher than other state-of-the-art designs. However, the flexibility of the scheme comes with a price; a large amount of power is consumed in the CML circuits, and one of whose main purposes is to convert two PRBS signals into three NRZ streams so as to control the three phase-shifters inside the microring. Nonetheless, the performance results show that the scheme can be achieved through the circuit-level transmitter structure that we have proposed.

## 6. Conclusion

To summarize, we implement a circuit-level PAM-4 transmitter design that is based on the voltage-tuning scheme for realizing optical PAM-4 using a three-segment microring modulator. First, we have done an overview of the voltage-tuning scheme including categorizing it into fine tuning and coarse tuning. Then, the transmitter structure is proposed and each of its block diagram is introduced. Finally, detailed circuit implementation of each block diagram has been discussed, and the simulation results based on extracted layout have been reported. The final simulation results show that our proposed circuit-level transmitter design is able to achieve PAM-4 data rate of $25 \mathrm{~Gb} / \mathrm{s}$ with high percentage of level separation mismatch ratio $R_{L M}$. The simulation also shows that although the transmitter design can attain much higher flexibility when it comes to the number of possible PAM-4 levels, the CML circuits which requires to make that happen will consume a lot of power at the same time.

## Conflicts of Interest

The authors declare no conflict of interest.

## List of Acronyms

| BERT | Bit-error-rate Tester |
| :--- | :--- |
| CML | Current-Mode Logic |
| CMOS | Complementary Metal Oxide Semiconductor |
| CW | Continuous Wave |
| DCVSL | Differential Cascade Voltage Swing Logic |
| IoT | Internet of Things |
| MRM | Microring Modulator |
| NRZ | Non-return to Zero |
| PAM | Pulse-amplitude Modulation |
| PDJ | Pattern-dependent Jitter |
| PDN | Pull-down Network |
| SOI | Silicon-on-insulator |

## References

[1] A. A. Fuqaha, M. Guizani, M. Mohammadi, and M. Aledhari,"Internet of things: A survey on enabling technologies, protocols, and applications," IEEE Communications Surveys Tutorials, vol. 17, no. 4, pp. 2347-2376, 2015.
[2] Y.Sun, R. Shubochkin, and D. Braganza "Technical feasibility of new $200 \mathrm{~Gb} / \mathrm{s}$ and $400 \mathrm{~Gb} / \mathrm{s}$ links for data centers," 2018 IEEE Optical Interconnects Conference, June 2018, pp. 37-38.
[3] T. Shi, T. I. Su, N. Zhang, C. Y. Hong, and D. Pan,"Silicon photonics platform for 400 g data center applications," 2018 Optical Fiber Communications Conference and Exposition, March 2018, pp. 1-3.
[4] R. Soref and J. Larenzo, "All-silicon active and passive guided-wave components for $=1.3$ and 1.6 m ," IEEE Journal of Quantum Electronics, vol. 22, no. 6, pp. 873-879, June 1986.
[5] J. Wang, and Y. Long, "On-chip silicon photonic signaling and process-ing: a review," Science Bulletin, vol. 63, no. 19, pp. 1267-1310, 2018.
[6] P. Dong, S. Liao, D. Feng, H. Liang, D. Zheng, R. Shafiiha, and A. V. Krishnamoorthy, "Low vpp, ultralow-energy, compact, high-speed silicon electro-optic modulator," Optics Express, vol. 17, no. 25, pp. 22484-22490, December 2009.
[7] J. Van Campenhout, Y. Ban, P. De Heyn, A. Srinivasan, J. De Coster, S. Lardenois, and S. Janssen, "Silicon photonics for 56G NRZ optical interconnects," 2018 Optical Fiber Communications Conference and Exposition, March 2018, pp. 1-3.
[8] S. Moazeni, S. Lin, M. Wade, L. Alloatti, R. J. Ram, M. Popović, and V. Stojanović, "A 40-Gb/s PAM-4 transmitter based on a ring-resonator optical DAC in 45-nm SOI CMOS," IEEE Journal of Solid-State Circuits, vol. 52, no. 12, pp. 3503-3516, December 2017.
[9] R. Wang and V. Saxena, "A cmos photonic optical pam4 transmitter linearized using three-segment ring modulator," 2019 IEEE 62nd International Midwest Symposium on Circuits and Systems, August 2019, pp. 1114-111.
[10] U. Pfeiffer and B. Welch, "Equivalent circuit model extraction of flip-chip ball interconnects based on direct probing techniques," IEEE Microwave and Wireless Components Letters, vol. 15, no. 9, pp. 594-596, September 2005.
[11] Q. Huang, F. Piazza, P. Orsatti, and T. Ohguro, "The impact of scaling down to deep submicron on CMOS RF circuits," IEEE Journal of Solid- State Circuits, vol. 33, no. 7, pp. 1023-1036, July 1998.
[12] T. Chalvatzis, K. H. Yau, R. A. Aroca, P. Schvan, M. T. Yang, and S. P. Voinigescu, "Low-voltage topologies for 40-Gb/s circuits in nanoscale CMOS," IEEE Journal of Solid-State Circuits, vol. 42, no. 7, pp. 1564-1573, July 2007.
[13] T. O. Dickson, K. H. Yau, T. Chalvatzis, A. M. Mangan, E. Laskin, R. Beerkens, and S. P. Voinigescu, "The invariance of characteristic current densities in nanoscale MOSFETs and its impact on algorithmic design methodologies and design porting of SiGe Bi-CMOS high-speed building blocks," IEEE Journal of Solid-State Circuits, vol. 41, no. 8, pp. 1830-1845, August 2006.
[14] S. Liang, D. H. K. Hoe, and C. A. T. Salama, "BiCMOS DCVSL gate," Electronics Letters, vol. 27, no. 4, pp. 346-347, Febuary 1991.
[15] G. Balamurugan, J. Kennedy, G. Banerjee, J. E. Jaussi, M. Mansuri, F. O'Mahony, and R. Mooney, "A scalable 515 Gbps , 1475 mw low-power I/O transceiver in 65 nm CMOS," IEEE Journal of Solid- State Circuits, vol. 43, no. 4, pp. 1010-1019, April 2008.
[16] F. O’Mahony, M. Mansuri, B. Casper, J. E. Jaussi, and R. Mooney, "A low-jitter PLL and repeaterless clock distribution network for a $20 \mathrm{~Gb} / \mathrm{s}$ link," 2006 Symposium on VLSI Circuits 2006. Digest of Technical Papers, June 2006.
[17] B. Analui, J. F. Buckwalter, and A. Hajimiri, "Data-dependent jitter in serial communications," IEEE Transactions on Microwave Theory and Techniques, vol. 53, no. 11, pp. 3388-3397, November 2005.
[18] H. Li, Z. Xuan, A. Titriku, C. Li, K. Yu, B. Wang, and T. Baehr-Jones, "A $25 \mathrm{gb} / \mathrm{s}, 4.4 \mathrm{v}$-swing, ac-coupled ring modulator-based WDM transmitter with wavelength stabilization in 65 nm cmos," IEEE Journal of Solid-State Circuits, vol. 50, no. 12, pp. 3145-3159, December 2015.
[19] R. Wang, J. Shawon, and V. Saxena, "A CMOS photonic PAM-4 electro-optic DAC using coupling based microrings," IEEE Int. Midwest Symposium on Circuits and Systems, 2018.


Copyright© by the authors. Licensee TAETI, Taiwan. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY-NC) license (https://creativecommons.org/licenses/by-nc/4.0/).


[^0]:    * Corresponding author. E-mail address: wang2430@vandals.uidaho.edu

