### A PAM-4 VCSEL TRANSMITTER WITH 2.5 TAP NONLINEAR

# EQUALIZER IN 65NM CMOS

### A Thesis

by

# ABHINAV TYAGI

# Submitted to the Office of Graduate and Professional Studies of Texas A&M University in partial fulfillment of the requirements for the degree of

# MASTER OF SCIENCE

| Chair of Committee, | Samuel Palermo      |  |
|---------------------|---------------------|--|
| Committee Members,  | Jose Silva-Martinez |  |
|                     | Laszlo Kish         |  |
|                     | Rabi N. Mahapatra   |  |
| Head of Department, | Miroslav M. Begovic |  |

December 2017

Major Subject: Electrical Engineering

Copyright 2017 Abhinav Tyagi

#### ABSTRACT

This thesis presents a Vertical Cavity Surface Emitting Laser (VCSEL) based transmitter that uses a nonlinear equalizer to equalize for nonlinear and bandwidth limited behavior of VCSEL. The transmitter employs PAM4 modulation scheme and a 2.5 tap nonlinear equalizer to maximize the vertical eye opening and reduce the skew in PAM4 eyes resulting from nonlinear behavior. The equalizer can also compensate for the static nonlinearity resulting from finite output impedance of tail current sources and low bandwidth resulting from the large capacitance (parasitic and pad) and large resistance (of VCSEL) at the output node. The nonlinear equalizer reduces to a traditional linear equalizer in cases where VCSEL can be approximated as linear e.g., for high bias currents. For such cases, 2.5 tap equalizer provides performance improvement over traditional 2 tap equalizer due to larger memory. The proposed architecture here implements a 2.5 tap nonlinear equalizer using a look-up-table approach and can equalize for all 32  $(4^{2.5})$  rising, falling and non-transitioning edges separately. The proposed architecture also uses a nonuniform DAC in the current mode output driver which utilizes the information related to unused levels and results in improved resolution when compared against the traditionally used uniform DAC. The transmitter consumes a power of 250mW and achieves a data rate of 50Gbps with a power efficiency of 5pJ/bit. The core transmitter area including PRBS, LUT, serializer and output driver is 375um\*500um while the total chip area is 1.4mm\*1.4mm. The transmitter has been implemented in 65nm CMOS technology.

ii

# DEDICATION

To my Tai Ji and Tau Ji.

#### ACKNOWLEDGEMENTS

I would like to take this opportunity to thank my advisor, Prof. Samuel Palermo, for his guidance and constant support throughout the course of my Master's thesis. I am grateful to him for putting faith in me and giving me the opportunity to work on this project. I am thankful to him for taking out time every week to discuss the progress of my work and to provide invaluable suggestions. I am deeply inspired by his organization and time management skills and would want to incorporate them in my life.

I would like to thank Prof. Jose Silva-Martinez, Prof. Laszlo Kish and Prof. Rabi Mahapatra for serving on my committee. Prof. Silva's course on Network Theory deepened my interest in Analog/Mixed Signal Circuits. Prof. Kish's courses on Noise have inspired me to research noise analysis of circuits.

I feel fortunate to be a part of Prof. Palermo's very well-organized research group. I am extremely indebted to Ashkan Roshan Zamir and Takayuki Iwai for the technical support and discussions I had with them. This work would not have been possible without their help. I am grateful to Kunzhi Yu for helping me with PCB design and testing. I would also like to thank Sagi Mathai, Wayne Sorin and Binhao Wang from HPE for providing technical support on PCB design, VCSEL integration and testing.

I would like to thank my friends Harsheen Kaur, Robin Gupta, Kaiwalya Swami, Sitanshu Satpathy and Avadhut Junnarkar for their friendship and wonderful time I spent here at Texas A&M. I would also like to thank Anish Morakhia, Venkatraman Natarajan and Sasank Ganesh for all the technical discussions we had during my research.

iv

I am thankful to Prof. Laxminidhi T. and Prof. U. Sripati Acharya at NITK Surathkal, India for introducing me to Analog Electronics and Electromagnetic Wave Theory. Their dedication to teaching is remarkable and was one of the reason that got me interested in Analog/Mixed-Signal/RF IC design.

I would like to thank my friends Anshul Tyagi, Shashank Tiwari, Sandeep Verma, Navin Kumar, Neeraj Verma and Kushagra Rao for always being supportive despite being on other side of the world.

My family has been a constant source of support and motivation throughout my life. The same was true during this thesis. I especially thank my parents to support my decision to pursue higher studies abroad.

### CONTRIBUTORS AND FUNDING SOURCES

# Contributors

This work was supervised by a thesis committee consisting of Professor Samuel Palermo (Advisor), Professor Jose Silva-Martinez and Professor Laszlo Kish of the Department of Electrical and Computer Engineering and Professor Rabi N. Mahapatra of the Department of Computer Science and Engineering.

The work discussed in Section 4.2 was done by Takayuki Iwai, visiting scholar from Toshiba Memory Corp., Japan.

# **Funding Sources**

This work was supported by Hewlett Packard Enterprise (HPE), Palo Alto, California.

# TABLE OF CONTENTS

| ABSTRACT                                       | ii                                                                                         |
|------------------------------------------------|--------------------------------------------------------------------------------------------|
| DEDICATION                                     | iii                                                                                        |
| ACKNOWLEDGEMENTS                               | iv                                                                                         |
| CONTRIBUTORS AND FUNDING SOURCES               | vi                                                                                         |
| TABLE OF CONTENTS                              | vii                                                                                        |
| LIST OF FIGURES                                | ix                                                                                         |
| LIST OF TABLES                                 | xii                                                                                        |
| 1. INTRODUCTION                                | 1                                                                                          |
| <ul> <li>1.1 High Speed Serial Links</li></ul> | 2<br>                                                                                      |
| 2. LITERATURE SURVEY                           | 12                                                                                         |
| <ul> <li>2.1 Transmitter</li></ul>             | 14         15         20         20         20         20         22         24         24 |

| 3. | RESEA  | RCH PROPOSAL                                    | 28 |
|----|--------|-------------------------------------------------|----|
|    | 3.1    | Limitations of Existing Architectures           | 28 |
|    | 3.2    | Challenges with VCSEL Based Transmitters        | 29 |
|    |        | 3.2.1 Effects of Nonlinearity and Low Bandwidth | 29 |
|    |        | 3.2.2 Sources of Nonlinearity and Low Bandwidth | 32 |
|    | 3.3    | Proposed Idea                                   | 33 |
|    | 3.4    | Coefficients of Nonlinear Equalizer             | 35 |
|    | 3.5    | Optimal Number of Taps for Nonlinear Equalizer  | 42 |
|    | 3.6    | Equalization for VCSEL Biased at High Current   | 49 |
| 4. | TRANS  | MITTER ARCHITECTURE                             | 51 |
|    | 4.1    | Fundamental Blocks                              | 51 |
|    |        | 4.1.1 Pseudo Random Bit Sequence (PRBS)         | 51 |
|    |        | 4.1.2 Look-Up-Table (LUT)                       | 53 |
|    |        | 4.1.3 Serializer                                | 57 |
|    |        | 4.1.4 Output Driver                             | 59 |
|    | 4.2    | PLL and Input Clock Path                        | 64 |
|    |        | 4.2.1 Bypass and PLL Clock Paths                | 64 |
|    |        | 4.2.2 Input and CML Buffers                     | 66 |
|    |        | 4.2.3 Voltage Controlled Oscillator (VCO)       | 66 |
|    |        | 4.2.4 Phase Locked Loop (PLL)                   | 69 |
|    | 4.3    | Transmitter Top Level                           | 70 |
|    |        | 4.3.1 Nonuniform Output DAC                     | 72 |
| 5. | MEASU  | JREMENT RESULTS                                 | 74 |
|    | 5.1    | Measurement Setup                               | 74 |
|    | 5.2    | Printed Circuit Boards                          | 75 |
|    | 5.3    | Measurement Results                             | 81 |
| 6. | CONCL  | USION                                           | 92 |
| RE | FERENC | CES                                             | 93 |

# LIST OF FIGURES

| FIGURE |                                                          | Page |
|--------|----------------------------------------------------------|------|
| 1.1    | General Optical Link                                     | 2    |
| 1.2    | Optical Fiber                                            | 4    |
| 1.3    | Current Mode Driver                                      | 9    |
| 2.1    | Transmitter Output Stage (Current Mode Driver)           | 13   |
| 2.2    | 2-Tap FFE Frequency Response                             | 17   |
| 2.3    | TX Equalization with FIR Filter                          | 17   |
| 2.4    | RX Equalization with FIR Filter                          | 18   |
| 2.5    | VCSEL's Equivalent Small Signal Model                    | 21   |
| 2.6    | Electrical and Optical Stages in VCSEL Model             | 22   |
| 2.7    | Complex Zero Equalizer                                   | 25   |
| 2.8    | Block Diagram of PAM-4 Transmitter with 2-Tap Linear FFE | 26   |
| 2.9    | Asymmetric Equalizer for 2-Tap NRZ                       | 27   |
| 3.1    | Step Response for all Possible Transitions in 2-Tap PAM4 | 30   |
| 3.2    | Effect of VCSEL Nonlinearity on PAM-4 Eye Diagram        | 31   |
| 3.3    | Current Levels in Output DAC                             | 37   |
| 3.4    | Process of Selecting Optimal Coefficients                | 40   |
| 3.5    | Output DAC Characterization                              | 44   |
| 3.6    | All Possible Transitions for 6-tap PAM-4                 | 45   |

| FIG | URE |
|-----|-----|
| TIU | UKE |

| Page |
|------|
|------|

| 3.7  | Effect of Nonlinear Equalization on PAM-4 Eye Diagram: (a) No Equalization, (b) 2-Tap Equalization, (c) 2.5-Tap Equalization, and (d) 3-Tap Equalization | 46 |
|------|----------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 3.8  | Vertical and Horizontal Eye Opening Versus Number of Taps                                                                                                | 48 |
| 3.9  | Improvement Shown by 2.5-Tap Nonlinear Equalizer                                                                                                         | 49 |
| 4.1  | PRBS-15 Block and Serialization of Symbols                                                                                                               | 52 |
| 4.2  | Connections Between PRBS and LUT                                                                                                                         | 54 |
| 4.3  | Mapping of PRBS Bits on LUT for 1 Bit of Output Driver                                                                                                   | 56 |
| 4.4  | Mapping of PRBS Bits on LUT for 5 Bits of Output Driver                                                                                                  | 56 |
| 4.5  | Quadrature 8-to-1 MUX and Clocking Unit                                                                                                                  | 58 |
| 4.6  | Direct FIR Architecture                                                                                                                                  | 60 |
| 4.7  | Segmented DAC Architecture                                                                                                                               | 60 |
| 4.8  | Output Driver                                                                                                                                            | 62 |
| 4.9  | Input Clock Path: (a) Complete Path, (b) Bypass Clock Path,<br>and (c) PLL Clock Path                                                                    | 65 |
| 4.10 | Shunt Inductive Peaking CML Buffer                                                                                                                       | 67 |
| 4.11 | LC-VCO Circuit Diagram Using Two LC Phase Noise Filters                                                                                                  | 67 |
| 4.12 | Third Order, Type II Charge Pump PLL Diagram                                                                                                             | 68 |
| 4.13 | Transmitter Architecture                                                                                                                                 | 71 |
| 4.14 | Basic Cell of LUT: 32-to-1 Mux                                                                                                                           | 72 |
| 4.15 | Comparison of Uniform and Nonuniform DAC                                                                                                                 | 73 |
| 5.1  | Measurement Setup                                                                                                                                        | 76 |
| 5.2  | Printed Circuit Board                                                                                                                                    | 77 |

# FIGURE

5.3

5.4

5.5

5.6

5.7

5.8

5.9

5.10

5.11

5.12

5.13

5.14

5.15

| Placement of Transmitter IC and VCSEL on PCB                                                                                                   | 78 |
|------------------------------------------------------------------------------------------------------------------------------------------------|----|
| Micrograph of Transmitter IC                                                                                                                   | 78 |
| Micrograph of VCSEL (a) Philips, (b) VI Systems                                                                                                | 79 |
| Bonding between Transmitter IC and VCSEL                                                                                                       | 79 |
| Measurement Setup in Lab (a) Complete Test Setup, (b) Close-Up Near RF Board                                                                   | 80 |
| (a) Measured L-I Curve, (b) L-I-V Curve from VI Systems VCSEL<br>Datasheet                                                                     | 81 |
| Pulse Responses with Modulation Current of 3mA: (a) I <sub>bias</sub> =2.5mA,<br>(b) I <sub>bias</sub> =4.5mA and (c) I <sub>bias</sub> =6.5mA | 82 |
| Pulse Responses with Extinction Ratio of 2: (a) I <sub>bias</sub> =1.5mA,<br>(b) I <sub>bias</sub> =4.5mA and (c) I <sub>bias</sub> =7.5mA     | 83 |
| NRZ Results: (a) 16Gbps, (b) 18Gbps, and (c) 22 Gbps                                                                                           | 84 |
| 40 Gbps PAM4: (a) No Equalization, (b) 2-Tap Equalizer,<br>and (c) 2.5-Tap Equalizer                                                           | 85 |
| 36 Gbps PAM4: (a) No Equalization, and (b) 2.5-Tap Nonlinear Equalization                                                                      | 86 |
| 44 Gbps PAM4: (a) No Equalization, and (b) 2.5-Tap Equalizer                                                                                   | 87 |
| 50 Gbps PAM4: (a) No Equalization, (b) 2-Tap Linear Equalizer,<br>(c) 2.5-Tap Linear Equalizer, and (d) 2.5-Tap Nonlinear Equalizer            | 88 |

Page

| 5.16 | Power Efficiency vs Data Rate | 89 |
|------|-------------------------------|----|
| 5.17 | Transmitter Power Breakdown   | 90 |

# LIST OF TABLES

|                                                                         | Page |
|-------------------------------------------------------------------------|------|
| Table 2.1 2-Tap Equalizer                                               | 15   |
| Table 4.1 Summary of VCO Design Parameters for 14GHz Resonant Frequency | 68   |
| Table 4.2 Summary of PLL System Parameters                              | 69   |
| Table 4.3 Output Clock Jitter and Current Consumption                   | 69   |
| Table 5.1 Performance Summary and Comparison                            | 91   |

#### **1. INTRODUCTION**

#### **1.1 High Speed Serial Links**

The demand for high speed data transfer has increased because of significant increase in computational capacity and absence of proportional increase in number of I/O pins. The number of I/O pins are limited due to space constraints and this calls for higher data rate to be transferred per I/O pin. This increased demand of high speed and performance has driven a significant increase in the data rate and complexity of today's high-speed systems and interconnect. Electrical copper links are generally used to interconnect various processors to build a processor multisystem as well as to connect I/O across relatively long distances [1].

The frequency dependent losses in the electrical (copper) channels arising due to skin effect and dielectric losses give the electrical channels a low pass nature. As the rate of transferring data increases, this low pass nature of channel results in attenuating a significant fraction of spectrum of the rectangular pulses which are generally used for transmitting data in digital communication. This attenuation of spectrum in frequency domain results in increasing the pulse widths in time domain at the other end of channel. This phenomenon, called Inter Symbol Interference (ISI), is one of the major limitations of current high-speed serial links [2]. Several equalization schemes at the transmitter and receiver have helped in overcoming the effect of this bandwidth limitation and hence in extending the range and performance of these electrical (copper) links.

### **1.2 Motivation for Optical Links**

The advancement in CMOS technology and introduction of faster and smaller technologies e.g., sub 20nm CMOS nodes or Finfet technologies has helped the CMOS digital circuitry to perform various operations (like data serialization) required for functionality of serial links at higher speeds. The bandwidth limitation caused by the low channel bandwidth in electrical links is the main bottleneck and stops electrical links from achieving high data rates. This has motivated the research into using optical links for high speed data transmission.

In optical interconnects, the Lasers and Photodiodes (PD) are used as transducers on transmitter and receiver side respectively. This can be seen in Figure 1.1. Lasers convert the current signal into optical power and photodiodes convert the optical power into current. The transmitter and receiver design before lasers and after PDs remain same as for electrical links except for driver design on transmitter side and transimpedance amplifier (TIA) design on receiver side.



Figure 1.1: General Optical Link

#### **1.2.1 Optical Channels**

Optical channels are medium through which light can travel from one place to another. They are analogous to copper traces in case of electrical links on which electrical signal propagates. Light can be transferred over short distances either by using free space or by using optical fibers.

In case of free space transmission, the light coming out of laser (the optical transmitter) is collimated (made parallel) using a collimator lens. A convex lens can be used for this purpose. The parallel light after the collimator can travel large distances as it does not diverge due to narrow divergence angle. This method can be used for small and medium range line-of-sight communication but has some disadvantages. The system is very sensitive to alignment position as a small shift in position can lead to loss of entire signal if the receiver is far away. The system becomes more sensitive to alignment tolerances as the distance between transmitter and receiver increases. The system is also sensitive to other environmental factors like atmospheric vibrations etc. Another method of transferring the light over small distances e.g., for chip-to-chip communication is to use optical fibers as optical channels. These systems are more

robust to environmental disturbances and the alignment is much easier e.g., for multimode fibers as compared to free space transmission.

Optical fiber consists of various layers covered on top of each other e.g., core, cladding, coating, strength membrane and outer jacket. Most of the light travels inside the core. For the light to always remain inside the core, the condition for total internal reflection (TIR) must be satisfied. The phenomenon of TIR occurs if the incidence angle



Figure 1.2: Optical Fiber

at each point is greater than the critical angle. This critical angle is determined by the refractive index of core and cladding. Thus, the refractive index of core and cladding are selected in such a way that a reasonable critical angle value can be achieved. The structure of fiber makes sure that the angle of incidence always remains greater than the critical angle unless it is bent sharply or beyond a predetermined angle. As the light reflects back and forth at the boundary of core and cladding, it remains confined inside the fiber and thus propagate from one end to another without much loss (per unit length).

Light is an electromagnetic wave, thus it obeys Maxwell's equations and these equations combined with boundary conditions results in Helmholtz equation for waves. Depending on various boundary conditions of the fiber, it can allow one or more modes to propagate through it. Modes are the different possible solutions of the Helmholtz equation. Different modes travel at different speeds and limit the optical transmission for large distances. Since these modes travel at different speeds, they come out of the fiber at different time i.e. if we send a small pulse of light from one end of fiber it will appear dispersed at the other end. This phenomenon is called modal dispersion and is analogous to ISI in case of electrical links. Modal dispersion is one of the major cause of limiting the high-speed data transmission for large distances.

#### **1.2.1.1 Single Mode Fibers**

One of the important boundary conditions that decides the modes is the inner diameter of fiber. As the inner diameter reduces, the fiber can support less and less number of modes until only one mode can propagate through the fiber. Such a fiber is called Single Mode Fiber (SMF). The diameter of SMF is typically less than 10um. Since the diameter is very small, the alignment here becomes challenging in order to avoid any coupling loss. The light here only travels in one mode and hence modal dispersion is absent. Single mode fibers can thus support higher data rates for long distance transmission compared to multi-mode fibers which are generally limited by modal dispersion. The loss of these fibers is minimum at high wavelengths of 1550um (~ 0.2dB/km) and are generally used with lasers operating at these wavelengths.

### **1.2.1.2 Multi Mode Fibers**

Multi-mode fibers (MMF) have larger diameter (~50um) and thus allow more than one modes to propagate through it. Since these fibers have larger diameter, it is relatively easy to couple light into these fibers. These fibers are generally used only in short and medium distance applications such as datacenters. These fibers are generally used with lasers working at 850nm and hence their loss is larger than SMFs but the major problem here is modal dispersion. Due to large modal dispersion, these fibers can not support high data rate for longer distances. Due to lower coupling loss as well as

ease of coupling compared to SMF, these fibers are generally preferred in short distance application.

#### 1.2.2 LASERs

LASER is the acronym for Light Amplification by Stimulated Emission of Radiation. Lasers are basically optical oscillators. It consists of two mirrors which act as cavity. The length of cavity determines the frequency of laser that the cavity can support. The light is reflected back and forth between these two mirrors inside the cavity. Since the reflectivity of mirrors is not 100%, if the light is left to oscillate between two mirrors, it can't sustain the oscillations and the laser stops "lasing". So, like any other oscillator, it needs a mechanism to sustain the oscillations. An external source is thus required to provide the energy the light lost in one round trip between two mirrors so that it can compensate for the loss that it suffered during the trip. This energy can be transferred through the material in cavity and depending on the material in cavity we have different type of lasers. Some of the examples of laser are Gas lasers, Chemical lasers, Dye lasers, Metal-vapor lasers, Solid-state lasers, Semiconductor lasers etc. An external source called the pump source is also needed to supply the energy to the cavity.

For e.g., In Helium-Neon (Gas) laser, an electrical discharge acts as the pump source. The energy provided by electrical discharge excites the electrons in atoms of gas and send them to higher energy states. When the light (of energy equal to that of the difference between the energy states of atom) travel inside the cavity, it interacts with these atoms. In general, three possible phenomena can occur. If the density of electrons in higher state is less than that on ground state, the atoms absorb the energy from light

equal to the difference between the energy states and the intensity of light (number of photons) reduces. This phenomenon is called stimulated absorption. If the density of electrons in higher state is same as that in lower state, the light passes unaffected from the cavity. And if the density of electrons in higher state is larger than that in lower state, the electron makes a transition from higher energy state to the lower energy state and the intensity of light increases. This phenomenon is called stimulated amplification and is the basis of all lasers. The state where the density of electrons in higher energy state is higher as compared to ground state is called population inversion. Population inversion can be achieved by passing energy to material inside cavity through pump source.

If the length of cavity, material in cavity and pump source are suitable, amplification of light is achieved. We can also achieve higher amplification compared with the loss incurred by the light while travelling in the cavity. Thus, one of the mirrors can be made partly transparent and partly reflective. The loss due to this low reflectivity can be easily compensated by the energy from external source and we get a net light output out of the laser.

#### **1.2.3 Semiconductor Lasers**

As in case of Gas lasers, an electrical discharge is used to achieve population inversion in the material inside cavity, we can also use electrical current to achieve population inversion inside the semiconductor material which acts as cavity in semiconductor laser thus achieving laser operation.

A semiconductor laser has mirrors as reflecting surfaces and the cavity is formed by the PN junction of a diode. The diode is used in forward bias configuration and is

generally driven using current source. When the diode is forward biased, the population of electrons in conduction band goes above the population in valance band thus achieving population inversion. When the light passes through the PN junction of this diode, the light absorbs energy from electrons thus send them from conduction band to valance band and gain intensity in this process. Effectively, the electrical energy gets converted to light energy which can then be transmitted outside.

Depending on the application and wavelengths of light required, different materials can be used as semiconductor material in PN diode e.g., GaN laser for 0.4um wavelength, InGaAsP laser for 1.0-2.1um, InGaN for 0.4-0.5um etc. Apart from the material used in constructing them, the lasers can also differ depending on how they are manufactured. Various type of lasers that are available commercially are Fabry-Perot Lasers, Distributed-Feedback (DFB), Hybrid silicon lasers etc.

Semiconductor lasers can emit light from edge or vertically (either top or bottom). While edge emitting lasers have been historically easier to manufacture they have certain disadvantages. They generally emit light in elliptical mode and generally have higher testing and packaging costs. Vertical Cavity Surface Emitting Lasers (VCSELs) emit light vertically and have many advantages over edge emitting lasers such as they can be easily used to form 2D arrays. They also emit light in circular mode as opposed to edge emitters that emit light in elliptical mode. The testing and packaging costs are lower and they also have lower size (and hence lower operating currents). The disadvantage is that it is hard to manufacture due to growth of high reflective mirrors.

#### **1.2.4 VCSEL Based Links**

Due to the advantages of VCSELs mentioned above, VCSEL based links have recently been the choice for short-haul communication especially within computing systems and data centers. VCSELs show a linear relationship between current and output light intensity for a wide range and hence are generally current driven. The light output is modulated by modulating the current passing through the VCSEL. Amplitude modulation is generally used and is achieved by changing the amount (amplitude) of current passing through the VCSEL. A current mode driver is used for this purpose and can be seen in Figure 1.3. Non-Return-to-Zero (NRZ) PAM2 (simply called NRZ) is the most commonly used modulation technique used in high speed serial as well as optical links. PAM4 modulation technique where four power levels are used and hence can transmit two bits in one symbol are also used especially for achieving higher data rates.



Figure 1.3: Current Mode Driver

#### **1.3 Research Contribution**

While VCSELs appear to be ideal candidates to be used in optical links, they have their limitations. The VCSELs are bandwidth limited devices and their bandwidth depends on the average current flowing through them. The bandwidth of VCSEL is a function of average input current and hence by definition it is a nonlinear device. Due to this nonlinearity and bandwidth limitation, the response of VCSEL to rising and falling edges is not same. The underdamped nature of rising edges causes them to be fast and subsequently results in them overshooting. On the other hand, the falling edges are overdamped (or less underdamped) and make the falling edges slower. This effect is not as pronounced in case of NRZ as it is in PAM-4. This nonlinear and bandwidth limited behavior manifests itself in the form of non-aligned eyes in eye diagram and results in the reduction of both vertical and horizontal eye opening. The top eye is generally shifted to the left side and bottom eye to the right side with respect to the middle eye.

Linear equalizers have traditionally been used to compensate for limited bandwidth of systems but they are ideally meant only for linear systems e.g., electrical channels. These equalizers can be used when the VCSEL is biased at higher current and can be approximated as linear. When biased at lower current, VCSEL's nonlinearity increases and linear equalizers prove to be sub-optimal. These equalizers cannot compensate for the skew caused due to the inherent nonlinearity of VCSELs. In this work, we use a nonlinear equalizer to compensate for the nonlinear response of the VCSEL. This nonlinear equalizer has been implemented using a look-up-table (LUT). The codes in LUT are selected to maximize the vertical eye opening. We have

implemented a 2.5 tap nonlinear equalizer where we use information from present symbol, last symbol and MSB of next symbol. This is equivalent to 1 main cursor, 1 post cursor and half pre-cursor and hence 2.5 tap. To show the effectiveness of this approach, VCSEL was biased at a low current to make it more nonlinear. Significant improvement in quality of eye was observed after turning on the equalizer compared to the unequalized case. Data rates of 36Gbps and 50Gbps with PAM4 modulation were used for this measurement.

#### **1.4 Thesis Organization**

This thesis is organized as follows. Section 2 discusses the basics of transmitter design, its basic blocks, VCSEL characteristics and the latest architectures in the literature. Section 3 introduces the proposed idea for transmitter design. Section 4 discusses the architecture of the transmitter and details of the major blocks. Section 5 discusses in detail the measurement results. The thesis is then concluded with Section 6 that summarizes the research work along with the potential improvements for future work.

#### 2. LITERATURE SURVEY

### **2.1 Transmitter**

Transmitter sends the information in form of some quantity that can be easily detected by receiver. A transmitter sends electrical signals in case of electrical links, electromagnetic waves in case of wireless communication and light in case of optical communication. The required transmitter swing may vary depending on the application. In general, a transmitter should be able to generate accurate voltage or current levels while also meeting other requirements. For example, the electrical channel has a characteristic impedance of 50 Ohms and thus the output impedance of an electrical transmitter should remain around 50 Ohms in order to attenuate any reflections coming from channel. In case of optical transmitters, the laser is generally kept near the transmitter in order to decrease the length of connection between them. Depending on the length of the connection, we may or may not ignore transmission line effects. If the length of the connection is large, and the transmission line effects can't be ignored, we may need to terminate the transmitter in order to attenuate the reflections coming from the laser.

Both current and voltage mode drivers can be used as output drivers in transmitters. Current mode drivers (shown in Figure 2.1) use the current mode logic and steers the current between its two arms. The termination is kept in parallel with the driver and since the driver is high impedance the effective impedance is only from the termination resistor except at high frequencies where the parasitics can also contribute to output

impedance of transmitter. In case of resistor termination, the current that must be provided by the driver doubles for same output swing. The reason is half of the current that must be dissipated in the termination resistor to achieve same swing. For this reason, voltage mode drivers are preferred in application where power efficiency is important. In voltage mod drivers, the output stages are calibrated to have an output impedance of 50 Ohms. The design here is more complicated as compared to current mode drivers but the current required by these drivers can be as low as a quarter of that required by the current mode drivers.



Figure 2.1: Transmitter Output Stage (Current Mode Driver)

#### **2.2 Modulation Schemes**

The most common modulation scheme used in serial links is Non-Return to Zero PAM2, also simply referred as NRZ. Here the information is transmitted by changing the amplitude of the transmitted signal. For example, in current mode driver for optical transmitters, the light intensity coming out of laser is proportional to the current flowing through it and hence the information can be transmitted by changing the amplitude of the current passing through the laser. A high current can represent a symbol '1' and low current a symbol '0'.

While NRZ is simple it has its disadvantages as the performance here starts to degrade as we approach higher data rates. The yquist frequency for NRZ is at  $R_b/2$  where  $R_b$  is the bit rate. As the bit rate increases spectral content at higher frequency increases and since the channel is generally band-limited, more and more spectral content is lost which leads to ISI. Thus, link designers use advanced modulation formats like PAM4 where 4 amplitude levels are used as compared to 2 in NRZ. This way, two bits can be transmitted simultaneously. The Nyquist frequency for PAM4 is half as compared to NRZ for same data rate or the bandwidth requirement for PAM4 is half compared to NRZ for same data rate and hence data rate can theoretically be doubled while still using the same bandwidth as NRZ. However, due to transmitter's peak power limit, the overall amplitude remains same and the 3 eyes now have to be accommodated in same amplitude difference. Thus, the max eye opening reduces to  $1/3^{rd}$  as that of NRZ. A general thumb rule says that if the channel loss at the Nyquist frequency of

PAM2 is greater than 9.54dB (20 log3) relative to the Nyquist frequency of PAM4 (or previous octave), PAM4 can theoretically provide better performance.

#### **2.3 Equalization**

Equalization techniques are generally used to overcome the effects of lower bandwidth of the channel. It increases the maximum data rate by cancelling the ISI caused by distortions from channel. Equalizers can be implemented on both transmitter and receiver side. They can be both linear and nonlinear and can be realized in both discrete and continuous time.

Linear equalizers are generally implemented as FIR filters on both transmitter and receiver side. Transmit equalizers implemented with FIR filters is the most common equalization used in serial links. Here, the signal is predistorted and launched onto the channel. The idea here is that the channel can be treated as a linear filter and the transmit

| Table | 2.1:2 | -Tap | Equa | lizer |
|-------|-------|------|------|-------|
|-------|-------|------|------|-------|

| Present Symbol | Previous Symbol | Filter Output |
|----------------|-----------------|---------------|
| 1              | 1               | 1-a           |
| 1              | -1              | 1+a           |
| -1             | 1               | -(1+a)        |
| -1             | -1              | -(1-a)        |

equalizer is another filter that when placed before channel can equalize for its response. The transmitter "pre-emphasis" filter can be represented in terms of a transfer function e.g.,  $H(z) = 1-a^*z^{-1}$ ; where "a" is a positive number. This is a two tap equalizer where the post cursor is weighted by "a" with respect to the main cursor.

As can be seen from the Table 2.1, the output of FIR filter depends on present and previous symbol. It's magnitude is less (1-a) if the present and last symbol are same and is more (1+a) if they are different. This means that the signal level increases at every transitioning bit which makes the transitioning bits faster. We can also look at it in frequency domain and say that the high frequency components (which are dominant when the signal is transitioning) are boosted by the equalizer and they compensate for the low pass response of the channel after that.

Plotting the poles and zeros of the same transfer function gives a pole at z=0 and zero at z=a=0.2 (assuming a=0.2). Since the frequencies increase counterclockwise on z=1 circle starting from z=1 which represents DC signal, we see that the effect of zero is more on the lower frequencies which reduces the gain at these frequencies. The same thing can be seen in the magnitude plot on the right side. This is clearly a high pass response which can compensate for the response of a low pass filter (channel). Here, we see that the gain of the filter goes above one but that is generally not practical because of the maximum transmitter power limitation. We thus scale down all the taps in order to meet this requirement. The weight of main cursor (which is one for unequalized case) also reduces and for this reason this transmitter "pre-emphasis" filter is also called "deemphasis" filter.



Figure 2.2: 2-Tap FFE Frequency Response



Figure 2.3: TX Equalization with FIR Filter

As discussed above, transmit equalizers can be represented in terms of transfer function in discrete domain. They are implemented as FIR filters as shown in Figure 2.3. The same filter can also be implemented on the receiver side as shown in Figure 2.4. But since the signal levels are generally very small on receiver side, we can actually emphasize the signal (or boost the high frequency components) as opposed to deemphasizing it on the transmitter side. One of the problem faced by linear receiver equalizers is the amplification of high frequency noise and crosstalk along with the incoming signal. Another challenge is the implementation of the analog delay elements, which are often implemented by time-interleaved sample-and-hold stages or through pure analog delays with large area of passive components. One of the major advantage



Figure 2.4: RX Equalization with FIR Filter

of receiver side equalizer is that the filters can be tuned adaptively and optimized depending on the channel. This saves the "back-channel" which the transmitter requires in order to get feedback from receiver regarding the channel characteristics.

Linear receiver equalizers can also be implemented in the form of continuous time linear equalizers (CTLE). Here, R-C degeneration is used which implements a zero in the transfer function of CTLE. This zero followed by poles form a high pass filter transfer function which compensates the low pass characteristics of the channel. While the design of CTLE is simple and it provides equalization at low area overhead, it has some disadvantages. The CTLE need to provide gain at maximum signal data rate which can limit the maximum achievable data rate. Tuning the parameters of CTLE can also be a challenging task.

Nonlinear equalizers are generally implemented as decision feedback equalizers (DFE) on receiver side. DFE works on the decisions made by sampler and attempts to remove the ISI directly from the incoming symbol by using the resolved data to control the polarity of equalization taps. Since the DFE uses the quantized values, it does not amplify the noise of the incoming symbol. In this regard, it is better than other receiver equalizers. The disadvantages of DFE include the error propagation which can occur if the resolved data is wrong. The effect of ISI gets doubled in this case instead to getting cancelled thus increasing the probability of error in next symbol. Another disadvantage is that the DFE can only work on the previous symbols and hence can only remove the post cursor ISI. This is based on the fact that the non-causal systems can't be designed and future symbols can't be predicted before the present symbol.

#### 2.4 VCSELs

As discussed earlier, VCSELs are semiconductor lasers and due to their advantages as simple direct modulation, excellent energy efficiency, low packaging cost and their ability to emit light from top (which allows the formation of 2D grids), VCSEL based links have recently become popular in short-haul optical communication especially within computing centers and datacenters. PAM2 modulation scheme is generally used to modulate the VCSEL current but advanced modulation schemes such as PAM4 can also be used to increase the data rates without reducing the operating distances.

### 2.4.1 VCSEL Modelling

VCSEL bandwidth is dependent on bias and temperature and is determined by a combination of electrical and optical parameters. The electrical parasitics of VCSEL together with the electron-photon interaction described by second order equation determines the VCSEL bandwidth. This leads to bandwidth limitation in VCSEL and hence equalization circuitry is used to achieve high data rates. As the bandwidth is bias current (input) dependent, the VCSEL displays nonlinear characteristics. Therefore, the equalization circuitry used can be nonlinear (to compensate for nonlinear characteristics) and is generally complex. The accurate knowledge of DC, AC and transient behavior of VCSEL is very important to optimally design the transmitter drivers. This calls for an accurate modelling of VCSELs in environment compatible with circuit simulators as Cadence. The model should consider the effect of temperature, bias current, electrical parasitics and rate equations for electrical to optical conversion. It should provide the

DC, small signal and large-signal simulation capabilities. One such modelling was done in [3] where various regions in VCSEL are modelled by various passive elements. This can be seen in Figure 2.5.

The electrical parasitics can be modelled as shown in the Figure 2.6. The model has two parts: electrical input stage and optical output stage. The first stage models the effect of various regions in VCSEL in terms of resistors and capacitors. The second stage models the electron-photon interaction which is governed by second order rate equations.



Figure 2.5: VCSEL's Equivalent Small Signal Model



Figure 2.6: Electrical and Optical Stages in VCSEL Model

Here, Cp represents the parasitic capacitance between VCSEL's cathode and anode terminal, Rm models the p and n type distributed Bragg reflectors (DBRs), Ra represents the active region resistance and Ca represents the active region capacitance. The second stage in the figure is the rate equation based optical output stage. The electrical and optical stage are connected by the current that flows through Ra and acts as the input to optical stage. In second stage, dN<sub>c</sub>/dt represents the change in number of charge carriers and dP<sub>c</sub>/dt represents the change in number of photons. This model accurately predicts the DC, small and large signal dynamics and has been used to model VCSEL in cadence simulations.

#### **2.5 Existing Architectures**

Many works on VCSEL based transmitters have been published recently. In some of these works, VCSELs are driven directly using pattern generators [4-7]. These architectures were based on both NRZ [4] and PAM-4 [5-7] modulation schemes. Since very good data generators are available, these designs were only limited by bandwidth of

VCSELs and hence were able to achieve high data rates. But this is generally not the case and on-chip drivers are designed to drive the VCSELs.

The simplest way to drive a VCSEL is to drive it directly through a current mode driver. This CML driver swings the current either into VCSEL (if the incoming bit is 1) or into dummy VCSEL (if the incoming bit is 0) and thus uses PAM-2 amplitude modulation to transmit the data. Such works have been published before [8-10]. The output driver in these designs is very simple as it's inputs depend only on the present symbol.

However, due to bandwidth limitation of VCSEL and output driver driving it, the data rate is limited for a fixed distance and equalization is generally required to extend the data rates without reducing the operating distances. Linear equalizers traditionally used for electrical channels have also been used to achieve higher data rates for VCSEL transmitters [11-16]. Linear FIR based transmit equalizers that can equalize for low pass characteristics of VCSELs have been used for a long time. The delay in these equalizers can be a symbol or less than that. Depending on the delay used, the equalizers can be symbol spaced [11-14] or sub-symbol spaced [15-16]. Similarly, linear equalizers have also been used for PAM4 modulation scheme [17]. Complex zero equalizer [18] where a pair of complex zero is used to cancel the pair of complex poles in the electrical-to-optical transfer function was published recently. Equalizers that compensates for group delay variation [19-20] have also been published.

Some equalizers that identify the asymmetric nature of VCSEL and compensate for that using nonlinear (or asymmetric) equalization have been published recently [21-25].

Asymmetric equalizers can control the width, delay and magnitude of equalization current provided at each transition. Asymmetric equalizers can also detect rising and falling edges and can compensate for them in different manner. This type of equalizer has been implemented for NRZ [25].

#### **2.5.1 Linear Equalizers for NRZ**

[14] reports a VCSEL based optical link. The transmitter here uses NRZ modulation and a linear 2 tap FFE to achieve 71 Gbps of data rate. This architecture was implemented in 130nm SiGe technology.

Another work [18] that uses a complex zero equalizer to cancel the effect of complex poles in the electrical-to-optical transfer function was reported recently. The complex poles in VCSEL transfer function leads to ringing (and overshooting) in its optical output which results in eye closure. This equalizer tries to eliminate the ringing in VCSEL response by using a filter with complex zeros and real poles in cascade with the VCSEL (which has complex poles). This results in equalizer cancelling the complex poles of VCSEL and changing its response from underdamped to overdamped. Figure 2.7 shows the basic idea behind complex zero equalizer based transmitter.

#### 2.5.2 Linear Equalizers for PAM4

A 40 Gb/s PAM-4 transmitter IC for Long-Wavelength VCSEL Links was reported in [17]. This work achieves a data rate of 40Gbps and uses PAM4 modulation. It uses a Single Mode Fiber (SMF) to eliminate the effect of modal dispersion and achieves a data rate of 28Gbps NRZ and 40Gbps PAM4 to 1km distance. As shown in Figure 2.8, this work uses a conventional FFE for equalization. Two current sources



Figure 2.7: Complex Zero Equalizer

for current cursor are used which represent MSB and LSB of the output driver. The weight of MSB tail current source is kept double as that of LSB. The weight of tail current for post cursor are also kept in the ration of 1:2. Since the equalizer here can't distinguish between the rising and falling edges, this is essentially a linear equalizer.

# **2.5.3 Nonlinear Equalizers for NRZ**

A 20Gbps 0.77pJ/bit VCSEL transmitter with nonlinear equalization in 32nm SOI CMOS was presented in [25]. This work uses NRZ modulation scheme and achieves a data rate of 20Gbps. It uses a nonlinear equalizer to compensate for the nonlinear response of VCSEL. As can be seen in Figure 2.9, the output driver here employs rise and fall detectors thus providing the system the ability to equalize for rising



Figure 2.8: Block Diagram of PAM-4 Transmitter with 2-Tap Linear FFE

and falling edges in different manner. Since, it can differentiate between rising and falling edges and can equalize for them in different manner, it is essentially a nonlinear equalizer. The design also employs a tunable analog delay as compared to a fixed symbol delay. Thus, by controlling the parameters like equalization current of rising edge, equalization current for falling edge and the delay in the equalizer path, the system can compensate for the nonlinear band limited response of VCSEL. The electrical to optical transfer function has been used to find these equalization coefficients. The coefficients are selected in order to cancel the peak in the electrical to optical transfer



Figure 2.9: Asymmetric Equalizer for 2-Tap NRZ

function. This method, in theory can reduce the underdamped response of VCSEL and can remove the ringing from the transient response. This leads to increased eye opening compared to the unequalized case.

### **3. RESEARCH PROPOSAL**

#### **3.1 Limitations of Existing Architectures**

The previous section discusses various architectures that are currently in use for VCSEL based transmitters. In earlier works, the linear equalizers similar to the ones used in electrical links to compensate for bandwidth limitation of VCSELs were used. VCSEL transmitters employing linear equalizers were published for both NRZ and PAM4 modulation schemes and were successful in extending data rates. The VCSEL was biased at high current in these architectures to get higher bandwidth. Since the VCSEL is relatively linear at higher bias currents, biasing VCSEL at high current also helped in reducing its nonlinearity apart from extending bandwidth. These equalizers worked well for high bias current setting but high bias current setting may not always be feasible e.g., for fixed max current for a VCSEL device, if we want to increase the OMA, the bias current must be reduced to accommodate for the swing. Also, it is not a good idea to bias the VCSEL near roll-over current as the characteristics of VCSEL changes with temperature and we may see compression especially in top eye of PAM4.

As discussed in last section, a VCSEL based transmitter using complex zero equalizer that compensates for the complex poles in Current-to-Power transfer function to change the system characteristics from underdamped to overdamped has also been published. This system again uses the linear approximation to set the complex zeros but

since the actual input is amplitude modulated, the linearity assumption fails and hence this method is not very optimal.

Recently, nonlinear equalizers designed to compensate for the nonlinear and bandwidth limited behavior were published. Asymmetric (or nonlinear) equalizers were used to compensate for rising and falling edges in different manner. But the method of finding the coefficients there was based on the fact that the current to power transfer function can be modelled as being second order for small changes (as the system can be assumed linear for small changes). However, using such an approach to find the coefficients in case of amplitude modulation is not correct since the swing (amplitude modulation) is generally very large and the system can't be approximated as linear. It may give reasonable results for NRZ if the optical modulation amplitude (OMA) is small, it is definitely not the best approach for PAM-4.

# **3.2 Challenges with VCSEL Based Transmitters**

### **3.2.1 Effects of Nonlinearity and Low Bandwidth**

Assumption of linearity allows for simple analysis of a system. The response of a bit stream in a linear system can simply be calculated by superimposing the pulse responses of individual bits. Electrical channel is a typical example of Linear Time Invariant (LTI) system. Even though the system is bandwidth limited and hence has memory, principle of superposition is still valid and therefore we can use the pulse response to calculate the response of a random bit stream.

VCSEL on the other hand is a nonlinear system i.e. superposition is not valid. The system is also bandwidth limited as the transfer function from current input to power output shows second order low pass characteristics. The nonlinearity of VCSELs has been discussed and modelled in detail in [3] and [25]. The nonlinearity of VCSELs can be seen by plotting step responses of all rising and falling transitions between two consecutive symbols. Out of 16 possible combinations of two consecutive symbols, 6 are rising, 6 are falling and 4 are non-transitioning. Figure 3.1 shows output of 12 transitions (rising and falling). The offset has been removed from all responses and hence they start from 0mW, they are then normalized so that the DC power after making the transition is 1mW.



Figure 3.1: Step Response for all Possible Transitions in 2-Tap PAM4

Here, 0 is the lowest level, 3 is the highest level of PAM4. I0, I1, I2 and I3 indicates the current corresponding to the four PAM4 levels. The box on right side color codes the responses of the graphs. It can be noted here that for rising edges, the response is more faster and underdamped compared to falling edges where response is more overdamped and sluggish. This effect is also visible in eye diagram in Figure 3.2 plotted with random data.



Figure 3.2: Effect of VCSEL Nonlinearity on PAM-4 Eye Diagram

#### **3.2.2 Sources of Nonlinearity and Low Bandwidth**

There are various sources of nonlinearity in the VCSEL transmitter. The output driver adds static nonlinearity due to channel length modulation in tail current sources of CML DAC. As the current in the VCSEL increases, the voltage-drop across it also increases. This is due to large resistance inside VCSEL which can be as large as 100 Ohms. Since the supply is fixed, this drop across VCSEL reduces the voltage available for CML DAC current mirrors. As the voltage at the drain of current source goes down, the current passing through it decreases due to channel length modulation. Thus, if a 2 bit binary DAC is used for driving VCSEL, the top eye generally gets compresses thus adding static nonlinearity to the transmitter. Other source of nonlinearity is the bias current and temperature dependent parameters of VCSEL. Since these parameters change dynamically, they add dynamic nonlinearity to VCSEL's current-to-power transfer function.

The two major sources of bandwidth limitation are the pole at output node of transmitter formed due to capacitance at that node and series impedance of VCSEL and the limited bandwidth of VCSEL itself. The capacitance at the output node of driver which includes the routing parasitics, the drain capacitance of input switches, the pad capacitance and other parasitic capacitances becomes substantially large. This combines with the large series resistance of the VCSEL creates a pole at output which limits the signal current that goes to the VCSEL. This is the major source of bandwidth limitation in older technologies and can be improved by moving to faster technologies. Other source of bandwidth limitation is the VCSEL itself. The current-to-power transfer

function in a VCSEL can be expressed by a second order low pass transfer function for small changes. This low pass characteristic limits the bandwidth of VCSEL. Advancement in VCSEL technologies has resulted production of high bandwidth VCSELs. VCSELs with bandwidths ~20GHz have already been developed.

#### **3.3 Proposed Idea**

A nonlinear equalization technique for PAM4 modulation scheme is proposed in this research work. The main idea behind this method is to extend the idea of nonlinear equalization, which was only limited to NRZ systems, to PAM4 systems as well and to introduce a method of finding the coefficients of nonlinear equalizer. This design can compensate for nonlinearity and bandwidth limitation arising because of multiple sources (discussed earlier). VCSEL biased at higher current can be approximated as linear system and this method of nonlinear equalization reduces to traditional linear equalization in that case.

As discussed before, the VCSEL is generally biased at high currents to reduce its nonlinearity and increase its bandwidth. But there can be some scenarios where it needs to be biased at low current. As the max current through VCSEL is fixed, to increase the OMA we may need to bias it at low current. The characteristics of VCSEL also change with temperature and it may not be a very good idea to bias VCSEL near roll-over current. Low current also reduces the power consumption and increases the mean time to failure. If the VCSEL is driven at lower current its inherent nonlinearity and bandwidth degrades. As discussed before, bandwidth and nonlinearity are also degraded by CMOS driver. Because of these nonlinearities, the rise and fall time of VCSEL begin to differ.

This effect is more pronounced in PAM4 than in NRZ and its effect can be seen on eye diagram. The three eyes of PAM4 stop being aligned. The top eye generally shifts towards left and bottom eye towards right. This reduces the horizontal and vertical opening of PAM4 eyes.

To improve this behavior, we propose a new method of nonlinear equalization where we treat VCSEL as a nonlinear bandwidth limited system and try to compensate for its response using a nonlinear filter before that. The proposed method tries to align the eyes and maximize the vertical opening thus improving both horizontal and vertical eye opening. It does that by detecting the transitions and equalizing for them separately. Since this method works on maximizing the vertical eye opening like many other linear equalizers in electrical channels (e.g., MMSE, Zero Forcing etc), this method of nonlinear equalization gives similar results for linear systems as obtained from the equalizers mentioned above.

The rising edges are generally fast and tend to overshoot while falling edges are slow. This leads to nonaligned eyes. The rising edges thus need to "slow down" and falling edges need to "speed up" in order to achieve the alignment. Depending on the memory in equalizer, the nonlinear equalizer can detect a number of transitions and equalize for them in different manner e.g., a 2 tap nonlinear equalizer using information from present symbol and next symbol can detect 16 (4 possible present symbols \* 4 possible last symbols) possible transitions and thus can equalize for all of them in different manner. In general, a N-tap PAM4 nonlinear equalizer can detect 4<sup>N</sup> different transitions and can equalize for them in different manner. Thus, the rising edges can be

slowed down by reducing the current on detection of rising edges and falling edges can be made faster by again reducing the current on detection of falling edge. This method provides flexibility in increasing or decreasing the current corresponding to present symbol and simultaneously allows to control the magnitude of this current. This functionality involves treating rising and falling edges differently and thus can't be performed by traditional linear equalizers.

The nonlinear filter (equalizer) here has been implemented using look-up-table (LUT) approach. The equalizer coefficients are entered from outside into volatile memory which are then selected depending on the symbols to be transmitted. This operation is performed at low speeds where it is relatively easy to look at the neighboring symbols and select the required code from the LUT. The codes generated from LUT are serialized to high speed using serializer and are then fed to output driver which in turn drives the VCSEL.

### **3.4 Coefficients of Nonlinear Equalizer**

Before understanding how to find coefficients of nonlinear equalizer, we must first understand the implementation of nonlinear equalizer. As discussed before, the nonlinear equalizer coefficients are stored in memory of a LUT and depending on the present and neighboring symbols, a code is selected from the LUT. This is then serialized to high speed and fed to the output driver which then drives the VCSEL. The output driver consists of a 5 bit DAC out of which first two bits consist of Symbol DAC and last three bits consist of Equalization DAC. The input to 2 bit symbol DAC is same as the 2 bit input (present) symbol with MSB of the present symbol driving the MSB of

the Symbol DAC and LSB of present symbol driving the LSB of Symbol DAC. This part is similar to a traditional output driver with no equalization.

Equalization functionality is added by the 3 bit Equalization DAC the inputs to which come from the LUT. Depending on the present and neighboring symbols, a predefined code is selected from LUT and fed to the 3 bits of Equalization DAC. The Symbol DAC provides the current to generate the four symbol levels and Equalization DAC generates 8 ( $2^3$  bits) equalization levels around each PAM4 level. This can be seen from the Figure 3.3.

The 2 bits in Symbol DAC are binary weighted and thus can produce four levels which are equally spaced. The Equalization DAC is also binary weighted and can generate 8 levels which are also equally spaced. Although, the Symbol DAC and Equalization DAC are binary weighted, the entire output DAC (Symbol + Equalization) is not binary weighted. This is due to the fact that depending on the bias current, data rate or other operating conditions the range of current required from the Equalization DAC can be lesser than the LSB of Symbol DAC. If the required range of Equalization DAC is indeed smaller than the LSB of Symbol DAC, we can reduce the LSB size of Equalization DAC to improve its resolution.

Now, we can understand the process of selecting the optimal coefficients. We first assume that the system has a memory of N (including present, past and future symbols) and that the system can be modelled using a Volterra series of order M. This method of modelling is similar to the one used in [26]. This model tries to express the output power of VCSEL sampled at an optimal instant (at center of middle eye) in terms



Figure 3.3: Current Levels in Output DAC

of present, past and future symbols as shown in Equation 1. Here, we assume that the total memory in the system is N and the system, with an order of 3 is sufficient to model the nonlinearity. A memory of N implies that there can be 4<sup>N</sup> distinct input symbol patterns (each of length N) that can produce distinct responses at VCSEL output.

Now, if we also use a N tap nonlinear equalizer, by definition, it can differentiate between 4<sup>N</sup> different input symbol pattern (each of length N) and can equalize for them separately. Equalization here is done by first detecting the input symbol pattern.

Equalization code is then selected from the LUT. This process maps the 2 bit present symbol (unequalized) to a 5 bit code (equalized). Depending on this 5 bit code, one of the 8 levels around the symbol level is selected which modifies the VCSEL response at the output.

If we ignore the quantization error of DAC, we can select the  $4^{N}$  coefficients of equalizer such that the optical response of each  $4^{N}$  input symbol patterns passes through the DC power levels at the optimal sampling instant. The DC power levels here are the power levels at VCSEL output generated when a symbol is transmitted repeatedly. These power levels are represented by P<sub>0</sub>, P<sub>1</sub>, P<sub>2</sub> and P<sub>3</sub> in Figure 3.4. Here, P0 is the power at the output of transmitter produced when the symbol 0 ("00") is transmitted repeatedly. Corresponding to these DC power levels, we can also define DC current levels i.e. 10, 11, I2 and I3, and input codes to DAC i.e. C0, C1, C2 and C3. The optimal sampling point is the instant in time where we want to set the maximum vertical eye opening after equalization. The optimal sampling point can be selected as the center of middle (unequalized) eye.

If the optical response of all the 4<sup>N</sup> symbol patterns passes through these power levels (P0, P1, P2 or P3) at the optimal sampling point, we can theoretically get 100% vertical eye opening and since the maximum eye opening for all three eyes occur at the same time (optimal sampling instant), the eyes are also horizontally aligned thus maximizing the horizontal eye opening as well.

To find the 4<sup>N</sup> coefficients of a N tap equalizer in a VCSEL system with N bit memory, we consider each input symbol pattern (of length N) and observe the response

at VCSEL output. The current corresponding to all symbols (except for the present symbol) is held fixed to the DC codes and the input DAC code corresponding to present symbol is swept. For example, let us assume that the memory in the system is 3 bits (1 present, 1 last and 1 next symbol) and the PAM4 levels are 0, 1, 2 and 3 starting from bottom to top. I<sub>0</sub>, I<sub>1</sub>, I<sub>2</sub> and I<sub>3</sub> are the DC current levels and P<sub>0</sub>, P<sub>1</sub>, P<sub>2</sub> and P<sub>3</sub> are the corresponding DC power levels. This is shown in Figure 3.4. Let's assume that the code 00100, 01100, 10100 and 11100 produce DC power levels P<sub>0</sub>, P<sub>1</sub>, P<sub>2</sub> and P<sub>3</sub> respectively at the VCSEL output where the first bit is MSB and the last bit is LSB in each code i.e. C0=00100, C1=01100, C2=10100 and C3=11100. To find the optimal coefficient for a particular symbol transition e.g.,  $S^{-1} = 0$ ,  $S^{0} = 2$ ,  $S^{1} = 3$  where 0 is the last symbol, 2 is the present symbol and 3 is the next symbol, we observe the VCSEL output for 8 cases where  $I^{-1}$  is always the current corresponding to a code of 00100 i.e.  $I_0$ ,  $I^1$  is always the current corresponding to a code of 11100 (DC symbol 3) i.e. I<sub>3</sub> but I<sup>0</sup> varies from minimum to maximum current possible for symbol 2 by varying output DAC code from 10000 to 10111 (i.e. 8 possible values). As the code corresponding to present symbol is swept, different output responses at VCSEL output are obtained. We select the one which produces the power output closest to DC power level P2 at the sampling instant e.g., in Figure 3.4, eight different inputs corresponding to 8 different present currents in first figure produces eight different responses (shown in second figure). The green one produces the least error compared to P<sub>2</sub> when the comparison is made at the optimal sampling instant. Thus, the optimal coefficient for this particular input symbol pattern is the input code corresponding to green waveform. The same process can be repeated for

all 64 (4<sup>3</sup>) possible input symbol patterns. Thus, this method provides a way to find the optimal coefficients for a N tap nonlinear equalizer. All codes before  $C^{-1}$  can be fixed at 00100 and all codes after  $C^{1}$  can be fixed at 11100 to remove the effect of ISI from symbols outside the three sumbols in consideration.



Figure 3.4: Process of Selecting Optimal Coefficients

Equivalently, we can also look at the VCSEL as a Volterra system with memory N. Assuming an order of 3 is enough to sufficiently capture the nonlinearity in VCSEL, we can represent the VCSEL with Equation (1). Here, x(n-i) represents the symbols and y(n) represent the power level at the output of VCSEL at a particular optimal point. Similar to VCSEL, we can also represent the nonlinear equalizer with another Volterra series and can find its coefficients such that the cascade of two equalizers produce no ISI at the output. This method corrects for nonlinear bandwidth limited behavior.

$$y(n) = \sum_{i=0}^{N} b1(i1)x(n-i1) + \sum_{i=0}^{N} \sum_{i2=i1}^{N} b2(i1,i2)x(n-i1)x(n-i2) + \sum_{i=0}^{N} \sum_{i2=i1}^{N} \sum_{i3=i2}^{N} b3(i1,i2,i3)x(n-i1)x(n-i2)x(n-i3)$$
(1)

Where, x(n) is the present symbol,

x(n-k) is the kth sample before the present symbol,

y(n) is the power at VCSEL output at an optimal instant,

N is the memory of nonlinear system,

and bk(i1,i2,...,ik) is the Volterra kernel of degree k.

Mathematically finding the coefficients of equalizer in form of other variables is very challenging and even if a solution is found it may not give very deep insight. So, a practical method that serves as the solution to this problem should be found.

Let us take the previous example where we have a VCSEL with memory of 3 symbols (last, present and next symbols). Also, assume that it can be represented using Volterra series as shown in Equation 1. Here, x(n-1) represents the previous symbol,

x(n) represents the present symbol, x(n+1) represents the next symbol and y(n) represents the value of VCSEL output at an optimal instant. The values of inputs can be 0, 1 2 or 3 depending in the input symbol. Thus, in this case x(n-1)=0, x(n)=2 and x(n+1)=3. Putting these values in Equation (1) gives the value of y(n). Now, this value of y(n) changes with x(n) and a solution of x(n) can be found that makes y(n) equal to P<sub>2</sub>. Note that x(n-1) and x(n+1) have been fixed and were not changed here. The reason for this is that during random incoming pattern, the input level corresponding to these symbols also vary due to equalization but can be averaged to the DC symbol levels.

If the current corresponding to the solution of x(n) is made to flow from the VCSEL, maximum Vertical and Horizontal openings can be achieved. Since, we use DAC at the output, this real solution for x(n) need to be quantized to one of the levels of DAC. This DAC is nothing but the 5 bit output DAC in the output driver. This method thus reduces to previous method of finding the coefficients by sweeping the Equalization DAC inputs to choose the optimal value which would result in maximum vertical eye opening. These coefficients (in the form of 5 bit DAC codes) found from either of these methods defines the equalizer and when this equalizer is cascaded with the VCSEL, it produces outputs such that their difference from DC power level at optimal sampling instant is least. Thus, this method maximizes the vertical eye opening and also helps in aligning the eyes.

# 3.5 Optimal Number of Taps for Nonlinear Equalizer

In previous section, we discussed the method of finding the optimal coefficients for an N tap equalizer. We also saw that the number of taps in a N tap equalizer are  $4^{N}$ 

and thus the number of coefficients increase exponentially with the number of taps in equalizer. This means that the hardware complexity of LUT increases exponentially with the number of taps. But the if number of taps increase, the equalizer has large memory and its ability to cancel the effect of ISI improves. Hence, there exists a trade-off between hardware complexity and the amount of ISI that can be cancelled by the equalizer and it is important to find the optimal number of taps which can provide good eye opening with minimum complexity.

Here, we assume that the system has a memory of N and try to find 4<sup>N</sup> coefficients for the equalizer. We then sweep the value of N from 1 to 6, 1 being the case for no equalization. First, the DAC is characterized by running a DC sweep. The input of DAC is fixed to one code and is then allowed to settle. The code is changed at an interval of 1ns and then allowed to settle after every change. This process is repeated for DAC inputs from "00000" to "11111".

We also observed from simulations that the 0 level needs equalization only in lower side i.e. transitions reaching 0 level are always slow and need to be "speed-up", thus, the current required while making a transition to 0 level is always lower than I0 (DC current level 0). The DC power level corresponding to symbol 00 is thus selected as the power level corresponding to code "00111". The DC power level corresponding to symbol 11 is selected in the middle, at the power level corresponding to code "11011". It is because it was observed from simulations that for transitions reaching level 3, the equalization current is required in both the directions. The DC levels for symbol 01 and 10 are selected such that the difference in power levels of adjacent symbols is same. The

DAC code corresponding to these power levels was found to be "01101" and "10011". This can be seen in Figure 3.5.

If we want to find the coefficients of a 6-tap equalizer ( $4^6$  different coefficients), we need to look at the output response of all  $4^6$  symbol transitions. Additionally, we need to evaluate 8 cases corresponding to 3 Equalization bits (for present symbol) in



Figure 3.5: Output DAC Characterization



Figure 3.6: All Possible Transitions for 6-tap PAM-4

each symbol pattern. Thus, we need to evaluate a total of  $4^6 * 8$  transitions. These transitions are shown in Figure 3.6. For memory of 6 bits (including present), each symbol can take 4 different values except for the present one which can take 32 values.

As an example, consider a case with symbol pattern  $0 \rightarrow 1 \rightarrow 1 \rightarrow 2 \rightarrow 3^{\circ} \rightarrow 2$ (shown in quotes is the present symbol). Here, the present symbol is 11 (3), the last four symbols before 11 were 10 (2), 01 (2), 01 (2) and 00 (0), and the next symbol is 10 (2). In this case, we fix the current corresponding to all bits but iterate the current corresponding to present symbol. This is done by changing the Equalization DAC code from 000 to 111 or output DAC code from 11000 to 11111. The output power at an optimal sampling point can be stored and then compared with the DC power level (P3 in this case). The Equalization DAC code that results in power output closest to P3 is the optimal code for this set of symbol transition. This process is repeated for all 4<sup>6</sup> possible symbol patterns and optimal code is found in all the cases. Transient simulations are then run for 600ns (16800 symbols), which is greater than  $2^{14}$  (16384) symbols.





Figure 3.7: Effect of Nonlinear Equalization on PAM-4 Eye Diagram: (a) No Equalization, (b) 2-Tap Equalization, (c) 2.5-Tap Equalization, and (d) 3-Tap Equalization

Considering an example of 2 tap equalizer, where the memory in the equalizer is 2 symbols, the system can distinguish between  $4^2$  different symbol transitions. Thus, the system treats symbol pattern  $0 \rightarrow 1 \rightarrow 1 \rightarrow 2 \rightarrow 3^{2} \rightarrow 2$  (or any symbol pattern  $X \rightarrow X \rightarrow X \rightarrow 3^{2} \rightarrow 2$ ) as  $3 \rightarrow 3 \rightarrow 3 \rightarrow 3^{2} \rightarrow 3^{2} \rightarrow 2$ , here X can take any values from 0 to 3. As discussed earlier, for half symbols, the system treats 2 and 3 as same and 0 and 1 as same thus  $0 \rightarrow 1 \rightarrow 1 \rightarrow 2 \rightarrow 3^{2} \rightarrow 2$  and  $0 \rightarrow 1 \rightarrow 1 \rightarrow 3 \rightarrow 3^{2} \rightarrow 3^{2} \rightarrow 2$  will be treated as  $3 \rightarrow 3 \rightarrow 3 \rightarrow 3^{2} \rightarrow 3^{2} \rightarrow 3^{2} \rightarrow 2$ .

Figure 3.7 shows four eye diagrams with different number of taps. It can be seen that as the number of taps increase, ability of a system to compensate for nonlinearity increases. This is evident from the skewed PAM4 eyes in case of no equalization and relatively aligned eyes for higher tap cases.

To find the optimal number of taps to compensate for nonlinearity, the vertical eye opening and horizontal eye opening are plotted corresponding to each eye diagram (Figure 3.8). For vertical eye opening, the minimum of the eye openings at an optimal sampling point is selected. For horizontal eye opening, the horizontal opening common to three eyes is plotted. The horizontal and vertical eye openings mentioned above are plotted for two bias currents but same OMA. The bias currents are selected to be 5.5mA and 6.5mA and OMA (single eye) is selected to be 1mW.

Both the curves show similar trend as the vertical and horizontal eye-opening increases with number of taps. This trend is constant except for 2-tap in low bias current case where the eye openings actually decreases. This may be due to low range of equalization provided by the equalization DAC. Although eye opening increases with



Figure 3.8: Vertical and Horizontal Eye Opening Versus Number of Taps

increase in number of taps, the complexity involved in implementing the LUT also increases. As discussed before, the complexity of LUT increases exponentially with number of taps e.g., it takes 2 times the hardware to implement a 2.5 tap equalizer compared to a 2 tap equalizer. Considering these trade-offs, an equalizer with 2.5 tap was selected. A 3 tap would also have been a good choice was it was not feasible to implement LUT for 3 tap due to its large size. Figure 3.9 shown again shows the improvement in eye opening achieved by the use of 2.5 tap nonlinear equalizer.



Figure 3.9: Improvement Shown by 2.5-Tap Nonlinear Equalizer

# 3.6 Equalization for VCSEL Biased at High Current

As discussed earlier, VCSEL can also be biased at high current to reduce its intrinsic nonlinearity and achieve higher bandwidth. In that case, we only need to deal with the static nonlinearity arising because of finite output impedance of tail current nodes. If the bandwidth of driver and VCSEL are comparable, the overall bandwidth of system also improves by increasing the bias current. On the other hand, if the overall bandwidth is limited by bandwidth of driver, not much improvement in total bandwidth is achieved by increasing the bias current of VCSEL. Even though, the bandwidth of driver in this technology (~10GHz) is lesser than that of VCSEL's (~20GHz), it is not negligible and we still see the rise in overall bandwidth as the bias current of VCSEL (and hence its bandwidth) is increases. Such systems, where we can ignore the nonlinearity of VCSEL, linear equalizers can be used whose coefficients can be found using well established techniques for linear electrical channels such as MMSE etc. Since the system here has a memory of 2.5 bits, it has the memory to store one previous symbol (2 bits) and half next symbol (1 bit). The information of LSB of next symbol is not stored and system can't distinguish between symbols 00 and 01 and treat them as same. Similarly, it can't distinguish between symbol 10 and 11 from next symbol and treats them same as well. This method is thus inferior to 3 tap in this regard but is much easier in terms of hardware implementation.

#### **4. TRANSMITTER ARCHITECTURE**

# **4.1 Fundamental Blocks**

This section explains the four fundamental blocks used in this architecture: Pseudo Random Bit Sequence generator, Look-Up-Table, 8-to-1 Serializer and Output Driver.

#### **4.1.1 Pseudo Random Bit Sequence (PRBS)**

Pseudo Random Bit Sequences (PRBS) are used mainly as the data sources inside the chips for testing. They are used as low speed data sources which can then be serialized to higher speeds for high sped testing. PRBS are implemented using Linear Feedback Shift Registers (LFSR). A series PRBS with N memory elements can produce PRBS of length 2<sup>N</sup>-1.

Any parallel PRBS with N memory elements can be converted to a series-parallel PRBS with K parallel outputs by adding K additional XOR gates. These parallel PRBS generators can be obtained by restructuring the series PRBS. The outputs of PRBS are such that when they are serialized they again become the PRBS with the same length as the original PRBS.

In this case, we have used a parallel PRBS generator with 15 D-flipflops and 16 parallel outputs. The serialized output received after serializing the PRBS parallel outputs has a period of  $2^{15}$ -1. Each one of the parallel output is the down sampled version of the final serialized output. Since a PRBS sequence retains the same sequence length even after down sampling, all the 16 parallel outputs of the parallel PRBS have a sequence length of  $2^{15}$ -1. Figure 4.1 shown the PRBS-15 block used in the transmitter

design as the data source. It gives out 16 parallel outputs which is equivalent to 8 PAM4 symbols. They can be serialized as shown in Figure 4.1. The characteristic polynomial used here is  $x^{15}+x^{14}+1$ .





Figure 4.1: PRBS-15 Block and Serialization of Symbols

## 4.1.2 Look-Up-Table (LUT)

Since we need to implement a 2.5 tap PAM4 architecture here, the system should be able to differentiate between  $4^{2.5}$  different transitions i.e. 32 transitions and for each detected transition, we should be able to provide a predefined code from the memory. Thus, we need a block that can differentiate between 2.5 different PAM4 symbols (or 5 bits) and can choose from 32 (2<sup>5</sup>) inputs from the memory. This functionality can be implemented using a 32-to-1 Mux with the present symbol, previous symbol and MSB of next symbol as the select lines to it. The 32 inputs can be entered from outside by the user and depending on the present, last and next symbol, it can select the right code from the memory. This 32-to-1 mux is the basic block of LUT.

If we look at the parallel output of PRBS, we can easily see that if S3 is the present symbol, it neighboring symbols i.e. S2 and S4 are going to be the previous and next symbol depending on how the serialization is performed. If S2 is the previous symbol for S3 and S4 is the next symbol, we can use these bits as the select lines of MUX as shown in the Figure 4.2. All the coefficient calculations are done at the lower speed inside the LUT. The serializer then serializes the codes calculated from the LUT and feed them to the output driver. If the LUT can work at the max frequency (data rate) we would need to calculate the output DAC code at highest frequency and feed the output driver. In that case, no serializer will be required. But since the LUT can't work at such high speeds, we pre-calculate various output DAC codes corresponding to various symbols from the PRBS. These codes are then serialized using the serializer to higher data rates.



Figure 4.2: Connections Between PRBS and LUT

If we use a N:1 serializer, we need to calculate N parallel outputs from the LUT which translates to N time the architecture but this method also relaxes the timing of the slower PRBS and LUT blocks by N times. They can now run at N times lower frequency as compared with the symbol rate. The value of this parameter N depends on various factors like the max delay in the path of PRBS and LUT, the speed of technology, clock skew, parasitics etc. Depending on the delay in the data and clock paths, retiming is sometimes needed in the data path if the propagation delay, setup time etc. can't be met in one clock cycle.

In this architecture, we have selected the value of N=8 i.e. the PRBS runs at 8 times lower frequency as compared with the symbol rate. No retiming has been done in PRBS, LUT or between them i.e. the data generation through PRBS and selection of equalizer coefficients is done in one cycle. This saves power in PRBS and LUT blocks.

Since the PRBS gives out 16 parallel bits i.e. 8 symbol outputs and after coefficient selection from LUT, they are serialized by 8:1 serializer, all the data generated by the PRBS in one cycle gets fed to output driver after parallel to serial conversion. Thus, the use of 16 bit parallel output PRBS is justified.

Now, the eight 32-to-1 muxes are placed in parallel to calculate 8 parallel bits. Each of these 8 bit is one of the bit of the 5 bit code to be used at the output driver. The eight 32-to-1 muxes in parallel are shown below with the PRBS block providing parallel outputs. Since we are implementing 2.5 tap equalizer here, we need a memory of 2 symbols. This is shown by symbols S8- from the previous cycle of PRBS and S1+ from the next cycle of PRBS. The mapping of these bits from PRBS output to LUT select lines can be seen in the Figure 4.3. The select lines have been color coded and are related to the mux shown in similar color. The mapping is just the extension of the mapping of 5 PRBS bits on select lines of one of the MUX shown earlier.

Now, since each code of output driver is 5 bits, we need to calculate these five bits separately and combine them at the input of output driver. This can be done by using 40 (5\*8) 32-to-1 muxes in the LUT. This can be seen in the Figure 4.4. The select lines to all five groups of 8\*32-to-1 muxes are same the only difference being the inputs from the memory. All five groups of 8 \*32-to-1 muxes are not same. Two of these groups corresponding to the MSB and 2<sup>nd</sup> MSB of the output driver and their value depend only on the present symbol. The present symbols from PRBS are directly used as the Symbol DAC inputs and the 32-to-1 muxes in LUT are replaced by a simple delay logic. This delay logic mimics the delay incurred by signals passing through 32-to-1 muxes in LUT.



Figure 4.3: Mapping of PRBS Bits on LUT for 1 Bit of Output Driver



Figure 4.4: Mapping of PRBS Bits on LUT for 5 Bits of Output Driver

Thus, we need only IN3<32:1>, IN2<32:1> and IN1<32:1> in memory which can be entered by user from outside.

## 4.1.3 Serializer

The serializer converts the 8 parallel outputs corresponding to 5 DAC bits to one output each [27]. It also generates the inverted outputs required for the functionality of output DAC. Thus, 5\*8 bits at the output of LUT are converted to 5\*1 bits at the output of 8-to-1 serializer. These 5 bits thus drive the output driver. The 8-to-1 serialization is performed in 3 stages. This can be seen from Figure 4.5. The figure also shows the clocking unit used for the serializer. The high speed input clock (frequency Fin) is first divided by 2 to generate quadrature phases (frequency Fin/2). Each of this phase is again divided by 2 and the resultant clock (Frequency Fin/4) drives the select line of a 2-to-1 mux which convert the 8 input bits to 4 output bits. The quadrature phases of clock produced earlier latch the four outputs from 8-to-4 mux. These four outputs become input to two 2-to-1 muxes, the select lines to which are two 180° out-of-phase clocks (frequency Fin/2) from the quadrature clocks. This converts the four data bits to 2 data bits. These 2 bits directly reach the input of final 2-to-1 mux, the select line of which is the incoming clock of frequency Fin. Since, the final mux in the system is 2:1, it is essentially a half-rate architecture. Due to high speeds, there is no retiming before final 2-to-1 mux.

The half rate architecture used here has some advantages and disadvantages. Using a full rate architecture involves running a D flip-flip at maximum clock rate which



Figure 4.5: Quadrature 8-to-1 MUX and Clocking Unit

can be extremely power consuming. On the other hand, the quarter rate architecture uses a 4-to-1 mux at the end of serializer and runs at a quarter rate compared to the full rate thus reducing the speed at which the circuitry must work. It has problem associated with maintaining of four phases at high frequency. Maintaining good phase relationship between four phases can be very difficult compared with two phases in the half rate architecture. The half rate architecture is thus a trade-off between speed and maintaining good phase relationship. The architecture also includes duty cycle correction inverters that can correct for any duty cycle error accumulated from input to output.

#### **4.1.4 Output Driver**

The two most common designs for the transmitter with equalization are the Direct FIR architecture [28] and the Segmented DAC architecture [29]. The direct FIR design involves equalization at the output node. All the equalization taps add the currents (if the final stage is current mode) at the output node and thus have parallel output drivers for output taps. Each driver here is sized to handle its maximum current. This increases the output capacitance at the output node and is one of the major disadvantage of this architecture. The predriver in this topology is very simple and the overall architecture consumes lower power compared to segmented DAC design. The architecture is shown in the Figure 4.6. The capacitance at output node increases with the number of parallel output drivers which limits the bandwidth at the output node.

The second topology generally used is the segmented DAC architecture. This architecture is shown in Figure 4.7. The output driver is implemented as a DAC with tail current source which are generally binary weighted. The switching transistors are also scaled similar to the tail current in order to have similar voltage drop in different DAC arms. The output DAC is thus able to generate a number of levels depending on the number of bits e.g., a 6-bit DAC shown in the diagram here can generate 64 levels of current at the output. The output transistors here are also sized to handle peak current but the size of transistors here remain smaller than the direct FIR case. This leads to smaller capacitance at the output node resulting in larger bandwidth. The design is relatively complex as the equalization is performed at lower speed using the LUT and the calculated coefficients are then serialized. Since the tap weights of the equalizer can be







Figure 4.7: Segmented DAC Architecture

controlled from the memory (LUT) which can be entered from outside, this design provides huge flexibility in equalization circuits.

The complexity of output driver increases with number of taps in direct FIR case and becomes large for nonlinear NRZ equalization because it also involves detecting the rising and falling edges. The complexity becomes very large if we want to implement nonlinear equalization for PAM-4 modulation scheme because number of possible transitions in PAM4 increases as power of 4 compared to NRZ where they increase with power of 2 e.g., for 2 tap equalization, number of possible transitions between 2 consecutive symbols in NRZ is 4 ( $2^2$ ) while it is 16 for PAM4 ( $4^2$ ). Detecting these edges is essential for nonlinear equalization but becomes very challenging in direct FIR architecture. Thus, because of high complexity of output driver, the segmented DAC architecture was used in this work.

Since, the equalization is performed in the LUT, almost all the complexity arising because of nonlinear PAM4 equalization is absorbed by the low speed LUT and the output driver is relatively simple. The output driver is the simplest yet the most important block of this design. The schematic of output driver can be seen below. The output driver here consists of a 5-bit DAC, each stage consists of two NMOS switching pair and a tail current as shown in the Figure 4.8.

The tail current source is implemented by cascode current mirror configuration. A sink current is generated outside the IC which is mirrored and then used for bias voltage (IBIAS) and cascode transistor gate voltage (ICAS) generation. These voltages are used to generate the tail currents for DAC. The cascode configuration is selected for

61

tail current sources due to static nonlinearity considerations for PAM4. As the current through the VCSEL increases while transmitting the highest level, the voltage drops across it increases significantly due to the large series resistance of VCSEL e.g., if the current difference between symbol "00" and "11" is 6mA and the series resistance of VCSEL is 1000hms, the voltage at output node while transmitting level "11" drops by 600mV compared to "00". This voltage drop reduces the current through the tail current sources due to channel length modulation (finite output impedance). Hence, the cascode transistors have been used to reduce this effect.



Figure 4.8: Output Driver

The bandwidth at output node is generally the major factor that limits the data rate and hence all transistors used in output stage have minimum length. The width of switching pair should also be as small as possible as it directly contributes to capacitance at the output node. However, their size should not be very small as these switches operate in linear region when conducting and their resistance increases as their size reduces increasing the voltage drop across them. The input transistors are sized to keep the voltage drop across them always less than 200mV. The minimum voltage at output node is 600mV and hence the drain of tail current sources always remains above 400mV keeping tail current sources in saturation (350mV is required to keep tail current sources in saturation). The movement of voltage at output node is 600mV if the modulation current and VCSEL resistance are considered to be 6mA and 1000hms respectively. Thus, the output node swings from 600mV to 1.2mV and does not create any reliability issues on VCSEL side.

The VCSEL here is biased at 3V and the minimum drop across it is 1.8 V. On the other side, where there is no VCSEL but the voltage is still 3V, there requires a voltage drop similar to VCSEL in order to avoid stressing the input NMOS transistor. This is achieved by putting three diode connected transistors in series. Each transistor provides the a  $V_{GS}$  drop which is of the order of 500-600mV. Three transistors thus provide a drop of around 1.8 V similar to VCSEL. High V<sub>t</sub> transistors have been used here to increase the drop provides by them. A leakage current of 100uA is also used to keep the transistors in ON state even when all the current is being driven into VCSEL. The

63

transistors in dummy VCSEL have been sized to have similar voltage fluctuation at the OUT and OUTB node (drain of both NMOS switching pairs).

The tail current sources have been implemented as DACs in order to have the control over optical modulation amplitude (OMA). Changing the tail current of MSB and  $2^{nd}$  MSB of output DAC changes the symbol level and thus the OMA at the output. Changing the tail current source corresponding to last three bits of output DAC changes the resolution and range of the equalization DAC. Thus, by increasing the tail currents corresponding to last three bits in the output DAC increases the range at the expense of resolution. The bias current (I<sub>Bias</sub>) has also been implemented as a DAC and can be controlled from outside. The bias current can be increased to get more bandwidth or can be reduced if large OMA is required at the output with a fixed peak current. All these bits can be programmed from outside using SCAN chain.

## 4.2 PLL and Input Clock Path

### 4.2.1 Bypass and PLL Clock Paths

A differential clock is supplied to the transmitter as input. This differential clock can be supplied directly from outside the chip or can be generated on-chip using PLL as shown in Figure 4.9. The bypass clock path or PLL including the reference clock path can be selected using a 2-to-1 CML MUX. The self-bias AC coupling inverters following the CML MUX convert the CML level signal to the CMOS level signal and output the differential clock to the transmitter. It is possible to monitor the clock to the transmitter with the dedicated clock monitor pads.



(a)



(b)



(c)

Figure 4.9: Input Clock Path: (a) Complete Path, (b) Bypass Clock Path, and (c) PLL Clock Path

The bypass differential clock in the range of 0.5GHz to 14GHz is propagated through the path shown in Figure 4.9(b). The differential clock input to the pads outside the chip is received by the shunt inductive peaking CML buffer. The clock is propagated to the CML MUX through five-stage CML buffers placed after the shunt inductive peaking CML buffer.

On the other hand, PLL can provide clock which shows a better random jitter performance than that of bypass clock. PLL output clock range is from 12GHz to 15GHz which is determined with VCO tuning range. Figure 4.9(c) shows the PLL clock path. The reference clock is provided through the same pads as bypass clock pads. The CMOS level single reference clock is generated by the self-bias AC coupling inverter placed after shunt inductive peaking CML buffer and five-stage CML buffers.

### **4.2.2 Input and CML Buffers**

The circuit diagram of shunt inductive-peaking CML buffer is shown in Figure 4.10. It receives the differential bypass clock or the differential reference clock input from the pads. The series inductor increases the bandwidth and compensates for reduction in amplitude due to the cable channel loss. The traditional resistive load CML buffers are used for buffering the clock. The tail currents of CML buffers are tunable with the 4-bit current DAC to adjust output amplitude.

# **4.2.3 Voltage Controlled Oscillator (VCO)**

A LC-VCO has been used in PLL to achieve low phase noise and low random jitter performance. Figure 4.11 shows LC-VCO circuit using two LC phase noise filters which improve the VCO phase noise [31]. The phase noise filters which are tuned to



Figure 4.10: Shunt Inductive Peaking CML Buffer



Figure 4.11: LC-VCO Circuit Diagram Using Two LC Phase Noise Filters

high impedance at twice of resonant frequency reduce the thermal noise. The summary of VCO design parameters for 14GHz resonant frequency are shown in Table 4.1. MIM capacitor value and Varactor capacitance value are tunable with 5 bit/16fF step and 4 bit/18fF step DAC respectively to achieve VCO tuning range from 12GHz to 15GHz. VCO oscillation is controlled by the top PMOS and VCO enable signal. VCO is disable when selecting bypass clock. Using this method, the phase noise is improved by -3dB at 6.87MHz offset frequency in simulation.

 Table 4.1: Summary of VCO Design Parameters for 14GHz Resonant Frequency

| Inductor L value           | 283 pH |  |  |
|----------------------------|--------|--|--|
| Inductor Q value           | 19.5   |  |  |
| MIM capacitor value        | 126 fF |  |  |
| Varactor capacitance value | 322 fF |  |  |
| Output Voltage Swing       | 1.2V   |  |  |
| VCO current consumption    | 5.2 mA |  |  |



Figure 4.12: Third Order, Type II Charge Pump PLL Diagram

| Loop Bandwidth                | f3dB = 6.87 MHz                     |  |  |
|-------------------------------|-------------------------------------|--|--|
| Damping Factor                | $\zeta = 1.492$                     |  |  |
| Natural Frequency             | $\omega n = 10.29 \text{ Mrad/sec}$ |  |  |
| VCO Tuning Range              | 12 - 15 GHz                         |  |  |
| VCO Conversion Gain           | Kvco = 1.6 GHz/V                    |  |  |
| Loop Division Ratio           | N = 12                              |  |  |
| Charge Pump Current           | Icp = 115.2 uA                      |  |  |
| Primary Loop Filter Capacitor | C1 = 145 pF                         |  |  |
| Loop Filter Resistor          | $R = 2 k\Omega$                     |  |  |
| Secondary Loop Filter         |                                     |  |  |
| Capacitor                     | C2 = 6 pF                           |  |  |
| Phase Margin                  | PM = 64.91  deg                     |  |  |
| Jitter Peak Gain              | 0.85 dB                             |  |  |
| Reference clock frequency     | fref = 1.167 GHz                    |  |  |

Table 4.2: Summary of PLL System Parameters

Table 4.3: Output Clock Jitter and Current Consumption

| PLL output Dj in simulation      | 190 fs |
|----------------------------------|--------|
| PLL output Rj in simulation      | 43 fs  |
| PLL current consumption          | 41 mA  |
| Bypass clock current consumption | 32 mA  |
| Clock MUX and Driver current     |        |
| consumption                      | 16 mA  |

# 4.2.4 Phase Locked Loop (PLL)

The third order, type II charge pump PLL has been implemented in this work.

PLL block level diagram and summary of PLL system parameters are shown in Figure

4.12 and Table 4.2 respectively. The output clock jitter and PLL current consumption is

summarized in Table 4.3.

#### **4.3 Transmitter Top Level**

Figure 4.13 shows the transmitter architecture. The major blocks are PRBS generator, LUT, 8-to-1 serializer and the output driver. The output driver is a 5 bit current DAC which steers the current between VCSEL and dummy VCSEL depending on the input code. It works at maximum clock frequency e.g., for 14 GHz clock input, the output driver runs at 28Gbauds and hence can reach 56Gbps of data rate. The 2 MSBs of DAC form the symbol DAC and can generate the 4 PAM4 levels. These 2 bits are same as the present symbol at any time and do not depend on the last or next symbol. The 3 LSBs of DAC form the equalization DAC and provides the equalization current required depending on present and other symbols. The 2 bit symbol DAC and 3 bit equalization DAC are both binary weighted but the combined 5 bit DAC is non-binary weighted i.e. we can independently fix the resolution and range of both the DACs. This leads to improvement of resolution of equalization DAC if the range requirement for equalization DAC is less than LSB of symbol DAC which is generally the case especially at low data rates. For N tap equalizer, the system can distinguish between 4<sup>N</sup> different set of input data transitions with the total memory being N e.g., for a 2.5 tap equalizer, the system can distinguish between 32 different input data transitions and has memory of 2.5 symbols. We can set the equalization current for all these input data transitions differently by setting the corresponding code in look-up-table (LUT).

A half rate architecture has been implemented here as it is easier to maintain the phase relation between two 180° clocks at high speeds compared to four 90° clocks in quarter rate. An on-chip PRBS-15 generator generates a random sequence with 16 bit (8

70



Figure 4.13: Transmitter Architecture

PAM4 symbols) output and runs at 1/8<sup>th</sup> the frequency of output driver. For 28Gbauds transmission, the PRBS is clocked at 3.5GHz. Since the LUT uses a total of 5 bits i.e. 2 bits from present symbol, 2 bits from last symbol and 1 MSB bit from next symbol, itsbasic cell is a 32-to-1 mux (as shown in Figure 4.14) and complete LUT is a combinational block which consists of 8\*5 (40) 32-to-1 muxes. In general, for a N tap equalizer we would have 2<sup>2N</sup>:1 mux as the basic cell. Since we have 5 bit DAC at output and the PRBS runs at 8x lower frequency in order to relieve the timing requirement of PRBS and LUT, we have 5\*8 such 32-to-1 muxes. As mentioned earlier, the input to

symbol DAC (first two MSBs of output DAC) is same as the present symbol itself and hence, the inputs to first two MSBs in output driver is taken directly from present symbol. This method thus reduces the size of memory in LUT from 5\*32 bits to 3\*32 bits. The LUT looks at present, previous and next symbol and generates 8 5-bit-DAC codes. These 8 5-bit codes are then serialized using 5 8-to-1 serializers until we have 5 bit code at 28Gbauds. This code is then used to feed the output driver which drives the laser at 28Gbauds or 56Gbps.

### 4.3.1 Nonuniform Output DAC

The output driver is a 5 bit DAC out of which first 2 bits form the Symbol DAC and last 3 bits form the Equalization DAC. Traditionally, a uniform DAC has been used at the output to provide equalization but here we use a nonuniform DAC. The symbol DAC generates the 4 PAM4 symbol level and equalization DAC provides the required



Figure 4.14: Basic Cell of LUT: 32-to-1 Mux

equalization current to improve the eye opening. This nonuniform DAC behaves like a uniform DAC when the range of Equalization DAC is equal to the LSB of Symbol DAC. If the range requirement from the equalization DAC is less than the LSB of symbol DAC, the LSB of the equalization DAC (and hence its range) can be reduced which leads to improving the resolution without increasing the number of bits.



Figure 4.15: Comparison of Uniform and Nonuniform DAC

#### **5. MEASUREMENT RESULTS**

### **5.1 Measurement Setup**

The measurement setup has been shown in the Figure 5.1. The DC PCB generate the DC voltages, bias current and scan signals required by the transmitter IC and is connected to RF PCB through ribbon cables. The transmitter and VCSEL are placed on the RF PCB beside each. The connections to transmitter from RF board are done through bond wires. The connection between transmitter and VCSEL is also done using bond wires. The two chips are placed to minimize the lengths of bond wires between them. Both cathode and anode connections of VCSEL are taken from the transmitter. The transmitter drives the VCSEL which in turn emits the light.

Butt coupling was first tried to couple the light from VCSEL directly into a bare fiber. But since the divergence angle from the VCSEL is large, a significant amount of light leaves into space. The coupling loss in this case resulted in a power efficiency of around 50%. A lens system was then tried to improve the coupling efficiency. The light coming out of VCSEL first reaches a collimator which makes the light parallel. This parallel light travels inside a lens tube and is focused by a focusing lens into the fiber on the other end. The collimating lens has a focal length of 10mm and the focusing lens has a focal length of 25mm. The diameter of both the lenses is 12.5mm. Both the lenses are placed inside a lens tube as shown in the Figure 5.1. The focusing lens focusses the light into a multimode fiber. The multimode fiber is held into place by a FC/APC Fiber adapter. A X-Y-Z translational stage has been used to hold the lens tube. The

74

translational stage can move the lens tube in X, Y and Z direction thus making it easier to capture the light. The VCSEL is kept at the focus of collimator and the FC/APC Fiber adapter has been kept at the focus of focusing lens. This is done to properly collimate and focus the light. The coupling efficiency using this setup has been found to be around 85%.

The light after getting coupled in the multimode fiber is fed to a photodiode (PD). A high bandwidth PD (30GHz) has been used here to avoid bandwidth limitation after the VCSEL. The PD converts the light signal to a current and drives that current into a 50 Ohm load. This resistance converts the current to voltage and also acts as a termination for the circuitry ahead. The PD is followed by a RF amplifier that amplifies the voltage signal. This RF amplifier is AC coupled to the PD and has a bandwidth of 40GHz. Although the gain provided by the amplifier can go up to 25, at high gain the noise performance of the amplifier goes bad and thus this amplifier has been used only at the default settings where the noise performance is best. The gain provided by amplifier in this setting is 4-5.

# **5.2 Printed Circuit Boards**

The DC PCB provides the scan inputs and DC power required on the chip. A NI DAQ generates the scan inputs which are then level shifted using a level shifter on the DC board. These scan signals are then supplied as inputs to the RF board through a ribbon cable. The DC board also houses many LDOs to generate the power supplies required by the transmitter IC. Another ribbon cable is used to supply the DC voltages to

75



Figure 5.1: Measurement Setup

the RF board. The DC board can also generate currents sources for bias circuitry on the RF chip.

The RF board is shown in the Figure 5.2. The scan signals are fed directly to the IC. The DC voltages and bias currents on the other hand are decoupled by the capacitors to remove the noise in the supply. 2 capacitors of 10uF and 6.8nF are used in parallel with each supply and bias current. They are placed close to the point where DC voltages are supplied to the board. Another cap of 100pF is placed on each supply and bias current near the IC. A current sink is generated on the RF board as it is required by the output driver in transmitter. This is achieved by using a potentiometer which can control

the flow of current through the bias circuitry. Two additional clock pads are used on the board. This is to monitor the bypass or PLL clock from the chip. An electrical trace, similar to the one on input side, connects the IC and SMA connector placed at the edge of chip. This electrical trace is also 50 Ohms impedance controlled. The RF PCB receives the high speed clock from function generator which is routed from clock source through SMA cables and reach the board through SMA connectors. This clock is then routed on board for a small distance over an electrical trace. This electrical trace is 50 Ohm impedance controlled and is shielded by vias from both sides. Figure 5.3 shows the placement of transmitter IC and VCSEL on PCB.



Figure 5.2: Printed Circuit Board



Figure 5.3: Placement of Transmitter IC and VCSEL on PCB



Figure 5.4: Micrograph of Transmitter IC



(a) (b)

Figure 5.5: Micrograph of VCSEL (a) Philips, (b) VI Systems



Figure 5.6: Bonding between Transmitter IC and VCSEL



(a)



(b)

Figure 5.7: Measurement Setup in Lab (a) Complete Test Setup, (b) Close-Up Near RF Board



Figure 5.8: (a) Measured L-I Curve, (b) L-I-V Curve from VI Systems VCSEL Datasheet

### **5.3 Measurement Results**

Figure 5.4 shows the micrograph of the transmitter IC. The bottom half shows the input clock buffer and PLL. The top half shows the transmitter. Figure 5.5 shows the Philips VCSEL used for measurements. The size of VCSEL in (a) is 235um\*235um and its thickness it 150um. The size of VCSEL in (b) is 210um\*250um and its thickness it 140um. Figure 5.6 shows the bonding between transmitter and VCSEL. The -3dB bandwidths of VCSELs in (a) and (b) is 11 GHz and 21GHz respectively.

Figure 5.8 shows the L-I characteristics of one of the VCSELs. The threshold current for this VCSEL is 0.5mA and slope efficiency is 0.5W/A. According to data sheet, the maximum forward current is 9mA and hence the results are not taken for higher values of current. After threshold current, the power increases linearly with VCSEL current

which suggest that complex modulation techniques e.g., PAM-4 can be implemented simply by varying the current through VCSEL in equal steps.



Figure 5.9: Pulse Responses with Modulation Current of 3mA: (a) Ibias=2.5mA, (b) Ibias=4.5mA, and (c) Ibias=6.5mA

Figure 5.9 shows plots the responses of isolated 1 and isolated 0 on same graphs. Pulse response corresponding to isolated 0 has been reversed and plotted on top of isolated 1 response for better comparison. The offset has been removed so that all the responses have a DC power of 0. This is done to remove the offset added by amplifier





Figure 5.10: Pulse Responses with Extinction Ratio of 2: (a) Ibias=1.5mA, (b) Ibias=4.5mA, and (c) Ibias=7.5mA

after photodiode because of AC coupling. The moving average of all the responses were calculated to smoothen them. As evident from the plots, the difference between isolated 1 and isolated 0 responses is more for lower bias currents and the responses become similar as the bias current increases. Figure 5.10 shows plots the responses of isolated 1 and isolated 0 on same graphs. As discussed earlier, the difference between isolated 1





(b)



(c)

Figure 5.11: NRZ Results: (a) 16Gbps, (b) 18Gbps, and (c) 22 Gbps

and isolated 0 responses is more for lower bias currents and the responses become similar as the bias current increases.

Figure 5.11 shows the NRZ eyes at 16, 18 and 22 Gbps. The results were obtained after pattern locking and noise averaging. No equalization was used for 16 and 18 Gbps. 3 tap linear equalization was used for 22 Gbps. It can be easily seen that the system is not bandwidth limited at least up to 18 Gbps.









(c)

Figure 5.12: 40 Gbps PAM4: (a) No Equalization, (b) 2-Tap Equalizer, and (c) 2.5-Tap Equalizer

Figure 5.12 shows the results for 40 Gbps PAM4. The currents corresponding to level 00, 01, 10 and 11 are 2.7mA, 4.1mA, 5.56mA and 7mA respectively. Bias current is 4.85mA and modulation current is 4.3mA. Optical modulation amplitude (OMA) at the output of VCSEL is found to be 1.94mW (2.9dBm).

Figure 5.13 shows the efficacy of nonlinear equalizer. The VCSEL here is biased at 4mA and the overall modulation current is fixed at 4.3mA. As seen from Figure 5.10 (a), the three eyes are skewed and the effective horizontal eye opening is only 6.9ps. Apart from increasing the vertical eye opening, the nonlinear equalizer also aligns the eyes by reducing the skew. As a result of this reduction is skew, the effective horizontal eye-opening increases from 6.9ps to 10.1ps. The OMA here is 1.94mW (2.9dBm).



Figure 5.13: 36 Gbps PAM4: (a) No Equalization, and (b) 2.5-Tap Nonlinear Equalization

Figure 5.14 shows the results for 44 Gbps PAM4. The currents corresponding to level 00, 01, 10 and 11 are 2.7mA, 4.1mA, 5.56mA and 7mA respectively. Bias current is 4.85mA and modulation current is 4.3mA. OMA at the output of VCSEL is found to be 1.94mW (2.9dBm).

Figure 5.15 shows the results for 50 Gbps PAM4. The currents corresponding to level 00, 01, 10 and 11 are 1.8mA, 3.4mA, 5.1mA and 6.7mA respectively. Bias current is 3.38mA and modulation current is 4mA. OMA at the output of VCSEL is 1.6mW (2dBm). The eye diagram without equalization shows very less horizontal and vertical opening. The eyes are also skew due to nonlinearity. 2 and 2.5 tap linear equalizer results in better eye opening but can't correct for skew. 2.5 tap nonlinear equalizer identifies this nonlinearity and compensates for it by choosing nonlinear coefficients. This results in both improved vertical eye opening and alignment of the three eyes.



Figure 5.14: 44 Gbps PAM4: (a) No Equalization, and (b) 2.5-Tap Equalizer



Figure 5.15: 50 Gbps PAM4: (a) No Equalization, (b) 2-Tap Linear Equalizer, (c) 2.5-Tap Linear Equalizer, and (d) 2.5-Tap Nonlinear Equalizer

Figure 5.16 shows power efficiency plotted against the data rate. Power dissipated by LUT, serializer, clocking and output driver has been considered while calculating the total power. PRBS power was not included as it emulates the data coming from low speed blocks and will be absent in actual product. All the blocks except output driver work on 1.2V supply. Output driver works on 3V supply.



Figure 5.16: Power Efficiency vs Data Rate

Figure 5.17 shows the power breakdown in transmitter. A major fraction of power is consumed by clocking and final 2-to-1 mux. Input clock at maximum frequency is buffered and routed to long distance before reaching the transmitter. This is done as the input clock passes through clock mux where a selection is made between bypass clock (clock coming directly from outside) and PLL clock (on-chip synthesized clock from low speed reference clock). The clock from clock mux is divided and given to 8-to-4 and 4-to-2 muxes and is directly used for final 2-to-1 mux. The clocking unit has been placed in middle and all muxes are placed on either side. The clock is thus routed to all the muxes from middle. This distribution of clock at high speed and over large distances leads to large power consumption.



Figure 5.17: Transmitter Power Breakdown

|                  | IEEE       | CICC    | JSSC 16[25] | JSSCC 2017 | IEEE       | This Work  |
|------------------|------------|---------|-------------|------------|------------|------------|
|                  | Photonics  | 17[18]  |             | [32]       | Photonics  |            |
|                  | 15 [14]    |         |             |            | 15 [17]    |            |
| Technology       | 0.13um     | 28nm    | 32nm        | 0.13um     | 0.13um     | 65nm CMOS  |
|                  | SiGe       | CMOS    | CMOS SOI    | SiGe       | SiGe       |            |
| Modulation       | NRZ        | NRZ     | NRZ         | NRZ        | PAM4       | PAM4       |
| Scheme           |            |         |             |            |            |            |
| Equalization     | 2-tap      | Complex | 2-tap       | 3-tap      | 2-tap      | 2.5 tap    |
| Technique        | linear FFE | Zero    | Asymmetric  | Asymmetric | linear FFE | Asymmetric |
|                  |            |         |             | FFE        |            | FFE        |
| Data Rate (Gbps) | 71         | 40      | 20          | 50         | 40         | 50         |
| Serialization    | Na         | Na      | No          | Na         | Na         | 8:1        |
| Outer OMA        | 0.09       | 1.3     | 0.9         | 0.7        | -3.8       | 2          |
| (dBm)            |            |         |             |            |            |            |
| Supply           | 4/-        | 1/-1.1  | 1.0/2.5     | 2.5/3.3    | 2.5/3.85   | 1.2/3.0    |
| (Driver/VCSEL)   |            |         |             |            |            |            |
| Power Efficiency | 13.4       | 0.5     | 0.77        | 3.8        | 9.4        | 5          |
| (pJ/bit)         |            |         |             |            |            |            |
| Data sequence    | PRBS-7     | PRBS-7  | PRBS-15     | PRBS-7     | PRBS-7     | PRBS-15    |
| Drive type       | CA         | CC      | CA          | CC         | CA         | CA         |

Table 5.1: Performance Summary and Comparison

Serialization=Na indicates the absence of on-chip data generation and use of BERT for

the same.

Serialization=No indicates on-chip data generation but absence of serializer

CA= Common Cathode

CC= Common Anode

### 6. CONCLUSION

Measurement results show substantial improvement in eye diagram before and after the use of proposed equalization method. The new nonlinear equalization technique improves the performance significantly and helps in achieving a data rate of 50 Gbps. The nonuniform DAC at output helps in providing better resolution compared to a uniform DAC without any increase in number of bits. To our knowledge, this is the highest data rate that has been achieved by any transmitter using CMOS technology.

The transmitter dissipates total power of 250mW and achieves a data rate of 50Gbps thus achieving a power efficiency of 5 pJ/bit. The core area of transmitter including PRBS, LUT, serializer and output driver is 375um\*500um. The total chip area including core transmitter, PLL, clock mux and decaps is 1.4mm\*1.4mm. The transmitter has been implemented in 65nm CMOS technology.

#### REFERENCES

 Palermo, S. (2011). High-speed serial I/O design for channel-limited and powerconstrained systems. CMOS Nanoelectronic Analog and RF VLSI Circuits, K. Iniewski. New York, NY: McGraw-Hill, 289-336.

[2] Broomall, J. R., & Van Deusen, H. (1997, May). Extending the useful range of copper interconnects for high data rate signal transmission. In Electronic Components and Technology Conference, 1997. Proceedings., 47th (pp. 196-203). IEEE.

[3] Wang, B., Sorin, W. V., Palermo, S., & Tan, M. R. (2016). Comprehensive verticalcavity surface-emitting laser model for optical interconnect transceiver circuit design. Optical Engineering, 55(12), 126103-126103.

[4] Westbergh, P., Haglund, E. P., Haglund, E., Safaisini, R., Gustavsso, J. S., & Larsson, A. (2013). High-speed 850 nm VCSELs operating error free up to 57Gbit/s. Electronics Letters, 49(16), 1021-1023.

[5] Tan, M., Wang, B., Sorin, W., Mathai, S., & Rosenberg, P. (2017). 50Gb/s PAM4
Modulated 1065nm Single-Mode VCSELs using SMF-28 for Mega-Data Centers. IEEE
Photonics Technology Letters, 29(13).

[6] Pavan, S. K., Lavrencik, J., Shubochkin, R., Sun, Y., Kim, J., Vaidya, D. S., Lingle,
R., Kise, T., & Ralph, S. (2014, March). 50Gbit/s PAM-4 MMF transmission using
1060nm VCSELs with reach beyond 200m. In Optical Fiber Communication
Conference (pp. W1F-5). Optical Society of America.

[7] Szczerba, K., Westbergh, P., Karlsson, M., Andrekson, P. A., & Larsson, A. (2013).
60 Gbits error-free 4-PAM operation with 850 nm VCSEL. Electronics Letters, 49(15),
953-955.

[8] Khafaji, M., Pliva, J., Henker, R., & Ellinger, F. (2017). A 42 Gbps VCSEL Driver Suitable for Burst Mode Operation in 14 nm Bulk CMOS. IEEE Photonics Technology Letters.

[9] Proesel, J., Schow, C., & Rylyakov, A. (2012, February). 25Gb/s 3.6 pJ/b and 15Gb/s
1.37 pJ/b VCSEL-based optical links in 90nm CMOS. In Solid-State Circuits
Conference Digest of Technical Papers (ISSCC), 2012 IEEE International (pp. 418-420).
IEEE.

[10] Proesel, J., Lee, B. G., Baks, C. W., & Schow, C. (2013, March). 35-Gb/s VCSELbased optical link using 32-nm SOI CMOS circuits. In Optical Fiber Communication Conference (pp. OM2H-2). Optical Society of America.

[11] Kuchta, D. M., Rylyakov, A. V., Schow, C. L., Proesel, J. E., Baks, C. W.,

Westbergh, P., Gustavsson, J. S., & Larsson, A. (2015). A 50 Gb/s NRZ modulated 850 nm VCSEL transmitter operating error free to 90 C. Journal of Lightwave Technology, 33(4), 802-810.

[12] Kuchta, D. M., Huynh, T. N., Doany, F. E., Schares, L., Baks, C. W., Neumeyr, C.,
Daly, A., Kögel, B., Rosskopf, J. & Ortsiefer, M. (2016). Error-free 56 Gb/s NRZ
modulation of a 1530-nm VCSEL link. Journal of Lightwave Technology, 34(14), 3275-3282.

[13] Bruensteiner, M., Papen, G. C., Poulton, J., Tell, S., Palmer, R., Giboney, K., Dolfi,
D., & Corzine, S. (1999). 3.3-V CMOS pre-equalization VCSEL transmitter for gigabit
multimode fiber links. IEEE Photonics Technology Letters, 11(10), 1301-1303.

[14] Kuchta, D. M., Schow, C. L., Rylyakov, A. V., Proesel, J. E., Doany, F. E., Baks,

C., Hamsel-Bissel, B.H., Kocot, C., Graham, L., Johnson, R., & Landry, G. (2013,

March). A 56.1 Gb/s NRZ modulated 850nm VCSEL-based optical link. In Optical Fiber Communication Conference and Exposition and the National Fiber Optic Engineers Conference (OFC/NFOEC), 2013 (pp. 1-3). IEEE.

[15] Rylyakov, A., Schow, C., Proesel, J., Kuchta, D. M., Baks, C., Li, N. Y., Xie, C., & Jackson, K. (2012, March). A 40-Gb/s, 850-nm, VCSEL-based full optical link.
In Optical Fiber Communication Conference (pp. OTh1E-1). Optical Society of America.

[16] Wang, J., Qi, N., Wang, Z., Yang, Q., Guo, H., Bai, R., Hong, Z., & Chiang, P. Y.
(2015, April). 4× 30 Gbps 155mW/channel VCSEL driver in 65nm CMOS. In Optical Interconnects Conference (OI), 2015 IEEE (pp. 111-112). IEEE.

[17] Soenen, W., Vaernewyck, R., Yin, X., Spiga, S., Amann, M. C., Kaur, K. S., Bakopoulos, P., & Bauwelinck, J. (2015). 40 Gb/s PAM-4 transmitter IC for long-wavelength VCSEL links. IEEE Photonics Technology Letters, 27(4), 344-347.
[18] Sharif-Bakhtiar, A., Lee, M. G., & Carusone, A. C. (2017, April). A 40-Gbps 0.5-

pJ/bit VCSEL driver in 28nm CMOS with complex zero equalizer. In Custom Integrated

Circuits Conference (CICC), 2017 IEEE (pp. 1-4). IEEE.

[19] Tsunoda, Y., Sugawara, M., Oku, H., Ide, S., & Tanaka, K. (2014, March). A 40-Gb/s VCSEL transmitter for optical interconnect with group-delay compensation pre-emphasis. In Optical Fiber Communications Conference and Exhibition (OFC), 2014 (pp. 1-3). IEEE.

[20] Tsunoda, Y., Sugawara, M., Oku, H., Ide, S., & Tanaka, K. (2014, February). 8.9 A
40Gb/s VCSEL over-driving IC with group-delay-tunable pre-emphasis for optical
interconnection. In Solid-State Circuits Conference Digest of Technical Papers (ISSCC),
2014 IEEE International (pp. 154-155). IEEE.

[21] Ohhata, K., Imamura, H., Ohno, T., Taniguchi, T., Yamashita, K., Yazaki, T., &
Chujo, N. (2010, May). 17 Gb/s VCSEL driver using double-pulse asymmetric emphasis technique in 90-nm CMOS for optical interconnection. In Circuits and Systems (ISCAS), Proceedings of 2010 IEEE International Symposium on (pp. 1847-1850).
IEEE.

[22] Ohhata, K., Seki, K., Imamura, H., Takeshita, Y., Yamashita, K., Kanai, H., &
Chujo, N. (2008, December). A 90-nm CMOS 4× 10 Gb/s VCSEL driver using asymmetric emphasis technique for optical interconnection. In Microwave Conference, 2008. APMC 2008. Asia-Pacific (pp. 1-4). IEEE.

[23] Tsunoda, Y., Shiraishi, T., Sugawara, M., Oku, H., Ide, S., & Tanaka, K. (2013, March). 25-Gb/s transmission over 250-m MMF using over-drive of 10-Gb/s VCSEL by utilizing asymmetric pre-emphasis. In Optical Fiber Communication Conference (pp. OW1B-2). Optical Society of America.

[24] Yazaki, T., Chujo, N., Yamashita, H., Takemoto, T., Lee, Y., & Matsuoka, Y.

(2014, June). 25-Gbps× 4 optical transmitter with adjustable asymmetric pre-emphasis in 65-nm CMOS. In Circuits and Systems (ISCAS), 2014 IEEE International Symposium on (pp. 2692-2695). IEEE.

[25] Raj, M., Monge, M., & Emami, A. (2016). A Modelling and Nonlinear Equalization
Technique for a 20 Gb/s 0.77 pJ/b VCSEL Transmitter in 32 nm SOI CMOS. IEEE
Journal of Solid-State Circuits, 51(8), 1734-1743.

[26] Ekşioğlu, E. M., & Kayran, A. H. (2005). Volterra kernel estimation for nonlinear communication channels using deterministic sequences. AEU-International Journal of Electronics and Communications, 59(2), 118-127.

[27] Li, H., Xuan, Z., Titriku, A., Li, C., Yu, K., Wang, B., Shafik, A., Qi, N., Liu, Y.,
Ding, R., & Baehr-Jones, T. (2015). A 25 Gb/s, 4.4 V-swing, AC-coupled ring
modulator-based WDM transmitter with wavelength stabilization in 65 nm CMOS. IEEE
Journal of Solid-State Circuits, 50(12), 3145-3159.

[28] Rylyakov, A., & Rylov, S. (2005). A low power 10 Gb/s serial link transmitter in
90-nm CMOS. In Compound Semiconductor Integrated Circuit Symposium, 2005.
CSIC'05. IEEE (pp. 4-pp). IEEE.

[29] Casper, B., Jaussi, J., O'Mahony, F., Mansuri, M., Canagasaby, K., Kennedy, J.,
Yeung, E., & Mooney, R. (2006, February). A 20Gb/s embedded clock transceiver in
90nm CMOS. In 2006 IEEE International Solid State Circuits Conference-Digest of
Technical Papers.

[30] Jiang, J. Y., Chiang, P. C., Hung, H. W., Lin, C. L., Yoon, T., & Lee, J. (2013,

February). 100Gb/s ethernet chipsets in 65nm CMOS technology. In Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2013 IEEE International (pp. 120-121). IEEE.

[31] Hegazi, E., Sjoland, H., & Abidi, A. A. (2001). A filtering technique to lower LC oscillator phase noise. IEEE Journal of Solid-State Circuits, 36(12), 1921-1930.
[32] Belfiore, G., Khafaji, M., Henker, R., & Ellinger, F. (2017). A 50 Gb/s 190 mW Asymmetric 3-Tap FFE VCSEL Driver. IEEE Journal of Solid-State Circuits, 52(9), 2422-2429.